[bksvol-discuss] Re: Any easy way in Word to convert book submitted as two-column rtf ?

  • From: Cindy <popularplace@xxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Sun, 14 Mar 2010 01:38:35 -0800 (PST)

I think what Melissa is seeing is a result of the book being scanned in 
two-page mode, with some of the sentences from one page leaking onto the other. 
I've proofed several books that have been as she describes
Cindy

Wish List (i.e., books wanted added to the collection) and books-being-scanned 
list available at sites below



Wish List: https://wiki.benetech.org/display/BSO/Bookshare+Wish+List

Books Being Scanned List: 
https://wiki.benetech.org/display/BSO/Books+Being+Scanned+List


--- On Sat, 3/13/10, Mike <mlsestak@xxxxxxxxxxxxx> wrote:

> From: Mike <mlsestak@xxxxxxxxxxxxx>
> Subject: [bksvol-discuss] Re: Any easy way in Word to convert book submitted 
> as two-column rtf ?
> To: bksvol-discuss@xxxxxxxxxxxxx
> Date: Saturday, March 13, 2010, 10:59 PM
> Melissa,
> 
> I don't know if either of these is what you are
> encountering (and since you could reject the version with
> the problems it doesn't matter anyway, but I thought I'd
> throw in my oar on what I've seen anyway), but I have seen
> two cases where word has problems with search and replace on
> section breaks. 
> If there is a section break inside a table, word will claim
> to find and replace it, but nothing actually seems to
> happens.  The cases where I've seen this are when the
> OCR decides to make the table of contents into a word table,
> or when the OCR decides to make the page header into a
> table. 
> The second case is a little more complicated to
> explain.  First, think of section breaks as being like
> chapter indicators (that's how word expects them to be
> used).  There are two kinds of section breaks to do
> this.  Most chapters begin with a new page, so there
> are section breaks that indicate beginning both a new
> section and a new page.  These are what OCR programs
> put in most of the time.  But, there are books with
> chapters that begin in the middle of a page.  So, word
> also has section breaks that indicate a new section, but not
> a new page.  When you do search and replace of the
> section break that is also a page break, to convert it to
> just a page break, all works properly.  When you try to
> convert a non-page break section break to a page break, word
> seems to get confused.  If you search and replace these
> section breaks with page breaks, word may delete the section
> break without putting in a page break, but it will make the
> next section break a non page break section break, then when
> you delete that, it is removed without a page break being
> put in, until all the section breaks are removed from your
> book, but no page breaks are put in to replace them. 
> Sometimes, word just will not these non page break section
> breaks (that is, even though it will find them, it will not
> replace them).  When Carrie Karnos created a bunch of
> books that had both kind of page breaks in them because
> someone changed the settings in the OCR program at
> bookshare, I found out that doing the search and replace
> from end to beginning instead of beginning to end worked
> correctly (made the page break section breaks into page
> breaks and removed the non page break section breaks. 
> I don't actually know why this works or even why I tried it
> in the first place. 
> Misha
> 
> Melissa Smith wrote:
> > I don't know, but I have seen those section breaks
> before, that Word doesn't find
> > with the ^b. I rejected it, because there was another
> copy of the same book, on the
> > check out, page, that was a better copy. I would like
> to know what those section
> > breaks are though.
> > 
> > Melissa Smith
> > 
> > 
> > On 3/13/2010 2:57 PM, Judy s. wrote:
> >> I'm proofreading a young adult novel that's really
> had me frustrated.
> >> 
> >> Every page is really two pages.  They
> obviously scanned it two pages at a time, and when it was
> OCRed they didn't convert it correctly.  It ended up as
> every "page" in the rtf really being two pages, coded in
> word as two side-by-side columns.
> >> 
> >> The book has zero page breaks. They are all
> section breaks, which are usually easy to convert.  In
> this case, when I convert them, it runs the two columns
> (that are really two separate pages, side by side) together.
> On top of that, it gives me a book that is one long column
> and only one letter wide! Then, it still has a kind of
> section break that's occurring on pages that have footnotes
> that I've never seen before. The ^b command does not find
> those, and I can't get Word to copy them so I can't figure
> out an ascii code for them that way.  I can't delete
> them easily, either. I've had to go through the book by
> visually looking for them, putting a blank line before and
> after them, highlighting that little section, and then
> deleting it. I did a google search, and haven't come up with
> a code for it either.
> >> 
> >> Has anyone found a way using Word to easily
> convert a book like this into text that correctly has the
> pages one after another instead of side by side?
> Highlighting the entire book and removing the columns didn't
> work. I tried that several different ways.
> >> 
> >> I figured out a messy brute-force way to do it
> finally, by grabbing all the text and dumping it into a new
> rtf file as a special paste with no formatting.  That
> gives me the text pretty much correctly (not completely -
> sometimes the columns are still intermingled), but I have to
> put in all the page breaks individually now.  That
> isn't too bad, because it was missing half of the page
> breaks anyways.  However, I can only find the missing
> ones by comparing the original rtf visually with my new rtf
> since half of the page numbers are missing.  Yuck.
> >> 
> >> Any thoughts on other ways to do this are
> welcome!  The scan, by the way, is beautiful to look at
> if you are sighted.  It is an exact match to what the
> book must have looked like in printed form.  But it's
> totally wrong for what we need!  It's been checked out
> and released by several volunteers before me, and I sure
> know why! smile.
> >> 
> >> Judy s.
> >> 
> >> To unsubscribe from this list send a blank Email
> to
> >> bksvol-discuss-request@xxxxxxxxxxxxx
> >> put the word 'unsubscribe' by itself in the
> subject line.  To get a list of available commands, put
> the word 'help' by itself in the subject line.
> >> 
> >> 
> > To unsubscribe from this list send a blank Email to
> > bksvol-discuss-request@xxxxxxxxxxxxx
> > put the word 'unsubscribe' by itself in the subject
> line.  To get a list of available commands, put the
> word 'help' by itself in the subject line.
> > 
> > 
> 
> To unsubscribe from this list send a blank Email to
> bksvol-discuss-request@xxxxxxxxxxxxx
> put the word 'unsubscribe' by itself in the subject
> line.  To get a list of available commands, put the
> word 'help' by itself in the subject line.
> 
> 


      
 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.

Other related posts: