[bksvol-discuss] Re: Advice needed for submission with no page numbers

  • From: "Gerald Hovas" <GeraldHovas@xxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Wed, 18 Oct 2006 23:10:57 -0500

Monica,

headers are often smaller, sometimes in a different font, and sometimes in a
different color than the regular text.  This makes it more difficult for the
OCR software to recognize.  K-1000 will have the same problem recognizing
them, though, because it also uses FineReader and OmniPage to recognize
text.

When I mentioned stripping headers, I was speaking of the text, not the
entire header.  If you strip out the page numbers by stripping out the
entire header and the book begins numbering the prologue or first chapter
with page 1, then you've changed the page numbering so that the first page
in the scan begins with 1, not the first page of the main text.  That will
cause the page numbering to be off in the Bookshare copy.

Using OpenBook or K-1000 to strip headers/footers also makes it impossible
for a validator to verify that all of the pages are present by referring to
the page numbers which did scan properly since they get stripped right along
with the text.  While you may think it's easy to verify this by reading the
book all the way through, I've found at least one instance where it was
impossible to determine that two pages had been skipped when I proofed a
book because there appeared to be no break in the story.  The only reason I
knew some pages had been skipped was because the page numbers didn't match
up later in the book.  Since FineReader had trouble recognizing about 20
page numbers in that area of the book, I was only able to determine which
pages I'd missed by referring back to the print book.  Once I found the
pages and inserted them, I could see how they added to the overall plot, but
the sentence at the bottom of the page prior to the missing pages flowed
smoothly into the sentence at the top of the page following the missing
pages.

So, I don't recommend using OpenBook's built-in tool-or K-1000's for that
matter-to strip headers/footers.

HTH

Gerald

-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Monica Willyard
Sent: Wednesday, October 18, 2006 10:37 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: Advice needed for submission with no page
numbers

Gerald, when I scan, I frequently get clear text but get headers that 
are mostly garbage.  I use Openbook 7 and just assumed that everyone 
else gets gibberish for headers too.  The numeric page numbers are 
usually legible,  but Roman numerals are usually messed up like my 
headers.  Do you have any idea of what I'm doing wrong to get that 
result?  Since the headers are often so messy and aren't uniform, I 
typically tell Openbook to strip them out and let the Bookshare tool 
number the pages.  I do this because you wrote awhile back that the 
stripper looks for uniform headers to know when to strip things, so 
I've been stripping them myself instead.

Monica Willyard

At Wednesday 10/18/2006 11:08 PM, you wrote:
>OpenBook recognizes page numbers just as well as K-1000.  That's because
>they both use the same OCR packages to do the recognition.  What it doesn't
>do is allow you to renumber the page numbers it assigns so that it's
>numbering agrees with the page numbers in the headers so you can easily
>determine if pages are missing.
>
>HTH
>
>Gerald

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of
available commands, put the word 'help' by itself in the subject line.

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.

Other related posts: