[bksvol-discuss] Re: 550 books in the download queue

  • From: Guido Corona <guidoc@xxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Tue, 10 Aug 2004 10:34:28 -0500

Kenneth,  I do agree that we cannot make things perfect,  especially if 
graphs and pictures are involved,  which typically generate optical noise. 
 On the other hand I do urge all scanning volunteers to perform a modicum 
of cleanup of their materials prior to posting to the system.

1.  PLEASE do perform a book integrity check.  If a print copy has 320 
pages, there is no reason whatsoever that the etext should contain 1760 
pages instead. 
There should be a 1 to 1 correspondence in the bodytext between printed 
pages and etext pages.  Any volunteer can in most cases easily remove some 
duplicate pages,  but can't fix things when they are totally out of 
kilter. Furthermore, it is a lot easier for the submitter to integrate any 
missing pages as he/she has still access to the print copy.
In most cases,  even if a few missing pages need to be inserted,  a page 
integrity check for a book does not take more than 15 minutes and would 
save all reviewers a lot of wasted time.
In some cases there is -- as already mentioned -- a tremendous discrepancy 
between pages in the printed book and etext pages,  where the etext has 5 
or 6 times the number of pages in the original.  This likely points to 
scanning/OCR settings that are way off,  or an OCR package being used 
which is less than adequate. 

2.  Lots of broken page headers make a book very tiring to read.  Please 
fix them or remove them.  Kurzweil lets you remove page headers 
automatically.  Version 8 was a little radical in this regard and ended 
removing also stuff it was not supposed to.  Newest version 9,  just 
announced yesterday,  has now an option for 'careful' header removal. 
Yesterday I worked on one of your books.  Kurzweil removed 190 headers. 
Approx 120 headers I removed manually.  It took me all of 15 minutes to do 
the cleanup.  As I had already performed a page integrity check and had 
come up with perfect correspondence after removal of duplicate pages,  I 
also removed page numbers to do a faster job.  Just want to get the 
backlog down quickly.

3.  Ah yes,  those amazing synopses. Let us all try to be informative . 
The short one should give our paying customers a very brief sense of the 
book.  If we are inclined to give more detailed info,  a longer and useful 
description can go in the long synopsis.  I confess I hardly ever make up 
my own synopsis,  but I liberally borrow from the front matter of the 
book,  the back cover of paperbacks and the front and back flaps of hard 
covers.  Synopsis such as "Set in Alabama",  "It's all in the Title", 
"Thorough Treatment of the matter at hand" are unfortunately not 
noticeably helpful and will only cause our paying customers to get 
irritated and lose faith in Bookshare.

Of course,  there is a lot more that submitters can do to improve the 
quality of their postings,  but even a little of upfront cleanup and a 
clean submission process will enable us to offer a quality product to OUR 
PAYING CUSTOMERS.

NOTE:  By the way,  Kurzweil will start shipping K1000 version 9 shortly. 
I have used the beta and found OCR quality even improved over earlier 
versions.  I will post the announcement shortly.  Cost of the upgrade is 
$95 or $0.00,  depending on the status of your account.


Guido D. Corona
IBM Accessibility Center,  Austin Tx.
IBM Research,
Phone:  (512) 838-9735
Email: guidoc@xxxxxxxxxxx

Visit my weekly Accessibility WebLog at:
http://www-3.ibm.com/able/weblog/corona_weblog.html

Other related posts: