[bksvol-discuss] Re: Quality checks procedure -- was WRe: Re: Self-validation

  • From: "Gary Wunder" <gwunder@xxxxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Fri, 18 Jun 2004 14:02:53 -0500

Nice list - thank you.


  ----- Original Message ----- 
  From: Guido Corona 
  To: bksvol-discuss@xxxxxxxxxxxxx 
  Sent: Friday, June 18, 2004 1:55 PM
  Subject: [bksvol-discuss] Quality checks procedure -- was WRe: Re: 
Self-validation



  I will not get into the self-validating debate.  But I suggest that both 
scanners and validators go through a set of quality checks that are as 
mechanized as possible,  to avoid the 'my book' vs. 'someone else's' book 
problem.  I found that the following set of checks tend to generate rather high 
results: 


  1.  Page Integrity Check:  sample every 20 pages. 

  2. Chapter header check:  works best if the book uses the word 'chapter' or 
something else to search for. 

  3. Page header check: lately I have been stripping headers manually and 
leaving only page numbers.  Definitely tedious, as I do it on each and every 
page. 

  4.  during the previous step I also check for the first word of each page.  
If the word looks like a fragment,  I merge it with the last word on the 
previous page if appropriate. 

  5.  Search for short dash followed by space, and short dash followed by 
newline.  These will let you find all sorts of words that were split at end of 
lines or at end of pages and can be repaired. 
    
  6.  Search for the tab char (\t in k1k).  This is most often a junk char,  
especially abundant at the beginning and end of lines.  You will frequently 
find it associated with other junk chars,  or single alphabets that had no 
business being there.  Remove manually each occurrence of these clustered nasty 
things as appropriate. 
    
  7.  Junk char hunt.  Look for junk chars available from the keyboard.  Start 
from the top left of the keyboard and work your way down to the bottom right.  
Remove or repair manually as required. 
  Jim Pardee also suggested we keep a file containing those chars that are not 
keyboardable:  we can copy/paste them in the find dialogue to search for them 
in the document. 

  9.  Look for whole words consisting of digit '1'.  In many cases you should 
change them to 'I'.  Sometimes they are to be deleted.  Do each change manually 
as appropriate. 

  10.  look for digit 1 followed by an apostrophes:  in most cases that should 
be changed to I followed by apostrophes. 

  11.  look for the apostrophes followed by digit 1.  In most cases that is 
part of a '11, which should become an 'll. 

  12.  Do a mass replacement of double single quote with single double quote. 

  13. Remove/fix single alphabetic words:  start with 'b'.  Search should be 
capitalization insensitive, except for 'i' which should be searched in lower 
case only.  Delete or repair each occurrence manually as appropriate.  Be 
careful,  you may be deleting someone's middle initial. 

  14. Spell check:  This step should remove most residual problems, except for 
some scanos that have generated valid English words.   

  Hope this helps. 

  Guido D. Corona
  IBM Accessibility Center,  Austin Tx.
  IBM Research,
  Phone:  (512) 838-9735
  Email: guidoc@xxxxxxxxxxx

  Visit my weekly Accessibility WebLog at:
  http://www-3.ibm.com/able/weblog/corona_weblog.html




        Nolan Crabb <aa3go@xxxxxxxxxxx> 
        Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx 
        06/18/2004 12:23 PM Please respond to
              bksvol-discuss 


       To bksvol-discuss@xxxxxxxxxxxxx  
              cc  
              Subject [bksvol-discuss] Re: Self-validation 

              

       



  Rui warned against the urge to self validate.  I completely concur!

  I've made a living editing my work and that of others for years.  The harsh 
  truth is, your own errors are much easier to miss, even if you've let that 
  book sit up there and cool its digital heels for days.  I guess the urge to 
  self validate is a natural one, since people get submission credits and 
  such that will help them pay for next year's subscription.  I have a Kate 
  Wilhelm mystery that's been up there for some time now, and I want very 
  much to just validate the thing and get the credits and more importantly, 
  get it up there so others can enjoy it.  But I won't.  I'm too aware of how 
  easy it is to skip errors in things you've either written or read.  I think 
  the checks and balances that exist here--the ones that encourage others to 
  validate what you've submitted--are the way it should work.  I realize 
  others will challenge my position, suggesting that self validation is 
  absolutely the only way some of the more esoteric titles will get 
  approved.  I disagree.  The first book I ever validated was a Christian 
  romance--decidedly not, not, not something I would normally want to read 
  under any circumstances.  Oddly enough, that's precisely the reason I chose 
  it.  I figured the material would be so new and different to me that I'd be 
  more prone to catch errors.  That book entered the Bookshare system with a 
  "good" rating presumably provided by the submitter.  I spent some time with 
  the book, but today it carries an "excellent" rating, and it's now part of 
  the collection.

  Please try not to misinterpret this, folks.  I don't use it as an example 
  to demonstrate how amazing I am.  Very nearly all of you have been at the 
  submission and validation end of this far longer than have I, and you're 
  doubtless the ultimate experts, having forgotten more in a day than I will 
  learn in years.  I just find self validation a little scary, especially in 
  light of rather strong messages lately which have called for higher quality 
  scans and validations.  There's no doubt we achieve higher quality 
  validations if we don't do them ourselves.

  The quarterly  magazine I edit goes through no fewer than four different 
  edits before it ever sees the inside of our subscribers' mailboxes.  I'm 
  not advocating for absolute rigid perfection; we are volunteers, after all, 
  who have lives.  But self validation is an excellent way to increase the 
  number of potential errors into the system.

  So that I don't totally come across here as being the loud mouthed whiner 
  on the list, here's a little proposal:  If you have a book that's been up 
  there quite a while, I'll take yours and validate it, regardless of the 
  subject or whatever, if you take mine and get it approved.  It's called 
  "The Casebook of Constance and Charlie Vol. 1," and it's 614 pages, so I'm 
  sure that's discouraged more than one person from taking it.  Obviously, 
  this is one of those first-come first-accepted challenges. <smile>

  Again, I'm not desirous of offending any here.  But in light of recent 
  messages that have called for higher standards in terms of better quality 
  scans and better validations, redoubling our resolve to let others validate 
  our work is probably one good way to ensure the increased quality of the 
  collection.

  Best Regards,

  Nolan, who is dawning his fire-retardant e-mail-reading suit in preparation 
  for all that indignant mail from self validators :-)



Other related posts: