View Full Version : Scan/OCR book pages to text file vs. retyping?
03-01-2005, 04:58 PM
One of my clients has two old books they need to reprint, but they don't have the original text files. The books are short -- one is about 20 pages, the other about 40 -- but nobody wants to retype them.
Does anyone know of a service bureau that can scan the pages (the books are saddle-stitched and could be easily cut into single pages), run the scans through an OCR program, clean up the text and provide a clean text file?
I'm also asking around to see if anyone wants to just retype them. I don't know the going rates either for typing or for scanning/OCR'ing, so I'm not sure which would be the most cost-effective.
FWIW, pages are 8.5 x 11, text is one column (margins are enormous so the line width is not too horrendously wide) and large (looks like 12 point Times Roman). The books are probably good candidates for scanning, as well as being retyped.
03-02-2005, 05:59 AM
One of my clients has two old books they need to reprint, but they don't have the original text files. The books are short -- one is about 20 pages, the other about 40 -- but nobody wants to retype them.… I don't know the going rates either for typing or for scanning/OCR'ing, so I'm not sure which would be the most cost-effective.Seems to me they have three options, the third being to shoot the printed pages and make new plates. The type may heavy up a little (though a good printer could probably keep that under control).
Still need to do a cost comparison, but if I were guessing, I’d say retyping would be cheaper than OCR (possibly only half a day’s work for a good typist). But you might talk to a printer about shooting the pages as well.
03-02-2005, 02:05 PM
I don't know of an SB that would do it but my guess is that with a decent ocr program it would be pretty straightforward, particularly given your description...a few years ago, I scanned my brother-in-law's CV--10 pages, most of it medical article citations--and was pleasantly surprised at how my no-name-came-with-the-scanner ocr software did with it...I'd say it was close to 95% accurate...
03-02-2005, 02:36 PM
'I'd say it was close to 95% accurate'
A good scanner & OCR program should do better than that, but a good typist will probably do even better. But a typist is more likely to be fazed by a medical article than OCR, which works by reading letters (and numbers etc.), not words.
03-02-2005, 02:55 PM
In the past, they have shot printed pages, but that's not an option this time. These books were printed in 1989 and 1991, and the type looks pretty bad (not the actual print quality, but the spacing is terrible, and the text is too large).
And they want to bring these books into conformance with their new branding standards, which limit us to two font families (neither of which is Times Roman).
So it's a gotta be a do-over.
03-02-2005, 03:01 PM
I used OmniPage many years ago (I'm suspect my old version, which was on floppies, probably won't even run under XP) with decent results, but that was for scanning a lot fewer pages than we are dealing with now.
I am not willing to do the scanning and OCR'ing (even if the client were to pay for the software). I don't have the patience to scan and OCR 60 pages. And I doubt my client would pay my hourly rate.
(One reason I'm not doing the retyping is that they don't want to pay my hourly design/production rate for plain old typing. And I don't like production typing, so I am not volunteering to lower my rate to get the job.)
If we don't find a service bureau that can do it, the client will probably just hire a temp typist or something.
03-02-2005, 04:17 PM
So it's a gotta be a do-over.Based on what you say, sounds like a blessing.
03-03-2005, 02:12 PM
>>michaelr: A good scanner & OCR program should do better than that, but a good typist will probably do even better.
There was *no* way I was going to type that CV and so the ocr output was very useful and cut the work I had to do down to a very reasonable level.
>>But a typist is more likely to be fazed by a medical article than OCR, which works by reading letters (and numbers etc.), not words.
That's really what I didn't want to have to retype...the article names were just a bear--primarily because I'm not an an electrophysiologist...'-}}
03-03-2005, 02:13 PM
>>marlene: If we don't find a service bureau that can do it, the client will probably just hire a temp typist or something.
As KT said...a blessing...'-}}
03-04-2005, 02:33 PM
The current Omnipage software is very good, and with the automatic page feed on my copier/fax/scanner, a job like this can be done very fast.
I send large jobs to a freelancer who does it very cheaply -- but the courier cost isn't worth it unless it's a larger job than yours. For something this small, if you don't have the software and equipment to do it yourself, it is probably cheapest for the client to hire a temp typist to rekey the whole thing.
03-04-2005, 05:08 PM
sounds like a blessing
It'll be a blessing to my checkbook after I lay out the books. <g>
03-04-2005, 05:11 PM
I send large jobs to a freelancer who does it very cheaply
Does your freelancer do the scan/OCR thing or retype the copy?
Either way, is your freelancer looking for more work?
03-12-2005, 09:14 AM
He does OCR Scanning. He's in Ontario, Canada, and is both fast and very cheap. I can get the info for you if you like.
We get back a completely unproofed Word document -- but it has to be proofed on this end regardless, so that tends not to matter so much to me. I skimmed through one such scan recently, from an entire 144-page book, and found half a dozen gibberish paragraphs that required going back to the initial document. The rest was just little stuff.
03-16-2005, 12:35 PM
the service we currently use is: www.discountdocumentscanning.com/ -- he's in Ontario.
I also got a quote from www.katscan-ocr.com -- she's in the U.S., Missouri, I think. We ended up going with the local company, though.
03-22-2005, 09:55 PM
Belated thanks for your responses. I am saving the info for future reference. My client decided to have the documents retyped in-house -- one of their staffers wanted the hours.
And of course you're right about the documents having to be proofed thoroughly whether they are retyped by a human or scanned and OCR'd. Either way, there are going to be typos (or scanos).
03-29-2005, 04:00 PM
If it's any use to you, UC Press has the originals of pre-computer age books retyped in China, simultaneously by two non English proficient operators who compare what is on their screens (if this is vague it's because it's second or third hand). We have three books (in revision) processed this way on our computers and it seems to have worked very well. The warts in the mss come through quite accurately --which might not happen if the typists were reading what they were processing. Kind of a human-brain OCR.
(The less said about what happens when the supposedly English speaking minions of the press get their hands on it the better...)
03-30-2005, 08:59 AM
That's an interesting procedure!
My client decided to retype their books in-house. They were small books, and it should not have taken much time for a competent typist.
04-01-2005, 04:40 PM
They were small books, and it should not have taken much time for a competent typist.
That was what I thought when I saw the # of pages. By the time you cut loose the pages, scan and then run pages through the OCR you've put in some time.
By the way--
The vile software that came with my current HP scanner all but negates document scanning AND acquisition through other software. If you "scan document", the page borders are set by the software automatically and you can't do a thing about them. But the background is white and the scanner can't SEE the page borders and a significant amount of text is always cut off with the margins it also can't see. For the same reason, you can't use it to make a copy. And the default resolution is 200 dpi --unacceptable to my Presto! OCR software, and if you acquire the document through the OCR program it scans at 200dpi --you can't do a thing about it. (Oddly, its own settings for text for OCR are 300 & 400dpi). This is also true of scanning from Photoshop but there you can change some of the settings --for each and every picture you scan, it reverts to 200dpi "millions of colors" between scans. This is the 8200, at the time I bought it the high-end home office scanner --no excuse for this kind of "we'll make your decisions" idiocy!
04-02-2005, 12:30 PM
>>molly/ca: The vile software that came with my current HP scanner all but negates document scanning AND acquisition through other software.
Sounds pretty nasty...scanner's aren't all that pricey these days...might be worth buying a new one or maybe even seeing if HP has newer drivers??
Just did a quick check...not sure what OS you are running but for Win2000, the latest drivers have late 2003 to 2005 dates and for XP looks about the same...
For mac OS-X, the driver dates range from 2004 to 2005...
04-05-2005, 04:59 PM
I think it's not the driver but the software, and you can't run the scanner without the software as far as I can see. It's where all the setting live, for one thing.
Ross and I are getting his dad/my husband a new computer and printer so I threw in a new laser printer too, because he's always bugging me about getting a deskjet that will print 13" wide so that's what I got (if it works --"recertified" from TigerDirect was best price I found), but he's not going to like it when he finds out how much ink it uses. Opened the LaserJet box and found the directions are ALL in pictures! Not ONE single word from beginning to end. Unfortunately, this leaves a few little matters in doubt --for instance, what is turned on or off when you start plugging the computer in... There's a little blow-in sheet that says for more detail see the CD, which you're supposed to load next to last...
I think I'll just assume that the computer is going to figure it out when it finds the new printer hooked up when it boots. The picture shows the printer being plugged into either the // or USB port with a picture of the software dialogue screens on its screen, which implies that you're doing it hot, but it's news to me that you can do that with a // port. Even if I could get to it, which I doubt I can.
I still have the manuals from our very first desk and laserjets (with the printer language codes in them --been a while since I've needed them but--). What a difference.
( I should try scanning a page of the setup folder --it has to be seen to be believed-- maybe later.)
04-06-2005, 10:59 AM
>>molly/ca: I think it's not the driver but the software, and you can't run the scanner without the software as far as I can see. It's where all the setting live, for one thing.
Interesting...99% of the time, I run my Epson 2450 scanner from within Photoshop using the Epson twain driver and have full control over the scanning process...I wonder if there some sort of pref set on yours for "automatic"?
I rarely if ever, use the other Epson scanner software--I do sometimes use it's copier process.
>>if it works --"recertified" from TigerDirect was best price I found),
Another good place to check for refurb'd units is http://refurbdepot.com
I bought my Epson 1160 printer and my Olympus C-4040 digicam from them...good prices and service...
>>found the directions are ALL in pictures! Not ONE single word from beginning to end
Weird...probably saves them having to have someone on staff who can translate directions into various languages...
>>but it's news to me that you can do that with a // port. Even if I could get to it, which I doubt I can.
I wouldn't do it that way...
With my Epson printers, they suggest installing the drivers, then turning off the pc and plugging the printer in--power and port--and then powering up the pc...I'd suggest that if you are using a parallel connection...
04-07-2005, 01:33 PM
>> There's a little blow-in sheet that says for more detail see the CD, which you're supposed to load next to last... >>
They're talking about installing the driver and software here. There is probably a directory on the CD for \manual or \help or \documentation or something similar. That's where you may find a real manual, most likely in PDF format.
Just put in your CD and look for that. Don't let it autorun. I always turn that OFF immediately on installing Windows because I don't like things that happen automatically. (I assume you're using Windows.) To prevent autorun from happening, hold down the Shift key while you insert the CD and it spins up.
vBulletin® v3.8.7, Copyright ©2000-2013, vBulletin Solutions, Inc.