DTP


 
Lively discussions on the graphic arts and publishing — in print or on the web


Go Back   Desktop Publishing Forum > General Discussions > Software

Reply
 
Thread Tools Display Modes
Old 04-26-2008, 08:32 AM   #1
RJ Emery
Member
 
Join Date: Mar 2005
Posts: 248
Default Recognizing Photo Caption as Text

I have a document that I scanned from a very old magazine article using Adobe Acrobat Standard 8.0 on my XP Pro SP2 machine.

Two of the pages are full-size color photos with captions overlaying the images. Where the caption is black text, the built-in OCR function of Acrobat Standard recognizes the text and makes the text searchable. Where the caption is white text, the OCR function does not see it and otherwise ignores it.

Where there is white text overlaying an image, how can I cause Acrobat Standard to recognize and process via OCR that white-on-color text?

   
__________________
RJ Emery, Eastern USA
WordPerfect 8 User on XP Pro SP3 System
OCR ScanSoft PaperPort SE v9 on
Brother MFC-8840DN Printer/Scanner/Fax
RJ Emery is offline   Reply With Quote
Old 04-26-2008, 08:45 AM   #2
terrie
Staff
 
Join Date: Oct 2004
Posts: 8,918
Default

I don't know anything really about Acrobat but if it's basically a single line of text, would it be an option to take the scan into Photoshop (or other imaging software) and select the text and then fill it with black?

Terrie
terrie is offline   Reply With Quote
Old 04-26-2008, 09:45 PM   #3
Howard Allen
Member
 
Howard Allen's Avatar
 
Join Date: Oct 2007
Location: Calgary, Alberta, Canada
Posts: 824
Default

RJ--

After some experimenting (with Acrobat 6 Pro/Mac), I had no luck. I made a Photoshop image with a blurry/blotchy red and blue background, then overlaid white text (Times, 18 pt) on top. I ran this through Acrobat's Paper Capture, and it actually OCRed about half of the white text correctly (every second line). Unfortunately, half ain't good enough. I then went back to Photoshop and tried what Terrie suggested: selected the white text and changed it to black. Then I OCRed this, and got only about 20% of the text OCRed correctly. The rest of the text was apparently interpreted as part of the image and not converted to text.

I think that's your biggest problem: text of any colour on a coloured background just doesn't offer enough contrast for Acrobat's Paper Capture to work with. It's good enough for black-on-white, but that's about all. Perhaps a more robust OCR application might do the job--sorry, I can't offer any suggestions.

Here's a kludge I came up with that may work for you (again, this is with Acrobat 6 Pro: hopefully you can do this with Acro 8 standard--I dunno):

1) Create the PDF document with the white text and don't bother to run it through Paper Capture.

2) Select the "Text Box Tool" (Tools menu/Advanced commenting) and create a text box. Type the text of the photo caption you want into the text box.

3) Drag the text box into the middle of the caption (unfortunately, there's no way of scaling the type in the text box, as far as I can see, but it may not matter; keep reading).

4) Right-click the text box and select "Properties". In the "Appearance" tab, set the opacity to "0" (zero). Click the "locked" button to keep the text box from moving.

This will make the text box invisible as it sits on top of your photo caption text. Since the text in the text box is searchable (I tried it), your readers will be able to search for and find any words in the text, just as if they were part of the visible caption. If that's all you want to be able to do, it should work.

Good luck!

   
__________________
Howard

OSX 10.10.5
Howard Allen is offline   Reply With Quote
Old 04-27-2008, 02:07 AM   #4
RJ Emery
Member
 
Join Date: Mar 2005
Posts: 248
Default

Howard,

I'll give your suggestion a try later. I would, however, like to ask a follow on question:

I am not opposed to purchasing better OCR software (or even Acrobat Professional). Once installed, how do I tell Acrobat to use the new OCR software as opposed to its built-in OCR capability?

FWIW, the OCR software I am most likely to acquire is IRIS ReadIRIS Pro 11. ABBYY FineReader Professional is too expensive for me, and I have my doubts about Nuance OmniPage Pro 16. I use ScanSoft's PaperPort SE version 9 that come bundled with my Brother MFC-8840DN printer/scanner/fax machine. ScanSoft later became Nuance.

   
__________________
RJ Emery, Eastern USA
WordPerfect 8 User on XP Pro SP3 System
OCR ScanSoft PaperPort SE v9 on
Brother MFC-8840DN Printer/Scanner/Fax
RJ Emery is offline   Reply With Quote
Old 04-27-2008, 08:53 AM   #5
Howard Allen
Member
 
Howard Allen's Avatar
 
Join Date: Oct 2007
Location: Calgary, Alberta, Canada
Posts: 824
Default

Quote:
Originally Posted by RJ Emery View Post
Once installed, how do I tell Acrobat to use the new OCR software as opposed to its built-in OCR capability?
Just don't use Acrobat. Your new OCR software will do all the work, and should give you the option of producing PDF output. If need be, you can use Acrobat to insert the ReadIRIS-produced PDF file into your compiled PDF document.

You'd better test the trial version of any OCR software you're thinking of buying, to make sure it'll do what you want. I can't give any specific recommendations; the only stand-alone OCR software I've used is an ancient version of ABBYY's FineReader Pro (Mac). FWIW, I was very impressed with the accuracy of their OCR engine, though in those days it didn't produce the layered PDFs (bitmap on top, editable text below) that you're looking for.

Cheers,

   
__________________
Howard

OSX 10.10.5
Howard Allen is offline   Reply With Quote
Old 04-27-2008, 02:33 PM   #6
RJ Emery
Member
 
Join Date: Mar 2005
Posts: 248
Default

Quote:
Originally Posted by Howard Allen View Post
You'd better test the trial version of any OCR software you're thinking of buying, to make sure it'll do what you want. I can't give any specific recommendations ...
I really don't use PDF creation and OCR capabilities all that often, perhaps just once a month -- and each time, I encounter one show stopper or another.

From other recently uncovered shortcomings, I now realize I should have purchased Acrobat Professional rather than the Standard version. I compared the feature list of the two products, and not knowing any better, falsely concluded the Pro version had features I really don't need, opting then to purchase just the Standard version. How wrong I was.

With OCR software, I really can't predict what I need, nor do I know enough to truly undersand what one product offers vis-*-vis another. I simply have to go with whatever is affordable and otherwise highly recommended by others, take what I get and hope it will do what I require when the need arises.

With OCR, there are only three choices:

1) ABBYY FineReader Pro at $400 is too expensive for my budget.

2) Nuance OmniPage Pro 16 is affordable at $90 and is the successor to the PaperPort SE 9 that I have. However, I am not impressed with PaperPort. For one reason, the W98 version of the same software was more robust and usable than the same implementation when re-installed on my new XP system.

3) That leaves IRIS ReadIRIS Pro 11 at $110. I will most likely go with it.

   
__________________
RJ Emery, Eastern USA
WordPerfect 8 User on XP Pro SP3 System
OCR ScanSoft PaperPort SE v9 on
Brother MFC-8840DN Printer/Scanner/Fax
RJ Emery is offline   Reply With Quote
Old 04-27-2008, 06:14 PM   #7
Howard Allen
Member
 
Howard Allen's Avatar
 
Join Date: Oct 2007
Location: Calgary, Alberta, Canada
Posts: 824
Default

Quote:
Originally Posted by RJ Emery View Post
I now realize I should have purchased Acrobat Professional rather than the Standard version. I compared the feature list of the two products, and not knowing any better, falsely concluded the Pro version had features I really don't need, opting then to purchase just the Standard version. How wrong I was.
Perhaps so, but don't be fooled into thinking you'd get a better OCR function with the Pro version. The OCR in Acro Pro is just as rinky-dink as the standard version; trust me. The Acro Pro has more markup and workgroup collaboration tools (which I seldom use) and various arcane tools for fine-tuning PDFs for press (which I've never used). The only reason I got Pro over Standard is because it was included with the CS Publishing Suite. Don't get me wrong: I find Acrobat indispensible, but I don't think most people would find Pro more useful than Standard. I've got Standard on my Win XP machine, and don't miss any of the Pro features.

   
__________________
Howard

OSX 10.10.5
Howard Allen is offline   Reply With Quote
Old 04-28-2008, 12:53 AM   #8
iamback
Member
 
iamback's Avatar
 
Join Date: Oct 2005
Location: Amsterdam, NL
Posts: 4,894
Default

Quote:
Originally Posted by RJ Emery View Post
With OCR, there are only three choices:

1) ABBYY FineReader Pro at $400 is too expensive for my budget.

2) Nuance OmniPage Pro 16 is affordable at $90 and is the successor to the PaperPort SE 9 that I have. However, I am not impressed with PaperPort. For one reason, the W98 version of the same software was more robust and usable than the same implementation when re-installed on my new XP system.

3) That leaves IRIS ReadIRIS Pro 11 at $110. I will most likely go with it.
I have an older version of ABBYY FineReader that I got (free) on a magazine cover disk. I rarely need it, but it's excellent and I never felt the need to upgrade.

You might try to find an offer on eBay... for any of these, really, but I found FineReader much better than what came with PaperPort which came with a scanner. (Says she who just placed her first-ever bid on eBay! )

   
__________________
Marjolein Katsma
Look through my eyes on Cultural Surfaces (soon!), My ArtFlakes shop and Flickr.
Occasionally I am also connecting online dots... and sometimes you can follow me on Marjolein's Travel Blog
iamback is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
NYT Mag gets new display/caption face ktinkel Fonts & Typography 5 03-15-2008 11:31 AM
B & W Photo Cover George Photography 7 02-19-2008 06:55 AM
Photo inserts dthomsen8 How to Use the Forum 7 07-29-2007 10:21 AM
My very challenging photo JVegVT Images 7 11-27-2005 02:47 PM


All times are GMT -8. The time now is 04:37 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Contents copyright 2004–2014 Desktop Publishing Forum and its members.