DTP


 
Lively discussions on the graphic arts and publishing — in print or on the web


Go Back   Desktop Publishing Forum > General Discussions > Print Production & Automation

Reply
 
Thread Tools Display Modes
Old 02-10-2005, 05:36 AM   #1
LoisWakeman
Staff
 
LoisWakeman's Avatar
 
Join Date: Jan 2005
Location: Uplyme, Devon, England
Posts: 1,402
Default PDF document encoding

Hi all, my first queston in the new forum.

A very naive user at my client has just sent me a PDF file to extract text for a user guide. The text comes out as strings of binary characters (extract below), and if I look in the properties/fonts dialogue, the font given is Type 1 of a face I have never heard of (called PSOwsthelpvs). The encoding says "Custom".

&" 0## 
)*

 . 
!! ',
&'  (  )  
(  (  (  + '  5  1. 2 # $  & # 7 (    3 ''

Is there any simple way to get at the text, or do I need to type it in from the display? AFAIK, their DMS uses Acrobat 5 for imaging, and I haven't had this problem in the past. I don't have ATM, but can use either Acrobat 4 or 6 to investigate if needed.
LoisWakeman is offline   Reply With Quote
Old 02-10-2005, 10:55 AM   #2
annc
Sysop
 
annc's Avatar
 
Join Date: Oct 2004
Location: Subtropical Queensland, Australia, between the mountains and the Coral Sea
Posts: 4,434
Default

Quote:
Originally Posted by LoisWakeman
Hi all, my first queston in the new forum.

A very naive user at my client has just sent me a PDF file to extract text for a user guide. The text comes out as strings of binary characters (extract below), and if I look in the properties/fonts dialogue, the font given is Type 1 of a face I have never heard of (called PSOwsthelpvs). The encoding says "Custom".

&" 0## 
)*

 . 
!! ',
&'  (  )  
(  (  (  + '  5  1. 2 # $  & # 7 (    3 ''

Is there any simple way to get at the text, or do I need to type it in from the display? AFAIK, their DMS uses Acrobat 5 for imaging, and I haven't had this problem in the past. I don't have ATM, but can use either Acrobat 4 or 6 to investigate if needed.
Hi Lois, and welcome to our new home.

I suppose it isn't as simple as security in the file preventing text gathering? I just opened one of my security-protected files in BBEdit and got a heap of binary characters. The encoding/font object looked like this:

30 0 obj<</Type/Font/Encoding/WinAnsiEncoding/BaseFont/ArialMT/FirstChar 32/LastChar 122/Subtype/TrueType/FontDescriptor 32 0 R/Widths[278 0 0 0 0 0 0 191 0 0 0 0 0 0 278 278 556 556 556 556 556 556 556 556 556 556 278 0 0 0 0 0 0 667 667 722 722 667 611 778 722 278 500 667 556 833 722 778 667 778 722 667 611 722 0 0 0 667 611 0 0 0 0 0 0 556 556 500 556 556 278 556 556 222 0 500 222 833 556 556 556 0 333 500 278 556 500 722 500 500 500]>>
endobj

and I think that is determined by the originating application, which was FileMaker Pro in my case. The original font was Arial, but the PDF was creaed on a Mac, not Windows.

Hmm, I just created a new PDF from within InDesign, using the same security settings, and it correctly identified the font as Trajan.

I'll try again and remove the ability to print.

Same result.

So, I dunno. Do you know how the PDF was created? The ones I've been using here to test were all created with the Adobe-supplied PDF printer driver for Mac OS X.

   
__________________
annc is offline   Reply With Quote
Old 02-10-2005, 11:18 AM   #3
ktinkel
Founding Sysop
 
ktinkel's Avatar
 
Join Date: Oct 2004
Location: In Connecticut, on the Housatonic River near its mouth at Long Island Sound.
Posts: 11,189
Default

Quote:
Originally Posted by LoisWakeman
Is there any simple way to get at the text, or do I need to type it in from the display? AFAIK, their DMS uses Acrobat 5 for imaging, and I haven't had this problem in the past. I don't have ATM, but can use either Acrobat 4 or 6 to investigate if needed.
My only thought is that the document security settings forbid text extraction, either by design or accident.

Open the PDF in Acrobat and take a look at the security settings. In version 6, it is File > Document Properties > Security. You should be able to see there what is permitted (and whether, in fact, there is any security attached to the file).

Do you have Adobe Illustrator? Ask it to open the PDF (it will ask you to tell it which page), and see if the text comes up in the clear there. If so, you should be able to copy it line by line, which is tedious. But at least you can be pretty sure there is no security issue.

I’d also consider asking the client for a new PDF, as this one may simply be broken in some way.

Kathleen

   
__________________
[SIZE=2][COLOR=LemonChiffon]::[/COLOR][/SIZE]
[SIGPIC][/SIGPIC]
ktinkel is offline   Reply With Quote
Old 02-10-2005, 11:49 PM   #4
LoisWakeman
Staff
 
LoisWakeman's Avatar
 
Join Date: Jan 2005
Location: Uplyme, Devon, England
Posts: 1,402
Default

Ann and KT,

Thanks for the thoughts - I'd already checked the security settings and text extraction is allowed.

The PDF is generated in Documentum (which uses V5), and I've had many others from the same source that worked. I did request a different PDF and the problem was the same. My next step is to try and find the font, and also try importing into Word in different encodings - didn't think of that yesterday.

I have Photosshop, so will try reading it in that - your InDesign hint gave me a clue to follow up!
LoisWakeman is offline   Reply With Quote
Old 02-11-2005, 06:08 AM   #5
ktinkel
Founding Sysop
 
ktinkel's Avatar
 
Join Date: Oct 2004
Location: In Connecticut, on the Housatonic River near its mouth at Long Island Sound.
Posts: 11,189
Default

Quote:
Originally Posted by LoisWakeman
Ann and KT,

Thanks for the thoughts - I'd already checked the security settings and text extraction is allowed.

The PDF is generated in Documentum (which uses V5), and I've had many others from the same source that worked. I did request a different PDF and the problem was the same. My next step is to try and find the font, and also try importing into Word in different encodings - didn't think of that yesterday.

I have Photosshop, so will try reading it in that - your InDesign hint gave me a clue to follow up!
Photoshop will rasterize the page, no?

I seem to remember their being a way to export Word RTF from Acrobat, but can’t seem to find out anything now. But if so, that might be an option.

Acrobat was supposed to solve all document interchange problems. :-(

   
__________________
[SIZE=2][COLOR=LemonChiffon]::[/COLOR][/SIZE]
[SIGPIC][/SIGPIC]
ktinkel is offline   Reply With Quote
Old 02-11-2005, 06:11 AM   #6
LoisWakeman
Staff
 
LoisWakeman's Avatar
 
Join Date: Jan 2005
Location: Uplyme, Devon, England
Posts: 1,402
Default

Friday afternoon, and no joy. The font is not found by Google, PhotoShop cannot find any text in the PDF, and I just tried using the text capture tool in Acrobat - which tells me the page has "graphics other than text or images on it. It cannot be captured." So I saved it as a JPEG and got an unrecoverable error in Capture server at the end of the OCR process.

go figure...
LoisWakeman is offline   Reply With Quote
Old 02-11-2005, 06:50 AM   #7
ktinkel
Founding Sysop
 
ktinkel's Avatar
 
Join Date: Oct 2004
Location: In Connecticut, on the Housatonic River near its mouth at Long Island Sound.
Posts: 11,189
Default

Quote:
Originally Posted by LoisWakeman
Friday afternoon, and no joy. The font is not found by Google, PhotoShop cannot find any text in the PDF, and I just tried using the text capture tool in Acrobat - which tells me the page has "graphics other than text or images on it. It cannot be captured." So I saved it as a JPEG and got an unrecoverable error in Capture server at the end of the OCR process.

go figure...
That isn’t a real font, it’s some sort of PostScript thingie, so I am not surprised you didn’t find it anywhere.

Have you asked at the Adobe user-to-user forums? There are some serious Acrobat gurus there, and one might be able to tell you definitively that all is lost, at least. Or maybe solve the problem.

   
__________________
[SIZE=2][COLOR=LemonChiffon]::[/COLOR][/SIZE]
[SIGPIC][/SIGPIC]
ktinkel is offline   Reply With Quote
Old 02-11-2005, 10:44 AM   #8
annc
Sysop
 
annc's Avatar
 
Join Date: Oct 2004
Location: Subtropical Queensland, Australia, between the mountains and the Coral Sea
Posts: 4,434
Default

Quote:
Originally Posted by LoisWakeman
Friday afternoon, and no joy. The font is not found by Google, PhotoShop cannot find any text in the PDF, and I just tried using the text capture tool in Acrobat - which tells me the page has "graphics other than text or images on it. It cannot be captured." So I saved it as a JPEG and got an unrecoverable error in Capture server at the end of the OCR process.

go figure...
It sounds as if what you're seeing isn't text at all, but a graphic. Pictures of text, IOW. Maybe a PDF was placed in a Word document and then re-exported as PDF. I've done similar things when getting reports out of FileMaker Pro so that secretaries could place them as 'tables' in their company reports, which had to be in Word. If that were then re-exported as PDF, I imagine it would be treated as a graphic.

Just a thought.

   
__________________
annc is offline   Reply With Quote
Old 02-13-2005, 11:42 PM   #9
LoisWakeman
Staff
 
LoisWakeman's Avatar
 
Join Date: Jan 2005
Location: Uplyme, Devon, England
Posts: 1,402
Default

Ann,

The odd thing is that I can select text in Reader 6 using the text tool (but not in Reader 4).

(And thanks for the tip re the forums, KT)

Unfortunately, my deadline is looming, and I think typing is the order of the day. I do want to sort it out one day though!
LoisWakeman is offline   Reply With Quote
Old 02-14-2005, 05:11 AM   #10
ktinkel
Founding Sysop
 
ktinkel's Avatar
 
Join Date: Oct 2004
Location: In Connecticut, on the Housatonic River near its mouth at Long Island Sound.
Posts: 11,189
Default

Quote:
Originally Posted by LoisWakeman
Unfortunately, my deadline is looming, and I think typing is the order of the day. I do want to sort it out one day though!
Ah, yes — typing. The tried-and-true (if tedious) solution for so many of these high-tech problems!

:-)

   
__________________
[SIZE=2][COLOR=LemonChiffon]::[/COLOR][/SIZE]
[SIGPIC][/SIGPIC]
ktinkel is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to embed a font in my PDF document ?? Mohamed Nazeeh Fonts & Typography 22 01-04-2007 07:47 AM
Where's that color in the document? bobafett Print Design 4 06-28-2006 05:06 PM
Colours in a document Jon Finch Print Production & Automation 7 11-08-2005 09:44 AM
Font Encoding/codepage levonk Fonts & Typography 6 02-18-2005 02:14 PM


All times are GMT -8. The time now is 04:17 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Contents copyright 2004–2014 Desktop Publishing Forum and its members.