DTP


 
Lively discussions on the graphic arts and publishing — in print or on the web


Go Back   Desktop Publishing Forum > General Discussions > General Publishing Topics

Reply
 
Thread Tools Display Modes
Old 02-18-2013, 02:40 PM   #1
johnnyboy
Member
 
Join Date: Oct 2008
Location: Tasmania, Australia
Posts: 127
Default PDF Editing

Here in Australia we have the old newspapers digitised and available online under the name of Trove. When a searched article comes up it has an OCR panel on the left. The options available at the top of this panel for copying are PDF, JPG and TXT. The PDF copy is taken from the newspaper and I wondered if that could be edited with a suitable PDF program. I think this is a forlorn hope as my attempts to do so have so far proved.
johnnyboy is offline   Reply With Quote
Old 02-18-2013, 05:02 PM   #2
Howard Allen
Member
 
Howard Allen's Avatar
 
Join Date: Oct 2007
Location: Calgary, Alberta, Canada
Posts: 824
Default

PDFs were never really intended for editing. They're the digital equivalent of a printed page. You can do some minor editing, mostly along the lines of replacing a few characters to fix typos, etc. Sort of the digital equivalent of dabbing out letters on a printed page with white-out and replacing them with rub-on Letraset (do they still make that stuff?). Doing any significant editing, like changing blocks of text or inserting stuff is awkward at best.

Another problem is that some of those archive-type PDFs are raster images: photos of the page, so there's no actual text to edit. Some of them are generated by OCR software and contain a hidden layer of editable text underneath the page image; these are frequently peppered with OCR errors, in my experience, but usable if you're willing to do some proofreading.

If you really need editable text, your best option is probably the .TXT file.

   
__________________
Howard

OSX 10.10.5
Howard Allen is offline   Reply With Quote
Old 02-18-2013, 06:23 PM   #3
johnnyboy
Member
 
Join Date: Oct 2008
Location: Tasmania, Australia
Posts: 127
Default

Thanks Howard. I thought that would be the case. I have been copying the TXT files into Word and correcting them with the PDF version on the other monitor as the guide. I am engaged in putting together 100 plus articles of 4000-6000 words each that were produced in 1883-85. It is a long job so I was looking for an easier way. I am up to No. 40 so I guess I will get there eventually.
johnnyboy is offline   Reply With Quote
Old 02-19-2013, 12:29 PM   #4
BobRoosth
Member
 
Join Date: Jan 2005
Location: Los Angeles, Ca.
Posts: 933
Default

I assume you mean this site: http://trove.nla.gov.au/newspaper

Clearly scanned or photographed. You are lucky there is any usable OCRd text.
BobRoosth is offline   Reply With Quote
Old 02-19-2013, 03:18 PM   #5
johnnyboy
Member
 
Join Date: Oct 2008
Location: Tasmania, Australia
Posts: 127
Default

It is not too bad. Of course there are a lot of corrections needed on the raw copy. However Trove has come up with what I regard as an excellent system of correction. Registered members can go in and correct the text and this has resulted in a lot of the articles being fairly good. I have done some but if you delve into the site you can find the people who have done corrections and they are ranked by who has done the most. The top ones have done a staggering amount.

Last edited by johnnyboy; 02-19-2013 at 04:18 PM.
johnnyboy is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What pdf editing program to buy? Fastrax Software 11 08-06-2010 01:08 PM
Editing text in pdf or eps Eric Ladner Print Production & Automation 13 10-10-2009 12:38 AM
Editing pdfs Mike Print Production & Automation 14 05-24-2008 12:43 AM
Video Editing Apps Clayton Images 20 02-18-2008 07:07 AM
EXIF Editing BobRoosth Photography 3 10-31-2007 10:25 PM


All times are GMT -8. The time now is 05:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Contents copyright 2004–2014 Desktop Publishing Forum and its members.