DTP


 
Lively discussions on the graphic arts and publishing — in print or on the web


Go Back   Desktop Publishing Forum > General Discussions > General Publishing Topics

Reply
 
Thread Tools Display Modes
Old 01-09-2006, 07:58 AM   #1
RJ Emery
Member
 
Join Date: Mar 2005
Posts: 248
Default Index of PDF Content

Given a corpus of existing PDF files acquired from various and varied sources, how does one build an index of them? What software do I need? What are the steps involved?

FWIW, I currently have a P2 450 MHz W98 system and Acrobat Reader 5.1, but I hope to move to new hardware and Linux soon.

   
__________________
RJ Emery, Eastern USA
WordPerfect 8 User on XP Pro SP3 System
OCR ScanSoft PaperPort SE v9 on
Brother MFC-8840DN Printer/Scanner/Fax
RJ Emery is offline   Reply With Quote
Old 01-18-2006, 06:35 PM   #2
John Spragens
Member
 
Join Date: Jan 2005
Posts: 437
Default

What kind of index do you want to have when you're done?

The full Acrobat allows you to create an index of all the PDF files in a directory that you specify. But it's not a human-readable index. It's something Acrobat uses to speed word searches across all the files in the collection.

   
__________________

www.enigmaterial.com
John Spragens is offline   Reply With Quote
Old 01-18-2006, 10:00 PM   #3
RJ Emery
Member
 
Join Date: Mar 2005
Posts: 248
Default

My hope was that I could produce an index that I could then print. However, if I could invoke the previously produced index, then search for keywords across all the PDF files in a folder or tree, that would be an acceptable compromise.

   
__________________
RJ Emery, Eastern USA
WordPerfect 8 User on XP Pro SP3 System
OCR ScanSoft PaperPort SE v9 on
Brother MFC-8840DN Printer/Scanner/Fax
RJ Emery is offline   Reply With Quote
Old 01-19-2006, 05:06 AM   #4
dthomsen8
Member
 
dthomsen8's Avatar
 
Join Date: Aug 2005
Location: Philadelphia, PA 19130
Posts: 2,158
Default Good Question!

Quote:
Originally Posted by RJ Emery
Given a corpus of existing PDF files acquired from various and varied sources, how does one build an index of them? What software do I need? What are the steps involved?
Good Question! What you are seeking is something like what Picasa2 and similar tools do for graphics. If there isn't such a tool, the idea is one for some clever programmer to develop a shareware product.

Good luck, and let us know if you find such a tool. I have a lot of PDF files, and sometimes have trouble finding the one I want.
dthomsen8 is offline   Reply With Quote
Old 01-19-2006, 12:41 PM   #5
annc
Sysop
 
annc's Avatar
 
Join Date: Oct 2004
Location: Subtropical Queensland, Australia, between the mountains and the Coral Sea
Posts: 4,436
Default

Quote:
Originally Posted by RJ Emery
My hope was that I could produce an index that I could then print. However, if I could invoke the previously produced index, then search for keywords across all the PDF files in a folder or tree, that would be an acceptable compromise.
Many years ago, when I was working in a petroleum industry library, one of the professional organisations (Society of Exploration Geophysicists, I think) produced a series of CDs of their professional papers in PDF format, and it had a searchable index. But the whole thing was encapsulated, so I have no idea what software they used.

The searchable index was presented as an application.

   
__________________
annc is offline   Reply With Quote
Old 01-19-2006, 06:37 PM   #6
Steve Rindsberg
Staff
 
Join Date: Nov 2004
Posts: 6,742
Default

Acrobat can index a PDF file or a whole folder full of them. The index it produces isn't human-readable nor can the information be extracted to print. In other words, it's nothing like a typical book index.

In use, a PDF is set to "attach" a particular index file when it opens. You can then do searches against the index.

   
__________________
Steve Rindsberg
====================
www.pptfaq.com
www.pptools.com
and stuff
Steve Rindsberg is offline   Reply With Quote
Old 01-20-2006, 05:14 AM   #7
RJ Emery
Member
 
Join Date: Mar 2005
Posts: 248
Default

Steve,

Could you elaborate? I have to open a PDF first before I can see an index or search other PDFs in the folder?

Adobe tells me Acrobat Capture 3.0 Personal Edition should do what I seek, which is to produce an index of PDFs in a folder that I could then search for keywords, opening only those PDFs that match the search.

I had hoped to produce a printable index, yes, something like at the end of a book, but that apparently is not possible.

   
__________________
RJ Emery, Eastern USA
WordPerfect 8 User on XP Pro SP3 System
OCR ScanSoft PaperPort SE v9 on
Brother MFC-8840DN Printer/Scanner/Fax

Last edited by RJ Emery; 01-20-2006 at 05:26 AM.
RJ Emery is offline   Reply With Quote
Old 01-20-2006, 03:57 PM   #8
Steve Rindsberg
Staff
 
Join Date: Nov 2004
Posts: 6,742
Default

>>Could you elaborate? I have to open a PDF first before I can see an index or search other PDFs in the folder?

Yes - you need to open a PDF that's got an "attached" index, in order to use the index. You can't *see* the index at all. The search feature in Acrobat/Reader uses it.

>>Adobe tells me Acrobat Capture 3.0 Personal Edition should do what I seek, which is to produce an index of PDFs in a folder that I could then search for keywords, opening only those PDFs that match the search.

That's how it works, yes.

>>I had hoped to produce a printable index, yes, something like at the end of a book, but that apparently is not possible.

Not with Acrobat, no.

I wish I knew a good source to suggest so you could download a PDF or set of them with index files included so you could try it out before spending money on add'l software. Does anyone have any ideas on that score?

   
__________________
Steve Rindsberg
====================
www.pptfaq.com
www.pptools.com
and stuff
Steve Rindsberg is offline   Reply With Quote
Old 01-20-2006, 09:20 PM   #9
RJ Emery
Member
 
Join Date: Mar 2005
Posts: 248
Default

Steve,

Let me understand the procedure. If I have a collection of PDF files, all presumably without attached index files, I would invoke Acrobat Capture and made index files for each of my PDFs, attaching each index to the PDF itself. That done, I could then open any of the PDFs, and from there search for keywords in any of the PDFs within my collection. Does that describe the process?

   
__________________
RJ Emery, Eastern USA
WordPerfect 8 User on XP Pro SP3 System
OCR ScanSoft PaperPort SE v9 on
Brother MFC-8840DN Printer/Scanner/Fax
RJ Emery is offline   Reply With Quote
Old 01-21-2006, 12:42 PM   #10
Steve Rindsberg
Staff
 
Join Date: Nov 2004
Posts: 6,742
Default

>Let me understand the procedure. If I have a collection of PDF files, all presumably without attached index files, I would invoke Acrobat Capture and made index files for each of my PDFs, attaching each index to the PDF itself.

No, it's actually simpler. Assuming Capture works roughly the same as the older cataloguing procedures built into acrobat, you'd point it at a whole folder full of PDFs and it would create one index for the lot of them.

Any one of the PDFs (or all of them) could then be set to attach your newly created index automatically when it opens. Once the index is attached during a given Acrobat/Reader session, it stays attached, even though you open other files.

>>That done, I could then open any of the PDFs, and from there search for keywords in any of the PDFs within my collection. Does that describe the process?

With amendments as above, yes. And one more: you can do full text searches. You're not limited to keywords. Unlike the kind of indices I think you were thinking of, you don't need to create lists of terms to be indexed.

Hmm. With Acrobat, you may be able to take an existing index created in some applications (Word, for example) and have it become a printable index as part of the PDF you make from the Word doc AND have the index entries clickable (to take you to the refrenced page). I've never done that, so I'm not sure.

   
__________________
Steve Rindsberg
====================
www.pptfaq.com
www.pptools.com
and stuff
Steve Rindsberg is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Auto Index of names??? kazik General Publishing Topics 1 01-29-2007 02:12 PM
(in)decent content question ktinkel Web Site Building & Maintenance 26 03-13-2006 12:26 PM
allow others to edit content Mato1344 General Publishing Topics 1 12-21-2005 05:45 PM
Directories & index.html files ktinkel Web Site Building & Maintenance 32 12-10-2005 02:58 PM
Content of ID Package BobRoosth Print Design 4 11-15-2005 05:27 PM


All times are GMT -8. The time now is 11:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Contents copyright 2004–2014 Desktop Publishing Forum and its members.