View Full Version : Document classification
Steve Rindsberg
12-22-2010, 07:31 AM
This seems the best place to post this. I'm about to take delivery on a document scanner. In case you're not familiar with these, they're special-purpose scanners that scan small piles of letter-size and smaller paper very quickly. Most can scan directly to PDF (and in some cases, do text recognition along the way).
Both the scanners and disk space have gotten inexpensive enough to make this practical for me. Space here is tight, and I figure I can eventually get rid of an entire 4-drawer file cabinet in the office and bunches of boxes before the contents gather mildew in the basement.
Great.
But having made that decision, I realized that I've no idea how I will (should!) name the files as I scan these things. Between the librarian/database wizard in our virtual midst and all the other smart folks, I figured this'd be a good place to start gathering ideas.
The paper I'll be scanning includes bank records, tax records, technical information on mostly computer-related subjects, contracts/agreements.
I'm thinking of some sort of short encoding system; I don't care for novella-length file names. Too much potential for compu-mishap there.
Possibly start each file name with e.g.
TAX_ for tax documents
FIN_ for financial/bank papers
Then subcategories followed by date/short descriptions, a la:
FIN_STM_2009_CreditUnion.PDF
Bank statements from 2009 from the credit union
TAX_FED_2008_Personal
2008 personal federal taxes
The subcategories would be different for different main categories. The year wouldn't be especially useful information for, say, programming info.
But all this is just first-round thoughts. I have to assume that there are already classification systems worked out and tested. I'd like to give those a bit of study before I reinvent the wheel (and possibly invent a hexagonal one instead of just using the circular design that's already available).
ktinkel
12-22-2010, 08:09 AM
Great questions! I look forward to this discussion, as I have been thinking about one of those scanners myself.
What one are you getting, btw?
I usually file data files like the ones you describe with names that begin with the date: YYYYMMDD. I started doing that a few years ago and it does help point me to the right place on my computer. On the other hand, I could have subfolders by year (say), and save some of that.
The bad part is that it puts a long string of nonsense in front of every file name.
Anyway, this should be good!
Steve Rindsberg
12-22-2010, 09:25 AM
I ordered the Fujitsu ScanSnap S1500 from NewEgg. It has 100 reviews, average 5-star rating, and most of the reviewers just *rave* about it; quite a few have used other scanners, quite a few have bought several of these for their offices. And in reading reviews for some other scanners, it's surprising how often you hear "Spend a little more, get the Fujitsu" or "I bought this to replace the Fujitsu we wore out. I should have bought another Fujitsu."
The only real drawback I see is that it doesn't have TWAIN or ISIS drivers; you have to use their own software. But I don't see any complaints about that, just that some folks want to use the scanner with their existing software and that's not generally in the cards.
I definitely want the date as part of the name for certain things, but date doesn't work for me as the primary lookup key. I'm *real* vague about dates.
"A new stove? Why think about a new stove? We just got that one." "Twenty-five years ago, just. Uh-huh."
Assuming you've refined your folder system in the filing cabinet over the years, and have been finding the files in there, can you start with that system and refine it further?
Also, does your scanner allow you to add metadata as it scans? I'm just thinking about how you are going to find individual file-equivalents-of-pieces-of-paper later.
We've just gone to a region-wide records management system at work and I am having soooo much difficulty finding my files in it. Doesn't help that I work in one department and work for another. I find myself looking for stuff by working my way down the hierarchy rather than searching through the entire regional council file system.
ktinkel
12-22-2010, 12:10 PM
I definitely want the date as part of the name for certain things, but date doesn't work for me as the primary lookup key. I'm *real* vague about dates.
"A new stove? Why think about a new stove? We just got that one." "Twenty-five years ago, just. Uh-huh."I know what you mean!
ktinkel
12-22-2010, 12:23 PM
I ordered the Fujitsu ScanSnap S1500 from NewEgg.Interesting. Are there different models under the same model number? One for PC, another for Mac? That was at Amazon; NewEgg is lots less confusing, but are they all the same thing?
The only real drawback I see is that it doesn't have TWAIN or ISIS drivers; you have to use their own software. But I don't see any complaints about that, just that some folks want to use the scanner with their existing software and that's not generally in the cards.So long as their software runs on the Mac, and so long as they are good at keeping up with revisions. Or so long as pushing a button gets you what you need. <g>
terrie
12-22-2010, 01:16 PM
steve: I don't care for novella-length file names.Me either! However, I also loathe underscores--what a pita to type them--why not just run the names all togther with caps for the start of significant words???
So..."TAX_FED_2008_Personal" would be "TaxFed2008Personal"
I think the idea of a category prefix is good...
Terrie
Steve Rindsberg
12-22-2010, 05:22 PM
There's a ScanSnap S1500M version. M for Mac
I think it's the same unit, just a different set of software (which includes Acrobat Standard 8 for Mac).
So yep, they got software for Mac. Can't say how good they are at keeping up with revs though.
Steve Rindsberg
12-22-2010, 05:33 PM
>> Assuming you've refined your folder system in the filing cabinet over the years, and have been finding the files in there, can you start with that system and refine it further?
The business stuff is fairly well organized, as is the project file. The general information ... kind of a free-for-all. If you paw through it long enough, you find what you're after. I'm hoping to refine that a bit. Pawing through PDFs is nowhere near as quick. <g>
[LATER] Kayza's reply made a nice "CLICK" happen. Forget the paper Piling System. I'm going take your advice but look to the way I already file electronic information on the shared network drive. It's not perfect, but I can generally locate pretty much anything I need in reasonable time. Thank you!
>> Also, does your scanner allow you to add metadata as it scans? I'm just thinking about how you are going to find individual file-equivalents-of-pieces-of-paper later.
I'm figuring on using the filing system as the "first level" index, of sorts. Folders for major categories, filenames that reflect the contents. But there are search utilities that'll search within PDFs for specific text.
The scanning software does allow for keywords (in several ways, I think). The most intriguing: you use a standard highlighter on keywords within the document, it adds them (presumably to the document properties in the PDF).
>> We've just gone to a region-wide records management system at work and I am having soooo much difficulty finding my files in it. Doesn't help that I work in one department and work for another. I find myself looking for stuff by working my way down the hierarchy rather than searching through the entire regional council file system.
I have the advantage here. I can pretty much guarantee that the taxonomy used in the system will line up nicely with my mental model of my little corner of the InfoWorld. <g>
I found a few interesting bits on the web. The first discusses the very problem you're talking about. Hierarchies and taxonomies designed by system implementers don't necessarily work for users.
Ten taxonomy myths
http://www.montague.com/review/myths.html
Digital Landfill (blog)
http://aiim.typepad.com/aiim_blog/
Steve Rindsberg
12-22-2010, 05:36 PM
I don't mind typing the underscores and find that I can read the results better with them. They substitute nicely for spaces (in fact, a couple of my apps routinely convert spaces to underscores in filenames that'll get uploaded to a web server.)
But I'll defend to the deat^H^H^H^Hmild pain in one or two fingers your right to do it your way on your computer.
And I promise not to send you copies of my TAX_Return_Scan.pdf files. <g>
Kayza
12-22-2010, 06:19 PM
The ScanSnap is considered the one to beat in the category. I haven't used the software it comes with, but if I'm right, naming may not be an issue. Some of the software should act as a document management system, especially if you get the bundle.
terrie
12-22-2010, 07:07 PM
steve: And I promise not to send you copies of my TAX_Return_Scan.pdf files. <g> LOL!!! Thank you...now...if I could just get one of my sisters to stop using spaces in her filenames--which are just gawdawful!--I'd take underscores...'-}}
Her filenames are like sentences and make me absolutely crazy when I have worked on her laptop--fortunately, that doesn't happen all that often...
Terrie
Steve Rindsberg
12-23-2010, 06:45 AM
It sure does seem to get the best reviews, doesn't it? Have you used it with other software than the included bundle? I'd be interested in hearing more about that.
By bundle, do you mean the Rack2-Filer software? I didn't get that (though apparently I'll get a demo/trial version with the scanner's normal software shipment).
In any case, unless it maintains the actual document files in PDF or some other standard format and lets me name them as I wish, I wouldn't be using it anyhow. I want to be able to drop the files on a shared network drive or DVD and make them accessible to all of the computers on the network, w/o having to install proprietary software to get at them.
I've already got a decent system for filing electronic documents ... basically using the computer file system as a hierarchically organized filing system. And thank you for leading me to THAT point. Ann suggested looking at my existing paper filing system for guidance. I think I'll look at how I've got the computer files arranged. With a few extensions, that should work pretty well.
vBulletin® v3.8.7, Copyright ©2000-2013, vBulletin Solutions, Inc.