[Tutor] Hoping to benefit from someone's experience...

Marc Tompkins marc.tompkins at gmail.com
Sat Apr 19 21:45:34 CEST 2008


Just as a followup -  I ended up using a Python script to recurse through
the folders of files and, for each file found, extract the
doctor/date/medical record number from the filename, then run a Word macro
on the file to insert the header slug.

If anyone ever needs to do something similar, or just wants to laugh at what
a hack this turned out to be, I've attached a text file containing the
Python and the macro.  (The UIDs and names have all been sanitized for my
protection.)
Bear in mind that this was a one-off; if I end up re-using this I will clean
it up, put things in classes, load UIDs from an external file, etc.

Background:
In the folder "U:\transcripts.doc"  there are folders called "Recent" and
"Archive" for each of about twelve providers; I need only Able, Baker, Doe,
Roe, Smith and Jones - that's why I put the folder names in a tuple instead
of simply processing the root folder.  Inside of each folder there are
subfolders for individual patients, but also many patient documents in the
main folder.

File names are supposed to follow the format
"Last,First-MRN-01Jan2008-Able.doc",
  but there are many variations on this theme -
    "Last,First-01Jan2008-MRN-Able.doc"
    "Last,First-MRN-01Jan2008-Able-Echo.doc"
    "Last,First-MRN-01Jan2008-Echo-Able.doc"
    "Last,First-01Jan2008-MRN-Echo-Able.doc"
    "Last,First-01Jan2008-MRN-Able-Echo.doc"
etc.

Last,First - the patient's name.  Irrelevant to my purpose.
MRN - medical record number
Echo - most of these files are consultations, but the ones labeled "Echo"
are echocardiogram reports.  For these I need to set Category and
Description to "Echo"; otherwise it's "Consultation
External"/"Consultation".

The doctor is supposed to dictate the MRN, and the transcriptionist puts it
in the filename - but there are many cases where the MRN is missing.  These
can look like:
    "Last,First-XXX-01Jan2008-Able.doc"
    "Last,First--01Jan2008-Able.doc"
    "Last,First-01Jan2008-Able.doc"
so I needed several different filters

The date needs to be passed in MM/DD/YYYY format; I found that the easiest
way to figure out which field was the date was to try/except strftime().
I'm sure there's a more elegant way using regexp, but I was in
quick-and-dirty mode.

As you might expect, most of the time is spent in Word.  It would probably
be faster if I set Visible = False, but for some reason the macro fails
(silently) when I do that, so I just minimize Word instead to cut down on
screen refreshes.  Also, drive U is a Samba share, so there's network
latency to take into account.  Even so, 10,231 files took less than 15
minutes to convert, which I can live with.

-- 
www.fsrtechnologies.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20080419/e4e8e4af/attachment-0001.htm 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rtfConversion.txt
Url: http://mail.python.org/pipermail/tutor/attachments/20080419/e4e8e4af/attachment-0001.txt 


More information about the Tutor mailing list