[Chicago] Request for a programmer from a lawyer friend

Bob Haugen bob.haugen at gmail.com
Sun Jan 8 19:04:07 CET 2012


Peter Van Schaick (contact info below and in the cc) is a lawyer,
friend, and all around good guy, if you can believe that about a
lawyer.  He does a lot of public interest legal work, for example.

Anyway, he is looking for a part-time long-term programmer.  Paid
work.  Your choice of tools. No commute.  Work from wherever.

I don't have time, nor the required ready skills.

I told him Python would be perfect for the job, and Chipy had lots of
excellent programmers.   So here's his request.  Contact him directly
if you're interested.

Peter writes:
I'm continuing to do some legal work with a friend in NYC, and I've
got set of problems with electronic data that I'd like to solve with a
computer programmer, instead of a "litigation support" person. So I'm
looking for a long-term, part-time person. Here's my immediate
problem.

My opponents produced an set of electronic copies of 1.3M pages of
documents, instead of paper copies. There are roughly 250,000
documents, reflected in pairs of image and ocr files (tif + text).
Each member of a document pair is labeled with the same number-name;
they differ in that one has a tif extension; the other a txt
extension. The file names are in sequential order in several dozen
numbered folders.

The odd fact is that about 15% of the tif files do NOT have a
corresponding txt file. A file like 00523451.tif may be missing its
parallel 00523451.txt file.

To start, I need a list of the 60,000 tif files that do NOT have a
parallel txt file. It seems that a recursive comparison routine, with
a result for both match and no match, would do the trick.

Next, I'd like to develop indices and search tools. I'd start with
Boolean searches, and work up to proximity searches, i.e., a pair of
text strings within a number of characters of each other.

Can you think of someone who might be interested in helping me? I'm
thinking of a paid consulting relationship. As you know, I'm
comfortable working remotely with the right person.

Peter van Schaick
201-388-3383
Email worklaw at gmail.com


More information about the Chicago mailing list