parsing and searching big text files

Simon Brunning SBrunning at trisystems.co.uk
Wed Dec 6 12:48:27 EST 2000


> From:	Gabor Gludovatz [SMTP:ggabor at sopron.hu]
> I have a big text file which contains 3 variable lenght columns with
> names. I have to search this file for names in either column and have to
> show to other 2 names, and I have to do this very fast!
> 
> The text file is about 3 megs long. Does someone know a method to do this
> quickly?
> 
> Which is the faster, should I read to whole text file into the memory and
> parse and search it there, or should I read it from the disk line by line
> and parse it..?
> the last seemed to be very slow.
> 
> Here is an example line from this text file:
> Foo Bar:John Doe:Bill
> 
> If I look for, for example, John Doe, the function should return Foo Bar
> and Bill and all the other records which contain John Doe.
> 
> Again: I have to do it very fast.
 
Try this (untested):

def scanGaborsFile(file, scanFor='John Doe', seperator=':'):
    result = []
    try:
        lines = file.read().splitlines()
    except AttributeError:
        lines = open(file, 'r').read().splitlines()
    for line in lines:
        fields = line.split(seperator)
        if scanFor in fields:
            fields.pop(fields.index(scanFor))
            result.append(fields)
    return result

To answer your specific question, reading in the whole file will be by far
the fastest way of doing it, unless you run out of memory.

Cheers,
Simon Brunning
TriSystems Ltd.
sbrunning at trisystems.co.uk




-----------------------------------------------------------------------
The information in this email is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this email by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution, or any action taken or omitted to be taken in
reliance on it, is prohibited and may be unlawful. TriSystems Ltd. cannot
accept liability for statements made which are clearly the senders own.




More information about the Python-list mailing list