comparing strings
Daniel Fackrell
unlearned at DELETETHIS.learn2think.org
Tue Jul 30 15:41:20 EDT 2002
"jadedlime" <jadedlime at hotmail.com> wrote in message
news:3d46e543$1_9 at news.teranews.com...
> Hi, I am very new to Python and very new to programming in general.
> I need to take a particular string of data and compare it to a dictionary
> which will put all matching records in a specific file and all the records
> from the string that do not match in another to be dealt with on an
> individual basis.
>
> I appreciate any and all help on this problem, part of the code is
included.
> this seems to produce the matches, but not the mismatches.
>
>
>
> import string, re, codecs
>
> # create inputs and outputs
>
> input = codecs.open("/home/jadedlime/Julio/julio.work", "r",
> encoding="ISO-8859-1")
> output = codecs.open("sec_try", "w", encoding="ISO-8859-1")
> output2 = codecs.open("sec_trynotfound", "w", encoding="ISO-8859-1")
>
> # read and split the lines of the main julio records
>
> whole = input.read()
> lines = string.split(whole, "\n")
>
> # create the dictionary with regular expressions built in
>
> dictionary = {"^aarl australian academic & research libraries$" : "10735",
> "^acimed$" : "11225",
> "^adbs: l'association des professionnels de l'information et
> de la documentation$" : "11715",
> "^alpha 94. strat\351gies d\222alphab\351tisation et de
> d\351veloppement culturel en milieu rural\
> $" : "12205",
> "^american archivists$" : "13185",
> "^anales de documentaci\363n$" : "14165",
> "^annual review of information science and technology
> (arist)$" : "14655",
> "^aproximaciones a la traducci\363n$" : "15145",
> "^apuntes$" : "15635",
> "^architectural records conference report$" : "16125",
> "^archivaria$" : "16615",
> }
>
>
> # take the desired field (journal titles) and put it in a list format
>
> journallist = []
> for line in lines:
> field = string.split(line, '"')
> journalitems = string.strip(string.lower(field[23]))
> journallist.append(journalitems)
>
> # compile the dictionary and make it into a list format
>
> dictionarykeys = dictionary.keys()
> dictionarylist=[]
> for dictionaryexp in dictionarykeys:
> regular = re.compile(dictionaryexp)
> dictionarylist.append(regular)
>
> # run a search that should match all the journal titles in the julio file
to
> the ones in
> # the dictionary, if they match send them to a specific file, if they do
> not, send them
> # to another file so ajustments can be made.
>
> for key in dictionarylist:
> for item in journallist:
> if re.search(key, item):
> output.write("found\t" + item + "\n")
> else:
> output2.write("not found\t" + item + "\n")
I'm not sure exactly what you're seeing, but the following appears to work.
I added a break under the output.write() call to end the loop prematurely
and skip the output2.write() call if the item is found. I bound both
output2 and output to sys.stdout and created two fictional lists for
testing. If the strings are going to be exact matches as opposed to the key
being a regular expression, you don't need re.search at all, but should be
able to simply use cmp(key, item) == 0.
>>> import re
>>> import sys
>>> dictionarylist=['abc', 'def', 'ghi']
>>> journallist=['def', 'ghi', 'jkl']
>>> output=sys.stdout
>>> output2=sys.stdout
>>> for key in dictionarylist:
... for item in journallist:
... if re.search(key, item):
... output.write("found\t" + item + "\n")
... break
... else:
... output2.write("not found\t" + item + "\n")
...
not found jkl
found def
found ghi
>>>
--
Daniel Fackrell (unlearned at learn2think.org)
When we attempt the impossible, we can experience true growth.
More information about the Python-list
mailing list