comparing strings

Tue Jul 30 15:41:20 EDT 2002

"jadedlime" <jadedlime at hotmail.com> wrote in message
news:3d46e543$1_9 at news.teranews.com...
> Hi, I am very new to Python and very new to programming in general.
> I need to take a particular string of data and compare it to a dictionary
> which will put all matching records in a specific file and all the records
> from the string that do not match in another to be dealt with on an
> individual basis.
>
> I appreciate any and all help on this problem, part of the code is
included.
> this seems to produce the matches, but not the mismatches.
>
>
>
> import string, re, codecs
>
> # create inputs and outputs
>
> input = codecs.open("/home/jadedlime/Julio/julio.work", "r",
> encoding="ISO-8859-1")
> output = codecs.open("sec_try", "w", encoding="ISO-8859-1")
> output2 = codecs.open("sec_trynotfound", "w", encoding="ISO-8859-1")
>
> # read and split the lines of the main julio records
>
> whole = input.read()
> lines = string.split(whole, "\n")
>
> # create the dictionary with regular expressions built in
>
> dictionary = {"^aarl australian academic & research libraries$" : "10735",
>               "^acimed$" : "11225",
>               "^adbs: l'association des professionnels de l'information et
> de la documentation$" : "11715",
>               "^alpha 94. strat\351gies d\222alphab\351tisation et de
> d\351veloppement culturel en milieu rural\
>               $" : "12205",
>               "^american archivists$" : "13185",
>               "^anales de documentaci\363n$" : "14165",
>               "^annual review of information science and technology
> (arist)$" : "14655",
>               "^aproximaciones a la traducci\363n$" : "15145",
>               "^apuntes$" : "15635",
>               "^architectural records conference report$" : "16125",
>               "^archivaria$" : "16615",
> }
>
>
> # take the desired field (journal titles) and put it in a list format
>
> journallist = []
> for line in lines:
>         field = string.split(line, '"')
>         journalitems = string.strip(string.lower(field[23]))
>         journallist.append(journalitems)
>
> #  compile the dictionary and make it into a list format
>
> dictionarykeys = dictionary.keys()
> dictionarylist=[]
> for dictionaryexp in dictionarykeys:
>         regular = re.compile(dictionaryexp)
>         dictionarylist.append(regular)
>
> # run a search that should match all the journal titles in the julio file
to
> the ones in
> # the dictionary, if they match send them to a specific file, if they do
> not, send them
> # to another file so ajustments can be made.
>
> for key in dictionarylist:
>         for item in journallist:
>                 if re.search(key, item):
>                         output.write("found\t" + item + "\n")
>         else:
>                 output2.write("not found\t" + item + "\n")

I'm not sure exactly what you're seeing, but the following appears to work.
I added a break under the output.write() call to end the loop prematurely
and skip the output2.write() call if the item is found.  I bound both
output2 and output to sys.stdout and created two fictional lists for
testing.  If the strings are going to be exact matches as opposed to the key
being a regular expression, you don't need re.search at all, but should be
able to simply use cmp(key, item) == 0.

>>> import re
>>> import sys
>>> dictionarylist=['abc', 'def', 'ghi']
>>> journallist=['def', 'ghi', 'jkl']
>>> output=sys.stdout
>>> output2=sys.stdout
>>> for key in dictionarylist:
...     for item in journallist:
...         if re.search(key, item):
...             output.write("found\t" + item + "\n")
...             break
...     else:
...         output2.write("not found\t" + item + "\n")
...
not found       jkl
found   def
found   ghi
>>>

--
Daniel Fackrell (unlearned at learn2think.org)
When we attempt the impossible, we can experience true growth.