[Tutor] Script to search in string of values from file A in file B

aduarte aduarte at itqb.unl.pt
Wed May 9 21:28:47 CEST 2012


Dear All,

Sorry it seems that I got the wrong mailing list to subscribe ...


I got the idea that this list was open to newbies ... by the answers I 
got I see that I was wrong


"

In that case, what do you use for data of the last key?


If you really have to handle the case where there is a final key with 
no
data, then you'll have to detect that case, and make up the data
separately.  That could be done with a try block, but this is probably
clearer:

rawlines = object.readlines()
if len(rawlines) %2 != 0:
     rawlines += ""      #add an extra line
lines = iter(rawlines)

for keyline in lines:
     linedata = lines.next()
     for word in searches:
         if word in keyline:
             print word, "-->", linedata
"


  after chatting in other mailing lists about other languages I realized 
that this mailing list is not in my league for python ...
  Interestingly I did got a strange advice from this list: try awk ... 
of Perl for the job, as Python is kind of tricky to print the next line 
that you selected (yes that was my question and I still don't understand 
how ppl advise me to insert new lines in 500Mb files and so on to do 
it...)

Once again sorry about the time.

Cheers

Afonso




On 2012-05-09 16:16, Dave Angel wrote:
> On 05/09/2012 11:04 AM, Afonso Duarte wrote:
>>
>>
>> -----Original Message-----
>> From: Dave Angel [mailto:d at davea.name]
>> <SNIP>
>>>
>>> Please post your messages as plain-text.   The double-spacing I get 
>>> is
>>> very annoying.
>>
>> Sorry for that my outlook mess-it-up
>
> I'm sure there's a setting to say use plain-text.  In Thunderbird, i
> tell it that any message to forums is to be plain-text.
>
>>
>>> There's a lot you don't say, which is implied in your code.
>>> Are the lines in file B.txt really alternating:
>>>
>>> key1
>>> data for key1
>>> key2
>>> data for key2
>>> ...
>>
>> Sure, that's why I describe them in the email like that and didn't 
>> say that
>> they weren't
>>
>>> Are the key lines in file B.txt exact messages, or do they just
>>> "contain" the key somewhere in the line?
>>>  Your code assumes the latter,
>>> but the whole thing could be much simpler if it were always an 
>>> exact match.
>>
>> The entry in B has text before and after (the size of that text 
>> changes from
>> entry to entry.
>
> In other words, the line pairs are not like your sample, but more 
> like:
>
> trash  key1    more trash
> Useful associated data for the previous key
> trash2 key2    more trash
> Useful associated ata for the previous key
>
>
>>
>>
>>> Are the keys in A.txt unique?  If so, you could store them in a 
>>> set, and
>> make lookup basically >instantaneous.
>>
>> That indeed I didn't refer, the entries from A are unique in B
>
> Not what I asked.  Are the keys in A.txt ever present more than once 
> in
> A.txt ?  But then again, if the key line can contain garbage before
> and/or after the key, then the set idea is moot anyway.
>
>>
>>
>>> I think the real question you had was how to access the line 
>>> following the
>> key, once you matched the key.
>>
>> True that is my real question (as the code above works just for the 
>> title
>> line, I basically want to print the next line of the B.txt for each 
>> entry)
>>
>>> Something like this should do it (untested)
>>>
>>> lines = iter( object )
>>> for key in lines:
>>>    linedata = lines.next()
>>>    if key in  mydictionary:
>>> 	print key, "-->", linedata
>>
>>
>>> Main caveat I can see is the file had better have an even number of 
>>> lines.
>>
>>
>> That changes from file to file, and its unlikely i have all even 
>> number.
>
> In that case, what do you use for data of the last key?
>
>
> If you really have to handle the case where there is a final key with 
> no
> data, then you'll have to detect that case, and make up the data
> separately.  That could be done with a try block, but this is 
> probably
> clearer:
>
> rawlines = object.readlines()
> if len(rawlines) %2 != 0:
>     rawlines += ""      #add an extra line
> lines = iter(rawlines)
>
> for keyline in lines:
>     linedata = lines.next()
>     for word in searches:
>         if word in keyline:
>             print word, "-->", linedata
>
>
>>
>> Thanks
>>
>>
>> Afonso
>>
>>



More information about the Tutor mailing list