[Tutor] Script to search in string of values from file A in file B
aduarte
aduarte at itqb.unl.pt
Wed May 9 21:28:47 CEST 2012
Dear All,
Sorry it seems that I got the wrong mailing list to subscribe ...
I got the idea that this list was open to newbies ... by the answers I
got I see that I was wrong
"
In that case, what do you use for data of the last key?
If you really have to handle the case where there is a final key with
no
data, then you'll have to detect that case, and make up the data
separately. That could be done with a try block, but this is probably
clearer:
rawlines = object.readlines()
if len(rawlines) %2 != 0:
rawlines += "" #add an extra line
lines = iter(rawlines)
for keyline in lines:
linedata = lines.next()
for word in searches:
if word in keyline:
print word, "-->", linedata
"
after chatting in other mailing lists about other languages I realized
that this mailing list is not in my league for python ...
Interestingly I did got a strange advice from this list: try awk ...
of Perl for the job, as Python is kind of tricky to print the next line
that you selected (yes that was my question and I still don't understand
how ppl advise me to insert new lines in 500Mb files and so on to do
it...)
Once again sorry about the time.
Cheers
Afonso
On 2012-05-09 16:16, Dave Angel wrote:
> On 05/09/2012 11:04 AM, Afonso Duarte wrote:
>>
>>
>> -----Original Message-----
>> From: Dave Angel [mailto:d at davea.name]
>> <SNIP>
>>>
>>> Please post your messages as plain-text. The double-spacing I get
>>> is
>>> very annoying.
>>
>> Sorry for that my outlook mess-it-up
>
> I'm sure there's a setting to say use plain-text. In Thunderbird, i
> tell it that any message to forums is to be plain-text.
>
>>
>>> There's a lot you don't say, which is implied in your code.
>>> Are the lines in file B.txt really alternating:
>>>
>>> key1
>>> data for key1
>>> key2
>>> data for key2
>>> ...
>>
>> Sure, that's why I describe them in the email like that and didn't
>> say that
>> they weren't
>>
>>> Are the key lines in file B.txt exact messages, or do they just
>>> "contain" the key somewhere in the line?
>>> Your code assumes the latter,
>>> but the whole thing could be much simpler if it were always an
>>> exact match.
>>
>> The entry in B has text before and after (the size of that text
>> changes from
>> entry to entry.
>
> In other words, the line pairs are not like your sample, but more
> like:
>
> trash key1 more trash
> Useful associated data for the previous key
> trash2 key2 more trash
> Useful associated ata for the previous key
>
>
>>
>>
>>> Are the keys in A.txt unique? If so, you could store them in a
>>> set, and
>> make lookup basically >instantaneous.
>>
>> That indeed I didn't refer, the entries from A are unique in B
>
> Not what I asked. Are the keys in A.txt ever present more than once
> in
> A.txt ? But then again, if the key line can contain garbage before
> and/or after the key, then the set idea is moot anyway.
>
>>
>>
>>> I think the real question you had was how to access the line
>>> following the
>> key, once you matched the key.
>>
>> True that is my real question (as the code above works just for the
>> title
>> line, I basically want to print the next line of the B.txt for each
>> entry)
>>
>>> Something like this should do it (untested)
>>>
>>> lines = iter( object )
>>> for key in lines:
>>> linedata = lines.next()
>>> if key in mydictionary:
>>> print key, "-->", linedata
>>
>>
>>> Main caveat I can see is the file had better have an even number of
>>> lines.
>>
>>
>> That changes from file to file, and its unlikely i have all even
>> number.
>
> In that case, what do you use for data of the last key?
>
>
> If you really have to handle the case where there is a final key with
> no
> data, then you'll have to detect that case, and make up the data
> separately. That could be done with a try block, but this is
> probably
> clearer:
>
> rawlines = object.readlines()
> if len(rawlines) %2 != 0:
> rawlines += "" #add an extra line
> lines = iter(rawlines)
>
> for keyline in lines:
> linedata = lines.next()
> for word in searches:
> if word in keyline:
> print word, "-->", linedata
>
>
>>
>> Thanks
>>
>>
>> Afonso
>>
>>
More information about the Tutor
mailing list