[Tutor] Replacing fields in lines of various lengths
spir
denis.spir at free.fr
Tue May 5 12:06:48 CEST 2009
Le Tue, 5 May 2009 00:22:45 -0400,
Dan Liang <danliang20 at gmail.com> s'exprima ainsi:
> -------------Begin data----------------------------
>
> w1 \t case_def_acc \t yes
> w2 \t noun_prop \t no
> w3 \t case_def_gen \t
> w4 \t dem_pron_f \t no
> w3 \t case_def_gen \t
> w4 \t dem_pron_f \t no
> w1 \t case_def_acc \t yes
> w3 \t case_def_gen \t
> w3 \t case_def_gen \t
>
> -------------End data----------------------------
> I tried to make changes to the code above by changing the function where we
> read the dictionary, but it did not work. While it is ugly, I include it as
> a proof that I have worked on the problem. I am sure you will have various
> nice ideas.
>
>
> -------------End code----------------------------
> def newlyTaggedWord(line):
> tagging = ""
> line = line.split(TAB) # separate parts of line, keeping data only
> if len(line)==3:
> word = line[-3]
> tag = line[-2]
> new_tags = tags[tag]
> decision = line[-1]
>
> # in decision I wanted to store #either yes or no if one of #these existed
>
> elif len(line)==2:
> word = line[-2]
> tag = line[-1]
> decision = TAB
>
> # I thought if it is a must to put sth in decision while decision #is really
> absent in line, I would put a tab. But I really want to #avoid putting
> anything there.
>
> new_tags = tags[tag] # read in dict
> tagging = TAB.join(new_tags) # join with TABs
> return word + TAB + tagging + TAB + decision
> -------------End code----------------------------
>
For simplicity, it would be cool if file would have some placeholder in place of absent yes/no 'decisions' so that you know there are always 3 fields. That's what would be cool with most languages. But python is rather flexible and clever for such border cases. Watch the example below:
s1, s2 = "1\t2\t3", "1\t2\t"
items1, items2 = s1.split('\t'), s2.split('\t')
print items1, items2
==>
['1', '2', '3'] ['1', '2', '']
So that you always have 3 items, the 3rd one maybe the empty string. Right?
This means:
* You can safely write "(word,tag,decision) = line.split(TAB)"
[Beware of misleading naming like "line = line.split(TAB)", for after this the name 'line' actually refers to field values.]
* You can have a single process.
* The elif branch in you code above will never run, i guess ;-)
[place a print instruction inside to check that]
Denis
Ps: I noticed that in your final version for the case of files with 2 fields only, you misplaced the file closings. They fit better in the func.
------
la vita e estrany
More information about the Tutor
mailing list