Splitting Tree
Cameron Simpson
cs at zip.com.au
Sun Dec 2 17:11:51 EST 2012
On 02Dec2012 07:02, subhabangalore at gmail.com <subhabangalore at gmail.com> wrote:
| On Sunday, December 2, 2012 5:39:32 PM UTC+5:30, subhaba... at gmail.com wrote:
| > I am using NLTK and I used the following command,
| > chunk=nltk.ne_chunk(tag)
| >
| > print "The Chunk of the Line Is:",chunk
| >
| > The Chunk of the Line Is: (S
| > ''/''
| > It/PRP
[...]
| > Now I am trying to split the output preferably by ",/,".
[...]
|
| Sorry to ask this. I converted in string and then splitted it.
I'm glad you solved your problem, but I would like to point out that
this is generally a risky way of manipulating data.
The problem arises if the string you're splitting on occurs as a literal
piece of text, but _not_ in the sense you intend. It may be the case
that it will not happen in your particular situation, but in general the
procedure:
- convert structure to string somehow
- perfect simple text manipulation
- unconvert
is at risk of simplistic parsing of the string.
A common example is with CSV data. Supposing you wanted the the third
column from an array of tuples:
rows = [ (1,2,"A",4),
(5,6,"B",8),
(9,10,"C,D",12),
]
and you wanted [ "A", "B", "C,D" ]. If one went with the "convert to
text" approach, and decided that converting each tuple to a CSV style
data row was a good idea you might write:
column_3 = []
for row in rows:
csv_string = ",".join( str(item) for item in row )
item3 = csv_string.split(",")[2]
column_3.append(item3)
The (simplistic) code above with give you "C" from the third row, not
"C,D". Because it naively assumes there are no commas in the data, and
then does a simplistic textual split to find the third column.
Obviously you woldn't really do that for something this simple; it is to
show the issue. But your situation where manipulating a tree was tricky
and you converted it to a string is very similar conceptually.
Hoping this shows you the issue,
--
Cameron Simpson <cs at zip.com.au>
I'm not making any of this up you know. - Anna Russell
More information about the Python-list
mailing list