which data structure should I use?

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Fri Jan 15 17:25:14 EST 2010


En Fri, 15 Jan 2010 01:56:24 -0300, Eknath Venkataramani  
<eknath.iyer at gmail.com> escribió:

> I have a txt file in the following format:

> [code]
> "confident" => {
>   count => 4,
>   trans => {
>      "ashahvasahta" => 0.74918568,
>     "atahmavaishahvaasa" => 0.09095465,
>     "pahraaram\.nbha" => 0.06990729,
>          "mailatae" => 0.02856427,
>            "utanai" => 0.01929341,
>              "anaa" => 0.01578552,
>          "uthaanae" => 0.01403157,
>          "jaitanae" => 0.01227762,
>     },
> },
> "consumers" => {
>   count => 4,
>   trans => {
>     "upabhaokahtaa" => 0.75144362,
> ...

> and I need to extract "confident" , "ashahvasahta" from the first
> record, "consumers",  "upabhaokahtaa" from the second record...
> i.e. "word in english" and the "first word in the probable-translations"

The most robust way would be to write a specific parser for such format.  
Should be easy using pyparsing http://pyparsing.wikispaces.com/

If you can guarantee certain properties (e.g. lines like "confident",  
"consumers" are always in a separate line; translations appear one per  
line; no line breaks before/after the => sign, etc.) then you could  
process the file line by line, looking at those separators. But only do  
that is you are completely sure the format is fixed (e.g. the file is  
computer-generated, not human-written). Anyway, it isn't much easier than  
writing a real parser, and the latter is a lot more reliable. Learning how  
to use a tool like pyparsing is in no way a waste of time.

-- 
Gabriel Genellina




More information about the Python-list mailing list