one-element tuples

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Mon Apr 11 02:57:29 EDT 2016


Fillmore writes:

> so, I do not quite control the format of the file I am trying to
> parse.
>
> it has the format:
>
> "str1","str2",....,"strN" => more stuff
>   :
>
> in some cases there is just one "str" which is what created me
> problem.  The first "str1" has special meaning and, at times, it can
> be alone.
>
> The way I handle this is:
>
>     parts = line.strip().split(" => ")
>     tokens = eval(parts[0])
>
>     if type(tokens) == str:   #Handle case that there's only one token

[- -]

>     else:

[- -]

> which admittedly is not very elegant. If you have suggestions on how
> to avoid the use of eval() and still achieve the same, I would be
> delighted to hear them

It depends on what a "strK" can be. You already trust that it cannot be
"A => B", but can it be "A, B"? Can it be '"Literally" this'? Or "\"".
That is, can the strings contain commas or quote characters?

If not, some variant of this may be preferable:

        [ item.strip('"') for item in parts[0].split(',') ]

Or something like this:

        re.findall('[^",]+', parts[0])

Both are brittle if the format allows the crucial characters to occur
(or not occur) in different ways. The regex way may be flexible enough
to deal with the details, but it would be more complicated than the
above example.

One thing would be easy to do either way: _check_ the line for no
surprises, and _then_ proceed to split, or raise an alert, or deal with
each detected special case specially.

If the format allows commas or doublequotes inside the strings, the
quotation rules may be such that you might be able to abuse the CSV
reader to interpret them. It depends much on the details, and is not
different from the way you are currently relying on the first part of
the line following Python's expression syntax.

next(csv.reader(io.StringIO('"fo,o",\'"bar"\'')))
# => ['fo,o', '\'"bar"\'']



More information about the Python-list mailing list