[Tutor] Converting from unicode to nonstring

Steven D'Aprano steve at pearwood.info
Fri Oct 15 15:37:57 CEST 2010


On Fri, 15 Oct 2010 09:26:48 pm David Hutto wrote:
> Ok, Let me restate and hopefully further clarify.
>
> 1. I have a field for a wxpython app using matplotlib to display
> 2. I have a sqlite3 db which I'm retrieving information from

Both of those points are irrelevant.


> 3. The sqlitle data is returned as unicode: u'field'

Semi-relevant. What's important is that you have data as strings. It 
could be coming from a text file:

data = open("my data.txt").read()

or from the user:

data = raw_input("Enter some data: ")

or any other source that gives a string. It makes no difference where it 
comes from, the only thing that is important is that it is a string.


> 4. The portion of the matplotlib code is filled in, in a for x in y:

I don't really understand this sentence. What do you mean, portion of 
code is filled in?

But in any case, it's irrelevant that you are doing something in a for 
loop. For loop, while loop, using the data once only, who cares?


> 5. in plot(self.plot), self.plot is the variable I'm using from the
> unicoded db field comes in from sqlite as u'[1,2,3,4]', which places
> a string in quotes in that variables place:
>
> plot(u'[1,2,3,4]')

Your sentence above is a bit convoluted, but I *think* this is what 
you're trying to say:

- you take the string (which originally comes from the database, but it 
doesn't matter where it comes from) and store it in self.plot;

- you pass self.plot to a function plot(); and

- since self.plot is a string, you are passing a string to the function 
plot().

Presumably this is a problem, because plot() expects a list of ints, not 
a string.


> 6. the plot(eval(self.plot)), changes the variable from the
> u'[1,2,3,4]' to just [1,2,3,4]

Well, yes it does, so long as the string is perfectly formed, and so 
long as it doesn't contain anything unexpected.

It is also slow, and unnecessary, and a bad habit to get into unless you 
know exactly what you are doing.


> 7 As stated somewhere above, the float error has nothing to do with
> the probel, only the fact that it was used as if I had placed ''
> around the necessary data from the db field.

Huh?


> 8. If anyone has a way better than eval to convert the u'field' when
> replacing a variable so that
>
> self.plot = [1,2,3,4]
>
> instead of
>
> self.plot = u'[1,2,3,4]'

And now, finally, after a dozen or more emails and 8 points (most of 
which are irrelevant!) we finally come to the real problem:

"I have a unicode string that looks like u'[1,2,3,4]' and I want to 
convert it into a list [1, 2, 3, 4]. How do I do that?"

There are at least three solutions to this:

(1) eval. The benefit of eval is that it is a built-in, so you don't 
have to do any programming. The downsides are that:

- it is very powerful, so it can do too much;
- it is dangerous if you can't trust the source of the data;
- because it does so much, it's a little slow;
- because it does so much, it will happily accept data that *should* 
give you an error;
- on the other hand, it's also quite finicky about what it does accept, 
and when it fails, the error messages may be cryptic.


(2) ast.literal_eval. The benefit of this is that it is a standard 
library function starting in Python 2.6, so all you need do is import 
the ast module. The downsides are:

- it is very powerful, so it can do too much;
- because it does so much, it will happily accept data that *should* 
give you an error;
- on the other hand, it's also quite finicky about what it does accept, 
and when it fails, the error messages may be cryptic.


(3) Write your own converter. The benefit of this is that you can make 
it as flexible or as finicky as you like. The downside is that you have 
to write it. But that's not actually very hard, and we can make sure 
that we get a nice error message in the event of a problem:


def str_to_list(s):
    """Convert a string that looks like a list to a list of ints."""
    s = s.strip()  # ignore leading and trailing spaces
    if not (s.startswith("[") and s.endswith("]")):
        raise ValueError("string does not look like a list")
    s = s[1:-1]  # throw away the [ and ]
    s = s.replace(",", " ")
    result = []
    try:
        for word in s.split():
            result.append(int(word))
    except ValueError:
        raise ValueError("item `%s` does not look like an int" % word)
    return result


>>> str_to_list(u' [1 , 2, 3, 15 , -3, 26,1, 7 ]   ')
[1, 2, 3, 15, -3, 26, 1, 7]


If you pass it faulty data, it gives you a nice error message:

>>> str_to_list( u'{1:2}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in str_to_list
ValueError: string does not look like a list

>>> str_to_list(u'[1,2,3,None]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 13, in str_to_list
ValueError: item `None` does not look like an int


Compared to eval and ast.literal_eval, both of which do too much:

>>> eval(u'{1:2}')
{1: 2}
>>> eval(u'[1,2,3,None]')
[1, 2, 3, None]

>>> ast.literal_eval(u'{1:2}')
{1: 2}
>>> ast.literal_eval(u'[1,2,3,None]')
[1, 2, 3, None]


-- 
Steven D'Aprano


More information about the Tutor mailing list