[Tutor] Converting from unicode to nonstring
Steven D'Aprano
steve at pearwood.info
Fri Oct 15 15:37:57 CEST 2010
On Fri, 15 Oct 2010 09:26:48 pm David Hutto wrote:
> Ok, Let me restate and hopefully further clarify.
>
> 1. I have a field for a wxpython app using matplotlib to display
> 2. I have a sqlite3 db which I'm retrieving information from
Both of those points are irrelevant.
> 3. The sqlitle data is returned as unicode: u'field'
Semi-relevant. What's important is that you have data as strings. It
could be coming from a text file:
data = open("my data.txt").read()
or from the user:
data = raw_input("Enter some data: ")
or any other source that gives a string. It makes no difference where it
comes from, the only thing that is important is that it is a string.
> 4. The portion of the matplotlib code is filled in, in a for x in y:
I don't really understand this sentence. What do you mean, portion of
code is filled in?
But in any case, it's irrelevant that you are doing something in a for
loop. For loop, while loop, using the data once only, who cares?
> 5. in plot(self.plot), self.plot is the variable I'm using from the
> unicoded db field comes in from sqlite as u'[1,2,3,4]', which places
> a string in quotes in that variables place:
>
> plot(u'[1,2,3,4]')
Your sentence above is a bit convoluted, but I *think* this is what
you're trying to say:
- you take the string (which originally comes from the database, but it
doesn't matter where it comes from) and store it in self.plot;
- you pass self.plot to a function plot(); and
- since self.plot is a string, you are passing a string to the function
plot().
Presumably this is a problem, because plot() expects a list of ints, not
a string.
> 6. the plot(eval(self.plot)), changes the variable from the
> u'[1,2,3,4]' to just [1,2,3,4]
Well, yes it does, so long as the string is perfectly formed, and so
long as it doesn't contain anything unexpected.
It is also slow, and unnecessary, and a bad habit to get into unless you
know exactly what you are doing.
> 7 As stated somewhere above, the float error has nothing to do with
> the probel, only the fact that it was used as if I had placed ''
> around the necessary data from the db field.
Huh?
> 8. If anyone has a way better than eval to convert the u'field' when
> replacing a variable so that
>
> self.plot = [1,2,3,4]
>
> instead of
>
> self.plot = u'[1,2,3,4]'
And now, finally, after a dozen or more emails and 8 points (most of
which are irrelevant!) we finally come to the real problem:
"I have a unicode string that looks like u'[1,2,3,4]' and I want to
convert it into a list [1, 2, 3, 4]. How do I do that?"
There are at least three solutions to this:
(1) eval. The benefit of eval is that it is a built-in, so you don't
have to do any programming. The downsides are that:
- it is very powerful, so it can do too much;
- it is dangerous if you can't trust the source of the data;
- because it does so much, it's a little slow;
- because it does so much, it will happily accept data that *should*
give you an error;
- on the other hand, it's also quite finicky about what it does accept,
and when it fails, the error messages may be cryptic.
(2) ast.literal_eval. The benefit of this is that it is a standard
library function starting in Python 2.6, so all you need do is import
the ast module. The downsides are:
- it is very powerful, so it can do too much;
- because it does so much, it will happily accept data that *should*
give you an error;
- on the other hand, it's also quite finicky about what it does accept,
and when it fails, the error messages may be cryptic.
(3) Write your own converter. The benefit of this is that you can make
it as flexible or as finicky as you like. The downside is that you have
to write it. But that's not actually very hard, and we can make sure
that we get a nice error message in the event of a problem:
def str_to_list(s):
"""Convert a string that looks like a list to a list of ints."""
s = s.strip() # ignore leading and trailing spaces
if not (s.startswith("[") and s.endswith("]")):
raise ValueError("string does not look like a list")
s = s[1:-1] # throw away the [ and ]
s = s.replace(",", " ")
result = []
try:
for word in s.split():
result.append(int(word))
except ValueError:
raise ValueError("item `%s` does not look like an int" % word)
return result
>>> str_to_list(u' [1 , 2, 3, 15 , -3, 26,1, 7 ] ')
[1, 2, 3, 15, -3, 26, 1, 7]
If you pass it faulty data, it gives you a nice error message:
>>> str_to_list( u'{1:2}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in str_to_list
ValueError: string does not look like a list
>>> str_to_list(u'[1,2,3,None]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 13, in str_to_list
ValueError: item `None` does not look like an int
Compared to eval and ast.literal_eval, both of which do too much:
>>> eval(u'{1:2}')
{1: 2}
>>> eval(u'[1,2,3,None]')
[1, 2, 3, None]
>>> ast.literal_eval(u'{1:2}')
{1: 2}
>>> ast.literal_eval(u'[1,2,3,None]')
[1, 2, 3, None]
--
Steven D'Aprano
More information about the Tutor
mailing list