Wait... WHAT?

Tim Chase python.list at tim.thechases.com
Wed Feb 12 22:29:53 EST 2014


On 2014-02-13 00:59, Mark Lawrence wrote:
> >>>> s = "\u3141" # HANGUL LETTER MIEUM
> >>>> f = open('test.txt', 'w')
> >>>> f.write("\u3141")  
> > Traceback (most recent call last):
> >    File "<stdin>", line 1, in <module>
> > UnicodeEncodeError: 'ascii' codec can't encode character '\u3141'
> > in position 0: ordinal not in range(128)
> >
> > Just because the open() call hides the specification of how Python
> > should do that encoding doesn't prevent the required encoding from
> > happening. :-)
> 
> Which clearly reinforces the fact that what you originally said is 
> incorrect, I don't have to do anything, Python very kindly does
> things for me under the covers.

...and when they break, you get to keep both pieces. :)

If you don't know that encoding is being done, it's a lot harder to
trust the assumption that you can directly write strings to files
when exceptions like the above happen. My original point (though
perhaps not conveyed as well as I'd intended) was that only bytes get
written to the disk, and that some encoding must take place.  It can
be done implicitly using some defaults which may break (as demoed),
whereas one would be better off doing it explicitly such as Chris
shows:

  >>> f = open('test.txt', 'w', encoding='utf-8')
  >>> f.write("\u3141")  
  1

UTF-8'rs gonna 8. (or whatever memes the cool kids are riffing these
days)

-tkc






More information about the Python-list mailing list