[Tutor] converting string to text

Dave Angel davea at davea.name
Wed Jul 10 12:45:15 CEST 2013


On 06/28/2013 07:08 PM, grubber at roadrunner.com wrote:
> I don't get why this doesn't work. Its a very small program to read in a group of numbers. I've attached the the python file and my test file. The test file just contains one line. The error I get is that it can't convert the string to a float, but its a valid number.
>

Good job reducing the problem to a pretty small sample.  It'd also be 
nice if you had included the code and data inline, since not everyone 
can get attachments, or is willing to open them if they do.

Please specify Python version and give the complete error message 
(including traceback), not a paraphrased error.

Get rid of the BOM from the data file, and it'll work fine.  You don't 
specify what version of Python you're using, so I have to guess.  But 
there's a utf-8 BOM conversion of a BOM at the  beginning of that file, 
and that's not numeric.  Best would be to change the way you generate 
that file, and don't put in a BOM for utf-8.

BOM's are markers that are put at the beginning of certain encodings of 
files to distinguish between BE and LE encodings.  But since your file 
is utf-8, a BOM is unnecessary and confusing.  It may even be illegal, 
but I'm not sure about that.

When in doubt about this sort of thing, use a hex dumper to examine the 
file.  But in this case, all you had to do with to
     print repr(line)

and you would have seen:
'\xef\xbb\xbf1.0000000 0.0000000 0.0000000\n'

The BOM is by definition a single Unicode code point.  But once encoded 
to utf-8, it's 3 bytes long.

You appear to be using Python 2.7, since the string module doesn't 
contain the split function in 3.3.  Therefore you're reading it into a 
byte string, and the BOM is exactly 3 bytes.  So remove the first 3 
bytes of lines[0].  You probably want to specifically use a startswith() 
method to make sure those 3 bytes ARE a BOM, in case some of your files 
have it and some do not.




BTW, there are other comments I could make about the source.   But the 
main concern is that you're mixing tabs and spaces.  Nasty bugs can 
occur that way.


-- 
DaveA



More information about the Tutor mailing list