[Tutor] converting string to text
Dave Angel
davea at davea.name
Wed Jul 10 12:45:15 CEST 2013
On 06/28/2013 07:08 PM, grubber at roadrunner.com wrote:
> I don't get why this doesn't work. Its a very small program to read in a group of numbers. I've attached the the python file and my test file. The test file just contains one line. The error I get is that it can't convert the string to a float, but its a valid number.
>
Good job reducing the problem to a pretty small sample. It'd also be
nice if you had included the code and data inline, since not everyone
can get attachments, or is willing to open them if they do.
Please specify Python version and give the complete error message
(including traceback), not a paraphrased error.
Get rid of the BOM from the data file, and it'll work fine. You don't
specify what version of Python you're using, so I have to guess. But
there's a utf-8 BOM conversion of a BOM at the beginning of that file,
and that's not numeric. Best would be to change the way you generate
that file, and don't put in a BOM for utf-8.
BOM's are markers that are put at the beginning of certain encodings of
files to distinguish between BE and LE encodings. But since your file
is utf-8, a BOM is unnecessary and confusing. It may even be illegal,
but I'm not sure about that.
When in doubt about this sort of thing, use a hex dumper to examine the
file. But in this case, all you had to do with to
print repr(line)
and you would have seen:
'\xef\xbb\xbf1.0000000 0.0000000 0.0000000\n'
The BOM is by definition a single Unicode code point. But once encoded
to utf-8, it's 3 bytes long.
You appear to be using Python 2.7, since the string module doesn't
contain the split function in 3.3. Therefore you're reading it into a
byte string, and the BOM is exactly 3 bytes. So remove the first 3
bytes of lines[0]. You probably want to specifically use a startswith()
method to make sure those 3 bytes ARE a BOM, in case some of your files
have it and some do not.
BTW, there are other comments I could make about the source. But the
main concern is that you're mixing tabs and spaces. Nasty bugs can
occur that way.
--
DaveA
More information about the Tutor
mailing list