Ascii Encoding Error with UTF-8 encoder

John Machin sjmachin at lexicon.net
Tue Jun 27 20:25:30 EDT 2006


On 28/06/2006 9:44 AM, Mike Currie wrote:
> 
> What I am doing is converting data for processing that will be tab (for 
> columns) and newline (for row) delimited.   Some of the data contains tabs 
> and newlines so, I have to convert them to something else so the file 
> integrity is good.
> 
> Not my idea, I've been left with the implementation however.
> 

Do you *need* UTF-8? Or is that only there to hide away the \x88 and 
\x83? Apart from tab and linefeed, what (if any) other characters are 
there in the data that are not printable ASCII characters?

In any case, if you have 8-bit string data, the CSV file format would 
appear to meet the requirement: it preserves your data by "quoting" 
delimiters and newlines that appear in the actual data. The Python csv 
module is included in every Python distribution since 2.3.

Cheers,
John



More information about the Python-list mailing list