writing \feff at the begining of a file
Peter Billam
peter at www.pjb.com.au
Sun Aug 15 07:08:08 EDT 2010
On 2010-08-14, Martin v. Loewis <martin at v.loewis.de> wrote:
>> Is there a standard way to autodetect the encoding of a text file?
> Use the chardet module:
> http://chardet.feedparser.org/
Very timely: the python-chardet package just seems to have appeared
on debian squeeze :-) After my latest "aptitude safe-upgrade":
box8 (debian) ~> aptitude show python-chardet
Package: python-chardet
State: installed
Automatically installed: yes
Version: 2.0.1-1
Priority: optional
Section: python
Maintainer: Piotr Ożarowski <piotr at debian.org>
Uncompressed Size: 721k
Depends: python, python-support (>= 0.90.0)
Description: universal character encoding detector
Chardet takes a sequence of bytes in an unknown character encoding,
and attempts to determine the encoding.
Supported encodings:
* ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
* Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and
Simplified Chinese)
* EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
* EUC-KR, ISO-2022-KR (Korean)
* KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
* ISO-8859-2, windows-1250 (Hungarian)
* ISO-8859-5, windows-1251 (Bulgarian)
* windows-1252 (English)
* ISO-8859-7, windows-1253 (Greek)
* ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
* TIS-620 (Thai)
This library is a port of the auto-detection code in Mozilla.
Homepage: http://chardet.feedparser.org/
Regards, Peter
--
Peter Billam www.pjb.com.au www.pjb.com.au/comp/contact.html
More information about the Python-list
mailing list