writing \feff at the begining of a file

Peter Billam peter at www.pjb.com.au
Sun Aug 15 07:08:08 EDT 2010


On 2010-08-14, Martin v. Loewis <martin at v.loewis.de> wrote:
>> Is there a standard way to autodetect the encoding of a text file? 
> Use the chardet module:
> http://chardet.feedparser.org/

Very timely: the python-chardet package just seems to have appeared
on debian squeeze :-)  After my latest "aptitude safe-upgrade":

  box8 (debian) ~> aptitude show python-chardet
  Package: python-chardet                  
  State: installed
  Automatically installed: yes
  Version: 2.0.1-1
  Priority: optional
  Section: python
  Maintainer: Piotr Ożarowski <piotr at debian.org>
  Uncompressed Size: 721k
  Depends: python, python-support (>= 0.90.0)
  Description: universal character encoding detector
   Chardet takes a sequence of bytes in an unknown character encoding,
   and attempts to determine the encoding. 
 
   Supported encodings: 
   * ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) 
   * Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and
     Simplified Chinese) 
   * EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese) 
   * EUC-KR, ISO-2022-KR (Korean) 
   * KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic) 
   * ISO-8859-2, windows-1250 (Hungarian) 
   * ISO-8859-5, windows-1251 (Bulgarian) 
   * windows-1252 (English) 
   * ISO-8859-7, windows-1253 (Greek) 
   * ISO-8859-8, windows-1255 (Visual and Logical Hebrew) 
   * TIS-620 (Thai) 
   
   This library is a port of the auto-detection code in Mozilla.
  Homepage: http://chardet.feedparser.org/

Regards,  Peter

-- 
Peter Billam       www.pjb.com.au    www.pjb.com.au/comp/contact.html



More information about the Python-list mailing list