Converting text file to different encoding.

Peter Otten __peter__ at web.de
Fri Apr 17 11:06:33 EDT 2015


Chris Angelico wrote:

> On Sat, Apr 18, 2015 at 12:26 AM,  <subhabrata.banerji at gmail.com> wrote:
>> I tried to do as follows,
>>>>> import codecs
>>>>> sourceEncoding = "iso-8859-1"
>>>>> targetEncoding = "utf-8"
>>>>> source = open("source1","w")
>>>>> string1="String type"
>>>>> str1=str(string1)
>>>>> source.write(str1)
>>>>> source.close()
>>>>> target = open("target", "w")
>>>>> source=open("source1","r")
>>>>> target.write(unicode(source.read(),
>>>>> sourceEncoding).encode(targetEncoding))
>>>>>
>>
>> am I going ok?
> 
> Here's how I'd do it.
> 
> $ python3
>>>> with open("source1", encoding="iso-8859-1") as source, open("target",
>>>> "w", encoding="utf-8") as target:
> ...     target.write(source.read())

This approach is also viable in Python 2.6 and 2.7 if you use io.open() 
instead of the builtin. 

To limit memory consumption for big files you can replace

target.write(source.read())

with

shutil.copyfileobj(source, target)

If you want to be sure that line endings are preserved open both files with

io.open(..., newline="") # disable newline translation




More information about the Python-list mailing list