detecting newline character

Daniel Geržo danger at rulez.sk
Sun Apr 24 03:43:58 EDT 2011


On 23.4.2011 21:18, Thomas 'PointedEars' Lahn wrote:
> Daniel Geržo wrote:
>
>> I need to detect the newline characters used in the file I am reading.
>> For this purpose I am using the following code:
>>
>> def _read_lines(self):
>>       with contextlib.closing(codecs.open(self.path, "rU")) as fobj:
>>           fobj.readlines()
>>           if isinstance(fobj.newlines, tuple):
>>               self.newline = fobj.newlines[0]
>>           else:
>>               self.newline = fobj.newlines
>>
>> This works fine, if I call codecs.open() without encoding argument; I am
>> testing with an ASCII enghlish text file, and in such case the
>> fobj.newlines is correctly detected being as '\r\n'. However, when I
>> call codecs.open() with encoding='ascii' argument, the fobj.newlines is
>> None and I can't figure out why that is the case. Reading the PEP at
>> http://www.python.org/dev/peps/pep-0278/ I don't see any reason why
>> would I end up with newlines being None after I call readlines().
>>
>> Anyone has an idea? You can fetch the file I am testing with from
>> http://danger.rulez.sk/subrip_ascii.srt
>
> I see nothing suspicious in your .srt *after* downloading it.  file -i
> confirms that it only contains US-ASCII characters (but see below).

That is indeed the case in my environment too.

danger@[danger-mbp ~/devel/pysublib/pysublib/test/files]> file -i 
subrip_ascii.srt
subrip_ascii.srt: regular file
danger@[danger-mbp ~/devel/pysublib/pysublib/test/files]> file 
subrip_ascii.srt
subrip_ascii.srt: ASCII English text, with CRLF line terminators


> The only reason I can think of for this not working ATM comes from the
> documentation, where it says that 'U' requires Python to be built with
> universal newline support; that it is *usually* so, but might not be so in
> your case (but then the question remains: How could it be not None without
> `encoding' argument?)

Yes, this is what does not make sense. If I didn't have the universal 
newline support enabled, I wouldn't have the newlines attribute at all.

> <http://docs.python.org/library/codecs.html?highlight=codecs.open#codecs.open>
> <http://docs.python.org/library/functions.html#open>
>
> WFM with and without `encoding' argument in python-2.7.1-8 (CPython), Debian
> GNU/Linux 6.0.1, Linux 2.6.35.5-pe (custom) SMP i686.
>
> Which Python implementation and version are you using on which system?

This is a standard python installation from MacPorts. System is OS X 
10.6.7. I have now tried both python 2.7.1 and python 2.6.6 from 
MacPorts and also 2.6.6 on FreeBSD. All fail for me when I set encoding.

> On which system has the "ASCII" file been created and how?  Note that both
> uploading the file with FTP in ASCII mode and downloading over HTTP might
> have removed the problem Python has with it.

Unfortunately I am not 100% sure where I created the file, it was quite 
some time ago, but it was either WinXP, or OS X Leopard. The source code 
can be found at https://bitbucket.org/danger/pysublib/src - I noticed 
the subtitle file tests (e.g. test/test_subripfile.py) are failing for 
me and I have identified the problem with newlines being None after 
calling read().

-- 
S pozdravom / Best regards
   Daniel Gerzo



More information about the Python-list mailing list