File Name issue

dn PythonList at DancesWithMice.info
Sun Oct 18 14:48:00 EDT 2020


On 19/10/2020 05:58, Mladen Gogala via Python-list wrote:
> On Sun, 18 Oct 2020 21:00:18 +1300, dn wrote:
>> On 18/10/2020 12:58, Mladen Gogala via Python-list wrote:
>>> On Sat, 17 Oct 2020 22:51:11 +0000, Mladen Gogala wrote:
>>> BTW, I used this
>>> cp /var/log/syslog ./in-file.log
>>> #!/usr/bin/env python3
>>> import io
>>> with open("in-file.log","r") as infile:
>>>       for line in infile:
>>>           print(line)
>>> I got a different error:
>>> Traceback (most recent call last):
>>>     File "./test.py", line 4, in <module>
>>>       for line in infile:
>>>     File "/usr/lib/python3.8/codecs.py", line 322, in decode
>>>       (result, consumed) = self._buffer_decode(data, self.errors, final)
>>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd8 in position
> 897:
>>> invalid continuation byte
>>
>>
>> @Mladen: is syslog a text file or binary format?
> 
> Hi!
> Syslog is the system log. It's a text file. This only happens if I use
> infile as iterable. If I use readline, all is well:
> 
> #!/usr/bin/env python3
> import io
> with open("in-file.log","r") as infile:
>      while True:
>          line=infile.readline()
>          if not line:
>              break
>          print(line)
> 
> I don't particularly like this idiom, but it works. That is probably a bug
> in the utf-8 decoder on Ubuntu. It doesn't happen on my Fedora 32 VM. I
> haven't tried with infile.reconfigure(encoding=None)


[Slightly OT from OP]

Some logging has started to move from simple-text to a more 
compressed?efficient 'binary' - hence my thinking.

Your observation, doubly-interesting.

Fedora uses UTF-8 by default. I would have expected the same of Ubuntu. 
One wonders if different decoder/encoder defaults are set by the 
repo-managers, or some-such explanation.

Using Fedora 32, (as before), and a copy of "/var/log/messages" because 
it doesn't use "syslog", it works happily:

 >>> with open( "messages", "r" ) as infile:
...      for line in infile:
...          print(line)
...          break
...
Oct 18 00:01:01 JrBrown systemd[1]: Starting update of the root trust 
anchor for DNSSEC validation in unbound...

However, the decisive-point is the actual data. Have you worked-out 
which line in the log causes the error - and thus the offending string 
of characters?
-- 
Regards =dn


More information about the Python-list mailing list