[Patches] [ python-Patches-1214889 ] file.encoding support for file.write and file.writelines
SourceForge.net
noreply at sourceforge.net
Mon Aug 8 08:49:50 CEST 2005
Patches item #1214889, was opened at 2005-06-04 19:45
Message generated for change (Comment added) made by birkenfeld
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1214889&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Reinhold Birkenfeld (birkenfeld)
Assigned to: Nobody/Anonymous (nobody)
Summary: file.encoding support for file.write and file.writelines
Initial Comment:
Here is a patch that allows Unicode strings written to
a file being automatically encoded. It enables Python
code to set file.encoding and obeys this encoding when
writing Unicode strings with write() or writelines().
It is my first core hackery, so forgive me one leaked
ref or the other. I hope I got the error handling
right; it is kind of confusing...
(btw: Bug #967986 will be fixed with this)
----------------------------------------------------------------------
>Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-08-08 08:49
Message:
Logged In: YES
user_id=1188172
Rejecting. This is incomplete and will be addressed more
properly in Py3k.
----------------------------------------------------------------------
Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-07-14 10:34
Message:
Logged In: YES
user_id=1188172
I agree with you that writing Unicode objects to a binary
file should raise an exception, but with the 'et#' format
string, 8-bit string objects should pass through file.write
unrecoded.
About your second comment: Yes, codecs is one way to do it,
but then I think that the encoding handling for print should
be ripped out, too. After all, that's what many people
complain about: "print unistr" works, while
"sys.stdout.write(unistr)" does not. As the comment below
about bug 1099364 shows, this shows up in various locations.
If this is rejected, file.write() shouldn't accept Unicode
anymore, and print should behave the same way.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2005-07-14 10:19
Message:
Logged In: YES
user_id=38388
I've thought about this some more: I'm not sure whether it
is such a good idea to try to move code from the codecs into
the standard file object - after all, the codecs already
support all this and do a much better job at handling error
cases and the like.
Furthermore, codecs support both directions: reading and
writing. Your patch only does one way.
The encoding support you currently find in the file object
is only needed for printing Unicode objects - it is not used
anywhere else.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2005-07-14 09:52
Message:
Logged In: YES
user_id=38388
This doesn't quite work (yet): you've broken the support for
writing binary data to the file via file.write(). Encodings
should only be used for non-binary files.
Also note that you are not freeing the memory allocated by
the "et#" parser for s.
Please add some test cases where you open a binary file and
write:
a) binary strings
b) contents of a buffer object
c) Unicode objects
to it.
Case c) should raise an exception. a) and b) should result
in the data being written as-is - without doing any recoding.
----------------------------------------------------------------------
Comment By: Petr Prikryl (prikryl)
Date: 2005-07-12 11:59
Message:
Logged In: YES
user_id=771873
The title and the comments do not say so, but the patch was
created by Reinhold Birkenfeld to solve the bug
[ 1099364 ] raw_input() displays wrong unicode prompt
As the bug was closed and Reinhold claims to be his "first
core hackery", I'd like to ask someone else to revise, whether
the patch is the correct solution to the reported bug. The bug
seems to be very visible (hence serious) in non-English
speaking countries where Unicode promisses to solve many
problems. Because of that I ask whether the bug should be
closed before accepting the patch. I am adding this text also
to link this patch to the original problem.
Thanks, Petr
----------------------------------------------------------------------
Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-06-05 18:31
Message:
Logged In: YES
user_id=1188172
Okay, put on #5.
----------------------------------------------------------------------
Comment By: Hye-Shik Chang (perky)
Date: 2005-06-05 17:20
Message:
Logged In: YES
user_id=55188
Yet another thing to fix:
You can't put local namespace declarations after
non-declaration statements. Because Python uses C89 as a C
source code standard, you should all declarations in the top
of functions only.
----------------------------------------------------------------------
Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-06-05 14:17
Message:
Logged In: YES
user_id=1188172
Thanks! Corrected in patch #4.
----------------------------------------------------------------------
Comment By: George Yoshida (quiver)
Date: 2005-06-05 14:09
Message:
Logged In: YES
user_id=671362
Reinhold, libstdtypes.tex needs two fixes.
\versionadded{2.3}
+\versionchanged[The encoding attribute is now writable and
is used
+for encoding Unicode strings given to \method{write()} and
+\method{writelines()}.]{
~~~
First, versionchanged tag does not have a trailing brace and it
resuls in compile error.
Second(really trivial), versionchanged macro automatically
appends a period at the end of the sentence(see the link [*]),
so you don't need to put it by hand.
Then the above line would become:
+\method{writelines()}]{2.5}
[*]: http://docs.python.org/doc/inline-markup.html
----------------------------------------------------------------------
Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-06-05 09:56
Message:
Logged In: YES
user_id=1188172
Third revision; adds new documentation and allows Python
code to set the encoding to Py_None.
----------------------------------------------------------------------
Comment By: Hye-Shik Chang (perky)
Date: 2005-06-05 04:26
Message:
Logged In: YES
user_id=55188
The idea looks good to me.
I attached a revised patch fixed code style, C99-style local
variable declaration and added a regrtest.
I think some documentation update will be needed also.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1214889&group_id=5470
More information about the Patches
mailing list