[Patches] [ python-Patches-1214889 ] file.encoding support for file.write and file.writelines

SourceForge.net noreply at sourceforge.net
Mon Aug 8 08:49:50 CEST 2005


Patches item #1214889, was opened at 2005-06-04 19:45
Message generated for change (Comment added) made by birkenfeld
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1214889&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Reinhold Birkenfeld (birkenfeld)
Assigned to: Nobody/Anonymous (nobody)
Summary: file.encoding support for file.write and file.writelines

Initial Comment:
Here is a patch that allows Unicode strings written to
a file being automatically encoded. It enables Python
code to set file.encoding and obeys this encoding when
writing Unicode strings with write() or writelines().

It is my first core hackery, so forgive me one leaked
ref or the other. I hope I got the error handling
right; it is kind of confusing...

(btw: Bug #967986 will be fixed with this)

----------------------------------------------------------------------

>Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-08-08 08:49

Message:
Logged In: YES 
user_id=1188172

Rejecting. This is incomplete and will be addressed more
properly in Py3k.

----------------------------------------------------------------------

Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-07-14 10:34

Message:
Logged In: YES 
user_id=1188172

I agree with you that writing Unicode objects to a binary
file should raise an exception, but with the 'et#' format
string, 8-bit string objects should pass through file.write
unrecoded.

About your second comment: Yes, codecs is one way to do it,
but then I think that the encoding handling for print should
be ripped out, too. After all, that's what many people
complain about: "print unistr" works, while
"sys.stdout.write(unistr)" does not. As the comment below
about bug 1099364 shows, this shows up in various locations.

If this is rejected, file.write() shouldn't accept Unicode
anymore, and print should behave the same way.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2005-07-14 10:19

Message:
Logged In: YES 
user_id=38388

I've thought about this some more: I'm not sure whether it
is such a good idea to try to move code from the codecs into
the standard file object - after all, the codecs already
support all this and do a much better job at handling error
cases and the like.

Furthermore, codecs support both directions: reading and
writing. Your patch only does one way.

The encoding support you currently find in the file object
is only needed for printing Unicode objects - it is not used
anywhere else.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2005-07-14 09:52

Message:
Logged In: YES 
user_id=38388

This doesn't quite work (yet): you've broken the support for
writing binary data to the file via file.write(). Encodings
should only be used for non-binary files.

Also note that you are not freeing the memory allocated by
the "et#" parser for s.

Please add some test cases where you open a binary file and
write:
a) binary strings 
b) contents of a buffer object
c) Unicode objects 
to it.

Case c) should raise an exception. a) and b) should result
in the data being written as-is - without doing any recoding.


----------------------------------------------------------------------

Comment By: Petr Prikryl (prikryl)
Date: 2005-07-12 11:59

Message:
Logged In: YES 
user_id=771873

The title and the comments do not say so, but the patch was 
created by Reinhold Birkenfeld to solve the bug 

[ 1099364 ] raw_input() displays wrong unicode prompt

As the bug was closed and Reinhold claims to be his "first 
core hackery", I'd like to ask someone else to revise, whether 
the patch is the correct solution to the reported bug. The bug 
seems to be very visible (hence serious) in non-English 
speaking countries where Unicode promisses to solve many 
problems. Because of that I ask whether the bug should be 
closed before accepting the patch. I am adding this text also 
to link this patch to the original problem.

Thanks, Petr


----------------------------------------------------------------------

Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-06-05 18:31

Message:
Logged In: YES 
user_id=1188172

Okay, put on #5.

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2005-06-05 17:20

Message:
Logged In: YES 
user_id=55188

Yet another thing to fix:

You can't put local namespace declarations after
non-declaration statements.  Because Python uses C89 as a C
source code standard, you should all declarations in the top
of functions only.


----------------------------------------------------------------------

Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-06-05 14:17

Message:
Logged In: YES 
user_id=1188172

Thanks! Corrected in patch #4.

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2005-06-05 14:09

Message:
Logged In: YES 
user_id=671362

Reinhold, libstdtypes.tex needs two fixes.

 \versionadded{2.3}
+\versionchanged[The encoding attribute is now writable and 
is used
+for encoding Unicode strings given to \method{write()} and 
+\method{writelines()}.]{
                      ~~~
First, versionchanged tag does not have a trailing brace and it 
resuls in compile error.

Second(really trivial), versionchanged macro automatically 
appends a period at the end of the sentence(see the link [*]), 
so you don't need to put it by hand.

Then the above line would become:

+\method{writelines()}]{2.5}

[*]: http://docs.python.org/doc/inline-markup.html


----------------------------------------------------------------------

Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-06-05 09:56

Message:
Logged In: YES 
user_id=1188172

Third revision; adds new documentation and allows Python
code to set the encoding to Py_None.

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2005-06-05 04:26

Message:
Logged In: YES 
user_id=55188

The idea looks good to me.
I attached a revised patch fixed code style, C99-style local
variable declaration and added a regrtest.
I think some documentation update will be needed also.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1214889&group_id=5470


More information about the Patches mailing list