[pypy-commit] pypy unicode-utf8-py3: improve TODO

mattip pypy.commits at gmail.com
Mon Jan 21 10:44:25 EST 2019


Author: Matti Picus <matti.picus at gmail.com>
Branch: unicode-utf8-py3
Changeset: r95689:8830c8411301
Date: 2019-01-21 17:43 +0200
http://bitbucket.org/pypy/pypy/changeset/8830c8411301/

Log:	improve TODO

diff --git a/TODO b/TODO
--- a/TODO
+++ b/TODO
@@ -1,15 +1,23 @@
-* find a better way to run "find" without creating the index storage,
-  if one is not already readily available
+* find a better way to run "find" without creating the index storage, if one
+  is not already readily available
 * write the correct jit_elidable in _get_index_storage
 * improve performance of splitlines
 * fix _pypyjson to not use a wrapped dict when decoding an object
 * make sure we review all the places that call ord(unichr) to check for ValueErrors
 * Find a more elegant way to define MAXUNICODE in rpython/rlib/runicode.py
-* rewrite unicodeobject.unicode_to_decimal_w to only use utf8 encoded bytes
-* revisit why runicode import str_decode_utf_8_impl needed instead of runicode import str_decode_utf_8
-* revisit all places where we do utf8.decode('utf-8'), they should work directly with utf8
+* revisit why runicode import str_decode_utf_8_impl needed instead of runicode
+  import str_decode_utf_8
+* revisit all places where we do utf8.decode('utf-8'), they should work
+  directly with utf8
   - rutf8.utf8_encode_mbcs
   - unicodehelper.fsencode
+  - unicodehelper.unicode_to_decimal_w
 * remove asserts from _WIN32 paths in rlib.rposix.re{name,place}
-* convert all realunicode_w to unicode_w after we flush out all old uses of unicode_w
-* benchmark
+* convert all realunicode_w to unicode_w after we flush out all old uses of
+  unicode_w
+* benchmark more (looks good so far)
+* Review all uses of W_Unicode.text_w, right now it is exactly W_Unicode.utf8_w. 
+  It shoud only return valid utf8 (see 0be26dc39a59 which broke translation on
+  win32 and failed tests on linux64). Then we can use it in places like
+  _socket.interp_func.getaddrinfo instead of space.encode_unicode_object(w_port,
+  'utf-8', 'strict')


More information about the pypy-commit mailing list