[pypy-dev] Fw: [pypy-svn] r14617 - pypy/dist/pypy/module/unicodedata[POSSIBLE SPAM ]
Ben.Young at risk.sungard.com
Ben.Young at risk.sungard.com
Wed Jul 13 16:35:41 CEST 2005
Hi, I was just looking over the subversion commit messages and had a
couple of comments to make.
Did you really want [0 * ... ] in some of these diffs, rather than [0] *
...? Also, don't lists resize to avoid this kind of behaviour?
Sorry if you had already spotted them!
Cheers,
Ben
pypy-svn-bounces at codespeak.net wrote on 13/07/2005 15:05:11:
> Author: ac
> Date: Wed Jul 13 16:05:10 2005
> New Revision: 14617
>
> Modified:
> pypy/dist/pypy/module/unicodedata/function.py
> Log:
> Avoid frequent reallocations when normalizing unicode.
>
> Modified: pypy/dist/pypy/module/unicodedata/function.py
>
==============================================================================
> --- pypy/dist/pypy/module/unicodedata/function.py (original)
> +++ pypy/dist/pypy/module/unicodedata/function.py Wed Jul 13 16:05:10
2005
> @@ -100,6 +100,7 @@
> NCount = (VCount*TCount)
> SCount = (LCount*NCount)
>
> +
> def normalize(space, w_form, w_unistr):
> form = space.str_w(w_form)
> if not space.is_true(space.isinstance(w_unistr, space.w_unicode)):
> @@ -121,7 +122,9 @@
> space.wrap('invalid normalization form'))
>
> strlen = space.int_w(space.len(w_unistr))
> - result = []
> + result = [0] * (strlen + strlen / 10 + 10)
> + j = 0
> + resultlen = len(result)
> # Expand the character
> for i in range(strlen):
> ch = space.int_w(space.ord(space.getitem(w_unistr,
space.wrap(i))))
> @@ -132,33 +135,57 @@
> V = VBase + (SIndex % NCount) / TCount;
> T = TBase + SIndex % TCount;
> if T == TBase:
> - result.extend([L, V])
> + if j + 2 > resultlen:
> + result.extend([0 * (j + 2 - resultlen + 10)])
> + resultlen = len(result)
> + result[j] = L
> + result[j + 1] = V
> + j += 2
> else:
> - result.extend([L, V, T])
> + if j + 3 > resultlen:
> + result.extend([0 * (j + 3 - resultlen + 10)])
> + resultlen = len(result)
> + result[j] = L
> + result[j + 1] = V
> + result[j + 2] = T
> + j += 3
> continue
> -
> - result.extend(decomposition.get(ch, [ch]))
> + decomp = decomposition.get(ch)
> + if decomp:
> + decomplen = len(decomp)
> + if j + decomplen > resultlen:
> + result.extend([0 * (j + decomplen - resultlen + 10)])
> + resultlen = len(result)
> + for ch in decomp:
> + result[j] = ch
> + j += 1
> + else:
> + if j + 1 > resultlen:
> + result.extend([0 * (j + 1 - resultlen + 10)])
> + resultlen = len(result)
> + result[j] = ch
> + j += 1
>
> # Sort all combining marks
> - for i in range(len(result)):
> + for i in range(j):
> ch = result[i]
> comb = unicodedb.combining(ch)
> if comb == 0:
> continue
> - for j in range(i, 0, -1):
> - if unicodedb.combining(result[j - 1]) <= comb:
> - result[j] = ch
> + for k in range(i, 0, -1):
> + if unicodedb.combining(result[k - 1]) <= comb:
> + result[k] = ch
> break
>
> - result[j] = result[j - 1]
> + result[k] = result[k - 1]
> else:
> result[0] = ch
>
> if not composed: # If decomposed normalization we are done
> - return space.newunicode(result)
> + return space.newunicode(result[:j])
>
> - if len(result) <= 1:
> - return space.newunicode(result)
> + if j <= 1:
> + return space.newunicode(result[:j])
>
> current = result[0]
> starter_pos = 0
> @@ -166,8 +193,8 @@
> prev_combining = 0
> if unicodedb.combining(current):
> prev_combining = 256
> - for j in range(1, len(result)):
> - next = result[j]
> + for k in range(1, j):
> + next = result[k]
> next_combining = unicodedb.combining(next)
> if next_insert == starter_pos + 1 or prev_combining <
next_combining:
> # Combine if not blocked
> _______________________________________________
> pypy-svn mailing list
> pypy-svn at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-svn
>
More information about the Pypy-dev
mailing list