[Python-Dev] PEP 393 Summer of Code Project
fwierzbicki at gmail.com
fwierzbicki at gmail.com
Fri Sep 9 21:58:41 CEST 2011
On Fri, Sep 9, 2011 at 10:16 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> I am curious how you index by code point rather than code unit with 16-bit
> code units and how it compares with the method I posted. Is there anything I
> can read? Reply off list if you want.
I'll post on-list until someone complains, just in case there are
interested onlookers :)
There aren't docs, but the code is here:
https://bitbucket.org/jython/jython/src/8a8642e45433/src/org/python/core/PyUnicode.java
Here are (I think) the most relevant bits for random access -- note
that getString() returns the internal representation of the PyUnicode
which is a java.lang.String
@Override
protected PyObject pyget(int i) {
if (isBasicPlane()) {
return Py.makeCharacter(getString().charAt(i), true);
}
int k = 0;
while (i > 0) {
int W1 = getString().charAt(k);
if (W1 >= 0xD800 && W1 < 0xDC00) {
k += 2;
} else {
k += 1;
}
i--;
}
int codepoint = getString().codePointAt(k);
return Py.makeCharacter(codepoint, true);
}
public boolean isBasicPlane() {
if (plane == Plane.BASIC) {
return true;
} else if (plane == Plane.UNKNOWN) {
plane = (getString().length() == getCodePointCount()) ?
Plane.BASIC : Plane.ASTRAL;
}
return plane == Plane.BASIC;
}
public int getCodePointCount() {
if (codePointCount >= 0) {
return codePointCount;
}
codePointCount = getString().codePointCount(0, getString().length());
return codePointCount;
}
-Frank
More information about the Python-Dev
mailing list