[pypy-dev] Speeds of various utf8 operations

Wed Mar 8 13:09:57 EST 2017

Yes ;)… At some point, now I'm still experimenting with the operations we
think we need.

Cheers,
Richard

On Mar 8, 2017 6:50 PM, "David Edelsohn" <dje.gcc at gmail.com> wrote:

> And POWER VSX and Z VX? ;-)
>
> - David
>
>
> On Wed, Mar 8, 2017 at 12:17 PM, Richard Plangger <planrichi at gmail.com>
> wrote:
> > Hi,
> >
> > as we discussed on the sprint I have now experimented with an SSE/AVX
> > implementation to 'len(utf8 string)' (this includes a check if it is
> > valid utf8). Since this is related to this mailing list thread I'll just
> > add it here!
> >
> > I ran some small measurements on it:
> >
> > Here some explanation of the names:
> >
> > pypy-seq-.*: sequential implementation in C, nothing fancy just a
> baseline
> > pypy-vec-sse4-.*: implementation using sse4 (128 bit registers)
> > pypy-vec-avx2-.*: implementation using avx2 (256 bit registers)
> > libunistring-.*: benchmarking the function u8_check in that gnu library,
> > NO length is calculated
> > mystrlenutf8-.*: some guy doing length calculation (no validity check)
> > only using 64bit words instead of per byte iteration. (see here [1])
> >
> > .*-news-de: html of a german website (has quite a lot of 2 byte code
> > points), ~ 1MB
> > .*-news-cn: worldjournarl.com -> mandarin (html website with lots of 4
> > byte code points) ~ 700 KB
> > .*-tipitaka-thai: xml page of some religious text with lots of 3 byte
> > code points (~4.5 MB) copied many times (original file was 300KB)
> >
> > Why is u8u16 missing? Well, as far as I can tell there is no function in
> > u8u16 that returns the length of an utf8 string and checks if it is
> > valid at the same time, without rewriting it. u8u16 is really just for
> > transforming utf8 to utf16.
> >
> > The benchmark runs read the content from a file (e.g. .*-news-de, a
> > german html news website), and in a loop iterates 10 times the
> > utf-8-get-length-and-check function written in C and sums up the time
> > for each run (using clock_t clock(void) in C, man 3 clock).
> >
> > .....................
> > pypy-seq-news-de: Median +- std dev: 76.0 us +- 1.4 us
> > .....................
> > pypy-sse4-vec-news-de: Median +- std dev: 5.16 us +- 0.14 us
> > .....................
> > pypy-avx2-vec-news-de: Median +- std dev: 384 ns +- 11 ns
> > .....................
> > libunistring-news-de: Median +- std dev: 33.0 us +- 0.4 us
> > .....................
> > mystrlenutf8-news-de: Median +- std dev: 9.25 us +- 0.22 us
> > .....................
> > pypy-seq-news-cn: Median +- std dev: 59.8 us +- 1.2 us
> > .....................
> > pypy-sse4-vec-news-cn: Median +- std dev: 7.70 us +- 0.12 us
> > .....................
> > pypy-avx2-vec-news-cn: Median +- std dev: 23.3 ns +- 0.4 ns
> > .....................
> > libunistring-news-cn: Median +- std dev: 30.5 us +- 0.4 us
> > .....................
> > mystrlenutf8-news-cn: Median +- std dev: 6.54 us +- 0.20 us
> > .....................
> > pypy-seq-tipitaka-thai: Median +- std dev: 939 us +- 39 us
> > .....................
> > pypy-sse4-vec-tipitaka-thai: Median +- std dev: 425 us +- 7 us
> > .....................
> > pypy-avx2-vec-tipitaka-thai: Median +- std dev: 19.9 ns +- 0.3 ns
> > .....................
> > libunistring-tipitaka-thai: Median +- std dev: 615 us +- 28 us
> > .....................
> > WARNING: the benchmark seems unstable, the standard deviation is high
> > (stdev/median: 17%)
> > Try to rerun the benchmark with more runs, samples and/or loops
> >
> > mystrlenutf8-tipitaka-thai: Median +- std dev: 45.1 us +- 7.9 us
> >
> > What do you think?
> >
> > I think it would even be a good idea to take a look at AVX512 (which
> > gives you a crazy amount of 512 bits (or 64 bytes) in your vector
> register).
> >
> > The AVX implementation is a bit fishy (compare avx2-vec-tipitaka-thai
> > and pypy-avx2-vec-news-cn). I need to recheck that, it would not make
> > sense to process 10x 4.5 MB in 20ns and 10x 700KB in 23ns.
> >
> > As soon as I have ironed out the issue I'll start to think about
> indexing...
> >
> > Cheers,
> > Richard
> >
> > [1] http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html
> >
> > On 03/04/2017 07:01 PM, Maciej Fijalkowski wrote:
> >> Hello everyone
> >>
> >> I've been experimenting a bit with faster utf8 operations (and
> >> conversion that does not do much). I'm writing down the results so
> >> they don't get forgotten, as well as trying to put them in rpython
> >> comments.
> >>
> >> As far as non-SSE algorithms go, for things like splitlines, split
> >> etc. is important to walk the utf8 string quickly and check properties
> >> of characters.
> >>
> >> So far the current finding has been that lookup table, for example:
> >>
> >>  def next_codepoint_pos(code, pos):
> >>      chr1 = ord(code[pos])
> >>      if chr1 < 0x80:
> >>          return pos + 1
> >>     return pos + ord(runicode._utf8_code_length[chr1 - 0x80])
> >>
> >> is significantly slower than following code (both don't do error
> checking):
> >>
> >> def next_codepoint_pos(code, pos):
> >>     chr1 = ord(code[pos])
> >>     if chr1 < 0x80:
> >>         return pos + 1
> >>     if 0xC2 >= chr1 <= 0xDF:
> >>         return pos + 2
> >>     if chr >= 0xE0 and chr <= 0xEF:
> >>         return pos + 3
> >>     return pos + 4
> >>
> >> The exact difference depends on how much multi-byte characters are
> >> there and how big the strings are. It's up to 40%, but as a general
> >> rule, the more ascii characters are, the less of an impact it has, as
> >> well as the larger they are, the more impact memory/L2/L3 cache has.
> >>
> >> PS. SSE will be faster still, but we might not want SSE for just
> splitlines
> >>
> >> Cheers,
> >> fijal
> >> _______________________________________________
> >> pypy-dev mailing list
> >> pypy-dev at python.org
> >> https://mail.python.org/mailman/listinfo/pypy-dev
> >>
> > _______________________________________________
> > pypy-dev mailing list
> > pypy-dev at python.org
> > https://mail.python.org/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20170308/893b45ae/attachment-0001.html>