PEP 393 vs UTF-8 Everywhere

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Sat Jan 21 10:56:35 EST 2017


Steve D'Aprano writes:

[snip]

> You could avoid that error by increasing the offset by the right
> amount:
>
> stuff = text[offset + len("ф".encode('utf-8'):]
>
> which is awful. I believe that's what Go and Julia expect you to do.

Julia provides a method to get the next index.

let text = "ἐπὶ οἴνοπα πόντον", offset = 1
    while offset <= endof(text)
        print(text[offset], ".")
        offset = nextind(text, offset)
    end
    println()
end # prints: ἐ.π.ὶ. .ο.ἴ.ν.ο.π.α. .π.ό.ν.τ.ο.ν.



More information about the Python-list mailing list