.title() - annoying mistake

Karen Shaeffer klsshaeffer at icloud.com
Fri Mar 19 13:45:17 EDT 2021



> On Mar 19, 2021, at 9:42 AM, Grant Edwards <grant.b.edwards at gmail.com> wrote:
> 
> On 2021-03-19, Skip Montanaro <skip.montanaro at gmail.com> wrote:
>>> 
>>> That's annoying. You have to roll your own solution!
>>> 
>> 
>> Certainly seems like a known issue:
>> 
>> https://bugs.python.org/issue12737
> 
> While that is an issue with string.title(), I don't see how it's
> related to what the OP is reporting. Issue 12737 is about Unicode
> combining marks.

Hi,
I’ve been frustrated by my experiences processing unstructured multilingual text with python. I’ve always assumed this was due to my insufficient experience with python (3) text processing. I’ve recently begun coding with Go. (I also continue to code in Python) And Go has exceptionally crisp and clear capacity to process unstructured multilingual utf-8 encoded text.

In just a few days of working with text processing in Go, using the book “The Go Programming Language” by Donovan and Kernighan, along with the Go language specification and other free online help, I have acquired a clear and crisp understanding of how to work effectively with unstructured, multilingual utf-8 encoded text (and emojis) and any unicode code point — even invalid unicode code points.

To see some of these issues first hand, write a palindrome detector that works with any sequence of utf-8 encoded code points, including invalid code points. I’m sure it can be done in python, although I’ve not done it. It’s a trivial exercise in Go.

I’m not bashing Python here. I will continue to code with python. Its an exceptional language and community. Just commenting on my experience.

humbly,
Karen



More information about the Python-list mailing list