Unicode and Python - how often do you index strings?

Ian Kelly ian.g.kelly at gmail.com
Thu Jun 5 20:05:34 EDT 2014


On Thu, Jun 5, 2014 at 2:34 PM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
>> If you want to be really picky about removing exactly one line
>> terminator, then this captures all the relatively modern variations:
>> re.sub('\r?\n$|\n?\r$', line, '', count=1)
>
> or perhaps: re.sub("[^ \S]+$", "", line)

That will remove more than one terminator, plus tabs. Points for
including \f and \v though.

I suppose if we want to be absolutely correct, we should follow the
Unicode standard:
re.sub(r'\r?\n$|[\r\v\f\x85\u2028\u2029]$', line, '', count=1)



More information about the Python-list mailing list