Is there a function to remove escape characters from a string ?

John Machin sjmachin at lexicon.net
Thu Dec 25 20:03:01 EST 2008


On Dec 26, 8:53 am, Stef Mientki <stef.mien... at gmail.com> wrote:
> Steven D'Aprano wrote:
> > On Thu, 25 Dec 2008 11:00:18 +0100, Stef Mientki wrote:
>
> >> hello,
>
> >> Is there a function to remove escape characters from a string ?
> >> (preferable all escape characters except "\n").
>
> > Can you explain what you mean? I can think of at least four alternatives:
>
> I have the following kind of strings,
> the funny "þ" is ASCII character 254, used as a separator character

ASCII ends at 127. Just refer to it as chr(254).

>
> [FSM]
> Counts = "1þ11þ16"     ==>   1,11,16
> Init1 = "1þ\BCtrl"     ==>    1,Ctrl
> State5 = "8þ\BJUMP_COMPL\b\n>PCWrite = 1\n>PCSource = 10"
>          ==> 8, JUMP_COMPL\n>PCWrite = 1\n>PCSource = 10

After making those substitutions, what are you going to do with it?
Split it up into fields using the csv module or stuff.split(",") or
some other DIY method? Is there a possibility that whoever "designed"
that data format used chr(254) as a separator because the data fields
contained "," sometimes and so "," could not be used as a separator?

> Seeing and testing all your answers, with great solutions that I've
> never seen before,

As far as str methods and built-ins that work on str objects are
concerned, there is no corpus of secret knowledge known only to a
cabal of wizards; it's all in the manual, and you don't need special
magical spectacles to see it :-)

> knowing nothing of escape sequences (I'm a windows guy ;-)

Why do you think that whether or not you are a "windows guy" is
relevant to knowing anything about escape sequences?

> I now see that the characters I need to remove, like  \B  and \b  are
> not "official" escape sequences.

\b *is* an "official" escape sequence, just like \n; see below:

| >>> x = '\b'; print len(x), repr(x)
| 1 '\x08'
| >>> x = r'\b'; print len(x), repr(x)
| 2 '\\b'
| >>> x = '\B'; print len(x), repr(x)
| 2 '\\B'
| >>> x = r'\B'; print len(x), repr(x)
| 2 '\\B'

> So in this case the best (easiest to understand) method is a few replace
> statements:
> s = s.replace ( '\b', '' ).replace( '\B',  '' )

It's probable that \b and \B are both TWO-byte sequences, in which
case you should use r'\b' so that it does what you want it to do, and
use r'\B' for consistency.




More information about the Python-list mailing list