[Tutor] FW: wierd replace problem

Timo timomlists at gmail.com
Tue Sep 14 09:32:38 CEST 2010


On 14-09-10 09:28, Roelof Wobben wrote:
>
>
> Hello,
>
> Strip ('"'') does not work.
> Still this message : SyntaxError: EOL while scanning string literal
>    
Review it again, see how many quotes you are using.

For example, this won't work either:
 >>> s = 'foo'bar'

You need to escape the quotes with a backslash, like:
 >>> s = 'foo\'bar'
 >>> print s
foo'bar


Cheers,
Timo

> So I think I go for the suggestion of Bob en develop a programm which deletes all the ' and " by scanning it character by character.
>
>   Roelof
>
>
>    
>> ----------------------------------------
>>      
>>> From: steve at pearwood.info
>>> To: tutor at python.org
>>> Date: Tue, 14 Sep 2010 09:39:29 +1000
>>> Subject: Re: [Tutor] wierd replace problem
>>>
>>> On Tue, 14 Sep 2010 09:08:24 am Joel Goldstick wrote:
>>>        
>>>> On Mon, Sep 13, 2010 at 6:41 PM, Steven D'Aprano
>>>>          
>>> wrote:
>>>        
>>>>> On Tue, 14 Sep 2010 04:18:36 am Joel Goldstick wrote:
>>>>>            
>>>>>> How about using str.split() to put words in a list, then run
>>>>>> strip() over each word with the required characters to be removed
>>>>>> ('`")
>>>>>>              
>>>>> Doesn't work. strip() only removes characters at the beginning and
>>>>> end of the word, not in the middle:
>>>>>            
>>>> Exactly, you first split the words into a list of words, then strip
>>>> each word
>>>>          
>>> Of course, if you don't want to remove ALL punctuation marks, but only
>>> those at the beginning and end of words, then strip() is a reasonable
>>> approach. But if the aim is to strip out all punctuation, no matter
>>> where, then it can't work.
>>>
>>> Since the aim is to count words, a better approach might be a hybrid --
>>> remove all punctuation marks like commas, fullstops, etc. no matter
>>> where they appear, keep internal apostrophes so that words like "can't"
>>> are different from "cant", but remove external ones. Although that
>>> loses information in the case of (e.g.) dialect speech:
>>>
>>> "'e said 'e were going to kill the lady, Mister Holmes!"
>>> cried the lad excitedly.
>>>
>>> You probably want to count the word as 'e rather than just e.
>>>
>>> And hyphenation is tricky to. A lone hyphen - like these - should be
>>> deleted. But double-dashes--like these--are word separators, so need to
>>> be replaced by a space. Otherwise, single hyphens should be kept. If a
>>> word begins or ends with a hyphen, it should be be joined up with the
>>> previous or next word. But then it gets more complicated, because you
>>> don't know whether to keep the hyphen after joining or not.
>>>
>>> E.g. if the line ends with:
>>>
>>> blah blah blah blah some-
>>> thing blah blah blah.
>>>
>>> should the joined up word become the compound word "some-thing" or the
>>> regular word "something"? In general, there's no way to be sure,
>>> although you can make a good guess by looking it up in a dictionary and
>>> assuming that regular words should be preferred to compound words. But
>>> that will fail if the word has changed over time, such as "cooperate",
>>> which until very recently used to be written "co-operate", and before
>>> that as "coöperate".
>>>
>>>
>>>
>>> --
>>> Steven D'Aprano
>>> _______________________________________________
>>> Tutor maillist - Tutor at python.org
>>> To unsubscribe or change subscription options:
>>> http://mail.python.org/mailman/listinfo/tutor 		 	   		
>>>        
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>    



More information about the Tutor mailing list