aligning text with space-normalized text

John Machin sjmachin at lexicon.net
Thu Jun 30 07:49:20 EDT 2005


Steven Bethard wrote:
> John Machin wrote:
> 
>> If "work" is meant to detect *all* possibilities of 'chunks' not 
>> having been derived from 'text' in the described manner, then it 
>> doesn't work -- all information about the positions of the whitespace 
>> is thrown away by your code.
>>
>> For example, text = 'foo bar', chunks = ['foobar']
> 
> 
> This doesn't match the (admittedly vague) spec

That is *exactly* my point -- it is not valid input, and you are not 
reporting all cases of invalid input; you have an exception where the 
non-spaces are impossible, but no exception where whitespaces are 
impossible.


which said that chunks
> are created "as if by ' '.join(chunk.split())".  For the text:
>     'foo bar'
> the possible chunk lists should be something like:
>     ['foo bar']
>     ['foo', 'bar']
> If it helps, you can think of chunks as lists of words, where the words 
> have been ' '.join()ed.

If it helps, you can re-read my message.

> 
> STeVe



More information about the Python-list mailing list