Cutting slices

Sun Mar 5 19:28:01 EST 2023

On 06/03/2023 11.59, aapost wrote:
> On 3/5/23 17:43, Stefan Ram wrote:
>>    The following behaviour of Python strikes me as being a bit
>>    "irregular". A user tries to chop of sections from a string,
>>    but does not use "split" because the separator might become
>>    more complicated so that a regular expression will be required
>>    to find it. But for now, let's use a simple "find":
>> |>>> s = 'alpha.beta.gamma'
>> |>>> s[ 0: s.find( '.', 0 )]
>> |'alpha'
>> |>>> s[ 6: s.find( '.', 6 )]
>> |'beta'
>> |>>> s[ 11: s.find( '.', 11 )]
>> |'gamm'
>> |>>>
>>
>>    . The user always inserted the position of the previous find plus
>>    one to start the next "find", so he uses "0", "6", and "11".
>>    But the "a" is missing from the final "gamma"!
>>    And it seems that there is no numerical value at all that
>>    one can use for "n" in "string[ 0: n ]" to get the whole
>>    string, isn't it?
>>
>>
> 
> I would agree with 1st part of the comment.
> 
> Just noting that string[11:], string[11:None], as well as string[11:16] 
> work ... as well as string[11:324242]... lol..

To expand on the above, answering the OP's second question: the numeric 
value is len( s ).

If the repetitive process is required, try a loop like:

 >>> start_index = 11	#to cure the issue-raised

 >>> try:
...     s[ start_index:s.index( '.', start_index ) ]
... except ValueError:
...     s[ start_index:len( s ) ]
...
'gamma'

However, if the objective is to split, then use the function built for 
the purpose:

 >>> s.split( "." )
['alpha', 'beta', 'gamma']

(yes, the OP says this won't work - but doesn't show why)

If life must be more complicated, but the next separator can be 
predicted, then its close-relative is partition().
NB can use both split() and partition() on the sub-strings produced by 
an earlier split() or ... ie there may be no reason to work strictly 
from left to right
- can't really help with this because the information above only shows 
multiple "." characters, and not how multiple separators might be 
interpreted.

A straight-line approach might be to use maketrans() and translate() to 
convert all the separators to a single character, eg white-space, which 
can then be split using any of the previously-mentioned methods.

If the problem is sufficiently complicated and the OP is prepared to go 
whole-hog, then PSL's tokenize library or various parser libraries may 
be worth consideration...

-- 
Regards,
=dn