A tough one: split on word length?

Laurent Pointal laurent.pointal at free.fr
Mon May 16 13:13:35 EDT 2016


DFS wrote:

> Have:
> '584323 Fri 13 May 2016 17:37:01 -0000 (UTC) 584324 Fri 13 May 2016
> 13:44:40 -0400 584325 13 May 2016 17:45:25 GMT 584326 Fri 13 May 2016
> 13:47:28 -0400'
> 
> Want:
> [('584323', 'Fri 13 May 2016 17:37:01 -0000 (UTC)'),
>    ('584324', 'Fri 13 May 2016 13:44:40 -0400'),
>    ('584325', '13 May 2016 17:45:25 GMT'),
>    ('584326', 'Fri 13 May 2016 13:47:28 -0400')]
> 
> 
> Or maybe split() on space, then run through and add words of 6+ numbers
> to the list, then recombine everything until you hit the next group of
> 6+ numbers, and so on?
> 
> The data is guaranteed to contain those 6+ groups of numbers.

Test with regexp under Python3

>>> import re
>>> s = '584323 Fri 13 May 2016 17:37:01 -0000 (UTC) 584324 Fri 13 May 2016 
13:44:40 -0400 584325 13 May 2016 17:45:25 GMT 584326 Fri 13 May 2016 
13:47:28 -0400'
>>> re.split("(\d{6})(.*?)", s)
['', '584323', '', ' Fri 13 May 2016 17:37:01 -0000 (UTC) ', '584324', '', ' 
Fri 13 May 2016 13:44:40 -0400 ', '584325', '', ' 13 May 2016 17:45:25 GMT 
', '584326', '', ' Fri 13 May 2016 13:47:28 -0400']


Dismiss empty items and strip whitespaces at begin or end of string, and 
that's done.

A+
Laurent.
Note: re experts will provide a cleaner solution.



More information about the Python-list mailing list