How to find the best solution ?

Tim Chase python.list at tim.thechases.com
Tue Mar 23 13:31:09 EDT 2010


Johny wrote:
> I have a text and would like  to split the text into smaller parts,
> say into 100 characters each. But if  the 100th character is not a
> blank ( but word) this must be less than 100 character.That means the
> word itself can not be split.
> These smaller parts must contains only whole( not split) words.
> I was thinking  about  RegEx but do not know how to find the correct
> Regular Expression.

While I suspect you can come close with a regular expression:

   import re, random
   size = 100
   r = re.compile(r'.{1,%i}\b' % size)
   # generate a random text string with a mix of word-lengths
   words = ['a', 'an', 'the', 'four', 'fives', 'sixsix']
   data = ' '.join(random.choice(words) for _ in range(200))
   # for each chunk of 100 characters (or fewer
   # if on a word-boundary), do something
   for bit in r.finditer(data):
     chunk = bit.group(0)
     print "%i: [%s]" % (len(chunk), chunk)

it may have an EOF fencepost error, so you might have to clean up 
the last item.  My simple test seemed to show it worked without 
cleanup though.

-tkc






More information about the Python-list mailing list