Regular express question

alex23 wuwei23 at gmail.com
Sun Nov 1 22:23:21 EST 2009


On Oct 31, 12:48 pm, elca <high... at gmail.com> wrote:
> Hello,
> i have some text document to parse.
> sample text is such like follow
> in this document, i would like to extract such like
> SUBJECT = 'NETHERLANDS MUSIC EPA'
> CONTENT = 'Michael Buble performs in Amsterdam Canadian singer Michael Buble
> performs during a concert in Amsterdam, The Netherlands, 30 October 2009.
> Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK '
>
> if anyone help me,much appreciate
>
> "
> NETHERLANDS MUSIC EPA | 36 before
> Michael Buble performs in Amsterdam Canadian singer Michael Buble performs
> during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble
> released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK
> "

You really don't need regular expressions for this:

>>> import os
>>> eol = os.linesep
>>> text = '''
... NETHERLANDS MUSIC EPA | 36 before
... Michael Buble performs in Amsterdam Canadian singer Michael Buble
performs
... during a concert in Amsterdam, The Netherlands, 30 October 2009.
Buble
... released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK
... '''
>>> text = text.strip() # remove eol markers
>>> subject = text.split(' | ')[0]
>>> content = ' '.join(text.split(eol)[1:])
>>> subject
'NETHERLANDS MUSIC EPA'
>>> content
"Michael Buble performs in Amsterdam Canadian singer Michael Buble
performs during a concert in Amsterdam, The Netherlands, 30 October
2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF
KRAAK"



More information about the Python-list mailing list