splitting file/content into lines based on regex termination

MRAB python at mrabarnett.plus.com
Thu Nov 7 13:13:14 EST 2013


On 07/11/2013 17:45, bruce wrote:
> update...
>
>    dat=re.compile("<br>#(\d+) / (\d+)#(\d+)#").split(s)
>
> almost works..
>
> except i get
> m = 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett,
> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL
> m = 45
> m = 58
> m = 0
> m = 10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett,
> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL
> m = 9
> m = 58
> m = 0
>
> and what i want is:
> m = 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett,
> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL 45 / 58,0
> m = 10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett,
> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL 9 / 58,0
>
>
> so i'd have the results of the "compile/regex process" to be added to
> the split lines
>
> thoughts/comments??
>
> thanks
>
The split method also returns what's matched in any capture groups,
i.e. "(\d+)". Try omitting the parentheses:

     dat = re.compile(r"<br>#\d+ / \d+#\d+#").split(s)

You should also be using raw string literals as above (r"..."). It
doesn't matter in this instance, but it might in others.
>
>
> On Thu, Nov 7, 2013 at 12:15 PM, bruce <badouglas at gmail.com> wrote:
>> hi.
>>
>> got a test file with the sample content listed below:
>>
>> the content is one long string, and needs to be split into separate lines
>>
>> I'm thinking the pattern to split on should be a kind of regex like::
>> <br>#45 / 58#0#
>> or
>> <br>#9 / 58#0
>> but i have no idea how to make this happen!!
>>
>> if i read the content into a buf -> s
>>
>> import re
>> dat = re.compile("what goes here??").split(s)
>>
>> --i'm not sure what goes in the compile() to get the process to work..
>>
>> thoughts/comments would be helpful.
>>
>> thanks
>>
>>
>> test dat::
>> 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL <br>#45 /
>> 58#0#10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL <br>#9 /
>> 58#0#10178#000#C S#S#124##001##DAY#Computer Systems#Roper,
>> Paul#3#MWF<br>#11:00am<br>#11:50am<br>#1170 TMCB <br>#41 /
>> 145#0#10178#000#C S#S#124##002##DAY#Computer Systems#Roper,
>> Paul#3#MWF<br>#2:00pm<br>#2:50pm<br>#1170 TMCB <br>#40 /
>> 120#0#01489#002#C S#S#142##001##DAY#Intro to Computer
>> Programming#Burton, Robert <div class='instructors'>Seppi, Kevin<br
>> /></div><span
>




More information about the Python-list mailing list