Regular Expression

MRAB python at mrabarnett.plus.com
Sun Apr 12 20:05:03 EDT 2015


On 2015-04-13 00:47, Pippo wrote:
> On Sunday, 12 April 2015 19:44:05 UTC-4, Pippo  wrote:
>> On Sunday, 12 April 2015 19:28:44 UTC-4, MRAB  wrote:
>> > On 2015-04-12 23:49, Pippo wrote:
>> > > I have a text as follows:
>> > >
>> > > "#D{#C[Health] #P[Information] -
>> > > means any information, including #ST[genetic information],
>> > > whether #C[oral | (recorded in (any form | medium))], that
>> > > (1)#C[Is created or received by] a
>> > > #A[health care provider | health plan | public health authority | employer | life insurer | school | university | or health care clearinghouse];
>> > > (2)#C[Relates to] #C[the past, present, or future physical | mental health | condition of an individual] |
>> > > #C[the provision of health care to an individual] |
>> > > #C[the past, present, or future payment for the provision of health care to an individual].}"
>> > >
>> > > I want to get all elements that start with #C and are []  and put it in an array. For example #C[Health], I try with regex but it doesn't work:
>> > >
>> > "... it doesn't work"? In what way doesn't it work?
>> >
>> > > import re
>> > > import tkinter.filedialog
>> > > import readfile
>> > >
>> > >
>> > >
>> > > j = 0
>> > >
>> > > text = [ ]
>> > >
>> > >
>> > > content = readfile.pattread()
>> > >
>> > > while j < len(content):
>> > >
>> > There's a syntax error here:
>> >
>> > >      constraint = re.compile(r'(#C\[\w*\]'))
>> > >      result = constraint.search(content[j],re.MULTILINE)
>> > >      text.append(result)
>> > >      print(text)
>> > >      j = j+1
>> > >
>>
>> result is empty! Although it should have a content.
>>
>> What is the syntax error?
>
> I fixed the syntax error but the result shows:
>
>>>>
> [None]
> [None, None]
> [None, None, None]
> [None, None, None, None]
> [None, None, None, None, None]
> [None, None, None, None, None, None]
> [None, None, None, None, None, None, None]
> [None, None, None, None, None, None, None, None]
>>>>
>
>
> No error but if I don't call the content I posted up and call this as a content: #content = "#C[Health] #P[Information]"
>
> result gives me #C[Health]
>
What does 'readfile.pattread()' return? Does it return a list of
strings? I'm guessing it does.

Try printing each string you're trying to match using 'repr', i.e.:

     print(repr(content[j]))

Do any look like they should match?

If one doesn't, but you think it should, post it here so that someone
can tell you why it doesn't! :-)




More information about the Python-list mailing list