Regular Expression

Pippo adm2303.2304 at gmail.com
Sun Apr 12 20:25:30 EDT 2015


On Sunday, 12 April 2015 20:06:08 UTC-4, MRAB  wrote:
> On 2015-04-13 00:47, Pippo wrote:
> > On Sunday, 12 April 2015 19:44:05 UTC-4, Pippo  wrote:
> >> On Sunday, 12 April 2015 19:28:44 UTC-4, MRAB  wrote:
> >> > On 2015-04-12 23:49, Pippo wrote:
> >> > > I have a text as follows:
> >> > >
> >> > > "#D{#C[Health] #P[Information] -
> >> > > means any information, including #ST[genetic information],
> >> > > whether #C[oral | (recorded in (any form | medium))], that
> >> > > (1)#C[Is created or received by] a
> >> > > #A[health care provider | health plan | public health authority | employer | life insurer | school | university | or health care clearinghouse];
> >> > > (2)#C[Relates to] #C[the past, present, or future physical | mental health | condition of an individual] |
> >> > > #C[the provision of health care to an individual] |
> >> > > #C[the past, present, or future payment for the provision of health care to an individual].}"
> >> > >
> >> > > I want to get all elements that start with #C and are []  and put it in an array. For example #C[Health], I try with regex but it doesn't work:
> >> > >
> >> > "... it doesn't work"? In what way doesn't it work?
> >> >
> >> > > import re
> >> > > import tkinter.filedialog
> >> > > import readfile
> >> > >
> >> > >
> >> > >
> >> > > j = 0
> >> > >
> >> > > text = [ ]
> >> > >
> >> > >
> >> > > content = readfile.pattread()
> >> > >
> >> > > while j < len(content):
> >> > >
> >> > There's a syntax error here:
> >> >
> >> > >      constraint = re.compile(r'(#C\[\w*\]'))
> >> > >      result = constraint.search(content[j],re.MULTILINE)
> >> > >      text.append(result)
> >> > >      print(text)
> >> > >      j = j+1
> >> > >
> >>
> >> result is empty! Although it should have a content.
> >>
> >> What is the syntax error?
> >
> > I fixed the syntax error but the result shows:
> >
> >>>>
> > [None]
> > [None, None]
> > [None, None, None]
> > [None, None, None, None]
> > [None, None, None, None, None]
> > [None, None, None, None, None, None]
> > [None, None, None, None, None, None, None]
> > [None, None, None, None, None, None, None, None]
> >>>>
> >
> >
> > No error but if I don't call the content I posted up and call this as a content: #content = "#C[Health] #P[Information]"
> >
> > result gives me #C[Health]
> >
> What does 'readfile.pattread()' return? Does it return a list of
> strings? I'm guessing it does.

yes it reads a file of string similar to the one I posted above

> 
> Try printing each string you're trying to match using 'repr', i.e.:
> 
>      print(repr(content[j]))
> 
> Do any look like they should match?

 print(repr(content[j])) gives me the following:

[None]
'#D{#C[Health] #P[Information] - \n'
[None, None]
'means any information, including #ST[genetic information], \n'
[None, None, None]
'whether #C[oral | (recorded in (any form | medium))], that \n'
[None, None, None, None]
'(1)#C[Is created or received by] a \n'
[None, None, None, None, None]
'#A[health care provider | health plan | public health authority | employer | life insurer | school | university | or health care clearinghouse];  \n'
[None, None, None, None, None, None]
'(2)#C[Relates to] #C[the past, present, or future physical | mental health | condition of an individual] | \n'
[None, None, None, None, None, None, None]
'#C[the provision of health care to an individual] | \n'
[None, None, None, None, None, None, None, None]
'#C[the past, present, or future payment for the provision of health care to an individual].}\n'

shouldn't it match "#C[Health]" in the first row? If not, what is the best way to fetch these items in an array?

> 
> If one doesn't, but you think it should, post it here so that someone
> can tell you why it doesn't! :-)



More information about the Python-list mailing list