[Tutor] re module

Albert-Jan Roskam fomcl at yahoo.com
Thu Aug 14 20:50:26 CEST 2014


-----------------------------
On Thu, Aug 14, 2014 4:07 PM CEST Chris “Kwpolska” Warrick wrote:

>On 14 Aug 2014 15:58 "Sunil Tech" <sunil.techspk at gmail.com> wrote:
>>
>> Hi,
>>
>> I have string like
>> stmt = '<p><span style="font-size: 11pt;"><span style="font-family: times
>new roman,times;">Patient name: Upadhyay Shyam</span><span
>style="font-family: times new roman,times;">  <br />Date of
>birth:   08/08/1988 <br />Issue(s) to be
>analyzed:  testttttttttttttttttttt</span></span><br /><span
>style="font-size: 11pt;"><span style="font-family: times new
>roman,times;">Nurse Clinical summary:  test1</span><span
>style="font-family: times new roman,times;"> <br /><br />Date of
>injury:   12/14/2013</span><br /><span style="font-family:
>times new roman,times;">Diagnoses:   723.4 - 300.02 - 298.3
>- 780.50 - 724.4 Brachial neuritis or radiculitis nos - Generalized
>anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance
>- Thoracic or lumbosacral neuritis or radiculitis, unspecified</span><br
>/><span style="font-family: times new roman,times;">Requester
>name:   Demo Spltycdtestt</span><br /><span
>style="font-family: times new roman,times;">Phone #:   (213)
>480-9000</span><br /><br /><span style="font-family: times new
>roman,times;">Medical records reviewed <br />__ pages of medical and
>administrative records were reviewed including:<br /><br /><br />Criteria
>used in analysis <br /> <br /><br />Reviewer comments <br /><br /><br
>/>Determination<br />Based on the clinical information submitted for this
>review and using the evidence-based, peer-reviewed guidelines referenced
>above, this request is <br /><br />Peer Reviewer
>Name/Credentials  </span><br /><span style="font-family: times
>new roman,times;">Solis, Test, PhD</span><br /><span
>style="font-family: times new roman,times;">Internal Medicine</span><br
>/><span style="font-family: times new roman,times;"> </span><br /><br
>/><span style="font-family: times new roman,times;">Attestation<br /><br
>/><br />Contact Information</span><span style="font-family: times new
>roman,times;"> <br /></span></span></p><br/><font face=\'times new
>roman,times\' size=\'3\'>Peer to Peer contact attempt 1: 08/13/2014 02:46
>PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did
>Not Change Determination</font>'
>>
>>
>> i am trying to find the various font sizes and font face from this string.
>>
>> i tried
>>
>> print re.search("<span style=\"(.*)\", stmt).group()
>>
>>
>> Thank you.
>>
>>
>>
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>Don't use regular expressions for HTML. Use lxml instead.
>
>Also, why would you need that exact thing? It's useless. Also, this code is
>very ugly, with too many <span>s and — worse — <font>s which should not be
>used at all.

Why lxml and not bs? I read that bs deals better with malformed html. You said the above html is messy, which is not necessarily the same as malformed, but.. Anyway, this reference also seems to favor lxml: http://stackoverflow.com/questions/4967103/beautifulsoup-and-lxml-html-what-to-prefer


More information about the Tutor mailing list