[Tutor] How to match strange characters

J. Van Brimmer jerry.vb at gmail.com
Mon Sep 8 17:01:26 CEST 2008


Kent Johnson wrote:
> On Mon, Sep 8, 2008 at 2:46 AM, J. Van Brimmer <jerry.vb at gmail.com> wrote:
>   
>> I have a legacy program at work that outputs a text file with this header:
>>
>> ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
>> º Radio Source Precession Program º
>> º by John B. Doe º
>> º 31 August 1992 º
>> ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍŒ
>> Enter Date for Precession as (MM-DD-YYYY) or C/R for 05-28-2004 > 05-28-2004
>> Enter the Catalog Name or C/R for CATALOG.SRC >
>> The Julian Date is = 2453153.5
>> 0022+002 5.6564 +0.2713 00:22:37.54 00:16:16.65
>> 0106+013 17.2117 +1.6052 01:08:50.80 01:36:18.58
>> .
>> I am trying to write a python script to strip this header (the first five
>> lines)(these headers) from the file.
>>     
>
>   
>> As you can see, I can print out the three lines after the strange header
>> lines, but not the strange character lines. How can I match on those strange
>> characters? What are they?
>>     
>
> The strange characters seem to be box drawing characters from DOS
> codepage 437. See
> http://www.microsoft.com/globaldev/reference/oem/437.htm
>
> My guess is that the characters in your program are not actually the
> same as the characters in the file because they use different
> encodings. Try using the hex values for the characters:
> if re.search('\xc9\xcd\xcd\xcd', line):
>
> Kent
>   

Thanks Kent, that worked. This is what the output looks like:


$ python srclist.py
???????????????????????????????????????????????????????????????????????????????

Hi there!
Hi there!



Not exactly what I expected, but at least it's recognizing the line, now 
I can delete it.

Thanks a million!

Jerry


More information about the Tutor mailing list