[Tutor] Parsing txt file

Cameron Simpson cs at zip.com.au
Thu Aug 21 01:31:53 CEST 2014


On 20Aug2014 16:35, Dima Kulik <dexternet89 at mail.ru> wrote:
> Hi to all. I have a problem with parsing file.
>I have txt file exported from AD and it has such structure:
>
>DistinguishedName : CN=*** ,OU=*** ,OU=*** ,DC=*** ,DC=***,DC=***
>GroupCategory     : Distribution
>GroupScope        : Universal
>Name              : ****
>ObjectClass       : group
>ObjectGUID        : 0b74b4e2-aad1-4342-a8f4-2fa7763e1d49
>SamAccountName    : ****
>SID               : S-1-5-21-1801674531-492894223-839522115-16421
[...]
>I've tried to make little parser:
>
>keywords = ['Name', 'Name:']
>input_file=open("Mail_Groups.txt","r").readlines()
>output_file=open("Out.txt","w")
>for line in input_file:
>    for word in line.split():

Aside from the remarks from others, I would change the way you're parsing each 
line. Based entirely on what you show above, I'd make the main out loops look 
like this:

   for line in input_file:
       left, right = line.split(':', 1)
       label = left.strip()
       value = right.strip()

and then made decisions using "label" and "value".

Your approach breaks the line into "words" on whitespace, which has several 
difficulties, including that the example input data look like a report. Often 
things with trailing colons will abut the colon if the label is long, eg:

     HereIsALongNameLabel: info...

Your split() will be presuming the colon is spaced out.

Just splitting once on the first colon and the trimming the whitespace from the 
two piece is simpler and gets you a more reliable parse.

Cheers,
Cameron Simpson <cs at zip.com.au>

Trust the computer industry to shorten Year 2000 to Y2K. It was this
thinking that caused the problem in the first place.
- Mark Ovens <marko at uk.radan.com>


More information about the Tutor mailing list