[Tutor] Parsing txt file
Cameron Simpson
cs at zip.com.au
Thu Aug 21 01:31:53 CEST 2014
On 20Aug2014 16:35, Dima Kulik <dexternet89 at mail.ru> wrote:
> Hi to all. I have a problem with parsing file.
>I have txt file exported from AD and it has such structure:
>
>DistinguishedName : CN=*** ,OU=*** ,OU=*** ,DC=*** ,DC=***,DC=***
>GroupCategory : Distribution
>GroupScope : Universal
>Name : ****
>ObjectClass : group
>ObjectGUID : 0b74b4e2-aad1-4342-a8f4-2fa7763e1d49
>SamAccountName : ****
>SID : S-1-5-21-1801674531-492894223-839522115-16421
[...]
>I've tried to make little parser:
>
>keywords = ['Name', 'Name:']
>input_file=open("Mail_Groups.txt","r").readlines()
>output_file=open("Out.txt","w")
>for line in input_file:
> for word in line.split():
Aside from the remarks from others, I would change the way you're parsing each
line. Based entirely on what you show above, I'd make the main out loops look
like this:
for line in input_file:
left, right = line.split(':', 1)
label = left.strip()
value = right.strip()
and then made decisions using "label" and "value".
Your approach breaks the line into "words" on whitespace, which has several
difficulties, including that the example input data look like a report. Often
things with trailing colons will abut the colon if the label is long, eg:
HereIsALongNameLabel: info...
Your split() will be presuming the colon is spaced out.
Just splitting once on the first colon and the trimming the whitespace from the
two piece is simpler and gets you a more reliable parse.
Cheers,
Cameron Simpson <cs at zip.com.au>
Trust the computer industry to shorten Year 2000 to Y2K. It was this
thinking that caused the problem in the first place.
- Mark Ovens <marko at uk.radan.com>
More information about the Tutor
mailing list