Reading files, splitting on a delimiter and newlines.

John Machin sjmachin at lexicon.net
Wed Jul 25 18:44:29 EDT 2007


On Jul 26, 3:08 am, Stargaming <stargam... at gmail.com> wrote:
> On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote:
> > On Jul 25, 10:46 am, chris... at gmail.com wrote:
> >> Hello,
>
> >> I have a situation where I have a file that contains text similar to:
>
> >> myValue1 = contents of value1
> >> myValue2 = contents of value2 but
> >>                         with a new line here
> >> myValue3 = contents of value3
>
> >> My first approach was to open the file, use readlines to split the
> >> lines on the "=" delimiter into a key/value pair (to be stored in a
> >> dict).
>
> >> After processing a couple files I noticed its possible that a newline
> >> can be present in the value as shown in myValue2.
>
> >> In this case its not an option to say remove the newlines if its a
> >> "multi line" value as the value data needs to stay intact.
>
> >> I'm a bit confused as how to go about getting this to work.
>
> >> Any suggestions on an approach would be greatly appreciated!
>
> > I'm confused. You don't want the newline to be present, but you can't
> > remove it because the data has to stay intact? If you don't want to
> > change it, then what's the problem?
>
> > Mike
>
> It's obviously that simple line-by-line filtering won't handle multi-line
> statements.
>
> You could solve that by saving the last item you added something to and,
> if the line currently handles doesn't look like an assignment, append it
> to this item. You might run into problems with such data:
>
>   foo = modern maths
>   proved that 1 = 1
>   bar = single
>
> If your dataset always has indendation on subsequent lines, you might use
> this. Or if the key's name is always just one word.
>

My take: all of the above, plus: Given that you want to extract stuff
of the form <LHS> = <RHS> I'd suggest developing a fairly precise
regular expression for LHS, maybe even for RHS, and trying this on as
many of these files as you can.

Why an RE for RHS? Consider:

foo = somebody said "I think that
REs = trouble
maybe_better = pyparsing"

:-)




More information about the Python-list mailing list