Simple code and suggestion

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Wed Nov 30 09:01:38 EST 2016


g thakuri writes:

> I would want to avoid using multiple split in the below code , what
> options do we have before tokenising the line?, may be validate the
> first line any other ideas
>
>  cmd = 'utility   %s' % (file)
>  out, err, exitcode = command_runner(cmd)
>  data = stdout.strip().split('\n')[0].split()[5][:-2]

That .strip() looks suspicious to me, but perhaps you know better.

Also, stdout should be out, right?

You can use io.StringIO to turn a string into an object that you can
read line by line just like a file object. This reads just the first
line and picks the part that you want:

data = next(io.StringIO(out)).split()[5][:-2]

I don't know how much this affects performance, but it's kind of neat.

A thing I like to do is name all fields even I don't use them all. The
assignment will fail with an exception if there's an unexpected number
of fields, and that's usually what I want when input is bad:

line = next(io.StringIO(out))
ID, FORM, LEMMA, POS, TAGS, WEV, ETC = line.split()
data = WEV[:-2]

(Those are probably not appropriate names for your fields :)

Just a couple of ideas that you may like to consider.



More information about the Python-list mailing list