Extracting values from text file

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Fri Jun 16 05:20:13 EDT 2006


First try, probably there are better ways to do it, and it's far from
resilient, it breaks in lot of different ways (example: more than one
number in one line, number with text on both sides of the line, etc.)
I have divided the data munging in many lines so I can see what's
happening, and you can fix/modify the code quikly.

Bye,
bearophile


data1 = """
 Some text that can span some lines.
More text
  Apples 34
  56 Ducks

Some more text.

  0.5 g butter
"""

import re
# Separate lines in a list
data2 = data1.split("\n")
print data2, "\n"

# clear lines from trailing and leading spaces, newlines, etc.
data3 = map(str.strip, data2)
print data3, "\n"

# remove blank lines after the stripping
data4 = filter(None, data3)
print data4, "\n"

# create a list of (lines, numbers) of only the lines with a number
inside
patt1 = re.compile("\d+\.?\d*") # No scientific notation
data5 = [(line, n) for line in data4 for n in patt1.findall(line)]
print data5, "\n"

# remove the number from the lines, and strip such lines
data6 = [(line.replace(num, "").strip(), num) for line, num in data5]
print data6, "\n"

def nconv(num):
    "To convert a number to an int, and if not possible to a float"
    try:
        result = int(num)
    except ValueError:
        result = float(num)
    return result

# convert the number strings into ints or floats
data7 = [(line, nconv(num)) for line, num in data6]
print data7, "\n"

# build the final dict of (line: number)
result = dict(data7)
print result, "\n"




More information about the Python-list mailing list