[Tutor] My problem in simple terms

Mon Mar 4 09:07:09 EST 2019

Edward Kanja wrote:

> Hi there ,
> Earlier i had sent an email on how to use re.sub function to eliminate
> square brackets. I have simplified the statements. Attached txt file named
> unon.Txt has the data im extracting from. The file named code.txt has the
> codes I'm using to extract the data.The regular expression works fine but
> my output has too many square brackets. How do i do away with them thanks.

The square brackets appear because re.findall() returns a list. If you know 
that there is only one match or if you are only interested in the first 
match you can extract it with

first = re.findall(...)[1]

This will of course fail if there is no match at all, so you have to check 
the length first. You can also use the length check to skip the lines with 
no match at all, i. e. the line appearing as

[] [] []

in your script's output.

Now looking at your data -- at least from the sample it seems to be rather 
uniform. There are records separated by "---..." and fields separated by 
"|". I'd forego regular expressions for that:

$ cat code.py
from itertools import groupby

def is_record_sep(line):
    return not line.rstrip().strip("-")

with open("unon.txt") as instream:
    for sep, group in groupby(instream, key=is_record_sep):
        if not sep:
            record = [
                [field.strip() for field in line.split("|")]
                for line in group if line.strip().strip("|-")
            ]
            # select field by their position in the record
            names = record[0][1]
            station = record[0][2]
            index = record[1][1]
            print(index, names, station, sep=", ")
$ python3 code.py 
11113648, Rawzeea NLKPP, VE11-Nairobi
10000007, Pattly MUNIIZ, TX00-Nairobi