[Tutor] My problem in simple terms
Peter Otten
__peter__ at web.de
Mon Mar 4 09:07:09 EST 2019
Edward Kanja wrote:
> Hi there ,
> Earlier i had sent an email on how to use re.sub function to eliminate
> square brackets. I have simplified the statements. Attached txt file named
> unon.Txt has the data im extracting from. The file named code.txt has the
> codes I'm using to extract the data.The regular expression works fine but
> my output has too many square brackets. How do i do away with them thanks.
The square brackets appear because re.findall() returns a list. If you know
that there is only one match or if you are only interested in the first
match you can extract it with
first = re.findall(...)[1]
This will of course fail if there is no match at all, so you have to check
the length first. You can also use the length check to skip the lines with
no match at all, i. e. the line appearing as
[] [] []
in your script's output.
Now looking at your data -- at least from the sample it seems to be rather
uniform. There are records separated by "---..." and fields separated by
"|". I'd forego regular expressions for that:
$ cat code.py
from itertools import groupby
def is_record_sep(line):
return not line.rstrip().strip("-")
with open("unon.txt") as instream:
for sep, group in groupby(instream, key=is_record_sep):
if not sep:
record = [
[field.strip() for field in line.split("|")]
for line in group if line.strip().strip("|-")
]
# select field by their position in the record
names = record[0][1]
station = record[0][2]
index = record[1][1]
print(index, names, station, sep=", ")
$ python3 code.py
11113648, Rawzeea NLKPP, VE11-Nairobi
10000007, Pattly MUNIIZ, TX00-Nairobi
More information about the Tutor
mailing list