Strange re problem

Peter Otten __peter__ at web.de
Fri Jun 20 07:12:03 EDT 2008


TYR wrote:

> OK, this ought to be simple. I'm parsing a large text file (originally
> a database dump) in order to process the contents back into a SQLite3
> database. The data looks like this:
> 
> 'AAA','PF',-17.416666666667,-145.5,'Anaa, French Polynesia','Pacific/
> Tahiti','Anaa';'AAB','AU',-26.75,141,'Arrabury, Queensland,
> Australia','?','?';'AAC','EG',31.133333333333,33.8,'Al Arish,
> Egypt','Africa/Cairo','El Arish International';'AAE','DZ',
> 36.833333333333,8,'Annaba','Africa/Algiers','Rabah Bitat';
> 
> which goes on for another 308 lines. As keen and agile minds will no
> doubt spot, the rows are separated by a ; so it should be simple to
> parse it using a regex. So, I establish a db connection and cursor,
> create the table, and open the source file.
> 
> Then we do this:
> 
> f = file.readlines()
> biglist = re.split(';', f)
> 
> and then iterate over the output from re.split(), inserting each set
> of values into the db, and finally close the file and commit
> transactions. But instead, I get this error:
> 
> Traceback (most recent call last):
>   File "converter.py", line 12, in <module>
>     biglist = re.split(';', f)
>   File "/usr/lib/python2.5/re.py", line 157, in split
>     return _compile(pattern, 0).split(string, maxsplit)
> TypeError: expected string or buffer
> 
> Is this because the lat and long values are integers rather than
> strings? (If so, any ideas?)

No, the result of f.readlines() is a list, but re.split() expects a string
as the second parameter.

f = file.read()
biglist = re.split(";", f)

should work if the file fits into memory, but you don't need regular
expressions here:

biglist = file.read().split(";")

is just as good -- or bad, if your data contains any ";" characters.

Peter



More information about the Python-list mailing list