Iterate over text file, discarding some lines via context manager

wxjmfauth at gmail.com wxjmfauth at gmail.com
Sat Nov 29 03:00:33 EST 2014


>>> with open('UnicodeData.txt', 'rb') as f:
...     t = f.read()
...     
>>> t = t.decode('ascii')
>>> z = t.splitlines()
>>> # process
>>> zz = [e.split(';') for e in z]
>>> for e in zz[:3]:
...     print(e)
...     
['0000', '<control>', 'Cc', '0', 'BN', '', '', '', '', 'N', 'NULL', '', '', '', '']
['0001', '<control>', 'Cc', '0', 'BN', '', '', '', '', 'N', 'START OF HEADING', '', '', '', '']
['0002', '<control>', 'Cc', '0', 'BN', '', '', '', '', 'N', 'START OF TEXT', '', '', '', '']
>>> (len(t), len(z), len(zz))
(1509570, 27268, 27268)
>>> 


Fast, simple, unbeatable, (without aspirin).

jmf



More information about the Python-list mailing list