Splitting text into lines

George Trojan - NOAA Federal george.trojan at noaa.gov
Tue Dec 13 11:45:34 EST 2016


I have files containing ASCII text with line s separated by '\r\r\n'.
Example:

$ od -c FTAK31_PANC_131140.1481629265635
0000000   F   T   A   K   3   1       P   A   N   C       1   3   1   1
0000020   4   0  \r  \r  \n   T   A   F   A   B   E  \r  \r  \n   T   A
0000040   F  \r  \r  \n   P   A   B   E       1   3   1   1   4   0   Z
0000060       1   3   1   2   /   1   4   1   2       0   7   0   1   0
0000100   K   T       P   6   S   M       S   C   T   0   3   5       O
0000120   V   C   0   6   0  \r  \r  \n                       F   M   1
0000140   3   2   1   0   0       1   0   0   1   2   G   2   0   K   T
0000160       P   6   S   M       B   K   N   1   0   0       W   S   0
0000200   1   5   /   1   8   0   3   5   K   T  \r  \r  \n
0000220           F   M   1   4   1   0   0   0       0   9   0   1   5
0000240   G   2   5   K   T       P   6   S   M       B   K   N   0   5
0000260   0       W   S   0   1   5   /   1   8   0   4   0   K   T   =
0000300  \r  \r  \n
0000303

What is the proper way of getting a list of lines?
Both
>>> open('FTAK31_PANC_131140.1481629265635').readlines()
['FTAK31 PANC 131140\n', '\n', 'TAFABE\n', '\n', 'TAF\n', '\n', 'PABE
131140Z 1312/1412 07010KT P6SM SCT035 OVC060\n', '\n', '     FM132100
10012G20KT P6SM BKN100 WS015/18035KT\n', '\n', '     FM141000 09015G25KT
P6SM BKN050 WS015/18040KT=\n', '\n']

and

>>> open('FTAK31_PANC_131140.1481629265635').read().splitlines()
['FTAK31 PANC 131140', '', 'TAFABE', '', 'TAF', '', 'PABE 131140Z 1312/1412
07010KT P6SM SCT035 OVC060', '', '     FM132100 10012G20KT P6SM BKN100
WS015/18035KT', '', '     FM141000 09015G25KT P6SM BKN050 WS015/18040KT=',
'']

introduce empty (or single character '\n') strings. I can do this:

>>> [x.rstrip() for x in open('FTAK31_PANC_131140.1481629265635',
'rb').read().decode().split('\n')]
['FTAK31 PANC 131140', 'TAFABE', 'TAF', 'PABE 131140Z 1312/1412 07010KT
P6SM SCT035 OVC060', '     FM132100 10012G20KT P6SM BKN100 WS015/18035KT',
'     FM141000 09015G25KT P6SM BKN050 WS015/18040KT=', '']

but it looks cumbersome. I Python2.x I stripped '\r' before passing the
string to split():

>>> open('FTAK31_PANC_131140.1481629265635').read().replace('\r', '')
'FTAK31 PANC 131140\nTAFABE\nTAF\nPABE 131140Z 1312/1412 07010KT P6SM
SCT035 OVC060\n     FM132100 10012G20KT P6SM BKN100 WS015/18035KT\n
FM141000 09015G25KT P6SM BKN050 WS015/18040KT=\n'

but Python 3.x replaces '\r\r\n' by '\n\n' on read().

Ideally I'd like to have code that handles both '\r\r\n' and '\n' as the
split character.

George



More information about the Python-list mailing list