[Tutor] splits and pops
bob gailer
bgailer at gmail.com
Sat Jul 12 16:28:54 CEST 2008
Please reply to the list and not just me. That way we all get to
contribute and to learn.
Eric Abrahamsen wrote:
> Sorry I haven't explained this clearly, it's just one more symptom of
> my confusion... Your example has a tab between records as well as
> between fields:
That's not how I see it! Look again:
"11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29"
> my text file had tabs only between fields, and only a newline between
> records.
>
> The test string I was practicing with was this:
>
> test = 'one\ttwo\tthree\nfour\tfive\tsix'
>
> split on tabs produced this:
>
> test = ['one', 'two', 'three\nfour', 'five', 'six']
>
> My loop (breaking test[2] on '\n') worked fine with this test, which
> was what confused me. I only realized what the problem was when I
> tried it on a test like this:
>
> test = ['one', 'two', 'three\nfour', 'five', 'six', 'seven\neight',
> 'nine']
>
> That showed me that I needed to step one extra item, in order to reach
> the next item that needed to be split. My brain still hurts.
>
>
> E
>
> On Jul 12, 2008, at 9:44 PM, bob gailer wrote:
>
>> Eric Abrahamsen wrote:
>>> I have a horribly stupid text parsing problem that is driving me
>>> crazy, and making me think my Python skills have a long, long way to
>>> go...
>>>
>>> What I've got is a poorly-though-out SQL dump, in the form of a text
>>> file, where each record is separated by a newline, and each field in
>>> each record is separated by a tab. BUT, and this is what sinks me,
>>> there are also newlines within some of the fields. Newlines are not
>>> 'safe' – they could appear anywhere – but tabs are 'safe' – they
>>> only appear as field delimiters.
>>>
>>> There are nine fields per record. All I can think to do is read the
>>> file in as a string, then split on tabs. That gives me a list where
>>> every eighth item is a string like this: u'last-field\nfirst-field'.
>>> Now I want to iterate through the list of strings, taking every
>>> eighth item, splitting it on '\n', and replacing it with the two
>>> resulting strings. Then I'll have the proper flat list where every
>>> nine list items constitutes one complete record, and I'm good to go
>>> from there.
>>>
>>> I've been fooling around with variations on the following (assuming
>>> splitlist = fullstring.split('\t')):
>>>
>>> for x in xrange(8, sys.maxint, 8):
>>> try:
>>> splitlist[x:x] = splitlist.pop(x).split('\n')
>>> except IndexError:
>>> break
>>>
>>> The first line correctly steps over all the list items that need to
>>> be split, but I can't come up with a line that correctly replaces
>>> those list items with the two strings I want. Either the cycle goes
>>> off and splits the wrong strings, or I get nested list items, which
>>> is not what I want. Can someone please point me in the right
>>> direction here?
>> I tried a simple case with fullstring =
>> "11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29"
>>
>> Your spec is a little vague "each field in each record is separated
>> by a tab". I assumed that to mean "fields in each record are
>> separated by tabs".
>> The result was ['11', '12', '13', '\n14', '15', '16', '17', '18',
>> '19', '21', '22', '23', '24', '25', '26', '27', '28', '29']
>> which I had expected.
>>
>> Give us an example of text for which it does not work.
>>>
>>>
>>
>>
>> --
>> Bob Gailer
>> 919-636-4239 Chapel Hill, NC
>>
>
>
--
Bob Gailer
919-636-4239 Chapel Hill, NC
More information about the Tutor
mailing list