[Tutor] splits and pops

bob gailer bgailer at gmail.com
Sat Jul 12 16:28:54 CEST 2008


Please reply to the list and not just me. That way we all get to 
contribute and to learn.

Eric Abrahamsen wrote:
> Sorry I haven't explained this clearly, it's just one more symptom of 
> my confusion... Your example has a tab between records as well as 
> between fields: 

That's not how I see it! Look again:

"11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29"

> my text file had tabs only between fields, and only a newline between 
> records.
>
> The test string I was practicing with was this:
>
> test = 'one\ttwo\tthree\nfour\tfive\tsix'
>
> split on tabs produced this:
>
> test = ['one', 'two', 'three\nfour', 'five', 'six']
>
> My loop (breaking test[2] on '\n') worked fine with this test, which 
> was what confused me. I only realized what the problem was when I 
> tried it on a test like this:
>
> test = ['one', 'two', 'three\nfour', 'five', 'six', 'seven\neight', 
> 'nine']
>
> That showed me that I needed to step one extra item, in order to reach 
> the next item that needed to be split. My brain still hurts.
>
>
> E
>
> On Jul 12, 2008, at 9:44 PM, bob gailer wrote:
>
>> Eric Abrahamsen wrote:
>>> I have a horribly stupid text parsing problem that is driving me 
>>> crazy, and making me think my Python skills have a long, long way to 
>>> go...
>>>
>>> What I've got is a poorly-though-out SQL dump, in the form of a text 
>>> file, where each record is separated by a newline, and each field in 
>>> each record is separated by a tab. BUT, and this is what sinks me, 
>>> there are also newlines within some of the fields. Newlines are not 
>>> 'safe' – they could appear anywhere – but tabs are 'safe' – they 
>>> only appear as field delimiters.
>>>
>>> There are nine fields per record. All I can think to do is read the 
>>> file in as a string, then split on tabs. That gives me a list where 
>>> every eighth item is a string like this: u'last-field\nfirst-field'. 
>>> Now I want to iterate through the list of strings, taking every 
>>> eighth item, splitting it on '\n', and replacing it with the two 
>>> resulting strings. Then I'll have the proper flat list where every 
>>> nine list items constitutes one complete record, and I'm good to go 
>>> from there.
>>>
>>> I've been fooling around with variations on the following (assuming 
>>> splitlist = fullstring.split('\t')):
>>>
>>> for x in xrange(8, sys.maxint, 8):
>>>    try:
>>>        splitlist[x:x] = splitlist.pop(x).split('\n')
>>>    except IndexError:
>>>        break
>>>
>>> The first line correctly steps over all the list items that need to 
>>> be split, but I can't come up with a line that correctly replaces 
>>> those list items with the two strings I want. Either the cycle goes 
>>> off and splits the wrong strings, or I get nested list items, which 
>>> is not what I want. Can someone please point me in the right 
>>> direction here?
>> I  tried a simple case with fullstring = 
>> "11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29" 
>>
>> Your spec is a little vague "each field in each record is separated 
>> by a tab". I assumed that to mean "fields in each record are 
>> separated by tabs".
>> The result was ['11', '12', '13', '\n14', '15', '16', '17', '18', 
>> '19', '21', '22', '23', '24', '25', '26', '27', '28', '29']
>> which I had expected.
>>
>> Give us an example of text for which it does not work.
>>>
>>>
>>
>>
>> -- 
>> Bob Gailer
>> 919-636-4239 Chapel Hill, NC
>>
>
>


-- 
Bob Gailer
919-636-4239 Chapel Hill, NC



More information about the Tutor mailing list