[Tutor] text processing lines variable content

Mark Lawrence breamoreboy at gmail.com
Wed Feb 6 13:07:10 EST 2019


On 06/02/2019 16:33, ingo janssen wrote:
> For parsing the out put of the Voro++ program and writing the data to a 
> POV-Ray include file I created a bunch of functions.
> 
> def pop_left_slice(inputlist, length):
>    outputlist = inputlist[0:length]
>    del inputlist[:length]
>    return outputlist

That's going to a lot of work slicing and dicing the input lists. 
Perhaps a chunked recipe like this 
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.chunked 
would be better.

> 
> this is used by every function to chop of the required part of the input 
> line.
> Two examples of the functions that proces a chopped of slice of the line 
> and append the data to the approriate list.
> 
> def f_vector(outlist):
>    x,y,z = pop_left_slice(line,3)
>    outlist.append(f"<{x},{y},{z}>,")
> 
> def f_vector_array(outlist, length):
>    rv = pop_left_slice(line, length)
>    rv = [f'<{i[1:-1]}>' for i in rv]  #i format is: '(1.234,2.345,3.456)'
>    rv = ",".join(rv)
>    outlist.append(f"  //label: {lbl}\n  array[{length}]"+"{\n "+rv+"\n  
> }\n")
> 
> Every line can contain up to 21 data chunks. Within one file each line 
> contains the same amount of chunks, but it varies between files. The 
> types of chunks vary and their position varies. I know beforehand how a 
> line in a file is constructed. I'd like to adapt the order in that the 
> functions are applied, but how?

I suspect that you're trying to over complicate things, what's wrong 
with a simple if/elif chain, a switch based on a dict or similar?

> 
> for i, line in enumerate(open("vorodat.vol",'r')):
>    points = i+1

enumerate takes a start argument so you shouldn't need the above line.

>    line = line.strip()
>    line = line.split(" ")
>    lbl = f_label(label)
>    f_vector(point)

Presumably the above is points?

>    f_value(radius)
>    v=f_number(num_vertex)
>    f_vector_array(rel_vertex,v)
>    f_vector_array(glob_vertex,v)
>    f_value_array(vertex_orders,v)
>    f_value(max_radius)
>    e=f_number(num_edge)
>    f_value(edge_dist)
>    ...etc
> 
> I thought about putting the functions in a dict and then create a list 
> with the proper order, but can't get it to work.

Please show us your code and exactly why it didn't work.

> 
> A second question, all this works for small files with hundreds of 
> lines, but some have 100000. Then I can get at max 22 lists with 100000 
> items. Not fun. I tried writing the data to a file "out of sequence", 
> not fun either. What would be the way to do this?
> I thought about writing each data chunk to a proper temporary file 
> instead of putting it in a list first. This would require at max 22 temp 
> files and then a merge of the files into one.

I'm not absolutely sure what you're saying here, but would something 
like the SortedList from 
http://www.grantjenks.com/docs/sortedcontainers/ help?

> 
> TIA,
> 
> ingo
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
> 


-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence



More information about the Tutor mailing list