CSV methodology

Terry Reedy tjreedy at udel.edu
Sun Sep 14 14:42:52 EDT 2014


On 9/14/2014 12:56 PM, jayte wrote:
> On Sun, 14 Sep 2014 03:02:12 -0400, Terry Reedy <tjreedy at udel.edu> wrote:
>
>> On 9/13/2014 9:34 PM, jetrn at newsguy.com wrote:
>
> [...]
>
>> First you need to think about (and document) what your numbers mean and
>> how they should be organized for analysis.
>>
>>> An example of the data:
>>> 1.850358651774470E-0002
>>
>> Why is this so smaller than the next numbers.  Are all those digits
>> significant, or are they mostly just noise -- and best dropped by
>> rounding the number to a few significant digits.
>
> Sorry, I neglected to mention the values' significance.  The MXP program
> uses the "distance estimate" algorithm in its fractal data generation.  The
> values are thus, for each point in a 1778 x 1000 image:
>
> Distance,   (an extended double)
> Iterations,  (a 16 bit int)
> zc_x,        (a 16 bit int)
> zc_y         (a 16 bit int)

> (Durring the "orbit" calculations, the result of each iteration will be positive
> or negative.  The "zc" is a "zero crossing" count, or frequency, and is
> used in the eventual coloring algorithm.  "Distance" can range from zero
> to very small.)
>
>>> 32
>>> 22
>>> 27

If you can output Distance as an 8 byte double in IEEE 754 binary64 
format, you could read a binary file with Python using the struct module.
https://docs.python.org/3/library/struct.html#module-struct
This would be the fastest way for output and subsequent input.

If you want a text file either for human readability or because you 
cannot write a readable binary file, each line would have the 4 fields 
listed above (with image row/col implied by line number). I strongly 
suggest a fixed column format with spaces between the fields.  It might 
be easiest to put the float as the end instead of the beginning. 
Allowing max iterations = 999, your first line would look like

  32  22  27 1.850358651774470E-0002

for line in file
     iternum, zc_x, zc_y, distance = line.split()

This will be faster for both you and the machine to read.

> Initially, I tried exporting the raw binary (hex) data,

'hex' usually means a hex string representation of the raw binary, so I 
am not sure what you actually did.

 > and no matter
> what I tried, could not get Python to read it in any useful way (which
> I attribute to my lack of knowledge in Python)

Again, see struct module.  Your mention of 'zooming' implies that you 
might want to write and read multiple multi-megabyte images in a single 
session.  If so, true binary would be best.

Another suggestion is to use the numpy package to read your image files 
into binary arrays (rather than arrays of Python number objects).  This 
would be much faster (and use less memory). Numpy arrays are 'standard' 
within Pythonland and can be used by scipy and other programs to display 
and analyze the arrays.

A third idea is to make your generator directly callable from Python. 
If you can put it in a dll, you could access it with ctypes.  If you can 
wrap it in a C program, you could use cython to make a Python extension 
modules.

Now I have probably given you too much to think about, so time to stop ;-).

-- 
Terry Jan Reedy




More information about the Python-list mailing list