Data structure for plotting monotonically expanding data set

Peter J. Holzer hjp-python at hjp.at
Thu May 27 13:03:53 EDT 2021


On 2021-05-27 11:28:11 +0200, Loris Bennett wrote:
> I currently a have around 3 years' worth of files like
> 
>   home.20210527
>   home.20210526
>   home.20210525
>   ...
> 
> so around 1000 files, each of which contains information about data
> usage in lines like
> 
>   name    kb
>   alice   123
>   bob     4
>   ...
>   zebedee 9999999
> 
> (there are actually more columns).  I have about 400 users and the
> individual files are around 70 KB in size.
> 
> Once a month I want to plot the historical usage as a line graph for the
> whole period for which I have data for each user.
[...]
> Obviously I will want to extract all the data for all users from a file
> once I have opened it.  After looping over all files I would naively end
> up with, say, a nested dict like
> 
>     {"20210527": { "alice" : 123, , ..., "zebedee": 9999999},
>      "20210526": { "alice" : 123, "bob" : 3, ..., "zebedee": 9},
>      "20210525": { "alice" : 123, "bob" : 1, ..., "zebedee": 9999999},
>      "20210524": { "alice" : 123, ..., "zebedee": 9},
>      "20210523": { "alice" : 123, ..., "zebedee": 9999999},
>      ...}
> 
> where the user keys would vary over time as accounts, such as 'bob', are
> added and latter deleted.
> 
> Is creating a potentially rather large structure like this the best way
> to go (I obviously could limit the size by, say, only considering the
> last 5 years)?

I don't think that would be a problem. However, I assume that you want
to create one graph per user, not a single graph with 400 lines (that
would be very cluttered). So I would swap the levels around:

{
    "alice": { "20210527": 123, "20210526": 123, ... },
    "bob":   { "20210526": 3, "20210525", 1, ... },
    "zebedee": { "20210527": 9999999, "20210526": 9, ... }
}

That way you have the data for each graph grouped together.

It might also be a good idea to use actual date objects instead of
strings.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20210527/1588721c/attachment.sig>


More information about the Python-list mailing list