[TriPython] TriPython October 2020 Online Meeting: Merging Traffic Ahead

Calloway, Chris cbc at unc.edu
Mon Oct 26 14:28:19 EDT 2020


https://www.meetup.com/tripython/events/274045807/

Thursday, October 29, 2020
6:00 PM to 8:00 PM EDT

This will be our second-ever online bimonthly TriPython meeting. It will consist of a featured presentation by our very own Mark Hutchinson. His presentation 'heapq is neither Heap nor Stack' will focus on a recent intriguing data matching problem that had to be solved using as little CPU and memory as possible. This will be followed by impromptu lightning talks. The meeting will be conducted via Zoom. Please RSVP for this event on meetup.com to view the Zoom link.

Title(s):
Merging Traffic Ahead
Merge is not Zip
Hey, merge(), your assumptions annoy me
heapq is neither Heap nor Stack

Abstract:
I faced a data matching problem where the runtime environment was to consume as little CPU and memory as possible. In other words, be inconspicuous and unobtrusive.  The data was in two rather large files, so in-memory processing was out.

I looked at the heapq.merge() method, but it only works for ascending data. This presentation will follow my journey to be able to use heapq.merge() in ways that I didn't think possible. I was also able to expand the flexibility of the sort/sorted function.

Featured Libraries:
heapq
itertools
pandas
pandas.DataFrame.sort_values
pandas.DataFrame.drop_duplicates
pandas.DataFrame.to_csv
Featured Data Structures:
Classes
Lists

Performance Topics:
Space vs. time trade-off
I/O vs. memory trade-off
Measuring the memory of Python data structures
Chunking and buffering

Computer Science Topics:
Search (look-up)
Internal vs. External sorts
Multi-column sorting
Merge Sort

--
Sincerely,

Chris Calloway
Applications Analyst
University of North Carolina
Renaissance Computing Institute
(919) 599-3530

-------------- next part --------------
   [1]https://www.meetup.com/tripython/events/274045807/

    

   Thursday, October 29, 2020
   6:00 PM to 8:00 PM EDT

    

   This will be our second-ever online bimonthly TriPython meeting. It will
   consist of a featured presentation by our very own Mark Hutchinson. His
   presentation 'heapq is neither Heap nor Stack' will focus on a recent
   intriguing data matching problem that had to be solved using as little CPU
   and memory as possible. This will be followed by impromptu lightning
   talks. The meeting will be conducted via Zoom. Please RSVP for this event
   on meetup.com to view the Zoom link.

    

   Title(s): 

   Merging Traffic Ahead

   Merge is not Zip

   Hey, merge(), your assumptions annoy me

   heapq is neither Heap nor Stack

    

   Abstract:

   I faced a data matching problem where the runtime environment was to
   consume as little CPU and memory as possible. In other words, be
   inconspicuous and unobtrusive.  The data was in two rather large files, so
   in-memory processing was out.

    

   I looked at the heapq.merge() method, but it only works for ascending
   data. This presentation will follow my journey to be able to use
   heapq.merge() in ways that I didn't think possible. I was also able to
   expand the flexibility of the sort/sorted function.

    

   Featured Libraries:

   heapq

   itertools

   pandas

   pandas.DataFrame.sort_values

   pandas.DataFrame.drop_duplicates

   pandas.DataFrame.to_csv

   Featured Data Structures:

   Classes

   Lists

    

   Performance Topics:

   Space vs. time trade-off

   I/O vs. memory trade-off

   Measuring the memory of Python data structures

   Chunking and buffering

    

   Computer Science Topics:

   Search (look-up)

   Internal vs. External sorts

   Multi-column sorting

   Merge Sort

    

   -- 

   Sincerely,

    

   Chris Calloway

   Applications Analyst

   University of North Carolina

   Renaissance Computing Institute

   (919) 599-3530

    

References

   Visible links
   1. https://www.meetup.com/tripython/events/274045807/


More information about the TriZPUG mailing list