memory leak with re.match

Albert-Jan Roskam sjeik_appie at hotmail.com
Wed Jul 5 03:52:12 EDT 2017


From: Python-list <python-list-bounces+sjeik_appie=hotmail.com at python.org> on behalf of Mayling ge <maylinge0903 at gmail.com>
Sent: Tuesday, July 4, 2017 9:01 AM
To: python-list
Subject: memory leak with re.match
    
   Hi,

   My function is in the following way to handle file line by line. There are
   multiple error patterns  defined and  need to apply  to each  line. I  use
   multiprocessing.Pool to handle the file in block.

   The memory usage increases to 2G for a 1G file. And stays in 2G even after
   the file processing. File closed in the end.

   If I comment  out the  call to re_pat.match,  memory usage  is normal  and
   keeps under 100Mb.

   am I using re in a wrong way? I cannot figure out a way to fix the  memory
   leak. And I googled .

   def line_match(lines, errors)

  <snip>

           lines = list(itertools.islice(fo, line_per_proc))

===> do you really need to listify the iterator?
           if not lines:

               break

           result = p.apply_async(line_match, args=(errors, lines))

===> the signature of line_match is (lines, errors), in args you do (errors, lines)




More information about the Python-list mailing list