memory leak with re.match

Mayling ge maylinge0903 at gmail.com
Wed Jul 5 04:36:04 EDT 2017


   Sorry. The code  here is just  to describe  the issue and  is just  pseudo
   code, please  forgive some  typo. I  list out  lines because  I need  line
   context.
   Sent from Mail Master
   On 07/05/2017 15:52, [1]Albert-Jan Roskam wrote:

     From: Python-list
     <python-list-bounces+sjeik_appie=hotmail.com at python.org> on behalf of
     Mayling ge <maylinge0903 at gmail.com>
     Sent: Tuesday, July 4, 2017 9:01 AM
     To: python-list
     Subject: memory leak with re.match

        Hi,

        My function is in the following way to handle file line by line.
     There are
        multiple error patterns  defined and  need to apply  to each  line.
     I  use
        multiprocessing.Pool to handle the file in block.

        The memory usage increases to 2G for a 1G file. And stays in 2G even
     after
        the file processing. File closed in the end.

        If I comment  out the  call to re_pat.match,  memory usage  is
     normal  and
        keeps under 100Mb.

        am I using re in a wrong way? I cannot figure out a way to fix the
     memory
        leak. And I googled .

        def line_match(lines, errors)

       <snip>

                lines = list(itertools.islice(fo, line_per_proc))

     ===> do you really need to listify the iterator?
                if not lines:

                    break

                result = p.apply_async(line_match, args=(errors, lines))

     ===> the signature of line_match is (lines, errors), in args you do
     (errors, lines)

References

   Visible links
   1. mailto:sjeik_appie at hotmail.com



More information about the Python-list mailing list