[Tutor] Python - help with something most essential

Abdur-Rahmaan Janhangeer arj.python at gmail.com
Mon Jun 12 04:21:24 EDT 2017


i might add that
with open( . . .

instead of

foo = open( . . .

also shows some maturity in py

Abdur-Rahmaan Janhangeer,
Mauritius
abdurrahmaanjanhangeer.wordpress.com

On 11 Jun 2017 12:33, "Peter Otten" <__peter__ at web.de> wrote:

> Japhy Bartlett wrote:
>
> > I'm not sure that they cared about how you used file.readlines(), I think
> > the memory comment was a hint about instantiating Counter()s
>
> Then they would have been clueless ;)
>
> Both Schtvveer's original script and his subsequent "Verschlimmbesserung"
> --
> beautiful german word for making things worse when trying to improve them
> --
> use only two Counters at any given time. The second version is very
> inefficient because it builds the same Counter over and over again -- but
> this does not affect peak memory usage much.
>
> Here's the original version that triggered the comment:
>
> [Schtvveer Schvrveve]
>
> > import sys
> > from collections import Counter
> >
> > def main(args):
> >     filename = args[1]
> >     word = args[2]
> >     print countAnagrams(word, filename)
> >
> > def countAnagrams(word, filename):
> >
> >     fileContent = readFile(filename)
> >
> >     counter = Counter(word)
> >     num_of_anagrams = 0
> >
> >     for i in range(0, len(fileContent)):
> >         if counter == Counter(fileContent[i]):
> >             num_of_anagrams += 1
> >
> >     return num_of_anagrams
> >
> > def readFile(filename):
> >
> >     with open(filename) as f:
> >         content = f.readlines()
> >
> >     content = [x.strip() for x in content]
> >
> >     return content
> >
> > if __name__ == '__main__':
> >     main(sys.argv)
> >
>
> referenced as before.py below, and here's a variant that removes
> readlines(), range(), and the [x.strip() for x in content] list
> comprehension, the goal being minimal changes, not code as I would write it
> from scratch.
>
> # after.py
> import sys
> from collections import Counter
>
> def main(args):
>     filename = args[1]
>     word = args[2]
>     print countAnagrams(word, filename)
>
> def countAnagrams(word, filename):
>
>     fileContent = readFile(filename)
>     counter = Counter(word)
>     num_of_anagrams = 0
>
>     for line in fileContent:
>         if counter == Counter(line):
>             num_of_anagrams += 1
>
>     return num_of_anagrams
>
> def readFile(filename):
>     # this relies on garbage collection to close the file
>     # which should normally be avoided
>     for line in open(filename):
>         yield line.strip()
>
> if __name__ == '__main__':
>     main(sys.argv)
>
> How to measure memoryview? I found
> <https://stackoverflow.com/questions/774556/peak-memory-
> usage-of-a-linux-unix-process> and as test data I use files containing
> 10**5 and 10**6
> integers. With that setup (snipping everything but memory usage from the
> time -v output):
>
> $ /usr/bin/time -v python before.py anagrams5.txt 123
> 6
>         Maximum resident set size (kbytes): 17340
> $ /usr/bin/time -v python before.py anagrams6.txt 123
> 6
>         Maximum resident set size (kbytes): 117328
>
>
> $ /usr/bin/time -v python after.py anagrams5.txt 123
> 6
>         Maximum resident set size (kbytes): 6432
> $ /usr/bin/time -v python after.py anagrams6.txt 123
> 6
>         Maximum resident set size (kbytes): 6432
>
> See the pattern? before.py uses O(N) memory, after.py O(1).
>
> Run your own tests if you need more datapoints or prefer a different method
> to measure memory consumption.
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list