[Tutor] String within a string solution (newbie question)

Matt Ruffalo matt.ruffalo at gmail.com
Thu Oct 27 12:04:14 EDT 2016


On 10/26/2016 02:06 PM, Wish Dokta wrote:
> Hello,
>
> I am currently writing a basic program to calculate and display the size of
> folders with a drive/directory. To do this I am storing each directory in a
> dict as the key, with the value being the sum of the size of all files in
> that directories (but not directories).
>
> For example:
>
> { "C:\\docs" : 10, "C:\\docs123" : 200, "C:\\docs\\code\\snippets" : 5,
> "C:\\docs\\code" : 20, "C:\\docs\\pics" : 200, "C:\\docs\\code\\python" :
> 10  }
>
> Then to return the total size of a directory I am searching for a string in
> the key:
>
> For example:
>
> for "C:\\docs\\code" in key:
>
> Which works fine and will return "C:\\docs\\code" : 20,
> "C:\\docs\\code\\snippets" : 5, "C:\\docs\\code\\python" : 10 = (35)
>
> However it fails when I try to calculate the size of a directory such as
> "C:\\docs", as it also returns "C:\\docs123".
>
> I'd be very grateful if anyone could offer any advice on how to correct
> this.

Hello-

As you saw in your current approach, using strings for paths can be
problematic in a lot of scenarios. I've found it really useful to use a
higher-level abstraction instead, like what is provided by pathlib in
the standard library. You're obviously using Windows, and you didn't
mention your Python version, so I'll assume you're using something
current like 3.5.2 (at least 3.4 is required for the following code).

You could do something like the following:

"""
from pathlib import PureWindowsPath

# From your example
sizes_str_keys = {
    "C:\\docs": 10,
    "C:\\docs123": 200,
    "C:\\docs\\code\\snippets": 5,
    "C:\\docs\\code": 20,
    "C:\\docs\\pics": 200,
    "C:\\docs\\code\\python": 10,
}

# Same dict, but with Path objects as keys, and the same sizes as values.
# You would almost definitely want to use Path in your code (and adjust
# the 'pathlib' import appropriately), but I'm on a Linux system so I had
# to use a PureWindowsPath instead.
sizes_path_keys = {PureWindowsPath(p): s for (p, s) in
sizes_str_keys.items()}

def filter_paths(size_dict, top_level_directory):
    for path in size_dict:
        # Given some directory we're examining (e.g. c:\docs\code\snippets),
        # and top-level directory (e.g. c:\docs), we want to yield this
        # directory if it exactly matches (of course) or if the top-level
        # directory is a parent of what we're looking at:

        # >>>
pprint(list(PureWindowsPath("C:\\docs\\code\\snippets").parents))
        # [PureWindowsPath('C:/docs/code'),
        #  PureWindowsPath('C:/docs'),
        #  PureWindowsPath('C:/')]

        # so in that case we'll find 'c:\docs' in iterating over
path.parents.

        # You'll definitely want to remove the 'print' calls too:
        if path == top_level_directory or top_level_directory in
path.parents:
            print('Matched', path)
            yield path
        else:
            print('No match for', path)

def compute_subdir_size(size_dict, top_level_directory):
    total_size = 0
    for dir_key in filter_paths(size_dict, top_level_directory):
        total_size += size_dict[dir_key]
    return total_size
"""

Then you could call 'compute_subdir_size' like so:

"""
>>> compute_subdir_size(sizes_path_keys, PureWindowsPath(r'c:\docs'))
Matched C:\docs\code\snippets
No match for C:\docs123
Matched C:\docs\code\python
Matched C:\docs\pics
Matched C:\docs\code
Matched C:\docs
245
>>> compute_subdir_size(sizes_path_keys, PureWindowsPath(r'c:\docs\code'))
Matched C:\docs\code\snippets
No match for C:\docs123
Matched C:\docs\code\python
No match for C:\docs\pics
Matched C:\docs\code
No match for C:\docs
35
"""

MMR...


More information about the Tutor mailing list