Changing strings in files

Cameron Simpson cs at cskk.id.au
Tue Nov 10 02:37:54 EST 2020


On 10Nov2020 07:24, Manfred Lotz <ml_news at posteo.de> wrote:
>I have a situation where in a directory tree I want to change a certain
>string in all files where that string occurs.
>
>My idea was to do
>
>- os.scandir and for each file

Use os.walk for trees. scandir does a single directory.

>   - check if a file is a text file

This requires reading the entire file. You want to check that it 
consists entirely of lines of text. In your expected text encoding - 
these days UTF-8 is the common default, but getting this correct is 
essential if you want to recognise text. So as a first cut, totally 
untested:

    for dirpath, filenames, dirnames in os.walk(top_dirpath):
        is_text = False
        try:
            # expect utf-8, fail if non-utf-8 bytes encountered
            with open(filename, encoding='utf-8', errors='strict') as f:
                for lineno, line in enumerate(f, 1):
                    ... other checks on each line of the file ...
                    if not line.endswith('\n'):
                        raise ValueError("line %d: no trailing newline" lineno)
                    if str.isprintable(line[:-1]):
                        raise ValueError("line %d: not all printable" % lineno)
                # if we get here all checks passed, consider the file to 
                # be text
                is_text = True
        except Exception as e:
            print(filename, "not text", e)
        if not is_text:
            print("skip", filename)
            continue

You could add all sorts of other checks. "text" is a loosely defined 
idea. But you could assert: all these lines decoded cleanly, so I can't 
do much damage rewriting them.

>   - if it is not a text file skip that file
>   - change the string as often as it occurs in that file

You could, above, gather up all the lines in the file in a list. If you 
get through, replace your string in the list and if anything was 
changed, rewrite the file from the list of lines.

>What is the best way to check if a file is a text file? In a script I
>could use the `file` command which is not ideal as I have to grep the
>result.

Not to mention relying on file, which (a) has a simple idea of text and 
(b) only looks at the start of each file, not the whole content. Very 
dodgy.

If you're really batch editing files, you could (a) put everything into 
a VCS (eg hg or git) so you can roll back changes or (b) work on a copy 
of your directory tree or (c) just print the "text" filenames to stdout 
and pipe that into GNU parallel, invoking "sed -i.bak s/this/that/g" to 
batch edit the checked files, keeping a backup.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list