Changing strings in files

Cameron Simpson cs at cskk.id.au
Tue Nov 10 17:55:26 EST 2020


On 11Nov2020 07:25, Chris Angelico <rosuav at gmail.com> wrote:
>If the main job of the program, as in this situation, is to read the
>entire file, I would probably have it read in the first 1KB or 16KB or
>thereabouts, see if that has any NUL bytes, and if not, proceed to
>read in the rest of the file. But depending on the situation, I might
>actually have a hard limit on the file size (say, "any file over 1GB
>isn't what I'm looking for"), so that would reduce the risks too.

You could shoehorn my suggested code for this efficiently.

It had a loop body like this:

         is_text = False
         try:
             # expect utf-8, fail if non-utf-8 bytes encountered
             with open(filename, encoding='utf-8', errors='strict') as f:
                 for lineno, line in enumerate(f, 1):
                     ... other checks on each line of the file ...
                     if not line.endswith('\n'):
                         raise ValueError("line %d: no trailing newline" lineno)
                     if str.isprintable(line[:-1]):
                         raise ValueError("line %d: not all printable" % lineno)
                 # if we get here all checks passed, consider the file 
                 # to
                 # be text
                 is_text = True
         except Exception as e:
             print(filename, "not text", e)
         if not is_text:
             print("skip", filename)
             continue

which scans the entire file to see if it is all text (criteria to be 
changed to suit the user, but I was going for clean strict utf-8 decode, 
all chars "printable").  Since we're doing that, we could accumulate the 
lines as we went and make the replacement in memory. If we get all the 
way out the bottom, rewrite the file.

If memory is a concern, we could copy modified lines to a temporary 
file, and copy back if everything was good (or not if we make no 
replacements).

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list