[Tutor] problem with back slash

Wed Feb 23 16:41:36 EST 2022

On 23Feb2022 11:37, Alex Kleider <alexkleider at gmail.com> wrote:
>I've written myself a little utility that accepts a text file which
>might have very long lines and returns a file with the same text but
>(as much as possible) with the lines no longer than MAX_LEN
>characters. (I've chosen 70.)
>It seems to work except when the the source file contains back
>slashes! (Presence of a back slash appears to cause the program
>to go into an endless loop.)

There's _nothing_ in your code which cares about backslashes.

>I've tried converting to raw strings but to no avail.

I have no idea what you mean here - all the strings you're manipulating 
come from the text file. They're just "strings". A Python "raw string" 
is just a _syntactic_ way to express a string in a programme, eg:

    r'some regexp maybe \n foo'

After that's evaluated, it is just a string.

>Here's the code, followed by an example source file.

Thank you. This shows the bug. Here's me running it:

    [~/tmp/p1]fleet2*> py3 foo.py input.txt
    line: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
    '
    writing: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935'
    remaining: ''
    line: '
    '
    writing: ''
    remaining: ''
    line: 'You can activate Windows 10 using the product key for your hardware which
    '

    ^CTraceback (most recent call last):
      File "/Users/cameron/tmp/p1/foo.py", line 63, in <module>
        line2write, line = split_on_space_closest_to_max_len(
      File "/Users/cameron/tmp/p1/foo.py", line 47, in 
    split_on_space_closest_to_max_len
        if next_space > max_len: break
    KeyboardInterrupt

It hung just before the traceback, where I interrupted it with ^C.

Now, that tells me where in the code it was - the programme is not hung, 
it is spinning. When interrupted it was in this loop:

    while True:
        next_space = unindented_line.find(' ', i_space+1)
        if next_space > max_len: break
        else: i_space =next_space

On the face of that loop should always advance i_space and therefore 
exit. But find() can return -1:

    >>> help(str.find)
    Help on method_descriptor:

    find(...)
        S.find(sub[, start[, end]]) -> int

        Return the lowest index in S where substring sub is found,
        such that sub is contained within S[start:end].  Optional
        arguments start and end are interpreted as in slice notation.

        Return -1 on failure.

i.e. when there is no space from the search point onward. So this could 
spin out. Let's see with modified code:

    print("LOOP1")
    while True:
        assert ' ' in unindented_line[i_space+1:], (
            "no space in unindented_line[i_space(%d)+1:]: %r"
            % (i_space, unindented_line[i_space+1:])
        )
        next_space = unindented_line.find(' ', i_space+1)
        if next_space > max_len: break
        else: i_space =next_space
    print("LOOP1 DONE")

thus:

    [~/tmp/p1]fleet2*> py3 foo.py input.txt
    line: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
    '
    writing: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935'
    remaining: ''
    line: '
    '
    writing: ''
    remaining: ''
    line: 'You can activate Windows 10 using the product key for your hardware which
    '
    LOOP1
    Traceback (most recent call last):
      File "/Users/cameron/tmp/p1/foo.py", line 69, in <module>
        line2write, line = split_on_space_closest_to_max_len(
      File "/Users/cameron/tmp/p1/foo.py", line 47, in 
    split_on_space_closest_to_max_len
        assert ' ' in unindented_line[i_space+1:], (
    AssertionError: no space in unindented_line[i_space(67)+1:]: 'which'

As suspected. Commenting out the assert and printing next_space shows 
the cycle, with this code:

    print("LOOP1")
    while True:
        ##assert ' ' in unindented_line[i_space+1:], (
        ##    "no space in unindented_line[i_space(%d)+1:]: %r"
        ##    % (i_space, unindented_line[i_space+1:])
        ##)
        next_space = unindented_line.find(' ', i_space+1)
        print("next_space =", next_space)
        if next_space > max_len: break
        else: i_space =next_space
    print("LOOP1 DONE")

which outputs this:

    [~/tmp/p1]fleet2*> py3 foo.py input.txt 2>&1 | sed 50q
    line: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
    '
    writing: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935'
    remaining: ''
    line: '
    '
    writing: ''
    remaining: ''
    line: 'You can activate Windows 10 using the product key for your hardware which
    '
    LOOP1
    next_space = 7
    next_space = 16
    next_space = 24
    next_space = 27
    next_space = 33
    next_space = 37
    next_space = 45
    next_space = 49
    next_space = 53
    next_space = 58
    next_space = 67
    next_space = -1
    next_space = 3
    next_space = 7
    next_space = 16
    next_space = 24
    next_space = 27
    next_space = 33

and so on indefinitely. You can see next_space reset to -1.

Which I'm here, some random remarks about the code:

>    original_line = line[:]

There's no need for this. Because strings are immutable, you can just 
go:

    original_line = line

All the other operations on "line" return new strings (because strings 
are immutable), leaving original_line untouched.

>    unindented_line = line.lstrip()
>    n_leading_spaces = line_length - len(unindented_line)
>    if n_leading_spaces > max_len:  # big indentation!!!
>        return ('', line[max_len:])
>    indentation = ' ' * n_leading_spaces

Isn't this also unindented_line[:n_leading_spaces]? I would be inclined 
to use that in case the whitespace isn't just spaces (eg TABs).  Because 
"line.lstrip()" strips leading whitespace, not leading spaces. This 
would preserve whetever was there.

Howvere your code is focussed on the space character, so maybe a more 
precise lstip() would be better:

    line.lstrip(' ')

stripping only space characters.

[...]
>    while True:
>        next_space = unindented_line.find(' ', i_space+1)
>        if next_space > max_len: break
>        else: i_space =next_space

A lot of us might write this:

    while True:
        next_space = unindented_line.find(' ', i_space+1)
        if next_space > max_len:
            break
        i_space =next_space

dropping the "else:". It is just style, but to my eye it is more clear 
that the "i_space =next_space" is an "uncodnitional" part of the normal 
loop iteration.

>        for line in source:
>#           line = repr(line.rstrip())

The commented out line above would damage "line" (by adding quotes and 
stuff to it), if uncommented.

>            print("line: '{}'".format(line))  # for debugging

You know you can just write?

    print("line:", line)

Cheers,
Cameron Simpson <cs at cskk.id.au>