textwrap.dedent replaces tabs?

Sat Dec 23 14:10:33 EST 2006

Frederic Rentsch wrote:

> Following a call to dedent () it shouldn't be hard to translate leading 
> groups of so many spaces back to tabs.

Sure, but the point is more that I don't think it's valid to change to
tabs in the first place.

E.g.:

 input = ' ' + '\t' + 'hello\n' +
         '\t' + 'world'

 output = textwrap.dedent(input)

will yield all of the leading whitespace stripped, which IMHO is a
violation of its stated function.  In this case, nothing should be
stripped, because the leading whitespace in these two lines does not
/actually/ match.  Sure, it visually matches, but that's not the point
(although I can understand that that's a point of contention in the
interpreter anyway, I would have no problem with it not accepting "1 tab
= 8 spaces" for indentation...  But that's another holy war.

> If I understand your problem, you want to restore the dedented line to 
> its original composition if spaces and tabs are mixed and this doesn't 
> work because the information doesn't survive dedent ().

Sure, although would there be a case to be made to simply not strip the
tabs in the first place?

Like this, keeping current functionality and everything...  (although I
would think if someone wanted tabs expanded, they'd call expandtabs on
the input before calling the function!):

def dedent(text, expand_tabs=True):
    """dedent(text : string, expand_tabs : bool) -> string

    Remove any whitespace than can be uniformly removed from the left
    of every line in `text`, optionally expanding tabs before altering
    the text.

    This can be used e.g. to make triple-quoted strings line up with
    the left edge of screen/whatever, while still presenting it in the
    source code in indented form.

    For example:

        def test():
            # end first line with \ to avoid the empty line!
            s = '''\
             hello
            \t  world
            '''
            print repr(s)     # prints '     hello\n    \t  world\n    '
            print repr(dedent(s))  # prints ' hello\n\t  world\n'
    """
    if expand_tabs:
        text = text.expandtabs()
    lines = text.split('\n')

    margin = None
    for line in lines:
        if margin is None:
            content = line.lstrip()
            if not content:
                continue
            indent = len(line) - len(content)
            margin = line[:indent]
        elif not line.startswith(margin):
            if len(line) < len(margin):
                content = line.lstrip()
                if not content:
                    continue
            while not line.startswith(margin):
                margin = margin[:-1]

    if margin is not None and len(margin) > 0:
        margin = len(margin)
        for i in range(len(lines)):
            lines[i] = lines[i][margin:]

    return '\n'.join(lines)

import unittest

class DedentTest(unittest.TestCase):
    def testBasicWithSpaces(self):
        input = "\n   Hello\n      World"
        expected = "\nHello\n   World"
        self.failUnlessEqual(expected, dedent(input))

    def testBasicWithTabLeadersSpacesInside(self):
        input = "\n\tHello\n\t   World"
        expected = "\nHello\n   World"
        self.failUnlessEqual(expected, dedent(input, False))

    def testAllTabs(self):
        input = "\t\tHello\n\tWorld"
        expected = "\tHello\nWorld"
        self.failUnlessEqual(expected, dedent(input, False))

    def testFirstLineNotIndented(self):
        input = "Hello\n\tWorld"
        expected = input
        self.failUnlessEqual(expected, dedent(input, False))

    def testMixedTabsAndSpaces(self):
        input = "  \t Hello\n   \tWorld"
        expected = "\t Hello\n \tWorld"
        self.failUnlessEqual(expected, dedent(input, False))

if __name__ == '__main__':
    unittest.main()
-tom!

--