Grouping function for string module?

Tue Sep 14 05:27:08 EDT 1999

Hello all,

I started the following thread below with a simple request 
for opinions about the usefulness of a string "grouping" 
function in the string module. 

Unfortunately, I did it on the wrong mailing list... so, now, 
let's go public...

Regards,

Dinu

-- 
Dinu C. Gherman

................................................................
"An average of more than 15 % of adults in 12 industrialized 
countries are functionally illiterate; in Ireland, the United 
Kingdom and the United States, the rates are over 20 %."

  (The State of the World's Children 1999,
   UNICEF, http://www.unicef.org/sowc99)

------------------------------------------------------------

"Dinu C. Gherman" wrote:
> 
> Hello,
> 
> I wonder if a grouping function would be considered useful
> by more than a few people? I needed it already several times
> (and implemented it) and was always surprised it "wasn't
> there" as it seems quite a natural thing to me.
> 
> I think of something like this, without giving my own current
> implementation (with a dual function for left-grouping, like
> with strip or an additional parameter for a function doing
> left and right grouping):
> 
> def rgroup(str, size, sep=' '):
>     """Group a string into blocks (starting at the right).
> 
>     e.g. rgroup('11000', 4) ==> '1 1000'
>     """
> 
>     return str   # dummy implementation...
> 
> Any comments?
> 
> Dinu

------------------------------------------------------------

Jeff Pinyan wrote:
> 
> > Any comments?
> 
> Coming from a perl background, it sounds vaguely like a functionality of
> the pack() and unpack() functions.
> 
> Just as an exercise, I wrote up an rgroup().  Not a bad concept, Dinu.

------------------------------------------------------------

Greg Stein wrote:
> 
> I don't see much utility other than for inserting commas/periods in a
> monetary value. The function seems overly specialized. Worse: the
> parameters would expand unreasonably; people will say "but I don't want
> a space" or "I want it to count from the other side" -- two more params
> all of a sudden. Or skip the latter param and bring it Yet Another
> Function.
> 
> What other uses does this apply to? Why should this be part of the
> standard library? (other than for parity with Perl's function)
> 
> Cheers,
> -g

------------------------------------------------------------

Tim Peters wrote:
> 
> I don't know of any string language that offers this as a primitive, so am
> not surprised at its absence in Python.  struct.unpack can easily be twisted
> toward this end, and is much more flexible in catering to an arbitrary mix
> of "field widths" too.
> 
> import string
> import struct
> 
> def rgroup(str, size, sep=' '):
>     """Group a string into blocks (starting at the right).
> 
>     e.g. rgroup('11000', 4) ==> '1 1000'
>     """
> 
>     whole, leftover = divmod(len(str), size)
>     fmt = (`size` + "s") * whole
>     if leftover:
>         fmt = `leftover` + "s" + fmt
>     return string.join(struct.unpack(fmt, str), sep)
> 
> [...]
> 
> naggingly y'rs  - tim

------------------------------------------------------------

"Dinu C. Gherman" wrote:
> 
> Greg Stein wrote:
> >
> > I don't see much utility other than for inserting commas/periods in a
> > monetary value. The function seems overly specialized. Worse: the
> > parameters would expand unreasonably; people will say "but I don't want
> > a space" or "I want it to count from the other side" -- two more params
> > all of a sudden. Or skip the latter param and bring it Yet Another
> > Function.
> 
> I needed it for formatting binary numbers expressed as Python
> strings. And it has a rather clearly defined job, that of group-
> ing a string into blocks of a given size with an optional sepe-
> rator. Having lstrip, rstrip and strip could also be regarded as
> an inflation of functions, couldn't it? After all, rstrip and
> strip are just that (with lstrip given below):
> 
> def rstrip(s):
>     r = map(None, s)
>     r.reverse()
>     r = lstrip(r) # r being a list now!
>     r.reverse()
>     return string.join(r, '')
> 
> def strip(s):
>     return lstrip(rstrip(s))
> 
> > What other uses does this apply to? Why should this be part of the
> > standard library? (other than for parity with Perl's function)
> 
> Don't get me wrong I'm not imposing this idea on you, I just
> asked for opinions! Mine is that lgroup/rgroup would be as
> useful/convenient a pair in the standard string module as
> lstrip/rstrip.
> 
> Ok, let me ask bluntly, what good applications are there for
> a function like lstrip? Why do we need it in a standard lib
> if you can write it down in a few lines like this (assuming
> string was imported before):
> 
> def lstrip(s):
>     if not s: return s
>     r = s[:]
>     while 1:
>         if r[0] in string.whitespace: r = r[1:]
>         else: return r
> 
> Ok, let me give you another application, not to persuade
> you, but to be constructive. You could do things like this
> more easily:
> 
> >>> from string import lgroup
> >>> s = '0001101001011100'
> >>> for b in split(lgroup(s, 4, ' '), ' '):
> ...     print b
> ...
> 0001
> 1010
> 0101
> 1100
> >>>
> 
> Then, although I don't care much about Perl, it seems that
> its users seem to use/need/appreciate... such a function,
> simply because it's there - which means nothing, ok, except
> that for the same reason they can also just ignore it.
> 
> Regards,
> 
> Dinu

------------------------------------------------------------

"Dinu C. Gherman" wrote:
> 
> Tim Peters wrote:
> >
> > I don't know of any string language that offers this as a primitive, so am
> > not surprised at its absence in Python.  struct.unpack can easily be twisted
> > toward this end, and is much more flexible in catering to an arbitrary mix
> > of "field widths" too.
> 
> True, variable block sizes would add more flexibility, but
> then you get closer to a real parsing function, which would
> be at least some overkill for the string module, perhaps...
> 
> [...]
> 
> Dinu

------------------------------------------------------------

Michael Muller wrote:
> 
> I've been following this thread and it seems to me that a more generalized
> solution to this problem would be to overload string.split() so that the
> second parameter could be an integer indicating the maximum width of each
> substring.  So for example:
> 
>    string.split('12345678', 3)  =>  ['123', '456', '78']
> 
> Solving the problem identified here then becomes a 2-step process:
> 
>    string.join(string.split(someString, width), seperator)

------------------------------------------------------------

"M.-A. Lemburg" wrote:
> 
> Michael Muller wrote:
> >
> > I've been following this thread and it seems to me that a more generalized
> > solution to this problem would be to overload string.split() so that the
> > second parameter could be an integer indicating the maximum width of each
> > substring.  So for example:
> >
> >    string.split('12345678', 3)  =>  ['123', '456', '78']
> >
> > Solving the problem identified here then becomes a 2-step process:
> >
> >    string.join(string.split(someString, width), seperator)
> 
> That won't work since split() already has up to 3 arguments. How
> about adding two new functions to cut strings into even parts,
> e.g. cut and rcut:
> 
> cut(string,snippet_length)
>         Returns a list of substrings generated by splitting string
>         at even intervals of the given length. The last entry
>         may have less characters.
> 
> rcut(...)
>         Just like cut() except that it works from right to left.
> 
> Would be a useful addition to produce human readable output
> or to format a long HEX string into mulitple lines. Anyway,
> I will probably have something like this in mxTextTools sooner or
> later...
> 
> --
> Marc-Andre Lemburg

------------------------------------------------------------

Skip Montanaro wrote:
> 
>     Michael> I've been following this thread and it seems to me that a more
>     Michael> generalized solution to this problem would be to overload
>     Michael> string.split() so that the second parameter could be an integer
>     Michael> indicating the maximum width of each substring.  So for
>     Michael> example:
> 
>     Michael>    string.split('12345678', 3)  =>  ['123', '456', '78']
> 
>     Michael> Solving the problem identified here then becomes a 2-step process:
> 
>     Michael>    string.join(string.split(someString, width), seperator)
> 
> string.split already takes two other option parameters (string to split on
> and max number of splits).  Why not just use re.split:
> 
>     >>> l = re.split("(...)", string.lowercase)
>     >>> map(l.remove, [""]*l.count(""))
>     [None, None, None, None, None, None, None, None]
>     >>> l
>     ['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx', 'yz']
> 
> ?
> 
> Skip

------------------------------------------------------------

Skip Montanaro wrote:
> 
> Actually, the more I think about it, why not encapsulate it in a function?
> 
>     import re
>     def splitn(s, n):
>         """split string s into chunks no more than n characters long"""
>         l = re.split("(.{%d,%d})" % (n,n), s)
>         map(l.remove, [""]*l.count(""))
>         return l
> 
> Skip

------------------------------------------------------------

Guido van Rossum wrote:
> 
> > Actually, the more I think about it, why not encapsulate it in a function?
> >
> >     import re
> >     def splitn(s, n):
> >         """split string s into chunks no more than n characters long"""
> >         l = re.split("(.{%d,%d})" % (n,n), s)
> >         map(l.remove, [""]*l.count(""))
> >         return l
> 
> Can we please stop this silly thread?  This will never become a
> standard function as long as I am in charge.
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)

------------------------------------------------------------

Ken Manheimer wrote:
> 
> I can't resist:
> 
> > -----Original Message-----
> > From: Skip Montanaro [mailto:skip at mojam.com]
> > Sent: Thursday, September 09, 1999 11:07 AM
> > [...]
> >     >>> l = re.split("(...)", string.lowercase)
> >     >>> map(l.remove, [""]*l.count(""))
> >     [None, None, None, None, None, None, None, None]
> >     >>> l
> >     ['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx', 'yz']
> 
> Here's a case where filter() is your friend - instead of the map
> expression:
> 
> >>> l = filter(None, l)
> 
> or, composing it all:
> 
> >>> l = filter(None, re.split("(...)", string.lowercase))
> 
> (I only mention this because it seems like the filter expr is a lot less
> complicated than the map you constructed.  I have no feel for the
> performance implications of this construct w.r.t., eg, an explicit loop,
> though - nor enough concern to investigate, i must say:-)
> 
> Ken

------------------------------------------------------------

Perry Stoll wrote:
> 
>     I've been following this thread and it seems to me that a more generalized
>     solution to this problem would be to overload string.split() so that the
>     second parameter could be an integer indicating the maximum width of each
>     substring.  So for example:
> 
>        string.split('12345678', 3)  =>  ['123', '456', '78']
> 
> More generalized? How about something that operate on sequence objects,
> something like the following.
> 
> -Perry
> 
> def regroup(seq, size):
>     """Return a list of sequences of length SIZE which are
> subsequences of input IN.
> 
> LIST out = regroup( SEQUENCE in , INT size)
> 
> So, for i < len(IN) % size:
> 
>    out[i] = in[ (i * size) : (i + 1) * size ]
> 
> Note, OUT[-1] will be shorter than SIZE iff len(IN) % size != 0
> 
>   >>> regroup('Pythonistas Unite!', 2)
>   ['Py', 'th', 'on', 'is', 'ta', 's ', 'Un', 'it', 'e!']
> 
>   >>> regroup(range(1,10), 2)
>   [[1, 2], [3, 4], [5, 6], [7, 8], [9]]
> 
> """
>     out = []
>     outlen = len(seq) / size
>     save = out.append                   # shortcut function lookup in loop
>     end = 0                             # initialize for first loop
>     for i in range(outlen):
>         start = end                     # start where we left off last time
>         end = end + size                # compute new end point
>         save( seq[ start : end ] )      # save the sub sequence
>     if len(seq) % size != 0:            # if there is any remaining
>         save( seq[ end : ] )             #  grab everything remaining
>     return out

------------------------------------------------------------

Skip Montanaro wrote:
> 
>     Skip> Actually, the more I think about it, why not encapsulate it in a
>     Skip> function?
>         ...
>     Guido> Can we please stop this silly thread?  This will never become a
>     Guido> standard function as long as I am in charge.
> 
> I don't recall proposing it as a standard function.  Others proposed
> extending string.split.  I merely pointed out that re.split+map (or
> re.split+filter as Ken explained) already does what people asked for.
> Encapsulating it as a function instead of always having two somewhat
> mystical lines of code seemed to make sense.
>
> [...]
> 
> Skip

------------------------------------------------------------

"Barry A. Warsaw" wrote:
> 
> >>>>> "Guido" == Guido van Rossum <guido at cnri.reston.va.us> writes:
> 
>     Guido> This will never become a standard function as long as I am
>     Guido> in charge.
> 
> You should add a "Mr. Bond" and maniacal laugh when you say that. :)
> 
> -Barry

------------------------------------------------------------

"Dinu C. Gherman" wrote:
> 
> Guido van Rossum wrote:
> >
> > Can we please stop this silly thread?  This will never become a
> > standard function as long as I am in charge.
> 
> Bondish or not ;-), I agree this is just about the wrong place to
> discuss, so please do ME a favour and stop contributing to this
> thread, be it silly or not!
> 
> I take all the blame on me for starting it here. Yes, I'm guilty,
> mea culpa, mea maxima culpa! It will never happen again! But a
> good overview of all Python mailing lists and their charters
> would, perhaps, be indeed an idea, if it's not already there.
> 
> As some sort of self-punishment I will wrap-up the contributions
> so far and post them to c.l.p., so we can all piecefully shake
> and enjoy our Martinis again...
> 
> Cheers,
> 
> Dinu