get each pair from a string.

Joshua Landau joshua.landau.ws at gmail.com
Sun Oct 21 16:50:42 EDT 2012


On 21 October 2012 19:33, Vincent Davis <vincent at vincentdavis.net> wrote:

> I am looking for a good way to get every pair from a string. For example,
> input:
> x = 'apple'
> output
> 'ap'
> 'pp'
> 'pl'
> 'le'
>
> I am not seeing a obvious way to do this without multiple for loops, but
> maybe there is not :-)
> In the end I am going to what to get triples, quads....... also.
>

The best way for *sliceable* objects is probably your way. However, not all
items can be sliced.

One way is this:

Let us say you have a string:

> >>> my_string = "abcdefghijklmnopqrstuvwxyz"
> >>> my_string
> 'abcdefghijklmnopqrstuvwxyz'


If you are to "zip" that, you get a "zip object"

>  >>> zip(my_string)
> <zip object at 0x1b67f38>


So you want to turn it back into a list:

> >>> list(zip(my_string))
> [('a',), ('b',), ('c',), ('d',), ('e',), ('f',), ('g',), ('h',), ('i',),
> ('j',), ('k',), ('l',), ('m',), ('n',), ('o',), ('p',), ('q',), ('r',),
> ('s',), ('t',), ('u',), ('v',), ('w',), ('x',), ('y',), ('z',)]


So why would you want "zip" anyway? Let us see what it does with two inputs.

> >>> list(zip(my_string, my_string))
> [('a', 'a'), ('b', 'b'), ('c', 'c'), ('d', 'd'), ('e', 'e'), ('f', 'f'),
> ('g', 'g'), ('h', 'h'), ('i', 'i'), ('j', 'j'), ('k', 'k'), ('l', 'l'),
> ('m', 'm'), ('n', 'n'), ('o', 'o'), ('p', 'p'), ('q', 'q'), ('r', 'r'),
> ('s', 's'), ('t', 't'), ('u', 'u'), ('v', 'v'), ('w', 'w'), ('x', 'x'),
> ('y', 'y'), ('z', 'z')]


I see. It goes over the first and takes an item, then over the second and
takes an item, and then puts them together. It then does this for all the
items in each.

All we want to do is offset the second item:

> >>> list(zip(my_string, my_string[1:]))
> [('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'f'), ('f', 'g'),
> ('g', 'h'), ('h', 'i'), ('i', 'j'), ('j', 'k'), ('k', 'l'), ('l', 'm'),
> ('m', 'n'), ('n', 'o'), ('o', 'p'), ('p', 'q'), ('q', 'r'), ('r', 's'),
> ('s', 't'), ('t', 'u'), ('u', 'v'), ('v', 'w'), ('w', 'x'), ('x', 'y'),
> ('y', 'z')]


And then convert the results to single strings:

> >>> ["".join(strs) for strs in zip(my_string, my_string[1:])]
> ['ab', 'bc', 'cd', 'de', 'ef', 'fg', 'gh', 'hi', 'ij', 'jk', 'kl', 'lm',
> 'mn', 'no', 'op', 'pq', 'qr', 'rs', 'st', 'tu', 'uv', 'vw', 'wx', 'xy',
> 'yz']


And this can be generalised in a more complicated way:

> >>> ["".join(strs) for strs in zip(*[my_string[n:] for n in range(4)])]
> ['abcd', 'bcde', 'cdef', 'defg', 'efgh', 'fghi', 'ghij', 'hijk', 'ijkl',
> 'jklm', 'klmn', 'lmno', 'mnop', 'nopq', 'opqr', 'pqrs', 'qrst', 'rstu',
> 'stuv', 'tuvw', 'uvwx', 'vwxy', 'wxyz']


Which can work with iterables:

> >>> from itertools import islice
> >>> ["".join(strs) for strs in zip(*[islice(my_string, n, None) for n in
> range(4)])]
> ['abcd', 'bcde', 'cdef', 'defg', 'efgh', 'fghi', 'ghij', 'hijk', 'ijkl',
> 'jklm', 'klmn', 'lmno', 'mnop', 'nopq', 'opqr', 'pqrs', 'qrst', 'rstu',
> 'stuv', 'tuvw', 'uvwx', 'vwxy', 'wxyz']


This will be much *faster* for short sequences made from massive strings
and much *slower* for long sequences made from medium-sized strings.
The first method *slices*, which copies a part of the whole string. This is
slow for large copies.
The second method* loops until it reaches the start*. This is slow when the
start is a long way in.

However, if you want to use slice-able items, the best way is the one
you've already worked out, as it does no extra copying or looping.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20121021/66c7acf0/attachment.html>


More information about the Python-list mailing list