[Python-es] extraer secuencias no adyacentes

Medardo Rodriguez (Merchise Group) med.swl en gmail.com
Dom Abr 18 16:25:53 CEST 2010


On 4/17/10, Antonio Reyes <areyespgil en gmail.com> wrote:
> secuencias no sean adyacentes, ...
>
> alguno de ustedes sabe cómo podría hacerlo?

<code>
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#----------------------------------------------------------------------
# Copyright (c) 2010 Medardo Rodriguez (Merchise)
#
# This is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License (GPL) as published by the
# Free Software Foundation;  either version 2  of  the  License, or (at
# your option) any later version.

from __future__ import with_statement

import sys
from re import compile as _regex_compile

STEP = 3    # define el salto a usar

regex = _regex_compile(r'\W*(\w+)')

def word_sequences(words, step=1):
    count = max(step, 1)
    i = 0
    while i < len(words) - step:
        yield (words[i], words[i+step])
        i += 1

def file_words(fname):
    with file(fname, 'r') as f:
        return regex.findall(f.read())

if __name__ == '__main__':
    fname = __file__ if len(sys.argv) <= 1 else sys.argv[1]
    print 'Leyendo de:', fname
    words = file_words(fname)
    print 'El archivo contiene', len(words), 'palabras.'
    seqs = list(word_sequences(words, step=STEP))
    print 'Lista de %s sequencias:\n%s' % (len(seqs), seqs)
</code>

Saludos



Más información sobre la lista de distribución Python-es