Line segments, overlap, and bits

Istvan Albert istvan.albert at gmail.com
Thu Mar 27 11:54:58 EDT 2008


On Mar 26, 5:28 pm, Sean Davis <seand... at gmail.com> wrote:
> I am working with genomic data.  Basically, it consists of many tuples
> of (start,end) on a line.  I would like to convert these tuples of
> (start,end) to a string of bits where a bit is 1 if it is covered by
> any of the regions described by the (start,end) tuples and 0 if it is
> not.  I then want to do set operations on multiple bit strings (AND,
> OR, NOT, etc.).  Any suggestions on how to (1) set up the bit string
> and (2) operate on 1 or more of them?  Java has a BitSet class that
> keeps this kind of thing pretty clean and high-level, but I haven't
> seen anything like it for python.

The solution depends on what size of genomes you want to work with.

There is a bitvector class that probably could do what you want, there
are some issues on scaling as it is pure python.

http://cobweb.ecn.purdue.edu/~kak/dist/BitVector-1.2.html

If you want high speed stuff (implemented in C and PyRex) that works
for large scale genomic data analysis the bx-python package might do
what you need (and even things that you don't yet know that you really
want to do)

http://bx-python.trac.bx.psu.edu/

but of course this one is a lot more complicated

i.



More information about the Python-list mailing list