[Baypiggies] Hiring / Bioinformatics Tutor/Hack day

Thu May 13 19:48:40 CEST 2010

I am interested in this tutor/hack day.

On Wed, May 12, 2010 at 5:21 PM, Glen Jarvis <glen at glenjarvis.com> wrote:

> This email covers two topics (although they can be, but don't have to be,
> inter-related):
>
> * A job opening
> * A tutor/hack day to give the computer scientists a real Bioinformatics
> problem to solve
>
> I put them together as a benefit for those who may be considering a job in
> this field. You can have a day to work on these types of problems to see if
> it interests you or bores you to tears..
>
>
> === Job Opening ===
> Some time back, I sent out an email regarding my bioinformatics lab hiring
> a programmer. I tried to give a feel for what work would be like on a daily
> basis. And, I tried to set your expectation for pay (less than industry).
>
> We still have that job opening -- probably because I set your expectation
> so well  :( .
>
> I was intentionally not involved in the interviewing/hiring process because
> I wanted to have no appearance of impropriety (as I was also interviewing
> for a position to move from contractor to full time employee). So, if you
> weren't hired, I don't really know why.... I intentionally stayed out of
> that loop to keep as professional as possible. I only know the position is
> still open.
>
> With that said, my boss is talking about hiring another programmer again
> for a short term (possibly a year or less).  Although, if it works out on
> both sides, it could turn into a permanent position (as it was for me - I
> was hired full time). Finding a fit for this position is actually difficult
> (on both sides).
>
> Sooooooo......  I'm going to stick my neck out and try something new:
> Working on a small bioinformatics problem in an open source environment.
>
>
> === Tutor/Hack day ===
> I've been wanting to get the open source community more involved with some
> of the problems that we're tackling. Open Source code is *so* much better
> than code reviewed by only a few eyes. And, this would also give everyone a
> chance to see what a problem would be like.
>
> There are some *real* bioinformaticians on this list (I don't yet consider
> myself on that level yet -- although I'm getting there). So, if you're a
> real bioinformatician, this may be a trivial problem for you. But, if you
> want to come and help explain things/help others work this out, that'd be
> cool!
>
> I'd like to get together (on a weekend, possibly) and hack on this problem.
> I will describe the things that I think you need to know:
>
> * What is FASTA format (http://www.ncbi.nlm.nih.gov/blast/fasta.shtml)
> * An brief introduction to BioPython (http://biopython.org/)
> * What is a genome
> * What is a gene
> * What are amino acids (contrasting against DNA data)
> * What is a 'percent identity' between genes
> * What is a species
> * What is a strain (loosely defined because it seems to be very loose in
> this problem)
> * The term taxa (plural) and taxon (singular)
> * How can genes vary and still be the same gene
> * How errors can exist in different databases
> * An introduction to the JGI (http://www.jgi.doe.gov/) database
> * An introduction to the UniProt (http://www.uniprot.org/)
>
>
> With this introduction, you should have a theoretical understanding of all
> that you need to solve this problem -- the rest is coding. (That is, if I do
> my job and explain things well -- and don't fall into pot holes of
> information that I don't know).... Also, I over simplified things that you
> don't need to know for this problem (e.g., We won't talk about open reading
> frames at all or what that means. Since we're already given amino acids, we
> don't care).
>
> The problem is:
>
> I will give you a file in FASTA format of the genes for a particular
> species (let's say: Chlamydophila pneumoniae). That file will contain a list
> of genes, one after the other, again in FASTA format. The file will have the
> JGI unique identifiers. However, we also want the UniProt identifier for
> this same gene.
>
> Now, this should be as simple as: "Take the gene from the JGI database,
> look-up the same gene in UniProt, record the number, dust off your hands -
> you're done" -- There are lots of little tedious problems, however, that
> keep it from being this easy.
>
> For example, if two genes are absolutely identical (they have the same
> amino acid sequence) except for in a single position, are they actually
> identical? What if the sequence found was in a strain instead of from the
> original exact species?
>
> Let me ask another question: If you were to somehow magically sequence your
> personal entire genome (everything - not just genes) from a cell in your toe
> and also sequence your entire genome from a cell from your nose, would they
> be identical?  I bet not... I'll explain why. Now, we expect less
> differences in actual genes (not in other parts of your genome), but even
> then, there can be some variation...
>
> These are the types of questions/problems that we'll be getting into if
> you're so interested...
>
> Who's up for this?  We'll get date and time once we have a set of
> interested people...
>
> You don't have to be interested in this job to be interested in this
> problem (and/or to do more in bioinformatics).
>
>
> Cheers,
>
>
>
> Glen
> --
> Whatever you can do or imagine, begin it;
> boldness has beauty, magic, and power in it.
>
> -- Goethe
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20100513/fd9c684f/attachment.html>