[Baypiggies] Hiring / Bioinformatics Tutor/Hack day

Mon May 17 09:44:23 CEST 2010

+1, sounds interesting

On 5/12/2010 5:21 PM, Glen Jarvis wrote:
> This email covers two topics (although they can be, but don't have to 
> be, inter-related):
>
> * A job opening
> * A tutor/hack day to give the computer scientists a real 
> Bioinformatics problem to solve
>
> I put them together as a benefit for those who may be considering a 
> job in this field. You can have a day to work on these types of 
> problems to see if it interests you or bores you to tears..
>
>
> === Job Opening ===
> Some time back, I sent out an email regarding my bioinformatics lab 
> hiring a programmer. I tried to give a feel for what work would be 
> like on a daily basis. And, I tried to set your expectation for pay 
> (less than industry).
>
> We still have that job opening -- probably because I set your 
> expectation so well  :( .
>
> I was intentionally not involved in the interviewing/hiring process 
> because I wanted to have no appearance of impropriety (as I was also 
> interviewing for a position to move from contractor to full time 
> employee). So, if you weren't hired, I don't really know why.... I 
> intentionally stayed out of that loop to keep as professional as 
> possible. I only know the position is still open.
>
> With that said, my boss is talking about hiring another programmer 
> again for a short term (possibly a year or less).  Although, if it 
> works out on both sides, it could turn into a permanent position (as 
> it was for me - I was hired full time). Finding a fit for this 
> position is actually difficult (on both sides).
>
> Sooooooo......  I'm going to stick my neck out and try something new: 
> Working on a small bioinformatics problem in an open source environment.
>
>
> === Tutor/Hack day ===
> I've been wanting to get the open source community more involved with 
> some of the problems that we're tackling. Open Source code is *so* 
> much better than code reviewed by only a few eyes. And, this would 
> also give everyone a chance to see what a problem would be like.
>
> There are some *real* bioinformaticians on this list (I don't yet 
> consider myself on that level yet -- although I'm getting there). So, 
> if you're a real bioinformatician, this may be a trivial problem for 
> you. But, if you want to come and help explain things/help others work 
> this out, that'd be cool!
>
> I'd like to get together (on a weekend, possibly) and hack on this 
> problem. I will describe the things that I think you need to know:
>
> * What is FASTA format (http://www.ncbi.nlm.nih.gov/blast/fasta.shtml)
> * An brief introduction to BioPython (http://biopython.org/)
> * What is a genome
> * What is a gene
> * What are amino acids (contrasting against DNA data)
> * What is a 'percent identity' between genes
> * What is a species
> * What is a strain (loosely defined because it seems to be very loose 
> in this problem)
> * The term taxa (plural) and taxon (singular)
> * How can genes vary and still be the same gene
> * How errors can exist in different databases
> * An introduction to the JGI (http://www.jgi.doe.gov/) database
> * An introduction to the UniProt (http://www.uniprot.org/)
>
>
> With this introduction, you should have a theoretical understanding of 
> all that you need to solve this problem -- the rest is coding. (That 
> is, if I do my job and explain things well -- and don't fall into pot 
> holes of information that I don't know).... Also, I over simplified 
> things that you don't need to know for this problem (e.g., We won't 
> talk about open reading frames at all or what that means. Since we're 
> already given amino acids, we don't care).
>
> The problem is:
>
> I will give you a file in FASTA format of the genes for a particular 
> species (let's say: Chlamydophila pneumoniae). That file will contain 
> a list of genes, one after the other, again in FASTA format. The file 
> will have the JGI unique identifiers. However, we also want the 
> UniProt identifier for this same gene.
>
> Now, this should be as simple as: "Take the gene from the JGI 
> database, look-up the same gene in UniProt, record the number, dust 
> off your hands - you're done" -- There are lots of little tedious 
> problems, however, that keep it from being this easy.
>
> For example, if two genes are absolutely identical (they have the same 
> amino acid sequence) except for in a single position, are they 
> actually identical? What if the sequence found was in a strain instead 
> of from the original exact species?
>
> Let me ask another question: If you were to somehow magically sequence 
> your personal entire genome (everything - not just genes) from a cell 
> in your toe and also sequence your entire genome from a cell from your 
> nose, would they be identical?  I bet not... I'll explain why. Now, we 
> expect less differences in actual genes (not in other parts of your 
> genome), but even then, there can be some variation...
>
> These are the types of questions/problems that we'll be getting into 
> if you're so interested...
>
> Who's up for this?  We'll get date and time once we have a set of 
> interested people...
>
> You don't have to be interested in this job to be interested in this 
> problem (and/or to do more in bioinformatics).
>
>
> Cheers,
>
>
>
> Glen
> -- 
> Whatever you can do or imagine, begin it;
> boldness has beauty, magic, and power in it.
>
> -- Goethe
>
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20100517/7553298e/attachment.html>