[Tutor] Translating R Code to Python-- reading in csv files, writing out to csv files

Sun May 20 04:28:58 CEST 2012

Thanks Martin-- this is really great.  My major question now is that I need
to transition to Python for a project and I need to learn how to think in
Python instead of in R.  The two strategies I have used so far are: a)
going through the description and exercises in
http://www.openbookproject.net/thinkcs/python/english2e/ and b) trying to
convert my R code into Python.

On a high-level, do you have any other suggestions for how I could go about
becoming more proficient in Python?

Thanks again to you and everyone else who responded.  I am really very much
obliged.

Benjamin

On Sat, May 19, 2012 at 5:32 PM, Martin A. Brown <martin at linux-ip.net>wrote:

>
> Greetings Benjamin,
>
> To begin: I do not know R.
>
>  : I'm trying to improve my python by translating R code that I
>  : wrote into Python.
>  :
>  : *All I am trying to do is take in a specific column in
>  : "uncurated" and write that whole column as output to "curated."
>  : It should be a pretty basic command, I'm just not clear on how to
>  : execute it.*
>
> The hardest part about translation is learning how to think in a
> different language.  If you know any other human languages, you
> probably know that you can say things in some languages that do not
> translate particularly well (other than circumlocution) into another
> language.  Why am I starting with this?  I am starting here because
> you seem quite comfortable with thinking and operating in R, but you
> don't seem as comfortable yet with thinking and operating in Python.
>
> Naturally, that's why you are asking the Tutor list about this, so
> welcome to the right place!  Let's see if we can get you some help.
>
>  : As background, GSEXXXXX_full_pdata.csv has different patient
>  : information (such as unique patient ID's, whether the tissue used
>  : was tumor or normal, and other things. I'll just use the first
>  : two characteristics for now). Template.csv is a template we built
>  : that allows us to take different datasets and standardize them
>  : for meta-analysis.  So for example, "curated$alt_sample_name"
>  : refers to the unique patient ID, and "curated$sample_type" refers
>  : to the type of tissue used.
>
> I have fabricated some data after your description that looks like
> this:
>
>  patientID,title,sample_type
>  V6IF0OqVu,0.5788,70
>  GXj51ljB2,0.3449,88
>
> You, doubtless have more columns and the data here are probably
> nothing like yours, but consider it useful for illustrative purposes
> only.  (Illustrating porpoises!  How did they get here?  Next thing
> you know we will have illuminating egrets and animating
> dromedaries!)
>
>  : I've been reading about the python csv module and realized it was
>  : best to get some expert input to clarify some confusion on my
>  : part.
>
> The csv module is very useful and quite powerful for reading data in
> different ways and iterating over data sets.  Supposing you know the
> index of the column of interest to you...well this is quite trivial:
>
>  import csv
>  def main(f,field):
>      for row in csv.reader(f):
>          print row[0],row[field]
>
>  # -- lists/tuples are zero-based [0,1,2], so 2 is the third column
>  #
>  #
>  main(open('GSEXXXXX_full_pdata.csv'),2)
>
> OK, but if your data files have different numbers of or ordering of
> columns, then this can become a bit fragile.  So maybe you would
> want to learn how to use the csv.DictReader, which will give you the
> same thing but uses the first (header) line to name the columns, so
> then you could do something more like this:
>
>  import csv
>  def main(f,id,field):
>      for row in csv.DictReader(f):
>          print row[id],row[field]
>
>  main(open('GSEXXXXX_full_pdata.csv'),'patientID','sample_type')
>
> Would you like more detail on this?  Well, have a look at this nice
> little summary:
>
>  http://www.doughellmann.com/PyMOTW/csv/
>
> Now, that really is just giving you a glimpse of the csv module.
> This is not really your question.  Your question was more along the
> lines of 'How do I, in Python, accomplish this task that is quite
> simple in R?'
>
> You may find that list-comprehensions, generators and iterators are
> all helpful in mangling the data according to your nefarious will
> once you have used the csv module to load the data into a data
> structure.
>
> In point of fact, though, Python does not have this particular
> feature that you are seek...not in the core libraries, however.
>
> The lack of this capability has bothered a few people over the
> years, so there are a few different types of solutions.  You have
> already heard a reference to RPy (about which I know nothing):
>
>  http://rpy.sourceforge.net/
>
> There are, however, a few other tools that you may find quite
> useful.  One chap wanted access to some features of R that he used
> all the time along with many of the other convenient features of
> Python, so he decided to implement dataframes (an R concept?) in
> Python.  This idea was present at the genesis of the pandas library.
>
>  http://pandas.pydata.org/
>
> So, how would you do this with pandas?  Well, you could:
>
>  import pandas
>  def main(f,field):
>      uncurated = pandas.read_csv(f)
>      curated = uncurated[field]
>      print curated
>
>  main(open('GSEXXXXX_full_pdata.csv'),'sample_type')
>
> Note that pandas is geared to allow you to access your data by the
> 'handles', the unique identifier for the row and the column name.
> This will produce a tabular output of just the single column you
> want.  You may find that pandas affords you access to tools with
> which you are already intellectually familiar.
>
> Good luck,
>
> -Martin
>
> P.S. While I was writing this, you sent in some sample data that
>   looked tab-separated (well, anyway, not comma-separated).  The
>   csv and pandas libraries allow for delimiter='\t' options to
>   most object constructor calls.  So, you could do:
>     csv.reader(f,delimiter='\t')
>
> --
> Martin A. Brown
> http://linux-ip.net/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120519/9c6d70b3/attachment-0001.html>