[Tutor] Get a single random sample

Fri Sep 9 14:49:09 CEST 2011

kitty wrote:

> Hi,
> 
> I'm new to python and I have read through the tutorial on:
> http://docs.python.org/tutorial/index.html
> which was really good, but I have been an R user for 7 years and and am
> finding it difficult to do even basic things in python, for example I want
> to import my data (a tab-delimited .txt file) so that I can index and
> select a random sample of one column based on another column. my data has
> 2 columns named 'area' and 'change.dens'.
> 
> In R I would just
> 
> data<-read.table("FILE PATH\\Road.density.municipio.all.txt", header=T)
> #header =T gives colums their headings so that I can call each
> #individually
> names(data)
> attach(data)
> 
> Then to Index I would simply:
> subset<-change.dens[area<2000&area>700] # so return change.dens values
> that have corresponding 'area's of between 700 and 2000
> 
> then to randomly sample a value from that I just need to
> random<-sample(subset,1)
> 
> 
> My question is how do I get python to do this???

I don't know R, but I believe the following does what you want:

import csv
import random
import sys
from collections import namedtuple

filename = "FILE PATH\\Road.density.municipio.all.txt"
with open(filename, "rb") as f:
    rows = csv.reader(f, delimiter="\t")
    headers = next(rows)
    rowtype = namedtuple("RT", [h.replace(".", "_") for h in headers])
    data = [rowtype(*map(float, row)) for row in rows]

print data
subset = [row for row in data if 700 < row.area < 2000]
print random.choice(subset).area

As you can see Python with just the standard library requires a lot more 
legwork than R. You might have a look at scipy/numpy, perhaps they offer 
functions that simplify your task. Finally, there's also 

http://rpy.sourceforge.net/

, but I've never tried that.