[Tutor] R: Tutor Digest, Vol 125, Issue 49

Dan Janzen dan.janzen at gmail.com
Thu Jul 17 21:50:06 CEST 2014


****DISCLAIMER*****

I have deliberately not read any of the other replies to this problem so 
my answer may be totally redundant! (but here it is anyway...)

One of the first issues that had to be addressed is the fact that your 
"CSV" file is probably not in the format you assume it is. Every line is 
a list, not the traditional "string separated by commas" format that one 
normally expects in a CSV file. One way to deal with that is to resave 
the file as a .txt file and deal with each line as one would normally do 
with a list, i.e. use list subscripting to manipulate each list element 
with regex code. Having said that, in the spirit of minimalism, there 
are ways to deal with it as a CSV file as well.

First, import the csv module and use the reader() method to properly 
access the contents.

importre

importcsv

withopen(/'non.csv'/, /'r'/) asp:

f = csv.reader(p, delimiter = /','/)

Then use a for loop to access each line and put the regex statements in 
the print statement

forw inf:

print(re.sub(r/'(\.\d)'/,/''/,w[0]), re.sub(r/'(\.\d)'/,/''/, w[1]))

The regex statements access the list elements with subscripting. The "$" 
was not necessary and without it you get the desired results.

TO SUMMARIZE:

With the following contents of file named "non.csv":

['uc002uvo.3 ', 'uc001mae.1']

['uc010dya.2 ', 'uc001kko.2']

and the following code run in Eclipse:

##test.py

importre

importcsv

withopen(/'non.csv'/, /'r'/) asp:

f = csv.reader(p, delimiter = /','/)

forw inf:

print(re.sub(r/'(\.\d)'/,/''/,w[0]), re.sub(r/'(\.\d)'/,/''/, w[1]))

I get:

['uc002uvo ''uc001mae']

['uc010dya ''uc001kko']





On 7/16/14, 4:04 AM, jarod_v6 at libero.it wrote:
> Hi there!!!
> I have a file  with this data
> ['uc002uvo.3 ', 'uc001mae.1']
> ['uc010dya.2 ', 'uc001kko.2']
> ['uc003ejx.2 ', 'uc010yfr.1']
> ['uc001bhk.2 ', 'uc003eib.2']
> ['uc001znc.2 ', 'uc001efn.2']
> ['uc002ycq.2 ', 'uc001vnh.2']
> ['uc001odf.1 ', 'uc002mwd.2']
> ['uc010jkn.1 ', 'uc010luk.1']
> ['uc003uhf.3 ', 'uc010tqd.1']
> ['uc002rue.3 ', 'uc001tex.2']
> ['uc011dtt.1 ', 'uc001lkv.1']
> ['uc003yyt.2 ', 'uc003mkl.2']
> ['uc003pkv.2 ', 'uc003ytw.2']
> ['uc010bhz.2 ', 'uc002kbt.1']
> ['uc001wnj.2 ', 'uc009wtj.1']
> ['uc011lyh.1 ', 'uc003jvb.2']
> ['uc002awj.1 ', 'uc009znm.1']
> ['uc010bft.2 ', 'uc002cxz.1']
> ['uc011mar.1 ', 'uc001lvb.1']
> ['uc001oxl.2 ', 'uc002lvx.1']
>
> I want to replace of the things after the dots, so I want to have  a file with
> this output:
>
> ['uc002uvo ', 'uc001mae']
> ['uc010dya ', 'uc001kko']
> ...
>
> I try to use regular expression but I have  a strange output
>
> with open("non_annotati.csv") as p:
>      for i in p:
>          lines= i.rstrip("\n").split("\t")
>          mit = re.sub(r'(\.\d$)','',lines[0])
>          mit2 = re.sub(r'(\.\d$)','',lines[1])
>          print mit,mit2
>
>
> uc003klv.2  uc010lxj
> uc001tzy.2  uc011kzk
> uc010qdj.1  uc001iku
> uc004coe.2  uc002vmf
> uc002dvw.2  uc004bxn
> uc001dmp.2  uc001dmo
> uc002rqd.2  uc010ynl
> uc010cvm.1  uc002qjc
> uc003ewy.3  uc003hgx
> uc002ejy.2  uc003mvb
> uc002fou.1  uc010ilx
> uc003vhf.2  uc010qlo
> uc003mix.2  uc010tdt
> uc002nez.1  uc003wxe
> uc011cpu.1  uc002keg
> uc001ovu.2  uc011dne
> uc010zfg.1  uc001jvq
> uc010jlf.2  uc011azi
> uc001ors.3  uc001vzx
> uc010tyt.1  uc003vih
> uc010fde.2  uc002xgq
> uc010bit.1  uc003zle
> uc010xcb.1  uc010wsg
> uc011acg.1  uc009wlp
> uc002bnj.2  uc004ckd
>
>
> Where is the error? what is wrong in my regular expression code?
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20140717/4be3d93c/attachment-0001.html>


More information about the Tutor mailing list