[Tutor] Tokenizing a imple string with split()

Jean Mark Gawron gawron at rohan.sdsu.edu
Sun Apr 1 19:47:14 CEST 2007


The  split method in the "re" module does what you want here.
This is a method on compiled re_patterns, so first you construct
a regular expression that matches any of the desired separators:

     >>> s = "spam;egg mail"
     >>> x_re = re.compile(r'[; ]')
     >>> s.split("; ")
     ['spam;egg mail']
     >>> x_re.split(s)
     ['spam', 'egg', 'mail']
 >>>

>
> Message: 1
> Date: Sat, 31 Mar 2007 18:00:47 -0400
> From: Rafael Garcia <rgarcia at fas.harvard.edu>
> Subject: Re: [Tutor] tokenizing a simple string with split()
> To: tutor at python.org
> Message-ID: <460EDA0F.7030801 at fas.harvard.edu>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I think by "multiple characters" they mean more than one character for
> ONE separator. I don't know how to specify multiple separators. You
> could try something like this:
>
> s = "spam;egg mail"
> list = []
> for t in s.split(";"):
>     for u in t.split(" "):
>         list.append(u)
>
> which yields:
> list = ['spam', 'egg', 'mail']
>
>
> Rafael
>
> On 3/31/07 5:43 PM, Andrei Petre wrote:
>> I want to split a string like "C:\My\Doc\;D:\backup\" with two
>> separators: \ and ;
>> I found that \ is handled with /raw string/ notation r"". But the
>> problem i encountered is with split() function.
>> In the 2.5 reference is said that "The sep argument of the split()
>> function may consist of multiple characters". But I cannot figured  
>> out
>> why this simple example not working:
>>
>> s = "spam;egg mail"
>> s.split("; ")
>>
>> output: ['spam;egg mail']
>>
>> instead of ['spam', 'egg', 'mail']
>>
>> any suggestion is welcome,
>> andrei
>>
>> --------------------------------------------------------------------- 
>> ---
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.python.org/pipermail/tutor/attachments/ 
> 20070331/8e1bedb4/attachment.html
>
> ------------------------------
>
> Message: 2
> Date: Sat, 31 Mar 2007 17:20:02 -0500
> From: "Greg Corradini" <gregcorradini at gmail.com>
> Subject: [Tutor] Create an Access table w/ mxODBC module
> To: tutor at python.org
> Message-ID:
> 	<429320790703311520w1ab9f56agd20914acb567bc77 at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello all,
> I'm brand new to the mxODBC module from egenix (like yesterday). I  
> currently
> use the module for data laundering. I connect to a Microsoft  
> Access .mdb and
> perform some SQL queries on tables. Almost everything is going fine.
> However, I can't find out how to create a new table. Even other  
> examples
> that I've found on the web that use the "create table" SQL command  
> aren't
> working for me. When I run mycursor.execute('create table TEST (nr  
> Integer,
> val Integer)'), i only get a return value of -1.  I was hoping  
> somebody with
> more mxODBC experience could give me a hand with this minor problem.
>
> Thanks
> Greg
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.python.org/pipermail/tutor/attachments/20070331/ 
> cfd322a8/attachment.htm
>
> ------------------------------
>
> Message: 3
> Date: Sat, 31 Mar 2007 23:20:21 +0100
> From: Dave S <pythontut at pusspaws.net>
> Subject: Re: [Tutor] My Python project - an update
> To: tutor at python.org
> Message-ID: <200703312320.21457.pythontut at pusspaws.net>
> Content-Type: text/plain;  charset="iso-8859-15"
>
> Very valid points, I was not aware that MD5 had been cracked :)
>
> Dave
>
>
> ------------------------------
>
> Message: 4
> Date: Sun, 1 Apr 2007 00:01:49 +0100
> From: "Alan Gauld" <alan.gauld at btinternet.com>
> Subject: Re: [Tutor] Detect errors when using os.popen.readlines()
> To: tutor at python.org
> Message-ID: <eump92$ka9$1 at sea.gmane.org>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
>
>
> "Peter" <lists1 at marscode.net> wrote
>
>> Is there a way to detect errors when running shell commands using
>> os.popen?
>
> You have to parse the programs output.
> Usually errors will appear on stderr so you need to read that as
> well as stdout.
>
> This may be slightly easier using the new subprocess module
> and the Popen class.
>
>> if an interface doesn't exist I get an error from the shell command.
>> I tried using try and except, but that did seem to work.
>
> Even if the program returns an error popen is still working just
> fine so no exception gets raised. You must parse the output
> (or check the status value, but thats not reliable in all programs)
>
> HTH,
>
> -- 
> Alan Gauld
> Author of the Learn to Program web site
> http://www.freenetpages.co.uk/hp/alan.gauld
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Sat, 31 Mar 2007 20:45:21 -0400
> From: Jay Mutter III <jmutter at uakron.edu>
> Subject: Re: [Tutor] Another parsing question
> To: Kent Johnson <kent37 at tds.net>
> Cc: tutor at python.org
> Message-ID: <7BD9F003-BE85-47BF-A83C-6A8926EB31C0 at uakron.edu>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
> Kent;
> Again thanks for the help.
> i am not sure if this is what you menat but i put
>
> for line in s:
>      jay = patno.findall(line)
>      jay2 = "".join(jay[0])
>      print jay2
>
> and it prints fine up until line 111 which is a line that had
> previously returned [ ] since a number didn't exist on that line and
> then exits with
>
> Traceback (most recent call last):
>    File "patentno2.py", line 12, in ?
>      jay2 = "".join(jay[0])
> IndexError: list index out of range
>
>
> And as long as i am writing, how can I delete a return at the end of
> a line if the line ends in a certain pattern?
>
> For instance, if line ends with the abbreviation  No.
> I want to join the current line with next line.
> Are lists immutable or can they be changed?
>
> Thanks again
>
> jay
>
> On Mar 31, 2007, at 2:27 PM, Kent Johnson wrote:
>
>> Jay Mutter III wrote:
>>> I have the following that I am using to extract "numbers' from a  
>>> file
>>> ...
>>> which yields the following
>>> [('1', '337', '912')]
>>> ...
>>> So what do i have above ? A list of tuples?
>>
>> Yes, each line is a list containing one tuple containing three
>> string values.
>>
>>> How do I  send the output to a file?
>>
>> When you print, the values are automatically converted to strings
>> by calling str() on them. When you use p2.write(), this conversion
>> is not automatic, you have to do it yourself via
>>   p2.write(str(jay))
>>
>> You can also tell the print statement to output to a file like this:
>>   print >>p2, jay
>>
>>> Is there a way to get the output as
>>> 1337912  instead of   [('1', '337', '912')]  ?
>>
>> In [4]: jay=[('1', '337', '912')]
>>
>> jay[0] is the tuple alone:
>> In [6]: jay[0]
>> Out[6]: ('1', '337', '912')
>>
>> Join the elements together using an empty string as the separator:
>> In [5]: ''.join(jay[0])
>> Out[5]: '1337912'
>> In [7]:
>>
>> Kent
>
>
>
> ------------------------------
>
> Message: 6
> Date: Sat, 31 Mar 2007 23:42:14 -0400
> From: Kent Johnson <kent37 at tds.net>
> Subject: Re: [Tutor] tokenizing a simple string with split()
> To: Andrei Petre <andrei.petre at gmail.com>
> Cc: tutor at python.org
> Message-ID: <460F2A16.4050206 at tds.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Andrei Petre wrote:
>> I want to split a string like "C:\My\Doc\;D:\backup\" with two
>> separators: \ and ;
>> I found that \ is handled with /raw string/ notation r"". But the
>> problem i encountered is with split() function.
>> In the 2.5 reference is said that "The sep argument of the split()
>> function may consist of multiple characters".
>
> The argument to split() is the literal string to split on, not a  
> list of
> potential splitting characters. So to split on '; ' your string would
> have to be 'spam; egg; mail'.
>
> To split on one of a list of characters you have to use a regular
> expression and re.split().
>
> In [1]: import re
> In [3]: re.split('[; ]', "spam;egg mail")
> Out[3]: ['spam', 'egg', 'mail']
>
> [; ] is a regular expression that means, "match either of ; or space".
>
> Kent
>
>
> ------------------------------
>
> Message: 7
> Date: Sun, 1 Apr 2007 00:43:34 -0600
> From: "David Heiser" <David.Heiser at intelliden.com>
> Subject: Re: [Tutor] tokenizing a simple string with split()
> To: <tutor at python.org>
> Message-ID:
> 	<DB30DA681DB9544886EA69FE9082737CCAF316 at csoexc02.intelliden.net>
> Content-Type: text/plain;	charset="us-ascii"
>
>
> Or you can try something like:
>
> x = r"C:\My\Doc\;D:\backup"
> x = x.replace("\\", ";")
> x = x.split(";")
>
>
> -----Original Message-----
> From: tutor-bounces at python.org [mailto:tutor-bounces at python.org] On
> Behalf Of Kent Johnson
> Sent: Saturday, March 31, 2007 9:42 PM
> To: Andrei Petre
> Cc: tutor at python.org
> Subject: Re: [Tutor] tokenizing a simple string with split()
>
>
> Andrei Petre wrote:
>> I want to split a string like "C:\My\Doc\;D:\backup\" with two
>> separators: \ and ;
>> I found that \ is handled with /raw string/ notation r"". But the
>> problem i encountered is with split() function.
>> In the 2.5 reference is said that "The sep argument of the split()
>> function may consist of multiple characters".
>
> The argument to split() is the literal string to split on, not a  
> list of
>
> potential splitting characters. So to split on '; ' your string would
> have to be 'spam; egg; mail'.
>
> To split on one of a list of characters you have to use a regular
> expression and re.split().
>
> In [1]: import re
> In [3]: re.split('[; ]', "spam;egg mail")
> Out[3]: ['spam', 'egg', 'mail']
>
> [; ] is a regular expression that means, "match either of ; or space".
>
> Kent
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
> ------------------------------
>
> Message: 8
> Date: Sun, 1 Apr 2007 08:54:02 +0100
> From: "Alan Gauld" <alan.gauld at btinternet.com>
> Subject: Re: [Tutor] Another parsing question
> To: tutor at python.org
> Message-ID: <eunoeu$prr$1 at sea.gmane.org>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
>
>
> "Jay Mutter III" <jmutter at uakron.edu> wrote
>
>> for line in s:
>>     jay = patno.findall(line)
>>     jay2 = "".join(jay[0])
>>     print jay2
>>
>> and it prints fine up until line 111 which is a line that had
>> previously returned [ ] since a number didn't exist on that line and
>> then exits with
>
>> IndexError: list index out of range
>
> Either try/catch the exception or add an
> if not line: continue  # or return a default string
>
>> And as long as i am writing, how can I delete a return at the end of
>> a line if the line ends in a certain pattern?
>>
>> For instance, if line ends with the abbreviation  No.
>
> if line.endswith(string): line = line.rstrip()
>
>> I want to join the current line with next line.
>> Are lists immutable or can they be changed?
>
> lists can be changed, tuples cannot.
>
> HTH,
>
> -- 
> Alan Gauld
> Author of the Learn to Program web site
> http://www.freenetpages.co.uk/hp/alan.gauld
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
> End of Tutor Digest, Vol 38, Issue 1
> ************************************



More information about the Tutor mailing list