split and regexp on textfile

mik3l3374 at gmail.com mik3l3374 at gmail.com
Fri Apr 13 04:40:34 EDT 2007


On Apr 13, 3:59 pm, "Flyzone" <flyz... at technologist.com> wrote:
> Hi,
> i have a problem with the split function and regexp.
> I have a file that i want to split using the date as token.
> Here a sample:
> -----
> Mon Apr  9 22:30:18 2007
> text
> text
> Mon Apr  9 22:31:10 2007
> text
> text
> ----
>
> I'm trying to put all the lines in a one string and then to separate
> it
> (could be better to not delete the \n if possible...)
>   while 1:
>      line = ftoparse.readline()
>      if not line: break
>      if line[-1]=='\n': line=line[:-1]
>              file_str += line
>   matchobj=re.compile('[A-Z][a-z][a-z][ ][A-Z][a-z][a-z][ ][0-9| ][0-9]
> [ ][0-9][0-9][:]')
>   matchobj=matchobj.split(file_str)
>   print matchobj
>
> i have tried also
>    matchobj=re.split(r"^[A-Z][a-z][a-z][ ][A-Z][a-z][a-z][ ][0-9| ]
> [0-9][ ][0-9][0-9][:]",file_str)
> and reading all with one:
>    file_str=ftoparse.readlines()
> but the split doesn't work...where i am wronging?

you trying to match the date part right? if re is what you desire,
here's one example:

>>> data = open("file").read()
>>> pat = re.compile("[A-Z][a-z]{2} [A-Z][a-z]{2}  \d{,2}\s+\d{,2}:\d{,2}:\d{,2} \d{4}",re.M|re.DOTALL)
>>> print pat.findall(data)
['Mon Apr  9 22:30:18 2007', 'Mon Apr  9 22:31:10 2007']




More information about the Python-list mailing list