re search through a text Vs line

Dave Angel davea at davea.name
Sun Oct 5 07:57:50 EDT 2014


Shiva <shivaji_tn at yahoo.com.dmarc.invalid> Wrote in message:
> Hi,
> 
> I am doing a regular expression search for a year through a file.
> 

I think you are being confused in part by your choice of names.
 Let's go through and describe the variable contents.

>   fileextract = open(fullfilename,'r')
>   line = fileextract.read()

'line' is a single string containing all the lines in the file.

>   texts = re.search(r'1\d\d\d', line)
>   print(texts.group())
> 
> The above works.
> 
> However if I do:
>      fileextract = open(fullfilename,'r')
>      line = fileextract.readlines()

Now, 'line' is a list, with each item of the list being a string.
 The name is very misleading, and should be something like
 'lines'

> 
>      for l in line:
>         texts = re.search(r'1\d\d\d', line)

The second argument here is a list,  not a string. You probably
 meant to search the variable named 'l' . Of course if you renamed
 things, then you might have a loop of
     for line in lines:
And after that change,  your search call would be correct again. 

>      print(texts.group())

Here you look only at the last result. You probably want this line
 indented., so it's part of the loop.

> 
> 
> None is returned. Why is it not iterating through each line of the file and
> doing a search? - It seems to return none.
> 

re.search only searches strings.

Other comments.  You neglected to close the files. Doesn’t hurt
 here, but it's best to get into good habits. Look up the with
 statement. 

The readlines call was unnecessary,  as you could have iterated
 oner the file object.
   for line in fileextract:

Your regexp won't match recent years. And it will match numbers
 like 51420, 1994333, and so on.
Perhaps you want to restrict where in the line those four digits
 may be.

When asking questions,  it is frequently useful to specify Python
 version. 


-- 
DaveA




More information about the Python-list mailing list