[Tutor] regex help for a noob

Thomas A. Anderson thomas.anderson at little-beak.com
Mon Feb 15 18:17:33 EST 2021


Thanks for the reply. I love this! I have a lot to learn.

After reading some, I figured I was making a list of a list, since
findall creates a list.

I was able to create a list comprehension to flatten the list, but would
rather have a more

pythonic way of clean code, so will examine everything again.


On 15.02.21 23:47, Alan Gauld via Tutor wrote:
> There are several things to comment on here...
>
> On 15/02/2021 20:39, Thomas A. Anderson via Tutor wrote:
>
>> The single characters I am looking for are nestled within a ("_"), i.e.
>> parenthesis and double quote.
>>
>> I have tried the following code:
>>
>>
>> import re
>>
>> def getlist():
>>     """ creates a list from file """ 
>       list = []
>>     dataload = open("/Users/drexl/Lyntin/sample.txt", "r")
> Best Python practice says use a with statement for this:
>
>       with open("/Users/drexl/Lyntin/sample.txt", "r") as dataload:
>
> That will ensue it gets closed again, even if you hit an exception.
>
>>     regExp = '\".*?\"' 
> This regex does not correspond to your specification. Where are the ()?
> I'd expect something like:
>
> regExp = "\(\"(.)\"|)   # match any single char between (" and ")...
>
> You want to extract the bit inside the quotes so that's
> what the group (ie the (.) bit) will do.
>
>>      for line in dataload.readlines():
> You don't need the readlines() its better to use the file
> object as an iterator:
>
>      for line in dataload:
>
> However I'm not sure you eben need to scan line by line, you
> could just read() the whole file and do it as a single search
> with findall()... But there may be data complications that
> preclude  that...
>
>>         x = re.findall(regExp, line)
>>         if x:
>>             list.append(x)
> findall() returns a list of found items. You are appending the whole
> list to your list. You probably want to add the lists together:
>
> list += x
>
> Also its very bad practice to use a type name for a variable. You
> have hidden the list() function so you can't now convert strings,
> say, to lists:
>
> Ls = list("abc")   -> error because list is now an actual list.
>
>> I have tried various other regex expressions, but they only give me worse or the same results.
>> So, I don't think it is regex related? But somewhere else, I am missing something?
> You are mostly missing the fact that appending a list to a
> list puts the whole list into the containing list
>
> a = [1]
> b = [2]
> c = []
> c.append(a)  -> [[1]]
> c.append(b)  -> [[1],[2]]
>
> But there's quite a few other things to tidy up too.
>


More information about the Tutor mailing list