Code improvement question

Mike Dewhirst miked at dewhirst.com.au
Tue Nov 14 22:41:20 EST 2023


On 15/11/2023 10:25 am, MRAB via Python-list wrote:
> On 2023-11-14 23:14, Mike Dewhirst via Python-list wrote:
>> I'd like to improve the code below, which works. It feels clunky to me.
>>
>> I need to clean up user-uploaded files the size of which I don't know in
>> advance.
>>
>> After cleaning they might be as big as 1Mb but that would be super rare.
>> Perhaps only for testing.
>>
>> I'm extracting CAS numbers and here is the pattern xx-xx-x up to
>> xxxxxxx-xx-x eg., 1012300-77-4
>>
>> def remove_alpha(txt):
>>
>>       """  r'[^0-9\- ]':
>>
>>       [^...]: Match any character that is not in the specified set.
>>
>>       0-9: Match any digit.
>>
>>       \: Escape character.
>>
>>       -: Match a hyphen.
>>
>>       Space: Match a space.
>>
>>       """
>>
>>       cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)
>>
>>       bits = cleaned_txt.split()
>>
>>       pieces = []
>>
>>       for bit in bits:
>>
>>           # minimum size of a CAS number is 7 so drop smaller clumps 
>> of digits
>>
>>           pieces.append(bit if len(bit) > 6 else "")
>>
>>       return " ".join(pieces)
>>
>>
>> Many thanks for any hints
>>
> Why don't you use re.findall?
>
> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)

I think I can see what you did there but it won't make sense to me - or 
whoever looks at the code - in future.

That answers your specific question. However, I am in awe of people who 
can just "do" regular expressions and I thank you very much for what 
would have been a monumental effort had I tried it.

That little re.sub() came from ChatGPT and I can understand it without 
too much effort because it came documented

I suppose ChatGPT is the answer to this thread. Or everything. Or will be.

Thanks

Mike


More information about the Python-list mailing list