Code improvement question

Mike Dewhirst miked at dewhirst.com.au
Tue Nov 14 18:14:10 EST 2023


I'd like to improve the code below, which works. It feels clunky to me.

I need to clean up user-uploaded files the size of which I don't know in 
advance.

After cleaning they might be as big as 1Mb but that would be super rare. 
Perhaps only for testing.

I'm extracting CAS numbers and here is the pattern xx-xx-x up to 
xxxxxxx-xx-x eg., 1012300-77-4

def remove_alpha(txt):

     """  r'[^0-9\- ]':

     [^...]: Match any character that is not in the specified set.

     0-9: Match any digit.

     \: Escape character.

     -: Match a hyphen.

     Space: Match a space.

     """

     cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)

     bits = cleaned_txt.split()

     pieces = []

     for bit in bits:

         # minimum size of a CAS number is 7 so drop smaller clumps of digits

         pieces.append(bit if len(bit) > 6 else "")

     return " ".join(pieces)


Many thanks for any hints

Cheers

Mike


More information about the Python-list mailing list