Question on List processing

Steven D'Aprano steve at pearwood.info
Tue Apr 26 12:29:33 EDT 2016


On Wed, 27 Apr 2016 01:38 am, subhabangalore at gmail.com wrote:


> I am trying to send you a revised example.
> list1=[u"('koteeswaram/BHPERSN engaged/NA ','class1')",
> u"('koteeswaram/BHPERSN is/NA ','class1')"]


Please don't use generic names that mean nothing like "list1". We can see it
is a list, but what is it for? Use a name that describes what the purpose
of the list is. Even "input" and "output" are better names.


> [('koteeswaram/BHPERSN engaged/NA ','class1'),
>  ('koteeswaram/BHPERSN is/NA  ','class1')]

What is this? The output? Don't make us guess what things are.

My *guess* is that you have a list of Unicode strings that look like this:

u"('aaa/TAG bbb/TAG ','class1')"

and you want to do six things:

- normalise the string;

- convert the Unicode string to ASCII, ignoring anything that isn't ASCII;

- delete the parentheses in the string;

- delete the leading and trailing single quotes;

- split the string on the comma;

- combine them into a tuple.


So let's make some functions:

# Untested
def remove_parentheses(string):
    if string.startswith("(") and string.endswith(")"):
        string = string[1:-1]
    return string

def remove_single_quotes(string):
    if string.startswith("'") and string.endswith("'"):
        string = string[1:-1]
    return string

def convert(string):
    if not isinstance(string, unicode):
        raise TypeError("expected unicode, but got %s" 
                        % type(string).__name__)
    string = unicodedata.normalize('NFKD', string)
    string = string.encode('ascii','ignore')
    string = remove_parentheses(string)
    first_part, second_part = string.split(",")
    first_part = remove_single_quotes(first_part)
    second_part = remove_single_quotes(second_part)
    return (first_part, second_part)


input = [ ... ]  # your input strings
output = []
for string in input:
    output.append(convert(string))





-- 
Steven




More information about the Python-list mailing list