[Tutor] Data pattern query.

Alan Gauld alan.gauld at yahoo.co.uk
Mon Jan 7 04:19:58 EST 2019


On 07/01/2019 02:38, mhysnm1964 at gmail.com wrote:

> All the descriptions of the transactions are 
> in a single column. I am trying to work out the 
> easiest method of identifying the same pattern 
> of text in the fields. 

What does a singe column mean? That presumably is how it
appears in the spreadsheet? But how is it stored in your
Python code? A list? a list of lists? a dictionary?

We don't know what your data looks like.
Post a sample along with an explanation of how it is
structured.

In general when looking for patterns in text a regular
expression is the tool of choice. But only if you know
what the pattern looks like. Identifying patterns as
you go is a much more difficult challenge

> Then I am going to group these vendors by categories. 

And how do you categorize them? Is the category also in
the data or is it some arbitrary thing that you have devised?

> In the field, there is the vendor name, suburb/town, type of transaction, etc.

etc is kind of vague!
Show us some data and tel;l us which field is which.
Without that its difficult to impossible to tell you
how to extract anything!

The important thing is not how it looked in the spreadsheet
but how it looks now you have it in Python.

> How can I teach the program to learn new vendor names? 

Usually you would use a set or dictionary and add new
names as you find them.

> I was thinking of removing all the duplicate entries 

Using a set would do that for you automatically

> Was thinking of using dictionaries for this. 
> But not sure if this is the best approach. 

If you make the vendor name the key of a dictionary
then it has the same effect as using a set. But whether
a set or dict is best depends on what else you need
to store. If its only the vendor names then a set
is best. If you want to store associated data then
a dict is better.

You need to be much more specific about what your
data looks like, how you identify the fields you
want, and how you will categorize them.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list