Creating a dictionary from a .txt file

Roy Smith roy at panix.com
Sun Mar 31 14:41:41 EDT 2013


In article <mailman.4023.1364751102.2939.python-list at python.org>,
 Dave Angel <davea at davea.name> wrote:

> On 03/31/2013 12:52 PM, C.T. wrote:
> > On Sunday, March 31, 2013 12:20:25 PM UTC-4, zipher wrote:
> >>  <SNIP>
> >>
> >
> > Thank you, Mark! My problem is the data isn't consistently ordered. I can 
> > use slicing and indexing to put the year into a tuple, but because a car 
> > manufacturer could have two names (ie, Aston Martin) or a car model could 
> > have two names(ie, Iron Duke), its harder to use slicing and indexing for 
> > those two.  I've added the following, but the output is still not what I 
> > need it to be.
> 
> So the correct answer is "it cannot be done," and an explanation.
> 
> Many times I've been given impossible conditions for a problem.  And 
> invariably the correct solution is to press [back] on the supplier of the 
> constraints.

In real life, you often have to deal with crappy input data (and bogus 
project requirements).  Sometimes you just need to be creative.

There's only a small set of car manufacturers.  A good start would be 
mining wikipedia's [[List of automobile manufacturers]].  Once you've 
got that list, you could try matching portions of the input against the 
list.

Depending on how much effort you wanted to put into this, you could 
explore all sorts of fuzzy matching (ie "delorean" vs "delorean motor 
company"), but even a simple search is better than giving up.

And, this is a good excuse to explore some of the interesting 
third-party modules.  For example, mwclient ("pip install mwclient") 
gives you a neat Python interface to wikipedia.  And there's a whole 
landscape of string matching packages to explore.

We deal with this every day at Songza.  Are Kesha and Ke$ha the same 
artist?  Pushing back on the record labels to clean up their catalogs 
isn't going to get us very far.



More information about the Python-list mailing list