A fun python CLI program for all to enjoy!

DFS nospam at dfs.com
Fri May 6 19:12:41 EDT 2016


On 5/6/2016 4:30 PM, MRAB wrote:
> On 2016-05-06 20:10, DFS wrote:
>> getAddresses.py
>>
>> Scrapes addresses from www.usdirectory.com and stores them in a SQLite
>> database, or writes them to text files for mailing labels, etc
>>
>> Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find
>> out how many Taco Bells are within 10 miles of you, and store all the
>> addresses in your own address database.
>>
>> No more convoluted Googling, or hitting the 'Next Page' button, or
>> fumbling with the Yellow Pages...
>>
>> Note: the db structure is flat on purpose, and the .csv files aren't
>> quote delimited.
>>
>> Put the program in its own directory.  It creates the SQLite database
>> there, and writes files there, too.
>>
>> Reviews of code, bug reports, criticisms, suggestions for improvement,
>> etc are all welcome.
>>
> OK, you asked for it... :-)
>
> 1. It's shorter and clearer not to compare with True or False:
>
>        if verbose:
>
>    and:
>
>        if not dupeRow:


Done.  It will take some getting used to, though.  I like that it's 
shorter, but I could do the same in VBA and almost always chose not to.



> 2. You can print a blank line with an empty print statement:
>
>        print

Done.  I actually like the way print  looks better than print ""



> 3. When looking for unique items, a set is a better choice than a list:
>
>        addrCheck = set()
>
>        def addrUnique(addr):
>            if addr not in addrCheck:
>                x = True
>                addrCheck.add(addr)
>            else:
>                x = False
>            return x

Done.

I researched this just now on StackOverflow:

"Sets are significantly faster when it comes to determining if an object 
is present in the set"
and
"lists are very nice to sort and have order while sets are nice to use 
when you don't want duplicates and don't care about order."

The speed difference won't matter here in my little app, but it's better 
to use the right construct for the job.



> 4. Try string formatting instead multiple concatenation:
>
>        print "%s arguments" % argCnt


You're referring to this line:
print str(argCnt) + " arguments"

Is there a real benefit of using string formatting here?  (other than 
the required str() conversion)



> 5. Strings have a .join method, and when you combine it with string
> slicing:
>
>        keyw = "+".join(sys.argv[1 : argCnt - 5])


Slick.  Works a treat, and saved 2 lines of code.  String handling is 
another area in which python shines compared to VB.


> 6. Another example of string formatting:
>
>        search = "%s %s %s %s %s" % (keyw, cityzip, state, miles, addrWant)

Done.  It's shorter, and doesn't require the str() conversion I had to 
do on several of the items.

If I can remember to use it, it should eliminate these:
"TypeError: cannot concatenate 'str' and 'int' objects"



> 7. It's recommended to use the 'with' statement when handling files:
>
>        with open(webfile, "w") as f:
>            if store == "csv":
>                f.write("Name,Address,CityStateZip\n")


Done.  I read that using 'with' means Python closes the file even if an 
exception occurs.  So a definite benefit.



>    If you don't want to use the 'with' statement, note that closing the
> file is:
>
>            f.close()
>
>    It needs the "()"!

I used close() in 1 place, but close without parens in 2 other places. 
So it works either way.  Good catch.

(it's moot now: all 'f.open()/f.close()' replaced by 'with open()')



> 8. When using SQL, you shouldn't try to insert the values yourself; you
> should use parametrised queries:
>
>        cSQL = "DELETE FROM addresses WHERE datasrc = ? AND search = ?;"
>        if verbose:
>            print cSQL
>        db.execute(cSQL, (datasrc, search))
>        conn.commit()
>
>     It'll insert the values where the "?" are and will do any necessary
> quoting itself. (Actually, some drivers use "?", others use "%s", so if
> it doesn't work with one, try the other.)
>
>     The way you wrote it, it would fail if a value contained a "'". It's
> that kind of thing that leads to SQL injection attacks.

Fixed.

You'll notice later on in the code I used the parameterized method for 
INSERTS.  I hate the look of that method, but it does make dealing with 
apostrophes easier, and makes it safer as you say.




Thanks for the code review, RMAB.  Good improvements.





More information about the Python-list mailing list