Searching for lots of similar strings (filenames) in sqlite3 database

Adam Funk a24061 at ducksburg.com
Wed Jul 2 08:30:08 EDT 2014


On 2014-07-02, Chris Angelico wrote:

> On Wed, Jul 2, 2014 at 7:32 PM, Adam Funk <a24061 at ducksburg.com> wrote:
>> Well, I've changed it to the following anyway.
>>
>>         subdir_glob = subdir + '/*'
>>         cursor.execute('SELECT filename FROM files WHERE filename GLOB ?',
>>                        (subdir_glob,))
>>         rows = cursor.fetchall()
>>         known_files = {row[0] for row in rows}
>>
>> I see what you mean about paths containing '%', but I don't see why
>> you were concerned about underscores, though.
>
> With GLOB, presumably ? matches a single character and * matches any
> number of characters. With LIKE, _ matches a single character and %
> matches any number. So, for instance, WHERE filename LIKE
> '/foo/bar/spam_spam/%' will match '/foo/bar/spam2spam/1234', which may
> be a little surprising. It's not going to be a serious problem in most
> cases, as it'll also match '/foo/bar/spam_spam/1234', but the false
> positives will make one of those "Huhhhhh????" moments if you don't
> keep an eye on your magic characters.
>
> In your specific case, you happen to be safe, but as I look over the
> code, my paranoia kicks in and tells me to check :) It's just one of
> those things that flags itself to the mind - anything that might help
> catch bugs early is a good feature of the mind, in my opinion!

Oh, I'd just missed the '_' in the LIKE documentation.  Doh!


-- 
Indentation is for enemy skulls, not code!
            --- Klingon Programmer's Guide



More information about the Python-list mailing list