PyPI password rules

Skip Montanaro skip at pobox.com
Thu Aug 28 00:28:21 EDT 2014


On Wed, Aug 27, 2014 at 10:32 PM, Chris Angelico <rosuav at gmail.com> wrote:

> I'm not sure I understand how your 'common' value works, though. Does
> the default 0.6 mean you take the 60% most common words? Those above
> the 60th percentile of frequency? Something else?
>

Yes, basically. A word has to pass the following hurdles before being
deemed "common":

* length >= 4
* all lower case
* no punctuation
* not already "emitted" (made it to the common list)
* seen this word at least 10 times
* have seen at least 100 words

Then and only then, if its word count places it in the top T percent of all
seen words (T defaults to 60%), is it added to the "emitted" or common word
list. Only words in that list are chosen as password material. Further, the
dict command allows you to identify words in the common list which aren't
in your computer's words file. You can give any of them (or any other word
you don't like) as arguments to the "bad" command.

I won't pretend to understand all that entropy stuff, and I realize that
given my 35k+ messages and my somewhat severe constraints, I have only
deemed 1057 words from my corpus as "worthy" so far. That's about 10 bits
of entropy per word? That obviously improves the chances my passwords can
be guessed, but I suspect I can lower my T value sufficiently to increase
the pool of candidate words to whatever amount of entropy you require. I
agree though, it is a bit backwards from how the XKCD 936 thing works.

I just realized something. To keep it from taking forever to start up
before I had a pickle save file, I limited the messages to those since
2014-08-22. Not too many. Not sure how to deal with that, but for the
moment, I initialize Polly.latest to 2014-05-01 in my sandbox (not checked
in). That will considerably increase the number of messages scanned. While
it's doing that (in a separate thread), I can watch the progress with the
stat command at the ? prompt:

? stat
messages: 0
all words: 0
common words: 0
'bad' words: 0
... time passes ...
?
messages: 716
all words: 4532
common words: 725
'bad' words: 0
? bad flaskapp luofeiyu lilypond
? stat
messages: 1013
all words: 5637
common words: 994
'bad' words: 3
?
messages: 1361
all words: 6545
common words: 1251
'bad' words: 3
? password
formatted overlap relation itself

... and so on. After awhile I should near the 2000 common word set.

Hmmm... I realize now that I'm not seeing all messages, at least I don't
think so. So much to learn about IMAP...

Skip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20140827/f63e6168/attachment.html>


More information about the Python-list mailing list