Encryption - was Hello World

Chris Angelico rosuav at gmail.com
Mon Dec 22 17:29:14 EST 2014


On Tue, Dec 23, 2014 at 6:57 AM, Dave Angel <d at davea.name> wrote:
> I figure I must be misunderstanding something in your explanation, since a
> brute-force password guesser would seem to only need four billion tries to
> (probably) crack that.
>
> 1) Are you assuming that the cracker can read the source code, but cannot
> modify the version of the code that is running?
>
> 2) Are you really doing something equivalent to:
>
> test = time_calc() - get a one-byte byte-string according to hour of the
> week
> encoded_pw = hash(password)
> if encoded_pw.startswith(test*2) and encoded_pw.endswith(test*2):
>       ---passed---
>
> I can understand that being sufficiently obscure for the pointy haired boss,
> but I figure I've got to be missing something.  A quick test with 3.2 shows
> that around a million hashes can be generated per second, so checking four
> billion is only an hour or so.  Since some of them will collide, that gives
> us something better than 50% likelihood of having found a useful pw in an
> hour.  But a few more hours and we'll most likely have it.
>
> For that matter, you must have already written such a pw finder.
>
> I'm back to figuring I'm misunderstanding what you're saying.

No, actually you're understanding that fairly well. Of course, I
didn't share the password finder script.

The code was similar in functionality to what you describe, but it
used a more obscure coding style so it wasn't obvious. Imagine using a
regex to verify that part of the hash. (It wasn't actually a regex,
but it wasn't Python either, and the significance is that it was
obfuscated code.) I don't remember exactly which hashing algorithm I
was using for this, but the password finder took about a week (running
roughly eight hours a day, while I was there) to cover most of the
required passwords.

As to the assumptions... uhh... that was never something I really
understood. I think you're probably right, and this was part of the
paranoia of "my code might be stolen". You're attempting to attribute
a level of logic to the requirements which has no supporting evidence
:)

But what you've proven above is how ineffective this technique is at
keeping out a determined, and mathematically-adept, attacker. Yaknow,
*real* security. This code was *extremely* effective at satisfying my
boss. As I said, he wasn't satisfied with the idea of just embedding a
SHA256 hash into the code; I would have used an XKCD 936 compliant
password, so brute-forcing that would take (assuming your
million-hashes-per-second figure) about a year, and that assuming the
attacker knew my exact style.

Aside: XKCD 936 overestimates the time to generate guesses (1000/sec),
which presumably means it's not talking about reversing a hash, but
attempting some other attack. (Not a big deal, since the same figure
is used for both types of password.) But it also underestimates the
password entropy of four words. Let's see. First off, a 4K corpus
isn't that hard to work with, so that potentially gives you another
four bits of entropy; in /usr/share/dict/words I have 72861 words with
no capital letters, punctuation, etc, so it wouldn't be unreasonable
to push that up even to 16 bits per word (which sounds weird, worded
like that), raising the total entropy from 44 bits to 64. And there's
no guarantee that one person's corpus will exactly match another's.
Plus, you might and might not capitalize the first letters of the
words (another bit), and you could run them together with no
punctuation, or use any common punctuation to separate them (space, or
"-:,./\" - eight easy options, 3 bits). So in theory, an attacker
might know that you're using an XKCD 936 password, but there could
still be up to 68 bits of entropy, *easily*. Even in a dedicated
personal attack, the estimate of 44 bits would be an absolute minimum,
and it's likely to cost rather more than that.

ChrisA



More information about the Python-list mailing list