How to print all expressions that match a regular expression

Alf P. Steinbach alfps at start.no
Sat Feb 6 23:38:38 EST 2010


* hzhuo1 at gmail.com:
>> So it seems we both misunderstood the problem.
>>
>> I didn't read the top level article until now, and reading it, I can't make
>> sense of it.
>>
> 
> [1] Seems that you should read the whole thing before making a post, or
> else you cannot know what we are talking about.
> Steven doesn't misunderstand me. We are talking about what I need, and
> he tries to help.

If you were not misunderstood then you've posted a question for which there's no 
practical solution.


>>> "Given the function hashlib.sha256, enumerate all the possible inputs
>>> that give the hexadecimal result
>>> 0a2591aaf3340ad92faecbc5908e74d04b51ee5d2deee78f089f1607570e2e91."
>> I tried some "parrot" variants but no dice. :-(
>>
>> [snip]
>>
> 
> This is a hash collision problem. Nobody has proved that SHA-256 is
> collision free, even not in the random oracle model, because people
> always suppose that a random oracle exists, and make hash function its
> substitution. That means it may be broken someday. And any provable
> security based on random oracle model is not secure.

Stephen's little challenge wasn't about breaking SHA-256 but about guessing his 
secret phrase, given his clues.


>>> I'm suggesting that, in general, there's no way to tell in advance which
>>> regexes will be easy and which will be hard, and even when they are easy,
>>> the enumeration will often be infinite.
> 
> It is hard to tell in advance.

No, it's trivial.


> [2] However, we can add some timing limit
> or counting limit, to make it an algorithm, which can halt. For
> example, whenever the program outputs more than 1000000 expressions
> that match the input regex, we can halt because that exceeds our
> limit. But surely this is not efficient because of the post-decision.

You don't need to wait for that output to complete. You can calculate the number 
of strings up front. Like it appears that you do below:


>> Essentially, any regexp that includes '+' or '*' (directly or via e.g. notation
>> that denotes "digit sequence") yields an infinite number of strings.
> 
> Infinity is really relative, not absolute. It is relative to the
> computing speed. For example, the regex '^[0|1]{2048}$' is rather
> simple and doesn't contain '+' or '$', but trying to output all
> expressions that match it has a complexity of 2^2048. If we can do
> that, then we can break RSA-2048.
> We must face the reality .

Here it seems that you have no problem calculating number of combinations, yet 
above, at the paragraph marked [2], you talk about waiting for a million strings 
to be output before seeing that it's too much, and earlier, at the paragraph 
marked [1], you maintain that your original question about generating all such 
strings (completely impractical) was what you wanted help with?

I'm sorry but I can't make sense of this; it appears to be meaningless.

Perhaps if you tried to clarify the requirements a bit.


Cheers & hth.,

- Alf



More information about the Python-list mailing list