From frank.siebenlist at gmail.com Fri Jul 1 12:11:10 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Fri, 1 Jul 2016 09:11:10 -0700 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? Message-ID: Many times you will have two parties with a shared symmetric key that they will use to communicate authenticated and private messages to each other. If you have multiple keys, then you somehow have to match the key to the received message based on the context, the sender, or some key identifier that both parties associate with the used key. I'm looking for a good symmetric key identifier to use without the need for context or any pre-shared key-identifier. Some standardized way to derive a key-id from the key itself, such that both parties can derive it independently without any pre-shared key specific knowledge. Of course that key identifier shouldn't reveal anything that could compromise the key itself. I haven't been able to find a well-established way to achieve this (yet)... One possible solution could be to just taking the sha256 of the key. As long as the key is truly random... that should be ok (?). It could conflict with possible derived keys that are generated that way. Or maybe using one of the available KDFs? Those should be one-way-functions that wouldn't leak anything(?) Maybe use a well-known nonce to avoid any possible collisions with derived-keys. Any suggestions? Anything I missed? Regards, Frank. From _ at lvh.io Fri Jul 1 12:51:12 2016 From: _ at lvh.io (lvh) Date: Fri, 1 Jul 2016 11:51:12 -0500 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: References: Message-ID: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> Hi Frank, > On Jul 1, 2016, at 11:11 AM, Frank Siebenlist wrote: > > snip snip key identifiers This is why some key derivation functions and PRFs have ?purpose? or ?info" fields, yes; including BLAKE2 and HKDF. Deriving a lesser key (which might just be a keyid) is a perfectly valid strategy from objcap practice. I?m doing something similar in the scheme of a larger semiprivate key scheme using libsodium. You probably do want something that explicitly supports that instead of just implicitly picking a particular nonce or whatever ? I?m not sure which nonce you?re referring to, I don?t think the systems you mentioned take one. TL;DR: make the derivation completely distinct based on what you?re deriving and why you?re deriving it :) You might also want to look at the related concept of NMR and key-wrap, which might let you solve the problem at a slightly different part of your protocol; essentially giving you a protected key with associated data about that key. It?s not entirely clear what the people standardizing GCM-SIV want to do exactly (other than ?not TLS?, I don?t think they?ve said), but this is the obvious choice, especially given GCM-SIVs separate code path for tiny messages and the historical linking of the two from a crypto design perspective. I am also writing NMR stuff on the side in libsodium/caesium, but that focuses mostly on being a Fernet replacement, rather than a keywrap, using secretbox (which makes it easy because big nonce space). Pretty sure I can translate it to the AEAD schemes, but the security proof gets iffier. Which reminds me: we should talk about Clojure bindings to libsodium some time :) lvh From frank.siebenlist at gmail.com Fri Jul 1 13:54:44 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Fri, 1 Jul 2016 10:54:44 -0700 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> Message-ID: Hi lvh, Guess you're the "lvh" who is responsible for "lvh/caesium" ;-). Good to see that you've reanimated that project! Believe you were kind of distracted for awhile, which "forced" me to play around with "franks42/naclj"... which has been on live-support for about a year now, because my new job consumes even my playtime. As part of that "franks42/naclj" effort, I suggested to standardize the derivation of a kid from the two curve25519 public keys. However, I recognize that you do not always have any DH-keys available when you have a bare symmetric key, so I suggested a scheme based on blake2. I wrote up some rationale for those choices here: "https://github.com/franks42/naclj/blob/master/Keys%2C%20IDs%2C%20and%20URNs.md", but never got much traction on the libsodium list,... and then I got distracted. Now I'm faced again with similar key-management issues, which could benefit from such key-derived kid's - so I try again. In summary, your suggestions all resonate very well, but... there are too many of them. Let's just pick one identifier derivation mechanism for symmetric keys, document it, implement it, use it! Groetjes, Frank. On Fri, Jul 1, 2016 at 9:51 AM, lvh <_ at lvh.io> wrote: > Hi Frank, > >> On Jul 1, 2016, at 11:11 AM, Frank Siebenlist wrote: >> >> snip snip key identifiers > > This is why some key derivation functions and PRFs have ?purpose? or ?info" fields, yes; including BLAKE2 and HKDF. Deriving a lesser key (which might just be a keyid) is a perfectly valid strategy from objcap practice. I?m doing something similar in the scheme of a larger semiprivate key scheme using libsodium. You probably do want something that explicitly supports that instead of just implicitly picking a particular nonce or whatever ? I?m not sure which nonce you?re referring to, I don?t think the systems you mentioned take one. TL;DR: make the derivation completely distinct based on what you?re deriving and why you?re deriving it :) > > You might also want to look at the related concept of NMR and key-wrap, which might let you solve the problem at a slightly different part of your protocol; essentially giving you a protected key with associated data about that key. It?s not entirely clear what the people standardizing GCM-SIV want to do exactly (other than ?not TLS?, I don?t think they?ve said), but this is the obvious choice, especially given GCM-SIVs separate code path for tiny messages and the historical linking of the two from a crypto design perspective. > > I am also writing NMR stuff on the side in libsodium/caesium, but that focuses mostly on being a Fernet replacement, rather than a keywrap, using secretbox (which makes it easy because big nonce space). Pretty sure I can translate it to the AEAD schemes, but the security proof gets iffier. Which reminds me: we should talk about Clojure bindings to libsodium some time :) > > > lvh > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev From _ at lvh.io Fri Jul 1 18:53:12 2016 From: _ at lvh.io (lvh) Date: Fri, 1 Jul 2016 17:53:12 -0500 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> Message-ID: <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> > On Jul 1, 2016, at 12:54 PM, Frank Siebenlist wrote: > > Hi lvh, > > Guess you're the "lvh" who is responsible for "lvh/caesium" ;-). Yup. I?m also a founding member of PyCA and the resident cryptographer, which is why I?m on this list :-) > Good to see that you've reanimated that project! Believe you were kind of > distracted for awhile, which "forced" me to play around with > "franks42/naclj"... which has been on live-support for about a year > now, because my new job consumes even my playtime. It did what I needed it to do at the time, so I didn?t fix what wasn?t broken ;-) I don?t recall anyone reaching out or filing issues. Once someone did ask questions and contributed code, I was happy to merge/review/cut new releases/do new development. More dev is happening now to scratch my own itch :) Currently I?m doing a lot of work around NMR as mentioned before, and API design around e.g. different byte buffer types, so that for example you can efficiently dump a nonce and a ciphertext in the same buffer, or derive multiple keys in one iteration of BLAKE2, etc. Also a bunch of work around e.g. pinning and verification of the produced binding and related benchmarking :) I invite you to look at caesium again, because some of the criticisms you make in naclj?s README no longer apply (e.g. caesium no longer uses kalium and instead binds libsodium directly, albeit for a different reason than what naclj mentions). Because the binding is done in Clojure, it can do all sorts of metaprogramming including binding every permutation of a particular method for various byte types in addition to the inspection mentioned above, e.g.: https://github.com/lvh/caesium/blob/master/src/caesium/binding.clj#L56-L62 Do you intend to continue to develop naclj, or is it effectively retired? > As part of that "franks42/naclj" effort, I suggested to standardize > the derivation of a kid from the two curve25519 public keys. However, > I recognize that you do not always have any DH-keys available when you > have a bare symmetric key, Is that scheme documented anywhere? I wonder what the use case is for two curve25519 pubkeys ? the ?obvious" case would seem to easily degenerate to the shared symmetric secret (after doing a DH exchange). > so I suggested a scheme based on blake2. I > wrote up some rationale for those choices here: > "https://github.com/franks42/naclj/blob/master/Keys%2C%20IDs%2C%20and%20URNs.md", > but never got much traction on the libsodium list,... and then I got > distracted. > > Now I'm faced again with similar key-management issues, which could > benefit from such key-derived kid's - so I try again. > > In summary, your suggestions all resonate very well, but... there are > too many of them. Let's just pick one identifier derivation mechanism > for symmetric keys, document it, implement it, use it! I think there are a few problems preventing this from happening right now, including: - Historically, cryptographers have not researched key wrap anywhere near as much as other schemes. I think the only reason it?s en vogue now is the interest in NMR, which at least a handful of cryptographers (Rogaway, Krovetz, and humbly, myself) care about now, and is incidentally a related problem. - People want subtly different things for their protocols, further reducing interest. Do you just want to identify a key? That?s fine, but a problem many protocols dodge. Do you want to ship a key to someone who already has a secret or asymmetric key? AEAD (including NMR AEAD in particular, so key wrap) and just asymmetric encryption (a la non-PFS TLS or GPG) is probably where you?re going to land. - How does this fit in a grander protocol and what is that protocol trying to accomplish? - How is the key identifier authenticated? What prevents Mallory from just modifying the key id bytes to effectively deny service? E.g. if I?m doing this to make sure I can rotate keys effectively, how do I auth that? Ideally without replacing an unrotatable secret key with another unrotatable secret key :D (Effective key rotation for KEKs is definitely something I care about.) - When keys are being sent alongside messages, how do we make this not a footgun for e.g. key selection attacks? (Granted, harder for EdDSA, but I want protocols to be correct for arbitrary schemes :)). PyCA cares about recipes being not footguns. That?s a mixed bag: on the one hand, it means we can give safe advice, on the other hand, it does mean that all we have is Fernet... Overall, I think this is a reasonable idea for some protocols, but I think we need to be extremely clear about what that is, who it?s for, and how to use it. lvh > Groetjes, Frank. > > On Fri, Jul 1, 2016 at 9:51 AM, lvh <_ at lvh.io> wrote: >> Hi Frank, >> >>> On Jul 1, 2016, at 11:11 AM, Frank Siebenlist wrote: >>> >>> snip snip key identifiers >> >> This is why some key derivation functions and PRFs have ?purpose? or ?info" fields, yes; including BLAKE2 and HKDF. Deriving a lesser key (which might just be a keyid) is a perfectly valid strategy from objcap practice. I?m doing something similar in the scheme of a larger semiprivate key scheme using libsodium. You probably do want something that explicitly supports that instead of just implicitly picking a particular nonce or whatever ? I?m not sure which nonce you?re referring to, I don?t think the systems you mentioned take one. TL;DR: make the derivation completely distinct based on what you?re deriving and why you?re deriving it :) >> >> You might also want to look at the related concept of NMR and key-wrap, which might let you solve the problem at a slightly different part of your protocol; essentially giving you a protected key with associated data about that key. It?s not entirely clear what the people standardizing GCM-SIV want to do exactly (other than ?not TLS?, I don?t think they?ve said), but this is the obvious choice, especially given GCM-SIVs separate code path for tiny messages and the historical linking of the two from a crypto design perspective. >> >> I am also writing NMR stuff on the side in libsodium/caesium, but that focuses mostly on being a Fernet replacement, rather than a keywrap, using secretbox (which makes it easy because big nonce space). Pretty sure I can translate it to the AEAD schemes, but the security proof gets iffier. Which reminds me: we should talk about Clojure bindings to libsodium some time :) >> >> >> lvh >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 643 bytes Desc: Message signed with OpenPGP using GPGMail URL: From _ at lvh.io Fri Jul 1 18:56:33 2016 From: _ at lvh.io (lvh) Date: Fri, 1 Jul 2016 17:56:33 -0500 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> Message-ID: <74CD0FB6-06AD-4601-BEEE-34F27BCCDBB6@lvh.io> ? esprit de l?escalier: there?s also the difference between public-parameter hashes and a PRF, and BLAKE2 will do both for you. So, are you trying to identify a key in such a way that Eve can not detect the key being reused (but Bob shares a key with you), or is that OK? lvh -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 639 bytes Desc: Message signed with OpenPGP using GPGMail URL: From frank.siebenlist at gmail.com Sat Jul 2 19:52:27 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Sat, 2 Jul 2016 16:52:27 -0700 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> Message-ID: <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> Hi Laurens, I'm afraid that I have not been very good in explaining my use case, because the questions you ask point at more complicated solutions than I thought were necessary. The aim is to find the most convenient symmetric key identifier to embed in a cipher message that would require the minimum amount of key management. What is best depends on the context. Sometimes it's easy because there is only one key, or the security context is so unambiguous that associating the right key is trivial. Other times it's a bit more challenging. We have an existing application with tens of long-lived keys, and the current key-management complicates key-rotation and upgrades to modern algos and such. If both Alice and Bob can generate key identifiers (kid's) from the key that they share directly, like derive if from the symmetric key, then there is no need to exchange or agree upon a name for that key as it would be kind of a "true name" (read Vinge if you haven't ;-) ). The parties only have to agree on the key identifier derivation method. For example, if Alice and Bob agree to name their symmetric keys by taking the sha256 of that key's bytes, base64url encode the hash, and represent it as a urn, like "urn:s256:V2jyhd8tX-19vpEhyrDzIHgUYyDA5MS1Qi71iw1SUP0". This would allow both parties to maintain their own key-db with (kid, key) associations. Embedding the kid in the exchanged cipher messages would allow both parties to easily find the key to decrypt the received message. (very much like we often use the hash of the public key (or pk-cert) to identify the private key to decrypt) The kid embedded in the cipher message is no more than a ?hint?. It could be signed as part of the whole cipher message, but its integrity can only be confirmed after the message is decrypted&authenticated. Changing the kid in a cipher message results in DoS, but so would flipping any other bit in that message. In its most simple form, I believe that the kid-derivation could be a sha2 of the key as long as the key is "truly" random. The only concern may be that some use a simple hash of the key for key derivation...(?). To avoid any of those usage collisions, you could define the convention of pre-pending the key with some publicly know constant, like b'pre-kid-constant' or fancier. If one believes that a simple sha2 hash is only borderline enough secure (?), then maybe use a CMAC or HMAC, where you use the key on the key-value itself, and the resulting tag would constitute the identifier. (I did something like that in franks42/naclj with blake2) Or use HKDF, with maybe a kid-derivation specific constant for the salt, a kid-specific info value, and a sufficient length of the resulting key, i.e. identifier, that makes everybody happy. Maybe something based on HKDF would be best (?). Hope this additional explanation helps. Thanks, Frank. PS. Don?t believe I will resurrect that franks42/naclj - I?ll add a note about depreciation and send them to your effort - it was a good experience learning about Curve/Ed25519 and the nacl/libsodium code though - also trying to keep all data structures as immutable as possible was a good exercise. On Fri, Jul 1, 2016 at 3:53 PM, lvh <_ at lvh.io> wrote: > >> On Jul 1, 2016, at 12:54 PM, Frank Siebenlist wrote: >> >> Hi lvh, >> >> Guess you're the "lvh" who is responsible for "lvh/caesium" ;-). > > Yup. I?m also a founding member of PyCA and the resident cryptographer, which is why I?m on this list :-) > >> Good to see that you've reanimated that project! Believe you were kind of >> distracted for awhile, which "forced" me to play around with >> "franks42/naclj"... which has been on live-support for about a year >> now, because my new job consumes even my playtime. > > It did what I needed it to do at the time, so I didn?t fix what wasn?t broken ;-) I don?t recall anyone reaching out or filing issues. Once someone did ask questions and contributed code, I was happy to merge/review/cut new releases/do new development. More dev is happening now to scratch my own itch :) > > Currently I?m doing a lot of work around NMR as mentioned before, and API design around e.g. different byte buffer types, so that for example you can efficiently dump a nonce and a ciphertext in the same buffer, or derive multiple keys in one iteration of BLAKE2, etc. Also a bunch of work around e.g. pinning and verification of the produced binding and related benchmarking :) > > I invite you to look at caesium again, because some of the criticisms you make in naclj?s README no longer apply (e.g. caesium no longer uses kalium and instead binds libsodium directly, albeit for a different reason than what naclj mentions). Because the binding is done in Clojure, it can do all sorts of metaprogramming including binding every permutation of a particular method for various byte types in addition to the inspection mentioned above, e.g.: https://github.com/lvh/caesium/blob/master/src/caesium/binding.clj#L56-L62 > > Do you intend to continue to develop naclj, or is it effectively retired? > >> As part of that "franks42/naclj" effort, I suggested to standardize >> the derivation of a kid from the two curve25519 public keys. However, >> I recognize that you do not always have any DH-keys available when you >> have a bare symmetric key, > > Is that scheme documented anywhere? I wonder what the use case is for two curve25519 pubkeys ? the ?obvious" case would seem to easily degenerate to the shared symmetric secret (after doing a DH exchange). > >> so I suggested a scheme based on blake2. I >> wrote up some rationale for those choices here: >> "https://github.com/franks42/naclj/blob/master/Keys%2C%20IDs%2C%20and%20URNs.md", >> but never got much traction on the libsodium list,... and then I got >> distracted. >> >> Now I'm faced again with similar key-management issues, which could >> benefit from such key-derived kid's - so I try again. >> >> In summary, your suggestions all resonate very well, but... there are >> too many of them. Let's just pick one identifier derivation mechanism >> for symmetric keys, document it, implement it, use it! > > I think there are a few problems preventing this from happening right now, including: > > - Historically, cryptographers have not researched key wrap anywhere near as much as other schemes. I think the only reason it?s en vogue now is the interest in NMR, which at least a handful of cryptographers (Rogaway, Krovetz, and humbly, myself) care about now, and is incidentally a related problem. > - People want subtly different things for their protocols, further reducing interest. Do you just want to identify a key? That?s fine, but a problem many protocols dodge. Do you want to ship a key to someone who already has a secret or asymmetric key? AEAD (including NMR AEAD in particular, so key wrap) and just asymmetric encryption (a la non-PFS TLS or GPG) is probably where you?re going to land. > - How does this fit in a grander protocol and what is that protocol trying to accomplish? > - How is the key identifier authenticated? What prevents Mallory from just modifying the key id bytes to effectively deny service? E.g. if I?m doing this to make sure I can rotate keys effectively, how do I auth that? Ideally without replacing an unrotatable secret key with another unrotatable secret key :D (Effective key rotation for KEKs is definitely something I care about.) > - When keys are being sent alongside messages, how do we make this not a footgun for e.g. key selection attacks? (Granted, harder for EdDSA, but I want protocols to be correct for arbitrary schemes :)). PyCA cares about recipes being not footguns. That?s a mixed bag: on the one hand, it means we can give safe advice, on the other hand, it does mean that all we have is Fernet... > > Overall, I think this is a reasonable idea for some protocols, but I think we need to be extremely clear about what that is, who it?s for, and how to use it. > > > lvh > >> Groetjes, Frank. >> >> On Fri, Jul 1, 2016 at 9:51 AM, lvh <_ at lvh.io> wrote: >>> Hi Frank, >>> >>>> On Jul 1, 2016, at 11:11 AM, Frank Siebenlist wrote: >>>> >>>> snip snip key identifiers >>> >>> This is why some key derivation functions and PRFs have ?purpose? or ?info" fields, yes; including BLAKE2 and HKDF. Deriving a lesser key (which might just be a keyid) is a perfectly valid strategy from objcap practice. I?m doing something similar in the scheme of a larger semiprivate key scheme using libsodium. You probably do want something that explicitly supports that instead of just implicitly picking a particular nonce or whatever ? I?m not sure which nonce you?re referring to, I don?t think the systems you mentioned take one. TL;DR: make the derivation completely distinct based on what you?re deriving and why you?re deriving it :) >>> >>> You might also want to look at the related concept of NMR and key-wrap, which might let you solve the problem at a slightly different part of your protocol; essentially giving you a protected key with associated data about that key. It?s not entirely clear what the people standardizing GCM-SIV want to do exactly (other than ?not TLS?, I don?t think they?ve said), but this is the obvious choice, especially given GCM-SIVs separate code path for tiny messages and the historical linking of the two from a crypto design perspective. >>> >>> I am also writing NMR stuff on the side in libsodium/caesium, but that focuses mostly on being a Fernet replacement, rather than a keywrap, using secretbox (which makes it easy because big nonce space). Pretty sure I can translate it to the AEAD schemes, but the security proof gets iffier. Which reminds me: we should talk about Clojure bindings to libsodium some time :) >>> >>> >>> lvh >>> _______________________________________________ >>> Cryptography-dev mailing list >>> Cryptography-dev at python.org >>> https://mail.python.org/mailman/listinfo/cryptography-dev >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev > > > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev > From frank.siebenlist at gmail.com Mon Jul 4 14:24:54 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Mon, 4 Jul 2016 11:24:54 -0700 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> Message-ID: To make it a little more real, please look at this gist: https://gist.github.com/franks42/b8b28049adcdf4504271238391c3525b which implements a key identifier generation based on HKDF. Any security concerns with such an approach? Better alternatives? Thanks, Frank. On Sat, Jul 2, 2016 at 4:52 PM, Frank Siebenlist wrote: > Hi Laurens, > > I'm afraid that I have not been very good in explaining my use case, because the questions you ask point at more complicated solutions than I thought were necessary. > > The aim is to find the most convenient symmetric key identifier to embed in a cipher message that would require the minimum amount of key management. > > What is best depends on the context. Sometimes it's easy because there is only one key, or the security context is so unambiguous that associating the right key is trivial. Other times it's a bit more challenging. We have an existing application with tens of long-lived keys, and the current key-management complicates key-rotation and upgrades to modern algos and such. > > If both Alice and Bob can generate key identifiers (kid's) from the key that they share directly, like derive if from the symmetric key, then there is no need to exchange or agree upon a name for that key as it would be kind of a "true name" (read Vinge if you haven't ;-) ). The parties only have to agree on the key identifier derivation method. > > For example, if Alice and Bob agree to name their symmetric keys by taking the sha256 of that key's bytes, base64url encode the hash, and represent it as a urn, like "urn:s256:V2jyhd8tX-19vpEhyrDzIHgUYyDA5MS1Qi71iw1SUP0". This would allow both parties to maintain their own key-db with (kid, key) associations. Embedding the kid in the exchanged cipher messages would allow both parties to easily find the key to decrypt the received message. > (very much like we often use the hash of the public key (or pk-cert) to identify the private key to decrypt) > > The kid embedded in the cipher message is no more than a ?hint?. It could be signed as part of the whole cipher message, but its integrity can only be confirmed after the message is decrypted&authenticated. Changing the kid in a cipher message results in DoS, but so would flipping any other bit in that message. > > In its most simple form, I believe that the kid-derivation could be a sha2 of the key as long as the key is "truly" random. The only concern may be that some use a simple hash of the key for key derivation...(?). To avoid any of those usage collisions, you could define the convention of pre-pending the key with some publicly know constant, like b'pre-kid-constant' or fancier. > > If one believes that a simple sha2 hash is only borderline enough secure (?), then maybe use a CMAC or HMAC, where you use the key on the key-value itself, and the resulting tag would constitute the identifier. (I did something like that in franks42/naclj with blake2) > > Or use HKDF, with maybe a kid-derivation specific constant for the salt, a kid-specific info value, and a sufficient length of the resulting key, i.e. identifier, that makes everybody happy. > > Maybe something based on HKDF would be best (?). > > Hope this additional explanation helps. > > Thanks, Frank. > > PS. Don?t believe I will resurrect that franks42/naclj - I?ll add a note about depreciation and send them to your effort - it was a good experience learning about Curve/Ed25519 and the nacl/libsodium code though - also trying to keep all data structures as immutable as possible was a good exercise. > > > On Fri, Jul 1, 2016 at 3:53 PM, lvh <_ at lvh.io> wrote: >> >>> On Jul 1, 2016, at 12:54 PM, Frank Siebenlist wrote: >>> >>> Hi lvh, >>> >>> Guess you're the "lvh" who is responsible for "lvh/caesium" ;-). >> >> Yup. I?m also a founding member of PyCA and the resident cryptographer, which is why I?m on this list :-) >> >>> Good to see that you've reanimated that project! Believe you were kind of >>> distracted for awhile, which "forced" me to play around with >>> "franks42/naclj"... which has been on live-support for about a year >>> now, because my new job consumes even my playtime. >> >> It did what I needed it to do at the time, so I didn?t fix what wasn?t broken ;-) I don?t recall anyone reaching out or filing issues. Once someone did ask questions and contributed code, I was happy to merge/review/cut new releases/do new development. More dev is happening now to scratch my own itch :) >> >> Currently I?m doing a lot of work around NMR as mentioned before, and API design around e.g. different byte buffer types, so that for example you can efficiently dump a nonce and a ciphertext in the same buffer, or derive multiple keys in one iteration of BLAKE2, etc. Also a bunch of work around e.g. pinning and verification of the produced binding and related benchmarking :) >> >> I invite you to look at caesium again, because some of the criticisms you make in naclj?s README no longer apply (e.g. caesium no longer uses kalium and instead binds libsodium directly, albeit for a different reason than what naclj mentions). Because the binding is done in Clojure, it can do all sorts of metaprogramming including binding every permutation of a particular method for various byte types in addition to the inspection mentioned above, e.g.: https://github.com/lvh/caesium/blob/master/src/caesium/binding.clj#L56-L62 >> >> Do you intend to continue to develop naclj, or is it effectively retired? >> >>> As part of that "franks42/naclj" effort, I suggested to standardize >>> the derivation of a kid from the two curve25519 public keys. However, >>> I recognize that you do not always have any DH-keys available when you >>> have a bare symmetric key, >> >> Is that scheme documented anywhere? I wonder what the use case is for two curve25519 pubkeys ? the ?obvious" case would seem to easily degenerate to the shared symmetric secret (after doing a DH exchange). >> >>> so I suggested a scheme based on blake2. I >>> wrote up some rationale for those choices here: >>> "https://github.com/franks42/naclj/blob/master/Keys%2C%20IDs%2C%20and%20URNs.md", >>> but never got much traction on the libsodium list,... and then I got >>> distracted. >>> >>> Now I'm faced again with similar key-management issues, which could >>> benefit from such key-derived kid's - so I try again. >>> >>> In summary, your suggestions all resonate very well, but... there are >>> too many of them. Let's just pick one identifier derivation mechanism >>> for symmetric keys, document it, implement it, use it! >> >> I think there are a few problems preventing this from happening right now, including: >> >> - Historically, cryptographers have not researched key wrap anywhere near as much as other schemes. I think the only reason it?s en vogue now is the interest in NMR, which at least a handful of cryptographers (Rogaway, Krovetz, and humbly, myself) care about now, and is incidentally a related problem. >> - People want subtly different things for their protocols, further reducing interest. Do you just want to identify a key? That?s fine, but a problem many protocols dodge. Do you want to ship a key to someone who already has a secret or asymmetric key? AEAD (including NMR AEAD in particular, so key wrap) and just asymmetric encryption (a la non-PFS TLS or GPG) is probably where you?re going to land. >> - How does this fit in a grander protocol and what is that protocol trying to accomplish? >> - How is the key identifier authenticated? What prevents Mallory from just modifying the key id bytes to effectively deny service? E.g. if I?m doing this to make sure I can rotate keys effectively, how do I auth that? Ideally without replacing an unrotatable secret key with another unrotatable secret key :D (Effective key rotation for KEKs is definitely something I care about.) >> - When keys are being sent alongside messages, how do we make this not a footgun for e.g. key selection attacks? (Granted, harder for EdDSA, but I want protocols to be correct for arbitrary schemes :)). PyCA cares about recipes being not footguns. That?s a mixed bag: on the one hand, it means we can give safe advice, on the other hand, it does mean that all we have is Fernet... >> >> Overall, I think this is a reasonable idea for some protocols, but I think we need to be extremely clear about what that is, who it?s for, and how to use it. >> >> >> lvh >> >>> Groetjes, Frank. >>> >>> On Fri, Jul 1, 2016 at 9:51 AM, lvh <_ at lvh.io> wrote: >>>> Hi Frank, >>>> >>>>> On Jul 1, 2016, at 11:11 AM, Frank Siebenlist wrote: >>>>> >>>>> snip snip key identifiers >>>> >>>> This is why some key derivation functions and PRFs have ?purpose? or ?info" fields, yes; including BLAKE2 and HKDF. Deriving a lesser key (which might just be a keyid) is a perfectly valid strategy from objcap practice. I?m doing something similar in the scheme of a larger semiprivate key scheme using libsodium. You probably do want something that explicitly supports that instead of just implicitly picking a particular nonce or whatever ? I?m not sure which nonce you?re referring to, I don?t think the systems you mentioned take one. TL;DR: make the derivation completely distinct based on what you?re deriving and why you?re deriving it :) >>>> >>>> You might also want to look at the related concept of NMR and key-wrap, which might let you solve the problem at a slightly different part of your protocol; essentially giving you a protected key with associated data about that key. It?s not entirely clear what the people standardizing GCM-SIV want to do exactly (other than ?not TLS?, I don?t think they?ve said), but this is the obvious choice, especially given GCM-SIVs separate code path for tiny messages and the historical linking of the two from a crypto design perspective. >>>> >>>> I am also writing NMR stuff on the side in libsodium/caesium, but that focuses mostly on being a Fernet replacement, rather than a keywrap, using secretbox (which makes it easy because big nonce space). Pretty sure I can translate it to the AEAD schemes, but the security proof gets iffier. Which reminds me: we should talk about Clojure bindings to libsodium some time :) >>>> >>>> >>>> lvh >>>> _______________________________________________ >>>> Cryptography-dev mailing list >>>> Cryptography-dev at python.org >>>> https://mail.python.org/mailman/listinfo/cryptography-dev >>> _______________________________________________ >>> Cryptography-dev mailing list >>> Cryptography-dev at python.org >>> https://mail.python.org/mailman/listinfo/cryptography-dev >> >> >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev >> From _ at lvh.io Tue Jul 5 10:39:08 2016 From: _ at lvh.io (lvh) Date: Tue, 5 Jul 2016 09:39:08 -0500 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> Message-ID: My apologies for the delay in replying; I?ve been busy taking time off and spending 4th of July weekend with my family. I?ll write a reply to this soon, it?s just that it will probably be a long one ;) > On Jul 4, 2016, at 1:24 PM, Frank Siebenlist wrote: > > To make it a little more real, please look at this gist: > > https://gist.github.com/franks42/b8b28049adcdf4504271238391c3525b > > which implements a key identifier generation based on HKDF. > > Any security concerns with such an approach? > > Better alternatives? > > Thanks, Frank. > > > > On Sat, Jul 2, 2016 at 4:52 PM, Frank Siebenlist > wrote: >> Hi Laurens, >> >> I'm afraid that I have not been very good in explaining my use case, because the questions you ask point at more complicated solutions than I thought were necessary. >> >> The aim is to find the most convenient symmetric key identifier to embed in a cipher message that would require the minimum amount of key management. >> >> What is best depends on the context. Sometimes it's easy because there is only one key, or the security context is so unambiguous that associating the right key is trivial. Other times it's a bit more challenging. We have an existing application with tens of long-lived keys, and the current key-management complicates key-rotation and upgrades to modern algos and such. >> >> If both Alice and Bob can generate key identifiers (kid's) from the key that they share directly, like derive if from the symmetric key, then there is no need to exchange or agree upon a name for that key as it would be kind of a "true name" (read Vinge if you haven't ;-) ). The parties only have to agree on the key identifier derivation method. >> >> For example, if Alice and Bob agree to name their symmetric keys by taking the sha256 of that key's bytes, base64url encode the hash, and represent it as a urn, like "urn:s256:V2jyhd8tX-19vpEhyrDzIHgUYyDA5MS1Qi71iw1SUP0". This would allow both parties to maintain their own key-db with (kid, key) associations. Embedding the kid in the exchanged cipher messages would allow both parties to easily find the key to decrypt the received message. >> (very much like we often use the hash of the public key (or pk-cert) to identify the private key to decrypt) >> >> The kid embedded in the cipher message is no more than a ?hint?. It could be signed as part of the whole cipher message, but its integrity can only be confirmed after the message is decrypted&authenticated. Changing the kid in a cipher message results in DoS, but so would flipping any other bit in that message. >> >> In its most simple form, I believe that the kid-derivation could be a sha2 of the key as long as the key is "truly" random. The only concern may be that some use a simple hash of the key for key derivation...(?). To avoid any of those usage collisions, you could define the convention of pre-pending the key with some publicly know constant, like b'pre-kid-constant' or fancier. >> >> If one believes that a simple sha2 hash is only borderline enough secure (?), then maybe use a CMAC or HMAC, where you use the key on the key-value itself, and the resulting tag would constitute the identifier. (I did something like that in franks42/naclj with blake2) >> >> Or use HKDF, with maybe a kid-derivation specific constant for the salt, a kid-specific info value, and a sufficient length of the resulting key, i.e. identifier, that makes everybody happy. >> >> Maybe something based on HKDF would be best (?). >> >> Hope this additional explanation helps. >> >> Thanks, Frank. >> >> PS. Don?t believe I will resurrect that franks42/naclj - I?ll add a note about depreciation and send them to your effort - it was a good experience learning about Curve/Ed25519 and the nacl/libsodium code though - also trying to keep all data structures as immutable as possible was a good exercise. >> >> >> On Fri, Jul 1, 2016 at 3:53 PM, lvh <_ at lvh.io> wrote: >>> >>>> On Jul 1, 2016, at 12:54 PM, Frank Siebenlist wrote: >>>> >>>> Hi lvh, >>>> >>>> Guess you're the "lvh" who is responsible for "lvh/caesium" ;-). >>> >>> Yup. I?m also a founding member of PyCA and the resident cryptographer, which is why I?m on this list :-) >>> >>>> Good to see that you've reanimated that project! Believe you were kind of >>>> distracted for awhile, which "forced" me to play around with >>>> "franks42/naclj"... which has been on live-support for about a year >>>> now, because my new job consumes even my playtime. >>> >>> It did what I needed it to do at the time, so I didn?t fix what wasn?t broken ;-) I don?t recall anyone reaching out or filing issues. Once someone did ask questions and contributed code, I was happy to merge/review/cut new releases/do new development. More dev is happening now to scratch my own itch :) >>> >>> Currently I?m doing a lot of work around NMR as mentioned before, and API design around e.g. different byte buffer types, so that for example you can efficiently dump a nonce and a ciphertext in the same buffer, or derive multiple keys in one iteration of BLAKE2, etc. Also a bunch of work around e.g. pinning and verification of the produced binding and related benchmarking :) >>> >>> I invite you to look at caesium again, because some of the criticisms you make in naclj?s README no longer apply (e.g. caesium no longer uses kalium and instead binds libsodium directly, albeit for a different reason than what naclj mentions). Because the binding is done in Clojure, it can do all sorts of metaprogramming including binding every permutation of a particular method for various byte types in addition to the inspection mentioned above, e.g.: https://github.com/lvh/caesium/blob/master/src/caesium/binding.clj#L56-L62 >>> >>> Do you intend to continue to develop naclj, or is it effectively retired? >>> >>>> As part of that "franks42/naclj" effort, I suggested to standardize >>>> the derivation of a kid from the two curve25519 public keys. However, >>>> I recognize that you do not always have any DH-keys available when you >>>> have a bare symmetric key, >>> >>> Is that scheme documented anywhere? I wonder what the use case is for two curve25519 pubkeys ? the ?obvious" case would seem to easily degenerate to the shared symmetric secret (after doing a DH exchange). >>> >>>> so I suggested a scheme based on blake2. I >>>> wrote up some rationale for those choices here: >>>> "https://github.com/franks42/naclj/blob/master/Keys%2C%20IDs%2C%20and%20URNs.md", >>>> but never got much traction on the libsodium list,... and then I got >>>> distracted. >>>> >>>> Now I'm faced again with similar key-management issues, which could >>>> benefit from such key-derived kid's - so I try again. >>>> >>>> In summary, your suggestions all resonate very well, but... there are >>>> too many of them. Let's just pick one identifier derivation mechanism >>>> for symmetric keys, document it, implement it, use it! >>> >>> I think there are a few problems preventing this from happening right now, including: >>> >>> - Historically, cryptographers have not researched key wrap anywhere near as much as other schemes. I think the only reason it?s en vogue now is the interest in NMR, which at least a handful of cryptographers (Rogaway, Krovetz, and humbly, myself) care about now, and is incidentally a related problem. >>> - People want subtly different things for their protocols, further reducing interest. Do you just want to identify a key? That?s fine, but a problem many protocols dodge. Do you want to ship a key to someone who already has a secret or asymmetric key? AEAD (including NMR AEAD in particular, so key wrap) and just asymmetric encryption (a la non-PFS TLS or GPG) is probably where you?re going to land. >>> - How does this fit in a grander protocol and what is that protocol trying to accomplish? >>> - How is the key identifier authenticated? What prevents Mallory from just modifying the key id bytes to effectively deny service? E.g. if I?m doing this to make sure I can rotate keys effectively, how do I auth that? Ideally without replacing an unrotatable secret key with another unrotatable secret key :D (Effective key rotation for KEKs is definitely something I care about.) >>> - When keys are being sent alongside messages, how do we make this not a footgun for e.g. key selection attacks? (Granted, harder for EdDSA, but I want protocols to be correct for arbitrary schemes :)). PyCA cares about recipes being not footguns. That?s a mixed bag: on the one hand, it means we can give safe advice, on the other hand, it does mean that all we have is Fernet... >>> >>> Overall, I think this is a reasonable idea for some protocols, but I think we need to be extremely clear about what that is, who it?s for, and how to use it. >>> >>> >>> lvh >>> >>>> Groetjes, Frank. >>>> >>>> On Fri, Jul 1, 2016 at 9:51 AM, lvh <_ at lvh.io> wrote: >>>>> Hi Frank, >>>>> >>>>>> On Jul 1, 2016, at 11:11 AM, Frank Siebenlist wrote: >>>>>> >>>>>> snip snip key identifiers >>>>> >>>>> This is why some key derivation functions and PRFs have ?purpose? or ?info" fields, yes; including BLAKE2 and HKDF. Deriving a lesser key (which might just be a keyid) is a perfectly valid strategy from objcap practice. I?m doing something similar in the scheme of a larger semiprivate key scheme using libsodium. You probably do want something that explicitly supports that instead of just implicitly picking a particular nonce or whatever ? I?m not sure which nonce you?re referring to, I don?t think the systems you mentioned take one. TL;DR: make the derivation completely distinct based on what you?re deriving and why you?re deriving it :) >>>>> >>>>> You might also want to look at the related concept of NMR and key-wrap, which might let you solve the problem at a slightly different part of your protocol; essentially giving you a protected key with associated data about that key. It?s not entirely clear what the people standardizing GCM-SIV want to do exactly (other than ?not TLS?, I don?t think they?ve said), but this is the obvious choice, especially given GCM-SIVs separate code path for tiny messages and the historical linking of the two from a crypto design perspective. >>>>> >>>>> I am also writing NMR stuff on the side in libsodium/caesium, but that focuses mostly on being a Fernet replacement, rather than a keywrap, using secretbox (which makes it easy because big nonce space). Pretty sure I can translate it to the AEAD schemes, but the security proof gets iffier. Which reminds me: we should talk about Clojure bindings to libsodium some time :) >>>>> >>>>> >>>>> lvh >>>>> _______________________________________________ >>>>> Cryptography-dev mailing list >>>>> Cryptography-dev at python.org >>>>> https://mail.python.org/mailman/listinfo/cryptography-dev >>>> _______________________________________________ >>>> Cryptography-dev mailing list >>>> Cryptography-dev at python.org >>>> https://mail.python.org/mailman/listinfo/cryptography-dev >>> >>> >>> _______________________________________________ >>> Cryptography-dev mailing list >>> Cryptography-dev at python.org >>> https://mail.python.org/mailman/listinfo/cryptography-dev >>> > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev From _ at lvh.io Wed Jul 6 13:13:43 2016 From: _ at lvh.io (lvh) Date: Wed, 6 Jul 2016 12:13:43 -0500 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> Message-ID: Hi, > On Jul 2, 2016, at 6:52 PM, Frank Siebenlist wrote: > The aim is to find the most convenient symmetric key identifier to embed in a cipher message that would require the minimum amount of key management. > > What is best depends on the context. Sometimes it's easy because there is only one key, or the security context is so unambiguous that associating the right key is trivial. Other times it's a bit more challenging. We have an existing application with tens of long-lived keys, and the current key-management complicates key-rotation and upgrades to modern algos and such. I?m assuming these keys are symmetric and don?t live in an HSM (since you seem to be able to perform arbitrary computation with them)? > If both Alice and Bob can generate key identifiers (kid's) from the key that they share directly, like derive if from the symmetric key, then there is no need to exchange or agree upon a name for that key as it would be kind of a "true name" (read Vinge if you haven't ;-) ). The parties only have to agree on the key identifier derivation method. > > For example, if Alice and Bob agree to name their symmetric keys by taking the sha256 of that key's bytes, base64url encode the hash, and represent it as a urn, like "urn:s256:V2jyhd8tX-19vpEhyrDzIHgUYyDA5MS1Qi71iw1SUP0". This would allow both parties to maintain their own key-db with (kid, key) associations. Embedding the kid in the exchanged cipher messages would allow both parties to easily find the key to decrypt the received message. > (very much like we often use the hash of the public key (or pk-cert) to identify the private key to decrypt) For some more context on ?it depends what you want to accomplish, and generic schemes are hard?; Bob and Alice may also want to have key ids that only work for _them_ ? e.g. Bob and Alice?s static DH keys are used to generate a shared secret used for an AD key wrap scheme. > The kid embedded in the cipher message is no more than a ?hint?. It could be signed as part of the whole cipher message, but its integrity can only be confirmed after the message is decrypted&authenticated. Changing the kid in a cipher message results in DoS, but so would flipping any other bit in that message. Does a failed decryption cause Bob to reject the message, or just try all the other keys? If so, what?s the benefit between just giving keys names, like sequence numbers or even strings? > In its most simple form, I believe that the kid-derivation could be a sha2 of the key as long as the key is "truly" random. The only concern may be that some use a simple hash of the key for key derivation...(?). To avoid any of those usage collisions, you could define the convention of pre-pending the key with some publicly know constant, like b'pre-kid-constant' or fancier. SHA2?s problems are a little less obvious when inputs are fixed length, but keys aren?t always ? I?d recommend a SHA3-era hash like BLAKE2b or SHA-3 itself to not have to worry about that part at all :) > If one believes that a simple sha2 hash is only borderline enough secure (?), then maybe use a CMAC or HMAC, where you use the key on the key-value itself, and the resulting tag would constitute the identifier. (I did something like that in franks42/naclj with blake2) What?s the key used to compute the MAC? (In this case, I think what you _really_ want is AD key wrapping schemes, including GCM-SIV?s tiny mode). > Or use HKDF, with maybe a kid-derivation specific constant for the salt, a kid-specific info value, and a sufficient length of the resulting key, i.e. identifier, that makes everybody happy. I?d probably go with BLAKE2b if this is _all_ you?re trying to do, but I think what you might really want is key wrap :) > Hope this additional explanation helps. A little :) Is this for encryption at rest, with multiple recipients, where the recipients are assumed to already have all of the keys? > PS. Don?t believe I will resurrect that franks42/naclj - I?ll add a note about depreciation and send them to your effort - it was a good experience learning about Curve/Ed25519 and the nacl/libsodium code though - also trying to keep all data structures as immutable as possible was a good exercise. It definitely has. I?m working on a blog post (series of blog posts) on crypto API design, particularly in the context of libsodium and the JVMs plethora of byte types. lvh > > On Fri, Jul 1, 2016 at 3:53 PM, lvh <_ at lvh.io> wrote: >> >>> On Jul 1, 2016, at 12:54 PM, Frank Siebenlist wrote: >>> >>> Hi lvh, >>> >>> Guess you're the "lvh" who is responsible for "lvh/caesium" ;-). >> >> Yup. I?m also a founding member of PyCA and the resident cryptographer, which is why I?m on this list :-) >> >>> Good to see that you've reanimated that project! Believe you were kind of >>> distracted for awhile, which "forced" me to play around with >>> "franks42/naclj"... which has been on live-support for about a year >>> now, because my new job consumes even my playtime. >> >> It did what I needed it to do at the time, so I didn?t fix what wasn?t broken ;-) I don?t recall anyone reaching out or filing issues. Once someone did ask questions and contributed code, I was happy to merge/review/cut new releases/do new development. More dev is happening now to scratch my own itch :) >> >> Currently I?m doing a lot of work around NMR as mentioned before, and API design around e.g. different byte buffer types, so that for example you can efficiently dump a nonce and a ciphertext in the same buffer, or derive multiple keys in one iteration of BLAKE2, etc. Also a bunch of work around e.g. pinning and verification of the produced binding and related benchmarking :) >> >> I invite you to look at caesium again, because some of the criticisms you make in naclj?s README no longer apply (e.g. caesium no longer uses kalium and instead binds libsodium directly, albeit for a different reason than what naclj mentions). Because the binding is done in Clojure, it can do all sorts of metaprogramming including binding every permutation of a particular method for various byte types in addition to the inspection mentioned above, e.g.: https://github.com/lvh/caesium/blob/master/src/caesium/binding.clj#L56-L62 >> >> Do you intend to continue to develop naclj, or is it effectively retired? >> >>> As part of that "franks42/naclj" effort, I suggested to standardize >>> the derivation of a kid from the two curve25519 public keys. However, >>> I recognize that you do not always have any DH-keys available when you >>> have a bare symmetric key, >> >> Is that scheme documented anywhere? I wonder what the use case is for two curve25519 pubkeys ? the ?obvious" case would seem to easily degenerate to the shared symmetric secret (after doing a DH exchange). >> >>> so I suggested a scheme based on blake2. I >>> wrote up some rationale for those choices here: >>> "https://github.com/franks42/naclj/blob/master/Keys%2C%20IDs%2C%20and%20URNs.md", >>> but never got much traction on the libsodium list,... and then I got >>> distracted. >>> >>> Now I'm faced again with similar key-management issues, which could >>> benefit from such key-derived kid's - so I try again. >>> >>> In summary, your suggestions all resonate very well, but... there are >>> too many of them. Let's just pick one identifier derivation mechanism >>> for symmetric keys, document it, implement it, use it! >> >> I think there are a few problems preventing this from happening right now, including: >> >> - Historically, cryptographers have not researched key wrap anywhere near as much as other schemes. I think the only reason it?s en vogue now is the interest in NMR, which at least a handful of cryptographers (Rogaway, Krovetz, and humbly, myself) care about now, and is incidentally a related problem. >> - People want subtly different things for their protocols, further reducing interest. Do you just want to identify a key? That?s fine, but a problem many protocols dodge. Do you want to ship a key to someone who already has a secret or asymmetric key? AEAD (including NMR AEAD in particular, so key wrap) and just asymmetric encryption (a la non-PFS TLS or GPG) is probably where you?re going to land. >> - How does this fit in a grander protocol and what is that protocol trying to accomplish? >> - How is the key identifier authenticated? What prevents Mallory from just modifying the key id bytes to effectively deny service? E.g. if I?m doing this to make sure I can rotate keys effectively, how do I auth that? Ideally without replacing an unrotatable secret key with another unrotatable secret key :D (Effective key rotation for KEKs is definitely something I care about.) >> - When keys are being sent alongside messages, how do we make this not a footgun for e.g. key selection attacks? (Granted, harder for EdDSA, but I want protocols to be correct for arbitrary schemes :)). PyCA cares about recipes being not footguns. That?s a mixed bag: on the one hand, it means we can give safe advice, on the other hand, it does mean that all we have is Fernet... >> >> Overall, I think this is a reasonable idea for some protocols, but I think we need to be extremely clear about what that is, who it?s for, and how to use it. >> >> >> lvh >> >>> Groetjes, Frank. >>> >>> On Fri, Jul 1, 2016 at 9:51 AM, lvh <_ at lvh.io> wrote: >>>> Hi Frank, >>>> >>>>> On Jul 1, 2016, at 11:11 AM, Frank Siebenlist wrote: >>>>> >>>>> snip snip key identifiers >>>> >>>> This is why some key derivation functions and PRFs have ?purpose? or ?info" fields, yes; including BLAKE2 and HKDF. Deriving a lesser key (which might just be a keyid) is a perfectly valid strategy from objcap practice. I?m doing something similar in the scheme of a larger semiprivate key scheme using libsodium. You probably do want something that explicitly supports that instead of just implicitly picking a particular nonce or whatever ? I?m not sure which nonce you?re referring to, I don?t think the systems you mentioned take one. TL;DR: make the derivation completely distinct based on what you?re deriving and why you?re deriving it :) >>>> >>>> You might also want to look at the related concept of NMR and key-wrap, which might let you solve the problem at a slightly different part of your protocol; essentially giving you a protected key with associated data about that key. It?s not entirely clear what the people standardizing GCM-SIV want to do exactly (other than ?not TLS?, I don?t think they?ve said), but this is the obvious choice, especially given GCM-SIVs separate code path for tiny messages and the historical linking of the two from a crypto design perspective. >>>> >>>> I am also writing NMR stuff on the side in libsodium/caesium, but that focuses mostly on being a Fernet replacement, rather than a keywrap, using secretbox (which makes it easy because big nonce space). Pretty sure I can translate it to the AEAD schemes, but the security proof gets iffier. Which reminds me: we should talk about Clojure bindings to libsodium some time :) >>>> >>>> >>>> lvh >>>> _______________________________________________ >>>> Cryptography-dev mailing list >>>> Cryptography-dev at python.org >>>> https://mail.python.org/mailman/listinfo/cryptography-dev >>> _______________________________________________ >>> Cryptography-dev mailing list >>> Cryptography-dev at python.org >>> https://mail.python.org/mailman/listinfo/cryptography-dev >> >> >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev >> > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev From frank.siebenlist at gmail.com Wed Jul 6 14:22:05 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Wed, 6 Jul 2016 11:22:05 -0700 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> Message-ID: Thanks for the detailed scrutiny! Comments/answers in-line: > ... >> The aim is to find the most convenient symmetric key identifier to embed in a cipher message that would require the minimum amount of key management. >> >> What is best depends on the context. Sometimes it's easy because there is only one key, or the security context is so unambiguous that associating the right key is trivial. Other times it's a bit more challenging. We have an existing application with tens of long-lived keys, and the current key-management complicates key-rotation and upgrades to modern algos and such. > > I?m assuming these keys are symmetric and don?t live in an HSM (since you seem to be able to perform arbitrary computation with them)? Correct - you need access to the key's bytes for the key identifier scheme I'm looking for. >> If both Alice and Bob can generate key identifiers (kid's) from the key that they share directly, like derive if from the symmetric key, then there is no need to exchange or agree upon a name for that key as it would be kind of a "true name" (read Vinge if you haven't ;-) ). The parties only have to agree on the key identifier derivation method. >> >> For example, if Alice and Bob agree to name their symmetric keys by taking the sha256 of that key's bytes, base64url encode the hash, and represent it as a urn, like "urn:s256:V2jyhd8tX-19vpEhyrDzIHgUYyDA5MS1Qi71iw1SUP0". This would allow both parties to maintain their own key-db with (kid, key) associations. Embedding the kid in the exchanged cipher messages would allow both parties to easily find the key to decrypt the received message. >> (very much like we often use the hash of the public key (or pk-cert) to identify the private key to decrypt) > > For some more context on ?it depends what you want to accomplish, and generic schemes are hard?; Bob and Alice may also want to have key ids that only work for _them_ ? e.g. Bob and Alice?s static DH keys are used to generate a shared secret used for an AD key wrap scheme. You're right - Alice may name the key she shares with Bob: "Bob's key", while Bob may name the same key: "Alice's key" on his end. They can/should use what ever name is easiest to construct the cypher messages that they want to exchange with each other. However, the key identifier that they embed inside of the cypher message cannot be a local nickname, but should be one that both parties can use, like the key identifier that I'm looking for. >> The kid embedded in the cipher message is no more than a ?hint?. It could be signed as part of the whole cipher message, but its integrity can only be confirmed after the message is decrypted&authenticated. Changing the kid in a cipher message results in DoS, but so would flipping any other bit in that message. > > Does a failed decryption cause Bob to reject the message, or just try all the other keys? If so, what?s the benefit between just giving keys names, like sequence numbers or even strings? What you do with a failed decryption is an interesting question, but I'm not sure why it's relevant for the key identifier scheme... (if you loop through all the keys and find one that decrypts the message even though it doesn't match the kid... could be phishy...). You could use any key identifier you want, as long as both Alice and Bob will know how to find the right key for that kid. When you use uuids, or arbitrary names/strings, though, you require Alice and Bob to agree on the (identifier, key) separately, before the cipher message can be decrypted. However, when you use an "intrinsic" identifier, like the one I'm proposing, then both Alice and Bob can generate those kid's for all the keys that they have and share, without any separate agreement - they only have to agree on the kid-derivation method. That observation is probably the main selling point. >> In its most simple form, I believe that the kid-derivation could be a sha2 of the key as long as the key is "truly" random. The only concern may be that some use a simple hash of the key for key derivation...(?). To avoid any of those usage collisions, you could define the convention of pre-pending the key with some publicly know constant, like b'pre-kid-constant' or fancier. > > SHA2?s problems are a little less obvious when inputs are fixed length, but keys aren?t always ? I?d recommend a SHA3-era hash like BLAKE2b or SHA-3 itself to not have to worry about that part at all :) > >> If one believes that a simple sha2 hash is only borderline enough secure (?), then maybe use a CMAC or HMAC, where you use the key on the key-value itself, and the resulting tag would constitute the identifier. (I did something like that in franks42/naclj with blake2) > > What?s the key used to compute the MAC? (In this case, I think what you _really_ want is AD key wrapping schemes, including GCM-SIV?s tiny mode). For that blake2 scheme that I used in franks42/naclj, the authentication-key is the key itself - you hash the key and use that same key to provide additional integrity protection. Pretty sure HMAC-like schemes were never meant for that purpose... but it doesn't hurt... >> Or use HKDF, with maybe a kid-derivation specific constant for the salt, a kid-specific info value, and a sufficient length of the resulting key, i.e. identifier, that makes everybody happy. > > I?d probably go with BLAKE2b if this is _all_ you?re trying to do, but I think what you might really want is key wrap :) Love blake2, but it's not available in plain-vanilla pyca/cryptography... Any concerns with using HKDF for this as I suggested in the gist? https://gist.github.com/franks42/b8b28049adcdf4504271238391c3525b Now comes my question about this "key wrap" that you so obviously try to promote as a solution ;-)... If I understand it well, key-wrap schemes also requires a second kek-like key, which we do not have... How would that work? >> Hope this additional explanation helps. > > A little :) Is this for encryption at rest, with multiple recipients, where the recipients are assumed to already have all of the keys? Cipher messages in rest or in flight - both use cases apply - any time you have to find the key to decrypt/verify a message through a key identifier send along with that message. Multiple recipients - sure - they face the same issue of finding the right key to decrypt - although you may use the individually shared keys as kek's but those scenarios are probably distracting... Yes, sending and receiving parties must have a shared (symmetric) key to make this work - through what key-exchange mechanism this was achieved is not important for this scheme to work. Regards, Frank. > >> PS. Don?t believe I will resurrect that franks42/naclj - I?ll add a note about depreciation and send them to your effort - it was a good experience learning about Curve/Ed25519 and the nacl/libsodium code though - also trying to keep all data structures as immutable as possible was a good exercise. > > It definitely has. I?m working on a blog post (series of blog posts) on crypto API design, particularly in the context of libsodium and the JVMs plethora of byte types. Crypto API design is complicated and has been screwed up many times - in my experience API design should probably not be left to the cryptographers as they live on a different planet ;-) - looking forward to that blog post! > >> On Fri, Jul 1, 2016 at 3:53 PM, lvh <_ at lvh.io> wrote: >>> >>>> On Jul 1, 2016, at 12:54 PM, Frank Siebenlist wrote: >>>> >>>> Hi lvh, >>>> >>>> Guess you're the "lvh" who is responsible for "lvh/caesium" ;-). >>> >>> Yup. I?m also a founding member of PyCA and the resident cryptographer, which is why I?m on this list :-) >>> >>>> Good to see that you've reanimated that project! Believe you were kind of >>>> distracted for awhile, which "forced" me to play around with >>>> "franks42/naclj"... which has been on live-support for about a year >>>> now, because my new job consumes even my playtime. >>> >>> It did what I needed it to do at the time, so I didn?t fix what wasn?t broken ;-) I don?t recall anyone reaching out or filing issues. Once someone did ask questions and contributed code, I was happy to merge/review/cut new releases/do new development. More dev is happening now to scratch my own itch :) >>> >>> Currently I?m doing a lot of work around NMR as mentioned before, and API design around e.g. different byte buffer types, so that for example you can efficiently dump a nonce and a ciphertext in the same buffer, or derive multiple keys in one iteration of BLAKE2, etc. Also a bunch of work around e.g. pinning and verification of the produced binding and related benchmarking :) >>> >>> I invite you to look at caesium again, because some of the criticisms you make in naclj?s README no longer apply (e.g. caesium no longer uses kalium and instead binds libsodium directly, albeit for a different reason than what naclj mentions). Because the binding is done in Clojure, it can do all sorts of metaprogramming including binding every permutation of a particular method for various byte types in addition to the inspection mentioned above, e.g.: https://github.com/lvh/caesium/blob/master/src/caesium/binding.clj#L56-L62 >>> >>> Do you intend to continue to develop naclj, or is it effectively retired? >>> >>>> As part of that "franks42/naclj" effort, I suggested to standardize >>>> the derivation of a kid from the two curve25519 public keys. However, >>>> I recognize that you do not always have any DH-keys available when you >>>> have a bare symmetric key, >>> >>> Is that scheme documented anywhere? I wonder what the use case is for two curve25519 pubkeys ? the ?obvious" case would seem to easily degenerate to the shared symmetric secret (after doing a DH exchange). >>> >>>> so I suggested a scheme based on blake2. I >>>> wrote up some rationale for those choices here: >>>> "https://github.com/franks42/naclj/blob/master/Keys%2C%20IDs%2C%20and%20URNs.md", >>>> but never got much traction on the libsodium list,... and then I got >>>> distracted. >>>> >>>> Now I'm faced again with similar key-management issues, which could >>>> benefit from such key-derived kid's - so I try again. >>>> >>>> In summary, your suggestions all resonate very well, but... there are >>>> too many of them. Let's just pick one identifier derivation mechanism >>>> for symmetric keys, document it, implement it, use it! >>> >>> I think there are a few problems preventing this from happening right now, including: >>> >>> - Historically, cryptographers have not researched key wrap anywhere near as much as other schemes. I think the only reason it?s en vogue now is the interest in NMR, which at least a handful of cryptographers (Rogaway, Krovetz, and humbly, myself) care about now, and is incidentally a related problem. >>> - People want subtly different things for their protocols, further reducing interest. Do you just want to identify a key? That?s fine, but a problem many protocols dodge. Do you want to ship a key to someone who already has a secret or asymmetric key? AEAD (including NMR AEAD in particular, so key wrap) and just asymmetric encryption (a la non-PFS TLS or GPG) is probably where you?re going to land. >>> - How does this fit in a grander protocol and what is that protocol trying to accomplish? >>> - How is the key identifier authenticated? What prevents Mallory from just modifying the key id bytes to effectively deny service? E.g. if I?m doing this to make sure I can rotate keys effectively, how do I auth that? Ideally without replacing an unrotatable secret key with another unrotatable secret key :D (Effective key rotation for KEKs is definitely something I care about.) >>> - When keys are being sent alongside messages, how do we make this not a footgun for e.g. key selection attacks? (Granted, harder for EdDSA, but I want protocols to be correct for arbitrary schemes :)). PyCA cares about recipes being not footguns. That?s a mixed bag: on the one hand, it means we can give safe advice, on the other hand, it does mean that all we have is Fernet... >>> >>> Overall, I think this is a reasonable idea for some protocols, but I think we need to be extremely clear about what that is, who it?s for, and how to use it. >>> >>> >>> lvh >>> >>>> Groetjes, Frank. >>>> >>>> On Fri, Jul 1, 2016 at 9:51 AM, lvh <_ at lvh.io> wrote: >>>>> Hi Frank, >>>>> >>>>>> On Jul 1, 2016, at 11:11 AM, Frank Siebenlist wrote: >>>>>> >>>>>> snip snip key identifiers >>>>> >>>>> This is why some key derivation functions and PRFs have ?purpose? or ?info" fields, yes; including BLAKE2 and HKDF. Deriving a lesser key (which might just be a keyid) is a perfectly valid strategy from objcap practice. I?m doing something similar in the scheme of a larger semiprivate key scheme using libsodium. You probably do want something that explicitly supports that instead of just implicitly picking a particular nonce or whatever ? I?m not sure which nonce you?re referring to, I don?t think the systems you mentioned take one. TL;DR: make the derivation completely distinct based on what you?re deriving and why you?re deriving it :) >>>>> >>>>> You might also want to look at the related concept of NMR and key-wrap, which might let you solve the problem at a slightly different part of your protocol; essentially giving you a protected key with associated data about that key. It?s not entirely clear what the people standardizing GCM-SIV want to do exactly (other than ?not TLS?, I don?t think they?ve said), but this is the obvious choice, especially given GCM-SIVs separate code path for tiny messages and the historical linking of the two from a crypto design perspective. >>>>> >>>>> I am also writing NMR stuff on the side in libsodium/caesium, but that focuses mostly on being a Fernet replacement, rather than a keywrap, using secretbox (which makes it easy because big nonce space). Pretty sure I can translate it to the AEAD schemes, but the security proof gets iffier. Which reminds me: we should talk about Clojure bindings to libsodium some time :) >>>>> >>>>> >>>>> lvh >>>>> _______________________________________________ >>>>> Cryptography-dev mailing list >>>>> Cryptography-dev at python.org >>>>> https://mail.python.org/mailman/listinfo/cryptography-dev >>>> _______________________________________________ >>>> Cryptography-dev mailing list >>>> Cryptography-dev at python.org >>>> https://mail.python.org/mailman/listinfo/cryptography-dev >>> >>> >>> _______________________________________________ >>> Cryptography-dev mailing list >>> Cryptography-dev at python.org >>> https://mail.python.org/mailman/listinfo/cryptography-dev >>> >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev > > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev From _ at lvh.cc Wed Jul 6 18:20:22 2016 From: _ at lvh.cc (Laurens Van Houtven) Date: Wed, 6 Jul 2016 17:20:22 -0500 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> Message-ID: <5AAC30C0-1042-4DDF-A6AD-75C2DE0AED6C@lvh.cc> Hi, Sent from my iPhone > On Jul 6, 2016, at 13:22, Frank Siebenlist wrote: >> For some more context on ?it depends what you want to accomplish, and generic schemes are hard?; Bob and Alice may also want to have key ids that only work for _them_ ? e.g. Bob and Alice?s static DH keys are used to generate a shared secret used for an AD key wrap scheme. > > You're right - Alice may name the key she shares with Bob: "Bob's > key", while Bob may name the same key: "Alice's key" on his end. They > can/should use what ever name is easiest to construct the cypher > messages that they want to exchange with each other. However, the key > identifier that they embed inside of the cypher message cannot be a > local nickname, but should be one that both parties can use, like the > key identifier that I'm looking for. Key wrap is symmetric, deterministic encryption; they are only local within the context of that key, not local to an identity. >>> The kid embedded in the cipher message is no more than a ?hint?. It could be signed as part of the whole cipher message, but its integrity can only be confirmed after the message is decrypted&authenticated. Changing the kid in a cipher message results in DoS, but so would flipping any other bit in that message. >> >> Does a failed decryption cause Bob to reject the message, or just try all the other keys? If so, what?s the benefit between just giving keys names, like sequence numbers or even strings? > > What you do with a failed decryption is an interesting question, but > I'm not sure why it's relevant for the key identifier scheme... > (if you loop through all the keys and find one that decrypts the > message even though it doesn't match the kid... could be phishy...). I'm not sure; how you use it has relevant consequences for how efficient you can make the scheme. > You could use any key identifier you want, as long as both Alice and > Bob will know how to find the right key for that kid. > When you use uuids, or arbitrary names/strings, though, you require > Alice and Bob to agree on the (identifier, key) separately, before the > cipher message can be decrypted. > However, when you use an "intrinsic" identifier, like the one I'm > proposing, then both Alice and Bob can generate those kid's for all > the keys that they have and share, without any separate agreement - > they only have to agree on the kid-derivation method. That observation > is probably the main selling point. >>> If one believes that a simple sha2 hash is only borderline enough secure (?), then maybe use a CMAC or HMAC, where you use the key on the key-value itself, and the resulting tag would constitute the identifier. (I did something like that in franks42/naclj with blake2) >> >> What?s the key used to compute the MAC? (In this case, I think what you _really_ want is AD key wrapping schemes, including GCM-SIV?s tiny mode). > > > For that blake2 scheme that I used in franks42/naclj, the > authentication-key is the key itself - you hash the key and use that > same key to provide additional integrity protection. Pretty sure > HMAC-like schemes were never meant for that purpose... but it doesn't > hurt... Do you have a proof of security for that? I'm in a car and don't have my notebook, but it seems like it'd be pretty easy to build a secure PRF for which that is not OK; I'm thinking CBC-MAC style vulns for example. Doing this securely (with a real key) is what key wrap tries to solve. >>> Or use HKDF, with maybe a kid-derivation specific constant for the salt, a kid-specific info value, and a sufficient length of the resulting key, i.e. identifier, that makes everybody happy. >> >> I?d probably go with BLAKE2b if this is _all_ you?re trying to do, but I think what you might really want is key wrap :) > > Love blake2, but it's not available in plain-vanilla pyca/cryptography... You should go fix that! ;) > Any concerns with using HKDF for this as I suggested in the gist? > https://gist.github.com/franks42/b8b28049adcdf4504271238391c3525b Seems fine; will get it a more thorough review when I get back. > Now comes my question about this "key wrap" that you so obviously try > to promote as a solution ;-)... No horse in this race; it's just that the deterministic encryption folks used to encrypt 16/32 bytes at a time and call what they do "key wrap" and it sounded a lot like what you want. Here's where I would get started: http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/siv/siv.pdf > If I understand it well, key-wrap schemes also requires a second > kek-like key, which we do not have... > How would that work? See above; with a real key; perhaps not exactly what you're looking for. > Cipher messages in rest or in flight - both use cases apply - any time > you have to find the key to decrypt/verify a message through a key > identifier send along with that message. > > Multiple recipients - sure - they face the same issue of finding the > right key to decrypt - although you may use the individually shared > keys as kek's but those scenarios are probably distracting... > > Yes, sending and receiving parties must have a shared (symmetric) key > to make this work - through what key-exchange mechanism this was > achieved is not important for this scheme to work. Right. The reason I'm being so persistent is similar to why a lot of cryptographers dislike PAKE -- it's not that it's bad or hard to do -- it just seems like a weird problem to have. To quote Glyph, it sounded a bit like a jackhammer problem :) In short: HKDF and BLAKE2 seem like what you want :) lvh > Regards, Frank. > > >> >>> PS. Don?t believe I will resurrect that franks42/naclj - I?ll add a note about depreciation and send them to your effort - it was a good experience learning about Curve/Ed25519 and the nacl/libsodium code though - also trying to keep all data structures as immutable as possible was a good exercise. >> >> It definitely has. I?m working on a blog post (series of blog posts) on crypto API design, particularly in the context of libsodium and the JVMs plethora of byte types. > > Crypto API design is complicated and has been screwed up many times - > in my experience API design should probably not be left to the > cryptographers as they live on a different planet ;-) - looking > forward to that blog post! > > >> >>>> On Fri, Jul 1, 2016 at 3:53 PM, lvh <_ at lvh.io> wrote: >>>> >>>>> On Jul 1, 2016, at 12:54 PM, Frank Siebenlist wrote: >>>>> >>>>> Hi lvh, >>>>> >>>>> Guess you're the "lvh" who is responsible for "lvh/caesium" ;-). >>>> >>>> Yup. I?m also a founding member of PyCA and the resident cryptographer, which is why I?m on this list :-) >>>> >>>>> Good to see that you've reanimated that project! Believe you were kind of >>>>> distracted for awhile, which "forced" me to play around with >>>>> "franks42/naclj"... which has been on live-support for about a year >>>>> now, because my new job consumes even my playtime. >>>> >>>> It did what I needed it to do at the time, so I didn?t fix what wasn?t broken ;-) I don?t recall anyone reaching out or filing issues. Once someone did ask questions and contributed code, I was happy to merge/review/cut new releases/do new development. More dev is happening now to scratch my own itch :) >>>> >>>> Currently I?m doing a lot of work around NMR as mentioned before, and API design around e.g. different byte buffer types, so that for example you can efficiently dump a nonce and a ciphertext in the same buffer, or derive multiple keys in one iteration of BLAKE2, etc. Also a bunch of work around e.g. pinning and verification of the produced binding and related benchmarking :) >>>> >>>> I invite you to look at caesium again, because some of the criticisms you make in naclj?s README no longer apply (e.g. caesium no longer uses kalium and instead binds libsodium directly, albeit for a different reason than what naclj mentions). Because the binding is done in Clojure, it can do all sorts of metaprogramming including binding every permutation of a particular method for various byte types in addition to the inspection mentioned above, e.g.: https://github.com/lvh/caesium/blob/master/src/caesium/binding.clj#L56-L62 >>>> >>>> Do you intend to continue to develop naclj, or is it effectively retired? >>>> >>>>> As part of that "franks42/naclj" effort, I suggested to standardize >>>>> the derivation of a kid from the two curve25519 public keys. However, >>>>> I recognize that you do not always have any DH-keys available when you >>>>> have a bare symmetric key, >>>> >>>> Is that scheme documented anywhere? I wonder what the use case is for two curve25519 pubkeys ? the ?obvious" case would seem to easily degenerate to the shared symmetric secret (after doing a DH exchange). >>>> >>>>> so I suggested a scheme based on blake2. I >>>>> wrote up some rationale for those choices here: >>>>> "https://github.com/franks42/naclj/blob/master/Keys%2C%20IDs%2C%20and%20URNs.md", >>>>> but never got much traction on the libsodium list,... and then I got >>>>> distracted. >>>>> >>>>> Now I'm faced again with similar key-management issues, which could >>>>> benefit from such key-derived kid's - so I try again. >>>>> >>>>> In summary, your suggestions all resonate very well, but... there are >>>>> too many of them. Let's just pick one identifier derivation mechanism >>>>> for symmetric keys, document it, implement it, use it! >>>> >>>> I think there are a few problems preventing this from happening right now, including: >>>> >>>> - Historically, cryptographers have not researched key wrap anywhere near as much as other schemes. I think the only reason it?s en vogue now is the interest in NMR, which at least a handful of cryptographers (Rogaway, Krovetz, and humbly, myself) care about now, and is incidentally a related problem. >>>> - People want subtly different things for their protocols, further reducing interest. Do you just want to identify a key? That?s fine, but a problem many protocols dodge. Do you want to ship a key to someone who already has a secret or asymmetric key? AEAD (including NMR AEAD in particular, so key wrap) and just asymmetric encryption (a la non-PFS TLS or GPG) is probably where you?re going to land. >>>> - How does this fit in a grander protocol and what is that protocol trying to accomplish? >>>> - How is the key identifier authenticated? What prevents Mallory from just modifying the key id bytes to effectively deny service? E.g. if I?m doing this to make sure I can rotate keys effectively, how do I auth that? Ideally without replacing an unrotatable secret key with another unrotatable secret key :D (Effective key rotation for KEKs is definitely something I care about.) >>>> - When keys are being sent alongside messages, how do we make this not a footgun for e.g. key selection attacks? (Granted, harder for EdDSA, but I want protocols to be correct for arbitrary schemes :)). PyCA cares about recipes being not footguns. That?s a mixed bag: on the one hand, it means we can give safe advice, on the other hand, it does mean that all we have is Fernet... >>>> >>>> Overall, I think this is a reasonable idea for some protocols, but I think we need to be extremely clear about what that is, who it?s for, and how to use it. >>>> >>>> >>>> lvh >>>> >>>>> Groetjes, Frank. >>>>> >>>>>> On Fri, Jul 1, 2016 at 9:51 AM, lvh <_ at lvh.io> wrote: >>>>>> Hi Frank, >>>>>> >>>>>>> On Jul 1, 2016, at 11:11 AM, Frank Siebenlist wrote: >>>>>>> >>>>>>> snip snip key identifiers >>>>>> >>>>>> This is why some key derivation functions and PRFs have ?purpose? or ?info" fields, yes; including BLAKE2 and HKDF. Deriving a lesser key (which might just be a keyid) is a perfectly valid strategy from objcap practice. I?m doing something similar in the scheme of a larger semiprivate key scheme using libsodium. You probably do want something that explicitly supports that instead of just implicitly picking a particular nonce or whatever ? I?m not sure which nonce you?re referring to, I don?t think the systems you mentioned take one. TL;DR: make the derivation completely distinct based on what you?re deriving and why you?re deriving it :) >>>>>> >>>>>> You might also want to look at the related concept of NMR and key-wrap, which might let you solve the problem at a slightly different part of your protocol; essentially giving you a protected key with associated data about that key. It?s not entirely clear what the people standardizing GCM-SIV want to do exactly (other than ?not TLS?, I don?t think they?ve said), but this is the obvious choice, especially given GCM-SIVs separate code path for tiny messages and the historical linking of the two from a crypto design perspective. >>>>>> >>>>>> I am also writing NMR stuff on the side in libsodium/caesium, but that focuses mostly on being a Fernet replacement, rather than a keywrap, using secretbox (which makes it easy because big nonce space). Pretty sure I can translate it to the AEAD schemes, but the security proof gets iffier. Which reminds me: we should talk about Clojure bindings to libsodium some time :) >>>>>> >>>>>> >>>>>> lvh >>>>>> _______________________________________________ >>>>>> Cryptography-dev mailing list >>>>>> Cryptography-dev at python.org >>>>>> https://mail.python.org/mailman/listinfo/cryptography-dev >>>>> _______________________________________________ >>>>> Cryptography-dev mailing list >>>>> Cryptography-dev at python.org >>>>> https://mail.python.org/mailman/listinfo/cryptography-dev >>>> >>>> >>>> _______________________________________________ >>>> Cryptography-dev mailing list >>>> Cryptography-dev at python.org >>>> https://mail.python.org/mailman/listinfo/cryptography-dev >>> _______________________________________________ >>> Cryptography-dev mailing list >>> Cryptography-dev at python.org >>> https://mail.python.org/mailman/listinfo/cryptography-dev >> >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev From _ at lvh.io Thu Jul 7 07:43:59 2016 From: _ at lvh.io (lvh) Date: Thu, 7 Jul 2016 06:43:59 -0500 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> Message-ID: <81D49326-13EE-4150-A91A-18146990135F@lvh.io> Hi, Apologies in advance for late and possibly duplicated message. Originally sent from my iPhone, from wrong e-mail address, which made the mailing list manager unhappy. > On Jul 6, 2016, at 13:22, Frank Siebenlist > wrote: >> For some more context on ?it depends what you want to accomplish, and generic schemes are hard?; Bob and Alice may also want to have key ids that only work for _them_ ? e.g. Bob and Alice?s static DH keys are used to generate a shared secret used for an AD key wrap scheme. > > You're right - Alice may name the key she shares with Bob: "Bob's > key", while Bob may name the same key: "Alice's key" on his end. They > can/should use what ever name is easiest to construct the cypher > messages that they want to exchange with each other. However, the key > identifier that they embed inside of the cypher message cannot be a > local nickname, but should be one that both parties can use, like the > key identifier that I'm looking for. Key wrap is symmetric, deterministic encryption; they are only local within the context of that key, not local to an identity. >>> The kid embedded in the cipher message is no more than a ?hint?. It could be signed as part of the whole cipher message, but its integrity can only be confirmed after the message is decrypted&authenticated. Changing the kid in a cipher message results in DoS, but so would flipping any other bit in that message. >> >> Does a failed decryption cause Bob to reject the message, or just try all the other keys? If so, what?s the benefit between just giving keys names, like sequence numbers or even strings? > > What you do with a failed decryption is an interesting question, but > I'm not sure why it's relevant for the key identifier scheme... > (if you loop through all the keys and find one that decrypts the > message even though it doesn't match the kid... could be phishy...). I'm not sure; how you use it has relevant consequences for how efficient you can make the scheme. > You could use any key identifier you want, as long as both Alice and > Bob will know how to find the right key for that kid. > When you use uuids, or arbitrary names/strings, though, you require > Alice and Bob to agree on the (identifier, key) separately, before the > cipher message can be decrypted. > However, when you use an "intrinsic" identifier, like the one I'm > proposing, then both Alice and Bob can generate those kid's for all > the keys that they have and share, without any separate agreement - > they only have to agree on the kid-derivation method. That observation > is probably the main selling point. >>> If one believes that a simple sha2 hash is only borderline enough secure (?), then maybe use a CMAC or HMAC, where you use the key on the key-value itself, and the resulting tag would constitute the identifier. (I did something like that in franks42/naclj with blake2) >> >> What?s the key used to compute the MAC? (In this case, I think what you _really_ want is AD key wrapping schemes, including GCM-SIV?s tiny mode). > > > For that blake2 scheme that I used in franks42/naclj, the > authentication-key is the key itself - you hash the key and use that > same key to provide additional integrity protection. Pretty sure > HMAC-like schemes were never meant for that purpose... but it doesn't > hurt... Do you have a proof of security for that? I'm in a car and don't have my notebook, but it seems like it'd be pretty easy to build a secure PRF for which that is not OK; I'm thinking CBC-MAC style vulns for example. Doing this securely (with a real key) is what key wrap tries to solve. >>> Or use HKDF, with maybe a kid-derivation specific constant for the salt, a kid-specific info value, and a sufficient length of the resulting key, i.e. identifier, that makes everybody happy. >> >> I?d probably go with BLAKE2b if this is _all_ you?re trying to do, but I think what you might really want is key wrap :) > > Love blake2, but it's not available in plain-vanilla pyca/cryptography... You should go fix that! ;) > Any concerns with using HKDF for this as I suggested in the gist? > https://gist.github.com/franks42/b8b28049adcdf4504271238391c3525b Seems fine; will get it a more thorough review when I get back. > Now comes my question about this "key wrap" that you so obviously try > to promote as a solution ;-)... No horse in this race; it's just that the deterministic encryption folks used to encrypt 16/32 bytes at a time and call what they do "key wrap" and it sounded a lot like what you want. Here's where I would get started: http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/siv/siv.pdf > If I understand it well, key-wrap schemes also requires a second > kek-like key, which we do not have... > How would that work? See above; with a real key; perhaps not exactly what you're looking for. > Cipher messages in rest or in flight - both use cases apply - any time > you have to find the key to decrypt/verify a message through a key > identifier send along with that message. > > Multiple recipients - sure - they face the same issue of finding the > right key to decrypt - although you may use the individually shared > keys as kek's but those scenarios are probably distracting... > > Yes, sending and receiving parties must have a shared (symmetric) key > to make this work - through what key-exchange mechanism this was > achieved is not important for this scheme to work. Right. The reason I'm being so persistent is similar to why a lot of cryptographers dislike PAKE -- it's not that it's bad or hard to do -- it just seems like a weird problem to have. To quote Glyph, it sounded a bit like a jackhammer problem :) In short: HKDF and BLAKE2 seem like what you want :) lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From simo at redhat.com Thu Jul 7 08:22:33 2016 From: simo at redhat.com (Simo Sorce) Date: Thu, 07 Jul 2016 08:22:33 -0400 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: <5AAC30C0-1042-4DDF-A6AD-75C2DE0AED6C@lvh.cc> References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> <5AAC30C0-1042-4DDF-A6AD-75C2DE0AED6C@lvh.cc> Message-ID: <1467894153.3121.158.camel@redhat.com> On Wed, 2016-07-06 at 17:20 -0500, Laurens Van Houtven wrote: > > Right. The reason I'm being so persistent is similar to why a lot of > cryptographers dislike PAKE -- it's not that it's bad or hard to do -- > it just seems like a weird problem to have. To quote Glyph, it sounded > a bit like a jackhammer problem :) Sorry for the OT, I find PAKE very useful and we have a draft[1] to get a variant (SPAKE) in the Kerberos protocol. Do you have any reference to documents describing this "dislike" ? I'd like to know more about it. Simo. [1] https://www.ietf.org/archive/id/draft-mccallum-kitten-krb-spake-preauth-00.txt -- Simo Sorce * Red Hat, Inc * New York From _ at lvh.io Thu Jul 7 08:36:12 2016 From: _ at lvh.io (lvh) Date: Thu, 7 Jul 2016 07:36:12 -0500 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: <1467894153.3121.158.camel@redhat.com> References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> <5AAC30C0-1042-4DDF-A6AD-75C2DE0AED6C@lvh.cc> <1467894153.3121.158.camel@redhat.com> Message-ID: <96D84BA1-AEA6-4C9C-B323-A9036B968AB4@lvh.io> > On Jul 7, 2016, at 7:22 AM, Simo Sorce wrote: > On Wed, 2016-07-06 at 17:20 -0500, Laurens Van Houtven wrote: >> >> Right. The reason I'm being so persistent is similar to why a lot of >> cryptographers dislike PAKE -- it's not that it's bad or hard to do -- >> it just seems like a weird problem to have. To quote Glyph, it sounded >> a bit like a jackhammer problem :) > > Sorry for the OT, I find PAKE very useful and we have a draft[1] to get > a variant (SPAKE) in the Kerberos protocol. > Do you have any reference to documents describing this "dislike" ? > I'd like to know more about it. Nope. I don?t share those opinions of PAKE, regardless; but I do agree that it?s a solution to a very specific problem. If you want a reasonable way to go from a low-entropy shared secret to a high-entropy one, then you probably want SPAKE2. lvh From simo at redhat.com Thu Jul 7 08:51:14 2016 From: simo at redhat.com (Simo Sorce) Date: Thu, 07 Jul 2016 08:51:14 -0400 Subject: [Cryptography-dev] "intrinsic" symmetric key identifier? In-Reply-To: <96D84BA1-AEA6-4C9C-B323-A9036B968AB4@lvh.io> References: <749A9BBD-6191-46E6-BF2B-4134FC37A27B@lvh.io> <0D20123F-F12B-4234-84DF-E9AECE6E31C9@lvh.io> <5471F776-D4B0-4941-9FAB-96341A530A55@gmail.com> <5AAC30C0-1042-4DDF-A6AD-75C2DE0AED6C@lvh.cc> <1467894153.3121.158.camel@redhat.com> <96D84BA1-AEA6-4C9C-B323-A9036B968AB4@lvh.io> Message-ID: <1467895874.3121.159.camel@redhat.com> On Thu, 2016-07-07 at 07:36 -0500, lvh wrote: > > On Jul 7, 2016, at 7:22 AM, Simo Sorce wrote: > > On Wed, 2016-07-06 at 17:20 -0500, Laurens Van Houtven wrote: > >> > >> Right. The reason I'm being so persistent is similar to why a lot of > >> cryptographers dislike PAKE -- it's not that it's bad or hard to do -- > >> it just seems like a weird problem to have. To quote Glyph, it sounded > >> a bit like a jackhammer problem :) > > > > Sorry for the OT, I find PAKE very useful and we have a draft[1] to get > > a variant (SPAKE) in the Kerberos protocol. > > Do you have any reference to documents describing this "dislike" ? > > I'd like to know more about it. > > > Nope. I don?t share those opinions of PAKE, regardless; but I do agree > that it?s a solution to a very specific problem. If you want a > reasonable way to go from a low-entropy shared secret to a > high-entropy one, then you probably want SPAKE2. Yes, we are using SPAKE2, thanks. Simo. -- Simo Sorce * Red Hat, Inc * New York From frank.siebenlist at gmail.com Mon Jul 11 23:42:26 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Mon, 11 Jul 2016 20:42:26 -0700 Subject: [Cryptography-dev] hash.SHA256 cpu expensive per byte versus byte-string? Message-ID: I ran in some unexpected timing issues while using pyca/cryptography?s hash.SHA256, and I?m wondering if there is something wrong with the timing discrepancy I see between two different hashing approaches. When I hash a single byte-string of 10million bytes, it seems to take 2-3 orders of magnitude less time than when I loop over the bytes and hash them one by one. Please look at the following bare-bone snippet: ? from __future__ import absolute_import, division, print_function import time from cryptography.hazmat.primitives import hashes from cryptography.hazmat.backends import default_backend # d1 = hashes.Hash(algorithm=hashes.SHA256(),backend=default_backend()) d2 = d1.copy() # n = 10000000 print('n:', n) # b = b'a' ba = bytearray(n*b'a') bs = bytes(ba) # s = time.time() d1.update(bs) t = time.time() - s print('ba: ', t) print(d1.finalize()) # s = time.time() for i in range(n): d2.update(b) t = time.time() - s print('b: ', t) print(d2.finalize()) # ? The output is: ? /usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/bin/python3.5 /Users/franksiebenlist/git/pyvate23/src/pyvate/messagedigest_tst.py n: 10000000 ba: 0.027185916900634766 b'\x01\xf4\xa8|\x04\xb4\n\xf5\x9a\xad\xc0\xe8\x12)5\tp\x9c\x9a\x87c\xa6\x0b\x7f\x9e\x1903"\xf8\xb0<' b: 15.677960872650146 b'\x01\xf4\xa8|\x04\xb4\n\xf5\x9a\xad\xc0\xe8\x12)5\tp\x9c\x9a\x87c\xa6\x0b\x7f\x9e\x1903"\xf8\xb0<' Process finished with exit code 0 ? Results for python 2 and 3 are similar. I understand that there may be a few more object-creations and casts involved in the looping, but 500 times slower? that was un unexpected surprise. Comments? Observation? Thanks, Frank. From _ at lvh.io Tue Jul 12 11:07:13 2016 From: _ at lvh.io (lvh) Date: Tue, 12 Jul 2016 10:07:13 -0500 Subject: [Cryptography-dev] hash.SHA256 cpu expensive per byte versus byte-string? In-Reply-To: References: Message-ID: Hi, > On Jul 11, 2016, at 10:42 PM, Frank Siebenlist wrote: > I understand that there may be a few more object-creations and casts involved in the looping, but 500 times slower? that was un unexpected surprise. As expected. You both get massively increased C call overhead and the worst case because you don?t get to hit a block until every 512/8 == 64 updates. Alas, openssl speed doesn?t distinguish between the same message sizes but in different chunk sizes, but you can at least clearly see the performance multiplier for larger messages. lvh From frank.siebenlist at gmail.com Tue Jul 12 13:49:07 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Tue, 12 Jul 2016 10:49:07 -0700 Subject: [Cryptography-dev] hash.SHA256 cpu expensive per byte versus byte-string? In-Reply-To: References: Message-ID: After I sent my message yesterday evening, I was also wondering about that 512bit (64byte) block-size of sha256, and if that would add to the observed slowness. The following output shows time as a function of byte-chunk size (1,2,8,32,64,128,256 bytes) b: 12.111763954162598 b2: 5.806451082229614 b8: 1.4664850234985352 b32: 0.37551307678222656 b64: 0.20229697227478027 b128: 0.11141395568847656 b256: 0.06758689880371094 8388608 bs: 0.020879030227661133 Time seems to go down linearly with increase of chunk size, and there is no perceived "speed boost" when we go through the 64byte thresh-hold. Time seems to be only linearly related to the number of python-to-C calls. And again, I can understand that the overhead is proportional to the number of python-to-C calls, but it's just the factor of 500 (2-3 order of magnitude) that (unpleasantly) surprised me. It requires one to optimize on byte-string size to pass in the update(), when you have many bytes to hash. For example, if you read from a file or socket, don't update() 1 byte at the time while you read from the stream, but fill-up a (big) buffer first and pass that buffer. -Frank. PS. I haven't looked at the sha256 C-code, but I can imagine that when you pass the update() one byte at the time, it will fill-up some 64byte-buffer, and if that buffer is filled, it will churn/hash that block. The adding a byte to the buffer is all low-level fast code in C, while the churning would use significantly more CPU cycles... hard to phantom that you would see much slower performance when you pass a single byte at the time in C... On Tue, Jul 12, 2016 at 8:07 AM, lvh <_ at lvh.io> wrote: > Hi, > >> On Jul 11, 2016, at 10:42 PM, Frank Siebenlist wrote: > > > >> I understand that there may be a few more object-creations and casts involved in the looping, but 500 times slower? that was un unexpected surprise. > > As expected. You both get massively increased C call overhead and the worst case because you don?t get to hit a block until every 512/8 == 64 updates. Alas, openssl speed doesn?t distinguish between the same message sizes but in different chunk sizes, but you can at least clearly see the performance multiplier for larger messages. > > lvh > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev From frank.siebenlist at gmail.com Thu Jul 14 01:23:31 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Wed, 13 Jul 2016 22:23:31 -0700 Subject: [Cryptography-dev] hash.SHA256 cpu expensive per byte versus byte-string? In-Reply-To: References: Message-ID: Python's native hashing module (hashlib), shows similar results: - about the same time when passed the 8MB blob in one go (probably expected as both use openssl) - substantial overhead when looping over small chunks (up to 100 times) - except that it's about 6 times faster per single byte.. n: 8388608 b: 1.958238124847412 b2: 1.0818939208984375 b8: 0.2987058162689209 b32: 0.10640311241149902 b64: 0.06242084503173828 b128: 0.04123806953430176 b256: 0.03258681297302246 8388608 bs: 0.02389383316040039 Guess hashlib used some better optimization on the C-calls (?). This is my last update on this observation. Conclusion is "so be it", and using bigger chunks for hashing gives (much) better performance. -Frank. On Tue, Jul 12, 2016 at 10:49 AM, Frank Siebenlist wrote: > After I sent my message yesterday evening, I was also wondering about > that 512bit (64byte) block-size of sha256, and if that would add to > the observed slowness. > The following output shows time as a function of byte-chunk size > (1,2,8,32,64,128,256 bytes) > > b: 12.111763954162598 > b2: 5.806451082229614 > b8: 1.4664850234985352 > b32: 0.37551307678222656 > b64: 0.20229697227478027 > b128: 0.11141395568847656 > b256: 0.06758689880371094 > 8388608 bs: 0.020879030227661133 > > Time seems to go down linearly with increase of chunk size, and there > is no perceived "speed boost" when we go through the 64byte > thresh-hold. > Time seems to be only linearly related to the number of python-to-C calls. > > And again, I can understand that the overhead is proportional to the > number of python-to-C calls, but it's just the factor of 500 (2-3 > order of magnitude) that (unpleasantly) surprised me. It requires one > to optimize on byte-string size to pass in the update(), when you have > many bytes to hash. For example, if you read from a file or socket, > don't update() 1 byte at the time while you read from the stream, but > fill-up a (big) buffer first and pass that buffer. > > -Frank. > > PS. I haven't looked at the sha256 C-code, but I can imagine that when > you pass the update() one byte at the time, it will fill-up some > 64byte-buffer, and if that buffer is filled, it will churn/hash that > block. The adding a byte to the buffer is all low-level fast code in > C, while the churning would use significantly more CPU cycles... hard > to phantom that you would see much slower performance when you pass a > single byte at the time in C... > > > On Tue, Jul 12, 2016 at 8:07 AM, lvh <_ at lvh.io> wrote: >> Hi, >> >>> On Jul 11, 2016, at 10:42 PM, Frank Siebenlist wrote: >> >> >> >>> I understand that there may be a few more object-creations and casts involved in the looping, but 500 times slower? that was un unexpected surprise. >> >> As expected. You both get massively increased C call overhead and the worst case because you don?t get to hit a block until every 512/8 == 64 updates. Alas, openssl speed doesn?t distinguish between the same message sizes but in different chunk sizes, but you can at least clearly see the performance multiplier for larger messages. >> >> lvh >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev From _ at lvh.io Thu Jul 14 09:57:57 2016 From: _ at lvh.io (lvh) Date: Thu, 14 Jul 2016 08:57:57 -0500 Subject: [Cryptography-dev] hash.SHA256 cpu expensive per byte versus byte-string? In-Reply-To: References: Message-ID: Hi Frank, > On Jul 14, 2016, at 12:23 AM, Frank Siebenlist wrote: > > Python's native hashing module (hashlib), shows similar results: > - about the same time when passed the 8MB blob in one go > (probably expected as both use openssl) > - substantial overhead when looping over small chunks (up to 100 times) > - except that it's about 6 times faster per single byte.. The perf by chunk is a consequence of how SHA256 works. The higher perf for many calls is a consequence of extension modules vs cffi. lvh > n: 8388608 > b: 1.958238124847412 > b2: 1.0818939208984375 > b8: 0.2987058162689209 > b32: 0.10640311241149902 > b64: 0.06242084503173828 > b128: 0.04123806953430176 > b256: 0.03258681297302246 > 8388608 bs: 0.02389383316040039 > > Guess hashlib used some better optimization on the C-calls (?). > > This is my last update on this observation. > Conclusion is "so be it", and using bigger chunks for hashing gives > (much) better performance. > > -Frank. > > On Tue, Jul 12, 2016 at 10:49 AM, Frank Siebenlist > wrote: >> After I sent my message yesterday evening, I was also wondering about >> that 512bit (64byte) block-size of sha256, and if that would add to >> the observed slowness. >> The following output shows time as a function of byte-chunk size >> (1,2,8,32,64,128,256 bytes) >> >> b: 12.111763954162598 >> b2: 5.806451082229614 >> b8: 1.4664850234985352 >> b32: 0.37551307678222656 >> b64: 0.20229697227478027 >> b128: 0.11141395568847656 >> b256: 0.06758689880371094 >> 8388608 bs: 0.020879030227661133 >> >> Time seems to go down linearly with increase of chunk size, and there >> is no perceived "speed boost" when we go through the 64byte >> thresh-hold. >> Time seems to be only linearly related to the number of python-to-C calls. >> >> And again, I can understand that the overhead is proportional to the >> number of python-to-C calls, but it's just the factor of 500 (2-3 >> order of magnitude) that (unpleasantly) surprised me. It requires one >> to optimize on byte-string size to pass in the update(), when you have >> many bytes to hash. For example, if you read from a file or socket, >> don't update() 1 byte at the time while you read from the stream, but >> fill-up a (big) buffer first and pass that buffer. >> >> -Frank. >> >> PS. I haven't looked at the sha256 C-code, but I can imagine that when >> you pass the update() one byte at the time, it will fill-up some >> 64byte-buffer, and if that buffer is filled, it will churn/hash that >> block. The adding a byte to the buffer is all low-level fast code in >> C, while the churning would use significantly more CPU cycles... hard >> to phantom that you would see much slower performance when you pass a >> single byte at the time in C... >> >> >> On Tue, Jul 12, 2016 at 8:07 AM, lvh <_ at lvh.io> wrote: >>> Hi, >>> >>>> On Jul 11, 2016, at 10:42 PM, Frank Siebenlist wrote: >>> >>> >>> >>>> I understand that there may be a few more object-creations and casts involved in the looping, but 500 times slower? that was un unexpected surprise. >>> >>> As expected. You both get massively increased C call overhead and the worst case because you don?t get to hit a block until every 512/8 == 64 updates. Alas, openssl speed doesn?t distinguish between the same message sizes but in different chunk sizes, but you can at least clearly see the performance multiplier for larger messages. >>> >>> lvh >>> _______________________________________________ >>> Cryptography-dev mailing list >>> Cryptography-dev at python.org >>> https://mail.python.org/mailman/listinfo/cryptography-dev > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 643 bytes Desc: Message signed with OpenPGP using GPGMail URL: From frank.siebenlist at gmail.com Thu Jul 14 12:01:24 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Thu, 14 Jul 2016 09:01:24 -0700 Subject: [Cryptography-dev] hash.SHA256 cpu expensive per byte versus byte-string? In-Reply-To: References: Message-ID: > The perf by chunk is a consequence of how SHA256 works. I politely disagree... Having chunks smaller or larger than SHA256's 64 byte block size doesn't seem to affect the timing results in any noticeable way. If you do not fill-up SHA256's block-size buffer with update(), it simply returns, and there is only the overhead of the function call. Unless I misunderstand the inner workings... Regards, Frank. On Thu, Jul 14, 2016 at 6:57 AM, lvh <_ at lvh.io> wrote: > Hi Frank, > >> On Jul 14, 2016, at 12:23 AM, Frank Siebenlist wrote: >> >> Python's native hashing module (hashlib), shows similar results: >> - about the same time when passed the 8MB blob in one go >> (probably expected as both use openssl) >> - substantial overhead when looping over small chunks (up to 100 times) >> - except that it's about 6 times faster per single byte.. > > The perf by chunk is a consequence of how SHA256 works. The higher perf for many calls is a consequence of extension modules vs cffi. > > lvh > >> n: 8388608 >> b: 1.958238124847412 >> b2: 1.0818939208984375 >> b8: 0.2987058162689209 >> b32: 0.10640311241149902 >> b64: 0.06242084503173828 >> b128: 0.04123806953430176 >> b256: 0.03258681297302246 >> 8388608 bs: 0.02389383316040039 >> >> Guess hashlib used some better optimization on the C-calls (?). >> >> This is my last update on this observation. >> Conclusion is "so be it", and using bigger chunks for hashing gives >> (much) better performance. >> >> -Frank. >> >> On Tue, Jul 12, 2016 at 10:49 AM, Frank Siebenlist >> wrote: >>> After I sent my message yesterday evening, I was also wondering about >>> that 512bit (64byte) block-size of sha256, and if that would add to >>> the observed slowness. >>> The following output shows time as a function of byte-chunk size >>> (1,2,8,32,64,128,256 bytes) >>> >>> b: 12.111763954162598 >>> b2: 5.806451082229614 >>> b8: 1.4664850234985352 >>> b32: 0.37551307678222656 >>> b64: 0.20229697227478027 >>> b128: 0.11141395568847656 >>> b256: 0.06758689880371094 >>> 8388608 bs: 0.020879030227661133 >>> >>> Time seems to go down linearly with increase of chunk size, and there >>> is no perceived "speed boost" when we go through the 64byte >>> thresh-hold. >>> Time seems to be only linearly related to the number of python-to-C calls. >>> >>> And again, I can understand that the overhead is proportional to the >>> number of python-to-C calls, but it's just the factor of 500 (2-3 >>> order of magnitude) that (unpleasantly) surprised me. It requires one >>> to optimize on byte-string size to pass in the update(), when you have >>> many bytes to hash. For example, if you read from a file or socket, >>> don't update() 1 byte at the time while you read from the stream, but >>> fill-up a (big) buffer first and pass that buffer. >>> >>> -Frank. >>> >>> PS. I haven't looked at the sha256 C-code, but I can imagine that when >>> you pass the update() one byte at the time, it will fill-up some >>> 64byte-buffer, and if that buffer is filled, it will churn/hash that >>> block. The adding a byte to the buffer is all low-level fast code in >>> C, while the churning would use significantly more CPU cycles... hard >>> to phantom that you would see much slower performance when you pass a >>> single byte at the time in C... >>> >>> >>> On Tue, Jul 12, 2016 at 8:07 AM, lvh <_ at lvh.io> wrote: >>>> Hi, >>>> >>>>> On Jul 11, 2016, at 10:42 PM, Frank Siebenlist wrote: >>>> >>>> >>>> >>>>> I understand that there may be a few more object-creations and casts involved in the looping, but 500 times slower? that was un unexpected surprise. >>>> >>>> As expected. You both get massively increased C call overhead and the worst case because you don?t get to hit a block until every 512/8 == 64 updates. Alas, openssl speed doesn?t distinguish between the same message sizes but in different chunk sizes, but you can at least clearly see the performance multiplier for larger messages. >>>> >>>> lvh >>>> _______________________________________________ >>>> Cryptography-dev mailing list >>>> Cryptography-dev at python.org >>>> https://mail.python.org/mailman/listinfo/cryptography-dev >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev > > > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev > From donald at stufft.io Thu Jul 14 12:17:01 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 14 Jul 2016 12:17:01 -0400 Subject: [Cryptography-dev] hash.SHA256 cpu expensive per byte versus byte-string? In-Reply-To: References: Message-ID: > On Jul 14, 2016, at 1:23 AM, Frank Siebenlist wrote: > > Guess hashlib used some better optimization on the C-calls (?). > > This is my last update on this observation. > Conclusion is "so be it", and using bigger chunks for hashing gives > (much) better performance. I believe this is going to be due to the overhead of CFFI on CPython. Every time we call a C function via CFFI there is some marshaling and such that goes on, so when you call update() a whole lot of times (one per byte) there?s a whole lot of marshaling and crossing the C boundary going on. In contrast, hashlib is written using the C-EXT API in CPython, which means that it integrates directly into the internals of CPython and doesn?t need to pay that marshaling cost. In terms of safety, CFFI is far superior to directly writing C in the C-EXT API, it?s also more portable since it utilized a pluggable backend approach, and on PyPy it tends to be much faster since it offers introspection that the JIT can take advantage of. The downside is, putting a bunch of CFFI calls in a hot loop on CPython can be slower than C-EXTs. ? Donald Stufft From frank.siebenlist at gmail.com Mon Jul 18 16:12:04 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Mon, 18 Jul 2016 13:12:04 -0700 Subject: [Cryptography-dev] Fernet NG or alternative simple, high-level, encrypted message module? In-Reply-To: <1467137494.3121.8.camel@redhat.com> References: <948A93A3-0FBC-41CB-937E-DFF2A7FBEC9D@gmail.com> <1465823210.3498.49.camel@redhat.com> <1467137494.3121.8.camel@redhat.com> Message-ID: Did this discussion about jwcrypto integration in pyca/cryptography happen? Anything we can do to help/facilitate this? Thanks, Frank. On Tue, Jun 28, 2016 at 11:11 AM, Simo Sorce wrote: > I see no problem matching the license, let's discuss if this merge can > be done and I will change the license as we start working on it for > real. > > Simo. > > On Tue, 2016-06-14 at 09:19 -0700, Frank Siebenlist wrote: >> Hi Simo - if you could accommodate the jwcrypto-license to match >> pyca/cryptography's... that would be fantastic and generous!!! - >> Thanks, Frank. >> >> On Mon, Jun 13, 2016 at 6:06 AM, Simo Sorce wrote: >> > On Sun, 2016-06-12 at 23:51 -0400, Paul Kehrer wrote: >> >> In general I'm in favor of pulling jwcrypto (or something like it) >> >> into cryptography. The obstacles are going to be figuring out the >> >> licensing (cryptography is Apache2/BSD dual licensed and any code >> >> contributed to it needs to be available under those licenses), >> >> discussing what (if any) API changes need to be made to fit in with >> >> the API design of the hazmat layer, and general "make the code style >> >> match cryptography". >> > >> > Jwcrypto author here, >> > from my POV we can discuss license/API/style adjustments needed, just >> > let me know in which form you want to have this discussion. >> > >> > Simo. >> > >> > -- >> > Simo Sorce * Red Hat, Inc * New York >> > >> > _______________________________________________ >> > Cryptography-dev mailing list >> > Cryptography-dev at python.org >> > https://mail.python.org/mailman/listinfo/cryptography-dev >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev > > > -- > Simo Sorce * Red Hat, Inc * New York > > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev From simo at redhat.com Wed Jul 20 05:02:15 2016 From: simo at redhat.com (Simo Sorce) Date: Wed, 20 Jul 2016 05:02:15 -0400 Subject: [Cryptography-dev] Fernet NG or alternative simple, high-level, encrypted message module? In-Reply-To: References: <948A93A3-0FBC-41CB-937E-DFF2A7FBEC9D@gmail.com> <1465823210.3498.49.camel@redhat.com> <1467137494.3121.8.camel@redhat.com> Message-ID: <1469005335.21393.27.camel@redhat.com> On Mon, 2016-07-18 at 13:12 -0700, Frank Siebenlist wrote: > Did this discussion about jwcrypto integration in pyca/cryptography > happen? Not yet. > Anything we can do to help/facilitate this? Jumpstart it with an Issue on github ? Simo. > Thanks, Frank. > > On Tue, Jun 28, 2016 at 11:11 AM, Simo Sorce wrote: > > > > I see no problem matching the license, let's discuss if this merge > > can > > be done and I will change the license as we start working on it for > > real. > > > > Simo. > > > > On Tue, 2016-06-14 at 09:19 -0700, Frank Siebenlist wrote: > > > > > > Hi Simo - if you could accommodate the jwcrypto-license to match > > > pyca/cryptography's... that would be fantastic and generous!!! - > > > Thanks, Frank. > > > > > > On Mon, Jun 13, 2016 at 6:06 AM, Simo Sorce > > > wrote: > > > > > > > > On Sun, 2016-06-12 at 23:51 -0400, Paul Kehrer wrote: > > > > > > > > > > In general I'm in favor of pulling jwcrypto (or something > > > > > like it) > > > > > into cryptography. The obstacles are going to be figuring out > > > > > the > > > > > licensing (cryptography is Apache2/BSD dual licensed and any > > > > > code > > > > > contributed to it needs to be available under those > > > > > licenses), > > > > > discussing what (if any) API changes need to be made to fit > > > > > in with > > > > > the API design of the hazmat layer, and general "make the > > > > > code style > > > > > match cryptography". > > > > Jwcrypto author here, > > > > from my POV we can discuss license/API/style adjustments > > > > needed, just > > > > let me know in which form you want to have this discussion. > > > > > > > > Simo. > > > > > > > > -- > > > > Simo Sorce * Red Hat, Inc * New York > > > > > > > > _______________________________________________ > > > > Cryptography-dev mailing list > > > > Cryptography-dev at python.org > > > > https://mail.python.org/mailman/listinfo/cryptography-dev > > > _______________________________________________ > > > Cryptography-dev mailing list > > > Cryptography-dev at python.org > > > https://mail.python.org/mailman/listinfo/cryptography-dev > > > > -- > > Simo Sorce * Red Hat, Inc * New York > > > > _______________________________________________ > > Cryptography-dev mailing list > > Cryptography-dev at python.org > > https://mail.python.org/mailman/listinfo/cryptography-dev > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev From frank.siebenlist at gmail.com Wed Jul 20 12:37:49 2016 From: frank.siebenlist at gmail.com (Frank Siebenlist) Date: Wed, 20 Jul 2016 09:37:49 -0700 Subject: [Cryptography-dev] Fernet NG or alternative simple, high-level, encrypted message module? In-Reply-To: <1469005335.21393.27.camel@redhat.com> References: <948A93A3-0FBC-41CB-937E-DFF2A7FBEC9D@gmail.com> <1465823210.3498.49.camel@redhat.com> <1467137494.3121.8.camel@redhat.com> <1469005335.21393.27.camel@redhat.com> Message-ID: Issue 57: Adopt jwcrypto as a jose/jwk/jwe/jws hazmat module https://github.com/pyca/cryptography/issues/3050 On Wed, Jul 20, 2016 at 2:02 AM, Simo Sorce wrote: > On Mon, 2016-07-18 at 13:12 -0700, Frank Siebenlist wrote: >> Did this discussion about jwcrypto integration in pyca/cryptography >> happen? > > Not yet. > >> Anything we can do to help/facilitate this? > > Jumpstart it with an Issue on github ? > > Simo. > >> Thanks, Frank. >> >> On Tue, Jun 28, 2016 at 11:11 AM, Simo Sorce wrote: >> > >> > I see no problem matching the license, let's discuss if this merge >> > can >> > be done and I will change the license as we start working on it for >> > real. >> > >> > Simo. >> > >> > On Tue, 2016-06-14 at 09:19 -0700, Frank Siebenlist wrote: >> > > >> > > Hi Simo - if you could accommodate the jwcrypto-license to match >> > > pyca/cryptography's... that would be fantastic and generous!!! - >> > > Thanks, Frank. >> > > >> > > On Mon, Jun 13, 2016 at 6:06 AM, Simo Sorce >> > > wrote: >> > > > >> > > > On Sun, 2016-06-12 at 23:51 -0400, Paul Kehrer wrote: >> > > > > >> > > > > In general I'm in favor of pulling jwcrypto (or something >> > > > > like it) >> > > > > into cryptography. The obstacles are going to be figuring out >> > > > > the >> > > > > licensing (cryptography is Apache2/BSD dual licensed and any >> > > > > code >> > > > > contributed to it needs to be available under those >> > > > > licenses), >> > > > > discussing what (if any) API changes need to be made to fit >> > > > > in with >> > > > > the API design of the hazmat layer, and general "make the >> > > > > code style >> > > > > match cryptography". >> > > > Jwcrypto author here, >> > > > from my POV we can discuss license/API/style adjustments >> > > > needed, just >> > > > let me know in which form you want to have this discussion. >> > > > >> > > > Simo. >> > > > >> > > > -- >> > > > Simo Sorce * Red Hat, Inc * New York >> > > > >> > > > _______________________________________________ >> > > > Cryptography-dev mailing list >> > > > Cryptography-dev at python.org >> > > > https://mail.python.org/mailman/listinfo/cryptography-dev >> > > _______________________________________________ >> > > Cryptography-dev mailing list >> > > Cryptography-dev at python.org >> > > https://mail.python.org/mailman/listinfo/cryptography-dev >> > >> > -- >> > Simo Sorce * Red Hat, Inc * New York >> > >> > _______________________________________________ >> > Cryptography-dev mailing list >> > Cryptography-dev at python.org >> > https://mail.python.org/mailman/listinfo/cryptography-dev >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev at python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev at python.org > https://mail.python.org/mailman/listinfo/cryptography-dev