The Holy Grail of CloudKane Hsieh Dec 14, 2012 Back to blog
A difficult part of being a VC is evalutating a company at all levels, from engineering feasbility to business viability. Ove the coming weeks I will try to explain some complex engineering concepts in plain English.
A term that is being tossed around more and more these days is homomorphic encryption, and it’s important to understand what it is when analyzing the value of cloud services targeted towards large institutions that have lots of sensitive data. (If you are mathematically inclined, the Wiki dives a lot deeper).
Right now, many companies hesitate to move to fully hosted data solutions because they don’t want to put sensitive data on servers they don’t control (ie the cloud). For instance, if Acme. Co has all sensitive data hosted on Microsoft Azure, and the DOJ subpoenas Microsoft for Acme’s data, they get all that data from Microsoft. Or a rogue employmee might swipe the disks (unlikely, but paranoia is a strong deterrent).
“Why not encrypt it?” you might ask. The problem with cloud now is that if I want to do computation on data I have in the cloud, it has to be unencrypted. Why? Consider the statement “4 + 3 = 7.” Let 4 and 3 represent some really confidential financial data, and 7 is a confidential result of some function, in this case addition. If I want to store 4 and 3 in the cloud, I encrypt them. Pretend our protocol I’m using encrypts 4 to AB and 3 to EF, and 7 to WX. I can now safely store AB and EF in the cloud. But I need to find out with the confidential result of 4+3 is. In the cloud, I would have to do AB + EF, since my data is encrypted. But that’s a meaningless statement. AB + EF doesn’t result in anything. Therefore, to do any computation in the cloud, unencrypted data has to exist for some period of time in the cloud. And that’s a dealbreaker for many companies.
Homomorphic encryption is an extraordinarily clever crypto system such that AB + EF = WX. Thta is, addition on two pieces of encrypted data results in the encrypted version of the unencrypted result of addition on the unencrypted versions of the two pieces of original data. A system is fully homomorphic if addition and multiplication can both be computed on encrypted data.
To make that rigorous:
• Let the encryption function be P()
• Let a, b, c, and d be data
A homomorphic encryption is one such that P(a) + P(b) = P(c) and P(a) * P(b) = P(d) if and only if (a + b = c) and (a * b = d). Homomorphic encryption is sort of a holy grail of enterprise cloud, because it means no unencrypted data will ever touch a cloud server.
Unfortunately, homomorphic encryption does not exist. At least, not in a usable way. However, there are companies that provide what I call “practically homomorphic” encryption. That is, software that isn't actually homomorphic but behaves like it is. This is usually accomplished via a referential set of proprietary metadata that is attached to the encrypted data. The software's algorithims then use the metadata to approximate functions on the original data.
For parsing certain data, such as searching email, I don’t need rigorous homomorphic encryption; unlike acounting, which has to give exact, deterministic results, email search results that are close enough (eg. only fails when searching a certain Greek symbol) represent a value added if it allows a company to use a cloud hosted email server (Office 365, Gmail) without worrying about sensitive data existing on the hosting server.
What does this all mean? Many remain bearish on the value of hosted storage to companies with a lot of sensitive information. True homomorphism, while still the domain of academic researchers, could potentially change all that; until then, companies like CipherCloud, which just raised $30M, will make practically homomorphic products for certain sets of data, and companies will timidly build out hybrid clouds.