A Peek Under the Hood
ZeroDB enables end-to-end encrypted queries, enabling clients to retrieve records without exposing decrypted data to the database server. The familiar client-server architecture stays the same, but query logic and decryption keys are pushed client-side. Since the server has no insight into the nature of the data, the risk of a server-side data breach is eliminated. Even if attackers successfully infiltrate the server, they won't have access to the cleartext data.
After issuing a query, the client interacts with the server during execution of the query over a series of multiple round trips. The encrypted index on the server is stored as a B-Tree. The client is able to traverse the index based on ZeroDB’s algorithm in an optimized manner to performantly retrieve the necessary encrypted records.
ZeroDB is able to provide end-to-end encryption while maintaining most of the functionality expected from a modern database. Because it performs query logic client-side, ZeroDB doesn’t slow down with parallel requests.
Query Protocol
As is typically the case with databases, data is structured as B-Trees. A B-Tree consists of buckets, each of which can be either a root, branch, or leaf node. The leaf nodes of a tree point to the actual objects being stored. Thus, searching the database is a simple tree traversal.
In order to make the database secure but still capable of performing search functions, the client encrypts the buckets with a key (at the time of creation or modification). The server, which stores the buckets, never knows this encryption key. The objects referenced by the leaf nodes of the B-Tree indexes are also encrypted.
The server doesn’t know how individual objects are organized within a tree structure, or whether they even belong to a tree structure at all. It cannot compare objects, or even tell whether they are the same object.
When a client performs a query, it asks the server to return buckets of the tree as it traverses it, in a just-in-time fashion. Figure 2 shows a sequence of client requests for traversal of the tree from Figure 1. Buckets can be cached client-side, so that subsequent queries hit the cache cutting down the number of network calls/round trips.
The server can provide objects (trees and the stored data) to multiple clients simultaneously, each of whom may have different encryption keys (for different indexes to their private data) or the same key (in which case the server can set quotas or throttle data in case one of the clients is compromised).
Since the encrypted database is stored remotely, the performance of queries is mostly defined by client-server latency, and does not increase server CPU load above what is expected from traditional databases.
Performance
Compared to traditional databases, this protocol requires multiple requests to perform one query. Due to the latency between client and server, the number of requests should be optimized. If index size is index_size
, bucket size is bucket_size
, and a reference to the next bucket or object occupies ref_size
in a bucket, the number of requests needed to perform one query is roughly log(index_size / ref_size) / log(bucket_size / ref_size)
. In a practical case of ref_size=30 bytes, index_size=1 GB, bucket_size=50 kb
we need to make three requests to complete a query, each of which transfers 50 kb of data. The number of required requests grows logarithmically with the index size (or the data size).
We tested ZeroDB’s full text search over the lkml archive [1] (plaintext ~250 MB), which showed that a remote query over encrypted data is ~0.5s (the server was deployed at AWS in Oregon while the query was performed from San Francisco).
Saving the data into the database also takes logarithmic time. Our tests showed that insert
queries take ~0.5s in similar conditions.
We're still learning many of the possible applications for ZeroDB. Why don't you share your ideal use case in the comments?
Thanks for reading and if you haven't already, head over to http://www.zerodb.io/#beta to sign up for the waitlist!