Don't be lazy, write your own proof!

By Tom Godden

November 22, 2024 - 8 minutes read - 1592 words

ZK compilers and virtual machines are amazing pieces of technology. Here is why not to use them.

Writing your own R1CS (Rank-one constraint system) or arithmetic circuit for ZKPs (Zero-Knowledge Proofs of Knowledge) is hard. You have to convert your turing-complete code to an enormous matrix of multiplications and additions for the former, or to a monstrous network of AND- and OR-gates for the latter.

For this reason, many tools have been developed that allow you to take high-level code in some form or another and transform it into a ZKP. Circom and Cairo allow you to compile your code to a zero-knowledge proof for DApps (Decentralized applications) on a blockchain. zkVM is a virtual machine for verifiable computation that runs code and proves its correct execution in zero-knowledge.

However, this comes at a cost. A constraint system like an R1CS or an arithmetic circuit is not equivalent to a Turing machine (a computer): there is no “runtime” and there are no lazy conditional statements. Therefore, whenever a programmer would normally use an if statement, the constraint system needs to eagerly evaluate both branches of this statement. If one of these branches has poor performance, it can destroy the efficiency of the entire algorithm. It gets even worse when we add loops into the mix. Many algorithms are by their very nature recursive: sorting algorithms, graph traversals, binary trees, logic reasoners, parsers, the list goes on. When we start using recursive eager ifs, we effectively start nesting ifs in ifs in ifs in … which leads to an exponential explosion of the algorithm’s efficiency. Effectively, the efficiency of the constraint system is always at least as bad as the worst case.

As an example, let’s look at a typical binary tree lookup algorithm. We’re assuming that there is a BinaryTree struct with all the common functions. It doesn’t really matter how the BinaryTree is represented in the backend.

fn find(tree: &BinaryTree<T>, elmt: &T) -> bool {
    if tree.is_leaf() {
        return tree.value() == elmt;
    } else {
        return find(tree.left(), t) || find(tree.right(), t)
    }
}

In a Turing machine, such an algorithm would run in $\mathcal{O}(\log_2(n))$. However, in a constraint system, eager IFs make it so that the algorithm always has a complexity of $\mathcal{O}(n) = \mathcal{\Omega}(n)$, effectively making it just as inefficient as a linear search.

Proving on multiple levels

For many algorithms, it is easier to prove a solution than to calculate said solution. This property is of course the basis of large parts, if not all, of cryptography. For example, prime factorisation is very difficult for a computer, but it is very easy to take some large primes and multiply them. It is extremely difficult to calculate the preimage of a hash, but if you know the preimage, it is very easy to prove that that preimage in fact hashes to that hash value. In fact, ZKPs also work this way: if you know the secret witness data, it is easy to generate a valid proof, but if you don’t, it’s computationally impossible to do so.

This is where we got to the main topic of this post: We are creating a zero-knowledge proof of knowledge. This means we already have the solution to whatever problem we are modeling. We don’t need to calculate the solution, we just need to prove that our solution is correct. This can increase the efficiency of our implementation exponentially (as in literally “with an exponent”).

The main idea is as follows: we can compute the solution “inefficiently” on our very efficient Turing machine. Afterwards, we can prove this solution efficiently in our very inefficient constraint system.

To go back to our example of the binary tree lookup. Since we know where the element sits in the tree, we simply need to give the solution, namely how to find this element.

enum Child {
    Left,
    Right
}

fn prove_find(tree: &BinaryTree<T>, elmt: &T, path: &[Child]) -> bool {
    match path.get(0) {
        None => tree.value() == elmt,
        Some(Child::Left) => prove_find(tree.left(), elmt, path[1..]),
        Some(Child::Right) => prove_find(tree.right(), elmt, path[1..]),
        
    }
    
}

This allows us to prove the existence of elmt in the tree in $\mathcal{O}(\log_2(n))$. Note that in this solution, the path is not a witness, which allows us to branch on it. Of course this leaks meta-information about elmt, namely it’s position in the sorted list.

If we dive deeper into this example, it turns out that it’s better to not have a sorted tree at all. If we use an unsorted array instead of a sorted binary tree to represent our set, we can calculate the location of elmt on our Turing machine. Of course, this happens in $\mathcal{O}(n)$, but now we can prove in $\mathcal{O}(1)$ that the element is in the set in our constraint system:

fn prove_find(set: &[T], elmt: &T, index: usize) -> bool {
    set[index] == elmt
}

Furthermore, because the set is now unsorted, we don’t leek any meta-information about elmt!

Write your own proof

Of course, the concept of proving instead of calculating is very well known with certain problems like merkle trees and shuffle proofs. However, it is applicable to almost any algorithmical problem you try to model in a constraint system. By using a high-level language that compiles down to R1CS, it is very easy to forget the restrictions of the constraint system. These tools are very powerful, but their abstractions can lead to a severe lack of transparency when writing proofs. Many of these tools try to come as close as possible to the experience of “just copy paste your {C,Java,Rust,…} code and we’ll generate a proof”, which is a very noble goal. But it is very important to remember the fact that you are designing a zero-knowledge proof of knowledge, where you should already have the answer.

I know I might sound like an old-school hardcore computer scientist who says we don’t need fancy compilers or high-level abstractions, that life was just fine when we programmed C or Assembly. However, there is a very big difference here: a traditional compiler compiles from a Turing-complete language, to a Turing-complete language. The source and target language have the same expressive power. This is not the case when you compile a Turing-complete language to a constraint system: the latter is significantly less expressive. Therefore, one should know what they are doing when relying on this automated conversion.

Case study: Circuitree

A nice demonstration of this topic is our own Circuitree. Circuitree is a zero-knowledge reasoner. It can be used to prove that some statement follows from a set of rules and initial data. This is useful for i.e. access control, which is often defined as some set of rules. Circuitree is very much designed to keep the philosophy of this post in mind.

In our initial naive implementation of Circuitree, we did not heed the warnings above. Reasoners are, by their very nature, full of recursion and conditional statements. This means that our initial performance was doubly exponential in the reasoning depth.

A very quick synopsis on how a datalog reasoner works: the reasoner starts with a set of rules, a set of assertions (also called “facts”) and a query. In each iteration, the reasoner applies the rules to the assertion set in order to generate new assertions. These new assertions are then added to the assertion set of the next iteration. This process is repeated until either the query is found, or no new assertions have been added to the assertions sets in the last iteration, indicating that the search space is exhausted.

The issue with the naive implementation is that, because we cannot branch, we have to add the result of each rule application to the assertion set, even if the arguments don’t match. Because we have to supply each possible combination of assertions to each rule, this results in an exponential growth of the assertion set, based on the number of rules and arguments for each rule. This happens in each iteration, which is how we get to the doubly exponential performance.

Our first improvement reduces one of the exponents. To achieve this, we first perform the reasoning on our Turing machine. For each iteration of the reasoner, we log the input and output assertion set. Then, in the constraint system, we prove that the output set follows from the input set, based on the rule set. The next iteration, we start with a fresh precomputed input set, which means, we don’t have an exponential explosion of the assertion set size.

Our second improvement (which sadly did not make it into the article) reduces the second exponent. If we apply our rules in same order in the precomputation as in the constraint system, we can supply the arguments to the rules in the correct order. This means that we don’t need to supply each combination of assertions in the assertion sets as arguments to the rule. Instead, we can simply perform a sliding window, which reduces the number of rule application from $\mathcal{O}(|A|^n)$ to $\mathcal{O}(|A|)$, with $A$ the assertion set and $n$ the number of arguments of the longest rule.

Conclusion

ZK compilers and interpreters do their best to hide the lower layers from the proof designer, but this means that the barrier between the turing-complete source language and the target constraint system becomes very opaque. This means that it is very easy to create unnecessarily big and slow proofs. By keeping in mind that you are performing a zero-knowledge proof of knowledge, you can very significantly improve the efficiency of your constraint system and therefore your ZKP.