Dark Mode Toggle

Obfuscation: building the final boss of cryptography (Part I)

2026 Jun 29 See all posts

Obfuscation: building the final boss of cryptography (Part I)

Special thanks to Sora Suegami, Janmajaya Mall, Aayush Jain and Fun Killer for feedback and review.

The most powerful primitive that has been conceived in cryptography is obfuscation. Obfuscation lets you convert a program \(P\) into an "encrypted program" \(Obf(P)\) such that you can run \(Obf(P)\) on cleartext inputs and get the same cleartext outputs that \(P\) gives you, but the internal workings of \(P\) are hidden. The precise formalism typically used, indistinguishability obfuscation (iO), says that if you are given obfuscations of two different programs that have the same functionality, you can't tell which is which. Effectively, it's hiding the code, not the data.

Obfuscation is powerful because it comes very close to the theoretical ideal of a universal "trustless trusted third party":

Source: The God Protocols (Nick Szabo), 1997

Cryptography protocols are often described by first imagining a protocol that relies on a trusted third party who sees everyone's messages and responds honestly, and then figuring out some way to do the same thing without the trust.

Encryption is simple: the "trusted third party" is effectively a postage system that accepts instructions saying "I want [recipient] to see [message]" and passes along the message to the recipient.
Zero knowledge proofs replace a trusted third party who receives your data, checks it, and then confirms to anyone who asks that the data is in some sense correct

Obfuscation (technically, obfuscation plus hashes) lets you make a simulated trusted third party for basically any protocol, so it can replace both of the above and much much more. There is only one major exception: an obfuscated program can't prevent itself from being copied, so it can't do "stateful" things like money - and that's exactly the gap that blockchains are well-placed to fill.

And so if you have obfuscation and a blockchain, you can do some pretty magical things. Like, say, a secure, private and collusion-resistant voting system that has almost no trust assumption at all - no M-of-N threshold committee required. Or basically anything from this list from 2014, without any M-of-N trust assumption.

A fairly general-purpose way to combine obfuscation and blockchains to make something very close to a "trustless trusted third party"

So what's the catch? Well, it turns out that making a secure form of obfuscation is really really hard.

There has been a decades-long tradition of insecure obfuscation: people shuffling around the logic in compiled programs to make it harder to see what's going on - admittedly, often to prevent users from modifying proprietary programs like games. This is the equivalent of things like the Caesar cipher for encryption - and, like Caesar-style ciphers, it regularly gets broken.

As a result, there has also been a decades-long tradition of trying to create an obfuscation protocol that we can mathematically prove is secure. But almost from the beginning, this ran into a problem. In 2001, we got a famous result that creating an ideal form of obfuscation - obfuscate \(P\) in such a way that running \(Obf(P)\) reveals nothing beyond what you can learn by querying an API that gives \(P(x)\) for any user-supplied \(x\) - is impossible. The core idea is that a code instantiation of \(P\) always reveals at least something beyond its outputs to user-supplied inputs: at the very least, you can learn things by applying \(P\) to its own code.

From that point, researchers shifted to trying to prove the second-best target: indistinguishability obfuscation (iO). This has been a twenty-year project, with many failed attempts, many constructions of protocols that build on top of an ingredient that does not yet exist, many people trying to build that ingredient, failed attempts of that, and so on. But in the last few years, we finally have some good news: we know how to achieve iO under reasonable security assumptions.

But within the good news, there is bad news: the run time is literally galactic. It's technically polynomial, but it involves stacking many layers of "take one thing that's vaguely like fully homomorphic encryption, now put the circuit for evaluating that into another thing that's vaguely like fully homomorphic encryption, run that in plain old regular fully homomorphic encryption once for each bit generating an intermediate multi-megabyte value, and oh yeah, did I mention that you have to put that whole thing into another thing that's vaguely like fully homomorphic encryption, and then run all that once for each bit in the input?" As a result, runtimes for these "sort of provably-secure iO schemes" are somewhere over λ¹⁰ (where λ is the "security parameter", ie. the logarithm of how long it takes to break the scheme; it's standard to say λ = 100 or 120)

There are two hopeful stories that you can tell here. One is that this is similar to where SNARKs were in 2010, and now that we know it's possible, smart people (and bots) will start coming up with clever workarounds to each bottleneck, and chopping off orders of magnitude from the runtime one after the other, and eventually we'll get to something that "only" takes a day on a heavy GPU to run (that may still sound prohibitive, but it's actually enough for many interesting applications). Another is that we will see more work on a different strategy toward the same goal: getting much better at developing new cryptographic assumptions, and getting better at telling which new assumptions are likely to be actually safe.

If you add in the more heuristic approaches, there are roughly three (not-yet-dead) families of obfuscation protocols so far, and you can place them on a "tradeoff frontier" of efficiency vs bravery on security assumptions:

This post will describe in detail the most galactic, but also the most rigorous, family so far: the one that's in blue in the diagram above.

Just a warning: there will be a lot of math. Obfuscation is hard because it basically requires stacking almost every primitive that cryptographers have invented in the past twenty years, except for the primitives that you already know about if you're a blockchain developer, such as SNARKs and STARKs. The underlying math will also be different: whereas SNARKs and STARKs tend to have a lot of polynomials, hashes and elliptic curves, obfuscation will have a lot of lattices, vectors and matrices.

Notation notes

The descriptions in this post are based on a combination of approaches from different papers, they do not completely follow any single one of them.
In some cases choice of notation and vocabulary will differ from any individual underlying post
Vectors are lowercase, matrices are uppercase
In the diagrams, the greyed text and dotted lines informally mean "X depends on Y, but you don't have to actually pass Y into X, so the (size / runtime) of X may be much smaller than the (size / runtime) of Y"
"iO" = "indistinguishability obfuscation", as opposed to "IO" = "input/output", and both as opposed to the British Indian Ocean Territory, which is what the .io TLD is named after

The standard pipeline

The "reasonably provably secure" obfuscation protocols that are built today are built on top of a decade-old tower of constructions: the AJ15 / BV15 / LPST15 / LPST16 lineage.

AJ15 and BV15 are two roughly simultaneous papers that both discovered roughly the same way to build obfuscation on top of a primitive called functional encryption: an "authority" publishes keys tied to a function \(F\), and once those keys exist, anyone with the encryption key can encrypt \(x%\) in such a way that anyone with the decryption key can recover \(F(x)\). LPST15 came up with something similar. But the functional encryption required needed to have very strong properties which were not yet available. Fortunately, a year later, LPST16 discovered a way to build obfuscation on top of something similar, sublinear compact randomized encoding, and then built it on top of an already-known "succinct-but-not-compact" functional encryption scheme plus a new primitive called XiO - obfuscation that is only slightly smaller in size than publishing a table of the outputs of the function on all possible inputs. The bulk of the work since then has been figuring out ways to actually implement XiO (though some protocols take other routes, eg. JLS20).

We will split this description in three parts:

Assuming you have a succinct FE scheme and an XiO scheme, how do you build an obfuscation protocol?
How do you build a succinct FE scheme?
How do you build an XiO scheme?

Assuming you have a succinct FE scheme and an XiO scheme, how do you build an obfuscation protocol?

There are several ways to build an obfuscation protocol on top of succinct FE and XiO, and they are all roughly equal in their high-level principles and their properties. So I will stick to a simplified version of the LPST15 design.

Note that in this post, we will often use "function" and "circuit" interchangeably. A circuit is a set of AND, OR, NOT, etc gates with wires linking them that can be used to evaluate a function. To get a better intuition of circuits, I recommend this post explaining garbled circuits (a primitive that we will need later anyway).

First, let us define succinct FE and XiO, so we understand the properties of these gadgets.

Here is succinct FE:

Authority generates circuit-independent public parameters, and circuit-specific decryption keys for a publicly-known function with a single-bit output with circuit \(C\).

Encryptor can use the encryption keys to encrypt \(x\). The runtime of this step does not increase (or increases only slightly) with the circuit size of \(C\), though it does scale linearly with input length.

Decryptor can use the decryption keys to learn \(C(x)\) and nothing else about \(x\). The runtime of this step does scale with the circuit size of \(C\).

In other words: someone does a trusted setup, I encrypt \(x\), you can decrypt \(C(x)\) (different from fully homomorphic encryption: in FHE, anyone who can decrypt \(C(x)\) can also decrypt \(x\) or any other function of \(x\))

Functional encryption is not on its own sufficient for obfuscation, because (i) it doesn't hide the function, and (ii) each published encryption is bound to one input and can only decrypt the function evaluated on that one input. But it gets you a lot of the way there.

Note that the more common definition of succinct FE requires the FE to tolerate high circuit size, not high circuit depth. In our usage, however, we do need to tolerate high depth. Fortunately, depth-independent succinct FE is a fairly simple wrapper on top of depth-bounded succinct FE (we will get into this later), so in this post we will just assume that the succinct FE that we use is also depth-independent.

Now, here is XiO:

Generator chooses parameters for a hidden function with circuit \(C\) that has a single-bit output. They generate an encoding of that function. This step is allowed to take as long as evaluating \(C\) on all possible inputs or even longer, but the output must be smaller than the truth table (the set of outputs for all possible inputs)

Evaluator evaluates the encoding to learn \(C(x)\) for any input \(x\).

The class of functions that XiO is designed for is functions that are really best understood as zero-input functions (aka "thunks") that produce a large but manageably-sized output. Given a thunk \(() \rightarrow X\), we can think of it as being a function \(i \rightarrow X[i]\), ie. the function which takes as input an index \(i\) as input, and returns the i'th output of the thunk. These are two different views of the same object; in this post we will regularly bounce back and forth between these views.

For example, you could imagine taking a function that generates a STARK (a roughly 128-512 kB sized cryptographic proof) and turning it into a function that takes as input an index \(0 \le i \lt 2^{22}\), generates the STARK internally, and then outputs the i'th bit of the STARK. The goal of the XiO is that it's a gadget smaller than the STARK (though note: even a STARK may be too small to actually allow known XiO protocols to shrink it) that lets you generate that STARK, without being able to learn anything else about the underlying process that's generating it.

Now, we combine succinct FE and XiO into a primitive called a sublinear compact randomized encoding:

Let's walk through this carefully. The goal here is to create a "randomized encoding" of \(P()\) - that is, a gadget that enables executing \(P()\) - which is asymptotically smaller than the runtime of \(P\) and asymptotically smaller than its output, and which hides \(P\).

This is slightly different than the usual presentation of randomized encodings, which operate over \(P(x)\) and hide \(x\) but not \(P\) (eg. garbled circuits work this way), but this is what we need for our use case - obfuscation is all about hiding the function.

The primitives that we will be working with do not hide the function. To do this, we make the "outer" function that these primitives are operating over just be a virtual machine (or "Turing machine") - basically, a function \(VM\) which takes circuits as input, so \(VM(P, X) = P(x)\). The circuit evaluating \(VM\) is often called a universal circuit. This way, \(P\) becomes just another input, which can be made private.

One nuance in terms of asymptotic complexity: \(VM\) needs to be instantiated as a circuit, and the size of a circuit must be proportional to its runtime (because circuits do not allow looping). Hence, the size of the \(VM\) circuit must be proportional to the runtime of \(P\). Fortunately, succinct FE allows generation to be fast even if the circuit is large and deep. However, the output may have size and encryption cost that scales with the size of \(P\). This is the reason why \(P\) must be expressed in a language other than circuits (such as a Turing machine): its size must be fixed even if the computations get large.

We do a trusted setup of succinct functional encryption, where the creator publishes public params and decryption keys for \(VM\). This allows anyone to encrypt \((P, x)\) in such a way that anyone else can decrypt \(VM(P, x) = P(x)\), but without learning \(P\) or \(x\).

To generate this randomized encoding, the creator wraps the two primitives I mentioned above inside each other: they do an XiO of a succinct FE encryption of \(VM(P, x)\). The succinct FE encryption hides \((P, x)\), and it's fast to generate even if \(P\) takes a long time to run. But succinct FE is not output-compressing: if the output is big, the succinct FE encryption is even bigger (and this is an unavoidable limitation of that type of construction). The XiO cuts the size back down. The creator makes a circuit \(C_{sfe}(i)\) that outputs the i'th bit of the succinct FE encryption, and does XiO over that. Its size is now asymptotically smaller, and it's asymptotically faster to generate, than the program \(P\) itself (note "asymptotically": at small sizes, the overhead of the scheme dominates, the key property we want is that for sufficiently large circuits, \(xIO(sFE(VM(P, x))))\) becomes smaller than \((P, x)\) itself, both in byte size and in generation time.

Notice that here we already have our first glimpse of obfuscation: the creator themselves can both do the trusted setup and publish the xIO, and others can execute it without being able to learn the program being run. But we have a big problem: this only works for one pre-configured input.

Now, we get to the next part: the final obfuscation construction.

Essentially, we take the sublinear compact RE primitive, and we apply it recursively, recursing on the number of bits of input going into the function you are trying to obfuscate. At the base case (zero inputs), we just use the sublinear compact RE as above. To add one bit, we make a sublinear compact RE of the process of generating two obfuscations one level lower: obfuscate the function \(P_0(x)\) which executes \(P\) but with the first input bit fixed to \(0\), and \(P_1(x)\) which executes \(P\) but with the first input bit fixed to \(1\).

Remember the key properties of the two building blocks of sublinear compact RE:

XiO ensures that this obfuscation can be smaller than the two sub-obfuscations it is generating
Succinct FE ensures that it can be faster to generate than the two sub-obfuscations it is generating.

Both properties are needed to ensure the recursion avoids blowup.

To evaluate an obfuscated program, you evaluate the top-level RE to get your two new obfuscations (both with one fewer input bit required), then choose either left or right based on the first input bit, then keep recursing further down, until you bottom out with a thunk that generates \(P(x)\) for the \(x\) that is the full path you walked down.

Here's another equivalent way to look at what's going on:

For programs with an n-bit input, there is a depth-n, size \(2^n\) tree of all possible evaluations. However, most of this tree is never evaluated. The whole thing is a tree of "thunks", and so the exponential cost of creating the tree is never paid because all the thunks that are not on the path to the specific input you care about never actually get instantiated. The only thing that does get instantiated is the obfuscation at each step along with path going from the root to the input. And at a very high level, that's it, that's the core design.

Note that the original LPST15 and LPST16 papers describe what is going on slightly differently (but equivalently): they do not recurse on obfuscation directly, rather they recurse on a "node program". One nice property of that approach is that the node program captures the "bits of the input so far", so you don't have to manipulate or wrap \(P\) itself; \(P\) stays unchanged throughout the whole process in their version.

Another thing that the above description elided is that for obfuscation to be secure, it needs to be randomized. So actually, it's not \(iO(P)\), it's \(iO(P, seed)\), where the \(seed\) is a value that is hashed to generate (pseudo-)randomness needed by primitives like garbled circuits. Randomized encoding, XiO and succinct functional encryption also all take a \(seed\) as an argument. When one of these functions calls any other as a subroutine, it should generate the seed for the subroutine via a hash from its own seed, and take care to provide a unique different seed for each one. To make the security proofs clean, the hash must technically be a puncturable PRF, though with the interesting property that the punctured version of the hash needs to be possible to construct but never actually gets run.

How do you build a succinct FE scheme?

The bad news: the succinct FE scheme is a complicated tower of constructions, one stacked on top of the other.

The good news: these constructions are relatively well-understood, use very standard cryptographic assumptions ("just" lattices), and the pipeline to build succinct FE has been understood since roughly 2014.

Here's the diagram first:

Let's go through these one by one. The two most fundamental building blocks are garbled circuits and fully homomorphic encryption.

Garbled circuits

Here is a post where I explain garbled circuits in more detail. The basic summary is:

For each wire in a circuit you generate two labels, one representing 0 and one representing 1.
For each gate, you publish a table of which pair of input labels corresponds to which output label, stored sorted by label so it's not clear which input label corresponds to zero and which corresponds to one.
The output labels are not given "in the clear"; instead, they are typically XOR'd with the hash of the two inputs. This makes them calculable when needed, but prevents you from learning anything about the "branch" other than the one that you have both input labels for.
To execute, you need to obtain the labels corresponding to your input, and then you "walk down the circuit" to eventually determine the values of the wires at the end (which are actually values, and not just labels)

Garbled circuits can only safely be executed once. If you give someone the labels for two inputs, they are able to compute much more than two outputs. Garbled circuits were originally developed as a 2-of-2 multi-party-computation primitive: I have a circuit \(C\), you know \(x\), I send you a garbling of \(C\), for each input wire I help you learn one of the two input wires without learning which one using a technique called oblivious transfer, and then you walk the circuit to compute the output \(C(x)\).

A basic rough description of how oblivious transfer works: you send me two public keys \(A\) and \(B\) that have to add up to some random hash so only one of them can have a valid private key but I don't know which one, I encrypt one label with \(A\) and one with \(B\), you decrypt the label that you have a private key for, but I can't tell which one that is.

Here, however, we are not doing 2-of-2 computation. Rather, we are using garbled circuits as an ingredient to get the properties that we need for succinct FE. Garbled circuits have many valuable properties. Notably:

You can give someone labels for an input without revealing the input.
Because the per-gate work is parallel, generating a garbled circuit for \(C\) is low-depth, even if \(C\) itself is very high depth.

Both of these properties will be important.

Garbling hides some information about the circuit, but not enough to be meaningfully useful for anything where you want function-hiding. When function-hiding is required (eg. the 2-of-2 MPC use case), a typical solution is to garble a universal circuit (aka virtual machine) and have the circuit provider also provide the actual circuit \(C\) they want to run as labels that become part of the input. For our use case, we don't care about garbling leaking details of the function, in fact precisely because we are employing this "wrap it in a VM" trick.

Fully homomorphic encryption (FHE)

Here is a post where I talk about FHE in more detail. Because the details matter for the other constructions later in this post, I will provide a basic summary.

FHE is based on a cryptographic assumption called learning with errors (LWE). Basically, if you have an approximate solution to a system of linear equations modulo some number (ie. a matrix \(A\), vectors \(s\) and \(b\), modulus \(q\) such that \(b = (A*s + e) % q\) modulo \(q\), where \(e\) is a "small" (aka "low-norm") error vector, so each value in \(e\) is much smaller than \(q\)), then given \(b\) and \(A\) you can't extract \(s\). If you have an exact equation \(s * A = b\), then that's a system of linear equations (modulo q), which can be done reasonably cheaply with Gaussian elimination. But with errors added, doing this inversion is computationally infeasible.

There are many FHE algorithms built on top of this assumption that have different properties. The simplest idea involves randomly choosing a private key \(s\), generating a random matrix \(A\), and computing \(b = A * s + e + m * \frac{q}{2}\) modulo \(q\) as the encryption of a single bit \(m\) (here, adding a bit to a vector adds that value to every element of the vector).

The creator can then publish encryptions of \(0\) and \(1\) computed this way. To decrypt, you can compute \(b - A * s\) which equals \(e + m * \frac{q}{2}\); the higher-order bit encodes the message and the lower-order bits encode the error which can be thrown away. If you have two ciphertexts \(b_1\) and \(b_2\) that are valid encryptions of \(x_1\) and \(x_2\), then \(b_1 + b_2\) (again, all modulo \(q\)) is a valid encryption of \(x_1 + x_2\).

Notice also that the above design as written can support encrypting vectors of bits in the message slot, and not just single bits. But there's one critical thing that the above design does not actually support: multiplication.

Multiplication is trickier than addition, for a few reasons. First, you can't multiply two vectors by each other and get a vector; you can only do that to other structures like matrices or polynomials. Second, whereas adding two three-term values \(A * s + e + m * \frac{q}{2}\) gives you another three-term value, multiplying two three-term values gives you a nine-term value, so you're still getting something of a different "shape". Third, you end up multiplying the "small" error by "large" other things, which makes the error blow up after only one round of multiplication.

There is no one easy solution to this. There are several families of solutions, that all have different drawbacks. The ones that use polynomials are based off of a stronger assumption called Ring LWE; a commonly-used scheme is BFV. The scheme that is most convenient to FHE usage is based on matrices, and is called GSW. To show how it works, I will first show a simplified version that makes the basic arithmetic work, but does not deal with the error issue, so it breaks if you have nonzero error.

First, generate a secret \(s\). To encrypt a value \(m\), first generate an otherwise-random matrix \(A\) where \(s * A\) equals some "small" or "low-norm" error \(e\). One way to do this is to generate both in two parts:

Generate a random \(s_{prefix}\), and then set \(s = (-s_{prefix}\ |\ 1)\) ( \(|\) is concatenation)
Generate a random \(A_{prefix}\), and a random low-norm error \(e\), and then set \(A = ({{A_{prefix}} \atop {s_{prefix} * A_{prefix} + e}})\) (that's vertical concatenation)

Then, compute \(c = A + m * I\) (where \(I\) is the identity matrix). And to decrypt the ciphertext, read off \((s * C)[-1]\) (ie. the last value of \(s * C\)).

If you are a mathematician, you might notice that we are "hiding" \(m\) in an ~~igon value~~ eigenvalue of C: \(s\) is the eigenvector, so it satisfies \(s * C \approx s * m\). They are not quite an exact eigenvector and eigenvalue; they're eigenvectors and eigenvalues with errors. But first, we can set aside the errors, and note some properties that make eigenvalues special.

Adding ciphertexts is again trivial: if

\(C_1\) satisfies \((s * C_1)[-1] \approx m_1\)

\(C_2\) satisfies \((s * C_2)[-1] \approx m_2\)

then

\(C_1 + C_2\) satisfies \((s * (C_1 + C_2))[-1] \approx (m_1 + m_2)\)

But unlike before, we can multiply ciphertexts too: if

\(s * C_1 \approx s * m_1\)

\(s * C_2 \approx s * m_2\)

then

\(s * (C_1 * C_2)\)

\(= (s * C_1) * C_2\)

\(\approx m_1 * s * C_2\)

\(\approx m_1 * m_2 * s\)

Eigenvalues are basically the only attribute of a matrix that has both an additive and a multiplicative property like this, and even there we are restricting to matrices that have the same eigenvector.

Now, let's bring back error tolerance. The mechanism above, as written, has a fatal flaw: if you write out the full equations with error, you end up multiplying error by elements of \(C\) and \(s\), which can be anywhere in the full range \([0, q-1]\). Hence, this immediately blows up error to the maximum. Additionally, even decryption cannot tolerate error.

To plug the first hole (we'll get back to the second at the end), actual GSW relies on a gadget matrix mechanism.

Here is what a gadget matrix looks like:

On its own this looks nice but it does not make much sense. However, it is intended to be paired with an operation (not a matrix) that it cancels out: the matrix bit decomposition operation:

Every adjacent four values in the output of this operation (which we call "\(G^{-1}\)") is the (least-significant-bit first) binary encoding (aka. bit decomposition) of the correspondingly-placed value in the input.

The most important facts about \(G\) and \(G^{-1}\) are:

The output of \(G^{-1}\) is "low-norm" (all small values)
\(G * G^{-1}(x) = x\), for any \(x\)

You can see the latter intuitively. If you trace the highlighted bit-decomposition of 2 (0 1 0 0) in the above example through the computation, then it gets multiplied by 1 2 4 8 from the gadget, and the result is \((0 * 1) + (1 * 2) + (0 * 4) + (0 * 8) = 2\). That is, if you apply the gadget matrix to a bit decomposition, it "evaluates" the bit decomposition to get back the original value.

In those places where we used the identity matrix (\(I\)) in the simplified description above, in the actual protocol we use \(G\). That is, ciphertexts are of the form \(A + m * G\). We multiply two ciphertexts by computing \(C_1 * G^{-1}(C_2)\). The message is being hidden in something that's no longer quite an eigenvalue-with-errors, but we still preserve the needed properties. The math ends up working the same way, but because the error never gets multiplied by large numbers, multiplication actually works. Specifically:

Encryption: \(C = A + m * G\)
Decryption: \(\frac{(s * C)[-1]}{q / 2}\)
Adding: \(C_1 + C_2\) (duh)
Multiplying: \(C_1 * G^{-1}(C_2)\)

Note one subtlety in the decryption. The last column of \(G\) contains \(\frac{q}{2}\) as its only nonzero entry, and so multiplying by \(G\) actually gives us an object containing \(m * \frac{q}{2} + e\) (and not \(m + e\)). This is what gives us error tolerance in decryption. This is also why we are now dividing by \(\frac{q}{2}\) when we encrypt. Note that this method as written requires \(q\) to be a power of two. If you want, you can make \(q\) be something else, but this requires tweaking the decryption procedure slightly.

We can check the correctness of multiplication. Let's start with just the algebra. First for addition:

\(C_1 = A_1 + m_1 * G\)

\(C_2 = A_2 + m_2 * G\)

And then for multiplication:

\(C_1 * G^{-1}(C_2)\)

\(= (A_1 + m_1 * G) * G^{-1}(C_2)\)

\(= A_1 * G^{-1}(C_2) + m_1 * G * G^{-1}(C_2)\)

\(= A_1 * G^{-1}(C_2) + m_1 * C_2\)

\(= A_1 * G^{-1}(C_2) + m_1 * A_2 + m_1 * m_2 * G\)

Now, we have to show that the first two terms are both valid "pads", in the same way that the original \(A_1\) and \(A_2\) were valid pads. The core property we need is that multiplying \(s * A\) leaves only a small low-norm error.

The second term is easy: \(A_2\) is a valid pad (\(s * A_2\) is low-norm), and we multiply it by \(m_1\), which is small, so \(m_1 * A_2\) is also low norm and hence a valid pad.

For the first term, let us unpack \(s * A_1 * G^{-1}(C_2)\). \(s * A_1\) returns a "small" error. \(G^{-1}(C_2)\) returns a matrix of ones and zeroes. Hence, \(s * A_1 * G^{-1}(C_2)\) is a small error multiplied by a matrix of ones and zeroes, which is still a small error.

The error in the second pad blows up by roughly the size of the message space (so, 0 or 1). The error in the first pad blows up by roughly the size of the matrix. Hence, each multiplication blows up the error by a constant factor, and so we can predict how many multiplications the FHE can survive before the error gets too high. At that point, you would need to reset the error by bootstrapping: evaluate the FHE decryption circuit inside FHE.

Finally, one more nuance: to keep the error blowup bounded, we need to require messages to stay small; specifically, they must be 0 or 1 (we can also accept eg. -1 or 2). Addition and multiplication do not respect size limitations, and unlike BGV, here we do not have native "wraparound" modulo 2. So to make the above algorithm safe, you would need to use circuits with "logic gates" implemented like this:

\(AND(a, b) = ab\)
\(OR(a, b) = a + b - ab\)
\(XOR(a, b) = a + b - 2ab\)
\(NOT(a) = E_1 - a\) where \(E_1\) is an encryption of 1.

This is only one way to do it; it's not optimal; there are ways to do it more efficiently, and that is part of the art of FHE optimization.

Here is some python code that follows along this style of GSW, though note that it uses a slightly different technique where \(A * R\) is the "pad" instead of \(A\) (this has all of the properties above, plus an extra property that we will need later). https://gist.github.com/vbuterin/f0f8a9eb09633226ada20c21a98d537e

It's worth playing around with this kind of FHE, because it helps build intuitions for some of the other primitives that we are going to use below.

Now, let's get to the next one:

Attribute-based encryption (ABE)

Not this Abe.

And not this Abe.

And also not this Abe.

Here is the definition of attribute-based encryption:

Authority generates keys for a function with circuit \(C\) , they generate a "master public key" \(mpk\) (public), and a secret key \(sk_C\) (given to decryptor) that depends on \(C\)

Encryptor knows \(m\) (a message to encrypt), they know a \(tag\) for that message, they do not necessarily know \(C\) but they do know \(mpk\). They produce a ciphertext \(T\)

A decryptor given such a \(T\) and \(sk_C\) can decrypt to recover \(m\) if \(C(tag) = 0\), otherwise they cannot.

Usually, attribute-based encryption is justified by examples like: \(m\) might be medical records at some hospital \(H\), \(tag\) might be an object that represents "you work at \(H\) AND you have a medical degree", so it becomes easy to encrypt the medical records in such a way that only the intended doctors can see them. As far as I can tell, however, the number of use cases that map well to this paradigm and can't easily be solved with much simpler public key encryption is very small. And so ABE has not been used much in practice.

_.But here, we have very good news: ABE has some valuable properties that make it very useful in building functional encryption! (And hence, obfuscation) In particular, even for circuits that are large, encryption is cheap. This is the ultimate seed of the asymmetry that allows the "thunks" in the obfuscation to be faster to create than they are to execute, and hence makes the exponentially-sized tree possible at all.

Here is how ABE works. Here we will follow the BGG+14 construction, whose properties are ideal for the succinct FE and obfuscation use case.

We work over a circuit. To have an example in your head, take the circuit that we used in the garbled circuits example (it's a two-bit adder), but remove the "two labels per wire" piece:

Instead, we will have only one matrix \(B_w\) per wire, plus two randomly-generated global public matrices \(A\) and \(D\).

The initial \(B_i\) are generated randomly. To generate the \(B\) matrices for the wires further down, we walk down the circuit going through each gate:

If it is ADD, then compute \(B_{out} = B_{L} + B_{R}\)
If it is MUL, then compute \(B_{out} = B_R * G^{-1}(-B_L)\)

As in FHE, we need to build OR, XOR and AND from ADD and MUL in ways that preserve the wire values staying in \(\{0, 1\}\).

The authority publishes circuit-independent \(A\) and \(D\), and the \(B_i\) for the input wires for the circuit. They also provide the decryptor with a decryption key, a low-norm matrix \(R_f\) that satisfies \((A | B_f) * R_f = D\), where \(B_f\) is the \(B\) matrix for the wire that encodes the output (ie. \(C(tag)\)). Notice that the construction of \(B\) matrices is not dependent on the input to the circuit, which is why the authority is able to construct \(B_f\) in a way that is then valid for all inputs. To make the equation between \((A | B_f) * R_f = D\) correct, \(A\) needs to be constructed in a special way that gives it a "trapdoor"; we will return to this later.

First, let's go through the encryptor and decryptor's logic.

To encrypt, the encryptor chooses a random vector \(s\), and computes \(c_{out} = s * D + e_{out} + \frac{q}{2} * m\). They also provide the input encodings, \(c_i = s * (B_i + G * tag[i]) + e_i\), and \(c_A = s * A + e_A\). Notice how encrypting does not depend on the circuit itself; it only requires knowing and doing a few multiplications involving the input \(B_i\) and \(D\).

The decryptor's job will be to start with these \(c_i\) values, and walk down the circuit, doing a homomorphic-encryption-like operation at each gate to convert the \(c_L\) for the left input wire and \(c_R\) for the right input wire into \(c_{out}\) for the output wire.

Here, the ADD case is again trivial: \(c_{out} = c_L + c_R\). The MUL case is the harder one.

The formula is: \(c_{out} = x_R * c_L + c_R * G^{-1}(-B_L)\)

\(x_R\) here is the actual value on that wire in the computation. To see why this works, let's look at each component in turn. For ease of exposition, I'll write \(G^{-1}(-B_L)\) as \(R_{-L}\); just remember that because \(G\) and \(G^{-1}\) cancel out, \(G * R_{-L} = -B_L\):

\(x_R * c_L\)

\(= x_R * (s * (x_L * G + B_L) + e_L)\)

\(= x_R * s * (x_L * G + B_L) + x_R * e_L\)

\(= s * (x_L * x_R * G + x_R * B_L) + x_R * e_L\)

\(c_R * R_{-L}\)

\(= (s * (x_R * G + B_R) + e_R) * R_{-L}\)

\(= s * (x_R * G * R_{-L} + B_R * R_{-L}) + e_R * R_{-L}\)

\(= s * (-x_R * B_L + B_R * R_{-L}) + e_R * R_{-L}\)

If we add the two together, the \(x_R * B_L\) cancel out, and we get:

\(s * (x_L * x_R * G + B_R * R_{-L}) + x_R * e_L + e_R * R_{-L}\)

The \(B_R * R_{-L}\) on the left side is just \(B_R * G^{-1}(-B_L)\), which is the definition of \(B_{out}\) for multiplication that we gave above. And the right side is error multiplied by low-norm values, so it stays as low-norm error.

And so the decryptor has everything they need to walk their way to \(c_f\).

If \(C(tag) = 0\), then the \(s * x_f * G\) term drops off from the definition of \(c_f\), and we just have: \(c_f = s * B_f + e_f\)

Once they get to \(c_f\), they prepend \(c_A\) to get \(c'_f = (c_A\ |\ c_f)\), which satisfies \(c'_f = s * (A\ |\ B_f) + e'_f\). They then compute:

\(c_{out} - c'_f * R_f\)

\(= s * D + e_{out} + \frac{q}{2} * m - s * (A\ |\ B_f) * R_f - e'_f * R_f\)

\(= s * D + e_{out} + \frac{q}{2} * m - s * D - e'_f * R_f\)

\(= e_{out} + \frac{q}{2} * m - e'_f * R_f\)

If the error did not blow up too much (remember, \(R_f\) is also low-norm), the decryptor can now freely recover \(m\).

Now, let's get back to the trapdoor mechanism.

We do not construct \(A\) randomly. Instead, we construct it as \(A = (A_{prefix}\ |\ G - A_{prefix} * R)\), where \(R\) is a low-norm "trapdoor", and \(G\) is the gadget matrix from before. Similarly to the FHE ciphertexts we saw, under the LWE assumption this kind of construction is computationally infeasible to distinguish from random if you do not have the trapdoor yourself.

The goal is to satisfy a neat mathematical property:

\(A * {R \atop I}\)

\(= (A_{prefix}\ |\ G - A_{prefix} * R) * {R \atop I}\)

\(= A_{prefix} * R + (G - A_{prefix} * R) * I\)

\(= A_{prefix} * R + G - A_{prefix} * R\)

\(= G\)

Or graphically:

We've constructed a matrix \(A\), where in some sense \(R \atop I\) is a sort of "inverse" of A (remember: \(G\) often plays the role of "pretending" to be the identity matrix in these types of LWE constructions).

The goal here is that we want to create matrices for which, for any known vector \(u\), only the creator can find a low-norm vector \(r\) where \(A*r = u\). That is, the trapdoor lets us "solve" the short integer solution (SIS) problem, which is closely related to LWE and is the basis for this type of cryptography working.

Here is the algorithm:

Binary-decompose \(u\) (ie. apply good old \(G^{-1}\) to it), let the output be \(z\)
Output \(r = {R \atop I} * z\)

\(R\), \(I\) and \(z\) are all low-norm, so this is also low-norm. To see algebraically why this solves the equation, compute:

\(A * r \\ = A * {R \atop I} * z \\ = G * z \\ = u\)

To guarantee that this is safe (in the sense of not leaking \(R\)), real-world implementations add additional error at this step; see MP12 for more details on maximally efficient ways to do this.

Now, let's go back to the still-unsolved puzzle from above: computing \(R_f\).

The equation we are solving for is:

\((A\ |\ B_f) * R_f = D\)

We will interpret \(R_f\) as \(R_{top} \atop R_{bottom}\), so we get:

\(A * R_{top} + B_f * R_{bottom} = D\)

We sample a random low-norm error for \(R_{bottom}\), so now we only need to solve:

\(A * R_{top} = D - B_f * R_{bottom}\)

Here, we just use the method above row-by-row to compute each row of \(R_{top}\). And we're done.

An astute reader may notice a vague resemblance to ring signatures, at least for the 1-of-2 case: you compute one of the two terms "the easy way", and you compute the other value to match (which you're "not supposed to be able to do") using a trapdoor.

For clarity, note that the \(R\) in the trapdoor is never given out, not even to the decryptor. \(R_f = {R_{top} \atop R_{bottom}}\) is what's given to the decryptor.

Now, let's recap what we have:

Two global public matrices \(A\) and \(D\), one random and the other computed via a trapdoor.
For each wire, a matrix \(B_w\); the output wire's matrix is \(B_f\)
A decryption key \(R_f\) satisfying \((A\ |\ B_f) * R_f = D\)
An encryption is \(c_i\) values, one for each input wire, and a \(c_A\) corresponding to \(A\) and a \(c_{out}\) corresponding to \(B_f\) but with the message mixed in.
The decryptor walks to \(c_f\), and computes \(c_{out} - (c_A\ |\ c_f)\) to obtain the message. But this works ONLY if \(C(tag) = 0\); if it is not, then there is a \(s * x_f * G\) term mixed in that throws off everything.
The most important amazing property of this is that you're able to generate a ciphertext whose decryptability depends on a long computation, without yourself ever running that computation (though an "authority" - think, trusted setup - does need to also run it once)

The main limitation of this scheme is on circuit depth: each multiplication increases error by a constant amount, and so the deeper the circuits, the bigger the values and the matrices need to be to compensate. You can do arbitrarily deep circuits in theory, but if you try, the overhead of making the ciphertexts big enough to handle the error blows up. This is what leads to the depth limitation in basic succinct FE, which the wrapper we will discuss later removes.

And that's ABE.

By the way, it's also not this Abe.

Functional encryption for fixed-depth circuits

Surprisingly, at least for this bit, we're now done the hard math. The next section is going to be just stitching together lego bricks.

First, the definition of functional encryption:

Authority generates keys for a function with circuit \(C\) , they generate circuit-independent public params (including a public key) as well a decryption key which depends on \(C\)

Encryptor encrypts \(x\) with the public key

Decryptor learns \(C(x)\) and nothing else

A particular property that we'll be after is succinctness: like we saw with ABE, encrypting must be cheap, even if the circuit is large.

Here's the process (following the GKP+13 construction). First the authority's work. The authority chooses the circuit \(C\) that they are generating parameters for.

Let the length in bits of an FHE ciphertext be \(w\)
Generate ABE public params and \(2*w\) ABE keys
- The ABE circuit is "take an FHE pubkey \(FHE.pk\) and the FHE-encrypted input \(E_x\), compute \(FHE.eval(C, E_x)\), output the i'th bit"
- "The i'th bit" has w options. Also, we make one set of params that output the i'th bit directly, and another that flip it before outputting (ie. the decryption is conditional on the output being 1). This contributes two options per bit. Hence, \(2*w\) sets of params total.
Publish the public params of the ABE, and give the \(2*w\) ABE decryption keys to the decryptor.

An alternative to issuing \(2*w\) ABE encryptions is to modify the ABE so that it simultaneously encrypts two messages, one decryptable only if \(C(tag) = 0\) and the other only if \(C(tag) = 1\). This adds some more complexity, but saves 2x overhead.

Now, the encryptor's work. The encryptor has a value \(x\) that they want to encrypt.

Generate an FHE keypair \((FHE.sk, FHE.pk)\), encrypt \(x\) to get \(E_x\)
Generate a garbling of the circuit that takes in a ciphertext \(E_y\) and outputs \(FHE.dec(E_y, FHE.sk)\) (note that \(FHE.sk\) is hardcoded into the circuit here)
This garbled circuit has input labels \(L_{\{index, bit\}}\) and then more labels for each intermediate gate and outputs. Encrypt \(L_{\{index, bit\}}\) under the (index, bit)'th ABE key, using \(E_x\) as the tag.
Output: \(FHE.pk\), \(E_x = FHE.enc(x, FHE.pk)\), and the full garbling, but with the input labels encrypted.

The decryptor does the ABE decryption procedure on the encrypted \(L_{in}\) labels using \(E_x\) as the tag, and thereby learn the input labels. They run the garbling and get the output.

Now, let's recap what we did here:

We kept the ABE property that the encryptor does not have to run the circuit; only the authority and the decryptor do.
We changed the logic: instead of the circuit \(C\) conditioning whether or not you can decrypt some unrelated message \(m\), the circuit is the only evaluation of \(x\) that you can decrypt
The core idea is that the ABE's property (only decrypt is you're "supposed to", as defined by some circuit \(C\)) ensures that the decryptor only gets the input labels to the garbling that they are supposed to. The garbling allows them to FHE decrypt only that value. If they try to run some different computation, then the \(c_f\) in the ABE decryption will have an extra dangling \(s * G\), they won't be able to finish decrypting to obtain valid input labels, and so they won't be able to run the garbling.
We inherit the main limitation of ABE: because the thing done inside the ABE is FHE evaluation of the circuit, which is as deep as the circuit multiplied by a significant blowup factor, it only works well for circuits of limited depth.

Functional encryption for unlimited-depth circuits

This one is a one-liner. Instead of ABE-encrypting to \(C\), you ABE-encrypt to the function that takes as input \(x\) and generates a garbled circuit for \(C(x)\) (using eg. \(hash(x)\) as the randomness for the garbling). Garbled circuit generation is parallelizable, so this is low-depth even if the circuit is high-depth. The decryptor runs low-depth FE to obtain the garbled circuit including the appropriate input labels, and then walks the garbled circuit in the clear to obtain the outputs.

And, we're done succinct functional encryption! To recap the whole thing, here's our original diagram again:

How do you build an XiO scheme?

Let us first remember the XiO definition:

Generator chooses parameters for a hidden function with circuit \(C\) that has a single-bit output. They generate an encoding of that function. This step is allowed to take as long as evaluating \(C\) on all possible inputs or even longer, but the output must be smaller than the truth table (the set of outputs for all possible inputs)

Evaluator evaluates the encoding to learn \(C(x)\) for any input \(x\).

"XiO" technically stands for "exponentially-(in)efficient indistinguishability obfuscation", but it's best not to think of it as being obfuscation at all. Instead, it's a way to compress a piece of data, in the special case where the data is generated by a program that is much smaller than the data, and where you want to hide the program. It's designed for "thunks" (zero-input functions) that output \(n\) bits, not what you might intuitively think of as "actual functions".

Most known constructions of XiO to date build upon a primitive called split-FHE. Here's a definition of split-FHE:

Generator is assumed to have the FHE encryption and decryption keys, and a specific ciphertext that they have in mind. We assume an encryption scheme that can support efficiently encrypting vectors (and not just single bits). Given a specific ciphertext \(C\) encoding a specific vector \(v\), they generate a hint \(h\), which is much smaller than \(v\).

Decryptor, given \(C\) and \(h\), can use the hint to decrypt \(C\) to get \(v\), but cannot use it to decrypt any other ciphertext or learn anything else about the FHE keys or the computation that \(C\) was derived from.

Here is how XiO built on split FHE works:

If we are encoding a thunk with output size \(n\), we treat it as a function \(Q\) that takes an index \(i \in \{1...n\}\), and outputs the i'th bit produced by the thunk.

We split up the input space into \(\sqrt{n}\) buckets each of size \(\sqrt{n}\). For each bucket, we FHE-encrypt (using GSW) each index inside that bucket (eg. \(1...\sqrt{n}\) for the first, \(\sqrt{n}+1 ... 2*\sqrt{n}\) for the second), and run \(Q\) on each index. Then, we use a special technique to convert the \(\sqrt{n}\) ciphertexts into a packed ciphertext that holds all values. And finally, we release a constant-size hint for it. The XiO is the set of all \(\sqrt{n}\) constant-size hints.

There are three technical pieces here to understand:

Some changes to GSW
How the packing works
How the hint mechanism works

In reality, the three are connected. We don't ever use the packing in any context other than hint generation and hint-based decryption, and the changes to GSW make it unable to decrypt in any way other than hint-based decryption, so we may as well treat it all as one algorithm. Let's go through these in turn, following the WW21 protocol.

Changes to GSW

Remember plain GSW: \(C = A + \frac{q}{2} * m * G\), where the freshly-generated-per-message \(A\) satisfies \(s * A = e\) for low-norm \(e\).

Here, we modify the GSW to: \(C = A * R + \frac{q}{2} * m * G + E\) , where \(A\) is a fixed "public key" matrix, and \(R\) is randomly chosen freshly each time but is low-norm. \(E\) is a new error introduced to protect \(R\) from being revealed, also randomly chosen freshly each time but low-norm.

Importantly, \(A\) is tall and narrow, and \(R\) is wide and short. The small dimensions of the two need to match. This is important, because this dimension is what determines the size of the hint. The hint can't just be one value, it needs to have some width to preserve LWE security, but it can be much smaller than the sizes of the matrices, and hence the vectors it will be helping to decrypt.

Also importantly, the multiply-by-s trick used to decrypt gets blown up by the new error matrix, so there is no longer a working "decryption key". The ONLY thing you can do to ciphertexts, other than evaluating them even more, is something like the hint-based decryption mechanism. For this reason, the authors are clear to call it a homomorphic commitment scheme, and not homomorphic encryption.

Packing and hints

Suppose you have several ciphertexts of the above form: \(C_1\), \(C_2\) ... \(C_k\). They all share the same \(A\), but have different \(R_i\) matrices, and different \(m_i\) (messages).

Here is how we combine them all into one vector:

Let \(u_i\) to be the vector that has the same width as \(R\) and \(G\), and that is all zeroes, except it's a 1 at the \(log(q) * i\) position (eg. if \(q=16\), for \(i=1\), it's \([\ 0\ 0\ 0\ 1\ |\ 0\ 0\ 0\ 0\ |\ 0\ 0\ ...\ ]\) (separators added for clarity). Notice that the \(log(q) * i\) position is exactly where the i'th row of the \(G\) matrix has its highest value.
- In plain terms, think of \(u_i\) as the "keep only the \(log(q) * i\)'th column, discard everything else" operator.
- In terms of its role, think of \(u_i\) as meaning "extract the most useful thing about the ciphertext (the place where \(m\) is stored in highest-significant-bit form), and put it into row \(i\) of the output.
Compute \(c_f = C_1 * u_1 + C_2 * u_2 + ... + C_k * u_k\)

If \(q\) is not a power of 2, you have to do something slightly more complicated: instead of the i'th block of \(log(q)\) being many zeroes followed by a one, it must be the least-significant-bit-first binary encoding of \(\frac{q}{2}\). The former is the \(q = 2^k\) special case of the latter.

That's it. To compute the raw hint, we "track" the \(R\) matrices through the additions and multiplications in the FHE along with the ciphertexts (exercise to the reader: go through the "track the \(R\)" algebra for ADD and MUL yourself; it follows closely the same \(G\) and \(G^{-1}\) based approach used with \(C\) to keep error low-norm), and then apply the exact same procedure to \(R\): \(r_f = R_1 * u_1 + R_2 * u_2 + ... + R_k * u_k\)

Now, remember that \(C_i = A * R_i + m * G + E_i\), so if you subtract \(A * R_i\) from \(C_i\), you get \(m * G + E_i\), which has \(m\) in \(log(q)\) adjacent positions on each row of the matrix. Here, though, we are dealing with many different \(R_i\)_' values, and we want to have a single operation that reveals all \(m_i\)'s at once. To do this, we are employing a trick: in the i'th row, we are using \(R_i\). So if you subtract \(c_f - A * r_f\), the i'th row of the result will correctly cancel out \(A * R_i\)_, and give you the value \(m_i * q/2\) that you need (plus an error).

There is only one remaining problem: the raw hint \(r_f\) on its own is not a secure hint. It reveals too much information about the \(R_i\) matrices, which are meant to be private. To deal with this problem, we mask it:

Generate a fresh secret \(s\) (same size as the hint)
Release \(d_f = r_f + s\) instead of \(r_f\) on its own
Also release a "sample", \(b = A * s + e\)

The verifier knows \(c_f\), so they can then compute:

\(c_f + b - A * d_f\)

\(= (A * r_f + m * G + e_1) + (A * s + e_2) - (A * r_f + A * s)\)

\(= m * G + e_1 - e_2\)

For security, the error in the sample must be much larger than the other errors, so as to fully hide information about them. So we have two layers of scale separation: the sample error is large enough to smudge the other error, and the message is large enough to avoid being corrupted by the sample error.

This second version of the scheme is secure, but it has a new big problem: the sample \(b\) is full-width, not short like the hint. So we make one more modification: instead of releasing \(b\) directly, we release a gadget that creates it.

Here is what we do:

The generator makes a fresh encryption of a \(seed\) (as in, a value that you generate hashes from), and releases an FHE encryption of the seed: \(A * R_i + seed[i] * G + E_i\) for each bit in the seed
For each bucket \(\beta\), the generator and the decryptor can both do the following computation:
- Compute \(A * hash(seed, \beta) + e_\beta\) inside FHE (where \(e\) itself is error generated by taking another \(hash_2(seed, \beta)\)), and get ciphertexts: \(C_i = A * R_i + m_i * \frac{q}{2} + E_i\), where \(m_i\) is the i'th bit of \(A * hash(seed, \beta) + e_\beta\).
- Apply the packing procedure we described above, but with a \(u_i\) for each of the \(k * log(q)\) columns, not just the \(k\) columns at multiples of \(log(q)\) minus one. For clarity, we'll treat this as a sum with two indices: the within-block index \(0 \le t \lt log(q)\), and the block index \(0 \le j \lt k\). We'll treat everything as zero-indexed here to make the math cleaner.
  - Remember: the original packing procedure gave you \(c_f = \sum_i (A * r_i + \frac{q}{2} * m_i)\). Here, you get \(c_g = \sum_{j,t}(A * r_{t + log(q) * j} + 2^t * m_{t + log(q) * j})\).
  - We are assuming that within the circuit, the j'th block of \(log(q)\) ciphertexts is representing the binary encoding of the j'th number in \(A * hash(seed, \beta) + e_\beta\).
  - Hence, within the j'th block, \(\sum_t 2^t * m_{t + log(q) * j}\), we are "evaluating" the binary decomposition of the j'th entry of \(A * hash(seed, \beta) + e_\beta\), and because of the structure of \(G\), all of these values get sent to the j'th row of the output.
- The output of the packing procedure is \(c_g = A * r_g + A * hash(seed, \beta) + e_\beta + e_g\). Notice that \(A * hash(seed, \beta) + e_\beta\) lives in the place where normally \(m * \frac{q}{2}\) would live. This is a direct result of the way that we modified the packing procedure above. The expression for \(c_g\) can be rewritten as \(c_g = A * (r_g + hash(seed,\beta)) + e\)
- The generator can compute \(r_g + hash(seed,\beta)\) in the clear, the evaluator can't. So the generator uses this as the \(s\) value that they mix in to the hint.
The decryptor's calculation is the same as before, except that they don't receive \(c_f\) from the generator; they generate it themselves using the method above. The math for why it cancels is also exactly the same.

The papers use \(prf\) ("pseudorandom function") to refer to the hash. This just means that the hash needs to have strong randomness properties (much stronger than collision resistance, but weaker than random oracle); if you trust hashes to satisfy the random oracle model, then you can safely think \(prf = hash\) everywhere.

And that's it. The XiO is:

The public parameters of the system (eg. encryption keys, the encrypted circuit to evaluate on)
The hint \(r_f + s\) for each bucket (where \(s = r_g + hash(seed,\beta)\))
The FHE encryption of the \(seed\) to enable computing samples that let you cancel out \(s\)

Evaluation requires re-doing the execution just for one particular bucket, then using the hint to decrypt the final vector, and finally taking out the value(s) you need from that bucket (in the obfuscation use case, the decryptor is taking out all values from all buckets).

There is one further optimization to maximize efficiency while guaranteeing provable security. The XiO is \(O(\sqrt{n})\) in the size \(n\) of the thunk it's encoding, but it requires publishing trusted setup parameters of size \(n\) (these exist for a technical reason: the function that generates the \(A * s + e\) samples that compensate for the hint needs to have a never-used path that accepts and uses fully random values, this allows for the security proof). This trusted setup is length \(n\). If you split the XiO into \(\approx n^{\frac{1}{3}}\) separate XiOs, then the size of the XiO goes up to \(\approx n^{\frac{2}{3}}\) but the trusted setup (which can be reused by all of them) goes down to \(\approx n^{\frac{2}{3}}\) too.

XiO and security assumptions

All parts of the obfuscation pipeline other than XiO can be justified with fairly "standard" lattice-based assumptions. Ultimately, all cryptography is just a guess that some mathematical problem is "actually hard" and it's not just that we've been too dumb to solve it so far but we might figure it out next year. But some guesses are more well-founded than others. Among all quantum-safe primitives, the basic LWE assumption underlying lattices is typically considered less robust than hashes, but more robust than almost anything else (including more complex lattice assumptions).

But with all existing XiO primitives there is a problem. The design above is not strictly provable under the lattice assumption. It is made provable under the assumption of the form "if you publish \(A * s + e\)-style ‘samples', and also publish a little bit of extra information, then we're still fine". But there's a problem: such assumptions have been broken before, and ultimately got broken here.

In 2013, an early candidate for multilinear maps (a very strong primitive that, if we had it, would make obfuscation far easier) was published in CLT13. In 2015, however, it was broken.

The object that CLT13 released was \(p_{zt} \approx z * \frac{h}{g}\). Ciphertexts are of the form \(c = \frac{a + gr}{z}\). If you multiply a ciphertext of \(a=0\) with \(p_{zt}\), you get \(c * p_{zt} = \frac{gr}{z} * z * \frac{h}{g} = r * h\). \(r\) and \(h\) were both very "small" values (much smaller than the modulus), and so the two multiplied together will also be much smaller than the modulus. Thus \(p_{zt}\) functions as a "zero check" key, which is what allows you to do pairing-check-style operations on lattice ciphertexts (which unlike pairings, can multiply to arbitrarily high depths if you scale the parameters).

The problem: \(h * r\) reveals an exact expression in the underlying secrets of the scheme over the integers, an so if you have enough such values, you can perform some matrix operations and recover the secrets.

The HPLS conjecture, the core new assumption in WW21, has what is on surface also a way to leak what is "just the error":

The adversary knows \(d_f = r_f + s\) because it is published
If the adversary has \(r_f\), then with enough of these values they can solve linear equations to recover the secrets (this on its own is ok: after all, the whole point of the \(s\) is to mask the \(r_f\))
If the adversary has \(s = r_g + hash(seed,\beta)\), then notice that that value satisfies \(c_g = A * (r_g + hash(seed,\beta)) + e\), and so if they multiply it by \(A\) and then subtract that from \(c_g\) (which they can also calculate), then they get an "error" value \(e\) in the clear

Notice that even if extracting \(e\) is fatal, WW21's security still has another line of defense: the adversary doesn't learn either one of the fatal secrets (\(r_f\) and \(s\)), they learn the sum of the two, and maybe there aren't any useful combination attacks that start from that sum. But even extracting \(e\) probably is fine. The attack in HJL21, which broke the HPLS assumption, managed to do one specific thing: they figured out how to adversarially construct circuits in such a way that the lowest-order bit of the error (which survives "smudging" tricks that normally protect schemes like this) encodes a variable that outputs whether the execution is "real" or a "simulated" execution used in the proving process. To summarize:

The attack requires the adversary to craft the circuit; no legitimate actor would design circuits in the way that the adversary would
The attack only leaks a variable used inside the machinery to prove the scheme's correctness, it doesn't actually leak any secrets
The attack requires knowing \(s\), and actually the adversary only knows \(d = s + r_f\)

Hence, even if the assumption is broken in limited cases, the security of WW21 is probably fine.

A new protocol was published in 2025, which adds some extra complexity to the "help the decryptor discover samples of the offset to the hint" piece of the WW21 technique to make it depend on lattice assumptions that are more limited than HPLS, but are still more aggressive than plain LWE. In particular, the assumption that HJL25 uses falls into a framework of "only publish objects where the secret-bearing part gets mod-q wrapped" (which excludes both making available "raw error" and CLT13's \(r * h\)). This is not as conservative as plain LWE, but it is a meaningful improvement. The space between "plain LWE" and "the secret-bearing part must be mod-q wrapped" contains assumptions like circular security, as well as the assumption in HJL25. There are counterexamples to assumptions of this type, but they are highly contrived, and real protocols based on these types of assumptions have proven secure so far.

One analogy you might have is to other "idealized black box" assumptions:

The random oracle model of hash functions
The generic-group model of elliptic curves.

The ROM has so far proven fine in practice, but researchers have designed signature schemes that are provably secure in the ROM but easily attackable if instantiated with any real hash algorithm. And the generic-group model is known to be false in a few cases, most notably order-finding via Schoof's algorithm, and pairings. Elliptic curve security rests on a shaky foundation that you can violate the generic-group model in those two ways, but not any more harmful ways.

The one family of protocols that proves iO entirely from "standard" assumptions is JLS20 and descendants, but these are not quantum-safe.

Summary: cryptographic assumptions are a complicated zoo, and there are no absolutes, only different comfort levels.

Another important note is that if we want to be provably reducible to these security assumptions, then the security proof of the obfuscation of a \(k\)-bit-input program needs to "walk over" all possible \(2^k\) inputs. For each input, you consume a little bit of security margin. Thus, the proof is consuming security margin potentially way more than \(2^{128}\) times. For this reason, all of these obfuscation protocols require the LWE parameters to be subexponentially secure, which means the ciphertexts are many thousands of times larger than regular LWE ciphertexts (and because of the stacked nature of the protocol, this "many thousands" itself compounds on itself several times).

If you wish, you can "yolo and ignore this", the same way that recursive STARK protocols yolo and ignore the fact that they require the hash function to simultaneously be a random oracle and be a concrete thing that can be instantiated by a circuit. Doing that is plausibly okay, but if you do enough plausibly okay things for efficiency, you run into the philosophical question: why even bother to fight so hard to make obfuscation provably secure under standard assumptions, instead of adopting one of the other approaches (eg. diamond iO) that are provably secure under aggressive assumptions but have far better efficiency?

In summary, the answer to "what cryptographic assumptions do you need to make a working obfuscation protocol" is:

A lot more robust than the 2010s, when obfuscation was built on matrix branching programs that relied on lattice-based multilinear maps - which, so far, have not survived cryptographic scrutiny and have frequently been broken in fatal ways.
But still far from perfect

XiO and trusted setups

Finally, another obvious but important thing to highlight: the security of obfuscated programs depends on trusted setups in a whole bunch of places, and so we are very far from being able to do fully trustless obfuscation (ie. generate programs with internal secrets in such a way that even the obfuscator does not know the secret). The best that we can do is replace the single party trusted setup with an N-party trusted setup.

Here is one natural way to do this.

We define a generator \(G(H, s)\) which returns a two-input program \(F(\beta, x)\) with the following behavior:

If \(\beta = 0\), it interprets its second input as a program \(P\), and returns \(Obf(Q)\) where \(Q(input) = P(input, hash(s, P, 0))\). That is, it takes as input a program \(P\) that has a slot for a secret, and outputs an obfuscated version of \(P\) with its internal secret hardcoded in.
If \(\beta = 1\), it interprets its second input as randomness, and it outputs \(Obf(H(H, hash(s, x, 1)))\).

By convention, we will say, \(G(G, x) = G_x\) (note: this is using a trick similar to the Y-combinator) and \(hash(s_i, r_{i+1},1) = s_{i+1}\).

Here's how a chain of N participants would look:

Alice runs \(G(G, s_1)\) to generate \(G_{s_1}\) (note that the raw \(G\) is unobfuscated, but the output of this step and all future steps will be obfuscated)
Bob calls \(G_{s_1}(1, r_2)\) which outputs \(G(G, hash(s_1, r_2, 1)) = G_{s_2}\)
Charlie calls \(G_{s_2}(1, r_3)\) which outputs \(G(G, hash(s_2, r_3, 1)) = G_{s_3}\)

Each participant would also run their output \(G_{s_k}\) with \(\beta = 0\) on \(P\) to obfuscate \(P\) with \(s_k\) (the mix of all previous secrets) as the secret, and publish this obfuscation. Only the last participant's obfuscation will actually mix in everyone's secrets, but in our model, we are assuming that participants come in one after the other, and none of them knows if they will be the last.

This way, the setup is universal and updateable, the gold standard for trusted setups.

And to verify the correctness of each step, each participant would need to make a recursive STARK proving that (i) the previous step was recursive-STARKed, and (ii) they added their randomness correctly.

However, this method is extremely expensive: it requires a STARK of obfuscation of obfuscation. An open challenge is to figure out the ideal way to do universal and updateable trusted setups for obfuscation.

So what are the runtimes?

Perhaps the most famous fact about these obfuscation protocols is that their runtimes are technically polynomial, but in practice galactic. To see why, we just need to look at the full tower again:

To decrypt FE, you need to run ABE decrypt of FHE eval of GC generation. Each one of those replaces each bit in the computation with a cryptographic object - either a hash or a matrix. FHE also requires a bootstrapping step that requires running FHE decryption inside FHE. ABE alone requires an overhead of \((log\ q)^5\), where \(log\ q\) must be significantly above the security parameter \(\lambda \approx 120\). And you're stacking these three things on top of each other. So FE alone requires an overhead factor somewhere in the trillions of trillions.
To evaluate sublinear RE, you need to evaluate an XiO of a succinct FE encryption. of a VM of the underlying program. XiO is mostly FHE; fortunately, you can reuse intermediate computations between the different bits that you're encoding, so for functions where most of the work is in generating a single thing, and extracting individual bits is relatively trivial, XiO overhead ~= FHE overhead, although unfortunately it's a very inefficient type of FHE. Also fortunately, the XiO is of the procedure that generates a succinct FE encryption, which does not care about what's underneath. But when you're evaluating, you then also have to, inside FE, generate the XiO of the next round.

The total theoretical overhead here is thus roughly: \(ABE * FHE * GC * XiO * sFE_{enc}\), where \(XiO\) is mostly FHE. All that per bit in the input.

Now, on top of all that, the parameter choices for each individual scheme have to be very conservative for subexponential security, adding over 1000x extra overhead at each LWE layer.

This is why these schemes are far from practical today; expected runtimes are longer than the lifetime of the universe.

Where do we go from here?

There are three routes:

Someone (or some bot) very smart discovers a way to optimize and/or simplify the above tower, using similar assumptions to what we have today. Perhaps there are ways to collapse ABE and XiO together into one single gadget - maybe find a construction that both has ciphertext-independent hints (like ABE) and is truly fully homomorphic so you can compute multiplications without having the cleartext wire values (like the split FHE used in XiO).
We figure out a way to do obfuscation with more aggressive cryptographic lattice assumptions, and thus create a different and much simpler tower.
We figure out a way to do obfuscation in some completely totally different way, not using lattices at all. Maybe this means inventing a new category of assumptions.

Routes (2) and especially (3) also imply that we get better at battle-testing such assumptions, again perhaps with AI help.

The next posts in this series will talk about what is perhaps the leading construction attempting to do (2), diamond iO, and what is perhaps the leading construction attempting to do (3), local mixing obfuscation.

If we succeed in either path, the reward is high: there is a real sense in which we will have "solved cryptography": any protocol that can be described using an idealized trusted third party, provided the adversary is allowed to rewind the clock, will be implementable securely. But getting there is still a formidable challenge.