Obfuscation: building the final boss of cryptography (Part I)
2026 Jun 29
See all posts
Obfuscation: building the final boss of cryptography (Part I)
Special thanks to Sora Suegami, Janmajaya Mall, Aayush
Jain and Fun Killer for feedback and review.
The most powerful primitive that has been conceived in cryptography
is obfuscation. Obfuscation lets you convert a program
\(P\) into an
"encrypted program" \(Obf(P)\) such that you can run \(Obf(P)\)
on cleartext inputs and get the same cleartext outputs that \(P\) gives
you, but the internal workings of \(P\) are hidden. The precise
formalism typically used, indistinguishability
obfuscation (iO), says that if you are given obfuscations
of two different programs that have the same functionality, you can't
tell which is which. Effectively, it's hiding the code, not the
data.
Obfuscation is powerful because it comes very close to the
theoretical ideal of a universal "trustless trusted third party":

Source: The
God Protocols (Nick Szabo), 1997
Cryptography protocols are often described by first imagining a
protocol that relies on a trusted third party who sees everyone's
messages and responds honestly, and then figuring out some way to do the
same thing without the trust.
- Encryption is simple: the "trusted third party" is effectively a
postage system that accepts instructions saying "I want [recipient] to
see [message]" and passes along the message to the recipient.
- Zero knowledge proofs replace a trusted third party who receives
your data, checks it, and then confirms to anyone who asks that the data
is in some sense correct
Obfuscation (technically, obfuscation plus hashes) lets you make a
simulated trusted third party for basically any protocol, so it
can replace both of the above and much much more. There is only one
major exception: an obfuscated program can't prevent itself from being
copied, so it can't do "stateful" things like money - and that's exactly
the gap that blockchains are well-placed to fill.
And so if you have obfuscation and a blockchain, you can do
some pretty magical things. Like, say, a secure, private and
collusion-resistant voting
system that has almost no trust assumption at all - no
M-of-N threshold committee required. Or basically anything from this
list from 2014, without any M-of-N trust assumption.

A fairly general-purpose way to combine obfuscation and
blockchains to make something very close to a "trustless trusted third
party"
So what's the catch? Well, it turns out that making a
secure form of obfuscation is really really hard.
There has been a decades-long tradition of insecure
obfuscation: people shuffling around the logic in compiled programs to
make it harder to
see what's going on - admittedly, often to prevent users from
modifying proprietary programs like games. This is the equivalent of
things like the Caesar cipher for
encryption - and, like Caesar-style ciphers, it regularly gets
broken.
As a result, there has also been a decades-long tradition of trying
to create an obfuscation protocol that we can mathematically prove is
secure. But almost from the beginning, this ran into a problem. In 2001,
we got a
famous result that creating an ideal form of obfuscation - obfuscate
\(P\) in such
a way that running \(Obf(P)\) reveals nothing beyond what
you can learn by querying an API that gives \(P(x)\) for any user-supplied \(x\) - is
impossible. The core idea is that a code instantiation of \(P\) always
reveals at least something beyond its outputs to user-supplied inputs:
at the very least, you can learn things by applying \(P\) to
its own code.
From that point, researchers shifted to trying to prove the
second-best target: indistinguishability
obfuscation (iO). This has been a twenty-year project, with many
failed attempts, many constructions of protocols that build on top of an
ingredient that does not yet exist, many people trying to build
that ingredient, failed attempts of that, and so on.
But in the last few years, we finally have some good news: we know how
to achieve iO under reasonable security assumptions.
But within the good news, there is bad news: the run time is
literally galactic. It's technically polynomial, but
it involves stacking many layers of "take one thing that's vaguely like
fully homomorphic encryption, now put the circuit for evaluating that
into another thing that's vaguely like fully homomorphic
encryption, run that in plain old regular fully homomorphic encryption
once for each bit generating an intermediate multi-megabyte
value, and oh yeah, did I mention that you have to put that whole thing
into another thing that's vaguely like fully homomorphic
encryption, and then run all that once for each bit in the
input?" As a result, runtimes for these "sort of provably-secure iO
schemes" are somewhere over λ10 (where λ is the "security
parameter", ie. the logarithm of how long it takes to break the scheme;
it's standard to say λ = 100 or 120)
There are two hopeful stories that you can tell here. One is that
this is similar to where SNARKs
were in 2010, and now that we know it's possible, smart people (and
bots) will start coming up with clever workarounds to each bottleneck,
and chopping off orders of magnitude from the runtime one after the
other, and eventually we'll get to something that "only" takes a day on
a heavy GPU to run (that may still sound prohibitive, but it's actually
enough for many interesting applications). Another is that we will see
more work on a different strategy toward the same goal: getting much
better at developing new cryptographic assumptions, and getting
better at telling which new assumptions are likely to be actually
safe.
If you add in the more heuristic approaches, there are roughly three
(not-yet-dead) families of obfuscation protocols so far, and you can
place them on a "tradeoff frontier" of efficiency vs bravery on security
assumptions:

This post will describe in detail the most galactic, but also the
most rigorous, family so far: the one that's in blue in the diagram
above.
Just a warning: there will be a lot of math.
Obfuscation is hard because it basically requires stacking almost every
primitive that cryptographers have invented in the past twenty years,
except for the primitives that you already know about if you're
a blockchain developer, such as SNARKs
and STARKs.
The underlying math will also be different: whereas SNARKs and STARKs
tend to have a lot of polynomials, hashes and elliptic curves,
obfuscation will have a lot of lattices, vectors and
matrices.

Notation notes
- The descriptions in this post are based on a combination of
approaches from different papers, they do not completely follow any
single one of them.
- In some cases choice of notation and vocabulary will differ from any
individual underlying post
- Vectors are lowercase, matrices are uppercase
- In the diagrams, the greyed text and dotted lines informally mean "X
depends on Y, but you don't have to actually pass Y into X, so the (size
/ runtime) of X may be much smaller than the (size / runtime) of Y"
- "iO" = "indistinguishability obfuscation", as opposed to "IO" =
"input/output", and both as opposed to the British Indian Ocean
Territory, which is what the .io TLD is named after
The standard pipeline
The "reasonably provably secure" obfuscation protocols that are built
today are built on top of a decade-old tower of constructions: the AJ15 / BV15 / LPST15 / LPST16 lineage.
AJ15 and BV15 are two roughly simultaneous papers that both
discovered roughly the same way to build obfuscation on top of a
primitive called functional encryption: an
"authority" publishes keys tied to a function \(F\), and
once those keys exist, anyone with the encryption key can encrypt \(x%\) in
such a way that anyone with the decryption key can recover \(F(x)\).
LPST15 came up with something similar. But the functional encryption
required needed to have very strong properties which were not yet
available. Fortunately, a year later, LPST16 discovered a way to build
obfuscation on top of something similar, sublinear compact
randomized encoding, and then built it on top of an already-known
"succinct-but-not-compact" functional encryption scheme plus a new
primitive called XiO - obfuscation that is only slightly
smaller in size than publishing a table of the outputs of the function
on all possible inputs. The bulk of the work since then has been
figuring out ways to actually implement XiO (though some protocols take
other routes, eg. JLS20).
We will split this description in three parts:
- Assuming you have a succinct FE scheme and an XiO scheme, how do you
build an obfuscation protocol?
- How do you build a succinct FE scheme?
- How do you build an XiO scheme?
Assuming
you have a succinct FE scheme and an XiO scheme, how do you build an
obfuscation protocol?
There are several ways to build an obfuscation protocol on top of
succinct FE and XiO, and they are all roughly equal in their high-level
principles and their properties. So I will stick to a simplified version
of the LPST15 design.
Note that in this post, we will often use "function" and "circuit"
interchangeably. A circuit is a set of AND, OR, NOT, etc gates with
wires linking them that can be used to evaluate a function. To get a
better intuition of circuits, I recommend this
post explaining garbled circuits (a primitive that we will need
later anyway).
First, let us define succinct FE and XiO, so we understand the
properties of these gadgets.
Here is succinct FE:
Authority generates circuit-independent public
parameters, and circuit-specific decryption keys for a publicly-known
function with a single-bit output with circuit \(C\).
Encryptor can use the encryption keys to encrypt
\(x\). The
runtime of this step does not increase (or increases only slightly) with
the circuit size of \(C\), though it does scale linearly
with input length.
Decryptor can use the decryption keys to learn \(C(x)\) and
nothing else about \(x\). The runtime of this step
does scale with the circuit size of \(C\).
In other words: someone does a trusted
setup, I encrypt \(x\), you can decrypt \(C(x)\)
(different from fully homomorphic encryption: in FHE, anyone who can
decrypt \(C(x)\) can
also decrypt \(x\) or any other function of \(x\))
Functional encryption is not on its own sufficient for obfuscation,
because (i) it doesn't hide the function, and (ii) each published
encryption is bound to one input and can only decrypt the function
evaluated on that one input. But it gets you a lot of the way there.
Note that the more common definition of succinct FE requires the FE
to tolerate high circuit size, not high circuit depth.
In our usage, however, we do need to tolerate high depth.
Fortunately, depth-independent succinct FE is a fairly simple wrapper on
top of depth-bounded succinct FE (we will get into this later), so in
this post we will just assume that the succinct FE that we use is also
depth-independent.
Now, here is XiO:
Generator chooses parameters for a hidden function
with circuit \(C\) that has a single-bit output.
They generate an encoding of that function. This step is allowed to take
as long as evaluating \(C\) on all possible inputs or even
longer, but the output must be smaller than the truth table (the set of
outputs for all possible inputs)
Evaluator evaluates the encoding to learn \(C(x)\) for
any input \(x\).
The class of functions that XiO is designed for is functions that are
really best understood as zero-input functions (aka
"thunks")
that produce a large but manageably-sized output. Given a thunk \(() \rightarrow
X\), we can think of it as being a function \(i \rightarrow
X[i]\), ie. the function which takes as input an index
\(i\) as
input, and returns the i'th output of the thunk. These are two different
views of the same object; in this post we will regularly bounce back and
forth between these views.
For example, you could imagine taking a function that generates a STARK
(a roughly 128-512 kB sized cryptographic proof) and turning it into a
function that takes as input an index \(0 \le i \lt
2^{22}\), generates the STARK internally, and then outputs
the i'th bit of the STARK. The goal of the XiO is that it's a gadget
smaller than the STARK (though note: even a STARK may be too small to
actually allow known XiO protocols to shrink it) that lets you
generate that STARK, without being able to learn anything else about the
underlying process that's generating it.
Now, we combine succinct FE and XiO into a primitive called a
sublinear compact randomized encoding:

Let's walk through this carefully. The goal here is to create a
"randomized encoding" of \(P()\) - that is, a gadget that
enables executing \(P()\) - which is asymptotically
smaller than the runtime of \(P\) and asymptotically smaller than
its output, and which hides \(P\).
This is slightly different than the usual presentation of
randomized encodings, which operate over \(P(x)\) and
hide \(x\) but not
\(P\) (eg.
garbled circuits work this way), but this is what we need for our use
case - obfuscation is all about hiding the function.
The primitives that we will be working with do not hide the function.
To do this, we make the "outer" function that these primitives are
operating over just be a virtual machine (or "Turing machine") -
basically, a function \(VM\) which takes circuits as input,
so \(VM(P,
X) = P(x)\). The circuit evaluating \(VM\) is
often called a universal circuit. This way, \(P\) becomes
just another input, which can be made private.
One nuance in terms of asymptotic complexity: \(VM\) needs
to be instantiated as a circuit, and the size of a circuit must
be proportional to its runtime (because circuits do not allow looping).
Hence, the size of the \(VM\) circuit must be proportional to
the runtime of \(P\). Fortunately, succinct FE allows
generation to be fast even if the circuit is large and deep. However,
the output may have size and encryption cost that scales with
the size of \(P\). This is the reason why \(P\) must be
expressed in a language other than circuits (such as a Turing machine):
its size must be fixed even if the computations get large.
We do a trusted
setup of succinct functional encryption, where the creator publishes
public params and decryption keys for \(VM\). This allows anyone to encrypt
\((P, x)\) in
such a way that anyone else can decrypt \(VM(P, x) = P(x)\), but without
learning \(P\) or
\(x\).
To generate this randomized encoding, the creator wraps the two
primitives I mentioned above inside each other: they do an XiO of a
succinct FE encryption of \(VM(P, x)\). The succinct FE
encryption hides \((P,
x)\), and it's fast to generate even if \(P\) takes a
long time to run. But succinct FE is not output-compressing: if the
output is big, the succinct FE encryption is even bigger (and this is an unavoidable
limitation of that type of construction). The XiO cuts the size back
down. The creator makes a circuit \(C_{sfe}(i)\) that outputs the i'th
bit of the succinct FE encryption, and does XiO over that. Its size is
now asymptotically smaller, and it's asymptotically faster to generate,
than the program \(P\) itself (note "asymptotically":
at small sizes, the overhead of the scheme dominates, the key property
we want is that for sufficiently large circuits, \(xIO(sFE(VM(P, x))))\) becomes
smaller than \((P,
x)\) itself, both in byte size and in generation time.
Notice that here we already have our first glimpse of obfuscation:
the creator themselves can both do the trusted setup and publish the
xIO, and others can execute it without being able to learn the program
being run. But we have a big problem: this only works for one
pre-configured input.
Now, we get to the next part: the final obfuscation construction.

Essentially, we take the sublinear compact RE primitive, and we apply
it recursively, recursing on the number of bits of input going into the
function you are trying to obfuscate. At the base case (zero inputs), we
just use the sublinear compact RE as above. To add one bit, we make a
sublinear compact RE of the process of generating two obfuscations one
level lower: obfuscate the function \(P_0(x)\) which executes \(P\) but
with the first input bit fixed to \(0\), and \(P_1(x)\)
which executes \(P\) but with the first input bit
fixed to \(1\).
Remember the key properties of the two building blocks of sublinear
compact RE:
- XiO ensures that this obfuscation can be smaller than the
two sub-obfuscations it is generating
- Succinct FE ensures that it can be faster to generate than
the two sub-obfuscations it is generating.
Both properties are needed to ensure the recursion avoids blowup.
To evaluate an obfuscated program, you evaluate the top-level RE to
get your two new obfuscations (both with one fewer input bit required),
then choose either left or right based on the first input bit, then keep
recursing further down, until you bottom out with a thunk that generates
\(P(x)\) for
the \(x\) that is
the full path you walked down.
Here's another equivalent way to look at what's going on:

For programs with an n-bit input, there is a depth-n, size \(2^n\) tree
of all possible evaluations. However, most of this tree is never
evaluated. The whole thing is a tree of "thunks", and so the exponential
cost of creating the tree is never paid because all the thunks that are
not on the path to the specific input you care about never actually get
instantiated. The only thing that does get instantiated is the
obfuscation at each step along with path going from the root to the
input. And at a very high level, that's it, that's the core design.
Note that the original LPST15 and LPST16 papers describe what
is going on slightly differently (but equivalently): they do not recurse
on obfuscation directly, rather they recurse on a "node
program". One nice property of that approach is that the node
program captures the "bits of the input so far", so you don't have to
manipulate or wrap \(P\) itself; \(P\) stays
unchanged throughout the whole process in their version.
Another thing that the above description elided is that for
obfuscation to be secure, it needs to be randomized. So actually, it's
not \(iO(P)\),
it's \(iO(P,
seed)\), where the \(seed\) is a value that is hashed to
generate (pseudo-)randomness needed by primitives like garbled circuits.
Randomized encoding, XiO and succinct functional encryption also all
take a \(seed\) as
an argument. When one of these functions calls any other as a
subroutine, it should generate the seed for the subroutine via a hash
from its own seed, and take care to provide a unique different seed for
each one. To make the security proofs clean, the hash must technically
be a puncturable
PRF, though with the interesting property that the punctured version
of the hash needs to be possible to construct but never actually gets
run.
How do you build a
succinct FE scheme?
The bad news: the succinct FE scheme is a complicated tower of
constructions, one stacked on top of the other.
The good news: these constructions are relatively well-understood,
use very standard cryptographic assumptions ("just" lattices),
and the pipeline to build succinct FE has been understood since roughly
2014.
Here's the diagram first:

Let's go through these one by one. The two most fundamental building
blocks are garbled circuits and fully homomorphic encryption.
Garbled circuits
Here
is a post where I explain garbled circuits in more detail. The basic
summary is:
- For each wire in a circuit you generate two labels, one representing
0 and one representing 1.
- For each gate, you publish a table of which pair of input labels
corresponds to which output label, stored sorted by label so it's not
clear which input label corresponds to zero and which corresponds to
one.
- The output labels are not given "in the clear"; instead, they are
typically XOR'd with the hash of the two inputs. This makes them
calculable when needed, but prevents you from learning anything about
the "branch" other than the one that you have both input labels
for.
- To execute, you need to obtain the labels corresponding to your
input, and then you "walk down the circuit" to eventually determine the
values of the wires at the end (which are actually values, and not just
labels)

Garbled circuits can only safely be executed once. If you give
someone the labels for two inputs, they are able to compute much more
than two outputs. Garbled circuits were originally developed as a 2-of-2
multi-party-computation primitive: I have a circuit \(C\), you
know \(x\), I send
you a garbling of \(C\), for each input wire I help you
learn one of the two input wires without learning which one using a
technique called oblivious
transfer, and then you walk the circuit to compute the output
\(C(x)\).
A basic rough description of how oblivious transfer works: you send
me two public keys \(A\) and \(B\) that have to add up to some
random hash so only one of them can have a valid private key but I don't
know which one, I encrypt one label with \(A\) and one with \(B\), you
decrypt the label that you have a private key for, but I can't tell
which one that is.
Here, however, we are not doing 2-of-2 computation. Rather, we are
using garbled circuits as an ingredient to get the properties
that we need for succinct FE. Garbled circuits have many valuable
properties. Notably:
- You can give someone labels for an input without revealing the
input.
- Because the per-gate work is parallel, generating a garbled circuit
for \(C\) is
low-depth, even if \(C\) itself is very high depth.
Both of these properties will be important.
Garbling hides some information about the circuit, but not
enough to be meaningfully useful for anything where you want
function-hiding. When function-hiding is required (eg. the 2-of-2 MPC
use case), a typical solution is to garble a universal circuit (aka
virtual machine) and have the circuit provider also provide the actual
circuit \(C\) they
want to run as labels that become part of the input. For our
use case, we don't care about garbling leaking details of the function,
in fact precisely because we are employing this "wrap it in a VM"
trick.
Fully homomorphic encryption
(FHE)
Here
is a post where I talk about FHE in more detail. Because the details
matter for the other constructions later in this post, I will provide a
basic summary.
FHE is based on a cryptographic assumption called learning with
errors (LWE). Basically, if you have an approximate
solution to a system of linear equations modulo some number (ie. a
matrix \(A\),
vectors \(s\) and
\(b\),
modulus \(q\) such
that \(b =
(A*s + e) % q\) modulo \(q\), where \(e\) is a
"small" (aka "low-norm") error vector, so each value in \(e\) is much
smaller than \(q\)), then given \(b\) and
\(A\) you
can't extract \(s\). If you have an exact
equation \(s * A =
b\), then that's a system of linear equations (modulo q),
which can be done reasonably cheaply with Gaussian
elimination. But with errors added, doing this inversion is
computationally infeasible.
There are many FHE algorithms built on top of this assumption that
have different properties. The simplest idea involves randomly choosing
a private key \(s\), generating a random matrix
\(A\), and
computing \(b = A * s + e + m *
\frac{q}{2}\) modulo \(q\) as the encryption of a single
bit \(m\) (here,
adding a bit to a vector adds that value to every element of the
vector).

The creator can then publish encryptions of \(0\) and
\(1\)
computed this way. To decrypt, you can compute \(b - A * s\)
which equals \(e + m *
\frac{q}{2}\); the higher-order bit encodes the message
and the lower-order bits encode the error which can be thrown away. If
you have two ciphertexts \(b_1\) and \(b_2\) that
are valid encryptions of \(x_1\) and \(x_2\), then
\(b_1 + b_2\)
(again, all modulo \(q\)) is a valid encryption of \(x_1 +
x_2\).
Notice also that the above design as written can support encrypting
vectors of bits in the message slot, and not just single bits.
But there's one critical thing that the above design does not actually
support: multiplication.
Multiplication is trickier than addition, for a few reasons. First,
you can't multiply two vectors by each other and get a vector; you can
only do that to other structures like matrices or polynomials. Second,
whereas adding two three-term values \(A * s + e + m *
\frac{q}{2}\) gives you another three-term value,
multiplying two three-term values gives you a nine-term value, so you're
still getting something of a different "shape". Third, you end up
multiplying the "small" error by "large" other things, which makes the
error blow up after only one round of multiplication.
There is no one easy solution to this. There are several families of
solutions, that all have different drawbacks. The ones that use
polynomials are based off of a stronger assumption called Ring
LWE; a commonly-used scheme is BFV.
The scheme that is most convenient to FHE usage is based on matrices,
and is called GSW. To
show how it works, I will first show a simplified version that makes the
basic arithmetic work, but does not deal with the error issue, so it
breaks if you have nonzero error.
First, generate a secret \(s\). To encrypt a value \(m\), first
generate an otherwise-random matrix \(A\) where \(s * A\)
equals some "small" or "low-norm" error \(e\). One way to do this is to
generate both in two parts:
- Generate a random \(s_{prefix}\), and then set \(s =
(-s_{prefix}\ |\ 1)\) ( \(|\) is concatenation)
- Generate a random \(A_{prefix}\), and a random low-norm
error \(e\), and
then set \(A =
({{A_{prefix}} \atop {s_{prefix} * A_{prefix} + e}})\)
(that's vertical concatenation)

Then, compute \(c = A +
m * I\) (where \(I\) is the identity matrix). And to
decrypt the ciphertext, read off \((s * C)[-1]\) (ie. the last value of
\(s *
C\)).

If you are a mathematician, you might notice that we are "hiding"
\(m\) in an
igon
value eigenvalue of C: \(s\) is the eigenvector, so it
satisfies \(s * C
\approx s * m\). They are not quite an exact
eigenvector and eigenvalue; they're eigenvectors and eigenvalues with
errors. But first, we can set aside the errors, and note some properties
that make eigenvalues special.
Adding ciphertexts is again trivial: if
\(C_1\)
satisfies \((s * C_1)[-1] \approx
m_1\)
\(C_2\)
satisfies \((s * C_2)[-1] \approx
m_2\)
then
\(C_1 + C_2\)
satisfies \((s * (C_1 + C_2))[-1]
\approx (m_1 + m_2)\)
But unlike before, we can multiply ciphertexts too: if
\(s * C_1
\approx s * m_1\)
\(s * C_2
\approx s * m_2\)
then
\(s *
(C_1 * C_2)\)
\(= (s *
C_1) * C_2\)
\(\approx
m_1 * s * C_2\)
\(\approx
m_1 * m_2 * s\)
Eigenvalues are basically the only attribute of a matrix that has
both an additive and a multiplicative property like this, and even there
we are restricting to matrices that have the same eigenvector.
Now, let's bring back error tolerance. The mechanism
above, as written, has a fatal flaw: if you write out the full equations
with error, you end up multiplying error by elements of \(C\) and
\(s\), which
can be anywhere in the full range \([0, q-1]\). Hence, this immediately
blows up error to the maximum. Additionally, even decryption cannot
tolerate error.
To plug the first hole (we'll get back to the second at the end),
actual GSW relies on a gadget matrix mechanism.
Here is what a gadget matrix looks like:

On its own this looks nice but it does not make much sense. However,
it is intended to be paired with an operation (not a matrix) that it
cancels out: the matrix bit decomposition operation:

Every adjacent four values in the output of this operation (which we
call "\(G^{-1}\)")
is the (least-significant-bit first) binary encoding (aka. bit
decomposition) of the correspondingly-placed value in the input.
The most important facts about \(G\) and \(G^{-1}\) are:
- The output of \(G^{-1}\) is
"low-norm" (all small values)
- \(G *
G^{-1}(x) = x\), for any \(x\)
You can see the latter intuitively. If you trace the highlighted
bit-decomposition of 2 (0 1 0 0) in the above example
through the computation, then it gets multiplied by 1 2 4 8
from the gadget, and the result is \((0 * 1) + (1 * 2) + (0 *
4) + (0 * 8) = 2\). That is, if you apply the gadget
matrix to a bit decomposition, it "evaluates" the bit decomposition to
get back the original value.
In those places where we used the identity matrix (\(I\)) in the
simplified description above, in the actual protocol we use
\(G\). That
is, ciphertexts are of the form \(A + m * G\). We multiply two
ciphertexts by computing \(C_1 *
G^{-1}(C_2)\). The message is being hidden in something
that's no longer quite an eigenvalue-with-errors, but we still
preserve the needed properties. The math ends up working the same way,
but because the error never gets multiplied by large numbers,
multiplication actually works. Specifically:
- Encryption: \(C = A +
m * G\)
- Decryption: \(\frac{(s * C)[-1]}{q /
2}\)
- Adding: \(C_1 +
C_2\) (duh)
- Multiplying: \(C_1 *
G^{-1}(C_2)\)
Note one subtlety in the decryption. The last column of \(G\)
contains \(\frac{q}{2}\) as its only nonzero
entry, and so multiplying by \(G\) actually gives us an object
containing \(m *
\frac{q}{2} + e\) (and not \(m + e\)). This is what gives us
error tolerance in decryption. This is also why we are now dividing by
\(\frac{q}{2}\) when we encrypt. Note
that this method as written requires \(q\) to be a power of two. If you
want, you can make \(q\) be something else, but this
requires tweaking the decryption procedure slightly.
We can check the correctness of multiplication. Let's start with just
the algebra. First for addition:
\(C_1 =
A_1 + m_1 * G\)
\(C_2 =
A_2 + m_2 * G\)
And then for multiplication:
\(C_1 *
G^{-1}(C_2)\)
\(= (A_1 + m_1 * G) *
G^{-1}(C_2)\)
\(= A_1 *
G^{-1}(C_2) + m_1 * G * G^{-1}(C_2)\)
\(= A_1 * G^{-1}(C_2) + m_1
* C_2\)
\(= A_1 *
G^{-1}(C_2) + m_1 * A_2 + m_1 * m_2 * G\)
Now, we have to show that the first two terms are both valid "pads",
in the same way that the original \(A_1\) and \(A_2\) were
valid pads. The core property we need is that multiplying \(s * A\)
leaves only a small low-norm error.
The second term is easy: \(A_2\) is a valid pad (\(s * A_2\)
is low-norm), and we multiply it by \(m_1\), which is small, so \(m_1 * A_2\)
is also low norm and hence a valid pad.
For the first term, let us unpack \(s * A_1 *
G^{-1}(C_2)\). \(s * A_1\) returns a "small" error.
\(G^{-1}(C_2)\) returns a matrix of
ones and zeroes. Hence, \(s * A_1 *
G^{-1}(C_2)\) is a small error multiplied by a matrix of
ones and zeroes, which is still a small error.
The error in the second pad blows up by roughly the size of the
message space (so, 0 or 1). The error in the first pad blows up by
roughly the size of the matrix. Hence, each multiplication blows up the
error by a constant factor, and so we can predict how many
multiplications the FHE can survive before the error gets too high. At
that point, you would need to reset the error by bootstrapping: evaluate
the FHE decryption circuit inside FHE.
Finally, one more nuance: to keep the error blowup bounded, we need
to require messages to stay small; specifically, they must be 0 or 1 (we
can also accept eg. -1 or 2). Addition and multiplication do
not respect size limitations, and unlike BGV, here we do not have native
"wraparound" modulo 2. So to make the above algorithm safe, you would
need to use circuits with "logic gates" implemented like this:
- \(AND(a,
b) = ab\)
- \(OR(a,
b) = a + b - ab\)
- \(XOR(a,
b) = a + b - 2ab\)
- \(NOT(a)
= E_1 - a\) where \(E_1\) is an encryption of 1.
This is only one way to do it; it's not optimal; there are ways to do
it more efficiently, and that is part of the art of FHE
optimization.
Here is some python code that follows along this style of GSW, though
note that it uses a slightly different technique where \(A * R\) is
the "pad" instead of \(A\) (this has all of the properties
above, plus an extra property that we will need later). https://gist.github.com/vbuterin/f0f8a9eb09633226ada20c21a98d537e
It's worth playing around with this kind of FHE, because it helps
build intuitions for some of the other primitives that we are going to
use below.
Now, let's get to the next one:
Attribute-based encryption
(ABE)

Not this Abe.

And not this Abe.

And also not this Abe.
Here is the definition of attribute-based encryption:
Authority generates keys for a function with circuit
\(C\) , they
generate a "master public key" \(mpk\) (public), and a secret key
\(sk_C\)
(given to decryptor) that depends on \(C\)
Encryptor knows \(m\) (a message to encrypt), they
know a \(tag\) for
that message, they do not necessarily know \(C\) but they do know \(mpk\). They
produce a ciphertext \(T\)
A decryptor given such a \(T\) and
\(sk_C\) can
decrypt to recover \(m\) if \(C(tag) = 0\), otherwise they
cannot.
Usually, attribute-based encryption is justified by examples like:
\(m\) might
be medical records at some hospital \(H\), \(tag\) might be an object that
represents "you work at \(H\) AND you have a medical degree",
so it becomes easy to encrypt the medical records in such a way that
only the intended doctors can see them. As far as I can tell, however,
the number of use cases that map well to this paradigm and can't easily
be solved with much simpler public key encryption is very small. And so
ABE has not been used much in practice.
.But here, we have very good news: ABE has some valuable
properties that make it very useful in building functional encryption!
(And hence, obfuscation) In particular, even for circuits that
are large, encryption is cheap. This is the ultimate seed of
the asymmetry that allows the "thunks" in the obfuscation to be faster
to create than they are to execute, and hence makes the
exponentially-sized tree possible at all.
Here is how ABE works. Here we will follow the BGG+14 construction,
whose properties are ideal for the succinct FE and obfuscation use
case.
We work over a circuit. To have an example in your head, take the
circuit that we used in the garbled circuits example (it's a two-bit
adder), but remove the "two labels per wire" piece:

Instead, we will have only one matrix \(B_w\) per wire, plus two
randomly-generated global public matrices \(A\) and \(D\).
The initial \(B_i\) are generated randomly. To
generate the \(B\) matrices for the wires further
down, we walk down the circuit going through each gate:
- If it is ADD, then compute \(B_{out} = B_{L} +
B_{R}\)
- If it is MUL, then compute \(B_{out} = B_R *
G^{-1}(-B_L)\)
As in FHE, we need to build OR, XOR and AND from ADD and MUL in ways
that preserve the wire values staying in \(\{0, 1\}\).
The authority publishes circuit-independent \(A\) and
\(D\), and
the \(B_i\) for
the input wires for the circuit. They also provide the
decryptor with a decryption key, a low-norm matrix \(R_f\) that
satisfies \((A |
B_f) * R_f = D\), where \(B_f\) is the \(B\) matrix
for the wire that encodes the output (ie. \(C(tag)\)). Notice that the
construction of \(B\) matrices is not dependent on the
input to the circuit, which is why the authority is able to
construct \(B_f\) in a way that is then valid
for all inputs. To make the equation between \((A |
B_f) * R_f = D\) correct, \(A\) needs to be constructed in a
special way that gives it a "trapdoor"; we will return to this
later.
First, let's go through the encryptor and decryptor's logic.
To encrypt, the encryptor chooses a random vector \(s\), and
computes \(c_{out}
​= s * D + e_{out} ​+ \frac{q}{2} * m\). They also provide
the input encodings, \(c_i = s * (B_i + G *
tag[i]) + e_i\), and \(c_A = s * A +
e_A\). Notice how encrypting does not depend on the
circuit itself; it only requires knowing and doing a few multiplications
involving the input \(B_i\) and \(D\).
The decryptor's job will be to start with these \(c_i\)
values, and walk down the circuit, doing a homomorphic-encryption-like
operation at each gate to convert the \(c_L\) for the left input wire and
\(c_R\) for
the right input wire into \(c_{out}\) for the output wire.
Here, the ADD case is again trivial: \(c_{out} = c_L +
c_R\). The MUL case is the harder one.
The formula is: \(c_{out} = x_R * ​c_L ​+ c_R
* ​G^{-1}(-B_L)\)
\(x_R\) here
is the actual value on that wire in the computation. To see why this
works, let's look at each component in turn. For ease of exposition,
I'll write \(G^{-1}(-B_L)\) as \(R_{-L}\);
just remember that because \(G\) and \(G^{-1}\) cancel out, \(G * R_{-L} =
-B_L\):
\(x_R *
c_L\)
\(= x_R * (s * (x_L * G +
B_L) + e_L)\)
\(= x_R * s * (x_L * G +
B_L) + x_R * e_L\)
\(= s *
(x_L * x_R * G + x_R * B_L) + x_R * e_L\)
\(c_R *
R_{-L}\)
\(= (s * (x_R * G + B_R) +
e_R) * R_{-L}\)
\(= s *
(x_R * G * R_{-L} + B_R * R_{-L}) + e_R * R_{-L}\)
\(= s *
(-x_R * B_L + B_R * R_{-L}) + e_R * R_{-L}\)
If we add the two together, the \(x_R * B_L\) cancel out, and we
get:
\(s *
(x_L * x_R * G + B_R * R_{-L}) + x_R * e_L + e_R *
R_{-L}\)
The \(B_R *
R_{-L}\) on the left side is just \(B_R *
G^{-1}(-B_L)\), which is the definition of \(B_{out}\)
for multiplication that we gave above. And the right side is error
multiplied by low-norm values, so it stays as low-norm error.
And so the decryptor has everything they need to walk their way to
\(c_f\).
If \(C(tag)
= 0\), then the \(s * x_f * G\) term drops off from
the definition of \(c_f\), and we just have: \(c_f = s
* B_f + e_f\)
Once they get to \(c_f\), they prepend \(c_A\) to
get \(c'_f =
(c_A\ |\ c_f)\), which satisfies \(c'_f =
s * (A\ |\ B_f) + e'_f\). They then compute:
\(c_{out}
- c'_f * R_f\)
\(= s * D
+ e_{out} + \frac{q}{2} * m - s * (A\ |\ B_f) * R_f - e'_f *
R_f\)
\(= s * D
+ e_{out} + \frac{q}{2} * m - s * D - e'_f * R_f\)
\(= e_{out} + \frac{q}{2} *
m - e'_f * R_f\)
If the error did not blow up too much (remember, \(R_f\) is
also low-norm), the decryptor can now freely recover \(m\).

Now, let's get back to the trapdoor mechanism.
We do not construct \(A\) randomly. Instead, we construct
it as \(A = (A_{prefix}\ |\ G -
A_{prefix} * R)\), where \(R\) is a low-norm "trapdoor", and
\(G\) is the
gadget matrix from before. Similarly to the FHE ciphertexts we saw,
under the LWE assumption this kind of construction is computationally
infeasible to distinguish from random if you do not have the trapdoor
yourself.
The goal is to satisfy a neat mathematical property:
\(A * {R
\atop I}\)
\(=
(A_{prefix}\ |\ G - A_{prefix} * R) * {R \atop I}\)
\(=
A_{prefix} * R + (G - A_{prefix} * R) * I\)
\(= A_{prefix} * R + G -
A_{prefix} * R\)
\(= G\)
Or graphically:

We've constructed a matrix \(A\), where in some sense \(R \atop I\)
is a sort of "inverse" of A (remember: \(G\) often plays the role of
"pretending" to be the identity matrix in these types of LWE
constructions).
The goal here is that we want to create matrices for which, for any
known vector \(u\), only the creator can find a
low-norm vector \(r\) where \(A*r = u\).
That is, the trapdoor lets us "solve" the short
integer solution (SIS) problem, which is closely related to LWE and
is the basis for this type of cryptography working.
Here is the algorithm:
- Binary-decompose \(u\) (ie. apply good old \(G^{-1}\) to
it), let the output be \(z\)
- Output \(r = {R
\atop I} * z\)
\(R\), \(I\) and
\(z\) are all
low-norm, so this is also low-norm. To see algebraically why this solves
the equation, compute:
\(A * r
\\ = A * {R \atop I} * z \\ = G * z \\ = u\)
To guarantee that this is safe (in the sense of not leaking \(R\)),
real-world implementations add additional error at this step; see MP12
for more details on maximally efficient ways to do this.
Now, let's go back to the still-unsolved puzzle from above: computing
\(R_f\).
The equation we are solving for is:
\((A\ |\
B_f) * R_f = D\)
We will interpret \(R_f\) as \(R_{top}
\atop R_{bottom}\), so we get:
\(A * R_{top} + B_f *
R_{bottom} = D\)
We sample a random low-norm error for \(R_{bottom}\), so now we only need to
solve:
\(A * R_{top} = D - B_f *
R_{bottom}\)
Here, we just use the method above row-by-row to compute each row of
\(R_{top}\).
And we're done.
An astute reader may notice a vague resemblance to ring signatures,
at least for the 1-of-2 case: you compute one of the two terms "the easy
way", and you compute the other value to match (which you're "not
supposed to be able to do") using a trapdoor.
For clarity, note that the \(R\) in the trapdoor is never given
out, not even to the decryptor. \(R_f = {R_{top} \atop
R_{bottom}}\) is what's given to the decryptor.
Now, let's recap what we have:
- Two global public matrices \(A\) and \(D\), one random and the other
computed via a trapdoor.
- For each wire, a matrix \(B_w\); the output wire's matrix is
\(B_f\)
- A decryption key \(R_f\) satisfying \((A\ |\
B_f) * R_f = D\)
- An encryption is \(c_i\) values, one for each input
wire, and a \(c_A\) corresponding to \(A\) and a
\(c_{out}\)
corresponding to \(B_f\) but with the message mixed
in.
- The decryptor walks to \(c_f\), and computes \(c_{out}
- (c_A\ |\ c_f)\) to obtain the message. But this works
ONLY if \(C(tag)
= 0\); if it is not, then there is a \(s * x_f *
G\) term mixed in that throws off everything.
- The most important amazing property of this is that you're
able to generate a ciphertext whose decryptability depends on a long
computation, without yourself ever running that computation
(though an "authority" - think, trusted setup - does need to also run it
once)
The main limitation of this scheme is on circuit depth: each
multiplication increases error by a constant amount, and so the deeper
the circuits, the bigger the values and the matrices need to be to
compensate. You can do arbitrarily deep circuits in theory, but
if you try, the overhead of making the ciphertexts big enough to handle
the error blows up. This is what leads to the depth limitation in basic
succinct FE, which the wrapper we will discuss later removes.
And that's ABE.

By the way, it's also not this Abe.
Functional
encryption for fixed-depth circuits
Surprisingly, at least for this bit, we're now done the hard math.
The next section is going to be just stitching together lego bricks.
First, the definition of functional encryption:
Authority generates keys for a function with circuit
\(C\) , they
generate circuit-independent public params (including a public key) as
well a decryption key which depends on \(C\)
Encryptor encrypts \(x\) with the public key
Decryptor learns \(C(x)\) and nothing else
A particular property that we'll be after is succinctness:
like we saw with ABE, encrypting must be cheap, even if the
circuit is large.
Here's the process (following the GKP+13
construction). First the authority's work. The authority chooses the
circuit \(C\) that
they are generating parameters for.
- Let the length in bits of an FHE ciphertext be \(w\)
- Generate ABE public params and \(2*w\) ABE keys
- The ABE circuit is "take an FHE pubkey \(FHE.pk\) and the FHE-encrypted input
\(E_x\),
compute \(FHE.eval(C, E_x)\), output the i'th
bit"
- "The i'th bit" has w options. Also, we make one set of params that
output the i'th bit directly, and another that flip it before outputting
(ie. the decryption is conditional on the output being 1). This
contributes two options per bit. Hence, \(2*w\) sets of params total.
- Publish the public params of the ABE, and give the \(2*w\) ABE
decryption keys to the decryptor.
An alternative to issuing \(2*w\) ABE encryptions is to modify
the ABE so that it simultaneously encrypts two messages, one decryptable
only if \(C(tag)
= 0\) and the other only if \(C(tag) = 1\). This adds some more
complexity, but saves 2x overhead.
Now, the encryptor's work. The encryptor has a value \(x\) that
they want to encrypt.
- Generate an FHE keypair \((FHE.sk, FHE.pk)\), encrypt \(x\) to get
\(E_x\)
- Generate a garbling of the circuit that takes in a ciphertext \(E_y\) and
outputs \(FHE.dec(E_y, FHE.sk)\) (note that
\(FHE.sk\) is
hardcoded into the circuit here)
- This garbled circuit has input labels \(L_{\{index,
bit\}}\) and then more labels for each intermediate gate
and outputs. Encrypt \(L_{\{index,
bit\}}\) under the (index, bit)'th ABE key, using \(E_x\) as
the tag.
- Output: \(FHE.pk\), \(E_x =
FHE.enc(x, FHE.pk)\), and the full garbling, but with the
input labels encrypted.
The decryptor does the ABE decryption procedure on the encrypted
\(L_{in}\)
labels using \(E_x\) as the tag, and thereby learn
the input labels. They run the garbling and get the output.

Now, let's recap what we did here:
- We kept the ABE property that the encryptor does not have to
run the circuit; only the authority and the decryptor do.
- We changed the logic: instead of the circuit \(C\)
conditioning whether or not you can decrypt some unrelated
message \(m\), the
circuit is the only evaluation of \(x\) that you can decrypt
- The core idea is that the ABE's property (only decrypt is you're
"supposed to", as defined by some circuit \(C\)) ensures that the decryptor only
gets the input labels to the garbling that they are supposed to. The
garbling allows them to FHE decrypt only that value. If they try to run
some different computation, then the \(c_f\) in the ABE decryption will
have an extra dangling \(s *
G\), they won't be able to finish decrypting to obtain
valid input labels, and so they won't be able to run the garbling.
- We inherit the main limitation of ABE: because the thing done inside
the ABE is FHE evaluation of the circuit, which is as deep as the
circuit multiplied by a significant blowup factor, it only works well
for circuits of limited depth.
Functional
encryption for unlimited-depth circuits
This one is a one-liner. Instead of ABE-encrypting to \(C\), you
ABE-encrypt to the function that takes as input \(x\) and
generates a garbled circuit for \(C(x)\) (using eg. \(hash(x)\)
as the randomness for the garbling). Garbled circuit generation is
parallelizable, so this is low-depth even if the circuit is high-depth.
The decryptor runs low-depth FE to obtain the garbled circuit including
the appropriate input labels, and then walks the garbled circuit in the
clear to obtain the outputs.
And, we're done succinct functional encryption! To recap the whole
thing, here's our original diagram again:

How do you build an XiO
scheme?
Let us first remember the XiO definition:
Generator chooses parameters for a hidden function
with circuit \(C\) that has a single-bit output.
They generate an encoding of that function. This step is allowed to take
as long as evaluating \(C\) on all possible inputs or even
longer, but the output must be smaller than the truth table (the set of
outputs for all possible inputs)
Evaluator evaluates the encoding to learn \(C(x)\) for
any input \(x\).
"XiO" technically stands for "exponentially-(in)efficient
indistinguishability obfuscation", but it's best not to think of it as
being obfuscation at all. Instead, it's a way to compress a piece of
data, in the special case where the data is generated by a program that
is much smaller than the data, and where you want to hide the program.
It's designed for "thunks" (zero-input functions) that output \(n\) bits,
not what you might intuitively think of as "actual functions".
Most known constructions of XiO to date build upon a primitive called
split-FHE. Here's a definition of split-FHE:
Generator is assumed to have the FHE encryption and
decryption keys, and a specific ciphertext that they have in mind. We
assume an encryption scheme that can support efficiently encrypting
vectors (and not just single bits). Given a specific ciphertext \(C\)
encoding a specific vector \(v\), they generate a hint \(h\), which
is much smaller than \(v\).
Decryptor, given \(C\) and \(h\), can use the hint to decrypt
\(C\) to get
\(v\), but
cannot use it to decrypt any other ciphertext or learn anything else
about the FHE keys or the computation that \(C\) was derived from.
Here is how XiO built on split FHE works:

If we are encoding a thunk with output size \(n\), we
treat it as a function \(Q\) that takes an index \(i \in
\{1...n\}\), and outputs the i'th bit produced by the
thunk.
We split up the input space into \(\sqrt{n}\) buckets each of size
\(\sqrt{n}\).
For each bucket, we FHE-encrypt (using GSW) each index inside that
bucket (eg. \(1...\sqrt{n}\) for the first, \(\sqrt{n}+1 ... 2*\sqrt{n}\) for the
second), and run \(Q\) on each index. Then, we use a
special technique to convert the \(\sqrt{n}\) ciphertexts into a packed
ciphertext that holds all values. And finally, we release a
constant-size hint for it. The XiO is the set of all \(\sqrt{n}\)
constant-size hints.
There are three technical pieces here to understand:
- Some changes to GSW
- How the packing works
- How the hint mechanism works
In reality, the three are connected. We don't ever use the packing in
any context other than hint generation and hint-based
decryption, and the changes to GSW make it unable to decrypt in any way
other than hint-based decryption, so we may as well treat it
all as one algorithm. Let's go through these in turn, following the WW21 protocol.
Changes to GSW
Remember plain GSW: \(C = A + \frac{q}{2} * m *
G\), where the freshly-generated-per-message \(A\)
satisfies \(s * A =
e\) for low-norm \(e\).
Here, we modify the GSW to: \(C = A * R + \frac{q}{2} *
m * G + E\) , where \(A\) is a fixed "public key" matrix,
and \(R\) is
randomly chosen freshly each time but is low-norm. \(E\) is a
new error introduced to protect \(R\) from being revealed, also
randomly chosen freshly each time but low-norm.

Importantly, \(A\) is tall and narrow, and \(R\) is wide
and short. The small dimensions of the two need to match. This is
important, because this dimension is what determines the size of the
hint. The hint can't just be one value, it needs to have some width to
preserve LWE security, but it can be much smaller than the sizes of the
matrices, and hence the vectors it will be helping to decrypt.
Also importantly, the multiply-by-s trick used to decrypt
gets blown up by the new error matrix, so there is no longer a working
"decryption key". The ONLY thing you can do to ciphertexts,
other than evaluating them even more, is something like the hint-based
decryption mechanism. For this reason, the authors are clear to call it
a homomorphic commitment scheme, and not homomorphic
encryption.
Packing and hints
Suppose you have several ciphertexts of the above form: \(C_1\),
\(C_2\) ...
\(C_k\). They
all share the same \(A\), but have different \(R_i\)
matrices, and different \(m_i\) (messages).
Here is how we combine them all into one vector:
Let \(u_i\) to be the vector that has the
same width as \(R\) and \(G\), and that is all zeroes, except
it's a 1 at the \(log(q)
* i\) position (eg. if \(q=16\), for \(i=1\), it's
\([\ 0\
0\ 0\ 1\ |\ 0\ 0\ 0\ 0\ |\ 0\ 0\ ...\ ]\) (separators added
for clarity). Notice that the \(log(q) * i\) position is exactly
where the i'th row of the \(G\) matrix has its highest
value.
- In plain terms, think of \(u_i\) as the "keep only the \(log(q) *
i\)'th column, discard everything else" operator.
- In terms of its role, think of \(u_i\) as meaning "extract the most
useful thing about the ciphertext (the place where \(m\) is
stored in highest-significant-bit form), and put it into row \(i\) of the
output.
Compute \(c_f =
C_1 * u_1 + C_2 * u_2 + ... + C_k * u_k\)
If \(q\) is not
a power of 2, you have to do something slightly more complicated:
instead of the i'th block of \(log(q)\) being many zeroes followed
by a one, it must be the least-significant-bit-first binary encoding of
\(\frac{q}{2}\). The former is the
\(q = 2^k\)
special case of the latter.
That's it. To compute the raw hint, we "track" the \(R\)
matrices through the additions and multiplications in the FHE along with
the ciphertexts (exercise to the reader: go through the "track the \(R\)"
algebra for ADD and MUL yourself; it follows closely the same \(G\) and
\(G^{-1}\)
based approach used with \(C\) to keep error low-norm), and
then apply the exact same procedure to \(R\): \(r_f =
R_1 * u_1 + R_2 * u_2 + ... + R_k * u_k\)

Now, remember that \(C_i = A * R_i + m * G +
E_i\), so if you subtract \(A * R_i\)
from \(C_i\), you
get \(m * G +
E_i\), which has \(m\) in \(log(q)\) adjacent positions on each
row of the matrix. Here, though, we are dealing with many different
\(R_i\)' values, and we
want to have a single operation that reveals all \(m_i\)'s at
once. To do this, we are employing a trick: in the i'th row, we are
using \(R_i\). So
if you subtract \(c_f - A
* r_f\), the i'th row of the result will correctly cancel
out \(A *
R_i\), and give you the value \(m_i * q/2\)
that you need (plus an error).
There is only one remaining problem: the raw hint \(r_f\) on
its own is not a secure hint. It reveals too much information
about the \(R_i\) matrices, which are meant to
be private. To deal with this problem, we mask it:
- Generate a fresh secret \(s\) (same size as the hint)
- Release \(d_f =
r_f + s\) instead of \(r_f\) on its own
- Also release a "sample", \(b = A * s + e\)
The verifier knows \(c_f\), so they can then compute:
\(c_f + b
- A * d_f\)
\(= (A *
r_f + m * G + e_1) + (A * s + e_2) - (A * r_f + A *
s)\)
\(= m * G
+ e_1 - e_2\)
For security, the error in the sample must be much larger than the
other errors, so as to fully hide information about them. So we have two
layers of scale separation: the sample error is large enough to smudge
the other error, and the message is large enough to avoid being
corrupted by the sample error.
This second version of the scheme is secure, but it has a new big
problem: the sample \(b\) is full-width, not short like
the hint. So we make one more modification: instead of releasing \(b\)
directly, we release a gadget that creates it.
Here is what we do:
The generator makes a fresh encryption of a \(seed\) (as
in, a value that you generate hashes from), and releases an FHE
encryption of the seed: \(A * R_i + seed[i] * G +
E_i\) for each bit in the seed
For each bucket \(\beta\), the generator and the
decryptor can both do the following computation:
Compute \(A * hash(seed, \beta) +
e_\beta\) inside FHE (where \(e\) itself is error generated by
taking another \(hash_2(seed,
\beta)\)), and get ciphertexts: \(C_i = A * R_i + m_i *
\frac{q}{2} + E_i\), where \(m_i\) is the i'th bit of \(A *
hash(seed, \beta) + e_\beta\).
Apply the packing procedure we described above, but with a \(u_i\) for
each of the \(k * log(q)\) columns, not just the
\(k\) columns
at multiples of \(log(q)\) minus one. For clarity,
we'll treat this as a sum with two indices: the within-block index \(0 \le t \lt
log(q)\), and the block index \(0 \le j \lt k\). We'll treat
everything as zero-indexed here to make the math cleaner.
- Remember: the original packing procedure gave you \(c_f =
\sum_i (A * r_i + \frac{q}{2} * m_i)\). Here, you get
\(c_g =
\sum_{j,t}(A * r_{t + log(q) * j} + 2^t * m_{t + log(q) *
j})\).
- We are assuming that within the circuit, the j'th block of \(log(q)\)
ciphertexts is representing the binary encoding of the j'th number in
\(A * hash(seed, \beta) +
e_\beta\).
- Hence, within the j'th block, \(\sum_t 2^t * m_{t +
log(q) * j}\), we are "evaluating" the binary
decomposition of the j'th entry of \(A * hash(seed, \beta) +
e_\beta\), and because of the structure of \(G\), all of
these values get sent to the j'th row of the output.
The output of the packing procedure is \(c_g =
A * r_g + A * hash(seed, \beta) + e_\beta + e_g\). Notice
that \(A * hash(seed, \beta) +
e_\beta\) lives in the place where normally \(m *
\frac{q}{2}\) would live. This is a direct result of the
way that we modified the packing procedure above. The expression for
\(c_g\) can
be rewritten as \(c_g = A * (r_g +
hash(seed,\beta)) + e\)
The generator can compute \(r_g +
hash(seed,\beta)\) in the clear, the evaluator can't. So
the generator uses this as the \(s\) value that they mix in to the
hint.
The decryptor's calculation is the same as before, except that
they don't receive \(c_f\) from the generator; they
generate it themselves using the method above. The math for why it
cancels is also exactly the same.
The papers use \(prf\) ("pseudorandom function") to
refer to the hash. This just means that the hash needs to have strong
randomness properties (much stronger than collision resistance, but
weaker than random oracle); if you trust hashes to satisfy the random
oracle model, then you can safely think \(prf = hash\) everywhere.
And that's it. The XiO is:
- The public parameters of the system (eg. encryption keys, the
encrypted circuit to evaluate on)
- The hint \(r_f +
s\) for each bucket (where \(s = r_g +
hash(seed,\beta)\))
- The FHE encryption of the \(seed\) to enable computing samples
that let you cancel out \(s\)
Evaluation requires re-doing the execution just for one particular
bucket, then using the hint to decrypt the final vector, and finally
taking out the value(s) you need from that bucket (in the obfuscation
use case, the decryptor is taking out all values from all buckets).
There is one further optimization to maximize efficiency while
guaranteeing provable security. The XiO is \(O(\sqrt{n})\) in the size \(n\) of the
thunk it's encoding, but it requires publishing trusted setup parameters
of size \(n\) (these
exist for a technical reason: the function that generates the \(A * s + e\)
samples that compensate for the hint needs to have a never-used path
that accepts and uses fully random values, this allows for the security
proof). This trusted setup is length \(n\). If you split the XiO into \(\approx
n^{\frac{1}{3}}\) separate XiOs, then the size of the XiO
goes up to \(\approx
n^{\frac{2}{3}}\) but the trusted setup (which can be
reused by all of them) goes down to \(\approx
n^{\frac{2}{3}}\) too.

XiO and security assumptions
All parts of the obfuscation pipeline other than XiO can be
justified with fairly "standard" lattice-based assumptions. Ultimately,
all cryptography is just a guess that some mathematical problem is
"actually hard" and it's not just that we've been too dumb to solve it
so far but we might figure it out next year. But some guesses are more
well-founded than others. Among all quantum-safe primitives, the basic
LWE assumption underlying lattices is typically considered less robust
than hashes, but more robust than almost anything else (including more
complex lattice assumptions).
But with all existing XiO primitives there is a problem. The design
above is not strictly provable under the lattice assumption. It
is made provable under the assumption of the form "if you
publish \(A * s +
e\)-style ‘samples', and also publish a little bit of
extra information, then we're still fine". But there's a problem:
such assumptions have been broken before, and ultimately got broken
here.
In 2013, an early candidate for multilinear maps (a very strong
primitive that, if we had it, would make obfuscation far
easier) was published in
CLT13. In 2015, however, it was broken.
The object that CLT13 released was \(p_{zt} \approx z *
\frac{h}{g}\). Ciphertexts are of the form \(c =
\frac{a + gr}{z}\). If you multiply a ciphertext of \(a=0\) with
\(p_{zt}\),
you get \(c *
p_{zt} = \frac{gr}{z} * z * \frac{h}{g} = r * h\). \(r\) and
\(h\) were
both very "small" values (much smaller than the modulus), and so the two
multiplied together will also be much smaller than the modulus. Thus
\(p_{zt}\)
functions as a "zero check" key, which is what allows you to do pairing-check-style
operations on lattice ciphertexts (which unlike pairings, can multiply
to arbitrarily high depths if you scale the parameters).
The problem: \(h *
r\) reveals an exact expression in the underlying secrets
of the scheme over the integers, an so if you have enough such values,
you can perform some matrix operations and recover the secrets.
The HPLS conjecture, the core new assumption in WW21, has what is on
surface also a way to leak what is "just the error":
- The adversary knows \(d_f = r_f + s\) because it is
published
- If the adversary has \(r_f\), then with enough of these
values they can solve linear equations to recover the secrets (this on
its own is ok: after all, the whole point of the \(s\) is to
mask the \(r_f\))
- If the adversary has \(s = r_g +
hash(seed,\beta)\), then notice that that value satisfies
\(c_g = A * (r_g +
hash(seed,\beta)) + e\), and so if they multiply it by
\(A\) and
then subtract that from \(c_g\) (which they can also
calculate), then they get an "error" value \(e\) in the clear
Notice that even if extracting \(e\) is fatal, WW21's security still
has another line of defense: the adversary doesn't learn either one of
the fatal secrets (\(r_f\) and \(s\)), they
learn the sum of the two, and maybe there aren't any useful
combination attacks that start from that sum. But even extracting \(e\)
probably is fine. The attack
in HJL21, which broke the HPLS assumption, managed to do one
specific thing: they figured out how to adversarially construct circuits
in such a way that the lowest-order bit of the error (which
survives "smudging" tricks that normally protect schemes like this)
encodes a variable that outputs whether the execution is "real" or a
"simulated" execution used in the proving process. To summarize:
- The attack requires the adversary to craft the circuit; no
legitimate actor would design circuits in the way that the adversary
would
- The attack only leaks a variable used inside the machinery to prove
the scheme's correctness, it doesn't actually leak any secrets
- The attack requires knowing \(s\), and actually the adversary only
knows \(d = s +
r_f\)
Hence, even if the assumption is broken in limited cases, the
security of WW21 is probably fine.
A new protocol was
published in 2025, which adds some extra complexity to the "help the
decryptor discover samples of the offset to the hint" piece of the WW21
technique to make it depend on lattice assumptions that are more limited
than HPLS, but are still more aggressive than plain LWE. In particular,
the assumption that HJL25 uses falls into a framework of "only publish
objects where the secret-bearing part gets mod-q wrapped" (which
excludes both making available "raw error" and CLT13's \(r * h\)).
This is not as conservative as plain LWE, but it is a meaningful
improvement. The space between "plain LWE" and "the secret-bearing part
must be mod-q wrapped" contains assumptions like circular security, as
well as the assumption in HJL25. There are counterexamples
to assumptions of this type, but they are highly contrived, and real
protocols based on these types of assumptions have proven secure so
far.
One analogy you might have is to other "idealized black box"
assumptions:
The ROM has so far proven fine in practice, but researchers have
designed signature schemes that are provably secure in the ROM but easily
attackable if instantiated with any real hash algorithm. And the
generic-group model is known to be false in a few cases, most notably order-finding
via Schoof's algorithm, and pairings.
Elliptic curve security rests on a shaky foundation that you can violate
the generic-group model in those two ways, but not any more harmful
ways.
The one family of protocols that proves iO entirely from "standard"
assumptions is JLS20
and descendants, but these are not quantum-safe.
Summary: cryptographic assumptions are a complicated zoo, and there
are no absolutes, only different comfort levels.
Another important note is that if we want to be provably reducible to
these security assumptions, then the security proof of the obfuscation
of a \(k\)-bit-input program needs to "walk
over" all possible \(2^k\) inputs. For each input, you
consume a little bit of security margin. Thus, the proof is consuming
security margin potentially way more than \(2^{128}\) times. For this reason,
all of these obfuscation protocols require the LWE parameters to be
subexponentially secure, which means the ciphertexts are many
thousands of times larger than regular LWE ciphertexts (and because of
the stacked nature of the protocol, this "many thousands" itself
compounds on itself several times).
If you wish, you can "yolo and ignore this", the same way that
recursive STARK protocols yolo and ignore the fact that they require the
hash function to simultaneously be a random oracle and be a concrete
thing that can be instantiated by a circuit. Doing that is
plausibly okay, but if you do enough plausibly okay
things for efficiency, you run into the philosophical question: why even
bother to fight so hard to make obfuscation provably secure under
standard assumptions, instead of adopting one of the other approaches
(eg. diamond iO) that are
provably secure under aggressive assumptions but have far better
efficiency?
In summary, the answer to "what cryptographic assumptions do you need
to make a working obfuscation protocol" is:
XiO and trusted setups
Finally, another obvious but important thing to highlight: the
security of obfuscated programs depends on trusted setups in a whole
bunch of places, and so we are very far from being able to do fully
trustless obfuscation (ie. generate programs with internal secrets
in such a way that even the obfuscator does not know the secret). The
best that we can do is replace the single party trusted setup with an
N-party trusted setup.
Here is one natural way to do this.
We define a generator \(G(H, s)\) which returns a two-input
program \(F(\beta, x)\) with the following
behavior:
- If \(\beta =
0\), it interprets its second input as a program \(P\), and
returns \(Obf(Q)\) where \(Q(input) = P(input, hash(s, P,
0))\). That is, it takes as input a program \(P\) that
has a slot for a secret, and outputs an obfuscated version of \(P\) with
its internal secret hardcoded in.
- If \(\beta =
1\), it interprets its second input as randomness, and it
outputs \(Obf(H(H, hash(s, x,
1)))\).
By convention, we will say, \(G(G, x) = G_x\) (note: this is using
a trick similar to the Y-combinator)
and \(hash(s_i, r_{i+1},1) =
s_{i+1}\).
Here's how a chain of N participants would look:
- Alice runs \(G(G,
s_1)\) to generate \(G_{s_1}\) (note that the raw \(G\) is
unobfuscated, but the output of this step and all future steps will be
obfuscated)
- Bob calls \(G_{s_1}(1, r_2)\) which outputs
\(G(G, hash(s_1, r_2, 1)) =
G_{s_2}\)
- Charlie calls \(G_{s_2}(1, r_3)\) which outputs
\(G(G, hash(s_2, r_3, 1)) =
G_{s_3}\)
Each participant would also run their output \(G_{s_k}\)
with \(\beta =
0\) on \(P\) to obfuscate \(P\) with
\(s_k\) (the
mix of all previous secrets) as the secret, and publish this
obfuscation. Only the last participant's obfuscation will actually mix
in everyone's secrets, but in our model, we are assuming that
participants come in one after the other, and none of them knows if they
will be the last.

This way, the setup is universal and updateable, the
gold standard for trusted setups.
And to verify the correctness of each step, each participant would
need to make a recursive STARK proving that (i) the previous step was
recursive-STARKed, and (ii) they added their randomness correctly.
However, this method is extremely expensive: it requires a STARK
of obfuscation of obfuscation. An open challenge is to figure out
the ideal way to do universal and updateable trusted setups for
obfuscation.
So what are the runtimes?
Perhaps the most famous fact about these obfuscation protocols is
that their runtimes are technically polynomial, but in practice
galactic. To see why, we just need to look at the full tower again:

- To decrypt FE, you need to run ABE decrypt of FHE eval of GC
generation. Each one of those replaces each bit in the computation with
a cryptographic object - either a hash or a matrix. FHE also requires a
bootstrapping step that requires running FHE decryption inside
FHE. ABE alone requires an overhead of \((log\ q)^5\), where \(log\ q\)
must be significantly above the security parameter \(\lambda
\approx 120\). And you're stacking these three things on
top of each other. So FE alone requires an overhead factor somewhere in
the trillions of trillions.
- To evaluate sublinear RE, you need to evaluate an XiO of a succinct
FE encryption. of a VM of the underlying program. XiO is mostly FHE;
fortunately, you can reuse intermediate computations between the
different bits that you're encoding, so for functions where most of the
work is in generating a single thing, and extracting individual bits is
relatively trivial, XiO overhead ~= FHE overhead, although unfortunately
it's a very inefficient type of FHE. Also fortunately, the XiO is of the
procedure that generates a succinct FE encryption, which does
not care about what's underneath. But when you're evaluating, you then
also have to, inside FE, generate the XiO of the next
round.
The total theoretical overhead here is thus roughly: \(ABE *
FHE * GC * XiO * sFE_{enc}\), where \(XiO\) is
mostly FHE. All that per bit in the input.
Now, on top of all that, the parameter choices for each
individual scheme have to be very conservative for subexponential
security, adding over 1000x extra overhead at each LWE
layer.
This is why these schemes are far from practical today; expected
runtimes are longer than the lifetime of the universe.
Where do we go from here?
There are three routes:
- Someone (or some bot) very smart discovers a way to optimize and/or
simplify the above tower, using similar assumptions to what we have
today. Perhaps there are ways to collapse ABE and XiO together into one
single gadget - maybe find a construction that both has
ciphertext-independent hints (like ABE) and is truly fully
homomorphic so you can compute multiplications without having the
cleartext wire values (like the split FHE used in XiO).
- We figure out a way to do obfuscation with more aggressive
cryptographic lattice assumptions, and thus create a different and much
simpler tower.
- We figure out a way to do obfuscation in some completely totally
different way, not using lattices at all. Maybe this means inventing a
new category of assumptions.
Routes (2) and especially (3) also imply that we get better at
battle-testing such assumptions, again perhaps with AI help.
The next posts in this series will talk about what is perhaps the
leading construction attempting to do (2), diamond iO, and what is
perhaps the leading construction attempting to do (3), local mixing
obfuscation.
If we succeed in either path, the reward is high: there is a real
sense in which we will have "solved cryptography": any protocol that can
be described using an idealized trusted third party, provided the
adversary is allowed to rewind the clock, will be implementable
securely. But getting there is still a formidable challenge.
Obfuscation: building the final boss of cryptography (Part I)
2026 Jun 29 See all postsSpecial thanks to Sora Suegami, Janmajaya Mall, Aayush Jain and Fun Killer for feedback and review.
The most powerful primitive that has been conceived in cryptography is obfuscation. Obfuscation lets you convert a program \(P\) into an "encrypted program" \(Obf(P)\) such that you can run \(Obf(P)\) on cleartext inputs and get the same cleartext outputs that \(P\) gives you, but the internal workings of \(P\) are hidden. The precise formalism typically used, indistinguishability obfuscation (iO), says that if you are given obfuscations of two different programs that have the same functionality, you can't tell which is which. Effectively, it's hiding the code, not the data.
Obfuscation is powerful because it comes very close to the theoretical ideal of a universal "trustless trusted third party":
Source: The God Protocols (Nick Szabo), 1997
Cryptography protocols are often described by first imagining a protocol that relies on a trusted third party who sees everyone's messages and responds honestly, and then figuring out some way to do the same thing without the trust.
Obfuscation (technically, obfuscation plus hashes) lets you make a simulated trusted third party for basically any protocol, so it can replace both of the above and much much more. There is only one major exception: an obfuscated program can't prevent itself from being copied, so it can't do "stateful" things like money - and that's exactly the gap that blockchains are well-placed to fill.
And so if you have obfuscation and a blockchain, you can do some pretty magical things. Like, say, a secure, private and collusion-resistant voting system that has almost no trust assumption at all - no M-of-N threshold committee required. Or basically anything from this list from 2014, without any M-of-N trust assumption.
A fairly general-purpose way to combine obfuscation and blockchains to make something very close to a "trustless trusted third party"
So what's the catch? Well, it turns out that making a secure form of obfuscation is really really hard.
There has been a decades-long tradition of insecure obfuscation: people shuffling around the logic in compiled programs to make it harder to see what's going on - admittedly, often to prevent users from modifying proprietary programs like games. This is the equivalent of things like the Caesar cipher for encryption - and, like Caesar-style ciphers, it regularly gets broken.
As a result, there has also been a decades-long tradition of trying to create an obfuscation protocol that we can mathematically prove is secure. But almost from the beginning, this ran into a problem. In 2001, we got a famous result that creating an ideal form of obfuscation - obfuscate \(P\) in such a way that running \(Obf(P)\) reveals nothing beyond what you can learn by querying an API that gives \(P(x)\) for any user-supplied \(x\) - is impossible. The core idea is that a code instantiation of \(P\) always reveals at least something beyond its outputs to user-supplied inputs: at the very least, you can learn things by applying \(P\) to its own code.
From that point, researchers shifted to trying to prove the second-best target: indistinguishability obfuscation (iO). This has been a twenty-year project, with many failed attempts, many constructions of protocols that build on top of an ingredient that does not yet exist, many people trying to build that ingredient, failed attempts of that, and so on. But in the last few years, we finally have some good news: we know how to achieve iO under reasonable security assumptions.
But within the good news, there is bad news: the run time is literally galactic. It's technically polynomial, but it involves stacking many layers of "take one thing that's vaguely like fully homomorphic encryption, now put the circuit for evaluating that into another thing that's vaguely like fully homomorphic encryption, run that in plain old regular fully homomorphic encryption once for each bit generating an intermediate multi-megabyte value, and oh yeah, did I mention that you have to put that whole thing into another thing that's vaguely like fully homomorphic encryption, and then run all that once for each bit in the input?" As a result, runtimes for these "sort of provably-secure iO schemes" are somewhere over λ10 (where λ is the "security parameter", ie. the logarithm of how long it takes to break the scheme; it's standard to say λ = 100 or 120)
There are two hopeful stories that you can tell here. One is that this is similar to where SNARKs were in 2010, and now that we know it's possible, smart people (and bots) will start coming up with clever workarounds to each bottleneck, and chopping off orders of magnitude from the runtime one after the other, and eventually we'll get to something that "only" takes a day on a heavy GPU to run (that may still sound prohibitive, but it's actually enough for many interesting applications). Another is that we will see more work on a different strategy toward the same goal: getting much better at developing new cryptographic assumptions, and getting better at telling which new assumptions are likely to be actually safe.
If you add in the more heuristic approaches, there are roughly three (not-yet-dead) families of obfuscation protocols so far, and you can place them on a "tradeoff frontier" of efficiency vs bravery on security assumptions:
This post will describe in detail the most galactic, but also the most rigorous, family so far: the one that's in blue in the diagram above.
Just a warning: there will be a lot of math. Obfuscation is hard because it basically requires stacking almost every primitive that cryptographers have invented in the past twenty years, except for the primitives that you already know about if you're a blockchain developer, such as SNARKs and STARKs. The underlying math will also be different: whereas SNARKs and STARKs tend to have a lot of polynomials, hashes and elliptic curves, obfuscation will have a lot of lattices, vectors and matrices.
Notation notes
The standard pipeline
The "reasonably provably secure" obfuscation protocols that are built today are built on top of a decade-old tower of constructions: the AJ15 / BV15 / LPST15 / LPST16 lineage.
AJ15 and BV15 are two roughly simultaneous papers that both discovered roughly the same way to build obfuscation on top of a primitive called functional encryption: an "authority" publishes keys tied to a function \(F\), and once those keys exist, anyone with the encryption key can encrypt \(x%\) in such a way that anyone with the decryption key can recover \(F(x)\). LPST15 came up with something similar. But the functional encryption required needed to have very strong properties which were not yet available. Fortunately, a year later, LPST16 discovered a way to build obfuscation on top of something similar, sublinear compact randomized encoding, and then built it on top of an already-known "succinct-but-not-compact" functional encryption scheme plus a new primitive called XiO - obfuscation that is only slightly smaller in size than publishing a table of the outputs of the function on all possible inputs. The bulk of the work since then has been figuring out ways to actually implement XiO (though some protocols take other routes, eg. JLS20).
We will split this description in three parts:
Assuming you have a succinct FE scheme and an XiO scheme, how do you build an obfuscation protocol?
There are several ways to build an obfuscation protocol on top of succinct FE and XiO, and they are all roughly equal in their high-level principles and their properties. So I will stick to a simplified version of the LPST15 design.
Note that in this post, we will often use "function" and "circuit" interchangeably. A circuit is a set of AND, OR, NOT, etc gates with wires linking them that can be used to evaluate a function. To get a better intuition of circuits, I recommend this post explaining garbled circuits (a primitive that we will need later anyway).
First, let us define succinct FE and XiO, so we understand the properties of these gadgets.
Here is succinct FE:
In other words: someone does a trusted setup, I encrypt \(x\), you can decrypt \(C(x)\) (different from fully homomorphic encryption: in FHE, anyone who can decrypt \(C(x)\) can also decrypt \(x\) or any other function of \(x\))
Functional encryption is not on its own sufficient for obfuscation, because (i) it doesn't hide the function, and (ii) each published encryption is bound to one input and can only decrypt the function evaluated on that one input. But it gets you a lot of the way there.
Note that the more common definition of succinct FE requires the FE to tolerate high circuit size, not high circuit depth. In our usage, however, we do need to tolerate high depth. Fortunately, depth-independent succinct FE is a fairly simple wrapper on top of depth-bounded succinct FE (we will get into this later), so in this post we will just assume that the succinct FE that we use is also depth-independent.
Now, here is XiO:
The class of functions that XiO is designed for is functions that are really best understood as zero-input functions (aka "thunks") that produce a large but manageably-sized output. Given a thunk \(() \rightarrow X\), we can think of it as being a function \(i \rightarrow X[i]\), ie. the function which takes as input an index \(i\) as input, and returns the i'th output of the thunk. These are two different views of the same object; in this post we will regularly bounce back and forth between these views.
For example, you could imagine taking a function that generates a STARK (a roughly 128-512 kB sized cryptographic proof) and turning it into a function that takes as input an index \(0 \le i \lt 2^{22}\), generates the STARK internally, and then outputs the i'th bit of the STARK. The goal of the XiO is that it's a gadget smaller than the STARK (though note: even a STARK may be too small to actually allow known XiO protocols to shrink it) that lets you generate that STARK, without being able to learn anything else about the underlying process that's generating it.
Now, we combine succinct FE and XiO into a primitive called a sublinear compact randomized encoding:
Let's walk through this carefully. The goal here is to create a "randomized encoding" of \(P()\) - that is, a gadget that enables executing \(P()\) - which is asymptotically smaller than the runtime of \(P\) and asymptotically smaller than its output, and which hides \(P\).
This is slightly different than the usual presentation of randomized encodings, which operate over \(P(x)\) and hide \(x\) but not \(P\) (eg. garbled circuits work this way), but this is what we need for our use case - obfuscation is all about hiding the function.
The primitives that we will be working with do not hide the function. To do this, we make the "outer" function that these primitives are operating over just be a virtual machine (or "Turing machine") - basically, a function \(VM\) which takes circuits as input, so \(VM(P, X) = P(x)\). The circuit evaluating \(VM\) is often called a universal circuit. This way, \(P\) becomes just another input, which can be made private.
One nuance in terms of asymptotic complexity: \(VM\) needs to be instantiated as a circuit, and the size of a circuit must be proportional to its runtime (because circuits do not allow looping). Hence, the size of the \(VM\) circuit must be proportional to the runtime of \(P\). Fortunately, succinct FE allows generation to be fast even if the circuit is large and deep. However, the output may have size and encryption cost that scales with the size of \(P\). This is the reason why \(P\) must be expressed in a language other than circuits (such as a Turing machine): its size must be fixed even if the computations get large.
We do a trusted setup of succinct functional encryption, where the creator publishes public params and decryption keys for \(VM\). This allows anyone to encrypt \((P, x)\) in such a way that anyone else can decrypt \(VM(P, x) = P(x)\), but without learning \(P\) or \(x\).
To generate this randomized encoding, the creator wraps the two primitives I mentioned above inside each other: they do an XiO of a succinct FE encryption of \(VM(P, x)\). The succinct FE encryption hides \((P, x)\), and it's fast to generate even if \(P\) takes a long time to run. But succinct FE is not output-compressing: if the output is big, the succinct FE encryption is even bigger (and this is an unavoidable limitation of that type of construction). The XiO cuts the size back down. The creator makes a circuit \(C_{sfe}(i)\) that outputs the i'th bit of the succinct FE encryption, and does XiO over that. Its size is now asymptotically smaller, and it's asymptotically faster to generate, than the program \(P\) itself (note "asymptotically": at small sizes, the overhead of the scheme dominates, the key property we want is that for sufficiently large circuits, \(xIO(sFE(VM(P, x))))\) becomes smaller than \((P, x)\) itself, both in byte size and in generation time.
Notice that here we already have our first glimpse of obfuscation: the creator themselves can both do the trusted setup and publish the xIO, and others can execute it without being able to learn the program being run. But we have a big problem: this only works for one pre-configured input.
Now, we get to the next part: the final obfuscation construction.
Essentially, we take the sublinear compact RE primitive, and we apply it recursively, recursing on the number of bits of input going into the function you are trying to obfuscate. At the base case (zero inputs), we just use the sublinear compact RE as above. To add one bit, we make a sublinear compact RE of the process of generating two obfuscations one level lower: obfuscate the function \(P_0(x)\) which executes \(P\) but with the first input bit fixed to \(0\), and \(P_1(x)\) which executes \(P\) but with the first input bit fixed to \(1\).
Remember the key properties of the two building blocks of sublinear compact RE:
Both properties are needed to ensure the recursion avoids blowup.
To evaluate an obfuscated program, you evaluate the top-level RE to get your two new obfuscations (both with one fewer input bit required), then choose either left or right based on the first input bit, then keep recursing further down, until you bottom out with a thunk that generates \(P(x)\) for the \(x\) that is the full path you walked down.
Here's another equivalent way to look at what's going on:
For programs with an n-bit input, there is a depth-n, size \(2^n\) tree of all possible evaluations. However, most of this tree is never evaluated. The whole thing is a tree of "thunks", and so the exponential cost of creating the tree is never paid because all the thunks that are not on the path to the specific input you care about never actually get instantiated. The only thing that does get instantiated is the obfuscation at each step along with path going from the root to the input. And at a very high level, that's it, that's the core design.
Note that the original LPST15 and LPST16 papers describe what is going on slightly differently (but equivalently): they do not recurse on obfuscation directly, rather they recurse on a "node program". One nice property of that approach is that the node program captures the "bits of the input so far", so you don't have to manipulate or wrap \(P\) itself; \(P\) stays unchanged throughout the whole process in their version.
Another thing that the above description elided is that for obfuscation to be secure, it needs to be randomized. So actually, it's not \(iO(P)\), it's \(iO(P, seed)\), where the \(seed\) is a value that is hashed to generate (pseudo-)randomness needed by primitives like garbled circuits. Randomized encoding, XiO and succinct functional encryption also all take a \(seed\) as an argument. When one of these functions calls any other as a subroutine, it should generate the seed for the subroutine via a hash from its own seed, and take care to provide a unique different seed for each one. To make the security proofs clean, the hash must technically be a puncturable PRF, though with the interesting property that the punctured version of the hash needs to be possible to construct but never actually gets run.
How do you build a succinct FE scheme?
The bad news: the succinct FE scheme is a complicated tower of constructions, one stacked on top of the other.
The good news: these constructions are relatively well-understood, use very standard cryptographic assumptions ("just" lattices), and the pipeline to build succinct FE has been understood since roughly 2014.
Here's the diagram first:
Let's go through these one by one. The two most fundamental building blocks are garbled circuits and fully homomorphic encryption.
Garbled circuits
Here is a post where I explain garbled circuits in more detail. The basic summary is:
Garbled circuits can only safely be executed once. If you give someone the labels for two inputs, they are able to compute much more than two outputs. Garbled circuits were originally developed as a 2-of-2 multi-party-computation primitive: I have a circuit \(C\), you know \(x\), I send you a garbling of \(C\), for each input wire I help you learn one of the two input wires without learning which one using a technique called oblivious transfer, and then you walk the circuit to compute the output \(C(x)\).
A basic rough description of how oblivious transfer works: you send me two public keys \(A\) and \(B\) that have to add up to some random hash so only one of them can have a valid private key but I don't know which one, I encrypt one label with \(A\) and one with \(B\), you decrypt the label that you have a private key for, but I can't tell which one that is.
Here, however, we are not doing 2-of-2 computation. Rather, we are using garbled circuits as an ingredient to get the properties that we need for succinct FE. Garbled circuits have many valuable properties. Notably:
Both of these properties will be important.
Garbling hides some information about the circuit, but not enough to be meaningfully useful for anything where you want function-hiding. When function-hiding is required (eg. the 2-of-2 MPC use case), a typical solution is to garble a universal circuit (aka virtual machine) and have the circuit provider also provide the actual circuit \(C\) they want to run as labels that become part of the input. For our use case, we don't care about garbling leaking details of the function, in fact precisely because we are employing this "wrap it in a VM" trick.
Fully homomorphic encryption (FHE)
Here is a post where I talk about FHE in more detail. Because the details matter for the other constructions later in this post, I will provide a basic summary.
FHE is based on a cryptographic assumption called learning with errors (LWE). Basically, if you have an approximate solution to a system of linear equations modulo some number (ie. a matrix \(A\), vectors \(s\) and \(b\), modulus \(q\) such that \(b = (A*s + e) % q\) modulo \(q\), where \(e\) is a "small" (aka "low-norm") error vector, so each value in \(e\) is much smaller than \(q\)), then given \(b\) and \(A\) you can't extract \(s\). If you have an exact equation \(s * A = b\), then that's a system of linear equations (modulo q), which can be done reasonably cheaply with Gaussian elimination. But with errors added, doing this inversion is computationally infeasible.
There are many FHE algorithms built on top of this assumption that have different properties. The simplest idea involves randomly choosing a private key \(s\), generating a random matrix \(A\), and computing \(b = A * s + e + m * \frac{q}{2}\) modulo \(q\) as the encryption of a single bit \(m\) (here, adding a bit to a vector adds that value to every element of the vector).
The creator can then publish encryptions of \(0\) and \(1\) computed this way. To decrypt, you can compute \(b - A * s\) which equals \(e + m * \frac{q}{2}\); the higher-order bit encodes the message and the lower-order bits encode the error which can be thrown away. If you have two ciphertexts \(b_1\) and \(b_2\) that are valid encryptions of \(x_1\) and \(x_2\), then \(b_1 + b_2\) (again, all modulo \(q\)) is a valid encryption of \(x_1 + x_2\).
Notice also that the above design as written can support encrypting vectors of bits in the message slot, and not just single bits. But there's one critical thing that the above design does not actually support: multiplication.
Multiplication is trickier than addition, for a few reasons. First, you can't multiply two vectors by each other and get a vector; you can only do that to other structures like matrices or polynomials. Second, whereas adding two three-term values \(A * s + e + m * \frac{q}{2}\) gives you another three-term value, multiplying two three-term values gives you a nine-term value, so you're still getting something of a different "shape". Third, you end up multiplying the "small" error by "large" other things, which makes the error blow up after only one round of multiplication.
There is no one easy solution to this. There are several families of solutions, that all have different drawbacks. The ones that use polynomials are based off of a stronger assumption called Ring LWE; a commonly-used scheme is BFV. The scheme that is most convenient to FHE usage is based on matrices, and is called GSW. To show how it works, I will first show a simplified version that makes the basic arithmetic work, but does not deal with the error issue, so it breaks if you have nonzero error.
First, generate a secret \(s\). To encrypt a value \(m\), first generate an otherwise-random matrix \(A\) where \(s * A\) equals some "small" or "low-norm" error \(e\). One way to do this is to generate both in two parts:
Then, compute \(c = A + m * I\) (where \(I\) is the identity matrix). And to decrypt the ciphertext, read off \((s * C)[-1]\) (ie. the last value of \(s * C\)).
If you are a mathematician, you might notice that we are "hiding" \(m\) in an
igon valueeigenvalue of C: \(s\) is the eigenvector, so it satisfies \(s * C \approx s * m\). They are not quite an exact eigenvector and eigenvalue; they're eigenvectors and eigenvalues with errors. But first, we can set aside the errors, and note some properties that make eigenvalues special.Adding ciphertexts is again trivial: if
\(C_1\) satisfies \((s * C_1)[-1] \approx m_1\)
\(C_2\) satisfies \((s * C_2)[-1] \approx m_2\)
then
\(C_1 + C_2\) satisfies \((s * (C_1 + C_2))[-1] \approx (m_1 + m_2)\)
But unlike before, we can multiply ciphertexts too: if
\(s * C_1 \approx s * m_1\)
\(s * C_2 \approx s * m_2\)
then
\(s * (C_1 * C_2)\)
\(= (s * C_1) * C_2\)
\(\approx m_1 * s * C_2\)
\(\approx m_1 * m_2 * s\)
Eigenvalues are basically the only attribute of a matrix that has both an additive and a multiplicative property like this, and even there we are restricting to matrices that have the same eigenvector.
Now, let's bring back error tolerance. The mechanism above, as written, has a fatal flaw: if you write out the full equations with error, you end up multiplying error by elements of \(C\) and \(s\), which can be anywhere in the full range \([0, q-1]\). Hence, this immediately blows up error to the maximum. Additionally, even decryption cannot tolerate error.
To plug the first hole (we'll get back to the second at the end), actual GSW relies on a gadget matrix mechanism.
Here is what a gadget matrix looks like:
On its own this looks nice but it does not make much sense. However, it is intended to be paired with an operation (not a matrix) that it cancels out: the matrix bit decomposition operation:
Every adjacent four values in the output of this operation (which we call "\(G^{-1}\)") is the (least-significant-bit first) binary encoding (aka. bit decomposition) of the correspondingly-placed value in the input.
The most important facts about \(G\) and \(G^{-1}\) are:
You can see the latter intuitively. If you trace the highlighted bit-decomposition of 2 (
0 1 0 0) in the above example through the computation, then it gets multiplied by1 2 4 8from the gadget, and the result is \((0 * 1) + (1 * 2) + (0 * 4) + (0 * 8) = 2\). That is, if you apply the gadget matrix to a bit decomposition, it "evaluates" the bit decomposition to get back the original value.In those places where we used the identity matrix (\(I\)) in the simplified description above, in the actual protocol we use \(G\). That is, ciphertexts are of the form \(A + m * G\). We multiply two ciphertexts by computing \(C_1 * G^{-1}(C_2)\). The message is being hidden in something that's no longer quite an eigenvalue-with-errors, but we still preserve the needed properties. The math ends up working the same way, but because the error never gets multiplied by large numbers, multiplication actually works. Specifically:
Note one subtlety in the decryption. The last column of \(G\) contains \(\frac{q}{2}\) as its only nonzero entry, and so multiplying by \(G\) actually gives us an object containing \(m * \frac{q}{2} + e\) (and not \(m + e\)). This is what gives us error tolerance in decryption. This is also why we are now dividing by \(\frac{q}{2}\) when we encrypt. Note that this method as written requires \(q\) to be a power of two. If you want, you can make \(q\) be something else, but this requires tweaking the decryption procedure slightly.
We can check the correctness of multiplication. Let's start with just the algebra. First for addition:
\(C_1 = A_1 + m_1 * G\)
\(C_2 = A_2 + m_2 * G\)
And then for multiplication:
\(C_1 * G^{-1}(C_2)\)
\(= (A_1 + m_1 * G) * G^{-1}(C_2)\)
\(= A_1 * G^{-1}(C_2) + m_1 * G * G^{-1}(C_2)\)
\(= A_1 * G^{-1}(C_2) + m_1 * C_2\)
\(= A_1 * G^{-1}(C_2) + m_1 * A_2 + m_1 * m_2 * G\)
Now, we have to show that the first two terms are both valid "pads", in the same way that the original \(A_1\) and \(A_2\) were valid pads. The core property we need is that multiplying \(s * A\) leaves only a small low-norm error.
The second term is easy: \(A_2\) is a valid pad (\(s * A_2\) is low-norm), and we multiply it by \(m_1\), which is small, so \(m_1 * A_2\) is also low norm and hence a valid pad.
For the first term, let us unpack \(s * A_1 * G^{-1}(C_2)\). \(s * A_1\) returns a "small" error. \(G^{-1}(C_2)\) returns a matrix of ones and zeroes. Hence, \(s * A_1 * G^{-1}(C_2)\) is a small error multiplied by a matrix of ones and zeroes, which is still a small error.
The error in the second pad blows up by roughly the size of the message space (so, 0 or 1). The error in the first pad blows up by roughly the size of the matrix. Hence, each multiplication blows up the error by a constant factor, and so we can predict how many multiplications the FHE can survive before the error gets too high. At that point, you would need to reset the error by bootstrapping: evaluate the FHE decryption circuit inside FHE.
Finally, one more nuance: to keep the error blowup bounded, we need to require messages to stay small; specifically, they must be 0 or 1 (we can also accept eg. -1 or 2). Addition and multiplication do not respect size limitations, and unlike BGV, here we do not have native "wraparound" modulo 2. So to make the above algorithm safe, you would need to use circuits with "logic gates" implemented like this:
This is only one way to do it; it's not optimal; there are ways to do it more efficiently, and that is part of the art of FHE optimization.
Here is some python code that follows along this style of GSW, though note that it uses a slightly different technique where \(A * R\) is the "pad" instead of \(A\) (this has all of the properties above, plus an extra property that we will need later). https://gist.github.com/vbuterin/f0f8a9eb09633226ada20c21a98d537e
It's worth playing around with this kind of FHE, because it helps build intuitions for some of the other primitives that we are going to use below.
Now, let's get to the next one:
Attribute-based encryption (ABE)
Not this Abe.
And not this Abe.
And also not this Abe.
Here is the definition of attribute-based encryption:
Usually, attribute-based encryption is justified by examples like: \(m\) might be medical records at some hospital \(H\), \(tag\) might be an object that represents "you work at \(H\) AND you have a medical degree", so it becomes easy to encrypt the medical records in such a way that only the intended doctors can see them. As far as I can tell, however, the number of use cases that map well to this paradigm and can't easily be solved with much simpler public key encryption is very small. And so ABE has not been used much in practice.
.But here, we have very good news: ABE has some valuable properties that make it very useful in building functional encryption! (And hence, obfuscation) In particular, even for circuits that are large, encryption is cheap. This is the ultimate seed of the asymmetry that allows the "thunks" in the obfuscation to be faster to create than they are to execute, and hence makes the exponentially-sized tree possible at all.
Here is how ABE works. Here we will follow the BGG+14 construction, whose properties are ideal for the succinct FE and obfuscation use case.
We work over a circuit. To have an example in your head, take the circuit that we used in the garbled circuits example (it's a two-bit adder), but remove the "two labels per wire" piece:
Instead, we will have only one matrix \(B_w\) per wire, plus two randomly-generated global public matrices \(A\) and \(D\).
The initial \(B_i\) are generated randomly. To generate the \(B\) matrices for the wires further down, we walk down the circuit going through each gate:
As in FHE, we need to build OR, XOR and AND from ADD and MUL in ways that preserve the wire values staying in \(\{0, 1\}\).
The authority publishes circuit-independent \(A\) and \(D\), and the \(B_i\) for the input wires for the circuit. They also provide the decryptor with a decryption key, a low-norm matrix \(R_f\) that satisfies \((A | B_f) * R_f = D\), where \(B_f\) is the \(B\) matrix for the wire that encodes the output (ie. \(C(tag)\)). Notice that the construction of \(B\) matrices is not dependent on the input to the circuit, which is why the authority is able to construct \(B_f\) in a way that is then valid for all inputs. To make the equation between \((A | B_f) * R_f = D\) correct, \(A\) needs to be constructed in a special way that gives it a "trapdoor"; we will return to this later.
First, let's go through the encryptor and decryptor's logic.
To encrypt, the encryptor chooses a random vector \(s\), and computes \(c_{out} ​= s * D + e_{out} ​+ \frac{q}{2} * m\). They also provide the input encodings, \(c_i = s * (B_i + G * tag[i]) + e_i\), and \(c_A = s * A + e_A\). Notice how encrypting does not depend on the circuit itself; it only requires knowing and doing a few multiplications involving the input \(B_i\) and \(D\).
The decryptor's job will be to start with these \(c_i\) values, and walk down the circuit, doing a homomorphic-encryption-like operation at each gate to convert the \(c_L\) for the left input wire and \(c_R\) for the right input wire into \(c_{out}\) for the output wire.
Here, the ADD case is again trivial: \(c_{out} = c_L + c_R\). The MUL case is the harder one.
The formula is: \(c_{out} = x_R * ​c_L ​+ c_R * ​G^{-1}(-B_L)\)
\(x_R\) here is the actual value on that wire in the computation. To see why this works, let's look at each component in turn. For ease of exposition, I'll write \(G^{-1}(-B_L)\) as \(R_{-L}\); just remember that because \(G\) and \(G^{-1}\) cancel out, \(G * R_{-L} = -B_L\):
\(x_R * c_L\)
\(= x_R * (s * (x_L * G + B_L) + e_L)\)
\(= x_R * s * (x_L * G + B_L) + x_R * e_L\)
\(= s * (x_L * x_R * G + x_R * B_L) + x_R * e_L\)
\(c_R * R_{-L}\)
\(= (s * (x_R * G + B_R) + e_R) * R_{-L}\)
\(= s * (x_R * G * R_{-L} + B_R * R_{-L}) + e_R * R_{-L}\)
\(= s * (-x_R * B_L + B_R * R_{-L}) + e_R * R_{-L}\)
If we add the two together, the \(x_R * B_L\) cancel out, and we get:
\(s * (x_L * x_R * G + B_R * R_{-L}) + x_R * e_L + e_R * R_{-L}\)
The \(B_R * R_{-L}\) on the left side is just \(B_R * G^{-1}(-B_L)\), which is the definition of \(B_{out}\) for multiplication that we gave above. And the right side is error multiplied by low-norm values, so it stays as low-norm error.
And so the decryptor has everything they need to walk their way to \(c_f\).
If \(C(tag) = 0\), then the \(s * x_f * G\) term drops off from the definition of \(c_f\), and we just have: \(c_f = s * B_f + e_f\)
Once they get to \(c_f\), they prepend \(c_A\) to get \(c'_f = (c_A\ |\ c_f)\), which satisfies \(c'_f = s * (A\ |\ B_f) + e'_f\). They then compute:
\(c_{out} - c'_f * R_f\)
\(= s * D + e_{out} + \frac{q}{2} * m - s * (A\ |\ B_f) * R_f - e'_f * R_f\)
\(= s * D + e_{out} + \frac{q}{2} * m - s * D - e'_f * R_f\)
\(= e_{out} + \frac{q}{2} * m - e'_f * R_f\)
If the error did not blow up too much (remember, \(R_f\) is also low-norm), the decryptor can now freely recover \(m\).
Now, let's get back to the trapdoor mechanism.
We do not construct \(A\) randomly. Instead, we construct it as \(A = (A_{prefix}\ |\ G - A_{prefix} * R)\), where \(R\) is a low-norm "trapdoor", and \(G\) is the gadget matrix from before. Similarly to the FHE ciphertexts we saw, under the LWE assumption this kind of construction is computationally infeasible to distinguish from random if you do not have the trapdoor yourself.
The goal is to satisfy a neat mathematical property:
\(A * {R \atop I}\)
\(= (A_{prefix}\ |\ G - A_{prefix} * R) * {R \atop I}\)
\(= A_{prefix} * R + (G - A_{prefix} * R) * I\)
\(= A_{prefix} * R + G - A_{prefix} * R\)
\(= G\)
Or graphically:
We've constructed a matrix \(A\), where in some sense \(R \atop I\) is a sort of "inverse" of A (remember: \(G\) often plays the role of "pretending" to be the identity matrix in these types of LWE constructions).
The goal here is that we want to create matrices for which, for any known vector \(u\), only the creator can find a low-norm vector \(r\) where \(A*r = u\). That is, the trapdoor lets us "solve" the short integer solution (SIS) problem, which is closely related to LWE and is the basis for this type of cryptography working.
Here is the algorithm:
\(R\), \(I\) and \(z\) are all low-norm, so this is also low-norm. To see algebraically why this solves the equation, compute:
\(A * r \\ = A * {R \atop I} * z \\ = G * z \\ = u\)
To guarantee that this is safe (in the sense of not leaking \(R\)), real-world implementations add additional error at this step; see MP12 for more details on maximally efficient ways to do this.
Now, let's go back to the still-unsolved puzzle from above: computing \(R_f\).
The equation we are solving for is:
\((A\ |\ B_f) * R_f = D\)
We will interpret \(R_f\) as \(R_{top} \atop R_{bottom}\), so we get:
\(A * R_{top} + B_f * R_{bottom} = D\)
We sample a random low-norm error for \(R_{bottom}\), so now we only need to solve:
\(A * R_{top} = D - B_f * R_{bottom}\)
Here, we just use the method above row-by-row to compute each row of \(R_{top}\). And we're done.
An astute reader may notice a vague resemblance to ring signatures, at least for the 1-of-2 case: you compute one of the two terms "the easy way", and you compute the other value to match (which you're "not supposed to be able to do") using a trapdoor.
For clarity, note that the \(R\) in the trapdoor is never given out, not even to the decryptor. \(R_f = {R_{top} \atop R_{bottom}}\) is what's given to the decryptor.
Now, let's recap what we have:
The main limitation of this scheme is on circuit depth: each multiplication increases error by a constant amount, and so the deeper the circuits, the bigger the values and the matrices need to be to compensate. You can do arbitrarily deep circuits in theory, but if you try, the overhead of making the ciphertexts big enough to handle the error blows up. This is what leads to the depth limitation in basic succinct FE, which the wrapper we will discuss later removes.
And that's ABE.
By the way, it's also not this Abe.
Functional encryption for fixed-depth circuits
Surprisingly, at least for this bit, we're now done the hard math. The next section is going to be just stitching together lego bricks.
First, the definition of functional encryption:
A particular property that we'll be after is succinctness: like we saw with ABE, encrypting must be cheap, even if the circuit is large.
Here's the process (following the GKP+13 construction). First the authority's work. The authority chooses the circuit \(C\) that they are generating parameters for.
An alternative to issuing \(2*w\) ABE encryptions is to modify the ABE so that it simultaneously encrypts two messages, one decryptable only if \(C(tag) = 0\) and the other only if \(C(tag) = 1\). This adds some more complexity, but saves 2x overhead.
Now, the encryptor's work. The encryptor has a value \(x\) that they want to encrypt.
The decryptor does the ABE decryption procedure on the encrypted \(L_{in}\) labels using \(E_x\) as the tag, and thereby learn the input labels. They run the garbling and get the output.
Now, let's recap what we did here:
Functional encryption for unlimited-depth circuits
This one is a one-liner. Instead of ABE-encrypting to \(C\), you ABE-encrypt to the function that takes as input \(x\) and generates a garbled circuit for \(C(x)\) (using eg. \(hash(x)\) as the randomness for the garbling). Garbled circuit generation is parallelizable, so this is low-depth even if the circuit is high-depth. The decryptor runs low-depth FE to obtain the garbled circuit including the appropriate input labels, and then walks the garbled circuit in the clear to obtain the outputs.
And, we're done succinct functional encryption! To recap the whole thing, here's our original diagram again:
How do you build an XiO scheme?
Let us first remember the XiO definition:
"XiO" technically stands for "exponentially-(in)efficient indistinguishability obfuscation", but it's best not to think of it as being obfuscation at all. Instead, it's a way to compress a piece of data, in the special case where the data is generated by a program that is much smaller than the data, and where you want to hide the program. It's designed for "thunks" (zero-input functions) that output \(n\) bits, not what you might intuitively think of as "actual functions".
Most known constructions of XiO to date build upon a primitive called split-FHE. Here's a definition of split-FHE:
Here is how XiO built on split FHE works:
If we are encoding a thunk with output size \(n\), we treat it as a function \(Q\) that takes an index \(i \in \{1...n\}\), and outputs the i'th bit produced by the thunk.
We split up the input space into \(\sqrt{n}\) buckets each of size \(\sqrt{n}\). For each bucket, we FHE-encrypt (using GSW) each index inside that bucket (eg. \(1...\sqrt{n}\) for the first, \(\sqrt{n}+1 ... 2*\sqrt{n}\) for the second), and run \(Q\) on each index. Then, we use a special technique to convert the \(\sqrt{n}\) ciphertexts into a packed ciphertext that holds all values. And finally, we release a constant-size hint for it. The XiO is the set of all \(\sqrt{n}\) constant-size hints.
There are three technical pieces here to understand:
In reality, the three are connected. We don't ever use the packing in any context other than hint generation and hint-based decryption, and the changes to GSW make it unable to decrypt in any way other than hint-based decryption, so we may as well treat it all as one algorithm. Let's go through these in turn, following the WW21 protocol.
Changes to GSW
Remember plain GSW: \(C = A + \frac{q}{2} * m * G\), where the freshly-generated-per-message \(A\) satisfies \(s * A = e\) for low-norm \(e\).
Here, we modify the GSW to: \(C = A * R + \frac{q}{2} * m * G + E\) , where \(A\) is a fixed "public key" matrix, and \(R\) is randomly chosen freshly each time but is low-norm. \(E\) is a new error introduced to protect \(R\) from being revealed, also randomly chosen freshly each time but low-norm.
Importantly, \(A\) is tall and narrow, and \(R\) is wide and short. The small dimensions of the two need to match. This is important, because this dimension is what determines the size of the hint. The hint can't just be one value, it needs to have some width to preserve LWE security, but it can be much smaller than the sizes of the matrices, and hence the vectors it will be helping to decrypt.
Also importantly, the multiply-by-s trick used to decrypt gets blown up by the new error matrix, so there is no longer a working "decryption key". The ONLY thing you can do to ciphertexts, other than evaluating them even more, is something like the hint-based decryption mechanism. For this reason, the authors are clear to call it a homomorphic commitment scheme, and not homomorphic encryption.
Packing and hints
Suppose you have several ciphertexts of the above form: \(C_1\), \(C_2\) ... \(C_k\). They all share the same \(A\), but have different \(R_i\) matrices, and different \(m_i\) (messages).
Here is how we combine them all into one vector:
Let \(u_i\) to be the vector that has the same width as \(R\) and \(G\), and that is all zeroes, except it's a 1 at the \(log(q) * i\) position (eg. if \(q=16\), for \(i=1\), it's \([\ 0\ 0\ 0\ 1\ |\ 0\ 0\ 0\ 0\ |\ 0\ 0\ ...\ ]\) (separators added for clarity). Notice that the \(log(q) * i\) position is exactly where the i'th row of the \(G\) matrix has its highest value.
Compute \(c_f = C_1 * u_1 + C_2 * u_2 + ... + C_k * u_k\)
If \(q\) is not a power of 2, you have to do something slightly more complicated: instead of the i'th block of \(log(q)\) being many zeroes followed by a one, it must be the least-significant-bit-first binary encoding of \(\frac{q}{2}\). The former is the \(q = 2^k\) special case of the latter.
That's it. To compute the raw hint, we "track" the \(R\) matrices through the additions and multiplications in the FHE along with the ciphertexts (exercise to the reader: go through the "track the \(R\)" algebra for ADD and MUL yourself; it follows closely the same \(G\) and \(G^{-1}\) based approach used with \(C\) to keep error low-norm), and then apply the exact same procedure to \(R\): \(r_f = R_1 * u_1 + R_2 * u_2 + ... + R_k * u_k\)
Now, remember that \(C_i = A * R_i + m * G + E_i\), so if you subtract \(A * R_i\) from \(C_i\), you get \(m * G + E_i\), which has \(m\) in \(log(q)\) adjacent positions on each row of the matrix. Here, though, we are dealing with many different \(R_i\)' values, and we want to have a single operation that reveals all \(m_i\)'s at once. To do this, we are employing a trick: in the i'th row, we are using \(R_i\). So if you subtract \(c_f - A * r_f\), the i'th row of the result will correctly cancel out \(A * R_i\), and give you the value \(m_i * q/2\) that you need (plus an error).
There is only one remaining problem: the raw hint \(r_f\) on its own is not a secure hint. It reveals too much information about the \(R_i\) matrices, which are meant to be private. To deal with this problem, we mask it:
The verifier knows \(c_f\), so they can then compute:
\(c_f + b - A * d_f\)
\(= (A * r_f + m * G + e_1) + (A * s + e_2) - (A * r_f + A * s)\)
\(= m * G + e_1 - e_2\)
For security, the error in the sample must be much larger than the other errors, so as to fully hide information about them. So we have two layers of scale separation: the sample error is large enough to smudge the other error, and the message is large enough to avoid being corrupted by the sample error.
This second version of the scheme is secure, but it has a new big problem: the sample \(b\) is full-width, not short like the hint. So we make one more modification: instead of releasing \(b\) directly, we release a gadget that creates it.
Here is what we do:
The generator makes a fresh encryption of a \(seed\) (as in, a value that you generate hashes from), and releases an FHE encryption of the seed: \(A * R_i + seed[i] * G + E_i\) for each bit in the seed
For each bucket \(\beta\), the generator and the decryptor can both do the following computation:
Compute \(A * hash(seed, \beta) + e_\beta\) inside FHE (where \(e\) itself is error generated by taking another \(hash_2(seed, \beta)\)), and get ciphertexts: \(C_i = A * R_i + m_i * \frac{q}{2} + E_i\), where \(m_i\) is the i'th bit of \(A * hash(seed, \beta) + e_\beta\).
Apply the packing procedure we described above, but with a \(u_i\) for each of the \(k * log(q)\) columns, not just the \(k\) columns at multiples of \(log(q)\) minus one. For clarity, we'll treat this as a sum with two indices: the within-block index \(0 \le t \lt log(q)\), and the block index \(0 \le j \lt k\). We'll treat everything as zero-indexed here to make the math cleaner.
The output of the packing procedure is \(c_g = A * r_g + A * hash(seed, \beta) + e_\beta + e_g\). Notice that \(A * hash(seed, \beta) + e_\beta\) lives in the place where normally \(m * \frac{q}{2}\) would live. This is a direct result of the way that we modified the packing procedure above. The expression for \(c_g\) can be rewritten as \(c_g = A * (r_g + hash(seed,\beta)) + e\)
The generator can compute \(r_g + hash(seed,\beta)\) in the clear, the evaluator can't. So the generator uses this as the \(s\) value that they mix in to the hint.
The decryptor's calculation is the same as before, except that they don't receive \(c_f\) from the generator; they generate it themselves using the method above. The math for why it cancels is also exactly the same.
The papers use \(prf\) ("pseudorandom function") to refer to the hash. This just means that the hash needs to have strong randomness properties (much stronger than collision resistance, but weaker than random oracle); if you trust hashes to satisfy the random oracle model, then you can safely think \(prf = hash\) everywhere.
And that's it. The XiO is:
Evaluation requires re-doing the execution just for one particular bucket, then using the hint to decrypt the final vector, and finally taking out the value(s) you need from that bucket (in the obfuscation use case, the decryptor is taking out all values from all buckets).
There is one further optimization to maximize efficiency while guaranteeing provable security. The XiO is \(O(\sqrt{n})\) in the size \(n\) of the thunk it's encoding, but it requires publishing trusted setup parameters of size \(n\) (these exist for a technical reason: the function that generates the \(A * s + e\) samples that compensate for the hint needs to have a never-used path that accepts and uses fully random values, this allows for the security proof). This trusted setup is length \(n\). If you split the XiO into \(\approx n^{\frac{1}{3}}\) separate XiOs, then the size of the XiO goes up to \(\approx n^{\frac{2}{3}}\) but the trusted setup (which can be reused by all of them) goes down to \(\approx n^{\frac{2}{3}}\) too.
XiO and security assumptions
All parts of the obfuscation pipeline other than XiO can be justified with fairly "standard" lattice-based assumptions. Ultimately, all cryptography is just a guess that some mathematical problem is "actually hard" and it's not just that we've been too dumb to solve it so far but we might figure it out next year. But some guesses are more well-founded than others. Among all quantum-safe primitives, the basic LWE assumption underlying lattices is typically considered less robust than hashes, but more robust than almost anything else (including more complex lattice assumptions).
But with all existing XiO primitives there is a problem. The design above is not strictly provable under the lattice assumption. It is made provable under the assumption of the form "if you publish \(A * s + e\)-style ‘samples', and also publish a little bit of extra information, then we're still fine". But there's a problem: such assumptions have been broken before, and ultimately got broken here.
In 2013, an early candidate for multilinear maps (a very strong primitive that, if we had it, would make obfuscation far easier) was published in CLT13. In 2015, however, it was broken.
The object that CLT13 released was \(p_{zt} \approx z * \frac{h}{g}\). Ciphertexts are of the form \(c = \frac{a + gr}{z}\). If you multiply a ciphertext of \(a=0\) with \(p_{zt}\), you get \(c * p_{zt} = \frac{gr}{z} * z * \frac{h}{g} = r * h\). \(r\) and \(h\) were both very "small" values (much smaller than the modulus), and so the two multiplied together will also be much smaller than the modulus. Thus \(p_{zt}\) functions as a "zero check" key, which is what allows you to do pairing-check-style operations on lattice ciphertexts (which unlike pairings, can multiply to arbitrarily high depths if you scale the parameters).
The problem: \(h * r\) reveals an exact expression in the underlying secrets of the scheme over the integers, an so if you have enough such values, you can perform some matrix operations and recover the secrets.
The HPLS conjecture, the core new assumption in WW21, has what is on surface also a way to leak what is "just the error":
Notice that even if extracting \(e\) is fatal, WW21's security still has another line of defense: the adversary doesn't learn either one of the fatal secrets (\(r_f\) and \(s\)), they learn the sum of the two, and maybe there aren't any useful combination attacks that start from that sum. But even extracting \(e\) probably is fine. The attack in HJL21, which broke the HPLS assumption, managed to do one specific thing: they figured out how to adversarially construct circuits in such a way that the lowest-order bit of the error (which survives "smudging" tricks that normally protect schemes like this) encodes a variable that outputs whether the execution is "real" or a "simulated" execution used in the proving process. To summarize:
Hence, even if the assumption is broken in limited cases, the security of WW21 is probably fine.
A new protocol was published in 2025, which adds some extra complexity to the "help the decryptor discover samples of the offset to the hint" piece of the WW21 technique to make it depend on lattice assumptions that are more limited than HPLS, but are still more aggressive than plain LWE. In particular, the assumption that HJL25 uses falls into a framework of "only publish objects where the secret-bearing part gets mod-q wrapped" (which excludes both making available "raw error" and CLT13's \(r * h\)). This is not as conservative as plain LWE, but it is a meaningful improvement. The space between "plain LWE" and "the secret-bearing part must be mod-q wrapped" contains assumptions like circular security, as well as the assumption in HJL25. There are counterexamples to assumptions of this type, but they are highly contrived, and real protocols based on these types of assumptions have proven secure so far.
One analogy you might have is to other "idealized black box" assumptions:
The ROM has so far proven fine in practice, but researchers have designed signature schemes that are provably secure in the ROM but easily attackable if instantiated with any real hash algorithm. And the generic-group model is known to be false in a few cases, most notably order-finding via Schoof's algorithm, and pairings. Elliptic curve security rests on a shaky foundation that you can violate the generic-group model in those two ways, but not any more harmful ways.
The one family of protocols that proves iO entirely from "standard" assumptions is JLS20 and descendants, but these are not quantum-safe.
Summary: cryptographic assumptions are a complicated zoo, and there are no absolutes, only different comfort levels.
Another important note is that if we want to be provably reducible to these security assumptions, then the security proof of the obfuscation of a \(k\)-bit-input program needs to "walk over" all possible \(2^k\) inputs. For each input, you consume a little bit of security margin. Thus, the proof is consuming security margin potentially way more than \(2^{128}\) times. For this reason, all of these obfuscation protocols require the LWE parameters to be subexponentially secure, which means the ciphertexts are many thousands of times larger than regular LWE ciphertexts (and because of the stacked nature of the protocol, this "many thousands" itself compounds on itself several times).
If you wish, you can "yolo and ignore this", the same way that recursive STARK protocols yolo and ignore the fact that they require the hash function to simultaneously be a random oracle and be a concrete thing that can be instantiated by a circuit. Doing that is plausibly okay, but if you do enough plausibly okay things for efficiency, you run into the philosophical question: why even bother to fight so hard to make obfuscation provably secure under standard assumptions, instead of adopting one of the other approaches (eg. diamond iO) that are provably secure under aggressive assumptions but have far better efficiency?
In summary, the answer to "what cryptographic assumptions do you need to make a working obfuscation protocol" is:
XiO and trusted setups
Finally, another obvious but important thing to highlight: the security of obfuscated programs depends on trusted setups in a whole bunch of places, and so we are very far from being able to do fully trustless obfuscation (ie. generate programs with internal secrets in such a way that even the obfuscator does not know the secret). The best that we can do is replace the single party trusted setup with an N-party trusted setup.
Here is one natural way to do this.
We define a generator \(G(H, s)\) which returns a two-input program \(F(\beta, x)\) with the following behavior:
By convention, we will say, \(G(G, x) = G_x\) (note: this is using a trick similar to the Y-combinator) and \(hash(s_i, r_{i+1},1) = s_{i+1}\).
Here's how a chain of N participants would look:
Each participant would also run their output \(G_{s_k}\) with \(\beta = 0\) on \(P\) to obfuscate \(P\) with \(s_k\) (the mix of all previous secrets) as the secret, and publish this obfuscation. Only the last participant's obfuscation will actually mix in everyone's secrets, but in our model, we are assuming that participants come in one after the other, and none of them knows if they will be the last.
This way, the setup is universal and updateable, the gold standard for trusted setups.
And to verify the correctness of each step, each participant would need to make a recursive STARK proving that (i) the previous step was recursive-STARKed, and (ii) they added their randomness correctly.
However, this method is extremely expensive: it requires a STARK of obfuscation of obfuscation. An open challenge is to figure out the ideal way to do universal and updateable trusted setups for obfuscation.
So what are the runtimes?
Perhaps the most famous fact about these obfuscation protocols is that their runtimes are technically polynomial, but in practice galactic. To see why, we just need to look at the full tower again:
The total theoretical overhead here is thus roughly: \(ABE * FHE * GC * XiO * sFE_{enc}\), where \(XiO\) is mostly FHE. All that per bit in the input.
Now, on top of all that, the parameter choices for each individual scheme have to be very conservative for subexponential security, adding over 1000x extra overhead at each LWE layer.
This is why these schemes are far from practical today; expected runtimes are longer than the lifetime of the universe.
Where do we go from here?
There are three routes:
Routes (2) and especially (3) also imply that we get better at battle-testing such assumptions, again perhaps with AI help.
The next posts in this series will talk about what is perhaps the leading construction attempting to do (2), diamond iO, and what is perhaps the leading construction attempting to do (3), local mixing obfuscation.
If we succeed in either path, the reward is high: there is a real sense in which we will have "solved cryptography": any protocol that can be described using an idealized trusted third party, provided the adversary is allowed to rewind the clock, will be implementable securely. But getting there is still a formidable challenge.