How do we show arithmetic theories undecidable?

Or, how do we know when we don't know?

Hera Brown

Department of Computer Science, University of Oxford, UK

In computer science, we like to show that things are decidable, and indeed the method to show that a problem is decidable is easy enough; simply give an algorithm that decides your problem. But we would also like to show that some things are undecidable (generally so we can stop trying to find algorithms for them).

The trouble is, the way to do this seems far less obvious than with proving a problem decidable. How can we show that there exist no algorithms that decide a theory? There are infinitely many possible algorithms, and so it’s not as though we can go through and show that none of them work. We need to use another method.

This is a particularly interesting problem when considering arithmetic theories, since even the algorithms for decidable theories can be very complex. There are some subtle tricks that can be used, and some interesting connections to other areas of computer science and mathematics can be made. With that in mind, I’ll focus on the (un)decidability of arithmetic theories here.

In this blog I’ll consider methods that we can use to show a theory undecidable. I’ll start by giving Church’s proof for the undecidability of first-order logic (and, by extension, the theory $\pa$), and then give a few examples of reductions that we can do to show decision problems undecidable.

The first method: directly

How do we show that a problem is undecidable if we’ve never shown that anything is undecidable before? Well, we have to prove undecidability directly. This is precisely what was done in the 1930s, by Church and Turing and Kleene, amongst others, and so let’s see how they got their undecidability results back then.

If there’s any arithmetic theory that’s canonically undecidable, it’s the theory of the natural numbers with addition and multiplication $\theory{\Int; <, +, \times}$. We’d expect to be able to prove this undecidable, and indeed we can.

At this point one might ask why we’re invoking Church, Turing and Kleene when Gödel was working with the theory $\pa$ already in 1931. But note that Gödel’s incompleteness theorems don’t show, by themselves, that the theory $\pa$ is undecidable. They only show that any consistent formal system that proves all the valid formulae of $\pa$ is incomplete. This is different from undecidability

Nevertheless, the theory $\theory{\Nat; +, <, \times}$ is undecidable. Here I’ll focus on Church’s proof of the undecidability of first-order logic generally, and then apply that proof to show that the theory $\pa$ is undecidable; I feel like Turing’s proof is sufficiently well known at this point (and if it remains unknown to the reader, then do read his paper [10]!).

For a more thorough exposition and exploration of the decision problem, do read Börger, Grädel and Gurevich’s book The Classical Decision Problem [4].

Church’s Proof

As a first note, Church’s paper [6] is a delight to read. It’s clear, and the main formal argument at the end of the paper is sandwiched by two informal accounts of the proof that clarify exactly what all the fraktur symbols are doing. I hope one day to write papers as well as Church does.

But onto the proof itself. The idea of Church’s proof is that we can capture the notion of computability in the $\lambda$-terms (lambda-terms) he uses. Said $\lambda$-terms come in three forms:

We’ll denote generic $\lambda$-terms with the small roman letters $s, t, u$ and, when we start to talk about functions, we’ll denote $\lambda$-terms using a boldface font.

These $\lambda$-terms can reduce to one another by renaming and substituting subterms in specific applications. More precisely, there are two ways in which terms can reduce to other terms:

Thinking about these definitions, we can see that some terms can’t beta reduce to anything else (as a simple example, we have $x$ by itself); we say that these terms are in normal form. As Church says, we can prove that every term, if it has a normal form, has a unique normal form that can be reduced to in a finite number of reductions. The intuition for this is that we don’t really lose any information we’d later need by doing a reduction, and so ultimately we’ll always reduce to the same normal form if we reduce to a normal form at all.

We can start to talk about mathematics using lambda-terms by introducing numerals, which are $\lambda$-terms that stand for natural numbers. We define these numerals $\mathbf{1}, \mathbf{2}, \dots$ by the following scheme:

and so on.

We can then start to talk about functions on numeral using $\lambda$-terms, starting from basic functions such as the identity function, and building up from there. This leads us onto a notion of $\lambda$-definability; we say that a function on naturals $f$ is $\lambda$-definable by a lambda-term $\mathbf{f}$ just when, whenever $f(n) = m$ we have the normal form of $\mathbf{f}\mathbf{n}$ be $\mathbf{m}$.

It’s at this point Church introduces the notion of recursiveness, what it means for a function to be recursive, and that the recursive functions are precisely the $\lambda$-definable functions. But he notes that the work of Kleene [7] shows that we can do without this (though we can also do our undecidability proof just using recursiveness and avoiding $\lambda$-definability as well!), and so we’ll just assume that the class of recursive functions, and the class of $\lambda$-definable functions, are the same class. We’ll take recursiveness, $\lambda$-definability, and decidability to all be equivalent notions.

Church introduces all this to get to the point of the argument: he argues that the set of all $\lambda$-terms that have a normal form isn’t recursive; that is, there’s no recursive function $f$ such that $f(s) = 1$ if $s$ has a normal form and $f(s) = 0$ otherwise. Since we’re equating recursiveness with decidability, it follows that there’s an undecidable set such that it is undecidable, in general, whether a given term $s$ is a member or not.

Now, to do this, since recursive functions are defined only on the natural numbers, we need a way to talk about $\lambda$-terms as if they were natural numbers. We can do this using Gödel numbering, where each $\lambda$-term is assigned a unique natural number according to some somewhat arbitrary numbering system. The particular numbering may vary, but here we’ll stick to using Church’s scheme where we assign each symbol of a lambda-term to a prime number, and take the product of all the prime exponents of those numbers.

With Gödel numbering in hand, we can then talk about $\lambda$-terms using other $\lambda$-terms; we define functions on the numerals which represent Gödel numbers of terms. This can get a little confusing, because there are lots of different things floating around that may be terms and that may be numbers. The upshot of it all, though, is that functions on sets of $\lambda$-terms are $\lambda$-definable.

We now show that the set of lambda-terms isn’t recursive. To show that the set of lambda-terms isn’t recursive, we use a diagonalisation argument. We assume for contradiction that the set of $\lambda$-terms is recursive, and since we’ve assumed that recursiveness and $\lambda$-definability are equivalent, we get a $\lambda$-term, call it $F$, that when applied to the Gödel number of a lambda-term, tells us whether that lambda-term has a normal form.

Using this, we can then tell what the normal form of a given term $s$ is, if it has one. To do this we first check that $F s = 1$ i.e.\ that $s$ has a normal form, and then simply enumerate all the terms that $s$ reduces to. Since every term with a normal form reduces to that normal form in a finite number of steps, we can find the normal form of any given term in finite time, using $\lambda$-definable functions.

We then set up a diagonalisation. Let $s_1, s_2, \dots$ be an enumeration of all $\lambda$-terms that have a normal form. Define a function $E$ as follows:

Such a function, by all our assumptions thus far, is $\lambda$-definable, and so we’ll say that the $\lambda$-term $\mathbf{E}$ defines it.

It must be the case that $\mathbf{E}$ has a normal form, since $(\mathbf{E})(\mathbf{1})$ has a normal form (recall the definition of $\mathbf{1}$ as $\lambda a b . (a)(b)$). But it can’t be that $\mathbf{E}$ is in the enumeration $s_1, s_2, \dots$, since we know that for every natural $n$ the term $(\mathbf{E})(\mathbf{n})$ isn’t convertible into $(s_n)(\mathbf{n})$.

We assumed at the start of this argument that the enumeration $s_1, s_2, \dots$ contained all $\lambda$-terms that have normal forms, and we’ve just shown that there’s a $\lambda$-term with a normal form that’s not a member of $s_1, s_2, \dots$. This is a contradiction. So the set of $\lambda$-terms with a normal form isn’t recursive.

Thus we have a non-recursive set of natural numbers.

How this applies to our theory

This is all well and good, but nowhere in the above does the theory $\theory{\Nat; +, \times}$ appear. But if we can define Church’s non-recursive set using the theory $\theory{\Nat; +, \times}$, then we’ve shown that the theory must be undecidable. Since we’re equating the notion of decidability with that of recursiveness, the theory $\pa$ can’t be decidable if it defines a non-recursive set. If it did, then that set wouldn’t be non-recursive, since we could decide it using the theory $\pa$.

To show that Church’s set is definable by the theory $\pa$, we recall the Gödel numbering we chose for our $\lambda$-terms. The beauty of Gödel numbering is that it translates syntactic facts about $\lambda$-terms into number-theoretic statements that we can manipulate in the language of $\pa$.

For instance, suppose that we’ve assigned the symbol $\lambda$ to the number $1$. Recall that a term is an abstraction just when it’s of the form $\lambda x . s$. In particular, a term is an abstraction iff it begins with a $\lambda$ symbol. We can define our Gödel numbering such that the term $s = \tau_1 \tau_2 \dots \tau_n$ has Gödel number $2^{\gnum{\tau_1}} \times 3^{\gnum{\tau_2}} \times \dots \times p_n^{\gnum{\tau_n}}$, where $\gnum{\tau}$ denotes the Gödel number of the symbol $\tau$, using the fundamental theorem of arithmetic to guarantee that each Gödel number corresponds to only one term. To check if a term is an abstraction, we just need to check whether its first term is a $\lambda$, and so we just need to check whether it has a factor of $2$.

Building up in this way, we can define sets of Gödel numbers of terms with specific properties in the theory $\pa$, express that one term reduces into another, and so on. Doing so, we get sufficient machinery to run Church’s proof that there’s a non-recursive set definable this time by the theory $\pa$, and we get exactly what we want. Thus the theory $\pa$ is undecidable.

The second method: reductively

Church’s proof, as most undecidability proofs, is a very impressive metamathematical proof of undecidability. Defining the syntax of the $\lambda$ calculus inside of the theory $\pa$ shows just how expressively powerful that theory is. But we’d like to avoid having to run a whole metamathematical proof every time we prove another theory is undecidable.

Thankfully, there’s an easier way to show that a theory is undecidable, and we do so using a reduction from one theory to another. Simply put, if we have a theory of the form $\theory{\Nat; f}$ for some operator (or set of operators) $f$, then we show that we can define both addition and multiplication in that theory. Note that we have to define both addition and multiplication, since the theory $\theory{\Nat; times}$ is decidable [3]! Once we’ve defined addition and multiplication in our theory, we have all the ingredients to emulate the $\lambda$ calculus and so define Church’s non-recursive set.

The undecidability of the theory of addition and squares

As a first example, let’s show that the theory $\theory{\Nat;, +, x \mapsto x^2}$, where $x \mapsto x^2$ is the squaring function, is undecidable. At first glance $x \mapsto x^2$ looks too similar to multiplication to give us a decidable theory, and indeed we do get an undecidable theory.

In particular, recall that $(a + b)^2 = a^2 + 2ab + b^2$. Rearranging this, we get that $(a+b)^2 - a^2 - b^2 = 2ab$. By instantiating the variable $c$ and saying that $c + c = 2ab$, we thus get a variable $c$ that is set equal to $ab$.

We can define all this in our theory as

\[ \mu(a,b,c) \defeq c + c = (a+b)^2 - a^2 - b^2 \]

and thus get a formula $\mu(a,b,c)$ that holds iff $ab = c$. We’ve therefore defined multiplication in our theory, and thus shown that the theory $\theory{\Nat; +, x \mapsto x^2}$ is undecidable. And we didn’t even have to do any metamathematics.

Note that the undecidability of this theory first appears to have been noticed by Tarski [9], though it also seems most sources in the literature cite Putnam [8] (who cites Tarski on the result), and even sometimes [5] even though the latter proves the far more general result that, if $P$ is the image of any polynomial of degree $\geq 2$, then the theory $\theory{\Nat; +, P}$ is undecidable. It could be that someone earlier than Tarski noticed the result; in any case the citations can be chased further back, and it’s interesting to see who first came up with obvious results.

The undecidability of the theory of addition and Hardy field functions

We can extend this idea of images of polynomial functions giving undecidable theories, and argue that anything that looks at least a little bit like a polynomial is undecidable. We can formalise this notion, and consider whether the theory $\theory{\Nat; <, +, f}$ is undecidable, where $f$ is a Hardy field function. This was the topic of a paper I worked on recently [2].

The formal definition of a Hardy field function is technical—if you’re interested, then refer to [1]—but suffice it to say that, when restricted to polynomial rates of growth, they nicely capture the intuition of functions that look like polynomials. Indeed, the class of all real logarithmico-exponential functions (that is, all functions built up from real polynomials, exponentiation, and the taking of logarithms) are Hardy field functions.

Of course, since the universe of our logical theory is just $\Nat$, we can’t generally instantiate real-valued functions in our theory. We therefore round them to the nearest integer using the $\nint{\cdot}$ operation, which does what it says on the tin. We therefore consider the theory $\theory{\Nat; <, +, \nint{f}}$, where $\nint{f}(x)$ is defined as $\nint{f(x)}$. So, for example, we may be dealing with $\nint{x^{5/2}}$, which over the naturals $0,1,2,3,4,\dots$ takes the values $0, 1, 6, 16, 32, \dots$, which does look like it could be a polynomial.

Now, we prove the such a theory is undecidable in three different cases:

We treat each of these cases in turn.

The super-linear case

If $f$ grows quickly enough, then any Taylor polynomial (i.e. any Taylor expansion of $f$ about some point or other) that approximates $f$ closely enough will grow quickly enough to be problematic.

But not any Taylor polynomial will do. We want, given a function $f$, a formula $\mu(a,b,c)$ that defines multiplication. If we have a polynomial function, we can use a similar idea to the squaring case above; given a Taylor polynomial of degree $d$ for $d \geq 2$, we just take its discrete derivative $d-1$ times and get out some function of $a$ and $b$ equal to $kab$ for some constant $k$. The trouble is that we want a fixed formula and so need to fix $d$. But if we want a good approximation of $f$, then that approxmiation may have arbitrarily high degree $d$. So it seems we’re stuck.

We’re saved, however, by the somewhat unintuitive fact that if $f$ grows faster than the polynomial $x^d$ then we can always find a Taylor polynomial of degree $d+1$ that approximates $f$ over arbitrarily long intervals. For our example of $f = x^{5/2}$ above, this means that there’s a cubic Taylor polynomial that differs by at most, say, $\epsilon = 0.01$ with $f$ over some interval of length, say, $1\ 000\ 000$. The function $f$ therefore looks like a degree $d+1$ polynomial over arbitrarily long intervals.

This is a little confusing at first, given that when learning about Taylor expansions we’re usually told that, for better approximations, we take higher-degree expansions. This is even more confusing when remembering that even high-degree expansions start to fail over relatively short intervals. But the trick we’ve used here is to approximate $f$ over different points from the origin. The further we get from the origin, the better the Taylor polynomial approximates $f$, and so we use this trick to get arbitrarily good polynomial approximations of $f$ of fixed degree.

A further nice result is that, if we can get arbitrarily close approximations of a Hardy field function (which we can), then there are integer polynomials (i.e.\ polynomials with integer coefficients) that are arbitrarily close approximations of that Hardy field function. Since we can only talk about integers in our theory, this means we can safely differentiate our function and get out multiplication without having to worry about information being lost by rounding non-integer values.

In short, then, we:

We’ve defined multiplication in our theory, and we already have addition by assumption, so we have an undecidable theory.

The sub-linear case

If $f$ grows slower than a polynomial, then we can’t just differentiate it, since we need something that grows at least polynomially quickly to be able to differentiate and get a multiplication of two variables. But what we do know about functions in the near-linear case is that they’re near to linear functions—hence the name—yet still have arbitrarily large gradients. We use this to our advantage.

We can define multiplication in the near-linear case by defining sufficiently long multiplicative intervals, which are regions of the function where the gradient remains constant. If we have one of these regions, which perhaps looks something like $10, 16, 22, 28, 34$, then we can define a limited form of multiplication over such intervals. In the above example, there’s a difference of $6$ between terms. If we want to know that $6 \times 3$ is, then, we just have to look at the third value of the sequence (assuming we start counting from zero), and then take off the initial value of the sequence from that. In this case, we get that $6 \times 3$ is equal to $28 - 10$, and that’s equal to $18$ as we’d want.

There is a problem here, though; such intervals have finite length, and since we’re assuming $f$ is an increasing function, once we’re done with an interval we’ll never see an interval with that step again. In the above example, we can’t multiply $6$ by $10$, since we don’t have a tenth value of the sequence to look at.

We avoid this problem, however, by taking difference sets. More specifically, notice that $a \times b = c$ just when, for some other integers, we have $a’ \times b = c’$ and $a’’ \times b = c’’$ and both $a = a’’-a’$ and $c = c’’-c’$. All of $a’, a’’, c’$ and $c’’$ can be as large as we want them to be, and this is just what we need to solve our problem above.

Thankfully, we have enough intervals of arbitrarily large lengths to be able to use these difference sets properly, and so we can define multiplication using them.

In short, then, we:

We’ve again defined multiplication in our theory, and so we have an undecidable theory again.

The near-linear case

If $f$ grows slower than a linear function, we can’t rely on it having arbitrarily large multiplicative intervals, since the first derivative of $f$ is decreasing. What we do know, however, is that the inverse of a function that grows slower than $x$ ends up growing faster than $x$. In particular, if a polynomial grows slower than $x$, then the inverse of that polynomial is a polynomial that ends up growing faster than $x$. We’ve already shown in the above that such polynomials that grow faster than $x$ give us undecidable theories, and so we just need to show that we can define the inverse of $f$ in our theory to get our undecidability result.

With defining the inverse of a function, it helps to know that if $f$ is a strictly increasing Hardy field function, then $\inv{f}(x)$ is just the least value $y$ such that $f(y) \geq x$. Now, Hardy field functions aren’t in general strictly increasing. But, as it happens, all Hardy fields are eventually increasing or eventually decreasing. We can use this to our advantage, and offset our Hardy field function by however much we need to get a strictly increasing function.

Using this, we can define the inverse of the Hardy field function $f$. The minimum of a set of values can be defined in Presburger arithmetic, so long as that set of values can be defined, and we can certainly define the set of all integers greater than $x$ for any given $x$. So we have all the tools we need to define the inverse of a Hardy field function within our augmented Presburger arithmetic, and so we have an undecidable theory.

In short, we:

Concluding Remarks

With all this said and done, it’s interesting to see what tools we use when proving the undecidability of a theory. We can start from a metamathematical proof that only briefly touches on numbers at all, and end up doing proofs of a number-theoretic flavour by the power of reductions. We haven’t even mentioned automata-theoretic or geometric decidability and undecidability results, and these are again possible because, in logic, everything is connected.

These interesting connections aren’t only present in undecidability proofs either; they can be used to provide decidability results, and show interesting connections between the complexities of various decision problems using similar reductions to what we’ve used here. There’s a lot more to be said on the subject of how we show theories undecidable, then, but here we’ve looked at a few ways one might do so.

Bibliography