Epistemic status: unhinged, mostly subjective, yet fun.


Motivations

The main objective of this post is to expose the molten core of mathematics and figure out from first principles what it means to “do mathematics”. There is no better stage for this than Set Theory. What follows is not the classical introduction to Sets. This part is heavily inspired by Tom Leinster’s Rethinking Set Theory.

This is an attempt to expand on his paper. For further details please visit his website and his other mind-bending papers.

The guidelines we’ll follow:

  • Start blind: every definition, property or inference step must be designed from first principles.
  • Mathematical objects/entities embody ideas: each entity instantiates a distilled intuition.
  • Relationships first: treat entities as black boxes; prescribe only their interactions.
  • Results shape the entities.
  • Layered abstractions: relationships can be minted as “higher” entities.

The journey ahead:

Truth → Logic → Sets → Elements → Structures

Proto-Mathematics

First we need to set up a clean “pseudo-code” to slowly build the primitives we need to actually do mathematics.

A formal language is a collection of symbols with which we can construct “well-formed” formulæ or propositions, according to a chosen collection of meta-linguistic rules.

The word collection here is not to be confounded with the word set. We cannot speak about sets, yet. For this section we rely on the natural language substrate and assign mathematical meaning to all this post-facto.

It looks weird at first sight, that we are loosely defining stuff that appears to have no foundations. My take is that this is both natural and necessary. We should look at this process like creating a new tool. You build a crappy hammer only to build a better one with it. The difference here is that there is no concrete object to design—we are designing a language. Once we have the formal language of set theory, we’ll have a formal way to think about all of this. But we need to get there first, somehow.

What is true?

Without truth, there is no context. Without context, nothing is distinguishable from the background noise of possibility. Mathematics needs ground to stand on.

But how do we speak about truth without already knowing what truth is? This is where Tarski’s insight rescues us from circularity. He doesn’t define truth—he constrains it. Any notion of truth worth having must satisfy two properties:

  1. Non-circularity: Truth cannot define itself.
  2. Material adequacy: Truth must bridge language and reality.

The second constraint seems almost trivial until you see what it demands. Consider:

“Today it snowed” is true if and only if today it snowed.

This looks like a tautology, but it’s not. The left side represents syntax—chicken scratches on paper, symbols in our formal system. The right side mentions semantics—actual frozen water falling from lurking sky. Truth is the bridge between symbol and reality.

When every well-formed formula in our language has a definite truth value—when every statement can be decisively evaluated—we have a fully interpreted formal language.

We’ve specified how truth ought to behave. We didn’t implement it yet. Truth will be instantiated in different forms at different levels: as “Membership” in Set Theory, as “Satisfaction” in model theory, as “Provability” in formal systems. For now, we need only the promise that such a bridge can be built.

What we’re constructing mirrors the stack of computational abstraction:

Mathematical LayerComputational Analog
Propositional logicMachine code
Predicate logicAssembly language
Set theoryHigher-level language

Each layer provides the primitives for the next. Each makes possible what was previously inexpressible.

We begin with the atoms.

Propositional logic: Composing truths

We have our formal language: propositions P, Q, R… things that are either true or false. Isolated statements about the world.

The main goal is to be able to reason, meaning we want to connect and derive propositions from other propositions. Reasoning isn’t just about isolated facts—it’s about relationships between facts. If I tell you “it’s raining” and separately “the streets are wet,” you naturally want to connect them: “it’s raining AND the streets are wet.”

Its truth is completely determined by the truth of its parts. The compound statement contains no new information—it’s a way of structuring information we already have.

This is the main principle of compositionality: the truth of complex statements is mechanistically determined by the truth of simpler ones.

We need operations that let us build these compound structures:

  • AND (∧): Both must hold. P ∧ Q is true only when both P and Q are true.
  • OR (∨): At least one must hold. P ∨ Q is true when either P is true, or Q is true, or both.
  • NOT (¬): Negation. ¬P is true when P is false, false when P is true.
  • IMPLIES (→): If P then Q. P → Q is false only when P is true and Q is false.

These aren’t arbitrary choices—they’re the natural joints where thought articulates. The key insight: these operators don’t add new propositional atoms to our language. Given propositions P and Q, the compound “P ∧ Q” is a derived statement whose truth value is fully determined by P and Q.

a ∧ b is true ⟺ both a is true AND b is true

This is the essence of compositional reasoning: complex structures built from simple parts, with no ambiguity about how truth propagates upward.

But propositional logic has limitations. We need another layer.

Predicates: Abstracting properties

Propositional logic limits us to statements about specific things: “my apple is red”—true or false. But we want to describe redness itself, not just this particular apple’s color.

We introduce a parameter, shifting from:

  • P = “My apple is red” (a complete proposition)

to:

  • P(x) = “x is red” (a predicate with a free variable x)

By refusing to specify what x is, we’ve created a template. P(x) isn’t a statement about any particular object—it’s a condition waiting to be applied. Think of it as a correspondence:

\[P: \text{Objects} \to \{\text{true}, \text{false}\}\]

When we write P(x), we’re describing the condition an object must satisfy to possess the property. The property becomes primary; objects are whatever happens to satisfy it.

This is a profound inversion. Instead of “this apple has redness”, we’re saying “redness is a condition, and this apple satisfies it.” Properties become first-class citizens manipulable independently of any particular object.

This move—from cataloging objects to characterizing conditions—is the engine of mathematical abstraction

The suspension problem

But what is the truth value of P(x)? It has none. “x is red” hangs suspended—neither true nor false—until we specify what x is. This suspension is what allows properties to exist independently of their instances.

Eventually we need to collapse predicates back into propositions—statements with definite truth values. We need to resolve the free variable.

Quantifiers: Binding the free

We introduce two operations that bind free variables:

(exists): “At least one object satisfies the predicate”

(for all): “Every object in the domain satisfies the predicate”

These transform suspended predicates into concrete propositions:

  • ∃x: P(x) — “Something is red” (has a truth value)
  • ∀x: P(x) — “Everything is red” (has a truth value)

The binding collapses the suspension. This is the machinery behind every mathematical theorem. When we write “For all ε > 0, there exists δ > 0 such that…” we’re binding variables to create propositions about entire domains.

A formula with all variables bound is a sentence —it has achieved a truth value. A formula’s arity counts its free variables before binding.

First-order logic quantifies over objects. Second-order logic quantifies over properties themselves. The pattern recurses: bind the free, collapse to the evaluable.

The missing foundation

We now have logical operations and quantification machinery. But quantifiers presuppose domains—universes of discourse over which ∃x and ∀x range. We need a theory of these collections themselves.

We need sets.

Set Theory

The Primitives

In this first-order theory we allow two basic primitives:

  • Things called Sets, denoted A, B, C…
  • Things called Arrows or Functions from a Source set to a Target set, denoted f, g, h,…

This choice is deliberate. We could have started with sets and membership (∈), as in Zermelo-Fraenkel. Instead, we start with sets and functions, treating both as undefined primitives whose nature emerges from their behavior.

Why arrows instead of elements? Because assuming elements as primitives—like ZFC does with ∈—makes a metaphysical commitment we don’t need. It’s like assuming atoms exist and never “discovering” them through their interactions. When you start with ∈, you’re saying “sets are bags of stuff (other sets), and the stuff exists independently.” But that’s backwards!

By starting with arrows, we’re being radically honest: we only know things through their relationships. A thing with no relationships to anything else might as well not exist. When elements emerge later from our construction, they’ll come with built-in consistency guarantees that ZFC needs separate axioms to ensure.

The very idea of “collections” in set theory is just a model for what we’re trying to describe at a higher level of abstraction. This is powerful because the bare-bone ideas at the higher levels are almost always the same. With few modifications you could describe anything else. Much like the DNA code of human thought.

We have these “entities”—sets—and these “arrows” from one set to another. We think of functions as channels through which information flows. Once we know how the information flows, the rest follows.

Basic properties of the arrows

Identity

Can the Source and Target be the same set?

Yes, and this is crucial. Being able to describe transformations from a set to itself allows our entities to have inner workings. The fact that information can move around the same object hints that there are internal structures—these we will call elements.

Once we can describe “transformations” from a set to itself, there is something “natural” we could think of: a transformation that does nothing. This is another instantiation of the concept of zero—a reference point.

We call these special arrows “identities”:

Mathematics is dumb, i.e., mathematics is the art of pointing out trivialities and taking them seriously.

In this theory, you exist only if there is an arrow that points at you. The identity is that minimal presence: “this thing can at least be picked out and referred to again.”

Composition

If I give you two arrows:

What is the most natural thing to do?

Composition. There is no other way around it.

Composition takes 2 arrows and produces one arrow that is mechanically specified by first flowing along \(f\) and then \(g\). Eugenia Cheng has the most salient example: in your family, supposing your mother has a brother, you call this relative “uncle.” But specifying the parental relationship is independent of whether you call it “uncle” or “the brother of your mother.” The composition (“uncle”) does not yield something new. Composition is syntactic sugar.

This is extremely important. Composition has closure—you cannot escape yielding an arrow that was already there all along. You can always compose, as long as the target of the first is the same as the source of the second.

Associativity: For all \(A, B, C, D\) and functions \(f, g, h\):

We have: \((h \circ g) \circ f = h \circ (g \circ f)\)

This might be deceptively obvious, but it isn’t. It is clearer if we strip the notation to its bare minimum.

is semantically different from

This associativity condition on our theory means that our knowledge of the system is complete. We possess knowledge of all possible compositions, and from all possible compositions we can retrieve their component functions and therefore source and target sets. In this theory, ignorance is not allowed.

Associativity is what allows us to think linearly about processes. It’s why we can write “ABCD” without specifying whether we mean ((AB)C)D or A(B(CD)). This coherence is what separates meaningful structure from noise.

Isomorphism

Let us play with what we have built. Consider:

As if we could break the identity of \(A\) into two arrows \(f\) and \(g\). What does this mean?

The composition of the transformation from \(A\) to \(B\) and back to \(A\) is the transformation that leaves \(A\) unchanged. Here we stress that nothing has changed within \(A\).

The language codes for what we call “inner workings.” If something happens to \(A\) but we should still call it \(A\), then there are transformations that do something to \(A\) but don’t change how we understand \(A\) as an entity.

We’ve seen that \(g\) must undo whatever \(f\) does. We could do the same for \(B\):

But what if we choose the same arrows for both cases?

  • \(g \circ f = \mathrm{id}_A\)
  • \(f \circ g = \mathrm{id}_B\)

\(A\) and \(B\) are linked together profoundly. For any morphism \(m: A \to X\), we can always slip in the identity:

Since \(g \circ f = \mathrm{id}_A\), we rewrite:

Composing gives:

This works for any arrow from \(A\). Every arrow from \(A\) to any \(X\) gives rise to an arrow from \(B\) to \(X\), and vice versa. The two objects are interchangeable from the perspective of arrow structure.

Since you cannot avoid the identity silently sitting there, you cannot avoid having \(B\) whenever \(A\) is there, and vice versa. In our theory, they are the same object wearing different masks.

When we have such a pair \(f: A \to B\) and \(g: B \to A\) with \(g \circ f = \mathrm{id}_A\) and \(f \circ g = \mathrm{id}_B\), we call them isomorphisms, and say \(A\) and \(B\) are isomorphic: \(A \cong B\).

The Terminal

We can compose arrows and have added properties to this operation. There is a trend in mathematics:

Constraining the theory by limiting it to a subset of possible choices gives rise to more structure on the theory itself.

An elliptic curve has more structure than the space it is embedded in. Change my mind.

The main idea is to posit the existence of a reference or conceptual sink. A place where Objects can always go to. How would it look, as a property expressed only in terms of arrows?

A Terminal Set is a Set \(\mathbf{1}\) such that for every other Set \(X\), there exists always a unique arrow:

The exclamation mark signifies “the unique arrow.” There’s only one way to get to the terminal from anywhere.

What if we have two terminal objects? Call them \(\mathbf{1}\) and \(\mathbf{1}’\). By definition:

  • There’s a unique arrow \(f: \mathbf{1} \to \mathbf{1}’\)
  • There’s a unique arrow \(g: \mathbf{1}’ \to \mathbf{1}\)
  • There’s a unique arrow \(\mathbf{1} \to \mathbf{1}\) (which must be \(\mathrm{id}_{\mathbf{1}}\))
  • There’s a unique arrow \(\mathbf{1}’ \to \mathbf{1}’\) (which must be \(\mathrm{id}_{\mathbf{1}’}\))

The composition \(g \circ f: \mathbf{1} \to \mathbf{1}\) is an arrow from \(\mathbf{1}\) to itself. By uniqueness, \(g \circ f = \mathrm{id}_{\mathbf{1}}\). Similarly, \(f \circ g = \mathrm{id}_{\mathbf{1}’}\).

They’re isomorphic! Any two terminal objects are necessarily the same from our theory’s perspective. The terminal object, if it exists, is unique up to isomorphism. We speak of the terminal object.

This is our first major result: uniqueness up to isomorphism. When you specify something by a universal property (like “unique arrow from everything”), the thing you’ve specified is essentially unique—any two versions are isomorphic.

Elements

We’ve been hinting at “inner workings” and something inside our black boxes. Now we open them—not by breaking them apart, but by using the structure we’ve built.

Remember \(\mathbf{1}\)? Every set has exactly one arrow going to it. But what about arrows coming from it?

An arrow \(x: \mathbf{1} \to A\) is special. It starts from the simplest possible source (the terminal object has no internal structure—there’s only one way to get there from anywhere) and picks out something specific in \(A\).

If \(\mathbf{1}\) is the “one-point set” in classical understanding, then an arrow from \(\mathbf{1}\) to \(A\) is like pointing at a specific location in \(A\). It’s selecting, choosing, picking out.

An element of \(A\) is an arrow from the terminal object to \(A\): \(x: \mathbf{1} \to A\).

We’ve discovered what “elements” are without ever opening the black box. The elements of \(A\) are precisely the arrows \(\mathbf{1} \to A\). We denote them as \(x: \mathbf{1} \to A\), or sometimes \(x \in A\) when feeling classical.

In ZFC, elements are primitive—you declare “x ∈ A” as a basic notion. Here, they are a consequence of more basic structural facts. Because our elements emerged from structure rather than being assumed, they automatically satisfy coherence conditions that ZFC needs separate axioms to ensure.

Two arrows are the same if they have the same effect on everything, and now “everything” includes all elements:

Two arrows \(f, g: A \to B\) are equal if and only if for every element \(x: \mathbf{1} \to A\), we have \(f \circ x = g \circ x\).

We can check if two functions are the same by testing them on elements. Global behavior (the function) is completely determined by local behavior (what it does to each element). This principle—functions are determined by their action on elements—seems obvious in retrospect. But we just derived it from pure arrow composition.

This property is called well-pointedness or saying that \(\mathbf{1}\) is a generator. It validates all element-wise reasoning in mathematics. The “inner workings” we kept hinting at? They’re the collection of arrows from \(\mathbf{1}\). The internal structure of a set is completely captured by how the terminal object sees it.

Conclusion: The First Layer

We started with nothing—just arrows and composition. We added:

  • Identity arrows (minimal self-reference)
  • Associativity (coherent sequencing)
  • A terminal object (a conceptual point)

From these sparse ingredients, the entire notion of “element” emerged naturally. Not because we put it there, but because it was always there, waiting to be discovered.

The key insight: structure emerges from relationships.

In traditional set theory (ZFC), you start with elements as primitive. You declare “x ∈ A” and build from there. But this makes a metaphysical commitment—you’re saying elements exist independently of any structure.

Our approach inverts this. Elements are not primitive. They’re a consequence of having a terminal object and considering arrows from it. The membership relation x ∈ A is shorthand for “there exists an arrow \(\mathbf{1} \to A\) that we call x.”

This isn’t just philosophical sleight of hand. It has consequences:

  • Extensionality (sets determined by elements) becomes a theorem, not an axiom
  • Function equality becomes testable (check on all elements)
  • The “inner workings” of sets are precisely characterized (arrows from \(\mathbf{1}\))

We’ve built a foundation where relationships are primary and objects are secondary. Sets exist because they relate to other sets through functions. Elements exist because the terminal object relates to other sets through selections.

What’s next

We’ve bootstrapped the notion of elements from pure structure. But we haven’t built much structure yet. We can’t form pairs, we can’t talk about subsets, we can’t internalize functions as objects, and we have no notion of infinity.

[In Part 2], we’ll build the machinery:

  • Products (ordered pairs without Kuratowski coding)
  • Exponentials (functions as objects, currying, higher-order functions)
  • Subobject classifier (truth values, subsets, power sets, logic internalized)
  • Natural numbers (induction and infinity from universal properties)
  • Initial objects and coproducts (duality, sum types, case analysis)

Each construction will follow the same pattern: specify behavior through universal properties, let implementation emerge. By the end, we’ll have recovered all of classical mathematics from nothing but arrows.

The journey continues.

References

  • Tom Leinster, Rethinking Set Theory (arXiv:1212.6543)
  • F. William Lawvere, “An Elementary Theory of the Category of Sets”
  • Saunders Mac Lane, “Categories for the Working Mathematician”
  • Eugenia Cheng, TED Talk on category theory and analogies

Next: [Part 2 - Building the Machinery]