Thinking Along Axes

Playing with Knives

To grasp the world around us, we slice it into categories using the knife of language.

A word has the power to split the universe in two. Everything either is, or is not a blobfish. Everything is, or is not sentient. Everything is, or is not a muffin [A].

This is binary thinking, and is only the most simple example of a more general model of language, which I call thinking along axes [B].

To improve the nuance of binary thinking, two things can be done. The first is to use a continuous spectrum, rather than discrete categories. A well known example is the political spectrum. Some view the right and left as distinct, opposing groups. Everyone is either friend or enemy. But most people use a continuous line. American variations usually include the following landmarks in order: communism, progressivism, centrism, conservatism, and fascism.

The continuous model better combats polarization because it acknowledges that there is middle ground. Drawing non-overlapping in- and out-groups is not possible, because not all members of a group have identical beliefs.

Another way to improve nuance of thought is by graduating to additional dimensions. A good example of this is the Myers-Briggs Personality Test, which categorizes based on 4 different dimensions: energy, information, decisions, and organization. This makes for a much more detailed understanding of personality. But it is still binary along each axis, each dimension having two options.

The most nuanced models have multiple continuous axes. Examples include Wait But Why’s Thinking, In 3D and Paul Graham's Four Quadrants of Conformism [C]. These types of models can be said to have high fidelity: they represent reality with more detail.

A Mathematical Metaphor

When thinking about thinking along axes, I imagine the following:

Everything that can be described exists as a point in an infinite-dimensional space.

The dimensions each represent a property (e.g. size, hue, kindness, speed, any adjective imaginable, etc. ad infinitum).

Words are vectors in this space.

Word vectors can be added to approximate anything.

We can only exactly describe what we ourselves have defined. No matter how many vectors you use, the span of those vectors only includes abstract objects (human constructs). You can never capture every single coordinate of something real by using a finite number of vectors, because real objects have an infinite number of non-zero entries. And you cannot use an infinite number of vectors. Therefore, exactly understanding a concrete object as anything but its whole is impossible.

Choosing axes for a model is akin to deciding on a basis for a subspace within the overall space. You are stating that projections onto that plane are meaningful simplifications and that we should bring attention to them in certain situations.

Adding an axis to your model that makes the set of vectors linearly dependent is redundant and inefficient.

A basis with orthogonal vectors creates axes without correlation. A basis with normal vectors creates axes of equal length. Both of these are extremely important.

Gram-Schmidt Orthonormalization is analogous to making a model less convoluted by eliminating correlation and differences in variation between axes. A basis with the same span, but with orthogonal and normal vectors is created.

The most important takeaways from this formality are that models are projections and that orthonormalization is extremely important. The basis you choose for a model is like a toolbox, the vectors the tools.

Near equal variation is important because otherwise one of the axes becomes much more important than the others. If that is the case, then it would probably be more efficient to use just that single axis. Why lug around a tool you never use?

Low correlation between all axes is even more important. Each tool should serve a single function. Would you rather have a great hammer and a great wrench, or two different hybrids, which can do both jobs, but neither well? When axes are at 90º angles, their meanings are unique and each adds information the others don't.

In case this is confusing, an example: you are making a model to understand what gives an object high kinetic energy. You could put mass on one axis and momentum on the other. But this is redundant because momentum is composed of velocity and mass. A better choice would be mass on one axis and velocity on the other. The axes aren't correlated so they each give distinct information [E].

The very best models are built on orthonormal bases because they have low correlation and difference in variation between axes. They use multiple continuous axes so as to have high fidelity and create projections that are close approximations to reality.

Approaching Reality

"All models are wrong, many are useful, some are deadly." -Nassim Nicholas Taleb

Fidelity should never be mistaken for actuality. Even the most complex models are approximations. They can only approach reality. But after getting sufficiently close, diminishing returns set in: this is called overthinking.

Overthinking can be deadly. This is probably why the least nuanced stereotypes are evoked by reactionary emotions like fear. Either there is, or there is not a lion chasing you. Not much of a spectrum is necessary, and extra dimensions would get you chewed alive. So extra fidelity is not always good.

But in our normal everyday lives, the instincts to make quick, fear based decisions are usually vestigial. Just as deadly as overthinking is oversimplification. For example, racial police violence stems from using the axis of skin color, and using nothing else, to make judgements.

The ideal way of thinking combines simplification and detail to converge on models that are efficient. That is, they use the least number of axes to accurately describe what's relevant.

Humans are innately driven toward efficient explanation, led there by two forms of beauty: the first is incompressible beauty. When people fall in love, they don't fall in love with a single trait of that person. Rather it is the person as a whole, irreducible to any one part. When we see the beauty in biology, it is not because of simplicity but extreme complexity. Incompressibility gives the impression of design, a numinous feeling, because whenever we cannot understand something, there is a goal to strive for; beyond our grasp, yet within our sight [F].

Sometimes we are content to bask in the mysteries of the world, but based on the progress of science it is clear that humans are perhaps more driven by the beauty of compression. David Perell, in his essay Expression is Compression, talks about how the ultimate act of creation is to distill an idea to its essential form. In art, physics, writing, and more, we remove the fat of some experience and deliver it in an abstract, yet essentially representative way for others to consume. He is spot on in saying that compression is the work of a master creator. We are amazed at the complex properties that can emerge from an elegant equation, or the intense emotions that can be conveyed in abstract artwork. We are driven to invent efficient explanations.

So beauty comes in two forms, which work together in creating purpose. The beauty of nature and the uncompressed is what drives us to understand the world. And as we understand it, we compress and create our own form of beauty. As one fuels the other, we approach reality.

Holism and Reductionism

To ultimately be nuanced, it takes an understanding that no matter how far you break something down, your model will always be just a convenience. It will never exactly be the system, and it will never describe a person in all of their complexities. While it would be impossible to comprehend the world without cutting it, in the end there should be a respect for the fact that we are always only an infinitesimal way toward a complete breakdown. The only way to truly see the world is to simply understand it as an unanalyzable whole.

This is not an original thought. While science has encouraged further cuts, many Eastern religions have taught that enlightenment is to see that the world is one, and that attempts to understand that further are in vain. This idea was also proposed further by Hegel as holism, rather than its opposite, reductionism. You cannot know anything unless you know everything.

Holism and reductionism are not enemies. They are two strategies for seeing the world, and the best of humanity comes from knowing which to apply.

Sometimes, incompressible beauty is better enjoyed uncompressed. Whenever a joke is explained it instantly ceases to be funny. Or, imagine you are contently enjoying a delicious Ballpark hotdog when someone decides to explain its origins. Some situations are "ruined" when reduced.

But much of the time explanations don't work like this. If you take a music theory class, you find out that the chills you get from moving songs are largely due to a chord progression, which is essentially the same one used in all music. If you take a neuroscience class, you will learn that your emotions are results of neurotransmitter concentrations. And yet these reductions, at least for me, seem not to ruin the joke, but to enhance it. Knowing the source generates beauty not just by compressing the incompressible, but by allowing us to recreate it. Here lies the beauty of engineering, of art, of becoming a master at anything: learning patterns allows you to generate the seemingly incompressible.

I'll say it again: Holism and reductionism are not enemies. They are two strategies for seeing the world, and the best of humanity comes from knowing which to apply.

We ought to fight for high-fidelity, efficient models—for continuous axes, multiple axes, and orthogonal axes. And, at the same time, we ought to admit their ultimate limitation. We ought to admire the beauty in complexity, yet strive to reduce it to our own form of beauty. We ought to embrace both compression and the incompressible.

To do all of this is to think along axes.

Notes

[A] Although, admittedly, sometimes the lines begin to blur.

[B] Axes, not axes or Axes.

[C] Thinking along axes is itself an example. One dimension is the continuity of the graph and the other dimension is the number of dimensions (this isn't completely continuous, but isn't binary).

[D] Using math as a metaphor shouldn’t give the impression that this is all easily quantifiable. Rather, mathematical objects can be used to understand the relations and patterns in concepts that would exist regardless of the actual specific values that might exist.

[E] This example was used because it is very exact. In real life, it is much harder to determine whether two axes are correlated, which is what makes great models hard to come by.

[F] More examples: God, consciousness, the size of the universe. All incompressible things that lead to numinosity.

Playing with Knives

A Mathematical Metaphor

Approaching Reality

Holism and Reductionism

Notes

Comments: