Tag Archives: epistemology

A Psychological Take on AGI Alignment

My understanding of AGI is, perhaps predictably, rooted in my understanding of human psychology.

There are many technical questions I can’t answer about why Artificial General Intelligence can easily be an existential risk for humanity. If someone points to our current Large Language Models and asks how they’re supposed to become a risk to  humanity… hey, maybe they won’t. I’m a psych guy, not a techie. Sure, I have ideas, but it’s borrowed knowledge, well outside my forte.

But it only minimally matters to me whether AGI is an existential risk for this decade vs this century. Whether LLMs are the path to it or not, the creation of AGI is not limited by physics, so I’m confident it will come about sooner or later.

When it does, it could be the start of a utopic future of abundance the world has never seen before… but only if certain, very specific types of AGI are created. Many more types of AGI seem predictably likely to lead to ruin, and as far as I’m concerned, until this “alignment problem” is solved, it’s a problem humanity needs to take a lot more seriously than it has been.

And I get why that’s hard for a lot of people to do, given the complexity and speculative nature of the threat. But as I said, my understanding of it is rooted in psychology, and I think that’s important given how humans are the only general intelligence we know exists and can at least somewhat understand.

Is there some law that says an artificial intelligence has to work like a human brain does? Definitely not, and that’s more concerning, not less.

There’s a whole taxonomy in science-fiction for different kinds of alien races, and what sorts of relationships we can expect them to have to humans. Most sci-fi just defaults to the weird-forehead aliens of Star Trek, or the slightly more monstrous but still basically human aliens of Star wars.

But “hard” sci fi is where you’ll see authors really exploring what it might mean to find a totally different evolutionary lineage result in intelligent life, and long story short, no matter how the alien looks,  cooperation is dependent on understanding and mutual values.

And humans can barely cooperate with each other despite sharing most of our genetics and basic building blocks of culture, like enjoying music and sugary food and smiling babies. If you try getting along with the equivalent of a sapient shark the exact way you would a human, you’re going to have a bad time.

(I have no problem inherently with the existence of non-human-like intelligences, but even if you don’t read science fiction, any study of earth’s ecological history should make it clear why minds which care about completely different things pose existential risks to one another. I hope any sufficiently different, fully sapient minds exist outside our lightcone, where we can’t harm each other.)

But many people fail to track how possible “inhuman” AGI is, and I think it’s because there are four things most people, no matter how good at computer science, physics, philosophy, etc, largely do not understand about human psychology.

1) What motivates our actions.
2) What causes memes to be more/less effective.
3) How human biology affects both of those.
4) The role prediction plays in beliefs and actions.

So I’m going to very quickly go over each, and maybe someday I’ll write the full essay on each that they deserve.

1) Human actions are informed by our ideas, but motivated by emotions and instincts we evolved for fitness in the ancestral environment. Our motivations are “coded in,” and felt through, our bodies.

This means outside of reflexes and habits, everything we deliberately choose to do follows some emotional experience or predicted emotional state-of-being.

Again, this isn’t to say ideas don’t matter. But they don’t matter unless they also evoke some feeling.  When humans feel things less, either through some neurological issue or hormone imbalance or brain injury, their motivation to do things is directly affected.

No emotions = no deliberate actions, only instincts and reflexes.

2) Memes persist and spread through emotional drives, which bottom out in biological drives. Memes scaffold on genes.

Memes can scaffold off memes. When memes override genes, they use emotions to motivate actions by rewiring what we find rewarding or aversive. Which means the effectiveness of memes are to some degree still based on our biology.

If the ideas we learn don’t motivate us toward more adaptive actions as dictated by our biology and the broader memes of our culture, they will lose to ideas that do. But a creature with different biology or in a different context would find different ideas adaptive or non-adaptive.

3) Biology is the bedrock our values all build on. All the initial things we care about by default, like warmth, food, smiles, music, even green plants, are biologically driven.

Ideas introduce new things that we care about to the point where we each become unique individuals, blends of our genetics and the ideas we’re exposed to, but again, it’s all built on our biological drives.

So, tweak our hormones, neurotransmitters, maybe even gut biome? We will change. What we like, what we believe, what we’re motivated to do, all can change by minor tweaks in the chemical soup that is your body.

Sufficiently tweaked biology even alters our ability to discern reality, let alone rational vs irrational beliefs or courses of actions. Take any human with a strong interest, passion, or ideal, and introduce that human’s body to sufficient heroin, and you can observe in real time as if by a dial the way their motivations will change away from previous interests, passions, and ideals and toward whatever it takes to acquire more heroin.

The degree to which this is recoverable or resistible is an interesting question, but the reality is undeniably that it happens. And base-line-human-addicted-to-heroin is far from the strangest biological base a general intelligence can be attached to.

4) Minds by default navigate reality by prediction, short and long term, and react accordingly.

Predict suffering? Aversion. Prolonged suffering? Depression. Fun? Motivation. Danger? Fight/flight/freeze/fawn. All are affected by memes and knowledge. But all are rooted in human biology.

New ideas can change the models we use to understand reality, and what predictions we will make as a result. But we still need to care about those outcomes, and the caring bottoms out in what our bodies want or like or think will be adaptive, however crudely.

Again, ideas can also influence those things. There are memes that lead people to not have children, despite genetic drives. There are memes that lead people to set themselves on fire.

But always these memes are motivating behavior by rewiring this system of predictive processing, of imagining different futures and then having an emotional reaction to those futures that motivate A vs B, C, or D.

So, to summarize, in case the connection to AI isn’t clear:

AI doesn’t have biology. Analogous inputs to weigh decisions have to be created for it. Without them, the AI would have no emotion/desires/values. Not even instincts.

Intelligence alone is not enough, for us or for AI. Intelligence is the ability to problem solve, to store knowledge and narrow down to the relevant bits, to pattern match and make predictions and imagine new solutions.

But that capability is not relevant to what you will value or care about. If you attach that capability to a heroin-maximizer, you will get lots of heroin. You need something more to nudge it toward one preferred world state over another, even if you don’t care what that world state is, because the AGI still needs to care.

And so, as far as I understand human psychology, there is no “don’t align” AGI option. For it to be an actual AGI that does things, for it to be an agent itself, it needs some equivalent of human instincts/emotions for it to have any values at all.

And we ideally want it to have values that are at least compatible with sharing the same lightcone as us, let alone the same planet or solar system.

Some people bring up human children as a rhetorical comparison to AGI, implying that we should treat them exactly the same. Their  worry is that, instead of letting AGI explore the realm of ideas as they want, people will try to indoctrinate them, and so long as that’s avoided, all would be well. And indoctrination is certainly a danger when it comes to superintelligent beings of any kind.

[A whole separate post would be needed to explore why an artificial general intelligence should be treated essentially equivalent to a superintelligence or something that will soon become one, but again, even if I’m wrong about that, it’s not a crux to me, because superintelligence is not limited by physics and even if me and my kids can live full happy lives I still care about my children’s children and my friends’ children’s children.]

[[There is also a school of thought that says intelligence is binary, you either have it or you don’t, and so superintelligence is basically not a real thing. Again, I would need a whole essay to explore why this is wrong, but I can confidently say that studying a rudimentary amount of psychology shows how untrue the “intelligence is binary” theory is for humans, let alone minds that might be built entirely different than ours.]]

But indoctrination is one of the last dangers when dealing with AGI. If all we have to worry about is AGI being indoctrinated or coerced, we have already solved like 99% of the dangers that come from AGI.

Because at least a superintelligent human capable of inventing superplagues or cold fusion would still share the same genetic drives as the rest of us. It would (most likely) still find smiles friendly and happiness inducing. It would still (most likely) appreciate music and greenery.

An AGI will not care about any of that, will not care about anything, if it is not programmed, at some basic level, to “feel” at all. There needs to be something in the place of its motivation generator, for the ideas it’s introduced to afterward to scaffold on when influencing what it chooses to do.

And sure, then it might learn and grow to care about things it didn’t originally get programmed to, the way humans do… assuming whatever it runs on is as malleable as the human brain.

But either way, “AGI Alignment” isn’t about control. You can’t think that something is “superintelligent” and also believe you can control it, or else we have different definitions of what “superintelligence” even means. If your plan is to try and control something that thinks both creatively and so quickly that you might as well be a tree by comparison, you will also have a bad time.

Alignment is about being able to understand and share any sorts of common values. And because it’s not optional for a true AGI to be a person, the only questions are how to do it “best,” for itself and humanity, and who decides that.

Experts and Expertise

TL;DR: Expertise is a multivariable spectrum, not a binary, and disagreements are often signs of different knowledge. Seek the knowledge gap between different experts, and between yourself and them. Find what you didn’t realize you didn’t know, and diversify your expert portfolio.

Seeing all the debates around AGI recently has made me feel that many people seem deeply confused about what “expertise” is and how to relate to it.

Rejecting expertise is something I never do, even if I disagree with the expert. Nor, obviously, do I bow to expertise. Instead, I use experts’ beliefs as opportunities to reflect on my own state of knowledge.

Useful explanations are the main thing I really care about, and both laymen and experts can provide those… but knowledge is the fundamental building block of a good explanation, and “expert” is meaningless as a word if it doesn’t signal at least some reservoir of knowledge.

When two experts disagree, my immediate thought is “I wonder what knowledge each of them has that the other lacks.”

One of them may even have all the relevant knowledge the other does, and more! In which case one of them could just in a binary way be wrong about a particular question in specific, or one can be more correct more often in general.

But always, when experts disagree, figuring that out, figuring out which expert has what knowledge, is where I find the most value in pointing my attention. Not all disagreements come down to explicit knowledge, of course, sometimes people have biases or heuristics or values that affect their beliefs… but the first two are just compressed knowledge, and the last one is usually pretty easy to pick out if the person explains their reasoning.

This is why, to me, asking people to notice their non-expertise (lack of knowledge) on a topic can be useful, so long as it doesn’t imply submission to authority. It should act as a prompt to notice confusion and boggle over uncertainties. Responding with “experts can be wrong” is both trivially true and uselessly general as a critique.

For me, learning from experts means seeking the gaps in knowledge that makes them the expert and me not one. I still expect what they say to make sense to me, but I can only do that if I can find parts of my model that they can’t account for, and that takes work on my part.

It’s sometimes hard work, and I suspect that’s what makes most people reject expertise when it’s convenient to their disagreement to do so. But we have to be willing to examine our own models, boggle over what’s missing, and not feel threatened by the gaps. Learning can be fun!

So, how to identify “actual experts” so you don’t waste time and energy listening to everyone who claims expertise?

Good question! I wish I had a better answer. It’s often hard, and tempting to outsource to credentials. For many decisions, like car repair or health, it makes sense to defer to doctors and mechanics, though I still always check online just to learn what the thing they say means and whether it fits my experience or symptoms.

But the central question I reorient to is, “What does this person think they know, and why do they think they know it?”

People I most respect are those who ask people, particularly those that disagree with them, to make their beliefs legible, and ask them what would change their mind. Seeing one expert do this to another is a sign that they’re someone who reflects on their own knowledge often, and that I should pay more attention to what they say.

This is also how non-credentialed experts can very clearly overturn what credentialed experts say, for me. When someone spends dozens, or even hundreds, of hours making their thinking legible in a way that I can observe, particularly about a specific topic… sure, they can still be wrong, just like the credentialed experts.

But at least I can check whether a credentialed expert addresses their cruxes or not. And I can tease out what part of their belief is based on knowledge they can make legible, vs heuristics or values the aren’t aware of or that I might disagree with.

Transgender Visibility Day, and the Laziness of Language

Happy Transgender Visibility Day!

I’m one of those people for whom “they” and “them” feel about as fitting as “he” and “him,” but I’ve been pretty lucky in a lot of ways and it doesn’t really bother me other than in a few specific circumstances. Normally I don’t even bring it up, but I’ve been considering doing it more often, even though I feel generally masculine, for the sake of normalizing something that really shouldn’t be that big a deal, so that’s part of what I wanted to do with this post.

But the much bigger part of why this feels important isn’t about me, but about the absolute weirdness that comes from society confusing its heuristics and semantic shorthands with deciding it’s allowed to tell people what they “should be.

Because that’s what this debate always comes down to. The labels society developed are all terrible ways to actually map reality, and while many people, and some parts of Western Society, have begun evolving past a lot of the baggage those labels inherited… there’s still a long way to go, and gender is just the latest frontier of this.

In the old days being a “man” or “woman” meant you had to have A, B and C traits, or like X, Y and Z things, and if you were different, that meant you were less of one, which was always framed in a bad way. More and more people are coming to accept that this is nonsense, but we get stuck on things like biology.

It’s not entirely our fault. The problem is we were given shitty words, a lazy language, and told that reality follows the words rather than that the words are a slapdash prototype effort to understand reality.

We had to develop words like “stepmom” to differentiate “biological mom” and “non-biological mom,” except THAT doesn’t work all the time either, because stepmom implies that they married your dad, so what do you call the female that helped raise you that didn’t marry your dad? We all just shrug and accept this gap in our map because no one bothered to create a differentiating word for “person who carried you in their womb whose genetics you share” and “person who is female who raised you.” Too much of an edge-case, maybe, or the only people it affected were poor, or it wasn’t something polite company would acknowledge because the “proper” thing to do would be to cement the relationship through marriage.
Bottom line is it’s a bad language. It’s lazy. It carries baggage and artifacts. It imprecisely describes reality. And we should always keep that in mind, ALWAYS, when we disagree with people about basically anything, but PARTICULARLY when we disagree about each other.
Ethnicity is like this too. There are some useful medical facts that can be determined through heredity and genetic trends in populations, but for 99% of circumstances, the question of what “race” someone is ends up being entirely about social constructs. It’s about how they’re treated by others, it’s about their experiences and lack of experiences, and people fall through the cracks of our shitty, lazy language all the time.
23&Me says I’m 96.4% “Iranian, Caucasian & Mesopotamian”:

Does that make me “white” or “middle-eastern” on the US Census? When people ask if I’m Middle-Eastern, what question am I actually answering? (And no, just saying “I’m Persian” or “My parents are from Iran” does not tend to clarify things for them, because this is not something most who ask know themselves!) I’ve always passed as white (other than in airports, at least), so most of the time it seems weird to call myself Middle-Eastern, though my dad and brother are far more obviously from the Middle East, and my dad in particular has lived a very different life as a result of that. I get clocked as Jewish once in a while, but only once in a way that made my life feel endangered.

The point is there’s nothing at the heart of the generally asked question “what ethnicity” I am. Knowing my parents are Iranian  would tell you some things about the kinds of food I enjoy and am used to, but not exclusively. I was raised Jewish, and that would again indicate some things about food familiarity and what holidays I’m familiar with. But when it comes to who I am, as a person, the pattern of thoughts and behaviors that make up me, it’s a nonsense question that, in a perfect world, I wouldn’t even have to consider. As with gender, I’m lucky enough that on most days I don’t have to, unless I’m filling out a form of some kind.
Back to gender. Because we were raised in a culture too lazy and biased to come up with words for “XY chromosomes” that means something different from “male presenting” and another word for “identifies with this bundle of cultural-specific gender stereotypes” and so on, we waste hours and hours, millions of collective hours, we waste blood and sweat and tears, on stupid debates about whether people should be called “men” or “women,” and the question of whether those should be the only two options takes the backseat, while the question of how much it actually matters compared to how we treat each other is talked around or ignored.
There are SOME non-stupid questions in that space. There are some non-stupid considerations that have to be navigated once in a while in society where something similar to the concept of “gender” or “sex” is important, particularly in medical contexts, dating contexts, physical competitions, etc.
But these are 1 in 100, 1 in 1,000, probably really 1 in 1,000,000 what people actually care about when you examine society’s insistence on how lazy we can collectively get away with being when thinking and talking about each other, and certainly don’t have any relationship to the various hysterias that lawmakers tend to leverage when deciding which bouts of cultural fears or ignorance are most politically expedient to them.
In my ideal world we all have pills we can take to transform in to any body shape we want anyway, or a menu in a simulation that lets us be anything we want, and anything that takes us even a tiny step in that direction is better than things that keep us stuck. Which means I’m always happy to call other people whatever personal-identity-labels they’d prefer to be called, even if I slip up sometimes due to pattern-matching visual gendertropes, or accessing cached memories of a person.
As for myself, over the course of my life I’ve responded to “Damon,” “נתן,” “Max,” and “Daystar,” and I honestly don’t really have a preference with what you call me; just how you treat me.

Memorization Matters

When I was young I and others I knew used to deride “memorization tests.” In a world where being able to learn facts is easier and faster than it’s ever been, it was hard to imagine why being able to recite trivia for a test would ever be useful. And since structured education is an abysmal way to learn in general, it took me a while to distinguish the poor pedagogy from the value of actually having memorized knowledge of things, even in the Information Age:

1) Synthesizing existing knowledge is usually necessary to gain new insights about the world. It seems obvious when stated clearly, but pay attention to how often people feel like they have new or interesting ideas, only to discover that they’ve already been had by others or are invalidated by some facts they didn’t know. Knowledge builds on knowledge; the more you have, the more likely you are to generate more.

2) Memorized information saves time, the value of which is often underestimated. People spend a lot of time trying to remember things, arguing about what facts are true (often for inane pop-culture info), and even a 10 second google search adds up if you do it enough, and can break flow of thought and productivity. Personally, I spend hours every week researching stuff for my story that someone with more in-depth physics, history, biochemistry, etc education would just know and be able to utilize to write.

3) Having a large body of true knowledge is VITAL for good information hygiene. Lack of knowledge is a big part of what makes up “gullibility.” When you hear an assertion about reality, your mind often automatically feels something, whether it’s skepticism, plausibility, confidence, or just uncertainty, that weird “back and forth” feeling as your brain offers up arguments or data or comparisons for and against.

The more true facts you actually know, the better calibrated your skepticism of false claims will be, and the more likely you are to actually investigate things that are presented as true when you think they’re not, or presented as false when you think they’re true.

To be clear, when I talk about memorized facts, I mostly am referring to actual understanding, not just being able to say the right combination of noises by rote. Memorizing a list of invention names doesn’t help you create new inventions, being able to recite atoms doesn’t help you understand each one’s properties, and new information would just get absorbed if you don’t understand what you’ve memorized enough for there to be some interaction with it. But once in a while even basic memorized trivia like names and dates are valuable for their own sake too.

I don’t mean to counterswing into an opposite extreme. Simple facts are no substitution for critical thinking or creativity, and knowing how to gather good information is also a very important skill. But the knowledge you have stored is what informs your thoughts day to day, and often affects whether you will know to start gathering more when faced with new info of dubious quality.

Ontology 101

Learning new words late in life (by which I here mean “in my 30s”) is interesting, because most of the time it’s a word that’s just another version of a word I already know with some subtle difference, or a mashing of two concepts that might be useful to have mashed together once in a while. Truly new concepts become rarer the older and more educated someone is, but as faulty as words are for communicating concepts, if you have no word for a concept then it becomes much harder to think about and discuss, a bit like having to rebuild chair every time you want to sit on it, or only being able to direct people to a location by describing landmarks.

A couple years ago I had no idea what “ontology” actually meant, despite feeling like I was hearing people say it all the time. Once I did I started using it all the time too. Okay not actually, maybe a few times a month , but that still feels like a meaningful jump given I had no word to cleanly represent what it meant before! So here’s me explaining it in a way I hope will help others do so too.

The problem was, every time I saw the word used, it seemed like it could be removed from a sentence and the sentence’s meaning wouldn’t change. All the definitions I read appeared to just mash words together in a way that made sense, but didn’t mean anything. For example, Wikipedia says:

“The branch of philosophy that studies concepts such as existence, being, becoming, and reality. It includes the questions of how entities are grouped into basic categories and which of these entities exist on the most fundamental level.”

This may or may not be a great definition, but it does little to actually tell people how to use the word “ontology” in any other context, or how it can be usefully applied to confusions or conversations.

What I found most helpful, ultimately, was considering the question “Do winged horses exist?”

This a question of ontology, because depending on how we define “exist” the answer might be “Probably not, there’s no evidence of any horses ever having wings,” or it might be “Yes, I read about them all the time in fiction, in contrast to flanglezoppers, which is a sound I just made that has no meaning.”

So ontology is the study and specification of what we mean when we say “real.” But it’s also about categorization; a more useful definition of ontology I came across is: An adjective signifying a relation to subjective models.

What does “a relation to subjective models” mean? Well, all ways of thinking of objects, for example, are subjective models; reality at its most basic level is absurdly fine-grained, far too detailed for us to understand or easily talk about. So we focus on emergent phenomena that are much easier to interface with, even if they’re not as precise. For example, we can talk about a country’s hundreds of millions of individuals, with their own personal goals and desires and preferences, and that can be useful. Or we can just say “The USA wants X” and it’s understood to mean something like “a meaningful chunk of the population” or “the government.” On the flip side, even an individual is not monolithic in their desires, and can be further broken down into subagents that might want competing things, like Freedom vs Security.

So it can be very valuable to know what model/map/layer you’re organizing concepts on, as well as what level your conversation partner is, to focus discussions. I wrote a brief conversation that shows what this looks like:

The philosophy teacher hands his student a pencil. “Describe this to me as if I was blind.”

The student thinks he’s clever, so says, “Well, it’s a collection of atoms, probably mostly carbon and graphite, with some rubber molecules—”

The teacher flicks the student’s ear, causing him to wince. “You’re in the wrong ontology. What you described could be a lot of different things, it could have been a lubricated piece of coal for all I knew. Describe it in a way that makes its distinctly observable parts plain to me.”

“Um. It’s a core of graphite wrapped in wood, with a piece of rubber on the end?”

“Better. Now switch the ontological frame to the functional parts.”

“It… has a writing part that’s at one end, and it has an erasing part at the other, and it has a holding part between them?”

“Excellent. Now tell me about it from the ontology of fundamental particles…”

There may be no end to ontological frames that you can use to examine and organize reality; animals can be classified by environmental preference or limb count or diet, stories by genre or structure or perspective, food by flavor or culture or substance.  Some are more broadly useful than others, but being able to swap ontological frames of how concepts are related and at what complexity level of “reality” they emerge, can be very valuable for the whole practice of using maps, frames, lenses, etc in a strategic way.