My attempts to sensemake AI risk
Making sense of AI risk is very hard for me. I've tried to write down, in no particular order, points that feel relevant for influencing how I'm reasoning about this, stream-of-consciousness style, with emphasis on how my emotions impact my models.
Knowingless is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
I joined the rationalist community in 2015, and somehow managed to get all the way to 2021 without hearing serious discussion about AI risk. It was in the water supply, but it felt kind of background, like everyone was doing their own research somewhere I couldn't see, and alluded to it with each other, but I rarely heard actual discussion about it like I did about other things.
I initially completely misunderstood what people meant when they mentioned the term "AI risk"; I thought people were saying something stupid about consciousness. Once I got a clarification that no, general intelligence didn't mean like... awareness, or at least it wasn't relevant, then this didn't bother me.
When I was a very religious 13-year-old in a very religious community that believed in the end times and the rapture, I saw a documentary about supposed secret hidden codes in the bible that had predicted a bunch of uncannily accurate stuff already, and that the end of the world was supposed to happen in 2012. I knew the end times would take seven years, so I reasoned the rapture might happen in 2005. It didn't happen, and I felt stupid. There was a lot of knowledge I didn't have around "detecting meaningful patterns" and "how people can misrepresent findings" and "priors around how dramatic tv documentaries don't have a strong loyalty to truth". I didn't know I didn't have this knowledge. And my prediction here hinged upon a ton of moving parts (assuming truth of Christianity, that the end times would take 7 years, that the documentary was being honest), and this made it really vulnerable - only one of those joints had to fail for me to be wrong. Things can feel really convincing, but if you're not strongly familiar with the failures of the relevant fields around the claims, it's easy to get deceived.
People in AI tell me I don't really need to know much technically about how machine learning works in order to do reasoning about AI risk. On one hand this is kind of true - like, during the beginning of covid my friend came to me worried about food shortages and me, knowing nothing about food shortages, managed to correctly deduce that this was not going to be an issue based on a lot of priors around how incentives and money and laws work. But on the other hand... the bible code. When I try to start seriously thinking about AI, a little voice in my head is constantly wary, like "you're way out of your depth", like the risk of me making some error in the reasoning is high.
But this doesn't mean I shouldn't try to reason about it! Fear of not knowing what I don't know is a bad reason to stop me making a serious attempt of thinking carefully.
There's lots of social incentive here, and I think that's making me need to be even more slow and careful. I am surrounded by people with strong opinions, many of my close friends work directly in AI, and the rationalist community, dear to my heart, was founded by one of the most pessimistic AI predictors alive.
I suspect I'm pretty susceptible to subconsciously believing opinions that my peers believe in order to be more liked and respected. I have some sense that I need to believe AI is a serious existential risk in order to be respected in my communities. This freaks me out, because some part of my brain is doing "Of course we're going to end up believing in AI risk, and the only question is how you can end up reasoning yourself into it while preserving the illusion to yourself that you are in fact doing this independently"
But I can't just, notice this and decide to discard AI risk. In a world where AI risk was very serious, then we would expect to find me in this situation anyway. This isn't an indicator between possible worlds here. But it is another fumbling block of distrust.
I also have some ego around figuring things out early on my own. I am a little mad I didn't go sit in my room at the age of seventeen or whatever and then realize AI was a huge existential risk. I suspect this is why I managed to get so far without looking into AI risk; some "if I can't reason it from uninfluenced first principles then I don't want it" vibe.
A lot of these self-doubt things, the "unknown unknowns fear", the "social belief influence," the "ego" bit, also serve a function for me as giving me a sense of control over my own mind. If I have unflattering narratives about myself, at least I have narratives, and I'm cool for being so introspective and honest. I'm incentivized to find the worst, most self-doubting parts of myself; this functions as a really good excuse for surrendering the authority of my reason to others. Actually being your own nuclear reactor of true opinion is terrifying, and much of my life has been spent masking my cowardice as humility, both to others and myself. And so, on some level, I don't trust the self-doubt. I must notice it, gently put it aside, and continue to stare into the scary abyss to find out what I actually think about this. It's just, it's an extra process to run in my brain, another ball to juggle.
So, what about the basic arguments themselves? I've been carefully titrating my exposure to them, so I can keep a close eye on my own response, so this is an in-progress, slow thing. My exposure, mostly: I've read maybe 5-7 essays on AI risk, have a bunch of friends who work in AI, haven't read any of the debates, had approximately 3 dedicated entry-level rationalist group discussions about AI risk, and have had a few smart people at parties casually try to convince me there's no big deal. I think I’m familiar with most of the basic arguments, but I’m not sure.
But right now... it seems pretty convincing? It seems clear to me that there's nothing stopping artificial general intelligence from getting way, way more intelligent than humans. We weren't optimized explicitly for intelligence, what happens when you have something optimized for it and barely constrained by time or scale? This seems pretty inevitable, assuming humanity doesn't crush itself first. My intuition is that this will be much faster than most people predict, here. I think most people are a bit blinded by the 'specialness' of humans, failure to think clearly about how intelligence works, and underestimations of the potential for exponential growth.
Part of this intuition comes from watching a bunch of people update towards shorter timelines (meaning the length of time they predict until AGI gets here) after cool new AI stuff like Dall-E came out. In my view, Dall-E-and-stuff arriving wasn't surprising to me, and if it was surprising to others, that meant they were failing to do some modeling correctly. This meant that people with shorter timeline predictions had better models to begin with.
I am mostly convinced that AGI won't be aligned with humans, at least not in the way that will prevent it from doing bug-squashing at us. It seems clearly super hard to program the right kind of goals into an AGI, for like a billion reasons. I have maybe more weight than others right now on "there might me some aspect about superintelligence that naturally results in cooperation", but it's still comparatively low.
It's still low cause I am pretty overwhelmed by the vastness of mindspace. I have often found myself completely humbled when trying to use my own mind to predict others, and am regularly astonished by how many tiny aspects of the human experience that seem like the default playground are in fact not default, can be changed. LSD really hammered this in for me; it was one specific drug, and threw me into one category of mindspace, but that category was unspeakably vast, and before it I couldn't even comprehend a mental state like that was possible. I am super wary about any intuitive assumption about the way intelligence must be. It seems plausible that AGI will be several times more alien to us than DMT is to our sober state.
(and, of course - if you have a superintelligent, unaligned AI, this would likely turn out pretty extinctiony for us, kinda the way we got extinctiony about smallpox)
The rest of the arguments feel less relevant, to me? Like, if you're granting a superintelligent AGI and you still think it won't be able to get out of the researcher's box (like, it’s on a computer disconnected from the internet and wants you to connect it to the internet, or something), then I don't think you're properly imagining superintelligence. Maybe this is a bit silly, but for my own calibration I've often imagined a bunch of five-year-olds who've been strictly instructed not to pass the key through your prison door slot, and you have to convince them to do it. The intelligence gap between you and five year olds is probably much smaller than the gap between you and an AGI, but probably you could convince the five year olds to let you out. People arguing they just wouldn't let an AGI take any sort of control of anything strikes me as silly as the five year olds swearing they won't let the adult out no matter what. Most other arguments around human beings controlling the AGI in any way once it happens feels equally as silly. You just can’t properly comprehend a thing vastly smarter than you!
I have more confusion around if it's gonna foom into that level of intelligence so fast. Maybe it'll be more incremental? Some people argue it can self modify, or do research on itself to make itself faster. Here I feel like my intuitions falter a lot, and like I'd need more information about how AI research works on a more practical scale. There have been a whole lot of extensive debates about this in my communities that I’ve been dutifully avoiding.
But are my “intuitions faltering” just me just being cowardly in deferring to people with more knowledge? Maybe? Okay, I'm going to pretend I have opinions here. It does seem likely that, if an organization with a lot of funding could use a moderately-intelligent AI to work on more smart AI, this would probably happen. There's lots of incentive for this to happen, because the smarter your AI, the more money you can make or wars you can win or whatever. But how fast will using-AI-to-work-on-AI work? The argument that "intelligence can't create higher intelligence than its own" seems outright ridiculous to me (we’re already doing this without that much work, in specific domains like Go), so I think it's at least possible. Probably it would work pretty fast? Maybe the more intelligent an AI gets, the more compute is required to train it, exponentially?
It's interesting, as I'm trying to think about this it feels like I'm working through mud. My brain keeps screaming at me that I'm not allowed to think about this, that I'm speculating far away from solid ground, and that I'm thinking dumb things about something I don't know anything about, and the smart people I respect who've debated this a thousand times in detail will feel frustrated reading it. I have to spend a good amount of attention continually bringing myself back to the object level.
But I’ve heard researchers express frustration that people aren’t taking AI risk very seriously, so if there’s a small chance that me slowly combing through my reactions to my very entry-level exposure to this might help, that’s why I’m laying it out.
But anyway; maybe there's exponential pressures against superintelligence? That we'll have a lot of stages where it's just not financially worth it to keep improving the AI, vs. using the currently-existing AI to do lots of profitable stuff, and all these slow stages will give us the time and information to figure out how to make the AI less scary? But if there's not exponential pressure, I think the change from "human-level AI" to "superintelligent AI" will go really fast. Maybe this is a crux for me here.
I am pretty confused about how long this will take. I do think stuff is moving faster than most people think, and interestingly I feel much braver/solid about that opinion. But here I feel like I end up relying on people more familiar with the industry than me, and people seem to be predicting anywhere from 3-20 years.
Am I being cowardly about having opinions again? Maybe. Fine, let's assume I must have my own opinions here. Fuck, having opinions is so scary. Okay - wild guesses here, the current AI capacity that I know of seems to be human-level good at lots of specific things like making art and poetry and essays and math and recognizing objects, and got this far mostly over the past... ten years? How much of our own brains does this make up? 25%? This would imply another 30 years before human-level AGI (which is presumably the thing we ctrl-c ctrl-v a lot and point it at itself to make superintelligence pretty shortly after that). But AI research itself is speeding up a lot, with way more funding, and my guess is a lot of the things to build are compounding, like how 50% of my difficulty with python was installing the console and after that the rest was much easier and more replicable. So... 10 years? That seems like a reasonable guess to me. It's also pretty squarely within prediction ranges of most of my peer groups, so I'm a little sus about it, but whatever. I’m pulling this prediction out of my ass, but at least it’s my ass.
Maybe this is more cowardice but I'm not gonna bother thinking too hard about the safety side right now. Maybe figuring out how to make the AI safe and nice to us will be easy, maybe hard, but right now my question is more how much should people (me? oh god oh no) be working on the safety at all. Regardless of the speed of transition from normal chill AGI to terrifyingly superintelligent AGI, or even if there’s a lot of stuff I’m failing to consider, even a 1% risk of human extinction is way too high and we should be pouring a massive amount of resources into it, and we are clearly acting like AI is less risky than it is right now. Even at 1% risk we should all be screaming.
To reiterate, a sheen over all of this is the fact that I'm in a particularly doomist section of AI culture. This makes me more suspicious, but cultural influence isn't itself an object-level argument, and I'm trying desperately to stick to those in order to avoid a self-doubt spiral.
When I was Christian, I was aware that lots of other people were devoutly into other religions, and I was like okay, how do I know I'm the right one? Do I just believe this because I was born into Christianity, the same way Muslims or Hindus or whatever do? And then I was like "well, I've got to trust the arguments themselves then. I can't trust my own feelings, I have to be genuinely open to being wrong and have to actually lose my faith if it stops making sense. This is what I would want the Muslims and Hindus and etc. to do, so it's only fair that I commit to it too." I then proceeded to stay in Christianity for a long time (evidence that ‘trusting the arguments themselves’ isn’t very effective) before finally leaving it (evidence that they’re at least kind of effective).
I've heard people say there's alarmists of every decade, and AI risk is just the most recent fad. They're not wrong, exactly? But this feels a little bit silly. For one, I think the average IQ/expertise of people worried about AI risk is significantly higher than those worried about most other world-ending things that never came to pass. And there were doomsday worries that were valid - the cold war was actually a near miss! And I'm sure there were many people in the past worried about actual catastrophes before they happened, and were mocked for it. The question is, does the world where AI riskers are a silly doomsday cult look meaningfully different than the world where AI riskers are actually predicting something scary, and if so, what's the differences? From the inside, how can I tell these apart? Telling me which world I must be in without giving me concrete evidence for how to tell the worlds apart is not a convincing argument to me!
Each individual point of AI risk arguments feel really salient and vivid to me, but once I step away from it they feel like they fall apart, like some part of my brain can't really connect them in a way that touches me. Okay, let's calibrate a bit:
If I were going to live, would I put money on the AI apocalypse happening in 10 years? I... think I would. How much? Probably... 35% of my current savings. I don't know what that means.
Do I want to go work in AI safety? No, not really? It makes me feel more generally supportive of people working in AI safety, but I don't feel any movement under my feet. This confuses me a bit, and (warning flag) gives me a little guilt/shame. "Don't you care about all of humanity?" the little voice asks. Well, I do. But maybe some part of me doesn't really believe? Or is it just I don't have any sense of making a difference?
In thinking about this, I get flashes of a vision where everyone I love dies, where I have children and they're annihilated. It feels like I lifted the lid on a box and glimpsed inside the mass of all human grief, like the substance of it condensed into this pure wailing thing, like whenever any person on the planet for all of history has hurt, this was the sacred substance that was bleeding into their soul, and in confronting AI risk I'm supposed to go swallow all of it at once. So I slam the box shut. I can't do it yet, in this moment I'm not strong enough.
But this is also kind of weird, because I think compared to most people I'm unusually good at facing and experiencing massive amounts of pain. Maybe I'm lying to myself about the degree? But I still think I'm better at accepting pain than most people who are in fact working in AI safety. Maybe flinching away from visions is more motivating to other people, and my own sense of motivation is more touchy? Maybe I just need to do LSD and meditate on everyone I love dying.
It almost feels like I'm compartmentalizing this extremely well, like I can simultaneously believe and not believe it at once. Is my mind doing what it needs to do to preserve function? Like the second I stop directly looking at AI risk, I forget it's there, and don't remember to start looking at it again, like I’m constantly waking up from a dream and struggling to remember what it was about. I do think I am uniquely avoidant in a way that rhymes like this; I'm very good at forgetting unpleasant things, and avoiding tasks by forgetting about them. Maybe my brain has a high resting set point of happiness and just can't intake anything that would disrupt it too badly?
I wonder to what degree the stillness around this comes from all of the other things in my brain - that I'm not allowed to have opinions about these, that I don't know what's going on, that I'm just trying to do a social thing. I wonder maybe if I were more easily clear on my own opinions, if this would feel more powerful.
I do feel afraid of AI, I can feel it in my body. I just don't know what to do. If I were a really good programmer, would I go do things about it? I think there's a decent chance, yeah.
Oo wave of helplessness there, huh
But in general, the heart of the matter feels like I’m looking at a teetering argument in the distance, like a really tall wobbly tower, and I’m like ‘looks sus.’ I then come up close and examine one joint at a time, they’re all well sealed and solid and I can’t find any flaws. But when I step away again, it just… looks sus all at once. This makes me really confused about all of it, like my perspective on its strength changes depending on where I happen to be standing that day. It’s also sus because lots of other bad frameworks have this quality - an abusive partner arguing you into staying might make perfectly reasonable, individual points, and you’ll probably only stay if you ignore how much the overall vibe makes you feel bad. I feel deceivable, and the “looks sus” from far away feels like an important signal of something, I just am not sure about what.
It feels like thinking about this isn’t just thinking about a concept, it feels like I’m trying to reason with someone who’s asking things of me. The conclusions around this feel very much like they should strongly impact my life, and some part of me hates that. “What do you want from me?” I feel like I’m yelling at AI risk, angrily. It feels like a bully of an idea, like I’m being asked to carry something too heavy for my fragile body, and the only way I’ll risk breaking myself beneath it is if I have zero doubt, if all this is clear and steady. I don’t want to believe that my life and the lives of everyone I love are at risk, especially when I feel financially insecure and the best skill I have is marketing sex work. And if this kind of attitude, at scale, ends up killing us, some part of me feels like then it’s worth dying.
I don’t know what I can do! I feel like I’m being asked to join a religion, but the religion isn’t telling me what to do, and I’m like, have I even joined the religion if nothing about my behavior changes? And there’s various small things I could maybe do, like earn to give or go volunteer free sexual labor for AI researchers (which i mean tbf already kinda doing) or go learn more and then try to debate fancy people, or go work as an assistant to some safety org, but I don’t want to do any of that.
And I know AI researchers themselves aren’t asking something of me specifically, and I’ve heard people talk about “live your lives well if you can’t help” or something, but I can’t just make this not relevant to me or my actions. Some anger comes up around this, like a part of me is mad at the people who told me about AI risk because now I have to live knowing about the lion in the cave in the back yard of my house when before I could have just lived my life and then died suddenly in peace, maybe.
Maybe I’m also imagining that people are judging me for not working in AI safety, like I’m bad at having priorities or my emotions in check or being smart enough to think clearly about it all. I’m also pretty afraid of publishing this piece, because it feels all so basic and childlike to me, like writing a ‘philosophy 101’ paper when much of your audience consists of philosophy professors.
But not wanting to be basic or be emotionally clouded doesn’t mean I’m not basic and emotionally clouded, and if looking directly at how AI risk emotionally clouds people helps, then maybe that’s something I can do.
And I wonder if all my hesitation and anger and selfishness here just comes from the fact that some big part of me isn’t fully convinced about AI risk, like if I actually believed it all then everything else would fall into place. Maybe I’m just spending a lot of effort trying to justify why I’m not doing a thing all my peers are doing because I don’t want to admit to myself that I don’t actually believe in AI risk, because not believing is low status. I don’t know! I can’t tell!
Overall, I still feel very overall confused! I’m really quite in flux and open to having my mind changed. I feel a general discomfort that I haven't yet seriously sat down with people who are familiar with the AI risk arguments and don't buy it and who want to try to reason me out of it. I feel as though I'm waiting to hold final opinions until I get some experience arguing for AI risk against people who are really against it, and find their arguments to not feel convincing. There's a big earnest part of me that would really like to be convinced out of this!
Knowingless is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.