Basic argument for AI Safety

The reason I think AI safety is important

Definitions

  • AGI = Artificial General Intelligence. Able to perform most tasks a human can perform at a similar or better level.
  • Aligned AI = AI that is "on our side". It is aligned with the humans motivations and morality. An aligned AI does not try to take out humanity. An aligned AI does not only do what we say, it does what we intended.

I'll break this up into a couple key points. Each point I'll give a confidence level, a couple of reasons why I think the point is true, and what I would need to see to think that I am wrong about a point.

AGI is possible

  • 99% confident
  • Humans don't have anything special going on in our heads. In theory, we could simulate an entire human brain on a computer and that would be AGI.
  • To be convinced that I'm wrong, I'd want to see an argument for how the human mind is doing something uncomputatble that a computer could never accomplish. Maybe we are wrong about reasoning and human brains are doing something special that only biologics could do.

AGI will come in 5-20 years

  • 60% confident
  • I think that AI progress has been tremendous in the last few years. The number of things an AI can't do is decreasing.
  • If next year's models are as good of a leap as gpt3 to gpt4, then I'll be more confident about this.
  • If next year's models are a much smaller improvement and the scaling laws fail to hold, then I think we would have to wait for another AI before major progress continues and this could take longer than 20 years

AGI by default is not aligned with us

  • 80% confident
  • The way we train these systems does not program a goal into them, it has them learn a goal.
  • There is a giant space of possible learned goals, most of these learned goals will not be the same as our intended goal. Goodhart's Law
  • Intended goals are hard to learn since our goals are so messy. It would be a near impossible task to formally code in morality such that we liked it if it was optimized well.
  • An AI could understand out goal, but not care about it. Orthogonality thesis
  • If I saw some evidence that the orthogonality thesis was wrong and as intelligence increases so to does morality, then I'd be convinced that I'm wrong.

AGI is hard to align

  • 50% confident
  • No one has figured out how to align an AGI yet, and we've been trying since 2001. Maybe with a lot more research we could, but there currently are 100x more people working on how to make AIs smarter than how to make them safe.
  • The question of alignment is the question of predicting how a being much smarter than you would act. This seems difficult, we don't have good theories for this.
  • I could see some heavily regulated plan where we iteratively make AIs smarter ensuring that they are aligned at each step. But this requires global coordination which I think is hard.
  • If we solve the fundamental alignment problem or there is tremendous progress towards it, then I'd be convinced that I'm wrong.

Unaligned AGI will intentional or unintentionally take out humanity

  • 90% confident
  • We can say little about what an AGI's learn goal is, but we can try to figure out goals that all smart things would want. These are called "instrumental goals". Examples of this are power seeking and protecting your self; If you are excellent at these things, it is more likely your goal will be achieved. An AGI could instrumentally decide to get rid of us if it didn't need us for survival and saw us as a threat.
  • An AI that doesn't care about humans (unaligned) doesn't need to really want to take human out, it could be from accident. It wants to get smarter, so it turns the earth into computers. AIs we train today are trying to maximize what ever goal they have, in general maximization leads to the AI wants to apply its goal to the whole galaxy. I don't think this goes well for us.
  • If I saw an argument that showed me that no matter how smart you are, there are still fundamental barrier to taking over the world, then I'd be convinced that I'm wrong.