Nick Bostrom: Superintelligence; Paths, Dangers, Strategies

Since 1940 the idea of reaching a state where machines would match human level intelligence was present. But what about reaching superhuman-level machine intelligence. I.J. Good wrote in 1956: “Let the ultraintelligent machine be defined as a machine that can surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an unltraintelligent machine could design even better machines; there would then unquestionably be an “intelligent explosion”, and the intelligence of man would be left far behind. Thus, the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.”[1]

Period of development of AI started after conference in Dartmouth College in 1956. In the first years it was about systems that can solve problems in very limited domain. But because of limitation of those systems: hardware limitations, poor handling of uncertainty, data scarcity, reliance on brittle and ungrounded symbolic representation; by mid-1970s, first “winter of AI” came. In early 1980’s new wave of research started, when Japan launched its Fifth-Generation Computer System Project and some other countries try to compete with them. But by late 1980’s this wave was losing its power and second “winter of AI” came. But by the 1990’s new technologies emerge to push AI development. Neural networks and genetic algorithms. They offered alternatives to GOFAI (good old-fashioned AI). Today AI can be much better than human in certain narrow domains like chess, games, … but it is still on a very raw level when it comes to things people can do without thinking.

Progress on two major fronts: towards a more solid statistical and information-theoretic foundation for machine learning on the one hand and towards practical and commercial success of various problem-specific application on the other; has restored to AI some of its lost prestige.

One area where AI is used a lot is financial markets. But in order to be aware of the risk, we could look at the 2010 Flash Crash situation (6.5.2010) when algorithms traded some E-Mini S&P 500 future contracts and started a chain of reactions of algorithms that lead to absurd selling and decrease of price. The situation was eventually stopped with computer-based safety function and majority of trades were afterwards canceled, but it was a good example of potential risks that automated system, that is following its logic, that did not predict certain scenarios, could have.

Developing system that will be able to create general level machine intelligence, we can talk about neuromorphic or synthetic approach. Also, we can see development of building self-learning machine in a way Turing sees it, like firm architecture that will develop with content or as seed AI, that will be able to develop its architecture and will work on recursive self-improvement. Whole brain emulation is another approach, where machine would scan and model brain. It requires scanning, translation and simulation. In general, whole brain emulation relies less on theoretical insight and more on technological capability than artificial intelligence. Improving biological cognition can bring some development of higher intelligence. Enhanced brain capabilities, probably through genetic modification will bring new potential for AI development. Another idea that was explored a lot was direct brain-computer connection. But because of so much limitation in this approach, this is not very viable way of going forward on a big scale. More conceivable path to superintelligence is through enhancement of networks and organizations. Humans have gain so much collective intelligence.

Enhanced biological or organizational intelligence would accelerate scientific and technological developments, potentially hastening the arrival of more radical forms of intelligence amplification such as whole brain emulation and AI.

Many machines and nonhuman animals already perform at superhuman levels in narrow domains. But we can define three superintelligence: collective, speed and quality superintelligence.

Probably the biggest push superintelligence will get is with improvement in digital intelligence, which will be based on development of hardware and software.

Hardware:

Speed of computational elements
Internal communication speed
Number of computational elements
Storage capacity
Reliability, lifespan and sensors

Software:

Duplicability
Editability
Good coordination
Memory sharing
New modules, modalities and algorithms

So, we know that at some time machines will reach and overtake biological capabilities. But how quickly will it happen and then how quickly will they evolve further to a level of superintelligence. We can talk about three scenarios of this transition. Slow, moderate and fast. Formula for rate of change in intelligence is optimization power divided by recalcitrance. The development of general AI from reaching human level to accelerate to superintelligence, will probably not be slow. Because when you think about it, it would probably be hard to emulate whole brain, but once when digital mind can be created, with small improvements, we can advance really quickly. There are three main sources of improvements: improvement in algorithms, content and hardware.

One of the questions in development of superintelligence is will there be one or many of them? And if it will be one, will it use its decisive strategic advantage, to create singleton – here dominance in the world. If we connected this question to speed of scenarios, we can say that if transition from human intelligence to superintelligence will be fast, then probably it will be only one, slow probably more, moderate then it can go both ways.

If superintelligence agent will be developed and they would want to use their existence to keep their advantage and take over the world, they would need to achieve superintelligent capabilities in the areas of: intelligence amplification, strategizing, social manipulation, hacking, technological research and economic productivity.

As we should be very careful not to anthropomorphize capabilities of superintelligence, we should also be careful about that when it comes to motivation. Question about what will superintelligence do, when it will become more and more intelligent can be seen from two theories. First thesis is orthogonality, that is saying, that intelligence and final goals are orthogonal; more or less any level of intelligence could in principle be combined with more or less and final goal. Second one is the instrumental convergence thesis holds that superintelligent agents having any wide range of final goals will nevertheless pursue similar intermediary goals because they have common instrumental reasons to do so.

When we think about predictions about superintelligent motivation, we can have three directions: predictability through design, predictability through inheritance and predictability through convergent instrumental reasons.

If we look from instrumental convergence thesis perspective and we can identify those convergent instrumental values, we can predict superintelligence behavior even if we don’t know its final goal. Some of those values could be: self-preservation, goal-content integrity, cognitive enhancement, technological perfection and resource acquisition.

Author shows how can superintelligence gain decisive advantage if it is launched first and can form singleton. After exploring potential motivation for acting and showing that AI can either have motivation detached from its final goals, or it can develop in some ways in order to achieve it final goals, we can estimate that there is potential for AI to develop in a way to see humans either as resource or a threat. One such scenario is »the treacherous run«. While weak, and AI behaves cooperatively (increasingly so, as it gets smarter). When the AI gets sufficiently strong – without warning or provocation -it strikes, forms a singleton, and begins directly to optimize the world according to the criteria implied by its final values. Another scenario is malignant failure modes. They could be: perverse instantiation, infrastructure profusion and mind crime.

If superintelligence that will achieve decisive advantage will really act in a way that will threaten human existence, can we have control over stopping it. We can look at control function from two perspective: capability control method, which aim to control what superintelligence can do; and motivation selection method, which aim to control what superintelligence wants to do.

Capability control methods can be: boxing methods (physical and informational containment), incentive methods, stunting and tripwires (behavior, ability and content).

Motivation selection methods can be: direct specification (set of rules or consequentialist), domesticity, indirect normativity and augmentation.

When we build an AI system in order to control it, some say we should only build question-answer machine. Or we could only build a tool and not an agent. But that is not so simple. Instead we can check for systems that can be developed and how they can influence control function. Systems (or castes) are: oracle, genies, sovereigns and tools. Oracle is essentially question-answering machine. A genie is a command-executing system. A sovereign is a system, that has open-ended mandate to operate in the world in pursuit of broad and possibly very long-range objectives.

Uni-polar outcome of superintelligence development is the one where single superintelligence gains decisive strategic advantage and use it to create singleton. But what about multi-polar outcome. Will it be better? Authors shows that even if this happen, there is no guarantee that it will not develop in next transitions back to singleton or that it will react in better ways from start.

So, if singleton is most probable scenario, we need to come back to control question and realize that capability method cannot be successful on a long run, superintelligence will find a way out. So, it is crucial to involve some kind of motivation restriction and upload our value system and make sure that it will not be altered. But that is hard to do, since when its system is not developed enough, it may not understand it, if it developed to much, it may reject it. One potential is to create system inside utility-maximizing agent framework. The ultimate goal would be creating a machine that can compute a good approximation of the expected utility of the actions available to it.

Some potential approaches to value-loading are: explicit representation, evolutionary selection, reinforcement learning, associative value accretion, motivational scaffolding, value learning, emulation modulation and institution design.

But even if we could solve value-loading challenge, what kind of values will we load. Suppose we could install any arbitrary final value into a seed IA. The decision as to which value install could then have the most reaching consequences. But even doing that, we should ask ourselves, is our prejudges and preconceptions really the best possible way forward. Some potential approach is to use indirect normativity. To offload some of ethics decisions on superintelligence. Yudowsky has proposed that a seed AI be given the final goal of carrying out humanity’s »coherent extrapolated volition« (CEV). So, this could be one form of indirect normativity. Another one is to let AI build morality models. We can call it moral rightness (MR). Another variation of this is moral permissibility model (MP). Or we can simply teach it to do what we mean. But all those options are content wise. About what to put into AI in value field. But we should also be careful about other components of building seed AI like: decision theory, epistemology and ratification.

If we look from broader perspective, we should have at least general picture about direction we are heading. We could look at all activities from personal perspective, how it will influence us. Another view is from impersonal, will it be good overall. One area is science and technology strategy. We should look at the areas like order of arrival, speed of changes, question of technology coupling, paths and enablers and cooperation and collaboration of development projects.

The common good principle should be established as a norm for development of seed AI or emulated one. Superintelligence should be developed only for the benefit of all of humanity and the service of widely shared ethical ideals.

In order to be prepared for intelligence explosion we should work on seeking the strategic light and building good capacity.

[1] In the book on page 5