Everything we do involves forecasts about how the future will unfold. The problem is, we’re not very good at it. In a landmark, twenty-year study, Wharton professor Philip Tetlock showed that even the average expert was only slightly better at predicting the future than random guesswork.
An Optimistic Skeptic
WE ARE ALL forecasters. When we think about changing jobs, getting married, buying a home, making an investment, launching a product, or retiring, we decide based on how we expect the future will unfold. These expectations are forecasts.
Forecasting is not a “you have it or you don’t” talent. It is a skill that can be cultivated.
Our desire to reach into the future will always exceed our grasp.
I believe it is possible to see into the future, at least in some situations and to some extent, and that any intelligent, open-minded, and hardworking person can cultivate the requisite skills. Call me an “optimistic skeptic.”
Edward Lorenz shifted scientific opinion toward the view that there are hard limits on predictability, a deeply philosophical question.
In 1814 the French mathematician and astronomer Pierre-Simon Laplace: We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes. Laplace called his imaginary entity a “demon.” If it knew everything about the present, Laplace thought, it could predict everything about the future. It would be omniscient.
If the clock symbolizes perfect Laplacean predictability, its opposite is the Lorenzian cloud.
How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances. It is generally true that the further we try to look into the future, the harder it is to see.
You might think the goal of forecasting is to foresee the future accurately, but that’s often not the goal, or at least not the sole goal. Sometimes forecasts are meant to entertain.
- Sometimes forecasts are used to advance political agendas and galvanize action.
- There is also dress-to-impress forecasting.
- And some forecasts are meant to comfort.
This jumble of goals is seldom acknowledged, which makes it difficult to even start working toward measurement and progress.
What makes these superforecasters so good? It’s not really who they are. It is what they do. Foresight isn’t a mysterious gift bestowed at birth. It is the product of particular ways of thinking, of gathering information, of updating beliefs. These habits of thought can be learned and cultivated by any intelligent, thoughtful, determined person.
The difference between heavyweights and amateurs, she (Anne Duke) said, is that the heavyweights know the difference between a 60⁄40 bet and a 40⁄60 bet.
Superforecasting does require minimum levels of intelligence, numeracy, and knowledge of the world, but anyone who reads serious books about psychological research probably has those prerequisites.
Superforecasting demands thinking that is open-minded, careful, curious, and — above all — self-critical. It also demands focus. The kind of thinking that produces superior judgment does not come effortlessly. Only the determined can deliver it reasonably consistently, which is why our analyses have consistently found commitment to self-improvement to be the strongest predictor of performance.
When you have a well-validated statistical algorithm, use it. This insight was never a threat to the reign of subjective judgment because we so rarely have well-validated algorithms for the problem at hand.
Watson’s chief engineer, David Ferrucci. Machines may get better at “mimicking human meaning,” and thereby better at predicting human behavior, but “there’s a difference between mimicking and reflecting meaning and originating meaning,” Ferrucci said.
Ferrucci sees light at the end of this long dark tunnel: “I think it’s going to get stranger and stranger” for people to listen to the advice of experts whose views are informed only by their subjective judgment. Human thought is beset by psychological pitfalls, a fact that has only become widely recognized in the last decade or two. “So what I want is that human expert paired with a computer to overcome the human cognitive limitations and biases.”
Illusions of Knowledge
We have all been too quick to make up our minds and too slow to change them. And if we don’t examine how we make these mistakes, we will keep making them. This stagnation can go on for years. Or a lifetime.
In 1747, when a British ship’s doctor named James Lind took twelve sailors suffering from scurvy, divided them into pairs, and gave each pair a different treatment: vinegar, cider, sulfuric acid, seawater, a bark paste, and citrus fruit. It was an experiment born of desperation. Lind took six shots in the dark — and one hit. The two sailors given the citrus recovered quickly.
Not until the twentieth century did the idea of randomized trial experiments, careful measurement, and statistical power take hold.
It was cargo cult science, a term of mockery coined much later by the physicist Richard Feynman to describe what happened after American airbases from World War II were removed from remote South Pacific islands. Cargo cult science has the outward form of science but lacks what makes it truly scientific.
“Doubt is not a fearful thing,” Feynman observed, “but a thing of very great value.”
The rate of the development of science is not the rate at which you make observations alone but, much more important, the rate at which you create new things to test.
If a question is asked and you instantly know the answer, it sprang from System 1. System 2 is charged with interrogating that answer. The standard routine in decision making is this: first System 1 delivers an answer, and only then can System 2 get involved, starting with an examination of what System 1 decided.
A defining feature of intuitive judgment is its insensitivity to the quality of the evidence on which the judgment is based.
The explanatory urge is mostly a good thing. Indeed, it is the propulsive force behind all human efforts to comprehend reality. The problem is that we move too fast from confusion and uncertainty (“I have no idea why my hand is pointed at a picture of a shovel”) to a clear and confident conclusion (“Oh, that’s simple”) without spending any time in between (“This is one possible explanation but there are others”).
Scientists must be able to answer the question “What would convince me I am wrong?” If they can’t, it’s a sign they have grown too attached to their beliefs.
The key is doubt. Scientists can feel just as strongly as anyone else that they know The Truth. But they know they must set that feeling aside and replace it with finely measured degrees of doubt — doubt that can be reduced (although never to zero) by better evidence from better studies.
Bait and switch: when faced with a hard question, we often surreptitiously replace it with an easy one. So the availability heuristic — like Kahneman’s other heuristics — is essentially a bait-and-switch maneuver. And just as the availability heuristic is usually an unconscious System 1 activity, so too is bait and switch.
The instant we wake up and look past the tip of our nose, sights and sounds flow to the brain and System 1 is engaged. This perspective is subjective, unique to each of us. Only you can see the world from the tip of your own nose. So let’s call it the tip-of-your-nose perspective.
Popular books often draw a dichotomy between intuition and analysis — “blink” versus “think” — and pick one or the other as the way to go.
But blink-think is another false dichotomy. The choice isn’t either/or, it is how to blend them in evolving situations.
Magnus Carlsen, the world chess champion and the highest-ranked player in history. “If I study a position for an hour then I am usually going in loops and I’m probably not going to come up with something useful. I usually know what I am going to do after 10 seconds; the rest is double-checking.”
All too often, forecasting in the twenty-first century looks too much like nineteenth-century medicine. There are theories, assertions, and arguments.
Keeping Score
Bringing the rigor of measurement to forecasting might seem easier to do: collect forecasts, judge their accuracy, add the numbers. That’s it. In no time, we’ll know how good Tom Friedman really is.
At lunch one day in 1988, my then – Berkeley colleague Daniel Kahneman tossed out a testable idea that proved prescient. He speculated that intelligence and knowledge would improve forecasting but the benefits would taper off fast.
Take the problem of timelines. Obviously, a forecast without a time frame is absurd. And yet, forecasters routinely make them.
That’s why forecasts without timelines don’t appear absurd when they are made. But as time passes, memories fade, and tacit time frames that once seemed obvious to all become less so. The result is often a tedious dispute about the “real” meaning of the forecast.
This problem alone renders many everyday forecasts untestable. Similarly, forecasts often rely on implicit understandings of key terms rather than explicit definitions.
These are among the smaller obstacles to judging forecasts. Probability is a much bigger one. Some forecasts are easy to judge because they claim unequivocally that something will or won’t happen.
In intelligence circles, Sherman Kent is a legend.
The key word in Kent’s work is estimate. As Kent wrote, “estimating is what you do when you do not know.”
Forecasting is all about estimating the likelihood of something happening.
People liked clarity and precision in principle but when it came time to make clear and precise forecasts, they weren’t so keen on numbers.
A more serious objection — then and now — is that expressing a probability estimate with a number may imply to the reader that it is an objective fact, not the subjective judgment it is. That is a danger.
Study after study showed that people attach very different meanings to probabilistic language like “could,” “might,” and “likely.”
We cannot rerun history so we cannot judge one probabilistic forecast — but everything changes when we have many probabilistic forecasts. This is called calibration.
This method works well for weather forecasts because there is new weather every day and forecasts stack up fast. But it works less well for events like presidential elections because it would take centuries — undisturbed by wars, plagues, and other shocks that perturb the true underlying causes — to pile up enough forecasts to make the statistics work.
Important as calibration is, it’s not the whole story because “perfect calibration” isn’t what we think of when we imagine perfect forecasting accuracy. Perfection is godlike omniscience. It’s saying “this will happen” and it does, or “this won’t happen” and it doesn’t. The technical term for this is “resolution.”
When we combine calibration and resolution, we get a scoring system that fully captures our sense of what good forecasters should do. Someone who says there is a 70 % chance of X should do fairly well if X happens. But someone who says there is a 90 % chance of X should do better. And someone bold enough to correctly predict X with 100 % confidence gets top marks. But hubris must be punished. The forecaster who says X is a slam dunk should take a big hit if X does not happen. How big a hit is debatable, but it’s reasonable to think of it in betting terms.
The math behind this system was developed by Glenn W. Brier in 1950, hence results are called Brier scores. In effect, Brier scores measure the distance between what you forecast and what actually happened.
So, we’ve come a long way. We have forecasting questions with clearly defined terms and timelines. We have lots of forecasts with numbers, and the math to calculate scores. We have squeezed out as much ambiguity as appears humanly possible.
The whole point of this exercise is to judge the accuracy of forecasts so we can then figure out what works in forecasting and what doesn’t. To do that, we have to interpret the meaning of the Brier scores, which requires two more things: benchmarks and comparability.
What a Brier score means depends on what’s being forecast.
Another key benchmark is other forecasters. Who can beat everyone else? Who can beat the consensus forecast? How do they pull it off? Answering these questions requires comparing Brier scores, which, in turn, requires a level playing field.
The final results appeared in 2005 — twenty-one years, six presidential elections, and three wars after I sat on the National Research Council panel that got me thinking about forecasting. I published them in the academic treatise Expert Political Judgment: How Good Is It? How Can We Know? To keep things simple, I’ll call this whole research program “EPJ.”
If you didn’t know the punch line of EPJ before you read this book, you do now: the average expert was roughly as accurate as a dart-throwing chimpanzee.
So why did one group do better than the other? It wasn’t whether they had PhDs or access to classified information. Nor was it what they thought — whether they were liberals or conservatives, optimists or pessimists. The critical factor was how they thought. One group tended to organize their thinking around Big Ideas, although they didn’t agree on which Big Ideas were true or false.
As ideologically diverse as they were, they were united by the fact that their thinking was so ideological. They sought to squeeze complex problems into the preferred cause-effect templates and treated what did not fit as irrelevant distractions.
They were unusually confident and likelier to declare things “impossible” or “certain.”
The other group consisted of more pragmatic experts who drew on many analytical tools, with the choice of tool hinging on the particular problem they faced. These experts gathered as much information from as many sources as they could. When thinking, they often shifted mental gears, sprinkling their speech with transition markers such as “however,” “but,” “although,” and “on the other hand.” They talked about possibilities and probabilities, not certainties.
Decades ago, the philosopher Isaiah Berlin wrote a much-acclaimed but rarely read essay that compared the styles of thinking of great authors through the ages. To organize his observations, he drew on a scrap of 2,500-year-old Greek poetry attributed to the warrior-poet Archilochus: “The fox knows many things but the hedgehog knows one big thing.”
I dubbed the Big Idea experts “hedgehogs” and the more eclectic experts “foxes.” Foxes beat hedgehogs. And the foxes didn’t just win by acting like chickens, playing it safe with 60 % and 70 % forecasts where hedgehogs boldly went with 90 % and 100 %. Foxes beat hedgehogs on both calibration and resolution. Foxes had real foresight. Hedgehogs didn’t.
Like all of us, hedgehog forecasters first see things from the tip-of-your-nose perspective. That’s natural enough. But the hedgehog also “knows one big thing,” the Big Idea he uses over and over when trying to figure out what will happen next. Think of that Big Idea like a pair of glasses that the hedgehog never takes off. The hedgehog sees everything through those glasses.
The simplicity and confidence of the hedgehog impairs foresight, but it calms nerves — which is good for the careers of hedgehogs.
The Wisdom of Crowds. Aggregating the judgment of many consistently beats the accuracy of the average member of the group, and is often as startlingly accurate as Galton’s weight-guessers. The collective judgment isn’t always more accurate than any individual guess, however. In fact , in any group there are likely to be individuals who beat the group . But those bull’s-eye guesses typically say more about the power of luck — chimps who throw a lot of darts will get occasional bull’s-eyes — than about the skill of the guesser.
Beating the average consistently requires rare skill.
Hundreds of people added valid information, creating a collective pool far greater than any one of them possessed. Of course they also contributed myths and mistakes, creating a pool of misleading clues as big as the pool of useful clues.
How well aggregation works depends on what you are aggregating. Aggregating the judgments of many people who know nothing produces a lot of nothing.
Aggregations of aggregations can also yield impressive results.
Now look at how foxes approach forecasting. They deploy not one analytical idea but many and seek out information not from one source but many. Then they synthesize it all into a single conclusion. In a word, they aggregate.
Like us, dragonflies have two eyes, but theirs are constructed very differently. Each eye is an enormous, bulging sphere, the surface of which is covered with tiny lenses. Depending on the species, there may be as many as thirty thousand of these lenses on a single eye, each one occupying a physical space slightly different from those of the adjacent lenses, giving it a unique perspective. Information from these thousands of unique perspectives flows into the dragonfly’s brain where it is synthesized into vision so superb that the dragonfly can see in almost every direction simultaneously, with the clarity and precision it needs to pick off flying insects at high speed.
A fox with the bulging eyes of a dragonfly is an ugly mixed metaphor but it captures a key reason why the foresight of foxes is superior to that of hedgehogs with their green-tinted glasses. Foxes aggregate perspectives.
Stepping outside ourselves and really getting a different view of reality is a struggle. But foxes are likelier to give it a try.
Remember the old reflexivity-paradox joke. There are two types of people in the world: those who think there are two types and those who don’t. I’m of the second type. My fox/hedgehog model is not a dichotomy. It is a spectrum.
People can and do think differently in different circumstances — cool and calculating at work, perhaps, but intuitive and impulsive when shopping. And our thinking habits are not immutable. Sometimes they evolve without our awareness of the change. But we can also, with effort, choose to shift gears from one mode to another.
“All models are wrong,” the statistician George Box observed, “but some are useful.” The fox/hedgehog model is a starting point, not the end.
Superforecasters
This particular bait and switch — replacing “Was it a good decision?” with “Did it have a good outcome?” — is both popular and pernicious. Savvy poker players see this mistake as a beginner’s blunder. A novice may overestimate the probability that the next card will win her the hand, bet big, get lucky, and win, but winning doesn’t retroactively make her foolish bet wise.
In 2006 the Intelligence Advanced Research Projects Activity (IARPA) was created. Its mission is to fund cutting-edge research with the potential to make the intelligence community smarter and more effective. As its name suggests, IARPA was modeled after DARPA, the famous defense agency whose military-related research has had a huge influence on the modern world. DARPA’s work even contributed to the invention of the Internet.
To have accountability for process but not accuracy is like ensuring that physicians wash their hands, examine the patient, and consider all the symptoms, but never checking to see whether the treatment works.
Doug Lorch doesn’t look like a threat to anyone. He looks like a computer programmer, which he was, for IBM.
Doug’s accuracy was as impressive as his volume. At the end of the first year, Doug’s overall Brier score was 0.22, putting him in fifth spot among the 2,800 competitors in the Good Judgment Project. Remember that the Brier score measures the gap between forecasts and reality, where 2.0 is the result if your forecasts are the perfect opposite of reality, 0.5 is what you would get by random guessing, and 0 is the center of the bull’s-eye.
In year 2, Doug joined a superforecaster team and did even better, with a final Brier score of 0.14, making him the best forecaster of the 2,800 GJP volunteers.
Regular forecasters needed to triple their foresight to see as far as superforecasters.
The psychologist Ellen Langer has shown how poorly we grasp randomness in a series of experiments.
Most things in life involve skill and luck, in varying proportions. The mix may be almost all luck and a little skill, or almost all skill and a little luck, or it could be one of a thousand other possible variations. That complexity makes it hard to figure out what to chalk up to skill and what to luck — a subject probed in depth by Michael Mauboussin, a global financial strategist, in his book The Success Equation.
Regression to the mean is an indispensable tool for testing the role of luck in performance: Mauboussin notes that slow regression is more often seen in activities dominated by skill, while faster regression is more associated with chance.
Two key conclusions. One we should not treat the superstars of any given year as infallible, not even Doug Lorch. Luck plays a role and it is only to be expected that the superstars will occasionally have a bad year and produce ordinary results — just as superstar athletes occasionally look less than stellar. But more basically, and more hopefully, we can conclude that the superforecasters were not just lucky. Mostly, their results reflected skill.
Supersmart?
Knowledge is something we can all increase, but only slowly. People who haven’t stayed mentally active have little hope of catching up to lifelong learners. Intelligence feels like an even more daunting obstacle.
Regular forecasters scored higher on intelligence and knowledge tests than about 70 % of the population. Superforecasters did better, placing higher than about 80 % of the population.
It seems intelligence and knowledge help but they add little beyond a certain threshold — so superforecasting does not require a Harvard PhD and the ability to speak five languages.
How many piano tuners are there in Chicago? The Italian American physicist Enrico Fermi — a central figure in the invention of the atomic bomb — concocted this little brainteaser decades before the invention of the Internet. Fermi knew people could do much better and the key was to break down the question with more questions.
What Fermi understood is that by breaking down the question, we can better separate the knowable and the unknowable. So, guessing — pulling a number out of the black box — isn’t eliminated. But we have brought our guessing process out into the light of day where we can inspect it. And the net result tends to be a more accurate estimate than whatever number happened to pop out of the black box when we first read the question.
Fermi-izing dares us to be wrong.
Fermi was renowned for his estimates. With little or no information at his disposal, he would often do back-of-the-envelope calculations like this to come up with a number that subsequent measurement revealed to be impressively accurate.
Statisticians call that the base rate — how common something is within a broader class. Daniel Kahneman has a much more evocative visual term for it. He calls it the “outside view” — in contrast to the “inside view,” which is the specifics of the particular case.
When we make estimates, we tend to start with some number and adjust. The number we start with is called the anchor. It’s important because we typically underadjust, which means a bad anchor can easily produce a bad estimate.
Coming up with an outside view, an inside view, and a synthesis of the two isn’t the end. It’s a good beginning. Superforecasters constantly look for other views they can synthesize into their own. There are many different ways to obtain new perspectives. What do other forecasters think? What outside and inside views have they come up with? What are experts saying? You can even train yourself to generate different perspectives.
Researchers have found that merely asking people to assume their initial judgment is wrong, to seriously consider why that might be, and then make another judgment, produces a second estimate which, when combined with the first, improves accuracy almost as much as getting a second estimate from another person.
Superforecasters pursue point-counterpoint discussions routinely, and they keep at them long past the point where most people would succumb to migraines.
Active open-mindedness (AOM) is a concept coined by the psychologist Jonathan Baron, who has an office next to mine at the University of Pennsylvania. Baron’s test for AOM asks whether you agree or disagree with statements like: People should take into consideration evidence that goes against their beliefs. It is more useful to pay attention to those who disagree with you than to pay attention to those who agree. Changing your mind is a sign of weakness. Intuition is the best guide in making decisions. It is important to persevere in your beliefs even when evidence is brought to bear against them.
For superforecasters, beliefs are hypotheses to be tested, not treasures to be guarded.
Superquants?
I have yet to find a superforecaster who isn’t comfortable with numbers and most are more than capable of putting them to practical use.
On Wall Street, math wizards are called quants, and the math they use can get a lot more esoteric than Monte Carlo models. Given superforecasters’ affinity for data it would be reasonable to suspect that it explains their superb results.
Superior numeracy does help superforecasters, but not because it lets them tap into arcane math models that divine the future. The truth is simpler, subtler, and much more interesting.
A smart executive will not expect universal agreement, and will treat its appearance as a warning flag that groupthink has taken hold.
A simple averaging would be a good start. Or he could do a weighted averaging — so that those whose judgment he most respects get more say in the collective conclusion. Either way, it is dragonfly eye at work.
The real Leon Panetta (analogy to movie Zero Dark Thirty) understands process-outcome paradoxes like this. And he is much less keen on certainty than the fictional Leon Panetta. “Nothing is one hundred percent,” he said several times during our interview. The real Leon Panetta thinks like a superforecaster.
But as researchers have shown, people who use “50 %” or “fifty-fifty” often do not mean it literally. They mean “I’m not sure” or “it’s uncertain” — or more simply “maybe.”
Amos Tversky in dealing with probabilities, he said, most people only have three settings: “gonna happen,” “not gonna happen,” and “maybe.”
It was remarkably late in history — arguably as late as the 1713 publication of Jakob Bernoulli’s Ars Conjectandi — before the best minds started to think seriously about probability.
Our ancestors couldn’t maintain a state of constant alert. The cognitive cost would have been too great. They needed worry-free zones. The solution? Ignore small chances and use the two-setting dial as much as possible. Either it is a lion or it isn’t. Only when something undeniably falls between those two settings — only when we are compelled — do we turn the mental dial to maybe.
People equate confidence and competence, which makes the forecaster who says something has a middling probability of happening less worthy of respect.
This sort of primal thinking goes a long way to explaining why so many people have a poor grasp of probability.
Scientists come at probability in a radically different way. They relish uncertainty, or at least accept it, because in scientific models of reality, certainty is illusory.
In the popular mind, scientists generate facts and chisel them into granite tablets. This collection of facts is what we call “science.” As the work of accumulating facts proceeds, uncertainty is pushed back. The ultimate goal of science is uncertainty’s total eradication. But that is a very nineteenth – century view of science. One of twentieth – century science’s great accomplishments has been to show that uncertainty is an ineradicable element of reality. “Uncertainty is real,” Byers writes. “It is the dream of total certainty that is an illusion.”
The finer grained the better, as long as the granularity captures real distinctions — meaning that if outcomes you say have an 11 % chance of happening really do occur 1 % less often than 12 % outcomes and 1 % more often than 10 % outcomes. This complex mental dial is the basis of probabilistic thinking.
An awareness of irreducible uncertainty is the core of probabilistic thinking, but it’s a tricky thing to measure.
To do that, we took advantage of a distinction that philosophers have proposed between “epistemic” and “aleatory” uncertainty. Epistemic uncertainty is something you don’t know but is, at least in theory, knowable.
Aleatory uncertainty is something you not only don’t know; it is unknowable.
We should expect frequent users of 50 % to be less accurate.
Superforecasters were much more granular.
How can we know that the granularity we see among superforecasters is meaningful?
Most people never attempt to be as precise as Brian, preferring to stick with what they know, which is the two- or three-setting mental model. That is a serious mistake. As the legendary investor Charlie Munger sagely observed, “If you don’t get this elementary, but mildly unnatural, mathematics of elementary probability into your repertoire, then you go through a long life like a one- legged man in an ass-kicking contest.”
When something unlikely and important happens it’s deeply human to ask “Why?”
Oprah Winfrey, in a commencement address at Harvard University that “there is no such thing as failure”. Failure is just life trying to move us in another direction.
Meaning is a basic human need. As much research shows, the ability to find it is a marker of a healthy, resilient mind.
Science doesn’t tackle “why” questions about the purpose of life. It sticks to “how” questions that focus on causation and probabilities.
“Maybe” suggests that, contra Einstein, God does play dice with the cosmos. Thus, probabilistic thinking and divine-order thinking are in tension.
A probabilistic thinker will be less distracted by “why” questions and focus on “how.” This is no semantic quibble. “Why?” directs us to metaphysics; “How?” sticks with physics.
If it’s true that probabilistic thinking is essential to accurate forecasting, and it-was-meant-to-happen thinking undermines probabilistic thinking, we should expect superforecasters to be much less inclined to see things as fated.
Supernewsjunkies?
SUPERFORECASTING ISN’T A paint-by-numbers method but superforecasters often tackle questions in a roughly similar way — one that any of us can follow: Unpack the question into components. Distinguish as sharply as you can between the known and unknown and leave no assumptions unscrutinized . Adopt the outside view and put the problem into a comparative perspective that downplays its uniqueness and treats it as a special case of a wider class of phenomena. Then adopt the inside view that plays up the uniqueness of the problem. Also explore the similarities and differences between your views and those of others — and pay special attention to prediction markets and other methods of extracting wisdom from crowds. Synthesize all these different views into a single vision as acute as that of a dragonfly. Finally, express your judgment as precisely as you can, using a finely grained scale of probability.
Superforecasters update much more frequently, on average, than regular forecasters.
An updated forecast is likely to be a better-informed forecast and therefore a more accurate forecast.
So, there are two dangers a forecaster faces after making the initial call. One is not giving enough weight to new information. That’s underreaction. The other danger is overreacting to new information, seeing it as more meaningful than it is, and adjusting a forecast too radically. Both under- and overreaction can diminish accuracy. Both can also, in extreme cases, destroy a perfectly good forecast.
Commitment can come in many forms, but a useful way to think of it is to visualize the children’s game Jenga, which starts with building blocks stacked one on top of another to form a little tower. Players take turns removing building blocks until someone removes the block that topples the tower. Our beliefs about ourselves and the world are built on each other in a Jenga-like fashion.
When a block is at the very base of the tower, there’s no way to remove it without bringing everything crashing down.
This suggests that superforecasters may have a surprising advantage: they’re not experts or professionals, so they have little ego invested in each forecast.
People base their estimate on what they think is a useful tidbit of information. Then they encounter clearly irrelevant information — meaningless noise — which they indisputably should ignore. But they don’t. They sway in the wind, at the mercy of the next random gust of irrelevant information. Such swaying is overreaction, a common and costly mistake.
Given superforecasters’ modest commitment to their forecasts, we would expect overreactions. And yet, superforecasters often manage to avoid both errors.
Forecasters should feel the same about under- and overreaction to new information, the Scylla and Charybdis of forecasting. Good updating is all about finding the middle passage.
Superforecasters not only update more often than other forecasters, they update in smaller increments.
Why this works is no mystery. A forecaster who doesn’t adjust her views in light of new information won’t capture the value of that information, while a forecaster who is so impressed by the new information that he bases his forecast entirely on it will lose the value of the old information that underpinned his prior forecast. But the forecaster who carefully balances old and new captures the value in both — and puts it into her new forecast. The best way to do that is by updating often but bit by bit.
A Presbyterian minister, educated in logic, Bayes was born in 1701, so he lived at the dawn of modern probability theory , a subject to which he contributed with “An Essay Towards Solving a Problem in the Doctrine of Chances.” That essay, in combination with the work of Bayes’ friend Richard Price, who published Bayes’ essay posthumously in 1761, and the insights of the great French mathematician Pierre-Simon La-place, ultimately produced Bayes’ theorem. It looks like this: P (H|D)/P ( -H|D) = P (D|H) • P (D|- H) • P (H) / P(- H) Posterior Odds = Likelihood Ratio • Prior Odds. The Bayesian belief-updating equation.
In simple terms, the theorem says that your new belief should depend on two things — your prior belief (and all the knowledge that informed it) multiplied by the “diagnostic value” of the new information.
Minto is a Bayesian who does not use Bayes’ theorem . That paradoxical description applies to most superforecasters.
Perpetual Beta
Keynes was breathtakingly intelligent and energetic. “There is no harm in being sometimes wrong, especially if one is promptly found out,” he wrote in 1933. For Keynes, failure was an opportunity to learn — to identify mistakes, spot new alternatives, and try again.
The one consistent belief of the “consistently inconsistent” John Maynard Keynes was that he could do better. Failure did not mean he had reached the limits of his ability. It meant he had to think hard and give it another go. Try, fail, analyze, adjust, try again: Keynes cycled through those steps ceaselessly.
Research on calibration — how closely your confidence matches your accuracy — routinely finds people are too confident.
Consider the Forer effect, named for the psychologist Bertram Forer, who asked some students to complete a personality test, then gave them individual personality profiles based on the results and asked how well the test captured their individual personalities. People were impressed by the test, giving it an average rating of 4.2 out of 5 — which was remarkable because Forer had actually taken vague statements like “you have a great need for people to like and admire you” from a book on astrology, assembled them into a profile, and given the same profile to everyone.
Vague language is elastic language.
Whenever a question closes, it’s obvious that superforecasters — in sharp contrast to Carol Dweck’s fixed-mindset study subjects — are as keen to know how they can do better as they are to know how they did.
Even with a growth mindset, the forecaster who wants to improve has to have a lot of what my colleague Angela Duckworth dubbed “grit.”
Computer programmers have a wonderful term for a program that is not intended to be released in a final version but will instead be used, analyzed , and improved without end. It is “perpetual beta.” Superforecasters are perpetual beta.
We have learned a lot about superforecasters, from their lives to their test scores to their work habits. Taking stock, we can now sketch a rough composite portrait of the modal superforecaster. In philosophic outlook, they tend to be:
- CAUTIOUS: Nothing is certain
- HUMBLE: Reality is infinitely complex
- NONDETERMINISTIC: What happens is not meant to be and does not have to happen. In their abilities and thinking styles, they tend to be:
- ACTIVELY OPEN – MINDED: Beliefs are hypotheses to be tested, not treasures to be protected
- INTELLIGENT AND KNOWLEDGEABLE, WITH A “NEED FOR COGNITION”: Intellectually curious, enjoy puzzles and mental challenges
- REFLECTIVE: Introspective and self – critical
- NUMERATE: Comfortable with numbers. In their methods of forecasting they tend to be:
- PRAGMATIC: Not wedded to any idea or agenda
- ANALYTICAL: Capable of stepping back from the tip-of-your-nose perspective and considering other views
- DRAGONFLY-EYED: Value diverse views and synthesize Judge using many grades of maybe
- THOUGHTFUL UPDATERS: When facts change, they change their minds
- GOOD INTUITIVE PSYCHOLOGISTS: Aware of the value of checking thinking for cognitive and emotional biases. In their work ethic, they tend to have:
- A GROWTH MINDSET: Believe it’s possible to get better
- GRIT: Determined to keep at it however long it takes
To paraphrase Thomas Edison, superforecasting appears to be roughly 75 % perspiration, 25 % inspiration.
Superteams
In his 1972 classic, Victims of Groupthink, the psychologist Irving Janis explored the decision making that went into both the Bay of Pigs invasion and the Cuban missile crisis. Today, everyone has heard of groupthink, although few have read the book that coined the term or know that Janis meant something more precise than the vague catchphrase groupthink has become today. In Janis’s hypothesis, “members of any small cohesive group tend to maintain esprit de corps by unconsciously developing a number of shared illusions and related norms that interfere with critical thinking and reality testing.”
We can’t all be wrong, can we?
Teams can cause terrible mistakes. They can also sharpen judgment and accomplish together what cannot be done alone. Managers tend to focus on the negative or the positive but they need to see both. As mentioned earlier, the term “wisdom of crowds” comes from James Surowiecki’s 2004 bestseller of the same name, but Surowiecki’s title was itself a play on the title of a classic 1841 book, Extraordinary Popular Delusions and the Madness of Crowds, which chronicled a litany of collective folly. Groups can be wise, or mad, or both. What makes the difference isn’t just who is in the group, Kennedy’s circle of advisers demonstrated. The group is its own animal.
If forecasters can keep questioning themselves and their teammates, and welcome vigorous debate, the group can become more than the sum of its parts.
Since Socrates, good teachers have practiced precision questioning, but still it’s often not used when it’s needed most.
Teams were 23 % more accurate than individuals.
Experience helped. Seeing this “dancing around,” people realized that excessive politeness was hindering the critical examination of views, so they made special efforts to assure others that criticism was welcome.
“The team is so much more effective at gathering information than one person could ever be.”
On average, when a forecaster did well enough in year 1 to become a superforecaster, and was put on a superforecaster team in year 2, that person became 50 % more accurate.
Teams of ordinary forecasters beat the wisdom of the crowd by about 10 %. Prediction markets beat ordinary teams by about 20 %. And superteams beat prediction markets by 15 % to 30 %.
How the group thinks collectively is an emergent property of the group itself, a property of communication patterns among group members, not just the thought processes inside each member.
All this brings us to the final feature of winning teams: the fostering of a culture of sharing.
My Wharton colleague Adam Grant categorizes people as “givers,” “matchers,” and “takers.” Givers are those who contribute more to others than they receive in return; matchers give as much as they get; takers give less than they take.
The aggregation of different perspectives is a potent way to improve judgment, but the key word is different. Combining uniform perspectives only produces more of the same, while slight variation will produce slight improvement. It is the diversity of the perspectives that makes the magic work.
The Leader’s Dilemma
LEADERS MUST DECIDE, and to do that they must make and use forecasts. The more accurate those forecasts are, the better, so the lessons of superforecasting should be of intense interest to them. But leaders must also act and achieve their goals. In a word, they must lead. And anyone who has led people may have doubts about how useful the lessons of superforecasting really are for leaders.
How can leaders be confident, and inspire confidence, if they see nothing as certain? How can they be decisive and avoid “analysis paralysis” if their thinking is so slow, complex, and self-critical? How can they act with relentless determination if they readily adjust their thinking in light of new information or even conclude they were wrong? And underlying superforecasting is a spirit of humility — a sense that the complexity of reality is staggering, our ability to comprehend limited, and mistakes inevitable. No one ever described Winston Churchill, Steve Jobs, or any other great leader as “humble.” Well, maybe Gandhi. But try to name a second and a third.
Leaders must be forecasters and leaders but it seems that what is required to succeed at one role may undermine the other.
The superforecaster model can help make good leaders superb and the organizations they lead smart, adaptable, and effective. The key is an approach to leadership and organization first articulated by a nineteenth-century Prussian general, perfected by the German army of World War II, made foundational doctrine by the modern American military, and deployed by many successful corporations today.
“In war, everything is uncertain,” wrote Helmuth von Moltke.
His writings on war — which were themselves influenced by the great theorist Carl von Clausewitz — profoundly shaped the German military that fought the two world wars.
“It is impossible to lay down binding rules” that apply in all circumstances, he wrote. In war, “wo cases never will be exactly the same.” Improvisation is essential.
So, a leader must possess unwavering determination to overcome obstacles and accomplish his goals — while remaining open to the possibility that he may have to throw out the plan and try something else.
What ties all of this together — from “nothing is certain” to “unwavering determination” — is the command principle of Auftragstaktik. Usually translated today as “mission command,” the basic idea is simple.
Decision-making power must be pushed down the hierarchy so that those on the ground — the first to encounter surprises on the evolving battlefield — can respond quickly.
Auftragstaktik blended strategic coherence and decentralized decision making with a simple principle: commanders were to tell subordinates what their goal is but not how to achieve it.
In 1982 “mission command” became part of official American doctrine.
I talked to David Petraeus about his philosophy of leadership and it was easy to hear echoes of Moltke. He even invoked the mantras “no plan survives contact with the enemy” and “nothing is certain.”
To develop flexible thinking, Petraeus pushes people out of their “intellectual comfort zone.”
According to Petraeus, it was a huge challenge to develop exercises that were reasonably safe but also forced officers to deal with surprises. But they figured it out because “that’s how you develop flexible leaders who can deal with uncertainty.”
But Petraeus sees the divide between doers and thinkers as a false dichotomy. Leaders must be both. “The bold move is the right move except when it’s the wrong move,” he says. A leader “needs to figure out what’s the right move and then execute it boldly.”
“Have backbone; disagree and commit” is one of Jeff Bezos’s fourteen leadership principles drilled into every new employee at Amazon.
The humility required for good judgment is not self-doubt — the sense that you are untalented, unintelligent, or unworthy. It is intellectual humility. It is a recognition that reality is profoundly complex, that seeing things clearly is a constant struggle, when it can be done at all, and that human judgment must therefore be riddled with mistakes. This is true for fools and geniuses alike. So, it’s quite possible to think highly of yourself and be intellectually humble. In fact, this combination can be wonderfully fruitful. Intellectual humility compels the careful reflection necessary for good judgment; confidence in one’s abilities inspires determined action.
Coping with dissonance is hard. “The test of a first-rate intelligence is the ability to hold two opposed ideas in mind at the same time and still retain the ability to function,” F. Scott Fitzgerald observed in “The Crack-Up.”
Are They Really So Super?
People can, in principle, use conscious System 2 reflection to catch mistakes arising from a rapid, unconscious System 1 operations. Superforecasters put enormous effort into doing just that. But the continuous self-scrutiny is exhausting, and the feeling of knowing is seductive. Surely even the best of us will inevitably slip back into easier, intuitive modes of thinking.
At work in each case was Kahneman’s WYSIATI — What You See Is All There Is — the mother of all cognitive illusions, the egocentric worldview that prevents us from seeing any world beyond the one visible from the tips of our noses.
Superforecasters are always just a System 2 slipup away from a blown forecast and a nasty tumble down the rankings. Kahneman and I agree about that.
Although Kahneman officially retired long ago, he still practices adversarial collaboration, his commitment as a scientist to finding common ground with those who hold different views.
He worked with Barbara Mellers, to explore the capacity of superforecasters to resist a bias of particularly deep relevance to forecasting: scope insensitivity.
We asked one randomly selected group of superforecasters, “How likely is it that the Assad regime will fall in the next three months?” Another group was asked how likely it was in the next six months. We did the same experiment with regular forecasters. Kahneman predicted widespread “scope insensitivity.” Unconsciously, they would do a bait and switch, ducking the hard question that requires calibrating the probability to the time frame and tackling the easier question about the relative weight of the arguments for and against the regime’s downfall.
Regular forecasters said there was a 40 % chance Assad’s regime would fall over three months and a 41 % chance it would fall over six months. But the superforecasters did much better: They put the probability of Assad’s fall at 15 % over three months and 24 % over six months.
It suggests that the superforecasters not only paid attention to the time frame in the question but also thought about other possible time frames — and thereby shook off a hard-to-shake bias.
My sense is that some superforecasters are so well practiced in System 2 corrections — such as stepping back to take the outside view — that these techniques have become habitual. In effect, they are now part of their System 1.
The “black swan” is therefore a brilliant metaphor for an event so far outside experience we can’t even imagine it until it happens. But Taleb isn’t interested only in surprise. A black swan must be impactful.
If forecasters make hundreds of forecasts that look out only a few months, we will soon have enough data to judge how well calibrated they are. But by definition, “highly improbable” events almost never happen. If we take “highly improbable” to mean a 1 % or 0.1 % or 0.0001 % chance of an event, it may take decades or centuries or millennia to pile up enough data. And if these events have to be not only highly improbable but also impactful, the difficulty multiplies.
History does sometimes jump. But it also crawls and slow, incremental change can be profoundly important.
I see Kahneman’s and Taleb’s critiques as the strongest challenges to the notion of superforecasting. We are far enough apart empirically and close enough philosophically to make communication, even collaboration, possible.
Taleb, Kahneman, and I agree there is no evidence that geopolitical or economic forecasters can predict anything ten years out beyond the excruciatingly obvious — “there will be conflicts” — and the odd lucky hits that are inevitable whenever lots of forecasters make lots of forecasts.
If you have to plan for a future beyond the forecasting horizon, plan for surprise. That means, as Danzig advises, planning for adaptability and resilience.
Kahneman and other pioneers of modern psychology have revealed that our minds crave certainty and when they don’t find it, they impose it. In forecasting, hindsight bias is the cardinal sin.
What’s Next
We can count and test like never before. And we are. Seen in this broader perspective, an evidence-based forecasting movement would not be a startling change springing up out of nothing. It would be another manifestation of a broad and deep shift away from decision making based on experience, intuition, and authority — “Do this because I think it will work and I’m an expert” — toward quantification and analysis.
If someone had asked me a decade ago to list the organizations that most needed to get serious about forecasting but were least likely to do so, the intelligence community would have been at the top. Why? Kto-kogo. Evidence-based forecasting will improve their work in the long run, but it’s dangerous in the short run.
Far too many people treat numbers like sacred totems offering divine insight. The truly numerate know that numbers are tools, nothing more, and their quality can range from wretched to superb.
Brier scoring of forecast accuracy. It is a work in progress. One problem is that Brier scores treat false alarms the same as misses. But when it comes to things like terrorist attacks, people are far more concerned about misses than false alarms. Fortunately, adjusting the scoring to capture this concern is easy. Forecasters just have to be told in advance what the ground rules are — “False positives will cost you one – tenth as much as false negatives” — so they can adjust their judgments accordingly.
Another way to think of it is to imagine a painter using the technique called pointillism. It consists of dabbing tiny dots on the canvas, nothing more. Each dot alone adds little. But as the dots collect, patterns emerge. With enough dots, an artist can produce anything from a vivid portrait to a sweeping landscape.
Another critical dimension of good judgment is asking good questions.
What qualifies as a good question? It’s one that gets us thinking about something worth thinking about. So one way to identify a good question is what I call the smack-the-forehead test: when you read the question after time has passed, you smack your forehead and say, “If only I had thought of that before!”
Ten Commandments for Aspiring Superforecasters
Ten Commandments for Aspiring Superforecasters:
- Triage. Focus on questions where your hard work is likely to pay off. Don’t waste time either on easy “clocklike” questions or on impenetrable “cloud-like” questions.
- Break seemingly intractable problems into tractable sub-problems. Channel the playful but disciplined spirit of Enrico Fermi. Decompose the problem into its knowable and unknowable parts. Flush ignorance into the open. Expose and examine your assumptions. Dare to be wrong by making your best guesses. Better to discover errors quickly than to hide them behind vague verbiage.
- Strike the right balance between inside and outside views. Superforecasters know that there is nothing new under the sun. Nothing is 100 % “unique.”
- Strike the right balance between under- and overreacting to evidence. Belief updating is to good forecasting as brushing and flossing are to good dental hygiene. It can be boring, occasionally uncomfortable, but it pays off in the long term. Yet superforecasters also know how to jump, to move their probability estimates fast in response to diagnostic signals. Superforecasters are not perfect Bayesian updaters but they are better than most of us. And that is largely because they value this skill and work hard at cultivating it.
- Look for the clashing causal forces at work in each problem. For every good policy argument, there is typically a counterargument that is at least worth acknowledging.
- Strive to distinguish as many degrees of doubt as the problem permits but no more. Few things are either certain or impossible. And “maybe” isn’t all that informative. So, your uncertainty dial needs more than three settings. Nuance matters. The more degrees of uncertainty you can distinguish, the better a forecaster you are likely to be.
- Strike the right balance between under- and overconfidence, between prudence and decisiveness. Superforecasters understand the risks both of rushing to judgment and of dawdling too long near “maybe.” They routinely manage the trade – off between the need to take decisive stands (who wants to listen to a waffler?) and the need to qualify their stands (who wants to listen to a blowhard?).
- Look for the errors behind your mistakes but beware of rearview-mirror hindsight biases. Don’t try to justify or excuse your failures. Own them! Conduct unflinching postmortems: Where exactly did I go wrong?
- Bring out the best in others and let others bring out the best in you. Master the fine arts of team management, especially perspective taking (understanding the arguments of the other side so well that you can reproduce them to the other’s satisfaction), precision questioning (helping others to clarify their arguments so they are not misunderstood), and constructive confrontation (learning to disagree without being disagreeable).
- Master the error-balancing bicycle. Implementing each commandment requires balancing opposing errors. Just as you can’t learn to ride a bicycle by reading a physics textbook, you can’t become a superforecaster by reading training manuals. Learning requires doing, with good feedback that leaves no ambiguity about whether you are succeeding — “I’m rolling along smoothly!” — or whether you are failing — “crash!”
- Don’t treat commandments as commandments. “It is impossible to lay down binding rules,” Helmuth von Moltke warned, “because two cases will never be exactly the same.” Guidelines are the best we can do in a world where nothing is certain or exactly repeatable.

