Predicting the future has always been a concern for positivist scientists. The theories and models they constructed claimed not only to represent general laws but also to forecast the prospective outcomes of long-term processes. Take for example the predecessor of sociology, Auguste Comte, who tried to explain the past development of humanity and predict its future course. Even Karl Marx, who criticized the limited conception of cause underlying positivist natural laws, developed a theory of history for Western Europe that saw socialism and communism as the final stages after epochs of slavery, feudalism and capitalism. In addition to description and explanation the predictive power was and is crucial for the scientific worth of theories.
Predictions about the future often rely on a combination of historical data and interpretations informed by current theories. A review of 65 estimates of how many people the earth can support for instance show how widely these differ: “The estimates have varied from <1 billion to >1000 billion”. The review also shows the different methods used for estimating human carrying capacity. The first estimation in the 17th century which stated 13.4 billion people extrapolated the number of inhabitants in Holland to the earth’s inhabited land area. Estimations from the 20th century are based on food and water supply and individual requirements thereof.
Recent estimations rely on computer models that integrate data and theories related to growth. Different scenarios are developed that estimate the maximum global population at about 9 billion people in the 21st century and then either to collapse or to adapt smoothly to the carrying capacity of the earth.
Estimations of this kind should be viewed with caution, because the information provided is incomplete. We might have some idea about the desirable level of material well-being and the physical environments we want to live in, but we cannot foresee the technologies, economic arrangements or the political institutions in place in fifty or eighty years. These mechanisms do not operate independently but interact and produce feedback loops. The awareness of dangers and risks alone won’t necessarily change predominant policies. Human behavior and the underlying fashions, tastes and values (on family size, equality, stability and sustainability) are too complex to be predicted accurately.
Let’s try then a more modest example! What about predicting the potential outbreak of a disease? Google Flu Trends was a program that aimed for better influenza forecasting than the U.S. Centers for Disease Control and Prevention. From 2008 onwards internet searches for information on symptoms, stages and remedies were analyzed in order to predict where and how severely the flu would strike next. The program failed. Big data inconsistencies and human errors in interpreting the data are held responsible for not predicting the flu outbreak in the United States in 2013, the worst outbreak of influenza in ten years. Another recent example is the Ebola epidemic in West Africa in 2014. The U.S. Centers for Disease Control and Prevention published a worst-case prediction with 1.4 million people infected. The World Health Organization predicted a 90% death rate from the disease, in retrospect the rate is about 70%. The data and the model based on initial outbreak conditions turned out inadequate for projections. Disease conditions and human behavior changed too quickly for humans and algorithms to keep up.
OK, then how about sales forecasting, a comparatively easy task? Mass-scale historical data has served eBay and other companies to measure the benefit of search advertising. In a simple predictive model clicks were counted to predict sales: “Although a click on an eBay ad was a strong predictor of a sale – consumers typically purchased right after clicking – the experiment revealed that a click did not have nearly as large a causal effect, because the consumers who clicked were likely to purchase, anyway”.
This shows us that data alone are not enough for prediction, one needs to know about causal effects and context information. Additionally, purely data-driven approaches tend to produce models and algorithms that are overfit to the idiosyncrasies of particular circumstances. What theories and models can deliver is not knowledge of the future but at best the ability to rule out a range of futures as unrealistic.
Featured image was taken from: http://www.bigpicexplorer.com/idealworld/population.htm