“The Trouble with Big Data”. New Book published by the KPLEX Project

One of the major terminological forces driving ICT development today is that of ‘big data.’ While the phrase may sound inclusive and integrative, in fact, big data approaches are highly selective, excluding any input that cannot be effectively structured, represented, or, indeed, digitised. The Trouble with Big Data explores the challenges society faces with big data, through the lens of culture rather than social, political or economic trends as demonstrated in the words we use, the values that underpin our interactions and the biases and assumptions that drive us.

Evolving from research undertaken in the Knowledge Complexity (KPLEX) project, in which Trinity College Dublin, the Data Archiving and Networked Services (DANS) of the Koninklijke Nederlandse Akademie von Wetenschappen, and Freie Universität Berlin were partners, this book focuses on areas such as data and language, data and sensemaking, data and power, data and invisibility, and big data aggregation. How cultural practices are displaced by, and yet simultaneously resist mass datafication, can be instructive for the critical observation of big data research and innovation.

This book is available as open access through the Bloomsbury Open programme and is available on www.bloomsburycollections.com. It is funded by Trinity College Dublin, DARIAH-EU and the European Commission.

No Surprises, Please

As researchers of the social, we are often pre-occupied with ways in which knowledge is governed and controlled in order not to upset hegemonic narratives, but we are reminded every day that anyone can produce knowledge. Inspirational stories of unorthodox investigators and inventors making surprising discoveries abound. They solve problems with one weird trick. Trainers hate them. When particular methods of knowledge creation catch the popular imagination, they stir us to wonder at the achievements of human enquiry and the possibilities of collective endeavour. Citizen-scientists’ efforts to map the universe offer a welcome break in headlines reminding us of humanity’s penchant for self-destruction, as well as evoking a sense of awe at the scale of achievement possible when a critical mass of committed, anonymous volunteers chip away at raw material to carve out a work of staggering complexity.

The allure of stumbling upon a breakthrough that puts experts’ ‘persistent plodding’ (Wang 1963: 93) to shame fosters fervour for emergent tools like the Ngram Viewer, which led Rosenberg to comment that ‘briefly, it seemed that everyone was ngramming’ (2013: 24). When the means of knowledge production are seemingly in the hands of the people, we are tantalised by a fantasy of taking power from those elites who would otherwise govern it, but of course, we are still using the master’s tools. Whether the master is Google – whose Google Books capture about a fifth of the world’s library, and which made a third of those available through the Ngram Viewer – or any other mediator, the citizen-researcher should be wary of the black box, and the weird tricks they conjure from it.

Blue Line
An Ngram


Seasoned researchers may consider themselves hyper-aware of dominant discourses, and no-one takes up a position thinking they’ve been duped into the values they hold dear. When virtue-signalling brands like Innocent and Lush cute-bomb us with faux-naïve descriptions of their purity and messages from their workforce of dedicated artisans, we all like to think we can see through their studied informality to the processes of mass production. Borgman (2015) writes of the ‘magic hands’ of specialised, local, expert knowledge production and that which can be replicated on an industrial scale. Both have their place and there is an enduring belief that there are some areas for which the small-scale artisan’s skills are irreplaceable, but which end of the spectrum is most likely to throw up surprises that challenge accepted thinking?

We like to think there is a difference between human, humane craft and computerised, robotic task fulfilment, and we all had a good laugh when that police robot fell down those stairs to its watery grave because we like to think the human perspective adds a special je ne sais quoi beyond the competence of machines. So even where hundreds of artisanal citizen-data-harvesters come together to produce a multi-perspective synth of Venice’s Piazza San Marco, the inherent complexity of this technological mediation cannot be equated with the singular, human perspective of Canaletto’s artistic rendering.

We are wary of the mediation of technology. We therefore allow technology to serve us, to answer the questions we had conceived it to answer, but we are still uncomfortable with the implications of allowing it to suggest new questions. Uricchio points out that journalistic pronouncements on the potentially dystopic applications of new technology have become a trope. The Algorithm, referred to as a synecdoche for various black boxes, evokes a vision of a merciless god to be feared and worshipped:

The recent explosion of headlines where the term ‘algorithm’ figures prominently and often apocalyptically suggests that we are re-enacting a familiar ritual in which ‘new’ technologies appear in the regalia of disruption. But the emerging algorithmic regime is more than ‘just another’ temporarily unruly new technology. (Uricchio, 2017: 125)

Crystal Dome
The Data Deluge

So could the right mix of data and algorithms disrupt our looping endlessly on the same track, elevating us above the Matthew Effect to a higher plain of enlightenment? Is this our era-defining opportunity to emerge from the data deluge with a trove of knowledge, the munificence of knowing exactly where to look? Well possibly, but only if that possibility is already within us, or at least, within those of us creating the algorithms. As Bowker reminds us:

Our knowledge professionals see selfish genes because that’s the way that we look at ourselves as social beings—if the same amount of energy had been applied to the universality of parasitism/symbiosis as has been applied to rampant individualistic analysis, we would see the natural and social worlds very differently. However, scientists tend to get inspired by and garner funding for concepts that sit “naturally” with our views of ourselves. The social, then, is other than the natural and should/must be modeled on it; and yet the natural is always already social. (Bowker 2013: 168)

Uricchio (2017: 126) is also sceptical, noting that the ‘dyad of big data and algorithms can enable new cultural and social forms, or they can be made to reinforce the most egregious aspects of our present social order’; yet he is more hopeful that the computational turn has the power to surprise us:

The new era has yet to be defined, and it is impossible to know how future historians will inscribe our trajectory. Of course, the ‘newness’ of this regime comes with the danger that it will be retrofitted to sustain the excesses and contradictions of the fast-aging modern, to empower particular individual points of view, to control and stabilize a master narrative. But it also offers an opportunity for critical thinking and an imaginative embrace of the era’s new affordances. And for these opportunities to be realized, we need to develop critical perspectives, to develop analytical categories relevant to these developments and our place in them. (Uricchio, 2017: 136)

Our critical capacity is therefore our indemnity against the seduction of surprising discoveries, helping us to judge and accept that which is novel and valid. By interrogating the implications of new areas of inquiry from the start, we can avoid the danger of our creations escaping our control and serving undesirable ends. If we use the computational turn as an opportunity to strengthen critical thought, we might just find our way through the new complexities of knowledge without any nasty surprises.



Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. Cambridge, Massachusetts: MIT press.

Bowker, G. (2013). Data Flakes: An Afterword to “Raw Data” Is an Oxymoron. in Gitelman, L. (2013). “Raw Data” Is an Oxymoron. Cambridge, Massachusetts: MIT press.

Rosenberg, D. (2013). Data Before the Fact. in Gitelman, L. (2013). “Raw Data” Is an Oxymoron. Cambridge, Massachusetts: MIT press.

Uricchio, W. (2017). Data, Culture and the Ambivalence of Algorithms. in Schäfer, M.T. and van Es, K. (2017). The Datafied Society: studying culture through data. Amsterdam: Amsterdam University Press.

Wang, H. (1963). Toward Mechanical Mathematics. in The modelling of Mind, ed. K.M. Sayre & F.J. Crosson. South Bend, IN: Notre Dame University Press.