One of the major terminological forces driving ICT development today is that of ‘big data.’ While the phrase may sound inclusive and integrative, in fact, ‘big data’ approaches are highly selective, excluding any input that cannot be effectively structured, represented, or, indeed, digitised.

Data of this messy, dirty sort is precisely the kind that humanities and cultural researchers deal with best. It will therefore be the contribution of the K-PLEX project to investigate these elements of humanities and cultural data, and the strategies researchers have developed to deal with them. In doing so it will remain at the margins of ICT so as to better shed light on the gap between analogue or augmented digital practices and fully computational ones. As such, it will expand our awareness of the risks inherent in big data and to suggest ways in which phenomena that resist datafication can still be represented (if only by their absence) in knowledge creation approaches reliant upon the interrogation of large data corpora.

K-PLEX approaches this challenge in a comparative, multidisciplinary and multisectoral fashion, focusing on 3 key challenges to the knowledge creation capacity of big data approaches: the manner in which data that are not digitised or shared become ‘hidden’ from aggregation systems; the fact that data is human created, and lacks the objectivity often ascribed to the term; the subtle ways in which data that are complex almost always become simplified before they can be aggregated. It will approach these questions via a humanities research perspective, but using social science research tools to look at both the humanistic and computer science approaches to the term ‘data’ and its many possible meanings and implications.

As such, K-PLEX project defines and describes key aspects of data that are at risk of being left out of our knowledge creation processes in a system where large scale data aggregation is becoming ever more the gold standard.

Recent Posts

Patina pleases

Everything new is blank. New things display their integrity and an undestroyed, immaculate surface. Design objects are icons of youth and timelessness. They demand attention in a special way, because their newness is always at risk. Traces like scratches rob them of their stainlessness and indicate a loss of aura. The ageing of these objects … Continue reading Patina pleases

Veracity and Value: Two more “V” of Big Data

So far we have learnt about the most popular three criteria of big data: volume, velocity and variety. Jennifer Edmond suggested adding voluptuousness as fourth criteria of (cultural) big data. I will now discuss two more “V” of big data that are often mentioned: veracity and value. Veracity refers to source reliability, information credibility and … Continue reading Veracity and Value: Two more “V” of Big Data

Why Messiness Matters

As always in a field where different conceptions are present, there exist differing understandings of what ‘tidy’ or ‘clean’ data on the one hand, and ‘messy’ data on the other might be. On a very basic level, and coming from the notion of data arranged in tables, Hadley Wickham has defined ‘tidy data’ as those where … Continue reading Why Messiness Matters

More Posts