Patina pleases

Everything new is blank. New things display their integrity and an undestroyed, immaculate surface. Design objects are icons of youth and timelessness. They demand attention in a special way, because their newness is always at risk. Traces like scratches rob them of their stainlessness and indicate a loss of aura. The ageing of these objects triggers a certain horror in their owners, because it shows the passing of newness and a loss of control. Consumer electronics is a particularly instructive example here, and this is in spite of a miraculous ability of electronic content – the ability NOT to age.

But ageing is precious. Aged objects, which can be found on flu markets, in museums or antique shops are often seen as precious. They represent the past, time, and history, and we appreciate the peculiarities of their surface, which has grown over decades. We are not disturbed by dust, grease, grime, wax, scratches, or cracks – on the contrary, the patina of objects represents their depth and their ability to exhibit their own ageing process. The attention of the observer focuses on the materiality of the object, the patina of the surface and the deformations gained in time mark their singularity and individuality. To be precise: The fascination of the observer focuses more on the signs of ageing, rather than on the object itself.


Schoolbag from a flu market

In the digital world, we don’t find comparable qualities. Ageing would mean that data have become corrupt, unreadable, unusable and therefore worthless. For objects digitized by cultural heritage institutions, this is a catastrophe; it means loss. With consumer electronics, it’s similar: A Smartphone is devalued by scratches. No gain in singularity is noted. With software, it is even worse. Software ageing is a phenomenon, where the software itself keeps its integrity and functionality, but if the environment, in which this software works, or the operating system changes, the software will create problems and mistakes, and sooner or later it will become dysfunctional. Ageing here means an incapability to adapt to the digital environment, and surprisingly this happens without wear or deterioration, since it is not data corruption which causes this ageing. In this respect, the process of software ageing can be compared to the ageing of human beings: They drag behind time or are uncoupled of the social world they live in; they lose connectivity, the ability of actualisation, and the skill to exchange with their environment.

With analogue objects, this is not the case. They provoke sensual pleasures without reminding the observer of the negative aspects of human ageing. Even if they have become dysfunctional and useless, they keep the dignity and aura of time, inscribed into their body and surface.  They keep their observers at a beneficial distance, which opens up space for imagination and empathy. The observer is free to visualise to himself the history of these objects and their capability to endure long time distances without vanishing – certainly a faculty which human beings do not dispose of. What remains are the characteristics of dignified ageing. While the nasty implications of ageing are buried in oblivion, analogue objects evoke a beauty of ageing.

Veracity and Value: Two more “V” of Big Data

So far we have learnt about the most popular three criteria of big data: volume, velocity and variety. Jennifer Edmond suggested adding voluptuousness as fourth criteria of (cultural) big data.

I will now discuss two more “V” of big data that are often mentioned: veracity and value. Veracity refers to source reliability, information credibility and content validity. In the book chapter “Data before the Fact” Daniel Rosenberg (2013: 37) argued: “Data has no truth. Even today, when we speak of data, we make no assumptions at all about veracity”. Many other scholars agree with this, see: Data before the (alternative) facts.

What has been questioned for “ordinary” data seems to hold true for big data. Is this because big data is thought to comprise statistical population data, not just data of a sample? Does the assumed totality of data reveal the previously hidden truth? Instead of relying on a model or on probability distributions, we could now assess and analyse data of the entire population. But apart from the implications for statistical analysis (higher chances of getting false-positives, need for tight statistical significance levels, etc.) there are even more fundamental problems with the veracity of big data.



Take the case of Facebook emoji reactions. They have been introduced in February 2016 to give users the opportunity to react to a post by tapping either Like, Love, Haha, Wow, Sad or Angry. Not only is the choice of affective states very limited and the expression of mixed emotions impossible but the ambiguity in using these expressions themselves is problematic. Although Facebook reminds its users: “It’s important to use Reactions in the way it was originally intended for Facebook — as a quick and easy way to express how you feel. […] Don’t associate a Reaction with something that doesn’t match its emotional intent (ex: ‘choose angry if you like the cute kitten’)”, we do know that human perceptions as well as motives and ways of acting and reacting are manifold. Emojis can be used to add emotional or situational meaning, to adjust tone, to make a message more engaging to the recipient, to manage conversations or to maintain relationships. Social and linguistic function of emojis are complex and varied. Big Data in the case of Facebook emoji reactions then seems to be as pre-factual and rhetorical as “ordinary” data.

Value now refers to social and economic value that big data might create. When reading documents like the European Big Data Value Strategic Research Innovation Agenda one gets the impression of economic value dominating. The focus is directed to “fuelling innovation, driving new business models, and supporting increased productivity and competitiveness”, “increase business opportunities through business intelligence and analytics” as well as to the “creation of value from Big Data for increased productivity, optimised production, more efficient logistics”. Big Data value is not speculative anymore: “Data-driven paradigms will emerge where information is proactively extracted through data discovery techniques and systems are anticipating the user’s information needs. […] Content and information will find organisations and consumers, rather than vice versa, with a seamless content experience”.

Facebook emoji reactions are just an example of this trend. Analysing users’ reactions allows not only for “better filter the News Feed to show more things that Wow us” but probably also to change consumer behavior and sell individualized products and services.

Featured image was taken from Flickr.

Why Messiness Matters

As always in a field where different conceptions are present, there exist differing understandings of what ‘tidy’ or ‘clean’ data on the one hand, and ‘messy’ data on the other might be.

On a very basic level, and coming from the notion of data arranged in tables, Hadley Wickham has defined ‘tidy data’ as those where each variable is a column, each observation is a row, and each type of observational unit is a table. Data not following these rules are understood as messy. Moreover, data cleaning is a routine procedure in statistics and private companies applied to the data gathered: judging the number and importance of missing values, correcting obvious inconsistencies and errors, normalization, deduplication etc. Common examples are the use of non-existing or incorrect postal codes, a number which wrongly indicates an age (in comparison to birth date), the conversion of names from “Mr. Smith” to “John Smith”, etc.

In the sciences, the understanding of ‘messy’ data largely depends on what is understood as ‘signal’ and what is seen as ‘noise’. Instruments collect a lot of data, and only the initiated are able to distinguish between them. Cleaning data and thus peeling out the relevant information in order to receive scientific facts is also a standard procedure here. In settings where the data structure is human-made rather than technically determined, ‘messy’ data remind scientists to reassess whether the relationship between the (research) question and the object of research is appropriate; whether the classifications used to form variables are properly conceived of or whether they inappropriately limit the phenomenon under question; and what kind of data can be found in that garbage category of the ‘other’.

In the humanities, this is not as easily the case as it is with the sciences. Two recent publications provide for examples. Julia Silge’s and David Robinson’s book “Text Mining with R” (2017) bears the subtitle “A tidy approach”. Very much like Wickham, they define ‘tidy text’ as “a table with one-token-per-row”. Brandon Walsh and Sarah Horowitz present in their “Introduction to Text Analysis” (2016) a more differentiated approach to an understanding of what ‘cleaning’ of ‘messy’ data might look like. They introduce their readers to the usual problems of cleaning dirty OCR; the standardization and disambiguation of names (their well-chosen example is Arthur Conan Doyle, who was born as Arthur Doyle, but used one of his given names as addendum to his last name); and the challenges posed by metadata standards. All that seems easy stuff at first glance, but think of Gothic types (there exist more than 3.000 of them), pseudonyms or heteronyms, or camouflage printings published during the inquisition or under political repression. Now you can imagine how hard it can be to keep your data ‘clean’.

And there is, last but not least, another conception of ‘messiness’ in the humanities. It lies in the specific cultural richness, or polysemy, or voluptuousness of the data (or the sources, the research material) under question: A text, an image, a video, a theater play or any other object of research can be interpreted from a range of viewpoints. Humanists are well aware of the fact that the choice of a theory or a methodological approach – the ‘grid’ which provides order to the chaos of what is being examined – never provides an exhausting interpretation. It is the ‘messiness’ of the data under consideration which provides the foundation of alternative approaches and research results, which is responsible for the resistance to interpretation (and, with Paul de Man, to theory) – and which continuously demands an openness towards seeing things in another way.

From analogue to proto-digital databases

Databases as collections of data are not a new phenomenon. Several centuries ago, collections began to emerge all over the world, as for instance the manuscript collections of Timbuktu (in medieval times a centre for Islamic scholars) demonstrate. The number of these manuscripts is estimated at about 300,000 in all the different domains such as Qur’anic exegesis, Arabic language and rhetoric, law and politics, astronomy and medicine, trade reports, etc.

Usually the memory of many people does not go back so far. They might relate today’s databases with the efforts of establishing universalizing classification systems, which began in the nineteenth century.

The transition to digital databases took place only very recently and this explains why many databases are still underway to digitization.

I will present the database eHRAF World Cultures to illustrate this point. This online database originated as “Collection of Ethnography” by the research programme “Human Relations Area Files” that started back in the 1940s at Yale University. The original aim of anthropologist George Peter Murdock was to allow for global comparisons in terms of human behaviour, social life, customs, material culture, and human-ecological environments. To implement this research endeavour it was thought necessary “to have a complete list of the world’s cultures – the Outline of World Cultures, which has about 2,000 described cultures – and include in the HRAF Collection of Ethnography (then available on paper) about ¼ of the world’s cultures. The available literature was much smaller then, so the million or so pages collected could have been about ¼ of the existing literature at that time”.

From the 1960s onwards, the contents of this collection of monographs, journal articles, dissertations, manuscripts, etc. have been converted into microfiche before in 1994 the digitization of the database was launched. The first online version of the database “eHRAF World Cultures” was available in 1997. This digitization process is far from accomplished. Up to now additional 15,000 pages are converted from the microfiche collection and integrated in the online database every year. Currently the database contains data about more than 300 cultures worldwide.


So what does make this database proto-digital then?

First of all it is the research function. When the subject-indexing – at the paragraph level (!) –was done, it was done manually. The standard that provided the guidelines for what and how to index the content of the texts is called the Outline of Cultural Materials and was at that time very elaborate. It assembles more than 700 topic identifiers, clustered into more than 90 subject groups.

The three digit numbers, e.g. 850 for the subject group “Infancy and Childhood” or 855 for the subject “Child Care” ought to facilitate the search for concepts and retrieve data also in other languages than English. And although Boolean searches allow combinations of subject categories and key words, cultures, countries or regions, one has to adapt the logic of this ethnographic classification system in order to carry out purposeful search operations. The organisation of the database was obviously conceptualised in a hierarchical way. If you want to get a particular piece of information, then you look up the superordinate concept and decide which subjects of this group you will need to apply to your research to get the expected results.

Secondly, although the “Outline of Cultural Materials” thesaurus is continually being extended there is no system for providing updates. Only once a year a new list of subject and subject groups is published (online, in PDF and in print).

Thirdly, data that would contribute to better localise cultural groups, such as GIS data (latitude and longitude coordinates) are not available in eHRAF.

At last, users can print or email search results and selected paragraphs or pages from documents, but there is no feature to export data from eHRAF into a (qualitative) data analysis software. The “eHRAF World Cultures” database is also not compatible with OpenURL.

The way from analogue to digital databases is apparently a long and difficult one. The curatorial agency of the database structure and the still discernible influence of the people who assigned the subjects to the database materials should now be a bit clearer.

Featured image was taken from

Critical Data Studies: A dialog on data and space

This article has been published by Craig M Dalton, Linnet Taylor, and Jim Thatcher. ‘Critical Data Studies: A dialog on data and space’. In: In: Big Data & Society, Vol 3 (2016), Issue 1, pp.1—9.

In light of recent technological innovations and discourses around data and algorithmic analytics, scholars of many stripes are attempting to develop critical agendas and responses to these developments (Boyd and Crawford 2012). In this mutual interview, three scholars discuss the stakes, ideas, responsibilities, and possibilities of critical data studies. The resulting dialog seeks to explore what kinds of critical approaches to these topics, in theory and practice, could open and make available such approaches to a broader audience.

The article is available online at:

On digital oblivion

Knowledge is made by oblivion.
Sir Thomas Browne; in: Sir Thomas Browne’s Works: Including His Life and Correspondence, vol.2, p.177.

Remembrance, memory and oblivion have a peculiar relationship to each other. Remembrance is the internal realisation of the past, memory (in the sense of memorials, monuments and archives) its exteriorised form. Oblivion supplements these two to a trinity, in which memory and oblivion work as complementary modes of remembrance.

The formation of remembrance can be seen as an elementary function in the development of personal and cultural identity; oblivion, on the other hand, ‘befalls’, it happens, it is non-intentional and can therefore be seen as a threat to identity. Cultural heritage institutions – such as galleries, libraries, archives, and museums (GLAM) are thus not only the places where objects are being collected, preserved, and organized; they also form bodies of memory, invaluable for our collective identity.

There is a direct line in these cultural heritage institutions from analogue documentation to digital practice: Online catalogues and digitized finding aids present metadata in a publicly accessible way. But apart from huge collections of digitized books, the material under question is mostly not available in digital formats. This is the reason why cultural heritage – and especially unique copies like the material stored in archives – can be seen as ‘hidden data’. What can be accessed are metadata: the most formal description of what is available in cultural heritage institutions. This structure works in a two-fold way towards oblivion: On the one hand, the content of archives and museums is present and existing, but not in digital formats and thus ‘invisible’. On the other hand, the century-long practice of documentation in the form of catalogues and finding aids has been carried over into digital information architectures; but even though these metadata are accessible, they hide more than they reveal if the content they refer to is not available online. We all have to rely on the information given and ordered by cultural heritage institutions, their classifications, taxonomies, and ontologies, to be able to access our heritage and explore what has formed our identities. Is Thomas Browne right in pointing to the structured knowledge gained from oblivion?

This depends on our attitude to the past and the formation of identity. It is possible to appreciate oblivion as a productive force. In antiquity, amnesty was the Siamese twin of amnesia; the word amnesty is derived from the Greek word ἀμνηστία (amnestia), or “forgetfulness, passing over”. Here it is oblivion which works for the generation of identity and unity: let bygones be bygones. In more recent times, it was Nietzsche who underlined the necessity to relieve oneself from the burdens of the past. It was Freud who identified the inability to forget as a mental disorder, earlier called melancholia, nowadays depression. And it was also Freud who introduced the differentiation between benign oblivion and malign repression.

But it is certainly not the task of GLAM-institutions to provide for oblivion. Their function is a provision of memory. Monuments, memorials, and the contents of archives serve a double bind: They keep objects in memory; and at the same time this exteriorisation neutralizes and serves oblivion insofar as it relieves from the affect of mourning; to erect monuments and memorials and to preserve the past in archives is in this sense a cultural technique of an elimination of meaning. To let go what is not longer present by preserving the available – in this relation the complementarity of memory and oblivion becomes visible; they don’t work against each other, but jointly. From this point of view remembrance – the internal realization of the past – is the task of the individual.


Detail of the front of the Jewish Museum Berlin
By No machine-readable author provided. Stephan Herz assumed (based on copyright claims). [Public domain], via Wikimedia Commons

It is not the individual which decides on what should sink into oblivion. Societies and cultures decide in a yet unexplored way which events, persons, or places (the lieux de mémoire) are kept in memory and which are not. If it is too big a task to change the documentation practices of GLAM-institutions, the information architecture, and the metadata they produce, the actual usage of archival content could provide an answer to the question of what is of interest for a society and what is not: The digital documentation of what is being searched for in online catalogues and digitized finding aids as well as which material is being ordered in archives clearly indicate the users’ preferences. Collected as data and analysed with algorithms, we could learn from this documentation not only about what is being kept in memory; we could also learn about what falls into oblivion. And that is a kind of information historians rarely dispose of.

Harry Potter and the Philosopher’s DIKW.

DIKW= Data. Information. Knowledge. Wisdom. DIKW!

That’s the way things flow. Or more specifically, that’s the way “the knowledge pyramid” says they flow.

From data we gather information we develop into knowledge which leads to wisdom.


But is it really that straightforward? Are our thought processes that streamlined, hierarchical and, let’s face it, uncomplicated? Or is DIKW simply a nice sounding but somewhat reductive anagram to be used when waxing lyrical about the philosophy of knowledge, information systems, information management, or pedagogy?

I for one am not all that convinced by DIKW. And I’m not the only one: the pyramid is widely criticised. But why? Where and how, exactly, does DIKW misrepresent how we think about and manage data and information? Today we’re going to explore the DIKW pyramid; specifically how exactly “data” gets transformed into “wisdom,” what exactly happens to it, and how a different approach to cleaning or processing that “data” can lead us to come to very different conclusions, and thus to very different states of “wisdom”. And to facilitate this philosophising on the nuances of DIKW and its vulnerability to corruption, I’m going to talk about Harry Potter and the “is Snape good or bad” plot that runs through all seven Harry Potter novels. Because, why not? Specifically I’m going to use Snape’s complexity as a character to highlight DIKW’s shortcomings and in particular how DIKW can be corrupted depending on how the data you collect is processed and interpreted.

As we all know, Snape looks kinda evil, acts kinda evil, hates Harry, and has a pretty dodgy past in which he was aligned with Voldemort, the wizard responsible for Harry’s parents’ deaths. He has a fondness for the “Dark Arts,” and, as head of Slytherin, an unhealthy interest in eugenics and so-called “blood purity” (never a good trait in a person). And he is played to absolute perfection by the unrivaled Alan Rickman, sadly now deceased.

Rowling maintains a near-constant back and forth throughout the series, with the characters forever pursuing the idea that Snape is bad, being thwarted in their pursuit of this idea, or thrown off their suspicions by Dumbledore who always reaffirms his strong faith in Snape. The dampening of any suspicion regarding Snape’s motives generally comes at the conclusion of any given book, only for these suspicions to be re-ignited at the start of the next book and the next adventure.

And just when this continual “is he or isn’t he a bad guy” threatens to get monotonous, with the well-trained reader now six books in and attuned to expect the usual — “Snape’s being shifty, ergo…he must be bad!” / “Nope he’s actually good, Dumbledore says so.” / “Oh okay let’s talk about this again in the next book.” — Rowling bucks our expectations spectacularly, and all of these hints and suspicions about Snape are seemingly verified  in book six, Harry Potter and the Half-Blood Prince, when Snape goes and kills Dumbledore, the one man who trusted and protected him absolutely; a most heinous crime, and one done using “Avada Kedavra,” the unforgivable curse.

Lets take a look at Snape’s first appearance, way back in book one, Harry Potter and the Philosopher’s Stone, or as it’s known in the US, Harry Potter and the Sorcerer’s Stone:Screenshot 2017-06-27 13.33.43.png

    (J. K. Rowling, Harry Potter and the Philosopher’s Stone, Bloomsbury 1999, 126.)

What do we get on Snape here?

He’s unhealthy looking, pale and yellowish. He could probably do with a good shampoo. Oh, and he has a hooked nose.

Now this description is controversial. Snape’s portrait could be considered to be “Jewish-coded” or even anti-Semitic; certainly it can be seen as having uneasy inter-textual chimes with overtly anti-Semitic portraits in classic (and classically anti-Semitic) canonical English-language texts such as Charles Dickens’s Oliver Twist or Shakespeare’s The Merchant of Venice where both Fagin and Shylock respectively (both pictured below) are presented as having the overt hooked noses that were considered characteristic of the so-called “Stage Jew.”

The “Stage Jew” is basically the Jewish equivalent of blackface, a crude form of racial stereotyping that was particularly popular during the Elizabethan period and thereafter. Much like blackface, these racist Jewish stereotypes were not just confined to the realms of theatre and literature, Hitler and the Nazi’s also made full use of racist caricatures in their propaganda.

Most all of this will (thankfully) be lost on a younger audience, but the question remains, does Rowling engage with this sadly all-too familiar visual trope deliberately? Knowing Snape’s story, his full story, as she claims she did all the way back in book one, does she proffer such material with a view to ultimately showing it up to be complete rubbish?

Snape’s appearance screams evil, irrespective of whether you want to connect that with the tradition of the Stage Jew, or with less racially charged narratives that seek to represent a person’s character in or through their appearance. It’s a frequent visual trope in Superhero(ine) movies, for example; the good guys look good, the bad guys look bad. And when the good guy turns bad (as they are wont to do on occasion), their nascent badness is represented visually through some change in their appearance. Again, in relation to the Spiderman 3 poster below, we can ask why the “evil” Spiderman is coded “black,” particularly when Spiderman hails from a country whose law enforcement has a well established track record of subjecting African American males to racial profiling, but we better stay on topic.

To return to Snape, to his appearance, and to how our reading of his appearance can change once we get to the crux of his character: Certainly Snape’s character leads us to question many of the facets of other peoples’ appearances we read and take for granted, their demeanour, their silences, their unreadability, their appearance, their complexity. Snape looks bad, but he is not bad, he is good. We have misread him and are guilty of superficially associating his morality with his appearance. This is not to say that readers of Harry Potter are guilty of racially profiling Snape, not at all, but simply that this tendency to create a link between appearance and morality has a long history in English literature, and one that unfortunately happens to have rather unpleasant racist roots that contemporary readers may not be aware of. This is just one small part of why Rowling is such a good author, and why the Harry Potter books are so rich and rewarding for growing minds in particular. But it’s also an example of how the DIKW pyramid can be drastically altered or corrupted depending on how you read the data at the bottom; the data being the material that proffers the opportunity to reach a state of wisdom regarding that particular material or phenomenon.

In other words, even if you’d only seen a very small selection of Marvel superhero movies, you have been trained to seek out information on a person’s moral character based off of their appearance, you may have taken facets of Snape’s malevolent appearance as “data” on Snape. This “data” would allow you to garner “information” on his character, because most figures that look like this in literature or superhero movies are bad or evil characters. Then, thanks to your experience of encountering Snape throughout the novels, this data would lead you to a position of “knowledge” or even to a position of “wisdom” regarding Snape (and peoples like Snape). This is where what we believe to be “wisdom” can start to cause problems.

—Pause here to note that the backstory of many of the villains in superhero movies are frequently tragic and often not all that different from the backstory of the hero or heroines. Similar data, different narratives, different superhero/ villain costume.—

This is also an example of how data is pre-epistemic. That means data has no truth, it just is, other people come along and clean it up, interpret it, narrativise it, assign it truth-values. Throughout the series Snape just is, he does not explain or excuse himself and he is as true to his love of Lily Potter at the beginning as he is at the end. We just happen not to be privy to his backstory. But Dumbledore is, and Rowling (naturally) is. Without the backstory readers are let run amok regarding the character of Snape, often hating him, feeling anxious in his presence, or fearing for the safety of the characters around him. But how much of that comes from Snape himself, or how much comes from other peoples’ interactions with him and their reading of him? Remember, many of the characters in the Potter-verse have already decided for themselves that Snape is evil and not to be trusted; the narratives we read are fueled by tainted interpretations of his data. Everything from his stare to his tone of voice is presented with adjectives that encourage us to read him as malevolent. The state of “Wisdom” (also, in my case, near-hysterical despair) we arrive at in book six when Snape kills Dumbledore is fuelled by cumulative knowledge and information that stems from misinterpreted data. The Snape-data has been read one way and one way alone, and there is no alternative narrative available, especially not once Dumbledore dies. There is no counter-argument to the data that seems to paint Snape as an absolute villain. None at least until, in his own death scene, he provides the memory (and narrative) that allows Harry to access the “truth value” of Snape’s history and his lifelong love for Lily.

Armed with this poignant insight into Snape’s childhood and history (his personal narrative, his personal account for the data-traces he has left on the Potter-verse), our DIKW pyramid is rewritten from the bottom up, but interestingly, much of it remains the same. The data stays data, but the narratives that we use form information from that data, to create knowledge from that information, and to eventually arrive at a sage-like state of sad wisdom regarding Snape’s sad fate, these narratives change. Snape still says the things he says, and does the things he does, we are just newly wise to his motives. We now know he is a good man. Same data, completely different DIKW.

We already known the DIKW model is problematic and oversimplified. As Christine Borgman notes, the “tripartite division of data, information, and knowledge […] oversimplifies the relationships among these complex constructs.”[1] Data is reinterpretable. And this is key. For the majority of the series Snape is continually heard, seen, and spoken about by characters in the texts using adjectives that assign morally dubious traits to his character. The –IKW part of Snape’s DIKW is unbalanced, because we do not get Snape’s personal narrative until the very end of his story. And it is only through this narrative that we can reassess the data on him we have collected, discarding some of it (such as the tendency to dramatise his appearance into one akin to a stage villain) as absolute rubbish, and reassessing what remains.

While Snape may carry the visual appearance (visual data) that makes it easy for us to suspect or infer that he is a “bad character,” while he may even carry the physiognomical hallmarks that hark back to racist character profiling in English literature, he is essentially a good person. What Rowling is saying here, in the most epic Rowling-esque fashion, is do not judge a person based on their appearance. Judge them on what they do and why they do it. This is Rowling’s real trump card to the “evil is as evil looks” camp of Snape-hating Potter fans. And arguably it is also Rowling’s way of redressing this unpleasant facet of English literary history, which sees race presented through face, and race or racial stereotypes sadly being presented as a measure of a character’s moral compass. Rowling writes back against this tradition by having Snape carry the same facial features of these similarly maligned “villainous” figures, features past readers or audiences would have taken as crude indications of his untrustworthiness. Yet instead of being “untrustworthy,” this same hook-nosed figure turns out to be one of the bravest, strongest, truest characters in the series. Take that Shakespeare and Dickens. Whup-ah.

So, we come back then to the problem of DIKW. Models such as DIKW create misleading and misrepresentative impressions about the supposed distinctions between the various facets of DIKW. DIKW also belies the central role narrative plays in all of this; narrative is the conveyor of information, knowledge, and wisdom. It is how we articulate and spread our opinions on data. And data is the foundation of DIKW, so depending on how that data is narrativised, the other elements in this hierarchy can be drastically different. One sees Snape as evil, and reads this wickedness in and into his every scene, up until his death. The other asks us to think carefully about what we do with our data, and the narratives we create from it, because even when we are wholly convinced in the veracity and justifiableness of our “wisdom,” we could be totally wrong, as we were with Snape.

[1] Christine L. Borgman, “Big Data, Little Data, No Data,” MIT Press, 17, accessed April 7, 2017,, 17.



Whose context?

When Christine Borgman (2015) mentions the term “native data” she is referring to data in its rawest form, with context information like communication artefacts included. In terms of the NASA’s EOSDIS Data Processing Levels, “native data” even precede level 0, meaning that no cleaning had been performed at all. Scientists who begin their analysis at this stage do not face any uncertainties about what this context information is. It is simply the recordings and the output of instruments, predetermined by the configuration of the instruments. NASA researchers may therefore count them lucky to obtain this kind of reliable context information.

Humanists’ and social scientists’ point of departure is quite different. Anthropologists for example would probably use the term “emic” for their field research data. “Emic” here stands in contrast to “etic” and has been derived from the distinction in linguistics between phonemics and phonetics: “The etic viewpoint studies behavior as from outside of a particular system, and as an essential initial approach to an alien system. The emic viewpoint results from studying behavior as from inside the system” (Pike 1967: 37). An example for the emic viewpoint might be the correspondences between pulses and organs in Chinese medical theory (see picture below) or the relation of masculinity to maleness in a particular cultural setting (MacInnes 1998).

L0038821 Chinese woodcut: Correspondences between pulses and organs

The emic context then for Anthropologists depends on the particular cultural background of their research participants. Disassociated from this cultural background and transferred into an etic context, data may become incomprehensible. Take for example the Kosovo, a sovereign state from an emic point of view, but only recognized by 111 UN member states. In this transition from emic to etic context, the etic context obviously becomes an imposed context.

Applied to libraries, archives, museums and galleries, it might equally be important to know the provenance and original use, so to speak the emic context of the resources. What functions did the materials have for the author or creator? To know about the “experience-near” and not only the “experience-distant” meanings of materials would increase its information content and transparency. One could also say that this additional providing of “emic” metadata enables traceability to the source context and guarantees the credibility of the data. From an operational viewpoint that would nevertheless recreate the problem of standards and making data findable.

If we move up to the next level, metadata from each GLAM-institution could be said to be emic, according to the understanding of the data structure by the curators in that institution. Currently there are over hundred different metadata standards applied. Again, the aggregation of several metadata standards into a unified metadata standard creates the same problem – transfer from an emic (an institution’s inherent metadata standard) into an etic metadata standard.

So what is the solution? Unless GLAM-institutions are willing to accept an imposed standard there remains only the possibility of a mutual convergence and ultimately an inter-institutional consensus.

Borgman, Christine L. (2015) Big Data, Little Data, No Data. Scholarship in the Networked World. Cambridge: MIT Press.
MacInnes, John (1998) The end of masculinity. The confusion of sexual genesis and sexual difference in modern society. Buckingham: Open University Press.
Pike, Kenneth L. (1967) Language in Relation to a Unified Theory of the Structure of Human Behavior. The Hague: Mouton.

Featured image was taken from

Promise and Paradox: Accessing Open Data in Archaeology

This article has been published by Huggett, Jeremy. ‘Promise and Paradox: Accessing Open Data in Archaeology’. In: Clare Mills, Michael Pidd and Esther Ward. Proceedings of the Digital Humanities Congress 2012. Studies in the Digital Humanities. Sheffield: HRI Online Publications, 2014.

Increasing access to open data raises challenges, amongst the most important of which is the need to understand the context of the data that are delivered to the screen. Data are situated, contingent, and incomplete: they have histories which relate to their origins, their purpose, and their modification. These histories have implications for the subsequent appropriate use of those data but are rarely available to the data consumer. This paper argues that just as data need metadata to make them discoverable, so they also need provenance metadata as a means of seeking to capture their underlying theory-laden, purpose-laden and process-laden character.

The article is available online at:

The data (and narrative) behind a good lie.

In 2003 James Frey, a twenty-something American and recovering alcoholic and drug addict, published his memoir A Million Little Pieces. The memoir recounts his experience of addiction and rehabilitation, often in brutalist and unsparingly graphic detail. Two scenes in particular have stayed with me to this day, one involving a trip to the dentist for un-anaesthetised root-canal treatment (the horror), the other something altogether worse involving a woman and the things people in the grips of an addiction will do for drugs.

In 2005 Frey hit the big-time and A Million Little Pieces was selected for the Holy-Grail of Book Clubs, Oprah’s Book Club, an act that virtually guaranteed massive sales. Not long after the book topped the New York Times’s best sellers list, as Oprah’s Book Club choices tend to do, and there it remained for some 15 weeks, selling close to four million copies. Not bad for a former addict, eh?

Except Frey wasn’t all he claimed to be. And his memoir wasn’t entirely factual. To give but one example. Instead of being in jail for a respectable 87 days, Frey was in jail for just a few hours. The Smoking Gun published an article titled “A Million Little Lies” that opened with the amusingly melodramatic statement “Oprah Winfrey’s been had” (gasp!) and proceeded with an article outlining how Frey’s book had circumvented the truth and taken Winfrey along for the ride:

Police reports, court records, interviews with law enforcement personnel, and other sources have put the lie to many key sections of Frey’s book. The 36-year-old author, these documents and interviews show, wholly fabricated or wildly embellished details of his purported criminal career, jail terms, and status as an outlaw “wanted in three states.”

In an odd reversal, Frey’s lack of jail time and status as a not-so-delinquent former delinquent led to outrage and culminated in a massively dramatic face-off with Winfrey that is referred to by Winfrey herself as among her most controversial interviews:

Oprah: James Frey is here and I have to say it is difficult for me to talk to you because I feel really duped. But more importantly, I feel that you betrayed millions of readers. I think it’s such a gift to have millions of people to read your work and that bothers me greatly. So now, as I sit here today I don’t know what is true and I don’t know what isn’t. So first of all, I wanted to start with The Smoking Gun report titled, “The Man Who Conned Oprah” and I want to know—were they right?

So Frey’s memoir was fictional, or if we are being very (very) generous, semi-autobiographical. He lied and blurred the lines between autobiography and real-life; using narrative to distort not only his personal data, but to distort the lives of others. It made for a pretty interesting (if graphic) novel, but this additional layer and awareness that, in addition to struggling with their own addictions, the figures in this novel (many of whom were now dead) had, on top of everything else, now been chewed up and spat out by Frey, was abhorrent. In particular I have always been repulsed by his crude appropriation and exploitation of the altogether dire narrative of a vulnerable female companion and love interest called Lilly, whose eventual suicide Frey “tweaks” both in terms of how and when it was enacted, presenting himself as a proto-saviour for Lilly, one who tragically arrives a few hours two late (this didn’t happen, Lilly’s suicide did not coincide with the day Frey arrived to “save her”).

What was marketed as a memoir narrative, that is to say a narrative wherein the data correlates with and can be backed up by real-world events, was in fact a fictional narrative, that is a narrative wherein the data is used to intimate real-world events. As Jesse Rosenthal points out in “Narrative Against Data” (2017), in these situations data is exploited to give the impression of truth or believability, and “the narrative relies for its coherence on our unexamined belief that a pre-existing series of events underlies it.” Frey’s manipulation of the rules of genre then, together with his exploitation of the co-dependence of data and narrative, was what was so unforgivable to Oprah and her readers. A Million Little Pieces would have been fine as fiction, but as a memoir, it was manipulative, rhetorical with the truth, and deeply exploitative. It was not the narrative that undid Frey’s web of manipulations, then, but the data that should have worked as their foundations, that should have been there to back his story up.

Frey’s hardly the first to do this. Philip Roth’s been skirting the lines that separate fiction and autobiography for decades. Dave Eggers’s 2000 work A Heartbreaking Work of Staggering Genius set the marker for contemporary US “good-guy” memoir-lit. More recently, Karl Ove Knausgård’s six-part “fictional” novel series provocatively titled My Struggle (in Norweigan: Min kamp) establishes the same tenuous, flirtatious relationship with the fictions and facts surrounding the life of its author. But while Knausgård’s work has certainly caused controversy in Sweden and his native Norway for the way it exposes the lives of his friends and family, this geographically localised controversy has served rather to fuel ongoing international interest and fascination with the novel series and its apparent real-life subject, Karl Ove himself, and his relentless mining of his autobiography to fuel these supposedly fictional narratives.

But the reaction to Frey’s twisting of the truth was altogether different. Why?

It seems you can alter facets of your memoirs if you label them as fictional to begin with (as Knausgård has done, an act that in itself is no doubt infuriating to the so-called “fictional” peoples exposed by his writings),  or if your modifications are relatively benign and done for stylistic reasons or dramatic effect (as Eggers has admitted to doing, and indeed as Barack Obama has done in his autobiographies).  But certain narratives, if declared truthful, must be verifiable by facts, evidence, and data. This extends beyond the realm of memoir and into present-day assertions of self. The intense push for President Trump to testify before Congress, for example, the implicit understanding (or hope, this is Trump after all) being that, much like in an autobiographical account of self, a narrative under oath must correlate to something tangible, to something real. It cannot simply turn out to be fictional scaffolding designed to dupe and distort the relationship between fact, fiction, narrative, and data.