Ways of Being in a Digital Age.

I’m just back from a few days in Liverpool, where I attended the “Ways of Being in a Digital Age” review conference at the University of Liverpool. This was the TCD working group’s first trip abroad for KPLEX dissemination, and my first ever trip to Liverpool.

UoL’s “Ways of Being in a Digital Age” was a a massive scoping review on how digital technology mediates our lives, and how technology and social change co-evolve and impact on each other. This conference came at the conclusion of the project, and invited participation in the form of papers that addressed any of the seven project domains: Citizenship and politics, Communities and identities, Communication and relationships, Health and wellbeing, Economy and sustainability, Data and representation, Governance and security. Naturally we hopped right into the “Data and Representation” category.

I was presenting a paper I co-wrote with Jennifer (Edmond, KPLEX’s PI) and, like most of my KPLEX activities thus far, I also used the platform as an opportunity to include as many funny data memes as I could reasonably fit into a 20 minute Powerpoint presentation. Which, by the way, is A LOT.

Our paper was titled “Digitising Cultural Complexity: Representing Rich Cultural Data in a Big Data environment” and in it we drew on many of the issues we’ve discussed thus far in the blog, such as data definitions, on the problems brought about by having so many different types of data, all classified using the same term (data), on data and surveillance, data and the humanities, and the “aura” of big data and how the possibilities of big data are manipulated and bumped up, so that it seems like an infallible “cure all,” when in fact it is anything but. And most importantly, on why complexity matters, and what happens when we allow alternative facts to take precedence over data.

The most exciting thing (from my perspective) was that we got to talk about some of our initial findings, findings based off of interviews I conducted with a group of computer scientists who very generously gave me their some of their time over the summer, and a more recent data mining project that is still underway, but that is producing some really exciting results. After all this desk research and talk about data over the last 9 months or so, the KPLEX team as a unit are in the midst of collecting data of our own, and that’s exciting. Below is a photo montage of my experience of the WOBDA conference, which mainly consists of all the different types of coffee I drank while there, along with some colossal pancakes I had for breakfast. I also acquired a new moniker, from the hipster barista in a very lovely coffee shop that I frequented twice during my two day stay. On the receipt, the note she wrote to find me was “Glasses lady.” 🙂

IMG_4444

Harry Potter and the Philosopher’s DIKW.

DIKW= Data. Information. Knowledge. Wisdom. DIKW!

That’s the way things flow. Or more specifically, that’s the way “the knowledge pyramid” says they flow.

From data we gather information we develop into knowledge which leads to wisdom.

Apparently.

But is it really that straightforward? Are our thought processes that streamlined, hierarchical and, let’s face it, uncomplicated? Or is DIKW simply a nice sounding but somewhat reductive anagram to be used when waxing lyrical about the philosophy of knowledge, information systems, information management, or pedagogy?

I for one am not all that convinced by DIKW. And I’m not the only one: the pyramid is widely criticised. But why? Where and how, exactly, does DIKW misrepresent how we think about and manage data and information? Today we’re going to explore the DIKW pyramid; specifically how exactly “data” gets transformed into “wisdom,” what exactly happens to it, and how a different approach to cleaning or processing that “data” can lead us to come to very different conclusions, and thus to very different states of “wisdom”. And to facilitate this philosophising on the nuances of DIKW and its vulnerability to corruption, I’m going to talk about Harry Potter and the “is Snape good or bad” plot that runs through all seven Harry Potter novels. Because, why not? Specifically I’m going to use Snape’s complexity as a character to highlight DIKW’s shortcomings and in particular how DIKW can be corrupted depending on how the data you collect is processed and interpreted.

As we all know, Snape looks kinda evil, acts kinda evil, hates Harry, and has a pretty dodgy past in which he was aligned with Voldemort, the wizard responsible for Harry’s parents’ deaths. He has a fondness for the “Dark Arts,” and, as head of Slytherin, an unhealthy interest in eugenics and so-called “blood purity” (never a good trait in a person). And he is played to absolute perfection by the unrivaled Alan Rickman, sadly now deceased.

Rowling maintains a near-constant back and forth throughout the series, with the characters forever pursuing the idea that Snape is bad, being thwarted in their pursuit of this idea, or thrown off their suspicions by Dumbledore who always reaffirms his strong faith in Snape. The dampening of any suspicion regarding Snape’s motives generally comes at the conclusion of any given book, only for these suspicions to be re-ignited at the start of the next book and the next adventure.

And just when this continual “is he or isn’t he a bad guy” threatens to get monotonous, with the well-trained reader now six books in and attuned to expect the usual — “Snape’s being shifty, ergo…he must be bad!” / “Nope he’s actually good, Dumbledore says so.” / “Oh okay let’s talk about this again in the next book.” — Rowling bucks our expectations spectacularly, and all of these hints and suspicions about Snape are seemingly verified  in book six, Harry Potter and the Half-Blood Prince, when Snape goes and kills Dumbledore, the one man who trusted and protected him absolutely; a most heinous crime, and one done using “Avada Kedavra,” the unforgivable curse.

Lets take a look at Snape’s first appearance, way back in book one, Harry Potter and the Philosopher’s Stone, or as it’s known in the US, Harry Potter and the Sorcerer’s Stone:Screenshot 2017-06-27 13.33.43.png

    (J. K. Rowling, Harry Potter and the Philosopher’s Stone, Bloomsbury 1999, 126.)

What do we get on Snape here?

He’s unhealthy looking, pale and yellowish. He could probably do with a good shampoo. Oh, and he has a hooked nose.

Now this description is controversial. Snape’s portrait could be considered to be “Jewish-coded” or even anti-Semitic; certainly it can be seen as having uneasy inter-textual chimes with overtly anti-Semitic portraits in classic (and classically anti-Semitic) canonical English-language texts such as Charles Dickens’s Oliver Twist or Shakespeare’s The Merchant of Venice where both Fagin and Shylock respectively (both pictured below) are presented as having the overt hooked noses that were considered characteristic of the so-called “Stage Jew.”

The “Stage Jew” is basically the Jewish equivalent of blackface, a crude form of racial stereotyping that was particularly popular during the Elizabethan period and thereafter. Much like blackface, these racist Jewish stereotypes were not just confined to the realms of theatre and literature, Hitler and the Nazi’s also made full use of racist caricatures in their propaganda.

Most all of this will (thankfully) be lost on a younger audience, but the question remains, does Rowling engage with this sadly all-too familiar visual trope deliberately? Knowing Snape’s story, his full story, as she claims she did all the way back in book one, does she proffer such material with a view to ultimately showing it up to be complete rubbish?

Snape’s appearance screams evil, irrespective of whether you want to connect that with the tradition of the Stage Jew, or with less racially charged narratives that seek to represent a person’s character in or through their appearance. It’s a frequent visual trope in Superhero(ine) movies, for example; the good guys look good, the bad guys look bad. And when the good guy turns bad (as they are wont to do on occasion), their nascent badness is represented visually through some change in their appearance. Again, in relation to the Spiderman 3 poster below, we can ask why the “evil” Spiderman is coded “black,” particularly when Spiderman hails from a country whose law enforcement has a well established track record of subjecting African American males to racial profiling, but we better stay on topic.

To return to Snape, to his appearance, and to how our reading of his appearance can change once we get to the crux of his character: Certainly Snape’s character leads us to question many of the facets of other peoples’ appearances we read and take for granted, their demeanour, their silences, their unreadability, their appearance, their complexity. Snape looks bad, but he is not bad, he is good. We have misread him and are guilty of superficially associating his morality with his appearance. This is not to say that readers of Harry Potter are guilty of racially profiling Snape, not at all, but simply that this tendency to create a link between appearance and morality has a long history in English literature, and one that unfortunately happens to have rather unpleasant racist roots that contemporary readers may not be aware of. This is just one small part of why Rowling is such a good author, and why the Harry Potter books are so rich and rewarding for growing minds in particular. But it’s also an example of how the DIKW pyramid can be drastically altered or corrupted depending on how you read the data at the bottom; the data being the material that proffers the opportunity to reach a state of wisdom regarding that particular material or phenomenon.

In other words, even if you’d only seen a very small selection of Marvel superhero movies, you have been trained to seek out information on a person’s moral character based off of their appearance, you may have taken facets of Snape’s malevolent appearance as “data” on Snape. This “data” would allow you to garner “information” on his character, because most figures that look like this in literature or superhero movies are bad or evil characters. Then, thanks to your experience of encountering Snape throughout the novels, this data would lead you to a position of “knowledge” or even to a position of “wisdom” regarding Snape (and peoples like Snape). This is where what we believe to be “wisdom” can start to cause problems.

—Pause here to note that the backstory of many of the villains in superhero movies are frequently tragic and often not all that different from the backstory of the hero or heroines. Similar data, different narratives, different superhero/ villain costume.—

This is also an example of how data is pre-epistemic. That means data has no truth, it just is, other people come along and clean it up, interpret it, narrativise it, assign it truth-values. Throughout the series Snape just is, he does not explain or excuse himself and he is as true to his love of Lily Potter at the beginning as he is at the end. We just happen not to be privy to his backstory. But Dumbledore is, and Rowling (naturally) is. Without the backstory readers are let run amok regarding the character of Snape, often hating him, feeling anxious in his presence, or fearing for the safety of the characters around him. But how much of that comes from Snape himself, or how much comes from other peoples’ interactions with him and their reading of him? Remember, many of the characters in the Potter-verse have already decided for themselves that Snape is evil and not to be trusted; the narratives we read are fueled by tainted interpretations of his data. Everything from his stare to his tone of voice is presented with adjectives that encourage us to read him as malevolent. The state of “Wisdom” (also, in my case, near-hysterical despair) we arrive at in book six when Snape kills Dumbledore is fuelled by cumulative knowledge and information that stems from misinterpreted data. The Snape-data has been read one way and one way alone, and there is no alternative narrative available, especially not once Dumbledore dies. There is no counter-argument to the data that seems to paint Snape as an absolute villain. None at least until, in his own death scene, he provides the memory (and narrative) that allows Harry to access the “truth value” of Snape’s history and his lifelong love for Lily.

Armed with this poignant insight into Snape’s childhood and history (his personal narrative, his personal account for the data-traces he has left on the Potter-verse), our DIKW pyramid is rewritten from the bottom up, but interestingly, much of it remains the same. The data stays data, but the narratives that we use form information from that data, to create knowledge from that information, and to eventually arrive at a sage-like state of sad wisdom regarding Snape’s sad fate, these narratives change. Snape still says the things he says, and does the things he does, we are just newly wise to his motives. We now know he is a good man. Same data, completely different DIKW.

We already known the DIKW model is problematic and oversimplified. As Christine Borgman notes, the “tripartite division of data, information, and knowledge […] oversimplifies the relationships among these complex constructs.”[1] Data is reinterpretable. And this is key. For the majority of the series Snape is continually heard, seen, and spoken about by characters in the texts using adjectives that assign morally dubious traits to his character. The –IKW part of Snape’s DIKW is unbalanced, because we do not get Snape’s personal narrative until the very end of his story. And it is only through this narrative that we can reassess the data on him we have collected, discarding some of it (such as the tendency to dramatise his appearance into one akin to a stage villain) as absolute rubbish, and reassessing what remains.

While Snape may carry the visual appearance (visual data) that makes it easy for us to suspect or infer that he is a “bad character,” while he may even carry the physiognomical hallmarks that hark back to racist character profiling in English literature, he is essentially a good person. What Rowling is saying here, in the most epic Rowling-esque fashion, is do not judge a person based on their appearance. Judge them on what they do and why they do it. This is Rowling’s real trump card to the “evil is as evil looks” camp of Snape-hating Potter fans. And arguably it is also Rowling’s way of redressing this unpleasant facet of English literary history, which sees race presented through face, and race or racial stereotypes sadly being presented as a measure of a character’s moral compass. Rowling writes back against this tradition by having Snape carry the same facial features of these similarly maligned “villainous” figures, features past readers or audiences would have taken as crude indications of his untrustworthiness. Yet instead of being “untrustworthy,” this same hook-nosed figure turns out to be one of the bravest, strongest, truest characters in the series. Take that Shakespeare and Dickens. Whup-ah.

So, we come back then to the problem of DIKW. Models such as DIKW create misleading and misrepresentative impressions about the supposed distinctions between the various facets of DIKW. DIKW also belies the central role narrative plays in all of this; narrative is the conveyor of information, knowledge, and wisdom. It is how we articulate and spread our opinions on data. And data is the foundation of DIKW, so depending on how that data is narrativised, the other elements in this hierarchy can be drastically different. One sees Snape as evil, and reads this wickedness in and into his every scene, up until his death. The other asks us to think carefully about what we do with our data, and the narratives we create from it, because even when we are wholly convinced in the veracity and justifiableness of our “wisdom,” we could be totally wrong, as we were with Snape.

[1] Christine L. Borgman, “Big Data, Little Data, No Data,” MIT Press, 17, accessed April 7, 2017, https://mitpress.mit.edu/big-data-little-data-no-data, 17.

 

 

The data (and narrative) behind a good lie.

In 2003 James Frey, a twenty-something American and recovering alcoholic and drug addict, published his memoir A Million Little Pieces. The memoir recounts his experience of addiction and rehabilitation, often in brutalist and unsparingly graphic detail. Two scenes in particular have stayed with me to this day, one involving a trip to the dentist for un-anaesthetised root-canal treatment (the horror), the other something altogether worse involving a woman and the things people in the grips of an addiction will do for drugs.

In 2005 Frey hit the big-time and A Million Little Pieces was selected for the Holy-Grail of Book Clubs, Oprah’s Book Club, an act that virtually guaranteed massive sales. Not long after the book topped the New York Times’s best sellers list, as Oprah’s Book Club choices tend to do, and there it remained for some 15 weeks, selling close to four million copies. Not bad for a former addict, eh?

Except Frey wasn’t all he claimed to be. And his memoir wasn’t entirely factual. To give but one example. Instead of being in jail for a respectable 87 days, Frey was in jail for just a few hours. The Smoking Gun published an article titled “A Million Little Lies” that opened with the amusingly melodramatic statement “Oprah Winfrey’s been had” (gasp!) and proceeded with an article outlining how Frey’s book had circumvented the truth and taken Winfrey along for the ride:

Police reports, court records, interviews with law enforcement personnel, and other sources have put the lie to many key sections of Frey’s book. The 36-year-old author, these documents and interviews show, wholly fabricated or wildly embellished details of his purported criminal career, jail terms, and status as an outlaw “wanted in three states.”

In an odd reversal, Frey’s lack of jail time and status as a not-so-delinquent former delinquent led to outrage and culminated in a massively dramatic face-off with Winfrey that is referred to by Winfrey herself as among her most controversial interviews:

Oprah: James Frey is here and I have to say it is difficult for me to talk to you because I feel really duped. But more importantly, I feel that you betrayed millions of readers. I think it’s such a gift to have millions of people to read your work and that bothers me greatly. So now, as I sit here today I don’t know what is true and I don’t know what isn’t. So first of all, I wanted to start with The Smoking Gun report titled, “The Man Who Conned Oprah” and I want to know—were they right?

So Frey’s memoir was fictional, or if we are being very (very) generous, semi-autobiographical. He lied and blurred the lines between autobiography and real-life; using narrative to distort not only his personal data, but to distort the lives of others. It made for a pretty interesting (if graphic) novel, but this additional layer and awareness that, in addition to struggling with their own addictions, the figures in this novel (many of whom were now dead) had, on top of everything else, now been chewed up and spat out by Frey, was abhorrent. In particular I have always been repulsed by his crude appropriation and exploitation of the altogether dire narrative of a vulnerable female companion and love interest called Lilly, whose eventual suicide Frey “tweaks” both in terms of how and when it was enacted, presenting himself as a proto-saviour for Lilly, one who tragically arrives a few hours two late (this didn’t happen, Lilly’s suicide did not coincide with the day Frey arrived to “save her”).

What was marketed as a memoir narrative, that is to say a narrative wherein the data correlates with and can be backed up by real-world events, was in fact a fictional narrative, that is a narrative wherein the data is used to intimate real-world events. As Jesse Rosenthal points out in “Narrative Against Data” (2017), in these situations data is exploited to give the impression of truth or believability, and “the narrative relies for its coherence on our unexamined belief that a pre-existing series of events underlies it.” Frey’s manipulation of the rules of genre then, together with his exploitation of the co-dependence of data and narrative, was what was so unforgivable to Oprah and her readers. A Million Little Pieces would have been fine as fiction, but as a memoir, it was manipulative, rhetorical with the truth, and deeply exploitative. It was not the narrative that undid Frey’s web of manipulations, then, but the data that should have worked as their foundations, that should have been there to back his story up.

Frey’s hardly the first to do this. Philip Roth’s been skirting the lines that separate fiction and autobiography for decades. Dave Eggers’s 2000 work A Heartbreaking Work of Staggering Genius set the marker for contemporary US “good-guy” memoir-lit. More recently, Karl Ove Knausgård’s six-part “fictional” novel series provocatively titled My Struggle (in Norweigan: Min kamp) establishes the same tenuous, flirtatious relationship with the fictions and facts surrounding the life of its author. But while Knausgård’s work has certainly caused controversy in Sweden and his native Norway for the way it exposes the lives of his friends and family, this geographically localised controversy has served rather to fuel ongoing international interest and fascination with the novel series and its apparent real-life subject, Karl Ove himself, and his relentless mining of his autobiography to fuel these supposedly fictional narratives.

But the reaction to Frey’s twisting of the truth was altogether different. Why?

It seems you can alter facets of your memoirs if you label them as fictional to begin with (as Knausgård has done, an act that in itself is no doubt infuriating to the so-called “fictional” peoples exposed by his writings),  or if your modifications are relatively benign and done for stylistic reasons or dramatic effect (as Eggers has admitted to doing, and indeed as Barack Obama has done in his autobiographies).  But certain narratives, if declared truthful, must be verifiable by facts, evidence, and data. This extends beyond the realm of memoir and into present-day assertions of self. The intense push for President Trump to testify before Congress, for example, the implicit understanding (or hope, this is Trump after all) being that, much like in an autobiographical account of self, a narrative under oath must correlate to something tangible, to something real. It cannot simply turn out to be fictional scaffolding designed to dupe and distort the relationship between fact, fiction, narrative, and data.

The green’s out: finding narrative in data, mining data for narrative.

When does blue stop becoming blue and start becoming less blue more green? Could you pinpoint the exact point where pigment stops being one and starts becoming another? And would that decision be objective or subjective? Take the Pantone colour bridge below, what is blue and what is green here?

Is there a fully blue blue? Is there a fully green green? And if there is an uber blue and an uber green, are they mutually exclusive?  Can they coagulate or mix? Of course they can, everyone who has played with paint knows this. But if and when they coagulate how do we represent that complex phenomenon? Bluegreen? Greenblue? BLgrUEeen? GREblENue? GbREENlue? And once again, is it possible to do this objectively? Is my blue the same colour blue as your blue? Are these terms in and of themselves sufficient to narrativise the colours we see before us? The modernists’ didn’t think so. Writing in 1921 Virginia Woolf composed the following two pieces, titled “Blue & Green,” and it’s with these two short pieces that I want to initiate this week’s post about data and narrative.

Green

THE PORTED fingers of glass hang downwards. The light slides down the glass, and drops a pool of green. All day long the ten fingers of the lustre drop green upon the marble. The feathers of parakeets—their harsh cries—sharp blades of palm trees—green, too; green needles glittering in the sun. But the hard glass drips on to the marble; the pools hover above the dessert sand; the camels lurch through them; the pools settle on the marble; rushes edge them; weeds clog them; here and there a white blossom; the frog flops over; at night the stars are set there unbroken. Evening comes, and the shadow sweeps the green over the mantelpiece; the ruffled surface of ocean. No ships come; the aimless waves sway beneath the empty sky. It’s night; the needles drip blots of blue. The green’s out.

Blue

The snub-nosed monster rises to the surface and spouts through his blunt nostrils two columns of water, which, fiery-white in the centre, spray off into a fringe of blue beads. Strokes of blue line the black tarpaulin of his hide. Slushing the water through mouth and nostrils he sings, heavy with water, and the blue closes over him dowsing the polished pebbles of his eyes. Thrown upon the beach he lies, blunt, obtuse, shedding dry blue scales. Their metallic blue stains the rusty iron on the beach. Blue are the ribs of the wrecked rowing boat. A wave rolls beneath the blue bells. But the cathedral’s different, cold, incense laden, faint blue with the veils of madonnas.[1]

So here in these pieces we see Woolf exploring the concept of blue and green, attempting to evoke the “feeling” of blue and green, the “experience” of blue and green. Because for Woolf, the adjectives alone were not enough. Blue may signify blue or blueness (just as green signifies green or greenness), but there is more to blue than the word “blue,” and there is more to green than “green.” After all, what, exactly, is so blue about blue or so green about green that is cannot be described by any other term or terms?

This provides us with the foundations for yet another narrative/ data puzzle. When it comes to complex and subjective experiential phenomenon such as colour or emotion, what is data and what is narrative? And how can our assertions about either be considered in any way objective? Even the most banal of assertions, such as an assignation of colour (“blue”) or an emotion (“sad”) are innately tied to our subjective experiences of that colour or emotion. If I am colour blind I may see a different shade of blue to you, but that facet of my experience of reality (my dataset of colours and of the colour blue specifically) will be no less “true” or representative of my reality than yours. Similarly, if I am autistic, my understanding or interpretation of the facial contortions that signify someone in my vinicity is “sad” or “happy” will be different to yours. They will be fed by different experiences and different stimuli, but are no less valid an interpretation of reality that the weird faces you find in your textbook neuro-typical example of a “sad” face.

In the above fictional pieces by Woolf, we encounter narrativised, impressionistic versions of blue and green; but what data or datum could we extract from the pieces? The title, “Blue & Green,” does not really do the respective narratives justice, because there is something more than blue and green here; or is there? Do the narratives elaborate on the data, create narrative from the data, or do the narratives reveal the latent richness inherent in the data? Woolf gave us facets of blue and green-ness we did not know existed within the colours; activating the speculative value of the data, presenting facets of blue that were already present in blue, just waiting for the right person to unlock its richness or full value.

In this way her work is analogous to Picasso’s or Monet’s respective blue periods, periods wherein both artists explored the nature of colours such as blue. We can place Woolf’s “Blue & Green” alongside representative pieces such as Picasso’s Blue Room, or to any one of Monet’s many water lilies studies (where he also explored yellow, but let’s not overload the KPLEX colour palette). To return to the issue of subjectivity, it is significant that in his later life Monet’s colour choices were greatly influenced and affected by sight problems; and so again, the blue I see, the blue you see, is not necessarily the blue he saw, which was itself not the same as the blue he saw following surgery to remove his cataracts in 1923.

The true depth of blue’s blueness, the full-Irish 40 shades green sees the colours incorporate elements that are not simple blues or greens. Taken together, these narratives paint a picture of blue and green that is far too complex for the adjectives alone; perhaps the Pantone colour bridge introduced earlier, or facets of these colour tone experiments by the Impressionists, and later by Picasso and the Cubists (among many, many others) provides us with a more complete dataset of the Woolfian blue and green.

So once again, is the colour blue when I write it the same shade of blue as the colour blue you imagine when you read this sentence? And if we cannot agree on a standardised version of something as commonplace as blue, how can we possibly agree on terms that carry more weight or exert control over us and our surroundings? Covfefe, anyone?

[1] Virginia Woolf, “Blue & Green,” Monday or Tuesday, 1921.

Ode to Spot: We need to talk about Data (and narrative).

We need to talk about data. And narrative. In fact, data and narrative need to talk to each other, work some issues out, attend relationship counselling, try to recapture that “spark,” that “special something” that kept bringing them together, that has made them, at times, seem inseparable, but also led to some pretty fiery clashes.

So, what’s the deal? What is the relationship between data and narrative? What role does narrative play in our use of data? What role does data play in our fashioning of narrative? How much of what we have to say about each is determined by pre-established notions we have about either one of these entities? Why did I instinctively opt, for example, while writing the previous two sentences, to refer to data as something that is “used” and narrative as something that is “fashioned”? And further still is it correct to refer to them as wholly distinct? Can we have a narrative that is bereft of data? And are data or datasets wholly bereft of narrative?

Data and narrative are presented by some as being irreconcilable or antithetical. Lev Manovich presents them as “natural enemies”[1] whereas Jesse Rosenthal, speaking in the context of the relationship between fictional narratives and data, observes how “the novel form itself is consistently adept at expressing the discomfort that data can produce: the uncertainty in the face of a central part of modern existence that seems to resist being brought forward into understanding.”[2] Todd Presner argues that data and narrative exist in a proto-antagonistic relationship wherein narrative begat data, and data begat narrative. I use antagonistic here in the sense of musculature, with the relationship between narrative and data being analogous to why you’re not able to flex your biceps and your triceps at the same time, because for one to flex, the other must relax or straighten.

Presner situates database and narrative as being at odds or somehow irreconcilable

because databases are not narratives […] they are formed from data (such as keywords) arranged in relational tables which can be queried, sorted, and viewed in relation to tables of other data. The relationships are foremost paradigmatic or associative relations […] since they involve rules that govern the selection or suitability of terms, rather than the syntagmatic, or combinatory elements, that give rise to narrative. Database queries are, by definition, algorithms to select data according to a set of parameters.[3]

So databases are not narratives, and while narratives can contain (or be built on data), they are not data-proper. This means there is a continual transference between data and narrative in either direction, a transference that is all the more explicit and controversial in the transition from an analogue to a digital environment. This transition, the extraction of data from narrative, or the injection of data into narrative, is a process that has significant ethical and epistemological implications:

The effect is to turn the narrative into data amenable to computational processing. Significantly, this process is exactly the opposite of what historians usually do, namely to create narratives from data by employing source material, evidence, and established facts into a narrative.[4]

Rosenthal also presents data and narrative as operating in this interrelated but tiered manner, with narrative being built on data, or data serving as the building blocks of narrative. And while Rosenthal focuses on fictional narratives, this is the case irrespective of whether the narrative in question is fictional or non-fictional because, after all, non-fictional narrative is still narrative.[5] Whereas Presner focuses on the complications surrounding the relationship between narrative and data in digital environments, Rosenthal’s engagement is more open to and acknowledging of the established and dynamic nature of the relationship between narrative and data in literature: “Narrative and data play off against each other, informing each other and broadening our understanding of each.”[6]

Data and narrative could be said to exist in a dynamic, dyadic relationship then. Indeed, Kathryn Hayles argues that data and narrative are symbiotic and should be seen as “natural symbionts.”[7] So their relationship is symbiotic, rather than antagonistic; they intermingle and their relationship is mutually beneficial, with data perhaps adding credence to narrative (fictional or otherwise) and narrative helping us understand data by making clear to us what the data is saying, or has the capacity to say (in the eyes of the person working with it). That said, if they are symbionts, what is the ratio of their intermingling? Is it possible for a narrative become data-heavy or data-saturated? Does this impede the narrative from being narrative?  Would a data-driven narrative read something along the lines of Data’s poem to his pet cat Spot from Star Trek The Next Generation (TNG) Season Six Episode Five:

Ode To Spot

Felis catus is your taxonomic nomenclature,

An endothermic quadruped, carnivorous by nature;

Your visual, olfactory, and auditory senses

Contribute to your hunting skills and natural defenses.

I find myself intrigued by your subvocal oscillations,

A singular development of cat communications

That obviates your basic hedonistic predilection

For a rhythmic stroking of your fur to demonstrate affection.

A tail is quite essential for your acrobatic talents;

You would not be so agile if you lacked its counterbalance.

And when not being utilized to aid in locomotion,

It often serves to illustrate the state of your emotion.

O Spot, the complex levels of behaviour you display

Connote a fairly well-developed cognitive array.

And though you are not sentient, Spot, and do not comprehend,

I nonetheless consider you a true and valued friend.

This “ode” is an example of a piece of writing so data-laden[8] that the momentum of the narrative is hampered, or rather the lyricism necessary to bring Data’s sweet ode to his cat into Shakespeare territory is seriously lacking. And what do I mean by lyricism? Well, the answer to that is relatively simple, just take a look at Shakespeare’s “Sonnet 18”:

Shall I compare thee to a summer’s day?

Thou art more lovely and more temperate.

Rough winds do shake the darling buds of May,

And summer’s lease hath all too short a date.

Sometime too hot the eye of heaven shines,

And often is his gold complexion dimmed;

And every fair from fair sometime declines,

By chance, or nature’s changing course, untrimmed;

But thy eternal summer shall not fade,

Nor lose possession of that fair thou ow’st,

Nor shall death brag thou wand’rest in his shade,

When in eternal lines to Time thou grow’st.

     So long as men can breathe, or eyes can see,

     So long lives this, and this gives life to thee.

In contrast to the Data-ode, and to the lyrical Shakespearian ode, would a narrative that is almost entirely bereft of data (and arguably also bereft of narrative, but let’s not go there) read something like a Trump rally speech?

A few days ago I called the fake news the enemy of the people. And they are. They are the enemy of the people.

(APPLAUSE)

Because they have no sources, they just make ’em up when there are none. I saw one story recently where they said, “Nine people have confirmed.” There’re no nine people. I don’t believe there was one or two people. Nine people.

And I said, “Give me a break.” Because I know the people, I know who they talk to. There were no nine people.

But they say “nine people.” And somebody reads it and they think, “Oh, nine people. They have nine sources.” They make up sources.

They’re very dishonest people. In fact, in covering my comments, the dishonest media did not explain that I called the fake news the enemy of the people. The fake news. They dropped off the word “fake.” And all of a sudden the story became the media is the enemy.

They take the word “fake” out. And now I’m saying, “Oh no, this is no good.” But that’s the way they are.

So I’m not against the media, I’m not against the press. I don’t mind bad stories if I deserve them.

And I tell ya, I love good stories, but we don’t go…

(LAUGHTER)

I don’t get too many of them.

But I am only against the fake news, media or press. Fake, fake. They have to leave that word.

I’m against the people that make up stories and make up sources.

They shouldn’t be allowed to use sources unless they use somebody’s name. Let their name be put out there. Let their name be put out.

(APPLAUSE)

“A source says that Donald Trump is a horrible, horrible human being.” Let ’em say it to my face.

(APPLAUSE)

Let there be no more sources.[9]

And now, by means of apology and for some brief respite, I offer you a meme of Data.

But are these our only options? Are narrative and data really at odds in this way? Is there a way to reconcile narrative and the database? Perhaps it is time to stop thinking of data and narrative as being at odds with each other; perhaps it is necessary to break down this dyad and facilitate better integration?

Traditionally, narrative driven criticism took the form of “retelling,” what Rosenthal calls an “artful,” or “opinionated reshaping” of the underlying evidence (aka the data) whereas more contemporaneous data driven criticism largely takes the form of visualisations that attempts to, as Rosenthal puts it, “let the data speak for itself, without mediation.”[10] This turn to the visual is driven by a hermeneutic belief akin to Ellen Gruber Garvey’s assertion that “Data will out.”[11] But this is something of a contradiction of terms considering elsewhere we are told (by Daniel Rosenberg) that data has a “pre-analytical, pre-factual status,”[12] that data is an entity “that resists analysis,”[13] but can also be “rhetorical,”[14] that “False data is data nonetheless”[15] and that “Data has no truth. Even today, when we speak about data, we make no assumptions about veracity.”[16] Borgman elaborates, stating that “Data are neither truth nor reality. They may be facts, sources of evidence, or principles of an argument that are used to assert truth or reality.”[17] That’s a lot of different data on data.

Fictional narratives can be built on supposedly reputable data, this helps the reader to suspend their disbelief and “believe in” the fictions they encounter within the narrative. Supposedly non-fictional narratives, such as presidential speeches, can be based on tenuously obtained, fabricated data, or can make reference to data that is not proffered, and may not even exist, rather like dressing a corgi up in a suit and asking it for a political manifesto.

What we’ve looked at today concerns the evolving discomfiture of our difficulties outlining the relationship between narrative and data. In the interplay between analogue and digital, different factions emerge regarding the relationship between data and narrative, with narrative and data variously presented as being anathematic, antagonistic, or symbiotic, with data presented as something one can be either “for” or “against” and with distinct preferences for one or the other (either narrative or data) being shown on a discipline specific, researcher specific, author specific level. At the same time, irrespective of which of these positions you adopt, it is clear that data and narrative are intricately linked and deeply necessary to each other. The question is then, how can one facilitate and elucidate the other best in a digital environment?

[1] Lev Manovich, The Language of New Media, 2002, 228.

[2] Jesse Rosenthal, “Introduction: ‘Narrative against Data,’” Genre 50, no. 1 (April 1, 2017): 2., doi:10.1215/00166928-3761312.

[3] Presner, in Fogu, Claudio, Kansteiner, Wulf, and Presner, Todd, Probing the Ethics of Holocaust Culture, History Unlimited (Cambridge: Harvard University Press, 2015), http://www.hup.harvard.edu/catalog.php?isbn=9780674970519.

[4] Presner, in ibid.

[5] “Yet the narrative relies for its coherence on our unexamined belief that a preexisting series of events underlies it. While data depends on a sense of irreducibility, narrative relies on a fiction that it is a retelling of something more objective. […] The coherence of the novel form, then, depends on making us believe that there is something more fundamental than narrative.” Rosenthal, “Introduction,” 2–3.

[6] Ibid., 4.

[7] N. Katherine Hayles, “Narrative and Database: Natural Symbionts,” PMLA 122, no. 5 (2007): 1603.

[8] And of course it’s data-laden, it was composed by Data, a Soong-type android, so basically a walking computer, albeit a mega-advanced one.

[9] “Transcript of President Trump’s CPAC Speech,” http://time.com/4682023/cpac-donald-trump-speech-transcript/

[10] Rosenthal, “Introduction,” 4.

[11] Ellen Gruber Garvey, “‘facts and FACTS:’ Abolitionists’ Database Innovations,” Gitelman, “Raw Data” Is an Oxymoron, 90.

[12] Rosenberg, “Data before the Fact,” in Gitelman, “Raw Data” is an Oxymoron, 18.

[13] Rosenthal, “Introduction,” 1.

[14] Rosenberg, “Data before the Fact,” in Gitelman, “Raw Data” Is an Oxymoron, 18.

[15] Rosenberg, “Data before the Fact,” in ibid.

[16] Rosenberg, “Data before the Fact,” in ibid., 37.

[17] Christine L. Borgman, “Big Data, Little Data, No Data,” MIT Press, 17, accessed April 7, 2017, https://mitpress.mit.edu/big-data-little-data-no-data. 

The Revolution of the McWord; or, why difference and complexity is necessary.

“It’s a beautiful thing, the destruction of words.”—George Orwell, 1984.

One of my earliest memories of being reprimanded happened when I was in Junior or Senior Infants at Primary School. During a French lesson I needed to use the bathroom (tiny humans often do). We had been told we could only communicate in French, so I sat there attempting to gather and translate my toilet related thoughts into something suitably Francophone. Eventually I put up my hand, got the teacher’s attention, pointed at my chest and said “Moi,” pointed at the door that lead to the bathrooms and said “toilette?” The teacher snapped and said “No Georgina you are not a toilet!”

A little harsh perhaps, especially considering I was a four-year old three-foot high mini-human, but still, I haven’t forgotten it, and now I’m fluent in French. My effort at breaking down a language barrier caused someone to snap and (it seems) be insulted by my tiny human attempt at French.

So certainly I agree with Jennifer’s point from her recent blog article here that “Building intimacy (for this is what I take the phrase “brings you closer” to mean) is not about having a rough idea of what someone is saying, it is about understanding the nuance of every gesture, every reference and resonance.” This is part of the (many) reason(s) why people on the autism spectrum, for example, find social interaction so difficult, because of a difficulty understanding these very gestural nuances that are so central to human communications. And this lack of understanding often brings with it frustration, isolation, loneliness, and pain. The point is: it’s not just about the words; it’s about how they are said, the tone, the gesture, the contexts. These are things a translation program cannot understand or impart, and it is arrogant to suggest that such facets of communication are by-passable or expendable when so many people struggle with them on a day-to-day basis. Moreover, they are facets of human communication that cannot be erased or eliminated from speech-exchanges with a view to making these exchanges “simpler” or “doubleplusgood.” That brings us right into 1984 territory.

From the perspective of Eugene Jolas, author of the “Revolution of the Word” manifesto published in transition magazine, a modernist periodical active in Paris throughout the 1920s and 1930s whose contributors included the likes of James Joyce, Gertrude Stein,  and Samuel Beckett, language was not complex enough:

Tired of the spectacle of short stories, novels, poems and plays still under the hegemony of the banal word, monotonous syntax, static psychology, descriptive naturalism, and desirous of crystallizing a viewpoint… Narrative is not mere anecdote, but the projection of a metamorphosis of reality” and that “The literary creator has the right to disintegrate the primal matter of words imposed on him by textbooks and dictionaries.”[1]

So, language, or rather languages (Jolas was fluent in several and often wrote in an amalgamation that, he felt, better reflected his hybridic Franco-German (Alsatian) and American identity[2]), was not complex enough to fully express the totality of reality.

Mark Zuckerburg proposes something of a devolution, a de-creation, a simplifying of difference, a reintegration and amalgamation of the facets that distinguish us from others. But while it might appear useful (Esperanto, anyone?), is the experience going to lead to richer conversations? To a demolition of barriers? Or will it result in something akin to Point It, the highly successful so-called “Travellers Language Kit” that contains no language at all, but rather an assortment of pictures that allow one to leaf through the book and point at the item you want.

So, with a sensitive interlocutor, one could perhaps intuit that “Me *points at* Coca-Cola” means “Hi, I would like a Coca-Cola.” Or that “Me *points at* toilet” would likely mean “Hi, I desperately need to use your bathroom, could you be so kind as to point me in the right direction?”

After my childhood French toilet incident, I myself could never overcome the mortification involved in using Point It. But even if I did, would the success of an exchange wherein “Me *points at* Coca-Cola” results in my being handed a Coca-Cola give me the same satisfaction as my first successful exchange with someone in another language did? The first time I managed to say something in French to a French person in France and be met with a response in French as opposed to a confused look or (worse) a response in English. Would Zuckerberg’s own much-lauded trip to China in 2014 where he was interviewed and responded in Mandarin have received as much positive press if he had worked through an interpreter, or used a pioneering neural network translation platform? I don’t think so.

Reducing or eliminating language difference also creates hierarchies, and this is dangerous. What language will we agree to communicate in? Why one language and not another? What facets of my individuality are accented in my native language that are perhaps left out or lost in another?

In short, there is an ethical element to this, and one that must be acknowledged and addressed. It’s similar to the argument Todd Presner articulates in “The Ethics of the Algorithm” when he notes the negative affect of reducing human experience (in this case, the testimonies of Holocaust survisors) to keywords so that their experiences become “searchable”: “it abstracts and reduces the human complexity of the victims’ lives to quantized units and structured data. In a word, it appears to be de-humanizing”[3]

We have to resist what Presner calls “the impulse to quantify, modularize, distantiate, technify, and bureaucratize the subjective individuality of human experience,”[4] even if this impulse is driven by a desire to facilitate communications across perceived borders. Finding, maintaining, and celebrating the individual in an era that is putting increasing pressure to separate the “in-” from the “-dividual” for the sake of facility will lead (and has perhaps already lead, if we can refer back to Don DeLillo’s 1985 observation that “You are the sum total of your data.”) to the era of the “dividua”[5]; where instead of championing individuality, people are reduced to their component data sets, or rather the facets of their personhood that can be assigned to data sets, with the rest—the enigmatic “in-” that makes up an individual—deemed unnecessary, a “barrier” to facile communications.

Rather that working to fractalise language, as Eugene Jolas did, universal translation (which is itself a misnomer, all translations are, to a degree, inexact and entail a degree of intuition or creativity to render one word in or through another word) simplifies that which cannot, and should not, be simplified.  This would be doubleplusungood.

Complexity matters.

KPLEX matters.

[1] Eugene Jolas, “Revolution of the Word,” transition 16/17, 1929.

[2] Born in the New Jersey, Jolas moved to Europe the bilingual Alsace-Lorraine region as a young child, and later spent key formative years in the United States.

[3] Presner, in “The Ethics of the Algorithm: Close and Distant Listening to the Shoah Foundation Visual History Archive” in Fogu, Claudio, Kansteiner, Wulf, and Presner, Todd, Probing the Ethics of Holocaust Culture..

[4] Presner, in ibid.

[5] Deleuze, “Postscript on the Societies of Control.”

Data before the (alternative) facts.

Ambiguity and uncertainty, cultural richness, idiosyncracy and subjectivity are par for the course when it comes to humanities research, but one mistake that can be made when we approach computational technologies that proffer access to greatly increased quanta of data, is to assume that these technologies are objective. They are not, and just like regular old fashioned analogue research, they are subject to subjectivities, prejudice, error, and can display or reflect the human prejudices of the user, the software technician, or the organisation responsible for designing the information architecture; for designing this supposedly “objective” machine or piece of software for everyday use by regular everyday humans.

In ‘Raw Data’ is an Oxymoron Daniel Rosenberg argues that data is rhetorical.[1] Christine Borgman elaborates on this stating that “Data are neither truth nor reality. They may be facts, sources of evidence, or principles of an argument that are used to assert truth or reality.”[2] But one person’s truth is, or can be, in certain cases, antithetical to another person’s reality. We saw an example of this in Trump aide Kellyanne Conway’s assertion that the Trump White House was offering “alternative facts” regarding the size of the crowd that attended Trump’s inauguration. Conway’s comments refer to the “alternative facts” offered by White House Press Secretary Sean Spicer, who performed an incredible feat of rhetorical manipulation (of data) by arguing that “this was the largest audience to ever witness an inauguration, period.”

While the phrase “alternative facts” may have been coined by Conway, it’s not necessarily that new as a concept;[3] at least not when it comes to data and how big data is, or can be, used and manipulated. So what exactly am I getting at? Large data can be manipulated and interpreted to reflect different users’ interpretations of “truth.” This means that data is vulnerable (in a sense). This is particularly applicable when it comes to the way we categorise data, and so it concerns the epistemological implications of architecture such as the “search” function, and the categories we decide best assist the user in translating the data into information, knowledge and the all-important wisdom (à la the “data, information, knowledge and wisdom” (DIKW) system). Johanna Drucker noted this back in 2011, and her observations are still being cited:

It is important to recognize that when we organize data into categories (according to population, gender, nation, etc.), these categories tend to be treated as if they were discrete and fixed, when in fact they are interpretive expressions (Drucker 2011).[4]

This correlates with Rosenberg’s idea of data as rhetorical, and also with Rita Raley’s argument that data is “performative.”[5] Like rhetoric and performance then, this means the material can be fashioned to reflect the intentions of the user, that it can be used to falsify truth, or to insist that perceived truths are in fact false; that news is “fake news.” Data provided the foundations for the arguments voiced by each side over the ratio of Trump inauguration attendees to Obama inauguration attendees, with each side insisting their interpretation of the data allowed them to present facts.

This is one of the major concerns surrounding big data research and Rosenthal uses it to draw a nice distinction between notions of falsity as they are presented in the sciences and the humanities respectively; with the first discipline maintaining a high awareness of what are referred to as  “false-positive results”[6] whereas in the humanities, so-called “false-positives” or theories that are later disproven or refuted (insofar as it is possible to disprove a theory in the humanities) are not discarded, but rather incorporated into the critical canon, becoming part of the bibliography of critical works on a given topic: “So while scientific truth is falsifiable, truth in the humanities never really cancels out what came before.”[7]

But perhaps a solution of sorts is to be found in the realm of the visual and the evolution of the visual (as opposed to the linguistic, which is vulnerable to rhetorical manipulations) as a medium through which we can attempt to study of data. Rosenthal argues that visual experiences of data representations trigger different cognitive responses:

When confronted with the visual instead of the linguistic, our thought somehow becomes more innocent, less tempered by experience. If data is, as Rosenberg suggests, a rhetorical function that marks the end of the chain of analysis, then in data-driven literary criticism the visualization allows us to engage with that irreducible element in a way that language would not. As Edward R. Tufte (2001, 14), a statistician and author on data visualization, puts it, in almost Heidegerrian language, ‘Graphics reveal data.’[8]

So “Graphics reveal data,” and it was to graphics that people turned in response to Conway and Spicer’s assertions regarding the “alternative facts” surrounding Trump’s inauguration crowd. Arguably, these graphics (below) express the counterargument to Conway and Spicer’s assertions in a more innately understandable way than any linguistic rebuttal, irrespective of how eloquent that rebuttal may be.

But things don’t end there (and it’s also important to remember that visualisations can themselves be manipulated). Faced with graphic “alternative-alternative facts” in the aftermath of both his own assertions and Conway’s defense of these assertions, Spicer countered the visual sparsity of the graphic image by retroactively clarifying that this “largest audience” assertion incorporated material external to the datasets used by the news media to compile their figures, taking in viewers on Youtube, Facebook, and those watching over the internet the world over. The “largest audience to ever witness an inauguration, period” is, apparently, “unquestionable,” and it was obtained by manipulating data.

[1] Daniel Rosenberg, “Data before the Fact,”’ Gitelman, “Raw Data” is an Oxymoron, 18.

[2] Borgman, “Big Data, Little Data, No Data,” 17.

[3] We can of course refer to Orwell’s 1984 and the “2+2=5” paradigm here.

[4] Karin van Es, Nicolàs López Coombs & Thomas Boeschoten, “Towards a Reflexive Digital Data Analysis” in Mirko Tobias Schäfer and Karin van Es, The Datafied Society. Studying Culture through Data, 176.

[5] Rita Raley, “Dataveillance and Countervailance” in Gitelman, “Raw Data” is an Oxymoron, 128.

[6] Rosenthal, “Introduction: Narrative Against Data in the Victorian Novel,” 6.

[7] Ibid., 7.

[8] Ibid., 5.

The rule and question of the eggs: how do we decide what data is the right data?

The rule and question of the eggs.

A young maiden beareth eggs to the market for to sell and her meetheth a young man that would play with her in so much that he overthroweth and breaketh the eggs every one, and will not pay for them. The maid doth him to be called afore the judge. The judge condemneth him to pay for the eggs/ but the judge knoweth not how many eggs there were. And that he demandeth of the maid/ she answereth that she is but young, and cannot well count.[1]

“The rule and question of the eggs” is an arithmetic problem contained within the earliest English-language printed treatise on arithmetic, An Introduction for to Learn to Reckon with the Pen.[2] In addition to being flat-out joyous to read, “The rule and question of the eggs” (along with the other examples Travis D. William’s discusses in his brilliant essay on early modern arithmetic in ‘Raw Data’ is an Oxymoron) raises some important issues that are relevant on a multidisciplinary level in terms of how we approach the archives of our respective fields of study with a view to making them available on digital platforms; furthermore, the “rule and question of the eggs” raises interesting questions in terms of how nuanced cultural artefacts (of which this is but one example) are to be satisfactorily yolked together (see what I did there?) and represented en masse in the form of big data.

It is not easy to define what facets of this piece of hella dodgy arithmetic merit particular attention, or to anticipate what facets of the piece other readers (present day or future readers) may find interesting. If we were to go about entering this into an online archive or database, what keywords would we use? What aspects of this problem are of particular importance? What information is essential, what is inessential? For practitioners of maths, is it the bizarre non-existent non-workable formula that seemingly asks the student to figure out how many eggs the girl was carrying in her basket, despite the volume of the basket being left out and the girl herself professing to not knowing how to count? For historians of mathematics and early modern approaches to the pedagogy of mathematics is it the language used to frame the problem? For sociologists, feminists, or historians (etc.) is it the unsettling reference to a “young man that would play with” a young woman? Is it the fact that the crime being reported is the destruction of eggs as opposed to the public assault? Or is it the fact that the young girl seemingly countered his unwelcome advances by committing fraud because she claimed she had been carrying 721 (!) eggs.[3] Simply put, where or what is the data here? And is the data you pull from this piece the same as the data I pull from it, or the same as the data a reader in fifty years time will pull from it?

Houston, we have a problem: if we go to categorise this example, we risk leaving out any one of the many essential facets that make the “rule and question of the eggs” such a rich historical document. But categorise it we must. And, just like moving between languages (English-to-French or vice versa), when we move from rich, nuanced, ambiguous and complex linguistic narratives to the language of the database, the dataset, the encoded set of assigned unambiguous values readable to computers, we expose the material to a translation act that imposes delimiting interpretations on that material by creating datasets that drastically simplify the material. Someone decides what is worth translating, what is incidental, and what should be left out here. Someone converts the information as it stands in one language (linguistic narrative), into supposedly comparable or equivalent information in another language (computer narrative).

Further still, by translating this early modern piece into information that is workable within the sphere of contemporaneous digital archival practices, we create a scenario wherein the material is no longer read as it was intended to be read. We impose our thinking and understanding about maths (or women’s rights, say, because the “play with” part really irks me) onto their text by taking it and making of it a series of functional datasets relevant to our particular scholarly interests. As Williams points out “a data set is already interpreted by the fact that is a set: some elements are privileged by inclusion; while others are denied relevance through exclusion.”[4]

In analysing these pieces Williams establishes four terminologies: “our reading and their reading, our rigor and their rigor.”[5] He elaborates:

Our reading is a practice of interpretation that seeks to understand the appearance and function of texts within their original historical and cultural milieus. Our reading thus incorporates the need to understand with nuance their reading: why and how contemporaneous readers would read the texts produced by their cultures.[6]

That’s all well and good when approached from an analogue-dependent research environment where one is tackling these early modern maths problems one by one. After all, this is merely one maths problem within an entire book containing maths problems. But what if we were to take it to a Borgesian level, to a big data level wherein this “rule of the eggs” is merely one math problem within an entire book containing maths problems, a book contained within a library containing books that contain only maths problems; a library that was in fact the ur-library of maths books, containing every maths book and every maths problem ever written.

When we amp up the scale to the realm of big data and this one tiny problem becomes one tiny problem within an entire ur-library of information, how do we stay cognisant of the fact that every entry in a given dataset, no matter how seemingly incidental or minute, could be as detailed and nuanced as our enigmatic rule and question of the eggs?

[1] Quoted in Travis D. Williams “Procrustean Marxism and Subjective Rigor,” Gitelman, “Raw Data” is an Oxymoron, 45.

[2] In am indebted to Travis D. Williams’s essay “Procrustean Marxism and Subjective Rigor: Early Modern Arithmetic and Its Readers” (to be found in “Raw Data” is an Oxymoron (2013)) for bringing these incredible examples to light.

[3] Williams notes that the “correct” answer (or rather the answer recorded in the arithmetic book as the “correct” answer) is 721 eggs. But this would mean that the young maiden carrying roughly 36 kilos (yes, I’ve done the math) of egg, which seems unlikely. Williams “Procrustean Marxism and Subjective Rigor,” ibid.

[4] Travis D. Williams “Procrustean Marxism and Subjective Rigor,” 41.

[5] Travis D. Williams “Procrustean Marxism and Subjective Rigor,” Gitelman, “Raw Data” is an Oxymoron, 42.

[6] Travis D. Williams “Procrustean Marxism and Subjective Rigor,” ibid.

Featured image was taken from http://www.flickr.com

Tinfoil hats, dataveillance, and panopticons.

When I started my work with KPLEX, I was not expecting to encounter so many references to literature. Specifically, to works of fiction I have read in my capacity as an erstwhile undergraduate and graduate student of literature who had (and still has) a devout personal interest in the very particular, paranoid postmodern fictions that crawled out of the Americas (North and South) like twitchy angst-ridden spiders in the mid-to-latter half of the 20th century. The George Orwell references did not surprise me all that much; after all, everyone loves to reference 1984. But Jorge Luis Borges, Thomas Pynchon, and Don DeLillo? These guys produced (the latter two are still producing) the kind of paranoiac post-Orwellian literature that could be nicely summed up by the Nirvana line “Just because you’re paranoid/ Don’t mean they’re not after you,” which is itself a slightly modified lift straight out of Joseph Heller’s Catch 22.Pynchon-simpsons.0.0      

It seems, however, that when it comes to outlining, theorising and speculating over the state, uses, and value of data in 21st century society, the paranoid tinfoil hat wearing Americans and their close predecessor, the Argentinian master of the labyrinth, got there first.

We are all by now familiar with—or have at least likely heard reference to—the surveillance system in operation in 1984; a two-way screen that captures image and sound so that the inhabitants of Orwell’s world are always potentially being watched and listened to. In a post-Snowden era this all-seeing all-hearing panoptic Orwellian entity has already been referenced to death, and indeed, as Rita Raley points out, Orwell’s two-way screen has long been considered inferior to the “disciplinary and control practice of monitoring, aggregating, and sorting data.”[1] In other words, to the practice of “dataveillance.[2] But Don DeLillo’s vision of the role data would play in our future was somewhat different, more nuanced, and most importantly, is less overtly classifiable as dystopian; in fact, it reads rather like a description of an assiduous Google Search, yet it is to be found in the pages of a book first published in 1985:

It’s what we call a massive data-base tally. Gladney, J.A.K. I punch in the name, the substance, the exposure time and then I tap into your computer history. Your genetics, your personals, your medicals, your psychologicals, your police-and-hospitals. It comes back pulsing stars. This doesn’t mean anything is going to happen to you as such; at least not today or tomorrow. It just means you are the sum total of your data. No man escapes that.[3]

Dataveillance is interesting because its function is not just to record and monitor, but also to speculate, to predict, and maybe even to prescribe. As a result, as Raley points out, its future value is speculative: “it awaits the query that would produce its value.”[4] By value Raley is referring to the economic value this data may have in terms of its potential to influence people to buy and sell things; and so, we have a scenario wherein data is traded in a manner akin to shares or currency, where “data is the new oil of the internet”:[5]

Data speculation means amassing data so as to produce patterns, as opposed to having an idea for which one needs to collect supporting data. Raw data is the material for informational patterns to come, its value unknown or uncertain until it is converted into the currency of information. And a robust data exchange, with so-termed data handlers and data brokers, has emerged to perform precisely this work of speculation. An illustrative example is BlueKai, “a marketplace where buyers and sellers trade high-quality targeting data like stocks,” more specifically, an auction for the near-instant circulation of user intent data (keyword searches, price searching and product comparison, destination cities from travel sites, activity on loan calculators).[6]

This environment of highly sophisticated, near-constant amassing of data leads us back to DeLillo and his observation, made back in 1985, that “you are the sum total of your data.” And this is perhaps the very environment that leads Geoffrey Bowker to declare, in his provocative afterword to the collection of essays ‘Raw Data’ is an Oxymoron (2013), that we as humans are “entering into”, are “being entered” into, “the dataverse.”[7] Within this dataverse, Bowker—who is being self-consciously hyperbolic—claims it is possible to “take the unnecessary human out of the equation,” envisioning a scenario wherein “our interaction with the world and each other is being rendered epiphenomenal to these data-program-data cycles” and one where, ultimately, “if you are not data, you don’t exist.”[8] But this is precisely where we must be most cautious, particularly when it comes to the nascent dataverse of humanities researchers. Because while we might tentatively make the claim to be within a societal dataverse now, the alignment of data with existence and experience is still far from total. We cannot yet fully capture the entirety of the white noise of selfhood.

And this is where things start to get interesting, because what is perhaps dystopian from a contemporaneous perspective—that is, the presence somewhere out there of near infinitesimal quanta of data pertaining to you, your preferences, your activities— a scenario that might reasonably lead us to reach for those tinfoil hats, is, conversely, a desirable one from the perspective of historians and other humanities researchers. A data sublime, a “single database fantasy”[9] wherein one could access everything, where nothing is hidden, and where the value, the intellectual, historical, and cultural value of the raw data is always speculative, always potentially of value to the researcher, and thus amassed and maintained with the same reverence associated with high value data traded today on platforms such as BlueKai. Because as it is, the amassing of big data for humanities researchers, particularly when it comes to converting extant analogue archives and collections, subjects the material to a hierarchising process wherein items of potential future value (speculative value) are left out or hidden; greatly diminishing their accessibility and altering the richness or fertility of the research landscape available to future scholars. After all, “if you are not data, you don’t exist.”[10] But if you don’t exist then, to paraphrase Raley, you cannot be subjected to the search or query of future scholars and researchers, the search or query that would determine your value.

As we move towards these data sublime scenarios, it is important not to lose sight of the fact that that which is considered data now, this steadily accumulating catalogue of material pertaining to us as individuals or humans en masse, still does not capture everything. And if this is true now then it is doubly true (ability to resist Orwellian doublespeak at this stage in blogpost = zero) of our past selves and the analogue records that constitute the body of humanities research. How do we incorporate the “not data” in an environment where data is currency?

Happy Day of DH!

[1] Raley, “Dataveillance and Countervailance” in Gitelman ed., “Raw Data” is an Oxymoron, 124.

[2] Roger Clarke, quoted in ibid.

[3] Don DeLillo, White Noise, quoted in Gitelman ed., “Raw Data” is an Oxymoron, 121, emphasis in original.

[4] Raley, “Dataveillance and Countervailance” in ibid., 123–4.

[5] Julia Angwin, “The Web’s New Gold Mine: Your Secrets,” quoted in ibid., 123.

[6] Raley, “Dataveillance and Countervailance” in ibid., 123.

[7] Geoffrey Bowker, “Data Flakes: An Afterword to ‘Raw Data’ Is an Oxymoron” in ibid., 167.

[8] Bowker, in ibid., 170.

[9] Raley, “Dataveillance and Countervailance” in ibid., 128

[10] Bowker, in ibid., 170.

Featured image is a still taken the film version of 1984.

Big Data, Little Data, Fabulous Data.

Suzanne Briet, in What is documentation? says that “Documentography is the enumeration and description of diverse documents.”[1] Slightly modified, and paired with a nice little neologism (who can resist neologisms?) I could describe the work I am doing at this stage in the KPLEX Project as “Datamentography.Datamentography, of course, meaning the enumeration and description of diverse data. I’m working to establish what it is we talk about when we talk about data; the established conceptions of what data is among different communities, the why, how and where that lead to the development of the various understandings and conceptions of data active today. Once this has been established, we can use these findings to move towards a new conceptualisation of data.

One of the other passages in Briet that I really like (because it is quite poetic and this is perhaps a little unexpected in a text about documentation standards) is as follows:

Is a star a document? Is a pebble rolled by a torrent a document? Is a living animal a document? No. But the photographs and the catalogues of stars, the stones in a museum of mineralogy, and the animals that are catalogued and shown in a zoo, are documents.[2]

So, this nice descriptive passage outlines a key distinction: the thing itself is not a document, but the material traces of its interactions with humans are; the photos, the specimens, the catalogues, the records (visual, audio, and so on). But how do we capture the richness of these items in a computerised environment? A pebble is one thing, as is a stone in a museum of mineralogy. How to capture the pebble rolling in a torrent? And how to do so in a manner that does not subject the material to an interpretative act that alters how future scholars and researchers approach these records? If all the future scholar can “see” in the online repository is what the person responsible for compiling the repository considered to be important (or codeable) then their interpretative sphere is corrupted (if that is not too dramatic a word) from the onset.

Choice emerges as an implicit facet in this distinction; irrespective of how objective we think we are being, the act of collating information is implicitly subjective. What one person identifies as important (as worthy of documenting, as data), may appear wholly unimportant to someone else, and vice versa. Taste and preference are fine in day-to-day life (“You say tomato, I say tomahto… You say potato, I say vodka…”), but when these inherently human and therefore unavoidable subjective tendencies are let loose on humanities repositories, then a hierarchy is imposed on knowledge that reflects the subjective choices of the person who has classified or codefied them.

Further still, encoding the thing-ness of things is difficult. In a society that increasingly values and priorities codefied data, if what is readily codeable is prioritisied without concordant measures taken to account for the facets of human records and experiences that do not lend themselves so readily to codification, we encounter a scenario wherein that which is not as readily codeable is left out, neglected or even forgotten.

Now, people have gone about defining data in a number of different ways, and almost all are at least a little problematic. Christine Borgman, in her book chapter “What are Data?” from Big Data, Little Data, No Data uses an example from the great Argentinian writer Jorge Luis Borges to explain why defining data by example is unsatisfactory. In his essay “The Analytical Language of John Wilkins” Borges presents us with a taxonomy of animals in the form of a Chinese encyclopedia, Celestial Emporium of Benevolent Knowledge. In this taxonomy we encounter the following classifications:

a) belonging to the emperor, b) embalmed, c) tame, d) sucking pigs, e) sirens, f) fabulous, g) stray dogs, h) included in the present classification, i) frenzied, j) innumerable k) drawn with a very fine camelhair brush, l) et cetera m) having just broken the water pitcher, n) that from a long way off look like flies.

Of course, this list is somewhat absurd, and its absurdity is what makes its funny and what makes Borges so brilliant; but this absurdity should not bely the critique of taxonomic practices that lies at the heart of this so-called “emporium of benevolent knowledge.” Lets take a closer look.

Embalmed animals are included because someone once identified them as worthy of embalming, and that the act of being embalmed somehow signified something that was worth documenting (in the form of putting the sad creature in a jar of formaldehyde; or rather, of making a record of the fact that this creature has been stored in formaldehyde). Similarly, some animals are included merely because they are already in the system (“included in the present classification”), so simply because they are already there and it is easy to carry them over and keep them incorporated; in this way long established practices are maintained, simply because they are long established and not necessarily because they are effective (hello metadata, you cheeky old fox).

In What is documentation? Briet charts the sad odyssey of an “antelope of a new kind […] encountered in Africa by an explorer who has succeeded in capturing an individual that is then brought back to Europe for our Botanical Garden”[3]:

A press release makes the event known by newspaper, by radio, and by newsreels. The discovery becomes the topic of an announcement at the Academy of Sciences. A professor at the Museum discusses it in his courses. The living animal is placed in a cage and catalogued (zoological garden). Once it is dead, it will be stuffed and preserved (in the Museum). It is loaned to an Exposition. It is played on a soundtrack at the cinema. Its voice is recorded on a disk. The first monograph serves to establish part of a treatise with plates, then a special encyclopaedia (zoological), then a general encyclopaedia. The works are catalogued in a library; after having been announced at publication [et cetera].[4]

So we have the genesis here of the thing itself—a “pure or natural object with an essence of [its] own”[5]—from its capture and discovery through to its death when it is stuffed (akin to Borges’s embalming) and a process is initiated wherein it is catalogued and subjected to extensive “documentography” according to the established taxonomy of “Homo documentator.”[6] “Homo documentator” creates detailed portraits of the creature (perhaps using a very fine camelhair brush as in Borges’s encyclopaedia) for inclusion in the classificatory system, records its unique markings, the sound of its voice, whatever aspects of the creatures essence can be readily captured. Once in the system, whether by means of artistic plates outlining specifics of the species, or in the form of photographs and sound recording, and so on, it becomes a de-facto document (de-facto data), and its documentability is exhausted only when the taxonomical system employed by the documentalist has itself been exhausted.

But who has designed this taxonomical system? Who is responsible for deciding what facets of the antelope are important and what not? Are items perhaps ever considered important solely because they are facile to document? And who is to say that this same sad and now stuffed antelope could not also be classified as fabulous (or, once fabulous, had you encountered it in the wild)? Further still, surely this creature, like all creatures when viewed from a certain perspective, could be included in Borges’s category for animals that “from a long way off look like flies.” The point is that the system of taxonomy is not objective, and our conceptions of the facets that are important or unimportant can and have been influenced by the hierarchies imposed upon them by the person or persons responsible for compiling them.

Borgman in “What Are Data?” refers us to Open Archival Information System (OAIS) for a definition of data that, once again, uses examples:

Data: A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen.[7]

This, like most definitions of data, seems relatively reasonable at first, naturally the characters on a page are going to qualify as data, and so if they do, they are or can be encoded as such. But what about the page itself? What about the materials on the page that do not qualify as characters? What about doodles? Pen-tests? Scribbles, drawings, additions and other contextually specific parlipomena? How do we encode these? And, if we decide not to, why do we decide not to, and who—if anyone—holds us accountable for that decision? Because such a decision, inconsequential though it may seem at first, could effect and limit future scholars.

And this is what I am attempting to tease out as part of my contribution to the KPLEX Project: why and how did certain conceptions of data become acceptable or dominant in certain circles, and, going forward, as we move towards bigger data infrastructures for the humanities, is there a way for us to ensure that the thing itself, in all its complex idiosyncratic fabulousness, remains visible, and available to the researcher?

[1] Briet et al., What Is Documentation?, 24.

[2] Ibid., 10.

[3] Ibid., 10.

[4] Ibid.

[5] Borgman, “Big Data, Little Data, No Data,” 18.

[6] Briet et al., What Is Documentation?, 29.

[7] Borgman, “Big Data, Little Data, No Data,” 20, emphasis in original.

The Featured Image was borrowed from Natascha Schwarz’s illustrated edition of Borges’s Book of Imaginary Beings: https://www.behance.net/gallery/10823485/Jorge-Luis-Borges-Book-of-Imaginary-Beings