#data #bigdata #digitalhumanities

Harry Potter and the Philosopher’s DIKW.

June 28, 2017June 28, 2017 by georginanugentfolan

DIKW= Data. Information. Knowledge. Wisdom. DIKW!

That’s the way things flow. Or more specifically, that’s the way “the knowledge pyramid” says they flow.

From data we gather information we develop into knowledge which leads to wisdom.

Apparently.

But is it really that straightforward? Are our thought processes that streamlined, hierarchical and, let’s face it, uncomplicated? Or is DIKW simply a nice sounding but somewhat reductive anagram to be used when waxing lyrical about the philosophy of knowledge, information systems, information management, or pedagogy?

I for one am not all that convinced by DIKW. And I’m not the only one: the pyramid is widely criticised. But why? Where and how, exactly, does DIKW misrepresent how we think about and manage data and information? Today we’re going to explore the DIKW pyramid; specifically how exactly “data” gets transformed into “wisdom,” what exactly happens to it, and how a different approach to cleaning or processing that “data” can lead us to come to very different conclusions, and thus to very different states of “wisdom”. And to facilitate this philosophising on the nuances of DIKW and its vulnerability to corruption, I’m going to talk about Harry Potter and the “is Snape good or bad” plot that runs through all seven Harry Potter novels. Because, why not? Specifically I’m going to use Snape’s complexity as a character to highlight DIKW’s shortcomings and in particular how DIKW can be corrupted depending on how the data you collect is processed and interpreted.

As we all know, Snape looks kinda evil, acts kinda evil, hates Harry, and has a pretty dodgy past in which he was aligned with Voldemort, the wizard responsible for Harry’s parents’ deaths. He has a fondness for the “Dark Arts,” and, as head of Slytherin, an unhealthy interest in eugenics and so-called “blood purity” (never a good trait in a person). And he is played to absolute perfection by the unrivaled Alan Rickman, sadly now deceased.

Rowling maintains a near-constant back and forth throughout the series, with the characters forever pursuing the idea that Snape is bad, being thwarted in their pursuit of this idea, or thrown off their suspicions by Dumbledore who always reaffirms his strong faith in Snape. The dampening of any suspicion regarding Snape’s motives generally comes at the conclusion of any given book, only for these suspicions to be re-ignited at the start of the next book and the next adventure.

And just when this continual “is he or isn’t he a bad guy” threatens to get monotonous, with the well-trained reader now six books in and attuned to expect the usual — “Snape’s being shifty, ergo…he must be bad!” / “Nope he’s actually good, Dumbledore says so.” / “Oh okay let’s talk about this again in the next book.” — Rowling bucks our expectations spectacularly, and all of these hints and suspicions about Snape are seemingly verified in book six, Harry Potter and the Half-Blood Prince, when Snape goes and kills Dumbledore, the one man who trusted and protected him absolutely; a most heinous crime, and one done using “Avada Kedavra,” the unforgivable curse.

Lets take a look at Snape’s first appearance, way back in book one, Harry Potter and the Philosopher’s Stone, or as it’s known in the US, Harry Potter and the Sorcerer’s Stone:

(J. K. Rowling, Harry Potter and the Philosopher’s Stone, Bloomsbury 1999, 126.)

What do we get on Snape here?

He’s unhealthy looking, pale and yellowish. He could probably do with a good shampoo. Oh, and he has a hooked nose.

Now this description is controversial. Snape’s portrait could be considered to be “Jewish-coded” or even anti-Semitic; certainly it can be seen as having uneasy inter-textual chimes with overtly anti-Semitic portraits in classic (and classically anti-Semitic) canonical English-language texts such as Charles Dickens’s Oliver Twist or Shakespeare’s The Merchant of Venice where both Fagin and Shylock respectively (both pictured below) are presented as having the overt hooked noses that were considered characteristic of the so-called “Stage Jew.”

The “Stage Jew” is basically the Jewish equivalent of blackface, a crude form of racial stereotyping that was particularly popular during the Elizabethan period and thereafter. Much like blackface, these racist Jewish stereotypes were not just confined to the realms of theatre and literature, Hitler and the Nazi’s also made full use of racist caricatures in their propaganda.

Most all of this will (thankfully) be lost on a younger audience, but the question remains, does Rowling engage with this sadly all-too familiar visual trope deliberately? Knowing Snape’s story, his full story, as she claims she did all the way back in book one, does she proffer such material with a view to ultimately showing it up to be complete rubbish?

Snape’s appearance screams evil, irrespective of whether you want to connect that with the tradition of the Stage Jew, or with less racially charged narratives that seek to represent a person’s character in or through their appearance. It’s a frequent visual trope in Superhero(ine) movies, for example; the good guys look good, the bad guys look bad. And when the good guy turns bad (as they are wont to do on occasion), their nascent badness is represented visually through some change in their appearance. Again, in relation to the Spiderman 3 poster below, we can ask why the “evil” Spiderman is coded “black,” particularly when Spiderman hails from a country whose law enforcement has a well established track record of subjecting African American males to racial profiling, but we better stay on topic.

To return to Snape, to his appearance, and to how our reading of his appearance can change once we get to the crux of his character: Certainly Snape’s character leads us to question many of the facets of other peoples’ appearances we read and take for granted, their demeanour, their silences, their unreadability, their appearance, their complexity. Snape looks bad, but he is not bad, he is good. We have misread him and are guilty of superficially associating his morality with his appearance. This is not to say that readers of Harry Potter are guilty of racially profiling Snape, not at all, but simply that this tendency to create a link between appearance and morality has a long history in English literature, and one that unfortunately happens to have rather unpleasant racist roots that contemporary readers may not be aware of. This is just one small part of why Rowling is such a good author, and why the Harry Potter books are so rich and rewarding for growing minds in particular. But it’s also an example of how the DIKW pyramid can be drastically altered or corrupted depending on how you read the data at the bottom; the data being the material that proffers the opportunity to reach a state of wisdom regarding that particular material or phenomenon.

In other words, even if you’d only seen a very small selection of Marvel superhero movies, you have been trained to seek out information on a person’s moral character based off of their appearance, you may have taken facets of Snape’s malevolent appearance as “data” on Snape. This “data” would allow you to garner “information” on his character, because most figures that look like this in literature or superhero movies are bad or evil characters. Then, thanks to your experience of encountering Snape throughout the novels, this data would lead you to a position of “knowledge” or even to a position of “wisdom” regarding Snape (and peoples like Snape). This is where what we believe to be “wisdom” can start to cause problems.

—Pause here to note that the backstory of many of the villains in superhero movies are frequently tragic and often not all that different from the backstory of the hero or heroines. Similar data, different narratives, different superhero/ villain costume.—

This is also an example of how data is pre-epistemic. That means data has no truth, it just is, other people come along and clean it up, interpret it, narrativise it, assign it truth-values. Throughout the series Snape just is, he does not explain or excuse himself and he is as true to his love of Lily Potter at the beginning as he is at the end. We just happen not to be privy to his backstory. But Dumbledore is, and Rowling (naturally) is. Without the backstory readers are let run amok regarding the character of Snape, often hating him, feeling anxious in his presence, or fearing for the safety of the characters around him. But how much of that comes from Snape himself, or how much comes from other peoples’ interactions with him and their reading of him? Remember, many of the characters in the Potter-verse have already decided for themselves that Snape is evil and not to be trusted; the narratives we read are fueled by tainted interpretations of his data. Everything from his stare to his tone of voice is presented with adjectives that encourage us to read him as malevolent. The state of “Wisdom” (also, in my case, near-hysterical despair) we arrive at in book six when Snape kills Dumbledore is fuelled by cumulative knowledge and information that stems from misinterpreted data. The Snape-data has been read one way and one way alone, and there is no alternative narrative available, especially not once Dumbledore dies. There is no counter-argument to the data that seems to paint Snape as an absolute villain. None at least until, in his own death scene, he provides the memory (and narrative) that allows Harry to access the “truth value” of Snape’s history and his lifelong love for Lily.

Armed with this poignant insight into Snape’s childhood and history (his personal narrative, his personal account for the data-traces he has left on the Potter-verse), our DIKW pyramid is rewritten from the bottom up, but interestingly, much of it remains the same. The data stays data, but the narratives that we use form information from that data, to create knowledge from that information, and to eventually arrive at a sage-like state of sad wisdom regarding Snape’s sad fate, these narratives change. Snape still says the things he says, and does the things he does, we are just newly wise to his motives. We now know he is a good man. Same data, completely different DIKW.

We already known the DIKW model is problematic and oversimplified. As Christine Borgman notes, the “tripartite division of data, information, and knowledge […] oversimplifies the relationships among these complex constructs.”[1] Data is reinterpretable. And this is key. For the majority of the series Snape is continually heard, seen, and spoken about by characters in the texts using adjectives that assign morally dubious traits to his character. The –IKW part of Snape’s DIKW is unbalanced, because we do not get Snape’s personal narrative until the very end of his story. And it is only through this narrative that we can reassess the data on him we have collected, discarding some of it (such as the tendency to dramatise his appearance into one akin to a stage villain) as absolute rubbish, and reassessing what remains.

While Snape may carry the visual appearance (visual data) that makes it easy for us to suspect or infer that he is a “bad character,” while he may even carry the physiognomical hallmarks that hark back to racist character profiling in English literature, he is essentially a good person. What Rowling is saying here, in the most epic Rowling-esque fashion, is do not judge a person based on their appearance. Judge them on what they do and why they do it. This is Rowling’s real trump card to the “evil is as evil looks” camp of Snape-hating Potter fans. And arguably it is also Rowling’s way of redressing this unpleasant facet of English literary history, which sees race presented through face, and race or racial stereotypes sadly being presented as a measure of a character’s moral compass. Rowling writes back against this tradition by having Snape carry the same facial features of these similarly maligned “villainous” figures, features past readers or audiences would have taken as crude indications of his untrustworthiness. Yet instead of being “untrustworthy,” this same hook-nosed figure turns out to be one of the bravest, strongest, truest characters in the series. Take that Shakespeare and Dickens. Whup-ah.

So, we come back then to the problem of DIKW. Models such as DIKW create misleading and misrepresentative impressions about the supposed distinctions between the various facets of DIKW. DIKW also belies the central role narrative plays in all of this; narrative is the conveyor of information, knowledge, and wisdom. It is how we articulate and spread our opinions on data. And data is the foundation of DIKW, so depending on how that data is narrativised, the other elements in this hierarchy can be drastically different. One sees Snape as evil, and reads this wickedness in and into his every scene, up until his death. The other asks us to think carefully about what we do with our data, and the narratives we create from it, because even when we are wholly convinced in the veracity and justifiableness of our “wisdom,” we could be totally wrong, as we were with Snape.

[1] Christine L. Borgman, “Big Data, Little Data, No Data,” MIT Press, 17, accessed April 7, 2017, https://mitpress.mit.edu/big-data-little-data-no-data, 17.

The Revolution of the McWord; or, why difference and complexity is necessary.

May 12, 2017May 15, 2017 by georginanugentfolan

“It’s a beautiful thing, the destruction of words.”—George Orwell, 1984.

One of my earliest memories of being reprimanded happened when I was in Junior or Senior Infants at Primary School. During a French lesson I needed to use the bathroom (tiny humans often do). We had been told we could only communicate in French, so I sat there attempting to gather and translate my toilet related thoughts into something suitably Francophone. Eventually I put up my hand, got the teacher’s attention, pointed at my chest and said “Moi,” pointed at the door that lead to the bathrooms and said “toilette?” The teacher snapped and said “No Georgina you are not a toilet!”

A little harsh perhaps, especially considering I was a four-year old three-foot high mini-human, but still, I haven’t forgotten it, and now I’m fluent in French. My effort at breaking down a language barrier caused someone to snap and (it seems) be insulted by my tiny human attempt at French.

So certainly I agree with Jennifer’s point from her recent blog article here that “Building intimacy (for this is what I take the phrase “brings you closer” to mean) is not about having a rough idea of what someone is saying, it is about understanding the nuance of every gesture, every reference and resonance.” This is part of the (many) reason(s) why people on the autism spectrum, for example, find social interaction so difficult, because of a difficulty understanding these very gestural nuances that are so central to human communications. And this lack of understanding often brings with it frustration, isolation, loneliness, and pain. The point is: it’s not just about the words; it’s about how they are said, the tone, the gesture, the contexts. These are things a translation program cannot understand or impart, and it is arrogant to suggest that such facets of communication are by-passable or expendable when so many people struggle with them on a day-to-day basis. Moreover, they are facets of human communication that cannot be erased or eliminated from speech-exchanges with a view to making these exchanges “simpler” or “doubleplusgood.” That brings us right into 1984 territory.

From the perspective of Eugene Jolas, author of the “Revolution of the Word” manifesto published in transition magazine, a modernist periodical active in Paris throughout the 1920s and 1930s whose contributors included the likes of James Joyce, Gertrude Stein, and Samuel Beckett, language was not complex enough:

Tired of the spectacle of short stories, novels, poems and plays still under the hegemony of the banal word, monotonous syntax, static psychology, descriptive naturalism, and desirous of crystallizing a viewpoint… Narrative is not mere anecdote, but the projection of a metamorphosis of reality” and that “The literary creator has the right to disintegrate the primal matter of words imposed on him by textbooks and dictionaries.”[1]

So, language, or rather languages (Jolas was fluent in several and often wrote in an amalgamation that, he felt, better reflected his hybridic Franco-German (Alsatian) and American identity[2]), was not complex enough to fully express the totality of reality.

Mark Zuckerburg proposes something of a devolution, a de-creation, a simplifying of difference, a reintegration and amalgamation of the facets that distinguish us from others. But while it might appear useful (Esperanto, anyone?), is the experience going to lead to richer conversations? To a demolition of barriers? Or will it result in something akin to Point It, the highly successful so-called “Travellers Language Kit” that contains no language at all, but rather an assortment of pictures that allow one to leaf through the book and point at the item you want.

So, with a sensitive interlocutor, one could perhaps intuit that “Me *points at* Coca-Cola” means “Hi, I would like a Coca-Cola.” Or that “Me *points at* toilet” would likely mean “Hi, I desperately need to use your bathroom, could you be so kind as to point me in the right direction?”

After my childhood French toilet incident, I myself could never overcome the mortification involved in using Point It. But even if I did, would the success of an exchange wherein “Me *points at* Coca-Cola” results in my being handed a Coca-Cola give me the same satisfaction as my first successful exchange with someone in another language did? The first time I managed to say something in French to a French person in France and be met with a response in French as opposed to a confused look or (worse) a response in English. Would Zuckerberg’s own much-lauded trip to China in 2014 where he was interviewed and responded in Mandarin have received as much positive press if he had worked through an interpreter, or used a pioneering neural network translation platform? I don’t think so.

Reducing or eliminating language difference also creates hierarchies, and this is dangerous. What language will we agree to communicate in? Why one language and not another? What facets of my individuality are accented in my native language that are perhaps left out or lost in another?

In short, there is an ethical element to this, and one that must be acknowledged and addressed. It’s similar to the argument Todd Presner articulates in “The Ethics of the Algorithm” when he notes the negative affect of reducing human experience (in this case, the testimonies of Holocaust survisors) to keywords so that their experiences become “searchable”: “it abstracts and reduces the human complexity of the victims’ lives to quantized units and structured data. In a word, it appears to be de-humanizing”[3]

We have to resist what Presner calls “the impulse to quantify, modularize, distantiate, technify, and bureaucratize the subjective individuality of human experience,”[4] even if this impulse is driven by a desire to facilitate communications across perceived borders. Finding, maintaining, and celebrating the individual in an era that is putting increasing pressure to separate the “in-” from the “-dividual” for the sake of facility will lead (and has perhaps already lead, if we can refer back to Don DeLillo’s 1985 observation that “You are the sum total of your data.”) to the era of the “dividua”[5]; where instead of championing individuality, people are reduced to their component data sets, or rather the facets of their personhood that can be assigned to data sets, with the rest—the enigmatic “in-” that makes up an individual—deemed unnecessary, a “barrier” to facile communications.

Rather that working to fractalise language, as Eugene Jolas did, universal translation (which is itself a misnomer, all translations are, to a degree, inexact and entail a degree of intuition or creativity to render one word in or through another word) simplifies that which cannot, and should not, be simplified. This would be doubleplusungood.

Complexity matters.

KPLEX matters.

[1] Eugene Jolas, “Revolution of the Word,” transition 16/17, 1929.

[2] Born in the New Jersey, Jolas moved to Europe the bilingual Alsace-Lorraine region as a young child, and later spent key formative years in the United States.

[3] Presner, in “The Ethics of the Algorithm: Close and Distant Listening to the Shoah Foundation Visual History Archive” in Fogu, Claudio, Kansteiner, Wulf, and Presner, Todd, Probing the Ethics of Holocaust Culture..

[4] Presner, in ibid.

[5] Deleuze, “Postscript on the Societies of Control.”

“Voluptuousness:” The Fourth “V” of Big Data?

April 3, 2017 by Jennifer Edmond

The KPLEX project is founded upon a recognition that definitions of the word ‘data’ tend to vary according to the perspective of the person using the word. It is therefore useful, at least, to have a standard definition of ‘big’ data.

Big Data is commonly understood as meeting 3 criteria, each conveniently able to be described with a word beginning with the letter V: Volume, Velocity and Variety.

Volume is perhaps the easiest to understand, but a lot of data really means a lot. Facebook, for example, stores more than 250 billion image . ‘Nuf said.

Velocity is how fast the data grows. That >250 billion images figure is estimated to be growing by a further 300-900 million per day (depending on what source you look at). Yeah.

Variety refers to the different formats, structures etc. you have in any data set.

Now, from a humanities data point of view, these vectors are interesting. Very few humanities data sets would be recognised as meeting criteria 1 or 2, though some (like the Shoah Foundation Video History Archive) come close. But the comparatively low number of high volume digital cultural datasets is related to the question of velocity: the fact that so many of these information sources have existed for hundreds of years or longer in analogue formats means that recasting them as digital is a highly expensive process, and born digital data is only just proving its value for the humanist researcher.

But Variety? Now you are talking. If nothing else, we do have huge heterogeneity in the data, even before we consider the analogue as well as the digital forms.

Cultural data makes us consider another vector as well, however: if it must start with V, I will call it “voluptuousness.” Cultural data can be steeped in meaning, referring to a massive further body of cultural information stored outside of the dataset itself. This interconnectedness means that some data can be exceptionally dense, or exceptionally rich, without being large. Think “to be or not to be;” think the Mona Lisa; think of a Bashō haiku. These are the ultimate big data, which, while tiny in terms of their footprint of 1s and 0s, sit at the centre of huge nets of referents, referents we can easily trace through the resonance of the words and images across people, cultures, and time.

Will the voluptuousness of data be the next big computational challenge?