“Digital History” vs “Data Science for History”
Any discussion of the intersection of history and new technologies — technologies that are new to the field of history, at least — will inevitably run into a morass of terminology. Academic writing isn’t exactly known for being clear and concise in the first place, and with novel terminology being coined to describe advances, anyone foolhardy enough to try writing about it finds themself in a minefield of changing, often-debated usage. I am apparently that foolhardy, and think some clarification of terms is necessary before going any further.
Digital history is a sub-field of digital humanities, the current buzzword in humanities academia. What exactly is and is not digital history is somewhat nebulous and up for debate, but the basic definition is fairly simple. According to Douglas Seefeldt and William G. Thomas in their aptly-named 2009 article “What is Digital History?”,
“Digital history might be understood broadly as an approach to examining and representing the past that works with the new communication technologies of the computer, the internet network, and software systems.”
For those of us looking for a more succinct if perhaps less precise definition, digital history can be summed up as “history with computers.” As one might imagine, something that vague comprises a pretty big tent.
As digital history continues to develop, further sub-disciplines within it will doubtless become apparent, but for now it is only possible to distinguish with confidence between the two main functions of digital history: representative and interpretive. The representative function of digital history involves gathering materials and placing them in a digital context where they can be explored in light of one another, presented as something more engaging than a shelf of archival boxes. The interpretive function of digital history, on the other hand, involves transforming materials by digital means in order to gain a new understanding of them. It is worthwhile to note that digitization alone is not digital history, only the raw material for it. The actual core of digital history is in the selection, interpretation, and presentation of material with regard to a historical question.
Data science, like digital history, is a current buzzword and relatively recent phrase, but unlike digital history, the study of data science long predates the term. While the definition is again somewhat nebulous, data science is generally agreed to be the practice of applying the scientific method to extract insights from data in order to generate predictions, drive actions, and guide further inquiry. Humans have been using quantitative methods to gather insights from data for centuries, but it has only been in the last few decades that the rise of computers has made data wrangling an essential and commonplace part of life. Data is generated in exponentially greater quantities, and the tools to work with it have improved at pace. As such, data science is now complex enough and necessary enough that it is recognized as a field in its own right. In the public consciousness, data science is often elided with “big data” and artificial intelligence, but these are simply the most visible attractive facets of a far broader field.
With those two definitions in mind, let’s turn to the distinction between “digital history” and “data science for history.” At first glance, the two phrases may seem synonymous, or perhaps like one is a subset of the other, but neither of these is technically the case. Digital history requires, by definition, the involvement of computers, but does not require any quantitative component. Data science, on the other hand, exists independently of computers (though I wouldn’t recommend doing it by hand unless you’re Katherine Johnson) and necessarily includes a quantitative component. In practice, the two often coincide, but you could apply data science to a historiographic question and publish the results as a traditional history paper just as easily as you could create an excellent digital history project using no data science at all. It is therefore improper to consider either dependent on the other. A more accurate statement is that data science for history is a powerful historical tool, often utilized to augment the interpretive function of digital history.
Similar confusions will occur in any scenario where methods are combined from widely disparate fields. Each field brings its own lexicon, and what’s seen as a minor facet in one may be an entire branch of another. For instance, a concept in the digital humanities called “distant reading” actually long predates the computer age under other names and in data science would be considered a form of natural language processing (NLP). I’ll do my best to pick apart these linguistic snarls in future blog posts, but it’s worthwhile to keep in mind that in developing fields, terms often experience changing and conflicting usage. Keep aware of it, and if worst comes to worst, please forgive the occasional gaffe by your local data science/history blogger. He’s trying his best!