On Covering the Gap between Computation and Humanities

Alexander Mehler
Goethe University Frankfurt, DE

Andy Lücking
Goethe University Frankfurt, DE

October 23, 2014

Since digital or computational humanities (CH) has started it’s triumph in the humanities’ research landscape, it is advisable to have a closer look at its methodological and epistemological range. To this end, we look at CH from the point of view of preprocessing, machine learning, and the general philosophy of science and experimental methodology. From this perspectives, a number of gaps between CH on the one hand and classical humanities on the other hand can be identified. These gaps open up when considering: (i) the status of pre-processing in CH, its logical work-flow and the evaluation of its results compared to the needs and terminological munition of the humanities. Most importantly, corpus preprocessing often comes before hypothesis formation and respective model selection has been carried out, turning the logically as well as methodologically required workflow upside down. (ii) the predominant role of functional explanations in CH applications vs. the predominant role of intentional explanations with regard to the humanities. While computational processes can at most be functionally evaluated, hypotheses made in the humanities are usually embedded within contexts of justification that draw on some intentional statement. (iii) the possibilities of falsifying CH hypotheses and hypotheses in the humanities. Given the different typical patterns of explanations (see (ii) above), the results of computations and of the humanities cannot put to falsification as known as the powerful methodology from the natural, experimental sciences. This leaves open questions about the validity of these results. (iv) the use of big data in CH vs. the use of deep data in the humanities. Analyses in the humanities usually involve the interpretation and rational reconstruction of their objects. This hermeneutic procedure goes beyond mere preprocessing and parsing of those objects, as is typically within reach of CH applications. When gathering interpreted and preprocessed data into corpora (which is done only seldom in the humanities, though), both approaches result in different kinds of resources which may be only of marginal benefit for the respectively other party. (vi) the lack of experimental methods in both CH and the humanities. In order to implement a notion of falsification in CH, one needs to think of CH-specific experimental settings which give rise to test procedures in the first place.

Based on these assessments, we argue that there are at least five interrelated gaps between computation and humanities, namely (1) an epistemological gap regarding the kind of evaluation mainly addressed by computational models in contrast to the kind of explanations addressed in the humanities; (2) a data-related gap regarding the build-up of ever growing text corpora in computer science in contrast to the need of controlled as well as deeply annotated data in the humanities; (3) a semiotic gap regarding signs as strings in the CH in contrast to rich sign-theoretical notions employed in the humanities; (4) a methodological gap with respect to understanding the functioning of methods of computer science by humanities scholars; and (5) an interpretation gap regarding the foundation of statistical findings in terms of the theoretical terms of the humanities involved. Having diagnosed these gaps we proceed by delineating two steps that could narrow (some of) these gaps: firstly, the understanding of CH technologies should be fostered by implementing them as part of a curriculum. Secondly, we should think of hybrid algorithmic methods, i.e. methods that at crucial branching points involve humanist expertise from the outset and in this way may pave the way towards “hermeneutic technologies” as a special kind of human-based evolutionary computing.