The Challenge for the Computational Sciences in Digital Humanities: Establishing a Common Meta-Methodological Framework

Jonas Kuhn, University of Stuttgart (jonas.kuhn@ims.uni-stuttgart.de)
August 2014

In my view, the major challenge for the branch of Digital Humanities that aims to exploit the potential in scalable computational models is to go beyond the status quo in DH where the relatively few studies going to greater analytical depth typically involve custom-tailored technological solutions which are developed and implemented in direct tandem collaborations and whose portability to other studies/disciplines is often unclear. Non-trivial solutions typically have to draw upon a complex collection of resources that require an appropriate adjustment and adequate combination techniques. Very often, the working assumptions of certain tool components will fail to be met in a different domain, which can cause unexpected and hard to predict system behavior even when formally, all interfaces adhere to the required exchange formats. The lack of portability and scalability (at moderate costs) is a serious issue, since it can lead to disappointment at various points:

  • Engineering efficiency. Funding sources investing money as well as humanities partners investing considerable time in interface specifications, guidelines and (test) annotation will expect that future projects can re-use major parts of earlier efforts
  • Theoretical flexibility. Scholars applying computational modeling frameworks will hope to be able to factorize out theory-specific assumptions from a prior empirical study while still taking advantage of the algorithmic facilities
  • cientific rigor. Ill-informed re-use of modeling solutions outside of the intended scope is likely to lead to unnoticed artifacts in the system predictions; thus there is a considerable risk that apparent break-throughs from DH methods will not stand the test of critical assessment by other scholars from the disciplines or will turn out impossible to replicate. This may discredit the use of such techniques.

One way to respond to these issues is to restrict DH support to generic exploration tools that avoid going into greater analytical depths. But this would sacrifice most of the goal of opening up innovative avenues of research in the digital humanities (besides the fact that it is probably an illusion that there are safe generic analytical steps,
given the diversity of data resources): generic support tools often lead to scholars’ disappointment, since the effort of including them in one’s working methodology does not pay off. There is so much manual work left that it seems more efficient to leave out the analytical support step completely.
I am convinced that effective use of computational models in the humanities can be a success. There is great potential in the modeling frameworks that have been developed over the last decades (for instance in computational linguistics and language technology). But to exploit them, the field needs a more systematic methodology that breaks down analytical processes into building blocks whose "deeper" functionality is transparent to the users in the humanities, so they are in a position to make their own critical assessment of the reliability of a particular component or component chain — and arrange for adjustments as necessary. Crucially,
the meta-architecture to be established should include best practices for non-computational intermediate steps too,
which are required to bridge the methodological gap between data-based empirical results and higher-level disciplinary research questions. Ultimately, digital humanities scholars should feel fully competent to draw upon a flexible methodological toolbox so they can try backing up any partial results from one component with evidence obtained from other sources, make informed adjustments to the components, or attempt an entirely different way of approaching the available information sources.
In other words, the mid- to long-term goal should not have IT specialists optimize a tool chain for fully automatic analysis so as to achieve the best possible performance for some specified task (which is bound to be imperfect for any non-trivial question anyway, thus requiring a responsible integration into higher-level research questions).
The digital humanities should rather aim to create transparency within a complex multi-purpose system of interacting information sources of variable quality or reliability (in plain extension of the classical competences humanities scholars have always had regarding approaches to their object of study). Contrary to the assumptions one can make about the typical users in a standard web-oriented application scenario of language technology and visual analytics (where users rarely have any philological or other meta-level attachment to the text basis from which they are seeking information), humanities scholars have far-reaching competences and intuitions about their objects of study and their sources. This makes the goal of developing an interactive framework for a network of knowledge sources a promising endeavor, drawing on techniques for aggregation, diagnostic and explorative visualization, quantitative analysis and linking back to data instances and (re-)annotation tools etc. In particular this means that C.S. is not in the role of a mere service provider, but the challenges that the use of computational models in the humanities pose are genuine scientific challenges for Computer Science (or Informatics) in its own right.
My hope is that a comprehensive and truly interdisciplinary digital humanities methodology will provide an effective choice of flexibly adjustable tools which are accessible to critical reflection and compatible with classical disciplinary methods and will thus lead to a break-through overcoming the residual wide-spread limitations on the types of higher-level subject-specific research questions that are being actively addressed in DH work — despite the availability of growing amounts of digitized resources.
The most critical issue in practice, I believe, will be to trade off the mentioned optimization criteria of Engineering efficiency, Theoretical flexibility and Scientific rigor. For establishing innovative computational methods in multiple disciplines it is important for the partners in the computational sciences to showcase some effective ways of porting and scaling up solutions without extremely high costs and effort. Otherwise the established cores of the humanities disciplines will not see the merit of a paradigm shift — so DH would remain a special community that is not anchored in the relevant fields. During some transitional period, it may hence be strategically justified to place slightly less emphasis on a rigorous checking of all prerequisite assumptions — which might slow down imaginative explorations of newly established approaches. However, this means walking a thin line. Since the scholar is losing direct control over the data when using computational models, it is crucial to establish standardized evaluation techniques. Otherwise, artifacts in computational modeling results are highly likely to give rise to misinterpretations, which would discredit the approaches in the core disciplines. In the long run, the digital humanities should place utmost importance on highest scientific standards in their methodology to avoid any impression that computational models can be used to bring about effects that are not warranted empirically.
So far, the main emphasis in this short contribution has been on the viewpoint from the computational sciences asking in which ways they may have to adjust to the situation to facilitate effective interdisciplinary work. Of course the success also depends on some degree of flexibility on the side of the humanities partners. From own and reported experience, the most critical obstacle to establishing an effective collaboration is the fundamentally different role that a meta-level understanding of methodological challenges plays on the two sides: for computational scientists, the ultimate driving force for scholarly work that doesn’t even need to be mentioned is finding a general solution to a class of similar problems. Hence, the expectation is that it is required and helpful to switch back and forth between problems that are related/similar along certain dimensions. For the humanist on the other hand, the connection to other work conceived of as most important is often at a level of disciplinary content and theory that is inaccessible to a computational scientist, whereas the methodologically most similar DH work is in a more distant content domain or sub-discipline that the humanist hesitates to relate to. It is very important for interdisciplinary success to overcome this dilemma: the solution may lie in an even more multi-disciplinary build-up of research teams and the involvement of “mediators”, i.e., methodologically aware DH scholars that have a high-level background in a number of disciplinary areas and can relay content questions to computational scientists and methodological constraints to humanist specialists.
As a side note, I believe that the field of digital humanities can learn a lot from the experience in the field of computational linguistics, which has faced many of the technical and methodological issues over three decades.
In the many strands of research in computational linguistics, formal linguistics, corpus linguistics and data-driven language technology, various different methodological ways of bringing together apparently incompatible modeling assumptions have been explored to quite some depth (and in particular the issues of empirical evaluation methods have been debated for a long time). (This is not to say that computationally ambitious branch in digital humanities should or could simply take over solutions one-to-one from computational linguistics; there may be mistakes that the digital humanities can avoid, and the broader spectrum of research questions and objects of study addressed obviously places additional challenges. But it would be rather unreasonable not to take advantage of the many years of practical experiments in sociology of science under similar conditions…)