Linguistic data is inherently multidimensional, with complex interactions between different linguistic features and structures being the norm rather than the exception. Historical linguistic change typically is the result of such complex interactions. The core remit of historical linguistic work is to identify a language change and to understand how different relevant factors have interacted with each other across time to effectuate the change. Statistical calculations are an extremely useful means for the quantification of changes of individual linguistic structures, but they are per se not suited for the uncovering and understanding of the complex feature interactions involved in a change. The central issue here is that statistical tests are suited to either confirm or to reject a given hypothesis, but the precise factors leading to a language change are often unknown or at least highly debated among researchers. Without a priori knowledge about potential interactions contained in the data, using statistical methods is not feasible. Moreover, data sparsity is a relevant issue since historical linguists are limited to working with those texts which have fortunately survived.

In my thesis, I used methods from the field of visual analytics for the investigation of historical change, illustrating the power and efficacy of data visualization and visual analytics for historical linguistic research. Visual analytic tools and techniques are generally suited for information synthesis and are moreover meant to help deriving insights from large amounts of dynamic, ambiguous, and often conflicting data; data properties which are also characteristic of linguistic data. In terms of historical linguistics, this facilitates the identification of language change and relevant interacting factors, furthering the understanding of the historical data.

More specifically, I investigated factors conditioning the occurrence of dative subjects in the history of Icelandic on the basis of data from the Icelandic Parsed Historical Corpus (short IcePaHC) using two different types of visualizations. I decided to examine dative subjects in the history of Icelandic because the Icelandic case system presents an interesting and challenging linguistic puzzle. In general, languages tend to use either word order, case marking and/or agreement to signal grammatical relations (subject and object). Icelandic is atypical in this respect as it has a rather rigid word order, but also retained a rich morphological case system over the centuries. Moreover, non-nominative subjects exist in the language, with in particular the synchronic existence of dative subjects being well-established. From a historical perspective, dative subjects have also attracted a good deal of research, specifically with respect to the question about whether dative subjects are a common Proto-Indo-European feature or whether they are a more recent historical innovation.

The first visualization tool employed for data analysis in my thesis is a glyph visualization which I developed together with Dominik Sacha, a former member of Daniel Keim’s Data Analysis and Visualization group at the University of Konstanz. The glyph visualization was used for the investigation of the interaction between dative subjects, lexical semantics, event structure and voice, i.e., factors which have been suggested as relevant for the diachrony of dative subjects in Icelandic by the existing literature.

Figure 1: Glyph representations available in the glyph visualization showing interactions between dative subjects, different event structure verb classes, and voice in IcePaHC

Using the glyph visualization immensely facilitated the linguistic analysis process since it enables an interactive and exploratory access to multiple interactions at different levels of detail (see Figure 1) while also providing an overview. By means of the glyph visualization, I was able to show that dative case marking on subjects indeed correlates with particular event structure configurations. In addition, I found that the distribution of dative subjects has been changing in the history of Icelandic with respect to lexical semantics and event structure. Furthermore, I uncovered that middle formation is the driving force behind the observed changes. This interrelation has been previously unknown.

The second corpus investigation in my thesis examined the interrelation between dative subjects and word order in IcePaHC more closely. In order to do so, I visualized the diachronic interaction of subject case marking and word order, in particular with respect to subject position and verb placement, using the HistoBankVis visualization system. HistoBankVis was developed together with Frederik Dennig, Michael Blumenschein and Daniel Keim in the scope of our collaboration between projects A03 and D02 of the SFB-TRR 161. HistoBankVis allows for the sophisticated and flexible investigation of multiple interacting features from different linguistic dimensions via different visualization components, supporting the researcher in the process of hypothesis testing and generation. In particular, the dimension interaction visualization component, which is based on Parallel Sets, proved to be a powerful and useful tool for investigating the feature interactions typically underlying historical change.

Figure 2: Dimension interaction visualization for voice and word order in sentences with a dative subject from IcePaHC from 1750-1899 (top) and 1900-2008 (bottom)

Via HistoBankVis, I showed that over time, subjects are increasingly realized in the clause-initial position in Icelandic. Moreover, the dimension interactions showed that dative subjects show a weaker tendency to be realized in a particular structural position than subjects overall. Only as of 1900, which is also when dative subjects show the largest change with respect to event structure and middle formation, dative subjects follow suit and mainly occur clause-initially. The dimension interactions additionally showed that there is a correlation between the increase of dative subjects together with middle voice and the increasing realization of dative subjects in the clause-initial position (see Figure 2).

In sum, the investigations presented in my thesis showed that dative subjects are part of a complex interlinked, but changing, system in which case, word order, grammatical relations, lexical semantics and event structure interact in the mapping (or linking) of arguments to grammatical relations. It is thus rather unlikely, that dative subjects are a Proto-Indo-European inheritance. In addition, in order to account for the complexity of the Icelandic linking system and the diachronic developments as evidenced by the corpus investigations, I developed a novel linking theory couched in the Lexical-Functional Grammar framework which factors in the relevant features for licensing case and grammatical relations in Icelandic, making a theoretical linguistic contribution which is informed by the visual analysis of the historical corpus data.

Dative Subjects: Historical Change Visualized

Christin Beck, former Schätzle, is a postdoctoral researcher in Miriam Butt’s computational linguistics group at the department of linguistics at the University of Konstanz, working in Project D02 in the SFB-TRR 161. Her research interests are visual analytics for historical linguistics, computational methods for investigating language change in historical corpora, and computational lexical semantic resources.

Leave a Reply

Your email address will not be published.

Cookie Consent with Real Cookie Banner