Authors:Marieke Meelen, David Willis Pages: 1 - 5 Abstract: This Special Issue derives from a workshop ‘Creating annotated corpora for historical languages’, held at Selwyn College, Cambridge on 26–27 September 2019. The workshop formed part of a wider project ‘Developing a Welsh Historical Treebank’, funded by the British Academy and Leverhulme Trust, which aimed to develop conventions and procedures that might form the basis for a fully parsed representative corpus of historical Welsh texts. The workshop was designed to share experience of building annotated historical corpora, focusing in particular on the technical issues involved. PubDate: 2022-06-27 DOI: 10.18148/hs/2022.v6i4-11.164 Issue No:Vol. 6, No. 4-11 (2022)
Authors:Marieke Meelen, David Willis Pages: 1 - 32 Abstract: This article examines various issues involved in constructing a parsed Penn-style representative historical corpus of Middle and Modern Welsh. Specifically, it focuses on what structures to adopt for constituency-based structural descriptions in three case studies: (i) whether to adopt rel- atively more or less hierarchical structures at the phrasal level and above; (ii) how to deal with complex prepositional phrases, typically containing a grammaticalizing or grammaticalized noun as one of their elements; and (iii) how to deal with coordination of main clauses and omission of elements shared between clauses. In each case, we see how conventions need to be adopted that facilitate maximal ease of searching for potential users of the corpus; that are robust across many centuries of language change; and that permit efficient and consistent parsing by a team of annotators. PubDate: 2022-06-27 DOI: 10.18148/hs/2022.v6i4-11.135 Issue No:Vol. 6, No. 4-11 (2022)
Authors:Melissa Farasyn, Anne-Sophie Ghyselen, Jacques Van Keymeulen, Anne Breitbarth Pages: 1 - 36 Abstract: This paper reports on the construction of a tagged and parsed pilot corpus of the southern Dutch dialects. The corpus aims to facilitate diachronic research into the syntax of Dutch, as its dialects have retained many interesting (morpho)syntactic features which can often be traced back to changes starting in or characteristics retained from older stages of historical Dutch. The discussion mainly focuses on initial test results achieved by applying existing NLP tools which have been developed or optimised for POS tagging and parsing standard Dutch. We report on initial tests on our data with Frog, TreeTagger and Alpino. We discuss some of the challenges we have encountered working with spoken, unstandardised language in general on the one hand and on specific (morpho)syntactic problems for POS tagging and parsing the southern Dutch dialects on the other hand. The challenges and solutions we present in this pilot study will inform our choices for the NLP tools we will use or adapt for the development of a more extensive annotated corpus. PubDate: 2022-06-27 DOI: 10.18148/hs/2022.v6i4-11.92 Issue No:Vol. 6, No. 4-11 (2022)
Authors:Nilo Pedrazzini Pages: 1 - 40 Abstract: This paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology and results may change depending on the amount of data and the levels of linguistic annotation available. The analysis indicates that deeply-annotated treebanks of limited size can be exploited to establish a solid guideline to analyze a phenomenon in shallowly-annotated corpora and even new, unannotated texts. This is particularly encouraging for historical languages, such as Early Slavic, showing very high diatopic and diachronic variation, which significantly undermines corpus-annotation automation and therefore calls for alternative strategies to counteract data scarcity. PubDate: 2022-06-27 DOI: 10.18148/hs/2022.v6i4-11.96 Issue No:Vol. 6, No. 4-11 (2022)
Authors:Hanne Martine Eckhoff Pages: 1 - 40 Abstract: This article uses extensive treebank data from the PROIEL and TOROT treebanks to track the much-debated rise of the animacy category in Russian, from definiteness-driven differential object marking in Old Church Slavonic via constructionally conditioned variation in Old East Slavonic to fully fledged animacy subgender marking in late Middle Russian. The change is interesting from a methodological point of view as well, since it requires us to annotate data through an ongoing change, and also since conventional treebank annotation is not enough to capture the conditions of the observed variation and change: annotation for semantics and information structure is necessary too. The article describes and defends a conservative approach to annotation in the face of change: the analysis that fits the first attested stage of a change is retained as long as possible. PubDate: 2022-06-27 DOI: 10.18148/hs/2022.v6i4-11.110 Issue No:Vol. 6, No. 4-11 (2022)
Authors:Erich Poppe Pages: 1 - 44 Abstract: This paper focusses on uses of finite and nonfinite verb forms in Early Modern Welsh subordinate clauses in which two or more verbal events are coordinated. In such clauses, three different constructions are already attested in Middle Welsh; one of these was described as the norm in the language of sixteenth-century Welsh Biblical texts by a nineteenth-century grammarian, Thomas Jones Hughes. On the basis of a micro-study of data from these texts, the paper will review his claim and survey the distribution of the relevant syntactic patterns, thereby assessing the potential of the coordination of verbal events in subordinate clauses as a promising area of research in historical syntax and typological linguistics. Based on a comparison of Welsh, Hebrew, and Greek parallel passages, it argues that translational equivalents can be seen to exist specifically between a Welsh construction with a nonfinite form in the second coordinand and formally different constructions in the Hebrew and Greek source texts. PubDate: 2022-06-27 DOI: 10.18148/hs/2022.v6i4-11.94 Issue No:Vol. 6, No. 4-11 (2022)
Authors:Elena Parina Pages: 1 - 23 Abstract: This study investigates the function of overt relative markers (yr hwn etc.) in a sample of the 16th-century Welsh translation of Gesta Romanorum. Using previous findings from a collection of 14th-century texts, the following results were obtained: (1) The relative frequency of the construction significantly increases in this text compared to the earlier period, which points to the expansion of this construction. (2) The data both from the 14th- century sample, as well as from the Gesta Romanorum, demonstrate that this construction is used to mark non-restrictive relative clauses. (3) Moreover, in Gesta Romanorum, another usage of this construction is found frequently, where overt marking is used in presentative relative clauses. This testifies that the category proposed by Lambrecht (2000) for French is valid for other languages. PubDate: 2022-06-27 DOI: 10.18148/hs/2022.v6i4-11.106 Issue No:Vol. 6, No. 4-11 (2022)
Authors:Raphael Sackmann Pages: 1 - 46 Abstract: In the Welsh language, constructions with nonfinite verb forms, traditionally called ‘verbal nouns’, are found frequently at all periods. Subjects of these forms can be marked in various ways. The frequency and distribution of certain subject markers differs drastically between Middle and Modern Welsh. Subject marking in Early Modern texts is highly variable, but has so far been little researched. This article presents a first micro-study analysing the distribution of different subject markers in nonfinite clauses in one text, Perl mewn Adfyd (1595), a religious treatise translated from English. Somewhat surprisingly, the data from this text already largely correspond to the Modern Welsh system, especially with regard to nonfinite adverbial and complement clauses. Taking into account examples from other texts, and including auxiliary constructions, formally less expected structures are tentatively related to semantic factors. PubDate: 2022-06-27 DOI: 10.18148/hs/2022.v6i4-11.97 Issue No:Vol. 6, No. 4-11 (2022)