for Journals by Title or ISSN
for Articles by Keywords
  Subjects -> HUMANITIES (Total: 885 journals)
    - ASIAN STUDIES (159 journals)
    - CLASSICAL STUDIES (111 journals)
    - ETHNIC INTERESTS (155 journals)
    - GENEALOGY AND HERALDRY (7 journals)
    - HUMANITIES (280 journals)
    - NATIVE AMERICAN STUDIES (28 journals)

HUMANITIES (280 journals)                  1 2     

Showing 1 - 71 of 71 Journals sorted alphabetically
Aboriginal and Islander Health Worker Journal     Full-text available via subscription   (Followers: 14)
Aboriginal Child at School     Full-text available via subscription   (Followers: 5)
About Performance     Full-text available via subscription   (Followers: 12)
Access     Full-text available via subscription   (Followers: 26)
ACCESS: Critical Perspectives on Communication, Cultural & Policy Studies     Full-text available via subscription   (Followers: 10)
Acta Academica     Full-text available via subscription   (Followers: 6)
Acta Universitaria     Open Access   (Followers: 4)
Adeptus     Open Access   (Followers: 1)
Advocate: Newsletter of the National Tertiary Education Union     Full-text available via subscription   (Followers: 1)
African and Black Diaspora: An International Journal     Hybrid Journal   (Followers: 11)
African Historical Review     Hybrid Journal   (Followers: 17)
AFRREV IJAH : An International Journal of Arts and Humanities     Open Access   (Followers: 2)
Agriculture and Human Values     Hybrid Journal   (Followers: 13)
Akademika : Journal of Southeast Asia Social Sciences and Humanities     Open Access   (Followers: 5)
Aldébaran     Open Access   (Followers: 3)
Alterstice : Revue internationale de la recherche interculturelle     Open Access  
Altre Modernità     Open Access   (Followers: 4)
Amaltea. Revista de mitocrítica     Open Access   (Followers: 1)
American Imago     Full-text available via subscription   (Followers: 3)
American Journal of Humanities and Social Sciences     Open Access   (Followers: 10)
American Review of Canadian Studies     Hybrid Journal   (Followers: 7)
Anabases     Open Access  
Analyse & Kritik. Zeitschrift f     Full-text available via subscription   (Followers: 1)
Angelaki: Journal of Theoretical Humanities     Hybrid Journal   (Followers: 17)
Antik Tanulmányok     Full-text available via subscription  
Antipode     Hybrid Journal   (Followers: 50)
Anuario Americanista Europeo     Open Access  
Arbutus Review     Open Access  
Argumentation et analyse du discours     Open Access   (Followers: 6)
Ars & Humanitas     Open Access   (Followers: 9)
Arts and Humanities in Higher Education     Hybrid Journal   (Followers: 34)
Asia Europe Journal     Hybrid Journal   (Followers: 5)
Australasian Journal of Popular Culture, The     Hybrid Journal   (Followers: 2)
Behaviour & Information Technology     Hybrid Journal   (Followers: 52)
Behemoth     Open Access   (Followers: 3)
Bereavement Care     Hybrid Journal   (Followers: 11)
Cahiers de praxématique     Open Access   (Followers: 1)
Carl Beck Papers in Russian and East European Studies     Full-text available via subscription   (Followers: 5)
Child Care     Full-text available via subscription   (Followers: 7)
Choreographic Practices     Hybrid Journal   (Followers: 1)
Chronicle of Philanthropy     Full-text available via subscription   (Followers: 2)
Ciencias Sociales y Humanidades     Open Access   (Followers: 1)
Claroscuro     Open Access   (Followers: 1)
Co-herencia     Open Access  
Coaching: An International Journal of Theory, Research and Practice     Hybrid Journal   (Followers: 10)
Cogent Arts & Humanities     Open Access   (Followers: 3)
Colloquia Humanistica     Open Access  
Communication and Critical/Cultural Studies     Hybrid Journal   (Followers: 26)
Comprehensive Therapy     Hybrid Journal   (Followers: 3)
Congenital Anomalies     Hybrid Journal   (Followers: 1)
Conjunctions. Transdisciplinary Journal of Cultural Participation     Open Access   (Followers: 3)
Conservation Science in Cultural Heritage     Open Access   (Followers: 10)
Cornish Studies     Hybrid Journal   (Followers: 2)
Creative Industries Journal     Hybrid Journal   (Followers: 9)
Critical Arts : South-North Cultural and Media Studies     Hybrid Journal   (Followers: 11)
Crossing the Border : International Journal of Interdisciplinary Studies     Open Access   (Followers: 4)
Cuadernos de historia de España     Open Access   (Followers: 4)
Cultural History     Hybrid Journal   (Followers: 25)
Cultural Studies     Hybrid Journal   (Followers: 51)
Culturas     Open Access   (Followers: 1)
Culture, Theory and Critique     Hybrid Journal   (Followers: 27)
Daedalus     Hybrid Journal   (Followers: 21)
Dandelion : Postgraduate Arts Journal & Research Network     Open Access   (Followers: 2)
Death Studies     Hybrid Journal   (Followers: 18)
Debatte: Journal of Contemporary Central and Eastern Europe     Hybrid Journal   (Followers: 5)
Digital Humanities Quarterly     Open Access   (Followers: 58)
Diogenes     Hybrid Journal   (Followers: 8)
Doct-Us Journal     Open Access  
Dorsal Revista de Estudios Foucaultianos     Open Access  
e-Hum : Revista das Áreas de Humanidade do Centro Universitário de Belo Horizonte     Open Access   (Followers: 1)
Early Modern Culture Online     Open Access   (Followers: 40)
Égypte - Monde arabe     Open Access   (Followers: 4)
Eighteenth-Century Fiction     Full-text available via subscription   (Followers: 20)
Éire-Ireland     Full-text available via subscription   (Followers: 9)
En-Claves del pensamiento     Open Access   (Followers: 1)
Ethiopian Journal of the Social Sciences and Humanities     Full-text available via subscription   (Followers: 8)
Études arméniennes contemporaines     Open Access   (Followers: 1)
Études canadiennes / Canadian Studies     Open Access   (Followers: 1)
Études de lettres     Open Access   (Followers: 3)
European Journal of Cultural Studies     Hybrid Journal   (Followers: 27)
European Journal of Social Theory     Hybrid Journal   (Followers: 17)
Expositions     Full-text available via subscription  
Fronteras : Revista de Ciencias Sociales y Humanidades     Open Access   (Followers: 2)
Frontiers in Digital Humanities     Open Access   (Followers: 1)
Fudan Journal of the Humanities and Social Sciences     Hybrid Journal  
GAIA - Ecological Perspectives for Science and Society     Full-text available via subscription   (Followers: 4)
German Research     Hybrid Journal   (Followers: 1)
German Studies Review     Full-text available via subscription   (Followers: 27)
Germanic Review, The     Hybrid Journal   (Followers: 5)
Globalizations     Hybrid Journal   (Followers: 8)
Gothic Studies     Full-text available via subscription   (Followers: 15)
Gruppendynamik und Organisationsberatung     Hybrid Journal   (Followers: 1)
Habitat International     Hybrid Journal   (Followers: 5)
Hacettepe Üniversitesi Edebiyat Fakültesi Dergisi     Open Access   (Followers: 1)
Harvard Journal of Asiatic Studies     Full-text available via subscription   (Followers: 14)
Heritage & Society     Hybrid Journal   (Followers: 17)
History of Humanities     Full-text available via subscription   (Followers: 5)
Hopscotch: A Cultural Review     Full-text available via subscription  
Human Affairs     Open Access   (Followers: 1)
Human and Ecological Risk Assessment: An International Journal     Hybrid Journal   (Followers: 4)
Human Nature     Hybrid Journal   (Followers: 19)
Human Performance     Hybrid Journal   (Followers: 5)
Human Remains and Violence : An Interdisciplinary Journal     Full-text available via subscription  
Human Studies     Hybrid Journal   (Followers: 11)
humanidades     Open Access  
Humanitaire     Open Access   (Followers: 2)
Humanities     Open Access   (Followers: 11)
Hungarian Cultural Studies     Open Access  
Hungarian Studies     Full-text available via subscription  
Ibadan Journal of Humanistic Studies     Full-text available via subscription  
Inkanyiso : Journal of Humanities and Social Sciences     Open Access   (Followers: 1)
Inter Faculty     Open Access  
Interim : Interdisciplinary Journal     Open Access   (Followers: 3)
International Journal for History, Culture and Modernity     Open Access   (Followers: 7)
International Journal of Arab Culture, Management and Sustainable Development     Hybrid Journal   (Followers: 8)
International Journal of Cultural Studies     Hybrid Journal   (Followers: 26)
International Journal of Heritage Studies     Hybrid Journal   (Followers: 18)
International Journal of Humanities and Arts Computing     Hybrid Journal   (Followers: 13)
International Journal of Humanities and Cultural Studies     Open Access   (Followers: 6)
International Journal of Humanities of the Islamic Republic of Iran     Open Access   (Followers: 11)
International Journal of Listening     Hybrid Journal   (Followers: 4)
International Journal of the Classical Tradition     Hybrid Journal   (Followers: 12)
Interventions : International Journal of Postcolonial Studies     Hybrid Journal   (Followers: 16)
ÍSTMICA. Revista de la Facultad de Filosofía y Letras     Open Access   (Followers: 1)
Jangwa Pana     Open Access  
Jewish Culture and History     Hybrid Journal   (Followers: 19)
Journal de la Société des Américanistes     Open Access  
Journal des africanistes     Open Access   (Followers: 1)
Journal for Cultural Research     Hybrid Journal   (Followers: 12)
Journal for General Philosophy of Science     Hybrid Journal   (Followers: 7)
Journal for Learning Through the Arts     Open Access   (Followers: 7)
Journal for New Generation Sciences     Open Access   (Followers: 2)
Journal for Research into Freemasonry and Fraternalism     Hybrid Journal  
Journal for Semitics     Full-text available via subscription   (Followers: 5)
Journal Of Advances In Humanities     Open Access   (Followers: 3)
Journal of Aesthetics & Culture     Open Access   (Followers: 22)
Journal of African American Studies     Hybrid Journal   (Followers: 8)
Journal of African Cultural Studies     Hybrid Journal   (Followers: 5)
Journal of African Elections     Full-text available via subscription  
Journal of Arts & Communities     Hybrid Journal   (Followers: 5)
Journal of Arts and Humanities     Open Access   (Followers: 20)
Journal of Bioethical Inquiry     Hybrid Journal   (Followers: 3)
Journal of Cultural Economy     Hybrid Journal   (Followers: 9)
Journal of Cultural Geography     Hybrid Journal   (Followers: 22)
Journal of Data Mining and Digital Humanities     Open Access   (Followers: 29)
Journal of Developing Societies     Hybrid Journal   (Followers: 2)
Journal of Family Theory & Review     Hybrid Journal   (Followers: 3)
Journal of Franco-Irish Studies     Open Access   (Followers: 1)
Journal of Happiness Studies     Hybrid Journal   (Followers: 27)
Journal of Interactive Humanities     Open Access   (Followers: 3)
Journal of Intercultural Communication Research     Hybrid Journal   (Followers: 16)
Journal of Intercultural Studies     Hybrid Journal   (Followers: 12)
Journal of Interdisciplinary History     Hybrid Journal   (Followers: 25)
Journal of Labor Research     Hybrid Journal   (Followers: 20)
Journal of Medical Humanities     Hybrid Journal   (Followers: 22)
Journal of Medieval and Early Modern Studies     Full-text available via subscription   (Followers: 34)
Journal of Modern Greek Studies     Full-text available via subscription   (Followers: 4)
Journal of Modern Jewish Studies     Hybrid Journal   (Followers: 13)
Journal of Open Humanities Data     Open Access   (Followers: 1)
Journal of Semantics     Hybrid Journal   (Followers: 11)
Journal of the Musical Arts in Africa     Hybrid Journal   (Followers: 1)
Journal of Visual Culture     Hybrid Journal   (Followers: 33)
Journal Sampurasun : Interdisciplinary Studies for Cultural Heritage     Open Access  
Jurisprudence     Hybrid Journal   (Followers: 19)
Jurnal Sosial Humaniora     Open Access   (Followers: 2)
L'Orientation scolaire et professionnelle     Open Access   (Followers: 1)
La lettre du Collège de France     Open Access   (Followers: 1)
La Revue pour l’histoire du CNRS     Open Access   (Followers: 2)
Lagos Notes and Records     Full-text available via subscription  
Language and Intercultural Communication     Hybrid Journal   (Followers: 21)
Language Resources and Evaluation     Hybrid Journal   (Followers: 7)
Law and Humanities     Hybrid Journal   (Followers: 7)
Law, Culture and the Humanities     Hybrid Journal   (Followers: 12)
Le Portique     Open Access   (Followers: 1)
Leadership     Hybrid Journal   (Followers: 33)
Legal Ethics     Hybrid Journal   (Followers: 13)
Legon Journal of the Humanities     Full-text available via subscription  
Letras : Órgano de la Facultad de Letras y Ciencias Huamans     Open Access  
Literary and Linguistic Computing     Hybrid Journal   (Followers: 5)
Litnet Akademies : 'n Joernaal vir die Geesteswetenskappe, Natuurwetenskappe, Regte en Godsdienswetenskappe     Open Access  
Lwati : A Journal of Contemporary Research     Full-text available via subscription  
Measurement     Hybrid Journal   (Followers: 2)
Medical Humanities     Full-text available via subscription   (Followers: 22)
Medieval Encounters     Hybrid Journal   (Followers: 9)
Médiévales     Open Access   (Followers: 5)
Mélanges de la Casa de Velázquez     Partially Free   (Followers: 1)
Memory Studies     Hybrid Journal   (Followers: 35)
Mens : revue d'histoire intellectuelle et culturelle     Full-text available via subscription  
Messages, Sages and Ages     Open Access  
Mind and Matter     Full-text available via subscription   (Followers: 3)
Mneme - Revista de Humanidades     Open Access  
Modern Italy     Hybrid Journal   (Followers: 8)
Motivation Science     Full-text available via subscription   (Followers: 2)
Mouseion     Open Access   (Followers: 1)
Mouseion: Journal of the Classical Association of Canada     Full-text available via subscription   (Followers: 14)
Museum International Edition Francaise     Hybrid Journal   (Followers: 4)
National Academy Science Letters     Hybrid Journal   (Followers: 5)
Nationalities Papers     Hybrid Journal   (Followers: 7)
Natures Sciences Sociétés     Full-text available via subscription  
Neophilologus     Hybrid Journal   (Followers: 8)

        1 2     

Journal Cover Language Resources and Evaluation
  [SJR: 0.915]   [H-I: 31]   [7 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1574-0218 - ISSN (Online) 1574-020X
   Published by Springer-Verlag Homepage  [2355 journals]
  • The Spoken Wikipedia Corpus collection: Harvesting, alignment and an
           application to hyperlistening
    • Authors: Timo Baumann; Arne Köhn; Felix Hennig
      Abstract: Abstract Spoken corpora are important for speech research, but are expensive to create and do not necessarily reflect (read or spontaneous) speech ‘in the wild’. We report on our conversion of the preexisting and freely available Spoken Wikipedia into a speech resource. The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. There are initiatives to create and sustain Spoken Wikipedia versions in many languages and hence the available data grows over time. Thousands of spoken articles are available to users who prefer a spoken over the written version. We turn these semi-structured collections into structured and time-aligned corpora, keeping the exact correspondence with the original hypertext as well as all available metadata. Thus, we make the Spoken Wikipedia accessible for sustainable research. We present our open-source software pipeline that downloads, extracts, normalizes and text–speech aligns the Spoken Wikipedia. Additional language versions can be exploited by adapting configuration files or extending the software if necessary for language peculiarities. We also present and analyze the resulting corpora for German, English, and Dutch, which presently total 1005 h and grow at an estimated 87 h per year. The corpora, together with our software, are available via As a prototype usage of the time-aligned corpus, we describe an experiment about the preferred modalities for interacting with information-rich read-out hypertext. We find alignments to help improve user experience and factual information access by enabling targeted interaction.
      PubDate: 2018-01-09
      DOI: 10.1007/s10579-017-9410-y
  • Introduction to the special issue
    • Authors: Laurette Pretorius; Claudia Soria
      Pages: 891 - 895
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-017-9405-8
      Issue No: Vol. 51, No. 4 (2017)
  • Nenek: a cloud-based collaboration platform for the management of
           Amerindian language resources
    • Authors: J. L. Gonzalez; Anuschka van’t Hooft; Jesus Carretero; Victor J. Sosa-Sosa
      Pages: 897 - 925
      Abstract: Abstract This article presents Nenek: A cloud-based collaboration platform for language documentation of underresourced languages. Nenek is based on a crowdsourcing scheme that supports native speakers, indigenous associations, government agencies and researchers in the creation of virtual communities of minority language speakers on the Internet. Nenek includes a set of web tools that enables users to work collaboratively on language documentation tasks, build lexicographic assets and produce new language resources. This platform includes a three-stage management model to control the acquisition of existent language resources, the manufacturing of new resources, as well as their distribution within the virtual community and to the general public. In the acquisition stage, existent language resources are either automatically extracted from the web by a crawler or received through donations from users who participate in a monolingual social network. In the manufacturing stage, lexicographic and collaborative tools enable users to build new resources while the acquired and manufactured resources are published in the diffusion stage, either within the virtual community or publicly. We present a life cycle mapping scheme that registers the transformations of the language resources at each of the three stages of language resource management. This scheme also traces the utilization and diffusion of each resource produced by the virtual community. The paper includes a case study in which we present the use of the Nenek platform in a language documentation project of a Mayan language spoken in Mexico's Gulf coast region called Huastec. This case study reveals Nenek's efficiency in terms of acquisition, annotation, manufacturing and diffusion of language resources; it also discusses the participation of the members of the virtual community.
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-016-9361-8
      Issue No: Vol. 51, No. 4 (2017)
  • Reassessing the value of resources for cross-lingual transfer of POS
           tagging models
    • Authors: Nicolas Pécheux; Guillaume Wisniewski; François Yvon
      Pages: 927 - 960
      Abstract: Abstract When linguistically annotated data is scarce, as is the case for many under-resourced languages, one has to resort to less complete forms of annotations obtained from crawled dictionaries and/or through cross-lingual transfer. Several recent works have shown that learning from such partially supervised data can be effective in many practical situations. In this work, we review two existing proposals for learning with ambiguous labels which extend conventional learners to the weakly supervised setting: a history-based model using a variant of the perceptron, on the one hand; an extension of the Conditional Random Fields model on the other hand. Focusing on the part-of-speech tagging task, but considering a large set of ten languages, we show (a) that good performance can be achieved even in the presence of ambiguity, provided however that both monolingual and bilingual resources are available; (b) that our two learners exploit different characteristics of the training set, and are successful in different situations; (c) that in addition to the choice of an adequate learning algorithm, many other factors are critical for achieving good performance in a cross-lingual transfer setting.
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-016-9362-7
      Issue No: Vol. 51, No. 4 (2017)
  • Modeling under-resourced languages for speech recognition
    • Authors: Mikko Kurimo; Seppo Enarvi; Ottokar Tilk; Matti Varjokallio; André Mansikkaniemi; Tanel Alumäe
      Pages: 961 - 987
      Abstract: Abstract One particular problem in large vocabulary continuous speech recognition for low-resourced languages is finding relevant training data for the statistical language models. Large amount of data is required, because models should estimate the probability for all possible word sequences. For Finnish, Estonian and the other fenno-ugric languages a special problem with the data is the huge amount of different word forms that are common in normal speech. The same problem exists also in other language technology applications such as machine translation, information retrieval, and in some extent also in other morphologically rich languages. In this paper we present methods and evaluations in four recent language modeling topics: selecting conversational data from the Internet, adapting models for foreign words, multi-domain and adapted neural network language modeling, and decoding with subword units. Our evaluations show that the same methods work in more than one language and that they scale down to smaller data resources.
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-016-9336-9
      Issue No: Vol. 51, No. 4 (2017)
  • Assisting non-expert speakers of under-resourced languages in assigning
           stems and inflectional paradigms to new word entries of morphological
    • Authors: Miquel Esplà-Gomis; Rafael C. Carrasco; Víctor M. Sánchez-Cartagena; Mikel L. Forcada; Felipe Sánchez-Martínez; Juan Antonio Pérez-Ortiz
      Pages: 989 - 1017
      Abstract: Abstract This paper presents a new method with which to assist individuals with no background in linguistics to create monolingual dictionaries such as those used by the morphological analysers of many natural language processing applications. The involvement of non-expert users is especially critical for under-resourced languages which either lack or cannot afford the recruitment of a skilled workforce. Adding a word to a morphological dictionary usually requires identifying its stem along with the inflection paradigm that can be used in order to generate all the word forms of the new entry. Our method works under the assumption that the average speakers of a language can successfully answer the polar question “is x a valid form of the word w to be inserted'”, where x represents tentative alternative (inflected) forms of the new word w. The experiments show that with a small number of polar questions the correct stem and paradigm can be obtained from non-experts with high success rates. We study the impact of different heuristic and probabilistic approaches on the actual number of questions.
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-016-9360-9
      Issue No: Vol. 51, No. 4 (2017)
  • Crawl and crowd to bring machine translation to under-resourced languages
    • Authors: Antonio Toral; Miquel Esplá-Gomis; Filip Klubička; Nikola Ljubešić; Vassilis Papavassiliou; Prokopis Prokopidis; Raphael Rubino; Andy Way
      Pages: 1019 - 1051
      Abstract: Abstract We present a widely applicable methodology to bring machine translation (MT) to under-resourced languages in a cost-effective and rapid manner. Our proposal relies on web crawling to automatically acquire parallel data to train statistical MT systems if any such data can be found for the language pair and domain of interest. If that is not the case, we resort to (1) crowdsourcing to translate small amounts of text (hundreds of sentences), which are then used to tune statistical MT models, and (2) web crawling of vast amounts of monolingual data (millions of sentences), which are then used to build language models for MT. We apply these to two respective use-cases for Croatian, an under-resourced language that has gained relevance since it recently attained official status in the European Union. The first use-case regards tourism, given the importance of this sector to Croatia’s economy, while the second has to do with tweets, due to the growing importance of social media. For tourism, we crawl parallel data from 20 web domains using two state-of-the-art crawlers and explore how to combine the crawled data with bigger amounts of general-domain data. Our domain-adapted system is evaluated on a set of three additional tourism web domains and it outperforms the baseline in terms of automatic metrics and/or vocabulary coverage. In the social media use-case, we deal with tweets from the 2014 edition of the soccer World Cup. We build domain-adapted systems by (1) translating small amounts of tweets to be used for tuning by means of crowdsourcing and (2) crawling vast amounts of monolingual tweets. These systems outperform the baseline (Microsoft Bing) by 7.94 BLEU points (5.11 TER) for Croatian-to-English and by 2.17 points (1.94 TER) for English-to-Croatian on a test set translated by means of crowdsourcing. A complementary manual analysis sheds further light on these results.
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-016-9363-6
      Issue No: Vol. 51, No. 4 (2017)
  • Ebaluatoia : crowd evaluation for English–Basque machine translation
    • Authors: Nora Aranberri; Gorka Labaka; Arantza Díaz de Ilarraza; Kepa Sarasola
      Pages: 1053 - 1084
      Abstract: Abstract This work explores the feasibility of a crowd-based pair-wise comparison evaluation to get feedback on machine translation progress for under-resourced languages. Specifically, we propose a task based on simple work units to compare the outputs of five English-to-Basque systems, which we implement in a web application. In our design, we put forward two key aspects that we believe community collaboration initiatives should consider in order to attract and maintain participants, that is, providing both a community challenge and a personal challenge. We describe how these aspects can comply with a strict methodology to ensure research validity. In particular, we consider the evaluation set size and the characteristics of the test sentences, the number of evaluators per comparison pair, and a mechanism to identify dishonest participation (or participants with insufficient linguistic knowledge). We also describe our dissemination effort, which targeted both general users and interest groups. Over 500 people participated actively in the Ebaluatoia campaign and we were able to collect over 35,000 evaluations in a short period of 10 days. From the results, we complete the ranking of the systems under evaluation and establish whether the difference in quality between the systems is significant.
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-016-9335-x
      Issue No: Vol. 51, No. 4 (2017)
  • The GUM corpus: creating multilayer resources in the classroom
    • Authors: Amir Zeldes
      Pages: 581 - 612
      Abstract: Abstract This paper presents the methodology, design principles and detailed evaluation of a new freely available multilayer corpus, collected and edited via classroom annotation using collaborative software. After briefly discussing corpus design for open, extensible corpora, five classroom annotation projects are presented, covering structural markup in TEI XML, multiple part of speech tagging, constituent and dependency parsing, information structural and coreference annotation, and Rhetorical Structure Theory analysis. Layers are inspected for annotation quality and together they coalesce to form a richly annotated corpus that can be used to study the interactions between different levels of linguistic description. The evaluation gives an indication of the expected quality of a corpus created by students with relatively little training. A multifactorial example study on lexical NP coreference likelihood is also presented, which illustrates some applications of the corpus. The results of this project show that high quality, richly annotated resources can be created effectively as part of a linguistics curriculum, opening new possibilities not just for research, but also for corpora in linguistics pedagogy.
      PubDate: 2017-09-01
      DOI: 10.1007/s10579-016-9343-x
      Issue No: Vol. 51, No. 3 (2017)
  • Accurate and efficient general-purpose boilerplate detection for crawled
           web corpora
    • Authors: Roland Schäfer
      Pages: 873 - 889
      Abstract: Abstract Removal of boilerplate is one of the essential tasks in web corpus construction and web indexing. Boilerplate (redundant and automatically inserted material like menus, copyright notices, navigational elements, etc.) is usually considered to be linguistically unattractive for inclusion in a web corpus. Also, search engines should not index such material because it can lead to spurious results for search terms if these terms appear in boilerplate regions of the web page. In this paper, I present and evaluate a supervised machine-learning approach to general-purpose boilerplate detection for languages based on Latin alphabets using Multi-Layer Perceptrons (MLPs). It is both very efficient and very accurate (between 95 % and \(99\,\%\) correct classifications, depending on the input language). I show that language-specific classifiers greatly improve the accuracy of boilerplate detectors. The single features used for the classification are evaluated with regard to the merit they contribute to the classification. Furthermore, I show that the accuracy of the MLP is on a par with that of a wide range of other classifiers. My approach has been implemented in the open-source texrex web page cleaning software, and large corpora constructed using it are available from the COW initiative, including the CommonCOW corpora created from CommonCrawl datasets.
      PubDate: 2017-09-01
      DOI: 10.1007/s10579-016-9359-2
      Issue No: Vol. 51, No. 3 (2017)
  • ATR4S: toolkit with state-of-the-art automatic terms recognition methods
           in Scala
    • Authors: Nikita Astrakhantsev
      Abstract: Abstract Automatically recognized terminology is widely used for various domain-specific texts processing tasks, such as machine translation, information retrieval or ontology construction. However, there is still no agreement on which methods are best suited for particular settings and, moreover, there is no reliable comparison of already developed methods. We believe that one of the main reasons is the lack of state-of-the-art method implementations, which are usually non-trivial to recreate—mostly, in terms of software engineering efforts. In order to address these issues, we present ATR4S, an open-source software written in Scala that comprises 13 state-of-the-art methods for automatic terminology recognition (ATR) and implements the whole pipeline from text document preprocessing, to term candidates collection, term candidate scoring, and finally, term candidate ranking. It is highly scalable, modular and configurable tool with support of automatic caching. We also compare 13 state-of-the-art methods on 7 open datasets by average precision and processing time. Experimental comparison reveals that no single method demonstrates best average precision for all datasets and that other available tools for ATR do not contain the best methods.
      PubDate: 2017-12-21
      DOI: 10.1007/s10579-017-9409-4
  • Building the Galician wordnet: methods and applications
    • Authors: Xavier Gómez Guinovart; Miguel Anxo Solla Portela
      Abstract: Abstract This paper presents the different methodologies and resources used to build Galnet, the Galician version of WordNet. It reviews the different extraction processes and the lexicographical and textual sources used to develop this resource, and describes some of its applications in ontology research and terminology processing.
      PubDate: 2017-11-29
      DOI: 10.1007/s10579-017-9408-5
  • The corpus of Basque simplified texts (CBST)
    • Authors: Itziar Gonzalez-Dios; María Jesús Aranzabe; Arantza Díaz de Ilarraza
      Abstract: Abstract In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.
      PubDate: 2017-11-18
      DOI: 10.1007/s10579-017-9407-6
  • Using semantic roles to improve text classification in the requirements
    • Authors: Alejandro Rago; Claudia Marcos; J. Andres Diaz-Pace
      Abstract: Abstract Engineering activities often produce considerable documentation as a by-product of the development process. Due to their complexity, technical analysts can benefit from text processing techniques able to identify concepts of interest and analyze deficiencies of the documents in an automated fashion. In practice, text sentences from the documentation are usually transformed to a vector space model, which is suitable for traditional machine learning classifiers. However, such transformations suffer from problems of synonyms and ambiguity that cause classification mistakes. For alleviating these problems, there has been a growing interest in the semantic enrichment of text. Unfortunately, using general-purpose thesaurus and encyclopedias to enrich technical documents belonging to a given domain (e.g. requirements engineering) often introduces noise and does not improve classification. In this work, we aim at boosting text classification by exploiting information about semantic roles. We have explored this approach when building a multi-label classifier for identifying special concepts, called domain actions, in textual software requirements. After evaluating various combinations of semantic roles and text classification algorithms, we found that this kind of semantically-enriched data leads to improvements of up to 18% in both precision and recall, when compared to non-enriched data. Our enrichment strategy based on semantic roles also allowed classifiers to reach acceptable accuracy levels with small training sets. Moreover, semantic roles outperformed Wikipedia- and WordNET-based enrichments, which failed to boost requirements classification with several techniques. These results drove the development of two requirements tools, which we successfully applied in the processing of textual use cases.
      PubDate: 2017-11-11
      DOI: 10.1007/s10579-017-9406-7
  • A semi-automatic annotation tool for unobtrusive gesture analysis
    • Authors: Stijn De Beugher; Geert Brône; Toon Goedemé
      Abstract: Abstract In a variety of research fields, including linguistics, human–computer interaction research, psychology, sociology and behavioral studies, there is a growing interest in the role of gestural behavior related to speech and other modalities. The analysis of multimodal communication requires high-quality video data and detailed annotation of the different semiotic resources under scrutiny. In the majority of cases, the annotation of hand position, hand motion, gesture type, etc. is done manually, which is a time-consuming enterprise requiring multiple annotators and substantial resources. In this paper we present a semi-automatic alternative, in which the focus lies on minimizing the manual workload while guaranteeing highly accurate annotations. First, we discuss our approach, which consists of several processing steps such as identifying the hands in images, calculating motion of the hands, segmenting the recording in gesture and non-gesture events, etc. Second, we validate our approach against existing corpora in terms of accuracy and usefulness. The proposed approach is designed to provide annotations according to the McNeill (Hand and mind: what gestures reveal about thought, University of Chicago Press, Chicago, 1992) gesture space and the output is compatible with annotation tools such as ELAN or ANVIL.
      PubDate: 2017-11-07
      DOI: 10.1007/s10579-017-9404-9
  • Investigating the cross-lingual translatability of VerbNet-style
    • Authors: Olga Majewska; Ivan Vulić; Diana McCarthy; Yan Huang; Akira Murakami; Veronika Laippala; Anna Korhonen
      Abstract: Abstract VerbNet—the most extensive online verb lexicon currently available for English—has proved useful in supporting a variety of NLP tasks. However, its exploitation in multilingual NLP has been limited by the fact that such classifications are available for few languages only. Since manual development of VerbNet is a major undertaking, researchers have recently translated VerbNet classes from English to other languages. However, no systematic investigation has been conducted into the applicability and accuracy of such a translation approach across different, typologically diverse languages. Our study is aimed at filling this gap. We develop a systematic method for translation of VerbNet classes from English to other languages which we first apply to Polish and subsequently to Croatian, Mandarin, Japanese, Italian, and Finnish. Our results on Polish demonstrate high translatability with all the classes (96% of English member verbs successfully translated into Polish) and strong inter-annotator agreement, revealing a promising degree of overlap in the resultant classifications. The results on other languages are equally promising. This demonstrates that VerbNet classes have strong cross-lingual potential and the proposed method could be applied to obtain gold standards for automatic verb classification in different languages. We make our annotation guidelines and the six language-specific verb classifications available with this paper.
      PubDate: 2017-10-20
      DOI: 10.1007/s10579-017-9403-x
  • Automatic speech recognition system for Tunisian dialect
    • Authors: Abir Masmoudi; Fethi Bougares; Mariem Ellouze; Yannick Estève; Lamia Belguith
      Abstract: Abstract Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.
      PubDate: 2017-09-22
      DOI: 10.1007/s10579-017-9402-y
  • A longitudinal database of Irish political speech with annotations of
           speaker ability
    • Authors: Ailbhe Cullen; Naomi Harte
      Abstract: Abstract This paper presents the Irish Political Speech Database, an English-language database collected from Irish political recordings. The database is collected with automated indexing and content retrieval in mind, and thus is gathered from real-world recordings (such as television interviews and election rallies) which represent the nature and quality of recordings which will be encountered in practical applications. The database is labelled for six speaker attributes: boring; charismatic; enthusiastic; inspiring; likeable; and persuasive. Each of these traits is linked to the perceived ability or appeal of the speaker, and as such are relevant to a range of content retrieval and speech analysis tasks. The six base attributes are combined to form a metric of Overall Speaker Appeal. A set of baseline experiments is presented, which demonstrate the potential of this database for affective computing studies. Classification accuracies of up to 76% are achieved, with little feature or system optimisation.
      PubDate: 2017-09-20
      DOI: 10.1007/s10579-017-9401-z
  • BLARK for multi-dialect languages: towards the Kurdish BLARK
    • Authors: Hossein Hassani
      Abstract: Abstract In this paper we introduce the Kurdish BLARK (Basic Language Resource Kit). The original BLARK has not considered multi-dialect characteristics and generally has targeted reasonably well-resourced languages. To consider these two features, we extended BLARK and applied the proposed extension to Kurdish. Kurdish language not only faces a paucity in resources, but also embraces several dialects within a complex linguistic context. This paper presents the Kurdish BLARK and shows that from Natural language processing and computational linguistics perspectives the revised BLARK provides a more applicable view of languages with similar characteristics to Kurdish.
      PubDate: 2017-09-11
      DOI: 10.1007/s10579-017-9400-0
  • The challenging task of summary evaluation: an overview
    • Authors: Elena Lloret; Laura Plaza; Ahmet Aker
      Abstract: Abstract Evaluation is crucial in the research and development of automatic summarization applications, in order to determine the appropriateness of a summary based on different criteria, such as the content it contains, and the way it is presented. To perform an adequate evaluation is of great relevance to ensure that automatic summaries can be useful for the context and/or application they are generated for. To this end, researchers must be aware of the evaluation metrics, approaches, and datasets that are available, in order to decide which of them would be the most suitable to use, or to be able to propose new ones, overcoming the possible limitations that existing methods may present. In this article, a critical and historical analysis of evaluation metrics, methods, and datasets for automatic summarization systems is presented, where the strengths and weaknesses of evaluation efforts are discussed and the major challenges to solve are identified. Therefore, a clear up-to-date overview of the evolution and progress of summarization evaluation is provided, giving the reader useful insights into the past, present and latest trends in the automatic evaluation of summaries.
      PubDate: 2017-09-02
      DOI: 10.1007/s10579-017-9399-2
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016