for Journals by Title or ISSN
for Articles by Keywords
  Subjects -> HUMANITIES (Total: 931 journals)
    - ASIAN STUDIES (166 journals)
    - CLASSICAL STUDIES (131 journals)
    - ETHNIC INTERESTS (161 journals)
    - GENEALOGY AND HERALDRY (8 journals)
    - HUMANITIES (286 journals)
    - NATIVE AMERICAN STUDIES (28 journals)

HUMANITIES (286 journals)                  1 2     

Showing 1 - 71 of 71 Journals sorted alphabetically
Aboriginal and Islander Health Worker Journal     Full-text available via subscription   (Followers: 14)
Aboriginal Child at School     Full-text available via subscription   (Followers: 5)
About Performance     Full-text available via subscription   (Followers: 11)
Access     Full-text available via subscription   (Followers: 25)
ACCESS: Critical Perspectives on Communication, Cultural & Policy Studies     Full-text available via subscription   (Followers: 9)
Acta Academica     Full-text available via subscription   (Followers: 6)
Acta Universitaria     Open Access   (Followers: 5)
Adeptus     Open Access   (Followers: 1)
Advocate: Newsletter of the National Tertiary Education Union     Full-text available via subscription   (Followers: 1)
African and Black Diaspora: An International Journal     Hybrid Journal   (Followers: 11)
African Historical Review     Hybrid Journal   (Followers: 16)
AFRREV IJAH : An International Journal of Arts and Humanities     Open Access   (Followers: 4)
Agriculture and Human Values     Hybrid Journal   (Followers: 14)
Akademika : Journal of Southeast Asia Social Sciences and Humanities     Open Access   (Followers: 6)
Aldébaran     Open Access   (Followers: 3)
Alterstice : Revue internationale de la recherche interculturelle     Open Access  
Altre Modernità     Open Access   (Followers: 3)
Amaltea. Revista de mitocrítica     Open Access   (Followers: 1)
American Imago     Full-text available via subscription   (Followers: 3)
American Journal of Humanities and Social Sciences     Open Access   (Followers: 10)
American Review of Canadian Studies     Hybrid Journal   (Followers: 7)
Anabases     Open Access  
Analyse & Kritik. Zeitschrift f     Full-text available via subscription   (Followers: 1)
Angelaki: Journal of Theoretical Humanities     Hybrid Journal   (Followers: 15)
Anglo-Saxon England     Hybrid Journal   (Followers: 33)
Antik Tanulmányok     Full-text available via subscription  
Antipode     Hybrid Journal   (Followers: 55)
Anuario Americanista Europeo     Open Access  
Arbutus Review     Open Access   (Followers: 1)
Argumentation et analyse du discours     Open Access   (Followers: 5)
Ars & Humanitas     Open Access   (Followers: 12)
Artes Humanae     Open Access  
Arts and Humanities in Higher Education     Hybrid Journal   (Followers: 35)
Asia Europe Journal     Hybrid Journal   (Followers: 5)
Australasian Journal of Popular Culture, The     Hybrid Journal   (Followers: 2)
Behaviour & Information Technology     Hybrid Journal   (Followers: 52)
Behemoth     Open Access   (Followers: 3)
Belin Lecture Series     Open Access  
Bereavement Care     Hybrid Journal   (Followers: 12)
Bulletin of the School of Oriental and African Studies     Hybrid Journal   (Followers: 18)
Cahiers de praxématique     Open Access   (Followers: 1)
Carl Beck Papers in Russian and East European Studies     Full-text available via subscription   (Followers: 5)
Child Care     Full-text available via subscription   (Followers: 7)
Choreographic Practices     Hybrid Journal   (Followers: 1)
Chronicle of Philanthropy     Full-text available via subscription   (Followers: 2)
Ciencias Sociales y Humanidades     Open Access   (Followers: 2)
Claroscuro     Open Access   (Followers: 1)
Co-herencia     Open Access  
Coaching: An International Journal of Theory, Research and Practice     Hybrid Journal   (Followers: 8)
Cogent Arts & Humanities     Open Access   (Followers: 3)
Colloquia Humanistica     Open Access  
Communication and Critical/Cultural Studies     Hybrid Journal   (Followers: 27)
Comprehensive Therapy     Hybrid Journal   (Followers: 3)
Congenital Anomalies     Hybrid Journal   (Followers: 1)
Conjunctions. Transdisciplinary Journal of Cultural Participation     Open Access   (Followers: 4)
Conservation Science in Cultural Heritage     Open Access   (Followers: 9)
Cornish Studies     Hybrid Journal   (Followers: 2)
Creative Industries Journal     Hybrid Journal   (Followers: 8)
Critical Arts : South-North Cultural and Media Studies     Hybrid Journal   (Followers: 11)
Crossing the Border : International Journal of Interdisciplinary Studies     Open Access   (Followers: 4)
Cuadernos de historia de España     Open Access   (Followers: 3)
Cultural History     Hybrid Journal   (Followers: 25)
Cultural Studies     Hybrid Journal   (Followers: 52)
Culturas     Open Access   (Followers: 1)
Culture, Theory and Critique     Hybrid Journal   (Followers: 27)
Daedalus     Hybrid Journal   (Followers: 21)
Dandelion : Postgraduate Arts Journal & Research Network     Open Access   (Followers: 4)
Death Studies     Hybrid Journal   (Followers: 19)
Debatte: Journal of Contemporary Central and Eastern Europe     Hybrid Journal   (Followers: 4)
Digital Humanities Quarterly     Open Access   (Followers: 54)
Diogenes     Hybrid Journal   (Followers: 8)
Doct-Us Journal     Open Access  
Dorsal Revista de Estudios Foucaultianos     Open Access  
e-Hum : Revista das Áreas de Humanidade do Centro Universitário de Belo Horizonte     Open Access   (Followers: 2)
Early Modern Culture Online     Open Access   (Followers: 35)
Égypte - Monde arabe     Open Access   (Followers: 5)
Eighteenth-Century Fiction     Full-text available via subscription   (Followers: 17)
Éire-Ireland     Full-text available via subscription   (Followers: 7)
En-Claves del pensamiento     Open Access   (Followers: 1)
Ethiopian Journal of the Social Sciences and Humanities     Full-text available via subscription   (Followers: 8)
Études arméniennes contemporaines     Open Access   (Followers: 1)
Études canadiennes / Canadian Studies     Open Access   (Followers: 1)
Études de lettres     Open Access   (Followers: 3)
European Journal of Cultural Studies     Hybrid Journal   (Followers: 27)
European Journal of Social Theory     Hybrid Journal   (Followers: 18)
Expositions     Full-text available via subscription  
Fronteras : Revista de Ciencias Sociales y Humanidades     Open Access   (Followers: 2)
Frontiers in Digital Humanities     Open Access   (Followers: 1)
Fudan Journal of the Humanities and Social Sciences     Hybrid Journal  
GAIA - Ecological Perspectives for Science and Society     Full-text available via subscription   (Followers: 2)
German Research     Hybrid Journal   (Followers: 1)
German Studies Review     Full-text available via subscription   (Followers: 27)
Germanic Review, The     Hybrid Journal   (Followers: 5)
Globalizations     Hybrid Journal   (Followers: 8)
Gothic Studies     Full-text available via subscription   (Followers: 15)
Gruppendynamik und Organisationsberatung     Hybrid Journal   (Followers: 2)
Habitat International     Hybrid Journal   (Followers: 5)
Hacettepe Üniversitesi Edebiyat Fakültesi Dergisi     Open Access   (Followers: 2)
Harvard Journal of Asiatic Studies     Full-text available via subscription   (Followers: 14)
Heritage & Society     Hybrid Journal   (Followers: 16)
History of Humanities     Full-text available via subscription   (Followers: 5)
Hopscotch: A Cultural Review     Full-text available via subscription   (Followers: 1)
Human Affairs     Open Access   (Followers: 1)
Human and Ecological Risk Assessment: An International Journal     Hybrid Journal   (Followers: 4)
Human Nature     Hybrid Journal   (Followers: 20)
Human Performance     Hybrid Journal   (Followers: 5)
Human Remains and Violence : An Interdisciplinary Journal     Full-text available via subscription  
Human Studies     Hybrid Journal   (Followers: 9)
humanidades     Open Access  
Humanitaire     Open Access   (Followers: 2)
Humanities     Open Access   (Followers: 12)
Humanities Diliman : A Philippine Journal of Humanities     Open Access  
Hungarian Cultural Studies     Open Access  
Hungarian Studies     Full-text available via subscription  
Ibadan Journal of Humanistic Studies     Full-text available via subscription  
Inkanyiso : Journal of Humanities and Social Sciences     Open Access   (Followers: 1)
Inter Faculty     Open Access  
Interim : Interdisciplinary Journal     Open Access   (Followers: 3)
International Journal for History, Culture and Modernity     Open Access   (Followers: 7)
International Journal of Arab Culture, Management and Sustainable Development     Hybrid Journal   (Followers: 7)
International Journal of Cultural Studies     Hybrid Journal   (Followers: 26)
International Journal of Heritage Studies     Hybrid Journal   (Followers: 18)
International Journal of Humanities and Arts Computing     Hybrid Journal   (Followers: 13)
International Journal of Humanities and Cultural Studies     Open Access   (Followers: 7)
International Journal of Humanities of the Islamic Republic of Iran     Open Access   (Followers: 10)
International Journal of Listening     Hybrid Journal   (Followers: 4)
International Journal of the Classical Tradition     Hybrid Journal   (Followers: 12)
Interventions : International Journal of Postcolonial Studies     Hybrid Journal   (Followers: 16)
ÍSTMICA. Revista de la Facultad de Filosofía y Letras     Open Access   (Followers: 1)
Jangwa Pana     Open Access  
Jewish Culture and History     Hybrid Journal   (Followers: 19)
Journal de la Société des Américanistes     Open Access  
Journal des africanistes     Open Access   (Followers: 1)
Journal for Cultural Research     Hybrid Journal   (Followers: 11)
Journal for General Philosophy of Science     Hybrid Journal   (Followers: 7)
Journal for Learning Through the Arts     Open Access   (Followers: 7)
Journal for New Generation Sciences     Open Access   (Followers: 3)
Journal for Research into Freemasonry and Fraternalism     Hybrid Journal  
Journal for Semitics     Full-text available via subscription   (Followers: 7)
Journal Of Advances In Humanities     Open Access   (Followers: 3)
Journal of Aesthetics & Culture     Open Access   (Followers: 22)
Journal of African American Studies     Hybrid Journal   (Followers: 9)
Journal of African Cultural Studies     Hybrid Journal   (Followers: 5)
Journal of African Elections     Full-text available via subscription  
Journal of Arts & Communities     Hybrid Journal   (Followers: 6)
Journal of Arts and Humanities     Open Access   (Followers: 20)
Journal of Bioethical Inquiry     Hybrid Journal   (Followers: 3)
Journal of Cultural Economy     Hybrid Journal   (Followers: 9)
Journal of Cultural Geography     Hybrid Journal   (Followers: 21)
Journal of Data Mining and Digital Humanities     Open Access   (Followers: 30)
Journal of Developing Societies     Hybrid Journal   (Followers: 1)
Journal of Family Theory & Review     Hybrid Journal   (Followers: 3)
Journal of Franco-Irish Studies     Open Access   (Followers: 1)
Journal of Happiness Studies     Hybrid Journal   (Followers: 26)
Journal of Interactive Humanities     Open Access   (Followers: 3)
Journal of Intercultural Communication Research     Hybrid Journal   (Followers: 15)
Journal of Intercultural Studies     Hybrid Journal   (Followers: 10)
Journal of Interdisciplinary History     Hybrid Journal   (Followers: 22)
Journal of Labor Research     Hybrid Journal   (Followers: 19)
Journal of Medical Humanities     Hybrid Journal   (Followers: 22)
Journal of Medieval and Early Modern Studies     Full-text available via subscription   (Followers: 31)
Journal of Modern Greek Studies     Full-text available via subscription   (Followers: 4)
Journal of Modern Jewish Studies     Hybrid Journal   (Followers: 14)
Journal of Open Humanities Data     Open Access   (Followers: 1)
Journal of Semantics     Hybrid Journal   (Followers: 12)
Journal of the Musical Arts in Africa     Hybrid Journal   (Followers: 1)
Journal of Visual Culture     Hybrid Journal   (Followers: 32)
Journal Sampurasun : Interdisciplinary Studies for Cultural Heritage     Open Access  
Jurisprudence     Hybrid Journal   (Followers: 18)
Jurnal Ilmu Sosial dan Humaniora     Open Access  
Jurnal Pendidikan Humaniora : Journal of Humanities Education     Open Access   (Followers: 1)
Jurnal Sosial Humaniora     Open Access   (Followers: 2)
L'Orientation scolaire et professionnelle     Open Access   (Followers: 1)
La lettre du Collège de France     Open Access   (Followers: 1)
La Revue pour l’histoire du CNRS     Open Access  
Lagos Notes and Records     Full-text available via subscription  
Language and Intercultural Communication     Hybrid Journal   (Followers: 19)
Language Resources and Evaluation     Hybrid Journal   (Followers: 4)
Law and Humanities     Hybrid Journal   (Followers: 6)
Law, Culture and the Humanities     Hybrid Journal   (Followers: 10)
Le Portique     Open Access   (Followers: 1)
Leadership     Hybrid Journal   (Followers: 35)
Legal Ethics     Hybrid Journal   (Followers: 13)
Legon Journal of the Humanities     Full-text available via subscription  
Letras : Órgano de la Facultad de Letras y Ciencias Huamans     Open Access   (Followers: 1)
Literary and Linguistic Computing     Hybrid Journal   (Followers: 5)
Litnet Akademies : 'n Joernaal vir die Geesteswetenskappe, Natuurwetenskappe, Regte en Godsdienswetenskappe     Open Access  
Lwati : A Journal of Contemporary Research     Full-text available via subscription   (Followers: 1)
Measurement     Hybrid Journal   (Followers: 2)
Medical Humanities     Full-text available via subscription   (Followers: 21)
Medieval Encounters     Hybrid Journal   (Followers: 7)
Médiévales     Open Access   (Followers: 3)
Mélanges de la Casa de Velázquez     Partially Free  
Memory Studies     Hybrid Journal   (Followers: 33)
Mens : revue d'histoire intellectuelle et culturelle     Full-text available via subscription  
Messages, Sages and Ages     Open Access  
Mind and Matter     Full-text available via subscription   (Followers: 3)
Mneme - Revista de Humanidades     Open Access   (Followers: 1)
Modern Italy     Hybrid Journal   (Followers: 7)
Motivation Science     Full-text available via subscription   (Followers: 2)

        1 2     

Journal Cover Language Resources and Evaluation
  [SJR: 0.915]   [H-I: 31]   [4 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1574-0218 - ISSN (Online) 1574-020X
   Published by Springer-Verlag Homepage  [2349 journals]
  • Creation of an annotated corpus of Old and Middle Hungarian court records
           and private correspondence
    • Authors: Attila Novák; Katalin Gugán; Mónika Varga; Adrienne Dömötör
      Pages: 1 - 28
      Abstract: The paper introduces a novel annotated corpus of Old and Middle Hungarian (16–18 century), the texts of which were selected in order to approximate the vernacular of the given historical periods as closely as possible. The corpus consists of testimonies of witnesses in trials and samples of private correspondence. The texts are not only analyzed morphologically, but each file contains metadata that would also facilitate sociolinguistic research. The texts were segmented into clauses, manually normalized and morphosyntactically annotated using an annotation system consisting of the PurePos PoS tagger and the Hungarian morphological analyzer HuMor originally developed for Modern Hungarian but adapted to analyze Old and Middle Hungarian morphological constructions. The automatically disambiguated morphological annotation was manually checked and corrected using an easy-to-use web-based manual disambiguation interface. The normalization process and the manual validation of the annotation required extensive teamwork and provided continuous feedback for the refinement of the computational morphology and iterative retraining of the statistical models of the tagger. The paper discusses some of the typical problems that occurred during the normalization procedure and their tentative solutions. Besides, we also describe the automatic annotation tools, the process of semi-automatic disambiguation, and the query interface, a special function of which also makes correction of the annotation possible. Displaying the original, the normalized and the parsed versions of the selected texts, the beta version of the first fully normalized and annotated historical corpus of Hungarian is freely accessible at the address
      PubDate: 2018-03-01
      DOI: 10.1007/s10579-017-9393-8
      Issue No: Vol. 52, No. 1 (2018)
  • The PROIEL treebank family: a standard for early attestations of
           Indo-European languages
    • Authors: Hanne Eckhoff; Kristin Bech; Gerlof Bouma; Kristine Eide; Dag Haug; Odd Einar Haugen; Marius Jøhndal
      Pages: 29 - 65
      Abstract: This article describes a family of dependency treebanks of early attestations of Indo-European languages originating in the parallel treebank built by the members of the project pragmatic resources in old Indo-European languages. The treebanks all share a set of open-source software tools, including a web annotation interface, and a set of annotation schemes and guidelines developed especially for the project languages. The treebanks use an enriched dependency grammar scheme complemented by detailed morphological tags, which have proved sufficient to give detailed descriptions of these richly inflected languages, and which have been easy to adapt to new languages. We describe the tools and annotation schemes and discuss some challenges posed by the various languages that have been annotated. We also discuss problems with tokenisation, sentence division and lemmatisation, commonly encountered in ancient and mediaeval texts, and challenges associated with low levels of standardisation and ongoing morphological and syntactic change.
      PubDate: 2018-03-01
      DOI: 10.1007/s10579-017-9388-5
      Issue No: Vol. 52, No. 1 (2018)
  • Hindi CCGbank: A CCG treebank from the Hindi dependency treebank
    • Authors: Bharat Ram Ambati; Tejaswini Deoskar; Mark Steedman
      Pages: 67 - 100
      Abstract: In this paper, we present an approach for automatically creating a combinatory categorial grammar (CCG) treebank from a dependency treebank for the subject–object–verb language Hindi. Rather than a direct conversion from dependency trees to CCG trees, we propose a two stage approach: a language independent generic algorithm first extracts a CCG lexicon from the dependency treebank. An exhaustive CCG parser then creates a treebank of CCG derivations. We also discuss special cases of this generic algorithm to handle linguistic phenomena specific to Hindi. In doing so we extract different constructions with long-range dependencies like coordinate constructions and non-projective dependencies resulting from constructions like relative clauses, noun elaboration and verbal modifiers.
      PubDate: 2018-03-01
      DOI: 10.1007/s10579-017-9379-6
      Issue No: Vol. 52, No. 1 (2018)
  • RST Signalling Corpus: a corpus of signals of coherence relations
    • Authors: Debopam Das; Maite Taboada
      Pages: 149 - 184
      Abstract: We present the RST Signalling Corpus (Das et al. in RST signalling corpus, LDC2015T10., 2015), a corpus annotated for signals of coherence relations. The corpus is developed over the RST Discourse Treebank (Carlson et al. in RST Discourse Treebank, LDC2002T07., 2002) which is annotated for coherence relations. In the RST Signalling Corpus, these relations are further annotated with signalling information. The corpus includes annotation not only for discourse markers which are considered to be the most typical (or sometimes the only type of) signals in discourse, but also for a wide array of other signals such as reference, lexical, semantic, syntactic, graphical and genre features as potential indicators of coherence relations. We describe the research underlying the development of the corpus and the annotation process, and provide details of the corpus. We also present the results of an inter-annotator agreement study, illustrating the validity and reproducibility of the annotation. The corpus is available through the Linguistic Data Consortium, and can be used to investigate the psycholinguistic mechanisms behind the interpretation of relations through signalling, and also to develop discourse-specific computational systems such as discourse parsing applications.
      PubDate: 2018-03-01
      DOI: 10.1007/s10579-017-9383-x
      Issue No: Vol. 52, No. 1 (2018)
  • A flexible text analyzer based on ontologies: an application for detecting
           discriminatory language
    • Authors: Alberto Salguero; Macarena Espinilla
      Pages: 185 - 215
      Abstract: Language can be a tool to marginalize certain groups due to the fact that it may reflect a negative mentality caused by mental barriers or historical delays. In order to prevent misuse of language, several agents have carried out campaigns against discriminatory language, criticizing the use of some terms and phrases. However, there is an important gap in detecting discriminatory text in documents because language is very flexible and, usually, contains hidden features or relations. Furthermore, the adaptation of approaches and methodologies proposed in the literature for text analysis is complex due to the fact that these proposals are too rigid to be adapted to different purposes for which they were intended. The main novelty of the methodology is the use of ontologies to implement the rules that are used by the developed text analyzer, providing a great flexibility for the development of text analyzers and exploiting the ability to infer knowledge of the ontologies. A set of rules for detecting discriminatory language relevant to gender and people with disabilities is also presented in order to show how to extend the functionality of the text analyzer to different discriminatory text areas.
      PubDate: 2018-03-01
      DOI: 10.1007/s10579-017-9387-6
      Issue No: Vol. 52, No. 1 (2018)
  • Ensuring annotation consistency and accuracy for Vietnamese treebank
    • Authors: Quy T. Nguyen; Yusuke Miyao; Ha T. T. Le; Nhung T. H. Nguyen
      Pages: 269 - 315
      Abstract: Treebanks are important resources for researchers in natural language processing. They provide training and testing materials so that different algorithms can be compared. However, it is not a trivial task to construct high-quality treebanks. We have not yet had a proper treebank for such a low-resource language as Vietnamese, which has probably lowered the performance of Vietnamese language processing. We have been building a consistent and accurate Vietnamese treebank to alleviate such situations. Our treebank is annotated with three layers: word segmentation, part-of-speech tagging, and bracketing. We developed detailed annotation guidelines for each layer by presenting Vietnamese linguistic issues as well as methods of addressing them. Here, we also describe approaches to controlling annotation quality while ensuring a reasonable annotation speed. We specifically designed an appropriate annotation process and an effective process to train annotators. In addition, we implemented several support tools to improve annotation speed and to control the consistency of the treebank. The results from experiments revealed that both inter-annotator agreement and accuracy were higher than 90%, which indicated that the treebank is reliable.
      PubDate: 2018-03-01
      DOI: 10.1007/s10579-017-9398-3
      Issue No: Vol. 52, No. 1 (2018)
  • Cross-language transfer of semantic annotation via targeted crowdsourcing:
           task design and evaluation
    • Authors: Evgeny A. Stepanov; Shammur Absar Chowdhury; Ali Orkan Bayer; Arindam Ghosh; Ioannis Klasinas; Marcos Calvo; Emilio Sanchis; Giuseppe Riccardi
      Pages: 341 - 364
      Abstract: Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks, like cross-language semantic annotation transfer, may generate low judgment agreement and/or poor performance. The most serious issue in cross-language porting is the absence of reference annotations in the target language; thus, crowd quality control and the evaluation of the collected annotations is difficult. In this paper we investigate targeted crowdsourcing for semantic annotation transfer that delegates to crowds a complex task such as segmenting and labeling of concepts taken from a domain ontology; and evaluation using source language annotation. To test the applicability and effectiveness of the crowdsourced annotation transfer we have considered the case of close and distant language pairs: Italian–Spanish and Italian–Greek. The corpora annotated via crowdsourcing are evaluated against source and target language expert annotations. We demonstrate that the two evaluation references (source and target) highly correlate with each other; thus, drastically reduce the need for the target language reference annotations.
      PubDate: 2018-03-01
      DOI: 10.1007/s10579-017-9396-5
      Issue No: Vol. 52, No. 1 (2018)
  • Exploring the fine-grained analysis and automatic detection of irony on
    • Authors: Cynthia Van Hee; Els Lefever; Véronique Hoste
      Abstract: To push the state of the art in text mining applications, research in natural language processing has increasingly been investigating automatic irony detection, but manually annotated irony corpora are scarce. We present the construction of a manually annotated irony corpus based on a fine-grained annotation scheme that allows for identification of different types of irony. We conduct a series of binary classification experiments for automatic irony recognition using a support vector machine (SVM) that exploits a varied feature set and compare this method to a deep learning approach that is based on an LSTM network and (pre-trained) word embeddings. Evaluation on a held-out corpus shows that the SVM model outperforms the neural network approach and benefits from combining lexical, semantic and syntactic information sources. A qualitative analysis of the classification output reveals that the classifier performance may be further enhanced by integrating implicit sentiment information and context- and user-based features.
      PubDate: 2018-02-26
      DOI: 10.1007/s10579-018-9414-2
  • The Talk of Norway: a richly annotated corpus of the Norwegian parliament,
    • Authors: Emanuele Lapponi; Martin G. Søyland; Erik Velldal; Stephan Oepen
      Abstract: In this work we present the Talk of Norway (ToN) data set, a collection of Norwegian Parliament speeches from 1998 to 2016. Every speech is richly annotated with metadata harvested from different sources, and augmented with language type, sentence, token, lemma, part-of-speech, and morphological feature annotations. We also present a pilot study on party classification in the Norwegian Parliament, carried out in the context of a cross-faculty collaboration involving researchers from both Political Science and Computer Science. Our initial experiments demonstrate how the linguistic and institutional annotations in ToN can be used to gather insights on how different aspects of the political process affect classification.
      PubDate: 2018-02-13
      DOI: 10.1007/s10579-018-9411-5
  • Annotated news corpora and a lexicon for sentiment analysis in Slovene
    • Authors: Jože Bučar; Martin Žnidaršič; Janez Povh
      Abstract: In this study, we introduce Slovene web-crawled news corpora with sentiment annotation on three levels of granularity: sentence, paragraph and document levels. We describe the methodology and tools that were required for their construction. The corpora contain more than 250,000 documents with political, business, economic and financial content from five Slovene media resources on the web. More than 10,000 of them were manually annotated as negative, neutral or positive. All corpora are publicly available under a Creative Commons copyright license. We used the annotated documents to construct a Slovene sentiment lexicon, which is the first of its kind for Slovene, and to assess the sentiment classification approaches used. The constructed corpora were also utilised to monitor within-the-document sentiment dynamics, its changes over time and relations with news topics. We show that sentiment is, on average, more explicit at the beginning of documents, and it loses sharpness towards the end of documents.
      PubDate: 2018-02-06
      DOI: 10.1007/s10579-018-9413-3
  • TermFinder: log-likelihood comparison and phrase-based statistical machine
           translation models for bilingual terminology extraction
    • Authors: Rejwanul Haque; Sergio Penkale; Andy Way
      Abstract: Bilingual termbanks are important for many natural language processing applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. The initial candidate terminology list is prepared by taking all arbitrary n-gram word sequences from the corpus. Then, a well-known statistical measure (the Dice coefficient) is employed in order to remove any multi-word terms with weak associations from the candidate term list. Thereafter, the log-likelihood comparison method is applied to rank the phrasal candidate term list. Then, using a phrase-based statistical machine translation model, we create a bilingual terminology with the extracted monolingual term lists. We integrate an external knowledge source—the Wikipedia cross-language link databases—into the terminology extraction (TE) model to assist two processes: (a) the ranking of the extracted terminology list, and (b) the selection of appropriate target terms for a source term. First, we report the performance of our monolingual TE model compared to a number of the state-of-the-art TE models on English-to-Turkish and English-to-Hindi data sets. Then, we evaluate our novel bilingual TE model on an English-to-Turkish data set, and report the automatic evaluation results. We also manually evaluate our novel TE model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains.
      PubDate: 2018-02-03
      DOI: 10.1007/s10579-018-9412-4
  • The Spoken Wikipedia Corpus collection: Harvesting, alignment and an
           application to hyperlistening
    • Authors: Timo Baumann; Arne Köhn; Felix Hennig
      Abstract: Spoken corpora are important for speech research, but are expensive to create and do not necessarily reflect (read or spontaneous) speech ‘in the wild’. We report on our conversion of the preexisting and freely available Spoken Wikipedia into a speech resource. The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. There are initiatives to create and sustain Spoken Wikipedia versions in many languages and hence the available data grows over time. Thousands of spoken articles are available to users who prefer a spoken over the written version. We turn these semi-structured collections into structured and time-aligned corpora, keeping the exact correspondence with the original hypertext as well as all available metadata. Thus, we make the Spoken Wikipedia accessible for sustainable research. We present our open-source software pipeline that downloads, extracts, normalizes and text–speech aligns the Spoken Wikipedia. Additional language versions can be exploited by adapting configuration files or extending the software if necessary for language peculiarities. We also present and analyze the resulting corpora for German, English, and Dutch, which presently total 1005 h and grow at an estimated 87 h per year. The corpora, together with our software, are available via As a prototype usage of the time-aligned corpus, we describe an experiment about the preferred modalities for interacting with information-rich read-out hypertext. We find alignments to help improve user experience and factual information access by enabling targeted interaction.
      PubDate: 2018-01-09
      DOI: 10.1007/s10579-017-9410-y
  • Introduction to the special issue
    • Authors: Laurette Pretorius; Claudia Soria
      Pages: 891 - 895
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-017-9405-8
      Issue No: Vol. 51, No. 4 (2017)
  • Nenek: a cloud-based collaboration platform for the management of
           Amerindian language resources
    • Authors: J. L. Gonzalez; Anuschka van’t Hooft; Jesus Carretero; Victor J. Sosa-Sosa
      Pages: 897 - 925
      Abstract: This article presents Nenek: A cloud-based collaboration platform for language documentation of underresourced languages. Nenek is based on a crowdsourcing scheme that supports native speakers, indigenous associations, government agencies and researchers in the creation of virtual communities of minority language speakers on the Internet. Nenek includes a set of web tools that enables users to work collaboratively on language documentation tasks, build lexicographic assets and produce new language resources. This platform includes a three-stage management model to control the acquisition of existent language resources, the manufacturing of new resources, as well as their distribution within the virtual community and to the general public. In the acquisition stage, existent language resources are either automatically extracted from the web by a crawler or received through donations from users who participate in a monolingual social network. In the manufacturing stage, lexicographic and collaborative tools enable users to build new resources while the acquired and manufactured resources are published in the diffusion stage, either within the virtual community or publicly. We present a life cycle mapping scheme that registers the transformations of the language resources at each of the three stages of language resource management. This scheme also traces the utilization and diffusion of each resource produced by the virtual community. The paper includes a case study in which we present the use of the Nenek platform in a language documentation project of a Mayan language spoken in Mexico's Gulf coast region called Huastec. This case study reveals Nenek's efficiency in terms of acquisition, annotation, manufacturing and diffusion of language resources; it also discusses the participation of the members of the virtual community.
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-016-9361-8
      Issue No: Vol. 51, No. 4 (2017)
  • Crawl and crowd to bring machine translation to under-resourced languages
    • Authors: Antonio Toral; Miquel Esplá-Gomis; Filip Klubička; Nikola Ljubešić; Vassilis Papavassiliou; Prokopis Prokopidis; Raphael Rubino; Andy Way
      Pages: 1019 - 1051
      Abstract: We present a widely applicable methodology to bring machine translation (MT) to under-resourced languages in a cost-effective and rapid manner. Our proposal relies on web crawling to automatically acquire parallel data to train statistical MT systems if any such data can be found for the language pair and domain of interest. If that is not the case, we resort to (1) crowdsourcing to translate small amounts of text (hundreds of sentences), which are then used to tune statistical MT models, and (2) web crawling of vast amounts of monolingual data (millions of sentences), which are then used to build language models for MT. We apply these to two respective use-cases for Croatian, an under-resourced language that has gained relevance since it recently attained official status in the European Union. The first use-case regards tourism, given the importance of this sector to Croatia’s economy, while the second has to do with tweets, due to the growing importance of social media. For tourism, we crawl parallel data from 20 web domains using two state-of-the-art crawlers and explore how to combine the crawled data with bigger amounts of general-domain data. Our domain-adapted system is evaluated on a set of three additional tourism web domains and it outperforms the baseline in terms of automatic metrics and/or vocabulary coverage. In the social media use-case, we deal with tweets from the 2014 edition of the soccer World Cup. We build domain-adapted systems by (1) translating small amounts of tweets to be used for tuning by means of crowdsourcing and (2) crawling vast amounts of monolingual tweets. These systems outperform the baseline (Microsoft Bing) by 7.94 BLEU points (5.11 TER) for Croatian-to-English and by 2.17 points (1.94 TER) for English-to-Croatian on a test set translated by means of crowdsourcing. A complementary manual analysis sheds further light on these results.
      PubDate: 2017-12-01
      DOI: 10.1007/s10579-016-9363-6
      Issue No: Vol. 51, No. 4 (2017)
  • ATR4S: toolkit with state-of-the-art automatic terms recognition methods
           in Scala
    • Authors: Nikita Astrakhantsev
      Abstract: Automatically recognized terminology is widely used for various domain-specific texts processing tasks, such as machine translation, information retrieval or ontology construction. However, there is still no agreement on which methods are best suited for particular settings and, moreover, there is no reliable comparison of already developed methods. We believe that one of the main reasons is the lack of state-of-the-art method implementations, which are usually non-trivial to recreate—mostly, in terms of software engineering efforts. In order to address these issues, we present ATR4S, an open-source software written in Scala that comprises 13 state-of-the-art methods for automatic terminology recognition (ATR) and implements the whole pipeline from text document preprocessing, to term candidates collection, term candidate scoring, and finally, term candidate ranking. It is highly scalable, modular and configurable tool with support of automatic caching. We also compare 13 state-of-the-art methods on 7 open datasets by average precision and processing time. Experimental comparison reveals that no single method demonstrates best average precision for all datasets and that other available tools for ATR do not contain the best methods.
      PubDate: 2017-12-21
      DOI: 10.1007/s10579-017-9409-4
  • Building the Galician wordnet: methods and applications
    • Authors: Xavier Gómez Guinovart; Miguel Anxo Solla Portela
      Abstract: This paper presents the different methodologies and resources used to build Galnet, the Galician version of WordNet. It reviews the different extraction processes and the lexicographical and textual sources used to develop this resource, and describes some of its applications in ontology research and terminology processing.
      PubDate: 2017-11-29
      DOI: 10.1007/s10579-017-9408-5
  • The corpus of Basque simplified texts (CBST)
    • Authors: Itziar Gonzalez-Dios; María Jesús Aranzabe; Arantza Díaz de Ilarraza
      Abstract: In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.
      PubDate: 2017-11-18
      DOI: 10.1007/s10579-017-9407-6
  • Automatic speech recognition system for Tunisian dialect
    • Authors: Abir Masmoudi; Fethi Bougares; Mariem Ellouze; Yannick Estève; Lamia Belguith
      Abstract: Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.
      PubDate: 2017-09-22
      DOI: 10.1007/s10579-017-9402-y
  • The challenging task of summary evaluation: an overview
    • Authors: Elena Lloret; Laura Plaza; Ahmet Aker
      Abstract: Evaluation is crucial in the research and development of automatic summarization applications, in order to determine the appropriateness of a summary based on different criteria, such as the content it contains, and the way it is presented. To perform an adequate evaluation is of great relevance to ensure that automatic summaries can be useful for the context and/or application they are generated for. To this end, researchers must be aware of the evaluation metrics, approaches, and datasets that are available, in order to decide which of them would be the most suitable to use, or to be able to propose new ones, overcoming the possible limitations that existing methods may present. In this article, a critical and historical analysis of evaluation metrics, methods, and datasets for automatic summarization systems is presented, where the strengths and weaknesses of evaluation efforts are discussed and the major challenges to solve are identified. Therefore, a clear up-to-date overview of the evolution and progress of summarization evaluation is provided, giving the reader useful insights into the past, present and latest trends in the automatic evaluation of summaries.
      PubDate: 2017-09-02
      DOI: 10.1007/s10579-017-9399-2
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-