Journal Cover
International Journal on Digital Libraries
Journal Prestige (SJR): 0.441
Citation Impact (citeScore): 2
Number of Followers: 710  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1432-1300 - ISSN (Online) 1432-5012
Published by Springer-Verlag Homepage  [2352 journals]
  • Neural ParsCit: a deep learning-based reference string parser
    • Authors: Animesh Prasad; Manpreet Kaur; Min-Yen Kan
      Pages: 323 - 337
      Abstract: We present a deep learning approach for the core digital libraries task of parsing bibliographic reference strings. We deploy the state-of-the-art long short-term memory (LSTM) neural network architecture, a variant of a recurrent neural network to capture long-range dependencies in reference strings. We explore word embeddings and character-based word embeddings as an alternative to handcrafted features. We incrementally experiment with features, architectural configurations, and the diversity of the dataset. Our final model is an LSTM-based architecture, which layers a linear chain conditional random field (CRF) over the LSTM output. In extensive experiments in both English in-domain (computer science) and out-of-domain (humanities) test cases, as well as multilingual data, our results show a significant gain ( \(p<0.01\) ) over the reported state-of-the-art CRF-only-based parser.
      PubDate: 2018-11-01
      DOI: 10.1007/s00799-018-0242-1
      Issue No: Vol. 19, No. 4 (2018)
       
  • Promoting user engagement with digital cultural heritage collections
    • Authors: Maristella Agosti; Nicola Orio; Chiara Ponchia
      Pages: 353 - 366
      Abstract: In the context of cooperating in a project whose central aim has been the production of a corpus agnostic research environment supporting access to and exploitation of digital cultural heritage collections, we have worked towards promoting user engagement with the collections. The aim of this paper is to present the methods and the solutions that have been envisaged and implemented to engage a diversified range of users with digital collections. Innovative solutions to stimulate and enhance user engagement have been achieved through a sustained and broad-based involvement of different cohorts of users. In particular, we propose the use of narratives to support and guide users within the collection and present them the main available tools. In moving beyond the specialized, search-based and stereotyped norm, the environment that we have contributed to developing offers a new approach for accessing and interacting with cultural heritage collections. It shows the value of an adaptive interface that dynamically responds to support the user, whatever his or her level of experience with digital environments or familiarity with the content may be.
      PubDate: 2018-11-01
      DOI: 10.1007/s00799-018-0245-y
      Issue No: Vol. 19, No. 4 (2018)
       
  • Introduction to the special issue on bibliometric-enhanced information
           retrieval and natural language processing for digital libraries (BIRNDL)
    • Authors: Philipp Mayr; Ingo Frommholz; Guillaume Cabanac; Muthu Kumar Chandrasekaran; Kokil Jaidka; Min-Yen Kan; Dietmar Wolfram
      Pages: 107 - 111
      Abstract: The large scale of scholarly publications poses a challenge for scholars in information seeking and sensemaking. Bibliometric, information retrieval (IR), text mining, and natural language processing techniques can assist to address this challenge, but have yet to be widely used in digital libraries (DL). This special issue on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL) was compiled after the first joint BIRNDL workshop that was held at the joint conference on digital libraries (JCDL 2016) in Newark, New Jersey, USA. It brought together IR and DL researchers and professionals to elaborate on new approaches in natural language processing, information retrieval, scientometric, and recommendation techniques that can advance the state of the art in scholarly document understanding, analysis, and retrieval at scale. This special issue includes 14 papers: four extended papers originating from the first BIRNDL workshop 2016 and the BIR workshop at ECIR 2016, four extended system reports of the CL-SciSumm Shared Task 2016 and six original research papers submitted via the open call for papers.
      PubDate: 2018-09-01
      DOI: 10.1007/s00799-017-0230-x
      Issue No: Vol. 19, No. 2-3 (2018)
       
  • Insights from CL-SciSumm 2016: the faceted scientific document
           summarization Shared Task
    • Authors: Kokil Jaidka; Muthu Kumar Chandrasekaran; Sajal Rustagi; Min-Yen Kan
      Pages: 163 - 171
      Abstract: We describe the participation and the official results of the 2nd Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm), held as a part of the BIRNDL workshop at the Joint Conference for Digital Libraries 2016 in Newark, New Jersey. CL-SciSumm is the first medium-scale Shared Task on scientific document summarization in the computational linguistics (CL) domain. Participants were provided a training corpus of 30 topics, each comprising of a reference paper (RP) and 10 or more citing papers, all of which cite the RP. For each citation, the text spans (i.e., citances) that pertain to the RP have been identified. Participants solved three sub-tasks in automatic research paper summarization using this text corpus. Fifteen teams from six countries registered for the Shared Task, of which ten teams ultimately submitted and presented their results. The annotated corpus comprised 30 target papers—currently the largest available corpora of its kind. The corpus is available for free download and use at https://github.com/WING-NUS/scisumm-corpus.
      PubDate: 2018-09-01
      DOI: 10.1007/s00799-017-0221-y
      Issue No: Vol. 19, No. 2-3 (2018)
       
  • Identifying reference spans: topic modeling and word embeddings help IR
    • Authors: Luis Moraes; Shahryar Baki; Rakesh Verma; Daniel Lee
      Pages: 191 - 202
      Abstract: The CL-SciSumm 2016 shared task introduced an interesting problem: given a document D and a piece of text that cites D, how do we identify the text spans of D being referenced by the piece of text' The shared task provided the first annotated dataset for studying this problem. We present an analysis of our continued work in improving our system’s performance on this task. We demonstrate how topic models and word embeddings can be used to surpass the previously best performing system.
      PubDate: 2018-09-01
      DOI: 10.1007/s00799-017-0220-z
      Issue No: Vol. 19, No. 2-3 (2018)
       
  • An MEI-based standard encoding for hierarchical music analyses
    • Authors: David Rizo; Alan Marsden
      Abstract: We propose a standard representation for hierarchical musical analyses as an extension to the Music Encoding Initiative (MEI) representation for music. Analyses of music need to be represented in digital form for the same reasons as music: preservation, sharing of data, data linking, and digital processing. Systems exist for representing sequential information, but many music analyses are hierarchical, whether represented explicitly in trees or graphs or not. Features of MEI allow the representation of an analysis to be directly associated with the elements of the music analyzed. MEI’s basis in TEI (Text Encoding Initiative), allows us to design a scheme which reuses some of the elements of TEI for the representation of trees and graphs. In order to capture both the information specific to a type of music analysis and the underlying form of an analysis as a tree or graph, we propose related “semantic” encodings, which capture the detailed information, and generic “non-semantic” encodings which expose the tree or graph structure. We illustrate this with examples of representations of a range of different kinds of analysis.
      PubDate: 2018-12-08
      DOI: 10.1007/s00799-018-0262-x
       
  • A framework for modelling and visualizing the US Constitutional Convention
           of 1787
    • Authors: Nicholas Cole; Alfie Abdul-Rahman; Grace Mallon
      Abstract: This paper describes a new approach to the presentation of records relating to formal negotiations and the texts that they create. It describes the architecture of a model, platform, and web interface (https://www.quillproject.net) that can be used by domain experts to convert the records typical of formal negotiations into a model of decision-making (with minimal training). This model has implications for both research and teaching, by allowing for better qualitative and quantitative analysis of negotiations. The platform emphasizes the reconstruction as closely as possible of the context within which proposals and decisions are made. The usability and benefits of a generic platform are illustrated by a presentation of the records relating to the 1787 Constitutional Convention that wrote the Constitution of the USA.
      PubDate: 2018-11-26
      DOI: 10.1007/s00799-018-0263-9
       
  • Recent applications of Knowledge Organization Systems: introduction to a
           special issue
    • Authors: Koraljka Golub; Rudi Schmiede; Douglas Tudhope
      PubDate: 2018-11-21
      DOI: 10.1007/s00799-018-0264-8
       
  • An empirically validated, onomasiologically structured, and linguistically
           motivated online terminology
    • Authors: Karolina Suchowolec; Christian Lang; Roman Schneider
      Abstract: Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.
      PubDate: 2018-11-17
      DOI: 10.1007/s00799-018-0254-x
       
  • A pragmatic approach to hierarchical categorization of research expertise
           in the presence of scarce information
    • Authors: Gustavo Oliveira de Siqueira; Sérgio Canuto; Marcos André Gonçalves; Alberto H. F. Laender
      Abstract: Throughout the history of science, different knowledge areas have collaborated to overcome major research challenges. The task of associating a researcher with such areas makes a series of tasks feasible such as the organization of digital repositories, expertise recommendation and the formation of research groups for complex problems. In this article, we propose a simple yet effective automatic classification model that is capable of categorizing research expertise according to a knowledge area classification scheme. Our proposal relies on discriminatory evidence provided by the title of academic works, which is the minimum information capable of relating a researcher to its knowledge area. Our experiments show that using supervised machine learning methods trained with manually labeled information, it is possible to produce effective classification models.
      PubDate: 2018-11-16
      DOI: 10.1007/s00799-018-0260-z
       
  • Automated identification of media bias in news articles: an
           interdisciplinary literature review
    • Authors: Felix Hamborg; Karsten Donnay; Bela Gipp
      Abstract: Media bias, i.e., slanted news coverage, can strongly impact the public perception of the reported topics. In the social sciences, research over the past decades has developed comprehensive models to describe media bias and effective, yet often manual and thus cumbersome, methods for analysis. In contrast, in computer science fast, automated, and scalable methods are available, but few approaches systematically analyze media bias. The models used to analyze media bias in computer science tend to be simpler compared to models established in the social sciences, and do not necessarily address the most pressing substantial questions, despite technically superior approaches. Computer science research on media bias thus stands to profit from a closer integration of models for the study of media bias developed in the social sciences with automated methods from computer science. This article first establishes a shared conceptual understanding by mapping the state of the art from the social sciences to a framework, which can be targeted by approaches from computer science. Next, we investigate different forms of media bias and review how each form is analyzed in the social sciences. For each form, we then discuss methods from computer science suitable to (semi-)automate the corresponding analysis. Our review suggests that suitable, automated methods from computer science, primarily in the realm of natural language processing, are already available for each of the discussed forms of media bias, opening multiple directions for promising further research in computer science in this area.
      PubDate: 2018-11-16
      DOI: 10.1007/s00799-018-0261-y
       
  • Anatomy of scholarly information behavior patterns in the wake of academic
           social media platforms
    • Authors: Hamed Alhoori; Mohammed Samaka; Richard Furuta; Edward A. Fox
      Abstract: As more scholarly content is born digital or converted to a digital format, digital libraries are becoming increasingly vital to researchers seeking to leverage scholarly big data for scientific discovery. Although scholarly products are available in abundance—especially in environments created by the advent of social networking services—little is known about international scholarly information needs, information-seeking behavior, or information use. The purpose of this paper is to address these gaps via an in-depth analysis of the information needs and information-seeking behavior of researchers, both students and faculty, at two universities, one in the USA and the other in Qatar. Based on this analysis, the study identifies and describes new behavior patterns on the part of researchers as they engage in the information-seeking process. The analysis reveals that the use of academic social networks has notable effects on various scholarly activities. Further, this study identifies differences between students and faculty members in regard to their use of academic social networks, and it identifies differences between researchers according to discipline. Although the researchers who participated in the present study represent a range of disciplinary and cultural backgrounds, the study reports a number of similarities in terms of the researchers’ scholarly activities.
      PubDate: 2018-11-03
      DOI: 10.1007/s00799-018-0255-9
       
  • Assessing plausibility of scientific claims to support high-quality
           content in digital collections
    • Authors: José María González Pinto; Wolf-Tilo Balke
      Abstract: This paper presents a formalization and extension of a novel approach to support high-quality content in digital libraries. Building on the concept of plausibility used in cognitive sciences, we aim at judging the plausibility of new scientific papers in light of prior knowledge. In particular, our work proposes a novel assessment of scientific papers to qualitatively support the work of reviewers. To do this, our approach focuses on the key aspect of scientific papers: claims. Claims are sentences found in empirical scientific papers that state statistical associations between entities and correspond to the core contributions of the papers. We can find these types of claims, for instance, in medicine, chemistry, and biology, where the consumption of a drug, a substance, or a product causes an effect on some other type of entity such as a disease, or another drug or substance. To operationalize the notion of plausibility, we promote claims as first-class citizens for scientific digital libraries and exploit state-of-the-art neural embedding representations of text and topic models. As a proof of concept of the potential usefulness of this notion of plausibility, we study and report extensive experiments on documents with scientific papers from the PubMed digital library.
      PubDate: 2018-10-28
      DOI: 10.1007/s00799-018-0256-8
       
  • Towards extracting event-centric collections from Web archives
    • Authors: Gerhard Gossen; Thomas Risse; Elena Demidova
      Abstract: Web archives constitute an increasingly important source of information for computer scientists, humanities researchers and journalists interested in studying past events. However, currently there are no access methods that help Web archive users to efficiently access event-centric information in large-scale archives that go beyond the retrieval of individual disconnected documents. In this article, we tackle the novel problem of extracting interlinked event-centric document collections from large-scale Web archives to facilitate an efficient and intuitive access to information regarding past events. We address this problem by: (1) facilitating users to define event-centric document collections in an intuitive way through a Collection Specification; (2) development of a specialised extraction method that adapts focused crawling techniques to the Web archive settings; and (3) definition of a function to judge the relevance of the archived documents with respect to the Collection Specification taking into account the topical and temporal relevance of the documents. Our extended experiments on the German Web archive (covering a time period of 19 years) demonstrate that our method enables efficient extraction of event-centric collections for different event types.
      PubDate: 2018-10-27
      DOI: 10.1007/s00799-018-0258-6
       
  • Cultural heritage metadata aggregation using web technologies: IIIF,
           Sitemaps and Schema.org
    • Authors: Nuno Freire; Glen Robson; John B. Howard; Hugo Manguinhas; Antoine Isaac
      Abstract: In the World Wide Web, a very large number of resources are made available through digital libraries. We (Europeana and data providers) report on case studies that tested the application of some of the most promising Web technologies, exploring several solutions based on the International Image Interoperability Framework (IIIF) and Sitemaps. We also describe an analysis of the Schema.org vocabulary for application in the context of cultural heritage and metadata aggregation. The solutions were tested successfully and leveraged on existing technology and knowledge in cultural heritage, with low implementation barriers. The future challenges lie in choosing among the several possibilities and standardizing solution(s). Europeana will proceed with recommendations for its network and is actively working within the IIIF community to achieve this goal.
      PubDate: 2018-10-26
      DOI: 10.1007/s00799-018-0259-5
       
  • Tracking the history and evolution of entities: entity-centric temporal
           analysis of large social media archives
    • Authors: Pavlos Fafalios; Vasileios Iosifidis; Kostas Stefanidis; Eirini Ntoutsi
      Abstract: How did the popularity of the Greek Prime Minister evolve in 2015' How did the predominant sentiment about him vary during that period' Were there any controversial sub-periods' What other entities were related to him during these periods' To answer these questions, one needs to analyze archived documents and data about the query entities, such as old news articles or social media archives. In particular, user-generated content posted in social networks, like Twitter and Facebook, can be seen as a comprehensive documentation of our society, and thus, meaningful analysis methods over such archived data are of immense value for sociologists, historians, and other interested parties who want to study the history and evolution of entities and events. To this end, in this paper we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities were reflected in social media in different time periods and under different aspects, like popularity, attitude, controversiality, and connectedness with other entities. A case study using a large Twitter archive of 4 years illustrates the insights that can be gained by such an entity-centric and multi-aspect analysis.
      PubDate: 2018-10-26
      DOI: 10.1007/s00799-018-0257-7
       
  • Designing an ontology for managing the diets of hypertensive individuals
    • Authors: Julaine Clunis
      Abstract: This paper describes the development of an ontology which could act as a recommendation system for hypertensive individuals. The author has conceptualized and developed an ontology which describes recipes, nutrients in foods and the interactions between nutrients and prescribed drugs, disease and general health. The paper begins with a review of the literature on several ontology designs. The previous ontology models guide the development of classes, properties and restrictions built into the hypertensive diet ontology. The model is constructed following the Ontology 101 methodology. The ontology was validated using proto-personas to create competency questions which were used to test the ontology. The findings show that the ontology may be used to provide information with the goal of assisting individuals in making sense of complex effects of diet on health and outcomes. It is concluded that the ontology can be used to provide support to patients as they seek to manage chronic illnesses such as hypertension. The study has relevance for creators of knowledge organization systems and ontologies in the healthcare field.
      PubDate: 2018-10-22
      DOI: 10.1007/s00799-018-0253-y
       
  • From subtitles to substantial metadata: examining characteristics of named
           entities and their role in indexing
    • Authors: Anne-Stine Ruud Husevåg
      Abstract: This paper explores the possible role of named entities extracted from text in subtitles in automatic indexing of TV programs. This is done by analyzing entity types, name density and name frequencies in subtitles and metadata records from different genres of TV programs. The name density in metadata records is much higher than the name density in subtitles, and named entities with high frequencies in the subtitles are more likely to be mentioned in the metadata records. Further analysis of the metadata records indicates an increase in use of named entities in metadata in accordance with the frequency the entities have in the subtitles. The most substantial difference was between a frequency of one or two, where the named entities with a frequency of two in the subtitles were twice as likely to be present in the metadata records. Personal names, geographical names and names of organizations were the most prominent entity types in both the news subtitles and news metadata, while persons, creative works and locations are the most prominent in culture programs. It is not possible to extract all the named entities in the manually created metadata records by applying named entity recognition to the subtitles for the same programs, but it is possible to find a large subset of named entities for some categories in certain genres. The results reported in this paper show that subtitles are a good source for personal names for all the genres covered in our study, and for creative works in literature programs. In total, it was possible to find 38% of the named entities in metadata records for news programs, 32% for literature programs, while 21% of the named entities in metadata records for talk shows were also present in the subtitles for the programs.
      PubDate: 2018-10-16
      DOI: 10.1007/s00799-018-0252-z
       
  • Image libraries and their scholarly use in the field of art and
           architectural history
    • Authors: Sander Münster; Christina Kamposiori; Kristina Friedrichs; Cindy Kröber
      Abstract: The use of image libraries in the field of art and architectural history has been the subject of numerous research studies over the years. However, since previous investigations have focused, primarily, either on user behavior or reviewed repositories, our aim is to bring together both approaches. Against this background, this paper identifies the main characteristics of research and information behavior of art and architectural history scholars and students in the UK and Germany and presents a structured overview of currently available scholarly image libraries. Finally, the implications for a user-centered design of information resources and, in particular, image libraries are provided.
      PubDate: 2018-07-07
      DOI: 10.1007/s00799-018-0250-1
       
  • Open information extraction as an intermediate semantic structure for
           Persian text summarization
    • Authors: Mahmoud Rahat; Alireza Talebpour
      Abstract: Semantic applications typically exploit structures such as dependency parse trees, phrase-chunking, semantic role labeling or open information extraction. In this paper, we introduce a novel application of Open IE as an intermediate layer for text summarization. Text summarization is an important method for providing relevant information in large digital libraries. Open IE is referred to the process of extracting machine-understandable structural propositions from text. We use these propositions as a building block to shorten the sentence and generate a summary of the text. The proposed system offers a new form of summarization that is able to break the structure of the sentence and extract the most significant sub-sentential elements. Other advantages include the ability to identify and eliminate less important sections of the sentence (such as adverbs, adjectives, appositions or dependent clauses), or duplicate pieces of sentences which in turn opens up the space for entering more sentences in the summary to enhance the coverage and coherency of it. The proposed system is localized for Persian language; however, it can be adopted to other languages. Experiments performed on a standard data set “Pasokh” with a standard comparison tool showed promising results for the proposed approach. We used summaries produced by the system in a real-world application in the virtual library of Shahid Beheshti University and received good feedbacks from users.
      PubDate: 2018-06-28
      DOI: 10.1007/s00799-018-0244-z
       
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
 
Home (Search)
Subjects A-Z
Publishers A-Z
Customise
APIs
Your IP address: 54.226.25.74
 
About JournalTOCs
API
Help
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-