Journal Cover International Journal on Digital Libraries
  [SJR: 0.375]   [H-I: 28]   [606 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1432-1300 - ISSN (Online) 1432-5012
   Published by Springer-Verlag Homepage  [2353 journals]
  • Scholarly Ontology: modelling scholarly practices
    • Authors: Vayianos Pertsas; Panos Constantopoulos
      Pages: 173 - 190
      Abstract: Abstract In this paper we present the Scholarly Ontology (SO), an ontology for modelling scholarly practices, inspired by business process modelling and Cultural-Historical Activity Theory. The SO is based on empirical research and earlier models and is designed so as to incorporate related works through a modular structure. The SO is an elaboration of the domain-independent core part of the NeDiMAH Methods Ontology addressing the scholarly ecosystem of Digital Humanities. It thus provides a basis for developing domain-specific scholarly work ontologies springing from a common root. We define the basic concepts of the model and their semantic relations through four complementary perspectives on scholarly work: activity, procedure, resource and agency. As a use case we present a modelling example and argue on the purpose of use of the model through the presentation of indicative SPRQL and SQWRL queries that highlight the benefits of its serialization in RDFS. The SO includes an explicit treatment of intentionality and its interplay with functionality, captured by different parts of the model. We discuss the role of types as the semantic bridge between those two parts and explore several patterns that can be exploited in designing reusable access structures and conformance rules. Related taxonomies and ontologies and their possible reuse within the framework of SO are reviewed.
      PubDate: 2017-09-01
      DOI: 10.1007/s00799-016-0169-3
      Issue No: Vol. 18, No. 3 (2017)
  • The evolution of web archiving
    • Authors: Miguel Costa; Daniel Gomes; Mário J. Silva
      Pages: 191 - 205
      Abstract: Abstract Web archives preserve information published on the web or digitized from printed publications. Much of this information is unique and historically valuable. However, the lack of knowledge about the global status of web archiving initiatives hamper their improvement and collaboration. To overcome this problem, we conducted two surveys, in 2010 and 2014, which provide a comprehensive characterization on web archiving initiatives and their evolution. We identified several patterns and trends that highlight challenges and opportunities. We discuss these patterns and trends that enable to define strategies, estimate resources and provide guidelines for research and development of better technology. Our results show that during the last years there was a significant growth in initiatives and countries hosting these initiatives, volume of data and number of contents preserved. While this indicates that the web archiving community is dedicating a growing effort on preserving digital information, other results presented throughout the paper raise concerns such as the small amount of archived data in comparison with the amount of data that is being published online.
      PubDate: 2017-09-01
      DOI: 10.1007/s00799-016-0171-9
      Issue No: Vol. 18, No. 3 (2017)
  • Inheriting library cards to Babel and Alexandria: contemporary metaphors
           for the digital library
    • Authors: Paul Gooding; Melissa Terras
      Pages: 207 - 222
      Abstract: Abstract Librarians have been consciously adopting metaphors to describe library concepts since the nineteenth century, helping us to structure our understanding of new technologies. As a profession, we have drawn extensively on these figurative frameworks to explore issues surrounding the digital library, yet very little has been written to date which interrogates how these metaphors have developed over the years. Previous studies have explored library metaphors, using either textual analysis or ethnographic methods to investigate their usage. However, this is to our knowledge the first study to use bibliographic data, corpus analysis, qualitative sentiment weighting and close reading to study particular metaphors in detail. It draws on a corpus of over 450 articles to study the use of the metaphors of the Library of Alexandria and Babel, concluding that both have been extremely useful as framing metaphors for the digital library. However, their longstanding use has seen them become stretched as metaphors, meaning that the field’s figurative framework now fails to represent the changing technologies which underpin contemporary digital libraries.
      PubDate: 2017-09-01
      DOI: 10.1007/s00799-016-0194-2
      Issue No: Vol. 18, No. 3 (2017)
  • Documenting archaeological science with CIDOC CRM
    • Authors: Franco Niccolucci
      Pages: 223 - 231
      Abstract: Abstract The paper proposes to use CIDOC CRM and its extensions CRMsci and CRMdig to document the scientific experiments involved in archaeological investigations. The nature of such experiments is analysed and ways to document their important aspects are provided using existing classes and properties from the CRM or from the above-mentioned schemas, together with newly defined ones, forming an extension of the CRM called CRMas.
      PubDate: 2017-09-01
      DOI: 10.1007/s00799-016-0199-x
      Issue No: Vol. 18, No. 3 (2017)
  • Tape music archives: from preservation to access
    • Authors: Carlo Fantozzi; Federica Bressan; Niccolò Pretto; Sergio Canazza
      Pages: 233 - 249
      Abstract: Abstract This article presents a methodology for the active preservation of, and the access to, magnetic tapes of audio archives. The methodology has been defined and implemented by a multidisciplinary team involving engineers as well as musicians, composers and archivists. The strong point of the methodology is the philological awareness that influenced the development of digital tools, which consider the critical questions in the historian and musicologist’s approach: the secondary information and the history of transmission of an audio document.
      PubDate: 2017-09-01
      DOI: 10.1007/s00799-017-0208-8
      Issue No: Vol. 18, No. 3 (2017)
  • On research data publishing
    • Authors: Leonardo Candela; Donatella Castelli; Paolo Manghi; Sarah Callaghan
      Pages: 73 - 75
      PubDate: 2017-06-01
      DOI: 10.1007/s00799-017-0213-y
      Issue No: Vol. 18, No. 2 (2017)
  • Automating data sharing through authoring tools
    • Authors: John R. Kitchin; Ana E. Van Gulick; Lisa D. Zilinski
      Pages: 93 - 98
      Abstract: Abstract In the current scientific publishing landscape, there is a need for an authoring workflow that easily integrates data and code into manuscripts and that enables the data and code to be published in reusable form. Automated embedding of data and code into published output will enable superior communication and data archiving. In this work, we demonstrate a proof of concept for a workflow, org-mode, which successfully provides this authoring capability and workflow integration. We illustrate this concept in a series of examples for potential uses of this workflow. First, we use data on citation counts to compute the h-index of an author, and show two code examples for calculating the h-index. The source for each example is automatically embedded in the PDF during the export of the document. We demonstrate how data can be embedded in image files, which themselves are embedded in the document. Finally, metadata about the embedded files can be automatically included in the exported PDF, and accessed by computer programs. In our customized export, we embedded metadata about the attached files in the PDF in an Info field. A computer program could parse this output to get a list of embedded files and carry out analyses on them. Authoring tools such as Emacs + org-mode can greatly facilitate the integration of data and code into technical writing. These tools can also automate the embedding of data into document formats intended for consumption.
      PubDate: 2017-06-01
      DOI: 10.1007/s00799-016-0173-7
      Issue No: Vol. 18, No. 2 (2017)
  • Experiences in integrated data and research object publishing using GigaDB
    • Authors: Scott C Edmunds; Peter Li; Christopher I Hunter; Si Zhe Xiao; Robert L Davidson; Nicole Nogoy; Laurie Goodman
      Pages: 99 - 111
      Abstract: Abstract In the era of computation and data-driven research, traditional methods of disseminating research are no longer fit-for-purpose. New approaches for disseminating data, methods and results are required to maximize knowledge discovery. The “long tail” of small, unstructured datasets is well catered for by a number of general-purpose repositories, but there has been less support for “big data”. Outlined here are our experiences in attempting to tackle the gaps in publishing large-scale, computationally intensive research. GigaScience is an open-access, open-data journal aiming to revolutionize large-scale biological data dissemination, organization and re-use. Through use of the data handling infrastructure of the genomics centre BGI, GigaScience links standard manuscript publication with an integrated database (GigaDB) that hosts all associated data, and provides additional data analysis tools and computing resources. Furthermore, the supporting workflows and methods are also integrated to make published articles more transparent and open. GigaDB has released many new and previously unpublished datasets and data types, including as urgently needed data to tackle infectious disease outbreaks, cancer and the growing food crisis. Other “executable” research objects, such as workflows, virtual machines and software from several GigaScience articles have been archived and shared in reproducible, transparent and usable formats. With data citation producing evidence of, and credit for, its use in the wider research community, GigaScience demonstrates a move towards more executable publications. Here data analyses can be reproduced and built upon by users without coding backgrounds or heavy computational infrastructure in a more democratized manner.
      PubDate: 2017-06-01
      DOI: 10.1007/s00799-016-0174-6
      Issue No: Vol. 18, No. 2 (2017)
  • Advancing research data publishing practices for the social sciences: from
           archive activity to empowering researchers
    • Authors: Veerle Van den Eynden; Louise Corti
      Pages: 113 - 121
      Abstract: Abstract Sharing and publishing social science research data have a long history in the UK, through long-standing agreements with government agencies for sharing survey data and the data policy, infrastructure, and data services supported by the Economic and Social Research Council. The UK Data Service and its predecessors developed data management, documentation, and publishing procedures and protocols that stand today as robust templates for data publishing. As the ESRC research data policy requires grant holders to submit their research data to the UK Data Service after a grant ends, setting standards and promoting them has been essential in raising the quality of the resulting research data being published. In the past, received data were all processed, documented, and published for reuse in-house. Recent investments have focused on guiding and training researchers in good data management practices and skills for creating shareable data, as well as a self-publishing repository system, ReShare. ReShare also receives data sets described in published data papers and achieves scientific quality assurance through peer review of submitted data sets before publication. Social science data are reused for research, to inform policy, in teaching and for methods learning. Over a 10 years period, responsive developments in system workflows, access control options, persistent identifiers, templates, and checks, together with targeted guidance for researchers, have helped raise the standard of self-publishing social science data. Lessons learned and developments in shifting publishing social science data from an archivist responsibility to a researcher process are showcased, as inspiration for institutions setting up a data repository.
      PubDate: 2017-06-01
      DOI: 10.1007/s00799-016-0177-3
      Issue No: Vol. 18, No. 2 (2017)
  • Semantic representation and enrichment of information retrieval
           experimental data
    • Authors: Gianmaria Silvello; Georgeta Bordea; Nicola Ferro; Paul Buitelaar; Toine Bogers
      Pages: 145 - 172
      Abstract: Abstract Experimental evaluation carried out in international large-scale campaigns is a fundamental pillar of the scientific and technological advancement of information retrieval (IR) systems. Such evaluation activities produce a large quantity of scientific and experimental data, which are the foundation for all the subsequent scientific production and development of new systems. In this work, we discuss how to semantically annotate and interlink this data, with the goal of enhancing their interpretation, sharing, and reuse. We discuss the underlying evaluation workflow and propose a resource description framework model for those workflow parts. We use expertise retrieval as a case study to demonstrate the benefits of our semantic representation approach. We employ this model as a means for exposing experimental data as linked open data (LOD) on the Web and as a basis for enriching and automatically connecting this data with expertise topics and expert profiles. In this context, a topic-centric approach for expert search is proposed, addressing the extraction of expertise topics, their semantic grounding with the LOD cloud, and their connection to IR experimental data. Several methods for expert profiling and expert finding are analysed and evaluated. Our results show that it is possible to construct expert profiles starting from automatically extracted expertise topics and that topic-centric approaches outperform state-of-the-art language modelling approaches for expert finding.
      PubDate: 2017-06-01
      DOI: 10.1007/s00799-016-0172-8
      Issue No: Vol. 18, No. 2 (2017)
  • Documenting a song culture: the Dutch Song Database as a resource for
           musicological research
    • Authors: Peter van Kranenburg; Martine de Bruin; Anja Volk
      Abstract: Abstract The Dutch Song Database is a digital repository documenting Dutch song culture in past and present. It contains more than 173 thousand references to song occurrences in the Dutch and Flemish language, from the Middle Ages up to the present, as well as over 18 thousand descriptions of song sources, such as song books, manuscripts and field recordings, all adhering to high quality standards. In this paper, we present the history and functionality of the database, and we demonstrate how the Dutch Song Database facilitates and enables musicological research by presenting its contents and search functionalities in a number of exemplary cases. We discuss difficulties and impediments that were encountered during the development of the database, and we sketch a future prospect for further development in the European context.
      PubDate: 2017-09-13
      DOI: 10.1007/s00799-017-0228-4
  • On providing semantic alignment and unified access to music library
    • Authors: David M. Weigl; David Lewis; Tim Crawford; Ian Knopke; Kevin R. Page
      Abstract: Abstract A variety of digital data sources—including institutional and formal digital libraries, crowd-sourced community resources, and data feeds provided by media organisations such as the BBC—expose information of musicological interest, describing works, composers, performers, and wider historical and cultural contexts. Aggregated access across such datasets is desirable as these sources provide complementary information on shared real-world entities. Where datasets do not share identifiers, an alignment process is required, but this process is fraught with ambiguity and difficult to automate, whereas manual alignment may be time-consuming and error-prone. We address this problem through the application of a Linked Data model and framework to assist domain experts in this process. Candidate alignment suggestions are generated automatically based on textual and on contextual similarity. The latter is determined according to user-configurable weighted graph traversals. Match decisions confirming or disputing the candidate suggestions are obtained in conjunction with user insight and expertise. These decisions are integrated into the knowledge base, enabling further iterative alignment, and simplifying the creation of unified viewing interfaces. Provenance of the musicologist’s judgement is captured and published, supporting scholarly discourse and counter-proposals. We present our implementation and evaluation of this framework, conducting a user study with eight musicologists. We further demonstrate the value of our approach through a case study providing aligned access to catalogue metadata and digitised score images from the British Library and other sources, and broadcast data from the BBC Radio 3 Early Music Show.
      PubDate: 2017-08-28
      DOI: 10.1007/s00799-017-0223-9
  • Investigating exploratory search activities based on the stratagem level
           in digital libraries
    • Authors: Zeljko Carevic; Maria Lusky; Wilko van Hoek; Philipp Mayr
      Abstract: Abstract In this paper, we present the results of a user study on exploratory search activities in a social science digital library. We conducted a user study with 32 participants with a social sciences background—16 postdoctoral researchers and 16 students—who were asked to solve a task on searching related work to a given topic. The exploratory search task was performed in a 10-min time slot. The use of certain search activities is measured and compared to gaze data recorded with an eye tracking device. We use a novel tree graph representation to visualise the users’ search patterns and introduce a way to combine multiple search session trees. The tree graph representation is capable of creating one single tree for multiple users and identifying common search patterns. In addition, the information behaviour of students and postdoctoral researchers is being compared. The results show that search activities on the stratagem level are frequently utilised by both user groups. The most heavily used search activities were keyword search, followed by browsing through references and citations, and author searching. The eye tracking results showed an intense examination of documents metadata, especially on the level of citations and references. When comparing the group of students and postdoctoral researchers, we found significant differences regarding gaze data on the area of the journal name of the seed document. In general, we found a tendency of the postdoctoral researchers to examine the metadata records more intensively with regard to dwell time and the number of fixations. By creating combined session trees and deriving subtrees from those, we were able to identify common patterns like economic (explorative) and exhaustive (navigational) behaviour. Our results show that participants utilised multiple search strategies starting from the seed document, which means that they examined different paths to find related publications.
      PubDate: 2017-08-14
      DOI: 10.1007/s00799-017-0226-6
  • Extracting discourse elements and annotating scientific documents using
           the SciAnnotDoc model: a use case in gender documents
    • Authors: Hélène de Ribaupierre; Gilles Falquet
      Abstract: Abstract When scientists are searching for information, they generally have a precise objective in mind. Instead of looking for documents “about a topic T”, they try to answer specific questions such as finding the definition of a concept, finding results for a particular problem, checking whether an idea has already been tested, or comparing the scientific conclusions of two articles. Answering these precise or complex queries on a corpus of scientific documents requires precise modelling of the full content of the documents. In particular, each document element must be characterised by its discourse type (hypothesis, definition, result, method, etc.). In this paper, we present a scientific document model (SciAnnotDoc ontology), developed from an empirical study conducted with scientists, that models the discourse types. We developed an automated process that analyses documents effectively identifying the discourse types of each element. Using syntactic rules (patterns), we evaluated the process output in terms of precision and recall using a previously annotated corpus in Gender Studies. We chose to annotate documents in Humanities, as these documents are well known to be less formalised than those in “hard science”. The process output has been used to create a SciAnnotDoc representation of the corpus on top of which we built a faceted search interface. Experiments with users show that searches using with this interface clearly outperform standard keyword searches for precise or complex queries.
      PubDate: 2017-08-09
      DOI: 10.1007/s00799-017-0227-5
  • The context of multiple in-text references and their signification
    • Authors: Marc Bertin; Iana Atanassova
      Abstract: Abstract In this paper, we consider sentences that contain multiple in-text references (MIR) and their position in the rhetorical structure of articles. We carry out the analysis of MIR in a large-scale dataset of about 80,000 research articles published by the Public Library of Science in 7 journals. We analyze two major characteristics of MIR: their positions in the IMRaD structure of articles and the number of in-text references that make up a MIR in the different journals. We show that MIR are rather frequent in all sections of the rhetorical structure. In the Introduction section, sentences containing MIR account for more than half of the sentences with references. We examine the syntactic patterns that are most used in the contexts of both multiple and single in-text references and show that they are composed, for the most part, of noun groups. We point out the specificity of the Methods section in this respect.
      PubDate: 2017-07-21
      DOI: 10.1007/s00799-017-0225-7
  • Retrieval by recommendation: using LOD technologies to improve digital
           library search
    • Authors: Lisa Wenige; Johannes Ruhland
      Abstract: Abstract This paper investigates how Linked Open Data (LOD) can be used for recommendations and information retrieval within digital libraries. While numerous studies on both research paper recommender systems and Linked Data-enabled recommender systems have been conducted, no previous attempt has been undertaken to explore opportunities of LOD in the context of search and discovery interfaces. We identify central advantages of Linked Open Data with regard to scientific search and propose two novel recommendation strategies, namely flexible similarity detection and constraint-based recommendations. These strategies take advantage of key characteristics of data that adheres to LOD principles. The viability of Linked Data recommendations was extensively evaluated within the scope of a web-based user experiment in the domain of economics. Findings indicate that the proposed methods are well suited to enhance established search functionalities and are thus offering novel ways of resource access. In addition to that, RDF triples from LOD repositories can complement local bibliographic records that are sparse or of poor quality.
      PubDate: 2017-07-19
      DOI: 10.1007/s00799-017-0224-8
  • Discovering the structure and impact of the digital library evaluation
    • Authors: Leonidas Papachristopoulos; Giannis Tsakonas; Moses Boudourides; Michalis Sfakakis; Nikos Kleidis; Sergios Lenis; Christos Papatheodorou
      Abstract: Abstract The multidimensional nature of digital libraries evaluation domain poses several challenges to the research communities that intend to assess criteria, methods, products and tools, and also practice them. The amount of scientific production that is published in the domain hinders and disorientates the interested researchers. These researchers need guidance to exploit effectively the considerable amount of data and the diversity of methods, as well as to identify new research goals and develop their plans for future studies. This paper proposes a methodological pathway to investigate the core topics that structure the digital library evaluation domain and their impact. Further to the exploration of these topical entities, this study investigates also the researchers that contribute substantially to key topics, their communities and their relationships. The proposed methodology exploits topic modeling and network analysis in combination with citation and altmetrics analysis on a corpus consisting of the digital library evaluation papers presented in JCDL, ECDL/TDPL and ICADL conferences in the period 2001–2013.
      PubDate: 2017-06-24
      DOI: 10.1007/s00799-017-0222-x
  • Identifying reference spans: topic modeling and word embeddings help IR
    • Authors: Luis Moraes; Shahryar Baki; Rakesh Verma; Daniel Lee
      Abstract: Abstract The CL-SciSumm 2016 shared task introduced an interesting problem: given a document D and a piece of text that cites D, how do we identify the text spans of D being referenced by the piece of text' The shared task provided the first annotated dataset for studying this problem. We present an analysis of our continued work in improving our system’s performance on this task. We demonstrate how topic models and word embeddings can be used to surpass the previously best performing system.
      PubDate: 2017-06-20
      DOI: 10.1007/s00799-017-0220-z
  • Insights from CL-SciSumm 2016: the faceted scientific document
           summarization Shared Task
    • Authors: Kokil Jaidka; Muthu Kumar Chandrasekaran; Sajal Rustagi; Min-Yen Kan
      Abstract: Abstract We describe the participation and the official results of the 2nd Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm), held as a part of the BIRNDL workshop at the Joint Conference for Digital Libraries 2016 in Newark, New Jersey. CL-SciSumm is the first medium-scale Shared Task on scientific document summarization in the computational linguistics (CL) domain. Participants were provided a training corpus of 30 topics, each comprising of a reference paper (RP) and 10 or more citing papers, all of which cite the RP. For each citation, the text spans (i.e., citances) that pertain to the RP have been identified. Participants solved three sub-tasks in automatic research paper summarization using this text corpus. Fifteen teams from six countries registered for the Shared Task, of which ten teams ultimately submitted and presented their results. The annotated corpus comprised 30 target papers—currently the largest available corpora of its kind. The corpus is available for free download and use at
      PubDate: 2017-06-14
      DOI: 10.1007/s00799-017-0221-y
  • Computational linguistics literature and citations oriented citation
           linkage, classification and summarization
    • Authors: Lei Li; Liyuan Mao; Yazhao Zhang; Junqi Chi; Taiwen Huang; Xiaoyue Cong; Heng Peng
      Abstract: Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.
      PubDate: 2017-06-13
      DOI: 10.1007/s00799-017-0219-5
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016