Journal Cover
International Journal on Digital Libraries
Journal Prestige (SJR): 0.441
Citation Impact (citeScore): 2
Number of Followers: 759  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1432-1300 - ISSN (Online) 1432-5012
Published by Springer-Verlag Homepage  [2574 journals]
  • Time-focused analysis of connectivity and popularity of historical persons
           in Wikipedia
    • Abstract: Abstract Wikipedia contains large amounts of content related to history. It is being used extensively for many knowledge intensive tasks within computer science, digital humanities and related fields. In this paper, we look into Wikipedia articles on historical people for studying link-related temporal features of articles on past people. Our study sheds new light on the characteristics of information about historical people recorded in the English Wikipedia and quantifies user interest in such data. We propose a novel style of analysis in which we use signals derived from the hyperlink structure of Wikipedia as well as from article view logs, and we overlay them over temporal dimension to understand relations between time periods, link structure and article popularity. In the latter part of the paper, we also demonstrate several ways for estimating person importance based on the temporal aspects of the link structure as well as a method for ranking cities using the computed importance scores of their related persons.
      PubDate: 2019-12-01
       
  • Comparing published scientific journal articles to their pre-print
           versions
    • Abstract: Abstract Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: US academic libraries paid $$\\(1.7\)$ billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. We have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers from two distinct science, technology, and medicine corpora and their final published counterparts. This comparison had two working assumptions: (1) If the publishers’ argument is valid, the text of a pre-print paper should vary measurably from its corresponding final published version, and (2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications.
      PubDate: 2019-12-01
       
  • Capisco: low-cost concept-based access to digital libraries
    • Abstract: Abstract In this article, we present the conceptual design and report on the implementation of Capisco—a low-cost approach to concept-based access to digital libraries. Capisco avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network. The network is seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system disambiguates the semantics of terms in the documents by their semantics and context and identifies the relevant CiC concepts. Supplementary to this, the disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. For established digital library systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing, and query interface) would require major technological effort and would most likely be disruptive. In addition to presenting Capisco, we describe ways to harness the results of our developed semantic analysis and disambiguation, while retaining the existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.
      PubDate: 2019-12-01
       
  • Anatomy of scholarly information behavior patterns in the wake of academic
           social media platforms
    • Abstract: Abstract As more scholarly content is born digital or converted to a digital format, digital libraries are becoming increasingly vital to researchers seeking to leverage scholarly big data for scientific discovery. Although scholarly products are available in abundance—especially in environments created by the advent of social networking services—little is known about international scholarly information needs, information-seeking behavior, or information use. The purpose of this paper is to address these gaps via an in-depth analysis of the information needs and information-seeking behavior of researchers, both students and faculty, at two universities, one in the USA and the other in Qatar. Based on this analysis, the study identifies and describes new behavior patterns on the part of researchers as they engage in the information-seeking process. The analysis reveals that the use of academic social networks has notable effects on various scholarly activities. Further, this study identifies differences between students and faculty members in regard to their use of academic social networks, and it identifies differences between researchers according to discipline. Although the researchers who participated in the present study represent a range of disciplinary and cultural backgrounds, the study reports a number of similarities in terms of the researchers’ scholarly activities.
      PubDate: 2019-12-01
       
  • Automated identification of media bias in news articles: an
           interdisciplinary literature review
    • Abstract: Abstract Media bias, i.e., slanted news coverage, can strongly impact the public perception of the reported topics. In the social sciences, research over the past decades has developed comprehensive models to describe media bias and effective, yet often manual and thus cumbersome, methods for analysis. In contrast, in computer science fast, automated, and scalable methods are available, but few approaches systematically analyze media bias. The models used to analyze media bias in computer science tend to be simpler compared to models established in the social sciences, and do not necessarily address the most pressing substantial questions, despite technically superior approaches. Computer science research on media bias thus stands to profit from a closer integration of models for the study of media bias developed in the social sciences with automated methods from computer science. This article first establishes a shared conceptual understanding by mapping the state of the art from the social sciences to a framework, which can be targeted by approaches from computer science. Next, we investigate different forms of media bias and review how each form is analyzed in the social sciences. For each form, we then discuss methods from computer science suitable to (semi-)automate the corresponding analysis. Our review suggests that suitable, automated methods from computer science, primarily in the realm of natural language processing, are already available for each of the discussed forms of media bias, opening multiple directions for promising further research in computer science in this area.
      PubDate: 2019-12-01
       
  • A Wikidata-based tool for building and visualising narratives
    • Abstract: Abstract In this paper we present a semi-automatic tool for constructing and visualising narratives, intended as networks of events related to each other by semantic relations. The tool obeys an ontology for narratives that we developed. It retrieves and assigns internationalised resource identifiers to the instances of the classes of the ontology using Wikidata as an external knowledge base and also facilitates the construction and contextualisation of events, and their linking to form the narratives. The knowledge collected by the tool is automatically saved as an Web ontology language graph. The tool also allows the visualisation of the knowledge included in the graph in simple formats like tables, network graphs and timelines. We have carried out an initial qualitative evaluation of the tool. As case study, an historian from the University of Pisa has used the tool to build the narrative of Dante Alighieri’s life. The evaluation has regarded the effectiveness of the tool and the satisfaction of the users’ requirements.
      PubDate: 2019-12-01
       
  • Introduction to the focused issue on the 2016 ACM/IEEE-CS Joint Conference
           on Digital Libraries JCDL 2016
    • PubDate: 2019-11-12
       
  • Recent applications of Knowledge Organization Systems: introduction to a
           special issue
    • PubDate: 2019-09-01
       
  • Knowledge Organization Systems (KOS) in the Semantic Web: a
           multi-dimensional review
    • Abstract: Abstract Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009, a significant number of conventional Knowledge Organization Systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses “LOD KOS” as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).
      PubDate: 2019-09-01
       
  • Designing an ontology for managing the diets of hypertensive individuals
    • Abstract: Abstract This paper describes the development of an ontology which could act as a recommendation system for hypertensive individuals. The author has conceptualized and developed an ontology which describes recipes, nutrients in foods and the interactions between nutrients and prescribed drugs, disease and general health. The paper begins with a review of the literature on several ontology designs. The previous ontology models guide the development of classes, properties and restrictions built into the hypertensive diet ontology. The model is constructed following the Ontology 101 methodology. The ontology was validated using proto-personas to create competency questions which were used to test the ontology. The findings show that the ontology may be used to provide information with the goal of assisting individuals in making sense of complex effects of diet on health and outcomes. It is concluded that the ontology can be used to provide support to patients as they seek to manage chronic illnesses such as hypertension. The study has relevance for creators of knowledge organization systems and ontologies in the healthcare field.
      PubDate: 2019-09-01
       
  • An empirically validated, onomasiologically structured, and linguistically
           motivated online terminology
    • Abstract: Abstract Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.
      PubDate: 2019-09-01
       
  • From subtitles to substantial metadata: examining characteristics of named
           entities and their role in indexing
    • Abstract: Abstract This paper explores the possible role of named entities extracted from text in subtitles in automatic indexing of TV programs. This is done by analyzing entity types, name density and name frequencies in subtitles and metadata records from different genres of TV programs. The name density in metadata records is much higher than the name density in subtitles, and named entities with high frequencies in the subtitles are more likely to be mentioned in the metadata records. Further analysis of the metadata records indicates an increase in use of named entities in metadata in accordance with the frequency the entities have in the subtitles. The most substantial difference was between a frequency of one or two, where the named entities with a frequency of two in the subtitles were twice as likely to be present in the metadata records. Personal names, geographical names and names of organizations were the most prominent entity types in both the news subtitles and news metadata, while persons, creative works and locations are the most prominent in culture programs. It is not possible to extract all the named entities in the manually created metadata records by applying named entity recognition to the subtitles for the same programs, but it is possible to find a large subset of named entities for some categories in certain genres. The results reported in this paper show that subtitles are a good source for personal names for all the genres covered in our study, and for creative works in literature programs. In total, it was possible to find 38% of the named entities in metadata records for news programs, 32% for literature programs, while 21% of the named entities in metadata records for talk shows were also present in the subtitles for the programs.
      PubDate: 2019-09-01
       
  • Analyzing the network structure and gender differences among the members
           of the Networked Knowledge Organization Systems (NKOS) community
    • Abstract: Abstract In this paper, we analyze a major part of the research output of the Networked Knowledge Organization Systems (NKOS) community in the period 2000–2016 from a network analytical perspective. We focus on the papers presented at the European and US NKOS workshops and in addition four special issues on NKOS in the last 16 years. For this purpose, we have generated an open dataset, the “NKOS bibliography” which covers the bibliographic information of the research output. We analyze the co-authorship network of this community which results in 123 papers with a sum of 256 distinct authors. We use standard network analytic measures such as degree, betweenness and closeness centrality to describe the co-authorship network of the NKOS dataset. First, we investigate global properties of the network over time. Second, we analyze the centrality of the authors in the NKOS network. Lastly, we investigate gender differences in collaboration behavior in this community. Our results show that apart from differences in centrality measures of the scholars, they have higher tendency to collaborate with those in the same institution or the same geographic proximity. We also find that homophily is higher among women in this community. Apart from small differences in closeness and clustering among men and women, we do not find any significant dissimilarities with respect to other centralities.
      PubDate: 2019-09-01
       
  • A way to express the reliability of archaeological data: data traceability
           at the Laboratoire Archéologie et Territoires (Tours, France)
    • Abstract: Abstract In order to respect the good practices in archaeology disseminated by the MASA Consortium (Archaeologists and Archaeological Sites Memories), the Laboratoire Archéologie et Territoires (Tours, France) wished to evaluate the progress of ArSol database (Soil Archives), its field data management database, with regard to the FAIR principles or the Five-Star Linked Open Data. The work undertaken to achieve compliance with these precepts has shown that it is also necessary to ensure the relevance and reliability of the published data. For data to be reusable, it seems essential to ensure traceability. Various tools’ set-up for the ArSol database makes it possible to ensure this traceability from the field recording, through its exploitation, to the publication of the results of the excavation. The traceability of data, to ensure their quality in terms of reliability and relevance, is an aspect that is fortunately already taken into account in the ArSol database and that complements satisfactorily the FAIR principles requirements.
      PubDate: 2019-08-16
       
  • Identification of tweets that mention books
    • Abstract: Abstract We address the task of identifying tweets that mention books from amongst tweets that contain the same strings as book titles. Assuming the existence of a comprehensive list of book titles, this task can be defined as text classification targeting tweets that contain the same string as book titles. In carrying out the task, we need to exclude two types of tweets. The first is automatically posted, spam-like tweets that promote book sales or post recommendations (bot tweets). This type of tweets is excluded because we are developing an online surrogate to book exposure embedded within human communication on social media, and the results of the present task are to be used in this system. The second is tweets that contain the same string as book titles but are not about books (noise tweets). We proposed a two-step, machine learning-based pipeline consisting of bot filtering followed by noise reduction. Evaluation of experiments showed that our proposed method achieved an F1-score of 0.76, which is comparable to the best performance reported in similar tasks and sufficient as a first step for use in practical applications. We also analysed the detailed performance and errors, which suggested that the proposed method maintained an appropriate balance between precision and recall, and can be further improved by increasing the data size and taking into account word senses.
      PubDate: 2019-08-05
       
  • Assessing the quality of answers autonomously in community
           question–answering
    • Abstract: Abstract Community question–answering (CQA) has become a popular method of online information seeking. Within these services, peers ask questions and create answers to those questions. For some time, content repositories created through CQA sites have widely supported general-purpose tasks; however, they can also be used as online digital libraries that satisfy specific needs related to education. Horizontal CQA services, such as Yahoo! Answers, and vertical CQA services, such as Brainly, aim to help students improve their learning process via Q&A exchanges. In addition, Stack Overflow—another vertical CQA—serves a similar purpose but specifically focuses on topics relevant to programmers. Receiving high-quality answer(s) to a posed CQA query is a critical factor to both user satisfaction and supported learning in these services. This process can be impeded when experts do not answer questions and/or askers do not have the knowledge and skills needed to evaluate the quality of the answers they receive. Such circumstances may cause learners to construct a faulty knowledge base by applying inaccurate information acquired from online sources. Though site moderators could alleviate this problem by surveying answer quality, their subjective assessments may cause evaluations to be inconsistent. Another potential solution lies in human assessors, though they may also be insufficient due to the large amount of content available on a CQA site. The following study addresses these issues by proposing a framework for automatically assessing answer quality. We accomplish this by integrating different groups of features—personal, community-based, textual, and contextual—to build a classification model and determine what constitutes answer quality. We collected more than 10 million educational answers posted by more than 3 million users on Brainly and 7.7 million answers on Stack Overflow to test this evaluation framework. The experiments conducted on these data sets show that the model using random forest achieves high accuracy in identifying high-quality answers. Findings also indicate that personal and community-based features have more prediction power in assessing answer quality. Additionally, other key metrics such as F1-score and area under ROC curve achieve high values with our approach. The work reported here can be useful in many other contexts that strive to provide automatic quality assessment in a digital repository.
      PubDate: 2019-08-05
       
  • Heritage Science and Cultural Heritage: standards and tools for
           establishing cross-domain data interoperability
    • Abstract: Abstract This paper describes a system for documenting scientific data produced in Heritage Sciences. The system is built around a general meta-model, flexible enough to provide descriptions, in a formal language, of the datasets produced by scientific research. Resulting metadata can be re-encoded and published in multiple formats. The underlying metadata schema is inspired by CIDOC CRM principles for data modelling and maintains a full compatibility with CIDOC CRM ontology to capture provenance and foster interoperability with Cultural Heritage information. The use of a wide set of thesauri and controlled vocabularies guarantees internal coherence at data and metadata level. Applicability tests are underway at different institutions and a set of user interfaces has been designed to simplify and speed up the process of data gathering and metadata definition.
      PubDate: 2019-08-03
       
  • Content-based video retrieval in historical collections of the German
           Broadcasting Archive
    • Abstract: Abstract The German Broadcasting Archive maintains the cultural heritage of radio and television broadcasts of the former German Democratic Republic (GDR). The uniqueness and importance of the video material fosters a large scientific interest in the video content. In this paper, we present a system for automatic video content analysis and retrieval to facilitate search in historical collections of GDR television recordings. It relies on a distributed, service-oriented architecture and includes video analysis algorithms for shot boundary detection, concept classification, person recognition, text recognition and similarity search. The combination of different search modalities allows users to obtain answers for a wide range of queries, leading to satisfactory results in short time. The performance of the system is evaluated using 2500 h of GDR television recordings.
      PubDate: 2019-06-01
       
  • Expressiveness and machine processability of Knowledge Organization
           Systems (KOS): an analysis of concepts and relations
    • Abstract: Abstract This study considers the expressiveness (that is, the expressive power or expressivity) of different types of Knowledge Organization Systems (KOS) and discusses its potential to be machine-processable in the context of the semantic web. For this purpose, the theoretical foundations of KOS are reviewed based on conceptualizations introduced by the Functional Requirements for Subject Authority Data (FRSAD) and the Simple Knowledge Organization System (SKOS); natural language processing techniques are also implemented. Applying a comparative analysis, the dataset comprises a thesaurus (Eurovoc), a subject headings system (LCSH) and a classification scheme (DDC). These are compared with an ontology (CIDOC-CRM) by focusing on how they define and handle concepts and relations. It was observed that LCSH and DDC focus on the formalism of character strings (nomens) rather than on the modelling of semantics; their definition of what constitutes a concept is quite fuzzy, and they comprise a large number of complex concepts. By contrast, thesauri have a coherent definition of what constitutes a concept, and apply a systematic approach to the modelling of relations. Ontologies explicitly define diverse types of relations, and are by their nature machine-processable. The paper concludes that the potential of both the expressiveness and machine processability of each KOS is extensively regulated by its structural rules. It is harder to represent subject headings and classification schemes as semantic networks with nodes and arcs, while thesauri are more suitable for such a representation. In addition, a paradigm shift is revealed which focuses on the modelling of relations between concepts, rather than the concepts themselves.
      PubDate: 2019-04-12
       
  • Introduction to the focused issue on the 20th International Conference on
           Theory and Practice of Digital Libraries (TPDL 2016)
    • Authors: Norbert Fuhr; László Kovács; Thomas Risse; Wolfgang Nejdl
      PubDate: 2019-02-01
      DOI: 10.1007/s00799-019-00265-4
       
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
 
About JournalTOCs
API
Help
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-