Journal Cover
International Journal on Digital Libraries
Journal Prestige (SJR): 0.441
Citation Impact (citeScore): 2
Number of Followers: 763  
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1432-1300 - ISSN (Online) 1432-5012
Published by Springer-Verlag Homepage  [2570 journals]
  • Historical document layout analysis using anisotropic diffusion and
           geometric features
    • Abstract: Abstract There are several digital libraries worldwide which maintain valuable historical manuscripts. Usually, digital copies of these manuscripts are offered to researchers and readers in raster-image format. These images carry several document degradations that may hinder automatic information retrieval solutions such as manuscript indexing, categorization, retrieval by content, etc. In this paper, we propose a learning-free and hybrid document layout analysis for handwritten historical manuscripts. It has two main phases: page characterization and segmentation. First, the proposed method locates main-content initially using top-down whitespace analysis. It employs anisotropic diffusion filtering to find whitespaces. Then, it extracts template features representing manuscripts’ authors writing behavior. After that, moving windows are used to scan the manuscript page and define main-content boundaries more precisely. We evaluated the proposed method on two datasets: One set is publicly available with 38 historical manuscript pages, and the other set of 51 historical manuscript pages that are collected from the online Harvard Library. Experiments on both datasets show promising results in terms of segmentation quality of main-content that reaches up to 98.5% success rate.
      PubDate: 2020-01-23
  • The HathiTrust Digital Library’s potential for musicology research
    • Abstract: Abstract The HathiTrust Digital Library (HTDL) is one of the largest digital libraries in the world, containing seventeen million volumes from the collections of major academic and research libraries. In this paper, we discuss the HTDL’s potential for musicology research by providing a bibliometric analysis of the collection as a whole, and of the music materials in particular. A series of case studies illustrates the kinds of musicological research that may be conducted using the HTDL. We highlight several opportunities for improvement and discuss promising future directions for new knowledge creation through the processing and analysis of large amounts of retrospective data. The HTDL presents significant new opportunities to the study of music that will continue to expand as data, metadata and collection enhancements are introduced.
      PubDate: 2020-01-23
  • Content selection criteria for news multi-video summarization based on
           human strategies
    • Abstract: Abstract In the recent years, the multimedia data volume produced and available for access has increased continuously and quickly, notably video content. This context has also increased the overload information problem: finding content of interest in the huge amount of available options. So, efficient schemes for content access are needed. Automatic video summarization is a research field that deals with this problem. Furthermore, the current multimedia systems make available several videos related to the same topic but having, each one, a piece of unique complementary information. This fact highlights the need for multi-video summarization to deal with users’ interest in being informed about a subject from a set of videos without being obligated to watch the whole set. However, the literature analysis shows that human strategies are not considered to define criteria used to automatically select video segments that will compose a summary and the focus of techniques has been the identification of common information in different videos. In this work, we investigate human strategies for news multi-video summarization. The results of the study with real users uncover relevant criteria to develop summaries, with potential to increase their semantics and bring them closer to users’ perception.
      PubDate: 2020-01-23
  • FAIR data for prehistoric mining archaeology
    • Abstract: Abstract This paper presents an approach how to create FAIR data for prehistoric mining archaeology, based on the CIDOC CRM ontology and semantic web standards. The interdisciplinary Research Centre HiMAT (History of mining activities in the Tyrol and adjacent areas, University of Innsbruck) investigates mining history from prehistoric to modern times with an interdisciplinary approach. One of the projects carried out at the research centre is the multinational DACH project “Prehistoric copper production in the eastern and central Alps”. For a specific geographical region of the project, the data transformation to open and re-usable data is investigated in a separate Open Research Data pilot project. The methodological approach will use the FAIR principles to make data Findable, Accessible, Interoperable and Re-usable. Every archaeological investigation in Austria has to be documented according to the requirements of the Austrian Federal Monuments Office. This documentation is deposited in the CERN-based EU supported research data repository ZENODO. For each deposited file, metadata are created through the application of the conceptual metadata schema CIDOC CRM, an ISO standard for Cultural Heritage Information, which was adopted by ARIADNE, the European Union Research Infrastructure for archaeological resources. Concepts specific to mining archaeology research are organized with the DARIAH Back Bone Thesaurus, a model for sustainable interoperable thesauri maintenance, developed in the European Union Digital Research Infrastructure for the Arts and Humanities. Metadata are created through the extraction of information from the documentation and the transformation to a knowledge graph using semantic web standards. To facilitate usage, graph data are exported to hierarchical and tabular formats representing sites and objects with their geographic locations, temporal and typological assignments and links to the research activities and documents. Metadata are deposited together with the documentation into the repository.
      PubDate: 2020-01-23
  • A fuzzy approach to evaluate the attributions reliability in the
           archaeological sources
    • Abstract: Abstract This paper presents a case study of data management and processing of archaeological information through a relational database. The unusual typology of the ‘small finds’ that were archaeologically analyzed and the specific history of the excavations at Phaistos and Ayia Triada (Crete, Greece) prompted our consideration of issues regarding data integrity. We sought to address the problem surrounding the relevance of archaeological sources by applying a reliability index to the subjective interpretations of archaeological data, which ultimately led to the implementation of a fuzzy method to determine the degree of uncertainty of attributions associated with function. The resulting database represents a ‘container of memories’ that allows the processing of all the typological and functional attributions from any source, without having to necessarily simplify or dilute the information in order to render it manageable. The concept of ‘probability of belonging’ and multi-assignment of source attributions seem to represent plausible methodological pathways to determining the reliability of archaeological data, thus warranting the research presented herein.
      PubDate: 2020-01-21
  • Time-focused analysis of connectivity and popularity of historical persons
           in Wikipedia
    • Abstract: Abstract Wikipedia contains large amounts of content related to history. It is being used extensively for many knowledge intensive tasks within computer science, digital humanities and related fields. In this paper, we look into Wikipedia articles on historical people for studying link-related temporal features of articles on past people. Our study sheds new light on the characteristics of information about historical people recorded in the English Wikipedia and quantifies user interest in such data. We propose a novel style of analysis in which we use signals derived from the hyperlink structure of Wikipedia as well as from article view logs, and we overlay them over temporal dimension to understand relations between time periods, link structure and article popularity. In the latter part of the paper, we also demonstrate several ways for estimating person importance based on the temporal aspects of the link structure as well as a method for ranking cities using the computed importance scores of their related persons.
      PubDate: 2019-12-01
  • Comparing published scientific journal articles to their pre-print
    • Abstract: Abstract Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: US academic libraries paid $$\\(1.7\)$ billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. We have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers from two distinct science, technology, and medicine corpora and their final published counterparts. This comparison had two working assumptions: (1) If the publishers’ argument is valid, the text of a pre-print paper should vary measurably from its corresponding final published version, and (2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications.
      PubDate: 2019-12-01
  • Capisco: low-cost concept-based access to digital libraries
    • Abstract: Abstract In this article, we present the conceptual design and report on the implementation of Capisco—a low-cost approach to concept-based access to digital libraries. Capisco avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network. The network is seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system disambiguates the semantics of terms in the documents by their semantics and context and identifies the relevant CiC concepts. Supplementary to this, the disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. For established digital library systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing, and query interface) would require major technological effort and would most likely be disruptive. In addition to presenting Capisco, we describe ways to harness the results of our developed semantic analysis and disambiguation, while retaining the existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.
      PubDate: 2019-12-01
  • Anatomy of scholarly information behavior patterns in the wake of academic
           social media platforms
    • Abstract: Abstract As more scholarly content is born digital or converted to a digital format, digital libraries are becoming increasingly vital to researchers seeking to leverage scholarly big data for scientific discovery. Although scholarly products are available in abundance—especially in environments created by the advent of social networking services—little is known about international scholarly information needs, information-seeking behavior, or information use. The purpose of this paper is to address these gaps via an in-depth analysis of the information needs and information-seeking behavior of researchers, both students and faculty, at two universities, one in the USA and the other in Qatar. Based on this analysis, the study identifies and describes new behavior patterns on the part of researchers as they engage in the information-seeking process. The analysis reveals that the use of academic social networks has notable effects on various scholarly activities. Further, this study identifies differences between students and faculty members in regard to their use of academic social networks, and it identifies differences between researchers according to discipline. Although the researchers who participated in the present study represent a range of disciplinary and cultural backgrounds, the study reports a number of similarities in terms of the researchers’ scholarly activities.
      PubDate: 2019-12-01
  • Automated identification of media bias in news articles: an
           interdisciplinary literature review
    • Abstract: Abstract Media bias, i.e., slanted news coverage, can strongly impact the public perception of the reported topics. In the social sciences, research over the past decades has developed comprehensive models to describe media bias and effective, yet often manual and thus cumbersome, methods for analysis. In contrast, in computer science fast, automated, and scalable methods are available, but few approaches systematically analyze media bias. The models used to analyze media bias in computer science tend to be simpler compared to models established in the social sciences, and do not necessarily address the most pressing substantial questions, despite technically superior approaches. Computer science research on media bias thus stands to profit from a closer integration of models for the study of media bias developed in the social sciences with automated methods from computer science. This article first establishes a shared conceptual understanding by mapping the state of the art from the social sciences to a framework, which can be targeted by approaches from computer science. Next, we investigate different forms of media bias and review how each form is analyzed in the social sciences. For each form, we then discuss methods from computer science suitable to (semi-)automate the corresponding analysis. Our review suggests that suitable, automated methods from computer science, primarily in the realm of natural language processing, are already available for each of the discussed forms of media bias, opening multiple directions for promising further research in computer science in this area.
      PubDate: 2019-12-01
  • A Wikidata-based tool for building and visualising narratives
    • Abstract: Abstract In this paper we present a semi-automatic tool for constructing and visualising narratives, intended as networks of events related to each other by semantic relations. The tool obeys an ontology for narratives that we developed. It retrieves and assigns internationalised resource identifiers to the instances of the classes of the ontology using Wikidata as an external knowledge base and also facilitates the construction and contextualisation of events, and their linking to form the narratives. The knowledge collected by the tool is automatically saved as an Web ontology language graph. The tool also allows the visualisation of the knowledge included in the graph in simple formats like tables, network graphs and timelines. We have carried out an initial qualitative evaluation of the tool. As case study, an historian from the University of Pisa has used the tool to build the narrative of Dante Alighieri’s life. The evaluation has regarded the effectiveness of the tool and the satisfaction of the users’ requirements.
      PubDate: 2019-12-01
  • Introduction to the focused issue on the 2016 ACM/IEEE-CS Joint Conference
           on Digital Libraries JCDL 2016
    • PubDate: 2019-11-12
  • Recent applications of Knowledge Organization Systems: introduction to a
           special issue
    • PubDate: 2019-09-01
  • Knowledge Organization Systems (KOS) in the Semantic Web: a
           multi-dimensional review
    • Abstract: Abstract Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009, a significant number of conventional Knowledge Organization Systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses “LOD KOS” as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).
      PubDate: 2019-09-01
  • Designing an ontology for managing the diets of hypertensive individuals
    • Abstract: Abstract This paper describes the development of an ontology which could act as a recommendation system for hypertensive individuals. The author has conceptualized and developed an ontology which describes recipes, nutrients in foods and the interactions between nutrients and prescribed drugs, disease and general health. The paper begins with a review of the literature on several ontology designs. The previous ontology models guide the development of classes, properties and restrictions built into the hypertensive diet ontology. The model is constructed following the Ontology 101 methodology. The ontology was validated using proto-personas to create competency questions which were used to test the ontology. The findings show that the ontology may be used to provide information with the goal of assisting individuals in making sense of complex effects of diet on health and outcomes. It is concluded that the ontology can be used to provide support to patients as they seek to manage chronic illnesses such as hypertension. The study has relevance for creators of knowledge organization systems and ontologies in the healthcare field.
      PubDate: 2019-09-01
  • An empirically validated, onomasiologically structured, and linguistically
           motivated online terminology
    • Abstract: Abstract Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.
      PubDate: 2019-09-01
  • From subtitles to substantial metadata: examining characteristics of named
           entities and their role in indexing
    • Abstract: Abstract This paper explores the possible role of named entities extracted from text in subtitles in automatic indexing of TV programs. This is done by analyzing entity types, name density and name frequencies in subtitles and metadata records from different genres of TV programs. The name density in metadata records is much higher than the name density in subtitles, and named entities with high frequencies in the subtitles are more likely to be mentioned in the metadata records. Further analysis of the metadata records indicates an increase in use of named entities in metadata in accordance with the frequency the entities have in the subtitles. The most substantial difference was between a frequency of one or two, where the named entities with a frequency of two in the subtitles were twice as likely to be present in the metadata records. Personal names, geographical names and names of organizations were the most prominent entity types in both the news subtitles and news metadata, while persons, creative works and locations are the most prominent in culture programs. It is not possible to extract all the named entities in the manually created metadata records by applying named entity recognition to the subtitles for the same programs, but it is possible to find a large subset of named entities for some categories in certain genres. The results reported in this paper show that subtitles are a good source for personal names for all the genres covered in our study, and for creative works in literature programs. In total, it was possible to find 38% of the named entities in metadata records for news programs, 32% for literature programs, while 21% of the named entities in metadata records for talk shows were also present in the subtitles for the programs.
      PubDate: 2019-09-01
  • Analyzing the network structure and gender differences among the members
           of the Networked Knowledge Organization Systems (NKOS) community
    • Abstract: Abstract In this paper, we analyze a major part of the research output of the Networked Knowledge Organization Systems (NKOS) community in the period 2000–2016 from a network analytical perspective. We focus on the papers presented at the European and US NKOS workshops and in addition four special issues on NKOS in the last 16 years. For this purpose, we have generated an open dataset, the “NKOS bibliography” which covers the bibliographic information of the research output. We analyze the co-authorship network of this community which results in 123 papers with a sum of 256 distinct authors. We use standard network analytic measures such as degree, betweenness and closeness centrality to describe the co-authorship network of the NKOS dataset. First, we investigate global properties of the network over time. Second, we analyze the centrality of the authors in the NKOS network. Lastly, we investigate gender differences in collaboration behavior in this community. Our results show that apart from differences in centrality measures of the scholars, they have higher tendency to collaborate with those in the same institution or the same geographic proximity. We also find that homophily is higher among women in this community. Apart from small differences in closeness and clustering among men and women, we do not find any significant dissimilarities with respect to other centralities.
      PubDate: 2019-09-01
  • Assessing the quality of answers autonomously in community
    • Abstract: Abstract Community question–answering (CQA) has become a popular method of online information seeking. Within these services, peers ask questions and create answers to those questions. For some time, content repositories created through CQA sites have widely supported general-purpose tasks; however, they can also be used as online digital libraries that satisfy specific needs related to education. Horizontal CQA services, such as Yahoo! Answers, and vertical CQA services, such as Brainly, aim to help students improve their learning process via Q&A exchanges. In addition, Stack Overflow—another vertical CQA—serves a similar purpose but specifically focuses on topics relevant to programmers. Receiving high-quality answer(s) to a posed CQA query is a critical factor to both user satisfaction and supported learning in these services. This process can be impeded when experts do not answer questions and/or askers do not have the knowledge and skills needed to evaluate the quality of the answers they receive. Such circumstances may cause learners to construct a faulty knowledge base by applying inaccurate information acquired from online sources. Though site moderators could alleviate this problem by surveying answer quality, their subjective assessments may cause evaluations to be inconsistent. Another potential solution lies in human assessors, though they may also be insufficient due to the large amount of content available on a CQA site. The following study addresses these issues by proposing a framework for automatically assessing answer quality. We accomplish this by integrating different groups of features—personal, community-based, textual, and contextual—to build a classification model and determine what constitutes answer quality. We collected more than 10 million educational answers posted by more than 3 million users on Brainly and 7.7 million answers on Stack Overflow to test this evaluation framework. The experiments conducted on these data sets show that the model using random forest achieves high accuracy in identifying high-quality answers. Findings also indicate that personal and community-based features have more prediction power in assessing answer quality. Additionally, other key metrics such as F1-score and area under ROC curve achieve high values with our approach. The work reported here can be useful in many other contexts that strive to provide automatic quality assessment in a digital repository.
      PubDate: 2019-08-05
  • Expressiveness and machine processability of Knowledge Organization
           Systems (KOS): an analysis of concepts and relations
    • Abstract: Abstract This study considers the expressiveness (that is, the expressive power or expressivity) of different types of Knowledge Organization Systems (KOS) and discusses its potential to be machine-processable in the context of the semantic web. For this purpose, the theoretical foundations of KOS are reviewed based on conceptualizations introduced by the Functional Requirements for Subject Authority Data (FRSAD) and the Simple Knowledge Organization System (SKOS); natural language processing techniques are also implemented. Applying a comparative analysis, the dataset comprises a thesaurus (Eurovoc), a subject headings system (LCSH) and a classification scheme (DDC). These are compared with an ontology (CIDOC-CRM) by focusing on how they define and handle concepts and relations. It was observed that LCSH and DDC focus on the formalism of character strings (nomens) rather than on the modelling of semantics; their definition of what constitutes a concept is quite fuzzy, and they comprise a large number of complex concepts. By contrast, thesauri have a coherent definition of what constitutes a concept, and apply a systematic approach to the modelling of relations. Ontologies explicitly define diverse types of relations, and are by their nature machine-processable. The paper concludes that the potential of both the expressiveness and machine processability of each KOS is extensively regulated by its structural rules. It is harder to represent subject headings and classification schemes as semantic networks with nodes and arcs, while thesauri are more suitable for such a representation. In addition, a paradigm shift is revealed which focuses on the modelling of relations between concepts, rather than the concepts themselves.
      PubDate: 2019-04-12
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762

Your IP address:
Home (Search)
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-