for Journals by Title or ISSN
for Articles by Keywords
Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover International Journal on Digital Libraries
  [SJR: 0.203]   [H-I: 24]   [515 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1432-1300 - ISSN (Online) 1432-5012
   Published by Springer-Verlag Homepage  [2281 journals]
  • A quantitative approach to evaluate Website Archivability using the CLEAR+
    • Abstract: Abstract Website Archivability (WA) is a notion established to capture the core aspects of a website, crucial in diagnosing whether it has the potential to be archived with completeness and accuracy. In this work, aiming at measuring WA, we introduce and elaborate on all aspects of CLEAR+, an extended version of the Credible Live Evaluation Method for Archive Readiness (CLEAR) method. We use a systematic approach to evaluate WA from multiple different perspectives, which we call Website Archivability Facets. We then analyse, a web application we created as the reference implementation of CLEAR+, and discuss the implementation of the evaluation workflow. Finally, we conduct thorough evaluations of all aspects of WA to support the validity, the reliability and the benefits of our method using real-world web data.
      PubDate: 2016-06-01
  • Scalable continual quality control of formative assessment items in an
           educational digital library: an empirical study
    • Abstract: Abstract An essential component of any library of online learning objects is assessment items, for example, homework, quizzes, and self-study questions. As opposed to exams, these items are formative in nature, as they help the learner to assess his or her own progress through the material. When it comes to quality control of these items, their formative nature poses additional challenges. e.g., there is no particular time interval in which learners interact with these items, learners come to these items with very different levels of preparation and seriousness, guessing generates noise in the data, and the numbers of items and learners can be several orders of magnitude larger than in summative settings. This empirical study aims to find a highly scalable mechanism for continual quality control of this class of digital content with a minimalist amount of additional metadata and transactional data, while taking into account also characteristics of the learners. In a subsequent evaluation of the model on a limited set of transactions, we find that taking into account the learner characteristic of ability improves the quality of item metadata, and in a comparison to Item Response Theory (IRT), we find that the developed model in fact performs slightly better in terms of predicting the outcome of formative assessment transactions, while never matching the performance of IRT on predicting the outcome of summative assessment.
      PubDate: 2016-06-01
  • A locality-aware similar information searching scheme
    • Abstract: Abstract In a database, a similar information search means finding data records which contain the majority of search keywords. Due to the rapid accumulation of information nowadays, the size of databases has increased dramatically. An efficient information searching scheme can speed up information searching and retrieve all relevant records. This paper proposes a Hilbert curve-based similarity searching scheme (HCS). HCS considers a database to be a multidimensional space and each data record to be a point in the multidimensional space. By using a Hilbert space filling curve, each point is projected from a high-dimensional space to a low-dimensional space, so that the points close to each other in the high-dimensional space are gathered together in the low-dimensional space. Because the database is divided into many clusters of close points, a query is mapped to a certain cluster instead of searching the entire database. Experimental results prove that HCS dramatically reduces the search time latency and exhibits high effectiveness in retrieving similar information.
      PubDate: 2016-06-01
  • The impact of JavaScript on archivability
    • Abstract: Abstract As web technologies evolve, web archivists work to adapt so that digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts (Ajax) that, for example, load data without a change in top level Universal Resource Identifier (URI) or require user interaction (e.g., content loading via Ajax when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. In an effort to understand why mementos (archived versions of live resources) in today’s archives vary in completeness and sometimes pull content from the live web, we present a study of web resources and archival tools. We used a collection of URIs shared over Twitter and a collection of URIs curated by Archive-It in our investigation. We created local archived versions of the URIs from the Twitter and Archive-It sets using WebCite, wget, and the Heritrix crawler. We found that only 4.2 % of the Twitter collection is perfectly archived by all of these tools, while 34.2 % of the Archive-It collection is perfectly archived. After studying the quality of these mementos, we identified the practice of loading resources via JavaScript (Ajax) as the source of archival difficulty. Further, we show that resources are increasing their use of JavaScript to load embedded resources. By 2012, over half (54.5 %) of pages use JavaScript to load embedded resources. The number of embedded resources loaded via JavaScript has increased by 12.0 % from 2005 to 2012. We also show that JavaScript is responsible for 33.2 % more missing resources in 2012 than in 2005. This shows that JavaScript is responsible for an increasing proportion of the embedded resources unsuccessfully loaded by mementos. JavaScript is also responsible for 52.7 % of all missing embedded resources in our study.
      PubDate: 2016-06-01
  • The evolution of web archiving
    • Abstract: Abstract Web archives preserve information published on the web or digitized from printed publications. Much of this information is unique and historically valuable. However, the lack of knowledge about the global status of web archiving initiatives hamper their improvement and collaboration. To overcome this problem, we conducted two surveys, in 2010 and 2014, which provide a comprehensive characterization on web archiving initiatives and their evolution. We identified several patterns and trends that highlight challenges and opportunities. We discuss these patterns and trends that enable to define strategies, estimate resources and provide guidelines for research and development of better technology. Our results show that during the last years there was a significant growth in initiatives and countries hosting these initiatives, volume of data and number of contents preserved. While this indicates that the web archiving community is dedicating a growing effort on preserving digital information, other results presented throughout the paper raise concerns such as the small amount of archived data in comparison with the amount of data that is being published online.
      PubDate: 2016-05-09
  • Scholarly Ontology: modelling scholarly practices
    • Abstract: Abstract In this paper we present the Scholarly Ontology (SO), an ontology for modelling scholarly practices, inspired by business process modelling and Cultural-Historical Activity Theory. The SO is based on empirical research and earlier models and is designed so as to incorporate related works through a modular structure. The SO is an elaboration of the domain-independent core part of the NeDiMAH Methods Ontology addressing the scholarly ecosystem of Digital Humanities. It thus provides a basis for developing domain-specific scholarly work ontologies springing from a common root. We define the basic concepts of the model and their semantic relations through four complementary perspectives on scholarly work: activity, procedure, resource and agency. As a use case we present a modelling example and argue on the purpose of use of the model through the presentation of indicative SPRQL and SQWRL queries that highlight the benefits of its serialization in RDFS. The SO includes an explicit treatment of intentionality and its interplay with functionality, captured by different parts of the model. We discuss the role of types as the semantic bridge between those two parts and explore several patterns that can be exploited in designing reusable access structures and conformance rules. Related taxonomies and ontologies and their possible reuse within the framework of SO are reviewed.
      PubDate: 2016-05-04
  • Supporting academic search tasks through citation visualization and
    • Abstract: Abstract Despite ongoing advances in information retrieval algorithms, people continue to experience difficulties when conducting online searches within digital libraries. Because their information-seeking goals are often complex, searchers may experience difficulty in precisely describing what they are seeking. Current search interfaces provide limited support for navigating and exploring among the search results and helping searchers to more accurately describe what they are looking for. In this paper, we present a novel visual library search interface, designed with the goal of providing interactive support for common library search tasks and behaviours. This system takes advantage of the rich metadata available in academic collections and employs information visualization techniques to support search results evaluation, forward and backward citation exploration, and interactive query refinement.
      PubDate: 2016-04-26
  • Font attributes enrich knowledge maps and information retrieval
    • Abstract: Abstract Typography is overlooked in knowledge maps (KM) and information retrieval (IR), and some deficiencies in these systems can potentially be improved by encoding information into font attributes. A review of font use across domains is used to itemize font attributes and information visualization theory is used to characterize each attribute. Tasks associated with KM and IR, such as skimming, opinion analysis, character analysis, topic modelling and sentiment analysis can be aided through the use of novel representations using font attributes such as skim formatting, proportional encoding, textual stem and leaf plots and multi-attribute labels.
      PubDate: 2016-02-08
  • Location-triggered mobile access to a digital library of audio books using
    • Abstract: Abstract This paper explores the role of audio as a means to access books while being at locations referred to within the books, through a mobile app, called Tipple. The books are sourced from a digital library—either self-contained on the mobile phone, or else over the network—and can either be accompanied by pre-recorded audio or synthesized using text-to-speech. The paper details the functional requirements, design and implementation of Tipple. The developed concept was explored and evaluated through three field studies.
      PubDate: 2015-10-19
  • Recent applications of Knowledge Organization Systems: introduction to a
           special issue
    • Abstract: Abstract This special issue of the International Journal of Digital Libraries evolved from the 13th Networked Knowledge Organization Systems (NKOSs) workshop held at the joint Digital Libraries conference 2014 in London. The focus of the workshop was ‘Mapping between Linked Data vocabularies of KOS’ and ‘Meaningful Concept Display and Meaningful Visualization of KOS’. The issue presents six papers on the general theme on both conceptual aspects and technical implementation of NKOS. We dedicate this special issue to our long-term colleague and friend Johan De Smedt who died in June 2015 while we were editing the special issue.
      PubDate: 2015-09-04
  • Alignment of conceptual structures in controlled vocabularies in the
           domain of Chinese art: a discussion of issues and patterns
    • PubDate: 2015-09-03
  • Guest editors’ introduction to the special issue on the digital
           libraries conference 2014
    • PubDate: 2015-09-01
  • Improving interoperability using vocabulary linked data
    • Abstract: Abstract The concept of Linked Data has been an emerging theme within the computing and digital heritage areas in recent years. The growth and scale of Linked Data has underlined the need for greater commonality in concept referencing, to avoid local redefinition and duplication of reference resources. Achieving domain-wide agreement on common vocabularies would be an unreasonable expectation; however, datasets often already have local vocabulary resources defined, and so the prospects for large-scale interoperability can be substantially improved by creating alignment links from these local vocabularies out to common external reference resources. The ARIADNE project is undertaking large-scale integration of archaeology dataset metadata records, to create a cross-searchable research repository resource. Key to enabling this cross search will be the ‘subject’ metadata originating from multiple data providers, containing terms from multiple multilingual controlled vocabularies. This paper discusses various aspects of vocabulary mapping. Experience from the previous SENESCHAL project in the publication of controlled vocabularies as Linked Open Data is discussed, emphasizing the importance of unique URI identifiers for vocabulary concepts. There is a need to align legacy indexing data to the uniquely defined concepts and examples are discussed of SENESCHAL data alignment work. A case study for the ARIADNE project presents work on mapping between vocabularies, based on the Getty Art and Architecture Thesaurus as a central hub and employing an interactive vocabulary mapping tool developed for the project, which generates SKOS mapping relationships in JSON and other formats. The potential use of such vocabulary mappings to assist cross search over archaeological datasets from different countries is illustrated in a pilot experiment. The results demonstrate the enhanced opportunities for interoperability and cross searching that the approach offers.
      PubDate: 2015-08-27
  • On the composition of ISO 25964 hierarchical relations (BTG, BTP, BTI)
    • Abstract: Abstract Knowledge organization systems (KOS) can use different types of hierarchical relations: broader generic (BTG), broader partitive (BTP), and broader instantial (BTI). The latest ISO standard on thesauri (ISO 25964) has formalized these relations in a corresponding OWL ontology (De Smedt et al., ISO 25964 part 1: thesauri for information retrieval: RDF/OWL vocabulary, extension of SKOS and SKOS-XL., 2013) and expressed them as properties: broaderGeneric, broaderPartitive, and broaderInstantial, respectively. These relations are used in actual thesaurus data. The compositionality of these types of hierarchical relations has not been investigated systematically yet. They all contribute to the general broader (BT) thesaurus relation and its transitive generalization broader transitive defined in the SKOS model for representing KOS. But specialized relationship types cannot be arbitrarily combined to produce new statements that have the same semantic precision, leading to cases where inference of broader transitive relationships may be misleading. We define Extended properties (BTGE, BTPE, BTIE) and analyze which compositions of the original “one-step” properties and the Extended properties are appropriate. This enables providing the new properties with valuable semantics usable, e.g., for fine-grained information retrieval purposes. In addition, we relax some of the constraints assigned to the ISO properties, namely the fact that hierarchical relationships apply to SKOS concepts only. This allows us to apply them to the Getty Art and Architecture Thesaurus (AAT), where they are also used for non-concepts (facets, hierarchy names, guide terms). In this paper, we present extensive examples derived from the recent publication of AAT as linked open data.
      PubDate: 2015-08-20
  • A sharing-oriented design strategy for networked knowledge organization
    • Abstract: Abstract Designers of networked knowledge organization systems often follow a service-oriented design strategy, assuming an organizational model where one party outsources clearly delineated business processes to another party. But the logic of outsourcing is a poor fit for some knowledge organization practices. When knowledge organization is understood as a process of exchange among peers, a sharing-oriented design strategy makes more sense. As an example of a sharing-oriented strategy for designing networked knowledge organization systems, we describe the design of the PeriodO period gazetteer. We analyze the PeriodO data model, its representation using JavaScript Object Notation-Linked Data, and the management of changes to the PeriodO dataset. We conclude by discussing why a sharing-oriented design strategy is appropriate for organizing scholarly knowledge.
      PubDate: 2015-08-14
  • CRM ba a CRM extension for the documentation of standing buildings
    • Abstract: Abstract Exploring the connections between successive phases and overlapping layers from different ages in an ancient building is paramount for its understanding and study. Archaeologists and cultural heritage experts are always eager to unveil the hidden relations of an archaeological building to reconstruct its history and for its interpretation. This paper presents CRMba, a CIDOC CRM extension developed to facilitate the discovery and the interpretation of archaeological resources through the definition of new concepts required to describe the complexity of historic buildings. The CRMba contributes to solving the datasets interoperability issue by exploiting the use of the CIDOC CRM to overcome data fragmentation, to investigate the semantics of building components, of functional spaces and of the construction phases of historic buildings and complexes, making explicit their physical and topological relations through time and space. The approach used for the development of the CRMba makes the model valid for the documentation of different kinds of buildings, across periods, styles and conservation state.
      PubDate: 2015-08-04
  • Knowledge infrastructures in science: data, diversity, and digital
    • Abstract: Abstract Digital libraries can be deployed at many points throughout the life cycles of scientific research projects from their inception through data collection, analysis, documentation, publication, curation, preservation, and stewardship. Requirements for digital libraries to manage research data vary along many dimensions, including life cycle, scale, research domain, and types and degrees of openness. This article addresses the role of digital libraries in knowledge infrastructures for science, presenting evidence from long-term studies of four research sites. Findings are based on interviews ( \(n=208\) ), ethnographic fieldwork, document analysis, and historical archival research about scientific data practices, conducted over the course of more than a decade. The Transformation of Knowledge, Culture, and Practice in Data-Driven Science: A Knowledge Infrastructures Perspective project is based on a 2  \(\times \)  2 design, comparing two “big science” astronomy sites with two “little science” sites that span physical sciences, life sciences, and engineering, and on dimensions of project scale and temporal stage of life cycle. The two astronomy sites invested in digital libraries for data management as part of their initial research design, whereas the smaller sites made smaller investments at later stages. Role specialization varies along the same lines, with the larger projects investing in information professionals, and smaller teams carrying out their own activities internally. Sites making the largest investments in digital libraries appear to view their datasets as their primary scientific legacy, while other sites stake their legacy elsewhere. Those investing in digital libraries are more concerned with the release and reuse of data; types and degrees of openness vary accordingly. The need for expertise in digital libraries, data science, and data stewardship is apparent throughout all four sites. Examples are presented of the challenges in designing digital libraries and knowledge infrastructures to manage and steward research data.
      PubDate: 2015-07-25
  • Representing gazetteers and period thesauri in four-dimensional
    • Abstract: Abstract Gazetteers, i.e., lists of place-names, enable having a global vision of places of interest through the assignment of a point, or a region, to a place name. However, such identification of the location corresponding to a place name is often a difficult task. There is no one-to-one correspondence between the two sets, places and names, because of name variants, different names for the same place and homonymy; the location corresponding to a place name may vary in time, changing its extension or even the position; and, in general, there is the imprecision deriving from the association of a concept belonging to language (the place name) to a precise concept (the spatial location). Also for named time periods, e.g., early Bronze Age, which are of current use in archaeology, the situation is similar: they depend on the location to which they refer as the same period may have different time-spans in different locations. The present paper avails of a recent extension of the CIDOC CRM called CRMgeo, which embeds events in a spatio-temporal 4-dimensional framework. The paper uses concepts from CRMgeo and introduces extensions to model gazetteers and period thesauri. This approach enables dealing with time-varying location appellations as well as with space-varying period appellations on a robust basis. For this purpose a refinement/extension of CRMgeo is proposed and a discretization of space and time is used to approximate real space–time extents occupied by events. Such an approach solves the problem and suggests further investigations in various directions.
      PubDate: 2015-07-21
  • When should I make preservation copies of myself?
    • Abstract: Abstract We investigate how different replication policies ranging from least aggressive to most aggressive affect the level of preservation achieved by autonomic processes used by web objects (WOs). Based on simulations of small-world graphs of WOs created by the Unsupervised Small-World algorithm, we report quantitative and qualitative results for graphs ranging in order from 10 to 5000 WOs. Our results show that a moderately aggressive replication policy makes the best use of distributed host resources by not causing spikes in CPU resources nor spikes in network activity while meeting preservation goals. We examine different approaches that WOs can communicate with each other and determine the how long it would take for a message from one WO to reach a specific WO, or all WOs.
      PubDate: 2015-06-21
  • Bridging the gap between real world repositories and scalable preservation
    • Abstract: Abstract Integrating large-scale processing environments, such as Hadoop, with traditional repository systems, such as Fedora Commons 3, has long proved to be a daunting task. In this paper, we will show how this integration can be achieved using software developed in the scalable preservation environments (SCAPE) project, and also how it can be achieved using a local more direct implementation at the Danish State and University Library inspired by the SCAPE project. Both allow full use of the Hadoop system for massively distributed processing without causing excessive load on the repository. We present a proof of concept SCAPE integration and an in-production local integration based on repository systems at the Danish State and University Library and the Hadoop execution environment. Both use data from the Newspaper Digitisation Project, a collection that will grow to more than 32 million JP2 images. The use case for the SCAPE integration is to perform feature extraction and validation of the JP2 images. The validation is done against an institutional preservation policy expressed in the machine readable SCAPE Control Policy vocabulary. The feature extraction is done using the Jpylyzer tool. We perform an experiment with various-sized sets of JP2 images, to test the scalability and correctness of the solution. The first use case considered from the local Danish State and University Library integration is also feature extraction and validation of the JP2 images, this time using Jpylyzer and Schematron requirements translated from the project specification by hand. We further look at two other use cases: generation of histograms of the tonal distributions of the images; and generation of dissemination copies. We discuss the challenges and benefits of the two integration approaches when having to perform preservation actions on massive collections stored in traditional digital repositories.
      PubDate: 2015-05-29
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2015