for Journals by Title or ISSN
for Articles by Keywords
Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Jurnals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover   International Journal on Digital Libraries
  [SJR: 0.203]   [H-I: 24]   [673 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1432-1300 - ISSN (Online) 1432-5012
   Published by Springer-Verlag Homepage  [2302 journals]
  • A linked open data architecture for the historical archives of the Getulio
           Vargas Foundation
    • Abstract: This paper presents an architecture for historical archives maintenance based on Open Linked Data technologies and open source distributed development model and tools. The proposed architecture is being implemented for the archives of the Centro de Pesquisa e Documentação de História Contemporânea do Brasil (Center for Research and Documentation of Brazilian Contemporary History) of the Fundação Getulio Vargas (Getulio Vargas Foundation). We discuss the benefits of this initiative and suggest ways of implementing it, as well as describing the preliminary milestones already achieved. We also present some of the possibilities for extending the accessibility and usefulness of the data archives information using semantic web technologies, natural language processing, image analysis tools, and audio–textual alignment, both in progress and planned.
      PubDate: 2015-03-19
  • Introduction to the focussed issue on Semantic Digital Archives
    • PubDate: 2015-03-12
  • A quantitative approach to evaluate Website Archivability using the CLEAR+
    • Abstract: Website Archivability (WA) is a notion established to capture the core aspects of a website, crucial in diagnosing whether it has the potential to be archived with completeness and accuracy. In this work, aiming at measuring WA, we introduce and elaborate on all aspects of CLEAR+, an extended version of the Credible Live Evaluation Method for Archive Readiness (CLEAR) method. We use a systematic approach to evaluate WA from multiple different perspectives, which we call Website Archivability Facets. We then analyse, a web application we created as the reference implementation of CLEAR+, and discuss the implementation of the evaluation workflow. Finally, we conduct thorough evaluations of all aspects of WA to support the validity, the reliability and the benefits of our method using real-world web data.
      PubDate: 2015-03-12
  • Scalable continual quality control of formative assessment items in an
           educational digital library: an empirical study
    • Abstract: An essential component of any library of online learning objects is assessment items, for example, homework, quizzes, and self-study questions. As opposed to exams, these items are formative in nature, as they help the learner to assess his or her own progress through the material. When it comes to quality control of these items, their formative nature poses additional challenges. e.g., there is no particular time interval in which learners interact with these items, learners come to these items with very different levels of preparation and seriousness, guessing generates noise in the data, and the numbers of items and learners can be several orders of magnitude larger than in summative settings. This empirical study aims to find a highly scalable mechanism for continual quality control of this class of digital content with a minimalist amount of additional metadata and transactional data, while taking into account also characteristics of the learners. In a subsequent evaluation of the model on a limited set of transactions, we find that taking into account the learner characteristic of ability improves the quality of item metadata, and in a comparison to Item Response Theory (IRT), we find that the developed model in fact performs slightly better in terms of predicting the outcome of formative assessment transactions, while never matching the performance of IRT on predicting the outcome of summative assessment.
      PubDate: 2015-03-11
  • A generalized topic modeling approach for automatic document annotation
    • Abstract: Ecological and environmental sciences have become more advanced and complex, requiring observational and experimental data from multiple places, times, and thematic scales to verify their hypotheses. Over time, such data have not only increased in amount, but also in diversity and heterogeneity of the data sources that spread throughout the world. This heterogeneity poses a huge challenge for scientists who have to manually search for desired data. ONEMercury has recently been implemented as part of the DataONE project to alleviate such problems and to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata records from multiple archives and repositories, and makes them searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could impede effective retrieval. We propose a methodology that learns the annotation from well-annotated collections of metadata records to automatically annotate poorly annotated ones. The problem is first transformed into the tag recommendation problem with a controlled tag library. Then, two variants of an algorithm for automatic tag recommendation are presented. The experiments on four datasets of environmental science metadata records show that our methods perform well and also shed light on the natures of different datasets. We also discuss relevant topics such as using topical coherence to fine-tune parameters and experiments on cross-archive annotation.
      PubDate: 2015-03-07
  • Using ontologies to capture the semantics of a (business) process for
           digital preservation
    • Abstract: IT-supported business processes and computationally intensive science (called e-science) have become increasingly ubiquitous in the last decades. Along with this trend comes the need to make at least the most important of these processes available for the long term, to allow later analysis of their execution, or even a re-execution. As such, the preservation of scientific experiments and their results enables others to reproduce and verify the results as well as build on the result of earlier work. All but the simplest processes require to be described by a multitude of information objects, as well as their interconnections and relations, to be successfully preserved. To enable a semantic description of these objects in a structured manner, we developed a formal meta-model that can be utilised in the digital preservation of a process. The meta-model describes classes of elements and their relations, in the form of ontologies, with a core ontology describing the generic concepts, and extension mechanisms to map supplementary ontologies describing more specific aspects. In this paper, we present the overall architecture and individual ontologies, and motivate their usefulness via the application to use cases from different domains.
      PubDate: 2015-03-04
  • The new knowledge infrastructure
    • PubDate: 2015-02-27
  • Introduction to the special issue on digital scholarship
    • PubDate: 2015-02-24
  • Visions and open challenges for a knowledge-based culturomics
    • Abstract: The concept of culturomics was born out of the availability of massive amounts of textual data and the interest to make sense of cultural and language phenomena over time. Thus far however, culturomics has only made use of, and shown the great potential of, statistical methods. In this paper, we present a vision for a knowledge-based culturomics that complements traditional culturomics. We discuss the possibilities and challenges of combining knowledge-based methods with statistical methods and address major challenges that arise due to the nature of the data; diversity of sources, changes in language over time as well as temporal dynamics of information in general. We address all layers needed for knowledge-based culturomics, from natural language processing and relations to summaries and opinions.
      PubDate: 2015-02-18
  • What lies beneath?: Knowledge infrastructures in the subseafloor
           biosphere and beyond
    • Abstract: We present preliminary findings from a three-year research project comprised of longitudinal qualitative case studies of data practices in four large, distributed, highly multidisciplinary scientific collaborations. This project follows a 2 \(\times \) 2 research design: two of the collaborations are big science while two are little science, two have completed data collection activities while two are ramping up data collection. This paper is centered on one of these collaborations, a project bringing together scientists to study subseafloor microbial life. This collaboration is little science, characterized by small teams, using small amounts of data, to address specific questions. Our case study employs participant observation in a laboratory, interviews ( \(n=49\) to date) with scientists in the collaboration, and document analysis. We present a data workflow that is typical for many of the scientists working in the observed laboratory. In particular, we show that, although this workflow results in datasets apparently similar in form, nevertheless a large degree of heterogeneity exists across scientists in this laboratory in terms of the methods they employ to produce these datasets—even between scientists working on adjacent benches. To date, most studies of data in little science focus on heterogeneity in terms of the types of data produced: this paper adds another dimension of heterogeneity to existing knowledge about data in little science. This additional dimension makes more complex the task of management and curation of data for subsequent reuse. Furthermore, the nature of the factors that contribute to heterogeneity of methods suggest that this dimension of heterogeneity is a persistent and unavoidable feature of little science.
      PubDate: 2015-02-15
  • A metadata model and mapping approach for facilitating access to
           heterogeneous cultural heritage assets
    • Abstract: In the last decade, Europe has put a tremendous effort into making cultural, educational and scientific resources publicly available. Based on national or thematic aggregators, initiatives like Europeana nowadays provide a plethora of cultural resources for people worldwide. Although such massive amounts of rich cultural heritage content are available, the potential of its use for educational and scientific purposes still remains largely untapped. Much valuable content is only available in the so-called long tail, i.e. in niche resources such as specifically themed cultural heritage collections, and are difficult to access from the mainstream hubs like major search engines, social networks or online encyclopaedias. The vision of the EEXCESS project is to push high-quality content from the long tail to platforms and devices which are used every day. The realisation of such use cases requires as a basis (and in addition to the functional components) a common metadata representation and tools for mapping between the data sources’ specific data models and this common representation. In this paper, we propose a data model for such a system that combines federated search results from different cultural heritage data sources. We then propose an approach for metadata mapping, with a focus on easy configurability of mappings, which—once properly configured—can then be executed on the fly by an automatic service. We demonstrate the approach using a real-world example.
      PubDate: 2015-01-29
  • The impact of JavaScript on archivability
    • Abstract: As web technologies evolve, web archivists work to adapt so that digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts (Ajax) that, for example, load data without a change in top level Universal Resource Identifier (URI) or require user interaction (e.g., content loading via Ajax when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. In an effort to understand why mementos (archived versions of live resources) in today’s archives vary in completeness and sometimes pull content from the live web, we present a study of web resources and archival tools. We used a collection of URIs shared over Twitter and a collection of URIs curated by Archive-It in our investigation. We created local archived versions of the URIs from the Twitter and Archive-It sets using WebCite, wget, and the Heritrix crawler. We found that only 4.2 % of the Twitter collection is perfectly archived by all of these tools, while 34.2 % of the Archive-It collection is perfectly archived. After studying the quality of these mementos, we identified the practice of loading resources via JavaScript (Ajax) as the source of archival difficulty. Further, we show that resources are increasing their use of JavaScript to load embedded resources. By 2012, over half (54.5 %) of pages use JavaScript to load embedded resources. The number of embedded resources loaded via JavaScript has increased by 12.0 % from 2005 to 2012. We also show that JavaScript is responsible for 33.2 % more missing resources in 2012 than in 2005. This shows that JavaScript is responsible for an increasing proportion of the embedded resources unsuccessfully loaded by mementos. JavaScript is also responsible for 52.7 % of all missing embedded resources in our study.
      PubDate: 2015-01-25
    • Abstract: In this article, we present PREMIS OWL. This is a semantic formalisation of the PREMIS 2.2 data dictionary of the Library of Congress. PREMIS 2.2 are metadata implementation guidelines for digitally archiving information for the long term. Nowadays, the need for digital preservation is growing. A lot of the digital information produced merely a decade ago is in danger of getting lost as technologies are changing and getting obsolete. This also threatens a lot of information from heritage institutions. PREMIS OWL is a semantic long-term preservation schema. Preservation metadata are actually a mixture of provenance information, technical information on the digital objects to be preserved and rights information. PREMIS OWL is an OWL schema that can be used as data model supporting digital archives. It can be used for dissemination of the preservation metadata as Linked Open Data on the Web and, at the same time, for supporting semantic web technologies in the preservation processes. The model incorporates 24 preservation vocabularies, published by the LOC as SKOS vocabularies. Via these vocabularies, PREMIS descriptions from different institutions become highly interoperable. The schema is approved and now managed by the Library of Congress. The PREMIS OWL schema is published at
      PubDate: 2015-01-11
  • Named entity evolution recognition on the Blogosphere
    • Abstract: Advancements in technology and culture lead to changes in our language. These changes create a gap between the language known by users and the language stored in digital archives. It affects user’s possibility to firstly find content and secondly interpret that content. In a previous work, we introduced our approach for named entity evolution recognition (NEER) in newspaper collections. Lately, increasing efforts in Web preservation have led to increased availability of Web archives covering longer time spans. However, language on the Web is more dynamic than in traditional media and many of the basic assumptions from the newspaper domain do not hold for Web data. In this paper we discuss the limitations of existing methodology for NEER. We approach these by adapting an existing NEER method to work on noisy data like the Web and the Blogosphere in particular. We develop novel filters that reduce the noise and make use of Semantic Web resources to obtain more information about terms. Our evaluation shows the potentials of the proposed approach.
      PubDate: 2014-12-23
  • VisInfo: a digital library system for time series research data based on
           exploratory search—a user-centered design approach
    • Abstract: To this day, data-driven science is a widely accepted concept in the digital library (DL) context (Hey et al. in The fourth paradigm: data-intensive scientific discovery. Microsoft Research, 2009). In the same way, domain knowledge from information visualization, visual analytics, and exploratory search has found its way into the DL workflow. This trend is expected to continue, considering future DL challenges such as content-based access to new document types, visual search, and exploration for information landscapes, or big data in general. To cope with these challenges, DL actors need to collaborate with external specialists from different domains to complement each other and succeed in given tasks such as making research data publicly available. Through these interdisciplinary approaches, the DL ecosystem may contribute to applications focused on data-driven science and digital scholarship. In this work, we present VisInfo (2014) , a web-based digital library system (DLS) with the goal to provide visual access to time series research data. Based on an exploratory search (ES) concept (White and Roth in Synth Lect Inf Concepts Retr Serv 1(1):1–98, 2009), VisInfo at first provides a content-based overview visualization of large amounts of time series research data. Further, the system enables the user to define visual queries by example or by sketch. Finally, VisInfo presents visual-interactive capability for the exploration of search results. The development process of VisInfo was based on the user-centered design principle. Experts from computer science, a scientific digital library, usability engineering, and scientists from the earth, and environmental sciences were involved in an interdisciplinary approach. We report on comprehensive user studies in the requirement analysis phase based on paper prototyping, user interviews, screen casts, and user questionnaires. Heuristic evaluations and two usability testing rounds were applied during the system implementation and the deployment phase and certify measurable improvements for our DLS. Based on the lessons learned in VisInfo, we suggest a generalized project workflow that may be applied in related, prospective approaches.
      PubDate: 2014-12-03
  • A pipeline for digital restoration of deteriorating photographic negatives
    • Abstract: Extending work presented at the second International Workshop on Historical Document Imaging and Processing, we demonstrate a digitization pipeline to capture and restore negatives in low-dynamic range file formats. The majority of early photographs were captured on acetate-based film. However, it has been determined that these negatives will deteriorate beyond repair even with proper conservation and no suitable restoration method is available without physically altering each negative. In this paper, we present an automatic method to remove various non-linear illumination distortions caused by deteriorating photographic support material. First, using a high-dynamic range structured-light scanning method, a 2D Gaussian model for light transmission is estimated for each pixel of the negative image. Estimated amplitude at each pixel provides an accurate model of light transmission, but also includes regions of lower transmission caused by damaged areas. Principal component analysis is then used to estimate the photometric error and effectively restore the original illumination information of the negative. A novel tone mapping approach is then used to produce the final restored image. Using both the shift in the Gaussian light stripes between pixels and their variations in standard deviation, a 3D surface estimate is calculated. Experiments of real historical negatives show promising results for widespread implementation in memory institutions.
      PubDate: 2014-11-08
  • Assisting digital interoperability and preservation through advanced
           dependency reasoning
    • Abstract: Digital material has to be preserved not only against loss or corruption, but also against changes in its ecosystem. A quite general view of the digital preservation problem is to approach it from a dependency management point of view. In this paper, we present a rule-based approach for dependency management which can model also converters and emulators. We show that this modeling approach enables the automatic reasoning needed for reducing the human effort required for checking (and monitoring) whether a task on a digital object is performable. We provide examples demonstrating how real-world converters and emulators can be modeled, and show how the preservation services can be implemented. Subsequently, we detail an implementation based on semantic web technologies, describe the prototype system Epimenides which demonstrates the feasibility of the approach, and finally report various promising evaluation results.
      PubDate: 2014-10-29
  • Evaluating a digital humanities research environment: the CULTURA approach
    • Abstract: Digital humanities initiatives play an important role in making cultural heritage collections accessible to the global community of researchers and general public for the first time. Further work is needed to provide useful and usable tools to support users in working with those digital contents in virtual environments. The CULTURA project has developed a corpus agnostic research environment integrating innovative services that guide, assist and empower a broad spectrum of users in their interaction with cultural artefacts. This article presents (1) the CULTURA system and services and the two collections that have been used for testing and deploying the digital humanities research environment, and (2) an evaluation methodology and formative evaluation study with apprentice researchers. An evaluation model was developed which has served as a common ground for systematic evaluations of the CULTURA environment with user communities around the two test bed collections. The evaluation method has proven to be suitable for accommodating different evaluation strategies and allows meaningful consolidation of evaluation results. The evaluation outcomes indicate a positive perception of CULTURA. A range of useful suggestions for future improvement has been collected and fed back into the development of the next release of the research environment.
      PubDate: 2014-09-16
  • A case study on propagating and updating provenance information using the
           CIDOC CRM
    • Abstract: Provenance information of digital objects maintained by digital libraries and archives is crucial for authenticity assessment, reproducibility and accountability. Such information is commonly stored on metadata placed in various Metadata Repositories (MRs) or Knowledge Bases (KBs). Nevertheless, in various settings it is prohibitive to store the provenance of each digital object due to the high storage space requirements that are needed for having complete provenance. In this paper, we introduce provenance-based inference rules as a means to complete the provenance information, to reduce the amount of provenance information that has to be stored, and to ease quality control (e.g., corrections). Roughly, we show how provenance information can be propagated by identifying a number of basic inference rules over a core conceptual model for representing provenance. The propagation of provenance concerns fundamental modelling concepts such as actors, activities, events, devices and information objects, and their associations. However, since a MR/KB is not static but changes over time due to several factors, the question that arises is how we can satisfy update requests while still supporting the aforementioned inference rules. Towards this end, we elaborate on the specification of the required add/delete operations, consider two different semantics for deletion of information, and provide the corresponding update algorithms. Finally, we report extensive comparative results for different repository policies regarding the derivation of new knowledge, in datasets containing up to one million RDF triples. The results allow us to understand the tradeoffs related to the use of inference rules on storage space and performance of queries and updates.
      PubDate: 2014-08-29
  • How to assess image quality within a workflow chain: an overview
    • Abstract: Image quality assessment (IQA) is a multi-dimensional research problem and an active and evolving research area. This paper aims to provide an overview of the state of the art of the IQA methods, putting in evidence their applicability and limitations in different application domains. We outline the relationship between the image workflow chain and the IQA approaches reviewing the literature on IQA methods, classifying and summarizing the available metrics. We present general guidelines for three workflow chains in which IQA policies are required. The three workflow chains refer to: high-quality image archives, biometric system and consumer collections of personal photos. Finally, we illustrate a real case study referring to a printing workflow chain, where we suggest and actually evaluate the performance of a set of specific IQA methods.
      PubDate: 2014-08-15
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2015