for Journals by Title or ISSN
for Articles by Keywords
Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover International Journal on Digital Libraries
  [SJR: 0.203]   [H-I: 24]   [538 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1432-1300 - ISSN (Online) 1432-5012
   Published by Springer-Verlag Homepage  [2335 journals]
  • Using a file history graph to keep track of personal resources across
           devices and services
    • Abstract: Abstract Personal digital resources now tend to be stored, managed and shared using a variety of devices and online services. As a result, different versions of resources are often stored in different places, and it has become increasingly difficult for users to keep track of them. We introduce the concept of a file history graph that can be used to provide users with a global view of resource provenance and enable them to track specific versions across devices and services. We describe how this has been used to realise a version-aware environment, called Memsy, and report on a lab study used to evaluate the proposed workflow. We also describe how reconciliation services can be used to fill in missing links in the file history graph and present a detailed study for the case of images as a proof of concept.
      PubDate: 2016-07-07
  • A semantic architecture for preserving and interpreting the information
           contained in Irish historical vital records
    • Abstract: Abstract Irish Record Linkage 1864–1913 is a multi-disciplinary project that started in 2014 aiming to create a platform for analyzing events captured in historical birth, marriage, and death records by applying semantic technologies for annotating, storing, and inferring information from the data contained in those records. This enables researchers to, among other things, investigate to what extent maternal and infant mortality rates were underreported. We report on the semantic architecture, provide motivation for the adoption of RDF and Linked Data principles, and elaborate on the ontology construction process that was influenced by both the requirements of the digital archivists and historians. Concerns of digital archivists include the preservation of the archival record and following best practices in preservation, cataloguing, and data protection. The historians in this project wish to discover certain patterns in those vital records. An important aspect of the semantic architecture is the clear separation of concerns that reflects those distinct requirements—the transcription and archival authenticity of the register pages and the interpretation of the transcribed data—that led to the creation of two distinct ontologies and knowledge bases. The advantage of this clear separation is the transcription of register pages resulted in a reusable data set fit for other research purposes. These transcriptions were enriched with metadata according to best practices in archiving for ingestion in suitable long-term digital preservation platforms.
      PubDate: 2016-07-01
  • Evaluating unsupervised thesaurus-based labeling of audiovisual content in
           an archive production environment
    • Abstract: Abstract In this paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results for given requirements with respect to archival quality, authority and service levels to external users. We conclude that with parameter settings that are optimized using a rigorous evaluation of precision and accuracy, the quality of automatic term-suggestion is sufficiently high. We furthermore provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective. Additional future work will be on deploying different information sources including annotations based on multimodal video analysis such as speaker recognition and computer vision.
      PubDate: 2016-06-23
  • Key components of data publishing: using current best practices to develop
           a reference model for data publishing
    • Abstract: Abstract The availability of workflows for data publishing could have an enormous impact on researchers, research practices and publishing paradigms, as well as on funding strategies and career and research evaluations. We present the generic components of such workflows to provide a reference model for these stakeholders. The RDA-WDS Data Publishing Workflows group set out to study the current data-publishing workflow landscape across disciplines and institutions. A diverse set of workflows were examined to identify common components and standard practices, including basic self-publishing services, institutional data repositories, long-term projects, curated data repositories, and joint data journal and repository arrangements. The results of this examination have been used to derive a data-publishing reference model comprising generic components. From an assessment of the current data-publishing landscape, we highlight important gaps and challenges to consider, especially when dealing with more complex workflows and their integration into wider community frameworks. It is clear that the data-publishing landscape is varied and dynamic and that there are important gaps and challenges. The different components of a data-publishing system need to work, to the greatest extent possible, in a seamless and integrated way to support the evolution of commonly understood and utilized standards and—eventually—to increased reproducibility. We therefore advocate the implementation of existing standards for repositories and all parts of the data-publishing process, and the development of new standards where necessary. Effective and trustworthy data publishing should be embedded in documented workflows. As more research communities seek to publish the data associated with their research, they can build on one or more of the components identified in this reference model.
      PubDate: 2016-06-20
  • Implementation of a workflow for publishing citeable environmental data:
           successes, challenges and opportunities from a data centre perspective
    • Abstract: Abstract In recent years, the development and implementation of a robust way to cite data have encouraged many previously sceptical environmental researchers to publish the data they create, thus ensuring that more data than ever are now open and available for re-use within and between research communities. Here, we describe a workflow for publishing citeable data in the context of the environmental sciences—an area spanning many domains and generating a vast array of heterogeneous data products. The processes and tools we have developed have enabled rapid publication of quality data products including datasets, models and model outputs which can be accessed, re-used and subsequently cited. However, there are still many challenges that need to be addressed before researchers in the environmental sciences fully accept the notion that datasets are valued outputs and time should be spent in properly describing, storing and citing them. Here, we identify current challenges such as citation of dynamic datasets and issues of recording and presenting citation metrics. In conclusion, whilst data centres may have the infrastructure, tools, resources and processes available to publish citeable datasets, further work is required before large-scale uptake of the services offered is achieved. We believe that once current challenges are met, data resources will be viewed similarly to journal publications as valued outputs in a researcher’s portfolio, and therefore both the quality and quantity of data published will increase.
      PubDate: 2016-06-18
  • Automating data sharing through authoring tools
    • Abstract: Abstract In the current scientific publishing landscape, there is a need for an authoring workflow that easily integrates data and code into manuscripts and that enables the data and code to be published in reusable form. Automated embedding of data and code into published output will enable superior communication and data archiving. In this work, we demonstrate a proof of concept for a workflow, org-mode, which successfully provides this authoring capability and workflow integration. We illustrate this concept in a series of examples for potential uses of this workflow. First, we use data on citation counts to compute the h-index of an author, and show two code examples for calculating the h-index. The source for each example is automatically embedded in the PDF during the export of the document. We demonstrate how data can be embedded in image files, which themselves are embedded in the document. Finally, metadata about the embedded files can be automatically included in the exported PDF, and accessed by computer programs. In our customized export, we embedded metadata about the attached files in the PDF in an Info field. A computer program could parse this output to get a list of embedded files and carry out analyses on them. Authoring tools such as Emacs + org-mode can greatly facilitate the integration of data and code into technical writing. These tools can also automate the embedding of data into document formats intended for consumption.
      PubDate: 2016-06-11
  • X3ML mapping framework for information integration in cultural heritage
           and beyond
    • Abstract: Abstract The aggregation of heterogeneous data from different institutions in cultural heritage and e-science has the potential to create rich data resources useful for a range of different purposes, from research to education and public interests. In this paper, we present the X3ML framework, a framework for information integration that handles effectively and efficiently the steps involved in schema mapping, uniform resource identifier (URI) definition and generation, data transformation, provision and aggregation. The framework is based on the X3ML mapping definition language for describing both schema mappings and URI generation policies and has a lot of advantages when compared with other relevant frameworks. We describe the architecture of the framework as well as details on the various available components. Usability aspects are discussed and performance metrics are demonstrated. The high impact of our work is verified via the increasing number of international projects that adopt and use this framework.
      PubDate: 2016-06-06
  • A quantitative approach to evaluate Website Archivability using the CLEAR+
    • Abstract: Abstract Website Archivability (WA) is a notion established to capture the core aspects of a website, crucial in diagnosing whether it has the potential to be archived with completeness and accuracy. In this work, aiming at measuring WA, we introduce and elaborate on all aspects of CLEAR+, an extended version of the Credible Live Evaluation Method for Archive Readiness (CLEAR) method. We use a systematic approach to evaluate WA from multiple different perspectives, which we call Website Archivability Facets. We then analyse, a web application we created as the reference implementation of CLEAR+, and discuss the implementation of the evaluation workflow. Finally, we conduct thorough evaluations of all aspects of WA to support the validity, the reliability and the benefits of our method using real-world web data.
      PubDate: 2016-06-01
  • Scalable continual quality control of formative assessment items in an
           educational digital library: an empirical study
    • Abstract: Abstract An essential component of any library of online learning objects is assessment items, for example, homework, quizzes, and self-study questions. As opposed to exams, these items are formative in nature, as they help the learner to assess his or her own progress through the material. When it comes to quality control of these items, their formative nature poses additional challenges. e.g., there is no particular time interval in which learners interact with these items, learners come to these items with very different levels of preparation and seriousness, guessing generates noise in the data, and the numbers of items and learners can be several orders of magnitude larger than in summative settings. This empirical study aims to find a highly scalable mechanism for continual quality control of this class of digital content with a minimalist amount of additional metadata and transactional data, while taking into account also characteristics of the learners. In a subsequent evaluation of the model on a limited set of transactions, we find that taking into account the learner characteristic of ability improves the quality of item metadata, and in a comparison to Item Response Theory (IRT), we find that the developed model in fact performs slightly better in terms of predicting the outcome of formative assessment transactions, while never matching the performance of IRT on predicting the outcome of summative assessment.
      PubDate: 2016-06-01
  • A locality-aware similar information searching scheme
    • Abstract: Abstract In a database, a similar information search means finding data records which contain the majority of search keywords. Due to the rapid accumulation of information nowadays, the size of databases has increased dramatically. An efficient information searching scheme can speed up information searching and retrieve all relevant records. This paper proposes a Hilbert curve-based similarity searching scheme (HCS). HCS considers a database to be a multidimensional space and each data record to be a point in the multidimensional space. By using a Hilbert space filling curve, each point is projected from a high-dimensional space to a low-dimensional space, so that the points close to each other in the high-dimensional space are gathered together in the low-dimensional space. Because the database is divided into many clusters of close points, a query is mapped to a certain cluster instead of searching the entire database. Experimental results prove that HCS dramatically reduces the search time latency and exhibits high effectiveness in retrieving similar information.
      PubDate: 2016-06-01
  • The impact of JavaScript on archivability
    • Abstract: Abstract As web technologies evolve, web archivists work to adapt so that digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts (Ajax) that, for example, load data without a change in top level Universal Resource Identifier (URI) or require user interaction (e.g., content loading via Ajax when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. In an effort to understand why mementos (archived versions of live resources) in today’s archives vary in completeness and sometimes pull content from the live web, we present a study of web resources and archival tools. We used a collection of URIs shared over Twitter and a collection of URIs curated by Archive-It in our investigation. We created local archived versions of the URIs from the Twitter and Archive-It sets using WebCite, wget, and the Heritrix crawler. We found that only 4.2 % of the Twitter collection is perfectly archived by all of these tools, while 34.2 % of the Archive-It collection is perfectly archived. After studying the quality of these mementos, we identified the practice of loading resources via JavaScript (Ajax) as the source of archival difficulty. Further, we show that resources are increasing their use of JavaScript to load embedded resources. By 2012, over half (54.5 %) of pages use JavaScript to load embedded resources. The number of embedded resources loaded via JavaScript has increased by 12.0 % from 2005 to 2012. We also show that JavaScript is responsible for 33.2 % more missing resources in 2012 than in 2005. This shows that JavaScript is responsible for an increasing proportion of the embedded resources unsuccessfully loaded by mementos. JavaScript is also responsible for 52.7 % of all missing embedded resources in our study.
      PubDate: 2016-06-01
  • Semantic representation and enrichment of information retrieval
           experimental data
    • Abstract: Abstract Experimental evaluation carried out in international large-scale campaigns is a fundamental pillar of the scientific and technological advancement of information retrieval (IR) systems. Such evaluation activities produce a large quantity of scientific and experimental data, which are the foundation for all the subsequent scientific production and development of new systems. In this work, we discuss how to semantically annotate and interlink this data, with the goal of enhancing their interpretation, sharing, and reuse. We discuss the underlying evaluation workflow and propose a resource description framework model for those workflow parts. We use expertise retrieval as a case study to demonstrate the benefits of our semantic representation approach. We employ this model as a means for exposing experimental data as linked open data (LOD) on the Web and as a basis for enriching and automatically connecting this data with expertise topics and expert profiles. In this context, a topic-centric approach for expert search is proposed, addressing the extraction of expertise topics, their semantic grounding with the LOD cloud, and their connection to IR experimental data. Several methods for expert profiling and expert finding are analysed and evaluated. Our results show that it is possible to construct expert profiles starting from automatically extracted expertise topics and that topic-centric approaches outperform state-of-the-art language modelling approaches for expert finding.
      PubDate: 2016-05-28
  • Meeting the challenge of environmental data publication: an operational
           infrastructure and workflow for publishing data
    • Abstract: Abstract Here we describe the defined workflow and its supporting infrastructure, which are used by the Natural Environment Research Council’s (NERC) Environmental Information Data Centre (EIDC) ( to enable publication of environmental data in the fields of ecology and hydrology. The methods employed and issues discussed are also relevant to publication in other domains. By utilising a clearly defined workflow for data publication, we operate a fully auditable, quality controlled series of steps permitting publication of environmental data. The described methodology meets the needs of both data producers and data users, whose requirements are not always aligned. A stable, logically created infrastructure supporting data publication allows the process to occur in a well-managed and secure fashion, while remaining flexible enough to deal with a range of data types and user requirements. We discuss the primary issues arising from data publication, and describe how many of them have been resolved by the methods we have employed, with demonstrable results. In conclusion, we expand on future directions we wish to develop to aid data publication by both solving problems for data generators and improving the end-user experience.
      PubDate: 2016-05-27
  • Experiences in integrated data and research object publishing using GigaDB
    • Abstract: Abstract In the era of computation and data-driven research, traditional methods of disseminating research are no longer fit-for-purpose. New approaches for disseminating data, methods and results are required to maximize knowledge discovery. The “long tail” of small, unstructured datasets is well catered for by a number of general-purpose repositories, but there has been less support for “big data”. Outlined here are our experiences in attempting to tackle the gaps in publishing large-scale, computationally intensive research. GigaScience is an open-access, open-data journal aiming to revolutionize large-scale biological data dissemination, organization and re-use. Through use of the data handling infrastructure of the genomics centre BGI, GigaScience links standard manuscript publication with an integrated database (GigaDB) that hosts all associated data, and provides additional data analysis tools and computing resources. Furthermore, the supporting workflows and methods are also integrated to make published articles more transparent and open. GigaDB has released many new and previously unpublished datasets and data types, including as urgently needed data to tackle infectious disease outbreaks, cancer and the growing food crisis. Other “executable” research objects, such as workflows, virtual machines and software from several GigaScience articles have been archived and shared in reproducible, transparent and usable formats. With data citation producing evidence of, and credit for, its use in the wider research community, GigaScience demonstrates a move towards more executable publications. Here data analyses can be reproduced and built upon by users without coding backgrounds or heavy computational infrastructure in a more democratized manner.
      PubDate: 2016-05-27
  • Advancing research data publishing practices for the social sciences: from
           archive activity to empowering researchers
    • Abstract: Abstract Sharing and publishing social science research data have a long history in the UK, through long-standing agreements with government agencies for sharing survey data and the data policy, infrastructure, and data services supported by the Economic and Social Research Council. The UK Data Service and its predecessors developed data management, documentation, and publishing procedures and protocols that stand today as robust templates for data publishing. As the ESRC research data policy requires grant holders to submit their research data to the UK Data Service after a grant ends, setting standards and promoting them has been essential in raising the quality of the resulting research data being published. In the past, received data were all processed, documented, and published for reuse in-house. Recent investments have focused on guiding and training researchers in good data management practices and skills for creating shareable data, as well as a self-publishing repository system, ReShare. ReShare also receives data sets described in published data papers and achieves scientific quality assurance through peer review of submitted data sets before publication. Social science data are reused for research, to inform policy, in teaching and for methods learning. Over a 10 years period, responsive developments in system workflows, access control options, persistent identifiers, templates, and checks, together with targeted guidance for researchers, have helped raise the standard of self-publishing social science data. Lessons learned and developments in shifting publishing social science data from an archivist responsibility to a researcher process are showcased, as inspiration for institutions setting up a data repository.
      PubDate: 2016-05-25
  • The evolution of web archiving
    • Abstract: Abstract Web archives preserve information published on the web or digitized from printed publications. Much of this information is unique and historically valuable. However, the lack of knowledge about the global status of web archiving initiatives hamper their improvement and collaboration. To overcome this problem, we conducted two surveys, in 2010 and 2014, which provide a comprehensive characterization on web archiving initiatives and their evolution. We identified several patterns and trends that highlight challenges and opportunities. We discuss these patterns and trends that enable to define strategies, estimate resources and provide guidelines for research and development of better technology. Our results show that during the last years there was a significant growth in initiatives and countries hosting these initiatives, volume of data and number of contents preserved. While this indicates that the web archiving community is dedicating a growing effort on preserving digital information, other results presented throughout the paper raise concerns such as the small amount of archived data in comparison with the amount of data that is being published online.
      PubDate: 2016-05-09
  • Scholarly Ontology: modelling scholarly practices
    • Abstract: Abstract In this paper we present the Scholarly Ontology (SO), an ontology for modelling scholarly practices, inspired by business process modelling and Cultural-Historical Activity Theory. The SO is based on empirical research and earlier models and is designed so as to incorporate related works through a modular structure. The SO is an elaboration of the domain-independent core part of the NeDiMAH Methods Ontology addressing the scholarly ecosystem of Digital Humanities. It thus provides a basis for developing domain-specific scholarly work ontologies springing from a common root. We define the basic concepts of the model and their semantic relations through four complementary perspectives on scholarly work: activity, procedure, resource and agency. As a use case we present a modelling example and argue on the purpose of use of the model through the presentation of indicative SPRQL and SQWRL queries that highlight the benefits of its serialization in RDFS. The SO includes an explicit treatment of intentionality and its interplay with functionality, captured by different parts of the model. We discuss the role of types as the semantic bridge between those two parts and explore several patterns that can be exploited in designing reusable access structures and conformance rules. Related taxonomies and ontologies and their possible reuse within the framework of SO are reviewed.
      PubDate: 2016-05-04
  • Supporting academic search tasks through citation visualization and
    • Abstract: Abstract Despite ongoing advances in information retrieval algorithms, people continue to experience difficulties when conducting online searches within digital libraries. Because their information-seeking goals are often complex, searchers may experience difficulty in precisely describing what they are seeking. Current search interfaces provide limited support for navigating and exploring among the search results and helping searchers to more accurately describe what they are looking for. In this paper, we present a novel visual library search interface, designed with the goal of providing interactive support for common library search tasks and behaviours. This system takes advantage of the rich metadata available in academic collections and employs information visualization techniques to support search results evaluation, forward and backward citation exploration, and interactive query refinement.
      PubDate: 2016-04-26
  • Recent applications of Knowledge Organization Systems: introduction to a
           special issue
    • Abstract: Abstract This special issue of the International Journal of Digital Libraries evolved from the 13th Networked Knowledge Organization Systems (NKOSs) workshop held at the joint Digital Libraries conference 2014 in London. The focus of the workshop was ‘Mapping between Linked Data vocabularies of KOS’ and ‘Meaningful Concept Display and Meaningful Visualization of KOS’. The issue presents six papers on the general theme on both conceptual aspects and technical implementation of NKOS. We dedicate this special issue to our long-term colleague and friend Johan De Smedt who died in June 2015 while we were editing the special issue.
      PubDate: 2015-09-04
  • Representing gazetteers and period thesauri in four-dimensional
    • Abstract: Abstract Gazetteers, i.e., lists of place-names, enable having a global vision of places of interest through the assignment of a point, or a region, to a place name. However, such identification of the location corresponding to a place name is often a difficult task. There is no one-to-one correspondence between the two sets, places and names, because of name variants, different names for the same place and homonymy; the location corresponding to a place name may vary in time, changing its extension or even the position; and, in general, there is the imprecision deriving from the association of a concept belonging to language (the place name) to a precise concept (the spatial location). Also for named time periods, e.g., early Bronze Age, which are of current use in archaeology, the situation is similar: they depend on the location to which they refer as the same period may have different time-spans in different locations. The present paper avails of a recent extension of the CIDOC CRM called CRMgeo, which embeds events in a spatio-temporal 4-dimensional framework. The paper uses concepts from CRMgeo and introduces extensions to model gazetteers and period thesauri. This approach enables dealing with time-varying location appellations as well as with space-varying period appellations on a robust basis. For this purpose a refinement/extension of CRMgeo is proposed and a discretization of space and time is used to approximate real space–time extents occupied by events. Such an approach solves the problem and suggests further investigations in various directions.
      PubDate: 2015-07-21
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016