for Journals by Title or ISSN
for Articles by Keywords
help
Followed Journals
Journal you Follow: 0
 
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover International Journal on Digital Libraries
  [SJR: 0.203]   [H-I: 24]   [533 followers]  Follow
    
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1432-1300 - ISSN (Online) 1432-5012
   Published by Springer-Verlag Homepage  [2334 journals]
  • A quantitative approach to evaluate Website Archivability using the CLEAR+
           method
    • Authors: Vangelis Banos; Yannis Manolopoulos
      Pages: 119 - 141
      Abstract: Abstract Website Archivability (WA) is a notion established to capture the core aspects of a website, crucial in diagnosing whether it has the potential to be archived with completeness and accuracy. In this work, aiming at measuring WA, we introduce and elaborate on all aspects of CLEAR+, an extended version of the Credible Live Evaluation Method for Archive Readiness (CLEAR) method. We use a systematic approach to evaluate WA from multiple different perspectives, which we call Website Archivability Facets. We then analyse archiveready.com, a web application we created as the reference implementation of CLEAR+, and discuss the implementation of the evaluation workflow. Finally, we conduct thorough evaluations of all aspects of WA to support the validity, the reliability and the benefits of our method using real-world web data.
      PubDate: 2016-06-01
      DOI: 10.1007/s00799-015-0144-4
      Issue No: Vol. 17, No. 2 (2016)
       
  • Scalable continual quality control of formative assessment items in an
           educational digital library: an empirical study
    • Authors: Gerd Kortemeyer
      Pages: 143 - 155
      Abstract: Abstract An essential component of any library of online learning objects is assessment items, for example, homework, quizzes, and self-study questions. As opposed to exams, these items are formative in nature, as they help the learner to assess his or her own progress through the material. When it comes to quality control of these items, their formative nature poses additional challenges. e.g., there is no particular time interval in which learners interact with these items, learners come to these items with very different levels of preparation and seriousness, guessing generates noise in the data, and the numbers of items and learners can be several orders of magnitude larger than in summative settings. This empirical study aims to find a highly scalable mechanism for continual quality control of this class of digital content with a minimalist amount of additional metadata and transactional data, while taking into account also characteristics of the learners. In a subsequent evaluation of the model on a limited set of transactions, we find that taking into account the learner characteristic of ability improves the quality of item metadata, and in a comparison to Item Response Theory (IRT), we find that the developed model in fact performs slightly better in terms of predicting the outcome of formative assessment transactions, while never matching the performance of IRT on predicting the outcome of summative assessment.
      PubDate: 2016-06-01
      DOI: 10.1007/s00799-015-0145-3
      Issue No: Vol. 17, No. 2 (2016)
       
  • Harmonizing the CRMba and CRMarchaeo models
    • Authors: Paola Ronzino
      Abstract: Abstract This work presents the initial thoughts towards the harmonization of the CRMba and CRMarchaeo models, two extensions of the CIDOC CRM, the former developed to model the complexity of a built structure from the perspective of buildings archaeology, while the latter was developed to model the processes involved in the investigation of subsurface archaeological deposits. The paper describes the modelling principles of CRMba and CRMarchaeo, and identifies common concepts that will allow to merge the two ontological models.
      PubDate: 2016-08-19
      DOI: 10.1007/s00799-016-0193-3
       
  • CRMgeo: A spatiotemporal extension of CIDOC-CRM
    • Authors: Gerald Hiebel; Martin Doerr; Øyvind Eide
      Abstract: Abstract CRMgeo is a formal ontology intended to be used as a global schema for integrating spatiotemporal properties of temporal entities and persistent items. Its primary purpose is to provide a schema consistent with the CIDOC CRM to integrate geoinformation using the conceptualizations, formal definitions, encoding standards and topological relations defined by the Open Geospatial Consortium in GeoSPARQL. To build the ontology, the same ontology engineering methodology was used as in the CIDOC CRM. CRMgeo first introduced the concept of Spacetime volume that was subsequently included in the CIDOC CRM and provides a differentiation between phenomenal and declarative Spacetime volume, Place and Time-Span. Phenomenal classes derive their identity from real world phenomena like events or things and declarative classes derive their identity from human declarations like dates or coordinates. This differentiation is an essential conceptual background to link CIDOC CRM to the classes, topological relations and encodings provided by Geo-SPARQL and thus allowing spatiotemporal analysis offered by geoinformation systems based on the semantic distinctions of the CIDOC CRM. CRMgeo introduces the classes and relations necessary to model the spatiotemporal properties of real world phenomena and their topological and semantic relations to spatiotemporal information about these phenomena that was derived from historic sources, maps, observations or measurements. It is able to model the full chain of approximating and finding again a phenomenal place, like the actual site of a ship wreck, by a declarative place, like a mark on a sea chart.
      PubDate: 2016-08-13
      DOI: 10.1007/s00799-016-0192-4
       
  • What’s news? Encounters with news in everyday life: a study of
           behaviours and attitudes
    • Authors: Sally Jo Cunningham; David M. Nichols; Annika Hinze; Judy Bowen
      Abstract: Abstract As the news landscape changes, for many users the nature of news itself is changing as well. Insights into the changing news behaviour of users can inform the design of access tools and news archives. We analysed a set of 35 autoethnographies of news encounters, created by students in New Zealand. These comprise rich descriptions of the news sources, modalities, topics of interest, and news ‘routines’ by which the students keep in touch with friends and maintain awareness of personal, local, national, and international events. We explore the implications of these insights into news behaviour for digital news systems.
      PubDate: 2016-08-10
      DOI: 10.1007/s00799-016-0187-1
       
  • Off-the-shelf CRM with Drupal: a case study of documenting decorated
           papers
    • Authors: Athanasios Velios; Aurelie Martin
      Abstract: Abstract We present a method of setting up a website using the Drupal CMS to publish CRM data. Our setup requires basic technical expertise by researchers who are then able to publish their records in both a human accessible way through HTML and a machine friendly format through RDFa. We begin by examining previous work on Drupal and the CRM and identifying useful patterns. We present the Drupal modules that are required by our setup and we explain why these are sustainable. We continue by giving guidelines for setting up Drupal to serve CRM data easily and we describe a specific installation for our case study which is related to decorated papers alongside our CRM mapping. We finish with highlighting the benefits of our method (i.e. speed and user-friendliness) and we refer to a number of issues which require further work (i.e. automatic validation, UI improvements and the provision for SPARQL endpoints).
      PubDate: 2016-08-08
      DOI: 10.1007/s00799-016-0191-5
       
  • Editorial for the TPDL 2015 special issue
    • Authors: Sarantos Kapidakis; Cezary Mazurek; Marcin Werla
      PubDate: 2016-07-26
      DOI: 10.1007/s00799-016-0190-6
       
  • WW1LOD: an application of CIDOC-CRM to World War 1 linked data
    • Authors: Eetu Mäkelä; Juha Törnroos; Thea Lindquist; Eero Hyvönen
      Abstract: Abstract The CIDOC-CRM standard indicates that common events, actors, places and timeframes are important in linking together cultural material, and provides a framework for describing them. However, merely describing entities in this way in two datasets does not yet interlink them. To do that, the identities of instances still need to be either reconciled, or be based on a shared vocabulary. The WW1LOD dataset presented in this paper was created to facilitate both of these approaches for collections dealing with the First World War. For this purpose, the dataset includes events, places, agents, times, keywords, and themes related to the war, based on over ten different authoritative data sources from providers such as the Imperial War Museum. The content is harmonized into RDF, and published as a Linked Open Data service. While generally based on CIDOC-CRM, some modeling choices used also deviate from it where our experience dictated such. In the article, these deviations are discussed in the hope that they may serve as examples where CIDOC-CRM itself may warrant further examination. As a demonstration of use, the dataset and online service have been used to create a contextual reader application that is able to link together and pull in information related to WW1 from, e.g., 1914–1918 Online, Wikipedia, WW1 Discovery, Europeana and the Digital Public Library of America.
      PubDate: 2016-07-26
      DOI: 10.1007/s00799-016-0186-2
       
  • Scripta manent: a CIDOC CRM semiotic reading of ancient texts
    • Authors: Achille Felicetti; Francesca Murano
      Abstract: Abstract This paper tries to identify the most important concepts involved in the study of ancient texts and proposes the use of CIDOC CRM to encode them and to model the scientific process of investigation related to the study of ancient texts to foster integration with other cultural heritage research fields. After identifying the key concepts, assessing the available technologies and analysing the entities provided by CIDOC CRM and by its extensions, we introduce more specific classes to be used as the basis for creating a new extension, CRMtex, which is more responsive to the specific needs of the various disciplines involved (including papyrology, palaeography, codicology and epigraphy).
      PubDate: 2016-07-22
      DOI: 10.1007/s00799-016-0189-z
       
  • Process, concept or thing? Some initial considerations in the
           ontological modelling of architecture
    • Authors: Anais Guillem; George Bruseker; Paola Ronzino
      Abstract: Abstract Architectural knowledge, representing an understanding of our built environment and how it functions, is a domain of research of high interest as much to laypeople as to architects themselves, researchers in cultural heritage in general and formal ontologists. In this work, we aim to provide an initial approach to the question of how to model architectural data in a formal ontology structure and consider some of the problems involved. This question is challenging both for the inherent difficulties of the discourse to be modelled but also for the lack of available structured data sources that would distinctly represent the architectural perspective proper, as well as for the contentious nature of the definition of architecture itself. We, therefore, take the step of exploring in broad strokes the possible approaches to architecture, tracing the notion of architecture as idea, process or thing from the literature. On the basis of this enquiry, we propose a model of some top-level referents of architecture using FRBRoo, an extension of CIDOC CRM that can be used to model creative processes. We argue that with the addition of only four classes to this model, to capture certain architecturally specific concepts and activities, we are able to provide an adequate high-level first approach to this problem. Further, by connecting this work to the existing extension of CRMba, which models built work as a system of relations of filled and unfilled spaces, there is a sufficient high-level ontological structure to begin to test for its utility to explore the issues of the relation between architecture as idea, process and thing.
      PubDate: 2016-07-22
      DOI: 10.1007/s00799-016-0188-0
       
  • Characteristics of social media stories
    • Authors: Yasmin AlNoamany; Michele C. Weigle; Michael L. Nelson
      Abstract: Abstract An emerging trend in social media is for users to create and publish “stories”, or curated lists of  Web resources, with the purpose of creating a particular narrative of interest to the user. While some stories on the Web are automatically generated, such as Facebook’s “Year in Review”, one of the most popular storytelling services is “Storify”, which provides users with curation tools to select, arrange, and annotate stories with content from social media and the Web at large. We would like to use tools, such as Storify, to present (semi-)automatically created summaries of archival collections. To support automatic story creation, we need to better understand as a baseline the structural characteristics of popular (i.e., receiving the most views) human-generated stories. We investigated 14,568 stories from Storify, comprising 1,251,160 individual resources, and found that popular stories (i.e., top 25 % of views normalized by time available on the Web) have the following characteristics: 2/28/1950 elements (min/median/max), a median of 12 multimedia resources (e.g., images, video), 38 % receive continuing edits, and 11 % of their elements are missing from the live Web. We also checked the population of Archive-It collections (3109 collections comprising 305,522 seed URIs) for better understanding the characteristics of the collections that we intend to summarize. We found that the resources in human-generated stories are different from the resources in Archive-It collections. In summarizing a collection, we can only choose from what is archived (e.g., twitter.com is popular in Storify, but rare in Archive-It). However, some other characteristics of human-generated stories will be applicable, such as the number of resources.
      PubDate: 2016-07-21
      DOI: 10.1007/s00799-016-0185-3
       
  • Detecting off-topic pages within TimeMaps in Web archives
    • Authors: Yasmin AlNoamany; Michele C. Weigle; Michael L. Nelson
      Abstract: Abstract Web archives have become a significant repository of our recent history and cultural heritage. Archival integrity and accuracy is a precondition for future cultural research. Currently, there are no quantitative or content-based tools that allow archivists to judge the quality of the Web archive captures. In this paper, we address the problems of detecting when a particular page in a Web archive collection has gone off-topic relative to its first archived copy. We do not delete off-topic pages (they remain part of the collection), but they are flagged as off-topic so they can be excluded for consideration for downstream services, such as collection summarization and thumbnail generation. We propose different methods (cosine similarity, Jaccard similarity, intersection of the 20 most frequent terms, Web-based kernel function, and the change in size using the number of words and content length) to detect when a page has gone off-topic. Those predicted off-topic pages will be presented to the collection’s curator for possible elimination from the collection or cessation of crawling. We created a gold standard data set from three Archive-It collections to evaluate the proposed methods at different thresholds. We found that combining cosine similarity at threshold 0.10 and change in size using word count at threshold −0.85 performs the best with accuracy = 0.987, \(F_{1}\) score = 0.906, and AUC \(=\) 0.968. We evaluated the performance of the proposed method on several Archive-It collections. The average precision of detecting off-topic pages in the collections is 0.89.
      PubDate: 2016-07-18
      DOI: 10.1007/s00799-016-0183-5
       
  • Web archive profiling through CDX summarization
    • Authors: Sawood Alam; Michael L. Nelson; Herbert Van de Sompel; Lyudmila L. Balakireva; Harihar Shankar; David S. H. Rosenthal
      Abstract: Abstract With the proliferation of public web archives, it is becoming more important to better profile their contents, both to understand their immense holdings as well as to support routing of requests in the Memento aggregator. To save time, the Memento aggregator should only poll the archives that are likely to have a copy of the requested URI. Using the crawler index files produced after crawling, we can generate profiles of the archives that summarize their holdings and can be used to inform routing of the Memento aggregator’s URI requests. Previous work in profiling ranged from using full URIs (no false positives, but with large profiles) to using only top-level domains (TLDs) (smaller profiles, but with many false positives). This work explores strategies in between these two extremes. In our experiments, we correctly identified about 78 % of the URIs that were present or not present in the archive with less than 1 % relative cost as compared to the complete knowledge profile and 94 % URIs with less than 10 % relative cost without any false negatives. With respect to the TLD-only profile, the registered domain profile doubled the routing precision, while complete hostname and one path segment gave a tenfold increase in the routing precision.
      PubDate: 2016-07-16
      DOI: 10.1007/s00799-016-0184-4
       
  • Using a file history graph to keep track of personal resources across
           devices and services
    • Authors: Matthias Geel; Moira C. Norrie
      Abstract: Abstract Personal digital resources now tend to be stored, managed and shared using a variety of devices and online services. As a result, different versions of resources are often stored in different places, and it has become increasingly difficult for users to keep track of them. We introduce the concept of a file history graph that can be used to provide users with a global view of resource provenance and enable them to track specific versions across devices and services. We describe how this has been used to realise a version-aware environment, called Memsy, and report on a lab study used to evaluate the proposed workflow. We also describe how reconciliation services can be used to fill in missing links in the file history graph and present a detailed study for the case of images as a proof of concept.
      PubDate: 2016-07-07
      DOI: 10.1007/s00799-016-0181-7
       
  • A semantic architecture for preserving and interpreting the information
           contained in Irish historical vital records
    • Authors: Christophe Debruyne; Oya Deniz Beyan; Rebecca Grant; Sandra Collins; Stefan Decker; Natalie Harrower
      Abstract: Abstract Irish Record Linkage 1864–1913 is a multi-disciplinary project that started in 2014 aiming to create a platform for analyzing events captured in historical birth, marriage, and death records by applying semantic technologies for annotating, storing, and inferring information from the data contained in those records. This enables researchers to, among other things, investigate to what extent maternal and infant mortality rates were underreported. We report on the semantic architecture, provide motivation for the adoption of RDF and Linked Data principles, and elaborate on the ontology construction process that was influenced by both the requirements of the digital archivists and historians. Concerns of digital archivists include the preservation of the archival record and following best practices in preservation, cataloguing, and data protection. The historians in this project wish to discover certain patterns in those vital records. An important aspect of the semantic architecture is the clear separation of concerns that reflects those distinct requirements—the transcription and archival authenticity of the register pages and the interpretation of the transcribed data—that led to the creation of two distinct ontologies and knowledge bases. The advantage of this clear separation is the transcription of register pages resulted in a reusable data set fit for other research purposes. These transcriptions were enriched with metadata according to best practices in archiving for ingestion in suitable long-term digital preservation platforms.
      PubDate: 2016-07-01
      DOI: 10.1007/s00799-016-0180-8
       
  • Evaluating unsupervised thesaurus-based labeling of audiovisual content in
           an archive production environment
    • Authors: Victor de Boer; Roeland J. F. Ordelman; Josefien Schuurman
      Abstract: Abstract In this paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results for given requirements with respect to archival quality, authority and service levels to external users. We conclude that with parameter settings that are optimized using a rigorous evaluation of precision and accuracy, the quality of automatic term-suggestion is sufficiently high. We furthermore provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective. Additional future work will be on deploying different information sources including annotations based on multimodal video analysis such as speaker recognition and computer vision.
      PubDate: 2016-06-23
      DOI: 10.1007/s00799-016-0182-6
       
  • Key components of data publishing: using current best practices to develop
           a reference model for data publishing
    • Authors: Claire C. Austin; Theodora Bloom; Sünje Dallmeier-Tiessen; Varsha K. Khodiyar; Fiona Murphy; Amy Nurnberger; Lisa Raymond; Martina Stockhause; Jonathan Tedds; Mary Vardigan; Angus Whyte
      Abstract: Abstract The availability of workflows for data publishing could have an enormous impact on researchers, research practices and publishing paradigms, as well as on funding strategies and career and research evaluations. We present the generic components of such workflows to provide a reference model for these stakeholders. The RDA-WDS Data Publishing Workflows group set out to study the current data-publishing workflow landscape across disciplines and institutions. A diverse set of workflows were examined to identify common components and standard practices, including basic self-publishing services, institutional data repositories, long-term projects, curated data repositories, and joint data journal and repository arrangements. The results of this examination have been used to derive a data-publishing reference model comprising generic components. From an assessment of the current data-publishing landscape, we highlight important gaps and challenges to consider, especially when dealing with more complex workflows and their integration into wider community frameworks. It is clear that the data-publishing landscape is varied and dynamic and that there are important gaps and challenges. The different components of a data-publishing system need to work, to the greatest extent possible, in a seamless and integrated way to support the evolution of commonly understood and utilized standards and—eventually—to increased reproducibility. We therefore advocate the implementation of existing standards for repositories and all parts of the data-publishing process, and the development of new standards where necessary. Effective and trustworthy data publishing should be embedded in documented workflows. As more research communities seek to publish the data associated with their research, they can build on one or more of the components identified in this reference model.
      PubDate: 2016-06-20
      DOI: 10.1007/s00799-016-0178-2
       
  • Implementation of a workflow for publishing citeable environmental data:
           successes, challenges and opportunities from a data centre perspective
    • Authors: Kathryn A. Harrison; Daniel G. Wright; Philip Trembath
      Abstract: Abstract In recent years, the development and implementation of a robust way to cite data have encouraged many previously sceptical environmental researchers to publish the data they create, thus ensuring that more data than ever are now open and available for re-use within and between research communities. Here, we describe a workflow for publishing citeable data in the context of the environmental sciences—an area spanning many domains and generating a vast array of heterogeneous data products. The processes and tools we have developed have enabled rapid publication of quality data products including datasets, models and model outputs which can be accessed, re-used and subsequently cited. However, there are still many challenges that need to be addressed before researchers in the environmental sciences fully accept the notion that datasets are valued outputs and time should be spent in properly describing, storing and citing them. Here, we identify current challenges such as citation of dynamic datasets and issues of recording and presenting citation metrics. In conclusion, whilst data centres may have the infrastructure, tools, resources and processes available to publish citeable datasets, further work is required before large-scale uptake of the services offered is achieved. We believe that once current challenges are met, data resources will be viewed similarly to journal publications as valued outputs in a researcher’s portfolio, and therefore both the quality and quantity of data published will increase.
      PubDate: 2016-06-18
      DOI: 10.1007/s00799-016-0175-5
       
  • Automating data sharing through authoring tools
    • Authors: John R. Kitchin; Ana E. Van Gulick; Lisa D. Zilinski
      Abstract: Abstract In the current scientific publishing landscape, there is a need for an authoring workflow that easily integrates data and code into manuscripts and that enables the data and code to be published in reusable form. Automated embedding of data and code into published output will enable superior communication and data archiving. In this work, we demonstrate a proof of concept for a workflow, org-mode, which successfully provides this authoring capability and workflow integration. We illustrate this concept in a series of examples for potential uses of this workflow. First, we use data on citation counts to compute the h-index of an author, and show two code examples for calculating the h-index. The source for each example is automatically embedded in the PDF during the export of the document. We demonstrate how data can be embedded in image files, which themselves are embedded in the document. Finally, metadata about the embedded files can be automatically included in the exported PDF, and accessed by computer programs. In our customized export, we embedded metadata about the attached files in the PDF in an Info field. A computer program could parse this output to get a list of embedded files and carry out analyses on them. Authoring tools such as Emacs + org-mode can greatly facilitate the integration of data and code into technical writing. These tools can also automate the embedding of data into document formats intended for consumption.
      PubDate: 2016-06-11
      DOI: 10.1007/s00799-016-0173-7
       
  • X3ML mapping framework for information integration in cultural heritage
           and beyond
    • Authors: Yannis Marketakis; Nikos Minadakis; Haridimos Kondylakis; Konstantina Konsolaki; Georgios Samaritakis; Maria Theodoridou; Giorgos Flouris; Martin Doerr
      Abstract: Abstract The aggregation of heterogeneous data from different institutions in cultural heritage and e-science has the potential to create rich data resources useful for a range of different purposes, from research to education and public interests. In this paper, we present the X3ML framework, a framework for information integration that handles effectively and efficiently the steps involved in schema mapping, uniform resource identifier (URI) definition and generation, data transformation, provision and aggregation. The framework is based on the X3ML mapping definition language for describing both schema mappings and URI generation policies and has a lot of advantages when compared with other relevant frameworks. We describe the architecture of the framework as well as details on the various available components. Usability aspects are discussed and performance metrics are demonstrated. The high impact of our work is verified via the increasing number of international projects that adopt and use this framework.
      PubDate: 2016-06-06
      DOI: 10.1007/s00799-016-0179-1
       
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
 
Home (Search)
Subjects A-Z
Publishers A-Z
Customise
APIs
Your IP address: 54.166.245.26
 
About JournalTOCs
API
Help
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016