for Journals by Title or ISSN
for Articles by Keywords
Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover International Journal on Digital Libraries
  [SJR: 0.375]   [H-I: 28]   [552 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1432-1300 - ISSN (Online) 1432-5012
   Published by Springer-Verlag Homepage  [2335 journals]
  • Focused crawler for events
    • Authors: Mohamed M. G. Farag; Sunshin Lee; Edward A. Fox
      Abstract: Abstract There is need for an Integrated Event Focused Crawling system to collect Web data about key events. When a disaster or other significant event occurs, many users try to locate the most up-to-date information about that event. Yet, there is little systematic collecting and archiving anywhere of event information. We propose intelligent event focused crawling for automatic event tracking and archiving, ultimately leading to effective access. We developed an event model that can capture key event information, and incorporated that model into a focused crawling algorithm. For the focused crawler to leverage the event model in predicting webpage relevance, we developed a function that measures the similarity between two event representations. We then conducted two series of experiments to evaluate our system about two recent events: California shooting and Brussels attack. The first experiment series evaluated the effectiveness of our proposed event model representation when assessing the relevance of webpages. Our event model-based representation outperformed the baseline method (topic-only); it showed better results in precision, recall, and F1-score with an improvement of 20% in F1-score. The second experiment series evaluated the effectiveness of the event model-based focused crawler for collecting relevant webpages from the WWW. Our event model-based focused crawler outperformed the state-of-the-art baseline focused crawler (best-first); it showed better results in harvest ratio with an average improvement of 40%.
      PubDate: 2017-01-07
      DOI: 10.1007/s00799-016-0207-1
  • Applications of RISM data in digital libraries and digital musicology
    • Authors: Klaus Keil; Jennifer A. Ward
      Abstract: Abstract Information about manuscripts and printed music indexed in RISM (Répertoire International des Sources Musicales), a large, international project that records and describes musical sources, was for decades available solely through book publications, CD-ROMs, or subscription services. Recent initiatives to make the data available on a wider scale have resulted in, most significantly, a freely accessible online database and the availability of its data as open data and linked open data. Previously, the task of increasing the amount of data was primarily carried out by RISM national groups and the Zentralredaktion (Central Office). The current opportunities available by linking to other freely accessible databases and importing data from other resources open new perspectives and prospects. This paper describes the RISM data and their applications for digital libraries and digital musicological projects. We discuss the possibilities and challenges in making available a large and growing quantity of data and how the data have been utilized in external library and musicological projects. Interactive functions in the RISM OPAC are planned for the future, as is closer collaboration with the projects that use RISM data. Ultimately, RISM would like to arrange a “take and give” system in which the RISM data are used in external projects, enhanced by the project participants, and then delivered back to the RISM Zentralredaktion.
      PubDate: 2017-01-06
      DOI: 10.1007/s00799-016-0205-3
  • Results of a digital library curriculum field test
    • Authors: Sanghee Oh; Seungwon Yang; Jeffrey P. Pomerantz; Barbara M. Wildemuth; Edward A. Fox
      Pages: 273 - 286
      Abstract: Abstract The DL Curriculum Development project was launched in 2006, responding to an urgent need for consensus on DL curriculum across the fields of computer science and information and library science. Over the course of several years, 13 modules of a digital libraries (DL) curriculum were developed and were ready for field testing. The modules were evaluated in DL courses in real classroom environments in 37 classes by 15 instructors and their students. Interviews with instructors and questionnaires completed by their students were used to collect evaluative feedback. Findings indicate that the modules have been well designed to educate students on important topics and issues in DLs, in general. Suggestions to improve the modules based on the interviews and questionnaires were discussed as well. After the field test, module development has been continued, not only for the DL community but also others associated with DLs, such as information retrieval, big data, and multimedia. Currently, 56 modules are readily available for use through the project website or the Wikiversity site.
      PubDate: 2016-11-01
      DOI: 10.1007/s00799-015-0151-5
      Issue No: Vol. 17, No. 4 (2016)
  • Systems integration of heterogeneous cultural heritage information systems
           in museums: a case study of the National Palace Museum
    • Authors: Shao-Chun Wu
      Pages: 287 - 304
      Abstract: Abstract This study addresses the process of information systems integration in museums. Research emphasis has concentrated on systems integration in the business community after restructuring of commercial enterprises. Museums fundamentally differ from commercial enterprises and thus cannot wholly rely on the business model for systems integration. A case study of the National Palace Museum in Taiwan was conducted to investigate its systems integration of five legacy systems into one information system for museum and public use. Participatory observation methods were used to collect data for inductive analysis. The results suggested that museums are motivated to integrate their systems by internal cultural and administrative operations, external cultural and creative industries, public expectations, and information technology attributes. Four factors related to the success of the systems integration project: (1) the unique attributes of a museum’s artifacts, (2) the attributes and needs of a system’s users, (3) the unique demands of museum work, and (4) the attributes of existing information technology resources within a museum. The results provide useful reference data for other museums when they carry out systems integration.
      PubDate: 2016-11-01
      DOI: 10.1007/s00799-015-0154-2
      Issue No: Vol. 17, No. 4 (2016)
  • Research-paper recommender systems: a literature survey
    • Authors: Joeran Beel; Bela Gipp; Stefan Langer; Corinna Breitinger
      Pages: 305 - 338
      Abstract: Abstract In the last 16 years, more than 200 research articles were published about research-paper recommender systems. We reviewed these articles and present some descriptive statistics in this paper, as well as a discussion about the major advancements and shortcomings and an overview of the most common recommendation concepts and approaches. We found that more than half of the recommendation approaches applied content-based filtering (55 %). Collaborative filtering was applied by only 18 % of the reviewed approaches, and graph-based recommendations by 16 %. Other recommendation concepts included stereotyping, item-centric recommendations, and hybrid recommendations. The content-based filtering approaches mainly utilized papers that the users had authored, tagged, browsed, or downloaded. TF-IDF was the most frequently applied weighting scheme. In addition to simple terms, n-grams, topics, and citations were utilized to model users’ information needs. Our review revealed some shortcomings of the current research. First, it remains unclear which recommendation concepts and approaches are the most promising. For instance, researchers reported different results on the performance of content-based and collaborative filtering. Sometimes content-based filtering performed better than collaborative filtering and sometimes it performed worse. We identified three potential reasons for the ambiguity of the results. (A) Several evaluations had limitations. They were based on strongly pruned datasets, few participants in user studies, or did not use appropriate baselines. (B) Some authors provided little information about their algorithms, which makes it difficult to re-implement the approaches. Consequently, researchers use different implementations of the same recommendations approaches, which might lead to variations in the results. (C) We speculated that minor variations in datasets, algorithms, or user populations inevitably lead to strong variations in the performance of the approaches. Hence, finding the most promising approaches is a challenge. As a second limitation, we noted that many authors neglected to take into account factors other than accuracy, for example overall user satisfaction. In addition, most approaches (81 %) neglected the user-modeling process and did not infer information automatically but let users provide keywords, text snippets, or a single paper as input. Information on runtime was provided for 10 % of the approaches. Finally, few research papers had an impact on research-paper recommender systems in practice. We also identified a lack of authority and long-term research interest in the field: 73 % of the authors published no more than one paper on research-paper recommender systems, and there was little cooperation among different co-author groups. We concluded that several actions could improve the research landscape: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches.
      PubDate: 2016-11-01
      DOI: 10.1007/s00799-015-0156-0
      Issue No: Vol. 17, No. 4 (2016)
  • Location-triggered mobile access to a digital library of audio books using
    • Authors: Annika Hinze; David Bainbridge
      Pages: 339 - 365
      Abstract: Abstract This paper explores the role of audio as a means to access books while being at locations referred to within the books, through a mobile app, called Tipple. The books are sourced from a digital library—either self-contained on the mobile phone, or else over the network—and can either be accompanied by pre-recorded audio or synthesized using text-to-speech. The paper details the functional requirements, design and implementation of Tipple. The developed concept was explored and evaluated through three field studies.
      PubDate: 2016-11-01
      DOI: 10.1007/s00799-015-0165-z
      Issue No: Vol. 17, No. 4 (2016)
  • API-based social media collecting as a form of web archiving
    • Authors: Justin Littman; Daniel Chudnov; Daniel Kerchner; Christie Peterson; Yecheng Tan; Rachel Trent; Rajat Vij; Laura Wrubel
      Abstract: Abstract Social media is increasingly a topic of study across a range of disciplines. Despite this popularity, current practices and open source tools for social media collecting do not adequately support today’s scholars or support building robust collections for future researchers. We are continuing to develop and improve Social Feed Manager (SFM), an open source application assisting scholars collecting data from Twitter’s API for their research. Based on our experience with SFM to date and the viewpoints of archivists and researchers, we are reconsidering assumptions about API-based social media collecting and identifying requirements to guide the application’s further development. We suggest that aligning social media collecting with web archiving practices and tools addresses many of the most pressing needs of current and future scholars conducting quality social media research. In this paper, we consider the basis for these new requirements, describe in depth an alignment between social media collecting and web archiving, outline a technical approach for effecting this alignment, and show how the technical approach has been implemented in SFM.
      PubDate: 2016-12-28
      DOI: 10.1007/s00799-016-0201-7
  • Avoiding spoilers: wiki time travel with Sheldon Cooper
    • Authors: Shawn M. Jones; Michael L. Nelson; Herbert Van de Sompel
      Abstract: Abstract A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if fans are behind in their viewing they run the risk of encountering “spoilers”—information that gives away key plot points before the intended time of the show’s writers. Because the wiki history is indexed by revisions, finding specific dates can be tedious, especially for pages with hundreds or thousands of edits. A wiki’s history interface does not permit browsing across historic pages without visiting current ones, thus revealing spoilers in the current page. Enterprising fans can resort to web archives and navigate there across wiki pages that were live prior to a specific episode date. In this paper, we explore the use of Memento with the Internet Archive as a means of avoiding spoilers in fan wikis. We conduct two experiments: one to determine the probability of encountering a spoiler when using Memento with the Internet Archive for a given wiki page, and a second to determine which date prior to an episode to choose when trying to avoid spoilers for that specific episode. Our results indicate that the Internet Archive is not safe for avoiding spoilers, and therefore we highlight the inherent capability of fan wikis to address the spoiler problem internally using existing, off-the-shelf technology. We use the spoiler use case to define and analyze different ways of discovering the best past version of a resource to avoid spoilers. We propose Memento as a structural solution to the problem, distinguishing it from prior content-based solutions to the spoiler problem. This research promotes the idea that content management systems can benefit from exposing their version information in the standardized Memento way used by other archives. We support the idea that there are use cases for which specific prior versions of web resources are invaluable.
      PubDate: 2016-12-21
      DOI: 10.1007/s00799-016-0200-8
  • The colors of the national Web: visual data analysis of the historical
           Yugoslav Web domain
    • Authors: Anat Ben-David; Adam Amram; Ron Bekkerman
      Abstract: Abstract This study examines the use of visual data analytics as a method for historical investigation of national Webs, using Web archives. It empirically analyzes all graphically designed (non-photographic) images extracted from Websites hosted in the historical .yu domain and archived by the Internet Archive between 1997 and 2000, to assess the utility and value of visual data analytics as a measure of nationality of a Web domain. First, we report that only \(23.5\%\) of Websites hosted in the .yu domain over the studied years had their graphically designed images properly archived. Second, we detect significant differences between the color palettes of .yu sub-domains (commercial, organizational, academic, and governmental), as well as between Montenegrin and Serbian Websites. Third, we show that the similarity of the domains’ colors to the colors of the Yugoslav national flag decreases over time. However, there are spikes in the use of Yugoslav national colors that correlate with major developments on the Kosovo frontier.
      PubDate: 2016-12-18
      DOI: 10.1007/s00799-016-0202-6
  • Mapping metadata to DDC classification structures for searching and
    • Authors: Xia Lin; Michael Khoo; Jae-Wook Ahn; Doug Tudhope; Ceri Binding; Diana Massam; Hilary Jones
      Abstract: Abstract In this paper, we introduce a metadata visual interface based on metadata aggregation and automatic classification mapping. We demonstrate that it is possible to aggregate metadata records from multiple unrelated repositories, enhance them through automatic classification, and present them in a unified visual interface. The main features of the interface include dynamic querying using DDC classes as filters, interactive visual views of search results and related DDC classes, and drill-down options for searching and browsing in different levels of details. The interface was tested in a user study of 30 subjects. A comparison was done on three modules of the interface, namely ‘search interface’, ‘hierarchical interface’, and ‘visual interface.’ The results indicate that subjects performed well with all the three interfaces, and they had more positive experience with the hierarchical interface than with the search interface and the visual interface.
      PubDate: 2016-12-07
      DOI: 10.1007/s00799-016-0197-z
  • Documenting archaeological science with CIDOC CRM
    • Authors: Franco Niccolucci
      Abstract: Abstract The paper proposes to use CIDOC CRM and its extensions CRMsci and CRMdig to document the scientific experiments involved in archaeological investigations. The nature of such experiments is analysed and ways to document their important aspects are provided using existing classes and properties from the CRM or from the above-mentioned schemas, together with newly defined ones, forming an extension of the CRM called CRMas.
      PubDate: 2016-11-30
      DOI: 10.1007/s00799-016-0199-x
  • Creating knowledge maps using Memory Island
    • Authors: Bin Yang; Jean-Gabriel Ganascia
      Abstract: Abstract Knowledge maps are useful tools, now beginning to be widely applied to the management and sharing of large-scale hierarchical knowledge. In this paper, we discuss how knowledge maps can be generated using Memory Island. Memory Island is our cartographic visualization technique, which was inspired by the ancient “Art of Memory”. It consists of automatically creating the spatial cartographic representation of a given hierarchical knowledge (e.g., ontology). With the help of its interactive functions, users can navigate through an artificial landscape, to learn and retrieve information from the knowledge. We also present some preliminary results of representing different hierarchical knowledge to show how the knowledge maps created by our technique work.
      PubDate: 2016-10-21
      DOI: 10.1007/s00799-016-0196-0
  • Expressing reliability with CIDOC CRM
    • Authors: Franco Niccolucci; Sorin Hermon
      Abstract: Abstract The paper addresses the issue of documenting and communicating the reliability of evidence interpretation in archaeology and, in general, in heritage science. It is proposed to express reliability with fuzzy logic, and model it using an extension of CIDOC CRM classes and properties. This proposed extension is compared with other CRM extensions.
      PubDate: 2016-10-07
      DOI: 10.1007/s00799-016-0195-1
  • Inheriting library cards to Babel and Alexandria: contemporary metaphors
           for the digital library
    • Authors: Paul Gooding; Melissa Terras
      Abstract: Abstract Librarians have been consciously adopting metaphors to describe library concepts since the nineteenth century, helping us to structure our understanding of new technologies. As a profession, we have drawn extensively on these figurative frameworks to explore issues surrounding the digital library, yet very little has been written to date which interrogates how these metaphors have developed over the years. Previous studies have explored library metaphors, using either textual analysis or ethnographic methods to investigate their usage. However, this is to our knowledge the first study to use bibliographic data, corpus analysis, qualitative sentiment weighting and close reading to study particular metaphors in detail. It draws on a corpus of over 450 articles to study the use of the metaphors of the Library of Alexandria and Babel, concluding that both have been extremely useful as framing metaphors for the digital library. However, their longstanding use has seen them become stretched as metaphors, meaning that the field’s figurative framework now fails to represent the changing technologies which underpin contemporary digital libraries.
      PubDate: 2016-09-22
      DOI: 10.1007/s00799-016-0194-2
  • What’s news? Encounters with news in everyday life: a study of
           behaviours and attitudes
    • Authors: Sally Jo Cunningham; David M. Nichols; Annika Hinze; Judy Bowen
      Abstract: Abstract As the news landscape changes, for many users the nature of news itself is changing as well. Insights into the changing news behaviour of users can inform the design of access tools and news archives. We analysed a set of 35 autoethnographies of news encounters, created by students in New Zealand. These comprise rich descriptions of the news sources, modalities, topics of interest, and news ‘routines’ by which the students keep in touch with friends and maintain awareness of personal, local, national, and international events. We explore the implications of these insights into news behaviour for digital news systems.
      PubDate: 2016-08-10
      DOI: 10.1007/s00799-016-0187-1
  • Editorial for the TPDL 2015 special issue
    • Authors: Sarantos Kapidakis; Cezary Mazurek; Marcin Werla
      PubDate: 2016-07-26
      DOI: 10.1007/s00799-016-0190-6
  • Characteristics of social media stories
    • Authors: Yasmin AlNoamany; Michele C. Weigle; Michael L. Nelson
      Abstract: Abstract An emerging trend in social media is for users to create and publish “stories”, or curated lists of  Web resources, with the purpose of creating a particular narrative of interest to the user. While some stories on the Web are automatically generated, such as Facebook’s “Year in Review”, one of the most popular storytelling services is “Storify”, which provides users with curation tools to select, arrange, and annotate stories with content from social media and the Web at large. We would like to use tools, such as Storify, to present (semi-)automatically created summaries of archival collections. To support automatic story creation, we need to better understand as a baseline the structural characteristics of popular (i.e., receiving the most views) human-generated stories. We investigated 14,568 stories from Storify, comprising 1,251,160 individual resources, and found that popular stories (i.e., top 25 % of views normalized by time available on the Web) have the following characteristics: 2/28/1950 elements (min/median/max), a median of 12 multimedia resources (e.g., images, video), 38 % receive continuing edits, and 11 % of their elements are missing from the live Web. We also checked the population of Archive-It collections (3109 collections comprising 305,522 seed URIs) for better understanding the characteristics of the collections that we intend to summarize. We found that the resources in human-generated stories are different from the resources in Archive-It collections. In summarizing a collection, we can only choose from what is archived (e.g., is popular in Storify, but rare in Archive-It). However, some other characteristics of human-generated stories will be applicable, such as the number of resources.
      PubDate: 2016-07-21
      DOI: 10.1007/s00799-016-0185-3
  • Web archive profiling through CDX summarization
    • Authors: Sawood Alam; Michael L. Nelson; Herbert Van de Sompel; Lyudmila L. Balakireva; Harihar Shankar; David S. H. Rosenthal
      Abstract: Abstract With the proliferation of public web archives, it is becoming more important to better profile their contents, both to understand their immense holdings as well as to support routing of requests in the Memento aggregator. To save time, the Memento aggregator should only poll the archives that are likely to have a copy of the requested URI. Using the crawler index files produced after crawling, we can generate profiles of the archives that summarize their holdings and can be used to inform routing of the Memento aggregator’s URI requests. Previous work in profiling ranged from using full URIs (no false positives, but with large profiles) to using only top-level domains (TLDs) (smaller profiles, but with many false positives). This work explores strategies in between these two extremes. In our experiments, we correctly identified about 78 % of the URIs that were present or not present in the archive with less than 1 % relative cost as compared to the complete knowledge profile and 94 % URIs with less than 10 % relative cost without any false negatives. With respect to the TLD-only profile, the registered domain profile doubled the routing precision, while complete hostname and one path segment gave a tenfold increase in the routing precision.
      PubDate: 2016-07-16
      DOI: 10.1007/s00799-016-0184-4
  • Using a file history graph to keep track of personal resources across
           devices and services
    • Authors: Matthias Geel; Moira C. Norrie
      Abstract: Abstract Personal digital resources now tend to be stored, managed and shared using a variety of devices and online services. As a result, different versions of resources are often stored in different places, and it has become increasingly difficult for users to keep track of them. We introduce the concept of a file history graph that can be used to provide users with a global view of resource provenance and enable them to track specific versions across devices and services. We describe how this has been used to realise a version-aware environment, called Memsy, and report on a lab study used to evaluate the proposed workflow. We also describe how reconciliation services can be used to fill in missing links in the file history graph and present a detailed study for the case of images as a proof of concept.
      PubDate: 2016-07-07
      DOI: 10.1007/s00799-016-0181-7
  • Evaluating unsupervised thesaurus-based labeling of audiovisual content in
           an archive production environment
    • Authors: Victor de Boer; Roeland J. F. Ordelman; Josefien Schuurman
      Abstract: Abstract In this paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results for given requirements with respect to archival quality, authority and service levels to external users. We conclude that with parameter settings that are optimized using a rigorous evaluation of precision and accuracy, the quality of automatic term-suggestion is sufficiently high. We furthermore provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective. Additional future work will be on deploying different information sources including annotations based on multimodal video analysis such as speaker recognition and computer vision.
      PubDate: 2016-06-23
      DOI: 10.1007/s00799-016-0182-6
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016