A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
The end of the list has been reached or no journals were found for your choice.
Similar Journals
Journal Cover
Research Ideas and Outcomes
Number of Followers: 0  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2367-7163
Published by Pensoft Homepage  [58 journals]
  • Guidelines for Research Ethics and Research Integrity in Citizen Science

    • Abstract: Research Ideas and Outcomes 8: e97122
      DOI : 10.3897/rio.8.e97122
      Authors : Eglė Ozolinčiūtė, William Bülow, Sonja Bjelobaba, Inga Gaižauskaitė, Veronika Krásničan, Dita Dlabolová, Julija Umbrasaitė : Students and researchers might have diverse ideas about and motivations for citizen science (CS) projects. To prevent uncertainty, we address ethical concerns emerging in CS projects and in CS in general, specifically, the transferability of the ethical skills and knowledge gained within academia (e.g. through studying and research conduct). We dedicate these Guidelines for Research Ethics and Research Integrity in Citizen Science primarily to Masters and Doctoral students and their supervisors, to facilitate CS-related research activities (i.e. mainstream CS) in line with the values of academic integrity. Using a pool of 85 papers, we identified nine topics covering 22 customised guidelines and supplemented them with further readings to build more in-depth knowledge. HTML XML PDF
      PubDate: Wed, 30 Nov 2022 12:32:19 +020
       
  • Sharing the Recipe: Reproducibility and Replicability in Research Across
           Disciplines

    • Abstract: Research Ideas and Outcomes 8: e89980
      DOI : 10.3897/rio.8.e89980
      Authors : Rima-Maria Rahal, Hanjo Hamann, Hilmar Brohmer, Florian Pethig : The open and transparent documentation of scientific processes has been established as a core antecedent of free knowledge. This also holds for generating robust insights in the scope of research projects. To convince academic peers and the public, the research process must be understandable and retraceable (reproducible), and repeatable (replicable) by others, precluding the inclusion of fluke findings into the canon of insights. In this contribution, we outline what reproducibility and replicability (R&R) could mean in the scope of different disciplines and traditions of research and which significance R&R has for generating insights in these fields. We draw on projects conducted in the scope of the Wikimedia "Open Science Fellows Program" (Fellowship Freies Wissen), an interdisciplinary, long-running funding scheme for projects contributing to open research practices. We identify twelve implemented projects from different disciplines which primarily focused on R&R, and multiple additional projects also touching on R&R. From these projects, we identify patterns and synthesize them into a roadmap of how research projects can achieve R&R across different disciplines. We further outline the ground covered by these projects and propose ways forward. HTML XML PDF
      PubDate: Tue, 22 Nov 2022 12:19:45 +020
       
  • From Theoretical Debates to Lived Experiences: Autoethnographic Insights
           into Open Educational Practices in German Higher Education

    • Abstract: Research Ideas and Outcomes 8: e86663
      DOI : 10.3897/rio.8.e86663
      Authors : Sigrid Fahrer, Tamara Heck, Ronny Röwert, Naomi Truan : The Open Science Fellow Program built a community where researchers learned how to work openly. Within this environment, questions emerged on what it means to teach openly, i.e. which practices represent open learning and teaching and which examples can be shared amongst colleagues and peers' Different concepts of open educational practices (OEP) aim at giving answers to open learning and teaching. At the same time, OEP still lack a common definition and many discussions on the topic only give minimal or implicit guidance to concrete approaches of being open —despite the creation and sharing of open educational resources.Investigating how we as practitioners implement concepts of OEP in the classroom was the starting point for the autoethnographic study we describe in this paper. We conducted a literature review to map explicit concepts of OEP, we reflected those concepts regarding the adaptation in our own teaching and our experiences with OER-based and other open teaching concepts. We discuss four research papers and our respective position as practitioners in higher education in Germany. We reflect on the current state of ideas of OEP and their practical adaptation and implementation in learning and teaching scenarios. HTML XML PDF
      PubDate: Thu, 17 Nov 2022 13:31:44 +020
       
  • Producing Open Data

    • Abstract: Research Ideas and Outcomes 8: e86384
      DOI : 10.3897/rio.8.e86384
      Authors : Caroline Fischer, Simon David Hirsbrunner, Vanessa Teckentrup : Open data offer the opportunity to economically combine data into large-scale datasets, fostering collaboration and re-use in the interest of treating researchers’ resources as well as study participants with care. Whereas advantages of utilising open data might be self-evident, the production of open datasets also challenges individual researchers. This is especially true for open data that include personal data, for which higher requirements have been legislated. Mainly building on our own experience as scholars from different research traditions (life sciences, social sciences and humanities), we describe best-practice approaches for opening up research data. We reflect on common barriers and strategies to overcome them, condensed into a step-by-step guide focused on actionable advice in order to mitigate the costs and promote the benefit of open data on three levels at once: society, the disciplines and individual researchers. Our contribution may prevent researchers and research units from re-inventing the wheel when opening data and enable them to learn from our experience. HTML XML PDF
      PubDate: Thu, 17 Nov 2022 12:46:39 +020
       
  • Knowledge Equity and Open Science in qualitative research –
           Practical research considerations

    • Abstract: Research Ideas and Outcomes 8: e86387
      DOI : 10.3897/rio.8.e86387
      Authors : Isabel Steinhardt, Felicitas Kruschick : How can Knowledge In/Equity be addressed in qualitative research by taking the idea of Open Science into account' Two projects from the Open Science Fellows Programme by Wikimedia Deutschland will be used to illustrate how Open Science practices can succeed in qualitative research, thereby reducing In/Equity. In this context, In/Equity is considered as a fair and equal representation of people, their knowledge and insights and comprehends questions about how epistemic, structural, institutional and personal biases generate and shape knowledge as guidance. Three questions guide this approach: firstly, what do we understand by In/Equity in the context of knowledge production in these projects' Secondly, who will be involved in knowledge generation and to what extent will they be valued or unvalued' Thirdly, how can data be made accessible for re-use to enable true participation and sharing' HTML XML PDF
      PubDate: Thu, 17 Nov 2022 09:08:50 +020
       
  • Recommendations for use of annotations and persistent identifiers in
           taxonomy and biodiversity publishing

    • Abstract: Research Ideas and Outcomes 8: e97374
      DOI : 10.3897/rio.8.e97374
      Authors : Donat Agosti, Laurence Benichou, Wouter Addink, Christos Arvanitidis, Terence Catapano, Guy Cochrane, Mathias Dillen, Markus Döring, Teodor Georgiev, Isabelle Gérard, Quentin Groom, Puneet Kishor, Andreas Kroh, Jiří Kvaček, Patricia Mergen, Daniel Mietchen, Joana Pauperio, Guido Sautter, Lyubomir Penev : The paper summarises many years of discussions and experience of biodiversity publishers, organisations, research projects and individual researchers, and proposes recommendations for implementation of persistent identifiers for article metadata, structural elements (sections, subsections, figures, tables, references, supplementary materials and others) and data specific to biodiversity (taxonomic treatments, treatment citations, taxon names, material citations, gene sequences, specimens, scientific collections) in taxonomy and biodiversity publishing. The paper proposes best practices on how identifiers should be used in the different cases and on how they can be minted, cited, and expressed in the backend article XML to facilitate conversion to and further re-use of the article content as FAIR data. The paper also discusses several specific routes for post-publication re-use of semantically enhanced content through large biodiversity data aggregators such as the Global Biodiversity Information Facility (GBIF), the International Nucleotide Sequence Database Collaboration (INSDC) and others, and proposes specifications of both identifiers and XML tags to be used for that purpose. A summary table provides an account and overview of the recommendations. The guidelines are supported with examples from the existing publishing practices. HTML XML PDF
      PubDate: Wed, 16 Nov 2022 12:00:00 +020
       
  • Marine Animal Forests of the World: Definition and Characteristics

    • Abstract: Research Ideas and Outcomes 8: e96274
      DOI : 10.3897/rio.8.e96274
      Authors : Covadonga Orejas, Marina Carreiro-Silva, Christian Mohn, James Reimer, Toufiek Samaai, A. Louise Allcock, Sergio Rossi : The term Marine Animal Forest (MAF) was first described by Alfred Russel Wallace in his book “The Malay Archipelago” in 1869. The term was much later re-introduced and various descriptions of MAFs were presented in great detail as part of a book series. The international research and conservation communities have advocated for the future protection of MAFs and their integration into spatial plans and, in response, there are plans to include the characteristics of MAFs into national policies and international directives and conventions (i.e. IUCN, CBD, OSPAR, HELCOM, Barcelona Convention, European directives, ABJN policies etc.). Some MAF ecosystems are already included in international and national conservation and management initiatives, for instance, shallow water coral reefs (ICRI, ICRAN) or cold-water coral reefs and gardens and sponge aggregations (classified as Vulnerable Marine Ecosystems, VMEs), but not as a group together with other ecosystems with similar ecological roles. Marine Animal Forests can be found in all oceans, from shallow to deep waters. They are composed of megabenthic communities dominated by sessile suspension feeders (such as sponges, corals and bivalves) capable of producing three-dimensional frameworks with structural complexity that provide refuge for other species.MAFs are diverse and often harbour highly endemic communities. Marine animal forests face direct anthropogenic threats and they are not protected in many regions, particularly in deep-sea environments. Even though MAFs have been already described in detail, there are still fundamental knowledge gaps regarding their geographical distribution and functioning. A workshop was dedicated to clarifying the definition of MAFs, characterising their structure and functioning, including delineating the ecosystem services that they provide and the threats upon them. The workshop was organised by Working Group 2 of the EU-COST Action “MAF-WORLD” (hereafter WG2), which is responsible for collating and promoting research on mapping, biogeography and biodiversity of MAFs, to identify and reduce these knowledge gaps. Herein, we report on this workshop and its outputs. HTML XML PDF
      PubDate: Fri, 11 Nov 2022 16:17:15 +020
       
  • The scope and scale of the life sciences (‘Nature’s
           envelope’)

    • Abstract: Research Ideas and Outcomes 8: e96132
      DOI : 10.3897/rio.8.e96132
      Authors : David Patterson : The extension of biology with a more data-centric component offers new opportunities for discovery. To enable investigations that rely on third-party data, the infrastructure that retains data and allows their re-use should, arguably, enable transactions that relate to any and all biological processes. The assembly of such a service-oriented and enabling infrastructure is challenging. Part of the challenge is to factor in the scope and scale of biological processes. From this foundation can emerge an estimate of the number of discipline-specific centres which will gather data in their given area of interest and prepare them for a path that will lead to trusted, persistent data repositories which will make fit-for-purpose data available for re-use. A simple model is presented for the scope and scale of life sciences. It can accommodate all known processes conducted by or caused by any and all organisms. It is depicted on a grid, the axes of which are (x) the durations of the processes and (y) the sizes of participants involved. Both axes are presented in log10 scales, and the grid is divided into decadal blocks with ten fold increments of time and size. Processes range in duration from 10-17 seconds to 3.5 billion years or more, and the sizes of participants range from 10-15 to 1.3 107 metres. Examples are given to illustrate the diversity of biological processes and their often inexact character. About half of the blocks within the grid do not contain known processes. The blocks that include biological processes amount to ‘Nature’s envelope’, a valuable rhetorical device onto which subdisciplines and existing initiatives may be mapped, and from which can be derived some key requirements for a comprehensive data infrastructure. HTML XML PDF
      PubDate: Thu, 10 Nov 2022 12:17:10 +020
       
  • Proposal for detecting coconut rhinoceros beetle breeding sites using
           harmonic radar

    • Abstract: Research Ideas and Outcomes 8: e86422
      DOI : 10.3897/rio.8.e86422
      Authors : Aubrey Moore, Matthew Siderhurst : Coconut rhinoceros beetle (CRB), a major pest of coconut and oil palms, is causing severe economic and environmental damage following recent invasions of several Pacific islands. Population suppression and eradication of this pest requires location and destruction of active and potential breeding sites where all life stages aggregate. Three search tactics for discovering breeding sites have been used with limited success: visual search by humans, search with assistance from detector dogs and search by tracking CRB adults fitted with radio transmitters.Here, we suggest a fourth search tactic: releasing CRB adults fitted with harmonic radar tags to locate breeding sites. Our idea is to find static end points for tags which accumulate at breeding sites, rather than active tracking of individual beetles. We plan to use commercially available hand-held harmonic radar devices. If we are successful, this technique may be useful for locating other insects which aggregate, such as hornets and other social insects. HTML XML PDF
      PubDate: Thu, 27 Oct 2022 08:46:43 +030
       
  • Recommendations for interoperability among infrastructures

    • Abstract: Research Ideas and Outcomes 8: e96180
      DOI : 10.3897/rio.8.e96180
      Authors : Sofie Meeus, Wouter Addink, Donat Agosti, Christos Arvanitidis, Bachir Balech, Mathias Dillen, Mariya Dimitrova, Juan Miguel González-Aranda, Jörg Holetschek, Sharif Islam, Thomas Jeppesen, Daniel Mietchen, Nicky Nicolson, Lyubomir Penev, Tim Robertson, Patrick Ruch, Maarten Trekels, Quentin Groom : The BiCIKL project is born from a vision that biodiversity data are most useful if they are presented as a network of data that can be integrated and viewed from different starting points. BiCIKL’s goal is to realise that vision by linking biodiversity data infrastructures, particularly for literature, molecular sequences, specimens, nomenclature and analytics. To make those links we need to better understand the existing infrastructures, their limitations, the nature of the data they hold, the services they provide and particularly how they can interoperate. In light of those aims, in the autumn of 2021, 74 people from the biodiversity data community engaged in a total of twelve hackathon topics with the aim to assess the current state of interoperability between infrastructures holding biodiversity data. These topics examined interoperability from several angles. Some were research subjects that required interoperability to get results, some examined modalities of access and the use and implementation of standards, while others tested technologies and workflows to improve linkage of different data types.These topics and the issues in regard to interoperability uncovered by the hackathon participants inspired the formulation of the following recommendations for infrastructures related to (1) the use of data brokers, (2) building communities and trust, (3) cloud computing as a collaborative tool, (4) standards and (5) multiple modalities of access:If direct linking cannot be supported between infrastructures, explore using data brokers to store linksCooperate with open linkage brokers to provide a simple way to allow two-way links between infrastructures, without having to co-organize between many different organisationsFacilitate and encourage the external reporting of issues related to their infrastructure and its interoperability.Facilitate and encourage requests for new features related to their infrastructure and its interoperability.Provide development roadmaps openlyProvide a mechanism for anyone to ask for helpDiscuss issues in an open forumProvide cloud-based environments to allow external participants to contribute and test changes to featuresConsider the opportunities that cloud computing brings as a means to enable shared management of the infrastructure.Promote the sharing of knowledge around big data technologies amongst partners, using cloud computing as a training environmentInvest in standards compliance and work with standards organisations to develop new, and extend existing standardsReport on and review standards compliance within an infrastructure with metrics that give credit for work on standard compliance and developmentProvide as many different modalities of access as possibleAvoid requiring personal contacts to download dataProvide a full description of an API and the data it servesFinally, the hackathons were an ideal meeting opportunity to build, diversify and extend the BiCIKL community further, and to ensure the alignment of the community with a common vision on how best to link data from specimens, samples, sequences, taxonomic names and taxonomic literature. HTML XML PDF
      PubDate: Fri, 14 Oct 2022 08:31:55 +030
       
  • On the Nature of Information: How FAIR Digital Objects are Building-up
           Semantic Space

    • Abstract: Research Ideas and Outcomes 8: e95119
      DOI : 10.3897/rio.8.e95119
      Authors : Hans-Günther Döbereiner : In this paper, we are concerned about the nature of information and how to gather and compose data with the help of so called FAIR digital objects (FDOs) in order to transform them to knowledge. FDOs are digital surrogates of real objects. The nature of information is intrinsically linked to the kind of questions one is asking. One might not ask a question or get philosophical about it. Answers depend on the data different disciplines gather about their objects of study. In Statistical Physics, classical Shannon entropy measures system order which in equilibrium just equals the heat exchanged with the environment. In cell biology, each protein carries certain functions which create specific information. Cognitive science describes how organisms perceive their environment via functional sensors and control behavior accordingly. Note that one can have function and control without meaning. In contrast, psychology is concerned with the assessment of our perceptions by assigning meaning and ensuing actions. Finally, philosophy builds logical constructs and formulates principles, in effect transforming facts into complex knowledge. All these statements make sense, but there is an even more concise way. Indeed, Luciano Floridi provides a precise and thorough classification of information in his central oeuvre On the Philosophy of Information (Floridi 2013). Especially, he performs a sequential construction to develop the attributes which data need to have in order to count as knowledge. Semantic information is necessarily well-formed, meaningful and truthful. Well-formed data becomes meaningful by action based-semantics of an autonomous-agent solving the symbol grounding problem (Taddeo and Floridi 2005) interacting with the environment. Knowledge is created then by being informed through relevant data accounted for. We notice that the notion of agency is crucial for defining meaning. The apparent gap between Sciences and Humanities (Bawden and Robinson 2020) is created by the very existence of meaning. Further, meaning depends on interactions & connotations which are commensurate with the effective complexity of the environment of a particular agent resulting in an array of possible definitions.In his classical paper More is different  (Anderson 1972)  discussed verbatim the hierarchical nature of science. Each level is made of and obeys the laws of its constituents from one level below with the higher-level exhibiting emergent properties like wetness of water assignable only to the whole system. As we rise through the hierarchies, there is a branch of science for each level of complexity; on each complexity level there are objects for which it is appropriate and fitting to build up vocabulary for the respective levels of description leading to formation of disciplinary languages. It is the central idea of causal emergence that on each level there is an optimal degree of coarse graining to define those objects in such a way that causality becomes maximal between them. This means there is emergence of informative higher scales in complex materials extending to biological systems and into the brain with its neural networks representing our thoughts in a hierarchy of neural correlates. A computational toolkit for optimal level prediction and control has been developed (Hoel and Levin 2020) which was conceptually extended to integrated information theory of consciousness (Albantakis et al. 2019). The large gap between sciences and humanities discussed above exhibits itself in a series of small gaps connected to the emergence of informative higher scales. It has been suggested that the origin of life may be identified as a transition in causal structure and information flow (Walker 2014). Integrated information measures globally how much the causal mechanisms of a system reduce the uncertainty about the possible causes for a given state. A measure of “information flow” that accurately captures causal effects has been proposed (Ay and Polani 2008). The state of the art is presented in (Ay et al. 2022) where the link between information and complexity is discussed. Ay et al single out hierarchical systems and interlevel causation. Even further, (Rosas et al. 2020) reconcile conflicting views of emergence via an exact information-theoretic approach to identify causal emergence in multivariate data. As information becomes differentially richer one eventually needs complexity measures beyond {Rn}. One may define generalized metrices on these spaces (Pirró 2009) measuring information complexity on ever higher hierarchical levels of information. As one rises through hierarchies, information on higher scale is usually gained by coarse graining to arrive at an effective, nevertheless exact description, on the higher scale. It is repeated coarse graining of syntactically well-ordered information layers which eventually leads to semantic information in a process which I conjecture to be reminiscent of renormalization group flow leading to a universal classification scheme. Thus, we identify scientific disciplines and their corresponding data sets as dual universality classes of physical and epistemic structure formation, respectively. Above the semantic gap, we may call this process quantification of the qualitative by semantic metrics. Indeed, (Kolchinsky and Wolpert 2018) explored for the first time quantitative semantic concepts in Physics in their 2018 seminal paper entitled Semantic information, autonomous agency and non-equilibrium statistical physics.  Their measures are numeric variants of entropy.  Semantic information is identified with ‘the information that a physical system has about its environment that is causally necessary for the system to maintain its own existence over time’.FDOs ...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • The ecology of FAIR Digital Objects, with special attention to
           roundtripping and benchmarking across the research ecosystem

    • Abstract: Research Ideas and Outcomes 8: e96117
      DOI : 10.3897/rio.8.e96117
      Authors : Daniel Mietchen : The more findable, accessible, interoperable and reusable (i.e. aligned with the FAIR Principles outlined by Wilkinson et al. 2016) a Digital Object is, the more likely it is to interact with other entities in the research ecosystem and beyond. As long as the interoperability of these entities is not perfect (and it rarely is), a variety of interactions with a given Digital Object (e.g. split, merge, aggregation, transformation, backup, upload, download, or updates of content, metadata, storage or permissions) will mean a variety of representations of it, with some closer to the original than others. This has consequences for how the information about Digital Objects or contained in them can move around the research ecosystem. In some contexts, multiple representations of a given original (or aspects of it) might exist, creating the need to assess similarities, differences and relationships and to include them in curation, management, education, dissemination and preservation workflows. In other contetxts, the sole copy of a Digital Object might exist on a legacy system with limited alignment to the FAIR Principles, which creates the need for generating more readily accessible backup copies and to adapt some of them for inclusion in contemporary workflows.In this presentation, we will look at the suitability of sets of FAIR Digital Objects to serve as indicators for several aspects of FAIRness across different elements of the research ecosystem. These sets could involve existing FAIR Digital Objects (e.g. data management plans, as per Mietchen 2021) as well as new or hypothetical ones, and inclusion or exclusion with respect to a given set could be defined using a wide range of criteria pertaining to the ecosystem elements of interest. Taking inspiration from tracing, monitoring, benchmarking and roundtripping activities in various research fields, we will then explore how far, how well and how quickly such sets - or their content - can travel through multiple elements of the research ecosystem (e.g. different databases or software pipelines or different stages of the research cycle) and what this means in terms of potential improvements to the FAIR Digital Objects themselves, to the sets and their contents, to the way the FAIR assessments are implemented or to relevant elements of the research ecosystem. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • FAIRifying the dependencies of FAIR Digital Objects within and beyond the
           research ecosystem

    • Abstract: Research Ideas and Outcomes 8: e96118
      DOI : 10.3897/rio.8.e96118
      Authors : Daniel Mietchen : As the FAIR Principles about findability, accessiblity, interoperability and reusability of research (Wilkinson et al. 2016) reach further and deeper into the research ecosystem, they are increasingly reflected in research policies, research infrastructures, data management plans and other elements of the research landscape. Yet many of these elements are themselves limited in their FAIRness, which hinders the FAIRification - adaptation to the FAIR Principles - of elements that depend on them, e.g. datasets, software, reviews, replication attempts or research evaluation. This can cause friction in alignment with current practices, thereby leading to missed educational and community engagement opportunities and hampering efficient monitoring of compliance or systematic identification of potential improvements.This poster looks at how the FAIRness of FAIR Digital Objects is affected by the FAIRness of their dependencies, focusing on two types of examples - research data policies and research ethics workflows.In the first part, the poster explores how the role of research data-related policies and regulations would change if theiy would increasingly involve FAIR Digital Objects, e.g. if policies and their key stipulations would have persistable identifiers linked to well-defined and machine-actionable schemas. These explorations will touch upon both technical and social aspects: what mechanisms are available and already used to increase the FAIRness of policies' Does it help or hinder if certain aspects of the transition to a FAIRer ecosystem are shared in a more or less FAIR way or with shorter or longer delays' Does having more FAIR policies themselves provide funders, institutions, publishers or other organizations with more of an edge or a handicap in terms of assisting their respective communities in the transition towards more FAIRness in their respective corner of the research ecosystem' How can the design of FAIR policy elements be tailored to optimize learning opportunities for specific stakeholder groups pertaining to specific types of collections of FAIR Digital Objects'In the second part, the poster explores what the benefits and risks would be of making more use of FAIR Digital Objects in research ethics workflows (Hegde et al. 2022). The components considered include the circumstances suggesting or even requiring an ethical review, the types of information that need to be exchanged during the process, the types of communications set up to convey said information, the stakeholders involved in any part of the process, the ways in which metadata about the process is stored and shared, and rules that govern any of these aspects and related matters. These questions will be discussed from the perspectives of several stakeholder groups, e.g. researchers, research subjects, research administrators, reviewers (on ethics committees or during manuscript or grant proposal review), data stewards, tool developers, science journalists, ethics educators and others. Another aspect considered is the potential of a more FAIR ethics process to reduce the burden on the stakeholders involved and to make their participation more meaningful, while raising compliance with applicable regulations, increasing the speed and transparency of the process and improving documentation and standardization.Generalizing based on these two examples, the poster concludes with a depiction of how to include dependencies of research-related FAIR Digital Objects in FAIR Digital workflows and assessments or reuses thereof. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • FAIR-IMPACT

    • Abstract: Research Ideas and Outcomes 8: e96144
      DOI : 10.3897/rio.8.e96144
      Authors : Jessica Parland-von Essen, Ingrid Dillo : In this poster we present the FAIR-IMPACT project, “Expanding FAIR solutions across EOSC”, which is funded by the European Commission Horizon Europe programme. The acronym FAIR stands for Findable, Accessible, Interoperable and Reusable. The project is coordinated by DANS and supported by 27 additional partners from 11 countries. FAIR-IMPACT will build on the successful practices, policies, tools and technical specifications arising from FAIRsFAIR other H2020 projects and initiatives, and from the FAIR and other relevant Working Groups of the former European Open Science Cloud (EOSC) Executive Board. The FAIR-IMPACT project is active between June 2022 and May 2025.Cascading grants will be available to support uptake of FAIR solutions and practices. The overall objective of FAIR-IMPACT is to realise a FAIR EOSC, that is an EOSC of FAIR data and services, by supporting the implementation of FAIR-enabling practices across scientific communities and research outputs at a European, national, and international level. Advancing findability, accessibility, interoperability and reusability (“FAIRness”) of data and other research objects are at the core of this project, which closely collaborates with the FAIRCORE4ESOC project (https://faircore4eosc.eu/ ).We will coordinate the implementation of frameworks and the alignment of FAIR data practices on metadata and persistent identifiers (PIDs) in order to achieve the wide uptake of and compliance with FAIR principles by national and European research data and metadata providers and repositories. In the poster we present our work on the implementation of the FAIR principles and practices. Among other things we aim at a coherent implementation of PIDs and more exact data citation, as services then are able to support better data quality with suitable PID solutions. A broader and more targeted use of PIDs, based on end-user needs, can support trust and risk management and requires collaboration among PID service providers and developing PID policies. Special attention will be paid to reproducibility and to sensitive data. As semantic artefacts are an important element in creating semantic interoperability, they are also regarded as digital objects with their own recommendations for FAIR implementation, as are software. Working together with the FAIRCORE4ESOC project, we can address things like kernel information profiles that are highly relevant for FAIR digital objects.The project will also focus on increasing data accessibility through enhancing interoperability on all levels, with specific steps taken to address recommendations outlined in the EOSC (The EOSC Interoperability Framework). Validation of core interoperability components through metadata mechanisms across scientific disciplines, fostering interoperability alignment with the nine European Data Spaces, and the DAMA framework (DAMA-DMBoK), i.e. the EOSC ecosystem, are of relevance.Metrics and FAIR assessment are also addressed in this project, that will extend and adapt the FAIRsFAIR data object assessment metrics and F-UJI tool to be more disciplinary-context aware and to include more discipline specific tests. The FAIR-IMPACT project will closely work with how the FAIR principles are implemented within the EOSC and how digital objects can be identified and managed. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Agile Research Data Management with FDOs using LinkAhead

    • Abstract: Research Ideas and Outcomes 8: e96075
      DOI : 10.3897/rio.8.e96075
      Authors : Timm Fitschen, Henrik tom Wörden, Alexander Schlemmer, Florian Spreckelsen, Daniel Hornung : One essential question with regard to the implementation of FAIR (Wilkinson et al. 2016) Digital Objects (FDOs) in everyday research is the following: How is data that is acquired in some way transformed into FDOs' Creating FDOs from data is a two-fold problem: "FAIR principles are policies, whereas the digital objects are technical abstractions" (Schwardmann 2020). Regarding the technical side, in order to become FDOs, raw data stored in files and databases have to be bundled with their metadata and PIDs have to be assigned. With good tools at hand, sharing data as an FDO with others might only be a matter of a few mouse clicks -- if the metadata is readily available.However, the process of collecting metadata comes with significant challenges of its own.While sometimes necessary, the manual annotation with metadata is error-prone and time-consuming. Due to resource constraints and time pressure, researchers might skip this task whenever it does not have any direct benefits for their work in the time frame of their current project. The experience with existing data repositories tells us that adding metadata at a late stage in the research data life cycle (for example just before publication) delays the problem in the best case. In the worst case, important information has already been lost at that stage. Furthermore, there are the FAIR principles which, for researchers, mean more rules to follow and thus more time spent on data management.Research data should be enriched with FAIR metadata as early as possible to ensure that the research data is FDO-ready when needed. In order to do this, researchers need tools that assist them with the task of making their data FDO-ready and those tools must not hinder the research process but in the best case even promote it. This means that the drawbacks of making data FDO-ready need to be mitigated and compensated by direct benefits to researchers.In this contribution, we present how early-on FDO-readyness can be achieved with the open source research data management toolkit LinkAhead and how researchers profit from the FDO-readyness directly in their work. LinkAhead, a CaosDB (Fitschen et al. 2019) distribution, assists its users from the very first steps of data acquisition to the completion of FDOs and data publication by means of a semantic data modal, metadata annotation to raw data and a powerful search capabilities.Why would researchers do what is necessary to make their data FOD-ready, early-on'With LinkAhead, the FDO-readyness is a welcome side-effect for users. Even though LinkAhead cannot magically generate all relevant metadata and make data FAIR, LinkAheads allows the automation of the process where possible and assists users elsewhere. The inevitable additional work for researchers is reduced as well as compensated with new possiblities for users to work with their data. Thus users are nudged into storing their data in clear and understandable structures and into annotating their data with high-quality metadata. We will highlight in the following, how users benefit from early FDO-readiness in their daily work due to those characteristics of LinkAhead. LinkAhead adapts to the changing needs of the researchers. It thus allows research data management to be an agile process and ensures that researchers can efficiently conduct their daily work. At the same time, it supports the development, documentation and observance of standards which is vital for the commensurability, reusability, and reproducibility of research findings. LinkAhead is designed to be the first tool after data acquisition and the last tool before the publication of data. It can be fed with data from LIMS, ELNs, simulation and analysis software, helps with automation of workflows, and manages raw data in files.Which direct benefit can LinkAhead offer to its users if they do what is necessary to make FDOs from data'When searching for data in general or FDOs in particular researchers can employ metadata and the connections among data in order to find what they are looking for. Thereby, browsing the data for example in the LinkAhead web interface can be very targeted. Additionally, these search capabilities can be used within analysis workflows in order to create the correct basis of data using FDOs directly for the question at hand. Client libraries, like the Python client, allow to include this into automated analyses. Since manual data insertion is inefficient in many research environments, LinkAhead does not only offer the insertion of data via web forms, but encourages the usage of automatic processes like the LinkAhead Crawler. While metadata should be added already during this insertion step if possible, LinkAhead assists in completing metadata after the initial insertion in order to strike a balance between interrupting the research workflow and running into the above mentioned challenges when adding metadata too late. This automatic data insertion process is highly customizable and allows to complement data as soon as possible such that FDOs are constituted.The semantic data model of LinkAhead allows researchers to use ontologies of their domain but also to extend those where necessary for the work at hand. This allows an agile adaption to changed requirements or new challenges and LinkAhead assures compatibility with old data if possible. The data model can capture both relations within an FDO, among metadata, data, and possibly files and references to other FDOs directly within LinkAhead or FDOs stored elsewhere using PIDs. The semantic data model and additional constraints facilitate the creation and validation of FDOs.LinkAhead allows to seamlessly integrate the FDO concept into the workflows of LinkAhead users. Thereby collecting information ...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • FAIR Begins at home: Implementing FAIR via the Community Data Driven
           Insights

    • Abstract: Research Ideas and Outcomes 8: e96082
      DOI : 10.3897/rio.8.e96082
      Authors : Carlos Utrilla Guerrero, Maria Vivas Romero : Arguments for the FAIR (Findable, Accesible, Inter-operable and Reusable) principles of science have mostly been based on appeals to values. However, the work of onboarding diverse researchers to make efficient and effective implementations of FAIR requires different appeals. In our recent effort to transform the institution into a FAIR University by 2025, here we report on the experiences of the Community of Data Driven Insights (CDDI), a interfaculty initiative where all university-wide research data service providers are joined together to support researchers and research groups (e.g. see research showcase example here) with all aspects concerning research data management. CDDI aims to turn all digital objects within Maastricht University (UM) into FAIR Digital Objects (FDO) and by disclosing the progress and challenges of implementing FDOs (e.g. see CDDI OSF repo: https://osf.io/398cz/), we hope to shed light on the process in a way that might be useful for other institutions in Europe and elsewhere. We initially identified 5 challenges for FDO implementation. These challenges were first a matter of reshaping the culture of science making practices to fit the FAIR principles. Additionally, it required an educational awareness within the scientific communities, and finally financial and technical tools to actually facilitate the transition to FAIR practices of science making. These perspectives show the complex dimensions of FAIR principles and FDO implementation to researchers across disciplines in a single university. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Connecting research-related FAIR Digital Objects with communities of
           stakeholders

    • Abstract: Research Ideas and Outcomes 8: e96119
      DOI : 10.3897/rio.8.e96119
      Authors : Daniel Mietchen : The last few years have seen considerable progress in terms of integrating individual elements of the research ecosystem with the so-called FAIR Principles (Wilkinson et al. 2016), a set of guidelines to make research-related resources more findable, accessible, interoperable and reusable (FAIR). This integration process has lots of technical as well as social components and ramifications, some of which resulted in dedicated terms like that of a FAIR Digital Object (FDO) which stands for research objects (e.g. datasets, software, specimens, publications) having at least a minimum level of compliance with the FAIR Principles.As the volume, breadth and depth of FAIR data and the variety of FAIR Digital Objects as well as their use and reuse continue to grow, there is ample opportunity for multi-dimensional interactions between generators, managers, curators, users and reusers of data, and the scope of data quality issues is diversifying accordingly.This poster looks at two ways in which individual collections of FAIR Digital Objects interact with the wider FAIR research landscape. First, it considers communities that curate, generate or use data, metadata or other resources pertaining to individual collections of FAIR Digital Objects. Specifically, which of these community activities are affected by higher or lower compliance of a collection's FDOs with the FAIR Principles' Second, we will consider the case of communities that overlap across FAIR collections - i.e. when some community members are engaged with several collections, possibly through multiple platforms - and what this means in terms of challenges and opportunities for enhancing findability, accessibility, interoperability and reusability between and across FAIR silos. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Two Examples on How FDO Types can Support Machine and Human Readability

    • Abstract: Research Ideas and Outcomes 8: e96014
      DOI : 10.3897/rio.8.e96014
      Authors : Ulrich Schwardmann, Tibor Kálmán : FAIR Digital Objects (FDOs) are typed by a well described set of attributes, where attributes are key value pairs with a key, which refers to a syntactic description of the value. Often the description of the set of attributes is called also profile. The exact description of the attribute keys is obviously crucial for machine actionability on one hand. On the other hand an exact description of attributes can be a way to allow also human readability of the used keys. Furthermore it can often integrate legacy attribute sets that are provided inside repositories for the description of their digital objects.In the following we show two examples of FDO types with their attributes from different viewpoints. The two examples are: the Persistent Identifiers (PID) for Instruments example and the DARIAH (see https://de.dariah.eu) use case. In both cases the Handle system is used for the persistent identifiers, the FDO record is provided by the Handle record of the PID and the attributes can be found here as type-data pairs in the phrasing of the Handle system.1 Example 1: PID for InstrumentsThe PID for instrument example goes back to the development of kernel metadata, which is seen as minimally required to reference and describe scientific instruments Stocker et al. 2020. The value space for the attributes here often contains hierarchical objects and can also be lists of attributes.An example of such an attribute definition is that of a single manufacturer of an instrument that occurs in a list of manufacturers here: http://dtr-test.pidconsortium.eu/#objects/21.T11148/7adfcd13b3b01de0d875.1.1. The Handle Record for a Full PID for InstrumentsIn this case one uses the references to the attribute definitions as keys for the values, which are often lists or objects. The Handle Record for a full attribute list of a PID for Instruments can be obtained from the Handle Proxy with https://hdl.handle.net/21.T11998/0000-001A-3905-1'noredirectThe structure of this FDO record is defined as a type definition at the ePIC Date Type Registry Schwardmann 2020 with http://dtr-test.pidconsortium.eu/objects/21.T11148/17ce618137e697852ea6 . This way we can also refer to this structure definition in a qualified key value pair like TYPE/0.TYPE and then use as keys in the FDO record the names as they are given for keys in this structure. This way an FDO record becomes a human readable form without loosing any machine readability. For further details see: https://hdl.handle.net/21.T11998/0000-001A-3905-8'noredirectIn both cases the full instrument descriptions are completely stored in the Handle database of the Handle PID service. The PID itself is a metadata object and can be seen as an FDO of its own.1.2. Type for a PID4Inst based on AttributesThe type for such FDOs is given via proxy https://hdl.handle.net/21.T11148/17ce618137e697852ea6 in the ePIC DTR1.3. PID4Inst in a RepositoryAnother option is to store the metadata objects of instrument descriptions in repositories. In this case a schema is needed to describe the metadata elements that are needed for the description. The existing attribute definitions could be bundled here into a single complex definition, which is syntactically almost identical to the type definition for instruments.From such a complex definition one could derive a schema for the repository entries. In this case the schema was directly derived from the type, which is conceptually different from attribute definitions, but syntactically similar and therefore exploitable by the same services. The result of the schema derivation can then be directly fed into the ingest module of repositories, in the following figure for example into the CORDRA schema module for the definition of attribute types: https://hdl.handle.net/21.T11148/c2c8c452912d57a44117An example of such a PID for instrument object in a repository is given at https://vm11.pid.gwdg.de:8445/objects/21.11145/8fefa88dea40956037ec2. Example 2: The DARIAH Use Case This example evolved in the humanities in the DARIAH project about five years ago with the DARIAH repository (Kálmán et al. 2016, DARIAH-DE 2016). The Handle record structure was created far before FDO records have been discussed. It uses key value pairs with human readable keys as the type and provides relatively atomic values. For humans the key here is a description for the value space that can be expected: https://hdl.handle.net/21.11113/0000-000B-CA4C-D'noredirectThe use of human readable keys does however not match the goal of machine readability of this description. Additionally it has the risk of uncertainty and ambiguity.2.1. Attribute DefinitionsIn order to make these attributes machine readable, attribute definitions for the allowed value spaces were needed and can be found in the ePIC data type registries. The following basic information type for an email address can be used as the reference key for the value space given for the 'RESPONSIBLE' type above for instance: https://dtr-test.pidconsortium.eu/#objects/21.T11148/e117a4a29bfd07438c1eAttribute definitions for all attributes used in the DARIAH example are given in the ePIC data type registrie. This way one is able to define a type for the DARIAH Handle records.2.2. An FDO Type of Legacy Repository RecordsSuch a type definition is given at: https://dtr-test.pidconsortium.eu/#objects/21.T11148/f1eea855587d8b1f66daIf this type is the known type of all objects in the DARIAH repository, the references to the keys are named very similar the human readable form of the Handle record.Usually and as we have seen in the previous PID4Inst example the type of the FDO would be another attribute of the FDO. This would require an adaption of the attributes of all digital objects of the DARIAH repository. But since all digital objects of the DARIAH repository f...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Current Developments in the Research Data Repository RADAR

    • Abstract: Research Ideas and Outcomes 8: e96005
      DOI : 10.3897/rio.8.e96005
      Authors : Felix Bach, Kerstin Soltau, Sandra Göller, Christian Bonatto Minella : RADAR is a cross-disciplinary internet-based service for long-term and format-independent archiving and publishing of digital research data from scientific studies and projects. The focus is on data from disciplines that are not yet supported by specific research data management infrastructures. The repository aims to ensure access and long-term availability of deposited datasets according to FAIR criteriaWilkinson et al. 2016 for the benefit of the scientific community. Published datasets are retained for at least 25 years; for archived datasets, the retention period can be flexibly selected up to 15 years. The RADAR Cloud service was developed as a cooperation project funded by the DFG (2013-2016) and started operations in 2017. It is operated by FIZ Karlsruhe - Leibniz-Institute for Information Infrastructure.As a distributed, multilayer application, RADAR is structured into a multitude of services and interfaces. The system architecture is modular and consists of a user interface (frontend), management layer (backend) and storage layer (archive), which communicate with each other via application programming interfaces (API). This open structure and the access to the APIs from outside allows integrating RADAR into existing systems and work processes, e. g. for automated upload of metadata from other applications using the RADAR API. RADAR's storage layer is encapsulated via the Data Center API. This approach guarantees independence from a specific storage technology and makes it possible to integrate alternative archives for the bitstream preservation of the research data.The data transfer to RADAR takes place in two steps: In the first step, the data is transferred to a temporary work storage. The ingest service accepts individual files and packed archives, optionally unpacks them while retaining the original directory structure and creates a dataset. For each file found, the MIME Type (see Multipurpose Internet Mail Extensions specification)) is analysed and stored in the technical metadata. When archiving and publishing, a dataset is created in the second step. The structure of this dataset - the AIP (archival information package) in the sense of the OAIS standard - corresponds to the BagIt standard. It contains, in addition to the actual research data in original order, technical and descriptive metadata (if created) for each file or directory as well as a manifest within one single TAR ("tape archive", a unix archiving format and utility) file as an entity in one place. This TAR file is stored permanently on magnetic tapes redundantly in three copies at different locations in two academic computing centres.The FAIR Principles are currently being given special importance in the research community. They define measures that ensure the optimal processing of research data, accessibility for both humans and machines, as well as reusability for further research. RADAR also promotes the implementation of the FAIR Principles with different measures and functional features, amongst others:Descriptive metadata are recorded using the internal RADAR Metadata Schema (based on DataCite Metadata Schema 4.0), which supports 10 mandatory and 13 optional metadata fields. Annotations can be made on the dataset level and on the individual files and folders level. A user licence which rules re-use of the data, must be defined for each dataset. Each published dataset receives a
      DOI which is registered with DataCite. RADAR metadata uses a combination of controlled lists and free text entries. Author identification is ensured by using an ORCID ID and funder identification by CrossRef Open Funder Registry. More interfacing options, e.g. ROR and the Integrated Authority File (GND) are currently implemented. Datasets can be easily linked with other digital resources (e.g. text publications) via a “related identifier”. To maximise data dissemination and discoverability, the metadata of published datasets are indexed in various formats (e.g. DataCite and DublinCore) and offered for public metadata harvesting e.g. via an OAI-provider.These measures are - to our minds - undoubtedly already significant, but not yet sufficient in the medium to long term. Especially in terms of interoperability, we see development potential for RADAR. The FAIR Digital Object (FDO) Framework seems to offer a promising concept, especially to further promote data interoperability and to close respective gaps in the current infrastructure and repository landscape.RADAR aims to participate in this community driven approach also in its role within the National Research Data Infrastructure (NFDI). As part of the NFDI, RADAR already plays a relevant role as a generic infrastructure service in several NFDI consortia (e.g. NFDI4Culture and NFDI4Chem). With RADAR4Chem and RADAR4Culture, FIZ Karlsruhe for example offers researchers from chemistry and the cultural sciences low-threshold data publication services based on RADAR. We successively develop these services further according to the needs of the communities, e.g. by integrating and linking them with subject-specific terminologies, by providing annotation options with subject-specific metadata or by enabling selective reading or previewing options for individual files in existing datasets.In our presentation, we would like to describe the present and future functionality of RADAR and its current level of FAIRness as possible starting points for further discussion with the FDO community with regard to the implementation of the FDO framework for our service. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • An Introduction to Cordra

    • Abstract: Research Ideas and Outcomes 8: e95966
      DOI : 10.3897/rio.8.e95966
      Authors : Robert Tupelo-Schneck : Cordra is a digital object server that can function as a key infrastructural piece in FAIR DO (findable, accessible, interoperable and reusable digital object) implementations. Cordra manages JSON records and payloads as typed digital objects identified by handles. Cordra is neither a database nor an indexer, but it integrates the two and provides a unified interface.Cordra is intended to support both quick prototyping as well as production systems.For prototyping, Cordra makes it easy to get up and running rapidly with a digital object server. A potential Cordra administrator can download Cordra and very quickly have a server which supports creation, search, and retrieval of digital objects with resolvable identifiers. The server supports Digital Object Interface Protocol (
      DOI P) and HTTP APIs out of the box, as well as an immediately usable prototype user interface. Cordra saves substantial development time as it comes with ready-made functionality ranging from user authentication and access control to information validation, enrichment, storing, and indexing. By default, Cordra is configured to store objects on the local file system of the machine and use embedded Apache Lucene for indexing. Simply by editing type definitions in Cordra's user interface, the administrator can start changing the behavior of the APIs and user interface in real time for experimentation, including adding custom operations.For production use, Cordra allows intensive extension and customization of the processes underlying the digital object server: how digital objects are stored and indexed, how they are validated and enriched, how users authenticate, when and to whom to give access to objects, and what custom operations can be performed. In production Cordra is run at scale, supporting high reliability and performance; among other options Cordra supports MongoDB and Amazon S3 for storage, and Elasticsearch and Apache Solr for indexing. By definition of the underlying types and operations, Cordra is intended to serve directly as the API backend for a production application.This talk will cover basic Cordra features as well as customization/configuration basics. Examples of current use will be shown, including the use of the Digital Object Interface Protocol (
      DOI P), for which Cordra serves as a reference implementation.Current users of Cordra include the Derivatives Service Bureau (DSB), which uses Cordra as part of its backend to manage the automated generation of International Securities Identification Numbers (ISINs) for OTC derivatives in the financial services sector; and the British Standard Institute (BSI) whose Identify service for construction product manufacturers aims to assign a Universal Persistent Identification Number (UPIN) "for every product that is specified and incorporated in a building structure". The DSB, Cordra users since 2017, has a production system with over 80 million identified digital objects which receives millions of searches each month. BSI.Identify has a system where Cordra's
      DOI P interface is directly accessible as the service's public API. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Starting FDO in the Cradle -- ROcrating Live Data

    • Abstract: Research Ideas and Outcomes 8: e95972
      DOI : 10.3897/rio.8.e95972
      Authors : Guido Aben, Juri Hößelbarth : This talk discusses the use of Fair Digital Objects (FDOs for short) for a democratised approach to FAIRness, that is, adherence to the Findable/Accessible/Interoperable/Reusable set of requirements, collectively called FAIR. This capability is being built for the CS3MESH4EOSC project.CS3MESH4EOSC is a 3-year EU-funded project in the EOSC context (we started Feb 2020) that addresses the challenges of the fragmentation of file and application services, digital sovereignty and the application of FAIR principles in the everyday practice of researchers. Initially, 7 major data services will be combined into ScienceMesh - a federated service mesh providing a frictionless collaboration platform for hundreds of thousands of users (researchers, engineers, students and staff). The service will offer easy access to data across institutional and geographical boundaries. The infrastructure will be gradually expanded and offered to the entire education and research community in Europe and beyond. The initial service will connect services in NL (SURFdrive), PL (PSNCBox), AU (CloudStor), DE (Sciebo), CZ (owncloud@CESNET), CH (SWITCHdrive) and DK (ScienceData), as well as domain data stores at CERN (CERNBox) and the EU's own Joint Research Centre's Copernicus (earth observation) datastore.The CS3MESH4EOSC project is busy designing, building and deploying the necessary technology to achieve this. CS3MESH4EOSC grew out of the grassroots "CS3" community, which started out as a self-help forum of infrastructure builders and providers from the academic sector who look after rapidly growing datastores of the "synch-'n'-share" paradigm (dropbox is a commercial equivalent); this type of store is growing rapidly more popular as a basic building block for live data storage and collaboration in research.The mission for the CS3MESH4EOSC project is to improve scientific collaboration across the entire mesh (essentially an interoperating federation of data stores), and to ensure that data sharing across this resulting mesh is done according to FAIR principles. This puts the CS3MESH4EOSC in a unique position: we need to bring FAIR tooling in front of a broad audience of research users (not just "FAIR literate" ones), and convice them that FAIRness is relevant at the point where live data is being collected, not just when data has congealed to collections. We recognise two main obstacles:FAIR-aware infrastructure needs to be simply available, right in front of every user's face, and be so usable that it gets broad uptake. By rule of thumb every additional step required sheds half of the userbase you start out with.Research communities need to be motivated, trained and assisted to use the FAIR infrastructure. It needs to make their lives easier. Without relevant infrastructure in place, there is no point in mounting FAIRness awareness campaigns.Therefore, CS3MESH4EOSC's approach to FAIR uptake is to start from the Science Mesh of datastores as described in the first paragraph, already in widespread use by researchers. We add a FAIR Description Service to these stores, available for any researcher of the system to use (an instance of the "Describo" tool, https://arkisto-platform.github.io/tools/description/describo-online/). Thus they can create FAIR Digital Object packages from their own data (using the RO-Crate standard) and also manage the deposition process, initially targeting the open access Zenodo and Dataverse repository services and the Open Science Framework (OSF) science workflow portal. The resulting system of metadata annotation and user guidance wizards that facilitate the process is called "ScieboRDS" (https://www.research-data-services.org/page/about/).By thus adding metadata awareness and annotation capabilities to this mesh that already has several hundreds of thousands of users and tens of Petabytes of live data on it, we end up with a democratised, low-barrier-of-entry approach to FAIR. Allowing researchers to generate FDOs from their live data (no onerous upload / collections steps) will help create more FAIR data supply. The capabilities thus far described are already available and are being deployed to users starting Q3 2022. Further development is underway that allows better capability negotiation between the live data store ("the Science Mesh") and the backend repository, such that users can rely on the relevant schema being autoprovisioned and ontologies being agreed upon before the FDO is packaged, thus improving metadata quality. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Are Fair Digital Objects and Digital Twins the same thing'

    • Abstract: Research Ideas and Outcomes 8: e95975
      DOI : 10.3897/rio.8.e95975
      Authors : Mark Wharton : Semantically-defined Digital Twins (DTs) and Fair Digital Objects (FDOs) are similar in concept. They both adopt the "Find first" philosphy and they both point to data about some entity in the real world (where "entity" is a very loose definition of any asset, real or imaginary). Are there any parallels that we can draw' Can we use a digital twin plaform to host FDOs'Are Fair Digital Objects and Digital Twins the same thing'The objectives of the GO-FAIR organisation, that data should be Findable, Accessible, Interoperable and Reusable were originally designed to help human beings find and understand data. There has always been a side-helping of desiring computers and machine intelligences to find and use that same data.The FDOs that people normally think of are datasets, probably time-series data collected over a considerable period, amounting to a large file or a database of records.What if you took the concept of FDOs and compared it to Digital Twins' If you define a Digital Twin as a virtual representation of a real-world object, entity or concept in that its identity, metadata and data are stored “in the cloud”, then it’s not a big jump from that to an FDO. If you approach it from the opposite direction - reduce the size of an FDO until it’s an individual entity, not a collection, then you arrive at the same destination.In our proposed talk we will outline the ways that FDOs and semantically-defined digital twins overlap. We will show real-world examples of Digital Twins in action and argue that an FDO of sufficient granularity would serve a very similar purpose.The Venn diagram of FDOs and Digital Twins is not a circle. Digital twins have attributes that FDOs do not have. For example, they might have real-time feeds of data (such as current temperature, fuel consumption, heart-rate… ), or they might have a control interface to change their state (turn on or off, raise a draw-bridge, etc). These dynamic activities hint at one big difference between the two - digital twins have behaviour. They do things in real time. There’s nothing in the FDO specification that suggests that’s important or required, but we would argue that’s from lack of imagination, rather than lack of desirability.The other lobe of the Venn diagram - the properties that FDOs have that Digital Twins do not - is mainly concerned with historical data. Digital twins are about the “now” of an entity, while an FDO is a historical collection of data, possibly about many entities over a period of time. This mismatch can be overcome by using the Digital Twin concept to define the dataset and using the “data bypass” pattern to allow the dataset to be transferred “out of band” to the requestor, given sufficient access permissions.Perhaps the strict equivalence that FDOs are Digital Twins is going too far. They certainly share many common features and posit a world where machines and humans can cooperate using the same data. Maybe the answer is to have an admixture of the two' HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Explainability Using Bayesian Networks for Bias Detection: FAIRness with
           FDO

    • Abstract: Research Ideas and Outcomes 8: e95953
      DOI : 10.3897/rio.8.e95953
      Authors : Ronit Purian, Natan Katz, Batya Feldman : In this paper we aim to provide an implementation of the FAIR Data Points (FDP) spec, that will apply our bias detection algorithm and automatically calculate a FAIRness score (FNS). FAIR metrics would be themselves represented as FDOs, and could be presented via a visual dashboard, and be machine accessible (Mons 2020, Wilkinson et al. 2016). This will enable dataset owners to monitor the level of FAIRness of their data. This is a step forward in making data FAIR, i.e., Findable, Accessible, Interoperable, and Reusable; or simply, Fully AI Ready data.First we may discuss the context of this topic with respect to Deep Learning (DL) problems. Why are Bayesian Networks (BN, explained below) beneficial for such issues'Explainability – Obtaining a directed acyclic graph (DAG) from a BN training provides coherent information about independence variables in the data base. In a generic DL problem, features are functions of these variables. Thus, one can derive which variables are dominant in our system. When customers or business units are interested in the cause of a neural net outcome, this DAG structure can be both a source to provide importance and clarify the model.Dimension Reduction — BN provides the joint distribution of our variables and their associations. The latter may play a role in reducing the features that we induce to the DL engine: If we know that for random variables X,Y the conditional entropy of X in Y are low, we may omit X since Y provides its nearly entire information. We have, therefore, a tool that can statistically exclude redundant variablesTagging Behavior – This section can be less evident for those who work in domains such as vision or voice. In some frameworks, labeling can be an obscure task (to illustrate, consider a sentiment problem with many categories that may overlap). When we tag the data, we may rely on some features within the datasets and generate conditional probability. Training BN, when we initialize an empty DAG, may provide outcomes in which the target is a parent of other nodes. Observing several tested examples, these outcomes reflect these “taggers’ manners”. We can therefore use DAGs not merely for the purpose of model development in machine learning but mainly learning taggers policy and improve it if needed.The conjunction of DL and Casual inference — Causal Inference is a highly developed domain in data analytics. It offers tools to resolve questions that on the one hand, DL models commonly do not and, on the other hand, the real-world raises. There is a need to find a framework in which these tools will work in conjunction. Indeed, such frameworks already exist (e.g., GNN). But a mechanism that merges typical DL problems causality is less common. We believe that the flow, as described in this paper, is a good step in the direction of achieving benefits from this conjunction.Fairness and Bias – Bayesian networks, in their essence, are not a tool for bias detection but they reveal which of the columns (or which of the data items) is dominant and modify other variables. When we discuss noise and bias, we address these faults to the column and not to the model or to the entire data base. However, assume we have a set of tools to measure bias (Purian et al. 2022). Bayesian networks can provide information about the prominence of these columns (as they are “cause” or “effect” in the data), thus allow us to assess the overall bias in the database.What are Bayesian Networks'The motivation for using Bayesian Networks (BN) is to learn the dependencies within a set of random variables. The networks themselves are directed acyclic graphs (DAG), which mimic the joint distribution of the random variables (e.g., Perrier et al. (2008)). The graph structure follows the probabilistic dependencies factorization of the joint distribution: a node V depends only on its parents (a r.v X independent of the other nodes will be presented as a parent free node).Real-World ExampleIn this paper we present a way of using the DL engine tabular data, with the python package bnlearn. Since this project is commercial, the variable names were masked; thus, they will have meaningless names.Constructing Our DAGWe begin by finding our optimal DAG.import bnlearn as bnDAG = bn.structure_learning.fit(dataframe) We now have a DAG. It has a set of nodes and an adjacency matrix that can be found as follow:print(DAG['adjmat']) The outcome has this form Fig. 1a.Where rows are sources (namely the direction of the arc is from the left column to the elements in the row) and columns are targets (i.e., the header of the column receives the arcs). When we begin drawing the obtained DAG, we get for one set of variables the following image: Fig. 1b.We can see that the target node in the rectangle is a source for many nodes. We can see that it still points arrows itself to two nodes. We will discuss this in the discussion (i.e., Rauber 2021). We have more variables, therefore I increased the number of nodes. Adding the information provided a new source for the target (i.e., its entire row is “False”). The obtained graph is the following: Fig. 1c.So, we know how to construct a DAG. Now we need to train its parameters. Code-wise we perform this as follows:model_mle = bn.parameter_learning.fit(DAG, dataframe, methodtype='maximumlikelihood')We can change ‘maximulikelihood’ with ‘bayes’ as described beyond. The outcome of this training is a set of factorized conditional distributions that reflect the DAG’s structure. It has this form for a given variable: Fig. 1d. The code to create DAG presentation is provided in Fig. 2. DiscussionIn this paper we have presented some of the theoretical concepts of Bayesian Networks and the usa...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Computable phenotypes for cohort identification: core content for a new
           class of FAIR Digital Objects

    • Abstract: Research Ideas and Outcomes 8: e95856
      DOI : 10.3897/rio.8.e95856
      Authors : Marisa Conte, Allen Flynn, Peter Boisvert, Zach Landis-Lewis, Rachel Richesson, Charles Friedman : IntroductionWe present current work to develop and define a class of digital objects that facilitates patient cohort identification for clinical studies, such that these objects are Findable, Accessible, Interoperable, and Reusable (FAIR) (Wilkinson et al. 2016). Developing this class of FAIR Digital Objects (FDOs) builds on the work of several years to develop the Knowledge Grid (https://kgrid.org/), which facilitates the development, description and implementation of biomedical knowledge packaged in machine-readable and machine-executable formats (Flynn et al. 2018). Additionally, this work aligns with the goals of the Mobilizing Computable Biomedical Knowledge (MCBK) community (https://mobilizecbk.med.umich.edu/) (Mobilizing Computable Biomedical Knowledge 2018). In this abstract, we describe our work to develop a FDO carrying a computable phenotype.Defining computable phenotypesIn biomedical informatics, 'phenotyping' describes a data-driven approach to identifying a group of individuals sharing observable characteristics of interest, generally related to a disease or condition, and a 'computable phenotype' (CP) is a machine-processable expression of a phenotypic pattern of these characteristics (Hripcsak and Albers 2018). For the purposes of this work, we are interested in CPs derived from data contained in electronic health record (EHR) systems. This includes both structured data, e.g. codes for diseases, diagnoses, procedures, or laboratory tests, and unstructured data, e.g. free text including patient histories, clinical observations, discharge summaries, and reports. Thus, we define computable phenotype FDOs (CP-FDOs) as a class of FDO that packages an executable EHR-derived CP together with documentation needed to implement and use it effectively for creating cohorts of individuals with similar observable characteristics from EHR data sets.Importance of portable and FAIR CPsThere is tremendous excitement for using real-world EHR data to discover important findings about human health and well-being. However, for discovery to happen, researchers need mechanisms like CPs to identify study cohorts for analysis. Beginning in the early 2010s, a growing literature explores various methods for the secondary use of EHR data for patient phenotyping to arrive at consistent study cohorts (Shivade et al. 2014, Banda et al. 2018). The heterogeneous nature of EHR data has inspired a wide variety of phenotyping methods, from those which rely solely on documented codes linked to terms in existing vocabularies to those which combine such codes with other concepts extracted from free text using natural language processing.Our current focus is on packaging CPs inside FDOs for classifying patients as having or not having a phenotype of interest. This can be done within an individual health system, or at scale across a clinical data research network. Using CPs for cohort identification can reduce the time and expense of traditional data set building and clincal trial recruitment, and expand the potential scope of a study population(Boland et al. 2013).Creating and validating CPs requires time, resources, and both clinical and technical expertise. One estimate is that it can take 6-10 months to develop and validate a CP (Shang et al. 2019). And, as there is no standard data model within EHRs in the United States, many CPs are designed for performance at a single site, rather than for portability, which is understood as the ability to implement a phenotype at a different site with similar performance (Shang et al. 2019). While portability is increasingly recognized as an important element of phenotyping, and there have been recent efforts to develop more portable CPs, many of these processes still require significant technical expertise at the implementation site to adapt the phenotype for use on local data.There may also be significant advantages to making CPs FAIR. These include transparency in cohort selection, and better generalizability of results. FAIR CPs may also increase the potential for robust comparisons of data from related studies, leading to better evidence synthesis to improve delivery of care and ultimately human health.Defining a new class of FDOs to hold and convey CPsWe believe that packaging validated CPs inside digital objects may alleviate many of the pressures mentioned above, and contributes to making both the processes and products of clinical research more FAIR. To this end, our current work focuses on packaging a validated CP inside a machine-processable FDO. The phenotype of interest identifies pediatric and adult patients with a rare disease (Oliverio et al. 2021), and has several features which make it ideal for transformation to an executable FDO. First, the phenotype utilizes standards to define the clinical characteristics of interest, and is based on a common data model; these features increase the potential for both interoperability and reuse. Additionally, because the phenotype has been validated across three sites, its portability has already been demonstrated. Finally, the full computable phenotype has been shared as a series of SQL queries, including scripts for patient identification, deriving statistics, and validation, which have been annotated with instructions for implementation at other sites.The goals of this work are: To develop CPs as executable DOs, leveraging previous work to develop executable Knowledge Objects (KO) (Flynn et al. 2018)To advance our understanding of how to define computable phenotypes as a class of FDO, including what is needed to meet the requirements of binding, abstraction, and encapsulation (Wittenburg et al. 2019)ConclusionComputable phenotypes, packaged as FDOs, may increase the potential both for the portability of a ph...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Challenges for FAIR Digital Object Assessment

    • Abstract: Research Ideas and Outcomes 8: e95943
      DOI : 10.3897/rio.8.e95943
      Authors : Esteban Gonzalez, Daniel Garijo, Oscar Corcho : A Digital Object (DO) "is a sequence of bits, incorporating a work or portion of a work or other information in which a party has rights or interests, or in which there is value". DOs should have persistent identifiers, meta-data and be readable by both humans and machines. A FAIR Digital Object is a DO able to interact with automated data processing systems (De Smedt et al. 2020) while following the FAIR (Findable, Accessible, Interoperable and Reusable principles) principles (Wilkinson et al. 2016).Although FAIR was originally targeted towards data artifacts, new initiatives have emerged to adapt other research digital resources such as software (Katz et al. 2021) (Lamprecht et al. 2020), ontologies (Poveda-Villalón et al. 2020), virtual research environments and even DOs (Collins et al. 2018). In this paper, we describe the challenges when assessing the level of compliance of a DO with the FAIR principles (i.e., its FAIRness), assuming that a DO contains multiple resources and captures their relationships. We explore different methods to calculate an evaluation score, and we discuss the challeneges and importance of providing explanations and guidelines for authors.FAIR assessment tools There are a growing number of tools used to assess the FAIRness of DOs. Community groups like FAIRassist.org have compiled lists of guidelines and tools for assessing the FAIRness of digital resources. These range from self assessment tools like questionnaires and checklists to semi-automated validators (Devaraju et al. 2021). Examples of automated validation tools include the F-UJI Automated FAIR Data Assessment Tool (Devaraju and Huber 2020), FAIR Evaluator and FAIR Checker for datasets or individual DOs; HowFairIs (Spaaks et al. 2021) for code repositories; and and FOOPS (Garijo et al. 2021) to assess ontologies.When it comes to assessing FDOs, we find two main challenges:Resource score discrepancy: Different FAIR assessment tools for the same type of resource produce different scores. For example, a recent study over datasets showcases differences in scores for the same resource due to how the FAIR principles are interpreted by different authors (Dumontier 2022).Heterogeneous FDO metadata: Validators include tests that explore metadata of the digital object. However, there is no agreed metadata schema to represent FDO metadata, which complicates this operation. In addition, metadata may be specific to a certain domain (De Smedt et al. 2020). To address this challenge, we need i) to agree on minimum common set of metadata to measure the FAIRness of DOs and ii) propose a framework to describe extensions for specialized digital objects (datasets, software, ontologies, VRE, etc.).In (Wilkinson et al. 2019), the authors propose a community-driven framework to assess the FAIRness of individual digital objects. This framework is based on:a collection of maturity indicators,principle compliance tests, anda module to apply those tests to digital resources.The proposed indicators may be a starting point to define which tests are needed for each type of resource (de Miranda Azevedo and Dumontier 2020).Aggregation of FAIR metrics Another challenge is the best way to produce an assessment score for a FDO, independently of the tests that are run to assess it. For example, each of the four dimensions of FAIR (Findable, Accessible, Interoperable and Reusable) usually have a different number of associated assessment tests. If the final score is presented based on the number of tests, then by default some dimensions may have more importance than others. Similarly, not all tests may have the same importance for some specific resources (e.g., in some cases having a license in a resource may be considered more important than having its full documentation).In our work we consider a FDO as an aggregation of resources, and therefore we face the additional challenge of creating an aggregated FAIRness score for the whole FDO. We consider the following aggregation scores:Global score: calculated by formula (see Fig. 1-1). It represents the percentage of total passed tests. It doesn’t take into account the principle to which a test belongs.FAIR average score: calculated by formula (see Fig. 1- 2). It represents the average of the passed tests ratios for each principle plus the ratio of passed tests used to evaluate the Research Object itself.Both metrics are agnostic to the kind of resource analyzed. The score they produce ranges from [0 - 100].Discussion A FDO has metadata records that describe it. Some records are common for all DOs, and others are specific to a DO. This makes it difficult to assess some FAIR principles like "F2: "data are described with rich metadata". Therefore, we believe a discussion of a minimal set of FAIR metadata should be addressed by the community.In addition, a FAIR assessment score can change significantly depending on the formula used for aggregating all metrics. Therefore, it is key to explain to users the method and provenance used to produce such score. Different communities should agree on the best scoring mechanism for their FDOs, e.g., by adding a weight to each principle and figuring out the right number of tests for each principle, which may give more importance to the principles with tests.We believe that the objective of a FAIR scoring system should not be to produce a ranking, but become a mechanism to improve the FAIRness of a FDO. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • How does software fit into the FDO landscape'

    • Abstract: Research Ideas and Outcomes 8: e95724
      DOI : 10.3897/rio.8.e95724
      Authors : Carlos Martinez-Ortiz, Carole Goble, Daniel Katz, Tom Honeyman, Paula Martinez, Michelle Barker, Leyla Jael Castro, Neil Chue Hong, Morane Gruenpeter, Jennifer Harrow, Anna-Lena Lamprecht, Fotis Psomopoulos : In academic research virtually every field has increased its use of digital and computational technology, leading to new scientific discoveries, and this trend is likely to continue. Reliable and efficient scholarly research requires researchers to be able to validate and extend previously generated research results. In the digital era, this implies that digital objectsKahn and Wilensky 2006 used in research should be Findable, Accessible, Interoperable and Reusable (FAIR). These objects include (but are not limited to) data, software, models (for example, machine learning), representations of physical objects, virtual research environments, workflows, etc. Leaving any of these digital objects out of the FAIR process may result in a loss of academic rigor and may have severe consequences in the long term for the field, such as a reproducibility crisis. In this extended abstract, we focus on research software as a FAIR digital object (FDO).The FDO framework De Smedt et al. 2020 describes FDOs as being actionable units of knowledge, which can be aggregated, analyzed, and processed by different types of algorithms. Such algorithms must be implemented by software in one form or another. The framework also describes large software stacks supporting FDOs enabling responsible data science and increasing reproducibility. This implies that software is a key ingredient of the FDO framework, and should adhere to the FAIR principles. Software plays multiple roles: it is a DO itself, it is responsible for creating new FDOs (e.g., data) and it helps to make them available to the public (e.g., via repositories and registries). However there is a need to specify in more detail how non-data DOs, in particular software, fit in this framework.Different classes of digital objects have different intrinsic properties and ways to relate to other DOs. This means that while they, in principle, are subject to the high-level FAIR principles, there are also differences depending on their type and properties, requiring an adaptation so FAIR implementations are more aligned to the digital object itself. This holds true in particular to software. Software has intrinsic properties (executability, composite nature, development practices, continuous evolution and versioning, and packaging and distribution) and specific needs that must be considered by the FDO framework. For example, open source software is typically developed in the open on social coding platforms, where releases are distributed through package management systems, unlike data that is typically published in archival repositories. These social coding platforms do not provide long term archiving, permanent identifiers, or metadata, and package management systems, while somewhat better, similarly do not make a commitment to long term archiving, do not use identifiers that fit the scholarly publication system well, and provide metadata that may be missing key elements. The FAIR for research software (FAIR4RS, Chue Hong et al. 2021) working group has dedicated significant effort in building a community consensus around developing FAIR principles that are customized for research software, providing methods for researchers to understand and address these gaps.In this presentation we will highlight the importance of software for the FAIR landscape and why different (but related) FAIR principles are needed for software (vs those originally developed for data). Our goal here is to contribute to building an FDO landscape together, where we consider all different types of digital objects that are essential in today's research, and we are enthusiastic about contributing our expertise on research software in helping shape this landscape. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Facing the Challenges in simulation-based Earth System Sciences and the
           Role of FAIR Digital Objects

    • Abstract: Research Ideas and Outcomes 8: e95817
      DOI : 10.3897/rio.8.e95817
      Authors : Ivonne Anders, Karsten Peters-von Gehlen, Martin Bergemann, Hannes Thiemann : MotivationResults of simulations with climate models form the most important basis for research and statements about possible changes in the future global, regional and local climate. These output volumes are increasing at an exponential rate (Balaji et al. 2018, Stevens et al. 2019). Efficiently handling these amounts of data is a challenge for researchers, mainly because the development of novel data and workflow handling approaches have not proceeded at the same rate as data volume has been increasing. This problem will only become more pronounced with the ever increasing performance of High Performance Computing (HPC) - systems used to perform weather and climate simulations (Lawrence et al. 2018). For example, in the framework of the European Commission's Destination Earth program the Digital Twins (Bauer et al. 2021) are expected to produce hundreds of terabytes of model output data every day at the EuroHPC computing sites.The described data challenge can be dissected into several aspects, two of which we will focus on in this contribution. Available data in the Earth System Sciences (ESS) are increasingly made openly accessible by various institutions, such as universities, research centres and government agencies, in addition to subject-specific repositories. Further, the exploitability of weather and climate simulation output beyond the expert community by humans and automated agents (as described by the FAIR data principles (F-Findable, A-Accessable, I-Interoperable, R-Reusable), Wilkinson et al. 2016) is currently very limited if not impossible due to disorganized metadata or incomplete provenance information. Additionally, developments regarding globally available and FAIR workflows in the spirit of the FAIR Digital Object (FDO) framework (Schultes and Wittenburg 2019, Schwardmann 2020) are just at the beginning.Cultural ChangeIn order to address the challenges with respect to data mentioned above, current efforts at DKRZ (German Climate Computing Center) are aimed at a complete restructuring of the way research is performed in simulation-based climate research (Anders et al. 2022, Mozaffari et al. 2022, Weigel et al. 2020). DKRZ is perfectly suited for this endeavor, because researchers have the resources and services available to conduct the entire suite of their data-intensive workflows - ranging from planning and setting up of model simulations, analyzing the model output, reusing existing large-volume datasets to data publication and long-term archival. At the moment, DKRZ-users do not have the possibility to orchestrate their workflows via a central service, but rather use a plethora of different tools to piece them together.Framework Environment FrevaThe central element of the new workflow environment at DKRZ shall be represented by the Freva (Free Evaluation System Framework) software infrastructure, which offers standardized data and tool solutions in ESS and is optimized for use on high-performance computer systems (Kadow et al. 2021). Freva is designed to be very well suited to the use of the FDO framework. The crucial aspects here are:the standardisation of data objects as input for analysis and processing,the already implemented remote access to data via a Persisitent Identifier (PID),the currently still system-internal capture of analysis provenance andthe possibility of sharing results but also workflows by research groups up to large communities.It is planned to extend the functionality of Freva so that the system automatically determines the data required for a specific analysis from a researcher’s research question (provided to the system via some interface), enquires available databases (local disk or tape, cloud or federated resources) for that data and retrieves the data if possible. If data are not available (yet), Freva shall be able to automatically configure, set up and submit model simulations to the HPC-System, so that the required data is created and becomes available (cf. Fig. 1). These data will in turn be ingested into Freva’s data catalog for reuse. Next, Freva shall orchestrate and document the analysis performed. Results will be provided either as numerical fields, images or animations depending on the researcher’s need. As a final step, the applied workflow and/or underlying data are published in accordance with the FAIR data guiding principles.FDOs - towards a global integrated Data Space To make the process sketched out above a reality, application of the FDO concept is essential (Schwardmann 2020, Schultes and Wittenburg 2019). There is a long tradition in the ESS community of global dissemination and reuse of large-volume climate data sets. Community standards like those developed and applied in the framework of internationally coordinated model intercomparison studies (CMIP) allow for low-barrier reuse of data (Balaji et al. 2018). Globally resolvable PIDs are provided on a regular basis. Current community ESS standards and workflows are already close to being compatible with implementing FDOs, however, now we also have to work on open points in the FDO concept, which are:the clear definition of community-specific FDO requirements including PID Kernel Types specifications,the operation of data type registries andthe technical implementation requirements for global access to FDOs.With these in place and implemented in Freva following standardized implementation recommendations, automated data queries across spatially distributed or different types of local databases become possible.We introduce the concept of implementations in Freva and also use it to highlight the challenges we face. Using an example, we show the vision of the work of a scientist in earth system science. HTML
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • FDO Project for Germany

    • Abstract: Research Ideas and Outcomes 8: e95806
      DOI : 10.3897/rio.8.e95806
      Authors : Peter Wittenburg, Hans Günther Döbereiner, Stefan Weisgerber, Giovanna Morigi : In Germany there is much agreement on a necessary step towards convergence in the domain of digital entities across sectors given the increasing number of emerging data spaces in research, industry and public services. Therefore a group of FAIR Digital Objects (FDO) experts is working on a proposal that willdemonstrate the functionality of FDOs, their added value and the scalability of their components,establish an active FDO community across sectors and disciplines collaborating beyond the project,establish a network of key persons promoting and advancing the FDO standard and its applications in collaboration with the FDO Forum,implement a set of FDO applications for selected use cases from economy and applied research,advancing the further development of FDO specifications and their transformation to international standards andsupport the international FDO Forum initiative.In addition to the standard activities such as PR, outreach, organising meetings, management etc. the project is designed to support a few major pillars:A basic infrastructure for FDOs will be made professional and usable by everyone interested based on what has already been developed in the realm of the FDO Forum.Three use cases will be implemented to serve as demonstrators of FDOs: (a) A testbed of a set of repositories from research and industry including those that are applying standards developed in industrial initiatives such as Industry 4.0 and Int. Data Space Association. (b) Processes in time will be modelled with FDOs to demonstrate the secure mechanisms provided by FDOs. (c) Time will be spent on implementing methods where FDOs help to organise the huge data space as it will emerge in future. These use cases need to be worked out in the 3-year project that will act as demonstrators across borders.Three research motivated use cases will be tackled as well. The collaboratiing experts believe that it is important already now to introduce quantum computing and their possible impact on data spaces. The group iof experts also wants to not only implement FDO applications but also wants to investigate the foundations of FDOs. In addition, two concrete cases have been selected to demonstrate the value of FDOs (cancer database, tomography imaging).A Thinktank is planned to discuss many upcoming aspects related to FDOs and these large data spaces such as legal and ethical aspects, philosophy of information, social impact of digital transformation, codification of roles and usage scenarios, etc.Other goals of the project are to further develop the FDO specifications in close collaboration with the FDO Forum based on the insights from the implementations, transform the specifications to international standards, and set up certification mechanisms. These activities will be led by standardisation organisations (DIN, DKE) which are embedded in ISO/IEC groups.It is an explicit wish of the funding institution for FDOs to help to create a global integrated data space which requires a close collaboration with existing industrial initiatives such as Big Data Value Association, Industry 4.0, Int. Data Space Association, Gaia-X etc. Therefore, the project partners are currently in discussion with these initiatives to bring in their expertise and components wherever that makes sense. Industry for example is busy formalising “roles” and “usages” which is not a topic of the FDO Forum.The project partners are in close contact with the German FDO experts already contributing to the FDO Forum discussions to integrate their expertise where possible into the project. Although the project will focus on activities in Germany we will seek to reach out to other European countries and beyond. It is obvious that the project will need to invest in training from the beginning leveraging on the already existing knowledge in the FDO Forum, DONA and ePIC for example.We are currently in discussion with the funding agency in shaping this project with the intention to start it already in 2022. With this project being granted we hope to be able to advance the international attempts to achieve a higher degree of convergence and thus efficiency in the domain of digital objects. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • A portal for indexing distributed FAIR digital objects for catalysis
           research

    • Abstract: Research Ideas and Outcomes 8: e95770
      DOI : 10.3897/rio.8.e95770
      Authors : Abraham Nieva de la Hidalga, Josephine Goodall, Richard Catlow, Corinne Anyka, Brian Matthews : A research object (RO) is defined as a semantically rich aggregation of (potentially distributed) resources that provide a layer of structure on top of information delivered as linked data (Bechhofer et al. 2013, Soiland-Reyes et al. 2022). A RO provides a container for the aggregation of resources, produced, and consumed by common services and shareable within and across organisational boundaries. This work sees research digital objects as composites which may consist of objects hosted in different repositories.In catalysis research, the characterisation of a sample may require analysing experimental data obtained from an instrument, data from a computer model, and/or comparing to data from a specialized database. Additionally, data may need to be reduced and cleaned before analysis, resulting in intermediate data. In this scenario the composite research object is integrated by all these data objects and their corresponding metadata. UK Catalysis Hub (UKCH) researchers perform these tasks as part of their day-to-day work. However, most of the time they need to manually collect, catalogue, and preserve all these data assets.The UKCH aims to support researchers with tools and services for the management and processing of data, through the development of the Catalysis Data Infrastructure (CDI Nieva de la Hidalga et al. 2022a) and the Catalysis Research Workbench (CRW Nieva de la Hidalga et al. 2022b). This work is integrated in the context of the Physical Sciences Data Infrastructure (PSDI Coles and Knight 2022). The PSDI aims to provide a layer that enables transparent access to existing resources whilst ensuring that they remain dedicated to its specific application. The intention is to explore the concept of the composite research digital object and the services required to facilitate both human and programmatic interactions with those objects to browse, review, retrieve, and use digital objects in the context of the research produced by UKCH scientists. The CDI will act as a thematic portal presenting data managed through the PSDI and serve as an example for the development of similar portals targeting specific research domains.In this case, the CDI is in the process of being redesigned with a sematic metadata model. The basic ontologies being considered for this model are: DCAT (Albertoni et al. 2022) will encode the metadata of digital objects; PROV-O (Belhajjame et al. 2013) will track the generation of digital objects. SPAR (Peroni and Shotton 2018) to encode publications data; SCHOLIX to encode the links between publications and data objects (Burton et al. 2017); FOAF (Brickley and Miller 2014) to encode researcher information; the Organization Ontology (ORG Reynolds 2014) to encode institution information; EXPO (Soldatova et al. 2006) to encode experiment information; and various domain specific ontologies for adding metadata about experiments, for instance CHEBI (Hastings et al. 2011), CHEMINF (Hastings et al. 2015), and FIX (Chebi-Administrators 2005).The implementation of the CDI using these ontologies will provide a roadmap for the integration of FAIR data object repositories with a service infrastructure which supports reproducibility, reuse of data, reuse of processing tools and implementation of advanced processing tools.The integration of the CDI and CRW with existing and new infrastructures will further support the work of catalysis scientists. In this context, a researcher can access the CDI to look for publications, see if there are data objects linked to them, and then look for processing tools which can be used to reproduce the results. An experiment for an early use case demonstrated the feasibility of reproducing published results using data and metadata linked to existing publications (Nieva de la Hidalga et al. 2022b). In the experiment, papers citing processable data were used to retrieve, process, and reproduce published results with no need for contacting the authors. Fig. 1 presents a view of the experiment performed.The current practices of publishing catalysis research data can be seen as aligned to the FAIR data principles, for instance Fig. 1 above can be also seen as Fig. 2Reproducing results required several human-centered activities, partly due to the encoding of the metadata as text documents. The challenge is to accelerate and automate these processes. It is important to highlight the role of cataloguing interfaces, such as the CDI, containing DO crates with only metadata and links to the different data assests that constitute the composite digital objects. The users of these interfaces will in turn rely on transparent services which do not require them to manually track the location and formats of the data assets they want to retrieve and use. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Visualization of Materials Science Topics in Publications of Institutional
           Repository using Natural Language Processing

    • Abstract: Research Ideas and Outcomes 8: e95679
      DOI : 10.3897/rio.8.e95679
      Authors : Sae Dieb, Keitaro Sodeyama, Mikiko Tanifuji : SAMURAI (NIMS 2022), a directory service of the National Institute for Materials Science (NIMS) researchers in Japan was launched in 2009 following the development of NIMS institutional repository (Tanifuji et al. 2019). The concept is to synchronize between profile information of researchers and their publications which are self-archived in the repository system. The SAMURAI was renewed in 2017 with interoperable functions with ORCID. SAMURAI supports various links to not only individual articles and patents, but also to databases such as KAKEN (Database of Grants-in-Aid for Scientific Research by NII). The service has yielded fully identified authors of journal articles from research members of NIMS by implementing a unique ResearcherID. Through this directory, NIMS is promoting materials research, supporting management of its researchers activities, and introducing NIMS researchers and their work to the public.In this work, we present an application to describe each researcher's output topics automatically from the archived research papers in the repository, by implementing materials science specific natural language processing developed in our study (Dieb et al. 2021) that visualizes the research trend of each SAMURAI researchers. The approach can maximize information absorbance for general audience and fully corresponds to open science policy.A list of publications' digital object identifiers (
      DOI s
      DOI 2022) for each researcher was constructed from his profile in SAMURAI. (In SAMURAI, the
      DOI s are stored in a PostgreSQL database). Using the
      DOI , recent publications were retrieved from NIMS text data mining platform (TDM-PF) in their XML format which were mainly available from 2003. Representative topic terms from their research publications that are related to materials science and engineering were extracted. We utilize term frequency analysis and automatic extraction for materials names to extract these necessary informative terms. Additionally, domain knowledge resources such as dictionaries were used. Data was preprocessed using noise reduction such as removing general English language stop words and physical units filtering. Such words do not have significance on their own. Word cloud approach was used for visualization (Fig. 1).This work brings us an opportunity to apply our NLP experience to mine information from research papers for public knowledge as a step towards data-driven materials science. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • The Vision of the FAIR Digital Object Machine and Ubiquitous FDO Services

    • Abstract: Research Ideas and Outcomes 8: e95268
      DOI : 10.3897/rio.8.e95268
      Authors : Peter Wittenburg, Christophe Blanchi, Claus Weiland, Ivonne Anders, Karsten Peters, Ulrich Schwardmann, George Strawn : In addition to the previous intensive discussion on the “Data Deluge” with respect to enormous increase of available research data, the 2022 Internet-of-Things conference confirmed that in the near future there will be billions if not trillions of smart IoT devices in a very wide range of applications and locations, many of them with computational capacities. This large number of distributed IoT devices will create continuous streams of data that will require a global framework to facilitate their integration into the Internet to enable controlled access to their data and services, to name but a few aspects. This framework would enable tracking of these IoT devices to measure their resource usage for instance to globally address the UN Sustainable Development Goals. Additionally, policy makers are committed to define regulations to break data monopolies and increase sharing. The result will be an increasingly huge domain of accessible digital data which on the one hand allows addressing new challenges especially cross-sector ones. A key prerequisite for this is to find the right data across domain boundaries supporting a specific task.Digitisation is already being called the fourth industrial revolution and the emerging data and information is the 21st century's new resource. Currently this vision is mostly unrealised due to the inability of existing data and digital resources to be findable, accessible, interoperable, and reusable despite the progress in providing thematic catalogs. As a result, the capacity of this new resource is latent and mostly underutilized. There is no Internet level infrastructure that currently exists to facilitate the process by which all data and digital resources are made consistently and globally accessible. There are patchworks of localized and limited access to trusted data on the Internet created by specific communities that have been funded or directed to collaborate.To turn digital information into a commodity, description, access to, validation, and processing of data needs to become part of the Internet infrastructure we call the Global Integrated Data Space (GIDS). The main pillars of this approach require that data and services be globally identified and consistently accessed, with predictive descriptions and access control to make them globally findable.Currently researchers are relying partly on informal knowledge such as knowing the labs and persons to maximize the chance to access trustworthy data, but this method is limiting the use of suitable data. In the future data scenario, other mechanisms will become possible. In the public information space Google-like searches using specific key terms have become an accepted solution to find documents for human consumption. This approach however, does not work in the GIDS with large numbers of data contributors from a wide range of institutions, from millions of IoT devices worldwide, and where a wide range of data types and automatic data processing procedures dominate. Indeed, successful labs that apply complex models describing digital surrogates can automatically leverage data and data processing procedures from other labs. This makes the currently often operationally applied manual stitching of data and operations too costly both in time and resources to be a competitive option. A researcher looking for specific brain imaging data for a specific study has a few options:Rely on a network of colleagues.Execute Google-like searches in known registries looking for appropriate departments and researchers.Execute Google-like searches on suitable data.He/she engages an agent to execute profile matching in suitable sub-spaces.We assume that data creators will have the capability and be interested to create detailed metadata of different types and that the researchers, who are looking for specific data, will be able to specify precise profiles for data they are looking for. Two of the key characteristics of the future data space will be operations that can carry out profile matching at ultra-high speeds and that will lead to various subspaces according to some facets using self-organizing mechanisms. Of course, this poses high requirements on the metadata quality being used and that creators and potential consumers share knowledge about the semantic space in which they operate, and available semantic mappings used by brokers or self-provided. Metadata must be highly detailed and suitable schemas have been developed already in many communities. In addition to the usual metadata, potential users will need to specify their credentials in the form of trusted roles and their usage purposes to indicate access opportunities.Changing current metadata practices to yield richer metadata as presribed by the FAIR principles will not be simple, especially since we seem to be far away from formalizing roles and usage purposes in a broadly accepted way, but the pressure to create rich and standardized metadata will increase. It should be noted of course that for data streams created by IoT sensors, defining proper metadata is an action that is only requested once or a few times.Why are FDOs special in this automatic profile matching scenario' FDOs are bundling all information required for automatic profile matching in a secure way, i.e., all metadata information are available via the gloablly unique resolvable and persisten identifiers (PID) of the FDO and the PID security mechanisms are at the basis to establish trust. FDOs will be provided with a secure method that is capable of computing efficiently coded profiles representing all properties of an FDO relevant for profile matching. This would speedup profile matching enormously.We will address two major questions characteri...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Enhancing RDM in Galaxy by integrating RO-Crate

    • Abstract: Research Ideas and Outcomes 8: e95164
      DOI : 10.3897/rio.8.e95164
      Authors : Paul De Geest, Frederik Coppens, Stian Soiland-Reyes, Ignacio Eguinoa, Simone Leo : We introduce how the Galaxy research environment (Jalili et al. 2020) integrates with RO-Crate as an implementation of Findable Accessible Interoperable Reproducible Digital Objects (FAIR Digital Objects / FDO) (Wilkinson et al. 2016, Schultes and Wittenburg 2018) and how using RO-Crate as an exchange mechanism of workflows and their execution history helps integrate Galaxy with the wider ecosystem of ELIXIR (Harrow et al. 2021) and the European Open Science Cloud (EOSC-Life) to enable FAIR and reproducible data analysis.RO-Crate (Soiland-Reyes et al. 2022) is a generic packaging format containing datasets and their description using standards for FAIR Linked Data. The format is based on schema.org (Guha et al. 2016) annotations in JSON-LD, which allows for rich metadata representation. The RO-Crate effort aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.The RO-Crate community brings together practitioners from very different backgrounds, and with different motivations and use cases. Among the core target users are:researchers engaged with computation and data-intensive, workflow-driven analysis;digital repository managers and infrastructure providers;individual researchers looking for a straightforward tool or how-to guide to “FAIRify” their data;data stewards supporting research projects in creating and curating datasets.Given the wide applicability of RO-Crate and the lack of practical implementations of FDOs, ELIXIR (Harrow et al. 2021) co-opted this initiative as the project to define a common format for research data exchange and repository entries. Thus, during the last year it’s been implemented in a wide range of services, such as: WorkflowHub (Goble et al. 2021) (a registry for describing, sharing and publishing scientific computational workflows) uses RO-Crates as an exchange format to improve reproducibility of computational workflows that follow the Workflow RO-Crate profile (Bacall et al. 2022); LifeMonitor (Leo et al. 2022) (a service to support the sustainability of computational workflows being developed as part of the EOSC-Life project) uses RO-Crate as an exchange format for describing test suites associated with workflows. Tools have been developed towards aiding the previously mentioned use cases and increasing the general usability of RO-Crates by providing a user-friendly (programmatic) interface for consumption and production of RO-Crates through programmatic libraries for consuming/producing RO-Crates (ro-crate-py De Geest et al. 2022, ro-crate-ruby Bacall and Whitwell 2022, ro-crate-js Lynch et al. 2021).The Galaxy project provides a research environment with data analysis and data management functionalities as a multi user platform, aiming to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. As such, it stores not just analysis related data but also the complete analytical workflow, including its metadata. The internal data model involves the history entity, including all steps performed in a specific analysis, and the workflow entity, defining the structure of an analytical pipeline. From the start, Galaxy aims to enable reproducible analyses by providing capabilities to export (and import) all the analysis history details and workflow data and metadata in a FAIR way. As such it helps its users with the daily research data management. The Galaxy community is continuously improving and adding features, the integration of the FAIR Digital Object principles is a natural next step in this. To be able to support these FDOs, Galaxy leverages the RO-Crate Python client library (De Geest et al. 2022) and provides multiple entry points to import and export different research data objects representing its internal entities and associated metadata. These objects include:a workflow definition, which is used to share/publish the details of an analysis pipeline, including the graph of tools that need to be executed, and metadata about the data types requiredindividual data files or a collection of datasets related to an analysis historya compressed archive of the entire analysis history including the metadata associated with it such as the tools used, their versions, the parameters chosen, workflow invocation related metadata, inputs, outputs, license, author, CWLProv description (Khan et al. 2019) of the workflow, contextual references in the form of Digital Object Identifiers (
      DOI s), ‘EMBRACE Data And Methods’ ontology (EDAM) terms (Ison et al. 2013), etc. The adoption of RO-crate by Galaxy allows a standardised exchange of FDOs with other platforms in the ELIXIR Tools ecosystem, such as WorkflowHub and LifeMonitor. Integrating RO-Crate deeply into Galaxy and offering import and export options of various Galaxy objects such as Research Objects allows for increased standardisation, improved Research Data Management (RDM) functionalities, smoother user experience (UX) as well as improved interoperability with other systems. The integration in a platform used by biologists to do data intensive analysis, facilitates the publication of workflows and workflow invocations for all skill levels and democratises the ability to perform Open Science. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Connecting Repositories to one Integrated Domain

    • Abstract: Research Ideas and Outcomes 8: e95153
      DOI : 10.3897/rio.8.e95153
      Authors : Peter Wittenburg, Christophe Blanchi, Daan Broeder : Information is the new commodity in the global economy and trustworthy digital repositories will be the key pillars within this new ecosystem. The value of this digital information will only be realised if these repositories can be interacted with in a consistent manner and their data accessible and understandable globally. Establishing a data interoperability layer is the goal of the emerging domain of Digital Objects. When considering how to proceed with designing this interoperability layer, it is important to state that repositories need to be considered from two different perspectives:Repositories are a reflection of the institutions that make them operational (quality of service, skilled experts, accessible over many years, appropriate data management procedures).Repositories are computational services that provide a specific set of functions.Complicating the effort to make repositories accessible and interoperable across the global is that many existing repositories have been developed in the past decades using a wide range of heterogeneous technologies, organisation of data and functionality. Many of these repositories are their own data silos and not interoperable. It is important to realise that much money has been invested to build these repositories and therefore we cannot expect that they will make large changes without great incentives and funding. This heterogeneity is the core of the challenge in making digital information the new commodity in the emerging global domain of digital objects.This paper will focus on the functional aspects of repositories and proposes the FAIR Digital Object model as a core data model for describing digital information and the use of the Digital Object Interface Protocol (
      DOI P) to establish interoperable communication with all repositories independently of the respective technical choices. It is the conviction of this paper’s authors that this integration of the FDO model and
      DOI P with existing repositories can be performed with minimal effort and we will present examples that document this claim.We will present three examples of existing integration in this paper:An integration of B2SHAREA CORDRA repositoryIntegration of the DOBES archiveB2SHARE is a repository that has assigned Persistent Identifiers (PIDs) (Handles) to all of its digital files. It allows users to add metadata according to a unified schema, but also has the possibility for user communities to extend this schema. The API allows one to specify a Handle which then gives access to the metadata and/or the bit sequences of the DO. It should be noted that B2SHARE allows one to include a set of bit-sequences being linked with the Handle. The integration consists of building a proxy that would provide a
      DOI P interface to B2SHARE to streamline the integration of the data and metadata into a single DO. The development of the proxy was relatively simple and did not require any changes on behalf of the B2SHARE repository. CORDRA is a CNRI repository/registry/registration system that manages DO, assigns Handles to all its DOs and is accessible through
      DOI P. For all intents and purposes, it implements many of the features from the Digital Object Architecture.The integration of the two repositories enables copying files or movíng digital objects. In the case of copying files (metadata and bit sequences) from B2SHARE to CORDRA, for example, all functionality of the CORDRA service such as searching would become possible. Important is that in this case the PID record identifying the digital object in the B2SHARE repository would have to be extended to point to the alternative path, and the API of B2SHARE would have to offer the alternative access paths to a client. This latter aspect has not been implemented. Moving a DO from B2SHARE to CORDRA would result in changing the ownership of the PID and adding the updated information about the DO.This adaptation was not done yet, but since this archive has some special functionalities, it is interesting to discuss the way of adaptation which could be chosen. In the DOBES archive each bundle of closely related digital objects is assigned a Handle and also metadata is treated as a digital object, i.e., it has a separate Handle. For management reasons and especially for enabling different contributors to maintain control of access rights, a tree structure was developed to allow contributors to organise their data according to specific criteria and users to browse the archive in addition to execute searches on the metadata.While accessing archival objects is comparatively simple, the ingest/upload feature is more complex. It should be noted that the archive supports establishing a canonical tree of resources to define scopes for authorisation (define who has the right to grant access permissions, etc.), and facilitating lookup by supporting browsing according to understandable criteria. Therefore, depositors need to specify where in the tree the new resources should be integrated, and which initial rights are associated with them. After uploading the gathered information into a workspace, the archive carries out many checks in a micro-workflow: metadata is checked against vocabularies and partly curated, types of bit-sequences are checked and aligned with the information in the metadata, etc. An operation has been developed which is called gatekeeper to ensure a highly consistent archive despite the many (remote) people contributing to its content. Thus, the archive requires a set of 4 information units being specified:the set of bit-sequences to be uploaded,the metadata describing the bundle,the node to be used to organise the resources andthe initial rights where the default would be “open”.Adapting this archive to
      ...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Interacting FDOs for Secure Processes

    • Abstract: Research Ideas and Outcomes 8: e95152
      DOI : 10.3897/rio.8.e95152
      Authors : Peter Wittenburg, Christophe Blanchi : In modern industry, administration and research there are many processes that involve distributed actors needing to securely create, update and manage information. Typical examples for such processes are supply chains in production industry and treatments in the medical area. Such a process can be characterised by a few key properties:they are driven by discrete events in time that need to be recordedthey allow different authenticated actors contribute to state informationthey must guarantee that existing information cannot be overwrittenthey are characterised by a high degree of automationNot all applications will require that all properties be met, there are also workflow processes in the research domain, for example. In this paper we will discuss the use case where an FAIR Digital Objects (FDO) is used as a digital surrogate for a physical product, specifically to act as a Digital Product Pass (DPP) which is an electronic document that fully describes the properties of a given product with its own unique global identifier. Each digital object surrogate can then be represented by rendering its ID as a QR code which can then easily be scanned by a client to access information about the object or to interact with that object. To constrain the scope of our example, we will only discuss what happens when a product leaves the factory, is put on a truck together with other products and is shipped to a destination. The requirement in our case is to adapt the DPP so it includes the greenhouse gas emissions incurred by the product during its shipment. In this process we basically have the following events:the product is identified and its manufacturing details specified.the product enters the truck and is detected andthe product leaves the truck.In all three events some interactions and information updates need to be executed automatically, i.e. we assume that the product is associated with a sensible identity which can be read by a sensor coupled with an IoT edge device on the truck.In the general case, our model describes interactions between FDOs where any FDO can potentially interact with any other FDO as their physical objects interact in the physical world. Any FDO that can authenticate itself using a Public Key Iinfrastructure challenge and have the proper credentials will be able to add to the state of another FDO. Whenever two FDO interact, each FDO can register the interaction as an event FDO that is recorded at a location specified within each FDO. The ability to register an event can require a different sort of authentication and access control but a simple validated digital signature from the creator of the event is a simple yet effective way to control access.Our example includes 3 entities the factory (F), the truck company (TC) and a third party that acts as trusted entity (TE) to manage shared information. Each entity is represented as an FDO containing a public key that it can use to authenticate itself as well as a certificate of that key from a trusted entity. The factory instantiates a Product FDO (FDO-Px) for each product and based on an agreement with the trusted entity a DPP for that product-(FDO-Dx). The truck company also instantiates a Truck FDO (FDO-Ty). Each FDO has a public key and a certificate. This certificate would reflect the agreement between the factory and the truck company that authorizes each other to be able to create event FDOs (FDO-Ez) used, record each encounter between their FDOs, and potentially the option to extend the DPP FDO (FDO-Dx). Each FDO also has its own set of methods which can be executed, and which make use of secure communication and exchange their public key.The first interaction is triggered when the product enters the truck and is detected by the truck’s edge device. This edge device is configured to cause the FDO-Ty to register an event by invoking a pre-determined method and passing the ID of the product it detected.FDO-Ty has a few methods that allow it to inform FDO-Fx about the event and will probably have access to create some information in the truck company’s database.FDO-Px will have methods to update the appropriate database in the factory so that the factory can trace what happened.FDO-Ty will also be able to create an event FDO FDO-Ex using the FDO-Px event method and trigger clock to wait on a message from FDO-Px.When both FDOs have informed the event FDO that a specific event type happened, the FDO-Ex will use a method to update its event table and the event is signed by both keys.The second interaction happens when the product leaves the truck and the truck’s edge device sensors notice this action. The same procedure will happen again with one extension: (x1) Now the truck FDO-Ty will do some computations according to some algorithm instantiated by the truck company about the additional GHG emissions associated with the transport of the product (x2). This will cause the DPP FDO, FDO-Dx, to update a data structure maintained by a trusted party.The benefits of this method are as follows:All digital surrogates are FDOs and provide a standardized access method.All structures are encapsulated and can only be manipulated by tested methods embedded in the corresponding FDOs.Methods are extensible and are themselves defined as FDOs.All events will be signed by the keys of both parties involved making them authenticated and traceable.The systematic use of PIDs makes it possible to follow each action by appropriate analysis functions that have the right to read using methods in the corresponding FDOs.The system can be easily extended to different scenarios and different numbers of actors involved HTML XML
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Collaborative Metadata Definition using Controlled Vocabularies, and
           Ontologies

    • Abstract: Research Ideas and Outcomes 8: e94931
      DOI : 10.3897/rio.8.e94931
      Authors : Ilia Bagov, Christian Greiner, Nikolay Garabedian : Data's role in a variety of technical and research areas is undeniably growing. This can be seen, for example, in the increased investments in the development of data-intensive analytical methods such as artificial intelligence (Zhang 2022), as well as in the rising rate of data generation which is expected to continue into the near future (Rydning and Shirer 2021). Academic research is one of the areas, where data is the lifeblood of generating hypotheses, creating new knowledge, and reporting results. Unlike proprietary industry data, academic research data is often subjected to stricter requirements regarding transparency, and accessibility. This is in part due to the public funding which many research institutions receive. One way to fulfil these requirements is by observing the FAIR (Findability, Accessibility, Interoperability, Reusability) principles for scientific data (Wilkinson et al. 2016). These introduce a variety of benefits, such as increased research reproducibility, a more transparent use of public funding, and environmental sustainability. A way of implementing the FAIR principles in practice is with the help of FAIR Digital Objects (FDOs) (European Commission: Directorate-General for Research and Innovation 2018). A FDO consists of data, an accompanying Persistent Identifier (PID), and rich metadata which describes the context of the data. Additionally, the data format contained in an FDO should be widely used, and ideally open. Our presentation is focused on the third of FDO's components mentioned previously – metadata. It outlines the concept for a framework which enables the collaborative definition of metadata fields which can be used to annotate FDO-encapsulated data for a given domain of research.The first component of the presented framework is a controlled vocabulary of the domain related to the data which needs to be annotated. A controlled vocabulary is a collective that denotes a controlled list of terms, their definitions, and the relations between them. In the framework presented in this contribution, the terms correspond to the metadata fields used in the data annotation process. Formally, the type of controlled vocabularies used in the framework is a thesaurus (National Information Standards Organization 2010). Thesauri consist not only of the elements mentioned previously, but also allow for the inclusion of synonyms for every defined term. This eliminates the ambiguity which can occur when using terms with similar definitions. Additionally, thesauri specify simple hierarchical relations between the terms in the vocabulary, which can provide an explicit structure to the set of defined metadata fields. The most important feature of our framework, however, is that the controlled vocabularies can be developed in a collaborative fashion by the domain experts of a given research field. Specifically, people are able to propose term definitions and edits, as well as cast votes on the appropriateness of terms which have already been proposed.Despite their advantages, one limit of thesauri is their lacking capability of relating metadata fields to each other in a more semantically rich fashion. This motivated the use of the second component of the framework, namely ontologies. An ontology can be defined as “a specification of a conceptualization” (Gruber 1995). More precisely, it is a data structure which represents entities in a given domain, as well as various relations between them. After a set of metadata fields has been defined within a controlled vocabulary, that vocabulary can be transformed into an ontology which contains additional relations between the fields. These can extend beyond the hierarchical structure of a thesaurus and can contain domain-specific information about the metadata fields. For example, one such relation can denote the data type of the value which a given field must take. Furthermore, ontologies can be used to link not only metadata, but also data, as well as individual FDOs themselves. This can contribute to the Reusability aspect of FAIR Data Objects. For example, an FDO generated by a research group in a given domain can be linked to an existing domain ontology. Afterwards, the FDO can be reused more easily by researchers from the same scientific field, because the ontology will have already specified the FDO's relation to the subject area. Additionally, cross-domain ontologies can be combined with each other which can increase the reusability of FDOs beyond their domain boundaries.The components described above are being implemented in the form of multiple software tools related to the framework. The first one, a controlled vocabulary editor written as a Python-based web application called VocPopuli, is the entry point for domain experts who want to develop a metadata vocabulary for their field of research or lab. The software, whose first version is already being tested internally, enables the collaborative definition, and editing of metadata terms. Additionally, it annotates each term, as well as the entire vocabulary, with the help of the PROV Data Model (PROV-DM) (Moreau and Missier 2013) - a schema used to describe the provenance of a given object. Finally, it assigns a PID to each term in the vocabulary, as well as the vocabulary itself. It is worth noting that the generated vocabularies themselves can be seen through the prism of FDOs: they contain data (the defined terms) which is annotated with metadata (e.g., the terms' authors) and provided with a PID.The second software solution will facilitate the transformation of the vocabularies developed with the help of VocPopuli into ontologies. It will handle two distinct use cases – the from-scratch conversion of vocabularies into ontologies, and the augmentation of existing ontologies wi...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Towards an extensible FAIRness assessment of FAIR Digital Objects

    • Abstract: Research Ideas and Outcomes 8: e94988
      DOI : 10.3897/rio.8.e94988
      Authors : Vincent Emonet, Remzi Çelebi, Jinzhou Yang, Michel Dumontier : The objective of the FAIR Digital Objects Framework (FDOF) is for objects published in a digital environment to comply with a set of requirements, such as identifiability, and the use of a rich metadata record (Santos 2021, Schultes and Wittenburg 2019, Schwardmann 2020). With the increasing prevalence of the FAIR (Findable, Accessible, Interoperable, Reusable) principles, and FAIR Digital Objects (FDO), used within different communities and domains (Wise et al. 2019), there will be a need to evaluate whether a FDO meets the requirements of the ecosystem in which it is used.Without a dedicated framework, communities will develop isolated assessment systems from the ground up (Sun et al. 2022, Bahim et al. 2020), which will cost them time, and lead to FAIRness assessments with limited interoperability and comparability.Previous work from the FAIR Metrics working group defined a framework for deploying individual FAIR metrics tests as separate services endpoints (Wilkinson et al. 2018, Wilkinson et al. 2019). To work in accordance with this framework, each test should take a subject URL as input, and return a score, either 0 or 1, a test version, and the test execution logs. A central service can then be used to assess the FAIRness of digital objects using collections of individual assessments. Such a framework could be easily extended, but there are currently no guidelines or tools to implement and publish new FAIRness assessments complying with this framework.To amend this problem, we published the fair-test library in python and its documentation, which help with developing and deploying individual FAIRness assessments. With this library, developers define their metric tests using custom python objects, which will guide them to provide all required metadata for their test as attributes, and implement the test evaluation logic as a function. The library also provides additional helper functions for common tasks, such as retrieving metadata from a URL, or testing a metric test.These tests can then be deployed as a web API, and registered in a central FAIR evaluation service supporting the FAIR metrics working group framework, such as FAIR enough or the FAIR evaluator. Finally, users of the evaluation services will be able to group the registered metrics tests in collections used to assess the quality of publicly available digital objects.There are currently as many as 47 tests that have been defined to assess compliance with various FAIR metrics, from which 25 have been defined using the fair-test library, including assessing if the identifier used is persistent, or if the metadata record attached to a digital object complies with a specific schema. This presentation introduces a user-friendly and extensible tool, which can assess whether specific requirements are met for a digital resource. Our contributions are: Developing and publishing the fair-test library to make the development and deployment of independent FAIRness assessment tests easier.Developing and publishing tests in python for existing FAIR metrics: 23 generic tests covering most of the FAIR metrics, and 2 domain-specific tests for the Rare Disease research community.We aim to engage with the FDO community to explore potential use-cases for an extensible tool to evaluate FDOs, and discuss their expectations related to the evaluation of digital objects.Insights and guidelines from the FDO community would contribute to further improving the fair-test ecosystem. Among improvements that are currently being under consideration, we can cite improving the collaborative aspect of metadata extraction, or adding new metadata to be returned by the tests. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • The FAIR extension: A web browser extension to evaluate Digital Object
           FAIRness

    • Abstract: Research Ideas and Outcomes 8: e95006
      DOI : 10.3897/rio.8.e95006
      Authors : Pedro Hernandez Serrano, Vincent Emonet : The scientific community's efforts have increased regarding the application and assessment of the FAIR principles on Digital Objects (DO) such as publications, datasets, or research software. Consequently, openly available automated FAIR assessment services have been working on standardization, such as FAIR enough, the FAIR evaluator or FAIRsFAIR's F-UJI. Digital Competence Centers such as University Libraries have been paramount in this process by facilitating a range of activities, such as awareness campaigns, trainings, or systematic support. However, in practice, using the FAIR assessment tools is still an intricate process for the average researcher. It requires a steep learning curve since it involves performing a series of manual processes requiring specific knowledge when learning the frameworks, disengaging some some researchers in the process.We aim to use technology to close this gap and make this process more accessible by bringing the FAIR assessment to the researcher's profiles. We will develop "The FAIR extension", an open-source, user-friendly web browser extension that allows researchers to make FAIR assessment directly at the web source. Web browser extensions have been an accessible digital tool for libraries supporting scholarship (De Sarkar 2015). A remarkable example is the lightweight version of reference managers deployed as a browser service (Ferguson 2019). Moreover, it has been demonstrated that they can be a vehicle for open access, such as Lean Library Browser Extension.The FAIR extension is a service that builds on top of the community-accepted FAIR evaluator APIs, i.e. it does not intend to create yet another FAIR assessment framework from scratch. The objective of the FAIR Digital Objects Framework (FDOF) is for objects published in a digital environment to comply with a set of requirements, such as identifiability, and the use of a rich metadata record (Santos 2021, Schultes and Wittenburg 2019). The FAIR extension will connect via REST-like operations to individual FAIR metrics test endpoints, according to Wilkinson et al. (2018), Wilkinson et al. (2019) and ultimately display the FAIR metrics on the client side (Fig. 1). Ultimately, the user will get FAIR scores of articles, datasets and other DOs in real-time on a web source, such as a scholarly platform or DO repository. With the possibility of creating simple reports of the assessment.It is acknowledged that the development of web-based tools carries some constraints regarding platform versions releases, e.g. Chromium Development Calendar. Nevertheless, we are optimistic about the potential use cases. For example,A student wanting to make use of a DO (e.g. software package), but doesn't know which to choose. The FAIR extension will indicate which one is more FAIR and aid the decision making processA Data steward recommending sourcesA researcher who wants to display all FAIR metrics of her DOs on a research profileA PI that wants to evaluate an aggregated metric for a project.These use cases can be the means to bringing the open source community and FAIR DO interest groups to work together. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Advancing caching and automation with FDO

    • Abstract: Research Ideas and Outcomes 8: e94856
      DOI : 10.3897/rio.8.e94856
      Authors : Amirpasha Mozaffari, Niklas Selke, Martin Schultz : Introduction: Geosciences are utilising big data that is constantly updated, modified, and changed with an ever-growing stream of new measured, modelled accumulated data (Reichstein et al. 2019). Many of these data reside in databases and are frequently revised and recalculated with new data, corrections, or recalibrations. Thus, versioning the data is a known challenge for the earth sciences system (ESS) community. To ensure reusability of the data and traceability of associated information, it is crucial to document this stream of changes in an efficient, human-readable, and machine-actionable manner. As previous studies (lump et al. 2021, Schultes et al. 2022) and community driven-efforts have shown, we believe the FAIR Digital Object (FDO) (De Smedt et al. 2020) concept could provide a neat solution by encapsulating the data, metadata, data version, and associated information and identifying it with a persistent identifier (PID) (Philipson 2019). In addition, FDO could be a path to avoid rerunning expensive, energy-demanding computations and data duplication as PID and metadata could enable a detailed search and cataloguing of available statistical aggregations and other products. In this conceptual work, we want to explore the FDO capability for data versioning combined with a state-of-the-art caching system for relational databases to provide reusable and mutable data products. Such an FDO-enabled caching system would enable us to identify recurring access patterns to data and store them as FDOs. Moreover, we believe such a concept could be integrated into an automated workflow where highly anticipated computation or user requests that require intensive computation are generated and submitted to High-performance Computing (HPC)'s.Case study: TOAR database: The Tropospheric Ozone Assessment Report (TOAR) database (Schultz 2017) consists of an extensive collection of global air quality measurements focusing on ground-level ozone. We use a PostgreSQL (PostgresSQL 2022) database, a widely used database for relational data, and provide the database schema and all related code via a free and open-source git repository (Jülich Supercomputing Centre 2022). A vital asset of the TOAR database is the associated REST API which has been implemented with Python/fast API and includes a module for statistical analysis of TOAR data. Users can request one or several of over 30 different statistical aggregates and define one of 5 target temporal resolutions from daily to annual, and the API will trigger online calculations based on the original data, which are stored at hourly time resolution. As there is an apparent demand from the scientific community to expand these analysis capabilities and allow for multi-variable and multi-station analysis (for example, to evaluate numerical model simulations), it will be necessary to design new parallel workflows to enable such calculations in a reasonable time.Two specific challenges to overcome in the design of automated workflow with an FDO-enabled caching system are ensuring that the query stays connected to the correct data and establishing a schedule for pre-calculating the most frequently used statistical aggregates. In the following, we discuss these two challenges in more detail. Caching system: There are other data providers in the field, but they commonly focus mainly on archiving the measurement data. We want an analysis tool with the fastest possible response times for the users. Furthermore, we want to look into FDOs to ensure the preservation of queries and make them reusable and traceable. For the caching itself, it is very important that the cache key created for a query allows for verification that the data used in computing the query the first time did not change when trying to reuse the cached result. In our conceptual work, we want to develop a concept and a demonstrator of an atmospheric data analysis cache. This includes choices for the underlying technical solution (e.g. PostgreSQL, MongoDB, Redis...), the definition of data structures and hash codes, design of a mechanism for the triggering of re-calculations, definition of a schedule for automated cache updates, and various aspects related to query documentation and reproducibility of results. Technical obstacles caused by the expected size of up to 0.8 Terabytes for the TOAR and complicated scalability issues that can arise should be considered for a possible solution. Ideally, the caching system should be agnostic of the underlying database/server choice to enhance portability. Automated workflow: The second challenge to address here is to combine the envisioned caching system with a flexible workflow scheme. Such a workflow setup enables preparing pre-compile and calculating the most frequently used statistical aggregate ahead of user demand. Queries can either be triggered by a user (demand-driven) or by an automatic system which will compute commonly used queries without a user having to trigger it (provider-driven) to have as many query results as possible ready to go for users, so they do not have to wait for the results after they have sent a request. User requests are categorised according to the availability of the statistical products and the required computation effort. Some might have already been calculated and stored as FDO can be quickly reloaded and processed further. While some queries might be new, but still possible to be calculated on the fly, and the responses could be delivered on near real-time basis. In contrast, some more intensive statistical aggregations require HPC. We believe an automated FDO-enabled caching system will utilise the metadata and FDO to provide on-demand data requests and reduce repetitive computation. It paves the way for intelligent computation that can be scheduled at d...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Challenges for Implementing FAIR Digital Objects with High Performance
           Workflows

    • Abstract: Research Ideas and Outcomes 8: e94835
      DOI : 10.3897/rio.8.e94835
      Authors : Line Pouchard, Tanzima Islam, Bogdan Nicolae : New types of workflows are being used in science that couple traditional distributed and high-performance computing (HPC) with data-intensive approaches, and orchestrate ensembles of numerical simulations and artificial intelligence (AI) models. Such workflows may use AI models to supplement computation where numerical simulations may be too computationally expensive, to automate trivial yet time consuming operations, to perform preliminary selections among intractable numbers of combinations in domains as diverse as protein binding, fine-grid climate simulations, and drug discovery. They offer renewed opportunities for scientific research but exhibit high computational, storage and communications requirements [Goble et al. 2020, Al-Saadi et al. 2021, da Silva et al. 2021]. These workflows can be orchestrated by workflow management systems (WMS) and built upon composable blocks that facilitate task placement and resource allocation for parallel executions on high performance systems [Lee et al. 2021, Merzky et al. 2021].The scientific computing communities running these kinds of workflows have been slow to adopt Findable, Accessible, Interpretable, and Re-usable (FAIR) principles, in part due to the complexity of workflow life cycles, the numerous WMS, and the specificity of HPC systems with rapidly evolving architectures and software stacks, and execution modes that require resource managers and batch schedulers [Plale et al. 2021]. FAIR Digital Objects (FDO) that encapsulate bit sequences of data, metadata, types and persistent identifiers (PID) can help promote the adoption of FAIR, enable knowledge extraction and dissemination, and contribute to re-use [De Smedt et al. 2020]. As workflows typically use data and software during planning and execution, FDOs are particularly adapted to enable re-use [Wittenburg et al. 2020]. But the benefits of FDOs such as automating data processing and actionable DO collections cannot be realized without the main components of FAIR, rich metadata and clear identifiers, being universally adopted in the community. These components are still elusive for HPC digital objects. Some metadata are added after results have been produced, are not described by controlled vocabularies, and typically left unconstrained, resulting in inefficient processes and loss of knowledge. Persistent identifiers are added at the time of publication to data supporting conclusions, so only a very small amount of data are being shared outside a small community of researchers “in the know”. In this conceptual work, one can distinguish several kinds of FDOs for HPC workflows that present both common and specific challenges to the development of canonical DO infrastructure and the implementation of FDO workflows that we discuss below:result FDOs represent computational results obtained when program execution complete,performance FDOs that contain performance measures and results from code optimization on parallel, heterogeneous architectures,intermediate FDOs from intermediate states of workflow execution, obtained from HPC checkpointing.All these FDOs for HPC workflows should include the computing environment and system specifications on which code was executed for metadata rich enough to enable re-usability [Pouchard et al. 2019]. Containers are often being used to capture dependencies between underlying libraries and versions in the execution environment for the installation and re-use of software code [Lofstead et al. 2015, Olaya et al. 2020]. But containers published in code repositories are made available without identifiers registered with resolvers. For instance, to attribute a Digital Object Identifier to software shared in github, one must perform the additional step of registering the code into Zenodo. FDOs extracted and built in the context of a canonical workflow framework including collections will help with the attribution of persistent identifiers and the linking of execution environment with data and workflow.Computational results may include machine learning predictions resulting form stochastic training of non-deterministic models. Neural networks and deep learning models present specific challenges to result FDOs related to provenance and the selection of quantities needed to include in an FDO for the re-use of results. What information needs to be included in a FAIR Digital Object encapsulating deep learning results to make it persistent and re-usable' The description of method, data and experiment recommended in [Gundersen and Kjensmo 2018] can be instantiated in a FDO collection. To make it re-usable, it should include the model architecture, the machine learning platform and its version, a submission script that contains hyperparameters, the loss function, batch size and number of epochs [Pouchard et al. 2020]. Challenges specific to digital objects containing performance measures for HPC workflows are those related to size, selection and reduction. Performance data at scale tends to be very large, thus a principled approach to selection is needed to determine which execution counters must be included in FDOs for performance reproducibility of an application [Patki et al. 2019]. Performance FDOs should include the variables selected to show their impact on performance and the methods used for selection: do such variables represent outliers in performance metrics' What methods and thresholds are used to qualify as outliers, what impact do these outliers have on overall performance of an execution' A key contributor to the failure to capture important information in HPC workflows is that metadata and provenance capture is often “bolted on” after the fact and in a piecemeal, cumbersome, inefficient manner that impedes further analysis. An FDO approach...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Realizing FAIR Digital Objects for the German Helmholtz Association of
           Research Centres

    • Abstract: Research Ideas and Outcomes 8: e94758
      DOI : 10.3897/rio.8.e94758
      Authors : Thomas Jejkal, Andreas Pfeil, Jan Schweikert, Anton Pirogov, Pedro Barranco, Florian Krebs, Christian Koch, Gerrit Guenther, Constanze Curdt, Martin Weinelt : The Helmholtz Association (Anonymous 2022d), the largest association of large-scale research centres in Germany, covers a wide range of research fields employing more than 43.000 researchers. In 2019, the Helmholtz Metadata Collaboration (HMC) (Anonymous 2022f) Platform as a joint endeavor across all research areas of the Helmholtz Association was started to make the depth and breadth of research data produced by Helmholtz Centres findable, accessible, interoperable, and reusable (FAIR) for the whole science community. To reach this goal, the concept of FAIR Digital Objects (FAIR DOs) has been chosen as top-level commonality for existing and future infrastructures of all research fields.In doing so, HMC follows the original approach of realizing FAIR DOs based on globally unique, Persistent Identifiers (PID), e.g., provided by https://handle.net/, machine actionable PID Records and strong typing using Data Types like https://dtr-test.pidconsortium.eu/#objects/21.T11148/1c699a5d1b4ad3ba4956 registered in a Data Type Registry, e.g., http://dtr-test.pidconsortium.eu/. In all these areas, HMC can build on the great groundwork of the Research Data Alliance and the FAIR DO Forum. However, when it comes to realization, there are still some gaps that will have to be addressed during our work and will be raised in this presentation. For single FAIR DO components like PIDs and Data Types, existing infrastructures are already available. Here, the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) (Anonymous 2022e) provides strong support with their many years of experience in this field. Within the framework of the ePIC consortium (Anonymous 2022c), the GWDG is offering on the one hand PID prefixes based on a sustainable business model, on the other hand GWDG is very active in terms of providing base services required for realizing FAIR DOs, e.g., different instances of Data Type Registries for accessing, creating, and managing Data Types required by FAIR DOs. Besides that, in the context of HMC we developed a couple of technical components to support the creation and management of FAIR DOs: The Typed PID Maker (Pfeil 2022b) providing machine actionable interfaces for creating, validating, and managing PIDs with machine-actionable metadata stored in their PID record, or the FAIR DO testbed, currently evolving into the FAIR DO Lab (Pfeil 2022a), serving as reference implementation for setting up a FAIR DO ecosystem. However, introducing FAIR DOs is not only about providing technical services, but also requires the definition and agreement on interfaces, policies, and processes.A first step in this direction was made in the context of HMC by agreeing on a Helmholtz Kernel Information Profile (http://dtr-test.pidconsortium.eu/#objects/21.T11148/b9b76f887845e32d29f7). In the concept of FAIR DOs, PID Kernel Information as defined by Weigel et al. (Weigel et al. 2018) is key to machine actionability of digital content. Strongly relying on Data Types and stored in the PID record directly at the PID resolution service, PID Kernel Information can be used by machines for fast decision making. The Helmholtz Kernel Information Profile is an attempt to introduce a top-level commonality across all digital assets produced within the Helmholtz Association and beyond to establish a basis for FAIR research data based on FAIR DOs.Hereby, the Helmholtz Kernel Information Profile integrates the recommendations of the RDA PID Kernel Information Working Group (Anonymous 2022b) as far as possible. By extending the Draft Kernel Information Profile (Weigel et al. 2018) with additional, mostly optional attributes, the Helmholtz Kernel Information Profile allows the adding of contextual information to FAIR DOs, e.g., research topic, or contact information, which is then available for machine decisions. Furthermore, additional properties for representing relationships between FAIR DOs, e.g, hasMetadata and isMetadataFor, were introduced to allow mutual relations between FAIR DOs.Currently, a demonstrator is implemented integrating the above components and services, i.e., PID Service, Data Type Registry, and Typed PID Maker. Fig. 1 outlines the architecture overview of the first version of the demonstrator.In this first version, in a semi-automatic workflow, a user enters a Zenodo (Anonymous 2022a) PID in a graphical Web frontend. A mapping component tries to fill automatically at least the properties required by the Helmholtz Kernel Information Profile using the obtained Zenodo metadata record. In a manual validation loop, the user may add or update certain properties before they are sent to an instance of the Typed PID Maker, validated against the Helmholtz Kernel Information Profile, and stored in the record of a newly registered PID using the services of the ePIC consortium. In addition, registered PID records are made searchable via the graphical frontend on top of a search index, e.g., realized using https://www.elastic.co/.After implementing this generic workflow, additional mappers supporting other repository platforms will be implemented based on the lessons learned, which will lead to a growing number of FAIR DOs and holds potential for providing significant benefits to scientists, e.g., a central point of contact for research data sets stored in different repositories, machine-actionable identification of relevant datasets, and creation of knowledge graphs representing relationships between data sets, repository platforms, researchers and research organizations.Furthermore, the gathered experience and its documentation will help others to apply the FAIR DO concept more easily, which will lead to an ever-growing collec...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • A FAIRification roadmap for ELIXIR Software Management Plans

    • Abstract: Research Ideas and Outcomes 8: e94608
      DOI : 10.3897/rio.8.e94608
      Authors : Olga Giraldo, Renato Alves, Dimitrios Bampalikis, Jose Fernandez, Eva Martin del Pico, Fotis Psomopoulos, Allegra Via, Leyla Jael Castro : Academic research requires careful handling of data plus any means to collect, transform and publish it, activities commonly supported by research software (from scripts to end-user applications). Data Management Plans (DMPs) are nowadays commonly requested by funders as part of good research practices. A DMP describes the data management lifecycle for the data corresponding to a research project, covering activities from collection to publication and preservation. To support and improve transparency, open science, reproducibility (and other *ilities), data needs to be accompanied by the software transforming it. Similar to DMPs, Software Management Plans (SMPs) can help formalize a set of structures and goals ensuring that the software is accessible and reusable in the short, medium and long term. DMPs and SMPs can be presented as text-based documents, guided by a set of questions corresponding to key points related to the lifecycle of either data or software.A step forward for DMPs are the machine-actionable DMPs (maDMPs) proposed by the Research Data Alliance DMP Common Standards Working Group. A maDMP corresponds to a structured representation of the most common elements present in a DMP (Miksa et al. 2020b), overcoming some obstacles linked to text-based representation. Such a structured representation makes it easier for DMPs to become readable and reusable for both humans and machines alike. The DMP Common Standard ontology (DCSO) (Cardoso et al. 2022) further supports maDMPs as it makes it easier to extend the original maDMP application profile to cover additional elements related to, for instance, SMPs or specific requirements from funders. maDMPs can be combined with the notion of a Research Object Crates (RO-Crate) to automate and ease management of research data (Miksa et al. 2020a). An RO-Crate (Soiland-Reyes et al. 2022) is an open, community-driven, and lightweight approach based on schema.org (Guha et al. 2016) annotations in JSON-LD to package research data (or any other research digital object) together with its metadata in a machine-readable manner.The ELIXIR SMP has been developed by the ELIXIR Software Development Best Practices Group in the ELIXIR Tools Platform to support researchers in life sciences (Alves et al. 2021). The ELIXIR SMP aims at making it easier to follow research software good practices aligned to the findable, accessible, interoperable and reusable principles for research software (FAIR4RS) (Chue Hong et al. 2022) while dealing with the lifecycle of research software. Its primary goal is encouraging a wider adoption by life science researchers, and being as inclusive as possible to the various levels of technical expertise. Here we present a roadmap for ELIXIR SMPs to become a FAIR digital object (FDO) (Schultes and Wittenburg 2019) based on the extension of maDMPs and DCSO and the use of RO-Crates. FDOs have been proposed as a way to package digital objects together with their metadata, types, identifiers and operations, so they become more machine-actionable and auto-contained.The current version of the ELIXIR SMP includes seven sections: accessibility and licensing, documentation, testing, interoperability, versioning, reproducibility, and recognition. Each section includes questions guiding and supporting researchers so they cover key aspects of the software lifecycle relevant to their own case. To lower the barrier and make it easier for researchers, most questions are Yes/No with some few offering a set of options. In some cases, a URL is also requested, for instance regarding the location of the documentation for end-users. Our roadmap for ELIXIR SMPs to move from a text-based questionnaire to an FDO comprises four main steps:creating maSMP application profile,extending DCSO,mapping to schema.org, andusing RO-Crates.Our maSMP application profile will include the semantic representation of the structured metadata that comes from the ELIXIR SMP. We will add granularity to the current root of the DCSO (dcso:DMP), by proposing the term SMP. In addition, we will propose the term ResearchSoftware as a dcso:Dataset. Terminology related to documentation, such as “Objective'' will also be considered. The objective is the Why the research software, which is crucial for their comprehensibility. We will propose the term DatasetObjective as the reason for the creation of a dataset. Source-codeRepository and Source-codeTesting are also good candidates to be part of the DCSO extension.We will extend DCSO with new classes and properties as necessary to include the software related elements mentioned in the maSMP application profile. As the ELIXIR SMP targets the life science community, we will analyze the need to add links from DCSO to ontologies describing common operations, activities, and types in this domain. One important aspect is the creation of a mapping from DCSO to schema.org. Schema.org has become a popular choice to add lightway semantics to web pages but can also be used on its own to provide metadata describing all sorts of objects. In life sciences, Bioschemas (Gray et al. 2017) offers guidelines on how to use some of the schema.org types aligned to this domain. Bioschemas includes a set of profiles, including minimum, recommended and optional properties, that have been agreed to and adopted by the community, for instance the ComputationalTool profile provides a way to describe software tools and applications. Bioschemas promotes its adoption by key resources in Life Sciences and development of tools such as the Bioschemas Markup Scraper and Extractor (BMUSE) used for the harvesting of the data (Gray et al. 2022).Our final step for ELIXIR SMPs to become an FDO is using RO-Crates to packag...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Updating Linked Data practices for FAIR Digital Object principles

    • Abstract: Research Ideas and Outcomes 8: e94501
      DOI : 10.3897/rio.8.e94501
      Authors : Stian Soiland-Reyes, Leyla Jael Castro, Daniel Garijo, Marc Portier, Carole Goble, Paul Groth : BackgroundThe FAIR principles (Wilkinson et al. 2016) are fundamental for data discovery, sharing, consumption and reuse; however their broad interpretation and many ways to implement can lead to inconsistencies and incompatibility (Jacobsen et al. 2020).The European Open Science Cloud (EOSC) has been instrumental in maturing and encouraging FAIR practices across a wide range of research areas. Linked Data in the form of RDF (Resource Description Framework) is the common way to implement machine-readability in FAIR, however the principles do not prescribe RDF or any particular technology (Mons et al. 2017).FAIR Digital ObjectFAIR Digital Object (FDO) (Schultes and Wittenburg 2019) has been proposed to improve researcher’s access to digital objects through formalising their metadata, types, identifiers and exposing their computational operations, making them actionable FAIR objects rather than passive data sources. FDO is a set of principles (Bonino et al. 2019), implementable in multiple ways. Current realisations mostly use Digital Object Interface Protocol (
      DOI Pv2) (DONA Foundation 2018), with the main implementation CORDRA. We can consider
      DOI Pv2 as a simplified combination of object-oriented (CORBA, SOAP) and document-based (HTTP, FTP) approaches.More recently, the FDO Forum has prepared detailed recommendations, currently open for comments, including a
      DOI P endorsement and updated FDO requirements. These point out Linked Data as another possible technology stack, which is the focus of this work.Linked DataLinked Data standards (LD), based on the Web architecture, are commonplace in sciences like bioinformatics, chemistry and medical informatics – in particular to publish Open Data as machine-readable resources. LD has become ubiquitous on the general Web, the schema.org vocabulary is used by over 10 million sites for indexing by search engines – 43% of all websites use JSON-LD.Although LD practices align to FAIR (Hasnain and Rebholz-Schuhmann 2018), they do not fully encompass active aspects of FDOs. The HTTP protocol is used heavily for applications (e.g. mobile apps and cloud services), with REST APIs of customised JSON structures. Approaches that merge the LD and REST worlds include Linked Data Platform (LDP), Hydra and Web Payments.Meeting FDO principles using Linked Data standardsConsidering the potential of FDOs when combined with the mature technology stack of LD, here we briefly discuss how FDO principles in Bonino et al. (2019) can be achieved using existing standards. The general principles (G1–G9) apply well: Open standards with HTTP being stable for 30 years, JSON-LD is widely used, FAIR practitioners mainly use RDF, and a clear abstraction between the RDF model with stable bindings available in multiple serialisations. However, when considering the specific principles (FDOF1–FDOF12) we find that additional constraints and best practices need to be established – arbitrary LD resources cannot be assumed to follow FDO principles. This is equivalent to how existing use of
      DOI P is not FDO-compliant without additional constraints.Namely, persistent identifiers (PIDs) (McMurry et al. 2017) (FDOF1) are common in LD world (e.g. using http://purl.org/ or https://w3id.org/), however they don’t always have a declared type (FDOF2), or the PID may not even appear in the metadata. URL-based PIDs are resolvable (FDOF3), typically over HTTP using redirections and content-negotiation. One great advantage of RDF is that all attributes are defined semantic artefacts with PIDs (FDOF4), and attributes can be reused across vocabularies. While CRUD operations (FDOF6) are supported by native HTTP operations (GET/PUT/POST/DELETE) as in LDP , there is little consistency on how to define operation interfaces in LD (FDOF5). Existing REST approaches like OpenAPI and URI templates are mature and good candidates, and should be related to defined types to support machine-actionable composition (FDOF7). HTTP error code 410 Gone is used in tombstone pages for removed resources (FDOF12), although more frequent is 404 Not Found.Metadata is resolved to HTTP documents with their own URIs, but these frequently don’t have their own PID (FDOF8). RDF-Star and nanopublications (Kuhn et al. 2021) give ways to identify and trace provenance of individual assertions. Different metadata levels (FDOF9) are frequently developed for LD vocabularies across different communities (FDOF10), such as FHIR for health data, Bioschemas for bioinformatics and>1000 more specific bioontologies. Increased declaration and navigation of profiles is therefore essential for machine-actionability and consistent consumption across FAIR endpoints. Several standards exist for rich collections (FDOF11), e.g. OAI-ORE, DCAT, RO-Crate, LDP. These are used and extended heterogeneously across the Web, but consistent machine-actionable FDOs will need specific choices of core standards and vocabularies. Another challenge is when multiple PIDs refer to “almost the same” concept in different collections – significant effort have created manual and automated semantic mappings (Baker et al. 2013, de Mello et al. 2022).Currently the FDO Forum has suggested the use of LDP as a possible alternative for implementing FAIR Digital Objects (Bonino da Silva Santos 2021), which proposes a novel approach of content-negotiation with custom media types. DiscussionThe Linked Data stack provides a set of specifications, tools and guidelines in order to help the FDO principles become a reality. This mature approach can accelerate uptake of FDO by scholars and existing research infrastructures such as the Europea...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • FAIR Digital Objects in Official Statistics

    • Abstract: Research Ideas and Outcomes 8: e94485
      DOI : 10.3897/rio.8.e94485
      Authors : Olav ten Bosch, Edwin de Jonge, Henk Laloli, Christine Laaboudi-Spoiden : Introduction*1Statistical offices on national and international scale provide statistics on demography, labour, income, society, economy, environment and other domains. Their collective output is usually referred to as ‘Official Statistics’. These offices have a long tradition of publishing data fairly and open, which is often part of their mission statement. For decades they have been providing websites with articles, press releases, graphs and tables of data for free, for research, for policy-making, and for common understanding. However, for users it often is not so easy to find the data needed, to (re-)use it in data-driven work or to refer to the right (sub)set of data in a sustainable way. Therefore, in this article we take a closer look at Official Statistics from a findable, accessibility, interoperability, and reusability (FAIR) perspective.Digital Objects in StatisticsDigital objects in official statistics can be identified on multiple levels. The core concept is the statistical fact: a number describing a certain estimate on a certain phenomenon in a certain population over a certain period of time. For example the estimated number of elderly inhabitants in Province Friesland (the Netherlands) on Jan 1, 2020, or the inflation in Belgium for fruits in 2021 are both statistical facts. Each of these statistical facts is uniquely defined and published as a digital object in the online statistical databases of Statistics Netherlands and Eurostat respectively. Statistical facts may have a production status (provisionary, final, revised) and are typically visualized as a number in a table cell or in a chart.Data without metadata are without meaning. A statistical fact refers to metadata (region, time, subject, population, uncertainty, quality etc.) which are essential to understand the context of the fact. We make a distinction here between structural or conceptual metadata, i.e. the structure and definitions of concepts, dimensions and types of data used, and referential metadata, i.e. descriptive information on the dataset. The metadata are of utmost importance to the data consumer to understand the data. Metadata have their own dynamics, e.g. classifications change over time. They are published as digital objects too, for example the statistical classification of economic activities (NACE).Statistical facts and their metadata form the foundation for higher level statistics products. News releases and thematic articles that explain statistics in a broader context are examples. This higher level content can be seen as digital objects too as it is usually the main entry level for the general public and search engines and enables their findability and accessibility.Standards and FAIREach digital object in official statistics has its own structure, dynamics, dissemination channels and standards. This can make it sometimes hard to work with data from official statistics.Statistical databases differ among statistical organizations, both technically as well as in metadata and the API’s that they offer for automated access. Main standards in this field are the Statistical Data and Metadata eXchange (SDMX), JSON-stat, OData, or simple formats such as CSV. Commonly agreed structural metadata is organized into SDMX registries (global registry, Eurostat registry), which provide automated access to statistical metadata, which is good for accessibility.The SDMX standard is actually targeted to statistical and financial data which may hinder wider reusability. Therefore some statistical offices are moving to semantic standards. An an example are the vocabularies and classifications published as linked open data by Statistics Netherlands. Publishing metadata this way makes it possible to reuse and link data across organizations and gives semantic structure that is machine readable. Another example is from the statistical office of the European Union, Eurostat, that is converting the statistical classifications and correspondence tables from their current metadata system into Linked Open Data in the EU Vocabularies website. The representation is based on XKOS, an ontology for modelling statistical classifications, offering machine-readable access for reusing objects as well as facilitating linking among classifications on national, EU or international level. Yet another initiative is from the United Nations Economic Commission for Europe (UNECE), where statistical organizations collectively develop a Core Ontology for Official Statistics (COOS) describing the statistical production process. All in all for structural metadata, statistical organizations are increasingly moving towards linked data standards to better align to non-statistical communities.In the field of referential metadata the Single Integrated Metadata Structure (SIMS) is used. It offers machine-readable descriptive metadata such as unit of measure, reference period, confidentiality, quality, accuracy etc. Some of the elements are also covered in the widely used RDF-based Data Catalog Vocabulary (DCAT) and the statistical variant (STAT-DCAT), which raises the question whether a further integration of these could improve FAIR-ness of statistical referential metadata.With respect to higher level digital objects, such as statistical articles, the use of semantic web ontologies such as schema.org and Dublin Core for annotating statistical output in common terms are increasingly being used. The use of Digital Object Identifiers (
      DOI s) where applicable makes it easier to refer to statistical output.From the above we can see that the use of different standards at different levels creates various ways to identify statistical content, such as Uniform Resource Names (URNs), SDMX identifiers, Digital Object Odentifiers (
      DOI
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • FIPs and Practice

    • Abstract: Research Ideas and Outcomes 8: e94451
      DOI : 10.3897/rio.8.e94451
      Authors : Barbara Magagna, Erik Schultes, Marek Suchánek, Tobias Kuhn : There is no doubt that FAIR Data and Services (GO FAIR Foundation 2019b) are needed to enable data intensive research and innovation. While the FAIR Guiding Principles (Wilkinson et al. 2016) specify the expected behaviors of digital artifacts they do not specify the technology choices actualizing these behaviors, leaving maximum freedom to operate for communities of practices. In recent years, EU research funding including projects associated with the European Open Science Cloud (European Commission 2020) have been coupled to the FAIRness of the actual data and service landscape and promote the development of FAIR implementation roadmaps. Begining in 2019, ENVRI-FAIR (ENVRI Community 2019) used a text-based questionnaire approach with more than 50 questions aiming to capture details on how the FAIR Principles were implemented in the Research Infrastructure data architectures. The responses however, were not directly usable for downstream analysis without substantial post-processing and harmonization (Magagna et al. 2020). At the same time, the GO FAIR Foundation (GO FAIR Foundation 2022) was leading the development of the FAIR Implementation Profile (FIP) concept. A FAIR Implementation Profile is a list of declared technology choices intended to implement each of the FAIR Guiding Principles, made as a collective decision by the members of a particular community of practice (Sustkova et al. 2020, Schultes et al. 2020). The FIP provided a more structured and efficient way of cataloging FAIR-realted implementations, but to scale the approach it was necessary to develop an ontology linked to machine-actionable questionnaires.In a collaborative effort, the FIP ontology (Kuhn et al. 2020) was developed in order to specify questions prompting FAIR Implementation Communities (FICs) to explicitly declare for each of the FAIR Principles, the FAIR Enabling Resources (FER) that the community uses to implement them. FERs are defined as digital objects that provide functions needed to achieve some aspect of FAIRness. Twelve different types of FERs were identified (Magagna and Schultes 2022), such as identifier service type (FAIR Principle F1), lookup-service type (F4) or structured vocabulary type (I2). The FIP as a whole is interpreted as community specific metadata and as such suitable for addressing the directive to reuse "domain-relevant community standards" given in Principle R1.3. To meet the requirement explicitly expressed by the ENVRI communities the FER concept has been extended to include its status as: already available for deployment or are under development; deployed currently or are planned to be deployed in the future; planned replacement of one FER by another in future. The Data Stewardship Wizard (DSW, ds-wizard.org), initially developed as a data management planning (DMP) tool in ELIXIR (Hooft et al. 2016), was desinged with all the required features providing a machine-actionable questionnaire (Pergl et al. 2019). Using knowledge models (KMs, Codevence 2020b) and Jinja2-based document templates (Anonymous 2007), it provides a versatile way to define decision trees in the form of smart questionnaires as well as their transformations to practically any textual and machine-readable document, for example, JSON or RDF or even FAIR Digital Object-like nanopublications. These capabilties were leveraged to build the FAIR Implementation Profile Wizard (FIP Wizard, Codevence 2020) where different KMs were used to create nanopublications within the Wizard environment for 1) FAIR Enabling Resources, 2) FAIR Implementation Communities, 3) Metadata Longevity Plans (the FER type for Principle A2), and 4) the FAIR Implementation Profiles themselves. Users open a questionnaire, fill it in, and then they can preview and publish a nanopublication. The relationships between these FIP-related nanopublications are made by linking the Uniform Resource Identifiers (URI, W3C 2001) of the nanopublications via drop-down lists and autocomplete features in the FIP Wizard questionnaire interface. This functionality is achieved using real-time API calls to a nanopublication querying service as the user types. As curatorial feedback, the user can see various metadata related to the matching nanopublications (e.g. description, timestamp, or approval badge). A custom submission service allows the submission of the nanopubs to a FIP triple store (Kuhn 2020). SPARQL queries can then help to produce the FIP matrix, a cross table for FERs and Communities illuminating convergence opportunitesd by FER reuse over the technological and community landscapes.In this way FIPs are published by the FIP Wizard as FAIR (machine-readable) and Open data, which can then serve as a reference for practical FAIR data stewardship activities conducted by members of that, and other communities. A FIP Wizard module linking the FIP to the DMP based on a mapping of the underlying KMs is under development and will translate the FIP into clear community-specific directives that data stewards can subsequently implement. FIP publication also encourages FIP reuse and repurposing by other communities, which saves time ‘reinventing the wheel’ and simultaneously drives convergence on FAIR implementation choices. Over time, FIPs need to be updated to fit the purposes of the community and to accomodate the ongoing development of FAIR technologies, such as FAIR Digital Objects. The FIP Wizard supports systematic versioningand it is anticipated that this revision legacy can later be mined for insights into FAIR-related technology trends. The FAIR Implementation Profiles and Practice (FIPP) working group (GO FAIR Foundation 2019) focuses on the role of FAIR Implementation Profiles (FIP) in the FAIR Digital Object (FDO) space. FIPs impact the FDO development in 2 pr...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • A FAIR Digital Object Lab Software Stack

    • Abstract: Research Ideas and Outcomes 8: e94408
      DOI : 10.3897/rio.8.e94408
      Authors : Andreas Pfeil, Thomas Jejkal, Sabrine Chelbi, Nicolas Blumenröhr : Preprocessing data for research, like finding, accessing, unifying or converting, takes up to large parts of research time spans (Wittenburg and Strawn 2018). The FAIR (Findability, Accessibility, Interoperability, Reusability) principles (Wilkinson 2016) aim to support and facilitate the (re)use of data, and will contribute to alleviating this problem. A FAIR Digital Object (FAIR DO) captures research data resources of all kinds (raw data, metadata, software, ...) in order to align them with the FAIR principles.FAIR Digital Objects are expressive, machine-actionable pointers to research data (De Smedt et al. 2020). As such, each FAIR DO points to one research data object. Additionally, they may link to other FAIR DOs, explaining their relations. The FAIR Digital Object Lab (Pfeil et al. 2022) is an extendable and adjustable architecture (a software stack) for generic FAIR Digital Object tasks. It consists of a set of interacting components with services and tools for creation, validation, discovery, curation, and more. In this talk, we will present our plans for the FAIR DO Lab and explain our decisions, which are mostly based on the experience gained in previous developments.The creation and maintenance of FAIR DOs is not trivial, as their persistent identifiers (PIDs) contain typed record information. When creating or maintaining PID records of FAIR DOs, the required information has to be validated, involving calls to a public Data Type Registry (DTR) (Lannom et al. 2015). After a successful validation, the information has to be transformed to a representation of a PID service. After a FAIR DO has been registered successfully, the PID should be documented locally and disseminated. Using these PIDs as a starting point, tools may use the machine-actionability of FAIR DOs to maintain search indexes or to create collections. This enables researchers to look up PIDs by searching for record information or timestamp.We are developing a set of services, offering a solution to support these use-cases, which we call the FAIR DO Lab. Its goal is to have a production-ready and configurable software stack, easing the development of FAIR-DO-aware tools and services by offering at least the described use-cases. We have already gained some experience by its predecessor, the FAIR DO Testbed (Pfeil et al. (2021a)), which was introduced at the Research Data Alliance (RDA) Virtual Plenary 17 Poster Session (Pfeil et al. 2021b). The Lab will be configurable similar to the Testbed, as each service can be omitted or replaced to satisfy specific needs while integrating the Lab on top of existing research infrastructures.The FAIR DO Lab enables PID record management and validation using the Typed PID Maker (Pfeil and Jejkal 2021), following the RDA PID Information Types (PIT) Working Group Recommendations (Weigel et al. 2015) and an external Data Type Registry (DTR), following the RDA Data Type Registry Working Group Recommendations (Lannom et al. 2015). The DTR stores profiles and types, enabling typed, machine-actionable PID records. The Typed PID Maker uses this information for the validation of PID records, and stores and disseminates PIDs after their creation.All created or modified PIDs are communicated to a message broker. This way, other services can be notified about such activities. Our first service making use of this will be an advanced indexing service. It will ingest the PIDs and their record information into a search index, but also try to extract information from the bit-sequence of the digital object itself. In a second step, we are considering the automated creation of collections utilizing our production-ready Collection Registry (Chelbi and Jejkal 2020), which the Testbed already includes. This will require a set of rules and a process to use those rules in order to place new PIDs in the correct collection. The Collection Registry is an implementation of the Collection API specification (Weigel et al. 2017), which was published by the corresponding RDA Research Data Collections Working Group.On the conceptual side, we hope to gain more insight about the required structure of PID records. There are ongoing discussions about this structure and to which degree standardization is required. Large talking points are the concepts of Digital Object Types (Lannom et al. 2015) and Kernel Information Profiles (Weigel et al. 2018). Working on the Lab and its predecessor, we recognized that there are large gaps regarding the structure of FAIR Digital Objects and the roles of the object types and profiles. To bring FAIR DOs into reality, research software will need to use them. But as FAIR DOs point to diverse kinds of research data, the software needs to make decisions. To what extent can the software use a specific FAIR DO' We observed that too much flexibility makes automated decisions harder. Our suggestion is therefore to consider FAIR DOs less from the infrastructure point of view, and more from the machine's point of view to improve the machine-actionability. We expect that we will gain insights about the feasibility in the development process to ease the development of further FAIR-aware tools for research, particularly for specialized tools that already exist and are in use. It will not be feasible to write every tool from scratch.On the practical side, the Lab will already have a stronger focus on interactive tools with user interfaces in order to provide an easy-to-use Lab for research. We consider our current work on granular base services for research data management to be a solid ground for such developments. These tools can of course not replace specialized tools, but will make the generic services in the Lab easy to use. We still expect that specialized tools will benefit from the integration of such se...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Biospecimens in FDO world

    • Abstract: Research Ideas and Outcomes 8: e94544
      DOI : 10.3897/rio.8.e94544
      Authors : Sara El-Gebali, Rory Macneil, Rorie Edmunds, Parul Tewatia, Jens Klump : With the advent of technological advances in research settings, scientific collections including sample material became on par with big data. Consequently there is a widespread need to highlight and recognise the inherent value of samples coupled with efforts in unlocking sample potential as resources for new scientific discovery. Samples with informative metadata can be more easily discoverable, more readily shared and reused, allowing reanalysis of associated datasets, avoiding duplicate efforts, and providing metaanalysis yielding considerably enhanced insight. Metadata provides the framework for a consistent, systematic and standardized collection of sample information, enabling users to identify the availability of research output from the samples, relevancy to their intended use, and a way to conveniently identify sample material as well as access provenance information related to the physical samples. Researchers need this essential information aiding their decision making process on the quality, usability and accessibility of the samples and associated datasets. We propose to explore the practical implementation of FAIR Digital Objects (FDO) for biological life science physical samples and practically how to create an FDO framework centralized around biospecimen samples, linked datasets, sample information and PIDs (Persistent Identifiers)Klump et al. 2021. This effort is highly relevant to enhancing the portability of sample information between multiple repositories and other kinds of resources (e.g. e-infrastructures).In this session we would like to present our current work in order to mobilize the community to define the FAIR Digital Object Architecture for biospecimen in life science including all infrastructure components e.g. metadata, PIDs and their integration with technical solutions. To that end, in our community of practice we aim to:What: Identify the minimum set of attributes required for describing biospecimen in biological life science (Minimal Information About a Biological Sample, MIABS) with ontological mapping for semantic unambiguity and machine actionability.Identify the required attributes for registering PIDs for biospecimens and how that will operate in an FDO ecosystem. This will pave the way for a framework of coupling the descriptive metadata to the digital object in a FAIR and comprehensive manner.How:Define a semantic FDO model for biospecimens.Define the role of biospecimen PIDs registration information and kernel attributes and how that translates to machine actionability and programmatic decisions.Define the implementation specifics for integration of biospecimen FDOs with operational infrastructure e.g. e-infrastructures, repositories and machines.Relevant technologies include: RO-Crate, Persistent identifiers, and metadata schemas The recent partnership between IGSN and DataCite described below is a catalyst in this call of action to the FDO community to build a Community of Practice (CoP) specifically focused on biospecimen samples. Community of practice:IGSN e.V. announced a partnership with DataCite, in which DataCite’s registration services and supporting technology for Digital Object Identifiers (another type of PID) are now being leveraged to register IGSN IDs, and thus ensure the ongoing sustainability of the IGSN ID infrastructure. Importantly, the two organizations are also focusing the community’s efforts on advocacy of PIDs for physical samples and expanding the global sample ecosystem. Assisted by the DataCite Samples Community Manager, the IGSN e.V. is establishing working groups (Communities of Practice) within different research domains to support development and promotion of standardized methods for identifying, citing, and locating physical samples. In particular, the partnership wishes to work with the Biosamples community to elaborate the necessary information (metadata) such that those within the community have a full understanding of a physical sample when its descriptive webpage is accessed via its PID, see this example. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Towards FAIR Data Access

    • Abstract: Research Ideas and Outcomes 8: e94386
      DOI : 10.3897/rio.8.e94386
      Authors : Daan Broeder, Willem Elbers, Michal Gawor, Cesare Concordia, Nicolas Larrousse, Dieter Van Uytvanck : Background In the past decade many different national, EU and global projects have been successful in raising awareness about Open Science and the importance of making data findable and accessible such as stated in the FAIR principles (Wilkinson et al. 2016).In this respect, there have been many advances with respect to options for discovering data. A multitude of either thematic or general catalogues are providing faceted browsing interfaces for humans and Application Programming Interfaces (APIs) for use by machines and similarly, data-citations in publications offer references to resources hosted by repositories. However, using such catalogues and data-citations, researchers are not guaranteed to obtain access to the data itself. Mostly the resource link in the catalogue (and also in the metadata) or citation is a “landing-page”, a description of the resource meant for human consumption. The landing-page may contain instructions how to access or download the resource itself but usually it is difficult to parse by machines.FAIR data accessThus the approach sketched above does not meet the requirements in scenarios where applications need assured and quick access to data. Also the FAIR principles interpretation from GO FAIR states*1 that these “emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data.” The requirement for providing a Persistent Identifier (PID) for a resource*2, is mostly interpreted as meaning a PID for the resource’s metadata or landing-page only. Note that we ignore the need for user authentication and authorization prior to accessing data, here we will only consider data that is ‘freely’ accessible.To improve the situation with respect to machine data accessibility a number of technologies and approaches that have been discussed in the CLARIN and Social Sciences and Humanities (SSH) infrastructure domain can be useful. We present some and comment on their suitability.SignpostingSignposting*3 is a technology proposed by van de Sompel (Sompel and Nelson 2015) to release relevant technical and bibliographical attributes from a resource URI. It's well described, and uses the HTTP protocol to provide additional information via HTTP Link Headers*4. Alternatively, for HTML type resources, the information may also be provided in HTML Link elements. In the CLARIN community the signposting concept was accepted, but its proposed implementation deviated from van de Sompel and made it less dependent on the HTTP protocol (Arnold et al. 2021). However on the downside, the signposting information is embedded in the CLARIN specific Component Metadata (CMDI) (Broeder et al. 2012), and so makes it CLARIN specific, or at least requires clients to have specific knowledge about CMDI.CLARIN Digital Object Gateway (DOG)One approach that is currently worked on for the CLARIN research infrastructure is the creation of a DOG library*5 and (later) a service that provides a proxy gateway from the resource PID to the actual data. DOG uses implicit knowledge about the different repository solutions that are used by the CLARIN B-type centres*6 and some repositories outside the CLARIN infrastructure. DOG works in two steps: first obtaining metadata from the resource PID and secondly extracting resource links from the metadata. Each of the repositories registered within DOG has a minimal configuration specifying how to parse fields of interest from the resource's metadata. For B-type CLARIN centres DOG uses content negotiation as the primary way of obtaining the metadata in CMDI format. For repositories outside the CLARIN infrastructure, DOG primarily relies on the API provided by the repository in order to access metadata and data resources. The DOG solution does have scalability problems, but within the limited domain of CLARIN centres, it can offer a solution until a better one becomes available.Limited PID kernel informationThe (limited) PID kernel information approach assumes that for every Digital Object (DO) (Berg-Cross et al. 2015) and its metadata a Handle type PID (CNRI 2020) is issued and that the Handle information record can be used to store and associate additional important information with the (meta) data PID using handle value types such as for example a checksum and references to the data or metadata. This is a simplification of the architecture proposed in the work done in RDA context: PID Kernel Info recommendations (Weigel et al. 2018). Consistent use of Handle information records could solve the data access problem, but just as for the signposting strategy, it requires strong discipline to maintain the additional information source. Examples from smaller projects and repositories exist that do manage this information in the Handle record eg. the DARIAH-DE repository*7. FAIR Digital Objects (FDO)FDOs*8 attempt to overcome the data management challenges posed by the heterogeneity and complexity of data using a combination of abstraction, virtualization and encapsulation (Schwardmann 2020). In practice, in the context of our access to data problem, the FDO solution can be seen as both a generalization and upgrade of the PID kernel information approach. The key characteristics here are the (conceptual) encapsulation of data objects with data structure and services that allow aware applications to recognize the data objects metadata and bitstream format, and process as intended by the programmer. Eligible data processing services, either general ones from communities,...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • The Comparative Anatomy of Nanopublications and FAIR Digital Objects

    • Abstract: Research Ideas and Outcomes 8: e94150
      DOI : 10.3897/rio.8.e94150
      Authors : Erik Schultes, Barbara Magagna, Tobias Kuhn, Marek Suchánek, Luiz Bonino da Silva Santos, Barend Mons : Beginning in 1995, early Internet pioneers proposed Digital Objects as encapsulations of data and metadata made accessible through persistent identifier resolution services (Kahn and Wilensky 2006). In recent years, this Digital Object Architecture has been extended to include the FAIR Guiding Principles (Wilkinson et al. 2016), resulting in the concept of a FAIR Digital Object (FDO), a minimal, uniform container making any digital resource machine-actionable. Intense effort is currently underway by a global community of experts to clarify definitions around an FDO Framework (FDOF) and to provide technical specifications (FAIR DO group 2020, FAIR Digital Object Forum 2020 , Bonino da Silva Santos (2021)) regarding their potential implementation.Beginning in 2009, nanopublications were independently conceived (Groth et al. 2010) as a minimal, uniform container making individual semantic assertions and their associated provenance metadata, machine-actionable. They represent minimal units of structured data as citable entities (Mons and Velterop 2009). A nanopublication consists of an assertion, the provenance of the assertion, and the provenance of the nanopublication (publication info). Nanopublications are implemented in and aligned with Semantic Web technologies such as RDF, OWL, and SPARQL (World Wide Web Consortium (W3C) 2015) and can be permanently and uniquely identified using resolvable Trusty URIs (Groth et al. 2021). The existing Nanopublication Server Network provides vital services orchestrating nanopublications (Kuhn et al. 2021) including identifier resolution, storage, search and access. Nanopublications can be used to expose quantitative and qualitative data, as well as hypotheses, claims, negative results, and opinions that are typically unavailable as structured data or go unpublished altogether. The first practical application of nanopublications occurred in 2014, with the publication of millions of nanopublications as part of the FANTOM5 Project (The FANTOM Consortium and the RIKEN PMI and CLST (DGT) 2014, Lizio et al. 2015). Since then, millions of real-world examples spanning diverse knowledge domains are now available on the nanopublication server network.Like nanopublication, the FDOF also posits an ultra-minimal approach to structured, self-contained, machine-readable data and metadata. An FDO consists of: the object itself (subsequently referred to here as the resource to avoid confusion with other meanings of the term “object”); the metadata describing the resource; and a globally unique and persistent identifier with predictable resolution behaviors. These two technologies share the same vision of a data infrastructure, and act as instances of Machine-Actionable Containers (MACs) that make use of minimal uniform standards to enable FAIR operations. Here, we compare the structure and computational behaviors of the existing nanopublication infrastructure, to those in the proposed FAIR Digital Object Framework. Although developed independently there are clear parallels between the vision and the approach of nanopublication and FDOF. Each aspires to minimal standards for the encapsulation of digital information into free-standing, publishable (citable, referenceable) entities. The minimal standards involve globally unique and persistent identifiers that resolve to standardized semantically enabled metadata descriptions that include machine actionable paths to the resource itself.At the same time, there are also differences. The scope of nanopublications is limited to the assertional data type and, as the name suggests, nanopublications should remain small in size (limited to single assertions as individual triples or small RDF graphs). In contrast FDOs are unlimited in their scope, accommodating digital resources of arbitrarily large size, type and complexity, so long as their type can be ontologically described. Furthermore, whereas nanopublications represent a moderately mature technology, the FDOF is a specification still under development. If it were possible to formally draw points of contact between the two approaches, then it would be possible to leverage the vast practical experience gained in the nanopublishing of assertions for the FDO community.Here, inspired by recent applications of nanopublications in the FIP Wizard tool (Schultes et al. 2020), and their extension to research claims (Kuhn 2022, McNamara 2022) and data using Schultes (2022a), Schultes (2022b), we attempt a point-by-point comparison of the specifications between nanopublication and FDOs. We find a remarkable congruence between the currently proposed FDO requirements and the existing nanopublication infrastructure, including several FDO-like qualities already embodied in the nanopublication ecosystem. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Incrementally building FAIR Digital Objects with Specimen Data Refinery
           workflows

    • Abstract: Research Ideas and Outcomes 8: e94349
      DOI : 10.3897/rio.8.e94349
      Authors : Oliver Woolland, Paul Brack, Stian Soiland-Reyes, Ben Scott, Laurence Livermore : Specimen Data Refinery (SDR) is a developing platform for automating transcription of specimens from natural history collections (Hardisty et al. 2022). SDR is based on computational workflows and digital twins using FAIR Digital Objects.We show our recent experiences with building SDR using the Galaxy workflow system and combining two FDO methodologies with open digital specimens (openDS) and RO-Crate data packaging. We suggest FDO improvements for incremental building of digital objects in computational workflows.SDR workflowsSDR is realised as the workflow system Galaxy (Afgan et al. 2018) with SDR tools installed. An Open Research challenge is that some tools have machine learning models with a commercial licence. This complicates publishing to Galaxy toolshed, however we created Ansible scripts to install equivalent Galaxy servers, including tools and dependencies, accounts and workflows. SDR workflows are published in WorkflowHub as FDOs.We implemented the use case De novo digitization in Galaxy (Brack et al. 2022). Shown in Fig. 1 the workflow steps exchange openDS JSON (Hardisty et al. 2019), for incremental completion of a digital specimen. Initial stages build a template openDS from a CSV with metadata and image references – subsequent analysis completes the rest of the JSON with regions of interest, text digitised from handwriting, and recognized named entities.Galaxy can visualise outputs of each step (Fig. 2), important to make the FDOs understandable by domain experts and to verify accuracy of SDR.We are adding workflows for partial stages, e.g. detection of regions (Livermore and Woolland 2022a) and hand-written text recognition (Livermore and Woolland 2022b), which we'll combine with scalability testing and wider testing by project users. Additional workflows will enhance existing FDOs and use new tools such as barcode detection of museums’ internal identifiers.We are now ready to publish digital specimens as FAIR Digital Objects, with registration into DiSSCO repositories, PID assignment and workflow provenance. However, even at this early stage we have identified several challenges that need to be addressed.FDO lessonsWe highlight the De novo use case because this workflow is exchanging partial FDOs – openDS objects which are not fully completed and not yet assigned persistent identifiers. openDS schemas are still in development, therefore SDR uses a more flexible JSON schema where only the initial metadata (populated from CSV) are required. Each step validates the partial FDO before passing it to the underlying command line tool.Although workflow steps exchange openDS objects, they cannot be combined in any order. For instance, named entity recognition requires digitised text in the FDO. We can consider these intermediate steps as sub-profiles of an FDO Type. Unlike hierarchical subclasses, these FDO profiles are more like ducktyping. For instance a text detection step may only require the regions key, but semantically there is no requirement for an OpenDSWithText to be a subclass of OpenDSWithRegion, as text also can be transcribed manually without regions. Similarly, we found that some steps can be executed in parallel, but this requires merging of partial FDOs. This can be achieved by combining JSON queries and JSON Schemas, but indicates that it may be more beneficial to have FDO fragments as separate objects. Adding openDS fragment steps would however complicate workflows.Several of our tools process the referenced images, currently https URLs in openDS. We added a caching layer to avoid repeated image downloading, coupled with local file-paths wiring in the workflow. A similar challenge occurs if accessing image data using
      DOI P, which unlike HTTP, has no caching mechanisms.RO-Crate lessonsGalaxy is developing support for importing and exporting Workflow Run Crates, a profile of RO-Crate (Soiland-Reyes et al. 2022b) to captures execution history of a workflow, including its definition and intermediate data (De Geest et al. 2022). SDR is adopting this support to combine openDS FDOs with workflow provenance, as envisioned by Walton et al. (2020).Our prototype de novo workflow returns results as a ZIP file of openDS objects. End-users should also get copies of the referenced images and generated visualisations, along with workflow execution metadata. We are investigating ways to embed the preliminary Galaxy workflow history before the final step, so that this result can be an enriched RO-Crate.ConclusionsSDR is an example of machine-assisted construction of FDOs, which highlight the needs for intermediate digital objects that are not yet FDO compliant. The passing of such “local FDOs” is beneficial not just for efficiency and visual inspection, but also to simplify workflow composition of canonical workflow building blocks. At the same time we see that it is insufficient to only pass FDOs as JSON objects, as they also have references to other data such as images, which should not need to be re-downloaded. Further work will investigate the use of RO-Crate as a wrapper of partial FDOs, but this needs to be coupled with more flexible FDO types as profiles, in order to restrict “impossible” ordering of steps depending on particular inner FDO fragments. A distinction needs to be made between open digital specimens that are in “draft” state and those that can be pushed to DiSSCo registries.We are experimenting with changing the SDR components into Canonical Workflow Building Blocks (Soiland-Reyes et al. 2022a) using the Common Workflow Language (Crusoe et al. 2022). This gives flexibility to scalably execute SDR workflows on different compute backends such as HPC or local cluster, without the additional setup of Gal...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Software Citation Workshop Results

    • Abstract: Research Ideas and Outcomes 8: e94250
      DOI : 10.3897/rio.8.e94250
      Authors : Daina Bouquin, Daniel Chivvis : Software is a copyrightable creative work that is foundationally important to the future of scholarly research, yet software citation is not ubiquitous. Despite increasing acceptance of general software citation principles (Smith et al. 2016), challenges exist that make their implementation difficult. As a result, metadata to facilitate software citation goes unrecorded, software goes uncited (Bouquin et al. 2020), and software authors continue to be divorced from their contributions to science and human cultural heritage. If this status quo persists, uncited software will become increasingly difficult to find, access, and build upon, which will prevent software from being a “Findable, Accessible, Interoperable, and Reusable” (FAIR) Digital Object in the future.Addressing the challenges that currently impact software citation implementation requires action from the scholarly communication ecosystem and digital preservation landscape, along with cooperation from software authors and users. This situation presents an opportunity to enable a future for FAIR software through software citation by leveraging intersections between digital challenges in libraries and archives, and the work of experts in other disciplines to advance theory and practice. To achieve this end, the Harvard-Smithsonian Center for Astrophysics (CfA) has partnered with the Institute of Museum and Library Services (IMLS) to bring together stakeholders and experts representing the many forms of labor and expertise to address specific questions about software citation that have so far gone unresolved, such as: How can we enable software citation when the software is part of a larger deposit of content in an archival repository' How can we enable the adoption of metadata standards that support software citation in repositories' To address the above questions, the Software Citation Workshop (SCW) was created. Although the meeting organizers originally planned for a fully in-person meeting, the constantly shifting pandemic landscape necessitated a new approach. Instead of one large meeting, SCW ‘22 started with a series of three online focus groups. These groups were held online from June 28th to June 30th, 2022. The goal of the focus groups was to help meeting organizers better understand how various stakeholders view problems associated with software citation metadata availability in repositories and perceived barriers to implementing software citation metadata standards. The focus groups also helped to identify how these people think software citation problems should be prioritized by addressing specific challenges, such as:What is the status of software citation adoption in your domain'What barriers and incentives influence the availability of software citation metadata in repositories' (e.g., policies, curatorial practices, user-generated information)From your perspective, how should people cite software that is part of a larger deposit' (e.g., Jupyter notebooks deposited with a paper) What could we do to make software citation metadata available in these larger deposits'What influences the adoption of metadata standards that support software citation in repositories'Are curators aware of the differences between the specific software citation metadata formats' (i.e., Citation File Format (CFF) (Druskat 2022) vs. CodeMeta (Jones et al. 2020))Are software authors aware of the different formats and when/how to use them'What have your experiences been with either' Should we be encouraging one over the other'In August 2022, an in-person workshop will summarize the answers to these challenges given by the online focus groups into an expanded and revised version of the Software Citation Implementation Challenges (Katz et al. 2019) document. In-person meeting attendees will be encouraged to brainstorm interventions to tackle and prioritize these newly defined software citation problems, and to lay out a series of mutually supporting approaches to address them. Attendees will also be given the opportunity to synthesize plans of action and other deliverables that can be shared widely.This presentation aims to promote these yet to be determined plans of action by introducing attendees to the lessons learned and outputs produced by the SCW ‘22 meetings. Through open collaboration and feedback, this presentation will also introduce attendees to ways they can help ensure the future of software as a FAIR Digital Object. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • FAIR Points: From sdo:LearningResource to FAIR Digital Object

    • Abstract: Research Ideas and Outcomes 8: e94244
      DOI : 10.3897/rio.8.e94244
      Authors : Sara El-Gebali, Christopher Erdmann, Donald Winston : The FAIRPoints organization, co-founded by the authors, aims to provide a platform for conversations to take place around realistic and pragmatic implementations of the FAIR (Findable, Accessible, Interoperable, Reusable) principles. The uniqueness of the FAIRPoints effort stems from an additional aim: to capture conversation contributions in the form of “bite-sized” objects – “points” – in a way that facilitates dynamic composition by instructors for the delivery of audience-customized training experiences. Thus, FAIRPoints aims to cultivate pragmatic learning resources to help realize the FAIR principles in practice, both throughinviting speakers to prime and lead discussions focused on choices/challenges regarding FAIR, andamplifying downstream value potential by serializing “points” made during such events as FAIR resources.Currently, event outcomes are serialized as LearningResource-typed JSON-LD objects in the schema.org sense, i.e. sdo:LearningResource, where @prefix sdo: ., and conform to the bioschemas.org TrainingMaterial profile. However, any differences in participant perspectives must be reconciled, via git revision control, towards a single “view” of a sdo:LearningResource. This situation is at odds with other explicit aims of the FAIRPoints organization such as including diverse voices and collecting heterogeneous input from a global perspective.Using the FAIR Digital Object (FDO) approach, a FAIRPoints sdo:LearningResource instance may be the Object to which an Identifier points, through an FDO Identifier Record, and sdo:LearningResource may be the FDO Type. Crucially, there may be a multiplicity of Metadata records pointed to by an FDO Identifier Record and thus a formal mechanism to cultivate and publish diverse perspectives.This presentation will outline FAIRPoints’ approach to FDO implementation for learning resources and its relation to published practice. Specifically, in relation to the FAIR Digital Twins approach*1, our approach may be seen as the stewardship of a “fluid graph” of learning-resource “knowlets” with support for “qua” projection in service of e.g. an instructor’s dynamic composition of training material for a targeted workshop. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Emerging FAIR Ecosystem(s): A Practical Perspective

    • Abstract: Research Ideas and Outcomes 8: e94149
      DOI : 10.3897/rio.8.e94149
      Authors : Erik Schultes, Arofan Gregory, Barbara Magagna : There is broad acceptance that FAIR data (Wilkinson et al. 2016) reuse is desirable, with considerable interest and energy being devoted to its realization, but many questions remain on the part of prospective implementers. Fundamental to explaining how best to implement FAIR is an overview of how the many FAIR models (and the technologies that support them) fit together into a coherent "FAIR ecosystem". How do FAIR Implementation Profiles (FIPs) (Schultes et al. 2020) relate to FAIR Data Points (FDPs) (Bonino da Silva Santo et al. 2022), and how do these relate to the concept of FAIR Digital Objects (FDOs) (Hudson et al. 2018)' What is their relationship to other diverse FAIR resources and digital assets (metadata, datasets, repositories, and the complex web of services that run them)' How are these novel and legacy systems intended to interoperate' These questions are often encountered by those involved in the growing number of projects looking at FAIR implementation (ENVRI-FAIR 2019, CODATA 2022, SciDataCon 2022, Seoul Korea 2022b, Health-Holland Project 2021, Health-Holland Project 2022, Swiss Personalized Health Network 2022, Maxwell 2021, VODAN 2020).Such an overview could also inform the development of specifications for the different models involved in a FAIR ecosystem, such as FIPs, FDPs, and the description of digital resources (data and services) at various levels. With an agreed picture of the FAIR-reuse ecosystem, the points of contact and "hand-off" would be easier to describe and coordinate.This presentation looks at questions from FAIR implementation across various settings, and proposes a view of the overall ecosystem which could be agreed and communicated to prospective implementers. It suggests the relationship between various artefacts being discussed in the FAIR community today (FIPs, FDPs, FDOs, and other digital assets) and looks at how these can be connected to the business layer to support the development of services and applications within the envisioned FAIR ecosystem. Notably, this includes how the Cross-Domain Interoperability Framework (CDIF) being developed through the WorldFAIR project can connect to the underlying FAIR ecosystem in practical terms (Weise et al. 2022).The presentation will address high-level considerations around the major technology components of a FAIR ecosystem, their roles within a range of common user scenarios (often having unavoidable legacy technology), and their relationship to each other and to the set of models needed to provide practical services for FAIR interoperation at the business level.Three basic scenarios are examined, in order to understand the practical requirements of different communities. The first scenario is one which has received a good deal of attention during initial efforts to implement the FAIR principles, a domain or user community without a strong pre-existing culture of data sharing and resume, wishes to become FAIR. The second scenario is one where a community with a strong existing culture of data sharing and reuse is looking to integrate its current approaches with those advocated by the FAIR community. The third scenario looks at FAIR from the perspective of the implementer of FAIR services from an “industrial” perspective: how does FAIR provide the kind of market which is needed to support full-scale services and application development'Each of these scenarios provide valid, but different views of what it will take to implement the FAIR in practical terms. In order to understand them, we can draw parallels with other large-scale data-sharing efforts in other communities - the Internet and the Web itself can be understood as a useful example of how vision, standards, and implementation combine to provide successful infrastructure at this scale. Indeed, the concept of Digital Objects (and now FAIR Digital Objects) has its roots in this analogy (Kahn and Wilensky 2006). Other, smaller examples also exist, which focus more specifically on the exchange of data and metadata: for example the Statistical Data and Metadata Exchange (SDMX) Initiative (SDMX community 2022) or the emerging Cross-Domain Interoperability Framework (SciDataCon 2022, Seoul Korea 2022a). Although implemented within targeted communities, these efforts exchange a wide range of data and metadata not entirely dissimilar to what is envisioned in FAIR. There is currently no single exact parallel for FAIR ecosystems, but there are examples from which we can learn in terms of making large-scale data reuse a practical reality.Core to these is a vision of all of the component pieces, and how they can act in concert to provide a scalable infrastructure which will address the needs of the many different communities of users. Such a common vision may be implicitly agreed among those working on FAIR implementation today, but in the interests of clear communication, it is time to document it - and in keeping with FAIR, this documentation should be itself machine-actionable. As we move toward the specification of the many components of the FAIR ecosystem, it seems only common sense to have an agreed roadmap. HTML XML PDF
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • FAIR Digital Object Application Case for Composing Machine Learning
           Training Data

    • Abstract: Research Ideas and Outcomes 8: e94113
      DOI : 10.3897/rio.8.e94113
      Authors : Nicolas Blumenröhr, Thomas Jejkal, Andreas Pfeil, Rainer Stotzka : The application case for implementing and using the FAIR Digital Object (FAIR DO) concept (Schultes and Wittenburg 2019), aims to simplify the access to label information for composing Machine Learning (ML) (Awad and Khanna 2015) training data.Data sets curated by different domain experts usually have non-identical label terms. This prevents images with similar labels from being easily assigned to the same category. Therefore, using them collectively for application as training data in ML comes with the cost of laborious relabeling. The data needs to be machine-interpretable and -actionable to automate this process. This is enabled by applying the FAIR DO concept. A FAIR DO is a representation of scientific data and requires at least a globally unique Persistent Identifier (PID) (Schultes and Wittenburg 2019), mandatory metadata, and a digital object type.Storing typed information in the PID record demands a prior selection of that information. This includes mandatory metadata and a digital object type to enable machine interpretability and subsequent actionability. The information provided in the PID record refers to its PID Kernel Information Profile (PIDKIP), defined or selected by the creator of the FAIR DO. A PIDKIP is a standard that facilitates the definition and validation of the mandatory metadata attributes in the PID record. This information acts as a basis for a machine to decide if the digital object is reusable for a particular application. Part of that is also the digital object type, which enables a machine to work with the data represented by the FAIR DO. If more information is required, the data itself or other associated FAIR DOs need to be accessed through references in the PID record.Specifying the granularity of the data representation, and the granularity of the metadata in the information record is not a fixed task but depends on the objective. Here, the FAIR DO concept is used for representing image data sets with their label metadata. Each data set contains multiple images, which refer to the same label term. One data set associated with a particular label is represented as one FAIR DO. A type that provides information about this entity covers the packaged format of the images and the image format itself. Further information about the label term and other metadata associated with the data set is provided or accessed through references in the PID record. For the PIDKIP, the Helmholtz KIP was chosen, following the RDA Working Group recommendations on PID Kernel Information (RDA 2013). This profile includes mandatory metadata attributes, used for machine-actionable decisions required for relabeling. Information about the data labels is not directly provided in its PID record, but in another PID record of an associated image label FAIR DO. This one represents a metadata document, containing label information about the data set. Its PID record is based on the same PIDKIP, i.e. the Helmholtz KIP. Both FAIR DOs point to each other. Thus, the image label FAIR DO is accessed via the reference in the PID record of the data set FAIR DO and vice versa. Its PID record contains information about the labels, which are relevant to the relabeling task. Accessing data label information that way means the user does not have to look up each data set, analyze its content and search for its labels. (Fig. 1)The automated procedure for relabeling then looks as follows: A specialized client that can work with PIDs, resolves the PID of a FAIR DO which represents an image data set, and fetches its record. Analyzing its type, the client validates the data usability for composing a ML training data set. Furthermore, the referenced PID of the image label FAIR DO in the record is resolved the same way. By analyzing its PID record, the client identifies that it is relevant for getting information about the labels. The document represented by the image label FAIR DO is accessed via its location path provided in the PID record. To work with its content, a specialized tool is required that is compatible with its format and schema, i.e. its type. This tool identifies and analyzes the label term of the data set for mapping it to corresponding label terms of other image data sets.This specification of FAIR DOs enables the relabeling of entire image data sets for application in ML. However, the current granularity of data representation is insufficient for other machine-based decisions and actions on single images. Another aspect in this regard is to increase the information in the PID record to enable more machine-actionable decisions. This requires reconsideration of the granularity of metadata in the PID record and needs to be balanced with the aim of fast record processing. Changing the content of the PID record also leads to deriving a new PIDKIP, or extending existing ones. Metadata tools applied in conjunction with the FAIR DO concept that uses the label information in the document of the metadata FAIR DOs need further specification. One requirement for their implementation is a standardized data description for the metadata document, using schemas and vocabularies.Using the machine actionability of FAIR DOs described above, enables automation for relabeling data sets. This leaves more time for the ML user to concentrate on model training and optimization. Software development of FAIR DO-specific clients and metadata mapping tools are the subject of current research. The next step is to implement such software, for carrying out the proposed concept on a large scale.This work has been supported by the research program 'Engineering Digital Futures' of the Helmholtz Association of German Research Centers and the Helmholtz Metadata Collaboration Platform (Helmholtz-Gemeinschaft Deutscher Forschungszentren 1995). ...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Creating lightweight FAIR Digital Objects with RO-Crate

    • Abstract: Research Ideas and Outcomes 8: e93937
      DOI : 10.3897/rio.8.e93937
      Authors : Stian Soiland-Reyes, Peter Sefton, Leyla Jael Castro, Frederik Coppens, Daniel Garijo, Simone Leo, Marc Portier, Paul Groth : RO-Crate (Soiland-Reyes et al. 2022) is a lightweight method to package research outputs along with their metadata, based on Linked Data principles (Bizer et al. 2009) and W3C standards. RO-Crate provides a flexible mechanism for researchers archiving and publishing rich data packages (or any other research outcome) by capturing their dependencies and context. However, additional measures should be taken to ensure that a crate is also following the FAIR principles (Wilkinson 2016), including consistent use of persistent identifiers, provenance, community standards, clear machine/human-readable licensing for metadata and data, and Web publication of RO-Crates.The FAIR Digital Object (FDO) approach (De Smedt et al. 2020) gives a set of recommendations that aims to improve findability, accessibility, interoperability and reproducibility for any digital object, allowing implementation through different protocols or standards.Here we present how we have followed the FDO recommendations and turned research outcomes into FDOs by publishing RO-Crates on the Web using HTTP, following best practices for Linked Data. We highlight challenges and advantages of the FDO approach, and reflect on what is required for an FDO profile to achieve FAIR RO-Crates.The implementation allows for a broad range of use cases, across scientific domains. A minimal RO-Crate may be represented as a persistent URI resolving to a summary website describing the outputs in a scientific investigation (e.g. https://w3id.org/dgarijo/ro/sepln2022 with links to the used datasets along with software). One of the advantages of RO-Crates is flexibility, particularly regarding the metadata accompanying the actual research outcome. RO-Crate extends schema.org, a popular vocabulary for describing resources on the Web (Guha et al. 2016). A generic RO-Crate is not required to be typed beyond Dataset*1. In practice, RO-Crates declare conformance to particular profiles, allowing processing based on the specific needs and assumptions of a community or usage scenario. This, effectively, makes RO-Crates typed and thus machine-actionable. RO-Crate profiles serve as metadata templates, making it easier for communities to agree and build upon their own metadata needs.RO-Crates have been combined with machine-actionable Data Management Plans (maDMPs) to automate and facilitate management of research data (Miksa et al. 2020). This mapping allows RO-Crates to be generated out of maDMPs and vice versa. The ELIXIR Software Management Plans (Alves et al. 2021) is planning to move their questionnaire to a machine-actionable format with RO-Crate. ELIXIR Biohackathon 2022 will explore integration of RO-Crate and the Data Stewardship Wizard (Pergl et al. 2019) with Galaxy, which can automate FDO creation that also follows data management plans.A tailored RO-Crate profile has been defined to represent Electronic Lab Notebooks (ELN) protocols bundled together with metadata and related datasets. Schröder et al. (2022) uses RO-Crates to encode provenance information at different levels, including researchers, manufacturers, biological and chemical resources, activities, measurements, and resulting research data. The use of RO-Crates makes it easier to programmatically question-answer information related to the protocols, for instance activities, resources and equipment used to create data. Another example is WorkflowHub (Goble et al. 2021) which defines the Workflow RO-Crate profile (Bacall et al. 2022), imposing additional constraints such as the presence of a main workflow and a license. It also specifies which entity types and properties must be used to provide such information, implicitly defining a set of operations (e.g., get the main workflow and its language) that are valid on all complying crates. The workflow system Galaxy (The Galaxy Community 2022) retrieves such Workflow Crates using GA4GH TRS API.The workflow profile has been further extended (with OOP-like inheritance) in Workflow Testing RO-Crate, adding formal workflow testing components: this adds operations such as getting remote test instances and test definitions, used by the LifeMonitor service to keep track of the health status of multiple published workflows. While RO-Crates use Web technologies, they are also self-contained, moving data along with their metadata. This is a powerful construct for interoperability across FAIR repositories, but this raises some challenges with regards to mutability and persistence of crates.To illustrate how such challenges can be handled, we detail how the WorkflowHub repository follows several FDO principles:Workflow entries must be frozen for editing and have complete kernel metadata (title, authors, license, description) [FDOF4] before they can be assigned a persistent identifier, e.g. https://doi.org/10.48546/workflowhub.workflow.255.1 [FDOF1]Computational workflows can be composed of multiple files used as a whole, e.g. CWL files in a GitHub repository. These are snapshotted as a single RO-Crate ZIP, indicating the main workflow. [FDOF11]PID resolution can content-negotiate to Datacite’s PID metadata [FDOF2] or use FAIR Signposting to find an RO-Crate containing the workflow [FDOF3] and richer JSON-LD metadata resources [FDOF5,FDOF8], see Fig. 1Metadata uses schema.org [FDOF7] following the community-developed Bioschemas ComputationalWorkflow profile [FDOF10].Workflows are discovered using the GA4GH TRS API [FDOF5,FDOF6,FDOF11] and created/modified using CRUD operations [FDOF6]The RO-Crate profile, effectively the FDO Type [FDOF7], is declared as https://w3id.org/workflowhub/workflow-ro-crate/1.0; the workflow language (e.g. https://w3id.org/workflowhub/workflow-ro-crate#galaxy) is defined in metadata o...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Implementation of a Photovoltaic System Model Using FAIR Digital Objects

    • Abstract: Research Ideas and Outcomes 8: e93929
      DOI : 10.3897/rio.8.e93929
      Authors : Jan Schweikert, Karl-Uwe Stucky, Mohamed Koubaa, Wolfgang Süß, Veit Hagenmeyer : The energy transition is an urgent and challenging subject in research and for society. For this, transitioning to renewable energies is one key element of the energy transition, but renewable energies have drawbacks with regard to their reliability of power supply. The weather as well as day and night periods induce a very high volatility, that requires multiscale coordination of power supply and demand. Therefore, the power grid will undergo a drastic change from a puely demand-driven towards a supply and demand-driven network. The operation of these highly coordinated future smart grids will create huge data volume which needs to be managed appropriately. In the scientific domain as well as in operations FAIR Digital Objects (FDOs) are envisaged to be a key technology for this task. FDOs provide access to metadata allowing applications to automatically retrieve the referenced data, interpret it semantically and support energy researchers and operational staff to handle it in a more sustainable way.In Schweikert et al. (2022) a concept is proposed which allows to describe different aspects of an object. Arbitrary schemas and ontologies can be utilized and a coherent object graph is provided by using FDOs. As an conceptual example the authors employed a photovoltaic system (PV system). The data of this system is divided into two kinds: static data and dynamic data. Static data is further divided into structural composition metadata, which describes the entities of the PV system and their relations to each other, and master data which are used to describe the properties of the different entities (comparable to a technical information data sheet). Dynamic data is the measurement data which is acquired at several locations within the PV system. The structural composition metadata is described by using an in-house developed ontology called PV Ontology based on the Web Ontology Language*1. The master data is described using the standards IEC 61850*2, GeoJSON*3 and SensorML*4 (for the master data of the sensors). The dynamic data - the structure of the measurement data and its geo-position - is also described with SensorML.By describing every entity in the PV system using the mentioned schemas, a lot of description objects (instances of the schemas) are created. For instance, a PV module is comprised of three description objects, one written in IEC 61850 (technical data sheet), another in GeoJSON (geo-position and dimensions), and the third is the ontology instance where the PV module is represented as a node in the ontology instance graph. How can these three objects be linked with each other to make clear that the containing information is about this PV module' Schweikert et al. (2022) uses FDOs to create these associations. The profile used in the FDOs is the Helmholtz Kernel Information Profile (KIP) (Pfeil et al. 2022) which is an extension of the Research Data Alliance (RDA)*5 KIP (Weigel et al. 2018). It introduces several new properties, inter alia, the property hasMetadata. This property allows to reference further FDOs providing metadata to the current one. Using the Helmholtz KIP Schweikert et al. (2022) constructs the description of the PV system as follows: An FDO is created for the ontology instance (hereinafter referred to as ONT-FDO). For every entity of the system with description objects an FDO (Bridge-FDO) is created. The digitalObjectLocation of these FDOs is referencing the ONT-FDO plus adding the ID of the corresponding entity in the ontology instance as fragment identifier. This bridges the border between the ontology and the FDOs and allows unambiguous referencing of ontology graph nodes. An application using the PV ontology and a given Bridge-FDO can infer the position of the entity in the PV system. For every description object an FDO is created and linked to its entity by using the hasMetadata property on the Bridge-FDO. Alternatively, if one wants to reduce the number of FDOs, a collection (e.g, using RDA's Collection Recommendations (Weigel et al. 2017)) containing all description objects can be created and referenced through the Bridge-FDO by creating an FDO for the collection. Lastly, a final Bridge-FDO can be created pointing with its digitalObjectLoaction to the PV system root node and referencing all other Bridge-FDOs with its hasMetadata property.Schweikert et al. (2022) did not implement the discussed concept. In the present work we introduce for the first time an implementation of the concept, in which we develop an application that allows users to browse and visualize all the data (structural design, technical information of the components and measurement data) of a PV system. A user provides a persistent identifier (PID) to the application and the application starts to resolve all the data associated with the PID. Three possible cases of a given PID can occur: First, the PID of the entire PV system is entered, the application retrieves the data for all components of the system and presents them to the user. Second, the PID of one component is entered, the application retrieves all the data of this component and presents them to the user, and an option to browse the complete system is offered. In these two cases it is assumed that the entered PID belongs to a Bridge-FDO. In the last case a PID of a description object is entered, then the application behaves as in the second case. The implementation verifies the concept in which any information about the PV system can be obtained by any starting node in the object graph spanned by the FDOs. It also provides insight into the applicability in a real-world use case uncovering possible problems and pitfalls. It also gives energy researchers and operational staff a useful tool to browse and visualize inform...
      PubDate: Wed, 12 Oct 2022 17:30:00 +030
       
  • Using a Collection Heath Index to prioritise access and activities in the
           New Zealand Arthropod Collection

    • Abstract: Research Ideas and Outcomes 8: e93841
      DOI : 10.3897/rio.8.e93841
      Authors : Darren Ward, Svetlana Malysheva : A Collection Health Index (CHI) is a useful approach to help scope new activities, prioritise curation and accelerate digitisation within taxonomic collections. We use a Collection Health Index (CHI), based on McGinley (1993), to profile the curation levels in the New Zealand Arthropod Collection for major insect groups. There are several highly curated and well known groups (Hemiptera, Lepidoptera, ‘Other Insects’). However, three major issues were identified: 1) curation becoming increasingly outdated in sections with large numbers of, particularly older, specimens (Coleoptera, Diptera); 2) historically poorer curation, with no resident expertise or resource (Diptera); and 3) high levels of family and genus-only material that needs further identification and a significant amount of alpha level taxonomy (parts of Coleoptera, parts of Diptera and Hymenoptera). Assessment using the CHI is simple and fast, allows future planning and is based on common issues for collection management, such as care, accessibility, organisation and data capture. HTML XML PDF
      PubDate: Fri, 7 Oct 2022 15:16:20 +0300
       
  • Ant-plant symbioses trade-offs and its role in forest restoration projects

    • Abstract: Research Ideas and Outcomes 8: e94784
      DOI : 10.3897/rio.8.e94784
      Authors : Sze Huei Yek : Ant-plant symbioses are complex between-species interactions found only in the tropical environment. Typically, in such symbioses, plants provide housing structures and food to their ant symbionts. In return, the ants protect their plants' host against herbivore attack and additional nutrients to help with plants' growth. These win-win interactions range from facultative to obligate mutualism. This proposal aims to test the three main mechanisms: (1) by-product benefits, (2) partner fidelity feedback and (3) partner choice in stabilising the ant-plant mutualism. Understanding the mechanisms are crucial as they form the foundation of the ant-plant distribution and growth, in other words - the health of the myrmecophyte (ants-loving) trees in the forest ecosystem. Hence, ant-plant symbioses are an ideal model system for investigating the effects of anthropogenic changes, such as deforestation and climate change on the outcome of ant-plant mutualistic interactions. This project attempts to identify the mechanisms regulating the mutualistic interactions and, in particular, identify the context in which such mutualistic interactions evolved and adapt to the changing environment. We hypothesise that there will be a higher diversity of obligate mutualistic ant-plant interactions in the undisturbed environment compared to degraded habitat. Furthermore, we expect there are different complexity of symbioses, involving multiple partners (ants-hemipteran insects-bacteria-fungi-plants) that deepen our understanding of how such symbioses can be stabilised. Finally, the deforestation combined with climate change in Southeast Asia will have a detrimental effect on ant-plant symbioses, causing breakdown of mutualistic partners and invasion of cheater ant species that do not confer a protective advantage to their plants' host. HTML XML PDF
      PubDate: Tue, 27 Sep 2022 09:01:31 +030
       
  • Scholia for Software

    • Abstract: Research Ideas and Outcomes 8: e94771
      DOI : 10.3897/rio.8.e94771
      Authors : Lane Rasberry, Daniel Mietchen : Scholia for Software is a project to add software profiling features to Scholia, which is a scholarly profiling service from the Wikimedia ecosystem and integrated with Wikipedia and Wikidata. This document is an adaptation of the funded grant proposal. We are sharing it for several reasons, including research transparency, our wish to encourage the sharing of research proposals for reuse and remixing in general, to assist others specifically in making proposals that would complement our activities, and because sharing this proposal helps us to tell the story of the project to community stakeholders.A "scholarly profiling service" is a tool which assists the user in accessing data on some aspect of scholarship, usually in relation to research. Typical features of such services include returning the biography of academic publications for any given researcher, or providing a list of publications by topic. Scholia already exists as a Wikimedia platform tool built upon Wikidata and capable of serving these functions. This project will additionally add software-related data to Wikidata, develop Scholia's own code, and address some ethical issues in diversity and representation around these activities. The end result will be that Scholia will have the ability to report what software a given researcher has described using in their publications, what software is most used among authors publishing on a given topic or in a given journal, what papers describe projects which use some given software, and what software is most often co-used in projects which use a given software. HTML XML PDF
      PubDate: Thu, 15 Sep 2022 16:16:40 +030
       
  • Applications for zoosporic parasites in aquatic systems (ParAqua)

    • Abstract:
      DOI : 10.3897/arphapreprints.e94590
      Authors : Serena Rasconi, Hans-Peter Grossart, Alena Gsell, Bastiaan Willem Ibelings, Dedmer van de Waal, Ramsy Agha, Ariola Bacu, Maija Balode, Meryem Beklioğlu, Maja Berden Zrimec, Florina Botez, Tom Butler, Slawomir Cerbin, Angela Cortina, Michael Cunliffe, Thijs Frenken, Esther Garcés, Laura Gjyli, Yonatan Golan, Tiago Guerra, Ayis Iacovides, Antonio Idà, Maiko Kagami, Veljo Kisand, Jovica Leshoski, Pini Marco, Natasa Mazalica, Takeshi Miki, Maria Iasmina Moza, Sigrid Neuhauser, Deniz Özkundakci, Kristel Panksep, Suzana Patcheva, Branka Pestoric, Maya Petrova Stoyneva, Diogo Pinto, Juergen Polle, Carmen Postolache, Joaquín Pozo Dengra, Albert Reñé, Pavel Rychtecky, Dirk S. Schmeller, Bettina Scholz, Géza Selmeczy, Télesphore Sime-Ngando, Kálmán Tapolczai, Orhideja Tasevska, Ivana Trbojevic, Blagoy Uzunov, Silke Van den Wyngaert, Ellen van Donk, Marieke Vanthoor, Elizabeta Veljanoska Sarafiloska, Susie Wood, Petr Znachor : Zoosporic parasites (i.e. fungi and fungi-like aquatic microorganisms) constitute important drivers of natural populations, causing severe host mortality. Economic impacts of parasitic diseases are notable in the microalgae biotech industry, affecting production of food ingredients, biofuels, pharma- and nutraceuticals.While scientific research on this topic is gaining traction by increasing studies elucidating the functional role of zoosporic parasites in natural ecosystems, we are currently lacking integrated and interdisciplinary efforts for effectively detecting and controlling parasites in the microalgae industry. To fill this gap we propose to establish an innovative, dynamic European network connecting scientists, industries and stakeholders to optimize information exchange, equalize access to resources and to develop a joint research agenda. ParAqua aims at compiling and making available all information on the occurrence of zoosporic parasites and their relationship with hosts, elucidate drivers and evaluate impacts of parasitism in natural and man-made aquatic environments. We aim to implement new tools for monitoring and prevention of infections, and to create protocols and a Decision Support Tool for detecting and controlling parasites in the microalgae biotech production. Applied knowledge on zoosporic parasites can feed back from industry to ecology, and we therefore will explore whether the developed tools can be applied for monitoring lakes and reservoirs. Short-Term Scientific Missions and Training Schools will be organised specifically for early stage scientists and managers – with a specific focus on ITC – with the aim to share and integrate both scientific and applied expertise and increase exchange between basic and applied researchers and stakeholders. HTML XML PDF
      PubDate: Fri, 9 Sep 2022 17:40:00 +0300
       
  • Joint statement on best practices for the citation of authorities of
           scientific names in taxonomy by CETAF, SPNHC and BHL

    • Abstract: Research Ideas and Outcomes 8: e94338
      DOI : 10.3897/rio.8.e94338
      Authors : Laurence Benichou, Jutta Buschbom, Mariel Campbell, Elisa Hermann, Jiří Kvaček, Patricia Mergen, Lorna Mitchell, Constance Rinaldo, Donat Agosti : This joint statement aims at encouraging all authors, publishers and editors involved in scientific publishing to give the bibliographic source of the authorities of taxonomic names. This initiative, written by members of the three communities, has been approved by the executive boards of the SPNHC (Society for the Preservation of Natural History Collections), CETAF (Consortium of European Taxonomic Facilities) and BHL (Biodiversity Heritage Library). HTML XML PDF
      PubDate: Fri, 9 Sep 2022 10:15:03 +0300
       
  • Illuminating biodiversity changes in the ‘Black Box’

    • Abstract: Research Ideas and Outcomes 8: e87143
      DOI : 10.3897/rio.8.e87143
      Authors : Helen Phillips, Erin Cameron, Nico Eisenhauer : Soil is often described as a ‘black box’, as surprisingly little is known about the high levels of biodiversity that reside there. For aboveground organisms, we have good knowledge of the distribution of the species and how they might change under future human impacts. Yet despite the fact that soil organisms provide a wide variety of ecosystem functions, we have very limited knowledge of their distribution and how their diversity might change in the future. In order to create accurate and generalisable models of biodiversity, the underlying data need to be representative of the entire globe. Yet even with our recently compiled global earthworm dataset of over 11000 sites, there are gaps across large regions. These gaps are consistent across many other datasets of both above- and belowground diversity. In order to fill the gaps we propose a sampling network (SoilFaUNa), to create a comprehensive database of soil macrofauna diversity and soil functions (e.g. decomposition rates). Building on the existing dataset of earthworm diversity and early data from the SoilFaUNa project, we will investigate changes in earthworm diversity. From our current work, we know that both climate and land use are main drivers in predicting earthworm diversity, but both will change under future scenarios and may alter ecosystem functions. We will, using space-for-time substitution models, estimate how earthworm diversity and their functions might change in the future, modelling earthworm diversity as a function of climate, land use and soil properties and predicting based on future scenarios. Previous studies of aboveground diversity changes over time using time-series analysis have found no-net-loss in richness, but analyses have criticisms. We aim to use time-series data on earthworms to move this debate forward, by using data and statistical methods that would address the criticisms, whilst increasing our knowledge on this understudied soil group. Field experiments and micro-/mesocosm experiments have been used to investigate the link between a number of soil organisms and ecosystem functions under few environmental conditions. Meta-analyses, which can produce generalisable results can only answer questions for which there are data. Thus, we have been lacking on information on the link between the entire community of soil fauna and ecosystem functions and impact of changes to the soil fauna community across environmental contexts. Using data collected from the SoilFaUNa project, we will, for the first time, synthesise globally distributed specifically-sampled data to model how changes in the community composition of soil macrofauna (due to changes in land use, climate or soil properties) impact the ecosystem functions in the soil. HTML XML PDF
      PubDate: Wed, 31 Aug 2022 08:15:39 +030
       
  • Taxonomic Treatments as Open FAIR Digital Objects

    • Abstract: Research Ideas and Outcomes 8: e93709
      DOI : 10.3897/rio.8.e93709
      Authors : Donat Agosti, Alexandros Ioannidis-Pantopikos : Taxonomy is the science of charting and describing the worlds biodiversity. Organisms are grouped into taxa which are given a given rank building the taxonomic hierarchy. The taxa are described in taxonomic treatments, well defined sections of scientific publications (Catapano 2019). They include a nomenclatural section and one or more sections including descriptions, material citations referring to studied specimens, or notes ecology and behavior. In case the treatment does not describe a new discovered taxon, previous treatments are cited in the form of treatment citations. This citation can refer to a previous treatment and add additional data, or it can be a statement synonymizing the taxon with another taxon. This allows building a citation network, and ultimately is a constituent part of the catalogue of life. Thus treatments play an important role to understand the diversity of life on Earth by providing the scientific argument why group of organism is a new species, or a synonym, and the data provided will increasingly be important to analyze and compare whole genomes of individual genomes.Treatments have been extracted by Plazi since 2008 (Agosti and Egloff 2009), and the TaxPub schema has been described by Catapano (Catapano 2019) to complement existing vocabularies to allow annotation of legacy literature and to produce new publications including the respective annotations (Penev et al. 2010). Today, more than 750,000 treatments have been annotated by Plazi’s TreatmenBank and over 400,000 have been made FAIR digital objects in the Biodiversity Literature Repository in a collaboration of Plazi, Zenodo and Pensoft (Ioannidis-Pantopikos and Agosti 2021, Agosti et al. 2019), and are reused by the Global Biodiversity Information Facility (GBIF), Global Biotic Interaction (GloBI), and the Library System of the Swiss Institute of Bioinformatics (SIBiLS).Each treatment on the Zenodo repository is findable through its rich metadata. The insertion of custom metadata in Zenodo provides metadata referring to domain specific vocabularies such as Darwin Core (Ioannidis-Pantopikos and Agosti 2021). The treatment are accessible through its DataCite Digital Object Identifier (
      DOI ) for the taxonomic treatment as subtype of a publication. The data is interoperable by machine actionable JSON version of the treatment. A license is provided to assure it is reusable.The richness of data and citations within a treatment provide a stepping stone to add treatments not only to knowledge systems such as Wikidata or openBioDiv, but to provide links to many of the cited objects, such as specimens through the material citations, and thus a well curated assemblage of links. Being a FAIR digital object, treatments can be cited and should ultimately linked to from a taxonomic name used in an identification of an organism. HTML XML PDF
      PubDate: Thu, 25 Aug 2022 12:09:04 +030
       
  • From Green Deal to Cultural Heritage: FAIR Digital Objects and European
           Common Data Spaces

    • Abstract: Research Ideas and Outcomes 8: e93815
      DOI : 10.3897/rio.8.e93815
      Authors : Sharif Islam, Andreas Weber, Erzsébet Tóth-Czifra : This talk outlines a vision for Common European Data Spaces, proposed by the European Commission, where FAIR principles (Wilkinson et al. 2016) and FAIR Digital Objects (FDOs) (De Smedt et al. 2020, Schwardmann 2020) can play a role in bringing together research infrastructures, data aggregators and other stakeholders working with curated objects in museums, herbaria, libraries and archives. The organisations and stakeholders involved represent a wide range of disciplines and data types including biodiversity, ecology, anthropology, archaeology, cultural history, digital storytelling, art conservation, and history of science among others (ICEDIG 2020, Ortolja-Baird and Nyhan 2021). The context and the history of the curated objects also span the natural sciences and cultural heritage domains (Nadim 2021, Weber 2021). Despite this heterogeneity, various common themes in the area of digital curation, open access, and data usage (Tasovac et al. 2020) appear where FDOs and Common European Data Spaces can be a useful venue for supporting the European Strategy for Data. In particular, FDOs, as an abstraction mechanism to structure and describe digital artefacts from a specific domain yet at the same time provide interoperability (De Smedt et al. 2020), can help realise the vision behind a common data space to “bring together relevant data infrastructures and governance frameworks in order to facilitate data pooling and sharing” (European Commission 2022:2).A May 2022 report on the challenges and opportunities of European Common Data Spaces highlights the following points:Open data holders have extensive experience in data publishing, metadata management, data quality, dataset discovery, data federation, as well as tried-and-tested standards (e.g. DCAT) and technologies. There seems to be very little knowledge/technology transfer from the open data community to the data spaces community, which is a missed opportunity. Data space implementations should not reinvent wheels that the open data community has already developed, tested, and used extensively.Whether the data is private, shared, or open, using data from multiple sources requires interoperability at several levels, from identifiers to vocabularies. The question of which data intermediaries will act as neutral agents to ensure interoperability is underexplored in the data space context. Public administrations, building on their experience of publishing open data, are best placed to take on such rolesBuilding on previous conversations facilitated by DiSSCo, DARIAH, Europeana, and Archives Portal Europe Foundation, (Europeana Conference 2021, DARIAH Annual Event 2022), this talk will address the above points from the perspective of bringing together the domains of natural history museums, cultural heritage, and digital humanities. Within our collaboration, we have identified several common areas such as data discoverability, linking, and providing contextual information, which align with the goal of FDO implementation. DiSSCo and DARIAH as European infrastructures, on the one hand, and Europeana and Archives Portal as data aggregators, on the other hand, are involved in improving access to data and the researchers' capacity to work with heterogeneous data sources. One of the biggest shared challenges across the diverse workflows in the arts and humanities and natural history domains is that the data curation processes form a natural continuum between a range of different actors working either in cultural heritage institutions or in academia. In reality, these different layers of curation, enrichment and analysis are separated by legal, institutional, infrastructural and even funding silos (as in many countries, these institutions belong to different ministries, and fall under different legislative frameworks). How can this continuum, from a scholarly point of view, be supported within common data space and FDO framework' At the same time, implementing a common data space requires not just interoperability but stewardship and strategy for sharing resources (Keller 2021).The data infrastructure and FAIR related activities explored in our collaboration are of strategic importance to help Europe and the rest of the world deal with important societal issues. Therefore, bringing this collaboration within the context of FDO provides an ideal avenue to explore potential data, policy, and implementation matters, in order to address the two gaps outlined above for Common Data Spaces. Furthermore, the ideas expressed in Common European Data Space for Cultural Heritage (with Europeana as the core stakeholder) and Green Deal Data Spaces need further clarification concerning implementation planning and most importantly, how multiple commons would work together. With DARIAH coming from the humanities and DiSSCo from the natural sciences side, such collaborations and synergy should align with the Common Data Spaces vision. The philosophy and ideas behind data and digital commons are not new (Fuchs 2020, Kashwan et al. 2021). However, it is crucial to contextualise the implementation strategy and benefits within data intensive, multidisciplinary research and FAIR principles.Given that curated objects are informational resources for the researchers, but can also provide contexts, and make visible the relationships between artefacts, people, publications, organisations, provenance, and events, it is important to think of them as much more than just records in a database. Additionally, FDOs as the digital representations of the curated objects have the potential of fostering cross-disciplinary collaborations (such as between biology, history, art or anthropology) and of providing a wider lens for understanding materiality and the role of data (Ribes 2019). As interdisciplin...
      PubDate: Thu, 25 Aug 2022 12:08:54 +030
       
  • From data pipelines to FAIR data infrastructures: A vision for the new
           horizons of bio- and geodiversity data for scientific research

    • Abstract: Research Ideas and Outcomes 8: e93816
      DOI : 10.3897/rio.8.e93816
      Authors : Sharif Islam, Claus Weiland, Wouter Addink : Natural science collections are vast repositories of bio- and geodiversity specimens. These collections, originating from natural history cabinets or expeditions, are increasingly becoming unparalleled sources of data facilitating multidisciplinary research (Meineke et al. 2018, Heberling et al. 2019, Cook et al. 2020, Thompson et al. 2021). Due to various global data mobilization and digitisation efforts (Blagoderov et al. 2012,Nelson and Ellis 2018), this digitised information about specimens includes database records along with two/three-dimensional images, sonograms, sound or video recordings, computerised tomography scans, machine-readable texts from labels on the specimens as well as media items and notes related to the discovery sites and acquisition (Hedrick et al. 2020,Phillipson 2022).The scope and practice of specimen gathering are also evolving. The term extended specimen was coined to refer to the specimen and associated data extending beyond the singular physical object to other physical or digital entities such as chemical composition, genetic sequence data or species data. Thus the specimen becomes an interconnected network of data resources that have incredible potential to enhance integrative and data-driven research (Webster 2017,Lendemer et al. 2019,Hardisty et al. 2022). These practices also reflect the role of data and the curatorial data life-cycle starting from the initial material sampling process to the downstream analysis. We are also seeing growing acknowledgement that disparate and domain specific data elements prevent interdisciplinarity which is crucial for a holistic understanding of biodiversity and climate crisis (Hicks et al. 2010, Craven et al. 2019, Folk and Siniscalchi 2021). Thus the data elements are not just records or rows in a database or data pipelines going from one repository to another. They have the potential to become self-describing digital artefacts that can revolutionise how machines interpret and work with specimen data. Within this context, the Distributed System of Scientific Collections (DiSSCo), a new European Research Infrastructure for natural science collections, envisions an infrastructure based on FAIR Digital Objects (FDO) that can unify more than 170 European natural science collections under common and FAIR-compliant (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016) access and curation policies and practices. DiSSCo’s key element in achieving FAIR is the implementation of Digital Specimen (a domain specific FDO) that closely aligns with the extended specimen practices. The idea behind Digital Specimen – an FDO that acts as a digital surrogate for a specific physical specimen in a natural science collection – was influenced by global conversations around the implementation of the Digital Object Architecture for biodiversity data (De Smedt et al. 2020, Islam et al. 2020,Hardisty et al. 2020). The main purpose of this talk is to explain the vision of how FAIR and FDO can create a data infrastructure that can not only take advantage of existing databases and repositories but at the same time provide support for innovative services such as AI and digital twinning. With scientific use cases in mind, the talk will highlight a few key FAIR and FDO components (persistent identifiers, metadata, ontologies) within the collaborative modelling activity of Digital Specimen specification. These components provide the template for specifying how a Digital Specimen should look so DiSSCo can build a FAIR service ecosystem based on FDOs (Addink et al. 2021). We will also give examples of envisioned services that can help with image feature extraction, and model training (Grieb et al. 2021,Hardisty et al. 2022) and digital twinning (Schultes et al. 2022). We believe this is an exciting new paradigm powered by FAIR and FDO that can help both humans and machines to accelerate the use of specimen data. From physical objects curated over hundred years, we have developed data pipelines, aggregators and repositories (Barberousse 2021). Now is the time to look for solutions where these data records can become FAIR Digital Objects to enable wider access and multidisciplinary research. HTML XML PDF
      PubDate: Thu, 25 Aug 2022 12:08:41 +030
       
  • FAIR Research Objects for realizing Open Science with RELIANCE EOSC
           project

    • Abstract: Research Ideas and Outcomes 8: e93940
      DOI : 10.3897/rio.8.e93940
      Authors : Anne Fouilloux, Federica Foglini, Elisa Trasatti : The H2020 Reliance project delivers a suite of innovative and interconnected services that extend European Open Science Cloud (EOSC)’s capabilities to support the management of the research lifecycle within Earth Science Communities and Copernicus Users. The project has delivered 3 complementary  technologies: Research Objects (ROs), Data Cubes and AI-based Text Mining.RoHub is a Research Object management platform that implements these 3 technologies and enables researchers to collaboratively manage, share and preserve their research work.RoHub implements the full RO model and paradigm: resources associated to a particular research work are aggregated into a single FAIR digital object, and metadata relevant for understanding and interpreting the content is represented as semantic metadata that are user and machine readable. The development of RoHub is co-designed and validated through multidisciplinary and thematic real life use cases led by three different Earth Science communities: Geohazards, Sea Monitoring and Climate Change communities. A RO commonly starts its life as an empty Live RO. ROs aggregate new objects through their whole lifecycle. This means, a RO is filled incrementally by aggregating new relevant resources such as workflows, datasets, documents according to its typology that are being created, reused or repurposed. These resources can be modified at any point in time.We can copy and keep ROs in time through snapshots which reflect their status at a given point in time. Snapshots can have their own identifiers (
      DOI s) which facilitates tracking the evolution of a research. At some point in time, a RO can be published and archived (so called Archived RO) with a permanent identifier (
      DOI ). New Live ROs can be derived based on an existing Archived RO, for instance by forking it. To guide researchers, different types of Research Objects can be created:Bibliography-centric: includes manuals, anonymous interviews, publications, multimedia (video, songs) and/or other material that support research;Data-centric: refers to datasets which can be indexed, discovered and manipulated;Executable: includes the code, data and computational environment along with a description of the research object and in some cases a workflow. This type of ROs can be executed and is often used for scripts and/or Jupyter Notebooks;Software-centric: also known as “Code as a Research Object”. Software-centric ROs include source codes and associated documentation. They often include sample datasets for running tests.Workflow-centric: contains workflow specifications, provenance logs generated when executing the workflows, information about the evolution of the workflow (version) and its components elements, and additional annotations for the workflow as a whole.Basic: can contain anything and is used when the other types do not cover the need.To ease the understanding and the reuse of the ROs, each type of RO (except Basic RO) has a template folder structure that we recommend researchers to select. For instance an executable RO has 4 folders:'biblio' where  researchers can aggregate documentations, scientific papers that øed to the development of the software/tool that is aggregated in the tool folder;'input' where all the input datasets required for executing the RO are aggregated;'output' where some or all the results generated by executing the RO are aggregated;'tool' where the executable tool is aggregated. Typically, we aggregate Jupyter Notebook and/or executable workflows (Galaxy or snakemake workflows).In addition to the different types of ROs and associated template structures, researchers can select the type of resources that constitutes the main entity of a RO: for instance, a Jupyter Notebook can be selected as the main entity of an executable RO. As shown on Fig. 1, this additional metadata is then visible to everyone (and machine readable) to ease reuse. Examples of Bibliography-centric and Data-centric Research Objects are shown on Fig. 2: the overall overview of any types of Research Object is always the same with mandatory metadata information such as the title, description, authors & collaborators, sketch (featured plots/images), the content of the RO (with different structures depending on the type of ROs). Additional information is displayed on the right panel such as number of downloads, additional discovered metadata (automatically discovered from the Reliance text enrichment service), free keywords (added by end-users) and citation. The 'toolbox' and 'share' sections allows end-users to download, snapshot and archive the RO and/or share it.Any Research Object in RoHub is a FAIR digital object that is for instance findable in OpenAire, including Live ROs.In our presentation, we will showcase different types of ROs for the 3 Earth Science communities represented in Reliance to highlight how the scientists in our respective disciplines changed their working methodology towards Open Science. HTML XML PDF
      PubDate: Thu, 25 Aug 2022 12:08:31 +030
       
  • A Multi-omics Data Analysis Workflow Packaged as a FAIR Digital Object

    • Abstract: Research Ideas and Outcomes 8: e94042
      DOI : 10.3897/rio.8.e94042
      Authors : Anna Niehues, Casper de Visser, Fiona Hagenbeek, Naama Karu, Alida Kindt, Purva Kulkarni, René Pool, Dorret Boomsma, Jenny van Dongen, Alain van Gool, Peter 't Hoen : In current biomedical and complex trait research, increasing numbers of large molecular profiling (omics) data sets are being generated. At the same time, many studies fail to be reproduced (Baker 2016, Kim 2018). In order to improve study reproducibility and data reuse, including integration of data sets of different types and origins, it is imperative to work with omics data that is findable, accessible, interoperable, and reusable (FAIR, Wilkinson 2016) at the source. The data analysis, integration and stewardship pillar of the Netherlands X-omics Initiative aims to facilitate multi-omics research by providing tools to create, analyze and integrate FAIR omics data. We here report a joint activity of X-omics and the Netherlands Twin Register demonstrating the FAIRification of a multi-omics data set and the development of a FAIR multi-omics data analysis workflow.The implementation of FAIR principles (Wilkinson 2016) can improve scientific transparency and facilitate data reuse. However, Kim (2018) showed in a case study that the availability of data and code are required but not sufficient to reproduce data analyses. They highlighted the importance of interoperable and open formats, and structured metadata. In order to increase research reproducibility on the data analysis level, additional practices such as version-control, code licensing, and documentation have been proposed. These include recommendations for FAIR software by the Netherlands eScience Center and the Dutch Data Archiving and Networked Services (DANS), and FAIR principles for research software proposed by the Research Data Alliance (Chue Hong 2022). Data analysis in biomedical research usually comprises multiple steps often resulting in complex data analysis workflows and requiring additional practices, such as containerization, to ensure transparency and reproducibility (Goble 2020, Stoudt 2021).We apply these practices to a multi-omics data set that comprises genome-wide DNA methylation profiles, targeted metabolomics, and behavioral data of two cohorts that participated in the ACTION Biomarker Study (ACTION, Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies, see consortium members in Suppl. material 1) (Boomsma 2015, Bartels 2018, Hagenbeek 2020, van Dongen 2021, Hagenbeek 2022). The ACTION-NTR cohort consists of twins that are either longitudinally concordant or discordant for childhood aggression. The ACTION-Curium-LUMC cohort consists of children referred to the Dutch LUMC Curium academic center for child and youth psychiatry. With the joint analysis of multi-omics data and behavioral data, we aim to identify substructures in the ACTION-NTR cohort and link them to aggressive behavior. First, the individuals are clustered using Similarity Network Fusion (SNF, Wang 2014), and latent feature dimensions are uncovered using different unsupervised methods including Multi-Omics Factor Analysis (MOFA) (Argelaguet 2018) and Multiple Correspondence Analysis (MCA, Lê 2008, Husson 2017). In a second step, we determine correlations between -omics and phenotype dimensions, and use them to explain the subgroups of individuals from the ACTION-NTR cohort. In order to validate the results, we project data of the ACTION-Curium-LUMC cohort onto the latent dimensions and determine if correlations between omics and phenotype data can be reproduced.Integration of data across cohorts and across data types, requires interoperability. We applied different practices to make the data FAIR, including conversion of files to community-standard formats, and capturing experimental metadata using the ISA (Investigation, Study, Assay) metadata framework (Johnson 2021) and ontology-based annotations. All data analysis steps including pre-processing of different omics data types were implemented in either R or Python and combined in a modular Nextflow (Di Tommaso 2017) workflow, where the environment for each step is provided as a Singularity (Kurtzer 2017) container. The analysis workflow is packaged in a Research Object Crate (RO-Crate) (Soiland-Reyes 2022). The RO-Crate is a FAIR digital object that contains the Nextflow workflow including ontology-based annotations of each analysis step. Since omics data is considered to be potentially personally identifiable, the packaged workflow contains a minimal synthetic data set resembling the original data structure. Finally, the code is made available on GitHub and the workflow is registered at Workflowhub (Goble 2021). Since our Nextflow workflow is set up in a modular manner, the individual analysis steps can be reused in other workflows. We demonstrate this replicability by applying different sub-workflows to data from two different cohorts. HTML XML PDF
      PubDate: Thu, 25 Aug 2022 12:08:02 +030
       
  • Essential Biodiversity Variables: extracting plant phenological data from
           specimen labels using machine learning

    • Abstract: Research Ideas and Outcomes 8: e86012
      DOI : 10.3897/rio.8.e86012
      Authors : Maria Mora-Cross, Adriana Morales-Carmiol, Te Chen-Huang, María Barquero-Pérez : Essential Biodiversity Variables (EBVs) make it possible to evaluate and monitor the state of biodiversity over time at different spatial scales. Its development is led by the Group on Earth Observations Biodiversity Observation Network (GEO BON) to harmonize, consolidate and standardize biodiversity data from varied biodiversity sources. This document presents a mechanism to obtain baseline data to feed the Species Traits Variable Phenology or other biodiversity indicators by extracting species characters and structure names from morphological descriptions of specimens and classifying such descriptions using machine learning (ML).A workflow that performs Named Entity Recognition (NER) and Classification of morphological descriptions using ML algorithms was evaluated with excellent results. It was implemented using Python, Pytorch, Scikit-Learn, Pomegranate, Python-crfsuite, and other libraries applied to 106,804 herbarium records from the National Biodiversity Institute of Costa Rica (INBio). The text classification results were almost excellent (F1 score between 96% and 99%) using three traditional ML methods: Multinomial Naive Bayes (NB), Linear Support Vector Classification (SVC), and Logistic Regression (LR). Furthermore, results extracting names of species morphological structures (e.g., leaves, trichomes, flowers, petals, sepals) and character names (e.g., length, width, pigmentation patterns, and smell) using NER algorithms were competitive (F1 score between 95% and 98%) using Hidden Markov Models (HMM), Conditional Random Fields (CRFs), and Bidirectional Long Short Term Memory Networks with CRF (BI-LSTM-CRF). HTML XML PDF
      PubDate: Tue, 23 Aug 2022 11:16:05 +030
       
  • Deliverable D4.10 Plan for Exploitation and Dissemination of SHOWCASE
           results

    • Abstract:
      DOI : 10.3897/arphapreprints.e93509
      Authors : Anna Sapundzhieva, Alexandra Korcheva, Nikol Yovcheva : Communication, dissemination and exploitation play a vital role within SHOWCASE as the main means of ensuring knowledge transfer and uptake of results during the project lifetime and after the project is concluded. The project’s strategic objectives and target groups, as well as the key messages and narratives that the project aims to communicate serve as an orientation in the project’s actions in the relevant field. The current Plan for Exploitation and Dissemination of Results (PEDR) has been developed to define target-specific objectives and outline concrete implementation actions.The SHOWCASE PEDR represents a document that aims to guide the communication and dissemination efforts to target project-relevant audiences, convey clear, understandable, coordinated and effective messages, and reach out project results to all interested parties within the various stakeholder groups.The plan presents the different communication and dissemination tools, structured in an implementation plan according to the different target groups and different stage of development of the project. It also provides a list of tailored key performance indicators (KPI) for the project’s outreach activities that aim to provide a means to quantitatively monitor the effectiveness of dissemination activities. Indicative time schedule for implementation and updates is provided.In addition, this document will identify key project results, which will be a subject of exploitation. HTML XML PDF
      PubDate: Mon, 15 Aug 2022 17:30:00 +030
       
  • Deliverable D4.8 Data Management Plan

    • Abstract:
      DOI : 10.3897/arphapreprints.e93508
      Authors : Alexandra Korcheva, Anna Sapundzhieva, Ignasi Bartomeus : The SHOWCASE DMP is structured into five sections, which aim to establish the scope and terms of use of research data within the project in accordance with the Horizon 2020 requirements of data management.The first section provides an introduction to the plan, which outlines the main data management practices that SHOWCASE would implement throughout the five-year project duration, as well as aspects of sustainable management of results and data after the conclusion of the project period.The second section of the document provides an overview of the commitments that SHOWCASE has made in relation to handling data in a controlled and transparent way, and ensuring an open access to research data and results in line with the EU’s Open Research Data Pilot and FAIR data management.The third section describes the details of data management within the project, focusing on different aspects of the process - from data collection, through data processing, to storage and access provision. The section features information on personal data protection in accordance with the General Data Protection Regulation (GDPR), as well as a break-down of the research data usage into project work packages. Recommendations for relevant data management practices are described in the section.The fourth section includes an overview of the specific data management details for the project work packages. The specific data formats and data management requirements of work packages are described.The fifth section of the DMP features concluding remarks on the data management strategy adopted by the project, and it outlines future updates and additions to the plan, which are going to be presented at a later stage of the project’s development. HTML XML PDF
      PubDate: Mon, 15 Aug 2022 17:30:00 +030
       
  • Deliverable D1.1 Network of EBAs established across Europe

    • Abstract:
      DOI : 10.3897/arphapreprints.e93505
      Authors : Vincent Bretagnolle, Sabrina Gaba, Amelia Hood, Simon Potts : SHOWCASE’s first step is to create a European network of local Experimental Biodiversity Areas (EBAs), that will be used to co-develop (though to varying degrees) and test successful strategies for better integrating biodiversity into farming. EBAs are located across a wide range of agro-ecosystems and represent farming systems undergoing both intensification as well as agricultural abandonment. Rather than creating new sites for the network, the approach in SHOWCASE was that EBAs would be developed mostly from existing collaborations between scientists and practitioners. The first work Package of SHOWCASE, WP1, has built in the 10 countries an experimental and knowledge exchange network in agricultural landscapes across Europe. Existing collaborations include LTSER platforms from eLTER RI, farmer cooperatives, farming research organisations and conservation organisations. These are well-established multi-actor networks already undertaking knowledge exchange, participatory research and innovation activities. Then, participatory approaches with farmers, administrators and other stakeholders are defining operational biodiversity targets at field/farm/regional level by discussing the types and extents of biodiversity indicators that should be used. WP1 thus is building our EBA network, with each EBA serving both as a local testbed for developing and implementing novel interventions and as a knowledge exchange hub. This is a pan-European network of Experimental Biodiversity Areas. In these EBAs multi-actor communities (growers, extension workers, researchers, NGOs, citizens etc.) work together to co-develop, co-manage, co-monitor and co-evaluate biodiversity innovations to enhance farm production, wildlife protection, livelihood quality and resilience of social-ecological production systems. These multi-actor communities will i) identify and prioritise local or regional challenges of biodiversity-agricultural production trade-offs, and ii) co-formulate and test potential solutions. However, to add value at the European level and allow up- scaling and out-scaling of solutions, it is essential to have a common framework and set of core standardised methodologies and measures used by all EBAs. EBAs are expected to be somewhat representative of Europe, in terms of biogeography, farming system or agricultural intensification/abandonment. However, all EBAs are starting from different points. One main target was to develop the network of EBAs based on a core approach, though place-based, in order to provide local solutions to local challenges. A conceptual representation of an EBA is given below illustrating how each EBA will be the fundamental base and operational platform integrating the various Tasks of WP1. HTML XML PDF
      PubDate: Mon, 15 Aug 2022 17:30:00 +030
       
  • Deliverable D4.11 EIP abstract on the literature review of Task 2.1

    • Abstract:
      DOI : 10.3897/arphapreprints.e93510
      Authors : Lena Luise Schaller, Verena Scherfranz, Kati Häfner, Fabian Klebl, Jabier Ruiz, Jochen Kantelhardt, Annette Piorr : Regulatory and incentive instruments for biodiversity management on farms (Short summary for practitioners) HTML XML PDF
      PubDate: Mon, 15 Aug 2022 17:30:00 +030
       
  • Deliverable D2.1¬†Overview of regulatory and incentive instruments for
           biodiversity management on farms

    • Abstract:
      DOI : 10.3897/arphapreprints.e93506
      Authors : Lena Luise Schaller, Verena Scherfranz, Kati Häfner, Fabian Klebl, Jabier Ruiz, Jochen Kantelhardt, Annette Piorr : This document represents Deliverable 2.1 “Overview of regulatory and incentive instruments for biodiversity management on farms” within WP2 „Identifying incentives to promote biodiversity and ecosystem services in agricultural landscapes“ of the EU Horizon 2020 project SHOWCASE. It reports the outcomes of WP2 Task 2.1 “Evaluating regulatory and incentive instruments for biodiversity management on farms”.In the 1st and 2nd chapter, the report gives a short introduction of the deliverable’s objectives, the tasks addressed, the report’s outline and the main focus of the literature review.Chapter 3 gives an overview of the main laws governing biodiversity protection in the European Union. The main elements of the Birds and Habitats directives are presented, alongside other biodiversity laws and policies, with a focus on the obligations and requirements they set on agriculture in order to protect European native wildlife. Chapter 3 also covers the features of the EU’s Common Agricultural Policy that operate as a regulatory baseline for all beneficiaries of farm subsidies, i.e., cross-compliance and greening requirements under the current CAP and the new conditionality in the CAP 2023-2027.Chapter 4 gives an overview of economic and non-economic approaches potentially promoting farmers’ pro-biodiversity behaviour. Whereas economically oriented approaches imply positive or negative monetary flows – compensation payments or rewards vs. penalties – to motivate farmers to implement biodiversity-friendly management practices or to prevent them from harming biodiversity, partnerships and networks steer farmers’ behaviour through agreeing on a common goal and working towards it by sharing resources, skills and risk. With regards to the agricultural focus of SHOWCASE, Chapter 4 looks in more detail at the incentives provided by the Common Agricultural Policy (CAP) of the European Union. This covers both the current and future CAP, with an overview of how the novel eco-schemes can provide new incentives for farmers to adopt biodiversity friendly practices.Chapter 5 looks into how the combination of regulatory frameworks and incentives operate in practice for farmers in the EU. To this end, grey literature and European Commission publications related to farming for biodiversity have been reviewed. A specific focus is set on biodiversity-friendly farming in Natura 2000 sites, as central exemplary areas of continuous and long-lasting efforts in biodiversity conservation. This is followed by revising some of the main conclusions from very recent grey literature assessing the successes and failures of the CAP in relation to biodiversity.Chapter 6 provides an overview of approaches that have already been implemented to incentivize farmers’ pro-biodiversity behaviour. Based on grey literature, various types of approaches – i. e. focusing on plot or farm level, land tenure or the entire value chain, building on organic farming or including market-based, value-based or measure-based mechanisms – were identified within the EBA countries, further EU member states and selected western countries outside the EU. In sum, 62 examples of pro-biodiversity schemes were included in the further analysis representing highly divergent incentivizing mechanisms and the most important agricultural systems of the EBAs as well as in consequence serving as an information platform for further EBA scheme design activities.Based on the preceding chapters and their focus on result-based approaches, Chapter 7 casts a critical eye on their suitability with regards to various regulatory, policy, social and administrative contexts also considering potential national differences. On the international level, WTO requirements such as Green Box rules are a limiting factor with regards to result- based payment modalities and thus scheme design. On the national and regional level, issues to be considered include long-term availability of funding, guaranteeing additionality if requested, stakeholders’ and decision-makers’ attitudes towards agri-environment-climate measures in general as well as towards result-oriented approaches specifically, availability of suitable indicators and IT-systems, access to extension services and profound know-how of farmers and public authorities regarding the interlinkages between biodiversity and farming practices. On individual level, farmers’ trust in involved institutions and their willingness to participate are additionally discussed as highly relevant factors affecting the suitability of result- based approaches.In Chapter 8 a structured overview on factors influencing farmers’ willingness to promote biodiversity by implementing voluntary biodiversity measures is presented. Based on the review of scientific literature, the chapter describes several determinants which have been identified along three scales, i.e. 1) society, community and landscape, 2) farm scale, and 3) farmers’ intrinsic factors. The main influencing factors at the first scale range from the design of policies, to economic aspects, to socio-cultural norms. The second scale encompasses relevant farm characteristics, such as farm type and size to field conditions. For the farmers’ intrinsic factors age, education, experience, and self-identity play an important role. However, it is important to make a distinction between farmers’ willingness to participate in schemes and their actual behaviour, because the latter is determined by ...
      PubDate: Mon, 15 Aug 2022 17:30:00 +030
       
  • Deliverable D4.9 Project logo, marketing starter pack and website
           running

    • Abstract:
      DOI : 10.3897/arphapreprints.e93511
      Authors : Anna Sapundzhieva, Alexandra Korcheva, Georgi Zhelezov : The following report presents the initial project branding and marketing products that showcase the project’s visual identity and overall corporate appearance.As a foundation of the future effective communication activities, a sound set of working dissemination tools and materials is crucial to be established within the first months of the project. A project logo, project promotional materials, overall visual identity package, and a public website (www.showcase-project.eu) were developed in the first 4 months of the project duration in order to form the main tools of project public visibility and internal communication.The project is provided with a logo that has been communicated and coordinated with all project partners. Dissemination materials such as the SHOWCASE brochure and poster were produced for raising awareness and engaging stakeholders at events. A project brand manual was created and circulated among project partners in order to provide a consistent visual representation of the project. A set of corporate templates was also produced and made available to the consortium partners to facilitate future dissemination and reporting activities such as letters, milestones and deliverable reports, PowerPoint presentations, etc. The project website is developed as the main dissemination channel.The longer‐term impact of the project's results will be secured by maintaining the website for a minimum of 5 years after the end of the project. HTML XML PDF
      PubDate: Mon, 15 Aug 2022 17:30:00 +030
       
  • Deliverable D3.8¬†A review of existing citizen science approaches to
           monitoring farmland biodiversity

    • Abstract:
      DOI : 10.3897/arphapreprints.e93507
      Authors : Andrew Ruck, Erik Öckinger, Rene van der Wal, Alice Mauchline, Amelia Hood, Simon Potts, Michiel Wallis De Vries, Sabrina Gaba, Vincent Bretagnolle : This report was researched and written between April and December 2021 by researchers at the Swedish University of Agricultural Sciences (SLU), with support from partners at the University of Reading (UK), De Vlinderstichting (Netherlands), and Centre National de la Recherche Scientifique (CNRS, France). The report consists of a review of existing 'citizen science’ approaches to monitoring biodiversity on farmland, in which we introduce a typology of five different types of approach, and highlight the strengths and weaknesses of these. This forms part of the project “SHOWCASing synergies between agriculture, biodiversity and Ecosystem services to help farmers capitalising on native biodiversity” (SHOWCASE). SHOWCASE aims to encourage the widespread uptake of biodiversity-friendly farming practices across Europe, both through identifying effective incentives for farmers, and gathering further evidence of the ecosystem services provided by increased levels of biodiversity. The project receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No.862480. In particular, this report fulfils Deliverable 3.8 within SHOWCASE: “A review of existing citizen science approaches to monitoring farmland biodiversity, including an overview of the different statistical approaches to handling citizen science data”. We at SLU are grateful to all SHOWCASE partners for their contributions. HTML XML PDF
      PubDate: Mon, 15 Aug 2022 17:30:00 +030
       
  • Bio-photogrammetry: digitally archiving coloured 3D morphology data of
           creatures and associated challenges

    • Abstract: Research Ideas and Outcomes 8: e86985
      DOI : 10.3897/rio.8.e86985
      Authors : Yuichi Kano : Morphological data of life forms are fundamental for documenting and understanding biodiversity. I developed a photogrammetry technique for reconstructing the outer coloured morphology of various creatures and published more than 1000 models online (https://sketchfab.com/ffishAsia-and-floraZia). By suspending it with nylon fishing line(s), taking digital photos from multiple angles and analysing the photos with photogrammetry software, we can obtain a fine 3-dimensional (3D) model of a creature. I believe the challenge could contribute to various fields, such as taxonomy, museology, morphology, anatomy, ecology, education, artificial intelligence, virtual reality, metaverse and, eventually, open/citizen science. Herein, I report the idea and achievement, which I have termed “bio-photogrammetry.” HTML XML PDF
      PubDate: Mon, 8 Aug 2022 10:31:51 +0300
       
  • Situating social work within disaster governance. Assessing the agency of
           social work as a bridging agent and its professionalization in disaster
           governance

    • Abstract: Research Ideas and Outcomes 8: e81568
      DOI : 10.3897/rio.8.e81568
      Authors : Pia Hollenbach, Monika Goetzoe, Malith de Silva : The SARS-COV-2 pandemic created a serious shock and surprise to the disaster governance mechanisms in existence. Even the most advanced disaster governance systems in the world struggled to govern, respond, communicate risk and build resilience against the pandemic. The overall management – locally and globally- showed that relevant stakeholders such as social workers that work frontline but also within disaster management relevant fields, were not heart nor taken their potentials and knowledge into consideration to sustainably set up a disaster management and responds strategy. Applying a comparative multi-sited ethnographic approach, the study aims to highlight the potential agency of social work as a bridging agent to enhance the efficiency and effectiveness of existing disaster governance and communication architecture and improve the resilience of communities to cope with the socio-ecological complexity of future disasters, similar to SARS-COV-2. Impact will be created in four main areas: (1) Actors in disaster governance will be educated using the new knowledge produced on contextualized disaster governance and communication strategies and impacts on community resilience; (2) Enhanced capacity and awareness of professional social work practitioners on their role/s as bridging agents within the disaster governance architecture to enhance disaster risk communication and community resilience; (3) Improved capacity for decision and policy-making and strengthened agency of social work in the field of disaster governance through the introduction of professional development training and the ToolKit SW2BRIDGE; and (4) Improved social work education at the university level through the introduction of a post-graduate programme on the application of social work in disasters. HTML XML PDF
      PubDate: Mon, 1 Aug 2022 09:31:34 +0300
       
  • Language evolution is not limited to speech acquisition: a large study of
           language development in children with language deficits highlights the
           importance of the voluntary imagination component of language

    • Abstract: Research Ideas and Outcomes 8: e86401
      DOI : 10.3897/rio.8.e86401
      Authors : Andrey Vyshedskiy : Did the boy bite the cat or was it the other way around' When processing a sentence with several objects, one has to establish ‘who did what to whom’. When a sentence cannot be interpreted by recalling an image from memory, we rely on the special type of voluntary constructive imagination called Prefrontal synthesis (PFS). PFS is defined as the ability to juxtapose mental visuospatial objects at will. We hypothesised that PFS has fundamental importance for language acquisition. To test this hypothesis, we designed a PFS-targeting intervention and administered it to 6,454 children with language deficiencies (age 2 to 12 years). The results from the three-year-long study demonstrated that children who engaged with the PFS intervention showed 2.2-fold improvement in combinatorial language comprehension compared to children with similar initial evaluations. These findings suggest that language can be improved by training the PFS and exposes the importance of the visuospatial component of language. This manuscript reflects on the experimental findings from the point of view of human language evolution. When used as a proxy for evolutionary language acquisition, the study results suggest a dichotomy of language evolution, with its speech component and its visuospatial component developing in parallel. The study highlights the radical idea that evolutionary acquisition of language was driven primarily by improvements of voluntary imagination rather than by improvements in the speech apparatus. HTML XML PDF
      PubDate: Thu, 14 Jul 2022 11:59:19 +030
       
  • Anthropocenic Objects. Collecting Practices for the Age of Humans

    • Abstract: Research Ideas and Outcomes 8: e89446
      DOI : 10.3897/rio.8.e89446
      Authors : Ulrike Sturm, Elisabeth Heyne, Elisa Herrmann, Bergit Arends, Anna-Lisa Dieter, Eric Dorfman, Frank Drauschke, Nicole Heller, Rebecca Kahn, Katja Kaiser, Gerda Koch, Nicolas Kramar, Alicia Mansilla Sánchez, Franz Mauelshagen, Tahani Nadim, Richard Pell, Mareike Petersen, Katharina Schmidt-Loske, Henning Scholz, Colin Sterling, Helmuth Trischler, Sarah Wagner : The knowledge needed to tackle future environmental and societal challenges can only be generated through exchange between science and society. The conventional distinction made between natural and cultural heritage in museums and other institutions is no longer appropriate in the Anthropocene. Museums must rethink the social and cultural dimensions of existing museum collections and reinvent the organization of knowledge production for our present. In three workshops at the Museum für Naturkunde Berlin, practitioners and interdisciplinary theorists discussed the concept of “Anthropocenic objects” and considered how they create opportunities for the emergence of new collecting practices involving participatory research and open exchange between research, society, and conservation institutions. HTML XML PDF
      PubDate: Tue, 12 Jul 2022 14:31:59 +030
       
  • Showcasing synergies between agriculture, biodiversity and ecosystem
           services to help farmers capitalising on native biodiversity (SHOWCASE)

    • Abstract: Research Ideas and Outcomes 8: e90079
      DOI : 10.3897/rio.8.e90079
      Authors : David Kleijn, Simon Potts, Erik Öckinger, Felix Herzog, Lena Luise Schaller, Ignasi Bartomeus, Kati Häfner, Vincent Bretagnolle, Anna Sapundzhieva : The slow adoption by the agricultural sector of practices to promote biodiversity are thought to originate from three interrelated issues. First, we know little about which incentives effectively motivate farmers to integrate biodiversity into daily farm management. Second, few studies so far have produced evidence that biodiversity-based approaches produce benefits in terms of key variables for farmers (yield, profit). Third, there is a large communication gap between the scientists investigating biodiversity-based farming practices and the farmers who have to implement them. To overcome these barriers, SHOWCASE will review and test the effectiveness of a range of economic and societal incentives to implement biodiversity management in farming operations and examine farmer and public acceptance. Focus will be on three promising approaches: (i) result-based incentives, (ii) involvement in citizen science biodiversity monitoring and (iii) biodiversity-based business models. SHOWCASE will co-produce together with stakeholders solid interdisciplinary evidence for the agro-ecological and socio-economic benefits of biodiversity management in 10 contrasting farming systems across Europe. SHOWCASE will also design communication strategies that are tailor-made to farmers and other key stakeholders operating in different socio-economic and environmental conditions.SHOWCASE will develop a multi-actor network of 10 Experimental Biodiversity Areas in contrasting European farming systems that will be used for in-situ research on biodiversity incentives and evidence for benefits as well as knowledge exchange. This network will be used to identify and test biodiversity indicators and targets relevant to all stakeholders and use them in a learning-by-doing approach to improve benefits of biodiversity management on farms, both within the network and beyond. HTML XML PDF
      PubDate: Mon, 11 Jul 2022 10:16:31 +030
       
  • The Animal Landscape and Man Simulation System (ALMaSS): a history,
           design, and philosophy

    • Abstract: Research Ideas and Outcomes 8: e89919
      DOI : 10.3897/rio.8.e89919
      Authors : Christopher John Topping : This article is the first article in the new topical RIO journal collection for ALMaSS. This editorial introduces ALMaSS, its history, component parts and philosophy, and forms a first access point for those interested in knowing more. It is written from my own personal perspective as the instigator and main developer for the system, effectively as the ‘father’ of ALMaSS. HTML XML PDF
      PubDate: Thu, 7 Jul 2022 15:31:36 +0300
       
  • People-Powered Research and Experiential Learning: Unravelling Hidden
           Biodiversity

    • Abstract: Research Ideas and Outcomes 8: e83853
      DOI : 10.3897/rio.8.e83853
      Authors : Melanie Pivarski, Matt von Konrat, Thomas Campbell, Ayesha Qazi-Lampert, Laura Trouille, Heaven Wade, Aimee Davis, Selma Aburahmeh, Joseph Aguilar, Cosmin Alb, Ken Alferes, Ella Barker, Karl Bitikofer, Kelli Boulware, Carla Bruton, Sicong Cao, Arturo Corona Jr., Christine Christian, Kaltra Demiri, Daniel Evans, Nkosi Evans, Connor Flavin, Jasmine Gillis, Victoria Gogol, Elizabeth Heublein, Edward Huang, Jake Hutchinson, Cyrus Jackson, Odaliz Jackson, Lauren Johnson, Michi Kirihara, Henry Kivarkis, Annette Kowalczyk, Alex Labontu, Briajia Levi, Ian Lyu, Sylvie Martin-Eberhardt, Gaby Mata, Joann Martinec, Beth McDonald, Mariola Mira, Minh Nguyen, Pansy Nguyen, Sarah Nolimal, Victoria Reese, Will Ritchie, Joannie Rodriguez, Yarency Rodriguez, Jacob Shuler, Jasmine Silvestre, Glenn Simpson, Gabriel Somarriba, Rogers Ssozi, Tomomi Suwa, Cheyenne Syring, Nidhi Thirthamattur, Keith Thompson, Caitlin Vaughn, Mario Viramontes, Chak Shing Wong, Lauren Wszolek : Globally, thousands of institutions house nearly three billion scientific collections offering unparallelled resources that contribute to both science and society. For herbaria alone - facilities housing dried plant collections - there are over 3,000 herbaria worldwide with an estimated 350 million specimens that have been collected over the past four centuries. Digitisation has greatly enhanced the use of herbarium data in scientific research, impacting diverse research areas, including biodiversity informatics, global climate change, analyses using next-generation sequencing technologies and many others. Despite the entrance of herbaria into a new era with enhanced scientific, educational and societal relevance, museum specimens remain underused. Natural history museums can enhance learning and engagement in science, particularly for school-age and undergraduate students. Here, we outline a novel approach of a natural history museum using touchscreen technology that formed part of an interactive kiosk in a temporary museum exhibit on biological specimens. We provide some preliminary analysis investigating the efficacy of the tool, based on the Zooniverse platform, in an exhibit environment to engage patrons in the collection of biological data. We conclude there is great potential in using crowd‐sourced science, coupled with online technology to unlock data and information from digital images of natural history specimens themselves. Sixty percent of the records generated by community scientists (citizen scientists) were of high enough quality to be utilised by researchers. All age groups produced valid, high quality data that could be used by researchers, including children (10 and under), teens and adults. Significantly, the paper outlines the implementation of experiential learning through an undergraduate mathematics course that focuses on projects with actual data to gain a deep, practical knowledge of the subject, including observations, the collection of data, analysis and problem solving. We here promote an intergenerational model including children, high school students, undergraduate students, early career scientists and senior scientists, combining experiential learning, museum patrons, researchers and data derived from natural history collections. Natural history museums with their dual remit of education and collections-based research can play a significant role in the field of community engagement and people-powered research. There also remains much to investigate on the use of interactive displays to help learners interpret and appreciate authentic research. We conclude with a brief insight into the next phase of our ongoing people-powered research activities developed and designed by high school students using the Zooniverse platform. HTML XML PDF
      PubDate: Mon, 27 Jun 2022 17:01:37 +030
       
  • A price tag on species

    • Abstract: Research Ideas and Outcomes 8: e86741
      DOI : 10.3897/rio.8.e86741
      Authors : Urmas Kõljalg, R. Henrik Nilsson, Arnold Tobias Jansson, Allan Zirk, Kessy Abarenkov : Species have intrinsic value but also partake in a long range of ecosystem services of major economic value to humans. These values have proved hard to quantify precisely, making it all too easy to dismiss them altogether. We outline the concept of the species stock market (SSM), a system to provide a unified basis for valuation of all living species. The SSM amalgamates digitized information from natural history collections, occurrence data, and molecular sequence databases to quantify our knowledge of each species from scientific, societal, and economic points of view. The conceptual trading system will necessarily be very unlike that of the regular stock market, but the looming biodiversity crisis implores us to finally put an open and transparent price tag on symbiosis, deforestation, and pollution HTML XML PDF
      PubDate: Fri, 17 Jun 2022 13:46:14 +030
       
  • Biotic Interactions as Mediators of Context-Dependent
           Biodiversity-Ecosystem Functioning Relationships

    • Abstract: Research Ideas and Outcomes 8: e85873
      DOI : 10.3897/rio.8.e85873
      Authors : Nico Eisenhauer, Paola Bonfante, François Buscot, Simone Cesarz, Carlos Guerra, Anna Heintz-Buschart, Jes Hines, Guillaume Patoine, Matthias Rillig, Bernhard Schmid, Kris Verheyen, Christian Wirth, Olga Ferlian : Biodiversity drives the maintenance and stability of ecosystem functioning as well as many of nature’s benefits to people, yet people cause substantial biodiversity change. Despite broad consensus about a positive relationship between biodiversity and ecosystem functioning (BEF), the underlying mechanisms and their context-dependencies are not well understood. This proposal, submitted to the European Research Council (ERC), aims at filling this knowledge gap by providing a novel conceptual framework for integrating biotic interactions across guilds of organisms, i.e. plants and mycorrhizal fungi, to explain the ecosystem consequences of biodiversity change. The overarching hypothesis is that EF increases when more tree species associate with functionally dissimilar mycorrhizal fungi. Taking a whole-ecosystem perspective, we propose to explore the role of tree-mycorrhiza interactions in driving BEF across environmental contexts and how this relates to nutrient dynamics. Given the significant role that mycorrhizae play in soil nutrient and water uptake, BEF relationships will be investigated under normal and drought conditions. Resulting ecosystem consequences will be explored by studying main energy channels and ecosystem multifunctionality using food web energy fluxes and by assessing carbon storage. Synthesising drivers of biotic interactions will allow us to understand context-dependent BEF relationships. This interdisciplinary and integrative project spans the whole gradient from local-scale process assessments to global relationships by building on unique experimental infrastructures like the MyDiv Experiment, iDiv Ecotron and the global network TreeDivNet, to link ecological mechanisms to reforestation initiatives. This innovative combination of basic scientific research with real-world interventions links trait-based community ecology, global change research and ecosystem ecology, pioneering a new generation of BEF research and represents a significant step towards implementing BEF theory for human needs. HTML XML PDF
      PubDate: Tue, 7 Jun 2022 14:31:01 +0300
       
  • Current cave monitoring practices, their variation and recommendations for
           future improvement in Europe: A synopsis from the 6th EuroSpeleo
           Protection Symposium

    • Abstract: Research Ideas and Outcomes 8: e85859
      DOI : 10.3897/rio.8.e85859
      Authors : Alexander Weigand, Szilárd-Lehel Bücs, Stanimira Deleva, Lada Lukić Bilela, Pierrette Nyssen, Kaloust Paragamian, Axel Ssymank, Hannah Weigand, Valerija Zakšek, Maja Zagmajster, Gergely Balázs, Shalva Barjadze, Katharina Bürger, William Burn, Didier Cailhol, Amélie Decrolière, Ferdinando Didonna, Azdren Doli, Tvrtko Drazina, Joerg Dreybrodt, Lana Ðud, Csaba Egri, Markus Erhard, Sašo Finžgar, Dominik Fröhlich, Grant Gartrell, Suren Gazaryan, Michel Georges, Jean-Francois Godeau, Ralf Grunewald, John Gunn, Jeff Hajenga, Peter Hofmann, Lee Knight, Hannes Köble, Nikolina Kuharic, Christian Lüthi, Cristian Munteanu, Rudjer Novak, Dainis Ozols, Matija Petkovic, Fabio Stoch, Bärbel Vogel, Ines Vukovic, Meredith Hall Weberg, Christian Zaenker, Stefan Zaenker, Ute Feit, Jean-Claude Thies : This manuscript summarizes the outcomes of the 6th EuroSpeleo Protection Symposium. Special emphasis was laid on presenting and discussing monitoring activities under the umbrella of the Habitats Directive (EU Council Directive 92/43/EEC) for habitat type 8310 "Caves not open to the public" and the Emerald Network. The discussions revealed a high level of variation in the currently conducted underground monitoring activities: there is no uniform definition of what kind of underground environments the "cave" habitat should cover, how often a specific cave has to be monitored, and what parameters should be measured to evaluate the conservation status. The variation in spatial dimensions in national definitions of caves further affects the number of catalogued caves in a country and the number of caves to be monitored. Not always participants are aware of the complete national monitoring process and that data sets should be freely available or easily accessible. The discussions further showed an inherent dilemma between an anticipated uniform monitoring approach with a coherent assessment methodology and, on the contrary, the uniqueness of caves and subterranean biota to be assessed – combined with profound knowledge gaps and a lack of resources. Nevertheless, some good practices for future cave monitoring activities have been identified by the participants: (1) Cave monitoring should focus on bio- and geodiversity elements alike; (2) Local communities should be involved, and formal agreements envisaged; (3) Caves must be understood as windows into the subterranean realm; (4) Touristic caves should not be excluded ad-hoc from regular monitoring; (5) New digital tools and open FAIR data infrastructures should be implemented; (6) Cave biomonitoring should focus on a large(r) biological diversity; and (7) DNA-based tools should be integrated. Finally, the importance of the 'forgotten' Recommendation No. 36 from the Bern Convention as a guiding legal European document was highlighted. HTML XML PDF
      PubDate: Wed, 4 May 2022 12:46:42 +0300
       
  • An idea on Smart Farming: IoT monitoring of water production from
           dihydrogen combustion

    • Abstract: Research Ideas and Outcomes 8: e82995
      DOI : 10.3897/rio.8.e82995
      Authors : Radia Belkeziz : Smart Farming is a concept developing rapidly and gaining momentum. The management of livestock and farm products is done in an automated way thanks to IoT technology. The large field of data at hand offers the possibility of analysis for a better understanding of issues and more efficient decision-making. The management of water consumption is one of the most relevent Smart Farming use cases. In the event of drought, the pressure on water resources becomes increasingly strong. What if we produced water then' The idea of not worrying about the consequence of drought on agricultural production would be interesting.One of the first experiences you learn in a chemistry class is that the combustion of dihydrogen produces water. However, it is necessary to follow this experience closely because of the risk of explosion. Dihydrogen can be produced by the gasification of (agricultural) biomass. Here, the technology takes over, by the means of a supervising IoT system. This system will manage the overall process from biomass production, then dihydrogen production (biomass-to-hydrogen), to water production (dihydrogen-to-water).If the idea proves to be viable on a large scale, the result would be valuable in reducing the issue of water scarcity, in times of drought, in agricultural areas, and even in allowing energy autonomy on farms. HTML XML PDF
      PubDate: Mon, 18 Apr 2022 09:08:57 +030
       
  • FID Civil Engineering, Architecture and Urbanism digital - A platform for
           science (BAUdigital)

    • Abstract: Research Ideas and Outcomes 8: e82563
      DOI : 10.3897/rio.8.e82563
      Authors : Susanne Arndt, Anna Beer, Ina Blümel, Carsten Elsner, Christian Hauschke, Dagmar Holste, Benjamin Kampe, Micky Lindlar, Gelareh Mofakhamsanie, Andreas Noback, Hedda Saemann, Stephan Tittel, Friedmar Voormann, Katja Wermbter, Roger Winkler : University Library Braunschweig (UB Braunschweig), University and State Library Darmstadt (ULB Darmstadt), TIB – Leibniz Information Centre for Technology and Natural Sciences and the Fraunhofer Information Centre for Planning and Building (Fraunhofer IRB) are jointly establishing a specialised information service (FID, "Fachinformationsdienst") for the disciplines of civil engineering, architecture and urbanism. The FID BAUdigital, which is funded by the German Research Foundation (DFG, "Deutsche Forschungsgemeinschaft"), will provide researchers working on digital design, planning and production methods in construction engineering with a joint information, networking and data exchange platform and support them with innovative services for documentation, archiving and publication in their data-based research. HTML XML PDF
      PubDate: Wed, 6 Apr 2022 08:10:41 +0300
       
  • Europa Biodiversity Observation Network: User and Policy Needs Assessment

    • Abstract: Research Ideas and Outcomes 8: e84480
      DOI : 10.3897/arphapreprints.e84517
      Authors : Hannah Moersberger, Juliette G. C. Martin, Jessi Junker, Ivelina Georgieva, Silke Bauer, Pedro Beja, Tom Breeze, Lluís Brotons, Helge Bruelheide, Néstor Fernández, Miguel Fernandez, Ute Jandt, Christian Langer, Anne Lyche Solheim, Joachim Maes, Francisco Moreira, Guy Pe'er, Joana Santana, Judy Shamoun-Baranes, Bruno Smets, Jose Valdez, Ian McCallum, Henrique M. Pereira, Aletta Bonn : In this report, we present the analysis of the different available biodiversity data streams at the EU and national level, both baseline biodiversity data and monitoring data. We assess how these biodiversity data inform and trigger policy action and identify the related challenges the different European countries and relevant EU agencies face and the solutions to overcome them. To do this, we consulted with more than 350 expert stakeholders from policy, research and practice. The assessment identified a fragmented biodiversity data landscape that cannot currently easily answer all relevant policy questions. Quantity and quality of biodiversity baseline datasets differ for the different countries, ranging from non-existent biodiversity monitoring due to capacity issues, to regular monitoring of ecosystem processes and state. By engaging stakeholders and experts in both member states and non-member states and from several EU bodies, we identified key challenges and ways to address these with targeted solutions towards building a joint European Biodiversity Monitoring Network. Solutions include focussing on cooperation and coordination, enhanced data standardisation and sharing, as well as the use of models and new technologies. These solutions can however only be realised with dedicated funding and capacity building, in coordination with all stakeholders in partnership. HTML XML PDF
      PubDate: Wed, 30 Mar 2022 07:57:25 +030
       
  • The Ecological Observing System of the Adriatic Sea (ECOAdS): structure
           and perspectives within the main European biodiversity and environmental
           strategies

    • Abstract: Research Ideas and Outcomes 8: e82597
      DOI : 10.3897/rio.8.e82597
      Authors : Alessandra Pugnetti, Elisabetta Manea, Ivica Vilibić, Alessandro Sarretta, Lucilla Capotondi, Bruno Cataletto, Elisabeth De Maio, Carlo Franzosini, Ivana Golec, Marco Gottardi, Jelena Kurtović Mrčelić, Hrvoje Mihanovic, Alessandro Oggioni, Grgur Pleslic, Mariangela Ravaioli, Silvia Rova, Andrea Valentini, Caterina Bergami : This Policy Brief succinctly presents the Ecological Observing System of the Adriatic Sea (ECOAdS), aimed at integrating the ecological and oceanographic dimensions within the conservation strategy of the Natura 2000 network, and to propose a way to go for its future development and maintenance. After a definition of marine ecological observatories, we describe the current structure of ECOAdS, its key components and potential relevance in relation to the main European strategies for biodiversity and marine observation for the next decade. Finally, we suggest some actions that could be undertaken for the future development of ECOAdS, targeting possible perspectives in different regional, macro-regional, national and European strategic contexts. This Policy Brief is one of the outcomes of the Interreg Italy-Croatia Project ECOSS (ECological Observing System in the Adriatic Sea: oceanographic observations for biodiversity; https://www.italy-croatia.eu/web/ecoss), which had the main purpose to design and carry out the first steps for the establishment of ECOAdS. HTML XML PDF
      PubDate: Fri, 25 Mar 2022 09:08:20 +020
       
  • B-GOOD: Giving Beekeeping Guidance by cOmputatiOnal-assisted Decision
           making

    • Abstract: Research Ideas and Outcomes 8: e84129
      DOI : 10.3897/rio.8.e84129
      Authors : Dirk de Graaf, Martin Bencsik, Lina De Smet, Peter Neumann, Marten Schoonman, José Paulo Sousa, Christopher Topping, Wim Verbeke, James Williams, Coby van Dooremalen : A key to healthy beekeeping is the Health Status Index (HIS) inspired by EFSA’s Healthy-B toolbox which we will make fully operational, with the active collaboration of beekeepers, by facilitating the coordinated and harmonised flow of data from various sources and by testing and validating each component thoroughly. We envisage a step-by-step expansion of participating apiaries, and will eventually cover all EU biogeographic regions. The key to a sustainable beekeeping is a better understanding of its socio-economics, particularly within local value chains, its relationship with bee health and the human-ecosystem equilibrium of the beekeeping sector and to implement these insights into the data processing and decision making. We will fully integrate socio-economic analyses, identify viable business models tailored to different contexts for European beekeeping and determine the carrying capacity of the landscape. In close cooperation with the EU Bee Partnership, an EU-wide bee health and management data platform and affiliated project website will be created to enable sharing of knowledge and learning between scientists and stakeholders within and outside the consortium. We will utilise and further expand the classification of the open source IT-application for digital beekeeping, BEEP, to streamline the flow of data related to beekeeping management, the beehive and its environment (landscape, agricultural practices, weather and climate) from various sources. The dynamic bee health and management data platform will allow us to identify correlative relationships among factors impacting the HSI, assess the risk of emerging pests and predators, and enable beekeepers to develop adaptive management strategies that account for local and EU-wide issues. Reinforcing and establishing, where necessary, new multi-actor networks of collaboration will engender a lasting learning and innovation system to ensure social-ecological resilient and sustainable beekeeping. HTML XML PDF
      PubDate: Wed, 23 Mar 2022 08:49:02 +020
       
  • SKG4EOSC - Scholarly Knowledge Graphs for EOSC: Establishing a backbone of
           knowledge graphs for FAIR Scholarly Information in EOSC

    • Abstract: Research Ideas and Outcomes 8: e83789
      DOI : 10.3897/rio.8.e83789
      Authors : Markus Stocker, Tina Heger, Artur Schweidtmann, Hanna Ćwiek-Kupczyńska, Lyubomir Penev, Milan Dojchinovski, Egon Willighagen, Maria-Esther Vidal, Houcemeddine Turki, Daniel Balliet, Ilaria Tiddi, Tobias Kuhn, Daniel Mietchen, Oliver Karras, Lars Vogt, Sebastian Hellmann, Jonathan Jeschke, Paweł Krajewski, Sören Auer : In the age of advanced information systems powering fast-paced knowledge economies that face global societal challenges, it is no longer adequate to express scholarly information - an essential resource for modern economies - primarily as article narratives in document form. Despite being a well-established tradition in scholarly communication, PDF-based text publishing is hindering scientific progress as it buries scholarly information into non-machine-readable formats. The key objective of SKG4EOSC is to improve science productivity through development and implementation of services for text and data conversion, and production, curation, and re-use of FAIR scholarly information. This will be achieved by (1) establishing the Open Research Knowledge Graph (ORKG, orkg.org), a service operated by the SKG4EOSC coordinator, as a Hub for access to FAIR scholarly information in the EOSC; (2) lifting to EOSC of numerous and heterogeneous domain-specific research infrastructures through the ORKG Hub’s harmonized access facilities; and (3) leverage the Hub to support cross-disciplinary research and policy decisions addressing societal challenges. SKG4EOSC will pilot the devised approaches and technologies in four research domains: biodiversity crisis, precision oncology, circular processes, and human cooperation. With the aim to improve machine-based scholarly information use, SKG4EOSC addresses an important current and future need of researchers. It extends the application of the FAIR data principles to scholarly communication practices, hence a more comprehensive coverage of the entire research lifecycle. Through explicit, machine actionable provenance links between FAIR scholarly information, primary data and contextual entities, it will substantially contribute to reproducibility, validation and trust in science. The resulting advanced machine support will catalyse new discoveries in basic research and solutions in key application areas. HTML XML PDF
      PubDate: Tue, 15 Mar 2022 14:47:20 +020
       
  • The WIO Regional Benthic Imagery Workshop: Lessons from past IIOE-2
           expeditions

    • Abstract: Research Ideas and Outcomes 8: e81563
      DOI : 10.3897/rio.8.e81563
      Authors : Tanya Haupt, Jamie Ceasar, Paris Stefanoudis, Charles von der Meden, Robyn Payne, Luther Adams, Darrell Anders, Anthony Bernard, Willem Coetzer, Wayne Florence, Liesl Janson, Ashley Johnson, Roxanne Juby, Alison Kock, Daniel Langenkämper, Ahmed Nadjim, Denham Parker, Toufiek Samaai, Laurenne Snyders, Leshia Upfold, Grant van der Heever, Lauren Williams : Originating from the Second International Indian Ocean Expedition (IIOE-2), the main goal of the Western Indian Ocean (WIO) Regional Benthic Imagery Workshop, was to provide information and training on the use of various underwater imagery platforms in benthic research. To date, attempts made to explore the bottom of the ocean range from simple diving bells to more advanced camera systems, and the rapidly expanding field of underwater image-based research has supported marine exploration in many forms, from biodiversity surveys, spatial analyses and temporal studies, to monitoring schemes. Alongside the increasing use of underwater camera systems worldwide, there is an evident need to improve training and access to these techniques for students and researchers from institutes within the WIO. The week-long virtual event was conducted between 30 August and 3 September 2021 with 266 participants. Sessions consisted of lessons, practical demonstrations and interactive discussions which covered the steps required to conduct underwater imagery surveys, taking participants through elements of sampling design, data acquisition and processing, considerations for statistical analysis and, effective managment of data. The session recordings from the workshop are available online as a teaching aid which has the potential to reach marine researchers both regionally and globally. It is crucial that we build on this momentum by continuing to develop and strengthen the network established through this initiative for standardised benthic-image-based research within the WIO. HTML XML PDF
      PubDate: Tue, 8 Mar 2022 14:14:19 +0200
       
  • BridgeDb and Wikidata: a powerful combination generating interoperable
           open research (BridgeDb)

    • Abstract: Research Ideas and Outcomes 8: e83031
      DOI : 10.3897/rio.8.e83031
      Authors : Egon Willighagen, Martina Kutmon, Marvin Martens, Denise Slenter : Like humans have a unique social security number and different phone numbers from various providers, so do proteins and metabolites have a unique structure but different identifiers from various databases. BridgeDb is an interoperability platform that allows combining these databases, by matching database-specific identifiers. These matches are called identifier mappings, and they are indispensable when combining experimental (omics) data with knowledge in reference databases. BridgeDb takes care of this interoperability between gene, protein, metabolite, and other databases, thus enabling seamless integration of many knowledge bases and wet-lab results. Since databases get updated continuously, so should the Open Science BridgeDb project. HTML XML PDF
      PubDate: Mon, 7 Mar 2022 11:50:37 +0200
       
  • Sharing taxonomic expertise between natural history collections using
           image recognition

    • Abstract: Research Ideas and Outcomes 8: e79187
      DOI : 10.3897/rio.8.e79187
      Authors : Michael Greeff, Max Caspers, Vincent Kalkman, Luc Willemse, Barry Sunderland, Olaf Bánki, Laurens Hogeweg : Natural history collections play a vital role in biodiversity research and conservation by providing a window to the past. The usefulness of the vast amount of historical data depends on their quality, with correct taxonomic identifications being the most critical. The identification of many of the objects of natural history collections, however, is wanting, doubtful or outdated. Providing correct identifications is difficult given the sheer number of objects and the scarcity of expertise. Here we outline the construction of an ecosystem for the collaborative development and exchange of image recognition algorithms designed to support the identification of objects. Such an ecosystem will facilitate sharing taxonomic expertise among institutions by offering image datasets that are correctly identified by their in-house taxonomic experts. Together with openly accessible machine learning algorithms and easy to use workbenches, this will allow other institutes to train image recognition algorithms and thereby compensate for the lacking expertise. HTML XML PDF
      PubDate: Tue, 1 Mar 2022 10:02:21 +0200
       
  • D6.1 Analysis of needs and capacity of different audiences including
           policy makers, expert practitioners and other modellers

    • Abstract:
      DOI : 10.3897/arphapreprints.e82715
      Authors : Milica Trajković, Dajana Vujaklija, Guy Ziv, Arjan Gosal, Jiaqi Ge, Jodi Gunning, Birgit Mueller, Annabelle Williams, Elisabet Nadeu : This document has five main sections: the first one, “Developing the needs assessment protocol” which explains how we approached to different stakeholders in order to define and analyse their needs and capacities; the second section contains the report of of the interviews conducted by RISE and present the needs of Policy Makers; section three explains the needs of expert practitioners identified during the online workshop (14th and 15th of July 2020);  section four presents the needs of biophysical modeling community and section five explains the needs of ABM modellers identified from recent scholarly workshops. The results of this analysis will be taken under consideration and co-design and co-development processes. HTML XML PDF
      PubDate: Fri, 25 Feb 2022 16:30:00 +020
       
  • Deliverable D2.2 BESTMAP Conceptual Framework Design &
           Architecture 

    • Abstract:
      DOI : 10.3897/arphapreprints.e82404
      Authors : Guy Ziv, Jodi Gunning, Tomáš Václavík, Michael Beckmann, Anne Paulus, Birgit Mueller, Meike Will, Anna Cord, Stephanie Roilo, James Bullock, Paul Evans, Cristina Domingo-Marimon, Joan Masó Pau : This deliverable provides a General Framework for the BESTMAP Policy Impact Assessment Modelling (BESTMAP-PIAM) toolset. The BESTMAP-PIAM is based on the notion of defining (a) a typology of agricultural systems, with one (or more) representative case study (CS) in each major system; (b) mapping all individual farms within the case study to a Farm System Archetype (FSA) typology; (c) model the adoption of agri-environmental schemes (AES) within the spatially-mapped FSA population using Agent Based Models (ABM), based on literature and a survey with sufficient representative sample in each FSA of each CS, to elucidate the non-monetary drivers underpinning AES adoption and the relative importance of financial and non-financial/social/identity drivers; (d) linking AES adoption to a set of biophysical, ecological and socio-economic impact models; (e) upscaling the CS level results to EU scale; (f) linking the outputs of these models to indicators developed for the post-2020 CAP output, result and impact reports; (g) visualizing outputs and providing a dashboard for policy makers to explore a range of policy scenarios, focusing on cost-effectiveness of different AES. HTML XML PDF
      PubDate: Fri, 25 Feb 2022 15:30:00 +020
       
  • Use of Worksheet events in Excel to save solver objective cell value from
           each iteration

    • Abstract: Research Ideas and Outcomes 8: e79006
      DOI : 10.3897/rio.8.e79006
      Authors : Prasanth Sambaraju : Solver is a Microsoft Excel add-in program which is used to find an optimal value for a formula in the objective cell. Solver accomplishes this either by maximizing, minimizing or setting the objective cell value to a specific value. The article presents the utility of in built worksheet events in Excel VBA to save the value of objective cell from each iteration when solver is used for optimization. HTML XML PDF
      PubDate: Fri, 25 Feb 2022 08:43:38 +020
       
  • Unifying approaches to Functional Marine Connectivity for improved marine
           resource management: the European SEA-UNICORN COST Action

    • Abstract: Research Ideas and Outcomes 8: e80223
      DOI : 10.3897/rio.8.e80223
      Authors : Audrey Darnaude, Sophie Arnaud-Haond, Ewan Hunter, Oscar Gaggiotti, Anna Sturrock, Maria Beger, Filip Volckaert, Angel Pérez-Ruzafa, Lucía López-López, Susanne E. Tanner, Cemal Turan, Servet Ahmet Doğdu, Stelios Katsanevakis, Federica Costantini : Truly sustainable development in a human-altered, fragmented marine environment subject to unprecedented climate change, demands informed planning strategies in order to be successful. Beyond a simple understanding of the distribution of marine species, data describing how variations in spatio-temporal dynamics impact ecosystem functioning and the evolution of species are required. Marine Functional Connectivity (MFC) characterizes the flows of matter, genes and energy produced by organism movements and migrations across the seascape. As such, MFC determines the ecological and evolutionary interdependency of populations, and ultimately the fate of species and ecosystems. Gathering effective MFC knowledge can therefore improve predictions of the impacts of environmental change and help to refine management and conservation strategies for the seas and oceans. Gathering these data are challenging however, as access to, and survey of marine ecosystems still presents significant challenge. Over 50 European institutions currently investigate aspects of MFC using complementary methods across multiple research fields, to understand the ecology and evolution of marine species. The aim of SEA-UNICORN, a COST Action within the European Union Horizon 2020 framework programme, is to bring together this research effort, unite the multiple approaches to MFC, and to integrate these under a common conceptual and analytical framework. The consortium brings together a diverse group of scientists to collate existing MFC data, to identify knowledge gaps, to enhance complementarity among disciplines, and to devise common approaches to MFC. SEA-UNICORN will promote co-working between connectivity practitioners and ecosystem modelers to facilitate the incorporation of MFC data into the predictive models used to identify marine conservation priorities. Ultimately, SEA-UNICORN will forge strong forward-working links between scientists, policy-makers and stakeholders to facilitate the integration of MFC knowledge into decision support tools for marine management and environmental policies. HTML XML PDF
      PubDate: Tue, 22 Feb 2022 14:49:46 +020
       
  • Deliverable D3.4 Summaries of data, obstacles and challenges from
           interview campaigns

    • Abstract:
      DOI : 10.3897/arphapreprints.e81787
      Authors : Felix Wittstock, David Hötten, Sofia Biffi, Cristina Domingo, Bořivoj Šarapatka, Marek Bednář, Minučer Mesaroš : This deliverable presents a Summaries of data, obstacles and challenges from interview campaigns of the H2020 BESTMAP project. It covers a detailed description of methodology, reporting on the concrete steps taken to collect and analyze interview data. It also discusses obstacles and challenges to BESTMAP interview campaigns. Finally, the deliverable presents the main qualitative and quantitative findings of the interview analysis, with a focus on qualitative content analysis of open interview questions. HTML XML PDF
      PubDate: Mon, 7 Feb 2022 10:15:00 +0200
       
  • D1.3 Guidelines and protocols harmonizing activities across case
           studies

    • Abstract:
      DOI : 10.3897/arphapreprints.e81337
      Authors : Tomáš Václavík, Fanny Langerwisch, Guy Ziv, Jodi Gunning, Arjan Gosal, Michael Beckmann, Anne Paulus, Felix Wittstock, Anna Cord, Stephanie Roilo, Cristina Domingo-Marimon, Anabel Sanchez, Annelies Broekman, Dajana Vujaklija : This document is the first version of the Guidelines and protocols harmonizing activities across case studies of the H2020 BESTMAP project. It is intended to be updated in month 40 (D1.8). HTML XML PDF
      PubDate: Fri, 28 Jan 2022 10:00:00 +020
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 44.192.52.167
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-