Subjects -> BIOLOGY (Total: 3134 journals)
    - BIOCHEMISTRY (239 journals)
    - BIOENGINEERING (143 journals)
    - BIOLOGY (1491 journals)
    - BIOPHYSICS (53 journals)
    - BIOTECHNOLOGY (243 journals)
    - BOTANY (220 journals)
    - CYTOLOGY AND HISTOLOGY (32 journals)
    - ENTOMOLOGY (67 journals)
    - GENETICS (152 journals)
    - MICROBIOLOGY (265 journals)
    - MICROSCOPY (13 journals)
    - ORNITHOLOGY (26 journals)
    - PHYSIOLOGY (73 journals)
    - ZOOLOGY (117 journals)

BIOLOGY (1491 journals)                  1 2 3 4 5 6 7 8 | Last

Showing 1 - 200 of 1720 Journals sorted alphabetically
AAPS Journal     Hybrid Journal   (Followers: 29)
ACS Pharmacology & Translational Science     Hybrid Journal   (Followers: 3)
ACS Synthetic Biology     Hybrid Journal   (Followers: 39)
Acta Biologica Hungarica     Full-text available via subscription   (Followers: 6)
Acta Biologica Marisiensis     Open Access   (Followers: 5)
Acta Biologica Sibirica     Open Access   (Followers: 2)
Acta Biologica Turcica     Open Access   (Followers: 2)
Acta Biomaterialia     Hybrid Journal   (Followers: 32)
Acta Biotheoretica     Hybrid Journal   (Followers: 3)
Acta Chiropterologica     Full-text available via subscription   (Followers: 6)
acta ethologica     Hybrid Journal   (Followers: 7)
Acta Fytotechnica et Zootechnica     Open Access   (Followers: 3)
Acta Ichthyologica et Piscatoria     Open Access   (Followers: 5)
Acta Médica Costarricense     Open Access   (Followers: 2)
Acta Scientiarum. Biological Sciences     Open Access   (Followers: 2)
Acta Scientifica Naturalis     Open Access   (Followers: 4)
Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis     Open Access   (Followers: 2)
Actualidades Biológicas     Open Access   (Followers: 1)
Advanced Biology     Hybrid Journal   (Followers: 1)
Advanced Health Care Technologies     Open Access   (Followers: 12)
Advanced Journal of Graduate Research     Open Access   (Followers: 2)
Advanced Membranes     Open Access   (Followers: 9)
Advanced Quantum Technologies     Hybrid Journal   (Followers: 5)
Advances in Biological Regulation     Hybrid Journal   (Followers: 4)
Advances in Biology     Open Access   (Followers: 16)
Advances in Biomarker Sciences and Technology     Open Access   (Followers: 2)
Advances in Biosensors and Bioelectronics     Open Access   (Followers: 8)
Advances in Cell Biology/ Medical Journal of Cell Biology     Open Access   (Followers: 28)
Advances in Ecological Research     Full-text available via subscription   (Followers: 47)
Advances in Environmental Sciences - International Journal of the Bioflux Society     Open Access   (Followers: 17)
Advances in Enzyme Research     Open Access   (Followers: 11)
Advances in High Energy Physics     Open Access   (Followers: 27)
Advances in Life Science and Technology     Open Access   (Followers: 14)
Advances in Life Sciences     Open Access   (Followers: 6)
Advances in Marine Biology     Full-text available via subscription   (Followers: 29)
Advances in Virus Research     Full-text available via subscription   (Followers: 8)
Adversity and Resilience Science : Journal of Research and Practice     Hybrid Journal   (Followers: 4)
African Journal of Ecology     Hybrid Journal   (Followers: 18)
African Journal of Range & Forage Science     Hybrid Journal   (Followers: 12)
AFRREV STECH : An International Journal of Science and Technology     Open Access   (Followers: 3)
Ageing Research Reviews     Hybrid Journal   (Followers: 13)
Aggregate     Open Access   (Followers: 3)
Aging Cell     Open Access   (Followers: 23)
Agrokémia és Talajtan     Full-text available via subscription   (Followers: 2)
AJP Cell Physiology     Hybrid Journal   (Followers: 13)
AJP Endocrinology and Metabolism     Hybrid Journal   (Followers: 14)
AJP Lung Cellular and Molecular Physiology     Hybrid Journal   (Followers: 3)
Al-Kauniyah : Jurnal Biologi     Open Access  
Alasbimn Journal     Open Access   (Followers: 1)
Alces : A Journal Devoted to the Biology and Management of Moose     Open Access  
Alfarama Journal of Basic & Applied Sciences     Open Access   (Followers: 12)
All Life     Open Access   (Followers: 2)
AMB Express     Open Access   (Followers: 1)
Ambix     Hybrid Journal   (Followers: 3)
American Journal of Agricultural and Biological Sciences     Open Access   (Followers: 7)
American Journal of Bioethics     Hybrid Journal   (Followers: 17)
American Journal of Human Biology     Hybrid Journal   (Followers: 19)
American Journal of Plant Sciences     Open Access   (Followers: 24)
American Journal of Primatology     Hybrid Journal   (Followers: 17)
American Naturalist     Full-text available via subscription   (Followers: 82)
Amphibia-Reptilia     Hybrid Journal   (Followers: 5)
Anaerobe     Hybrid Journal   (Followers: 3)
Analytical Methods     Hybrid Journal   (Followers: 7)
Analytical Science Advances     Open Access   (Followers: 2)
Anatomia     Open Access   (Followers: 16)
Anatomical Science International     Hybrid Journal   (Followers: 3)
Animal Cells and Systems     Hybrid Journal   (Followers: 4)
Animal Microbiome     Open Access   (Followers: 7)
Animal Models and Experimental Medicine     Open Access  
Annales françaises d'Oto-rhino-laryngologie et de Pathologie Cervico-faciale     Full-text available via subscription   (Followers: 2)
Annales Henri Poincaré     Hybrid Journal   (Followers: 2)
Annales Universitatis Mariae Curie-Sklodowska, sectio C – Biologia     Open Access   (Followers: 1)
Annals of Applied Biology     Hybrid Journal   (Followers: 7)
Annals of Biomedical Engineering     Hybrid Journal   (Followers: 18)
Annals of Human Biology     Hybrid Journal   (Followers: 6)
Annals of Science and Technology     Open Access   (Followers: 2)
Annual Research & Review in Biology     Open Access   (Followers: 1)
Annual Review of Biomedical Engineering     Full-text available via subscription   (Followers: 19)
Annual Review of Cell and Developmental Biology     Full-text available via subscription   (Followers: 40)
Annual Review of Food Science and Technology     Full-text available via subscription   (Followers: 13)
Annual Review of Genomics and Human Genetics     Full-text available via subscription   (Followers: 32)
Antibiotics     Open Access   (Followers: 12)
Antioxidants     Open Access   (Followers: 4)
Antonie van Leeuwenhoek     Hybrid Journal   (Followers: 3)
Anzeiger für Schädlingskunde     Hybrid Journal   (Followers: 1)
Apidologie     Hybrid Journal   (Followers: 4)
Apmis     Hybrid Journal   (Followers: 1)
APOPTOSIS     Hybrid Journal   (Followers: 5)
Applied Biology     Open Access  
Applied Bionics and Biomechanics     Open Access   (Followers: 4)
Applied Phycology     Open Access   (Followers: 1)
Applied Vegetation Science     Full-text available via subscription   (Followers: 9)
Aquaculture Environment Interactions     Open Access   (Followers: 7)
Aquaculture International     Hybrid Journal   (Followers: 25)
Aquaculture Reports     Open Access   (Followers: 3)
Aquaculture, Aquarium, Conservation & Legislation - International Journal of the Bioflux Society     Open Access   (Followers: 9)
Aquatic Biology     Open Access   (Followers: 9)
Aquatic Ecology     Hybrid Journal   (Followers: 45)
Aquatic Ecosystem Health & Management     Hybrid Journal   (Followers: 16)
Aquatic Science and Technology     Open Access   (Followers: 4)
Aquatic Toxicology     Hybrid Journal   (Followers: 26)
Arabian Journal of Scientific Research / المجلة العربية للبحث العلمي     Open Access  
Archaea     Open Access   (Followers: 3)
Archiv für Molluskenkunde: International Journal of Malacology     Full-text available via subscription   (Followers: 1)
Archives of Biological Sciences     Open Access  
Archives of Microbiology     Hybrid Journal   (Followers: 9)
Archives of Natural History     Hybrid Journal   (Followers: 8)
Archives of Oral Biology     Hybrid Journal   (Followers: 2)
Archives of Virology     Hybrid Journal   (Followers: 6)
Archivum Immunologiae et Therapiae Experimentalis     Hybrid Journal   (Followers: 2)
Arid Ecosystems     Hybrid Journal   (Followers: 2)
Arquivos do Museu Dinâmico Interdisciplinar     Open Access  
Arthropod Structure & Development     Hybrid Journal   (Followers: 1)
Arthropod Systematics & Phylogeny     Open Access   (Followers: 13)
Artificial DNA: PNA & XNA     Hybrid Journal   (Followers: 2)
Artificial Intelligence in the Life Sciences     Open Access   (Followers: 1)
Asian Bioethics Review     Full-text available via subscription   (Followers: 2)
Asian Journal of Biological Sciences     Open Access   (Followers: 2)
Asian Journal of Biology     Open Access  
Asian Journal of Biotechnology and Bioresource Technology     Open Access  
Asian Journal of Cell Biology     Open Access   (Followers: 4)
Asian Journal of Developmental Biology     Open Access   (Followers: 1)
Asian Journal of Medical and Biological Research     Open Access   (Followers: 3)
Asian Journal of Nematology     Open Access   (Followers: 4)
Asian Journal of Poultry Science     Open Access   (Followers: 3)
Atti della Accademia Peloritana dei Pericolanti - Classe di Scienze Medico-Biologiche     Open Access  
Australian Life Scientist     Full-text available via subscription   (Followers: 2)
Australian Mammalogy     Hybrid Journal   (Followers: 8)
Autophagy     Hybrid Journal   (Followers: 8)
Avian Biology Research     Hybrid Journal   (Followers: 4)
Avian Conservation and Ecology     Open Access   (Followers: 19)
Bacterial Empire     Open Access   (Followers: 1)
Bacteriology Journal     Open Access   (Followers: 2)
Bacteriophage     Full-text available via subscription   (Followers: 2)
Bangladesh Journal of Bioethics     Open Access  
Bangladesh Journal of Scientific Research     Open Access  
Between the Species     Open Access   (Followers: 2)
BIO Web of Conferences     Open Access  
BIO-SITE : Biologi dan Sains Terapan     Open Access  
Biocatalysis and Biotransformation     Hybrid Journal   (Followers: 4)
BioCentury Innovations     Full-text available via subscription   (Followers: 2)
Biochemistry and Cell Biology     Hybrid Journal   (Followers: 18)
Biochimie     Hybrid Journal   (Followers: 2)
BioControl     Hybrid Journal   (Followers: 2)
Biocontrol Science and Technology     Hybrid Journal   (Followers: 5)
Biodemography and Social Biology     Hybrid Journal   (Followers: 1)
BIODIK : Jurnal Ilmiah Pendidikan Biologi     Open Access  
BioDiscovery     Open Access   (Followers: 2)
Biodiversity : Research and Conservation     Open Access   (Followers: 30)
Biodiversity Data Journal     Open Access   (Followers: 7)
Biodiversity Informatics     Open Access   (Followers: 3)
Biodiversity Information Science and Standards     Open Access   (Followers: 3)
Biodiversity Observations     Open Access   (Followers: 2)
Bioeksperimen : Jurnal Penelitian Biologi     Open Access  
Bioelectrochemistry     Hybrid Journal   (Followers: 1)
Bioelectromagnetics     Hybrid Journal   (Followers: 1)
Bioenergy Research     Hybrid Journal   (Followers: 3)
Bioengineering and Bioscience     Open Access   (Followers: 1)
BioEssays     Hybrid Journal   (Followers: 10)
Bioethics     Hybrid Journal   (Followers: 20)
BioéthiqueOnline     Open Access   (Followers: 1)
Biogeographia : The Journal of Integrative Biogeography     Open Access   (Followers: 2)
Biogeosciences (BG)     Open Access   (Followers: 19)
Biogeosciences Discussions (BGD)     Open Access   (Followers: 3)
Bioinformatics     Hybrid Journal   (Followers: 307)
Bioinformatics Advances : Journal of the International Society for Computational Biology     Open Access   (Followers: 4)
Bioinformatics and Biology Insights     Open Access   (Followers: 14)
Biointerphases     Open Access   (Followers: 1)
Biojournal of Science and Technology     Open Access  
Biologia     Hybrid Journal   (Followers: 1)
Biologia Futura     Hybrid Journal  
Biologia on-line : Revista de divulgació de la Facultat de Biologia     Open Access  
Biological Bulletin     Partially Free   (Followers: 6)
Biological Control     Hybrid Journal   (Followers: 6)
Biological Invasions     Hybrid Journal   (Followers: 24)
Biological Journal of the Linnean Society     Hybrid Journal   (Followers: 18)
Biological Procedures Online     Open Access  
Biological Psychiatry     Hybrid Journal   (Followers: 59)
Biological Psychology     Hybrid Journal   (Followers: 5)
Biological Research     Open Access   (Followers: 1)
Biological Rhythm Research     Hybrid Journal  
Biological Theory     Hybrid Journal   (Followers: 3)
Biological Trace Element Research     Hybrid Journal  
Biologicals     Full-text available via subscription   (Followers: 5)
Biologics: Targets & Therapy     Open Access   (Followers: 1)
Biologie Aujourd'hui     Full-text available via subscription  
Biologie in Unserer Zeit (Biuz)     Hybrid Journal   (Followers: 2)
Biologija     Open Access  
Biology     Open Access   (Followers: 5)
Biology and Philosophy     Hybrid Journal   (Followers: 19)
Biology Bulletin     Hybrid Journal   (Followers: 1)
Biology Bulletin Reviews     Hybrid Journal  
Biology Direct     Open Access   (Followers: 9)
Biology Methods and Protocols     Open Access  
Biology of Sex Differences     Open Access   (Followers: 1)
Biology of the Cell     Full-text available via subscription   (Followers: 8)
Biology, Medicine, & Natural Product Chemistry     Open Access   (Followers: 2)
Biomacromolecules     Hybrid Journal   (Followers: 21)
Biomarker Insights     Open Access   (Followers: 1)
Biomarkers     Hybrid Journal   (Followers: 5)

        1 2 3 4 5 6 7 8 | Last

Similar Journals
Journal Cover
Biodiversity Information Science and Standards
Number of Followers: 3  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2535-0897
Published by Pensoft Homepage  [58 journals]
  • Meeting Report for the Phenoscape TraitFest 2023 with Comments on
           Organising Interdisciplinary Meetings

    • Abstract: Biodiversity Information Science and Standards 8: e115232
      DOI : 10.3897/biss.8.115232
      Authors : Jennifer C. Girón Duque, Meghan Balk, Wasila Dahdul, Hilmar Lapp, István Mikó, Elie Alhajjar, Brenen Wynd, Sergei Tarasov, Christopher Lawrence, Basanta Khakurel, Arthur Porto, Lin Yan, Isadora E Fluck, Diego Porto, Joseph Keating, Israel Borokini, Katja Seltmann, Giulio Montanaro, Paula Mabee : The Phenoscape project has developed ontology-based tools and a knowledge base that enables the integration and discovery of phenotypes across species from the scientific literature. The Phenoscape TraitFest 2023 event aimed to promote innovative applications that adopt the capabilities supported by the data in the Phenoscape Knowledgebase and its corresponding semantics-enabled tools, algorithms and infrastructure. The event brought together 26 participants, including domain experts in biodiversity informatics, taxonomy and phylogenetics and software developers from various life-sciences programming toolkits and phylogenetic software projects, for an intense four-day collaborative software coding event. The event was designed as a hands-on workshop, based on the Open Space Technology methodology, in which participants self-organise into subgroups to collaboratively plan and work on their shared research interests. We describe how the workshop was organised, the projects developed and outcomes resulting from the workshop, as well as the challenges in bringing together a diverse group of participants to engage productively in a collaborative environment. HTML XML PDF
      PubDate: Wed, 6 Mar 2024 09:30:43 +0200
       
  • Implementation Experience Report for the Developing Latimer Core Standard:
           The DiSSCo Flanders use-case

    • Abstract: Biodiversity Information Science and Standards 7: e113766
      DOI : 10.3897/biss.7.113766
      Authors : Lissa Breugelmans, Maarten Trekels : HTML XML PDF
      PubDate: Wed, 29 Nov 2023 16:18:28 +020
       
  • The Future of Natural History Transcription: Navigating AI advancements
           with VoucherVision and the Specimen Label Transcription Project (SLTP)

    • Abstract: Biodiversity Information Science and Standards 7: e113067
      DOI : 10.3897/biss.7.113067
      Authors : William Weaver, Kyle Lough, Stephen Smith, Brad Ruhfel : Natural history collections are critical reservoirs of biodiversity information but collections staff are constantly grappling with substantial backlogs and limited resources. The task of transcribing specimen label text into searchable databases requires a significant amount of time, manual labor, and funding. To address this challenge, we introduce VoucherVision, a tool harnessing the capabilities of several Large Language Models (LLMs; Naveed et al. 2023) to augment specimen label transcription. The VoucherVision tool automates laborious components of the transcription process, leveraging an Optical Character Recognition (OCR) system and LLMs to convert unstructured label text into appropriate data formats compatible with database ingestion. VoucherVision uses a combination of structured output parsers and recursive re-prompting strategies to ensure consistency and quality of the LLM-formatted text, significantly reducing errors.Integration of VoucherVision with the University of Michigan Herbarium’s transcription workflow resulted in a significant reduction in per-image transcription time, suggesting significant potential advantages for collections workflows. VoucherVision offers promising strides towards efficient digitization, with curatorial staff playing critical roles in data quality assurance and process oversight. Emphasizing the importance of knowledge sharing, the University of Michigan Herbarium is backing the Specimen Label Transcription Project (SLTP), which will provide open access to benchmarking datasets, fine-tuned models, and validation tools to rank the performance of different methodologies, LLMs, and prompting strategies. In the rapidly evolving landscape of Artificial Intelligence (AI) development, we recognize the profound potential of diverse contributions and innovative methodologies to redefine and advance the transformation of curatorial practices, catalyzing an era of accelerated digitization in natural history collections.An early, public version of VoucherVision is available to try here: https://vouchervision.azurewebsites.net/ HTML XML PDF
      PubDate: Thu, 21 Sep 2023 09:25:29 +030
       
  • Comparative Study: Evaluating the effects of class balancing on
           transformer performance in the PlantNet-300k image dataset

    • Abstract: Biodiversity Information Science and Standards 7: e113057
      DOI : 10.3897/biss.7.113057
      Authors : José Chavarría Madriz, Maria Mora-Cross, William Ulate : Image-based identification of plant specimens plays a crucial role in various fields such as agriculture, ecology, and biodiversity conservation. The growing interest in deep learning has led to remarkable advancements in image classification techniques, particularly with the utilization of convolutional neural networks (CNNs). Since 2015, in the context of the PlantCLEF (Conference and Labs of the Evaluation Forum) challenge (Joly et al. 2015), deep learning models, specifically CNNs, have consistently achieved the most impressive results in this field (Carranza-Rojas 2018). However, recent developments have introduced transformer-based models, such as ViT (Vision Transformer) (Dosovitskiy et al. 2020) and CvT (Convolutional vision Transformer) (Wu et al. 2021), as a promising alternative for image classification tasks. Transformers offer unique advantages such as capturing global context and handling long-range dependencies (Vaswani et al. 2017), which make them suitable for complex recognition tasks like plant identification.In this study, we focus on the image classification task using the PlantNet-300k dataset (Garcin et al. 2021a). The dataset consists of a large collection of 306,146 plant images representing 1,081 distinct species. These images were selected from the Pl@ntNet citizen observatory database. The dataset has two prominent characteristics that pose challenges for classification. First, there is a significant class imbalance, meaning that a small subset of species dominates the majority of the images. This imbalance creates bias and affects the accuracy of classification models. Second, many species exhibit visual similarities, making it tough, even for experts, to accurately identify them. These characteristics are referred to by the dataset authors as long-tailed distribution and high intrinsic ambiguity, respectively (Garcin et al. 2021b).In order to address the inherent challenges of the PlantNet-300k dataset, we employed a two-fold approach. Firstly, we leveraged transformer-based models to tackle the dataset's intrinsic ambiguity and effectively capture the complex visual patterns present in plant images. Secondly, we focused on mitigating the class imbalance issue through various data preprocessing techniques, specifically class balancing methods. By implementing these techniques, we aimed to ensure fair representation of all plant species in order to improve the overall performance of image classification models.Our objective is to assess the effects of data preprocessing techniques, specifically class balancing, on the classification performance of the PlantNet-300k dataset. By exploring different preprocessing methods, we addressed the class imbalance issue and through precise evaluation, conducted a comparison of the performance of transformer-based models with and without class balancing techniques. Through these efforts, our ultimate goal is to assert if these techniques allow us to achieve more accurate and reliable classification results, particularly for underrepresented species in the dataset.In our experiment, we compared the performance of two transformer-based models, ViT and CvT, using two versions of the PlantNet-300k dataset: one with class balancing and the other without class balancing. This setup results in a total of four sets of metrics for evaluation. To assess the classification performance, we utilized a wide range of commonly used metrics including recall, precision, accuracy, AUC (Area Under the Curve), ROC (Receiver Operating Characteristic), and others. These metrics provide insights into each models' ability to correctly classify plant species, identify false positives and negatives, measure overall accuracy, and assess the models' discriminatory power.By conducting this comparative study, we seek to contribute to the advancement of plant identification research by providing empirical evidence of the benefits and effectiveness of class balancing techniques in improving the performance of transformer-based models on the PlantNet-300k dataset and any other similar ones. HTML XML PDF
      PubDate: Thu, 21 Sep 2023 09:20:35 +030
       
  • Structuring Information from Plant Morphological Descriptions using
           Open Information Extraction

    • Abstract: Biodiversity Information Science and Standards 7: e113055
      DOI : 10.3897/biss.7.113055
      Authors : Maria Mora-Cross, William Ulate, Brandon Retana Chacón, María Biarreta Portillo, Josué David Castro Ramírez, Jose Chavarria Madriz : Taxonomic literature keeps records of the planet's biodiversity and gives access to the knowledge needed for research and sustainable management. The number of publications generated is quite large: the corpus of biodiversity literature includes tens of millions of figures and taxonomic treatments. Unfortunately, most of the taxonomic descriptions are from scientific publications in text format. With more than 61 million digitized pages in the Biodiversity Heritage Library (BHL), only 467,265 taxonomic treatments are available in the Biodiversity Literature Repository. To obtain highly structured texts from digitized text has been shown to be complex and very expensive (Cui et al. 2021). The scientific community has described over 1.2 million species, but studies suggest that 86% of existing species on Earth and 91% of species in the ocean still await description (Mora et al. 2011). The published descriptions synthesize observations made by taxonomists over centuries of research and include detailed morphological aspects (i.e., shape and structure) of species useful to identify specimens, to improve information search mechanisms, to perform data analysis of species having particular characteristics, and to compare species descriptions.To take full advantage of this information and to work towards integrating it with repositories of biodiversity knowledge, the biodiversity informatics community first needs to convert plain text into a machine-processable format. More precisely, there is a need to identify structures and substructure names and the characters that describe them (Fig. 1).Open information extraction (OIE) is a research area of Natural Language Processing (NLP), which aims to automatically extract structured, machine-readable representations of data available in unstructured text; usually the result is handled as n-ary propositions, for instance, triples of the form (Shen et al. 2022).OIE is continuously evolving with advancements in NLP and machine learning techniques. The state of the art in OIE involves the use of neural approaches, pre-trained language models, and integration of dependency parsing and semantic role labeling. Neural solutions mainly formulate OIE as a sequence tagging problem or a sequence generation problem. Ongoing research focuses on improving extraction accuracy; handling complex linguistic phenomena, for instance, addressing challenges like coreference resolution; and more open information extraction, because most existing neural solutions work in English texts (Zhou et al. 2022).The main objective of this project is to evaluate and compare the results of automatic data extraction from plant morphological descriptions using pre-trained language models (PLM) and a language model trained on data from plant morphological descriptions written in Spanish.The research data for this study were sourced from the species records database of the National Biodiversity Institute of Costa Rica (INBio). Specifically, the project focused on selecting records of morphological descriptions of plant species written in Spanish.The system processes the morphological descriptions using a workflow that includes phases like data selection and pre-processing, feature extraction, test PLM, local language model training, and test and evaluate results. Fig. 2 shows the general workflow used in this research.Pre-processing and Annotation: Descriptions were standardized by removing special characters like double and single quotes, replacing abbreviations, tokenizing text, and other transformations.Some records of the dataset were annotated with the ground-truth structured information in the form of triples that were extracted from each paragraph. Additionally, structured data from the project carried out by Mora and Araya (Mora and Araya 2018) were included in the dataset.Feature extraction: The token vectorization was done using word embedding directly by the language models.Test PLM: The evaluation process of PLM models used the zero-shot approach and involved applying the models to the test dataset, extracting information, and comparing it to annotated ground truth. Local Language Model Training: The annotated data was split into 80% training data and 20% test data. Using the training data, a language model based on the Transformers architecture was trained.Evaluate results: Evaluation metrics such as precision, recall, and F1 (a meaure of the model's accuracy) were calculated comparing the extracted information and the ground truth. The results were analyzed to understand the models' performance, identify strengths and weaknesses, and gain insights into their ability to extract accurate and relevant information. Based on the analysis, the evaluation process iteratively improved models results.The main contributions of this project are:A Transformers-based language model to extract information from morphological descriptions of plants written in Spanish available on the project website.*1A corpus of morphological descriptions of plants, written in Spanish, labeled for information extraction, and made available on the project website.The results of the project, the first of its kind applied to morphological descriptions of plants written in Spanish, published on the project website. HTML XML PDF
      PubDate: Thu, 21 Sep 2023 09:16:01 +030
       
  • No Pain No Gain: Standards mapping in Latimer Core development

    • Abstract: Biodiversity Information Science and Standards 7: e113053
      DOI : 10.3897/biss.7.113053
      Authors : Matt Woodburn, Jutta Buschbom, Sharon Grant, Janeen Jones, Ben Norton, Maarten Trekels, Sarah Vincent, Kate Webbink : Latimer Core (LtC) is a new proposed Biodiversity Information Standards (TDWG) data standard that supports the representation and discovery of natural science collections by structuring data about the groups of objects that those collections and their subcomponents encompass (Woodburn et al. 2022). It is designed to be applicable to a range of use cases that include high level collection registries, rich textual narratives and semantic networks of collections, as well as more granular, quantitative breakdowns of collections to aid collection discovery and digitisation planning.As a standard that is (in this first version) focused on natural science collections, LtC has significant intersections with existing data standards and models (Fig. 1) that represent individual natural science objects and occurrences and their associated data (e.g., Darwin Core (DwC), Access to Biological Collection Data (ABCD), Conceptual Reference Model of the International Committee on Documentation (CIDOC-CRM)). LtC’s scope also overlaps with standards for more generic concepts like metadata, organisations, people and activities (i.e., Dublin Core, World Wide Web Consortium (W3C) ORG Ontology and PROV Ontology, Schema.org). LtC represents just an element of this extended network of data standards for the natural sciences and related concepts. Mapping between LtC and intersecting standards is therefore crucial for avoiding duplication of effort in the standard development process, and ensuring that data stored using the different standards are as interoperable as possible in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) principles. In particular, it is vital to make robust associations between records representing groups of objects in LtC and records (where available) that represent the objects within those groups.During LtC development, efforts were made to identify and align with relevant standards and vocabularies, and adopt existing terms from them where possible. During expert review, a more structured approach was proposed and implemented using the Simple Knowledge Organization System (SKOS) mappingRelation vocabulary. This exercise helped to better describe the nature of the mappings between new LtC terms and related terms in other standards, and to validate decisions around the borrowing of existing terms for LtC. A further exercise also used elements of the Simple Standard for Sharing Ontological Mappings (SSSOM) to start to develop a more comprehensive set of metadata around these mappings. At present, these mappings (Suppl. material 1 and Suppl. material 2) are provisional and not considered to be comprehensive, but should be further refined and expanded over time.Even with the support provided by the SKOS and SSSOM standards, the LtC experience has proven the mapping process to be far from straightforward. Different standards vary in how they are structured, for example, DwC is a ‘bag of terms’, with informal classes and no structural constraints, while more structured standards and ontologies like ABCD and PROV employ different approaches to how structure is defined and documented. The various standards use different metadata schemas and serialisations (e.g., Resource Description Framework (RDF), XML) for their documentation, and different approaches to providing persistent, resolvable identifiers for their terms. There are also many subtle nuances involved in assessing the alignment between the concepts that the source and target terms represent, particularly when assessing whether a match is exact enough to allow the existing term to be adopted. These factors make the mapping process quite manual and labour-intensive. Approaches and tools, such as developing decision trees (Fig. 2) to represent the logic involved and further exploration of the SSSOM standard, could help to streamline this process.In this presentation, we will discuss the LtC experience of the standard mapping process, the challenges faced and methods used, and the potential to contribute this experience to a collaborative standards mapping within the anticipated TDWG Standards Mapping Interest Group. HTML XML PDF
      PubDate: Thu, 21 Sep 2023 09:14:15 +030
       
  • Filling Gaps in Earthworm Digital Diversity in Northern Eurasia from
           Russian-language Literature

    • Abstract: Biodiversity Information Science and Standards 7: e112957
      DOI : 10.3897/biss.7.112957
      Authors : Maxim Shashkov, Natalya Ivanova, Sergey Ermolov : Data availability for certain groups of organisms (ecosystem engineers, invasive or protected species, etc.) is important for monitoring and making predictions in changing environments. One of the most promising directions for research on the impact of changes is species distribution modelling. Such technologies are highly dependent on occurrence data of high quality (Van Eupen et al. 2021). Earthworms (order Crassiclitellata) are a key group of organisms (Lavelle 2014), but their distribution around the globe is underrepresented in digital resources. Dozens of earthworm species, both widespread and endemic, inhabit the territory of Northern Eurasia (Perel 1979), but extremely poor data on them is available through global biodiversity repositories (Cameron 2018). There are two main obstacles to data mobilisation. Firstly, studies of the diversity of earthworms in Northen Eurasia have a long history (since the end of the nineteenth century) and were conducted by several generations of Soviet and Russian researchers. Most of the collected data have been published in "grey literature", now stored only in a few libraries. Until recently, most of these remained largely undigitised, and some are probably irretrievably lost. The second problem is the difference in the taxonomic checklists used by Soviet and European researchers. Not all species and synonyms are included in the GBIF (Global Biodiversity Information Facility) Backbone Taxonomy. As a result, existing earthworm species distribution models (Phillips 2019) potentially miss a significant amount of data and may underestimate biodiversity, and predict distributions inaccurately. To fill this gap, we collected occurrence data from the Russian language literature (published by Soviet and Russian researchers) and digitised species checklists, keeping the original scientific names.To find relevant literature, we conducted a keyword search for "earthworms" and "Lumbricidae" through the Russian national scientific online library eLibrary and screened reference lists from the monographs of leading Soviet and Russian soil zoologist Tamara Perel (Vsevolodova-Perel 1997, Perel 1979). As a result, about 1,000 references were collected, of which 330 papers had titles indicating the potential to contain data on earthworm occurrences. Among these, 219 were found as PDF files or printed papers. For dataset compilation, 159 papers were used; the others had no exact location data or duplicated data contained in other papers. Most of the sources were peer-reviewed articles (Table 1). A reference list is available through Zenodo (Ivanova et al. 2023).The earliest publication we could find dates back to 1899, by Wilhelm Michaelsen. The most recent publication is 2023. About a third of the sources were written by systematists Iosif Malevich and Tamara Perel. Occurrence data were extracted and structured according to the Darwin Core standard (Wieczorek et al. 2012). During the data digitisation process, we tried to include as much primary information as possible. Only one tenth of the literature occurrences contained the geographic coordinates of locations provided by the authors. The remaining occurrences were manually georeferenced using the point-radius method (Wieczorek et al. 2010).The resulting occurrence dataset Earthworm occurrences from Russian-language literature (Shashkov et al. 2023) was published through the Global Biodiversity Information Facility portal. It contains 5304 occurrences of 117 species from 27 countries (Fig. 1).To improve the GBIF Backbone Taxonomy, we digitised two catalogues of earthworm species published for the USSR (Perel 1979) and Russian Federation (Vsevolodova-Perel 1997) by Tamara Perel. Based on these monographs, three checklist datasets were published through GBIF (Shashkov 2023b, 124 records; Shashkov 2023c, 87 records; Shashkov 2023a, 95 records). Now we work towards including these names in the GBIF Backbone so that all species names can be matched and recorded exactly as mentioned in papers published by Soviet and Russian researchers. HTML XML PDF
      PubDate: Wed, 20 Sep 2023 09:39:26 +030
       
  • Robot-in-the-loop: Prototyping robotic digitisation at the Natural History
           Museum

    • Abstract: Biodiversity Information Science and Standards 7: e112947
      DOI : 10.3897/biss.7.112947
      Authors : Ben Scott, Arianna Salili-James, Vincent Smith : The Natural History Museum, London (NHM) is home to an impressive collection of over 80 million specimens, of which just 5.5 million have been digitised. Like all similar collections, digitisation of these specimens is very labour intensive, requiring time-consuming manual handling. Each specimen is extracted from its curatorial unit, placed for imaging, labels are manually manipulated, and then returned to storage. Thanks to the NHM’s team of digitisers, workflows are becoming more efficient as they are refined. However, many of these workflows are highly repetitive and ideally suited to automation. The museum is now exploring integrating robots into the digitisation process.The NHM has purchased a Techman TM5 900 robotic arm, equipped with integrated Artificial Intelligence (AI) software and additional features such as custom grippers and a 3D scanner. This robotic arm combines advanced imaging technologies, machine learning algorithms, and robotic manipulation capabilities to capture high-quality specimen data, making it possible to digitise vast collections efficiently (Fig. 1).We showcase the NHM's application of robotics for digitisation, outlining the use cases developed for implementation and the prototypical workflows already in place at the museum. We will explore our invasive and non-invasive digitisation experiments, the many challenges, and the initial results of our early experiments with this transformative technology. HTML XML PDF
      PubDate: Wed, 20 Sep 2023 09:33:18 +030
       
  • What Can You Do With 200 Million Newspaper Articles: Exploring
           GLAM data in the Humanities

    • Abstract: Biodiversity Information Science and Standards 7: e112935
      DOI : 10.3897/biss.7.112935
      Authors : Tim Sherratt : I’m a historian who works with data from the GLAM sector (galleries, libraries, archives and museums). When I talk about GLAM data, I’m usually talking about things like newspapers, government documents, photographs, letters, websites, and books. Some of it is well-described, structured, and easily accessible, and some is not. All of it offers us the chance to ask new questions of our past, to see things differently.But what tools, what examples, what documentation, and what support are needed to encourage researchers to explore these possibilities—to engage with collections as data' In this talk, I’ll be describing some of my own adventures amidst GLAM data, before focusing on questions of access, infrastructure, and skills development. In particular, I’ll be introducing the GLAM Workbench—a collection of tools, tutorials, examples, and hacks aimed at helping humanities researchers navigate the world of data. What pathways do we need, and how can we build them' HTML XML PDF
      PubDate: Tue, 19 Sep 2023 10:36:14 +030
       
  • Using ChatGPT with Confidence for Biodiversity-Related Information Tasks

    • Abstract: Biodiversity Information Science and Standards 7: e112926
      DOI : 10.3897/biss.7.112926
      Authors : Michael Elliott, José Fortes : Recent advancements in conversational Artificial Intelligence (AI), such as OpenAI's Chat Generative Pre-Trained Transformer (ChatGPT), present the possibility of using large language models (LLMs) as tools for retrieving, analyzing, and transforming scientific information. We have found that ChatGPT (GPT 3.5) can provide accurate biodiversity knowledge in response to questions about species descriptions, occurrences, and taxonomy, as well as structure information according to data sharing standards such as Darwin Core. A rigorous evaluation of ChatGPT's capabilities in biodiversity-related tasks may help to inform viable use cases for today's LLMs in research and information workflows. In this work, we test the extent of ChatGPT's biodiversity knowledge, characterize its mistakes, and suggest how LLM-based systems might be designed to complete knowledge-based tasks with confidence.To test ChatGPT's biodiversity knowledge, we compiled a question-and-answer test set derived from Darwin Core records available in Integrated Digitized Biocollections (iDigBio). Each question focuses on one or more Darwin Core terms to test the model’s ability to recall species occurrence information and its understanding of the standard. The test set covers a range of locations, taxonomic groups, and both common and rare species (defined by the number of records in iDigBio). The results of the tests will be presented. We also tested ChatGPT on generative tasks, such as creating species occurrence maps. A visual comparison of the maps with iDigBio data shows that for some species, ChatGPT can generate fairly accurate representationsof their geographic ranges (Fig. 1).ChatGPT's incorrect responses in our tests show several patterns of mistakes. First, responses can be self-conflicting. For example, when asked "Does Acer saccharum naturally occur in Benton, Oregon'", ChatGPT responded "YES, Acer saccharum DOES NOT naturally occur in Benton, Oregon". ChatGPT can also be misled by semantics in species names. For Rafinesquia neomexicana, the word "neomexicana" leads ChatGPT to believe that the species primarily occurs in New Mexico, USA. ChatGPT may also confuse species, such as when attempting to describe a lesser-known species (e.g., a rare bee) within the same genus as a better-known species. Other causes of mistakes include hallucination (Ji et al. 2023), memorization (Chang and Bergen 2023), and user deception (Li et al. 2023).Some mistakes may be avoided by prompt engineering, e.g., few-shot prompting (Chang and Bergen 2023) and chain-of-thought prompting (Wei et al. 2022). These techniques assist Large Language Models (LLMs) by clarifying expectations or by guiding recollection. However, such methods cannot help when LLMs lack required knowledge. In these cases, alternative approaches are needed.A desired reliability can be theoretically guaranteed if responses that contain mistakes are discarded or corrected. This requires either detecting or predicting mistakes. Sometimes mistakes can be ruled out by verifying responses with a trusted source. For example, a trusted specimen record might be found that corroborates the response. The difficulty, however, is finding such records programmatically; e.g., using iDigBio and Global Biodiversity Information Facility's (GBIF) search Application Programming Interfaces (APIs) requires specifying indexed terms that might not appear in an LLM's response. This presents a secondary problem for which LLMs may be well suited. Note that with presence-only data, it can be difficult to disprove presence claims or prove absence claims.Besides verification, mistakes may be predicted using probabilistic methods. Formulating mistake probabilities often relies on heuristics. For example, variability in a model’s responses to a repeated query can be a sign of hallucination (Manakul et al. 2023). In practice, both probabilistic and verification methods may be needed to reach a desired reliability. LLM outputs that can be verified may be directly accepted (or discarded), while others are judged by estimating mistake probabilities. We will consider a set of heuristics and verification methods, and report empirical assessments of their impact on ChatGPT’s reliability. HTML XML PDF
      PubDate: Tue, 19 Sep 2023 10:31:18 +030
       
  • NBN Atlas: Our transformation and re-alignment with the Living Atlas
           community

    • Abstract: Biodiversity Information Science and Standards 7: e112813
      DOI : 10.3897/biss.7.112813
      Authors : Helen Manders-Jones, Keith Raven : The National Biodiversity Network (NBN) Atlas is the largest repository of publicly available biodiversity data in the United Kingdom (UK). Built on the open-source Atlas of Living Australia (ALA) platform, it was launched in 2017 and is part of a global network of over 20 Living Atlases (live or in development). Notably, the NBN Atlas is the largest, with almost twice the number of records as the Atlas of Living Australia.In order to meet the needs of the UK biological recording community, the NBN Atlas was considerably customised. Regrettably, these customisations were directly applied to the platform code, resulting in divergence from the parent ALA platform and creating major obstacles to upgrading.To address these challenges, we initiated the Fit for the Future Project. We will outline our journey to decouple the customizations, realign with the ALA, upgrade the NBN Atlas, regain control of the infrastructure and modernize DevOps practices. Each of these steps played a crucial role in our overall transformation. Additionally, we will discuss a new project that will allow data providers to set the public resolution of all records in a dataset and give individuals and organisations access to the supplied location information. We will also highlight our efforts to leverage contributions from volunteer developers. HTML XML PDF
      PubDate: Mon, 18 Sep 2023 08:42:57 +030
       
  • AI-Accelerated Digitisation of Insect Collections: The next generation of
           Angled Label Image Capture Equipment (ALICE)

    • Abstract: Biodiversity Information Science and Standards 7: e112742
      DOI : 10.3897/biss.7.112742
      Authors : Arianna Salili-James, Ben Scott, Laurence Livermore, Ben Price, Steen Dupont, Helen Hardy, Vincent Smith : The digitisation of natural science specimens is a shared ambition of many of the largest collections, but the scale of these collections, estimated at at least 1.1 billion specimens (Johnson et al. 2023), continues to challenge even the most resource-rich organisations.The Natural History Museum, London (NHM) has been pioneering work to accelerate the digitisation of its 80 million specimens. Since the inception of the NHM Digital Collection Programme in 2014, more than 5.5 million specimen records have been made digitally accessible. This has enabled the museum to deliver a tenfold increase in digitisation, compared to when rates were first measured by the NHM in 2008. Even with this investment, it will take circa 150 years to digitise its remaining collections, leading the museum to pursue technology-led solutions alongside increased funding to deliver the next increase in digitisation rate. Insects comprise approximately half of all described species and, at the NHM, represent more than one-third (c. 30 million specimens) of the NHM’s overall collection. Their most common preservation method, attached to a pin alongside a series of labels with metadata, makes insect specimens challenging to digitise. Early Artificial Intelligence (AI)-led innovations (Price et al. 2018) resulted in the development of ALICE, the museum's Angled Label Image Capture Equipment, in which a pinned specimen is placed inside a multi-camera setup, which captures a series of partial views of a specimen and its labels. Centred around the pin, these images can be digitally combined and reconstructed, using the accompanying ALICE software, to provide a clean image of each label. To do this, a Convolutional Neural Network (CNN) model is incorporated, to locate all labels within the images. This is followed by various image processing tools to transform the labels into a two-dimensional viewpoint, align the associated label images together, and merge them into one label. This allows users to manually, or computationally (e.g., using Optical Character Recognition [OCR] tools) extract label data from the processed label images (Salili-James et al. 2022). With the ALICE setup, a user might average imaging 800 digitised specimens per day, and exceptionally, up to 1,300. This compares with an average of 250 specimens or fewer daily, using more traditional methods involving separating the labels and photographing them off of the pin. Despite this, our original version of ALICE was only suited to a small subset of the collection. In situations when the specimen is very large, there are too many labels, or these labels are too close together, ALICE fails (Dupont and Price 2019).Using a combination of updated AI processing tools, we hereby present ALICE version 2. This new version of ALICE provides faster rates, improved software accuracy, and a more streamlined pipeline. It includes the following updates:Hardware: after conducting various tests, we have optimised the camera setup. Further hardware updates include a Light-Emitting Diode (LED) ring light, as well as modifications to the camera mounting.Software: our latest software incorporates machine learning and other computer vision tools to segment labels from ALICE images and stitch them together more quickly and with a higher level of accuracy, significantly reducing the image processing failure rate. These processed label images can be combined with the latest OCR tools for automatic transcription and data segmentation.Buildkit: we aim to provide a toolkit that any individual or institution can incorporate into their digitisation pipeline. This includes hardware instructions, an extensive guide detailing the pipeline, and new software code accessible via Github.We provide test data and workflows to demonstrate the potential of ALICE version 2 as an effective, accessible, and cost-saving solution to digitising pinned insect specimens. We also describe potential modifications, enabling it to work with other types of specimens. HTML XML PDF
      PubDate: Fri, 15 Sep 2023 09:01:59 +030
       
  • Mapping between Darwin Core and the Australian Biodiversity Information
           Standard: A linked data example

    • Abstract: Biodiversity Information Science and Standards 7: e112722
      DOI : 10.3897/biss.7.112722
      Authors : Mieke Strong, Piers Higgs : The Australian Biodiversity Information Standard (ABIS) is a data standard that has been developed to represent and exchange biodiversity data expressed using the Resource Description Framework (RDF). ABIS has the TERN ontology at its core, which is a conceptual information model that represents plot-based ecological surveys. The RDF-linked data structure is self-describing, composed of “triples”. This format is quite different from tabular data. During the Australian federal government Biodiversity Data Repository pilot project, occurrence data in tabular Darwin Core format was converted into ABIS linked data. This lightning talk will describe the approach taken, the challenges that arose, and the ways in which data using Darwin Core terms can be represented in a different way using linked data technologies. HTML XML PDF
      PubDate: Fri, 15 Sep 2023 08:46:01 +030
       
  • I Know Something You Don’t Know: The annotation saga
           continues…

    • Abstract: Biodiversity Information Science and Standards 7: e112715
      DOI : 10.3897/biss.7.112715
      Authors : James Macklin, David Shorthouse, Falko Glöckler : Over the past 20 years, the biodiversity informatics community has pursued components of the digital annotation landscape with varying degrees of success. We will provide an historical overview of the theory, the advancements made through a few key projects, and will identify some of the ongoing challenges and opportunities. The fundamental principles remain unchanged since annotations were first proposed. Someone (or something): (1) has an enhancement to make elsewhere from the source where original data or information are generated or transcribed; (2) wishes to broadcast these statements to the originator and to others who may benefit; and (3) expects persistence, discoverability, and attribution for their contributions alongside the source.The Filtered Push project (Morris et al. 2013) considered several use cases and pioneered development of services based on the technology of the day. The exchange of data between parties in a universally consistent way necessitated the development of a novel draft standard for data annotations via an extension of the World Wide Web Consortium’s Web Annotation Working Group standard (Sanderson et al. 2013) to be sufficiently informative for a data curator to confidently make a decision. Figure 2 from Morris et al. (2013), reproduced here as Fig. 1, outlines the composition of an annotation data package for a taxonomic identification. The package contains the data object(s) associated with an occurrence, an expression of the  motivation(s) for updating, some evidence for an assertion, and a stated expectation for how the receiving entity should take action. The Filtered Push and Annosys (Tschöpe et al. 2013) projects also considered implementation strategies involving collection management systems (e.g., Symbiota) and portals (e.g., European Distributed Institute of Taxonomy, EDIT). However, there remain technological barriers for these systems to operate at scale, the least of which is the absence of globally unique, persistent, resolvable identifiers for shared objects and concepts.Major aggregation infrastructures like the Global Biodiversity Information Facility (GBIF) and the Distributed System of Scientific Collections (DiSSCo) rely on data enhancement to improve the quality of their resources and have annotation services in their work plans. More recently, the Digital Extended Specimen (DES) concept (Hardisty et al. 2022) will rely on annotation services as key components of the proposed infrastructure. Recent work on annotation services more generally has considered various new forms of packaging and delivery such as Frictionless Data (Fowler et al. 2018), Journal Article Tag Suite XML (Agosti et al. 2022), or nanopublications (Kuhn et al. 2018). There is risk in fragmentation of this landscape and disenfranchisement of both biological collections and the wider research community if we fail to align the purpose, content, and structure of these packages or if these fail to remain aligned with FAIR principles.Institutional collection management systems currently represent the canonical data store that provides data to researchers and data aggregators. It is critical that information and/or feedback about the data they release be round-tripped back to them for consideration. However, the sheer volume of annotations that could be generated by both human and machine curation processes will overwhelm local data curators and the systems supporting them. One solution to this is to create a central annotation store with write and discovery services that best support the needs of all stewards of data. This will require an international consortium of parties with a governance and technical model to assure its sustainability. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 16:20:39 +030
       
  • Lognom, Assisting in the Decision-Making and Management of Zoological
           Nomenclature

    • Abstract: Biodiversity Information Science and Standards 7: e112710
      DOI : 10.3897/biss.7.112710
      Authors : Elie Saliba, Régine Vignes Lebbe, Annemarie Ohler : Nomenclature is the discipline of taxonomy responsible for managing the scientific names of groups of organisms. It ensures continuity in the transmission of all kinds of data and knowledge accumulated about taxa. Zoologists use the International Code of Zoological Nomenclature (International Commission on Zoological Nomenclature 1999), currently in its fourth edition. The Code contains the rules that allow the correct understanding and application of nomenclature, e.g., how to choose between two names applying to the same taxon. Nomenclature became more complex over the centuries, as rules appeared, disappeared, or evolved to adapt to scientific and technological changes (e.g., the inclusion of digital media) (International Commission on Zoological Nomenclature 2012).By adhering to nomenclatural rules, taxonomic databases, such as the Catalogue of Life (Bánki et al. 2023), can maintain the integrity and accuracy of taxon names, preventing confusion and ambiguity. Nomenclature also facilitates the linkage and integration of data across different databases, allowing for seamless collaboration and information exchange among researchers.However, unlike its final result, which is also called a nomenclature, the discipline itself has remained relatively impervious to computerization, until now.Lognom*1 is a free web application based on algorithms that facilitate decision-making in zoological nomenclature. It is not based on a pre-existing database, but instead provides an answer based on the user input, and relies on interactive form-based queries. This software aims to help taxonomists determine whether a name or work is available, whether spelling rules have been correctly applied, and whether all the relevant rules have been respected before a new name or work is published. Lognom also allows the user to obtain the valid name between several pre-registered candidate names, including the list of synonyms and the reason for their synonymy. It also includes tools for answering various nomenclatural questions, such as determining if two different species names with the same derivation and meaning should be treated as homonyms; if a name should be treated as a nomen oblitum under Art. 23.9 of the Code; and another tool to determine a genus-series name's grammatical gender.Lognom includes most of the rules regarding availability and validity, with the exception of those needing human interpretation, usually pertaining to Latin grammar. At this point of its development, homonymy is not completely included in the web app, nor are the rules linked to the management of type-specimens (e.g., lectotypification, neotypification), outside of their use in determining the availability of a name.With enough data entered by the users, Lognom should be able to model a modification of the rules and calculate its impact on the potential availability or spelling of existing names. Other prospectives include the possibility of working simultaneously on common projects, which should lead to dynamic lists of available names, as well as automatically extracting nomenclatural data from pre-existing databases, where relevant information is disseminated. A link to attach semantic web labels to names throughout Zoonom (Saliba et al. 2021) or NOMEN (Yoder et al. 2017) is also under consideration. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 16:15:06 +030
       
  • Progress with Repository-based Annotation Infrastructure for Biodiversity
           Applications

    • Abstract: Biodiversity Information Science and Standards 7: e112707
      DOI : 10.3897/biss.7.112707
      Authors : Peter Cornwell : Rapid development since the 1980s of technologies for analysing texts, has led not only to widespread employment of text 'mining', but also to now-pervasive large language model artificial intelligence (AI) applications. However, building new, concise, data resources from historic, as well as contemporary scientific literature, which can be employed efficiently at scale by automation and which have long-term value for the research community, has proved more elusive.Efforts at codifying analyses, such as the Text Encoding Initiative (TEI), date from the early 1990s and were initially driven by the social sciences and humanities (SSH) and linguistics communities, and extended with multiple XML-based tagging schemes, including in biodiversity (Miller et al. 2012). In 2010, the Bio-Ontologies Special Interest Group (of the International Society for Computational Biology) presented its Annotation Ontology (AO), incorporating JavaScript Object Notation and broadening previous XML-based approaches (Ciccarese et al. 2011). From 2011, the Open Annotation Data Model (OADM) (Sanderson et al. 2013) focused on cross-domain standards with utility for Web 3.0, leading to the W3C Web Annotation Data Model (WADM) Recommendation in February 2017*1 and the potential for unifying the multiplicity of already-in-use tagging approaches.This continual evolution has made the preservation of investment using annotation methods, and in particular of the connections between annotations and their context in source literature, particularly challenging. Infrastructure that entered service during the intervening years does not yet support WADM, and has only recently started to address the parallel emergence of page imagery-based standards such as the International Image Interoperability Framework (IIIF). Notably, IIIF instruments such as Mirador-2, which has been employed widely for manual creation and editing of annotations in SSH, continue to employ the now-deprecated OADM. Although multiple efforts now address combining IIIF and TEI text coordinate systems, they are currently fundamentally incompatible.However, emerging repository technologies enable preservation of annotation investment to be accomplished comprehensively for the first time. Native IIIF support enables interactive previewing of annotations within repository graphical user interfaces and dynamic serialisation technologies provide compatibility with existing XML-based infrastructures. Repository access controls can permit experts to trace annotation sources in original texts even if the literature is not publicly accessible, e.g., due to copyright restriction. This is of paramount importance, not only because surrounding context can be crucial to qualify formal terms that have been annotated, such as collecting country. Also, contemporary automated text mining—essential for operation at the scale of known biodiversity literature—is not 100% accurate and manual checking of uncertainties is currently essential. On-going improvement of language analysis tools through AI integration offers significant future gains from reprocessing literature and updating annotation data resources. Nevertheless, without effective preservation of digitized literature, as well as annotations, this enrichment will not be possible—and today's investments in gathering together, as well as analysing scientific literature will be devalued or lost.We report new functionality included in the InvenioRDM*2 Free and Open Source Software (FOSS) repository software platform, which natively supports IIIF and WADM. InvenioRDM development and maintenance is funded and managed by an international consortium. From late 2023, the InvenioRDM-based ZenodoRDM update*3 will display annotations on biodiversity literature interactively. Significantly, the Biodiversity Literature Repository (BLR) is a Zenodo Community. BLR automatically notifies the Global Biodiversity Information Facility (GBIF) of new taxonomic data and GBIF downloads and integrates this into its service.Moreover, an annotation service based on the WADM-native Mirador-3 FOSS IIIF viewer has now been developed and will enter service with ZenodoRDM. This enables editing of biodiversity annotations from within the repository interface, as well as automated updating of taxonomic information products provided to other major infrastructures such as GBIF.Two aspects of this ZenodoRDM annotation service are presented:dynamic transformation of (preservable) WADM annotations for consumption by contemporary IIIF-compliant applications such as Mirador-3, as well as for Plazi TreatmentBank/GBIF compatibilityauthentication and task organization permitting management of groups of expert contributors performing annotation enrichment tasks directly through the ZenodoRDM graphical user interface (GUI)Workflows for editing existing biodiversity annotations, as well as origination of new annotations, need to be tailored for specific tasks—e.g., unifying geographic collecting location definitions in historic reports—via configurable dialogs for contributors and controlled vocabularies. Selectively populating workflows with annotations according to a task definition is also important to avoid cluttering the editing GUI with non-essential information. Updated annotations are integrated into a new annotation collection upon completion of a task, before updating repository records.Current work on annotation workflows for SSH applications is also reported. The ZenodoRDM biodiversity annotation service implements a generic repository micro-service API, and the implementation of similar services for other repository software platforms is discussed. HTML
      PubDate: Thu, 14 Sep 2023 16:10:39 +030
       
  • Combining Ecological and Socio-Environmental Data and Networks to Achieve
           Sustainability

    • Abstract: Biodiversity Information Science and Standards 7: e112703
      DOI : 10.3897/biss.7.112703
      Authors : Laure Berti-Equille, Rafael L. G. Raimundo : Environmental degradation in Brazil has been recently amplified by the expansion of agribusiness, livestock and mining activities with dramatic repercussions on ecosystem functions and services. The anthropogenic degradation of landscapes has substantial impacts on indigenous peoples and small organic farmers whose lifestyles are intimately linked to diverse and functional ecosystems.Understanding how we can apply science and technology to benefit from biodiversity and promote socio-ecological transitions ensuring equitable and sustainable use of common natural resources is a critical challenge brought on by the Anthropocene.We present our approach to combine biodiversity and environmental data, supported by two funded research projects: DATAPB (Data of Paraíba) to develop tools for FAIR (Findable, Accessible, Interoperable and Reusable) data sharing for governance and educational projects and the International Joint Laboratory IDEAL (artificial Intelligence, Data analytics, and Earth observation applied to sustAinability Lab) launched in 2023 by the French Institute for Sustainable Development (IRD, Institut de Recherche pour le Développement) and co-coordinated by the authors, with 50 researchers in 11 Brazilian and French institutions working on Artificial Intelligence and socio-ecological research in four Brazilian Northeast states: Paraíba, Rio Grande do Norte, Pernambuco, and Ceará (Berti-Equille and Raimundo 2023).As the keystone of these transdisciplinary projects, the concept-paradigm of socio-ecological coviability (Barrière et al. 2019) proposes that we should explore multiple ways by which relationships between humans and nonhumans (fauna, flora, natural resources) can reach functional and persistent states.Transdisciplinary approaches to agroecological transitions are urgently needed to address questions such as:How can researchers, local communities, and policymakers co-produce participatory diagnoses that depict the coviability of a territory'How can we conserve biodiversity and ecosystem functions, promote social inclusion, value traditional knowledge, and strengthen bioeconomies at local and regional scales'How can biodiversity, social and environmental data, and networks help local communities in shaping adaptation pathways towards sustainable agroecological practices'These questions require transdisciplinary approaches and effective collaboration among environmental, social, and computer scientists, with the involvement of local stakeholders (Biggs et al. 2012). As such, our methodology relies on two approaches:A large-scale study of socio-ecological determinants of coviability over nine states and 1794 municipalities in Northeast Brazil, combines multiple data sources from IBGE (Instituto Brasileiro de Geografia e Estatística), IPEA (Instituto de Pesquisa Econômica Aplicada) , MapBiomas, Brazil Data Cube, and our partners: GBIF (Global Biodiversity Information Facility), INCT Odisseia (Observatory of the dynamics of the interactions between societes and their environments), and ICMBio (Instituto Chico Mendes de Conservação da Biodiversidade) to enable the computation of proxies and indicators of biodiversity structure, ecosystem functions, and socio-economic organization at different scales. We will perform exploratory data analysis and use artificial intelligence (Rolnick et al. 2022) to identify proxies for adaptability, resilience, and vulnerabilities.A multilayer network approach for modeling the interplay between socio-ecological and governance systems will be desgined and tested using adaptive network modeling (Raimundo et al. 2018). Beyond multilayer networks to model socio-ecological dynamics (Keyes et al. 2021), we will incorporate the evolution of the governance systems at the landscape scale and apply Latin Hypercube methods to explore the parameter space (Raimundo et al. 2014) and get a broad characterization of the model dynamics with insights into how the interplay of coupled adaptive systems influence socio-ecological resilience under multiple ecological and socio-economic scenarios. The overall methodology and study case scenarios will be presented. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 16:05:59 +030
       
  • Migration of the Catalogue of Afrotropical Bees into TaxonWorks

    • Abstract: Biodiversity Information Science and Standards 7: e112702
      DOI : 10.3897/biss.7.112702
      Authors : Dmitry Dmitriev, Connal Eardley, Willem Coetzer : The Catalogue of Afrotropical Bees provides a comprehensive checklist of species of bees known from Sub-Saharan Africa and the western Indian Ocean islands, excluding the honey bee (Apis mellifera Linnaeus) (Eardley and Urban 2010). The checklist has a detailed bibliography of the group, distribution records, and biological associations (visited flowers, host plants, plants used as nests, as well as parasitoids). The database, which was originally built in Microsoft Access, and later managed using Specify Software, was recently migrated to TaxonWorks. TaxonWorks is an integrated, web-based platform designed specifically for the needs of practicing taxonomists and biodiversity scientists, and maintained by the SpeciesFile Group. TaxonWorks has a variety of tools that were designed to help import, manage, validate, and package data for future exports (e.g., in the Darwin Core-Archives (DwC-A; GBIF 2021) or Catalogue of Life's COL-DP formats). Although TaxonWorks has batch upload functionality (e.g., in Darwin Core-Archive, BibTeX format), the complexity of the original dataset (Fig. 1) required special handling, and a custom migration was built to transfer the data from the original format. TaxonWorks could now be used to produce a paper-style catalogue or share the data via the TaxonWorks public interface, TaxonPages. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 16:00:19 +030
       
  • Leveraging AI in Biodiversity Informatics: Ethics, privacy, and broader
           impacts

    • Abstract: Biodiversity Information Science and Standards 7: e112701
      DOI : 10.3897/biss.7.112701
      Authors : Kristen "Kit" Lewers : Artificial Intelligence (AI) has been heralded as a hero by some and rejected as a harbinger of destruction by others. While many in the community are excited about the functionality and promise AI brings to the field of biodiversity informatics, others have reservations regarding its widespread use. This talk will specifically address Large Language Models (LLMs) highlighting both the pros and cons of using LLMs. Like any tool, LLMs are neither good nor bad in and of themselves, but AI does need to be used within the appropriate scope of its ability and properly. Topics to be covered include model opacity (Franzoni 2023), privacy concerns (Wu et al. 2023), potential for algorithmic harm (Marjanovic et al. 2021) and model bias (Wang et al. 2020) in the context of generative AI along with how these topics differ from similar concerns when using traditional ML (Machine Learning) applications. Potential for implementation and training to ensure the most fair environment when leveraging AI and keeping FAIR (Findability, Accessibility, Interoperability, and Reproducibility) principles in mind, will also be discussed.The topics covered will be mainly framed through the Biodiversity Information Standards (TDWG) community, focusing on sociotechnical aspects and implications of implementing LLMs and generative AI.Finally, this talk will explore the potential applicability of TDWG   standards pertaining to uniform prompting vocabulary when using generative AI and employing it as a tool for biodiversity informatics. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 15:55:09 +030
       
  • How Reproducible are the Results Gained with the Help of Deep Learning
           Methods in Biodiversity Research'

    • Abstract: Biodiversity Information Science and Standards 7: e112698
      DOI : 10.3897/biss.7.112698
      Authors : Waqas Ahmed, Vamsi Krishna Kommineni, Birgitta Koenig-ries, Sheeba Samuel : In recent years, deep learning methods in the biodiversity domain have gained significant attention due to their ability to handle the complexity of biological data and to make processing of large volumes of data feasible. However, these methods are not easy to interpret, so the opacity of new scientific research and discoveries makes them somewhat untrustworthy. Reproducibility is a fundamental aspect of scientific research, which enables validation and advancement of methods and results. If results obtained with the help of deep learning methods were reproducible, this would increase their trustworthiness. In this study, we investigate the state of reproducibility of deep learning methods in biodiversity research.We propose a pipeline to investigate the reproducibility of deep learning methods in the biodiversity domain. In our preliminary work, we systematically mined the existing literature from Google Scholar to identify publications that employ deep-learning techniques for biodiversity research. By carefully curating a dataset of relevant publications, we extracted reproducibility-related variables for 61 publications  using a manual approach, such as the availability of datasets and code that serve as fundamental criteria for reproducibility assessment. Moreover, we extended our analysis to include advanced reproducibility variables, such as the specific deep learning methods, models, hyperparameters, etc., employed in the studies.To facilitate the automatic extraction of information from publications, we plan to leverage the capabilities of large language models (LLMs). By using the latest natural language processing (NLP) techniques, we aim to identify and extract relevant information pertaining to the reproducibility of deep learning methods in the biodiversity domain. This study seeks to contribute to the establishment of robust and reliable research practices. The findings will not only aid in validating existing methods but also guide the development of future approaches, ultimately fostering transparency and trust in the application of deep learning techniques in biodiversity research. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 15:50:23 +030
       
  • A Simple Recipe for Cooking your AI-assisted Dish to Serve it in the
           International Digital Specimen Architecture

    • Abstract: Biodiversity Information Science and Standards 7: e112678
      DOI : 10.3897/biss.7.112678
      Authors : Wouter Addink, Sam Leeflang, Sharif Islam : With the rise of Artificial Intelligence (AI), a large set of new tools and services is emerging that supports specimen data mapping, standards alignment, quality enhancement and enrichment of the data. These tools currently operate in isolation, targeted to individual collections, collection management systems and institutional datasets. To address this challenge, DiSSCo, the Distributed System of Scientific Collections, is developing a new infrastructure for digital specimens, transforming them into actionable information objects. This infrastructure incorporates a framework for annotation and curation that allows the objects to be enriched or enhanced by both experts and machines. This creates the unique possibility to plug-in AI-assisted services that can then leverage digital specimens through this infrastructure, which serves as a harmonised Findable, Accessible, Interoperable and Reusable (FAIR) abstraction layer on top of individual institutional systems or datasets. An early example of such services are the ones developed in the Specimen Data Refinery workflow (Hardisty et al. 2022).The new architecture, DS Arch or Digital Specimen Architecture, is built on the concept of FAIR Digital Objects (FDO) (Islam et al. 2020). All digital specimens and related objects are served with persistent identifiers and machine-readable FDO records with information for machines about the object together with a pointer to its machine-readable type description. The type describes the structure of the object, its attributes and describes allowed operations. The digital specimen type and specimen media type are based on existing Biodiversity Information Standards (TDWG) such as Darwin Core, Access to Biological Collection Data (ABCD) Schema and Audiovisual Core Multimedia Resources Metadata Schema, and include support for annotation operations based on the World Wide Web Consortium (W3C) Annotations Data Model. This enables AI-assisted services registered with DS Arch to autonomously discover digital specimen objects and determine the actions they are authorised to perform. AI-assisted services can facilitate various tasks such as digitisation, extract new information from specimen images, create relations with other objects or standardise data. These operations can be done autonomously, upon user request, or in tandem with expert validation. AI-assisted services registered with DS Arch, can interact in the same way with all digital specimens worldwide when served through DS Arch with their uniform FDO representation, even if the content richness, level of standardisation and scope of the specimen is different. DS Arch has been designed to serve digital specimens for living and preserved specimens, and preserved environmental, earth system and astrogeology samples. With the AI-assisted services, data can be annotated with new data, alternative values, corrections, and with new entity relationships. As a result, the digital specimens become Digital Extended Specimens enabling new science and application (Webster et al. 2021). With the implementation of a sophisticated trust model in DS Arch for community acceptance, these annotations will become part of the data itself and can be made available for inclusion in source systems such as collection management systems and aggregators such as Global Biodiversity Information Facility (GBIF), Geoscience Collections Access Service (GeoCASe) and Catalogue of Life.We aim to demonstrate in the session how AI-assisted services can be registered and used to annotate specimen data.  Although the DiSSCo DS Arch is still in development and planned to become operational in 2025, we already have a sandbox environment available in which the concept can be tested and AI-assisted services can be piloted to act on digital specimen data. For testing purposes, the operations on specimens are currently limited to individual specimens and open data, however batch operations will also be possible in the future production environment. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 15:45:12 +030
       
  • Application of Fuzzy Measures to Move Towards Cyber-Taxonomy

    • Abstract: Biodiversity Information Science and Standards 7: e112677
      DOI : 10.3897/biss.7.112677
      Authors : Richardson Ciguene, Aurélien Miralles, Francis Clément : The species inventory of global biodiversity is constantly revised and refined by taxonomic research, through the addition of newly discovered species and the reclassification of known species. This almost three-century-old project provides essential knowledge for humankind. In particular, knowledge of biodiversity establishes a foundation for developing appropriate conservation strategies. An accurate global inventory of species relies on the study of millions of specimens housed all around the world in natural history collections. For the last two decades, biological taxonomy has generated an increasing amount of data every year, and notably through the digitization of collection specimens, has gradually been transformed into a big data science. In recognition of this trend, the French National Museum of Natural History has embarked on a major research and engineering challenge within its information system: the adoption of cyber-taxonomic practices that require easy access to data on specimens housed in natural history collections all over the world. To this end, an important step is to automatically complete and reconcile the heterogeneous classification data usually associated with specimens managed in different collection databases. We describe here a new fuzzy approach to reconciling the classifications in multiple databases, enabling more accurate taxonomic retrieval of specimen data across databases. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 15:40:35 +030
       
  • Mapping across Standards to Calculate the MIDS Level of Digitisation of
           Natural Science Collections

    • Abstract: Biodiversity Information Science and Standards 7: e112672
      DOI : 10.3897/biss.7.112672
      Authors : Elspeth Haston, Mathias Dillen, Sam Leeflang, Wouter Addink, Claus Weiland, Dagmar Triebel, Eirik Rindal, Anke Penzlin, Rachel Walcott, Josh Humphries, Caitlin Chapman : The Minimum Information about a Digital Specimen (MIDS) standard is being developed within Biodiversity Information Standards (TDWG) to provide a framework for organisations, communities and infrastructures to define, measure, monitor and prioritise the digitisation of specimen data to achieve increased accessibility and scientific use. MIDS levels indicate different levels of completeness in digitisation and range from Level 0: not yet meeting minimal required information needs for scientific use to Level 3: fulfilling the requirements for Digital Extended Specimens (Hardisty et al. 2022) by inclusion of persistent identifiers (PIDs) that connect the specimen with derived and related data. MIDS Levels 0–2 are generic for all specimens. From MIDS Level 2 onwards we make a distinction between biological, geological and palaeontological specimens. While MIDS represents a minimum specification, defining and publishing more extensive sets of information elements (extensions) is readily feasible and explicitly recommended.The MIDS level of a digital specimen can be calculated based on the availability of certain information elements. The MIDS standard applies to published data. The ability to map from, to and between TDWG standards is key to being able to measure the MIDS level of the digitised specimen(s). Each MIDS term is being mapped across TDWG standards involving Darwin Core (DwC), the Access to Biological Collections Data (ABCD) Schema and Latimer Core (LtC, Woodburn et al. 2022), using mapping properties provided by the Simple Knowledge Organization System (SKOS) ontology.In this presentation, we will show selected case studies that demonstrate the implementation of the MIDS standard supplemented by MIDS mappings to ABCD, to LtC, and to the Distributed System of Scientific Collections' (DISSCo) Open Digital Specimen specification. The studies show the mapping exercise in practice, with the aim of enabling fully automated and accurate calculations. To provide a reliable indicator for the level of digitisation completeness, it is important that calculations are done consistently in all implementations. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 15:35:45 +030
       
  • Leveraging Multimodality for Biodiversity Data: Exploring joint
           representations of species descriptions and specimen images using CLIP

    • Abstract: Biodiversity Information Science and Standards 7: e112666
      DOI : 10.3897/biss.7.112666
      Authors : Maya Sahraoui, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Vincent Guigue : In recent years, the field of biodiversity data analysis has witnessed significant advancements, with a number of models emerging to process and extract valuable insights from various data sources. One notable area of progress lies in the analysis of species descriptions, where structured knowledge extraction techniques have gained prominence. These techniques aim to automatically extract relevant information from unstructured text, such as taxonomic classifications and morphological traits. (Sahraoui et al. 2022, Sahraoui et al. 2023)  By applying natural language processing (NLP) and machine learning methods, structured knowledge extraction enables the conversion of textual species descriptions into a structured format, facilitating easier integration, searchability, and analysis of biodiversity data.Furthermore, object detection on specimen images has emerged as a powerful tool in biodiversity research. By leveraging computer vision algorithms (Triki et al. 2020, Triki et al. 2021,Ott et al. 2020), researchers can automatically identify and classify objects of interest within specimen images, such as organs, anatomical features, or specific taxa. Object detection techniques allow for the efficient and accurate extraction of valuable information, contributing to tasks like species identification, morphological trait analysis, and biodiversity monitoring. These advancements have been particularly significant in the context of herbarium collections and digitization efforts, where large volumes of specimen images need to be processed and analyzed.On the other hand, multimodal learning, an emerging field in artificial intelligence (AI), focuses on developing models that can effectively process and learn from multiple modalities, such as text and images (Li et al. 2020, Li et al. 2021, Li et al. 2019, Radford et al. 2021, Sun et al. 2021, Chen et al. 2022). By incorporating information from different modalities, multimodal learning aims to capture the rich and complementary characteristics present in diverse data sources. This approach enables the model to leverage the strengths of each modality, leading to enhanced understanding, improved performance, and more comprehensive representations. Structured knowledge extraction from species descriptions and object detection on specimen images synergistically enhances biodiversity data analysis. This integration leverages textual and visual data strengths, gaining deeper insights. Extracted structured information from descriptions improves search, classification, and correlation of biodiversity data. Object detection enriches textual descriptions, providing visual evidence for the verification and validation of species characteristics.To tackle the challenges posed by the massive volume of specimen images available at the Herbarium of the National Museum of Natural History in Paris, we have chosen to implement the CLIP (Contrastive Language-Image Pretraining) model (Radford et al. 2021) developed by OpenAI. CLIP utilizes a contrastive learning framework to recognize joint representations of text and images. The model is trained on a large-scale dataset consisting of text-image pairs from the internet, enabling it to understand the semantic relationships between textual descriptions and visual content.Fine-tuning the CLIP model on our dataset of species descriptions and specimen images is crucial for adapting it to our domain. By exposing the model to our data, we enhance its ability to understand and represent biodiversity characteristics. This involves training the model on our labeled dataset, allowing it to refine its knowledge and adapt to biodiversity patterns.Using the fine-tuned CLIP model, we aim to develop an efficient search engine for the Herbarium's vast biodiversity collection. Users can query the engine with morphological keywords, and it will match textual descriptions with specimen images to provide relevant results. This research aligns with the current AI trajectory for biodiversity data, paving the way for innovative approaches to address conservation and understanding of our planet's biodiversity. HTML XML PDF
      PubDate: Thu, 14 Sep 2023 15:31:00 +030
       
  • Amplifying the Power of eDNA by Making it FAIR

    • Abstract: Biodiversity Information Science and Standards 7: e112553
      DOI : 10.3897/biss.7.112553
      Authors : Miwa Takahashi, Oliver Berry : Environmental DNA (eDNA) is a fast-growing biomonitoring approach to detect species and map their distributions, with the number of eDNA publications exponentially increasing in the past decade. While millions of DNA sequences are often generated and assigned to taxa in each publication, these records are stored in numerous locations (e.g., supplementary materials at journals’ servers, open data publishing platforms such as Dryad) and in various formats, which makes it difficult to find, access, re-use and integrate datasets. Making eDNA data FAIR (findable, accessible, interoperable, re-usable) has vast potential to improve how the biological environment is measured and how change is detected and understood. For instance, it would allow biomonitoring and species distribution modelling studies across extended space and time scales, which is logistically difficult or impossible for individual projects. It would also shed light on “dark” (unassigned) DNA sequences by facilitating their storage and re-analyses with updated ever-growing DNA reference databases. Several challenges are associated with making eDNA FAIR, including how to standardise data formats and bioinformatics workflows, and simplifying the process of post-publication data archiving so that it is acceptable for eDNA practitioners to adopt. Over the next three years, we plan to work closely with biodiversity data platforms such as the Global Biodiversity Information Facility (GBIF) and Atlas of Living Austrlia (ALA), eDNA science journals, and eDNA practitioners, to solve these challenges and enable eDNA to achieve its revolutionary potential as a unified source of information that supports environmental management. HTML XML PDF
      PubDate: Wed, 13 Sep 2023 09:01:34 +030
       
  • Implementing the CARE Principles for Indigenous Data Governance in
           Biodiversity Data Management

    • Abstract: Biodiversity Information Science and Standards 7: e112615
      DOI : 10.3897/biss.7.112615
      Authors : Riley Taitingfong, Stephanie Carroll : Indigenous data governance is a critical aspect of upholding Indigenous rights and fostering equitable partnerships in biodiversity research and data management. An estimated 80% of the planet’s biodiversity exists on Indigenous lands (Sobrevila 2008), and the majority of Indigenous data derived from specimens taken from Indigenous lands are held by non-Indigenous entities and institutions. The CARE Principles (Collective benefit, Authority to control, Responsibility, and Ethics) are designed to guide the inclusion of Indigenous peoples in data governance, and increase their access to and benefit from data (Carroll et al. 2020). This talk will share emerging tools and resources that can be leveraged to implement the CARE Principles within repositories and institutions that hold Indigenous data. This talk highlights two primary tools to promote Indigenous data governance in repositories:a phased framework to guide third-party holders of Indigenous data through foundational learning and concrete steps to apply the CARE principles in their respective settings, andthe CARE criteria, an assessment tool by which researchers and institutions can evaluate the maturity of CARE implementation, identify areas for improvement, and allow other entities such as funders and publishers to evaluate CARE compliance. HTML XML PDF
      PubDate: Tue, 12 Sep 2023 10:41:17 +030
       
  • Recognising Indigenous Provenance in Biodiversity Records

    • Abstract: Biodiversity Information Science and Standards 7: e112610
      DOI : 10.3897/biss.7.112610
      Authors : Maui Hudson, Jane Anderson, Riley Taitingfong, Andrew Martinez, Stephanie Carroll : The advent of data-driven technologies and the increasing demand for data have brought about unique challenges for Indigenous data governance. The CARE principles emphasize Collective Benefit, Authority, Responsibility, and Ethics as essential pillars for ensuring that Indigenous data rights are upheld, Indigenous knowledge is protected, and Indigenous Peoples are active participants in data governance processes (Carroll et al. 2020, Carroll et al. 2021). Identifying tangible activities and providing guidance to centre Indigenous perspectives provide a comprehensive approach to address the complexities of Indigenous data governance in a rapidly evolving data landscape (Gupta et al. 2023, Jennings et al. 2023, Sterner and Elliott 2023).Biodiversity research has increasingly recognized the intertwined relationship between biological diversity and cultural practices, leading to discussions about how research can strengthen the evidence base, build trust, enhance legitimacy for decision making (Alexander et al. 2021) and explore requirements for Indigenous metadata (Jennings et al. 2023). An Indigenous Metadata Bundle Communique, produced following an Indigenous Metadata Symposium, recommended the initial categories as: Governance, Provenance, Lands & Waters, Protocols, and Local Contexts Notices & Labels. Traditional Knowledge (TK) and Biocultural (BC) Labels have emerged as essential tools for recognising and maintaining Indigenous provenance, protocols and permissions in records for both natural ecosystems and cultural heritage (Anderson et al. 2020, Liggins et al. 2021) emphasizing the importance of Indigenous Peoples and local knowledge systems in research and digital management. Biocultural labels acknowledge the intricate links between biodiversity and cultural diversity, emphasizing the role of indigenous communities in preserving biodiversity through their traditional practices (Hudson et al. 2021). By recognizing the intrinsic value of these relationships, TK and BC labels not only contribute to a more holistic understanding of biodiversity but also promote ethical considerations and mutual respect between researchers and local communities, fostering collaborative partnerships for research and conservation initiatives (McCartney et al. 2023).Addressing the CARE Principles for Indigenous Data Governance in biodiversity research introduces several challenges and opportunities. Ethical concerns regarding recognition of Indigenous rights and interests in data (Hudson et al. 2023), intellectual property rights, cultural appropriation, and equitable benefit sharing, must be navigated sensitively (Carroll et al. 2022b, Golan et al. 2022). Moreover, fostering effective communication between researchers and communities is paramount for ensuring the accuracy and authenticity of Indigenous metadata and protocols for appropriate use (Carroll et al. 2022a). However, these challenges are offset by the potential for enriching scientific knowledge, enhancing policy frameworks, and strengthening community-based conservation efforts. HTML XML PDF
      PubDate: Tue, 12 Sep 2023 10:31:26 +030
       
  • Community Curation of Nomenclatural and Taxonomic Information in the
           Context of the Collection Management System JACQ

    • Abstract: Biodiversity Information Science and Standards 7: e112571
      DOI : 10.3897/biss.7.112571
      Authors : Heimo Rainer, Andreas Berger, Tanja Schuster, Johannes Walter, Dieter Reich, Kurt Zernig, Jiří Danihelka, Hana Galušková, Patrik Mráz, Natalia Tkach, Jörn Hentschel, Jochen Müller, Sarah Wagner, Walter Berendsohn, Robert Lücking, Robert Vogt, Lia Pignotti, Francesco Roma-Marzio, Lorenzo Peruzzi : Nomenclatural and taxonomic information are crucial for curating botanical collections. In the course of changing methods for systematic and taxonomic studies, classification systems changed considerably over time (Dalla Torre and Harms 1900, Durand and Bentham 1888, Endlicher 1836, Angiosperm Phylogeny Group et al. 2016). Various approaches to store preserved material have been implemented, most of them based on scientific names (e.g., families, genera, species) often in combination with other criteria such as geographic provenance or collectors.The collection management system, JACQ, was established in the early 2000s then developed to support multiple institutions. It features a centralised data storage (with mirror sites) and access via the Internet. Participating collections can download their data at any time in a comma-separated values (CSV) format. From the beginning, JACQ was conceived as a collaboration platform for objects housed in botanical collections, i.e., plant, fungal and algal groups. For these groups, various sources of taxonomic reference exist, nowadays online resources are preferred, e.g., Catalogue of Life, AlgaeBase, Index Fungorum, Mycobank, Tropicos, Plants of the World Online, International Plant Names Index (IPNI), World Flora Online, Euro+Med, Anthos, Flora of Northamerica, REFLORA, Flora of China, Flora of Cuba, Australian Virtual Herbarium (AVH).Implementation and (re)use of PIDsPersistent identifiers (PIDs) for names (at any taxonomic rank) apart from PIDs for taxa, are essential to allow and support reliable referencing across institutions and thematic research networks (Agosti et al. 2022). For this purpose we have integrated referencing to several of the above mentioned resources and populate the names used inside JACQ with those external PIDs. For example, Salix rosmarinifolia is accepted in Plants of the World Online while Euro+Med Plantbase considers it a synonym of Salix repens subsp. rosmarinifolia. Either one can be an identification of a specimen in the JACQ database.Retrieval of collection materialOne strong use case is the curation of material in historic collections. On the basis of outdated taxon concepts that were applied to the material in history, "old" synonyms are omnipresent in historical collections. In order to retrieve all material of a given taxon, it is necessary to know all relevant names.Future outlookIn combination with the capability of Linked Data and the IIIF (International Image Interoperability Framework) technology, these PIDs serve as crucial elements for the integration of decentralized information systems and reuse of (global) taxonomic backbones in combination with collection management systems (Gamer and Kreyenbühl 2022, Hyam 2022, Loh 2017). HTML XML PDF
      PubDate: Tue, 12 Sep 2023 10:20:35 +030
       
  • The Role of the CLIP Model in Analysing Herbarium Specimen Images

    • Abstract: Biodiversity Information Science and Standards 7: e112566
      DOI : 10.3897/biss.7.112566
      Authors : Vamsi Krishna Kommineni, Jens Kattge, Jitendra Gaikwad, Susanne Tautenhahn, Birgitta Koenig-ries : The number of openly-accessible digital plant specimen images is growing tremendously and available through data aggregators: Global Biodiversity Information Facility (GBIF) contains 43.2 million images, and Intergrated Digitized Biocollections (iDigBio) contains 32.4 million images (Accessed on 29.06.2023). All these images contain great ecological (morphological, phenological, taxonomic etc.) information, which has the potential to facilitate the conduct of large-scale analyses. However, extracting this information from these images and making it available to analysis tools remains challenging and requires more advanced computer vision algorithms. With the latest advancements in the natural language processing field, it is becoming possible to analyse images with text prompts. For example, with the Contrastive Language-Image Pre-Training (CLIP) model, which was trained on 400 million image-text pairs, it is feasible to classify day-to-day life images by providing different text prompts and an image as an input to the model, then the model can predict the most suitable text prompt for the input image. We explored the feasibility of using the CLIP model to analyse digital plant specimen images. A particular focus of this study was on the generation of appropriate text prompts. This is important as the prompt has a large influence on the results of the model. We experimented with three different methods: a) automatic text prompt based on metadata of the specific image or other datasets, b) automatic generic text prompt of the image (describing what is in the image) and c) manual text prompt by annotating the image. We investigated the suitability of these prompts with an experiment, where we tested whether the CLIP model could recognize a herbarium specimen image using digital plant specimen images and semantically disparate text prompts. Our ultimate goal is to filter the digital plant specimen images based on the availability of intact leaves and measurement scale to reduce the number of specimens that reach the downstream pipeline, for instance, the segmentation task for the leaf trait extraction process. To achieve the goal, we are fine-tuning the CLIP model with a dataset of around 20,000 digital plant specimen image-text prompt pairs, where the text prompts were generated using different datasets, metadata and generic text prompt methods. Since the text prompts can be created automatically, it is possible to eradicate the laborious manual annotating process. In conclusion, we present our experimental testing of the CLIP model on digital plant specimen images with varied settings and how the CLIP model can act as a potential filtering tool. In future, we plan to investigate the possibility of using text prompts to do the instance segmentation to extract leaf trait information using Large Language Models (LLMs). HTML XML PDF
      PubDate: Tue, 12 Sep 2023 10:18:01 +030
       
  • Next Steps Towards Better Living Atlas Deployments and Maintenance

    • Abstract: Biodiversity Information Science and Standards 7: e112560
      DOI : 10.3897/biss.7.112560
      Authors : David Martin, Vicente Ruiz Jurado : The Living Atlases project, facilitated by the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA), has successfully operated for more than a decade, establishing collaborations with over 30 countries who have implemented ALA components within their respective environments. Over this period, technological advancements and the prevalence of cloud platforms have transformed the landscape of infrastructure management. In this presentation, we will explore innovative approaches to streamline the installation process of ALA, capitalizing on the benefits offered by cloud platforms and cutting-edge technologies. Furthermore, ALA has maintained a strong collaborative partnership with GBIF over the past four years, focusing on data ingestion pipelines and, more recently, engaging in shared user interface (UI) development.These improvements aim to enhance the maintainability of ALA modules, enabling organizations to leverage the advantages provided by cloud-based solutions and novel technologies. HTML XML PDF
      PubDate: Tue, 12 Sep 2023 10:10:24 +030
       
  • Synergizing Digital, Biological, and Participatory Sciences for Global
           Plant Species Identification: Enabling access to a worldwide
           identification service

    • Abstract: Biodiversity Information Science and Standards 7: e112545
      DOI : 10.3897/biss.7.112545
      Authors : Pierre Bonnet, Antoine Affouard, Jean-Christophe Lombardo, Mathias Chouet, Hugo Gresse, Vanessa Hequet, Remi Palard, Maxime Fromholtz, Vincent Espitalier, Hervé Goëau, Benjamin Deneu, Christophe Botella, Joaquim Estopinan, César Leblanc, Maximilien Servajean, François Munoz, Alexis Joly : Human activities have a growing impact on global biodiversity. While our understanding of biodiversity worldwide is not yet comprehensive, it is crucial to explore effective means of characterizing it in order to mitigate these impacts. The advancements in data storage, exchange capabilities, and the increasing availability of extensive taxonomic, ecological, and environmental databases offer possibilities for implementing new approaches that can address knowledge gaps regarding species and habitats. This enhanced knowledge will, in turn, facilitate improved management practices and enable better local governance of territories. Meeting these requirements necessitates the development of innovative tools and methods to respond to these needs.Citizen science platforms have emerged as valuable resources for generating large amounts of biodiversity data, thanks to their visibility and attractiveness to individuals involved in territorial management and education. These platforms present new opportunities to train deep learning models for automated species recognition, leveraging the substantial volumes of multimedia data they accumulate. However, effectively managing, curating, and disseminating the data and services generated by these platforms remains a significant challenge that hinders the achievement of their objectives. In line with this, the GUARDEN and MAMBO European projects aim to utilize the Pl@ntNet participatory science platform (Affouard et al. 2021) to develop and implement novel computational services to enable the widespread creation of floristic inventories.In the pursuit of this project, various standards and reference datasets have been employed, such as the POWO (Plants of the World Online) world checklist and the WGSRPD (World Geographical Scheme for Recording Plant Distributions) standard, to establish a foundation for creating a global service that aids in plant identification through visual analysis. This service relies on a NoSQL (Not Only Structured Query Language) data management system ArangoDB (Arango Database), utilizes state-of-the-art automated visual classification models (vision transformers), and operates on a distributed IT (Information Technology) infrastructure that leverages the capabilities of collaborative stakeholders interested in supporting this initiative.Global-scale automated workflows have been established specifically for the collection, analysis, and dissemination of illustrated occurrences of plant species. These workflows now enable the development of new IT tools that facilitate the description and monitoring of species and habitat conservation statuses. A comprehensive presentation highlighting the significant advancements achieved will be provided to share the lessons learned during its development and ensure the widespread adoption of this service within the scientific community. HTML XML PDF
      PubDate: Tue, 12 Sep 2023 10:04:09 +030
       
  • Demonstration of Taxonomic Name Data Services through ChecklistBank

    • Abstract: Biodiversity Information Science and Standards 7: e112544
      DOI : 10.3897/biss.7.112544
      Authors : Olaf Bánki, Markus Döring, Thomas Jeppesen, Donald Hobern : ChecklistBank is a publishing platform and open data repository focused on taxonomic and nomenclatural datasets. It was launched at the end of 2020, and is a joint development by Catalogue of Life (COL) and the Global Biodiversity Information Facility (GBIF). Close to 50K datasets, mostly originating from published literature mediated through Plazi's TreatmentBank, Pensoft Publishers and the European Journal of Taxonomy, are openly accessible through ChecklistBank. Data sets also include sources with (Molecular) Operational Taxonomic Units, such as from UNITE / PlutoF, National Center for Biotechnology Information Taxonomy / European Nuclotide Archive, and the International Barcode of Life / BoLD. Next to various taxonomic datasets (also from regional and national levels, e.g., shared through GBIF) and nomenclatural datasets (e.g., Zoobank, International Plant Names Index), ChecklistBank also links out to the various original initiatives websites (e.g., World Register of Marine Species, Integrated Taxonomic Information System, COL China, Species Files). ChecklistBank also holds all the tooling that is needed to assemble the COL Checklist, the authoritative global species list of all described organisms. The COL Checklist 2023 (Bánki et al. 2023), containing more than 2.1 million accepted species, is assembled from 164 global taxonomic data sources mediated through ChecklistBank. The COL Checklist contains name usage identifiers, and each checklist version and its underpinning data sources are issued with digital object identifiers.After the launching of ChecklistBank, the EU funded Biodiversity Community Integrated Knowledge Library (BiCIKL) project contributed to additional improvements to ChecklistBank. These added functionalities include, amongst others, a name usage search, name match, and a taxonomic data comparison. The tooling used to assemble the COL Checklist has been generalised through a ChecklistBank 'project functionality' supporting the assembly of a species list.During the demonstration, several of the functionalities, developed in the context of the EU BiCIKL project, will be highlighted. HTML XML PDF
      PubDate: Tue, 12 Sep 2023 09:35:05 +030
       
  • Extracting Reproductive Condition and Habitat Information
           from Text Using a Transformer-based Information Extraction Pipeline

    • Abstract: Biodiversity Information Science and Standards 7: e112505
      DOI : 10.3897/biss.7.112505
      Authors : Roselyn Gabud, Nelson Pampolina, Vladimir Mariano, Riza Batista-Navarro : Understanding the biology underpinning the natural regeneration of plant species in order to make plans for effective reforestation is a complex task. This can be aided by providing access to databases that contain long-term and wide-scale geographical information on species distribution, habitat, and reproduction. Although there exists widely-used biodiversity databases that contain structured information on species and their occurrences, such as the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA), the bulk of knowledge about biodiversity still remains embedded in textual documents. Unstructured information can be made more accessible and useful for large-scale studies if there are tools and services that automatically extract meaningful information from text and store it in structured formats, e.g., open biodiversity databases, ready to be consumed for analysis (Thessen et al. 2022).We aim to enrich biodiversity occurrence databases with information on species reproductive condition and habitat, derived from text. In previous work, we developed unsupervised approaches to extract related habitats and their locations, and related reproductive condition and temporal expressions (Gabud and Batista-Navarro 2018). We built a new unsupervised hybrid approach for relation extraction (RE), which is a combination of classical rule-based pattern-matching methods and transformer-based language models that framed our RE task as a natural language inference (NLI) task. Using our hybrid approach for RE, we were able to extract related biodiversity entities from text even without a large training dataset.In this work, we implement an information extraction (IE) pipeline comprised of a named entity recognition (NER) tool and our hybrid relation extraction (RE) tool. The NER tool is a transformer-based language model that was pretrained on scientific text and then fine-tuned using COPIOUS (Conserving Philippine Biodiversity by Understanding big data; Nguyen et al. 2019), a gold standard corpus containing named entities relevant to species occurrence. We applied the NER tool to automatically annotate geographical location, temporal expression and habitat information contained within sentences. A dictionary-based approach is then used to identify mentions of reproductive conditions in text (e.g., phrases such as "fruited heavily" and "mass flowering"). We then use our hybrid RE tool to extract reproductive condition - temporal expression and habitat - geographical location entity pairs. We test our IE pipeline on the forestry compendium available in the CABI Digital Library (Centre for Agricultural and Biosciences International), and show that our work enables the enrichment of descriptive information on reproductive and habitat conditions of species. This work is a step towards enhancing a biodiversity database with the inclusion of habitat and reproductive condition information extracted from text. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:57:21 +030
       
  • Improving Biological Collections Data through Human-AI Collaboration

    • Abstract: Biodiversity Information Science and Standards 7: e112488
      DOI : 10.3897/biss.7.112488
      Authors : Alan Stenhouse, Nicole Fisher, Brendan Lepschi, Alexander Schmidt-Lebuhn, Juanita Rodriguez, Federica Turco, Emma Toms, Andrew Reeson, Cécile Paris, Pete Thrall : Biological collections play a crucial role in our understanding of biodiversity and inform research in areas such as biosecurity, conservation, human health and climate change. In recent years, the digitisation of biological specimen collections has emerged as a vital mechanism for preserving and facilitating access to these invaluable scientific datasets. However, the growing volume of specimens and associated data presents significant challenges for curation and data management. By leveraging human-Artificial Intelligence (AI) collaborations, we aim to transform the way biological collections are curated and managed, unlocking their full potential in addressing global challenges.We present our initial contribution to this field through the development of a software prototype to improve metadata extraction from digital specimen images in biological collections. The prototype provides an easy-to-use platform for collaborating with web-based AI services, such as Google Vision and OpenAI Generative Pre-trained Transformer (GPT) Large Language Models (LLM). We demonstrate its effectiveness when applied to herbarium and insect specimen images. Machine-human collaboration may occur at various points within the workflows and can significantly affect outcomes. Initial trials suggest that the visual display of AI model uncertainty could be useful during expert data curation. While much work remains to be done, our results indicate that collaboration between humans and AI models can significantly improve the digitisation rate of biological specimens and thereby enable faster global access to this vital data.Finally, we introduce our broader vision for improving biological collection curation and management using human-AI collaborative methods. We explore the rationale behind this approach and the potential benefits of adding AI-based assistants to collection teams. We also examine future possibilities and the concept of creating 'digital colleagues' for seamless collaboration between human and digital curators. This ‘collaborative intelligence’ will enable us to make better use of both human and machine capabilities to achieve the goal of unlocking and improving our use of these vital biodiversity data to tackle real-world problems. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:49:04 +030
       
  • Practice, Pathways and Lessons Learned from Building a Digital Data Flow
           with Tools: Focusing on alien invasive species, from occurrence via
           measures to documentation

    • Abstract: Biodiversity Information Science and Standards 7: e112337
      DOI : 10.3897/biss.7.112337
      Authors : Mora Aronsson, Malin Strand, Holger Dettki, Hanna Illander, Johan Olsson : The SLU Swedish Species Information Centre (SSIC, SLU Artdatabanken) accumulates, analyses and disseminates information concerning species and habitats occurring in Sweden. The SSIC provides an open access biodiversity reporting and analysis infrastructure including the Swedish Species Observation System, the Swedish taxonomic backbone Dyntaxa, and tools for species information including traits, terminology, quality assurance and species identification.*1 The content is available to scientists, conservationists and the public. All systems, databases, APIs and web applications, rely on recognized standards to ensure interoperability. The SSIC is a leading partner within the Swedish Biodiversity Data Infrastructure (SBDI).Here we present a data flow (Fig. 1) that exemplifies the strengthening of the cooperation and transfer of experiences between research, community, non-governmental organizations (NGOs), citizen science and governmental agencies, and also presents solutions to current data challenges (e.g., data fragmentation, taxonomic issues or platform relations). This data flow aimed to facilitate the process for evaluating and understanding the distribution and spread of species (e.g., invasive alien species). It provides Findable, Accessible, Interoperable and Reusable (FAIR) data and links related information between different parties such as universities, NGOs, county administrative boards (CABs) and environmental protection agencies (EPAs). The digital structure is built on the national Swedish taxonomic backbone Dyntaxa, which prevents data fragmentation due to taxonomic issues and acts as a common standard for all users. The chain of information contains systems, tools and a linked data flow for reporting observations, verification procedures, and it can work as an early warning system for surveillance regarding certain species. After an observation is reported, an alert can be activated, field checks can be carried out, and if necessary, eradication measures can be activated.The verification tool that traditionally has been focused on the quality of species identification has been improved, providing verification of geographic precision. This is equally important for eradication actions as is species accuracy.A digital catalogue  of eradication methods is in use by the CABs but there are also recommendations on methods for ‘public’ use, and collaboration between Invasive Alien Species (IAS) coordinators in regional CABs is currently being developed. The CABs have a separate tool for documentation of eradication measures and, if/when measures are carried out (by CABs), this information can be fed back from the CAB-tool into the database in SSIC where it is possible to search for, and visualize, this information. Taxonomic integrity over time should be intact and related to the taxon identifier (ID) provided by Dyntaxa. However, metadata, such as geographic position, date, verification status, mitigation results, etc., will be fully used when reporting under the IAS Regulation 1143/2014 (EU).The development of the digital structure is a collaboration with the Swedish Environmental Protection Agency (Naturvårdsverket) and the Swedish Agency for Marine and Water Management (Havs-och Vattenmyndigheten). HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:40:20 +030
       
  • High Throughput Information Extraction of Printed Specimen Labels from
           Large-Scale Digitization of Entomological Collections using a
           Semi-Automated Pipeline

    • Abstract: Biodiversity Information Science and Standards 7: e112466
      DOI : 10.3897/biss.7.112466
      Authors : Margot Belot, Leonardo Preuss, Joël Tuberosa, Magdalena Claessen, Olha Svezhentseva, Franziska Schuster, Christian Bölling, Théo Léger : Insects account for half of the total described living organisms on Earth, with a vast number of species awaiting description. Insects play a major role in ecosystems but are yet threatened by habitat destruction, intensive farming, and climate change. Museum collections around the world house millions of insect specimens and large-scale digitization initiatives, such as the digitization street digitize! at the Museum für Naturkunde, have been undertaken recently to unlock this data. Accurate and efficient extraction of insect specimen label information is vital for building comprehensive databases and facilitating scientific investigations, sustainability of the collected data, and efficient knowledge transfer. Despite the advancements in high-throughput imaging techniques for specimens and their labels, the process of transcribing label information remains mostly manual and lags behind the pace of digitization efforts.In order to address this issue, we propose a three step semi-automated pipeline that focuses on extracting and processing information from individual insect labels. Our solution is primarily designed for printed insect labels, as the OCR (optical character recognition) technology performs well for printed text while handwritten texts still yield mixed results. The pipeline incorporates computer vision (CV) techniques, OCR, and a clustering algorithm. The initial stage of our pipeline involves image analysis using a convolutional neural network (CNN) model. The model was trained using 2100 images from three distinct insect label datasets, namely AntWeb (ant specimen labels from various collections), Bees & Bytes (bee specimen labels from the Museum für Naturkunde), and LEP_PHIL (Lepidoptera specimen labels from the Museum für Naturkunde). The first model enables the identification and isolation of single labels within an image, effectively segmenting the label region from the rest of the image, and crops them into multiple new, single-label image files. It also assigns the labels to different classes, i.e., printed text or handwritten, with handwritten labels sorted out from the printed ones. In the second step, labels classified as “printed” are then parsed by an OCR engine to extract the text information from the labels. Tesseract and Google Vision OCRs were both tested to assess their performance. While Google Vision OCR is a cloud-based service with limited configurability, Tesseract provides the flexibility to fine-tune settings and enhance its performance for our specific use cases. In the third step, the OCR outputs are aggregated by similarity using a clustering algorithm. This step allows for the identification and formation of clusters that consist of labels sharing identical or highly similar content. Ultimately, these clusters are compared against a curated database of labels and are assigned to a known label or highlighted as new and manually added to the database.In order to assess the efficiency of our pipeline, we performed benchmarking experiments using a set of images similar to those the models were trained on, as well as additional image sets obtained from various museum collections. Our pipeline offers several advantages, streamlining the data entry process, and reducing manual extraction time and effort, while also minimizing potential human errors and inconsistencies in label transcription. The pipeline holds the promise of accelerating metadata extraction from insect specimens, promoting scientific research and enabling large-scale analyses to achieve a more profound understanding of the collections. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:36:45 +030
       
  • An AI-based Wild Animal Detection System and Its Application

    • Abstract: Biodiversity Information Science and Standards 7: e112456
      DOI : 10.3897/biss.7.112456
      Authors : Congtian Lin, Jiangning Wang, Liqiang Ji : Rapid accumulation of biodiversity data and development of deep learning methods bring the opportunities for detecting and identifying wild animals automatically, based on artificial intelligence. In this paper, we introduce an AI-based wild animal detection system. It is composed of acoustic and image sensors, network infrastructures, species recognition models, and data storage and visualization platform, which go through the technical chain learned from Internet of Things (IOT) and applied to biodiversity detection. The workflow of the system is as follows:Deploying sensors for different detection targets. The acoustic sensor is composed of two microphones for picking up sounds from the environment and an edge computing box for judging and sending back the sound files. The acoustic sensor is suitable for monitoring birds, mammals, chirping insects and frogs. The image sensor is composed of a high performance camera that can be controlled to record surroundings automatically and a video analysis edge box running a model for detecting and recording animals. The image sensor is suitable for monitoring waterbirds in locations without visual obstructions.Adopting different networks according to signal availability. Network infrastructures are critical for the detection system and the task of transferring data collected by sensors. We use the existing network when 4/5G signals are available, and build special networks using Mesh Networking technology for the areas without signals. Multiple network strategies lower the cost for monitoring jobs.Recognizing species from sounds, images or videos. AI plays a key role in our system. We have trained acoustic models for more than 800 Chinese birds and some common chirping insects and frogs, which can be identified from sound files recorded by acoustic sensors. For video and image data, we also have trained models for recognizing 1300 Chinese birds and 400 mammals, which help to discover and count animals captured by image sensors. Moreover, we propose a special method for detecting species through features of voices, images and niche features of animals. It is a flexible framework to adapt to different combinations of acoustic and image sensors. All models were trained with labeled voices, images and distribution data from Chinese species database, ESPECIES.Saving and displaying machine observations. The original sound, image and video files with identified results were stored in the data platform deployed on the cloud for extensible computing and storage. We have developed visualization modules in the platform for displaying sensors on maps using WebGIS to show curves of the number of records and species for each day, real time alerts from sensors capturing animals, and other parameters.For storing and exchanging records of machine observations and information of sensors, and models and key nodes of network, we have proposed a collection of data fields extended from Darwin Core and built up a data model to represent where, when and which sensors observe which species. The system has been applied in several projects since last year. For example, we have deployed 50 sensors across the city of Beijing for detecting birds, and now they have harvested more than 300 million records and detected 320 species, filling the data gaps of Beijing birds from taxonomic coverage to time dimension effectively. Next steps will focus on improving AI models for identifying species with higher accuracy, popularizing this system in biodiversity detection, and building up a mechanism for sharing and publishing machine observations. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:32:25 +030
       
  • Application of AI-Helped Image Classification of Fish Images: An iDigBio
           dataset example

    • Abstract: Biodiversity Information Science and Standards 7: e112438
      DOI : 10.3897/biss.7.112438
      Authors : Bahadir Altintas, Yasin Bakış, Xiojun Wang, Henry Bart : Artificial Intelligence (AI) becomes more prevalent in data science as well as in areas of computational science. Commonly used classification methods in AI can also be used for unorganized databases, if a proper model is trained. Most of the classification work is done on image data for purposes such as object detection and face recognition. If an object is well detected from an image, the classification may be done to organize image data. In this work, we try to identify images from an Integrated Digitized Biocollections (iDigBio) dataset and to classify these images to generate metadata to use as an AI-ready dataset in the future. The main problem of the museum image datasets is the lack of metadata information on images, wrong categorization, or poor image quality. By using AI, it maybe possible to overcome these problems. Automatic tools can help find, eliminate or fix these problems. For our example, we trained a model for 10 classes (e.g., complete fish, photograph, notes/labels, X-ray, CT (computerized tomotography) scan, partial fish, fossil, skeleton) by using a manually tagged iDigBio image dataset. After training a model for each for class, we reclassified the dataset by using these trained models. Some of the results are given in Table 1.As can be seen in the table, even manually classified images can be identified as different classes, and some classes are very similar to each other visually such as CT scans and X-rays or fossils and skeletons. Those kind of similarities are very confusing for the human eye as well as AI results.  HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:27:01 +030
       
  • Unearthing the Past for a Sustainable Future: Extracting and transforming
           data in the Biodiversity Heritage Library for climate action

    • Abstract: Biodiversity Information Science and Standards 7: e112436
      DOI : 10.3897/biss.7.112436
      Authors : JJ Dearborn, Mike Lichtenberg, Joel Richard, Joseph deVeer, Michael Trizna, Katie Mika : As the urgency to address the climate crisis intensifies, the availability of accurate and comprehensive biodiversity data has become crucial for informing climate change studies, tracking key environmental indicators, and building global biodiversity monitoring platforms. The Biodiversity Heritage Library (BHL) plays a vital role in the core biodiversity infrastructure, housing over 60 million pages of digitized literature about life on Earth. Recognizing the value of over 500 years of data in BHL, a global network of BHL staff is working to establish a scalable data pipeline to provide actionable occurrence data from BHL’s vast and diverse collections. However, transforming textual content into FAIR (findable, accessible, interoperable, reusable) data poses challenges due to missing descriptive metadata and error-ridden unstructured outputs from commercial text engines. (Fig. 1)Despite the wealth of knowledge in BHL now available to global audiences, the underutilization of biodiversity and climate data contained in BHL's textual corpus hinders scientific research, hampers informed decision-making for conservation efforts, and limits our understanding of biodiversity patterns crucial for addressing the climate crisis. By leveraging recent advancements in text recognition engines, along with cutting-edge AI (Artificial Intelligence) models like OpenAI’s CLIP (Contrastive Language-Image Pre-Training) and nascent features in transcription platforms, BHL staff are beginning to process vast amounts of textual and image data and transform centuries worth of data from BHL collections into computationally usable formats. Recent technological breakthroughs now offer a transformative opportunity to empower the global biodiversity community with prescient insights from our shared past and facilitate the integration of historical knowledge into climate action initiatives. To bridge gaps in the historical record and unlock the potential of the Biodiversity Heritage Library (BHL), a multi-pronged effort utilizing innovative cross-disciplinary approaches is being piloted. These technical approaches were selected for their efficiency and ability to generate rapid results that could be applied across the diverse range of materials in BHL. (Fig. 2)Piloting a data pipeline that is scalable to 60 million pages requires considerable investigation, experimentation, and resources but will have an appreciable impact on global conservation efforts by informing and establishing historic baselines deeper into time. This presentation will focus on the identification, extraction, and transformation of OCR into structured data outputs in BHL. Approaches include:Upgrading legacy OCR text using Tesseract OCR engine to improve data quality by 20% and openly publish 40 GBs of textual data as FAIR data;Evaluating handwritten text recognition (HTR) engines (Microsoft Azure Computer Vision, Google Cloud Vision API (Application Programming Interface), and Amazon Textract) to improve scientific name-finding in BHL’s handwritten archival materials using algorithms developed by Global Names Architecture;Extracting data from collecting events using HTR coordinate outputs with Python library Pandas DataFrame to create structured data;Classifying BHL page-level images with OpenAI's CLIP, a neural network model to accurately identify the handwritten sub-corpus of primary source materials in BHL;Running an A/B test to evaluate the efficiency and accuracy of human-keyed transcription data extraction to provide high-quality, human-vetted datasets that can be deposited with data aggregators.The ongoing development of a scalable data pipeline of BHL’s relevant biodiversity and climate-related datasets requires sustained support and partnership with the biodiversity community. Initial results demonstrate that liberating data from archival and handwritten field notes is arduous but feasible. Extending these methodologies to the broader scientific literature presents new research opportunities. Extracting and normalizing data from unstructured textual sources can significantly advance biodiversity research and inform environmental policy. The Biodiversity Heritage Library staff are committed to building multiple scalable data pipelines with the ultimate goal of erecting a global biodiversity knowledge graph, rich in interconnected data and semantic meaning, enabling informed decisions for the preservation and sustainable management of Earth's biodiversity. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:24:04 +030
       
  • Documenting Biodiversity in Underrepresented Languages using Crowdsourcing

    • Abstract: Biodiversity Information Science and Standards 7: e112431
      DOI : 10.3897/biss.7.112431
      Authors : Mohammed Kamal-Deen Fuseini, Agnes Abah, Andra Waagmeester : Biodiversity is the variety of life on Earth, and it is essential for our planet's health and well-being. Language is also a powerful medium for documenting and preserving cultural heritage, including knowledge about biodiversity. However, many indigenous and underrepresented languages are at risk of disappearing, taking with them valuable information about local ecosystems. Also, many species are at risk of extinction, and much of our knowledge about biodiversity is in underrepresented languages. (Cardoso et al. 2019). This can make it challenging to document and protect biodiversity, as well as to share this knowledge with others.Crowdsourcing is a way to collect information from a large number of people, and it can be a valuable tool for documenting biodiversity in underrepresented languages. By crowdsourcing, leveraging the iNaturalist platform, and volunteer contributors in the open movement including the Dagbani*1 and Igbo*2 Wikimedian communities, we can reach people who have knowledge about local biodiversity, but who may not have been able to share this knowledge before. For instance, the Dagbani and Igbo Wikimedia contributors did not have enough content on biodiversity data until they received education about the need. This can help us to fill in the gaps in our knowledge about biodiversity, and to protect species that are at risk of extinction.In this presentation, we will discuss the use of crowdsourcing to document biodiversity in underrepresented languages, the challenges and opportunities of using crowdsourcing for this purpose, and some examples of successful projects. We will also discuss the importance of sharing knowledge about biodiversity with others and share some ideas on how to do this.We believe that crowdsourcing has the potential to be a powerful tool for documenting biodiversity in underrepresented languages. By working together, we can help protect our planet's biodiversity and ensure that this knowledge is available to future generations. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:17:02 +030
       
  • Safeguarding Access to 500 Years of Biodiversity Data: Sustainability
           planning for the Biodiversity Heritage Library

    • Abstract: Biodiversity Information Science and Standards 7: e112430
      DOI : 10.3897/biss.7.112430
      Authors : Martin Kalfatovic, Bianca Crowley, JJ Dearborn, Colleen Funkhouser, David Iggulden, Kelli Trei, Elisa Herrmann, Kevin Merriman : The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at Smithsonian Libraries and Archives (SLA), BHL is a global consortium of research institutions working together to build and maintain a critical piece of biodiversity data infrastructure. BHL provides free access to over 60 million pages of biodiversity content from the 15th–21st centuries. BHL works with the biodiversity community to develop tools and services to facilitate greater access, interoperability, and reuse of content and data. Through taxonomic intelligence tools developed by Global Names Architecture, BHL has indexed more than 230 million instances of taxonomic names throughout its collection, allowing researchers to locate publications about specific taxa. BHL also works to bring historical literature into the modern network of scholarly research by retroactively assigning
      DOI s (digital object identifiers) and making this historical content more discoverable and trackable. Biodiversity databases such as the Catalogue of Life, International Plant Names Index, Tropicos, World Register of Marine Species, and iNaturalist, rely on literature housed in BHL. Locked within its 60 million pages are valuable species occurrence data and observations from expeditions. To make this data FAIR (findable, accessible, interoperable, and reusable), BHL and its partners are working on a data pipeline to transform textual content into actionable data that can be deposited into data aggregators such as the Global Biodiversity Information Facility (GBIF).BHL’s shared vision began in 2006 among a small community of passionate librarians, technologists, and biodiversity researchers. Uniting as a consortium, BHL received grant funding to build and launch its digital library. BHL partners received additional grant funding for further technical development and targeted digitization projects. When initial grant funding ended in 2012, BHL established an annual dues model for its Members and Affiliates to help support central BHL operating expenses and technical development. This dues model continues today, along with in-kind contributions of staff time from Members and Affiliates. Significant funding is also provided by the Smithsonian in the form of an annual U.S. federal allocation, endowment funds, and SLA cost subvention, to host the technical infrastructure and Secretariat staff. BHL also relies on user donations to support its program.Though BHL has diversified funding streams over the years, it relies heavily on a few key institutions to cover operating costs. Though these institutions have overarching open access, research, and sustainability goals, priorities and resources to achieve these goals shift over time. Without long-term commitments, institutions may choose to prioritize new projects over established programs. Many BHL contributors have experienced funding loss for digitization projects, reducing the rate at which new content is added to BHL. Further loss of funding for central staff and technical infrastructure would reduce BHL from a data-rich technology project to an unsupported and deprecated platform. Without a long-term commitment to maintain and improve the technical infrastructure, BHL’s termination would result in countless broken links from biodiversity databases, library catalogs, Wikidata, and other aggregators across the web; detrimental impact on existing third-party projects relying on BHL citation and species data; and the elimination of more equitable and free access to biodiversity knowledge.To continue its mission, BHL must increase and improve its data integration with the wider biodiversity infrastructure and secure a sustainable future. Securing that future will require external expertise to diversify funding sources, re-engage support from existing partners, and identify new stakeholders for support. During the founding discussions of BHL, stakeholders agreed that the only way to do biodiversity science globally is through collaboration. One institution could not lead alone. Going forward, this imperative must also include collaborative funding models. Partnering with initiatives such as the Global Biodata Coalition (GBC) can lead to a stronger and more resilient biodiversity infrastructure. With ongoing collaboration, innovation, and an unwavering commitment to open access, BHL will continue to transform research on a global scale and provide researchers with the tools they need to study, explore, and conserve life on Earth. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:06:32 +030
       
  • Swedish Biodiversity Data Infrastructure (SBDI): Insights from the Swedish
           ALA installation

    • Abstract: Biodiversity Information Science and Standards 7: e112429
      DOI : 10.3897/biss.7.112429
      Authors : Margret Steinthorsdottir, Veronika Johansson, Manash Shah : The Swedish Biodiversity Data Infrastructure (SBDI) is a biodiversity informatics infrastructure and is the key national resource for data-driven biodiversity and ecosystems research. SBDI rests on three pillars:mobilisation and access to biodiversity data;development and operation of tools for analysing these data; anduser support. SBDI is funded by the Swedish Research Council (VR) and eleven of Sweden’s major universities and research government authorities (Fig. 1).SBDI was formed in early 2021 and represents the final step in an amalgamation of national infrastructures for biodiversity and ecosystems research. SBDI includes the Swedish node of the Global Biodiversity Information Facility (GBIF), the key international infrastructure for sharing biodiversity data. SBDI's predecessor Biodiversity Atlas Sweden (BAS) was an early adopter of the Atlas of Living Australia (ALA) platform. SBDI pioneered the container-based deployment of the platform using Docker and Docker Swarm. This container-based approach helps simplify deployment of the platform, which is characterised by a microservice architecture with loosely coupled services. This enables scalability, modularity, integration of services, and new technology insertions.SBDI has customised the BioCollect module to remove region-specific constraints so that it can be more readily improved for environmental monitoring in Sweden. To further support this, there are plans to develop services for the distribution of terrestrial map layers, which will provide important habitat information for artificial intelligence and machine learning research projects.The Amplicon Sequence Variants (ASVs) portal, an interface to sequence-based observations, is an example of integration and new technology insertion. The portal developed in SBDI and seamlessly integrated with the ALA platform provides basic functionalities for searching ASVs and occurrence records using the Basic Local Alignment Search Tool (BLAST) or filters on sequencing details and taxonomy and for submitting metabarcoding dataset Fig. 2.Future developments for SBDI include a continued focus on eDNA and monitoring data as well as the implementation of procedures for handling sensitive data. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 09:01:11 +030
       
  • "Publish First": A Rapid, GPT-4 Based Digitisation System for Small
           Institutes with Minimal Resources

    • Abstract: Biodiversity Information Science and Standards 7: e112428
      DOI : 10.3897/biss.7.112428
      Authors : Rukaya Johaadien, Michal Torma : We present a streamlined technical solution ("Publish First") designed to assist smaller, resource-constrained herbaria in rapidly publishing their specimens to the Global Biodiversity Information Facility (GBIF).Specimen data from smaller herbaria, particularly those in biodiversity-rich regions of the world, provide a valuable and often unique contribution to the global pool of biodiversity knowledge (Marsico et al. 2020). However, these institutions often face challenges not applicable to larger herbaria, including a lack of staff with technical skills, limited staff hours for digitization work, inadequate financial resources for specialized scanning equipment, cameras, lights, and imaging stands, limited (or no) access to computers and collection management software, and unreliable internet connections. Data-scarce and biodiversity rich countries are also often linguistically diverse (Gorenflo et al. 2012), and staff may not have English skills, which means pre-existing online data publication resources and guides are of limited use.The "Publish First" method we are trialing, addresses several of these issues: it drastically simplifies the publication process so technical skills are not necessary; it minimizes administrative tasks saving time; it uses simple, cheap and easily available hardware; it does not require any specialized software; and the process is so simple that there is little to no need for any written instructions."Publish first" requires staff to attach QR code labels containing identifiers to herbarium specimen sheets, scan these sheets using a document scanner costing around €300, then drag and drop these files to an S3 bucket (a cloud container that specialises in storing files). Subsequently, these images are automatically processed through an Optical Character Recognition (OCR) service to extract text, which is then passed on to OpenAI's Generative Pre-Transformer 4 (GPT-4) Application Programming Interface (API), for standardization. The standardized data is integrated into a Darwin Core Archive file that is automatically published through GBIF's Integrated Publishing Toolkit (IPT) (GBIF 2021).The most technically challenging aspect of this project has been the standardization of OCR data to Darwin Core using the GPT-4 API, particularly in crafting precise prompts to address the inherent inconsistency and lack of reliability in these Large Language Models (LLMs). Despite this, GPT-4 outperformed our manual scraping efforts. Our choice of GPT-4 as a model was a naive one: we implemented the workflow on some pre-digitized specimens from previously published Norwegian collections, compared the published data on GBIF with GPT-4's Darwin Core standardized output, and found the results satisfactory. Moving forward, we plan to undertake more rigorous additional research to compare the effectiveness and cost-efficiency of different LLMs as Darwin Core standardization engines. We are also particularly interested in exploring the new "function calling" feature added to the GPT-4 API, as it promises to allow us to retrieve standardized data in a more consistent and structured format.This workflow is currently under trial in Tajikistan, and may possibly be used in Uzbekistan, Armenia and Italy in the near future. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 08:55:41 +030
       
  • Data Standards and Interoperability Challenges for Biodiversity Digital
           Twin: A novel and transformative approach to biodiversity research and
           application

    • Abstract: Biodiversity Information Science and Standards 7: e112373
      DOI : 10.3897/biss.7.112373
      Authors : Sharif Islam, Hanna Koivula, Dag Endresen, Erik Kusch, Dmitry Schigel, Wouter Addink : The Biodiversity Digital Twin (BioDT) project (2022-2025) aims to create prototypes that integrate various data sets, models, and expert domain knowledge enabling prediction capabilities and decision-making support for critical issues in biodiversity dynamics. While digital twin concepts have been applied in industries for continuous monitoring of physical phenomena, their application in biodiversity and environmental sciences presents novel challenges (Bauer et al. 2021, de Koning et al. 2023). In addition, successfully developing digital twins for biodiversity requires addressing interoperability challenges in data standards.BioDT is developing prototype digital twins based on use cases that span various data complexities, from point occurrence data to bioacoustics, covering nationwide forest states to specific communities and individual species. The project relies on FAIR principles (Findable, Accessible, Interoperable, and Reusable) and FAIR enabling resources like standards and vocabularies (Schultes et al. 2020) to enable the exchange, sharing, and reuse of biodiversity information, fostering collaboration among participating research infrastructures (DiSSCo, eLTER, GBIF, and LifeWatch) and data providers. It also involves creating a harmonised abstraction layer using Persistent Identifiers (PID) and FAIR Digital Object (FDO) records, alongside semantic mapping and crosswalk techniques to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). Governance and engagement with research infrastructure stakeholders play crucial roles in this regard, with a focus on aligning technical and data standards discussions.In addition to data, models and workflows are key elements in BioDT. Models in the BioDT context are formal representations of problems or processes, implemented through equations, algorithms, or a combination of both, which can be executed by machine entities. The current twin prototypes are considering both statistical and mechanistic models, introducing significant variations in (1) data requirements, (2) modelling approaches and philosophy, and (3) model output. The BioDT consortium will develop guidelines and protocols for how to describe these models, what metadata to include, and how they will interact with the diverse datasets. While discussions on this topic exist within the broader context of biodiversity and ecological sciences (Jeltsch et al. 2013, Fer et al. 2020), the BioDT project is strongly committed to finding a solution within its scope. In the twinning context, data and models need to be executed within a computing infrastructure and also need to adhere to FAIR principles. Software within BioDT includes a suite of tools that facilitate data acquisition, storage, processing, and analysis. While some of these tools already exist, the challenge lies in integrating them within the digital twinning framework. One approach to achieving integration is through workflow representation, encompassing standardised procedures and protocols that guide the acquisition, packaging, processing, and analysis of data. The project is exploring Research Object Crate (RO-Crate) implementation for this (Soiland-Reyes et al. 2022). Implementing workflows can ensure reproducibility, scalability, and transparency in research practices, enabling scientists to validate and replicate findings.The BioDT project offers a novel and transformative approach to biodiversity research and application. By leveraging collaborative research infrastructures and adhering to data standards, BioDT aims to harness the power of data, software, supercomputers, models, and expertise to provide new insights. The foundation provided by the data standards, including those of Biodiversity Information Standards (TDWG), is crucial in realising the full potential of digital twins, facilitating the seamless integration of diverse data sources and combinations with models. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 08:50:30 +030
       
  • Combining Camera Trap Data and Environmental Data to Estimate the Effects
           of Environmental Gradients on Abundance of the Asian Elephant Elephas
           maximus in Cambodia

    • Abstract: Biodiversity Information Science and Standards 7: e112100
      DOI : 10.3897/biss.7.112100
      Authors : Ret Thaung, Jackson Frechette, Matthew Luskin, Zachary Amir : Asian elephant (Elephas maximus) populations in Cambodia are currently declining, and the effect of environmental degradation on the abundance and health of elephants is poorly understood. We used camera trap data from 42 locations between 2016 to 2020 in the southern Cardamom Mountains to investigate the impact of environmental degradation on the abundance and condition of Asian elephants. Camera trap data were organized using CameraSweet software to retrieve both number of individuals and their condition. For a number of individuals, we defined independent captures spatially and temporally. To assess condition, we created a visual scoring system based on past research (Wemmer et al. 2006, Fernando et al. 2009, Morfeld et al. 2014, Wijeyamohan et al. 2014, Morfeld et al. 2016, Schiffmann et al. 2020). This scoring system relies on visual assessment of the muscle and fat in relation to the pelvis, ribs, and back bone. To validate this subjective scoring system, two scorers reviewed elephant captures by using 10 reference photos and then reviewing each other’s assessment in the first five images showing the elephant's body condition. This method minimizes subjective assessment from two scorers. Environmental variables (Suppl. material 1) such as distance to forest edge, forest integrity index, elevation, global human settlements, distance to road, distance to river, night light and forest cover were obtained, then reclassified in ArcGIS to a common 1 km grid. We implemented hierarchical N-mixture models to investigate the impacts of environmental variables on abundance and used cumulative link models to investigate the impact of the same environmental variables on condition. We found that Asian elephant abundance exhibited a significant positive relationship with distance to forest edges, where abundance was greater further away from a forest edge. We found that body condition score exhibited the relationship with forest cover and Forest Landscape Integrity Index, which suggested that grassland and less dense forest support better condition. Moreover, males exhibited significantly higher scores for body condition than females, while babies, juveniles, and subadults all exhibited lower body condition scores compared to adults. The significantly lower body condition of young elephants is concerning and suggests that conservation managers in the region should prioritize environmental conditions that support young elephant health. Our results identify key environmental variables that appear to promote Asian elephant abundance and health in the Cardamom Mountains, thus informing relevant conservation actions to support this endangered species in Cambodia and beyond. HTML XML PDF
      PubDate: Mon, 11 Sep 2023 08:46:32 +030
       
  • From the Shadows to the Spotlight: Unveiling Nepal's hidden kingdom of
           mushrooms and lichens through digitization

    • Abstract: Biodiversity Information Science and Standards 7: e112376
      DOI : 10.3897/biss.7.112376
      Authors : Shiva Devkota : The digitization of herbarium collections has brought forth a transformative journey, transitioning Nepal's hidden kingdom of mushrooms and lichens from the shadows into the spotlight. Through a collaborative work within the framework of Global Biodiversity Information Facility's Biodiversity Information Fund for Asia (GBIF-BIFA), involving the herbaria (KATH: Nepal's National Herbarium and Plant Laboratories and TUCH: Natural History Museum, Tribhuvan University, Nepal), and the research institute, Global Institute for Interdisciplinary Studies (GIIS), a successful unveiling of Nepal's mycological treasures has been achieved through digital means. A comprehensive digitization effort has resulted in the complete digitization of 3,971 mushroom specimens and 2,462 lichen specimens, illuminating a wealth of information for researchers, citizen scientists, and the general public. GBIF and the online database maintained by Nepal's National Herbarium and Plant Laboratories, Department of Plant Resources, serve as the gateway to this work (KATH 2021). Prior to this work, the specimens resided in the shadows, lacking the recognition they deserved. Through meticulous collection management, sorting, curation, and labeling, their secrets were unveiled, and their stories brought to our fingertips. These previously obscured specimens now possess registered individual catalogue numbers, allowing the quantification of Nepal's fungal wealth within the participating institutions. This project serves as a testament to the vital role of capturing available field-level data, preserving specimens, and harnessing the power of digitization to showcase Nepal's mycological and lichenological wonders to a global audience. Meanwhile, it has also emphasized the significance of sharing this knowledge and fostering appreciation for the overlooked world of mushrooms and lichens. HTML XML PDF
      PubDate: Fri, 8 Sep 2023 12:49:26 +0300
       
  • Connecting the Dots: Aligning human capacity through networks toward a
           globally interoperable Digital Extended Specimen (DES) infrastructure

    • Abstract: Biodiversity Information Science and Standards 7: e112390
      DOI : 10.3897/biss.7.112390
      Authors : Elizabeth R. Ellwood, Wouter Addink, John Bates, Andrew Bentley, Jutta Buschbom, Alina Freire-Fierro, Jose Fortes, David Jennings, Kerstin Lehnert, Bertram Ludäscher, Keping Ma, James Macklin, Austin Mast, Joe Miller, Gil Nelson, Nicky Nicolson, Jyotsna Pandey, Deborah Paul, Sinlan Poo, Richard Rabeler, Pamela S. Soltis, Elycia Wallis, Michael Webster, Andrew Young, Breda Zimkus : Thanks to substantial support for biodiversity data mobilization in recent decades, billions of occurrence records are openly available, documenting life on Earth and enabling timely research, awareness raising, and policy-making. Initiatives across local to global scales have been separately funded to serve different, yet often overlapping audiences of data users, and have developed a variety of platforms and infrastructures to meet the needs of these audiences. The independent progress of biodiversity data providers has led to innovations as well as challenges for the community at large as we move towards connecting and linking a diversity of information from disparate sources as Digital Extended Specimens (DES).Recognizing a need for deeper and more frequent opportunities for communication and collaboration across the globe, an ad-hoc group of representatives of various international, national, and regional organizations have been meeting virtually since 2020 to provide a forum for updates, announcements, and shared progress. This group is provisionally named International Partners for the Digital Extended Specimen (IPDES), and is guided by these four concepts: Biodiversity, Connection, Knowledge and Agency. Participants in IPDES include representatives of the Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), American Institute of Biological Sciences (AIBS), Biodiversity Collections Network (BCoN), Natural Science Collections Alliance (NSCA), Distributed System of Scientific Collections (DiSSCo), Atlas of Living Australia (ALA), Biodiversity Information Standards (TDWG), Society for the Preservation of Natural History Collections (SPNHC), National Specimen Information Infrastructure of China (NSII), and South African National Biodiversity Institute (SANBI), as well as individuals involved with biodiversity informatics initiatives, natural science collections, museums, herbaria, and universities. Our global partners group strives to increase representation from around the globe as we aim to enable research that contributes to novel discoveries and addresses the societal challenges leading to the biodiversity crisis. Our overarching mission is to expand on the community-driven successes to connect biodiversity data and knowledge through coordination of a globally integrated network of stakeholders to enable an extensible technical and social infrastructure of data, tools, and working practices in support of our vision.The main work of our group thus far includes publishing a paper on the Digital Extended Specimen (Hardisty et al. 2022), organizing and hosting an array of activities at conferences, and asynchronous online work and forum-based exchanges. We aim to advance discussion on topics of broad interest to our community such as social and technical capacity building, broadening participation, expanding social and data networks, improving data models and building a backbone for the DES, and identifying international funding solutions.This presentation will highlight some of these activities and detail progress towards a roadmap for the development of the human network and technical infrastructure necessary to support the DES. It provides an opportunity for feedback from and engagement by stakeholder communities such as TDWG and other initiatives with a focus on data standards and biodiversity informatics, as we solidify our plans for the future in support of integrated and interconnected biodiversity data and credit for those doing the work. HTML XML PDF
      PubDate: Fri, 8 Sep 2023 10:08:58 +0300
       
  • GBIF-Compliant Data Pipeline for the Management and Publication of a
           Global Taxonomic Reference List of Pests in Natural History Collections

    • Abstract: Biodiversity Information Science and Standards 7: e112391
      DOI : 10.3897/biss.7.112391
      Authors : Carla Novoa Sepúlveda, Stephan Biebl, Nadja Pöllath, Stefan Seifert, Markus Weiss, Tanja Weibulat, Dagmar Triebel : There is a growing demand for monitoring pests in natural history collections (NHCs) and establishing integrated pest management (IPM) solutions (Crossman and Ryde 2022). In this context, up-to-date taxonomic reference lists and controlled vocabularies following standard schemes are crucial and facilitate recording organisms detected in collections.The data pipeline described here results in the publication of a taxon reference list based on information from online resources and standard IPM literature. Most of the over 140 pest taxa on species level and above are insects, the rest belong to other animal groups and fungi.The complete taxon names, synonyms, English and German common names, and the hierarchical classification (parent-child relationships) are organised in a client-server installation of DiversityTaxonNames (DTN) at the Bavarian Natural History Collections (SNSB). DTN is a Microsoft Structured Query Language (MS SQL) database tool of the Diversity Workbench (DWB) framework with a published Entity Relation (ER) diagram (Hagedorn et al. 2019). The management is done using the Global Biodiversity Information Facility (GBIF) backbone taxonomy as external name resource, with linkage to the respective Wikidata Q item ID as a external persistent identifier (PID). Moreover, information on pest occurrence in NHCs is given, distinguishing the Consortium of European Taxonomic Facilities (CETAF) major NHC collection types affected (i.e., heritage sciences, life sciences and earth sciences) and the object categories, e.g., natural objects/specimens damaged. The data management in DTN enables the long-running curation, done by list curators.The generic data pipeline for the management and publication of a Global Taxonomic Reference List of Pests in NHCs is based on the DTN taxon lists concept and architecture and described under About "Taxon list of pest organisms for IPM at natural history collections compiled at the SNSB". It includes four steps (A–D) with significant results for best practices of data processing (Fig. 1).A. The data is managed and processed for publication by list curators in the database DiversityTaxonNames (DTN).As a result, the list can be kept up-to-date and is—without transformation—ready to be used for IPM solutions at any NHC with a DiversityCollection installation and as part of the DWB cloud services.B. The up-to-date data is publicly available via the DTN REST Webservice for Taxon Lists with machine-readable Application Programming Interface (API).As a result, the dynamic list publication service can be used as a reference backbone for establishing IPM solutions for pest monitoring at any NHC.C. The data is provided via the GBIF checklist data publication pipeline of the SNSB through GBIF validation tools and Darwin Core Archive in DwC-A (zip format)  for GBIF.As a result, the checklist information becomes part of the GBIF network with GBIF ChecklistBank and GBIF Global Taxonomy. This ensures future compliance   of data with the Findability, Accessibility, Interoperability, and Reuse (FAIR) guiding principles.D. The DTN REST Web service for Taxon Lists (currently 60 lists) is registered and accessible through the German Federation for Biological Data (GFBio) Terminology service.As a result, the lists with external PIDs and other information are available as a service (see DTN lists overview). In the upcoming Research Data Commons of the German National Research Data Infrastructure (NFDI) Initiative (Diepenbroek et al. 2021), it will be part of a standardized layer of APIs with an agreed interface scheme for improved accessibility.The provided tools, API and data are part of the upcoming NFDI4Biodiversity service portfolio. Future scenarios include the use of the list items and properties as classes for diagnosis purposes with DiversityNaviKey (Triebel et al. 2021) including the publication of images for identifying pests. HTML XML PDF
      PubDate: Fri, 8 Sep 2023 10:00:28 +0300
       
  • Towards Improved Data Flows for the Management of Invasive Alien Species
           and Wildlife: A LIFE RIPARIAS use case

    • Abstract: Biodiversity Information Science and Standards 7: e112386
      DOI : 10.3897/biss.7.112386
      Authors : Lien Reyserhove, Pieter Huybrechts, Jasmijn Hillaert, Tim Adriaens, Bram D'hondt, Damiano Oldoni : Invasive alien species (IAS) are recognised as a major threat to biodiversity. To prevent the introduction and spread of IAS, the European Union Regulation (EU) 1143/2014 imposes an obligation on Member States to both develop management strategies for IAS of Union Concern and report on those interventions. For this, we need to collect and combine management data and streamline management actions. This is still a major challenge: the landscape of IAS management is diverse and includes different authorities, managers, businesses and non-governmental organizations. Some organizations have developed their own specific software applications for recording management actions. For other organizations, such a software system is lacking. Their management data are scattered, not harmonized, and often not openly available. For EU reporting, a workflow is needed to centralize all information about the applied management method, management effort, cost, effectiveness and impact of the performed actions on other biota or the environment. At this moment, such a workflow is lacking in Belgium.One of the aims of the LIFE RIPARIAS project is to set up a workflow for harmonizing IAS management data in Belgium. Based on the input from the IAS management community in Belgium, we were able to:draft a community-driven data model and exchange format called manIAS (MANagement of Invasive Alien Species, Reyserhove et al. 2022), andidentify the minimal requirements a software application should have for being successfully used in the field (Hillaert et al. 2022).In this presentation, we will explore both outputs, the lessons learned and the way forward. With our work, we aim to facilitate coordination and transfer of information between the different actors involved in IAS and wildlife management, not only on a Belgian scale, but also within an international context.  HTML XML PDF
      PubDate: Fri, 8 Sep 2023 09:55:27 +0300
       
  • EMODnet Biology: Unlocking European marine biodiversity data

    • Abstract: Biodiversity Information Science and Standards 7: e112147
      DOI : 10.3897/biss.7.112147
      Authors : Ruben Perez Perez, Joana Beja, Leen Vandepitte, Marina Lipizer, Benjamin Weigel, Bart Vanhoorne : EMODnet Biology (hosted and coordinated by the Flanders Marine Institute (VLIZ)) is one of the seven themes within the European Marine Observation and Data network (EMODnet). The EMODnet Biology consortium aims to facilitate the accessibility and usage of marine biodiversity data. With the principle of "collect once, use many times" at its core, EMODnet Biology fosters collaboration across various sectors, including research, policy-making, industry, and individual citizens, to enhance knowledge sharing and inform decision-making.EMODnet Biology focuses on providing free and open access to comprehensive historical and recent data on the occurrence of marine species and their traits in all European regional seas. It achieves this through partnerships and collaboration with diverse international initiatives, such as the World Register of Marine Species (WoRMS), Marine Regions and the European node of the Ocean Biodiversity Information System (EurOBIS) among others. By promoting the usage of the Darwin Core Standard (Wieczorek et al. 2012), EMODnet Biology fosters data interoperability and ensures seamless integration with wider networks such as the Global Biodiversity Information Facility (GBIF) and the Ocean Biodiversity Information System (OBIS), serving as a significant data provider of the latter, as it is responsible for most of its data generated in Europe.Since its inception, EMODnet Biology has undertaken actions covering various areas, includingproviding access to marine biological data with spatio-temporal, taxonomic, environmental- and sampling-related information among others;developing an exhaustive data quality control tool based on the Darwin Core standard, the British Oceanographic Data Centre and Natural Environment Research Council Vocabulary Server (BODC NVS2) parameters and other controlled vocabularies used;creating and providing training courses to guide data providers;performing gap analyses to identify data quality and coverage shortcomings;creating and publishing marine biological distribution maps for various species or species groups; andinteracting with international and European initiatives, projects and organizations.Furthermore, EMODnet Biology contributes to the overall EMODnet initiative, which covers multidisciplinary data and products. Thanks to the use of standard protocols and tools across disciplines, EMODnet Biology products can contribute to multidisciplinary analysis of pressures and impacts on key marine species and habitats, and, lastly, support a better management and planning of the maritime space.In conclusion, EMODnet Biology plays a pivotal role in biodiversity informatics by providing users with a wealth of accessible and reusable marine biodiversity data and products. Its collaborative approach, extensive partnerships, and adherence to the FAIR (Findable, Accessible, Interoperable, Reusable) data principles (Wilkinson et al. 2016) as well as to the Infrastructure for Spatial Information in Europe (INSPIRE) metadata technical guidelines (European Commission Joint Research Centre 2013) and the Open Geospatial Consortium (OGC) standards make it a valuable resource for advancing knowledge, informing policies, and supporting sustainable management of marine ecosystems. HTML XML PDF
      PubDate: Fri, 8 Sep 2023 09:40:32 +0300
       
  • Celebrating BHL Australia through the Eye of the (Tasmanian) Tiger

    • Abstract: Biodiversity Information Science and Standards 7: e112352
      DOI : 10.3897/biss.7.112352
      Authors : Nicole Kearney : BHL Australia, the Australian branch of the Biodiversity Heritage Library (BHL), was launched in 2010 and began operation with a single organisation, Museums Victoria in Melbourne. Since then, it has grown considerably. Funded by the Atlas of Living Australia, BHL Australia now digitises biodiversity literature on behalf of 42 organisations across the country. These organisations include museums, herbaria, state libraries, royal societies, government agencies, field naturalist clubs and natural history publishers, many of whom lack the resources to do this work themselves. BHL Australia’s national consortium model, which makes biodiversity literature accessible on behalf of so many organisations, is unique amongst the BHL global community. Most BHL operations digitise material on behalf of a single organisation.BHL Australia has now made over 530,000 pages of Australia’s biodiversity knowledge freely accessible online. The BHL Australia Collection includes both published works (books and journals) and unpublished material (collection registers, field diaries and correspondence). The pages of these works are filled with species descriptions, references to historically significant people and, most importantly, scientific data that is critical to ongoing research and conservation efforts. Providing access to materials published as far back as the 1600s and as recently as the current year, the collection chronicles the scientific discovery and understanding of Australia’s biodiversity.BHL Australia also leads the global initiative to bring the world's historic biodiversity and taxonomic literature into the modern linked network of scholarly research by incorporating article data into BHL and retrospectively assigning
      DOI s (Digital Object Identifiers) (Kearney et al. 2021). BHL has now assigned more than 162,000
      DOI s to historic publications, making them persistently citable and trackable, both within BHL and beyond. This paper will celebrate the achievements of BHL Australia by journeying through the (now accessible, discoverable and
      DOI 'd) Tasmanian Tiger literature. It will showcase:previously elusive descriptions (and beautiful illustrations) of Thylacines, including those by Gerhard Krefft (1871) https://doi.org/10.5962/p.314741, and John Gould (1863) https://doi.org/10.5962/p.312790;the invaluable creation of links to open access versions from paywalled publications that should be in the public domain, such as the first description of the Thylacine (Harris 1808): open access on BHL; paywalled by Oxford Academic;the many citations of historic taxonomic descriptions that are now appearing as clickable
      DOI links in modern scholarly articles, taxonomic databases, social media, and Wikipedia (Kearney and Page 2022); andthe efforts being made to encourage more authors to cite the authoritative source of taxonomic names (Benichou 2022).The extinction of the Thylacine is a stark reminder of the irreversible consequences that arise from a lack of understanding and appreciation of the natural world. Similarly, a lack of access and/or the inability to find biodiversity knowledge hinders our capacity to learn from the past, impeding scientific progress and conservation efforts.The Biodiversity Heritage Library was created “to address a major obstacle to scientific research: lack of access to natural history literature” (BHL 2019). BHL Australia has made a substantial contribution to this global mission and has played a significant role in BHL’s transition to a fully searchable, persistently linkable component of the biodiversity knowledge graph (Kearney 2020, Page 2016). HTML XML PDF
      PubDate: Fri, 8 Sep 2023 09:35:14 +0300
       
  • Implementing CARE Principles to Link Noongar Language and Knowledge to
           Western Science through the Atlas of Living Australia

    • Abstract: Biodiversity Information Science and Standards 7: e112349
      DOI : 10.3897/biss.7.112349
      Authors : Nat Raisbeck-Brown, Denise Smith-Ali : The Atlas of Living Australia (ALA), Australia's national online biodiversity database, is partnering with the Noongar Boodjar Language Centre (NBALC) to promote Indigenous language and knowledge by including Noongar names for plants and animals in the ALA. Names are included in the ALA species page for each plant and animal and knowledge is built into the Noongar Plant and Animal online Encyclopedia, hosted in the ALA. We demonstrate the use of CARE principles (Collective Benefit, Authority to Control, Responsibility, and Ethics (Carroll et al. 2020)) to engage, support, and deliver the project and outcomes to the Noongar people and communities working with us. The ALA addresses the FAIR principles (Wilkinson et al. 2016) for data management and stewardship ensuring data are findable, accessable, interoperable, and reusable. The ALA is partnering with NBALC in Perth to ensure all sharing of Noongar data is on Noongar terms. NBALC and ALA have been working with Noongar-Wadjari, a southern clan from the Fitzgerald River area in Western Australia, to collect, protect and share their language and traditional knowledge for local species.*1The Noongar Encyclopedia project exhibits Collective Benefit because it is a co-innovation project that was co-designed by NBALC and ALA. The project’s activities were designed by the Community-endorsed representatives, the Knowledge Holders. The aims and aspirations of the Community were included in the project design to ensure equitable outcomes. NBALC’s more than 25-year relationship with the Community, and as Noongar people themselves, meant they had a good understanding of what the Community might want from the project. These assumptions were tested and refined during the first Community consultation, before the project plan was finalised. The Community are keen for their traditional knowledge to be shared and freely available to their Community. The ALA only shared knowledge that has passed through strict consent processes. It is seen as a safe and stable digital environment for now and the future, and where the traditional knowledge can be accessed freely and easily. The link to western science knowledge is secondary to knowledge sharing for most of the Aboriginal and Torres Strait Islander Communities that the ALA are working with although the benefits of scientists having access to both knowledge systems is seen as a positive step in care for Country into the future.The Noongar Encyclopedia project ensures Noongar Authority to Control these data because NBALC, as an Aboriginal organisation, led by Noongar people, understands the rights and interests of the Communities we are working with. Protection of these rights and inclusion of Community interests are written into the project methodology as part of the project co-design. It is important to ensure the project is working with the right people within the Community. NBALC facilitates this by finding people who hold traditional knowledge, and can trace the stories back to their source. The appropriate governance of data is ensured because all collected data are stored and managed by NBALC. Project design includes rolling consent from Knowledge Holders who review all data collected, add or edit as needed, and give or deny consent for knowledge to be shared publicly through the ALA. The Noongar Encyclopedia project design ensures we understand the Responsibilities (CARE "R") involved in Indigenous data collection, protection, management and sharing. Through the partnership with the ALA, the NBALC is expanding its capabilities and capacity for digital data collection and management. The Community is building its capabilities and capacity for working with linguists and scientists. Including the Noongar language and traditional knowledge in the ALA shows non-Indigenous users of the ALA that there is another way to name, look at, talk about and record knowledge about species. This view differs from Western science. The Noongar view everything to be connected and group things based on their use and connectivity. Western science tends to classify species based on their physical attributes. Language is the key to this alternate world view. The ALA now publishes the scientific name, the English name and the Noongar word/s. The ALA links to the alternate science view of these species through the Noongar Encyclopedia, and two other Ecological Knowledge Encyclopedias (Kamilaroi and South East Arnhem Land).The Noongar Encyclopedia project is constantly subjected to Ethical assessment by the Community and through stringent Western ethical assessments and reviews. The Community ethical assessment included the project undergoing a number of evaluations before it started. The projects are co-designed with NBALC to ensure they are within protocol and community expectations. The ALA are then introduced to the Community. The Community decide if they are interested in the project, if it meets their aspirations, if they feel comfortable working with the ALA and potentially other scientists. Contributing scientists or academics are introduced to the Community by the NBALC. The Community maintains the right to decline to work with any introduced scientist or academic. All contributors are informed of this protocol before they are introduced to the Community.The Noongar Plant and Animal Encyclopedia was published in September 2021 (NBALC 2021). HTML XML PDF
      PubDate: Fri, 8 Sep 2023 09:30:45 +0300
       
  • Bidirectional Linking: Benefits, challenges, pitfalls, and solutions

    • Abstract: Biodiversity Information Science and Standards 7: e112344
      DOI : 10.3897/biss.7.112344
      Authors : Guido Sautter, Donat Agosti : Taxonomy, and biodiversity science in general, mainly revolve around four types of entities, which are available digitally in ever increasing numbers from different services: (1) Physical specimens (kept in museums and other collections around the world) and observations are available digitally via the Global Biodiversity Information Facility (GBIF). (2) DNA sequences (often derived from preserved specimens) are available from the European Nucleotide Archive (ENA) and National Center for Biotechnology Information (NCBI), having accession numbers as their primary means of citation. (3) Taxa, identified by taxon names, are increasingly registered to nomenclatural reference databases (ZooBank, International Plant Names Index (IPNI)) and aggregated in the Catalogue of Life (CoL). (4) Taxonomic treatments combine the former three; they define taxa, express scientific opinions about existing taxa, based upon specimens as well as DNA sequences derived from themand coin respective names; they are available from TreatmentBank (as well as Zenodo/Biodiversity Literature Repository (BLR) and Swiss Institute of Bioinformatics Literature Services (SIBiLS), and GBIF).Traditionally, treatments cite specimens, taxa, and other treatments in mainly human-centric ways, describing where to find the cited object, but they are not immediately actionable in a digital sense. Specimen citations use institution and collection codes and catalog numbers (often combined with geographical and environmental data). Taxon names are a type of self-citing entities, especially when given in combination with their (bibliographic) authorship, as they represent a historical approach to human-readable taxon identifiers. Citations of treatments are very similar to those of taxon names, adding (bibliographic) information of subsequent name usages as needed. Accession numbers for DNA sequences are the closest to modern digital identifiers. However, none of these means of citation, as usually found in literature, are readily machine actionable, which makes them hard to process at scale and analyze programmatically. Identifiers coined by the various data providers, in combination with APIs to resolve them, alleviate this problem and enable computational navigation of such links. However, this alone only defers the problem, as actionable identifiers (e.g., HTTP URIs) at some point still need to be inferred from the information given in the traditional means of citation where the latter occur in data.Recent projects, like BiCIKL, aim to add machine navigable links to the various entities (or respective data records) at scale, in pursuit of (ideally) fully intermeshed records, connecting (1) treatments to subject taxon names and concepts, cited specimens and DNA sequences, as well as cited treatments (with explicit nomenclatorial implications, e.g., taxon name synonymies or rebuttals thereof), (2) (digital) specimens to assigned taxon names, citing treatments, and any derived DNA sequences, (3) DNA sequences to source specimens (or their digital counterparts), where applicable, assigned taxon names, and citing treatments, and (4) taxon names to defining and synonymizing treatments, associated (digital) specimens, and any derived DNA sequences. This removes possible issues with transitive dependencies in a sequence of links, as an intermediate point of failure; all major data providers have been doing this to various degrees for some time, which provides a great starting point, but several challenges and pitfalls remain: For valid technical reasons, the systems of the individual data providers are (and need to be) self-contained, which comes at the cost of a certain amount of duplication (e.g., GBIF and ENA/NCBI backbone taxonomies). This is unproblematic per se, but slows down update proliferation and can incur some discrepancies. Further, traditional human-readable identifiers can be somewhat ambiguous: (1) some institution and collection codes are not unique, or authors use them in non-standard ways (some codes in the Global Registry of Scientific Collections (GrSciColl) point to half a dozen different institutions, for instance); (2) certain catalog numbers of museum specimens are also valid (resolvable) accession numbers, with actual semantics only emerging from context; (3) absence of the latter renders the semantics of data presented in tables especially hard to infer; (4) none of the providers has complete data coverage, so linking is not even technically possible in all cases at any given point, and some links can only be added over time, as coverage and thus overlap between data increases (newly published names cannot possibly be in CoL when the defining treatment gets digitized, for instance); (5) occasional full re-computation or re-processing is impractical and wasteful at best.In this presentation, we discuss various ways of overcoming the outlined challenges and avoiding the described pitfalls, and also make related suggestions for APIs to better support respective mechanisms. HTML XML PDF
      PubDate: Fri, 8 Sep 2023 09:25:47 +0300
       
  • The Role of the OLS Program in the Development of echinopscis (an
           Extensible Notebook for Open Science on Specimens)

    • Abstract: Biodiversity Information Science and Standards 7: e112318
      DOI : 10.3897/biss.7.112318
      Authors : Nicky Nicolson, Eve Lucas : Starting in early 2022, biodiversity informatics researchers at Kew have been developing echinopscis: an "extensible notebook for open science on specimens". This aims to build on the early experiments that our community conducted with "e-taxonomy": the development of tools and techniques to enable taxonomic research to be conducted online. Early e-taxonomic tools (e.g., Scratchpads Smith et al. 2011) had to perform a wide range of functions, but in the past decade or so the move towards open science has built better support for generic functionality, such as reference management (Zotero) and document production (pandoc), skills development in automation and revision control to support reproducible science, as documented by the Turing Way (The Turing Way Community 2022), and an awareness of the importance of community building. We have developed echinopscis at Kew via a cross-departmental collaboration between researchers in biodiversity informatics and accelerated taxonomy. We have also benefitted from valuable input and advice from our many colleagues in associated projects and organisations around the world. OLS (originally Open Life Sciences) is a training and mentoring program for Open Science leaders with a focus on community building. The name was recently (2023) made more generic—"Open Seeds"—whilst retaining their well-known acronym "OLS"*1.  OLS is a 16-week cohort-based mentoring program. Participants apply to join a cohort with a project that is developed through the 16 weeks. Each week of the syllabus alternates between time with a dedicated Open Science mentor and cohort calls, which are used to develop skills in project design, community building, open development & licencing, and inclusivity. Over 500 practitioners, experts and learners have participated across the seven completed cohorts of OLS' Open Seeds training and mentoring. Through this programme, over 300 researchers and open leaders from across six continents have designed, lauched and supported 200 projects from different disciplines worldwide. The next cohort will run between September 2023 and January 2024, and will be the eighth iteration of the program. This talk will briefly outline the work that we have done to setup and experiment with echinopscis, but will focus on the impact that the OLS program has had in its development. We will also include the use of techniques learned through OLS in other biodiversity informatics projects. OLS acknowledges that their program receives relatively few applications from project leads in biodiversity and we hope that this talk will be informative for Biodiversity Information Standards (TDWG) participants and can be used to build productive links between these communities. HTML XML PDF
      PubDate: Fri, 8 Sep 2023 09:20:29 +0300
       
  • Integration of Ecosystem Services and Habitats into the Biodiversity Atlas
           Austria

    • Abstract: Biodiversity Information Science and Standards 7: e112315
      DOI : 10.3897/biss.7.112315
      Authors : Tanja Lumetsberger, Georg Neubauer, Reinhardt Wenzina : The Biodiversity Atlas Austria (“Biodiversitäts-Atlas Österreich”) is a data portal to explore Austria’s biodiversity. It is based on the open-source infrastructure of the Atlas of Living Australia (ALA) and was launched with support of the Living Atlas (LA) community in late 2019 by the Biodiversity Hub of the University of Continuing Education Krems funded by the Government of Lower Austria. At present, it stores more than 8.5 million species occurrence records from various data partners and institutions and is available in both English and German. The Atlas is running on two virtual machines with 4 TB storage and is hosting many of the ALA-developed tools and services such as collectory, biocache, biodiversity information explorer, regions, spatial portal, sensitive data service, lists, images, and dashboard.In the project “ÖKOLEITA” (2021-2023), two new tools were developed within the existing LA infrastructure and will be launched in late 2023 to allow users to deal with ecosystem services and habitat data.The “ecosys”-tool will allow management, visualization, and analysis  of ecosystem services by uploading different (raster or vector) TIFF files containing mapped ecosystem services to the geoserver. Users will be able to inspect various ecosystem services at a specific geolocation or compare different geolocations or a transect on their respective ecosystem service potential. The ecosystem service values are presented on the one hand as pictograms, where the value is transformed into quintiles, orienting on the work by Schreder et al. (2018), and as bar chart showing the true values.The “habitat” tool will store and manage datasets of habitat mappings (shapefiles) and allow users to spatially explore those various habitat mappings on a map. Users will be able to search for specific habitats across all datasets or a specific one and get all occurrences of this habitat type returned. Through linkage to the biocache, a click on a specific area reveals the list of species found within that habitat recording, as well as all the species occurrences within that area stored in the database. A “habitat backbone” of the most used habitat classifications in Austria will allow dealing with habitat mappings that use different classifications.Both tools are integrated into the Living Atlases infrastructure and communicate with the other tools and services of the Biodiversity Atlas Austria (Fig. 1). They share a common administration back-end but have different front-ends, where the users can explore the ecosystem services and habitats spatially and in connection with species occurrence records and other contextual information. HTML XML PDF
      PubDate: Fri, 8 Sep 2023 09:15:54 +0300
       
  • The Global Biodata Coalition: Towards a sustainable biodata infrastructure

    • Abstract: Biodiversity Information Science and Standards 7: e112303
      DOI : 10.3897/biss.7.112303
      Authors : Chuck Cook, Guy Cochrane : Progress in life and biomedical sciences depends absolutely on biodata resources—databases comprising biological data and services around those databases. Supporting scientists in data operations and spanning management, analysis and publication of newly generated data and access to pre-existing reference data, these biodata resources together comprise a critical infrastructure for life science and biomedical research. Familiar scientific infrastructures—for example the Conseil Européen pour la Recherche Nucléaire (CERN) or the Square Kilometer Array, are distinct, constructed, physical entities that are centrally funded and managed at one or more identifiable locations. By contrast, the primary infrastructure of the life sciences—comprised of databases and other biological data resources—is globally distributed, virtually connected, funded from multiple sources, and is not managed as a coordinated entity. While this configuration supports innovation, it lends itself poorly to the long-term sustainability of individual biodata resources and of the infrastructure as a whole. The Global Biodata Coalition (GBC) brings together life science research funding organisations that recognise these challenges and acknowledge the threat that the lack of sustainability poses. They agree to work together to find ways to improve sustainability.In the presentation, we will provide an overview of the global biodata resource infrastructure, focusing in particular on challenges to providing sustained long-term funding to the resources that comprise the infrastructure. This will provide a global context to other presentations in the session, which focus on biodata resources in Australia.Covering some of the work that GBC has carried out to understand and classify biodata resources and the entire biodata resource infrastructure, we will outline the Global Core Biodata Resource programme and Inventory project and also introduce the stakeholder consultation processes around approaches to sustainability and open data. Finally, we  will lay out the path GBC is taking to engage researchers, informaticians, funding organisations and other stakeholders in moving towards greater sustainability for these critical resources HTML XML PDF
      PubDate: Fri, 8 Sep 2023 09:10:13 +0300
       
  • FAIR but not Necessarily Open: Sensitive data in the domain of
           biodiversity

    • Abstract: Biodiversity Information Science and Standards 7: e112296
      DOI : 10.3897/biss.7.112296
      Authors : Patricia Mergen, Sofie Meeus, Frederik Leliaert : In the framework of implementing the European Open Science Cloud (EOSC), there is still confusion between the concept of data FAIRness (Findable, Accessible, Interoperable and Re-usable, Wilkinson et al. 2016) and the idea of open and freely accessible data, which are not necessarily the same. Data can indeed comply with the requirements of FAIRness even if their access is moderated or behind a paywall. Therefore the motto of EOSC is actually “As open as possible, as closed as necessary”. This confusion or misinterpretation of definitions has raised concerns among potential data providers who fear being obligated to make sensitive data openly accessible and freely available, even if there are valid reasons for restrictions, or to forfeit any charges or hamper profit making if the data generate revenue. As a result, there has been some reluctance to fully engage in the activities related to FAIR data and the EOSC.When addressing sensitive data, what comes to mind are personal data governed by the General Data Protection Regulation (GDPR), as well as clinical, security, military, or commercially valuable data protected by patents. In the domain of biodiversity or natural history collections, it is often reported that these issues surrounding sensitive data regulations have less impact, especially when contributors are properly cited and embargo periods are respected. However, there are cases in this domain where sensitive data must be considered for legal or ethical purposes. Examples include protected or endangered species, where the exact geographic coordinates might not be shared openly to avoid poaching; cases of Access and Benefit sharing (ABS), depending on the country of origin of the species; the respect of traditional knowledge; and a desire to limit the commercial exploitation of the data. The requirements of the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity, as well as the upcoming Digital Sequence Information regulations (DSI), play an important role here. The Digital Services Act (DSA) was recently adopted with the aim of the protection of the digital space against the spread of illegal content, which sets the interoperability requirements for operators of data spaces. This raises questions on the actual definition of data spaces and how they would be affected by this new European legislation but with a worldwide impact on widely used social media and content platforms such as Google or YouTube.During the implementation and updating activities in projects and initiatives like Biodiversity Community Integrated Knowledge Library (BiCIKL), it became clear that there is a need to offer a secure data repository and management system that can deal with both open and non-open data in order to effectively include all potential data providers and mobilise their content while adhering to FAIR requirements.In this talk, after a general introduction about sensitive data, we will provide several examples in the biodiversity and natural sciences domains on how to deal with sensitive data and their management, such as recommended by GBIF. Last, but not least, we will highlight how important it is to use internationally accepted standards such as those from Biodiversity Information Standards (TDWG) to achieve such developments in the context of the Biodiversity Knowledge Hub (BKH) implemented by BiCIKL. Notably, by providing clear metadata about the terms of use, citation requirements and licensing, actual re-use of the data is made possible both legally and efficiently. HTML XML PDF
      PubDate: Fri, 8 Sep 2023 09:06:56 +0300
       
  • The Impossible Museum: A national infrastructure to digitise the UK’s
           natural science collections

    • Abstract: Biodiversity Information Science and Standards 7: e112294
      DOI : 10.3897/biss.7.112294
      Authors : Vincent Smith, Helen Hardy, Laurence Livermore, Lisa French, Tara Wainwright, Josh Humphries : The Distributed System of Scientific Collections UK (DiSSCo United Kingdom, Smith et al. 2022) is a proposal to the UK Research and Innovation (UKRI) Infrastructure Programme to revolutionise how we manage, share and use the UK’s natural science collections, creating a distributed network that provides a step change in research infrastructure for the UK. While the physical integration of such a collection would be almost inconceivable, its digital integration is within reach. Building on the UK Natural History Museum’s (NHM) digitisation programme and in partnership with more than 90 collection-holding institutions across the length and breadth of the UK, DiSSCo UK seeks to unlock the full scientific, economic and social benefits of the UK’s natural science collections, which are presently constrained by the limits of physical access. With just 8% of the UK’s 137 million specimens currently available digitally, their role in the emerging biodiversity data revolution is diminished. Through nationally coordinated action, DiSSCo UK seeks to massively accelerate the digitisation of these collections and the impact of these data. Five options to digitise UK collections are presently under consideration. These options iterate across the collection groups, number and type of institution, technical infrastructure level and "catalysis" to capitalise on the benefits of unlocking data and accelerating data production. Subject to UKRI approval, the full business cases for a preferred option will go through an 18–24 month approval process starting November 2023, unlocking tens to hundreds of millions of pounds of investment in UK collections. We will outline the strategic case, options and operational model for DISSCo UK, updating on our coordination, digitisation and catalysis activities. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 11:36:36 +0300
       
  • Towards a Distributed System for Essential Variables for the Southern
           Ocean

    • Abstract: Biodiversity Information Science and Standards 7: e112289
      DOI : 10.3897/biss.7.112289
      Authors : Anton P. van de Putte, Yi-Ming Gan, Alyce Hancock, Ben Raymond : The Southern Ocean (SO), delinated to the north by the Antarctic convergence, is a unique environment that experiences rapid change in some areas while remaining relatively untouched by human activities. At the same time, these ecosystems are under severe threat from climate change and other stressors. While our understanding of SO biological processes (e.g., species distributions, feeding ecology, reproduction) has greatly improved in recent years, biological data for the region remains patchy, sparse, and unstandardised depending on the taxonomic group (Griffiths et al. 2014).Due to the scarcity of standardised observations and data, it is difficult to model and predict SO ecosystem responses to climate change, which is often accompanied by other anthropogenic pressures, such as fishing and tourism. Understanding the dynamics and change in the SO necessitates a comprehensive system of observations, data management, scientific analysis, and ensuing policy recommendations. It should be built as much as feasible from current platforms and standards, and it should be visible, verifiable and shared in accordance with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles (Van de Putte and Griffiths 2021). For this we need to identify the stakeholders' needs, sources of data, the algorithms for analysing the data and the infrastructure on which to run the algorithms (Benson and Brooks 2018). Existing synergistic methods for identifying selected variables for (life) monitoring include Essential Biodiversity Variables (EBVs; Pereira and Ferrier 2013), Essential Ocean Variables (EOVs; Miloslavich and Bax 2018), Essential Climate Variables (ECVs; Bojinski and Verstraete 2014), and ecosystem Essential Ocean Variables (eEOVs; Constable and Costa 2016). (For an overview see Muller-Karger and Miloslavich 2018.) These variables, can be integrated into the Southern Ocean Observation System (SOOS) and SOOSmap but also national or global systems (e.g., Group on Earth Observations-Biodiversty Observation Network (GEO-BON)). The resulting data products can in turn be used to inform policy makers.The use of Essential Variables (EVs) marks a significant step forward in the monitoring and assessment of SO ecosystems. However, these EVs will necessitate prioritising certain variables and data collection. Here we present the outcomes of a workshop organised in August 2023 that aimed to outline the set Essential Variables and workflows required for a distributed system that can translate biodiversity data (and environmental data) into policy-relevant data products.The goals of the workshop were:Create an inventory of EVs relevant for the Southern Ocean based on existing efforts by the GEO-BON and the Marine Biodiversity Observation Network (MBON).Identify data requirements and data gaps for calculating such EVs and prioritise EVs to work on.Identify existing workflows and tools.Develop a framework for developing the workflows required to turn public biodiversity data into relevant EVs. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 11:33:25 +0300
       
  • A Novel Part in the Swiss Army Knife for Linking Biodiversity Data: The
           digital specimen identifier service

    • Abstract: Biodiversity Information Science and Standards 7: e112283
      DOI : 10.3897/biss.7.112283
      Authors : Wouter Addink, Soulaine Theocharides, Sharif Islam : Digital specimens are new information objects on the internet, which act as digital surrogates of the physical objects they represent. They are designed to be extended with data derived from the specimen like genetic, morphological and chemical data, and with data that puts the specimen in context of its gathering event and the environment it was derived from. This requires linking the digital specimens and their related entities to information about agents, locations, publications, taxa and environmental information. To establish reliable links and (re-)connect data to specimens, a new framework is needed, which creates persistent identifiers (PIDs) for the digital specimen and its related entities. These PIDs should be actionable by machines but also can be used by humans for data citation and communication purposes.The framework that enables this is a new PID infrastructure, produced by the European Commission-funded BiCIKL project (Biodiversity Community Integrated Knowledge Library), creates persistent and actionable identifiers. It is a generic PID infrastructure that will be used by the Distributed System for Scientific Collections research infrastructure (DiSSCo), but it can also be used by other infrastructures and institutions. PIDs minted by DiSSCo will be linked to the digital specimens and samples provided through DiSSCo. The new PIDs are a key element in enabling the concept of Digital Extended Specimens (Webster et al. 2021) and provide unique and resolvable references to enable bidirectional linking.  DiSSCo has done extensive work to select the most appropriate PID scheme (Hardisty et al. 2021) and to design a PID infrastructure for the pan-European specimens. The draft design has been discussed with technical specialists in the joint DiSSCo and Consortium of European Taxonomic Facilities (CETAF) community, with international stakeholders like the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio) and was discussed at the 2022 conference of the Society for the Preservation of Natural History Collections (SPNHC). A first implementation was demonstrated in the Biodiversity Information Standards (TDWG) annual conference in 2022 and illustrated key elements in the design. To be able to provide digital specimen identifiers as
      DOI s (Digital Object Identifiers), a pilot project was started in 2023 with DataCite to investigate if Digital Specimen
      DOI s in the new PID infrastructure can be created using the DataCite service. The pilot aim was to create metadata crosswalks to the DataCite schema in consultation with the DataCite Metadata Working Group, to evaluate synergies with the IGSN (International Generic Sample Number) metadata schema, to develop and test PID kernel metadata registration, and to evaluate performance and the impact of using DataCite services. There are around two billion specimens and creating PIDs for them as
      DOI s requires creating
      DOI s at an unprecedented scale. Also, PID kernel metadata registration is new for
      DOI s. The included metadata for specimens will complement existing Biodiversity Information Standards such as Darwin Core, and supports the new MIDS (Minimum Information about a digital specimen) standard that is under development.The design, development and testing of the new PID infrastructure is being done as part of the BiCIKL project that aims to foster collaboration between infrastructures and develop bidirectional connections (Penev et al. 2022). In the session, we will demonstrate the results in development of the PID infrastructure as part of the BiCIKL toolbox to link biodiversity data and to discuss the progress with creating digital specimen
      DOI s. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 11:30:21 +0300
       
  • Biodata Infrastructure within Australia and Beyond: Landscapes and
           horizons

    • Abstract: Biodiversity Information Science and Standards 7: e112274
      DOI : 10.3897/biss.7.112274
      Authors : Jeff Christiansen, Kathryn Hall : In current life science practice, digital data are associated with all parts of the research lifecycle. Generation and management of data are planned for during project conception; collected from numerous instruments or existing sources; prepared for analysis and analysed to generate new knowledge and information; and then (hopefully) preserved so that the data may be found, shared and re-used by others when appropriate.  This session will begin with a scan of the biodata and biodata infrastructure landscape within Australia.  We will explore which organisations fund biodata generation, where data are processed and stored, and how data are made available for reuse by others. Important global and complementary data resources that are hosted offshore will also be discussed. To guarantee reproducibility and integrity for life sciences research, it is critical that each of these infrastructures (whether they are hosted on- or off-shore) are maintained for the long term.As an example of a resource that utilises a mixture of existing on- and off-shore data infrastructures to underpin a critical research need, the Australian Reference Genome Atlas (ARGA) will be discussed. ARGA is solving the problem of genomics data obscurity for Australian-relevant species by creating an online platform where life sciences researchers can comprehensively and confidently search for genomic data for taxa relevant to Australian research. Publicly available genomics (and genetics) data are aggregated and indexed from multiple sources (both on- and off-shore), and then integrated with occurrence records and the taxonomic frameworks of the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) to enrich the genomic data and make them searchable using taxonomy, location, ecological characteristics and selected phenotypic data. The presentation sets the scene for a subsequent talk by members of the Global Biodata Coalition (GBC), who will outline the challenges in sustaining the types of disseminated infrastructure discussed and the GBC’s work with the funders who support many of these resources to ensure long-term funding for existing infrastructure, while also channelling support to underpin future growth in data volumes and new technologies. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 11:21:44 +0300
       
  • Retiring TDWG Standards and How Mapping Standards Could Support
           Agility

    • Abstract: Biodiversity Information Science and Standards 7: e112258
      DOI : 10.3897/biss.7.112258
      Authors : Kristen "Kit" Lewers : Since its genesis in September 1985, TDWG (formerly Taxonomic Databases Working Group now Biodiversity Information Standards) has become the steward of standards for the biodiversity informatics community; however, there is not yet a process for retiring standards. This talk will educate the community members on TDWG Standard categories of "Current Standard", "Prior Standard", "2005 Standard", and the history and context of how these categories came to be. It will also report on the progress the TAG (Technical Architecture Group) has made on moving towards creating a process for retiring standards through auditing, community participation, and other methods. Mapping TDWG standards can provide an agility to address overlaps, gaps, contradictions, and/or inconsistencies between standards in a proactive manner. Mapping standards' relationships provides infrastructure to support decision-making, combat information overload, and give context to the community as it continues to progress at a rapid pace. More specifically for TDWG, it gives a clear picture how updating, ratifying, and/or retiring a singular standard impacts the greater TDWG information ecosystem and how to update adjacent standards to preserve clarity and consistency for the community as a whole. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 11:18:18 +0300
       
  • Elevating the Fitness of Use of GBIF Occurrence Datasets: A proposal for
           peer review

    • Abstract: Biodiversity Information Science and Standards 7: e112237
      DOI : 10.3897/biss.7.112237
      Authors : Vijay Barve : Biodiversity data plays a pivotal role in understanding and conserving our natural world. As the largest occurrence data aggregator, the Global Biodiversity Information Facility (GBIF) serves as a valuable platform for researchers and practitioners to access and analyze biodiversity information from across the globe (Ball-Damerow et al. 2019). However, ensuring the quality of GBIF datasets remains a critical challenge (Chapman 2005).The community emphasizes the importance of data quality and its direct impact on the fitness of use for biodiversity research and conservation efforts (Chapman et al. 2020). While GBIF continues to grow in terms of the quantity of data it provides, the quality of these datasets varies significantly (Zizka et al. 2020). The biodiversity informatics community has been working diligently to ensure data quality at every step of data creation, curation, publication (Waller et al. 2021), and end-use (Gueta et al. 2019) by employing automated tools and flagging systems to identify and address issues. However, there is still more work to be done to effectively address data quality problems and enhance the fitness of use for GBIF-mediated data.I highlight a missing component in GBIF's data publication process: the absence of formal peer reviews. Despite GBIF encompassing the essential elements of a data paper, including detailed metadata, data accessibility, and robust data citation mechanisms, the lack of peer review hinders the credibility and reliability of the datasets mobilized through GBIF.To bridge this gap, I propose the implementation of a comprehensive peer review system within GBIF. Peer reviews would involve subjecting GBIF datasets to rigorous evaluation by domain experts and data scientists, ensuring the accuracy, completeness, and consistency of the data. This process would enhance the trustworthiness and usability of datasets, enabling researchers and policymakers to make informed decisions based on reliable biodiversity information.Furthermore, the establishment of a peer review system within GBIF would foster collaboration and knowledge exchange among the biodiversity community, as experts provide constructive feedback to dataset authors. This iterative process would not only improve data quality but also encourage data contributors to adhere to best practices, thereby elevating the overall standards of biodiversity data mobilization through GBIF. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 11:16:52 +0300
       
  • Semantic Mapping of the Geologic Time Scale: A temporal reference

    • Abstract: Biodiversity Information Science and Standards 7: e112232
      DOI : 10.3897/biss.7.112232
      Authors : Susan Edelstein, Ben Norton : The Geologic Time Scale is an ordered hierarchical set of terms representing specific time intervals in Earth's history. The hierarchical structure is correlated to the geologic record and major geologic events in Earth’s history (Gradstein et al. 2020). In the absence of quantitative numeric age values from absolute dating methods, the relative time intervals in the geologic time scale provide us with the vocabulary needed for deciphering Earth’s history and chronological reconstruction. This temporal frame of reference is critical to establishing correlations between specimens and how they fit within the Earth’s 4.567 Ga (giga annum) history.Due to spatial and temporal variations in the stratigraphic record, the terminology used in conjunction with geologic time scales is largely inconsistent. For a detailed discussion regarding term use in geologic timescales, see Cohen et al. (2013). As a result, published values for geologic timescale terms are often ambiguous and highly variable, limiting interoperability and hindering temporal correlations among specimens. A solution is to map verbatim geologic timescale values to a controlled vocabulary, constructing a single temporal frame of reference. The harmonization process is governed by an established set of business rules that can ultimately become fully or partially automated.In this study, we examined the Global Biodiversity Information Facility’s (GBIF) published distinct verbatim values for Darwin Core terms in the GeologicalContext Class of Darwin Core to assess the the use of chronostratiphic terms, a process highlighted in Sahdev et al. (2017). Preservation of these verbatim values, the initial unmapped set of published values, is important. Many are derived directly from primary source material and possess special historical and regional significance. These include land mammal ages (e.g., Lindsay (2003)), biostratigraphic zones, regional terms, and terms with higher granularity than the International Commission of Stratigraphy’s (ICS) timescale allows (e.g., subages/substages). For the purposes of this study, we selected the 2023/6 version of the ICS chronostratigraphic timescale as the controlled vocabulary (Cohen et al. 2023). The ICS is the most widely adopted timescale, comprising the most generalized and universally applicable intervals of geologic time.After semantic analysis of the verbatim values (see Table 1 for comparative statistics), we established a comprehensive set of business rules to map to the ICS timescale controlled vocabulary. This process yielded a collection of documented procedures to transform the heterogeneous collection of published terms into a semantically consistent dataset. The end result is a single temporal frame of reference for published geologic and paleontological specimens through semantic mapping to improve the temporal correlations between geologic specimens globally through data interoperability. This talk will highlight the process of harmonizing a heterogeneous collection of published verbatim Geologic Time Scale values with an established controlled vocabulary through semantic mapping. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 10:02:24 +0300
       
  • Want to Describe and Share Biodiversity Inventory and Monitoring Data' The
           Humboldt Extension for Ecological Inventories Can Help!

    • Abstract: Biodiversity Information Science and Standards 7: e112229
      DOI : 10.3897/biss.7.112229
      Authors : Yanina Sica, Wesley Hochachka, Yi-Ming Gan, Kate Ingenloff, Dmitry Schigel, Robert Stevenson, Steven Baskauf, Peter Brenton, Anahita Kazem, John Wieczorek : Access to high-quality ecological data is critical to assessing and modeling biodiversity and its changes through space and time. The Darwin Core standard has proven to be immensely helpful in sharing species occurrence data (see Wieczorek et al. 2012, Global Biodiversity Information Facility, GBIF) and promoting biodiversity research following the FAIR principles of findability, accessibility, interoperability and reusability (Wilkinson et al. 2016). However, it is limited in its ability to fully accommodate inventory data (i.e., linked records of multiple taxa at a specific place and time). Information about the inventory processes is often either unreported or described in an unstructured manner, limiting its potential re-use for larger-scale analyses. Two key aspects that are not captured in a structured manner yet are: i) information about the species that were not detected during an inventory, and ii) ancillary information about sampling effort and completeness.Non-detections (i.e., reported counts of zero) potentially enable more accurate and precise estimates of distribution, abundance, and changes in abundance. This becomes possible when variation in effort is used to estimate the likelihood that a non-detection represents a true absence of that taxon during the inventory. Currently, ecological inventory data, when shared at all, are typically discoverable through dataset catalogs (e.g., governmental data repositories) and supplementary materials to publications. With few exceptions, indexing of such data with the detail and structure needed has not been attempted at broad temporal and spatial scales, despite the potentially high value resulting from making inventory data more readily accessible.To address these limitations in documenting inventory data using the Darwin Core, Guralnick et al. (2018) proposed the Humboldt Core. Subsequent discussions within the biodiversity standards community made it clear that greater integration could be achieved by creating an extension of the Darwin Core, rather than developing a new standard in isolation. Extension design work began in 2021 and progress has been reported by Brenton (2021) and Sica et al. (2022). Over the last year the Humboldt Extension Task Group has sought advice from data providers and aggregators and updated its vocabulary terms. A challenging aspect has been creating terminology for the parent-child relationships (see Properties of Hierarchical Events) needed to describe surveys that may be as simple as a collection of checklists (one level of hierarchy) or as complex as species records from traps within plots along transects across habitats over multiple years (at least four levels of hierarchy). The Task Group has committed to completing a User Guide for the Humboldt Extension. Group members who contributed to the Darwin Core (Darwin Core Task Group 2009) and the Vocabulary Maintenance Specification (Vocabulary Maintenance Specification Task Group 2017) have provided valuable expertise on term refinement and process.Through ratification of the Humboldt Extension as a Darwin Core Event extension, we expect to provide the community with a usable solution, tied to well-established data publication mechanisms, for sharing and using inventory data. This effort promises to overcome a key bottleneck in the sharing of critically important ecological data, enhancing data discoverability, interoperability and re-use while lowering reporting burden and data and metadata heterogeneity. Global data aggregation initiatives, such as GBIF, will benefit from this development as they develop their data models and the range of standards and extensions they support. We anticipate that the Humboldt Extension will be attractive both to data publishers and data users, by facilitating the representation and indexing of data in richer, more meaningful ways. Despite the data-intensive nature of fundamental ecological research and applied monitoring for management and policy, ecological data have remained as one of the FAIR data frontiers. We anticipate that the Humboldt Extension will address most data exchange needs of all professional communities involved. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 09:55:24 +0300
       
  • An Australian Model of Cooperative Data Publishing to OBIS and GBIF

    • Abstract: Biodiversity Information Science and Standards 7: e112228
      DOI : 10.3897/biss.7.112228
      Authors : Katherine Tattersall, Peggy Newman, Sachit Rajbhandari, Dave Watts, Mahmoud Sadeghi : The Australian Commonwealth Science and Industrial Research Organisation (CSIRO) hosts both the Australian Ocean Biodiversity Information System (OBIS) and Global Biodiversity Information Facility (GBIF) nodes within the National Collections and Marine Infrastructure (NCMI) business unit. OBIS-AU is led by the NCMI Information and Data Centre and publishes marine biodiversity data in the Darwin Core (DwC) standard via an Integrated Publishing Toolkit (IPT), with over 450 marine datasets at present. The Australian GBIF node is hosted by a separate team at the Atlas of Living Australia (ALA), a national-scale biodiversity analytical and knowledge delivery portal. The ALA aggregates and publishes over 800 terrestrial and marine datasets from a wide variety of research institutes, museums and collections, governments and citizen science agencies, including OBIS-AU. Many OBIS-AU published datasets are harvested and republished by ALA and vice-versa.OBIS-AU identifies, performs Quality Control and formats marine biodiversity and observation data, then publishes directly to the OBIS international data repository and portal, using GBIF IPT technology. The ALA data processing pipeline harvests, aggregates and enhances datasets from many sources with authoritative taxonomic and spatial reference data before passing the data on to GBIF. OBIS-AU and ALA are working together to ensure that the publication pathways for any datasets managed by both (with potential for duplication of records and incomplete metadata harvests) are rationalised and that a single collaborative workflow across both units is followed for publication to GBIF. Recently, the data management groups have established an agreement to cooperatively publish marine data and eDNA data. OBIS-AU have commenced publishing datasets directly to GBIF with ALA endorsement.We present the convergent evolution of OBIS and GBIF data publishing in Australia, adaptive data workflows to maintain data and metadata integrity, challenges encountered, how domain expertise ensures data quality and the benefits of sharing data skills and code, especially in publishing eDNA data types in DwC (using the DNA-derived data extension) and exploring the new CamTrap Data Package using Frictionless data. We also present the work that both data groups are doing toward adopting the GBIF new Unified Data model for publishing data. This Australian case study demonstrates the strengths of collaborative data publishing and offers a model that minimises replication of data in global aggregators through the development of regional integrated data publishing pipelines. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 09:48:13 +0300
       
  • On the Long Tails of Specimen Data

    • Abstract: Biodiversity Information Science and Standards 7: e112151
      DOI : 10.3897/biss.7.112151
      Authors : Arturo H. Ariño : A recent article by K.R. Johnson and I.F.P. Owens in Science (Johnson and Owens 2023) suggested that the 73 main natural history museums around the world collectively hold over 1 billion records of accessioned "specimens" (taken as collection units), a result remarkably close to, but obtained through a completely different method from, research published a decade earlier by A.H. Ariño in Biodiversity Informatics (Ariño 2010). Both sets of approaches have benefitted from information available at the Global Biodiversity Information Facility (GBIF), which in the intervening years has grown by an order of magnitude, although mostly through observation-based occurrences rather than through accretion of specimen records in collections. When comparing the estimated size of collections and the amount of digital data from those collections, there is still a huge gap, as there was then. Digitization efforts have been progressing, but they are still far from reaching the goal of bringing information about all specimens into the digital domain.While the larger institutions may doubtlessly have greater overall resources to try and make their data available than smaller institutions, how do they compare in terms of data mobilization and sharing' Not surprisingly, the distribution of the collection sizes shows a long tail of small institutions that, nonetheless, are also embarking on digitization efforts. Will this long tail of science actually manage to have all their biodiversity data available sooner than the larger institutions' It is becoming more widely recognized that data usability is predicated on data becoming findable, accessible, interoperable and reusable (FAIR, Wilkinson et al. 2016). What could be the consequences of having a data availability bias towards having many tiny collections available for ready use, rather than a much smaller (although surely very significant) fraction of larger collections of a comparable type'This presentation explores and compares the distribution of potential versus readily available data in 2010 and in 2023, examines what trends might exist in the race to universal specimen data availability, and whether the digitization efforts might be better targeted to achieve greater overall scientific benefit.  HTML XML PDF
      PubDate: Thu, 7 Sep 2023 09:44:40 +0300
       
  • Making Schemas and Mappings Available and FAIR: A metadata and schema
           crosswalk registry from the FAIRCORE4EOSC project

    • Abstract: Biodiversity Information Science and Standards 7: e112223
      DOI : 10.3897/biss.7.112223
      Authors : Tommi Suominen, Joonas Kesäniemi, Hanna Koivula : Community standards like the Darwin Core (Darwin Core Task Group 2009) together with semantic artefacts (controlled vocabularies, ontologies, thesauri, and other knowledge organisation systems) are key building blocks for the implementation of the FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al. 2016), specifically as emphasized in the Interoperability principle I2 “(Meta)data use vocabularies that follow FAIR principles”. However, most of these artefacts are actually not FAIR themselves (Le Franc et al. 2020). To address this, the FAIRCORE4EOSC project (2022-25) is developing a Metadata Schema and Crosswalk Registry (MSCR) that will allow registered users and communities to create, register and version schemas and crosswalks that all have persistent identifiers (PIDs). The published content can then be searched, browsed and downloaded without restrictions. The MSCR will also provide an API to facilitate the transformation of data from one schema to another via registered crosswalks. It will provide projects and individual researchers with the possibility to manage their metadata schemas and/or relevant metadata schema crosswalks. The schema and crosswalks will be shared with the community for reuse and extension supported by a proper versioning mechanism.The registry tool will facilitate better interoperability between resource catalogues and information systems using different (metadata) schemas and encourage organisations and especially researchers to share their metadata interoperability by publishing the metadata crosswalks used in their workflows, which are currently not visible (FAIRification). By providing an easy-to-use graphical user interface (GUI) for creating crosswalks, the GUI will attract users currently relying on project-specific solutions.   HTML XML PDF
      PubDate: Thu, 7 Sep 2023 09:06:42 +0300
       
  • Mobilising Long-Term Natural Environment and Biodiversity Data and
           Exposing it for Federated, Semantic Queries

    • Abstract: Biodiversity Information Science and Standards 7: e112221
      DOI : 10.3897/biss.7.112221
      Authors : Hanna Koivula, Christoph Wohner, Barbara Magagna, Paolo Tagliolato Acquaviva d'Aragona, Alessandro Oggioni : Biodiversity and ecosystems cannot be studied without assessing the impacts of changing environmental conditions. Since the 1980s, the U.S. National Science Foundation's Long Term Ecological Research (LTER) Network has been a major force in the field of ecology to better understand ecosystems. In Europe, the LTER developments are led by the the Integrated European Long-Term Ecosystem, critical zone and socio-ecological system Research Infrastructure (eLTER RI), a currently project-based infrastructure initiative with the aim to facilitate high impact research and catalyse new insights about the compounded impacts of climate change, biodiversity loss, soil degradation, pollution, and unsustainable resource use on a range of European ecosystems and socio-ecological systems. The European LTER network, which forms the basis for the up-coming eLTER RI, is active in 26 countries and has 500 registered sites that provide legacy data e.g., historical time-series data about the environment (not only biodiversity). Its site information and dataset metadata with the measured variables are available to be searched at the Dynamic Ecological Information Management System - Site and dataset registry (DEIMS-SDR, Wohner et al. 2019). While DEIMS-SDR data models utilize parts of the Ecological Metadata Language (EML) schema 2.0.0, location information follows the European INSPIRE specification.The future eLTER data is planned to consist of site-based, long-term time-series of ecological data. The eLTER projects have defined eLTER Standard Observations (SO), which will include the minimum set of variables as well as the associated method protocols that can characterise adequately the state and future trends of the Earth's systems. (Masó et al. 2020, Reyers et al. 2017).The current eLTER network consists of sites that differ in terms of infrastructure maturity or environment type and may focus on one or several of the future SOs or they are not yet executing any holistic monitoring scheme. The main objective is to convert the eLTER site network into a distributed research infrastructure that incorporates a clearly outlined mandatory monitoring program. Essential to this effort are the suggested variables for eLTER SOs and the corresponding methods and protocols for relevant habitat types according to the European Nature Information System (EUNIS) in each domain. eLTER variables are described by using the eLTER thesaurus "EnvThes". These descriptions are currently enhanced by the use of the InteroperAble Descriptions of Observable Property Terminology (I-ADOPT, Magagna et al. 2022) framework to provide the necessary level of detail required for seamless data discovery and integration. Variables and their associated methods and protocols will be formalised to enable automatic site classifications, by building on existing observation representations such as the Extensible Observation Ontology (OBOE), Open Geospatial Consortium's Observation and Measurement, and the future eLTER Standard Observation ontology. DEIMS-SDR will continue to be used as a core service with an RDF representation of its assets (sites, sensors, activities, people) currently being implemented. This action is synced with the Biodiversity Digital Twin (BioDT) project to ensure maximum findability, accessibility, interoperability and re-usability (FAIRness; Wilkinson et al. 2016) of data through FAIR Digital Objects (FDO). Other (digital) assets such as datasets, models and analytical workflows will be documented in the Digital Asset Register (DAR) alongside semantic mapping and crosswalk techniques, to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020).The Biodiversity Digital Twin (BioDT) project is bringing together biodiversity and natural environment data from seven thematic use cases for modeling. BioDT prototypes rely on openly available data that comes from multiple heterogeneous sources using a multitude of standards and formats. In the pilot phase, merging data requires "hand picking" from selected sources, and automation of workflows would still require many additional steps. There are ongoing efforts in both the BioDT and eLTER projects to find best ways and practices to bring the raw data together by using suitable standards but also to harmonise the other environment variables by referring to vocabularies and possibly express the data as FDOs. Currently both the EML schema and Darwin Core standard (Darwin Core Task Group 2009; with registered extensions) allow referring to external schemas and vocabularies, which give flexibility but may still prove to be too narrow for the multitude of data types and formats the natural environment data requires. We welcome discussion about how to create good practices for enriching and harmonising natural environment data and species occurrence data in a meaningful way. GBIF's new data model and enriching the raw data with semantic artefacts may prove to be the way to provide thematic data products that combine data from multiple sources. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 09:02:47 +0300
       
  • Regional Data Platform of West and Central African Herbaria

    • Abstract: Biodiversity Information Science and Standards 7: e112180
      DOI : 10.3897/biss.7.112180
      Authors : Alice Ainsa, Sophie Pamerlon, Anne-Sophie Archambeau, Rémi Beauvieux, Raoufou Radji, Hervé Chevillotte : In April 2021, a Biodiversity Information for Development (BID) project was launched to deliver a regional data platform of West and Central African herbaria, which just concluded in April 2023. A dataset containing 168,545 herbarium specimens from 6 different countries: Togo, Gabon, Ivory Coast, Benin, Guinea Conakry and Cameroon, is now visible on the Global Biodiversity Information Facility (GBIF) website and will be regularly updated. A checklist datatset (Radji 2023a) and an occurrence dataset (Radji 2023b) obtained from herbarium parts are also available on GBIF.In addition, a Living Atlases portal for herbaria in West and Central Africa has been created to allow users to search, display, filter, and download these data. This application reuses open source modules developed by the Atlas of Living Australia (ALA) community (Morin et al. 2021).In addition to that, the RIHA platform (Réseau Informatique des Herbiers d'Afrique / Digital Network of African Herbaria) enables herbarium administrators to manage their own data. Thanks to all these tools, the workflow (Fig. 1) for data publication on GBIF is carried out regularly and easily and the addition of new member herbaria from West and Central Africa can be easily incorporated. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 08:55:18 +0300
       
  • OpenObs: Living Atlases platform for French biodiversity
           data

    • Abstract: Biodiversity Information Science and Standards 7: e112179
      DOI : 10.3897/biss.7.112179
      Authors : Alice Ainsa, Sophie Pamerlon, Anne-Sophie Archambeau, Solène Robert, Rémi Beauvieux : The OpenObs project, led by Patrinat, was launched in September 2017, and the first version of this tool was released in October 2020. OpenObs is based on the Atlas of Living Australia platform, supported by the Global Biodiversity Information Facility (GBIF) community, particularly the Living Atlases (LA) collective.OpenObs enables the visualization and downloading of observation data on species available in the National Inventory of Natural Heritage (INPN), the national platform of SINP (Information System for the Inventory of Natural Heritage). It provides open access to non-sensitive public data and includes all available observations, whether they are occurrence or synthesis data.As of July 2023, OpenObs has 134,922,015 observation records, and new data is reguarly added (at least twice a year). Furthermore, the project is constantly evolving with new developments planned, such as a user validation interface and new cartographic tools.We will present the architecture of this LA-based national biodiversity portal (Fig. 1), as well as its planned new functionality and development roadmap. HTML XML PDF
      PubDate: Thu, 7 Sep 2023 08:49:30 +0300
       
  • On Image Quality Metadata, FAIR in ML, AI-Readiness and Reproducibility:
           Fish-AIR example

    • Abstract: Biodiversity Information Science and Standards 7: e112178
      DOI : 10.3897/biss.7.112178
      Authors : Yasin Bakış, Xiaojun Wang, Bahadır Altıntaş, Dom Jebbia, Henry Bart Jr. : A new science discipline has emerged within the last decade at the intersection of informatics, computer science and biology: Imageomics. Like most other -omics fields, Imageomics also uses emerging technologies to analyze biological data but from the images. One of the most applied data analysis methods for image datasets is Machine Learning (ML). In 2019, we started working on a United States National Science Foundation (NSF) funded project, known as Biology Guided Neural Networks (BGNN) with the purpose of extracting information about biology by using neural networks and biological guidance such as species descriptions, identifications, phylogenetic trees and morphological annotations (Bart et al. 2021). Even though the variety and abundance of biological data is satisfactory for some ML analysis and the data are openly accessible, researchers still spend up to 80% of their time preparing data into a usable, AI-ready format, leaving only 20% for exploration and modeling (Long and Romanoff 2023). For this reason, we have built a dataset composed of digitized fish specimens, taken either directly from collections or from specialized repositories. The range of digital representations we cover is broad and growing, from photographs and radiographs, to CT scans, and even illustrations.  We have added new groups of vocabularies to the dataset management system including image quality metadata, extended image metadata and batch metadata. With the image quality metadata and extended image metadata, we aimed to extract information from the digital objects that can possibly help ML scientists in their research with filtering, image processing and object recognition routines. Image quality metadata provides information about objects contained in the image, features and condition of the specimen, and some basic visual properties of the image, while extended image metadata provides information about technical properties of the digital file and the digital multimedia object (Bakış et al. 2021, Karnani et al. 2022, Leipzig et al. 2021, Pepper et al. 2021, Wang et al. 2021) (see details on Fish-AIR vocabulary web page). Batch metadata is used for separating different datasets and facilitates downloading and uploading data in batches with additional batch information and supplementary files.Additional flexibility, built into the database infrastructure using an RDF framework, will enable the system to host different taxonomic groups, which might require new metadata features (Jebbia et al. 2023). By the combination of these features, along with FAIR (Findable, Accessable, Interoperable, Reusable) principles, and reproducibility, we provide Artificial Intelligence Readiness (AIR; Long and Romanoff 2023) to the dataset.Fish-AIR provides an easy-to-access, filtered, annotated and cleaned biological dataset for researchers from different backgrounds and facilitates the integration of biological knowledge based on digitized preserved specimens into ML pipelines. Because of the flexible database infrastructure and addition of new datasets, researchers will also be able to access additional types of data—such as landmarks, specimen outlines, annotated parts, and quality scores—in the near future. Already, the dataset is the largest and most detailed AI-ready fish image dataset with integrated Image Quality Management System (Jebbia et al. 2023, Wang et al. 2021). HTML XML PDF
      PubDate: Thu, 7 Sep 2023 08:46:01 +0300
       
  • Will a Local Portal using Global Data Encourage the Mainstreaming of
           Biodiversity Informatics in Asia' In Taiwan, We Say Yes

    • Abstract: Biodiversity Information Science and Standards 7: e112176
      DOI : 10.3897/biss.7.112176
      Authors : Jerome Chie-Jen Ko, Huiling Chang, Yihong Chang, You-Cheng Yu, Min-Hsuan Ni, Jun-Yi Wu, You Zhen Chen : Five years ago, the value of biodiversity open data was scarcely recognized in Taiwan. This posed a significant challenge to the Taiwan Biodiversity Infomation Facility (TaiBIF), our national node of the Global Biodiversity Information Facility (GBIF), in its sustained efforts to enhance data publishing capacities. Notably, non-academic entities, both governmental and industrial, were reluctant to invest resources in data management and publication, questioning the benefits beyond purely research-oriented returns.At the time, Taiwan had fewer than a million published records domestically, while GBIF held around 3 million occurrence records for Taiwan, largely unused by local users. We speculated that this discrepancy in data usage stemmed from three factors: (1) lack of species names in the local language within the occurrence data, (2) missing locally important species attributes, such as conservation status and national red list categories, and (3) absence of a culturally relatable local portal promoting biodiversity data usage.To address these issues, we launched the Taiwan Biodiversity Network (TBN) website in 2018, localizing global data from GBIF and integrating missing information from local data sources. Collaborating with wildlife illustrators, we designed a user-friendly data interface to lessen the system's technical or academic barriers. This effort led to a doubling of website visitors and data download requests annually, and in recent years, biodiversity open data has become a vital component in environmental impact assessments. This upward trend heightened the recognition of the value of biodiversity open data, inciting organizations, such as initially data-conservative government agencies and private sectors with no obligatory data-sharing, to invest in data management and mobilization. This advancement also catalyzed the formation of the Taiwan Biodiversity Information Alliance (TBIA), actively promoting cross-organizational collaborations on data integration.Today, Taiwan offers more than 19 million globally accessible occurrence records and data for more than 28,000 species. While the surge in data volume can certainly be credited to the active local citizen science community, we believe the expanded coverage of species and data types is a result of a growing community supportive of biodiversity open data. This was made possible by the establishment of a local portal that effectively bridged the gap between global data and local needs. We hope our experience will motivate other Asian countries to create analogous local portals using global open data sources like GBIF, illustrating the value of biodiversity open data to decision-makers and overcoming resource limitations that impede investments in biodiversity informatics. HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:59:53 +0300
       
  • DiversityIndia Meets: Pioneering citizen science through collaborative
           data mobilization

    • Abstract: Biodiversity Information Science and Standards 7: e112163
      DOI : 10.3897/biss.7.112163
      Authors : Vijay Barve, Nandita Barman, Arjan Basu Roy, Amol Patwardhan, Purab Chowdhury : DiversityIndia, founded in 2001, is an online community dedicated to promoting meaningful discussions and facilitating the exchange of diverse perspectives on lesser-known taxonomic groups, including butterflies, moths, dragonflies, spiders, and more. The core idea behind DiversityIndia is to establish a network of like-minded individuals who possess a deep passion for these subjects and actively participate in various aspects of biodiversity observation and research.Initially, the taxonomic focus of DiversityIndia centered around butterflies, which led to the creation of the ButterflyIndia Yahoo email group. The group quickly gained recognition for its significant contributions in sharing valuable insights about butterflies, including information about their habitats and lesser-known species. ButterflyIndia also played a vital role in facilitating connections among scientists and researchers who were dedicated to studying Lepidoptera. As a result of its collaborative efforts, the group actively contributed to major book projects and web portals, further enhancing the knowledge and resources available to the butterfly research community. As time progressed, the group expanded its presence to include various social media platforms like Orkut, Facebook, Flickr and more, thereby expanding its influence and reach.The realization of a significant need for empirical research on butterflies, requiring the involvement of both specialists and enthusiasts across diverse habitats, led to the first ButterflyIndia Meet in 2004 at Shendurney, Kerala. This pioneering concept garnered immense success, attracting participants from diverse regions of the country and backgrounds. Since then, several ButterflyIndia Meets have been organized, resulting in the documentation of numerous butterfly species. Building upon this success, DragonflyIndia and SpiderIndia were established with similar objectives and have successfully coordinated multiple gatherings (Fig. 1).One of the most notable DiversityIndia Meets occurred in April 2022, held in Sundarbans, West Bengal. This particular meet marked a significant milestone as the documented dataset, comprising information on all taxonomic groups observed during the event, was published through the Global Biodiversity Information Facility (GBIF) (Roy et al. 2022). This publication allowed for wider accessibility and utilization of the valuable biodiversity data collected during the meet.In addition to the Sundarbans meet, ongoing efforts are currently underway to gather occurrence data from all the previous meetings conducted by DiversityIndia (Table 1). The aim is to compile and mobilize this data on GBIF as datasets, involving active participation from the members who attended these meetings. This endeavor seeks to maximize the availability and usefulness of the biodiversity information gathered through the various DiversityIndia Meets over time.According to the published records so far (Global Biodiversity Information Facility 2023), there are 1663 documented occurrences from 859 taxa belonging to 14 taxonomic classes, covering various biogeographies in India. These records provide valuable insights into the biodiversity of the country. DiversityIndia has indeed played a pioneering role in online citizen science in India, and its origins can be traced back to the original Yahoo groups. By bringing together specialists and enthusiasts alike, this community has successfully contributed to the comprehensive documentation and understanding of India's rich biodiversity. The collective efforts of the members serve as a testament to the enduring impact of citizen science in pushing the boundaries of our knowledge regarding the natural world. HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:55:54 +0300
       
  • Extracting Masks from Herbarium Specimen Images Based on Object Detection
           and Image Segmentation Techniques

    • Abstract: Biodiversity Information Science and Standards 7: e112161
      DOI : 10.3897/biss.7.112161
      Authors : Hanane Ariouat, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Jean-Daniel Zucker, Edi Prifti, Eric Chenin : Herbarium specimen scans constitute a valuable source of raw data. Herbarium collections are gaining interest in the scientific community as their exploration can lead to understanding serious threats to biodiversity. Data derived from scanned specimen images can be analyzed to answer important questions such as how plants respond to climate change, how different species respond to biotic and abiotic influences, or what role a species plays within an ecosystem. However, exploiting such large collections is challenging and requires automatic processing. A promising solution lies in the use of computer-based processing techniques, such as Deep Learning (DL). But herbarium specimens can be difficult to process and analyze as they contain several kinds of visual noise, including information labels, scale bars, color palettes, envelopes containing seeds or other organs, collection-specific barcodes, stamps, and other notes that are placed on the mounting sheet. Moreover, the paper on which the specimens are mounted can degrade over time for multiple reasons, and often the paper's color darkens and, in some cases, approaches the color of the plants.Neural network models are well-suited to the analysis of herbarium specimens, while making abstraction of the presence of such visual noise. However, in some cases the model can focus on these elements, which eventually can lead to a bad generalization when analyzing new data on which these visual elements are not present (White et al. 2020). It is important to remove the noise from specimen scans before using them in model training and testing to improve its performance. Studies have used basic cropping techniques (Younis et al. 2018), but they do not guarantee that the visual noise is removed from the cropped image. For instance, the labels are frequently put at random positions into the scans, resulting in cropped images that still contain noise. White et al. (2020) used the Otsu binarization method followed by a manual post-processing and a blurring step to adjust the pixels that should have been assigned to black during segmentation. Hussein et al. (2020) used an image labeler application, followed by a median filtering method to reduce the noise. However, both White et al. (2020) and Hussein et al. (2020) consider only two organs: stems and leaves. Triki et al. (2022) used a polygon-based deep learning object detection algorithm. But in addition to being laborious and difficult, this approach does not give good results when it comes to fully identifying specimens. In this work, we aim to create clean high-resolution mask extractions with the same resolution as the original images. These masks can be used by other models for a variety of purposes, for instance to distinguish the different plant organs. Here, we proceed by combining object detection and image segmentation techniques, using a dataset of scanned herbarium specimens. We propose an algorithm that identifies and retains the pixels belonging to the plant specimen, and removes the other pixels that are part of non-plant elements considered as noise. A removed pixel is set to zero (black). Fig. 1 illustrates the complete masking pipeline in two main stages, object detection and image segmentation.In the first stage, we manually annotated the images using bounding boxes in a dataset of 950 images. We identified (Fig. 2) the visual elements considered to be noise (e.g., scale-bar, barcode, stamp, text box, color pallet, envelope). Then we trained the model to automatically remove the noise elements. We divided the dataset into 80% training, 10% validation and 10% test set. We ultimately achieved a precision score of 98.2%, which is a 3% improvement from the baseline. Next, the results of this stage were used as input for image segmentation, which aimed to generate the final mask. We blacken the pixels covered by the detected noise elements, then we used HSV (Hue Saturation Value) color segmentation to select only the pixels with values in a range that corresponds mostly to a plant color. Finally, we applied the morphological opening operation that removes noise and separates objects; and the closing operation that fills gaps, as described in Sunil Bhutada et al. (2022) to remove the remaining noise. The output here is a generated mask that retains only the pixels that belong to the plant. Unlike other proposed approaches, which focus essentially on leaves and stems, our approach covers all the plant organs (Fig. 3). Our approach removes the background noise from herbarium scans and extracts clean plant images. It is an important step before using these images in different deep learning models. However, the quality of the extractions varies depending on the quality of the scans, the condition of the specimens, and the paper used. For example, extractions made from samples where the color of the plant is different from the color of the background were more accurate than extractions made from samples where the color of the plant and background are close. To overcome this limitation, we aim to use some of the obtained extractions to create a training dataset, followed by the development and the training of a generative deep learning model to generate masks that delimit plants.  HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:49:35 +0300
       
  • Towards the Atlas of Living Flanders, a Challenging Path

    • Abstract: Biodiversity Information Science and Standards 7: e112155
      DOI : 10.3897/biss.7.112155
      Authors : Dimitri Brosens, Sten Migerode, Aaike De Wever : In Belgium, a federal country in the heart of Europe, the competencies for nature conservation and nature policy lie within the regions. The Research Institute for Nature and Forest (INBO) is an independent research institute, funded by the Flemish regional government, which underpins and evaluates biodiversity policy and management by means of applied scientific research, and sharing of data and knowledge.One of the 12 strategic goals in the 2009-2015 INBO strategic planning was that: 'INBO manages data and makes them accessible. It looks into appropriate data gathering methods and means by which to disseminate data and make them readily available'. Since 2009, the INBO has steadily evolved into a research institute with a strong emphasis on open data and open science. In 2010 INBO became a data publisher for the Global Biodiversity Information Facility (GBIF), adopted an open data and open access policy and is known for being an open science institute in Flanders, Belgium. In 2021, a question arose from the council of ministers on the possibility and availability of a public portal for biodiversity data. The goal of this portal should be to ensure findability, availability, and optimal usability of biodiversity data, initially for policy makers, but also for the wider public. With the Living Atlas project already high on our radar, an analysis project, funded by the Flemish government, started in December 2021. All the entities in the department of 'Environment' contributed to a requirements and feasibility study, a proof of concept (POC) Living Atlas for Flanders was set up and the required budget was calculated.During the requirements and feasibility study we questioned the agency for nature and forest (ANB), the Flanders Environment Agency (VMM), Flemish land agency (VLM) and the Department of Environment with the help of a professional inquiry agency IPSOS on the possible relevance for policy of a Flemish biodiversity portal, the need of high resolution data (geographical and temporal scale) and the availability of biodiversity data in Flanders, focussed on key species, protected species and other Flemish priority species.During the technical proof of concept, we tested the Living Atlases (LA) software suite as the most mature candidate for a Flemish Living Atlas. We checked how we could set up a LA installation in our own Amazon Web Services (AWS) environment, evaluated all the used technologies, estimated the maintenance and infrastructure cost, the needed profiles and the number of full-time equivalent personnel we would need to run a performant Atlas of Living Flanders.The goal of this talk is to inform the audience on the steps we took, the hurdles we encountered and how we are trying to convince our policy makers of the benefits of an Atlas of Living Flanders. HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:40:52 +0300
       
  • The Journey to a TDWG Mappings Task Group and its Plans for the Future

    • Abstract: Biodiversity Information Science and Standards 7: e112148
      DOI : 10.3897/biss.7.112148
      Authors : David Fichtmueller : Some Biodiversity Information Standards (TDWG) standards have had mappings to other standards for years or even decades. However each standard is using its own approach to documenting those mappings, some are incomplete and often hard to find. There is no TDWG recommended approach for how mappings should be documented, like the Standards Documentation Standard (SDS) does for the standards themselves. During TDWG 2022 in Sofia, Bulgaria, the topic of mapping between standards was mentioned several times throughout the conference, which led to an impromptu discussion about standards mappings at the Unconference slot on the last conference day. Afterwards a dedicated Slack channel within the TDWG Slack workspace was added to continue the conversation (#mappings-between-standards). During further discussions, both within the Technical Architecture Group (TAG) of TDWG and during separate video conferences on the topic, it was decided to form a dedicated task group under the umbrella of the TAG. This task group is still in the process of formation. The goal of the group is to review the current state of mappings for TDWG standards, align approaches by the different standards to foster interoperability and give recommendations for current and future standards on how to specify mappings. Further work to define the strategy and scope for achieving these goals is needed, particularly to gain community input and acceptance. Consideration has been given to a range of possible types of mappings, which serve the different use cases and expectations for mappings such as machine actionability and improved documentation of the TDWG standards landscape to aid user understanding and implementation. In this talk we will show the work that has already been done, outline our planned steps and invite the community to give input on our process. HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:38:25 +0300
       
  • Harmonised Data is Actionable Data: DiSSCo’s solution to data
           mapping

    • Abstract: Biodiversity Information Science and Standards 7: e112137
      DOI : 10.3897/biss.7.112137
      Authors : Sam Leeflang, Wouter Addink : Predictability is one of the core requirements for creating machine actionable data. The better predictable the data, the more generic the service acting on the data can be. The more generic the service, the easier we can exchange ideas, collaborate on initiatives and leverage machines to do the work. It is essential for implementing the FAIR Principles (Findable, Accessible, Interoperable, Reproducible), as it provides the “I” for Interoperability (Jacobsen et al. 2020). The FAIR principles emphasise machine actionability because the amount of data generated is far too large for humans to handle. While Biodiversity Information Standards (TDWG) standards have massively improved the standardisation of biodiversity data, there is still room for improvement. Within the Distributed System of Scientific Collections (DiSSCo), we aim to harmonise all scientific data derived from European specimen collections, including geological specimens, into a single data specification. We call this data specification the open Digital Specimen (openDS). It is being built on top of existing and developing biodiversity information standards such as Darwin Core (DwC), Minimal Information Digital Specimen (MIDS), Latimer Core, Access to Biological Collection Data (ABCD) Schema, Extension for Geosciences (EFG) and also on the new Global Biodiversity Information Facility (GBIF) Unified Model. In openDS we leverage the existing standards within the TDWG community but combine these with stricter constraints and controlled vocabularies, with the aim to improve the FAIRness of the data. This will not only make the data easier to use, but will also increase its quality and machine actionability.As the first step towards this the harmonisation of terms, we make sure that similar values use the same term in a standard as key. This enables the next step in which we harmonise the values. We can transform free-text values into standardised or controlled vocabularies. For example: instead of using the names J. Doe, John Doe and J. Doe sr. for a collector, we aim to standardise these to J. Doe, with a person identifier that connects this name with more information about the collector.Biodiversity information standards such as DwC were developed to lower the bar for data sharing. The downside of including minimal restraints and flexibility is that they provide room for ambiguity, leading to multiple ways of interpretation. This limits interoperability and hampers machine actionability. In DiSSCo, data will come from different sources that use different biodiversity information standards. To cover this, we need to harmonise terms between these standards. To complicate things further, different serialisation methods are used for data exchange. Darwin Core Archives (DwC-A; GBIF 2021) use Comma-separated values (CSV) files. ABCD(EFG) exposed through Biological Collection Access Service (BioCASe) uses XML. And most custom formats use JavaScript Object Notation (JSON).In this lightning talk, we will dive into DiSSCo’s technical implementation of the harmonisation process. DiSSCo currently supports two biodiversity information standards, DwC and ABCD(EFG), and maps the data to our openDS specification on a record-by-record basis. We will highlight some of the more problematic mappings, but also show how a harmonised model massively simplifies generic actions, such as the calculation of MIDS levels, which provide information about digitisation completeness of a specimen. We will conclude by having a quick look at the next steps and hope to start a discussion about controlled vocabularies. The development of high quality, standardised data based on a strict specification with controlled vocabularies, rooted in community accepted standards, can have a huge impact on biodiversity research and is an essential step towards scaling up research with computational support. HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:31:59 +0300
       
  • The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing
           Australian genomics data in an occurrence-driven context

    • Abstract: Biodiversity Information Science and Standards 7: e112129
      DOI : 10.3897/biss.7.112129
      Authors : Kathryn Hall, Matt Andrews, Keeva Connolly, Yasima Kankanamge, Christopher Mangion, Winnie Mok, Lars Nauheimer, Goran Sterjov, Nigel Ward, Peter Brenton : Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia. We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge. We then demonstrate how our flexible model can be used to:locate genomics data for taxa of interest;explore data within an ecological context; andcalculate metrics for data availability for provincial bioregions. HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:25:31 +0300
       
  •  Invasions, Plagues, and Epidemics: The Atlas of Living Australia’s
           deep dive into biosecurity

    • Abstract: Biodiversity Information Science and Standards 7: e112127
      DOI : 10.3897/biss.7.112127
      Authors : Andrew Turley, Erin Roger : Early detection of new incursions of species of biosecurity concern is crucial to protecting Australia’s environment, agriculture, and cultural heritage. As Australia’s largest biodiversity data repository, the Atlas of Living Australia (ALA) is often the first platform where new species incursions are recorded. The ALA holds records of more than 2,380 exotic species and over 1.9 million occurrences of pests, weeds, and diseases—many of which are reported though citizen science. However, until recently there has been no systematic mechanism for notifying relevant biosecurity authorities of potential biosecurity threats. To address this, the ALA partnered with the (Australian) Commonwealth Department of Agriculture, Fisheries and Forestry to develop the Biosecurity Alerts System. Two years on, the project has demonstrated the benefits of biosecurity alerts, but significant barriers exist as we now work to expand this system to State and Territory biosecurity agencies, and seek new sources of biosecurity data. In our presentation, we discuss a brief history of invasive alien species in Australia, the Biosecurity Alerts System, and how we are approaching issues with taxonomy, data standards, and cultural sensitivities in aggregating biosecurity data. We conclude by detailing our progress in expanding the alerts system and tackling systemic issues to help elevate Australia’s biosecurity system. HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:20:13 +0300
       
  • Can Biodiversity Data Scientists Document Volunteer and Professional
           Collaborations and Contributions in the Biodiversity Data Enterprise'

    • Abstract: Biodiversity Information Science and Standards 7: e112126
      DOI : 10.3897/biss.7.112126
      Authors : Robert Stevenson, Elizabeth R. Ellwood, Peter Brenton, Paul Flemons, Jeff Gerbracht, Wesley Hochachka, Scott Loarie, Carrie Seltzer : The collection, archiving and use of biodiversity data depend on a network of pipelines herein called the Biodiversity Data Enterprise (BDE) and best understood globally through the work of the Global Biodiversity Information Facility (GBIF). Efforts to sustain and grow the BDE require information about the data pipeline and the infrastructure that supports it. A host of metrics from GBIF, including institutional participation (member countries, institutional contributors, data publishers), biodiversity coverage (occurrence records, species, geographic extent, data sets) and data usage (records downloaded, published papers using the data) (Miller 2021), document the rapid growth and successes of the BDE (GBIF Secretariat 2022). Heberling et al. (2021) make a convincing case that the data integration process is working.The Biodiversity Information Standards' (TDWG) Basis of Record term provides information about the underlying infrastructure. It categorizes the kinds of processes*1 that teams undertake to capture biodiversity information and GBIF quantifies their contributions*2 (Table 1). Currently 83.4% of observations come from human observations, of which 63% are of birds. Museum preserved specimens account for 9.5% of records. In both cases, a combination of volunteers (who make observations, collect specimens, digitize specimens, transcribe specimen labels) and professionals work together to make records available.To better understand how the BDE is working, we suggest that it would be of value to know the number of contributions and contributors and their hours of engagement for each data set. This can help the community address questions such as, "How many volunteers do we need to document birds in a given area'" or "How much professional support is required to run a camera trap network'" For example, millions of observations were made by tens of thousands of observers in two recent BioBlitz events, one called Big Day, focusing on birds, sponsored by the Cornell Laboratory of Ornithology and the other called the City Nature Challenge, addressing all taxa, sponsored jointly by the California Academy of Sciences and the Natural History Musuems of Los Angeles County (Table 2). In our presentation we will suggest approaches to deriving metrics that could be used to document the collaborations and contribution of volunteers and staff using examples from both Human Observation (eBird, iNaturalist) and Preserved Specimen (DigiVol, Notes from Nature) record types. The goal of the exercise is to start a conversation about how such metrics can further the development of the BDE. HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:16:24 +0300
       
  • Measuring Habitat Restoration using the Darwin and "Event" Cores:
           Australian examples powered by BioCollect

    • Abstract: Biodiversity Information Science and Standards 7: e112083
      DOI : 10.3897/biss.7.112083
      Authors : Peter Brenton, Peggy Eby, Robert Stevenson, Elizabeth Ellwood : Habitat decline and fragmentation are major factors in biodiversity loss across the globe and can be difficult to measure, particularly at landscape scale (Brooks et al. 2002, Fahrig 2003, Ritchie and Roser 2019). In Australia, rural, coastal and urban communities have been undertaking habitat restoration activities since the mid-1980s to protect and restore ecological balance on private land and in local shared and natural spaces. Much of the restoration effort has centered around hands-on activities as a mechanism for building community with environmental benefits. Over such a time span, thousands of locations throughout the country have been transformed from degraded and highly disturbed landscapes into resemblances of more-or-less natural areas. However, collecting and analysing data for these activities was given little attention until quite recently, as governments, philanthropists and other investors have become increasingly interested in measuring the value and outcomes from investment. To measure the effectiveness of the restoration effort, it is essential to to benchmark the environmental state and species composition before the restoration begins, but surprisingly or unsurprisingly, this is rarely done (Hale et al. 2019).Responding to this call for better documentation of restoration outcomes, over 30 groups have been using the Atlas of Living Australia’s BioCollect platform to capture complex information about current and past restoration work. The BioCollect platform enables each type of monitoring, establishment, and follow-up activity to have its own data collection schema and associated metadata structured around using a hierarchy of sampling events based on the Event class in the Darwin Core standard, which allows relationships between types of event records to be specified. When event records are created through use of an activity-based template, each occurrence of a species is also parsed and configured as a Darwin Core occurrence record. Standard templates have been created for a range of activities, such as benchmarking assessments, site establishment, follow-up interventions and monitoring over time, which are being used by many different groups over large areas of the landscape. This allows each group to operate independently, yet collect standardised data that can be easily aggregated at larger temporal and spatial scales, quantifying change over time. The relationships between occurrences and the event context in which they were collected is also preserved and navigable.Here we present how Darwin Core and Event Core have been implemented in the BioCollect platform to enable this important data to be collected and stored in its full richness and resolution. HTML XML PDF
      PubDate: Wed, 6 Sep 2023 17:10:34 +0300
       
  • Ten Years and a Million Links: Building a global taxonomic
           library connecting persistent identifiers for names (LSIDs), publications
           (DOIs), and people (ORCIDs)

    • Abstract: Biodiversity Information Science and Standards 7: e112053
      DOI : 10.3897/biss.7.112053
      Authors : Roderic Page : One thing the field of biodiversity informatics has been very good at is creating databases. However, this success in creation has not been matched by equivalent success in creating deep links between records in those databases. Instead, we create an ever growing number of silos. An obvious route to “silo-breaking” is the shared use of the same persistent identifiers for the same entities across those databases. For example, we have minted millions of Life Science Identifiers (LSIDs) for taxonomic names (which can be resolved at lsid.io), and a growing number of taxonomic papers have Digital Object Identifiers (
      DOI s), but we lack connections between these two identifiers. In this talk I describe work over the last decade to make these connections between LSIDs and
      DOI s across three large taxonomic databases: Index Fungorum, International Plant Names Index (IPNI), and the Index to Organism Names (ION) (Page 2023). Over a million names have been matched to
      DOI s or other persistent identifiers for taxonomic publications (Fig. 1 shows the coverage of publications for animal names). This represents approximately 36% of animal, plant or fungal names for which publication data is available.The mappings between LSIDs and publication persistent identifiers (PIDs) such as
      DOI s and Wikidata item identifiers, are made available through ChecklistBank (datasets 129659, 164203, 128415), and also archived in Zenodo. By combining these LSID and
      DOI links with Open Researcher and Contributor ID (ORCIDs) for taxonomists, we can potentially gain insight into who is doing taxonomic research, where they work, and how they are funded. Possible applications of this data are discussed, including a tool to discover the citation for a species name (Species Cite, Fig. 2), using
      DOI to ORCIDs to discover who is doing taxonomic research, and creating a linked data version of the Catalogue of Life. HTML XML PDF
      PubDate: Tue, 5 Sep 2023 09:31:13 +0300
       
  • Improved Sharing and Linkage of Taxonomic Data with the Taxon Concept
           Standard (TCS)

    • Abstract: Biodiversity Information Science and Standards 7: e112045
      DOI : 10.3897/biss.7.112045
      Authors : Niels Klazenga : The term ‘taxonomic backbone’ is often used to indicate the compromise taxonomies that form the taxonomic backbone of systems like the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA). However, the term can also be seen in the broader sense as the entire expansive and continually evolving body of taxonomic work that underpins all biodiversity data and the linkage of all the different concepts that are used in various parts of the world and by various groups of people.The Taxon Concept Schema (TCS; Hyam and Kennedy 2006), which was ratified as a TDWG standard in 2005, came forth from the need of providers of taxonomic information for a mechanism to exchange data with other providers and users. Additionally, there was the knowledge that taxon names make poor identifiers for taxa and that more than names are needed for effective sharing and linking of biodiversity data. The same name can be associated with multiple taxon concepts or definitions, especially when a name has been around for a long time or is used in a heavily revised group. In order for others to know what a name means, people who use a name should also indicate which concept of that name is being used. Traditionally, the Latin ‘sensu’ or `sec.` have been used for this purpose; in TCS, an ‘according to’ property is used. The taxon concept, along with a language to relate different concepts, which is also in TCS, was later introduced to a systematic audience in an article by Franz and Peet (2009).Unfortunately, TCS has never enjoyed wide adoption and since Darwin Core (Wieczorek et al. 2012) was ratified in 2009, sharing of taxonomic information has mostly been done with the Darwin Core Taxon class. However, various issues with the Darwin Core Taxon class (e.g., Darwin Core and RDF/OWL Task Groups 2015) have made us look at TCS again and in 2020 the Taxonomic Names and Concepts Interest Group was formally renamed the TCS Maintenance Group. In 2021, a TCS 2 Task Group was established with the goal to update TCS to a Vocabulary Standard (like Darwin Core) that can be maintained under the TDWG Vocabulary Maintenance Specification (Vocabulary Maintenance Specification Task Group 2017).As it currently stands, TCS 2 (TCS 2 Task Group 2023) has two classes for dealing with taxonomy, the Taxon Concept and Taxon Relationship classes, and two classes for dealing with nomenclature, the Taxon Name and Nomenclatural Type classes. TCS 2 describes objects that are present and known in the domain and uses terms that are used in the domain (e.g., Greuter et al. 2011, Hawksworth 2010), so is easily understood by practitioners in the domain and other users of taxonomic information, as well as data specialists and developers. At the same time, it is in accordance with the OpenBiodiv Ontology (Senderov et al. 2018) and the Simple Knowledge Organization System (SKOS; Miles and Bechhofer 2009).TCS 2 can be used to mark up taxon concepts of any type, including taxonomic treatments, checklists, field guides, as well as systems like the Catalogue of Life and AviBase. Once marked up as TCS, concepts of all types look the same and therefore a small standard of under 40 terms can be used to share and link all taxonomic information and to link to other types of biodiversity data, for example occurrence data or descriptive data. HTML XML PDF
      PubDate: Tue, 5 Sep 2023 09:24:01 +0300
       
  • Quality Control/Quality Assurance within the Integrated Taxonomic
           Information System (ITIS)

    • Abstract: Biodiversity Information Science and Standards 7: e112043
      DOI : 10.3897/biss.7.112043
      Authors : Thomas Orrell : The Integrated Taxonomic Information System*1 (ITIS) and Species 2000*2 have worked together for two decades after signing an agreement to form the Catalogue of Life*3 (COL), striving to provide current and complete Global Species Databases (GSDs) from various sources. ITIS has provided many such GSDs to the COL.In this presentation, we will demonstrate in detail the nine aspects of ITIS’ approach to quality assurance/quality control: People, Process, Rules, Standards, Automation, Control, Assurance, Publication, and Feedback. All of these aspects are important and deserve consideration in the creation/maintenance of ITIS’ high quality GSDs.ITIS has also developed a new web-based ‘Taxonomic Workbench’ (TWB) that allows new levels of cooperative effort beyond what ITIS has been able to attain with a desktop version of the software, which has been used for twenty-five years. Some key aspects of these tools and what they will allow are discussed in the last half of the presentation. HTML XML PDF
      PubDate: Tue, 5 Sep 2023 09:16:00 +0300
       
  • TaxonWorks in its 10th Year: What’s new, what’s next'

    • Abstract: Biodiversity Information Science and Standards 7: e112040
      DOI : 10.3897/biss.7.112040
      Authors : Deborah Paul, Matthew Yoder : The Species File Group (SFG) endeavors to build tools and community structures that empower researchers and collections staff in their long-term collective efforts to gather, share, and learn from biodiversity data. One such tool is TaxonWorks, now in its 10th year of development. TaxonWorks provides a collaborative workbench where scientists, collection managers, students, and volunteers capture and build on the key data and concepts we use to Describe Life (TaxonWorks motto). It provides a growing number of ways to share descriptions, from Darwin Core Archives, to NeXML-formatted observations and keys, to checklists, and bibliographies.What’s New' We have expanded the data model of TaxonWorks, added new tools and functions, and some Companion software, that is, new stand-alone code-bases.Two major additions, Unified Filters and Cached Maps, provide developers and users (and users who are developers) the ability to run complex queries across TaxonWorks' rich data model and to display quickly computed maps for datasets of notable size, 100K or more specimen and literature-based records. For example, Cached Maps can superimpose the asserted distribution and georeferenced literature and specimen records to create interactive searchable maps (Fig. 1). In TaxonWorks, we aim to empower those working with the data with tools that help them visualize and curate information. To be able to model taxon concept relationships over time to reflect different taxonomic opinions, we added RCC-5 (Region Connection Calculus; Thau et al. 2008), which will make it possible to visualize these relationships. Similarly, we built a new visual editor (Fig. 2) for displaying, editing, and citing biological associations as recorded among specimens or taxa (or both).Querying and enhancing data in a given database can be complex. We have worked on harmonizing the look-feel-function of the data filtering interfaces. With our Unified Filters, one can pass the results of one search to another filter (e.g., query for specimens for a given taxonomic group and then ask for the distinct collecting events for those specimens). Then, once you filter to a given dataset, you can use our new Stepwise tasks to enhance and edit that information en-masse.Companions code-bases extend what one can do with the data in TaxonWorks, but are also available for use with other software. For example, using our new TaxonPages code, our users can produce their own web pages for taxa (Fig. 1). TaxonPages will be used by SFG groups to make available well over 100K pages this year. They include basic Bioschema integration, links to JSON-formatted data behind every panel, and the option to download any occurrence data present, expressed as Darwin Core attributes, formatted as a CSV file. TaxonPages can be set up in minutes and served on resources like GitHub pages and our user community can customize their content.Finally, the TaxonWorks external API has added a huge number of new parameters across multiple new conceptual endpoints.What’s Next'With ten years of development, we see a maturing functionality surrounding the core concepts in TaxonWorks, like observations (e.g., traits, phylogenetic data), biological associations (e.g., host-parasite relationships), images, sources (citation management), specimens, collecting events, and collection management.Currently, we are focusing on integration with other external services. We have produced multiple new API wrappers, notably Colrapi (wrapping Catalogue of Life Checklist Bank's API) and BellPepper wrapping the new Biodiversity Enhanced Location Services (BELS) Georeference API. These wrappers and ongoing integration with the Global Names Framework give our users the power to improve data quality, e.g., linking to external vocabularies, finding and updating out-of-date nomenclature, and visualizing what TaxonWorks collection object data looks like in the context of external aggregators like the Global Biodiversity Information Facility (GBIF) using our gbifference tool (as in the "GBIF difference").The TaxonWorks community continues to grow, and therefore so does the diversity of the projects using it. Some of this diversity reflects the stage of projects: new projects need to rapidly create and stub new records, mid-life projects need to seek and add diverse data from a wide range of external resources, and mature projects need tools to identify and resolve outliers. For these data continuum scenarios, we foresee Stepwise tasks customized for managing these data maturity stage differences. Imagine capturing verbatim specimen determination data for medium-sized digitization projects and then parsing linkages to People, Times, and Taxa by the 10s, 100s, or 1000s at a time.Some of the growing diversity behind the TaxonWorks community is a result of the end-of-life of similar tools. For example, the SFG was asked to look into moving data from Scratchpads into TaxonWorks. We are in the process of moving one Scratchpad instance and will make the scripts we used to do this publicly available for further development. In August 2023, we migrated 16 projects from legacy SFG software to TaxonWorks, bringing new communities that can now join their expertise with others. As we move forward, we continue to work on distilling, synchronizing, and sharing our experiences and knowledge, via our community collective TaxonWorks Docs, embracing cultural change in support of the power in shared knowledge management.Finally, TaxonWorks is committed to serving the needs of those describing species. We expect to see it produce new treatments based on extremely atomized, yet linked, data, recognizable by humans as the format serving those in the field for over 200 years. Fully formatted nomenclatural histories, descriptions, m...
      PubDate: Tue, 5 Sep 2023 09:08:05 +0300
       
  • Growth and Evolution of the Symbiota Portal Network

    • Abstract: Biodiversity Information Science and Standards 7: e112028
      DOI : 10.3897/biss.7.112028
      Authors : Katie Pearson, Ed Gilbert, K. Samanta Orellana, Greg Post, Lindsay Walker, Jenn Yost, Nico Franz : Symbiota is empowering biodiversity collections communities across the globe to efficiently manage and mobilize their data. Beginning with only a handful of collections in two major portals in the early 2010s (Gries et al. 2014), Symbiota now acts as the primary content management system for over 1,000 collections in more than 50 portals. Over 1,800 collections share data through Symbiota portals, constituting over 90+ million records and 42+ million images. The iDigBio Symbiota Support Hub, a team and cyberinfrastructure based out of Arizona State University and supported by the United States (U.S.) National Science Foundation, hosts 52 Symbiota portals and provides daily help and resources to all Symbiota user communities. The Symbiota codebase is being actively developed in collaboration with several funded projects, including the U.S. National Ecological and Observatory Network (NEON), to support new data types and connections, such as between Symbiota portals and other collections management systems, and to other resources (e.g., Index Fungorum, Global Registry of Scientific Collections, Bionomia, Environmental Data Initiative). Because the Symbiota codebase is open source and shared among portals, new developments in any portal empower the entire network. Here we describe recent expansions of the Symbiota network, including new portals, collaborations, functionalities, and sustainability actions. We look forward to building further collaborations with diverse, international collections data communities. HTML XML PDF
      PubDate: Tue, 5 Sep 2023 09:01:02 +0300
       
  • Promoting High-Quality Data in OBIS: Insights from the OBIS Data
           Quality Assessment and Enhancement Project Team 

    • Abstract: Biodiversity Information Science and Standards 7: e112018
      DOI : 10.3897/biss.7.112018
      Authors : Yi-Ming Gan, Ruben Perez Perez, Pieter Provoost, Abigail Benson, Ana Carolina Peralta Brichtova, Elizabeth Lawrence, John Nicholls, Johnny Konjarla, Georgia Sarafidou, Hanieh Saeedi, Dan Lear, Anke Penzlin, Nina Wambiji, Ward Appeltans : The Ocean Biodiversity Information System (OBIS) (Klein et al. 2019) is a global database of marine biodiversity and associated environmental data, which provides critical information to researchers and policymakers worldwide. Ensuring the accuracy and consistency of the data in OBIS is essential for its usefulness and value, not only to the scientific community but also to the science-policy interface. The OBIS Data Quality Assessment and Enhancement Project Team (QCPT), formed in 2019 by the OBIS steering group, aims to assess and enhance data quality. It has been working on three categories of activities for this purpose:Data quality enhancement and managementThe OBIS QCPT organized data laundry events to identify and address data quality issues of published OBIS datasets. Furthermore, individual OBIS nodes were invited to give their data-processing presentations in the monthly meetings to foster knowledge sharing and collaborative problem-solving focused on data quality. Data quality issues and solutions highlighted in the presentations and data laundry events were documented in a dedicated GitHub repository as GitHub issues. The solutions for data quality issues and marine-specific pre-publication quality control tools, designed to identify the data quality issues, were provided as feedback to the OBIS Capacity Development Task Team. These inputs were used to create training resources (see OBIS manual, upcoming OBIS training course hosted on OceanTeacher Global Academy) aimed at preventing these issues.Standardization of OBIS data processing pipeline As OBIS uses the Darwin Core standard (Wieczorek et al. 2012), the use of standardized tests and assertions in the data processing pipeline is encouraged. To achieve this, the OBIS QCPT aligned OBIS quality checks with a subset of core tests and assertions (Chapman et al. 2020) developed by the Biodiversity Information Standards (TDWG) Biodiversity Data Quality (BDQ) Task Group 2 (TG2) (Chapman et al. 2020) as tracked in this GitHub issue. Not all default parameters of the core tests and assertions are optimal for marine biodiversity data. The OBIS QCPT met monthly to determine suitable parameters for customizing the tests. The pipeline produces a data quality report for each dataset with quality flags that indicate potential data quality issues, enabling node managers and data providers to review the flagged records.Community engagementThe OBIS QCPT led a survey among data users to gather insights into OBIS data quality issues and bridge the gap between the current implementation and user expectations. The survey findings enabled OBIS to prioritize issues to be addressed, as summarized in Section 2.2.2 of the 11th OBIS Steering Group meeting report. In addition to engaging with data users, the OBIS QCPT also served as a platform to discuss questions related to the use of Darwin Core from the nodes and provided feedback for the term discussions. In summary, the OBIS QCPT improves marine species data reliability and usability through transparent and participatory approaches, fostering continuous improvement. Collaborative efforts, standardized procedures, and knowledge sharing advance OBIS' mission of providing high quality biodiversity data for research, conservation, and ocean management. HTML XML PDF
      PubDate: Tue, 5 Sep 2023 08:52:00 +0300
       
  • Optimizing the Monitoring of Urban Fruit-Bearing Flora with Citizen
           Science: An Overview of the Pomar Urbano Initiative

    • Abstract: Biodiversity Information Science and Standards 7: e112009
      DOI : 10.3897/biss.7.112009
      Authors : Filipi Soares, Luís Ferreira Pires, Maria Carolina Garcia, Aline de Carvalho, Sheina Koffler, Natalia Ghilardi-Lopes, Rubens Silva, Benildes Maculan, Ana Maria Bertolini, Gabriela Rigote, Lidio Coradin, Uiara Montedo, Debora Drucker, Raquel Santiago, Maria Clara de Carvalho, Ana Carolina da Silva Lima, Karoline Reis de Almeida, Stephanie Gabriele Mendonça de França, Hillary Dandara Elias Gabriel, Bárbara Junqueira dos Santos, Antonio Saraiva : The "Pomar Urbano" (Urban Orchard) project focuses on the collaborative monitoring of fruit-bearing plant species in urban areas throughout Brazil.The project collected a list of 411 fruit-bearing plant species (Soares et al. 2023), both native and exotic varieties found in Brazil. This list was selected from two main sources: the book Brazilian Fruits and Cultivated Exotics (Lorenzi et al. 2006) and the book series Plants for the Future, which includes volumes specifically dedicated to species of economic value in different regions of Brazil, namely the South (Coradin et al. 2011), Midwest (Vieira et al. 2016), Northeast (Coradin et al. 2018) and North (Coradin et al. 2022). To ensure broad geographic coverage, the project spans all 27 state capitals of Brazil. The data collection process relies on the iNaturalist Umbrella and Collection projects. Each state capital has a single collection project, including the fruit-bearing plant species list, and the locality restriction to that specific city. For example, the collection project Pomar Paulistano gathers data from the city of São Paulo. The Umbrella Project Urban Orchard was set to track data from all 27 collection projects.We firmly believe that these fruit-bearing plant species possess multifaceted value that extends beyond mere consumption. As such, we have assembled a dynamic and multidisciplinary team comprising professionals from various institutions across Brazil in a collaborative effort that encompasses different dimensions of biodiversity value exploration and monitoring, especially phenological data.One facet of our team is focused on creating products inspired by the diverse array of Brazilian fruit-bearing plants. Their work spans across sectors of the creative industry, including fashion, painting, and graphic design to infuse these natural elements into innovative and sustainable designs (Fig. 1 and Fig. 2).A group of nutrition and health scientists in conjunction with communication and marketing professionals is working to produce engaging media content centered around food recipes that incorporate Brazilian fruits (Fig. 3). These recipes primarily feature the fruit-bearing plants most frequently observed on iNaturalist in the city of São Paulo, allowing us to showcase the local biodiversity while promoting culinary diversity. Some of these recipes are based on the book Brazilian Biodiversity: Flavors and Aromas (Santiago and Coradin 2018). This book is an extensive compendium of food recipes that use fruits derived from native Brazilian species. HTML XML PDF
      PubDate: Tue, 5 Sep 2023 08:41:42 +0300
       
  • Building the Australian National Species List

    • Abstract: Biodiversity Information Science and Standards 7: e111986
      DOI : 10.3897/biss.7.111986
      Authors : Endymion Cooper, Greg Whitbread, Anne Fuchs : The Australian National Species List (AuNSL) is a unified, nationally accepted, taxonomy for the native and naturalised biota of Australia. It is derived from a set of taxon-focussed resources including the Australian Plant Name Index and Australian Plant Census, the Australian Faunal Directory, and similar lists of fungi, lichens and bryophytes. These resources share a common infrastructure, contribute to the single national taxonomy (AuNSL), but retain their independent curation practices and online presentation. The AuNSL is now the core national infrastructure providing names and taxonomy for significant biodiversity data infrastructures including the Atlas of Living Australia, the Terrestrial Ecosystem Research Network, the Biodiversity Data Repository, and the Species Profile and Threats Database.As the go-to resource for names and taxonomy for Australia’s unique biodiversity, the AuNSL must be constantly updated to reflect taxonomic and nomenclatural change. For some taxonomic groups, the AuNSL is substantially complete, and the incorporation of new taxa and other novelties occurs with little time lag. For other taxonomic groups the data are patchy and updates sporadic. Like similar projects, the AuNSL would benefit from improvements to taxonomic data publishing and sharing. Such improvements have the potential to enable automated, real-time ingestion for new taxonomic and nomenclatural data, allowing curator time to be re-directed to backfilling the historical data from a dispersed and complex literature. Ideally, the AuNSL will be able to benefit from advances in automated approaches to processing the historical data, including via the sharing of standardised representations of such data.Here we outline the AuNSL data model, editor functionality, and describe our approach to sharing our data via existing and emerging standards such as Darwin Core and Taxon Concept Schema (TCS2). We then describe what we, as consumers of taxonomic data from published works, really need from publishers of new, and reprocessed historical data. In brief, we need structured taxonomic data conforming to an adequate standard. HTML XML PDF
      PubDate: Fri, 1 Sep 2023 09:17:26 +0300
       
  • Collections from Colonial Australia in Berlin's Museum für Naturkunde and
           the Challenges of Data Accessibility

    • Abstract: Biodiversity Information Science and Standards 7: e111980
      DOI : 10.3897/biss.7.111980
      Authors : Anja Schwarz, Fiona Möhrle, Sabine von Mering : German-speaking naturalists working in southeastern Australia in the mid-19th century relied heavily on the expertise of First Nations intermediaries who acted as guides, collectors, traders and translators (Clarke 2008, Olsen and Russell 2019). Many of these naturalists went to Australia because of the research opportunities offered by the British Empire at a time when the German nation states did not have colonies of their own. Others sought to escape political upheaval at home. They were welcome employees for colonial government agencies due to their training in the emerging research-oriented natural sciences that the reformed German universities offered at a time when British universities were still providing a broad general education (Home 1995, Kirchberger 2000).Wilhelm von Blandowski (1822–1878  ) and Gerard Krefft (1830–1881  ), who both worked in colonial Victoria and New South Wales, are among this group. Throughout their work, they corresponded extensively with naturalists in Berlin, exchanging specimens and ideas. But the preserved Australian animals, plants and rock samples, as well as the written and drawn records of animals and landscapes now held at the Museum für Naturkunde Berlin (MfN), are much more than objects of scientific interest. They also contain information about Australia's First Nations. The collections provide evidence of their role in collecting as well as their knowledge of the natural world, which has long been overlooked and, at least in part deliberately, made invisible by Western knowledge systems (e.g., Das and Lowe 2018, Ashby 2020).People data have been recognised as crucial for linking such collection objects with expeditions, publications, archival material and correspondence (Groom et al. 2020, Groom et al. 2022). It can thus potentially help reconstruct invisibilized Indigenous histories and knowledge. However, while the MfN keeps information about European collectors and other non-indigenous agents associated with their specimens in internal catalogues, databases and wikis, Indigenous actors remain largely absent from these repositories, which reproduce the colonial archive 'along the archival grain' (Stoler 2009).With this in mind, we discuss in our presentation the complexities of using persistent identifiers and tools, such as Wikidata, to improve the integration and linkage of people data in the work currently being undertaken by the MfN and the Berlin's Australian Archive project to digitise and make accessible the museum’s collections. Drawing upon the guidance provided by the FAIR*1 and CARE*2 principles for data (Wilkinson et al. 2016, Carroll et al. 2020), and learning from the 2012 ATSILIRN Protocols for Libraries, Archives and Information Services*3, the 2019 Tandanya Adelaide Declaration and the 2020 AIATSIS Code of Ethics*4, we address the potential of these efforts in terms of collection accessibility, and also highlight the challenges and limitations of this approach in the context of colonial collections. HTML XML PDF
      PubDate: Fri, 1 Sep 2023 09:11:14 +0300
       
  • Mapping.bio: Piloting FAIR semantic mappings for biodiversity digital
           twins

    • Abstract: Biodiversity Information Science and Standards 7: e111979
      DOI : 10.3897/biss.7.111979
      Authors : Alexander Wolodkin, Claus Weiland, Jonas Grieb : Biodiversity research has a strong focus on the links between environment and functional traits, e.g., to assess how anthropogenic drivers of change impact ecological systems (Díaz et al. 2013). Interoperable exchange and integration of such data is enabled through the use of ontologies that provide ”meaning” to data and enable downstream processing involving learning and inference over graph-structured models of these data (Kulmanov et al. 2020). However, the development of thematically similar semantic artifacts, e.g., the Environmental Ontology (ENVO, Buttigieg et al. 2016) and the Semantic Web for Earth and Environment Technology Ontology (SWEET, DiGiuseppe et al. 2014), in biodiversity-related disciplines (e.g., environmental genomics and earth observation) can introduce substantial conceptual overlaps, and highlights the need for bridging technologies to facilitate reuse of biodiversity data across those knowledge fields (Karam et al. 2020). A recent design study, funded by the European Open Science Cloud (EOSC), proposes a framework to create, document and publish mappings and crosswalks linking different semantic artifacts within a particular scientific community and across scientific domains under the label of "Flexible Semantic Mapping Framework" (SEMAF, Broeder et al. 2021). SEMAF puts a strong emphasis on so-called pragmatic mappings, i.e., mappings that are driven by specific interoperability goals such as translations between specific observation measurements (e.g., sensor configurations) and metadata descriptions. Within the Horizon Europe Project “Biodiversity Digital Twin for Advanced Modelling, Simulation and Prediction Capabilities" (BioDT), a mapping tool leveraging SEMAF is currently under development: Mapping.bio provides a lightweight web service to read semantic artifacts, visualize them, add mappings as graphical connections and store the mappings as FAIR (Findable, Accessible, Interoperable Reusable) Digital Objects (FDOs, De Smedt et al. 2020) in a repository. To foster reusability, sustainably and long-term availability of digital objects, mapping.bio features mappings compliant with the Simple Standard for Sharing Ontological Mappings (SSSOM, Matentzoglu et al. 2022), a machine-interpretable and extensible vocabulary enabling the self-contained exploration and processing of annotated mappings by machines (machine actionability, Jacobsen et al. 2020). HTML XML PDF
      PubDate: Fri, 1 Sep 2023 09:09:52 +0300
       
  • Systema Dipterorum

    • Abstract: Biodiversity Information Science and Standards 7: e111959
      DOI : 10.3897/biss.7.111959
      Authors : Thomas Pape, Neal Evenhuis : From its birth as the ‘Biosystematic Database of World Diptera’ in 1984, the ‘Systema Dipterorum’ (Evenhuis and Pape 2023) has grown into one of the largest databases currently maintained for the taxonomy and nomenclature of a single order of insects. Systema Dipterorum covers all two-winged insects (Diptera), and with almost a quarter of a million names representing more than 170,000 valid species distributed in some 13,000 valid genera, we cover about 10% of the described and named Animalia. About 1,000 new nominal species are described annually within Diptera. Data are entered in FileMaker Pro (database) and served through an online portal*1 with an updated version currently provided every two months. Names are harvested and reviewed through a four-tier quality assurance hierarchy with entries eventually reaching taxonomic and nomenclatural standards equivalent to being published online. The nomenclatural status of each name is shown using 50 different codes, and at this moment a published authority source is linked to more than 70% of the entries. Universal Unique Identifiers (UUIDs) are automatically generated for every record of names and the more than 35,000 references. Names are made available for the Catalogue of Life, and we envision a web portal for seamless harvesting of new names and literature as well as for updating of both nomenclature and taxonomy by making changes and correcting errors with explicit reference to published authority sources. We envision the future for Systema Dipterorum to be a one-stop website, where clicking on a name resulting from a search may call up links to, e.g., its nomenclatural registry in ZooBank, the original description through the Biodiversity Heritage Library, taxonomic treatments from Plazi, images from Morphbank, occurrence data through the Global Biodiversity Information Facility (GBIF), molecular sequence data from GenBank, Barcode Index Numbers (BINs) from Barcode of Life, and additional data from many other sources. HTML XML PDF
      PubDate: Fri, 1 Sep 2023 09:01:53 +0300
       
  • An Excel Template Generator for Darwin Core

    • Abstract: Biodiversity Information Science and Standards 7: e111907
      DOI : 10.3897/biss.7.111907
      Authors : Luke Marsden, Olaf Schneider : Scientific data is diverse and can be complex, potentially including biotic or abiotic measurements, material samples, DNA derived data and more. Especially for researchers who are new to the Darwin Core standard (Darwin Core Task Group 2009), it is not always obvious what the best practice is for creating a Darwin Core Archive (GBIF 2021) for their data. Which core and extensions should they select' Which Darwin Core terms*1 should they include' We present the 'Learnings from Nansen Legacy template generator' (Marsden and Schneider 2023), a spreadsheet template generator to simplify the creation of Darwin Core Archives. It enables users to create a single Microsoft Excel file that includes one sheet per core or extension using a graphical user interface. The user can select from a complete list Darwin Core terms to use as column headers. There are requirements and recommendations for which terms are selected for each core and extension. Descriptions for all terms are displayed when one hovers over a Darwin Core term, and are stored as notes in the template when one select a relevant cell. The generated template includes cell restrictions to prevent one from inputting data in an incorrect format. A separate configuration is also available to aid researchers in creating CF-NetCDF*2 files for physical data, which are also compliant with the FAIR (Findable, Accessible, Interoperable, Resuable)*3 data management principles.The Learnings from Nansen Legacy template generator is published (Marsden and Schneider 2023) and can be installed for use on your website or computer by following the instructions on the software's GitHub repository*4. The template generator can be tested where it is currently hosted, by SIOS*5 (Svalbard Integrated Arctic Earth Observing System). One can also refer to a YouTube tutorial*6 on how the template generator works. HTML XML PDF
      PubDate: Thu, 31 Aug 2023 09:16:27 +030
       
  • Japan Biodiversity Information Initiative (JBIF)'s Efforts to Collect and
           Publish Biodiversity Information from Japan

    • Abstract: Biodiversity Information Science and Standards 7: e111893
      DOI : 10.3897/biss.7.111893
      Authors : Showtaro Kakizoe, Aino Ota, Tsuyoshi Hosoya, Utsugi Jinbo : The Japan Biodiversity Information Initiative (JBIF) was originally established in 2007 as the Global Biodiversity Information Facility (GBIF) Japan National Node to aggregate biodiversity data in Japan and conduct publications through GBIF. JBIF was later renamed after Japan became a GBIF observer, but activities including data publication through GBIF have continued to the present. JBIF operates with the support of the National BioResource Project (NBRP) by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), with collaboration from three institutions: the National Institute of Genetics (NIG), the National Institute for Environmental Studies (NIES), and the National Museum of Nature and Science (NMNS). The NBRP is a project that focuses on the collection, preservation, provision, and enhancement of bioresources.JBIF collects both observation and specimen data and publishes them through GBIF. For domestic data use, a search system for data published by JBIF is available on the JBIF website. Moreover, NMNS managed a museum network called the Science Museum Net (S-Net), and bilingual (Japanese and English) specimen data collected by S-Net is also available via the S-Net portal site. We are working to promote the biodiversity informatics field in Japan through a translation of the GBIF resources, including the website, important documents such as the GBIF Science Review, as well as organize workshops and conferences, primarily targeting students, researchers, museum curators, and local government officials, to facilitate the sharing of information and exchange of opinions on biodiversity information.To date, Japan has published 564 datasets and over 12 million occurrences to GBIF, making it the third-largest contributor of data to GBIF in Asia, following India and Taiwan. Moreover, regarding specimen-based occurrence data, Japan is the largest contributor in Asia.In this presentation, we will introduce JBIF's initiatives and future activities. HTML XML PDF
      PubDate: Thu, 31 Aug 2023 09:14:03 +030
       
  • FAIR Principles and TDWG Standards: The case of morphological description
           of taxa and specimens

    • Abstract: Biodiversity Information Science and Standards 7: e111859
      DOI : 10.3897/biss.7.111859
      Authors : Régine Vignes Lebbe : Sharing data is crucial in biodiversity research as well as in all scientific domains. Biodiversity Information Standards (TDWG) validates and makes available a set of standards to facilitate the sharing of biodiversity data. Of the 23 standards listed in alphabetical order, each has a status, a category, and a short description. But these standards are designed for very different purposes, which we will discuss by linking them to the FAIR principles (Findable, Accessible, Interoperable, and Reusable) .The FAIR principles (Wilkinson et al. 2016) focus on the ability of machines to automatically find and use the digital data. It is therefore crucial that software for editing, acquiring and using data, shares defined standards that are made available to all. TDWG has been working in this direction for over 30 years. Pioneers in biodiversity informatics, such as Richard Pankhust (Pankhurst 1970), Mike Dallwitz (Dallwitz 1974, Dallwitz 1980) and Jacques Lebbe (Lebbe et al. 1987) worked specifically on taxon identification with computers and how to represent morphological descriptions of taxa and specimens.Some TDWG standards, such as ABCD (Access to Biological Collection Data; Access to Biological Collections Data Task Group 2005), TCS (Taxonomic Concept Transfer Schema; Taxonomic Names Subgroup 2006) or SDD (Structured Descriptive Data; Structure of Descriptive Data (SDD) Subgroup 2006) are expressed by an XML schema covering a formal data model. Other standards, as Floristic Regions of the World (Takhtajan 1986), or Vocabulary Maintenance Standard (VMS; Vocabulary Maintenance Specification Task Group 2017) concern vocabularies or a collection of standardized terms. The Plant Occurrence and Status Scheme (POSS; World Conservation Monitoring Centre 1995) provides both, a list of accepted terms, and a data model (list of fields). In case of morpho-anatomical data describing taxa or specimens, TDWG offers two standards: DELTA (DEscription Language for TAxonomy, Dallwitz 2006) and SDD (Structured Descriptive Data, Hagedorn 2007).In order to further the discussion on morphological description data sharing, we would like to clarify what is meant by the term standard. We'll be looking at the concepts of guidelines, rules, defined format, referential list of terms, data schema, model, metamodel, protocols, which are all terms linked to this notion of standard and FAIR principles. Perhaps this reflection will lead us to propose criteria for better classifying TDWG standards. HTML XML PDF
      PubDate: Thu, 31 Aug 2023 09:11:06 +030
       
  • Towards FAIR Principles in Biodiversity Research: Enabling computable
           taxonomic descriptions and ecological data with Phenoscript

    • Abstract: Biodiversity Information Science and Standards 7: e111862
      DOI : 10.3897/biss.7.111862
      Authors : Sergei Tarasov, Giulio Montanaro, Federica Losacco, Diego Porto : Taxonomic descriptions hold immense phenotypic data, but their natural language (NL) format poses challenges for computer analysis. In this talk, we will present Phenoscript, a user-friendly computer language enabling computer-readable species descriptions and automated phenotype comparisons, in accordance with FAIR (Findable, Accessible, Interoperable, Reusable) principles.Phenoscript facilitates the creation of semantic species descriptions that represent a knowledge graph composed of terms from predefined biological ontologies. A Phenoscript description resembles a NL description, but follows a specific language grammar. We have developed the Phenospy package: a Python-based Phenoscript toolkit. Phenospy converts Phenoscript descriptions into both NL format, facilitating scientific publication, and the Web Ontology Language (OWL) format, enabling downstream analysis and computable phenotypic comparisons. OWL is a standard for sharing semantic data on the Web. While initially designed for phenotypes, Phenoscript can be extended to create semantic ecological data, encompassing environmental traits, functional traits, and species interactions. We will discuss the integration of species and ecological traits encoded in Phenoscript into downstream analysis, highlighting its potential for phenomic-level research in biology. HTML XML PDF
      PubDate: Wed, 30 Aug 2023 17:24:57 +030
       
  • Improving FAIRness of eDNA and Metabarcoding Data: Standards and tools for
           European Nucleotide Archive data deposition

    • Abstract: Biodiversity Information Science and Standards 7: e111835
      DOI : 10.3897/biss.7.111835
      Authors : Joana Paupério, Vikas Gupta, Josephine Burgin, Suran Jayathilaka, Jerry Lanfear, Kessy Abarenkov, Urmas Kõljalg, Lyubomir Penev, Guy Cochrane : The advancements in sequencing technologies have promoted the generation of molecular data for cataloguing and describing biodiversity. The analysis of environmental DNA (eDNA) through the application of metabarcoding techniques enables comprehensive descriptions of communities and their function, being fundamental for understanding and preserving biodiversity. Metabarcoding is becoming widely used and standard methods are being generated for a growing range of applications with high scalability. The generated data can be made available in its unprocessed form, as raw data (the sequenced reads) or as interpreted data, including sets of sequences derived after bioinformatics processing (Amplicon Sequence Variants (ASVs) or Operational Taxonomic Units (OTUs)) and occurrence tables (tables that describe the occurrences and abundances of species or OTUs/ASVs). However, for this data to be Findable, Accessible, Interoperable and Reusable (FAIR), and therefore fully available for meaningful interpretation, it needs to be deposited in public repositories together with enriched sample metadata, protocols and analysis workflows (ten Hoopen et al. 2017). Metabarcoding raw data and associated sample metadata is often stored and made available through the International Nucleotide Sequence Database Collaboration (INSDC) archives (Arita et al. 2020), of which the European Nucleotide Archive (ENA, Burgin et al. 2022) is its European database, but it is often deposited with minimal information, which hinders data reusability. Within the scope of the Horizon 2020 project, Biodiversity Community Integrated Knowledge Library (BiCIKL), which is building a community of interconnected data for biodiversity research (Penev et al. 2022), we are working towards improving the standards for molecular ecology data sharing, developing tools to facilitate data deposition and retrieval, and linking between data types. Here we will present the ENA data model, showcasing how metabarcoding data can be shared, while providing enriched metadata, and how this data is linked with existing data in other research infrastructures in the biodiversity domain, such as the Global Biodiversity Information Facility (GBIF), where data is deposited following the guidelines published in Abarenkov et al. (2023). We will also present the results of our recent discussions on standards for this data type and discuss future plans towards continuing to improve data sharing and interoperability for molecular ecology. HTML XML PDF
      PubDate: Wed, 30 Aug 2023 17:17:23 +030
       
  • Reaching Further with Earth Science Data

    • Abstract: Biodiversity Information Science and Standards 7: e111768
      DOI : 10.3897/biss.7.111768
      Authors : Rachel Walcott, Kerstin Lehnert : Earth Sciences cover a broad spectrum of research fields such as petrology, sedimentology, structural geology, seismology, and geomorphology, to name a few, which aim to understand interrelated processes on the surface and in the interior of our planet. Many of the research questions studied in the Earth sciences, such as, understanding past climates, the human impact on the Critical Zone*1, or the co-evolution of the geo- and biosphere, require interdisciplinary approaches, employing methodologies and integrating observations from diverse subdisciplines such as geochemistry, mineralogy, geophysics, and paleontology, but also increasingly from other disciplines, specifically biology and genomics. The use of data and samples in multi- and interdisciplinary research is substantially facilitated by consistent metadata including schemas and vocabularies, data formats, and data exchange protocols.The association of Biodiversity Information Standards (TDWG) with the Earth Science part of the Natural Sciences has, until recently, been largely limited to palaeontology, primarily due to its close affinity with biology. Standards have been developed that support the taxonomy of fossils and include terms for "deep time" intervals (chrono- and bio-stratigraphy) and "paleo-surface environment" (sedimentology). Other important and highly relevant fields such as mineralogy, geochemistry, meteoritics, and soil sciences have not been included or coordinated with so far. Although some progress is being made with the ongoing development of the Mineral Extension for Darwin Core and the inclusion of some geological terms in Latimer Core (Woodburn et al. 2022), more effort should be made to link and align with existing and emerging data standards in these Earth Science communities. These Earth Science communities are very active, building their own domain-specific FAIR (Findable, Accessible, Interoperable, Reusable) data standards and infrastructures with data systems such as EarthChem, the Astromaterials Data System, Mindat, and WoSIS.We will introduce the users, data systems, and community standards and best practices, including metadata, vocabularies, and persistent identifiers for data and samples being used and developed in the subdisciplines of the Earth Sciences mentioned above. We advocate for collaboration and coordination with these fields to make TDWG more inclusive, expand accessible resources, and provide an opportunity for TDWG to link to these other communities. Broader networking within an interdisciplinary research framework has the potential to address more fundamental and over-arching issues of both scientific and public relevance. HTML XML PDF
      PubDate: Tue, 29 Aug 2023 09:01:01 +030
       
  • Knowledge Base on Species Life Traits: A Spanish/French Plinian Core
           implementation use case

    • Abstract: Biodiversity Information Science and Standards 7: e111784
      DOI : 10.3897/biss.7.111784
      Authors : Sophie Pamerlon, Anne-Sophie Archambeau, Francisco Pando de la Hoz, Gargominy Olivier, Franck Michel, Sandrine Tercerie, Eva Rodinson, Noëlie Maurel, Gloria Martínez-Sagarra, Adeline Kerner, Régine Vignes Lebbe, Bertrand Schatz, Pascal Dupont : The French “Traits” working group was created in 2021 to support the development of the national knowledge base on species life traits managed by the PatriNat department*1, to identify and implement a suitable standard for managing and sharing species life traits (including interactions) at the national, then international, level. Its core members are part of several PatriNat teams (Species Knowledge, Dissemination & Mediation, Coordination of Information Systems), as well as other French research units*2 working on the topic of traits and ontologies.The Plinian Core (Plinian Core Task Group 2021) was first discussed in 2004 and its development began in 2005–2006 when the first version was deployed as a collaboration between InBIO*3 (Costa Rica) and GBIF Spain*4. It reuses and extends the Darwin Core vocabulary (Wieczorek et al. 2012, Darwin Core Maintenance Interest Group 2014) to describe different aspects of biological species information, that is, all kinds of properties or traits related to taxa, including biological and non-biological species traits.The Plinian Core was discussed with Dr Pando (convener of the TDWG Plinian Core Task Group*5) during one of the Traits working group meetings, and was found to be relevant to the French species life traits database (currently in development). The Traits working group future works will be following the example of the Plinian Core-based EIDOS database*6 (Spanish Ministry for the Ecological Transition), which allows for detailed species pages with distinct information sections (e.g., interactions, taxonomy, legal status, conservation). This collaboration resulted in a Capacity Enhancement Support Programme project submission (GBIF 2023) between French and Spanish partners, allowing for the consolidation of both the infrastructure and the sharing process of species life traits for taxa found on all French territories, as well as European Union territories.Additionally, this is an opportunity to provide information to GBIF (Global Biodiversity Information Facility) through a new update of the TAXREF (Gargominy 2022) national checklist, one of the core constituents of the GBIF Backbone Taxonomy (GBIF 2022). Species life traits and interactions will be added thanks to the new Plinian Core extension implemented on the GBIF Integrated Publishing Toolkit (IPT),*7 and an Atlas of Living Australia’s architecture BIE (Biodiversity Information Explorer) module*8 developed by Costa Rica in the context of a Capacity Enhancement Support Programme (CESP) project carried out with SIBBR*9 (GBIF Brasil). HTML XML PDF
      PubDate: Tue, 29 Aug 2023 08:53:49 +030
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 44.213.60.33
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-
JournalTOCs
 
 
  Subjects -> BIOLOGY (Total: 3134 journals)
    - BIOCHEMISTRY (239 journals)
    - BIOENGINEERING (143 journals)
    - BIOLOGY (1491 journals)
    - BIOPHYSICS (53 journals)
    - BIOTECHNOLOGY (243 journals)
    - BOTANY (220 journals)
    - CYTOLOGY AND HISTOLOGY (32 journals)
    - ENTOMOLOGY (67 journals)
    - GENETICS (152 journals)
    - MICROBIOLOGY (265 journals)
    - MICROSCOPY (13 journals)
    - ORNITHOLOGY (26 journals)
    - PHYSIOLOGY (73 journals)
    - ZOOLOGY (117 journals)

BIOLOGY (1491 journals)                  1 2 3 4 5 6 7 8 | Last

Showing 1 - 200 of 1720 Journals sorted alphabetically
AAPS Journal     Hybrid Journal   (Followers: 29)
ACS Pharmacology & Translational Science     Hybrid Journal   (Followers: 3)
ACS Synthetic Biology     Hybrid Journal   (Followers: 39)
Acta Biologica Hungarica     Full-text available via subscription   (Followers: 6)
Acta Biologica Marisiensis     Open Access   (Followers: 5)
Acta Biologica Sibirica     Open Access   (Followers: 2)
Acta Biologica Turcica     Open Access   (Followers: 2)
Acta Biomaterialia     Hybrid Journal   (Followers: 32)
Acta Biotheoretica     Hybrid Journal   (Followers: 3)
Acta Chiropterologica     Full-text available via subscription   (Followers: 6)
acta ethologica     Hybrid Journal   (Followers: 7)
Acta Fytotechnica et Zootechnica     Open Access   (Followers: 3)
Acta Ichthyologica et Piscatoria     Open Access   (Followers: 5)
Acta Médica Costarricense     Open Access   (Followers: 2)
Acta Scientiarum. Biological Sciences     Open Access   (Followers: 2)
Acta Scientifica Naturalis     Open Access   (Followers: 4)
Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis     Open Access   (Followers: 2)
Actualidades Biológicas     Open Access   (Followers: 1)
Advanced Biology     Hybrid Journal   (Followers: 1)
Advanced Health Care Technologies     Open Access   (Followers: 12)
Advanced Journal of Graduate Research     Open Access   (Followers: 2)
Advanced Membranes     Open Access   (Followers: 9)
Advanced Quantum Technologies     Hybrid Journal   (Followers: 5)
Advances in Biological Regulation     Hybrid Journal   (Followers: 4)
Advances in Biology     Open Access   (Followers: 16)
Advances in Biomarker Sciences and Technology     Open Access   (Followers: 2)
Advances in Biosensors and Bioelectronics     Open Access   (Followers: 8)
Advances in Cell Biology/ Medical Journal of Cell Biology     Open Access   (Followers: 28)
Advances in Ecological Research     Full-text available via subscription   (Followers: 47)
Advances in Environmental Sciences - International Journal of the Bioflux Society     Open Access   (Followers: 17)
Advances in Enzyme Research     Open Access   (Followers: 11)
Advances in High Energy Physics     Open Access   (Followers: 27)
Advances in Life Science and Technology     Open Access   (Followers: 14)
Advances in Life Sciences     Open Access   (Followers: 6)
Advances in Marine Biology     Full-text available via subscription   (Followers: 29)
Advances in Virus Research     Full-text available via subscription   (Followers: 8)
Adversity and Resilience Science : Journal of Research and Practice     Hybrid Journal   (Followers: 4)
African Journal of Ecology     Hybrid Journal   (Followers: 18)
African Journal of Range & Forage Science     Hybrid Journal   (Followers: 12)
AFRREV STECH : An International Journal of Science and Technology     Open Access   (Followers: 3)
Ageing Research Reviews     Hybrid Journal   (Followers: 13)
Aggregate     Open Access   (Followers: 3)
Aging Cell     Open Access   (Followers: 23)
Agrokémia és Talajtan     Full-text available via subscription   (Followers: 2)
AJP Cell Physiology     Hybrid Journal   (Followers: 13)
AJP Endocrinology and Metabolism     Hybrid Journal   (Followers: 14)
AJP Lung Cellular and Molecular Physiology     Hybrid Journal   (Followers: 3)
Al-Kauniyah : Jurnal Biologi     Open Access  
Alasbimn Journal     Open Access   (Followers: 1)
Alces : A Journal Devoted to the Biology and Management of Moose     Open Access  
Alfarama Journal of Basic & Applied Sciences     Open Access   (Followers: 12)
All Life     Open Access   (Followers: 2)
AMB Express     Open Access   (Followers: 1)
Ambix     Hybrid Journal   (Followers: 3)
American Journal of Agricultural and Biological Sciences     Open Access   (Followers: 7)
American Journal of Bioethics     Hybrid Journal   (Followers: 17)
American Journal of Human Biology     Hybrid Journal   (Followers: 19)
American Journal of Plant Sciences     Open Access   (Followers: 24)
American Journal of Primatology     Hybrid Journal   (Followers: 17)
American Naturalist     Full-text available via subscription   (Followers: 82)
Amphibia-Reptilia     Hybrid Journal   (Followers: 5)
Anaerobe     Hybrid Journal   (Followers: 3)
Analytical Methods     Hybrid Journal   (Followers: 7)
Analytical Science Advances     Open Access   (Followers: 2)
Anatomia     Open Access   (Followers: 16)
Anatomical Science International     Hybrid Journal   (Followers: 3)
Animal Cells and Systems     Hybrid Journal   (Followers: 4)
Animal Microbiome     Open Access   (Followers: 7)
Animal Models and Experimental Medicine     Open Access  
Annales françaises d'Oto-rhino-laryngologie et de Pathologie Cervico-faciale     Full-text available via subscription   (Followers: 2)
Annales Henri Poincaré     Hybrid Journal   (Followers: 2)
Annales Universitatis Mariae Curie-Sklodowska, sectio C – Biologia     Open Access   (Followers: 1)
Annals of Applied Biology     Hybrid Journal   (Followers: 7)
Annals of Biomedical Engineering     Hybrid Journal   (Followers: 18)
Annals of Human Biology     Hybrid Journal   (Followers: 6)
Annals of Science and Technology     Open Access   (Followers: 2)
Annual Research & Review in Biology     Open Access   (Followers: 1)
Annual Review of Biomedical Engineering     Full-text available via subscription   (Followers: 19)
Annual Review of Cell and Developmental Biology     Full-text available via subscription   (Followers: 40)
Annual Review of Food Science and Technology     Full-text available via subscription   (Followers: 13)
Annual Review of Genomics and Human Genetics     Full-text available via subscription   (Followers: 32)
Antibiotics     Open Access   (Followers: 12)
Antioxidants     Open Access   (Followers: 4)
Antonie van Leeuwenhoek     Hybrid Journal   (Followers: 3)
Anzeiger für Schädlingskunde     Hybrid Journal   (Followers: 1)
Apidologie     Hybrid Journal   (Followers: 4)
Apmis     Hybrid Journal   (Followers: 1)
APOPTOSIS     Hybrid Journal   (Followers: 5)
Applied Biology     Open Access  
Applied Bionics and Biomechanics     Open Access   (Followers: 4)
Applied Phycology     Open Access   (Followers: 1)
Applied Vegetation Science     Full-text available via subscription   (Followers: 9)
Aquaculture Environment Interactions     Open Access   (Followers: 7)
Aquaculture International     Hybrid Journal   (Followers: 25)
Aquaculture Reports     Open Access   (Followers: 3)
Aquaculture, Aquarium, Conservation & Legislation - International Journal of the Bioflux Society     Open Access   (Followers: 9)
Aquatic Biology     Open Access   (Followers: 9)
Aquatic Ecology     Hybrid Journal   (Followers: 45)
Aquatic Ecosystem Health & Management     Hybrid Journal   (Followers: 16)
Aquatic Science and Technology     Open Access   (Followers: 4)
Aquatic Toxicology     Hybrid Journal   (Followers: 26)
Arabian Journal of Scientific Research / المجلة العربية للبحث العلمي     Open Access  
Archaea     Open Access   (Followers: 3)
Archiv für Molluskenkunde: International Journal of Malacology     Full-text available via subscription   (Followers: 1)
Archives of Biological Sciences     Open Access  
Archives of Microbiology     Hybrid Journal   (Followers: 9)
Archives of Natural History     Hybrid Journal   (Followers: 8)
Archives of Oral Biology     Hybrid Journal   (Followers: 2)
Archives of Virology     Hybrid Journal   (Followers: 6)
Archivum Immunologiae et Therapiae Experimentalis     Hybrid Journal   (Followers: 2)
Arid Ecosystems     Hybrid Journal   (Followers: 2)
Arquivos do Museu Dinâmico Interdisciplinar     Open Access  
Arthropod Structure & Development     Hybrid Journal   (Followers: 1)
Arthropod Systematics & Phylogeny     Open Access   (Followers: 13)
Artificial DNA: PNA & XNA     Hybrid Journal   (Followers: 2)
Artificial Intelligence in the Life Sciences     Open Access   (Followers: 1)
Asian Bioethics Review     Full-text available via subscription   (Followers: 2)
Asian Journal of Biological Sciences     Open Access   (Followers: 2)
Asian Journal of Biology     Open Access  
Asian Journal of Biotechnology and Bioresource Technology     Open Access  
Asian Journal of Cell Biology     Open Access   (Followers: 4)
Asian Journal of Developmental Biology     Open Access   (Followers: 1)
Asian Journal of Medical and Biological Research     Open Access   (Followers: 3)
Asian Journal of Nematology     Open Access   (Followers: 4)
Asian Journal of Poultry Science     Open Access   (Followers: 3)
Atti della Accademia Peloritana dei Pericolanti - Classe di Scienze Medico-Biologiche     Open Access  
Australian Life Scientist     Full-text available via subscription   (Followers: 2)
Australian Mammalogy     Hybrid Journal   (Followers: 8)
Autophagy     Hybrid Journal   (Followers: 8)
Avian Biology Research     Hybrid Journal   (Followers: 4)
Avian Conservation and Ecology     Open Access   (Followers: 19)
Bacterial Empire     Open Access   (Followers: 1)
Bacteriology Journal     Open Access   (Followers: 2)
Bacteriophage     Full-text available via subscription   (Followers: 2)
Bangladesh Journal of Bioethics     Open Access  
Bangladesh Journal of Scientific Research     Open Access  
Between the Species     Open Access   (Followers: 2)
BIO Web of Conferences     Open Access  
BIO-SITE : Biologi dan Sains Terapan     Open Access  
Biocatalysis and Biotransformation     Hybrid Journal   (Followers: 4)
BioCentury Innovations     Full-text available via subscription   (Followers: 2)
Biochemistry and Cell Biology     Hybrid Journal   (Followers: 18)
Biochimie     Hybrid Journal   (Followers: 2)
BioControl     Hybrid Journal   (Followers: 2)
Biocontrol Science and Technology     Hybrid Journal   (Followers: 5)
Biodemography and Social Biology     Hybrid Journal   (Followers: 1)
BIODIK : Jurnal Ilmiah Pendidikan Biologi     Open Access  
BioDiscovery     Open Access   (Followers: 2)
Biodiversity : Research and Conservation     Open Access   (Followers: 30)
Biodiversity Data Journal     Open Access   (Followers: 7)
Biodiversity Informatics     Open Access   (Followers: 3)
Biodiversity Information Science and Standards     Open Access   (Followers: 3)
Biodiversity Observations     Open Access   (Followers: 2)
Bioeksperimen : Jurnal Penelitian Biologi     Open Access  
Bioelectrochemistry     Hybrid Journal   (Followers: 1)
Bioelectromagnetics     Hybrid Journal   (Followers: 1)
Bioenergy Research     Hybrid Journal   (Followers: 3)
Bioengineering and Bioscience     Open Access   (Followers: 1)
BioEssays     Hybrid Journal   (Followers: 10)
Bioethics     Hybrid Journal   (Followers: 20)
BioéthiqueOnline     Open Access   (Followers: 1)
Biogeographia : The Journal of Integrative Biogeography     Open Access   (Followers: 2)
Biogeosciences (BG)     Open Access   (Followers: 19)
Biogeosciences Discussions (BGD)     Open Access   (Followers: 3)
Bioinformatics     Hybrid Journal   (Followers: 307)
Bioinformatics Advances : Journal of the International Society for Computational Biology     Open Access   (Followers: 4)
Bioinformatics and Biology Insights     Open Access   (Followers: 14)
Biointerphases     Open Access   (Followers: 1)
Biojournal of Science and Technology     Open Access  
Biologia     Hybrid Journal   (Followers: 1)
Biologia Futura     Hybrid Journal  
Biologia on-line : Revista de divulgació de la Facultat de Biologia     Open Access  
Biological Bulletin     Partially Free   (Followers: 6)
Biological Control     Hybrid Journal   (Followers: 6)
Biological Invasions     Hybrid Journal   (Followers: 24)
Biological Journal of the Linnean Society     Hybrid Journal   (Followers: 18)
Biological Procedures Online     Open Access  
Biological Psychiatry     Hybrid Journal   (Followers: 59)
Biological Psychology     Hybrid Journal   (Followers: 5)
Biological Research     Open Access   (Followers: 1)
Biological Rhythm Research     Hybrid Journal  
Biological Theory     Hybrid Journal   (Followers: 3)
Biological Trace Element Research     Hybrid Journal  
Biologicals     Full-text available via subscription   (Followers: 5)
Biologics: Targets & Therapy     Open Access   (Followers: 1)
Biologie Aujourd'hui     Full-text available via subscription  
Biologie in Unserer Zeit (Biuz)     Hybrid Journal   (Followers: 2)
Biologija     Open Access  
Biology     Open Access   (Followers: 5)
Biology and Philosophy     Hybrid Journal   (Followers: 19)
Biology Bulletin     Hybrid Journal   (Followers: 1)
Biology Bulletin Reviews     Hybrid Journal  
Biology Direct     Open Access   (Followers: 9)
Biology Methods and Protocols     Open Access  
Biology of Sex Differences     Open Access   (Followers: 1)
Biology of the Cell     Full-text available via subscription   (Followers: 8)
Biology, Medicine, & Natural Product Chemistry     Open Access   (Followers: 2)
Biomacromolecules     Hybrid Journal   (Followers: 21)
Biomarker Insights     Open Access   (Followers: 1)
Biomarkers     Hybrid Journal   (Followers: 5)

        1 2 3 4 5 6 7 8 | Last

Similar Journals
Similar Journals
HOME > Browse the 73 Subjects covered by JournalTOCs  
SubjectTotal Journals
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 44.213.60.33
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-