A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
Showing 1 - 200 of 265 Journals sorted alphabetically
Accountability in Research: Policies and Quality Assurance     Hybrid Journal   (Followers: 19)
Acta Nova     Open Access   (Followers: 2)
Acta Scientifica Malaysia     Open Access   (Followers: 1)
Acta Scientifica Naturalis     Open Access   (Followers: 4)
Adıyaman University Journal of Science     Open Access  
Advanced Science     Open Access   (Followers: 16)
Advanced Science, Engineering and Medicine     Partially Free   (Followers: 8)
Advanced Theory and Simulations     Hybrid Journal   (Followers: 4)
Advances in Research     Open Access  
Advances in Science and Technology     Full-text available via subscription   (Followers: 18)
African Journal of Science, Technology, Innovation and Development     Hybrid Journal   (Followers: 8)
Afrique Science : Revue Internationale des Sciences et Technologie     Open Access   (Followers: 1)
AFRREV STECH : An International Journal of Science and Technology     Open Access   (Followers: 3)
Alfarama Journal of Basic & Applied Sciences     Open Access   (Followers: 12)
American Academic & Scholarly Research Journal     Open Access   (Followers: 4)
American Journal of Applied Sciences     Open Access   (Followers: 22)
American Journal of Humanities and Social Sciences     Open Access   (Followers: 14)
Anales del Instituto de la Patagonia     Open Access  
Applied Mathematics and Nonlinear Sciences     Open Access   (Followers: 2)
Arab Journal of Basic and Applied Sciences     Open Access  
Arabian Journal for Science and Engineering     Hybrid Journal   (Followers: 1)
Archives Internationales d'Histoire des Sciences     Partially Free   (Followers: 5)
Archives of Current Research International     Open Access  
ARPHA Conference Abstracts     Open Access   (Followers: 1)
ARPHA Proceedings     Open Access  
Asian Journal of Advanced Research and Reports     Open Access  
Asian Journal of Scientific Research     Open Access   (Followers: 2)
Asian Journal of Technology Innovation     Hybrid Journal   (Followers: 5)
Australian Field Ornithology     Full-text available via subscription   (Followers: 1)
Australian Journal of Social Issues     Hybrid Journal   (Followers: 6)
Bangladesh Journal of Scientific Research     Open Access  
Beni-Suef University Journal of Basic and Applied Sciences     Open Access   (Followers: 1)
Berichte Zur Wissenschaftsgeschichte     Hybrid Journal   (Followers: 11)
Bilge International Journal of Science and Technology Research     Open Access  
Bioethics Research Notes     Full-text available via subscription   (Followers: 15)
BJHS Themes     Open Access   (Followers: 6)
Bulletin de la Société Royale des Sciences de Liège     Open Access  
Bulletin of the National Research Centre     Open Access  
Chain Reaction     Full-text available via subscription  
Ciencia Amazónica (Iquitos)     Open Access  
Ciencia en su PC     Open Access   (Followers: 1)
Ciencia Ergo Sum     Open Access  
Ciência ET Praxis     Open Access  
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering     Open Access  
Comunicata Scientiae     Open Access  
Conference Papers in Science     Open Access  
Configurations     Full-text available via subscription   (Followers: 11)
COSMOS     Hybrid Journal   (Followers: 1)
Crea Ciencia Revista Científica     Open Access  
Current Issues in Criminal Justice     Hybrid Journal   (Followers: 14)
Current Research in Geoscience     Open Access   (Followers: 6)
Data     Open Access   (Followers: 4)
Dhaka University Journal of Science     Open Access  
Discover Sustainability     Open Access   (Followers: 4)
Einstein (São Paulo)     Open Access  
Ekaia : EHUko Zientzia eta Teknologia aldizkaria     Open Access  
Emergent Scientist     Open Access  
Enhancing Learning in the Social Sciences     Open Access   (Followers: 8)
Enseñanza de las Ciencias : Revista de Investigación y Experiencias Didácticas     Open Access  
Entramado     Open Access  
Entre Ciencia e Ingeniería     Open Access  
Epiphany     Open Access   (Followers: 1)
Ethiopian Journal of Education and Sciences     Open Access   (Followers: 5)
European Online Journal of Natural and Social Sciences     Open Access   (Followers: 4)
European Scientific Journal     Open Access   (Followers: 11)
Evidência - Ciência e Biotecnologia - Interdisciplinar     Open Access  
Exchanges : the Warwick Research Journal     Open Access   (Followers: 1)
Experimental Results     Open Access   (Followers: 2)
Fides et Ratio : Revista de Difusión Cultural y Científica     Open Access  
Fontanus     Open Access   (Followers: 1)
Forensic Science Policy & Management: An International Journal     Hybrid Journal   (Followers: 286)
Frontiers in Climate     Open Access   (Followers: 5)
Frontiers in Science     Open Access   (Followers: 1)
Fundamental Research     Open Access  
Futures & Foresight Science     Hybrid Journal   (Followers: 1)
Gaudium Sciendi     Open Access  
Ghana Studies     Full-text available via subscription   (Followers: 15)
Global Journal of Pure and Applied Sciences     Full-text available via subscription  
Globe, The     Full-text available via subscription   (Followers: 4)
HardwareX     Open Access  
Heidelberger Jahrbücher Online     Open Access  
Heliyon     Open Access   (Followers: 1)
History of Science and Technology     Open Access   (Followers: 5)
Hoosier Science Teacher     Open Access  
Indian Journal of History of Science     Hybrid Journal   (Followers: 2)
Instruments     Open Access  
Interciencia     Open Access  
International Annals of Science     Open Access  
International Journal of Advanced Multidisciplinary Research and Review     Open Access  
International Journal of Applied Science     Open Access  
International Journal of Engineering, Science and Technology     Open Access  
International Journal of Network Science     Hybrid Journal   (Followers: 3)
International Journal of Social Sciences and Management     Open Access   (Followers: 3)
International Journal of Technology Policy and Law     Hybrid Journal   (Followers: 10)
International Science and Technology Journal of Namibia     Open Access   (Followers: 2)
International Scientific and Vocational Studies Journal     Open Access  
Investiga : TEC     Open Access  
Investigación Joven     Open Access  
Investigacion y Ciencia     Open Access   (Followers: 1)
Iranian Journal of Science and Technology, Transactions A : Science     Hybrid Journal  
iScience     Open Access   (Followers: 2)
Issues in Science & Technology     Free   (Followers: 9)
Ithaca : Viaggio nella Scienza     Open Access  
J : Multidisciplinary Scientific Journal     Open Access  
Jaunujų mokslininkų darbai     Open Access   (Followers: 3)
Journal de la Recherche Scientifique de l'Universite de Lome     Full-text available via subscription  
Journal of Chromatography & Separation Techniques     Open Access   (Followers: 9)
Journal of Advanced Research     Open Access   (Followers: 2)
Journal of Analytical Science & Technology     Open Access   (Followers: 5)
Journal of Applied Science and Technology     Full-text available via subscription   (Followers: 1)
Journal of Applied Sciences and Environmental Management     Open Access   (Followers: 1)
Journal of Big History     Open Access   (Followers: 4)
Journal of Composites Science     Open Access   (Followers: 4)
Journal of Diversity Management     Open Access   (Followers: 4)
Journal of Indian Council of Philosophical Research     Hybrid Journal  
Journal of Institute of Science and Technology     Open Access  
Journal of King Saud University - Science     Open Access  
Journal of Mathematical and Fundamental Sciences     Open Access  
Journal of Negative and No Positive Results     Open Access  
Journal of Responsible Technology     Open Access  
Journal of Science and Technology     Open Access   (Followers: 2)
Journal of Science and Technology     Open Access   (Followers: 1)
Journal of Science and Technology (Ghana)     Open Access   (Followers: 3)
Journal of Science and Technology Policy Management     Hybrid Journal   (Followers: 1)
Journal of Science Foundation     Open Access   (Followers: 1)
Journal of Scientific Research and Reports     Open Access   (Followers: 1)
Journal of Shanghai Jiaotong University (Science)     Hybrid Journal  
Journal of Social Science Research     Open Access   (Followers: 2)
Journal of Taibah University for Science     Open Access  
Journal of the Ghana Science Association     Full-text available via subscription   (Followers: 3)
Journal of the History of Ideas     Full-text available via subscription   (Followers: 196)
Journal of the Indian Institute of Science     Hybrid Journal   (Followers: 4)
Journal of the Royal Society of New Zealand     Hybrid Journal   (Followers: 49)
Journal of the South Carolina Academy of Science     Open Access  
Journal of Unsolved Questions     Open Access  
Jurnal Sains Dasar     Open Access  
Jurnal Teknosains     Open Access  
Karaelmas Science and Engineering Journal     Open Access  
Karbala International Journal of Modern Science     Open Access  
Kennedy Institute of Ethics Journal     Full-text available via subscription   (Followers: 10)
Logo STI Science, Technology and Innovation     Open Access   (Followers: 14)
Malawi Journal of Science and Technology     Open Access   (Followers: 6)
Maskana     Open Access  
MethodsX     Open Access  
Mètode Science Studies Journal : Annual Review     Open Access  
Modern Applied Science     Open Access   (Followers: 1)
Momona Ethiopian Journal of Science     Open Access   (Followers: 5)
National Academy Science Letters     Hybrid Journal   (Followers: 3)
National Science Review     Hybrid Journal   (Followers: 1)
Natural Sciences     Open Access  
Natural Sciences Education     Hybrid Journal   (Followers: 1)
Naturen     Full-text available via subscription  
Nepal Journal of Science and Technology     Open Access  
Network Science     Hybrid Journal   (Followers: 4)
Nordic Journal of Science and Technology     Open Access   (Followers: 2)
Nordic Studies in Science Education     Open Access   (Followers: 4)
Nova     Open Access  
Open Conference Proceedings Journal     Open Access  
Open Journal of Applied Sciences     Open Access  
Orbis Cógnita : Revista Científica     Open Access   (Followers: 1)
Patterns     Open Access   (Followers: 9)
People and Nature     Open Access   (Followers: 4)
Población y Desarrollo - Argonautas y caminantes     Open Access  
Politique et Sociétés     Full-text available via subscription   (Followers: 1)
Portal de la Ciencia     Open Access  
Proceedings of the Indian National Science Academy     Full-text available via subscription   (Followers: 4)
Proceedings of the Linnean Society of New South Wales     Full-text available via subscription   (Followers: 2)
Proceedings of the Royal Society of Queensland, The     Full-text available via subscription  
QScience Connect     Open Access  
Quantum Science and Technology     Hybrid Journal   (Followers: 15)
Rafidain Journal of Science     Open Access  
Rehabilitation Research, Policy, and Education     Hybrid Journal   (Followers: 2)
Reportes Científicos de la FaCEN     Open Access  
Reports in Advances of Physical Sciences     Open Access  
Research Ideas and Outcomes     Open Access  
Research Integrity and Peer Review     Open Access   (Followers: 1)
Research Policy : X     Open Access   (Followers: 3)
Respuestas     Open Access  
Revista Bases de la Ciencia     Open Access  
Revista Cientifica Guillermo de Ockham     Open Access  
Revista Conhecimento Online     Open Access  
Revista Crítica de Ciências Sociais     Open Access  
Revista de Ciencia y Tecnología     Open Access  
Revista de la Academia Colombiana de Ciencias Exactas, Físicas y Naturales     Open Access  
Revista de la Universidad del Zulia     Open Access  
Revista Politécnica     Open Access  
Revista Tecnológica     Open Access  
Revista UniVap     Open Access  
SAINSTIS     Open Access  
Sainteknol : Jurnal Sains dan Teknologi     Open Access  
Sci     Open Access  
Science     Full-text available via subscription   (Followers: 5392)
Science & Diplomacy     Free   (Followers: 3)
Science Advances     Free   (Followers: 45)
Science and Technology     Open Access   (Followers: 2)
Science Heritage Journal     Open Access  
Science World Journal     Open Access  
Science, Technology and Arts Research Journal     Open Access   (Followers: 1)
ScienceRise     Open Access  
Sciences du jeu     Open Access  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Similar Journals
Journal Cover
Data
Number of Followers: 4  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2306-5729
Published by MDPI Homepage  [258 journals]
  • Data, Vol. 10, Pages 27: SAPEx-D: A Comprehensive Dataset for Predictive
           Analytics in Personalized Education Using Machine Learning

    • Authors: Muhammad Adnan Aslam, Fiza Murtaza, Muhammad Ehatisham Ul Haq, Amanullah Yasin, Numan Ali
      First page: 27
      Abstract: Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student career chances, predicting learning success has been a central focus in education. Both performance analysis and providing high-quality instruction are challenges faced by modern schools. Maintaining high academic standards, juggling life and academics, and adjusting to technology are problems that students must overcome. In this study, we present a comprehensive dataset, SAPEx-D (Student Academic Performance Exploration), designed to predict student performance, encompassing a wide array of personal, familial, academic, and behavioral factors. Our data collection effort at Air University, Islamabad, Pakistan, involved both online and paper questionnaires completed by students across multiple departments, ensuring diverse representation. After meticulous preprocessing to remove duplicates and entries with significant missing values, we retained 494 valid responses. The dataset includes detailed attributes such as demographic information, parental education and occupation, study habits, reading frequencies, and transportation modes. To facilitate robust analysis, we encoded ordinal attributes using label encoding and nominal attributes using one-hot encoding, expanding our dataset from 38 to 88 attributes. Feature scaling was performed to standardize the range and distribution of data, using a normalization technique. Our analysis revealed that factors such as degree major, parental education, reading frequency, and scholarship type significantly influence student performance. The machine learning models applied to this dataset, including Gradient Boosting and Random Forest, demonstrated high accuracy and robustness, underscoring the dataset’s potential for insightful academic performance prediction. In terms of model performance, Gradient Boosting achieved an accuracy of 68.7% and an F1-score of 68% for the eight-class classification task. For the three-class classification, Random Forest outperformed other models, reaching an accuracy of 80.8% and an F1-score of 78%. These findings highlight the importance of comprehensive data in understanding and predicting academic outcomes, paving the way for more personalized and effective educational strategies.
      Citation: Data
      PubDate: 2025-02-20
      DOI: 10.3390/data10030027
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 28: A Directory of Datasets for Mining Software
           Repositories

    • Authors: Themistoklis Diamantopoulos, Andreas L. Symeonidis
      First page: 28
      Abstract: The amount of software engineering data is constantly growing, as more and more developers employ online services to store their code, keep track of bugs, or even discuss issues. The data residing in these services can be mined to address different research challenges; therefore, certain initiatives have been established to encourage sharing research datasets collecting them. In this work, we investigate the effect of such an initiative; we create a directory that includes the papers and the corresponding datasets of the data track of the Mining Software Engineering (MSR) conference. Specifically, our directory includes metadata and citation information for the papers of all data tracks, throughout the last twelve years. We also annotate the datasets according to the data source and further assess their compliance to the FAIR principles. Using our directory, researchers can find useful datasets for their research, or even design methodologies for assessing their quality, especially in the software engineering domain. Moreover, the directory can be used for analyzing the citations of data papers, especially with regard to different data categories, as well as for examining their FAIRness score throughout the years, along with its effect on the usage/citation of the datasets.
      Citation: Data
      PubDate: 2025-02-20
      DOI: 10.3390/data10030028
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 29: HOSPI Application to Portuguese
           Hospitals’ Websites

    • Authors: Delfina Soares, Joana Carvalho, Dimitrios Sarantis
      First page: 29
      Abstract: The Health Online Service Provision Index (HOSPI) is an instrument to assess and monitor hospitals’ websites. The index comprises four criteria—Content, Services, Community Interaction and Technology Features—each with a subset of indicators and sub-indicators. HOSPI was applied to the Portuguese hospitals’ websites in 2023, originating the dataset described in this article. The article also provides a detailed account of the data collection process, which involved direct observation of the websites and specific treatment methods, ensuring the reliability and validity of the dataset. It underscores the relevance of having this data available and how it can improve service provision online in health facilities and support policymaking.
      Citation: Data
      PubDate: 2025-02-21
      DOI: 10.3390/data10030029
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 30: Open Georeferenced Field Data on Forest Types and
           Species for Biodiversity Assessment and Remote Sensing Applications

    • Authors: Patrizia Gasparini, Lucio Di Cosmo, Antonio Floris, Federica Murgia, Maria Rizzo
      First page: 30
      Abstract: Forest ecosystems are important for biodiversity conservation, climate regulation and climate change mitigation, soil and water protection, and the recreation and provision of raw materials. This paper presents a dataset on forest type and tree species composition for 934 georeferenced plots located in Italy. The forest type is classified in the field consistently with the Italian National Forest Inventory (NFI) based on the dominant tree species or species group. Tree species composition is provided by the percent crown cover of the main five species in the plot. Additional data on conifer and broadleaves pure/mixed condition, total tree and shrub cover, forest structure, sylvicultural system, development stage, and local land position are provided. The surveyed plots are distributed in the central–eastern Alps, in the central Apennines, and in the southern Apennines; they represent a wide range of species composition, ecological conditions, and silvicultural practices. Data were collected as part of a project aimed at developing a classification algorithm based on hyperspectral data. The dataset was made publicly available as it refers to forest types and species widespread in many countries of Central and Southern Europe and is potentially useful to other researchers for the study of forest biodiversity or for remote sensing applications.
      Citation: Data
      PubDate: 2025-02-21
      DOI: 10.3390/data10030030
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 31: Using Weather Data for Improved Analysis of
           Vehicle Energy Efficiency

    • Authors: Reno Filla
      First page: 31
      Abstract: In moving vehicles, the dominating energy losses are due to interactions with the environment: air resistance and rolling resistance. It is known that weather has a significant impact, yet there is a lack of literature showing how the wealth of openly available data from professional weather observations can be used in this context. This article will give an overview of how such data are structured and how they can be accessed in order to augment logs gained during vehicle operation or simulated trips. Two efficient algorithms for such data extraction and augmentation are discussed and several examples for use are provided, also demonstrating that some caveats do exist with respect to the source of weather data.
      Citation: Data
      PubDate: 2025-02-24
      DOI: 10.3390/data10030031
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 32: Spatial Dataset of Climate Robust and High-Yield
           Agricultural Areas in Brandenburg: Results of a Classification Framework
           Using Bio-Economic Climate Simulations

    • Authors: Hannah Jona von Czettritz, Sandra Uthes, Johannes Schuler, Kurt-Christian Kersebaum, Peter Zander
      First page: 32
      Abstract: Coherent spatial data are crucial for informed land use and regional planning decisions, particularly in the context of securing a crisis-proof food supply and adapting to climate change. This dataset provides spatial information on climate-robust and high-yield agricultural arable land in Brandenburg, Germany, based on the results of a classification using bio-economic climate simulations. The dataset is intended to support regional planning and policy makers in zoning decisions (e.g., photovoltaic power plants) by identifying climate-robust arable land with high current and stable future production potential that should be reserved for agricultural use. The classification method used to generate the dataset includes a wide range of indicators, including established approaches, such as a soil quality index, drought, water, and wind erosion risk, as well as a dynamic approach, using bio-economic simulations, which determine the production potential under future climate scenarios. The dataset is a valuable resource for spatial planning and climate change adaptation, contributing to long-term food security especially in dry areas such as the state of Brandenburg facing increased production risk under future climatic conditions, thereby serving globally as an example for land use planning challenges related to climate change.
      Citation: Data
      PubDate: 2025-02-25
      DOI: 10.3390/data10030032
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 33: Data Quality Tools to Enhance a Network Anomaly
           Detection Benchmark

    • Authors: José Camacho, Rafael A. Rodríguez-Gómez
      First page: 33
      Abstract: Network traffic datasets are essential for the construction of traffic models, often using machine learning (ML) techniques. Among other applications, these models can be employed to solve complex optimization problems or to identify anomalous behaviors, i.e., behaviors that deviate from the established model. However, the performance of the ML model depends, among other factors, on the quality of the data used to train it. Benchmark datasets, with a profound impact on research findings, are often assumed to be of good quality by default. In this paper, we derive four variants of a benchmark dataset in network anomaly detection (UGR’16, a flow-based real-world traffic dataset designed for anomaly detection), and show that the choice among variants has a larger impact on model performance than the ML technique used to build the model. To analyze this phenomenon, we propose a methodology to investigate the causes of these differences and to assess the quality of the data labeling. Our results underline the importance of paying more attention to data quality assessment in network anomaly detection.
      Citation: Data
      PubDate: 2025-02-25
      DOI: 10.3390/data10030033
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 34: Draft Genome Sequence Data of the Ensifer sp.
           P24N7, a Symbiotic Bacteria Isolated from Nodules of Phaseolus vulgaris
           Grown in Mining Tailings from Huautla, Morelos, Mexico

    • Authors: José Augusto Ramírez-Trujillo, Maria Guadalupe Castillo-Texta, Mario Ramírez-Yáñez, Ramón Suárez-Rodríguez
      First page: 34
      Abstract: In this work, we report the draft genome sequence of Ensifer sp. P24N7, a symbiotic nitrogen-fixing bacterium isolated from nodules of Phaseolus vulgaris var. Negro Jamapa was planted in pots that contained mining tailings from Huautla, Morelos, México. The genomic DNA was sequenced by an Illumina NovaSeq 6000 using the 250 bp paired-end protocol obtaining 1,188,899 reads. An assembly generated with SPAdes v. 3.15.4 resulted in a genome length of 7,165,722 bp composed of 181 contigs with a N50 of 323,467 bp, a coverage of 76X, and a GC content of 61.96%. The genome was annotated with the NCBI Prokaryotic Genome Annotation Pipeline and contains 6631 protein-coding sequences, 3 complete rRNAs, 52 tRNAs, and 4 non-coding RNAs. The Ensifer sp. P24N7 genome has 59 genes related to heavy metal tolerance predicted by RAST server. These data may be useful to the scientific community because they can be used as a reference for other works related to heavy metals, including works in Huautla, Morelos.
      Citation: Data
      PubDate: 2025-02-27
      DOI: 10.3390/data10030034
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 35: A Comprehensive Indoor Environment Dataset from
           Single-Family Houses in the US

    • Authors: Sheik Murad Hassan Anik, Xinghua Gao, Na Meng
      First page: 35
      Abstract: The paper describes a dataset comprising indoor environmental factors such as temperature, humidity, air quality, and noise levels. The data were collected from 10 sensing devices installed in various locations within three single-family houses in Virginia, USA. The objective of the data collection was to study the indoor environmental conditions of the houses over time. The data were collected at a frequency of one record per minute for a year, combining to a total over 2.5 million records. The paper provides actual floor plans with sensor placements to aid researchers and practitioners in creating reliable building performance models. The techniques used to collect and verify the data are also explained in the paper. The resulting dataset can be employed to enhance models for building energy consumption, occupant behavior, predictive maintenance, and other relevant purposes.
      Citation: Data
      PubDate: 2025-03-05
      DOI: 10.3390/data10030035
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 36: KRID:A Large-Scale Nationwide Korean Road
           Infrastructure Dataset for Comprehensive Road Facility Recognition

    • Authors: Hyeongbok Kim, Eunbi Kim, Sanghoon Ahn, Beomjin Kim, Sung Jin Kim, Tae Kyung Sung, Lingling Zhao, Xiaohong Su, Gilmu Dong
      First page: 36
      Abstract: Comprehensive datasets are crucial for developing advanced AI solutions in road infrastructure, yet most existing resources focus narrowly on vehicles or a limited set of object categories. To address this gap, we introduce the Korean Road Infrastructure Dataset (KRID), a large-scale dataset designed for real-world road maintenance and safety applications. Our dataset covers highways, national roads, and local roads in both city and non-city areas, comprising 34 distinct types of road infrastructure—from common elements (e.g., traffic signals, gaze-directed poles) to specialized structures (e.g., tunnels, guardrails). Each instance is annotated with either bounding boxes or polygon segmentation masks under stringent quality control and privacy protocols. To demonstrate the utility of this resource, we conducted object detection and segmentation experiments using YOLO-based models, focusing on guardrail damage detection and traffic sign recognition. Preliminary results confirm its suitability for complex, safety-critical scenarios in intelligent transportation systems. Our main contributions include: (1) a broader range of infrastructure classes than conventional “driving perception” datasets, (2) high-resolution, privacy-compliant annotations across diverse road conditions, and (3) open-access availability through AI Hub and GitHub. By highlighting critical yet often overlooked infrastructure elements, this dataset paves the way for AI-driven maintenance workflows, hazard detection, and further innovations in road safety.
      Citation: Data
      PubDate: 2025-03-14
      DOI: 10.3390/data10030036
      Issue No: Vol. 10, No. 3 (2025)
       
  • Data, Vol. 10, Pages 11: RNA Sequencing Dataset of Drosophila Nociceptor
           Translatomic Response to Injury

    • Authors: Christine M. Hale, Kyle J. Beauchemin, Courtney L. Brann, Julie K. Moulton, Ramaz Geguchadze, Benjamin J. Harrison, Geoffrey K. Ganter
      First page: 11
      Abstract: To prepare to address the mechanisms of injury-induced nociceptor sensitization, we sequenced the translatome of the nociceptors of injured Drososophila larvae and those of uninjured larvae. Third-instar larvae expressing a green fluorescent protein (GFP)-tagged ribosomal subunit specifically in Class 4 dendritic arborization neurons, recognized as pickpocket-expressing primary nociceptors, via the GAL4/UAS method, were injured by ultraviolet light or sham-injured. Larvae were subjected to translating ribosome affinity purification for the GFP tag and nociceptor-specific ribosome-bound RNA was sequenced.
      Citation: Data
      PubDate: 2025-01-21
      DOI: 10.3390/data10020011
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 12: Portable Analyses of Strategic Metal-Rich
           Minerals Using pXRF and pLIBS: Methodology and Database Development

    • Authors: Marjolène Jatteau, Jean Cauzid, Cécile Fabre, Panagiotis Voudouris, Georgios Soulamidis, Alexandre Tarantola
      First page: 12
      Abstract: Strategic metals are indispensable for meeting the needs of modern society. It is then necessary to reassess the potential of such metals in Europe. For the exploration of strategic metals, portable XRF (X-Ray Fluorescence) and LIBS (Laser Induced Breakdown Spectroscopy) are powerful techniques allowing their multi-elementary analysis. This paper presents a database providing more than 2000 pXRF data and more than 4000 pLIBS spectra acquired on minerals from the Mineralogy and Petrology Museum of National and Kapodistrian University of Athens (NKUA), selected based on their potential in bearing strategic metals. The combination of these two portable techniques, along with expanding dataset on strategic metal-rich minerals, provides valuable insights into strategic metal affinities and demonstrates the effectiveness of portable tools for exploring strategic raw materials. Indeed, such database allows to strengthen the knowledge on strategic metals by producing statistic and chemometric analyses (e.g., boxplot, PCA, PLS) on their distribution.
      Citation: Data
      PubDate: 2025-01-27
      DOI: 10.3390/data10020012
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 13: A Multimodal Dataset of Fact-Checked News from
           Chile’s Constitutional Processes: Collection, Processing, and
           Analysis

    • Authors: Ignacio Molina, Brian Keith, Mauricio Matus
      First page: 13
      Abstract: This paper presents a multimodal dataset capturing fact-checked news coverage of Chile’s constitutional processes from 2019–2023. The collection comprises 300 articles from three sources: Fast Check, Fact Checking UC, and BioBioChile, containing 242,687 words of text and visual content in 168 entries. The dataset implements advanced natural language processing through RoBERTa and computer vision techniques via EfficientNet, with unified multimodal analysis using the CLIP model. Technical validation through clustering analysis and expert review demonstrates the dataset’s effectiveness in identifying narrative patterns within constitutional process coverage. The structured format includes verification metadata, precomputed embeddings, and documented relationships between textual and visual elements. This enables research into how misinformation propagates through multiple channels during significant political events. This paper details the dataset’s composition, collection methodology, and validation while acknowledging specific limitations. This contribution addresses a gap in current research resources by providing verified multimodal content spanning two constitutional processes, supporting investigations in computational social science and misinformation studies.
      Citation: Data
      PubDate: 2025-01-28
      DOI: 10.3390/data10020013
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 14: Data on Stark Broadening of Sn II Spectral Lines

    • Authors: Milan S. Dimitrijević, Magdalena D. Christova, Cristina Yubero, Sylvie Sahal-Bréchot
      First page: 14
      Abstract: Data on spectral line widths and shifts broadened by interactions with charged particles, for 44 lines in the spectrum of ionized tin, for collisions with electrons and H II and HeII ions, are presented as online available tables. We obtained them by employing the semiclassical perturbation theory for temperatures, T, within the 5000–100,000 K range, and for a grid of perturber densities from 1014 cm−3 to 1020 cm−3. The presented Stark broadening data are of interest for the analysis and synthesis of ionized tin lines in the spectra of hot and dense stars, such as, for example, for white dwarfs and hot subwarfs, and for the modelling of their atmospheres. They are also useful for the diagnostics of laser-induced plasmas for high-order harmonics generation in ablated materials.
      Citation: Data
      PubDate: 2025-01-28
      DOI: 10.3390/data10020014
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 15: Global Dataset of Extreme Sea Levels and Coastal
           Flood Impacts over the 21st Century

    • Authors: Ebru Kirezci, Ian Young, Roshanka Ranasinghe, Yiqun Chen, Yibo Zhang, Abbas Rajabifard
      First page: 15
      Abstract: A global database of coastal flooding impacts resulting from extreme sea levels is developed for the present day and for the years 2050 and 2100. The database consists of three sub-datasets: the extreme sea levels, the coastal areas flooded by these extreme sea levels, and the resulting socioeconomic implications. The extreme sea levels consider the processes of storm surge, tide levels, breaking wave setup and relative sea level rise. The socioeconomic implications are expressed in terms of Expected Annual Population Affected (EAPA) and Expected Annual Damage (EAD), and presented at the global, regional and national scales. The EAPA and EAD are determined both for existing coastal defence levels and assuming two plausible adaptation scenarios, along with socioeconomic development narratives. All the sub-datasets can be visualized with a Digital Twin platform based on a GIS-based mapping host. This publicly available database provides a first-pass assessment, enabling users to extract and identify global and national coastal hotspots under different projections of sea level rise and socioeconomic developments.
      Citation: Data
      PubDate: 2025-01-28
      DOI: 10.3390/data10020015
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 16: Data-Driven Scheduling Optimization for SMT Lines
           Using SMD Reel Commonality

    • Authors: Jorge Quijano, Nohemi Torres Cruz, Leslie Quijano-Quian, Eduardo Rafael Poblano-Ojinaga, Salvador Anacleto Noriega Noriega Morales
      First page: 16
      Abstract: Optimizing production efficiency in Surface-Mount Technology (SMT) manufacturing is a critical challenge, particularly in high-mix environments where frequent product changeovers can lead to significant downtime. This study presents a scheduling algorithm that minimizes changeover times on SMT lines by leveraging the commonality of Surface-Mount Device (SMD) reel part numbers across product Bills of Materials (BOMs). The algorithm’s capabilities were demonstrated through both simulated datasets and practical validation trials, providing a comprehensive evaluation framework. In the practical implementation, the algorithm successfully aligned predicted and measured changeover times, highlighting its applicability and accuracy in operational settings. The proposed approach integrates heuristic and optimization techniques to identify scheduling strategies that not only minimize reel changes but also support production scalability and operational flexibility. This framework offers a robust solution for optimizing SMT workflows, enhancing productivity, and reducing resource inefficiencies in both greenfield projects and established manufacturing environments.
      Citation: Data
      PubDate: 2025-01-29
      DOI: 10.3390/data10020016
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 17: Rainfall Intensity–Duration–Frequency
           Curves Dataset for Brazil

    • Authors: Ivana Patente Torres, Roberto Avelino Cecílio, Laura Thebit de Almeida, Marcel Carvalho Abreu, Demetrius David da Silva, Sidney Sara Zanetti, Alexandre Cândido Xavier
      First page: 17
      Abstract: This is a database containing rainfall intensity–duration–frequency equations (IDF equations) for 6550 pluviographic and pluviometric stations in Brazil. The database was compiled from 370 different publications and contains the following information: station identification, geographic position, size and period of the rainfall series used, parameters of the IDF equations, and literature references. The database is available on Mendeley Data (
      DOI : 10.17632/378bdcmnc8.1) in the form of spreadsheets and vector files. Since the launch of the Pluvio 2.1 software in 2006, which included 549 IDF equations obtained in the country, this is the largest and most accessible database of IDF equations in Brazil. The data provided may be useful, among other purposes, for designing hydraulic structures, controlling water erosion, planning land use, and water resource planning and management.
      Citation: Data
      PubDate: 2025-01-29
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 18: Statistical Approach in Personalized Nutrition
           Exemplified by Reanalysis of Public Datasets

    • Authors: Paola G. Ferrario, Maik Döring, Christian Ritz
      First page: 18
      Abstract: In clinical nutrition, it is regularly observed that individuals respond differently to a dietary treatment. Personalized nutrition aims to consider such variability in response by delivering personalized nutritional recommendations. Ideally, the optimal treatment for each individual will be selected and then dispensed according to the specific individual’s characteristics. The aim of this paper is to discuss and apply existing statistical methods, which can be adequately used in the context of personalized nutrition. We discuss the estimation of individualized treatment rules (ITRs) as we wish to favor one out of two interventions. The applicability of the methods is demonstrated by reusing two public datasets: one in the context of a parallel group design and one in the context of a crossover design. The bias of the estimator of the ITRs underlying parameters is evaluated in a simulation study.
      Citation: Data
      PubDate: 2025-01-30
      DOI: 10.3390/data10020018
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 19: Impact of Various Land Cover Transformations on
           Climate Change: Insights from a Spatial Panel Analysis

    • Authors: Mohsen Khezri
      First page: 19
      Abstract: This study introduces an innovative empirical methodology by integrating spatial panel models with satellite imagery data from 1970 to 2019. This innovative approach illuminates the effects of greenhouse gas emissions, deforestation, and various global variables on regional temperature shifts and the environmental repercussions of land-use alterations, establishing a substantial empirical basis for climate change. The results revealed that global variables such as sunspot activity, the length of day (LOD), and the Global Mean Sea Level (GMSL) have negligible impacts on global temperature variations. This model uncovers the nuanced effect of deforestation on global temperatures, highlighting a decrease in temperature following deforestation above 40°N latitude, contrary to the warming effect observed in lower latitudes. Exceptionally, deforestation within the 10° N to 10° S tropical bands results in a temperature decrease, challenging the established theories. The results suggest that converting forests to grass/shrublands and croplands plays a significant role in these temperature dynamics.
      Citation: Data
      PubDate: 2025-01-31
      DOI: 10.3390/data10020019
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 20: Seaweed-Based Bioplastics: Data Mining
           Ingredient–Property Relations from the Scientific Literature

    • Authors: Fernanda Véliz, Thulasi Bikku, Davor Ibarra-Pérez, Valentina Hernández-Muñoz, Alysia Garmulewicz, Felipe Herrera
      First page: 20
      Abstract: Automated analysis of the scientific literature using natural language processing (NLP) can accelerate the identification of potentially unexplored formulations that enable innovations in materials engineering with fewer experimentation and testing cycles. This strategy has been successful for specific classes of inorganic materials, but their general application in broader material domains such as bioplastics remains challenging. To begin addressing this gap, we explore correlations between the ingredients and physicochemical properties of seaweed-based biofilms from a corpus of 2000 article abstracts from the scientific literature since 1958, using a supervised word co-occurrence analysis and an unsupervised approach based on the language model MatBERT without fine-tuning. Using known relations between ingredients and properties for test scenarios, we discuss the potential and limitations of these NLP approaches for identifying novel combinations of polysaccharides, plasticizers, and additives that are related to the functionality of seaweed biofilms. The model demonstrates a valuable predictive ability to identify ingredients associated with increased water vapor permeability, suggesting its potential utility in optimizing formulations for future research. Using the model further revealed alternative combinations that are underrepresented in the literature. This automated method facilitates the mapping of relationships between ingredients and properties, guiding the development of seaweed bioplastic formulations. The unstructured and heterogeneous nature of the literature on bioplastics represents a particular challenge that demands ad hoc fine-tuning strategies for state-of-the-art language models for advancing the field of seaweed bioplastics.
      Citation: Data
      PubDate: 2025-02-01
      DOI: 10.3390/data10020020
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 21: An Open Database of the Internal and Surface
           Temperatures of a Reinforced-Concrete Slab-on-I-Beam Section

    • Authors: Pedro Cavadia, José M. Benjumea, Oscar Begambre, Edison Osorio, María A. Mantilla
      First page: 21
      Abstract: Due to climate change, the temperature monitoring of reinforced-concrete (RC) structures is becoming critical for preventive maintenance and extending their lifespan. Significant temperature variations in RC elements can affect their natural frequencies and modulus of elasticity or generate abnormal stress levels, potentially leading to structural damage. Data from thermal monitoring systems are invaluable for testing and validating numerical methodologies for estimating internal thermal responses and aiding in prevention/maintenance decision making. Despite its importance, few experimental outdoor data on the internal and external temperatures of concrete structures are available. This study presents a comprehensive dataset from a 120-day temperature-monitoring campaign on a 1.2 m long reinforced-concrete slab-on-I-beam model under tropical conditions in Bucaramanga, Colombia. The monitoring system measured the internal temperatures at 40 points using embedded thermocouples, while the surface temperatures were recorded with handheld and drone-mounted thermal cameras. Simultaneously, the ambient temperature, solar radiation, rainfall, wind velocity, and other parameters were monitored using a weather station. The instrumentation ensured the synchronization and high spatial resolution of the thermal data. The data, collected at 30 min intervals, are openly available in CSV format, offering valuable resources for validating numerical models, studying thermal gradients, and enhancing structural health-monitoring frameworks.
      Citation: Data
      PubDate: 2025-02-04
      DOI: 10.3390/data10020021
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 22: Stress Factors in Higher Education: A Data
           Analysis Case

    • Authors: Rodolfo Bojorque, Fernando Moscoso, Fernando Pesántez, Ángela Flores
      First page: 22
      Abstract: This study investigates stressors in higher education, focusing on their impact on students and faculty at Universidad Politécnica Salesiana (UPS) and using eight years of comprehensive data. Employing data mining techniques, the research analyzed enrollment, retention, graduation, employability, socioeconomic status, academic performance, and faculty workload to uncover patterns affecting academic outcomes. The study found that UPS exhibits a stable educational system, maintaining consistent metrics across student success indicators. However, the COVID-19 pandemic presented unique stressors, evidenced by a paradoxical increase in student grades during heightened faculty stress levels. This anomaly suggests a potential link between academic rigor and faculty well-being during systemic disruptions. Stressors affecting students directly correlated with reduced academic performance, highlighting the importance of early detection and intervention. Conversely, faculty stress was reflected in adjustments to grading practices, raising questions about institutional pressures and faculty motivation. These findings emphasize the value of proactive data analytics in identifying stress-induced anomalies to support student success and faculty well-being. The study advocates for further research on faculty burnout, motivation, and institutional strategies to mitigate stressors, underscoring the potential of data-driven approaches to enhance the quality and sustainability of higher education ecosystems.
      Citation: Data
      PubDate: 2025-02-07
      DOI: 10.3390/data10020022
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 23: A Bayesian State-Space Approach to Dynamic
           Hierarchical Logistic Regression for Evolving Student Risk in Educational
           Analytics

    • Authors: Moeketsi Mosia
      First page: 23
      Abstract: Early detection of academically at-risk students is crucial for designing timely interventions that improve educational outcomes. However, many existing approaches either ignore the temporal evolution of student performance or rely on “black box” models that sacrifice interpretability. In this study, we develop a dynamic hierarchical logistic regression model in a fully Bayesian framework to address these shortcomings. Our method leverages partial pooling across students and employs a state-space formulation, allowing each student’s log-odds of failure to evolve over multiple assessments. By using Markov chain Monte Carlo for inference, we obtain robust posterior estimates and credible intervals for both population-level and individual-specific effects, while posterior predictive checks ensure model adequacy and calibration. Results from simulated and real-world datasets indicate that the proposed approach more accurately tracks fluctuations in student risk compared to static logistic regression, and it yields interpretable insights into how engagement patterns and demographic factors influence failure probability. We conclude that a Bayesian dynamic hierarchical model not only enhances prediction of at-risk students but also provides actionable feedback for instructors and administrators seeking evidence-based interventions.
      Citation: Data
      PubDate: 2025-02-07
      DOI: 10.3390/data10020023
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 24: Visual Footprint of Separation Through Membrane
           Distillation on YouTube

    • Authors: Ersin Aytaç, Mohamed Khayet
      First page: 24
      Abstract: Social media has revolutionized the dissemination of information, enabling the rapid and widespread sharing of news, concepts, technologies, and ideas. YouTube is one of the most important online video sharing platforms of our time. In this research, we investigate the trace of separation through membrane distillation (MD) on YouTube using statistical methods and natural language processing. The dataset collected on 04.01.2024 included 212 videos with key characteristics such as durations, views, subscribers, number of comments, likes, etc. The results show that the number of videos is not sufficient, but there is an increasing trend, especially since 2019. The high number of channels offering information about MD technology in countries such as the USA, India, and Canada indicates that these countries recognized the practical benefits of this technology, especially in areas such as water treatment, desalination, and industrial applications. This suggests that MD could play a pivotal role in finding solutions to global water challenges. Word cloud analysis showed that terms such as “water”, “treatment”, “desalination”, and “separation” were prominent, indicating that the videos focused mainly on the principles and applications of MD. The sentiment of the comments is mostly positive, and the dominant emotion is neutral, revealing that viewers generally have a positive attitude towards MD. The narrative intensity metric evaluates the information transfer efficiency of the videos and provides a guide for effective content creation strategies. The results of the analyses revealed that social media awareness about MD technology is still not sufficient and that content development and sharing strategies should focus on bringing the technology to a wider audience.
      Citation: Data
      PubDate: 2025-02-08
      DOI: 10.3390/data10020024
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 25: CropsDisNet: An AI-Based Platform for Disease
           Detection and Advancing On-Farm Privacy Solutions

    • Authors: Mohammad Badhruddouza Khan, Salwa Tamkin, Jinat Ara, Mobashwer Alam, Hanif Bhuiyan
      First page: 25
      Abstract: Crop failure is defined as crop production that is significantly lower than anticipated, resulting from plants that are harmed, diseased, destroyed, or influenced by climatic circumstances. With the rise in global food security concern, the earliest detection of crop diseases has proven to be pivotal in agriculture industries to address the needs of the global food crisis and on-farm data protection, which can be met with a privacy-preserving deep learning model. However, deep learning seems to be a largely complex black box to interpret, necessitating a prerequisite for the groundwork of the model’s interpretability. Considering this, the aim of this study was to follow up on the establishment of a robust deep learning custom model named CropsDisNet, evaluated on a large-scale dataset named “New Bangladeshi Crop Disease Dataset (corn, potato and wheat)”, which contains a total of 8946 images. The integration of a differential privacy algorithm into our CropsDisNet model could establish the benefits of automated crop disease classification without compromising on-farm data privacy by reducing training data leakage. To classify corn, potato, and wheat leaf diseases, we used three representative CNN models for image classification (VGG16, Inception Resnet V2, Inception V3) along with our custom model, and the classification accuracy for these three different crops varied from 92.09% to 98.29%. In addition, demonstration of the model’s interpretability gave us insight into our model’s decision making and classification results, which can allow farmers to understand and take appropriate precautions in the event of early widespread harvest failure and food crises.
      Citation: Data
      PubDate: 2025-02-18
      DOI: 10.3390/data10020025
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 26: Consistency and Stability in Feature Selection
           for High-Dimensional Microarray Survival Data in Diffuse Large B-Cell
           Lymphoma Cancer

    • Authors: Kazeem A. Dauda, Rasheed K. Lamidi
      First page: 26
      Abstract: High-dimensional survival data, such as microarray datasets, present significant challenges in variable selection and model performance due to their complexity and dimensionality. Identifying important genes and understanding how these genes influence the survival of patients with cancer are of great interest and a major challenge to biomedical scientists, healthcare practitioners, and oncologists. Therefore, this study combined the strengths of two complementary feature selection methodologies: a filtering (correlation-based) approach and a wrapper method based on Iterative Bayesian Model Averaging (IBMA). This new approach, termed Correlation-Based IBMA, offers a highly efficient and effective means of selecting the most important and influential genes for predicting the survival of patients with cancer. The efficiency and consistency of the method were demonstrated using diffuse large B-cell lymphoma cancer data. The results revealed that the 15 most important genes out of 3835 gene features were consistently selected at a threshold p-value of 0.001, with genes with posterior probabilities below 1% being removed. The influence of these 15 genes on patient survival was assessed using the Cox Proportional Hazards (Cox-PH) Model. The results further revealed that eight genes were highly associated with patient survival at a 0.05 level of significance. Finally, these findings underscore the importance of integrating feature selection with robust modeling approaches to enhance accuracy and interpretability in high-dimensional survival data analysis.
      Citation: Data
      PubDate: 2025-02-18
      DOI: 10.3390/data10020026
      Issue No: Vol. 10, No. 2 (2025)
       
  • Data, Vol. 10, Pages 4: Optimizing Parkinson’s Disease Prediction: A
           Comparative Analysis of Data Aggregation Methods Using Multiple Voice
           Recordings via an Automated Artificial Intelligence Pipeline

    • Authors: Zhengxiao Yang, Hao Zhou, Sudesh Srivastav, Jeffrey G. Shaffer, Kuukua E. Abraham, Samuel M. Naandam, Samuel Kakraba
      First page: 4
      Abstract: Patient-level grouped data are prevalent in public health and medical fields, and multiple instance learning (MIL) offers a framework to address the challenges associated with this type of data structure. This study compares four data aggregation methods designed to tackle the grouped structure in classification tasks: post-mean, post-max, post-min, and pre-mean aggregation. We developed a customized AI pipeline that incorporates twelve machine learning algorithms along with the four aggregation methods to detect Parkinson’s disease (PD) using multiple voice recordings from individuals available in the UCI Machine Learning Repository, which includes 756 voice recordings from 188 PD patients and 64 healthy individuals. Seven performance metrics—accuracy, precision, sensitivity, specificity, F1 score, AUC, and MCC—were utilized for model evaluation. Various techniques, such as Bag Over-Sampling (BOS), cross-validation, and grid search, were implemented to enhance classification performance. Among the four aggregation methods, post-mean aggregation combined with XGBoost achieved the highest accuracy (0.880), F1 score (0.922), and MCC (0.672). Furthermore, we identified potential trends in selecting aggregation methods that are suitable for imbalanced data, particularly based on their differences in sensitivity and specificity. These findings provide meaningful implications for the further exploration of grouped imbalanced data.
      Citation: Data
      PubDate: 2025-01-02
      DOI: 10.3390/data10010004
      Issue No: Vol. 10, No. 1 (2025)
       
  • Data, Vol. 10, Pages 5: The EDI Multi-Modal Simultaneous Localization and
           Mapping Dataset (EDI-SLAM)

    • Authors: Racinskis, Krasnikovs, Arents, Greitans
      First page: 5
      Abstract: This paper accompanies the initial public release of the EDI multi-modal SLAM dataset, a collection of long tracks recorded with a portable sensor package. These include two global shutter RGB camera feeds, LiDAR scans, as well as inertial and GNSS data from an RTK-enabled IMU-GNSS positioning module—both as satellite fixes and internally fused interpolated pose estimates. The tracks are formatted as ROS1 and ROS2 bags, with separately available calibration and ground truth data. In addition to the filtered positioning module outputs, a second form of sparse ground truth pose annotation is provided using independently surveyed visual fiducial markers as a reference. This enables the meaningful evaluation of systems that directly utilize data from the positioning module into their localization estimates, and serves as an alternative when the GNSS reference is disrupted by intermittent signals or multipath scattering. In this paper, we describe the methods used to collect the dataset, its contents, and its intended use.
      Citation: Data
      PubDate: 2025-01-07
      DOI: 10.3390/data10010005
      Issue No: Vol. 10, No. 1 (2025)
       
  • Data, Vol. 10, Pages 6: Self-Reported Data for Sustainable Development
           from People Living in Rural and Remote Areas

    • Authors: Salem Ahmed Alabdali, Salvatore Flavio Pileggi, Gnana Bharathy
      First page: 6
      Abstract: This paper describes a dataset for the Sustainable Development of remote and rural areas. Version 1.0 includes self-reported data, with a total of 212 valid responses collected in 2024 across different sectors (education, healthcare, and business) from people living in rural and remote areas in Saudi Arabia. The structured survey is understood to support research endeavors and policy making, looking at the peculiar characteristics of those regions. The 40 core questions, in addition to the detailed demographic questions, aim to capture different perspectives and perceptions on innovative and sustainable solutions. Overall, the dataset offers valuable strategic insights to be integrated with other sources of information, as well as the opportunity to incrementally generate extensive and diverse knowledge in the field. The major limitation is inherently related to the local context, as data comes from the most educated persons with access to digital resources. Additionally, the dataset may be considered as relatively small, and there is some gender imbalance due to cultural factors.
      Citation: Data
      PubDate: 2025-01-08
      DOI: 10.3390/data10010006
      Issue No: Vol. 10, No. 1 (2025)
       
  • Data, Vol. 10, Pages 7: Cholec80-Boxes: Bounding Box Labelling Data for
           Surgical Tools in Cholecystectomy Images

    • Authors: Tamer Abdulbaki Alshirbaji, Nour Aldeen Jalal, Herag Arabian, Alberto Battistel, Paul David Docherty, Hisham ElMoaqet, Thomas Neumuth, Knut Moeller
      First page: 7
      Abstract: Surgical data analysis is crucial for developing and integrating context-aware systems (CAS) in advanced operating rooms. Automatic detection of surgical tools is an essential component in CAS, as it enables the recognition of surgical activities and understanding the contextual status of the procedure. Acquiring surgical data is challenging due to ethical constraints and the complexity of establishing data recording infrastructures. For machine learning tasks, there is also the large burden of data labelling. Although a relatively large dataset, namely the Cholec80, is publicly available, it is limited to the binary label data corresponding to the surgical tool presence. In this work, 15,691 frames from five videos from the dataset have been labelled with bounding boxes for surgical tool localisation. These newly labelled data support future research in developing and evaluating object detection models, particularly in the laparoscopic image data analysis domain.
      Citation: Data
      PubDate: 2025-01-08
      DOI: 10.3390/data10010007
      Issue No: Vol. 10, No. 1 (2025)
       
  • Data, Vol. 10, Pages 8: Application of Google Earth Engine to Monitor
           Greenhouse Gases: A Review

    • Authors: Damar David Wilson, Gebrekidan Worku Tefera, Ram L. Ray
      First page: 8
      Abstract: Google Earth Engine (GEE) is a cloud-based platform revolutionizing geospatial analysis by providing access to vast satellite datasets and computational capabilities for monitoring environmental and societal issues. It incorporates machine learning (ML) techniques and algorithms as part of its tools for analyzing and processing large geospatial data. This review explores the diverse applications of GEE in monitoring and mitigating greenhouse gas emissions and uptakes. GEE is a cloud-based platform built on Google’s infrastructure for analyzing and visualizing large-scale geospatial datasets. It offers large datasets for monitoring greenhouse gas (GHG) emissions and understanding their environmental impact. By leveraging GEE’s capabilities, researchers have developed tools and algorithms to analyze remotely sensed data and accurately quantify GHG emissions and uptakes. This review examines progress and trends in GEE applications, focusing on monitoring carbon dioxide (CO2), methane (CH4), and nitrous oxide/nitrogen dioxide (N2O/NO2) emissions. It discusses the integration of GEE with different machine learning methods and the challenges and opportunities in optimizing algorithms and ensuring data interoperability. Furthermore, it highlights GEE’s role in pinpointing emission hotspots, as demonstrated in studies monitoring uptakes. By providing insights into GEE’s capabilities for precise monitoring and mapping of GHGs, this review aims to advance environmental research and decision-making processes in mitigating climate change.
      Citation: Data
      PubDate: 2025-01-11
      DOI: 10.3390/data10010008
      Issue No: Vol. 10, No. 1 (2025)
       
  • Data, Vol. 10, Pages 9: Credit Evaluation of Technology-Based Small and
           Micro Enterprises: An Innovative Weighting Method Based on Machine
           Learning and AHP

    • Authors: Bingya Wu, Zhihui Hu, Zhouyi Gu, Yuxi Zheng, Jiayan Lv
      First page: 9
      Abstract: Technology-based small and micro enterprises play a crucial role in national economic and social development. Managing their credit risk effectively is key to ensuring their healthy growth. This study is based on corporate credit management theory and Wu’s three-dimensional credit theory. It clarifies the credit concept and measurement logic of these enterprises, considering their unique development characteristics in China. A credit evaluation system is constructed, and an innovative method combining machine learning with comprehensive evaluation is proposed. This approach aims to assess the credit status of technology-based small and micro enterprises in a thorough and objective manner. The study finds that, first, the credit level of these enterprises is currently moderate, with little variation. Second, financial information remains a key factor in credit evaluation. Third, the ML-AHP (Machine Learning-Analytic Hierarchy Process) combined weighting method effectively integrates subjective experience with objective data, providing a more rational assessment. The findings provide theoretical references and practical guidance for the healthy development of technology-based small and micro enterprises, early credit risk warning, and improved financing efficiency.
      Citation: Data
      PubDate: 2025-01-14
      DOI: 10.3390/data10010009
      Issue No: Vol. 10, No. 1 (2025)
       
  • Data, Vol. 10, Pages 10: A Comprehensive Parcel-Level Dataset on Farmland
           Assessment: Addressing Grid-Cell Data Bias Estimation

    • Authors: Wai Yan Siu, Man Li, Arthur J. Caplan
      First page: 10
      Abstract: Grid-cell data are increasingly used in research due to the growing availability and accessibility of remote sensing products. However, grid-cell data often fails to represent the actual decision-making unit, leading to biased estimates in socio-economic analysis. To this end, this paper presents a comprehensive parcel-level dataset for Salt Lake County, Utah, spanning from 2008 to 2018. This dataset combines detailed spatial and temporal data on land ownership, land use, and preferential farmland tax assessments under the Greenbelt program. Compiled from multiple geospatial sources, the dataset includes nearly 200,000 parcel-year observations, providing valuable insights into landowner decision-making and the impact of tax abatement incentives at the decision-making level. This resource is beneficial for researchers, educators, and practitioners in sustainable development, environmental studies, and farmland conservation.
      Citation: Data
      PubDate: 2025-01-17
      DOI: 10.3390/data10010010
      Issue No: Vol. 10, No. 1 (2025)
       
  • Data, Vol. 9, Pages 102: An Expected Goals on Target (xGOT) Metric as a
           New Metric for Analyzing Elite Soccer Player Performance

    • Authors: Ruiz-de-Alarcón-Quintero, De-la-Cruz-Torres
      First page: 102
      Abstract: Introduction. Football analysis is an applied research area that has seen a huge upsurge in recent years. More complex analysis to understand the soccer players’ or teams’ performances during matches is required. The objective of this study was to prove the usefulness of the expected goals on target (xGOT) metric, as a good indicator of a soccer team’s performance in professional Spanish football leagues, both in the women’s and men’s categories. Method. The data for the Spanish teams were collected from the statistical website Football Reference (https://www.fbref.com). The 2023/24 season was analyzed for Spanish leagues, both in the women’s and men’s categories (LigaF and LaLiga, respectively). For all teams, the following variables were calculated: goals, possession value (PV), expected goals (xG) and xGOT. All data obtained for each variable were normalized by match (90 min). A descriptive and correlational statistical analysis was carried out. Results. In the men’s league, this study found a high correlation between goals per match and xGOT (R2 = 0.9248) while in the women’s league, there was a high correlation between goals per match (R2 = 0.9820) and xG and between goals per match and xGOT (R2 = 0.9574). Conclusions. In the LaLiga, the xGOT was the best metric that represented the match result while in the LigaF, the xG and the xGOT were the best metrics that represented the match score.
      Citation: Data
      PubDate: 2024-08-28
      DOI: 10.3390/data9090102
      Issue No: Vol. 9, No. 9 (2024)
       
  • Data, Vol. 9, Pages 103: TM–IoV: A First-of-Its-Kind Multilabeled
           Trust Parameter Dataset for Evaluating Trust in the Internet of Vehicles

    • Authors: Yingxun Wang, Adnan Mahmood, Mohamad Faizrizwan Mohd Sabri, Hushairi Zen
      First page: 103
      Abstract: The emerging and promising paradigm of the Internet of Vehicles (IoV) employ vehicle-to-everything communication for facilitating vehicles to not only communicate with one another but also with the supporting roadside infrastructure, vulnerable pedestrians, and the backbone network in a bid to primarily address a number of safety-critical vehicular applications. Nevertheless, owing to the inherent characteristics of IoV networks, in particular, of being (a) highly dynamic in nature and which results in a continual change in the network topology and (b) non-deterministic owing to the intricate nature of its entities and their interrelationships, they are susceptible to a number of malicious attacks. Such kinds of attacks, if and when materialized, jeopardizes the entire IoV network, thereby putting human lives at risk. Whilst the cryptographic-based mechanisms are capable of mitigating the external attacks, the internal attacks are extremely hard to tackle. Trust, therefore, is an indispensable tool since it facilitates in the timely identification and eradication of malicious entities responsible for launching internal attacks in an IoV network. To date, there is no dataset pertinent to trust management in the context of IoV networks and the same has proven to be a bottleneck for conducting an in-depth research in this domain. The manuscript-at-hand, accordingly, presents a first of its kind trust-based IoV dataset encompassing 96,707 interactions amongst 79 vehicles at different time instances. The dataset involves nine salient trust parameters, i.e., packet delivery ratio, similarity, external similarity, internal similarity, familiarity, external familiarity, internal familiarity, reward/punishment, and context, which play a considerable role in ascertaining the trust of a vehicle within an IoV network.
      Citation: Data
      PubDate: 2024-08-31
      DOI: 10.3390/data9090103
      Issue No: Vol. 9, No. 9 (2024)
       
  • Data, Vol. 9, Pages 104: Interruption Audio & Transcript: Derived from
           Group Affect and Performance Dataset

    • Authors: Daniel Doyle, Ovidiu Şerban
      First page: 104
      Abstract: Despite the widespread development and use of chatbots, there is a lack of audio-based interruption datasets. This study provides a dataset of 200 manually annotated interruptions from a broader set of 355 data points of overlapping utterances. The dataset is derived from the Group Affect and Performance dataset managed by the University of the Fraser Valley, Canada. It includes both audio files and transcripts, allowing for multi-modal analysis. Given the extensive literature and the varied definitions of interruptions, it was necessary to establish precise definitions. The study aims to provide a comprehensive dataset for researchers to build and improve interruption prediction models. The findings demonstrate that classification models can generalize well to identify interruptions based on this dataset’s audio. This opens up research avenues with respect to interruption-related topics, ranging from multi-modal interruption classification using text and audio modalities to the analysis of group dynamics.
      Citation: Data
      PubDate: 2024-08-31
      DOI: 10.3390/data9090104
      Issue No: Vol. 9, No. 9 (2024)
       
  • Data, Vol. 9, Pages 105: Experimental Data in a Greenhouse with and
           without Cultivation of Stringless Blue Lake Beans

    • Authors: Sebastian-Camilo Vanegas-Ayala, Julio Barón-Velandia, Oscar-Mauricio Garcia-Chavez, Adrian Romero-Palencia, Daniel-David Leal-Lara
      First page: 105
      Abstract: Greenhouse cultivation is one of the current strategies to address the challenges of food production, sustainability, and food quality. Similarly, the use of technological tools to automate greenhouse environments through a set of sensors and actuators allows for the control and improvement of processes within this environment. This document presents data collected from the sensors and actuators of two identical greenhouse environments, one with the cultivation of stringless blue lake beans and the other without cultivation. The aim is that this dataset will provide a broader characterization of the behavior of climatic variables inside greenhouse environments and how they are impacted by control actions, subsequently contributing to the development of new research on implementations of or improvements to control, supervision, management, and automation actions in greenhouse environments.
      Citation: Data
      PubDate: 2024-09-04
      DOI: 10.3390/data9090105
      Issue No: Vol. 9, No. 9 (2024)
       
  • Data, Vol. 9, Pages 106: Analysis of Split-System Air Conditioner Faults
           through Electrical Measurement Data

    • Authors: Anderson Carlos de Oliveira, Abel Cavalcante Lima Filho, Francisco Antonio Belo, André Victor Oliveira Cadena
      First page: 106
      Abstract: This work presents an electrical measurement dataset from a split-system air conditioner in normal operating conditions and with specific faults, such as incrustation in the condenser and evaporator air inlet with different levels of blocking, which often occurs in this type of equipment. We also added compressor capacitor degradation, which is a very common fault in this type of equipment, although it is scarcely addressed in research. The data were obtained through a non-invasive current sensor and a grain-oriented voltage sensor containing the values of the current and voltage of equipment that was installed in the field and tested at different levels for these fault conditions. This work not only explains how the entire data collection process was carried out but also presents two examples of fast Fourier transform (FFT) applications for the detection and diagnosis of faults through the electrical measurements analyzed in our studies, which had good effectiveness.
      Citation: Data
      PubDate: 2024-09-13
      DOI: 10.3390/data9090106
      Issue No: Vol. 9, No. 9 (2024)
       
  • Data, Vol. 9, Pages 107: OSBA: An Open Neonatal Neuroimaging Atlas and
           Template for Spina Bifida Aperta

    • Authors: Anna Speckert, Hui Ji, Kelly Payette, Patrice Grehten, Raimund Kottke, Samuel Ackermann, Beth Padden, Luca Mazzone, Ueli Moehrlen, Spina Bifida Study Group Zurich Spina Bifida Study Group Zurich, Andras Jakab
      First page: 107
      Abstract: We present the Open Spina Bifida Aperta (OSBA) atlas, an open atlas and set of neuroimaging templates for spina bifida aperta (SBA). Traditional brain atlases may not adequately capture anatomical variations present in pediatric or disease-specific cohorts. The OSBA atlas fills this gap by representing the computationally averaged anatomy of the neonatal brain with SBA after fetal surgical repair. The OSBA atlas was constructed using structural T2-weighted and diffusion tensor MRIs of 28 newborns with SBA who underwent prenatal surgical correction. The corrected gestational age at MRI was 38.1 ± 1.1 weeks (mean ± SD). The OSBA atlas consists of T2-weighted and fractional anisotropy templates, along with nine tissue prior maps and region of interest (ROI) delineations. The OSBA atlas offers a standardized reference space for spatial normalization and anatomical ROI definition. Our image segmentation and cortical ribbon definition are based on a human-in-the-loop approach, which includes manual segmentation. The precise alignment of the ROIs was achieved by a combination of manual image alignment and automated, non-linear image registration. From the clinical and neuroimaging perspective, the OSBA atlas enables more accurate spatial standardization and ROI-based analyses and supports advanced analyses such as diffusion tractography and connectomic studies in newborns affected by this condition.
      Citation: Data
      PubDate: 2024-09-17
      DOI: 10.3390/data9090107
      Issue No: Vol. 9, No. 9 (2024)
       
  • Data, Vol. 9, Pages 108: Dataset on the Validation and Standardization of
           the Questionnaire for the Self-Assessment of Service-Learning Experiences
           in Higher Education (QaSLu)

    • Authors: Roberto Sánchez-Cabrero, Elena López-de-Arana Prado, Pilar Aramburuzabala, Rosario Cerrillo
      First page: 108
      Abstract: This dataset shows the original validation and standardization of the Questionnaire for the Self-Assessment of Service-Learning Experiences in Higher Education (QaSLu). The QaSLu is the first instrument to measure university service-learning (USL), validated following a strict qualitative and quantitative process by a sample of experts in USL and generating rating scales for different profiles of professors. The Delphi method was used for the qualitative validation by 16 academic experts, who evaluated the relevance and clarity of the items. After two consultation rounds, 45 items were qualitatively validated, generating the QaSLu-45. Then, 118 instructors from 43 universities took part as the sample in the quantitative validation procedure. Quantitative validation was carried out through goodness-of-fit measures using confirmatory factor analysis and the final configuration optimized using one-factor robust exploratory factor analysis, determining the most optimal version of the questionnaire under the law of parsimony, the QaSLu-27, with only 27 items and better psychometric properties. Finally, rating scales were calculated to compare different profiles of USL professors. These findings offer a valid, strong, and trustworthy instrument. The QaSLu-27 may be helpful for the design of USL experiences, in addition to facilitating the assessment of such programs to enhance teaching and learning processes.
      Citation: Data
      PubDate: 2024-09-19
      DOI: 10.3390/data9090108
      Issue No: Vol. 9, No. 9 (2024)
       
  • Data, Vol. 9, Pages 109: Data on Economic Analysis: 2017 Social Accounting
           Matrices (SAMs) for South Africa

    • Authors: Pfunzo, Bahta, Jordaan
      First page: 109
      Abstract: The purpose of the Social Accounting Matrix (SAM) is to improve the quality of the database for modelling, including, but not limited to, policy analysis, multiplier analysis, price analysis, and Computable General Equilibrium. This article contributes to constructing the 2017 national SAM for South Africa, incorporating regional accounts. Only in Limpopo Province of South Africa are agricultural industries, labour, and households captured at the district level, while agricultural industry, labour, and household accounts in other provinces remain unchanged. The main data sources for constructing a SAM are found from different sources, such as Supply and Use Tables, National Accounts, Census of Commercial Agriculture, Quarterly Labour Force Survey, South Africa Revenue Service, Global Insight (regional explorer), and South Africa Reserve Bank. The dataset recorded that land returns for irrigation agriculture were highest (18.2%) in the Northern Cape Province of South Africa compared to other provinces, whereas the Free State Province of South Africa rainfed agriculture had the largest shares (22%) for payment to land. Regarding intermediate inputs, rainfed agriculture in the Western Cape, Free State, and Kwazulu-Natal Provinces paid approximately 0.4% for using intermediate inputs. In terms of the districts, land returns for irrigation were highest in the Vhembe district of Limpopo Province of South Africa with 0.3%. Despite Mopani district of Limpopo Province of South Africa having the lowest land returns for irrigation agriculture, it has the highest share (1.6%) of payment to land from rainfed agriculture. The manufacturing and community service sectors had a trade deficit, whereas other sectors experienced a trade surplus. The main challenges found in developing a SAM are scarcity of data to attain the information needed for disaggregation for the sub-matrices and insufficient information from different data sources for estimating missing information to ensure the row and column totals of the SAM are consistent and complete.
      Citation: Data
      PubDate: 2024-09-20
      DOI: 10.3390/data9090109
      Issue No: Vol. 9, No. 9 (2024)
       
  • Data, Vol. 9, Pages 90: SaBi3d—A LiDAR Point Cloud Data Set of
           Car-to-Bicycle Overtaking Maneuvers

    • Authors: Christian Odenwald, Moritz Beeking
      First page: 90
      Abstract: While cycling presents environmental benefits and promotes a healthy lifestyle, the risks associated with overtaking maneuvers by motorized vehicles represent a significant barrier for many potential cyclists. A large-scale analysis of overtaking maneuvers could inform traffic researchers and city planners how to reduce these risks by better understanding these maneuvers. Drawing from the fields of sensor-based cycling research and from LiDAR-based traffic data sets, this paper provides a step towards addressing these safety concerns by introducing the Salzburg Bicycle 3d (SaBi3d) data set, which consists of LiDAR point clouds capturing car-to-bicycle overtaking maneuvers. The data set, collected using a LiDAR-equipped bicycle, facilitates the detailed analysis of a large quantity of overtaking maneuvers without the need for manual annotation through enabling automatic labeling by a neural network. Additionally, a benchmark result for 3D object detection using a competitive neural network is provided as a baseline for future research. The SaBi3d data set is structured identically to the nuScenes data set, and therefore offers compatibility with numerous existing object detection systems. This work provides valuable resources for future researchers to better understand cycling infrastructure and mitigate risks, thus promoting cycling as a viable mode of transportation.
      Citation: Data
      PubDate: 2024-07-24
      DOI: 10.3390/data9080090
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 91: Data Descriptor of Snakebites in Brazil from 2007
           to 2020

    • Authors: Alexandre Vilhena Silva-Neto, Gabriel Santos Mouta, Antônio Alcirley Silva Balieiro, Jady Shayenne Mota Cordeiro, Patricia Carvalho Silva Balieiro, Tatyana Costa Amorin Ramos, Djane Clarys Baia-da-Silva, Élisson Silva Rocha, Patricia Takako Endo, Theo Lynn, Wuelton Marcelo Monteiro, Vanderson Souza Sampaio
      First page: 91
      Abstract: Snakebite envenomations (SBE) are a significant global public health threat due to their morbidity and mortality. This is a neglected public health issue in many tropical and subtropical countries. Brazil is in the top ten countries affected by SBE, with 32,160 cases reported only in 2020, posing a high burden for this population. In this paper, we describe the data structure of snakebite records from 2007 to 2020 in the Notifiable Disease Information System (SINAN), made available by the Brazilian Ministry of Health (MoH). In addition, we also provide R scripts that allow a quick and automatic updating of data from the SINAN according to its availability. The data presented in this work are related to clinical and demographic information on SBE cases. Also, data on outcomes, laboratory results, and treatment are available. The dataset is available and freely accessible; however, preprocessing, adjustments, and standardization are necessary due to incompleteness and inconsistencies. Regardless of these limitations, it provides a solid basis for assessing different aspects and the national burden of envenoming.
      Citation: Data
      PubDate: 2024-07-24
      DOI: 10.3390/data9080091
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 92: BELMASK—An Audiovisual Dataset of Adversely
           Produced Speech for Auditory Cognition Research

    • Authors: Cleopatra Christina Moshona, Frederic Rudawski, André Fiebig, Ennes Sarradj
      First page: 92
      Abstract: In this article, we introduce the Berlin Dataset of Lombard and Masked Speech (BELMASK), a phonetically controlled audiovisual dataset of speech produced in adverse speaking conditions, and describe the development of the related speech task. The dataset contains in total 128 min of audio and video recordings of 10 German native speakers (4 female, 6 male) with a mean age of 30.2 years (SD: 6.3 years), uttering matrix sentences in cued, uninstructed speech in four conditions: (i) with a Filtering Facepiece P2 (FFP2) mask in silence, (ii) without an FFP2 mask in silence, (iii) with an FFP2 mask while exposed to noise, iv) without an FFP2 mask while exposed to noise. Noise consisted of mixed-gender six-talker babble played over headphones to the speakers, triggering the Lombard effect. All conditions are readily available in face-and-voice and voice-only formats. The speech material is annotated, employing a multi-layer architecture, and was originally conceptualized to be used for the administration of a working memory task. The dataset is stored in a restricted-access Zenodo repository and is available for academic research in the area of speech communication, acoustics, psychology and related disciplines upon request, after signing an End User License Agreement (EULA).
      Citation: Data
      PubDate: 2024-07-24
      DOI: 10.3390/data9080092
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 93: Optimizing Database Performance in Complex Event
           Processing through Indexing Strategies

    • Authors: Maryam Abbasi, Marco V. Bernardo, Paulo Váz, José Silva, Pedro Martins
      First page: 93
      Abstract: Complex event processing (CEP) systems have gained significant importance in various domains, such as finance, logistics, and security, where the real-time analysis of event streams is crucial. However, as the volume and complexity of event data continue to grow, optimizing the performance of CEP systems becomes a critical challenge. This paper investigates the impact of indexing strategies on the performance of databases handling complex event processing. We propose a novel indexing technique, called Hierarchical Temporal Indexing (HTI), specifically designed for the efficient processing of complex event queries. HTI leverages the temporal nature of event data and employs a multi-level indexing approach to optimize query execution. By combining temporal indexing with spatial- and attribute-based indexing, HTI aims to accelerate the retrieval and processing of relevant events, thereby improving overall query performance. In this study, we evaluate the effectiveness of HTI by implementing complex event queries on various CEP systems with different indexing strategies. We conduct a comprehensive performance analysis, measuring the query execution times and resource utilization (CPU, memory, etc.), and analyzing the execution plans and query optimization techniques employed by each system. Our experimental results demonstrate that the proposed HTI indexing strategy outperforms traditional indexing approaches, particularly for complex event queries involving temporal constraints and multi-dimensional event attributes. We provide insights into the strengths and weaknesses of each indexing strategy, identifying the factors that influence performance, such as data volume, query complexity, and event characteristics. Furthermore, we discuss the implications of our findings for the design and optimization of CEP systems, offering recommendations for indexing strategy selection based on the specific requirements and workload characteristics. Finally, we outline the potential limitations of our study and suggest future research directions in this domain.
      Citation: Data
      PubDate: 2024-07-24
      DOI: 10.3390/data9080093
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 94: SparrKULee: A Speech-Evoked Auditory Response
           Repository from KU Leuven, Containing the EEG of 85 Participants

    • Authors: Bernd Accou, Lies Bollens, Marlies Gillis, Wendy Verheijen, Hugo Van hamme, Tom Francart
      First page: 94
      Abstract: Researchers investigating the neural mechanisms underlying speech perception often employ electroencephalography (EEG) to record brain activity while participants listen to spoken language. The high temporal resolution of EEG enables the study of neural responses to fast and dynamic speech signals. Previous studies have successfully extracted speech characteristics from EEG data and, conversely, predicted EEG activity from speech features. Machine learning techniques are generally employed to construct encoding and decoding models, which necessitate a substantial quantity of data. We present SparrKULee, a Speech-evoked Auditory Repository of EEG data, measured at KU Leuven, comprising 64-channel EEG recordings from 85 young individuals with normal hearing, each of whom listened to 90–150 min of natural speech. This dataset is more extensive than any currently available dataset in terms of both the number of participants and the quantity of data per participant. It is suitable for training larger machine learning models. We evaluate the dataset using linear and state-of-the-art non-linear models in a speech encoding/decoding and match/mismatch paradigm, providing benchmark scores for future research.
      Citation: Data
      PubDate: 2024-07-26
      DOI: 10.3390/data9080094
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 95: Bootstrap Method as a Tool for Analyzing Data with
           Atypical Distributions Deviating from Parametric Assumptions: Critique and
           Effectiveness Evaluation

    • Authors: Joanna Kostanek, Kamil Karolczak, Wiktor Kuliczkowski, Cezary Watala
      First page: 95
      Abstract: In today’s research environment characterized by exponential data growth and increasing complexity, the selection of appropriate statistical tests, tailored to research objectives and data distributions, is paramount for rigorous analysis and accurate interpretation. This article explores the growing prominence of bootstrapping, an advanced statistical technique for multiple comparisons analysis, offering flexibility and customization by estimating sample distributions without assuming population distributions, thus serving as a valuable alternative to traditional methods in various data scenarios. Computer simulations were conducted using data from cardiovascular disease patients. Two approaches, spontaneous partly controlled simulation and fully constrained simulation using self-written R scripts, were utilized to generate datasets with specified distributions and analyze the data using tests for comparing more than two groups. The utilization of the bootstrap method greatly improves statistical analysis, especially in overcoming the constraints of conventional parametric tests. Our research showcased its effectiveness in comparing multiple scenarios, yielding strong findings across diverse distributions, even with minor inflation in p values. Serving as a valuable substitute for parametric approaches, bootstrap promotes careful consideration when rejecting hypotheses, thus fostering a deeper understanding of statistical nuances and bolstering analytical rigor.
      Citation: Data
      PubDate: 2024-07-26
      DOI: 10.3390/data9080095
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 96: Data on the Land Cover Transition, Subsequent
           Landscape Degradation, and Improvement in Semi-Arid Rainfed Agricultural
           Land in North–West Tunisia

    • Authors: Zahra Shiri, Aymen Frija, Hichem Rejeb, Hassen Ouerghemmi, Quang Bao Le
      First page: 96
      Abstract: Understanding past landscape changes is crucial to promote agroecological landscape transitions. This study analyzes past land cover changes (LCCs) alongside subsequent degradation and improvements in the study area. The input land cover (LC) data were taken from ESRI’s ArcGIS Living Atlas of the World and then assessed for accuracy using ground truth data points randomly selected from high-resolution images on the Google Earth Engine. The LCC analyses were performed on QGIS 3.28.15 using the Semi-Automatic Classification Plugin (SCP) to generate LCC data. The degradation or improvement derived from the analyzed data was subsequently assessed using the UNCCD Good Practice Guidance to generate land cover degradation data. Using the Landscape Ecology Statistics (LecoS) plugin in QGIS, the input LC data were processed to provide landscape metrics. The data presented in this article show that the studied landscape is not static, even over a short-term time horizon (2017–2022). The transition from one LC class to another had an impact on the ecosystem and induced different states of degradation. For the three main LC classes (forest, crops, and rangeland) representing 98.9% of the total area in 2022, the landscape metrics, especially the number of patches, reflected a 105% increase in landscape fragmentation between 2017 and 2022.
      Citation: Data
      PubDate: 2024-07-29
      DOI: 10.3390/data9080096
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 97: Genomic Insights into Bacillus thuringiensis
           V-CO3.3: Unveiling Its Genetic Potential against Nematodes

    • Authors: Leopoldo Palma, Yolanda Bel, Baltasar Escriche
      First page: 97
      Abstract: Bacillus thuringiensis (Bt) is a Gram-positive, spore-forming, and ubiquitous bacterium harboring plasmids encoding a variety of proteins with insecticidal activity, but also with activity against nematodes. The aim of this work was to perform the genome sequencing and analysis of a native Bt strain showing bipyramidal parasporal crystals and designated V-CO3.3, which was isolated from the dust of a grain storehouse in Córdoba (Spain). Its genome comprised 99 high-quality assembled contigs accounting for a total size of 5.2 Mb and 35.1% G + C. Phylogenetic analyses suggested that this strain should be renamed as Bacillus cereus s.s. biovar Thuringiensis. Gene annotation revealed a total of 5495 genes, among which, 1 was identified as encoding a Cry5Ba homolog protein with well-documented toxicity against nematodes. These results suggest that this Bt strain has interesting potential for nematode biocontrol.
      Citation: Data
      PubDate: 2024-07-29
      DOI: 10.3390/data9080097
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 98: Arabic Lexical Substitution: AraLexSubD Dataset
           and AraLexSub Pipeline

    • Authors: Eman Naser-Karajah, Nabil Arman
      First page: 98
      Abstract: Lexical substitution aims to generate a list of equivalent substitutions (i.e., synonyms) to a sentence’s target word or phrase while preserving the sentence’s meaning to improve writing, enhance language understanding, improve natural language processing models, and handle ambiguity. This task has recently attracted much attention in many languages. Despite the richness of Arabic vocabulary, limited research has been performed on the lexical substitution task due to the lack of annotated data. To bridge this gap, we present the first Arabic lexical substitution benchmark dataset AraLexSubD for benchmarking lexical substitution pipelines. AraLexSubD is manually built by eight native Arabic speakers and linguists (six linguist annotators, a doctor, and an economist) who annotate the 630 sentences. AraLexSubD covers three domains: general, finance, and medical. It encompasses 2476 substitution candidates ranked according to their semantic relatedness. We also present the first Arabic lexical substitution pipeline, AraLexSub, which uses the AraBERT pre-trained language model. The pipeline consists of several modules: substitute generation, substitute filtering, and candidate ranking. The filtering step shows its effectiveness by achieving an increase of 1.6 in the F1 score on the entire AraLexSubD dataset. Additionally, an error analysis of the experiment is reported. To our knowledge, this is the first study on Arabic lexical substitution.
      Citation: Data
      PubDate: 2024-07-30
      DOI: 10.3390/data9080098
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 99: A Performance Analysis of Hybrid and Columnar
           Cloud Databases for Efficient Schema Design in Distributed Data Warehouse
           as a Service

    • Authors: Fred Eduardo Revoredo Rabelo Ferreira, Robson do Nascimento Fidalgo
      First page: 99
      Abstract: A Data Warehouse (DW) is a centralized database that stores large volumes of historical data for analysis and reporting. In a world where enterprise data grows exponentially, new architectures are being investigated to overcome the deficiencies of traditional Database Management Systems (DBMSs), driving a shift towards more modern, cloud-based solutions that provide resources such as distributed processing, columnar storage, and horizontal scalability without the overhead of physical hardware management, i.e., a Database as a Service (DBaaS). Choosing the appropriate class of DBMS is a critical decision for organizations, and there are important differences that impact data volume and query performance (e.g., architecture, data models, and storage) to support analytics in a distributed cloud environment efficiently. In this sense, we carry out an experimental evaluation to analyze the performance of several DBaaS and the impact of data modeling, specifically the usage of a partially normalized Star Schema and a fully denormalized Flat Table Schema, to further comprehend their behavior in different configurations and designs in terms of data schema, storage form, memory availability, and cluster size. The analysis is done in two volumes of data generated by a well-established benchmark, comparing the performance of the DW in terms of average execution time, memory usage, data volume, and loading time. Our results provide guidelines for efficient DW design, showing, for example, that the denormalization of the schema does not guarantee improved performance, as solutions performed differently depending on its architecture. We also show that a Hybrid Processing (HTAP) NewSQL solution can outperform solutions that support only Online Analytical Processing (OLAP) in terms of overall execution time, but that the performance of each query is deeply influenced by its selectivity and by the number of join functions.
      Citation: Data
      PubDate: 2024-08-05
      DOI: 10.3390/data9080099
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 100: Dataset of Registered Hematoxylin–Eosin and
           Ki67 Histopathological Image Pairs Complemented by a Registration
           Algorithm

    • Authors: Dominika Petríková, Ivan Cimrák, Katarína Tobiášová, Lukáš Plank
      First page: 100
      Abstract: In this work, we describe a dataset suitable for analyzing the extent to which hematoxylin–eosin (HE)-stained tissue contains information about the expression of Ki67 in immunohistochemistry staining. The dataset provides images of corresponding pairs of HE and Ki67 stainings and is complemented by algorithms for computing the Ki67 index. We introduce a dataset of high-resolution histological images of testicular seminoma tissue. The dataset comprises digitized histology slides from 77 conventional testicular seminoma patients, obtained via surgical resection. For each patient, two physically adjacent tissue sections are stained: one with hematoxylin and eosin, and one with Ki67 immunohistochemistry staining. This results in a total of 154 high-resolution images. The images are provided in PNG format, facilitating ease of use for image analysis compared to the original scanner output formats. Each image contains enough tissue to generate thousands of non-overlapping 224 × 224 pixel patches. This shows the potential to generate more than 50,000 pairs of patches, one with HE staining and a corresponding Ki67 patch that depicts a very similar part of the tissue. Finally, we present the results of applying a ResNet neural network for the classification of HE patches into categories according to their Ki67 label.
      Citation: Data
      PubDate: 2024-08-07
      DOI: 10.3390/data9080100
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 101: Viral Targets in the Human Interactome with
           Comprehensive Centrality Analysis: SARS-CoV-2, a Case Study

    • Authors: Nilesh Kumar, M. Shahid Mukhtar
      First page: 101
      Abstract: Network centrality analyses have proven to be successful in identifying important nodes in diverse host–pathogen interactomes. The current study presents a comprehensive investigation of the human interactome and SARS-CoV-2 host targets. We first constructed a comprehensive human interactome by compiling experimentally validated protein–protein interactions (PPIs) from eight distinct sources. Additionally, we compiled a comprehensive list of 1449 SARS-CoV-2 host proteins and analyzed their interactions within the human interactome, which identified enriched biological processes and pathways. Seven diverse topological features were employed to reveal the enrichment of the SARS-CoV-2 targets in the human interactome, with closeness centrality emerging as the most effective metric. Furthermore, a novel approach called CentralityCosDist was employed to predict SARS-CoV-2 targets, which proved to be effective in expanding the pool of predicted targets. Pathway enrichment analyses further elucidated the functional roles and potential mechanisms associated with predicted targets. Overall, this study provides valuable insights into the complex interplay between SARS-CoV-2 and the host’s cellular machinery, contributing to a deeper understanding of viral infection and immune response modulation.
      Citation: Data
      PubDate: 2024-08-20
      DOI: 10.3390/data9080101
      Issue No: Vol. 9, No. 8 (2024)
       
  • Data, Vol. 9, Pages 89: Literature-Based Inventory of Chemical Substance
           Concentrations Measured in Organic Food Consumed in Europe

    • Authors: Joanna Choueiri, Pascal Petit, Franck Balducci, Dominique J. Bicout, Christine Demeilliers
      First page: 89
      Abstract: Populations are exposed daily to numerous environmental pollutants, particularly through food. To address environmental issues, many agricultural production methods have been developed, including organic farming. To date, there is no exhaustive inventory of the contamination of organic foods as there is for conventional foods. The main objective of this work was to construct a growing and updatable database on chemical substances and their levels in organic foods consumed in Europe. To this end, a literature search was conducted, resulting in a total of 1207 concentration values from 823 food–substances pairs involving 166 food matrices and 209 chemical substances, among which 95% were not authorized in organic farming and 80% were pesticides. The most encountered substance groups are “inorganic contaminants” and “organophosphate”, and the most studied food groups are “fruit used as fruit” and “Cereals and cereal primary derivatives”. Further studies are needed to continue updating the database with robust and comprehensive data on organic food contamination. This database could be used to study the health risks associated with these contaminants.
      Citation: Data
      PubDate: 2024-07-03
      DOI: 10.3390/data9070089
      Issue No: Vol. 9, No. 7 (2024)
       
  • Data, Vol. 9, Pages 138: CARE to Compare: A Real-World Benchmark Dataset
           for Early Fault Detection in Wind Turbine Data

    • Authors: Christian Gück, Cyriana M. A. Roelofs, Stefan Faulstich
      First page: 138
      Abstract: Early fault detection plays a crucial role in the field of predictive maintenance for wind turbines, yet the comparison of different algorithms poses a difficult task because domain-specific public datasets are scarce. Many comparisons of different approaches either use benchmarks composed of data from many different domains, inaccessible data, or one of the few publicly available datasets that lack detailed information about the faults. Moreover, many publications highlight a couple of case studies where fault detection was successful. With this paper, we publish a high quality dataset that contains data from 36 wind turbines across 3 different wind farms as well as the most detailed fault information of any public wind turbine dataset as far as we know. The new dataset contains 89 years worth of real-world operating data of wind turbines, distributed across 44 labeled time frames for anomalies that led up to faults, as well as 51 time series representing normal behavior. Additionally, the quality of training data is ensured by turbine-status-based labels for each data point. Furthermore, we propose a new scoring method, called CARE (Coverage, Accuracy, Reliability and Earliness), which takes advantage of the information depth that is present in the dataset to identify good early fault detection models for wind turbines. This score considers the anomaly detection performance, the ability to recognize normal behavior properly, and the capability to raise as few false alarms as possible while simultaneously detecting anomalies early.
      Citation: Data
      PubDate: 2024-11-23
      DOI: 10.3390/data9120138
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 139: Detective Gadget: Generic Iterative Entity
           Resolution over Dirty Data

    • Authors: Marcello Buoncristiano, Giansalvatore Mecca, Donatello Santoro, Enzo Veltri
      First page: 139
      Abstract: In the era of Big Data, entity resolution (ER), i.e., the process of identifying which records refer to the same entity in the real world, plays a critical role in data-integration tasks, especially in mission-critical applications where accuracy is mandatory, since we want to avoid integrating different entities or missing matches. However, existing approaches struggle with the challenges posed by rapidly changing data and the presence of dirtiness, which requires an iterative refinement during the time. We present Detective Gadget, a novel system for iterative ER that seamlessly integrates data-cleaning into the ER workflow. Detective Gadgetemploys an alias-based hashing mechanism for fast and scalable matching, check functions to detect and correct mismatches, and a human-in-the-loop framework to refine results through expert feedback. The system iteratively improves data quality and matching accuracy by leveraging evidence from both automated and manual decisions. Extensive experiments across diverse real-world scenarios demonstrate its effectiveness, achieving high accuracy and efficiency while adapting to evolving datasets.
      Citation: Data
      PubDate: 2024-11-25
      DOI: 10.3390/data9120139
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 140: Algorithm for Trajectory Simplification Based on
           Multi-Point Construction in Preselected Area and Noise Smoothing
           Processing

    • Authors: Simin Huang, Zhiying Yang
      First page: 140
      Abstract: Simplifying trajectory data can improve the efficiency of trajectory data analysis and query and reduce the communication cost and computational overhead of trajectory data. In this paper, a real-time trajectory simplification algorithm (SSFI) based on the spatio-temporal feature information of implicit trajectory points is proposed. The algorithm constructs the preselected area through the error measurement method based on the feature information of implicit trajectory points (IEDs) proposed in this paper, predicts the falling point of trajectory points, and realizes the one-way error-bounded simplified trajectory algorithm. Experiments show that the simplified algorithm has obvious progress in three aspects: running speed, compression accuracy, and simplification rate. When the trajectory data scale is large, the performance of the algorithm is much better than that of other line segment simplification algorithms. The GPS error cannot be avoided. The Kalman filter smoothing trajectory can effectively eliminate the influence of noise and significantly improve the performance of the simplified algorithm. According to the characteristics of the trajectory data, this paper accurately constructs a mathematical model to describe the motion state of objects, so that the performance of the Kalman filter is better than other filters when smoothing trajectory data. In this paper, the trajectory data smoothing experiment is carried out by adding random Gaussian noise to the trajectory data. The experiment shows that the Kalman filter’s performance under the mathematical model is better than other filters.
      Citation: Data
      PubDate: 2024-11-29
      DOI: 10.3390/data9120140
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 141: A Dataset of Plant Species Richness in Chinese
           National Nature Reserves

    • Authors: Chunjing Wang, Wuxian Yan, Jizhong Wan
      First page: 141
      Abstract: This comprehensive dataset on the number of plant species, genera, and families in 383 national nature reserves in China has been compiled based on the available literature. Heilongjiang Province and the Guangxi Zhuang Autonomous Region have the highest number of nature reserves. Species richness is relatively high in the Jinfoshan, Dabashan, Wenshan, Hupingshan, and Shennongjia Nature Reserves. This dataset provides important baseline information on plant species richness coupling with genus and family numbers in Chinese national nature reserves and should help researchers and environmentalists understand the dynamic species changes in various nature reserves. This detailed and reliable information may serve as the foundation for future plant research in Chinese nature reserves and play a positive role in promoting more effective natural protection, biological distribution, and biodiversity conservation in these areas.
      Citation: Data
      PubDate: 2024-11-30
      DOI: 10.3390/data9120141
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 142: Nearest-Better Network-Assisted Fitness Landscape
           Analysis of Contaminant Source Identification in Water Distribution
           Network

    • Authors: Yiya Diao, Changhe Li, Sanyou Zeng, Shengxiang Yang
      First page: 142
      Abstract: Contaminant Source Identification in Water Distribution Network (CSWIDN) is critical for ensuring public health, and optimization algorithms are commonly used to solve this complex problem. However, these algorithms are highly sensitive to the problem’s landscape features, which has limited their effectiveness in practice. Despite this, there has been little experimental analysis of the fitness landscape for CSWIDN, particularly given its mixed-encoding nature. This study addresses this gap by conducting a comprehensive fitness landscape analysis of CSWIDN using the Nearest-Better Network (NBN), the only applicable method for mixed-encoding problems. Our analysis reveals for the first time that CSWIDN exhibits the landscape features, including neutrality, ruggedness, modality, dynamic change, and separability. These findings not only deepen our understanding of the problem’s inherent landscape features but also provide quantitative insights into how these features influence algorithm performance. Additionally, based on these insights, we propose specific algorithm design recommendations that are better suited to the unique challenges of the CSWIDN problem. This work advances the knowledge of CSWIDN optimization by both qualitatively characterizing its landscape and quantitatively linking these features to algorithms’ behaviors.
      Citation: Data
      PubDate: 2024-12-06
      DOI: 10.3390/data9120142
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 143: A Data Storage, Analysis, and Project
           Administration Engine (TMFdw) for Small- to Medium-Size Interdisciplinary
           Ecological Research Programs with Full Raster Data Capabilities

    • Authors: Paulina Grigusova, Christian Beilschmidt, Maik Dobbermann, Johannes Drönner, Michael Mattig, Pablo Sanchez, Nina Farwig, Jörg Bendix
      First page: 143
      Abstract: Over almost 20 years, a data storage, analysis, and project administration engine (TMFdw) has been continuously developed in a series of several consecutive interdisciplinary research projects on functional biodiversity of the southern Andes of Ecuador. Starting as a “working database”, the system now includes program management modules and literature databases, which are all accessible via a web interface. Originally designed to manage data in the ecological Research Unit 816 (SE Ecuador), the open software is now being used in several other environmental research programs, demonstrating its broad applicability. While the system was mainly developed for abiotic and biotic tabular data in the beginning, the new research program demands full capabilities to work with area-wide and high-resolution big models and remote sensing raster data. Thus, a raster engine was recently implemented based on the Geo Engine technology. The great variety of pre-implemented desktop GIS-like analysis options for raster point and vector data is an important incentive for researchers to use the system. A second incentive is to implement use cases prioritized by the researchers. As an example, we present machine learning models to generate high-resolution (30 m) microclimate raster layers for the study area in different temporal aggregation levels for the most important variables of air temperature, humidity, precipitation, and solar radiation. The models implemented as use cases outperform similar models developed in other research programs.
      Citation: Data
      PubDate: 2024-12-06
      DOI: 10.3390/data9120143
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 144: Multi-Modal Dataset of Human Activities of Daily
           Living with Ambient Audio, Vibration, and Environmental Data

    • Authors: Thomas Pfitzinger, Marcel Koch, Fabian Schlenke, Hendrik Wöhrle
      First page: 144
      Abstract: The detection of human activities is an important step in automated systems to understand the context of given situations. It can be useful for applications like healthcare monitoring, smart homes, and energy management systems for buildings. To achieve this, a sufficient data basis is required. The presented dataset contains labeled recordings of 25 different activities of daily living performed individually by 14 participants. The data were captured by five multisensors in supervised sessions in which a participant repeated each activity several times. Flawed recordings were removed, and the different data types were synchronized to provide multi-modal data for each activity instance. Apart from this, the data are presented in raw form, and no further filtering was performed. The dataset comprises ambient audio and vibration, as well as infrared array data, light color and environmental measurements. Overall, 8615 activity instances are included, each captured by the five multisensor devices. These multi-modal and multi-channel data allow various machine learning approaches to the recognition of human activities, for example, federated learning and sensor fusion.
      Citation: Data
      PubDate: 2024-12-09
      DOI: 10.3390/data9120144
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 145: Formalization for Subsequent Computer Processing
           of Kara Sea Coastline Data

    • Authors: Daria Bogatova, Stanislav Ogorodov
      First page: 145
      Abstract: This study aimed to develop a methodological framework for predicting shoreline dynamics using machine learning techniques, focusing on analyzing generalized data without distinguishing areas with higher or lower retreat rates. Three sites along the southwestern Kara Sea coast were selected for this investigation. The study analyzed key coastal features, including lithology, permafrost, and geomorphology, using a combination of field studies and remote sensing data. Essential datasets were compiled and formatted for computer-based analysis. These datasets included information on permafrost and the geomorphological characteristics of the coastal zone, climatic factors influencing the shoreline, and measurements of bluff top positions and retreat rates over defined time periods. The positions of the bluff tops were determined through a combination of imagery with varying resolutions and field measurements. A novel aspect of the study involved employing geostatistical methods to analyze erosion rates, providing new insights into the shoreline dynamics. The data analysis allowed us to identify coastal areas experiencing the most significant changes. By continually refining neural network models with these datasets, we can improve our understanding of the complex interactions between natural factors and shoreline evolution, ultimately aiding in developing effective coastal management strategies.
      Citation: Data
      PubDate: 2024-12-09
      DOI: 10.3390/data9120145
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 146: Data Decomposition Modeling Based on Improved
           Dung Beetle Optimization Algorithm for Wind Power Prediction

    • Authors: Jiajian Ke, Tian Chen
      First page: 146
      Abstract: Accurate wind power forecasting is essential for maintaining the stability of a power system and enhancing scheduling efficiency in the power sector. To enhance prediction accuracy, this paper presents a hybrid wind power prediction model that integrates the improved complementary ensemble empirical mode decomposition (ICEEMDAN), the RIME optimization algorithm (RIME), sample entropy (SE), the improved dung beetle optimization (IDBO) algorithm, the bidirectional long short-term memory (BiLSTM) network, and multi-head attention (MHA). In this model, RIME is utilized to improve the parameters of ICEEMDAN, reducing data decomposition complexity and effectively capturing the original data information. The IDBO algorithm is then utilized to improve the hyperparameters of the MHA-BiLSTM model. The proposed RIME-ICEEMDAN-IDBO-MHA-BiLSTM model is contrasted with ten others in ablation experiments to validate its performance. The experimental findings prove that the proposed model achieves MAPE values of 5.2%, 6.3%, 8.3%, and 5.8% across four datasets, confirming its superior predictive performance and higher accuracy.
      Citation: Data
      PubDate: 2024-12-09
      DOI: 10.3390/data9120146
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 147: Parallel Simplex, an Alternative to Classical
           Experimentation: A Case Study

    • Authors: Francisco Zorrilla Briones, Inocente Yuliana Meléndez Pastrana, Manuel Alonso Rodríguez Morachis, José Luís Anaya Carrasco
      First page: 147
      Abstract: Experimentation is a strong methodology that improves and optimizes processes. Nevertheless, in many cases, real-life dynamics of production demands and other restrictions inhibit the use of these methodologies because their use implies stopping production, generating scrap, jeopardizing demand accomplishments, and other problems. Proposed here is an alternative methodology to search for the best process variable levels and optimize the response of the process without the need to stop production. This algorithm is based on the principles of the Variable Simplex developed by Nelder and Mead and the continuous iterative process of EVOPS developed by Box, which is then modified as a simplex by Spendley. It is named parallel simplex because it searches for the best response with three independent Simplexes searching for the same response at the same time. The algorithm was designed for three simplexes of two input variables each. The case study documented shows that it is efficient and effective.
      Citation: Data
      PubDate: 2024-12-10
      DOI: 10.3390/data9120147
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 148: Teal-WCA: A Climate Services Platform for
           Planning Solar Photovoltaic and Wind Energy Resources in West and Central
           Africa in the Context of Climate Change

    • Authors: Salomon Obahoundje, Arona Diedhiou, Alberto Troccoli, Penny Boorman, Taofic Abdel Fabrice Alabi, Sandrine Anquetin, Louise Crochemore, Wanignon Ferdinand Fassinou, Benoit Hingray, Daouda Koné, Chérif Mamadou, Fatogoma Sorho
      First page: 148
      Abstract: To address the growing electricity demand driven by population growth and economic development while mitigating climate change, West and Central African countries are increasingly prioritizing renewable energy as part of their Nationally Determined Contributions (NDCs). This study evaluates the implications of climate change on renewable energy potential using ten downscaled and bias-adjusted CMIP6 models (CDFt method). Key climate variables—temperature, solar radiation, and wind speed—were analyzed and integrated into the Teal-WCA platform to aid in energy resource planning. Projected temperature increases of 0.5–2.7 °C (2040–2069) and 0.7–5.2 °C (2070–2099) relative to 1985–2014 underscore the need for strategies to manage the rising demand for cooling. Solar radiation reductions (~15 W/m2) may lower photovoltaic (PV) efficiency by 1–8.75%, particularly in high-emission scenarios, requiring a focus on system optimization and diversification. Conversely, wind speeds are expected to increase, especially in coastal regions, enhancing wind power potential by 12–50% across most countries and by 25–100% in coastal nations. These findings highlight the necessity of integrating climate-resilient energy policies that leverage wind energy growth while mitigating challenges posed by reduced solar radiation. By providing a nuanced understanding of the renewable energy potential under changing climatic conditions, this study offers actionable insights for sustainable energy planning in West and Central Africa.
      Citation: Data
      PubDate: 2024-12-10
      DOI: 10.3390/data9120148
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 149: Unlocking New Opportunities for Spatial Analysis
           of Farms’ Income and Business Activities in Italy: The Agricultural
           Regions in Shapefile Format

    • Authors: Sara Quaresima, Pasquale Nino, Concetta Cardillo, Arianna Di Paola
      First page: 149
      Abstract: Italy is divided into 773 Agricultural Regions (ARs) based on shared physical and agronomic characteristics. These regions offer a valuable tool for analyzing various geographical, socio-economic, and environmental aspects of agriculture, including the climate. However, the ARs have lacked geospatial data, limiting their analytical potential. This study introduces the “Italian ARs Dataset”, a georeferenced shapefile defining the boundaries of each AR. This dataset facilitates geographical assessments of Italy’s complex agricultural sector. It also unlocks the potential for integrating AR data with other datasets like the Farm Accounting Data Network (FADN) dataset, in Italy represented by the Rete di Informazione Contabile Agricola (RICA), which samples hundreds of thousands of farms annually. To demonstrate the dataset’s utility, a large sample of RICA data encompassing 179 irrigated crops from 2011 to 2021, covering all of Italy, was retrieved. Validation confirmed successful assignment of all ARs present in the RICA sample to the corresponding shapefile. Additionally, to encourage the use of the ARs Dataset with gridded data, different spatial-scale resolutions are tested to identify a suitable threshold. The minimal spatial scale identified is 0.11 degrees, a commonly adopted scale by several climate datasets within the EURO-CORDEX and COPERNICUS programs.
      Citation: Data
      PubDate: 2024-12-13
      DOI: 10.3390/data9120149
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 150: Genome-Scale DNA Methylome and Transcriptome
           Profiles of Prostate Cancer Recurrence After Prostatectomy

    • Authors: Jim Smith, Priyadarshana Ajithkumar, Emma J. Wilkinson, Atreyi Dutta, Sai Shyam Vasantharajan, Angela Yee, Gregory Gimenez, Rathan M. Subramaniam, Michael Lau, Amir D. Zarrabi, Euan J. Rodger, Aniruddha Chatterjee
      First page: 150
      Abstract: Prostate cancer (PCa) is a major health burden worldwide, and despite early treatment, many patients present with biochemical recurrence (BCR) post-treatment, reflected by a rise in prostate-specific antigen (PSA) over a clinical threshold. Novel transcriptomic and epigenomic biomarkers can provide a powerful tools for the clinical management of PCa. Here, we provide matched RNA sequencing and array-based genome-wide DNA methylome data of PCa patients (n = 17) with or without evidence of BCR following radical prostatectomy. Formalin-fixed paraffin-embedded (FFPE) tissues were used to generate these data, which included technical replicates to provide further validity of the data. We describe the sample features, experimental design, methods and bioinformatic pipelines for processing these multi-omic data. Importantly, comprehensive clinical, histopathological, and follow-up data for each patient were provided to enable the correlation of transcriptome and methylome features with clinical features. Our data will contribute towards the efforts of developing epigenomic and transcriptomic markers for BCR and also facilitate a deeper understanding of the molecular basis of PCa recurrence.
      Citation: Data
      PubDate: 2024-12-16
      DOI: 10.3390/data9120150
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 151: A Framework for Current and New Data Quality
           Dimensions: An Overview

    • Authors: Russell Miller, Harvey Whelan, Michael Chrubasik, David Whittaker, Paul Duncan, João Gregório
      First page: 151
      Abstract: This paper presents a comprehensive exploration of data quality terminology, revealing a significant lack of standardisation in the field. The goal of this work was to conduct a comparative analysis of data quality terminology across different domains and structure it into a hierarchical data model. We propose a novel approach for aggregating disparate data quality terms used to describe the multiple facets of data quality under common umbrella terms with a focus on the ISO 25012 standard. We introduce four additional data quality dimensions: governance, usefulness, quantity, and semantics. These dimensions enhance specificity, complementing the framework established by the ISO 25012 standard, as well as contribute to a broad understanding of data quality aspects. The ISO 25012 standard, a general standard for managing the data quality in information systems, offers a foundation for the development of our proposed Data Quality Data Model. This is due to the prevalent nature of digital systems across a multitude of domains. In contrast, frameworks such as ALCOA+, which were originally developed for specific regulated industries, can be applied more broadly but may not always be generalisable. Ultimately, the model we propose aggregates and classifies data quality terminology, facilitating seamless communication of the data quality between different domains when collaboration is required to tackle cross-domain projects or challenges. By establishing this hierarchical model, we aim to improve understanding and implementation of data quality practices, thereby addressing critical issues in various domains.
      Citation: Data
      PubDate: 2024-12-18
      DOI: 10.3390/data9120151
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 152: Advanced Methodology for Emulating Local
           Operating Conditions in Proton Exchange Membrane Fuel Cells

    • Authors: Marine Cornet, Arnaud Morin, Jean-Philippe Poirot-Crouvezier, Yann Bultel
      First page: 152
      Abstract: This work focuses on the study of operating heterogeneities on a large MEA’s active surface area in a PEMFC stack. An advanced methodology is developed, aiming at the prediction of local operating conditions such as temperature, relative humidity and species concentration. A physics-based Pseudo-3D model developed under COMSOL Multiphysics allows for the observation of heterogeneities over the entire active surface area. Once predicted, these local operating conditions are experimentally emulated, thanks to a differential cell, to provide the local polarization curves and electrochemical impedance spectra. Coupling simulation and experimental, thirty-seven local operating conditions are emulated to examine the degree of correlation between local operating conditions and PEMFC cell performances. Researchers and engineers can use the polarization curves and Electrochemical Impedance Spectroscopy diagrams to fit the variables of an empirical model or to validate the results of a theoretical model.
      Citation: Data
      PubDate: 2024-12-20
      DOI: 10.3390/data9120152
      Issue No: Vol. 9, No. 12 (2024)
       
  • Data, Vol. 9, Pages 122: Towards a Taxonomy Machine: A Training Set of 5.6
           Million Arthropod Images

    • Authors: Dirk Steinke, Sujeevan Ratnasingham, Jireh Agda, Hamzah Ait Boutou, Isaiah C. H. Box, Mary Boyle, Dean Chan, Corey Feng, Scott C. Lowe, Jaclyn T. A. McKeown, Joschka McLeod, Alan Sanchez, Ian Smith, Spencer Walker, Catherine Y.-Y. Wei, Paul D. N. Hebert
      First page: 122
      Abstract: The taxonomic identification of organisms from images is an active research area within the machine learning community. Current algorithms are very effective for object recognition and discrimination, but they require extensive training datasets to generate reliable assignments. This study releases 5.6 million images with representatives from 10 arthropod classes and 26 insect orders. All images were taken using a Keyence VHX-7000 Digital Microscope system with an automatic stage to permit high-resolution (4K) microphotography. Providing phenotypic data for 324,000 species derived from 48 countries, this release represents, by far, the largest dataset of standardized arthropod images. As such, this dataset is well suited for testing the efficacy of machine learning algorithms for identifying specimens into higher taxonomic categories.
      Citation: Data
      PubDate: 2024-10-25
      DOI: 10.3390/data9110122
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 123: Sustainable Transportation Characteristics
           Diary—Example of Older (50+) Cyclists

    • Authors: Sreten Jevremović, Carol Kachadoorian, Filip Arnaut, Aleksandra Kolarski, Vladimir A. Srećković
      First page: 123
      Abstract: Cycling is a sustainable and healthy form of transportation that is gradually becoming the primary means of transportation over shorter distances in many countries. This paper describes the dataset used to determine the cycling characteristics of seniors in the USA and Canada. For these purposes, a specially created questionnaire was used in a survey conducted from August 2021 to July 2022. The questionnaire contained sections related to the general socio-demographic characteristics of the respondents, general characteristics of cycling (type of bicycle, cycle time, mileage, etc.), and specific characteristics of cycling (riding in night conditions, termination of cycling, motivating and demotivating factors for cycling, etc.). The total sample consisted of 5096 respondents (50+ years old). This database is particularly significant because it represents the first set of publicly available data related to the cycling characteristics of older adults. The database can be used by various researchers dealing with this topic, but also by the decision-makers who want to design a sustainable and accessible cycling infrastructure, respecting the requirements of this category of users. Finally, this dataset can serve as an adequate basis in the process of determining the specificities and understanding the needs of older cyclists in traffic.
      Citation: Data
      PubDate: 2024-10-25
      DOI: 10.3390/data9110123
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 124: Curated Polyoxometalate Formula Dataset

    • Authors: Aleksandar Kondinski, Nadiia Gumerova, Annette Rompel
      First page: 124
      Abstract: Reticular and cluster materials often feature complex formulas, making a comprehensive overview challenging due to the need to consult various resources. While datasets have been collected for metal-organic frameworks (MOFs), covalent organic frameworks (COFs), and zeolites, among others, there remains a gap in systematically organized information for polyoxometalates. This paper introduces a carefully curated dataset of 1984 polyoxometalate (POM) and related cluster metal oxide formula instances, currently connecting over 2500 POM material instances. These POM instances incorporate 75 different chemical elements, with compositions ranging from binary to octonary element clusters. This dataset not only enhances accessibility to polyoxometalate data but also aims to facilitate further research and development in the study of these complex inorganic compounds.
      Citation: Data
      PubDate: 2024-10-29
      DOI: 10.3390/data9110124
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 125: Enhancing Access Across Europe for Documents
           Published According to Freedom of Information Act: Applying Woogle Design
           and Technique to Estonian Public Information Act Document

    • Authors: Gerda Viira, Maarten Marx
      First page: 125
      Abstract: In the Netherlands, the Open Government Act (Wet openbare overheid or Woo/Wob in Dutch) is in effect, with the primary objective of ensuring a more transparent government. In line with the legislation, a search engine named Woogle has been designed and developed to centralize documents published under the Open Government Act. The Estonian Public Information Act serves a similar purpose and requires all public institutions to publish information generated during official duties, fostering transparency and public oversight. Currently, Estonia’s document repositories are decentralized, and content search is not supported, which hinders people’s ability to efficiently locate information. This study aims to assess public information accessibility in Estonia and to apply Woogle’s design and techniques to Estonia’s document repositories, thereby evaluating its potential for broader European implementation. The methodology involved web scraping data and documents from 57 Estonian public institutions’ document repositories. The results indicate that Woogle’s design and techniques can be implemented in Estonia. From a technical perspective, the alignment of the fields was successful, while it was found that content-wise, the Estonian data present challenges due to inconsistencies and lack of comprehensive categorization. The findings suggest potential scalability across European countries, pointing to a broader applicability of the Woogle model for creating a corpus of Freedom of Information Act documents in Europe. The collected data are available as a dataset.
      Citation: Data
      PubDate: 2024-10-29
      DOI: 10.3390/data9110125
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 126: Long-Term Outdoor Cultivation of Nannochloropsis
           in California, Hawaii, and New Mexico

    • Authors: Alina A. Corcoran, Marcela Saracco Alvarez, Taryn Cornell, Isidora Echenique-Subiabre, Julia Gerber, Stephanie Getto, Ahlem Jebali, Heather Martinez, Jakob O. Nalley, Charles J. O’Kelly, Aidan Ryan, Jonathan B. Shurin, Shawn R. Starkenburg
      First page: 126
      Abstract: The project “Optimizing Selection Pressures and Pest Management to Maximize Cultivation Yield” (OSPREY, award #DE-EE08902) was undertaken to enhance the annual productivity, stability, and quality of algal production strains for biofuels and bioproducts. The foundation of this project was the year-round cultivation of a Nannochloropsis strain across three outdoor systems in California, Hawaii, and New Mexico. We aimed to leverage environmental selection pressures to drive strain improvement and use metagenomic techniques to inform pest management tools. The resulting dataset includes environmental and biological parameters from these cultivation campaigns, captured in a single CSV file. This dataset aims to serve a wide range of end users, from biologists to algal farmers, addressing the scarcity of publicly available data on algae cultivation. Further data releases will include 16S rRNA amplicon sequencing and shotgun sequencing datasets.
      Citation: Data
      PubDate: 2024-10-29
      DOI: 10.3390/data9110126
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 127: Thermal Transmittance Limits Dataset for New and
           Existing Buildings Across EU Regulations

    • Authors: Paolo Maria Congedo, Cristina Baglivo, Delia D’Agostino, Paola Maria Albanese
      First page: 127
      Abstract: Building energy regulations are essential for reducing energy consumption in the European Union (EU) and achieving climate neutrality goals. This data article supplements the “Overview of EU Building Envelope Energy Requirement for Climate Neutrality” by presenting a detailed dataset on building regulations across all 27 EU member states, with a focus on building envelope efficiency. The data include thermal transmittance limits for windows, walls, floors, and roofs, offering insights into regulatory differences and potential opportunities for harmonization. Information was sourced from the Energy Performance of Buildings Directive (EPBD) database, national reports, and scientific literature to ensure comprehensive coverage. Key aspects of each country’s regulations are summarized in tables, covering both new constructions and renovations. The inclusion of Köppen–Geiger climate classifications allows for climate-specific analyses, providing valuable context for researchers, policymakers, and construction professionals. This dataset enables comparative studies, helping to identify best practices and inform policy interventions aimed at enhancing energy efficiency across Europe. It also supports the development of tailored strategies to improve building performance in different environmental conditions, ultimately contributing to the EU’s energy and climate targets.
      Citation: Data
      PubDate: 2024-10-31
      DOI: 10.3390/data9110127
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 128: Towards a Datatset of Digitalized Historical
           German VET and CVET Regulations

    • Authors: Thomas Reiser, Jens Dörpinghaus, Petra Steiner, Michael Tiemann
      First page: 128
      Abstract: The digitization of historical documents has gained particular interest in recent years in the digital humanities. The goal is to digitize historical documents by extracting and structuring text from scanned images. Here, we focus on the processing of historical German VET (vocational education and training) and CVET (continuing vocational education and training) regulations to support educational research. This dataset contains data from 1908 to the present and includes 2125 documents as PDF, 983 fully converted XML documents, and additional metadata for 7090 documents from the archive. We present an overview of the historical background and the challenges of processing different historical documents from three different federal states.
      Citation: Data
      PubDate: 2024-11-03
      DOI: 10.3390/data9110128
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 129: Data Hub for Life Cycle Assessment of Climate
           Change Solutions—Hydrogen Case Study

    • Authors: Shiva Zargar, Miyuru Kannangara, Giovanna Gonzales-Calienes, Jianjun Yang, Jalil Shadbahr, Cyrille Decès-Petit, Farid Bensebaa
      First page: 129
      Abstract: Life cycle assessment, which evaluates the complete life cycle of a product, is considered the standard methodological framework to evaluate the environmental performance of climate change solutions. However, significant challenges exist related to datasets used to quantify these environmental indicators. Although extensive research and commercial data on climate change technologies, pathways, and facilities exist, they are not readily available to practitioners of life cycle assessment in the right format and structure using an open platform. In this study, we propose a new open data hub platform for life cycle assessment, considering a hierarchical data flow starting with raw data collected on climate change technologies at laboratory, pilot, demonstration, or commercial scales to provide the information required for policy and decision-making. This platform makes data accessible at multiple levels for practitioners of life cycle assessment, while making data interoperable across platforms. The proposed data hub platform and workflow are explained through the polymer electrolyte membrane electrolysis hydrogen production as a case study. The climate change environment impact of 1.17 ± 0.03 kg CO2 eq./kg H2 was calculated for the case study. The current data hub platform is limited to evaluating environmental impacts; however, future additions of economic and social aspects are envisaged.
      Citation: Data
      PubDate: 2024-11-05
      DOI: 10.3390/data9110129
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 130: Non-Destructive Wood Analysis Dataset: Comparing
           X-Ray and Terahertz Imaging Techniques

    • Authors: Caroline Marc, Bertrand Marcon, Louis Denaud, Stéphane Girardon
      First page: 130
      Abstract: Wood density measurement plays a crucial role in assessing wood quality and predicting its mechanical performance. This dataset was collected to compare the accuracy and reliability of two non-destructive techniques, X-rays and terahertz waves, for measuring wood density. While X-rays have been commonly used in the industry due to their effectiveness, they pose health risks due to ionizing radiation. Terahertz waves, on the other hand, are non-ionizing and offer high spatial resolution. This article presents a database of wood samples measurements obtained using both techniques, on the same 110 samples with a fine location of the measuring points, on a wide range of wood species (tropical and temperate ones) and densities, from 111 kg·m−3 to 1086 kg·m−3. The database includes X-ray and terahertz scans, sample dimensions, moisture content, and color photographs.
      Citation: Data
      PubDate: 2024-11-05
      DOI: 10.3390/data9110130
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 131: Influence of Temperature Variability on the
           Efficacy of Negative Ions in Removing Particulate Matter and Pollutants:
           An Experimental Database

    • Authors: Paola M. Ortiz-Grisales, Leidy Gutiérrez-León, Carlos D. Zuluaga-Ríos
      First page: 131
      Abstract: Cities globally must make urgent decisions to ensure a sustainable future as rising pollution, particularly PM2.5, poses severe health risks like respiratory and heart diseases. PM2.5’s harmful composition also impacts vegetation and the environment. Immediate government intervention is necessary to mitigate these effects. This study tackles the urgent problem of reducing PM2.5 levels in Medellín’s urban and indoor environments, where pollution presents serious health risks. To explore effective solutions, this research provides new data on the interaction between particulate matter from various pollutants and negative ions under different temperature conditions, offering valuable insights into air quality improvement strategies. Using a high-voltage system, ions bind to pollutants, accelerating their removal. Experiments measured temperature, humidity, formaldehyde, volatile organic compounds, negative ions, and PM2.5 in a 40 cm3 chamber across various conditions. Pollutants tested included cigarette smoke, incense, charcoal, and gasoline at two voltage levels and three temperature ranges. The data, available in CSV format, were based on 36,000 samples and repeated tests for reliability. This resource is designed to support studies investigating particulate matter control in urban and indoor environments, as well as to improve our understanding of negative ion-based air purification processes. The data are publicly available and structured in formats compatible with leading data analysis platforms.
      Citation: Data
      PubDate: 2024-11-08
      DOI: 10.3390/data9110131
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 132: The VNF Cybersecurity Dataset for Research
           (VNFCYBERDATA)

    • Authors: Believe Ayodele, Victor Buttigieg
      First page: 132
      Abstract: Virtualisation has received widespread adoption and deployment across a wide range of enterprises and industries throughout the years. Network Function Virtualisation (NFV) is a technical concept that presents a method for dynamically delivering virtualised network functions as virtualised or software components. Virtualised Network Function (VNF) has distinct advantages, but it also faces serious security challenges. Cyberattacks such as Denial of Service (DoS), malware/rootkit injection, port scanning, and so on can target VNF appliances just like any other network infrastructure. To create exceptional training exercises for machine or deep learning (ML/DL) models to combat cyberattacks in VNF, a suitable dataset (VNFCYBERDATA) exhibiting an actual reflection, or one that is reasonably close to an actual reflection, of the problem that the ML/DL model could address is required. This article describes a real VNF dataset that contains over seven million data points and twenty-five cyberattacks generated from five VNF appliances. To facilitate a realistic examination of VNF traffic, the dataset includes both benign and malicious traffic.
      Citation: Data
      PubDate: 2024-11-08
      DOI: 10.3390/data9110132
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 133: Additions to Space Physics Data Facility and
           pysatNASA: Increasing Mars Global Surveyor and Mars Atmosphere and
           Volatile EvolutioN Dataset Utility

    • Authors: Teresa M. Esman, Alexa J. Halford, Jeff Klenzing, Angeline G. Burrell
      First page: 133
      Abstract: The Space Physics Data Facility (SPDF) is a digital archive of space physics data and is useful for the storage, analysis, and dissemination of data. We discuss the process used to create an amended dataset and store it on the SPDF. The operational software to generate the archival data software uses the open-source Python package pysat, and an end-user module has been added to the pysatNASA module. The result is the addition of data products to the Mars Global Surveyor (MGS) magnetometer (MAG) dataset, its archival location on SPDF, and pysat compatibility. The primary and metadata format increases the convenience and efficiency for users of the MGS MAG data. The storage of planetary and heliophysics data in one location supports the use of data throughout the solar system for comparison, while pysat compatibility enables loading data in an identical format for ease of processing. We encourage the use of the outlined process for past, present, and future space science missions of all sizes and funding levels. This includes balloons to Flagship-class missions.
      Citation: Data
      PubDate: 2024-11-08
      DOI: 10.3390/data9110133
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 134: The Design of a Script Identification Algorithm
           and Its Application in Constructing a Text Language Identification Dataset
           

    • Authors: Mamtimin Qasim, Wushour Silamu, Minghui Qiu
      First page: 134
      Abstract: Script identification is easier to implement than language identification, and its identification rate is very high. The fewer languages are identified when using a language identification algorithm, the higher the identification rate is. However, no systematic study on SI involving multiple languages and determining how to construct relevant language identification datasets has been conducted. Therefore, in this paper, we discuss and design a script identification algorithm and the construction of a language identification dataset based on script groups. The data sources in this paper comprise 261 different languages’ text corpora from the Leipzig Corpora Collection, which are grouped into 23 different script groups. In the Unicode encoding scheme, different scripts are arranged into different code regions. Based on this feature, we propose a written script identification algorithm based on regular expression matching, the micro F-score of which reaches 0.9929 in sentence-level script identification experiments. To reduce noise when constructing the language identification dataset for each script, a script identification algorithm is used to filter out other-script content in each text.
      Citation: Data
      PubDate: 2024-11-11
      DOI: 10.3390/data9110134
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 135: Dataset to Quantify Spillover Effects Among
           Concurrent Green Initiatives

    • Authors: Rong Zhang, Qi Zhang, Conghe Song, Li An
      First page: 135
      Abstract: Green initiatives are popular mechanisms globally to enhance environmental and human wellbeing. However, multiple green initiatives, when overlapping geographically and targeting the same participants, may interact with each other, giving rise to what is termed “spillover effects”, where one initiative and its outcomes influence another. This study examines the spillover effects among four major concurrent initiatives in the United States (U.S.) and China using a comprehensive dataset. In the U.S., we analysed county-level data in 2018 for the Conservation Reserve Program (CRP) and the Environmental Quality Incentives Program (EQIP), both operational for over 25 years. In China, data from Fanjingshan and Tianma National Nature Reserves (2014–2015) were used to evaluate the Grain-to-Green Program (GTGP) and the Forest Ecological Benefit Compensation (FEBC) program. The dataset comprises 3106 records for the U.S. and 711 plots for China, including several socio-economic variables. The results of multivariate linear regression indicate that there exist significant spillover effects between CRP & EQIP and GTGP & FEBC, with one initiative potentially enhancing or offsetting another’s impacts by 22% to 100%. This dataset provides valuable insights for researchers and policymakers to optimize the effectiveness and resilience of concurrent green initiatives.
      Citation: Data
      PubDate: 2024-11-13
      DOI: 10.3390/data9110135
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 136: Two Datasets over South Tyrol and Tyrol Areas to
           Understand and Characterize Water Resource Dynamics in Mountain Regions

    • Authors: Ludovica De Gregorio, Giovanni Cuozzo, Riccardo Barella, Francisco Corvalán, Felix Greifeneder, Peter Grosse, Abraham Mejia-Aguilar, Georg Niedrist, Valentina Premier, Paul Schattan, Alessandro Zandonai, Claudia Notarnicola
      First page: 136
      Abstract: In this work, we present two datasets for specific areas located on the Alpine arc that can be exploited to monitor and understand water resource dynamics in mountain regions. The idea is to provide the reader with information about the different sources of water supply over five defined test areas over the South Tyrol (Italy) and Tyrol (Austria) areas in alpine environments. The snow cover fraction (SCF) and Soil Moisture Content (SMC) datasets are derived from machine learning algorithms based on remote sensing data. Both SCF and SMC products are characterized by a spatial resolution of 20 m and are provided for the period from October 2020 to May 2023 (SCF) and from October 2019 to September 2022 (SMC), respectively, covering winter seasons for SCF and spring–summer seasons for SMC. For SCF maps, the validation with very high-resolution images shows high correlation coefficients of around 0.9. The SMC products were originally produced with an algorithm validated at a global scale, but here, to obtain more insights into the specific alpine mountain environment, the values estimated from the maps are compared with ground measurements of automatic stations located at different altitudes and characterized by different aspects in the Val Mazia catchment in South Tyrol (Italy). In this case, an MAE between 0.05 and 0.08 and an unbiased RMSE between 0.05 and 0.09 m3·m−3 were achieved. The datasets presented can be used as input for hydrological models and to hydrologically characterize the study alpine area starting from different sources of information.
      Citation: Data
      PubDate: 2024-11-16
      DOI: 10.3390/data9110136
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 137: Dual Transcriptome of Post-Germinating Mutant
           Lines of Arabidopsis thaliana Infected by Alternaria brassicicola

    • Authors: Mailen Ortega-Cuadros, Laurine Chir, Sophie Aligon, Nubia Velasquez, Tatiana Arias, Jerome Verdier, Philippe Grappin
      First page: 137
      Abstract: Alternaria brassicicola is a seed-borne pathogen that causes black spot disease in Brassica crops, yet the seed defense mechanisms against this fungus remain poorly understood. Building upon recent reports that highlighted the involvement of indole pathways in seeds infected by Alternaria, this study provides transcriptomic resources to further elucidate the role of these metabolic pathways during the interaction between seeds and fungal pathogens. Using RNA sequencing, we examined the gene expression of glucosinolate-deficient mutant lines (cyp79B2/cyp79B3 and qko) and a camalexin-deficient line (pad3), generating a dataset from 14 samples. These samples were inoculated with Alternaria or water, and collected at 3, 6, and 10 days after sowing to extract total RNA. Sequencing was performed using DNBseq™ technology, followed by bioinformatics analyses with tools such as FastQC (version 0.11.9), multiQC (version 1.13), Venny (version 2.0), Salmon software (version 0.14.1), and R packages DESeq2 (version 1.36.0), ClusterProfiler (version 4.12.6) and ggplot2 (version 3.4.0). By providing this valuable dataset, we aim to contribute to a deeper understanding of seed defense mechanisms against Alternaria, leveraging RNA-seq for various analyses, including differential gene expression and co-expression correlation. This work serves as a foundation for a more comprehensive grasp of the interactions during seed infection and highlights potential targets for enhancing crop protection and management.
      Citation: Data
      PubDate: 2024-11-18
      DOI: 10.3390/data9110137
      Issue No: Vol. 9, No. 11 (2024)
       
  • Data, Vol. 9, Pages 110: Comprehensive Overview of Long-Term Ecosystem
           Research Datasets at LTER Site Oberes Stubachtal

    • Authors: Bernhard Zagel, Hans Wiesenegger, Robert R. Junker, Gerhard Ehgartner
      First page: 110
      Abstract: This article provides a comprehensive overview of all currently available datasets of the Long-term Ecosystem Research (LTER) site Oberes Stubachtal. The site is located in the Hohe Tauern mountain range (Eastern Alps, Austria) and includes both protected areas (Hohe Tauern National Park) and unprotected areas (Stubach valley). While the main research focus of the site is on high mountains, glaciology, glacial hydrology, and biodiversity, the eLTER Whole-System Approach (WAILS) was used for data selection. This approach involves a systematic screening of all available data to assess their suitability as eLTER Standard Observations (SOs). This includes the geosphere, atmosphere, hydrosphere, biosphere, and sociosphere. These SOs are fundamental to the development of a comprehensive long-term ecosystem research framework. In total, more than 40 datasets have been collated for the LTER site Oberes Stubachtal and included in the Dynamic Ecological Information Management System—Site and Data Registry (DEIMS-SDR), the eLTER’s data platform. This paper provides a detailed inventory of the datasets and their primary attributes, evaluates them against the WAILS-required observation data, and offers insights into strategies for future initiatives. All datasets are made available through dedicated repositories for FAIR (findable, accessible, interoperable, reusable) use.
      Citation: Data
      PubDate: 2024-09-25
      DOI: 10.3390/data9100110
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 111: Non-Linear Relationship between MiRNA Regulatory
           Activity and Binding Site Counts on Target mRNAs

    • Authors: Shuangmei Tian, Ziyu Zhao, Beibei Ren, Degeng Wang
      First page: 111
      Abstract: MicroRNAs (miRNA) exert regulatory actions via base pairing with their binding sites on target mRNAs. Cooperative binding, i.e., synergism, among binding sites on an mRNA is biochemically well characterized. We studied whether this synergism is reflected in the global relationship between miRNA-mediated regulatory activity and miRNA binding site count on the target mRNAs, i.e., leading to a non-linear relationship between the two. Recently, using our own and public datasets, we have enquired into miRNA regulatory actions: first, we analyzed the power-law distribution pattern of miRNA binding sites; second, we found that, strikingly, mRNAs for core miRNA regulatory apparatus proteins have extraordinarily high binding site counts, forming self-feedback-control loops; third, we revealed that tumor suppressor mRNAs generally have more sites than oncogene mRNAs; and fourth, we characterized enrichment of miRNA-targeted mRNAs in translationally less active polysomes relative to more active polysomes. In these four studies, we qualitatively observed obvious positive correlation between the extent to which an mRNA is miRNA-regulated and its binding site count. This paper summarizes the datasets used. We also quantitatively analyzed the correlation by comparative linear and non-linear regression analyses. Non-linear relationships, i.e., accelerating rise of regulatory activity as binding site count increases, fit the data much better, conceivably a transcriptome-level reflection of cooperative binding among miRNA binding sites on a target mRNA. This observation is potentially a guide for integrative quantitative modeling of the miRNA regulatory system.
      Citation: Data
      PubDate: 2024-09-25
      DOI: 10.3390/data9100111
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 112: Fundamentals of Analysis of Health Data for
           Non-Physicians

    • Authors: Carlos Hernández-Nava, Miguel-Félix Mata-Rivera, Sergio Flores-Hernández
      First page: 112
      Abstract: The increasing prevalence of diabetes worldwide, including in Mexico, presents significant challenges to healthcare systems. This has a notable impact on hospital admissions, as diabetes is considered an ambulatory care-sensitive condition, meaning that hospitalizations could be avoided. This is just one example of many challenges faced in the medical and public health fields. Traditional healthcare methods have been effective in managing diabetes and preventing complications. However, they often encounter limitations when it comes to analyzing large amounts of health data to effectively identify and address diseases. This paper aims to bridge this gap by outlining a comprehensive methodology for non-physicians, particularly data scientists, working in healthcare. As a case study, this paper utilizes hospital diabetes discharge records from 2010 to 2023, totaling 36,665,793 records from medical units under the Ministry of Health of Mexico. We aim to highlight the importance for data scientists to understand the problem and its implications. By doing so, insights can be generated to inform policy decisions and reduce the burden of avoidable hospitalizations. The approach primarily relies on stratification and standardization to uncover rates based on sex and age groups. This study provides a foundation for data scientists to approach health data in a new way.
      Citation: Data
      PubDate: 2024-09-27
      DOI: 10.3390/data9100112
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 113: Dataset for Machine Learning: Explicit All-Sky
           Image Features to Enhance Solar Irradiance Prediction

    • Authors: Joylan Nunes Maciel, Jorge Javier Gimenez Ledesma, Oswaldo Hideo Ando Junior
      First page: 113
      Abstract: Prediction of solar irradiance is crucial for photovoltaic energy generation, as it helps mitigate intermittencies caused by atmospheric fluctuations such as clouds, wind, and temperature. Numerous studies have applied machine learning and deep learning techniques from artificial intelligence to address this challenge. Based on the recently proposed Hybrid Prediction Method (HPM), this paper presents an original and comprehensive dataset with nine attributes extracted from all-sky images developed using image processing techniques. This dataset and analysis of its attributes offer new avenues for research into solar irradiance forecasting. To ensure reproducibility, the data processing workflow and the standardized dataset have been meticulously detailed and made available to the scientific community to promote further research into prediction methods for photovoltaic energy generation.
      Citation: Data
      PubDate: 2024-09-29
      DOI: 10.3390/data9100113
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 114: Open and Collaborative Dataset for the
           Classification of Operational Transconductance Amplifiers for
           Switched-Capacitor Applications

    • Authors: Francesco Gagliardi, Michele Dei
      First page: 114
      Abstract: This study introduces a collaborative and open dataset designed to classify operational transconductance amplifiers (OTAs) in switched-capacitor applications. The dataset comprises a diverse collection of OTA designs sourced from the literature, facilitating benchmarking, analysis and innovation in analog and mixed-signal integrated circuit design. Various evaluation methodologies, implemented through a companion Python notebook script, are discussed to assess OTA performances across different operating conditions and specifications. Several Figures of Merit (FoMs) are utilized as performance metrics to achieve significant performance classification. This study also uncovers intriguing behaviors and correlations among FoMs, providing valuable insights into OTA design considerations. By making the dataset openly available on platforms like GitHub, this work encourages collaboration and knowledge sharing within the integrated circuit design community, thereby enhancing transparency, reproducibility and innovation in OTA design research.
      Citation: Data
      PubDate: 2024-10-03
      DOI: 10.3390/data9100114
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 115: Characterization and Dataset Compilation of
           Torque–Angle Curve Behavior for M2/M3 Screws

    • Authors: Iván Juan Carlos Pérez-Olguín, Consuelo Catalina Fernández-Gaxiola, Luis Alberto Rodríguez-Picón, Luis Carlos Méndez-González
      First page: 115
      Abstract: This research explores the torque–angle behavior of M2/M3 screws in automotive applications, focusing on ensuring component reliability and manufacturing precision within the recommended assembly specification limits. M2/M3 screws, often used in tight spaces, are susceptible to issues like stripped threads and inconsistent torque, which can compromise safety and performance. The study’s primary objective is to develop a comprehensive dataset of torque–angle measurements for these screws, facilitating the analysis of key parameters such as torque-to-seat, torque-to-fail, and process windows. By applying Gaussian curve fitting and Gaussian process regression, the research models and simulates torque behavior to understand torque dynamics in small fasteners and remarks on the potential of statistical methods in torque analysis, offering insights for improving manufacturing practices. As a result, it can be concluded that the proposed stochastics methodologies offer the benefit of fail-to-seat ratio improvement, allow inference, reduce the sample size needed in incoming test studies, and minimize the number of destructive test samples needed.
      Citation: Data
      PubDate: 2024-10-06
      DOI: 10.3390/data9100115
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 116: Data Descriptor for “Understanding and
           Perception of Automated Text Generation among the Public: Two Surveys with
           Representative Samples in Germany”

    • Authors: Angelica Lermann Henestrosa, Joachim Kimmerle
      First page: 116
      Abstract: With the release of ChatGPT, text-generating AI became accessible to the general public virtually overnight, and automated text generation (ATG) became the focus of public debate. Previously, however, little attention had been paid to this area of AI, resulting in a gap in the research on people’s attitudes and perceptions of this technology. Therefore, two representative surveys among the German population were conducted before (March 2022) and after (July 2023) the release of ChatGPT to investigate people’s attitudes, concepts, and knowledge on ATG in detail. This data descriptor depicts the structure of the two datasets, the measures collected, and potential analysis approaches beyond the existing research paper. Other researchers are encouraged to take up these data sets and explore them further as suggested or as they deem appropriate.
      Citation: Data
      PubDate: 2024-10-11
      DOI: 10.3390/data9100116
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 117: Perception and Reuse of Open Data in the Spanish
           University Teaching and Research Community

    • Authors: Christian Vidal-Cabo, Enrique Alfonso Sánchez-Pérez, Antonia Ferrer-Sapena
      First page: 117
      Abstract: Introduction. Open Government is a form of public policy based on the pillars of collaboration and citizen participation, transparency and the right of access to public information. With the help of information and communication technologies, governments and administrations carry out open data initiatives, making reusable datasets available to all citizens. The academic community, highly qualified personnel, can become potential reusers of this data, which would lead to its use for scientific research, generating knowledge, and for teaching, improving the training of university students and promoting the reuse of open data in the future. Method. This study was developed using a quantitative research methodology (survey), which was distributed by email in one context block and six technical blocks, with a total of 30 questions. The data collection period was between 15 March and 10 May 2021. Analysis. The data obtained through this quantitative methodology were processed, normalised, and analysed. Results. A total of 783 responses were obtained, from 34 Spanish provinces. The researchers come from 47 Spanish universities and 21 research centres, and 19 research areas of the State Research Agency are represented. In addition, a platform was developed with the data for the purpose of visualising the results of the survey. Conclusions. The sample thus obtained is representative and the conclusions can be extrapolated to the rest of the Spanish university teaching staff. In terms of gender, the study is balanced between men and women (41.76% W vs. 56.58% M). In general, researchers responding to the survey know what open data is (79.31%) but only 50.57% reuse open data. The main conclusion is that open government data prove to be useful sources of information for science, especially in areas such as Social Sciences, Industrial Production, Engineering and Engineering for Society, Information and Communication Technologies, Economics and Environmental Sciences.
      Citation: Data
      PubDate: 2024-10-11
      DOI: 10.3390/data9100117
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 118: A Dataset of Two-Dimensional XBeach Model Set-Up
           Files for Northern California

    • Authors: Andrea C. O’Neill, Kees Nederhoff, Li H. Erikson, Jennifer A. Thomas, Patrick L. Barnard
      First page: 118
      Abstract: Here, we describe a dataset of two-dimensional (2D) XBeach model files that were developed for the Coastal Storm Modeling System (CoSMoS) in northern California as an update to an earlier CoSMoS implementation that relied on one-dimensional (1D) modeling methods. We provide details on the data and their application, such that they might be useful to end-users for other coastal studies. Modeling methods and outputs are presented for Humboldt Bay, California, in which we compare output from a nested 1D modeling approach to 2D model results, demonstrating that the 2D method, while more computationally expensive, results in a more cohesive and directly mappable flood hazard result.
      Citation: Data
      PubDate: 2024-10-11
      DOI: 10.3390/data9100118
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 119: Data Mining Approach for Evil Twin Attack
           Identification in Wi-Fi Networks

    • Authors: Roman Banakh, Elena Nyemkova, Connie Justice, Andrian Piskozub, Yuriy Lakh
      First page: 119
      Abstract: Recent cyber security solutions for wireless networks during internet open access have become critically important for personal data security. The newest WPA3 network security protocol has been used to maximize this protection; however, attackers can use an Evil Twin attack to replace a legitimate access point. The article is devoted to solving the problem of intrusion detection at the OSI model’s physical layers. To solve this, a hardware–software complex has been developed to collect information about the signal strength from Wi-Fi access points using wireless sensor networks. The collected data were supplemented with a generative algorithm considering all possible combinations of signal strength. The k-nearest neighbor model was trained on the obtained data to distinguish the signal strength of legitimate from illegitimate access points. To verify the authenticity of the data, an Evil Twin attack was physically simulated, and a machine learning model analyzed the data from the sensors. As a result, the Evil Twin attack was successfully identified based on the signal strength in the radio spectrum. The proposed model can be used in open access points as well as in large corporate and home Wi-Fi networks to detect intrusions aimed at substituting devices in the radio spectrum where IEEE 802.11 networking equipment operates.
      Citation: Data
      PubDate: 2024-10-14
      DOI: 10.3390/data9100119
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 120: Rainfall Erosivity over Brazil: A Large National
           Database

    • Authors: Mariza P. Oliveira-Roza, Roberto A. Cecílio, David B. S. Teixeira, Michel C. Moreira, André Q. Almeida, Alexandre C. Xavier, Sidney S. Zanetti
      First page: 120
      Abstract: Rainfall erosivity (RE) represents the potential of rainfall to cause soil erosion, and understanding its impact is essential for the adoption of soil and water conservation practices. Although several studies have estimated RE for Brazil, currently, no single reliable and easily accessible database exists for the country. To fill this gap, this work aimed to review the research and generate a rainfall erosivity database for Brazil. Data were gathered from studies that determined rainfall erosivity from observed rainfall records and synthetic rainfall series. Monthly and annual rainfall erosivity values were organized on a spreadsheet and in the shapefile format. In total, 54 studies from 1990 to 2023 were analyzed, resulting in the compilation of 5516 erosivity values for Brazil, of which 6.3% were pluviographic, and 93.7% were synthetic. The regions with the highest availability of information were the Northeast (35.6%), Southeast (30.1%), South (19.9%), Central-West (7.7%), and North (6.7%). The database, which can be accessed on the Mendeley Data platform, can aid professionals and researchers in adopting public policies and carrying out studies aimed at environmental conservation and management basin development.
      Citation: Data
      PubDate: 2024-10-14
      DOI: 10.3390/data9100120
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 9, Pages 121: Computing the Commonalities of Clusters in
           Resource Description Framework: Computational Aspects

    • Authors: Simona Colucci, Francesco Maria Donini, Eugenio Di Sciascio
      First page: 121
      Abstract: Clustering is a very common means of analysis of the data present in large datasets, with the aims of understanding and summarizing the data and discovering similarities, among other goals. However, despite the present success of the use of subsymbolic methods for data clustering, a description of the obtained clusters cannot rely on the intricacies of the subsymbolic processing. For clusters of data expressed in a Resource Description Framework (RDF), we extend and implement an optimized, previously proposed, logic-based methodology that computes an structure—called a Common Subsumer—describing the commonalities among all resources. We tested our implementation with two open, and very different, datasets: one devoted to public procurement, and the other devoted to drugs in pharmacology. For both datasets, we were able to provide reasonably concise and readable descriptions of clusters with up to 1800 resources. Our analysis shows the viability of our methodology and computation, and paves the way for general cluster explanations to be provided to lay users.
      Citation: Data
      PubDate: 2024-10-20
      DOI: 10.3390/data9100121
      Issue No: Vol. 9, No. 10 (2024)
       
  • Data, Vol. 10, Pages 1: Artificial Intelligence and Ontologies for the
           Management of Heritage Digital Twins Data

    • Authors: Achille Felicetti, Franco Niccolucci
      First page: 1
      Abstract: This study builds upon the Reactive Heritage Digital Twin paradigm established in prior research, exploring the role of artificial intelligence in expanding and enhancing its capabilities. After providing an overview of the ontological model underlying the RHDT paradigm, this paper investigates the application of AI to improve data analysis and predictive capabilities of Heritage Digital Twins in synergy with the previously defined RHDTO semantic model. The structured nature of ontologies is highlighted as essential for enabling AIs to operate transparently, minimising hallucinations and other errors that are characteristic challenges of these technologies. New classes and properties within RHDTO are introduced to represent the AI-enhanced functions. Finally, some case studies are provided to illustrate how integrating AI within the RHDT framework can contribute to enriching the understanding of cultural information through interconnected data and facilitate real-time monitoring and preservation of cultural objects.
      Citation: Data
      PubDate: 2024-12-26
      DOI: 10.3390/data10010001
      Issue No: Vol. 10, No. 1 (2024)
       
  • Data, Vol. 10, Pages 2: Minisatellite Isolation and Minisatellite
           Molecular Marker Development in Citrus limon (L.) Osbeck

    • Authors: Oleg S. Alexandrov, Dmitry V. Romanov
      First page: 2
      Abstract: Minisatellites are widespread tandem DNA repeats in the genome with a monomer length of 10 to 100 bp. The high variability of minisatellite loci makes them attractive for the development of molecular markers. Minisatellites are used as markers according to three strategies: marking of digested genomic DNA with minisatellite-based probes; amplification with primers based on the sequences of the minisatellites themselves; amplification with primers designed for borders upstream and downstream of the minisatellite locus. In this study, a microsatellite dataset was obtained from the analysis of the Citrus limon (L.) Osbeck genome using Tandem Repeat Finder (TRF) and GMATA software. The minisatellite loci found were used to develop molecular markers that were tested in GMATA using electronic PCR (e-PCR). The obtained dataset includes sequences of extracted minisatellites and their characteristics (start and end nucleotide positions on the chromosome, length of monomer, number of repetitions and length of array), as well as sequences of developed primers, expected lengths of amplicons, and e-PCR results. The presented dataset can be used for the marking of lemon samples according to any of the three strategies. It provides a useful basis for lemon variety certification, identification of samples, verification of collections, lemon genome mapping, saturation of already created maps, studying of the lemon genome architecture etc.
      Citation: Data
      PubDate: 2024-12-28
      DOI: 10.3390/data10010002
      Issue No: Vol. 10, No. 1 (2024)
       
  • Data, Vol. 10, Pages 3: Synthetic Dataset for Analyzing Geometry-Dependent
           Optical Properties of All-Pass Micro-Ring Resonators

    • Authors: Sebastian Valencia-Garzon, Esteban Gonzalez-Valencia, Nelson Gómez-Cardona, Andres Calvo-Salcedo, J. A. Jaramillo-Villegas, Jorge Montoya-Cardona, Erick Reyes-Vera
      First page: 3
      Abstract: This study focuses on the analysis of the spectral response of all-pass micro-ring resonators (MRRs), which are essential in photonic device applications such as telecommunications, sensing, and optical frequency comb generation. The aim of this work is to generate a synthetic dataset that explores the spectral characteristics of the expected transmission spectra of MRRs by varying their structural parameters. Using numerical simulations, the dataset will allow the optimization of MRR performance metrics such as free spectral range (FSR), full width at half maximum (FWHM), and quality factor (Q-factor). The results confirm that variations in geometric configurations can significantly affect MRR performance, and the dataset provides valuable insights into the optimization process. Furthermore, machine learning techniques can be applied to the dataset to automate and improve the design process, reducing simulation times and increasing accuracy. This work contributes to the development of photonic devices by providing a broad dataset for further analysis and optimization.
      Citation: Data
      PubDate: 2024-12-30
      DOI: 10.3390/data10010003
      Issue No: Vol. 10, No. 1 (2024)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 18.97.14.90
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-
JournalTOCs
 
 

 A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
Showing 1 - 200 of 265 Journals sorted alphabetically
Accountability in Research: Policies and Quality Assurance     Hybrid Journal   (Followers: 19)
Acta Nova     Open Access   (Followers: 2)
Acta Scientifica Malaysia     Open Access   (Followers: 1)
Acta Scientifica Naturalis     Open Access   (Followers: 4)
Adıyaman University Journal of Science     Open Access  
Advanced Science     Open Access   (Followers: 16)
Advanced Science, Engineering and Medicine     Partially Free   (Followers: 8)
Advanced Theory and Simulations     Hybrid Journal   (Followers: 4)
Advances in Research     Open Access  
Advances in Science and Technology     Full-text available via subscription   (Followers: 18)
African Journal of Science, Technology, Innovation and Development     Hybrid Journal   (Followers: 8)
Afrique Science : Revue Internationale des Sciences et Technologie     Open Access   (Followers: 1)
AFRREV STECH : An International Journal of Science and Technology     Open Access   (Followers: 3)
Alfarama Journal of Basic & Applied Sciences     Open Access   (Followers: 12)
American Academic & Scholarly Research Journal     Open Access   (Followers: 4)
American Journal of Applied Sciences     Open Access   (Followers: 22)
American Journal of Humanities and Social Sciences     Open Access   (Followers: 14)
Anales del Instituto de la Patagonia     Open Access  
Applied Mathematics and Nonlinear Sciences     Open Access   (Followers: 2)
Arab Journal of Basic and Applied Sciences     Open Access  
Arabian Journal for Science and Engineering     Hybrid Journal   (Followers: 1)
Archives Internationales d'Histoire des Sciences     Partially Free   (Followers: 5)
Archives of Current Research International     Open Access  
ARPHA Conference Abstracts     Open Access   (Followers: 1)
ARPHA Proceedings     Open Access  
Asian Journal of Advanced Research and Reports     Open Access  
Asian Journal of Scientific Research     Open Access   (Followers: 2)
Asian Journal of Technology Innovation     Hybrid Journal   (Followers: 5)
Australian Field Ornithology     Full-text available via subscription   (Followers: 1)
Australian Journal of Social Issues     Hybrid Journal   (Followers: 6)
Bangladesh Journal of Scientific Research     Open Access  
Beni-Suef University Journal of Basic and Applied Sciences     Open Access   (Followers: 1)
Berichte Zur Wissenschaftsgeschichte     Hybrid Journal   (Followers: 11)
Bilge International Journal of Science and Technology Research     Open Access  
Bioethics Research Notes     Full-text available via subscription   (Followers: 15)
BJHS Themes     Open Access   (Followers: 6)
Bulletin de la Société Royale des Sciences de Liège     Open Access  
Bulletin of the National Research Centre     Open Access  
Chain Reaction     Full-text available via subscription  
Ciencia Amazónica (Iquitos)     Open Access  
Ciencia en su PC     Open Access   (Followers: 1)
Ciencia Ergo Sum     Open Access  
Ciência ET Praxis     Open Access  
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering     Open Access  
Comunicata Scientiae     Open Access  
Conference Papers in Science     Open Access  
Configurations     Full-text available via subscription   (Followers: 11)
COSMOS     Hybrid Journal   (Followers: 1)
Crea Ciencia Revista Científica     Open Access  
Current Issues in Criminal Justice     Hybrid Journal   (Followers: 14)
Current Research in Geoscience     Open Access   (Followers: 6)
Data     Open Access   (Followers: 4)
Dhaka University Journal of Science     Open Access  
Discover Sustainability     Open Access   (Followers: 4)
Einstein (São Paulo)     Open Access  
Ekaia : EHUko Zientzia eta Teknologia aldizkaria     Open Access  
Emergent Scientist     Open Access  
Enhancing Learning in the Social Sciences     Open Access   (Followers: 8)
Enseñanza de las Ciencias : Revista de Investigación y Experiencias Didácticas     Open Access  
Entramado     Open Access  
Entre Ciencia e Ingeniería     Open Access  
Epiphany     Open Access   (Followers: 1)
Ethiopian Journal of Education and Sciences     Open Access   (Followers: 5)
European Online Journal of Natural and Social Sciences     Open Access   (Followers: 4)
European Scientific Journal     Open Access   (Followers: 11)
Evidência - Ciência e Biotecnologia - Interdisciplinar     Open Access  
Exchanges : the Warwick Research Journal     Open Access   (Followers: 1)
Experimental Results     Open Access   (Followers: 2)
Fides et Ratio : Revista de Difusión Cultural y Científica     Open Access  
Fontanus     Open Access   (Followers: 1)
Forensic Science Policy & Management: An International Journal     Hybrid Journal   (Followers: 286)
Frontiers in Climate     Open Access   (Followers: 5)
Frontiers in Science     Open Access   (Followers: 1)
Fundamental Research     Open Access  
Futures & Foresight Science     Hybrid Journal   (Followers: 1)
Gaudium Sciendi     Open Access  
Ghana Studies     Full-text available via subscription   (Followers: 15)
Global Journal of Pure and Applied Sciences     Full-text available via subscription  
Globe, The     Full-text available via subscription   (Followers: 4)
HardwareX     Open Access  
Heidelberger Jahrbücher Online     Open Access  
Heliyon     Open Access   (Followers: 1)
History of Science and Technology     Open Access   (Followers: 5)
Hoosier Science Teacher     Open Access  
Indian Journal of History of Science     Hybrid Journal   (Followers: 2)
Instruments     Open Access  
Interciencia     Open Access  
International Annals of Science     Open Access  
International Journal of Advanced Multidisciplinary Research and Review     Open Access  
International Journal of Applied Science     Open Access  
International Journal of Engineering, Science and Technology     Open Access  
International Journal of Network Science     Hybrid Journal   (Followers: 3)
International Journal of Social Sciences and Management     Open Access   (Followers: 3)
International Journal of Technology Policy and Law     Hybrid Journal   (Followers: 10)
International Science and Technology Journal of Namibia     Open Access   (Followers: 2)
International Scientific and Vocational Studies Journal     Open Access  
Investiga : TEC     Open Access  
Investigación Joven     Open Access  
Investigacion y Ciencia     Open Access   (Followers: 1)
Iranian Journal of Science and Technology, Transactions A : Science     Hybrid Journal  
iScience     Open Access   (Followers: 2)
Issues in Science & Technology     Free   (Followers: 9)
Ithaca : Viaggio nella Scienza     Open Access  
J : Multidisciplinary Scientific Journal     Open Access  
Jaunujų mokslininkų darbai     Open Access   (Followers: 3)
Journal de la Recherche Scientifique de l'Universite de Lome     Full-text available via subscription  
Journal of Chromatography & Separation Techniques     Open Access   (Followers: 9)
Journal of Advanced Research     Open Access   (Followers: 2)
Journal of Analytical Science & Technology     Open Access   (Followers: 5)
Journal of Applied Science and Technology     Full-text available via subscription   (Followers: 1)
Journal of Applied Sciences and Environmental Management     Open Access   (Followers: 1)
Journal of Big History     Open Access   (Followers: 4)
Journal of Composites Science     Open Access   (Followers: 4)
Journal of Diversity Management     Open Access   (Followers: 4)
Journal of Indian Council of Philosophical Research     Hybrid Journal  
Journal of Institute of Science and Technology     Open Access  
Journal of King Saud University - Science     Open Access  
Journal of Mathematical and Fundamental Sciences     Open Access  
Journal of Negative and No Positive Results     Open Access  
Journal of Responsible Technology     Open Access  
Journal of Science and Technology     Open Access   (Followers: 2)
Journal of Science and Technology     Open Access   (Followers: 1)
Journal of Science and Technology (Ghana)     Open Access   (Followers: 3)
Journal of Science and Technology Policy Management     Hybrid Journal   (Followers: 1)
Journal of Science Foundation     Open Access   (Followers: 1)
Journal of Scientific Research and Reports     Open Access   (Followers: 1)
Journal of Shanghai Jiaotong University (Science)     Hybrid Journal  
Journal of Social Science Research     Open Access   (Followers: 2)
Journal of Taibah University for Science     Open Access  
Journal of the Ghana Science Association     Full-text available via subscription   (Followers: 3)
Journal of the History of Ideas     Full-text available via subscription   (Followers: 196)
Journal of the Indian Institute of Science     Hybrid Journal   (Followers: 4)
Journal of the Royal Society of New Zealand     Hybrid Journal   (Followers: 49)
Journal of the South Carolina Academy of Science     Open Access  
Journal of Unsolved Questions     Open Access  
Jurnal Sains Dasar     Open Access  
Jurnal Teknosains     Open Access  
Karaelmas Science and Engineering Journal     Open Access  
Karbala International Journal of Modern Science     Open Access  
Kennedy Institute of Ethics Journal     Full-text available via subscription   (Followers: 10)
Logo STI Science, Technology and Innovation     Open Access   (Followers: 14)
Malawi Journal of Science and Technology     Open Access   (Followers: 6)
Maskana     Open Access  
MethodsX     Open Access  
Mètode Science Studies Journal : Annual Review     Open Access  
Modern Applied Science     Open Access   (Followers: 1)
Momona Ethiopian Journal of Science     Open Access   (Followers: 5)
National Academy Science Letters     Hybrid Journal   (Followers: 3)
National Science Review     Hybrid Journal   (Followers: 1)
Natural Sciences     Open Access  
Natural Sciences Education     Hybrid Journal   (Followers: 1)
Naturen     Full-text available via subscription  
Nepal Journal of Science and Technology     Open Access  
Network Science     Hybrid Journal   (Followers: 4)
Nordic Journal of Science and Technology     Open Access   (Followers: 2)
Nordic Studies in Science Education     Open Access   (Followers: 4)
Nova     Open Access  
Open Conference Proceedings Journal     Open Access  
Open Journal of Applied Sciences     Open Access  
Orbis Cógnita : Revista Científica     Open Access   (Followers: 1)
Patterns     Open Access   (Followers: 9)
People and Nature     Open Access   (Followers: 4)
Población y Desarrollo - Argonautas y caminantes     Open Access  
Politique et Sociétés     Full-text available via subscription   (Followers: 1)
Portal de la Ciencia     Open Access  
Proceedings of the Indian National Science Academy     Full-text available via subscription   (Followers: 4)
Proceedings of the Linnean Society of New South Wales     Full-text available via subscription   (Followers: 2)
Proceedings of the Royal Society of Queensland, The     Full-text available via subscription  
QScience Connect     Open Access  
Quantum Science and Technology     Hybrid Journal   (Followers: 15)
Rafidain Journal of Science     Open Access  
Rehabilitation Research, Policy, and Education     Hybrid Journal   (Followers: 2)
Reportes Científicos de la FaCEN     Open Access  
Reports in Advances of Physical Sciences     Open Access  
Research Ideas and Outcomes     Open Access  
Research Integrity and Peer Review     Open Access   (Followers: 1)
Research Policy : X     Open Access   (Followers: 3)
Respuestas     Open Access  
Revista Bases de la Ciencia     Open Access  
Revista Cientifica Guillermo de Ockham     Open Access  
Revista Conhecimento Online     Open Access  
Revista Crítica de Ciências Sociais     Open Access  
Revista de Ciencia y Tecnología     Open Access  
Revista de la Academia Colombiana de Ciencias Exactas, Físicas y Naturales     Open Access  
Revista de la Universidad del Zulia     Open Access  
Revista Politécnica     Open Access  
Revista Tecnológica     Open Access  
Revista UniVap     Open Access  
SAINSTIS     Open Access  
Sainteknol : Jurnal Sains dan Teknologi     Open Access  
Sci     Open Access  
Science     Full-text available via subscription   (Followers: 5392)
Science & Diplomacy     Free   (Followers: 3)
Science Advances     Free   (Followers: 45)
Science and Technology     Open Access   (Followers: 2)
Science Heritage Journal     Open Access  
Science World Journal     Open Access  
Science, Technology and Arts Research Journal     Open Access   (Followers: 1)
ScienceRise     Open Access  
Sciences du jeu     Open Access  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Similar Journals
Similar Journals
HOME > Browse the 73 Subjects covered by JournalTOCs  
SubjectTotal Journals
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 18.97.14.90
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-