A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
Showing 1 - 200 of 265 Journals sorted alphabetically
Accountability in Research: Policies and Quality Assurance     Hybrid Journal   (Followers: 19)
Acta Nova     Open Access   (Followers: 2)
Acta Scientifica Malaysia     Open Access   (Followers: 1)
Acta Scientifica Naturalis     Open Access   (Followers: 4)
Adıyaman University Journal of Science     Open Access  
Advanced Science     Open Access   (Followers: 16)
Advanced Science, Engineering and Medicine     Partially Free   (Followers: 8)
Advanced Theory and Simulations     Hybrid Journal   (Followers: 5)
Advances in Research     Open Access  
Advances in Science and Technology     Full-text available via subscription   (Followers: 18)
African Journal of Science, Technology, Innovation and Development     Hybrid Journal   (Followers: 7)
Afrique Science : Revue Internationale des Sciences et Technologie     Open Access   (Followers: 1)
AFRREV STECH : An International Journal of Science and Technology     Open Access   (Followers: 3)
Alfarama Journal of Basic & Applied Sciences     Open Access   (Followers: 12)
American Academic & Scholarly Research Journal     Open Access   (Followers: 4)
American Journal of Applied Sciences     Open Access   (Followers: 22)
American Journal of Humanities and Social Sciences     Open Access   (Followers: 13)
Anales del Instituto de la Patagonia     Open Access  
Applied Mathematics and Nonlinear Sciences     Open Access   (Followers: 2)
Arab Journal of Basic and Applied Sciences     Open Access  
Arabian Journal for Science and Engineering     Hybrid Journal   (Followers: 1)
Archives Internationales d'Histoire des Sciences     Partially Free   (Followers: 5)
Archives of Current Research International     Open Access  
ARPHA Conference Abstracts     Open Access   (Followers: 1)
ARPHA Proceedings     Open Access  
Asian Journal of Advanced Research and Reports     Open Access  
Asian Journal of Scientific Research     Open Access   (Followers: 2)
Asian Journal of Technology Innovation     Hybrid Journal   (Followers: 5)
Australian Field Ornithology     Full-text available via subscription   (Followers: 1)
Australian Journal of Social Issues     Hybrid Journal   (Followers: 6)
Bangladesh Journal of Scientific Research     Open Access  
Beni-Suef University Journal of Basic and Applied Sciences     Open Access   (Followers: 1)
Berichte Zur Wissenschaftsgeschichte     Hybrid Journal   (Followers: 11)
Bilge International Journal of Science and Technology Research     Open Access  
Bioethics Research Notes     Full-text available via subscription   (Followers: 15)
BJHS Themes     Open Access   (Followers: 1)
Bulletin de la Société Royale des Sciences de Liège     Open Access  
Bulletin of the National Research Centre     Open Access  
Chain Reaction     Full-text available via subscription  
Ciencia Amazónica (Iquitos)     Open Access  
Ciencia en su PC     Open Access   (Followers: 1)
Ciencia Ergo Sum     Open Access  
Ciência ET Praxis     Open Access  
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering     Open Access  
Comunicata Scientiae     Open Access  
Conference Papers in Science     Open Access  
Configurations     Full-text available via subscription   (Followers: 11)
COSMOS     Hybrid Journal   (Followers: 1)
Crea Ciencia Revista Científica     Open Access  
Current Issues in Criminal Justice     Hybrid Journal   (Followers: 14)
Current Research in Geoscience     Open Access   (Followers: 6)
Data     Open Access   (Followers: 4)
Dhaka University Journal of Science     Open Access  
Discover Sustainability     Open Access   (Followers: 5)
Einstein (São Paulo)     Open Access  
Ekaia : EHUko Zientzia eta Teknologia aldizkaria     Open Access  
Emergent Scientist     Open Access  
Enhancing Learning in the Social Sciences     Open Access   (Followers: 7)
Enseñanza de las Ciencias : Revista de Investigación y Experiencias Didácticas     Open Access  
Entramado     Open Access  
Entre Ciencia e Ingeniería     Open Access  
Epiphany     Open Access   (Followers: 1)
Ethiopian Journal of Education and Sciences     Open Access   (Followers: 5)
European Online Journal of Natural and Social Sciences     Open Access   (Followers: 4)
European Scientific Journal     Open Access   (Followers: 7)
Evidência - Ciência e Biotecnologia - Interdisciplinar     Open Access  
Exchanges : the Warwick Research Journal     Open Access   (Followers: 1)
Experimental Results     Open Access   (Followers: 2)
Fides et Ratio : Revista de Difusión Cultural y Científica     Open Access  
Fontanus     Open Access   (Followers: 1)
Forensic Science Policy & Management: An International Journal     Hybrid Journal   (Followers: 252)
Frontiers in Climate     Open Access   (Followers: 5)
Frontiers in Science     Open Access   (Followers: 1)
Fundamental Research     Open Access  
Futures & Foresight Science     Hybrid Journal   (Followers: 1)
Gaudium Sciendi     Open Access  
Ghana Studies     Full-text available via subscription   (Followers: 15)
Global Journal of Pure and Applied Sciences     Full-text available via subscription  
Globe, The     Full-text available via subscription   (Followers: 4)
HardwareX     Open Access  
Heidelberger Jahrbücher Online     Open Access  
Heliyon     Open Access   (Followers: 1)
History of Science and Technology     Open Access   (Followers: 6)
Hoosier Science Teacher     Open Access  
Indian Journal of History of Science     Hybrid Journal   (Followers: 3)
Instruments     Open Access  
Interciencia     Open Access  
International Annals of Science     Open Access  
International Journal of Advanced Multidisciplinary Research and Review     Open Access  
International Journal of Applied Science     Open Access  
International Journal of Engineering, Science and Technology     Open Access  
International Journal of Network Science     Hybrid Journal   (Followers: 3)
International Journal of Social Sciences and Management     Open Access   (Followers: 2)
International Journal of Technology Policy and Law     Hybrid Journal   (Followers: 10)
International Science and Technology Journal of Namibia     Open Access   (Followers: 2)
International Scientific and Vocational Studies Journal     Open Access  
Investiga : TEC     Open Access  
Investigación Joven     Open Access  
Investigacion y Ciencia     Open Access   (Followers: 1)
Iranian Journal of Science and Technology, Transactions A : Science     Hybrid Journal  
iScience     Open Access   (Followers: 2)
Issues in Science & Technology     Free   (Followers: 8)
Ithaca : Viaggio nella Scienza     Open Access  
J : Multidisciplinary Scientific Journal     Open Access  
Jaunujų mokslininkų darbai     Open Access   (Followers: 3)
Journal de la Recherche Scientifique de l'Universite de Lome     Full-text available via subscription  
Journal of Chromatography & Separation Techniques     Open Access   (Followers: 9)
Journal of Advanced Research     Open Access   (Followers: 2)
Journal of Analytical Science & Technology     Open Access   (Followers: 5)
Journal of Applied Science and Technology     Full-text available via subscription   (Followers: 1)
Journal of Applied Sciences and Environmental Management     Open Access   (Followers: 1)
Journal of Big History     Open Access   (Followers: 4)
Journal of Composites Science     Open Access   (Followers: 4)
Journal of Diversity Management     Open Access   (Followers: 4)
Journal of Indian Council of Philosophical Research     Hybrid Journal  
Journal of Institute of Science and Technology     Open Access  
Journal of King Saud University - Science     Open Access  
Journal of Mathematical and Fundamental Sciences     Open Access  
Journal of Negative and No Positive Results     Open Access  
Journal of Responsible Technology     Open Access  
Journal of Science and Technology     Open Access   (Followers: 2)
Journal of Science and Technology     Open Access   (Followers: 1)
Journal of Science and Technology (Ghana)     Open Access   (Followers: 3)
Journal of Science and Technology Policy Management     Hybrid Journal   (Followers: 1)
Journal of Science Foundation     Open Access   (Followers: 1)
Journal of Scientific Research and Reports     Open Access   (Followers: 1)
Journal of Shanghai Jiaotong University (Science)     Hybrid Journal  
Journal of Social Science Research     Open Access   (Followers: 2)
Journal of Taibah University for Science     Open Access  
Journal of the Ghana Science Association     Full-text available via subscription   (Followers: 3)
Journal of the History of Ideas     Full-text available via subscription   (Followers: 168)
Journal of the Indian Institute of Science     Hybrid Journal   (Followers: 4)
Journal of the Royal Society of New Zealand     Hybrid Journal   (Followers: 49)
Journal of the South Carolina Academy of Science     Open Access  
Journal of Unsolved Questions     Open Access  
Jurnal Sains Dasar     Open Access  
Jurnal Teknosains     Open Access  
Karaelmas Science and Engineering Journal     Open Access  
Karbala International Journal of Modern Science     Open Access  
Kennedy Institute of Ethics Journal     Full-text available via subscription   (Followers: 10)
Logo STI Science, Technology and Innovation     Open Access   (Followers: 14)
Malawi Journal of Science and Technology     Open Access   (Followers: 6)
Maskana     Open Access  
MethodsX     Open Access  
Mètode Science Studies Journal : Annual Review     Open Access  
Modern Applied Science     Open Access   (Followers: 1)
Momona Ethiopian Journal of Science     Open Access   (Followers: 5)
National Academy Science Letters     Hybrid Journal   (Followers: 3)
National Science Review     Hybrid Journal   (Followers: 1)
Natural Sciences     Open Access  
Natural Sciences Education     Hybrid Journal   (Followers: 1)
Naturen     Full-text available via subscription  
Nepal Journal of Science and Technology     Open Access  
Network Science     Hybrid Journal   (Followers: 4)
Nordic Journal of Science and Technology     Open Access   (Followers: 2)
Nordic Studies in Science Education     Open Access   (Followers: 3)
Nova     Open Access  
Open Conference Proceedings Journal     Open Access  
Open Journal of Applied Sciences     Open Access  
Orbis Cógnita : Revista Científica     Open Access   (Followers: 2)
Patterns     Open Access   (Followers: 9)
People and Nature     Open Access   (Followers: 4)
Población y Desarrollo - Argonautas y caminantes     Open Access  
Politique et Sociétés     Full-text available via subscription   (Followers: 1)
Portal de la Ciencia     Open Access  
Proceedings of the Indian National Science Academy     Full-text available via subscription   (Followers: 5)
Proceedings of the Linnean Society of New South Wales     Full-text available via subscription   (Followers: 2)
Proceedings of the Royal Society of Queensland, The     Full-text available via subscription  
QScience Connect     Open Access  
Quantum Science and Technology     Hybrid Journal   (Followers: 15)
Rafidain Journal of Science     Open Access  
Rehabilitation Research, Policy, and Education     Hybrid Journal   (Followers: 2)
Reportes Científicos de la FaCEN     Open Access  
Reports in Advances of Physical Sciences     Open Access  
Research Ideas and Outcomes     Open Access  
Research Integrity and Peer Review     Open Access  
Research Policy : X     Open Access   (Followers: 3)
Respuestas     Open Access  
Revista Bases de la Ciencia     Open Access  
Revista Cientifica Guillermo de Ockham     Open Access  
Revista Conhecimento Online     Open Access  
Revista Crítica de Ciências Sociais     Open Access  
Revista de Ciencia y Tecnología     Open Access  
Revista de la Academia Colombiana de Ciencias Exactas, Físicas y Naturales     Open Access  
Revista de la Universidad del Zulia     Open Access  
Revista Politécnica     Open Access  
Revista Tecnológica     Open Access  
Revista UniVap     Open Access  
SAINSTIS     Open Access  
Sainteknol : Jurnal Sains dan Teknologi     Open Access  
Sci     Open Access  
Science     Full-text available via subscription   (Followers: 5082)
Science & Diplomacy     Free   (Followers: 3)
Science Advances     Free   (Followers: 44)
Science and Technology     Open Access   (Followers: 2)
Science Heritage Journal     Open Access  
Science World Journal     Open Access  
Science, Technology and Arts Research Journal     Open Access   (Followers: 1)
ScienceRise     Open Access  
Sciences du jeu     Open Access  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Similar Journals
Journal Cover
Data
Number of Followers: 4  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2306-5729
Published by MDPI Homepage  [258 journals]
  • Data, Vol. 9, Pages 83: Leveraging Sports Analytics and Association Rule
           Mining to Uncover Recovery and Economic Impacts in NBA Basketball

    • Authors: Vangelis Sarlis, George Papageorgiou, Christos Tjortjis
      First page: 83
      Abstract: This study examines the multifaceted field of injuries and their impacts on performance in the National Basketball Association (NBA), leveraging a blend of Data Science, Data Mining, and Sports Analytics. Our research is driven by three pivotal questions: Firstly, we explore how Association Rule Mining can elucidate the complex interplay between players’ salaries, physical attributes, and health conditions and their influence on team performance, including team losses and recovery times. Secondly, we investigate the relationship between players’ recovery times and their teams’ financial performance, probing interdependencies with players’ salaries and career trajectories. Lastly, we examine how insights gleaned from Data Mining and Sports Analytics on player recovery times and financial influence can inform strategic financial management and salary negotiations in basketball. Harnessing extensive datasets detailing player demographics, injuries, and contracts, we employ advanced analytic techniques to categorize injuries and transform contract data into a format conducive to deep analytical scrutiny. Our anomaly detection methodologies, an ensemble combination of DBSCAN, isolation forest, and Z-score algorithms, spotlight patterns and outliers in recovery times, unveiling the intricate dance between player health, performance, and financial outcomes. This nuanced understanding emphasizes the economic stakes of sports injuries. The findings of this study provide a rich, data-driven foundation for teams and stakeholders, advocating for more effective injury management and strategic planning. By addressing these research questions, our work not only contributes to the academic discourse in Sports Analytics but also offers practical frameworks for enhancing player welfare and team financial health, thereby shaping the future of strategic decisions in professional sports.
      Citation: Data
      PubDate: 2024-06-24
      DOI: 10.3390/data9070083
      Issue No: Vol. 9, No. 7 (2024)
       
  • Data, Vol. 9, Pages 84: Gender Distribution of Scientific Prizes Is
           Associated with Naming of Awards after Men, Women or Neutral

    • Authors: Gehmlich, Krause
      First page: 84
      Abstract: Woman scientists have for long been under-represented as recipients of academic prizes. The reasons for this lack of recognition are manifold, including potential gender bias amongst award panels and nomination practices. This dataset of the gender distribution of 8747 recipients of 345 scientific medals and prizes awarded by 11 General Scientific Societies as well as subject-specific societies in the Earth and Environmental Sciences and in Cardiology between 1731 and 2021 explores the magnitude, temporal trends and potential drivers of observed gender imbalances. Our analysis revealed women were particularly underrepresented in awards named after men with awards not named after a person or named after a woman being more frequently awarded to woman scientists. Time-series analysis confirmed persisting trends that are only starting to change since the early 2000s, indicating that a lot remains to be accomplished to achieve true equity. We encourage the scientific community to extend our data and analysis, as they represent important evidence of the recognition of academic achievements towards other under-represented groups and including also nomination information.
      Citation: Data
      PubDate: 2024-06-25
      DOI: 10.3390/data9070084
      Issue No: Vol. 9, No. 7 (2024)
       
  • Data, Vol. 9, Pages 85: Evaluation of Online Inquiry Competencies of
           Chilean Elementary School Students: A Dataset

    • Authors: Chourio-Acevedo, González-Ibañez
      First page: 85
      Abstract: In the age of abundant digital content, children and adolescents face the challenge of developing new information literacy competencies, particularly those pertaining to online inquiry, in order to thrive academically and personally. This article addresses the challenge encountered by Chilean students in developing online inquiry competencies (OICs) essential for completing school assignments, particularly in natural science education. A diagnostic study was conducted with 279 elementary school students (from fourth to eighth grade) from four educational institutions in Chile, representing diverse socioeconomic backgrounds. An instrument aligned with the national curriculum, featuring questions related to natural sciences, was administered through a game named NEURONE-Trivia, which integrates a search engine and a logging component to record students’ search behavior. The primary outcome of this study is a dataset comprising demographic information, self-perception, and information-seeking behaviors data collected during students’ online search sessions for natural science research tasks. This dataset serves as a valuable resource for researchers, educators, and practitioners interested in investigating the interplay between demographic characteristics, self-perception, and information-seeking behaviors among elementary students within the context of OIC development. Furthermore, it enables further examination of students’ search behaviors concerning source evaluation, information retrieval, and information utilization.
      Citation: Data
      PubDate: 2024-06-25
      DOI: 10.3390/data9070085
      Issue No: Vol. 9, No. 7 (2024)
       
  • Data, Vol. 9, Pages 86: Tuning Data Mining Models to Predict Secondary
           School Academic Performance

    • Authors: William Hoyos, Isaac Caicedo-Castro
      First page: 86
      Abstract: In recent years, educational data mining has emerged as a growing discipline focused on developing models for predicting academic performance. The primary objective of this research was to tune classification models to predict academic performance in secondary school. The dataset employed for this study encompassed information from 19,545 high school students. We used descriptive statistics to characterise information contained in personal, school, and socioeconomic variables. We implemented two data mining techniques, namely artificial neural networks (ANN) and support vector machines (SVM). Parameter optimisation was conducted through five–fold cross–validation, and model performance was assessed using accuracy and F1–Score. The results indicate a functional dependence between predictor variables and academic performance. The algorithms demonstrated an average performance exceeding 80% accuracy. Notably, ANN outperformed SVM in the dataset analysed. This type of methodology could help educational institutions to predict academic underachievement and thus generate strategies to improve students’ academic performance.
      Citation: Data
      PubDate: 2024-06-26
      DOI: 10.3390/data9070086
      Issue No: Vol. 9, No. 7 (2024)
       
  • Data, Vol. 9, Pages 87: A Point Cloud Dataset of Vehicles Passing through
           a Toll Station for Use in Training Classification Algorithms

    • Authors: Alexander Campo-Ramírez, Eduardo F. Caicedo-Bravo, Eval B. Bacca-Cortes
      First page: 87
      Abstract: This work presents a point cloud dataset of vehicles passing through a toll station in Colombia to be used to train artificial vision and computational intelligence algorithms. This article details the process of creating the dataset, covering initial data acquisition, range information preprocessing, point cloud validation, and vehicle labeling. Additionally, a detailed description of the structure and content of the dataset is provided, along with some potential applications of its use. The dataset consists of 36,026 total objects divided into 6 classes: 31,432 cars, campers, vans and 2-axle trucks with a single tire on the rear axle, 452 minibuses with a single tire on the rear axle, 1158 buses, 1179 2-axle small trucks, 797 2-axle large trucks, and 1008 trucks with 3 or more axles. The point clouds were captured using a LiDAR sensor and Doppler effect speed sensors. The dataset can be used to train and evaluate algorithms for range data processing, vehicle classification, vehicle counting, and traffic flow analysis. The dataset can also be used to develop new applications for intelligent transportation systems.
      Citation: Data
      PubDate: 2024-06-27
      DOI: 10.3390/data9070087
      Issue No: Vol. 9, No. 7 (2024)
       
  • Data, Vol. 9, Pages 88: Multi-Scale Earthquake Damaged Building Feature
           Set

    • Authors: Guorui Gao, Futao Wang, Zhenqing Wang, Qing Zhao, Litao Wang, Jinfeng Zhu, Wenliang Liu, Gang Qin, Yanfang Hou
      First page: 88
      Abstract: Earthquake disasters are marked by their unpredictability and potential for extreme destructiveness. Accurate information on building damage, captured in post-earthquake remote sensing images, is critical for an effective post-disaster emergency response. The foundational features within these images are essential for the accurate extraction of building damage data following seismic events. Presently, the availability of publicly accessible datasets tailored specifically to earthquake-damaged buildings is limited, and existing collections of post-earthquake building damage characteristics are insufficient. To address this gap and foster research advancement in this domain, this paper introduces a new, large-scale, publicly available dataset named the Major Earthquake Damage Building Feature Set (MEDBFS). This dataset comprises image data sourced from five significant global earthquakes and captured by various optical remote sensing satellites, featuring diverse scale characteristics and multiple spatial resolutions. It includes over 7000 images of buildings pre- and post-disaster, each subjected to stringent quality control and expert validation. The images are categorized into three primary groups: intact/slightly damaged, severely damaged, and completely collapsed. This paper develops a comprehensive feature set encompassing five dimensions: spectral, texture, edge detection, building index, and temporal sequencing, resulting in 16 distinct classes of feature images. This dataset is poised to significantly enhance the capabilities for data-driven identification and analysis of earthquake-induced building damage, thereby supporting the advancement of scientific and technological efforts for emergency earthquake response.
      Citation: Data
      PubDate: 2024-06-28
      DOI: 10.3390/data9070088
      Issue No: Vol. 9, No. 7 (2024)
       
  • Data, Vol. 9, Pages 89: Literature-Based Inventory of Chemical Substance
           Concentrations Measured in Organic Food Consumed in Europe

    • Authors: Joanna Choueiri, Pascal Petit, Franck Balducci, Dominique J. Bicout, Christine Demeilliers
      First page: 89
      Abstract: Populations are exposed daily to numerous environmental pollutants, particularly through food. To address environmental issues, many agricultural production methods have been developed, including organic farming. To date, there is no exhaustive inventory of the contamination of organic foods as there is for conventional foods. The main objective of this work was to construct a growing and updatable database on chemical substances and their levels in organic foods consumed in Europe. To this end, a literature search was conducted, resulting in a total of 1207 concentration values from 823 food–substances pairs involving 166 food matrices and 209 chemical substances, among which 95% were not authorized in organic farming and 80% were pesticides. The most encountered substance groups are “inorganic contaminants” and “organophosphate”, and the most studied food groups are “fruit used as fruit” and “Cereals and cereal primary derivatives”. Further studies are needed to continue updating the database with robust and comprehensive data on organic food contamination. This database could be used to study the health risks associated with these contaminants.
      Citation: Data
      PubDate: 2024-07-03
      DOI: 10.3390/data9070089
      Issue No: Vol. 9, No. 7 (2024)
       
  • Data, Vol. 9, Pages 74: CVs Classification Using Neural Network Approaches
           Combined with BERT and Gensim: CVs of Moroccan Engineering Students

    • Authors: Aniss Qostal, Aniss Moumen, Younes Lakhrissi
      First page: 74
      Abstract: Deep learning (DL)-oriented document processing is widely used in different fields for extraction, recognition, and classification processes from raw corpus of data. The article examines the application of deep learning approaches, based on different neural network methods, including Gated Recurrent Unit (GRU), long short-term memory (LSTM), and convolutional neural networks (CNNs). The compared models were combined with two different word embedding techniques, namely: Bidirectional Encoder Representations from Transformers (BERT) and Gensim Word2Vec. The models are designed to evaluate the performance of architectures based on neural network techniques for the classification of CVs of Moroccan engineering students at ENSAK (National School of Applied Sciences of Kenitra, Ibn Tofail University). The used dataset included CVs collected from engineering students at ENSAK in 2023 for a project on the employability of Moroccan engineers in which new approaches were applied, especially machine learning, deep learning, and big data. Accordingly, 867 resumes were collected from five specialties of study (Electrical Engineering (ELE), Networks and Systems Telecommunications (NST), Computer Engineering (CE), Automotive Mechatronics Engineering (AutoMec), Industrial Engineering (Indus)). The results showed that the proposed models based on the BERT embedding approach had more accuracy compared to models based on the Gensim Word2Vec embedding approach. Accordingly, the CNN-GRU/BERT model achieved slightly better accuracy with 0.9351 compared to other hybrid models. On the other hand, single learning models also have good metrics, especially based on BERT embedding architectures, where CNN has the best accuracy with 0.9188.
      Citation: Data
      PubDate: 2024-05-24
      DOI: 10.3390/data9060074
      Issue No: Vol. 9, No. 6 (2024)
       
  • Data, Vol. 9, Pages 75: De-Anonymizing Users across Rating Datasets via
           Record Linkage and Quasi-Identifier Attacks

    • Authors: Nicolás Torres, Patricio Olivares
      First page: 75
      Abstract: The widespread availability of pseudonymized user datasets has enabled personalized recommendation systems. However, recent studies have shown that users can be de-anonymized by exploiting the uniqueness of their data patterns, raising significant privacy concerns. This paper presents a novel approach that tackles the challenging task of linking user identities across multiple rating datasets from diverse domains, such as movies, books, and music, by leveraging the consistency of users’ rating patterns as high-dimensional quasi-identifiers. The proposed method combines probabilistic record linkage techniques with quasi-identifier attacks, employing the Fellegi–Sunter model to compute the likelihood of two records referring to the same user based on the similarity of their rating vectors. Through extensive experiments on three publicly available rating datasets, we demonstrate the effectiveness of the proposed approach in achieving high precision and recall in cross-dataset de-anonymization tasks, outperforming existing techniques, with F1-scores ranging from 0.72 to 0.79 for pairwise de-anonymization tasks. The novelty of this research lies in the unique integration of record linkage techniques with quasi-identifier attacks, enabling the effective exploitation of the uniqueness of rating patterns as high-dimensional quasi-identifiers to link user identities across diverse datasets, addressing a limitation of existing methodologies. We thoroughly investigate the impact of various factors, including similarity metrics, dataset combinations, data sparsity, and user demographics, on the de-anonymization performance. This work highlights the potential privacy risks associated with the release of anonymized user data across diverse contexts and underscores the critical need for stronger anonymization techniques and tailored privacy-preserving mechanisms for rating datasets and recommender systems.
      Citation: Data
      PubDate: 2024-05-27
      DOI: 10.3390/data9060075
      Issue No: Vol. 9, No. 6 (2024)
       
  • Data, Vol. 9, Pages 76: The China Historical Christian Database: A Dataset
           Quantifying Christianity in China from 1550 to 1950

    • Authors: Alex Mayfield, Margaret Frei, Daryl Ireland, Eugenio Menegon
      First page: 76
      Abstract: The era of digitization is revolutionizing traditional humanities research, presenting both novel methodologies and challenges. This field harnesses quantitative techniques to yield groundbreaking insights, contingent upon comprehensive datasets on historical subjects. The China Historical Christian Database (CHCD) exemplifies this trend, furnishing researchers with a rich repository of historical, relational, and geographical data about Christianity in China from 1550 to 1950. The study of Christianity in China confronts formidable obstacles, including the mobility of historical agents, fluctuating relational networks, and linguistic disparities among scattered sources. The CHCD addresses these challenges by curating an open-access database built in neo4j that records information about Christian institutions in China and those that worked inside of them. Drawing on historical sources, the CHCD contains temporal, relational, and geographic data. The database currently has over 40,000 nodes and 200,000 relationships, and continues to grow. Beyond its utility for religious studies, the CHCD encompasses broader interdisciplinary inquiries including social network analysis, geospatial visualization, and economic modeling. This article introduces the CHCD’s structure, and explains the data collection and curation process.
      Citation: Data
      PubDate: 2024-05-29
      DOI: 10.3390/data9060076
      Issue No: Vol. 9, No. 6 (2024)
       
  • Data, Vol. 9, Pages 77: Data on Stark Broadening of N VI Spectral Lines

    • Authors: Milan S. Dimitrijević, Magdalena D. Christova, Sylvie Sahal-Bréchot
      First page: 77
      Abstract: Data on Stark broadening parameters, spectral line widths, and shifts for 15 multiplets of N VI, whose spectral lines are broadened by collisions with electrons, protons, alpha particles (He III) and B III, B IV, B V and B VI ions, are presented. They have been calculated using the semiclassical perturbation theory, for temperatures from 50,000 K to 2,000,000 K, and perturber densities from 1016 cm−3 up to 1024 cm−3. The data for e, p and He III are of particular interest for the analysis and modelling of atmospheres of hot and dense stars, as, e.g., white dwarfs, and for investigation of their spectra, and data for boron ions are used for analysis and modelling of laser-driven plasma in proton–boron fusion research.
      Citation: Data
      PubDate: 2024-05-29
      DOI: 10.3390/data9060077
      Issue No: Vol. 9, No. 6 (2024)
       
  • Data, Vol. 9, Pages 78: In Vivo and In Vitro Electrochemical Impedance
           Spectroscopy of Acute and Chronic Intracranial Electrodes

    • Authors: Kyle P. O’Sullivan, Brian J. Philip, Jonathan L. Baker, John D. Rolston, Mark E. Orazem, Kevin J. Otto, Christopher R. Butson
      First page: 78
      Abstract: Invasive intracranial electrodes are used in both clinical and research applications for recording and stimulation of brain tissue, providing essential data in acute and chronic contexts. The impedance characteristics of the electrode–tissue interface (ETI) evolve over time and can change dramatically relative to pre-implantation baseline. Understanding how ETI properties contribute to the recording and stimulation characteristics of an electrode can provide valuable insights for users who often do not have access to complex impedance characterizations of their devices. In contrast to the typical method of characterizing electrical impedance at a single frequency, we demonstrate a method for using electrochemical impedance spectroscopy (EIS) to investigate complex characteristics of the ETI of several commonly used acute and chronic electrodes. We also describe precise modeling strategies for verifying the accuracy of our instrumentation and understanding device–solution interactions, both in vivo and in vitro. Included with this publication is a dataset containing both in vitro and in vivo device characterizations, as well as some examples of modeling and error structure analysis results. These data can be used for more detailed interpretation of neural recordings performed on common electrode types, providing a more complete picture of their properties than is often available to users.
      Citation: Data
      PubDate: 2024-06-06
      DOI: 10.3390/data9060078
      Issue No: Vol. 9, No. 6 (2024)
       
  • Data, Vol. 9, Pages 79: CrazyPAD: A Dataset for Assessing the Impact of
           Structural Defects on Nano-Quadcopter Performance

    • Authors: Kamil Masalimov, Tagir Muslimov, Evgeny Kozlov, Rustem Munasypov
      First page: 79
      Abstract: This article presents a novel dataset focused on structural damage in quadcopters, addressing a significant gap in unmanned aerial vehicle (UAV or drone) research. The dataset is called CrazyPAD (Crazyflie Propeller Anomaly Data) according to the name of the Crazyflie 2.1 nano-quadrocopter used to collect the data. Despite the existence of datasets on UAV anomalies and behavior, none of them covers structural damage specifically in nano-quadrocopters. Our dataset, therefore, provides critical data for developing predictive models for defect detection in nano-quadcopters. This work details the data collection methodology, involving rigorous simulations of structural damages and their effects on UAV performance. The ultimate goal is to enhance UAV safety by enabling accurate defect diagnosis and predictive maintenance, contributing substantially to the field of UAV technology and its practical applications.
      Citation: Data
      PubDate: 2024-06-13
      DOI: 10.3390/data9060079
      Issue No: Vol. 9, No. 6 (2024)
       
  • Data, Vol. 9, Pages 80: Data for Optimal Estimation of Under-Frequency
           Load Shedding Scheme Parameters by Considering Virtual Inertia Injection

    • Authors: Santiago Bustamante-Mesa, Jorge W. Gonzalez-Sanchez, Sergio D. Saldarriaga-Zuluaga, Jesús M. López-Lezama, Nicolás Muñoz-Galeano
      First page: 80
      Abstract: The data presented in this paper are related to the paper entitled “Optimal Estimation of Under-Frequency Load Shedding Scheme Parameters by Considering Virtual Inertia Injection”, available in the Energies journal. Here, data are included to show the results of an Under-Frequency Load Shedding (UFLS) scheme that considers the injection of virtual inertia by a VSC-HVDC link. The data obtained in six cases which were considered and analyzed are shown. In this paper, each case represents a different frequency response configuration in the event of generation loss, taking into account the presence or absence of a VSC-HVDC link, traditional and optimized UFLS schemes, as well as the injection of virtual inertia by the VSC-HVDC link. Data for each example contain the state of the relay, threshold, position in every delay, load shed, and relay configuration parameters. Data were obtained through Digsilent Power Factory and Python simulations. The purpose of this dataset is so that other researchers can reproduce the results reported in our paper.
      Citation: Data
      PubDate: 2024-06-13
      DOI: 10.3390/data9060080
      Issue No: Vol. 9, No. 6 (2024)
       
  • Data, Vol. 9, Pages 81: Beyond the Classroom: An Analysis of Internal and
           External Factors Related to Students’ Love of Learning and
           Educational Outcomes

    • Authors: Charles M. Burke, Lori P. Montross, Vera G. Dianova
      First page: 81
      Abstract: This study explores the multifaceted factors influencing student learning motivations and educational outcomes. Utilizing a diverse student body from Franklin University Switzerland, the study emphasizes the impact of internal factors, such as the psychological state of flow and a self-reported love of learning, alongside GPA and student cohort influences like year of study, academic discipline, country of origin, and academic travel. Through a cross-sectional survey of 112 students, the study evaluates how these factors correlate with and diverge from each other and student GPAs, aiming to dissect the influences of intrinsic motivations, demographic variables, and educational experiences. Our analysis revealed significant correlations between students’ self-reported love of learning, experiences of flow, and academic performance. Conversely, academic travel did not show a significant direct impact, suggesting that while such experiences are enriching, they do not necessarily translate into a greater love of learning, flow, or higher academic achievement in the short term. However, demographic factors, particularly discipline of study and country of origin, significantly influenced the students’ love of learning, indicating varied motivational drives across different cultural and educational backgrounds. This study provides valuable insights for educational policymakers and institutions aiming to cultivate more engaging and fulfilling learning environments.
      Citation: Data
      PubDate: 2024-06-16
      DOI: 10.3390/data9060081
      Issue No: Vol. 9, No. 6 (2024)
       
  • Data, Vol. 9, Pages 82: Hardware Trojan Dataset of RISC-V and Web3
           Generated with ChatGPT-4

    • Authors: Victor Takashi Hayashi, Wilson Vicente Ruggiero
      First page: 82
      Abstract: Although hardware trojans impose a relevant threat to the hardware security of RISC-V and Web3 applications, existing datasets have a limited set of examples, as the most famous hardware trojan dataset TrustHub has 106 different trojans. RISC-V specifically has study cases of three and four different hardware trojans, and no research was found regarding Web3 hardware trojans in modules such as a hardware wallet. This research presents a dataset of 290 Verilog examples generated with ChatGPT-4 Large Language Model (LLM) based on 29 golden models and the TrustHub taxonomy. It is expected that this dataset supports future research endeavors regarding defense mechanisms against hardware trojans in RISC-V, hardware wallet, and hardware Proof of Work (PoW) miner.
      Citation: Data
      PubDate: 2024-06-19
      DOI: 10.3390/data9060082
      Issue No: Vol. 9, No. 6 (2024)
       
  • Data, Vol. 9, Pages 61: Training Datasets for Epilepsy Analysis:
           Preprocessing and Feature Extraction from Electroencephalography Time
           Series

    • Authors: Christian Riccio, Angelo Martone, Gaetano Zazzaro, Luigi Pavone
      First page: 61
      Abstract: We describe 20 datasets derived through signal filtering and feature extraction steps applied to the raw time series EEG data of 20 epileptic patients, as well as the methods we used to derive them. Background: Epilepsy is a complex neurological disorder which has seizures as its hallmark. Electroencephalography plays a crucial role in epilepsy assessment, offering insights into the brain’s electrical activity and advancing our understanding of seizures. The availability of tagged training sets covering all seizure phases—inter-ictal, pre-ictal, ictal, and post-ictal—is crucial for data-driven epilepsy analyses. Methods: Using the sliding window technique with a two-second window length and a one-second time slip, we extract multiple features from the preprocessed EEG time series of 20 patients from the Freiburg Seizure Prediction Database. In addition, we assign a class label to each instance to specify its corresponding seizure phase. All these operations are made through a software application we developed, which is named Training Builder. Results: The 20 tagged training datasets each contain 1080 univariate and bivariate features, and are openly and publicly available. Conclusions: The datasets support the training of data-driven models for seizure detection, prediction, and clustering, based on features engineering.
      Citation: Data
      PubDate: 2024-04-26
      DOI: 10.3390/data9050061
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 62: Stimulated Microcontroller Dataset for New IoT
           Device Identification Schemes through On-Chip Sensor Monitoring

    • Authors: Alberto Ramos, Honorio Martín, Carmen Cámara, Pedro Peris-Lopez
      First page: 62
      Abstract: Legitimate identification of devices is crucial to ensure the security of present and future IoT ecosystems. In this regard, AI-based systems that exploit intrinsic hardware variations have gained notable relevance. Within this context, on-chip sensors included for monitoring purposes in a wide range of SoCs remain almost unexplored, despite their potential as a valuable source of both information and variability. In this work, we introduce and release a dataset comprising data collected from the on-chip temperature and voltage sensors of 20 microcontroller-based boards from the STM32L family. These boards were stimulated with five different algorithms, as workloads to elicit diverse responses. The dataset consists of five acquisitions (1.3 billion readouts) that are spaced over time and were obtained under different configurations using an automated platform. The raw dataset is publicly available, along with metadata and scripts developed to generate pre-processed T–V sequence sets. Finally, a proof of concept consisting of training a simple model is presented to demonstrate the feasibility of the identification system based on these data.
      Citation: Data
      PubDate: 2024-04-28
      DOI: 10.3390/data9050062
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 63: Detailed Landslide Traces Database of Hancheng
           County, China, Based on High-Resolution Satellite Images Available on the
           Google Earth Platform

    • Authors: Zhao, Xu, Huang
      First page: 63
      Abstract: Hancheng is located in the eastern part of China’s Shaanxi Province, near the west bank of the Yellow River. It is located at the junction of the active geological structure area. The rock layer is relatively fragmented, and landslide disasters are frequent. The occurrence of landslide disasters often causes a large number of casualties along with economic losses in the local area, seriously restricting local economic development. Although risk assessment and deformation mechanism analysis for single landslides have been performed for landslide disasters in the Hancheng area, this area lacks a landslide traces database. A complete landslide database comprises the basic data required for the study of landslide disasters and is an important requirement for subsequent landslide-related research. Therefore, this study used multi-temporal high-resolution optical images and human-computer interaction visual interpretation methods of the Google Earth platform to construct a landslide traces database in Hancheng County. The results showed that at least 6785 landslides had occurred in the study area. The total area of the landslides was about 95.38 km2, accounting for 5.88% of the study area. The average landslide area was 1406.04 m2, the largest landslide area was 377,841 m2, and the smallest landslide area was 202.96 m2. The results of this study provides an important basis for understanding the spatial distribution of landslides in Hancheng County, the evaluation of landslide susceptibility, and local disaster prevention and mitigation work.
      Citation: Data
      PubDate: 2024-04-29
      DOI: 10.3390/data9050063
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 64: A Comprehensive Dataset of the Aerodynamic and
           Geometric Coefficients of Airfoils in the Public Domain

    • Authors: Kanak Agarwal, Vedant Vijaykrishnan, Dyutit Mohanty, Manikandan Murugaiah
      First page: 64
      Abstract: This study presents an extensive collection of data on the aerodynamic behavior at a low Reynolds number and geometric coefficients for 2900 airfoils obtained through the class shape transformation (CST) method. By employing a verified OpenFOAM-based CFD simulation framework, lift and drag coefficients were determined at a Reynolds number of 105. Considering the limited availability of data on low Reynolds number airfoils, this dataset is invaluable for a wide range of applications, including unmanned aerial vehicles (UAVs) and wind turbines. Additionally, the study offers a method for automating CFD simulations that could be applied to obtain aerodynamic coefficients at higher Reynolds numbers. The breadth of this dataset also supports the enhancement and creation of machine learning (ML) models, further advancing research into the aerodynamics of airfoils and lifting surfaces.
      Citation: Data
      PubDate: 2024-04-30
      DOI: 10.3390/data9050064
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 65: Spectral Library of Plant Species from Montesinho
           Natural Park in Portugal

    • Authors: Isabel Pôças, Cátia Rodrigues de Almeida, Salvador Arenas-Castro, João C. Campos, Nuno Garcia, João Alírio, Neftalí Sillero, Ana C. Teodoro
      First page: 65
      Abstract: In this work, we present and describe a spectral library (SL) with 15 vascular plant species from Montesinho Natural Park (MNP), a protected area in Northeast Portugal. We selected species from the vascular plants that are characteristic of the habitats in the MNP, based on their prevalence, and also included one invasive species: Alnus glutinosa (L.) Gaertn, Castanea sativa Mill., Cistus ladanifer L., Crataegus monogyna Jacq., Frangula alnus Mill., Fraxinus angustifolia Vahl, Quercus pyrenaica Willd., Quercus rotundifolia Lam., Trifolium repens L., Arbutus unedo L., Dactylis glomerata L., Genista falcata Brot., Cytisus multiflorus (L’Hér.) Sweet, Erica arborea L., and Acacia dealbata Link. We collected spectra (300–2500 nm) from five records per leaf and leaf side, which resulted in 538 spectra compiled in the SL. Additionally, we computed five vegetation indices from spectral data and analysed them to highlight specific characteristics and differences among the sampled species. We detail the data repository information and its organisation for a better understanding of the data and to facilitate its use. The SL structure can add valuable information about the selected plant species in MNP, contributing to conservation purposes. This plant species SL is publicly available in Zenodo platform.
      Citation: Data
      PubDate: 2024-04-30
      DOI: 10.3390/data9050065
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 66: A Series Production Data Set for Five-Axis CNC
           Milling

    • Authors: Anna-Maria Schmitt, Bastian Engelmann
      First page: 66
      Abstract: The described data set contains features from the machine control of a five-axis milling machine. The features were recorded during thirteen series productions. Each series production includes a changeover process in which the machine was set up for the production of a different product. In addition to the timestamps and the twenty recorded features derived from Numerical Control (NC) variables, the data set also contains labels for the different production phases. For this purpose, up to 23 phases were assigned, which are based on a generalized milling process. The data set consists of thirteen .csv files, each representing a series production. The data set was recorded in a production company in the contract manufacturing sector for components with real series orders in ongoing industrial production.
      Citation: Data
      PubDate: 2024-04-30
      DOI: 10.3390/data9050066
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 67: Unveiling University Groupings: A Clustering
           Analysis for Academic Rankings

    • Authors: George Matlis, Nikos Dimokas, Petros Karvelis
      First page: 67
      Abstract: The evaluation and ranking of educational institutions are of paramount importance to a wide range of stakeholders, including students, faculty members, funding organizations, and the institutions themselves. Traditional ranking systems, such as those provided by QS, ARWU, and THE, have offered valuable insights into university performance by employing a variety of indicators to reflect institutional excellence across research, teaching, international outlook, and more. However, these linear rankings may not fully capture the multifaceted nature of university performance. This study introduces a novel clustering analysis that complements existing rankings by grouping universities with similar characteristics, providing a multidimensional perspective on global higher education landscapes. Utilizing a range of clustering algorithms—K-Means, GMM, Agglomerative, and Fuzzy C-Means—and incorporating both traditional and unique indicators, our approach seeks to highlight the commonalities and shared strengths within clusters of universities. This analysis does not aim to supplant existing ranking systems but to augment them by offering stakeholders an alternative lens through which to view and assess university performance. By focusing on group similarities rather than ordinal positions, our method encourages a more nuanced understanding of institutional excellence and facilitates peer learning among universities with similar profiles. While acknowledging the limitations inherent in any methodological approach, including the selection of indicators and clustering algorithms, this study underscores the value of complementary analyses in enriching our understanding of higher educational institutions’ performance.
      Citation: Data
      PubDate: 2024-05-11
      DOI: 10.3390/data9050067
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 68: EEG and Physiological Signals Dataset from
           Participants during Traditional and Partially Immersive Learning
           Experiences in Humanities

    • Authors: Rebeca Romo-De León, Mei Li L. Cham-Pérez, Verónica Andrea Elizondo-Villegas, Alejandro Villarreal-Villarreal, Alexandro Antonio Ortiz-Espinoza, Carol Stefany Vélez-Saboyá, Jorge de Jesús Lozoya-Santos, Manuel Cebral-Loureda, Mauricio A. Ramírez-Moreno
      First page: 68
      Abstract: The relevance of the interaction between Humanities-enhanced learning using immersive environments and simultaneous physiological signal analysis contributes to the development of Neurohumanities and advancements in applications of Digital Humanities. The present dataset consists of recordings from 24 participants divided in two groups (12 participants in each group) engaging in simulated learning scenarios, traditional learning, and partially immersive learning experiences. Data recordings from each participant contain recordings of physiological signals and psychometric data collected from applied questionnaires. Physiological signals include electroencephalography, real-time engagement and emotion recognition calculation by a Python EEG acquisition code, head acceleration, electrodermal activity, blood volume pressure, inter-beat interval, and temperature. Before the acquisition of physiological signals, participants were asked to fill out the General Health Questionnaire and Trait Meta-Mood Scale. In between recording sessions, participants were asked to fill out Likert-scale questionnaires regarding their experience and a Self-Assessment Manikin. At the end of the recording session, participants filled out the ITC Sense of Presence Inventory questionnaire for user experience. The dataset can be used to explore differences in physiological patterns observed between different learning modalities in the Humanities.
      Citation: Data
      PubDate: 2024-05-15
      DOI: 10.3390/data9050068
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 69: Review of Data Processing Methods Used in
           Predictive Maintenance for Next Generation Heavy Machinery

    • Authors: Ietezaz Ul Hassan, Krishna Panduru, Joseph Walsh
      First page: 69
      Abstract: Vibration-based condition monitoring plays an important role in maintaining reliable and effective heavy machinery in various sectors. Heavy machinery involves major investments and is frequently subjected to extreme operating conditions. Therefore, prompt fault identification and preventive maintenance are important for reducing costly breakdowns and maintaining operational safety. In this review, we look at different methods of vibration data processing in the context of vibration-based condition monitoring for heavy machinery. We divided primary approaches related to vibration data processing into three categories–signal processing methods, preprocessing-based techniques and artificial intelligence-based methods. We highlight the importance of these methods in improving the reliability and effectiveness of heavy machinery condition monitoring systems, highlighting the importance of precise and automated fault detection systems. To improve machinery performance and operational efficiency, this review aims to provide information on current developments and future directions in vibration-based condition monitoring by addressing issues like imbalanced data and integrating cutting-edge techniques like anomaly detection algorithms.
      Citation: Data
      PubDate: 2024-05-15
      DOI: 10.3390/data9050069
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 70: Continuous Wave Measurements Collected in
           Intermediate Depth throughout the North Sea Storm Season during the
           RealDune/REFLEX Experiments

    • Authors: Jantien Rutten, Marion Tissier, Paul van Wiechen, Xinyi Zhang, Sierd de Vries, Ad Reniers, Jan-Willem Mol
      First page: 70
      Abstract: High-resolution wave measurements at intermediate water depth are required to improve coastal impact modeling. Specifically, such data sets are desired to calibrate and validate models, and broaden the insight on the boundary conditions that force models. Here, we present a wave data set collected in the North Sea at three stations in intermediate water depth (6–14 m) during the 2021/2022 storm season as part of the RealDune/REFLEX experiments. Continuous measurements of synchronized surface elevation, velocity and pressure were recorded at 2–4 Hz by Acoustic Doppler Profilers and an Acoustic Doppler Velocimeter for a 5-month duration. Time series were quality-controlled, directional-frequency energy spectra were calculated and common bulk parameters were derived. Measured wave conditions vary from calm to energetic with 0.1–5.0 m sea-swell wave height, 5–16 s mean wave period and W-NNW direction. Nine storms, i.e., wave height beyond 2.5 m for at least six hours, were recorded including the triple storms Dudley, Eunice and Franklin. This unique data set can be used to investigate wave transformation, wave nonlinearity and wave directionality for higher and lower frequencies (e.g., sea-swell and infragravity waves) to compare with theoretical and empirical descriptions. Furthermore, the data can serve to force, calibrate and validate models during storm conditions.
      Citation: Data
      PubDate: 2024-05-17
      DOI: 10.3390/data9050070
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 71: Neural Architecture Comparison for Bibliographic
           Reference Segmentation: An Empirical Study

    • Authors: Rodrigo Cuéllar Cuéllar Hidalgo, Raúl Pinto Pinto Elías, Juan Manuel Torres Torres Moreno, Osslan Osiris Vergara Vergara Villegas , Gerardo Reyes Reyes Salgado, Andrea Magadán Magadán Salazar
      First page: 71
      Abstract: In the realm of digital libraries, efficiently managing and accessing scientific publications necessitates automated bibliographic reference segmentation. This study addresses the challenge of accurately segmenting bibliographic references, a task complicated by the varied formats and styles of references. Focusing on the empirical evaluation of Conditional Random Fields (CRF), Bidirectional Long Short-Term Memory with CRF (BiLSTM + CRF), and Transformer Encoder with CRF (Transformer + CRF) architectures, this research employs Byte Pair Encoding and Character Embeddings for vector representation. The models underwent training on the extensive Giant corpus and subsequent evaluation on the Cora Corpus to ensure a balanced and rigorous comparison, maintaining uniformity across embedding layers, normalization techniques, and Dropout strategies. Results indicate that the BiLSTM + CRF architecture outperforms its counterparts by adeptly handling the syntactic structures prevalent in bibliographic data, achieving an F1-Score of 0.96. This outcome highlights the necessity of aligning model architecture with the specific syntactic demands of bibliographic reference segmentation tasks. Consequently, the study establishes the BiLSTM + CRF model as a superior approach within the current state-of-the-art, offering a robust solution for the challenges faced in digital library management and scholarly communication.
      Citation: Data
      PubDate: 2024-05-18
      DOI: 10.3390/data9050071
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 72: A Benchmark Data Set for Long-Term Monitoring in
           the eLTER Site Gesäuse-Johnsbachtal

    • Authors: Florian Lippl, Alexander Maringer, Margit Kurka, Jakob Abermann, Wolfgang Schöner, Manuela Hirschmugl
      First page: 72
      Abstract: This paper gives an overview over all currently available data sets for the European Long-term Ecosystem Research (eLTER) monitoring site Gesäuse-Johnsbachtal. The site is part of the LTSER platform Eisenwurzen in the Alps of the province of Styria, Austria. It contains both protected (National Park Gesäuse) and non-protected areas (Johnsbachtal). Although the main research focus of the eLTER monitoring site Gesäuse-Johnsbachtal is on inland surface running waters, forests and other wooded land, the eLTER whole system (WAILS) approach was followed in regard to the data selection, systematically screening all available data in regard to its suitability as eLTER’s Standard Observations (SOs). Thus, data from all system strata was included, incorporating Geosphere, Atmosphere, Hydrosphere, Biosphere and Sociosphere. In the WAILS approach these SOs are key data for a whole system approach towards long term ecosystem research. Altogether, 54 data sets have been collected for the eLTER monitoring site Gesäuse-Johnsbachtal and included in the Dynamical Ecological Information Management System – Site and Data Registry (DEIMS-SDR), which is the eLTER data platform. The presented work provides all these data sets through dedicated data repositories for FAIR use. This paper gives an overview on all compiled data sets and their main properties. Additionally, the available data are evaluated in a concluding gap analysis with regard to the needed observation data according to WAILS, followed by an outlook on how to fill these gaps.
      Citation: Data
      PubDate: 2024-05-18
      DOI: 10.3390/data9050072
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 73: Comparative Analysis of the Predictive Performance
           of an ANN and Logistic Regression for the Acceptability of Eco-Mobility
           Using the Belgrade Data Set

    • Authors: Jelica Komarica, Draženko Glavić, Snežana Kaplanović
      First page: 73
      Abstract: To solve the problem of environmental pollution caused by road traffic, alternatives to vehicles with internal combustion engines are often proposed. As such, eco-mobility microvehicles have significant potential in the fight against environmental pollution, but only on the condition that they are widely accepted and that they replace the vehicles that predominantly pollute the environment. With this in mind, this study aims to elucidate the main variables that influence the acceptability of these vehicles, using prediction models based on binary logistic regression and a multilayer artificial neural network—a multilayer perceptron (ANN). The data of a random sample obtained via an online questionnaire, answered by 503 inhabitants of Belgrade (Serbia), were used for training and testing the model. A multilayer perceptron with 9 and 7 neurons in two hidden layers, a hyperbolic tangent activation function in the hidden layer, and an identity function in the output layer performed slightly better than the binary logistic regression model. With an accuracy of 85%, a precision of 79%, a recall of 81%, and an area under the ROC curve of 0.9, the multilayer perceptron model recognized the influential variables in predicting acceptability. The results of the model indicate that a respondent’s relationship to their current environmental pollution, the frequency of their use of modes of transport such as bicycles and motorcycles, their mileage for commuting, and their personal income have the greatest influence on the acceptability of using eco-mobility vehicles.
      Citation: Data
      PubDate: 2024-05-19
      DOI: 10.3390/data9050073
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 47: An EEG Dataset of Subject Pairs during
           

    • Authors: María A. Hernández-Mustieles, Yoshua E. Lima-Carmona, Axel A. Mendoza-Armenta, Ximena Hernandez-Machain, Diego A. Garza-Vélez, Aranza Carrillo-Márquez, Diana C. Rodríguez-Alvarado, Jorge de J. Lozoya-Santos, Mauricio A. Ramírez-Moreno
      First page: 47
      Abstract: This dataset was acquired during collaboration and competition tasks performed by sixteen subject pairs (N = 32) of one female and one male under different (face-to-face and online) modalities. The collaborative task corresponds to cooperating to put together a 100-piece puzzle, while the competition task refers to playing against each other in a one-on-one classic 28-piece dominoes game. In the face-to-face modality, all interactions between the pair occurred in person. On the other hand, in the online modality, participants were physically separated, and interaction was only allowed through Zoom software with an active microphone and camera. Electroencephalography data of the two subjects were acquired simultaneously while performing the tasks. This article describes the experimental setup, the process of the data streams acquired during the tasks, and the assessment of data quality.
      Citation: Data
      PubDate: 2024-03-27
      DOI: 10.3390/data9040047
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 48: Luxury Car Data Analysis: A Literature Review

    • Authors: Pegah Barakati, Flavio Bertini, Emanuele Corsi, Maurizio Gabbrielli, Danilo Montesi
      First page: 48
      Abstract: The concept of luxury, considering it a rare and exclusive attribute, is evolving due to technological advances and the increasing influence of consumers in the market. Luxury cars have always symbolized wealth, social status, and sophistication. Recently, as technology progresses, the ability and interest to gather, store, and analyze data from these elegant vehicles has also increased. In recent years, the analysis of luxury car data has emerged as a significant area of research, highlighting researchers’ exploration of various aspects that may differentiate luxury cars from ordinary ones. For instance, researchers study factors such as economic impact, technological advancements, customer preferences and demographics, environmental implications, brand reputation, security, and performance. Although the percentage of individuals purchasing luxury cars is lower than that of ordinary cars, the significance of analyzing luxury car data lies in its impact on various aspects of the automotive industry and society. This literature review aims to provide an overview of the current state of the art in luxury car data analysis.
      Citation: Data
      PubDate: 2024-03-30
      DOI: 10.3390/data9040048
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 49: Analysis of a Bluetooth Traffic Dataset Obtained
           during University Examination Sessions

    • Authors: Radu Bouaru, Adrian Peculea, Bogdan Iancu, Sorin Buzura, Emil Cebuc, Vasile Dadarlat
      First page: 49
      Abstract: In academic environments, students take exams simultaneously in campus examination classrooms. Due to recent advancements in technology, examination rooms are flooded with Bluetooth data traffic generated by personal devices (smartphones, smartwatches, etc.). The work presented in this article proposes a method for collecting Bluetooth traffic in an academic examination setting. The desired data were collected during several examination sessions using an Ubertooth One device, and then an in-depth post-processing analysis was performed on the collected dataset. The devices generating traffic were precisely located within the examination room, and areas with heightened data traffic were highlighted. Additionally, another goal of the current research was to provide a unique type of dataset to the academic community, facilitating its utilization in further research endeavors.
      Citation: Data
      PubDate: 2024-03-30
      DOI: 10.3390/data9040049
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 50: DNA of Music: Identifying Relationships among
           Different Versions of the Composition Sadhukarn from Thailand, Laos, and
           Cambodia Using Multivariate Statistics

    • Authors: Sumetus Eambangyung, Gretel Schwörer-Kohl, Witoon Purahong
      First page: 50
      Abstract: Sadhukarn, a sacred music composition performed ritually to salute and invite divine powers to open a ceremony or feast, is played in Thailand, Cambodia, and Laos. Different countries have unique versions, arranged based on musicians’ skills and en vogue styles. This study presents the results of multivariate statistical analyses of 26 different versions of Sadhukarn main melodies using non-metric multidimensional scaling (NMDS) and cluster analysis. The objective was to identify the optimal number of parameters for identifying the origin and relationships among Sadhukarn versions, including rhyme structures, pillar tone, rhythmic and melodic patterns, intervals, pitches, and combinations of these parameters. The data were analyzed using both full and normalized datasets (32 phrases) to avoid biases due to differences in phrases among versions. Overall, the combination of six parameters is the best approach for data analysis in both full and normalized datasets. The analysis of the ‘full version’ shows the separation of Sadhukarn versions from different countries of origin, while the analysis of the ‘normalized version’ reveals the rhyme structure, rhythmic structure, and pitch as crucial parameters for identifying Sadhukarn versions. We conclude that multivariate statistics are powerful tools for identifying relationships among different versions of Sadhukarn compositions from Thailand, Laos, and Cambodia and within the same countries of origin.
      Citation: Data
      PubDate: 2024-03-30
      DOI: 10.3390/data9040050
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 51: Longitudinal Patterns of Online Activity and
           Social Feedback Are Associated with Current and Perceived Changes in
           Quality of Life in Adult Facebook Users

    • Authors: Davide Marengo, Michele Settanni
      First page: 51
      Abstract: The present study explored how sharing verbal status updates on Facebook and receiving Likes, as a form of positive social feedback, correlate with current and perceived changes in Quality of Life (QoL). Utilizing the Facebook Graph API, we collected a longitudinal dataset comprising status updates and Likes received by 1577 adult Facebook users over a 12-month period. Two monthly indicators were calculated: the percentage of verbal status updates and the average number of Likes per post. Participants were administered a survey to assess current and perceived changes in QoL. Confirmatory Factor Analysis (CFA) and the Auto-Regressive Latent Trajectory Model with Structured Residuals (ALT-SRs) were used to model longitudinal patterns emerging from the objective recordings of Facebook activity and explore their correlation with QoL measures. Findings indicated a positive correlation between the percentage of verbal status updated on Facebook and current QoL. Online positive social feedback, measured through received Likes, was associated with both current QoL and perceived improvements in QoL. Of note, perceived improvements in QoL correlated with an increase in received Likes over time. Results highlight the relevance of collecting and modeling longitudinal Facebook data for the investigation of the association between activity on social media and individual well-being.
      Citation: Data
      PubDate: 2024-03-31
      DOI: 10.3390/data9040051
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 52: Natural Language Processing Patents Landscape
           Analysis

    • Authors: Hend S. Al-Khalifa, Taif AlOmar, Ghala AlOlyyan
      First page: 52
      Abstract: Understanding NLP patents provides valuable insights into innovation trends and competitive dynamics in artificial intelligence. This study uses the Lens patent database to investigate the landscape of NLP patents. The overall patent output in the NLP field on a global scale has exhibited a rapid growth over the past decade, indicating rising research and commercial interests in applying NLP techniques. By analyzing patent assignees, technology categories, and geographic distribution, we identify leading innovators as well as research hotspots in applying NLP. The patent landscape reflects intensifying competition between technology giants and research institutions. This research aims to synthesize key patterns and developments in NLP innovation revealed through patent data analysis, highlighting implications for firms and policymakers. A detailed understanding of NLP patenting activity can inform intellectual property strategy and technology investment decisions in this burgeoning AI domain.
      Citation: Data
      PubDate: 2024-03-31
      DOI: 10.3390/data9040052
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 53: Wearable Device Bluetooth/BLE Physical Layer
           Dataset

    • Authors: Artis Rusins, Deniss Tiscenko, Eriks Dobelis, Eduards Blumbergs, Krisjanis Nesenbergs, Peteris Paikens
      First page: 53
      Abstract: Wearable devices, such as headsets and activity trackers, rely heavily on the Bluetooth and/or the Bluetooth Low Energy wireless communication standard to exchange data with smartphones or other peripherals. Since these devices collect personal health and activity data, ensuring the privacy and security of the transmitted data is crucial. Therefore, we present a dataset that captures complete Bluetooth communications—including advertising, connection, data exchange, and disconnection—in an RF isolated environment using software-defined radio. We were able to successfully decode the captured Bluetooth packets using existing tools. This dataset provides researchers with the ability to fully analyze Bluetooth traffic and gain insight into communication patterns and potential security vulnerabilities.
      Citation: Data
      PubDate: 2024-04-03
      DOI: 10.3390/data9040053
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 54: Illumina 16S rRNA Gene Sequencing Dataset of
           Bacterial Communities of Soil Associated with Ironwood Trees (Casuarina
           equisetifolia) in Guam

    • Authors: Tao Jin, Robert L. Schlub, Claudia Husseneder
      First page: 54
      Abstract: Ironwood trees, which are of great importance for the economy and environment of tropical areas, were first discovered to suffer from a slow progressive dieback in Guam in 2002, later referred to as ironwood tree decline (IWTD). A variety of biotic factors have been shown to be associated with IWTD, including putative bacterial pathogens Ralstonia solanacearum and Klebsiella species (K. variicola and K. oxytoca), the fungus Ganoderma australe, and termites. Due to the soilborne nature of these pathogens, soil microbiomes have been suggested to be a significant factor influencing tree health. In this project, we sequenced the microbiome in the soil collected from the root region of healthy ironwood trees and those showing signs of IWTD to evaluate the association between the bacterial community in soil and IWTD. This dataset contains 4,782,728 raw sequencing reads present in soil samples collected from thirty-nine ironwood trees with varying scales of decline severity in Guam obtained via sequencing the V1–V3 region of the 16S rRNA gene on the Illumina NovaSeq (2 × 250 bp) platform. Sequences were taxonomically assigned in QIIME2 using the SILVA 132 database. Firmicutes and Actinobacteria were the most dominant phyla in soil. Differences in soil microbiomes were detected between limestone and sand soil parent materials. No putative plant pathogens of the genera Ralstonia or Klebsiella were found in the samples. Bacterial diversity was not linked to parameters of IWTD. The dataset has been made publicly available through NCBI GenBank under BioProject ID PRJNA883256. This dataset can be used to compare the bacterial taxa present in soil associated with ironwood trees in Guam to bacteria communities of other geographical locations to identify microbial signatures of IWTD. In addition, this dataset can also be used to investigate the relationship between soil microbiomes and the microbiomes of ironwood trees as well as those of the termites which attack ironwood trees.
      Citation: Data
      PubDate: 2024-04-07
      DOI: 10.3390/data9040054
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 55: Learning from Conect4children: A Collaborative
           Approach towards Standardisation of Disease-Specific Paediatric Research
           Data

    • Authors: Anando Sen, Victoria Hedley, Eva Degraeuwe, Steven Hirschfeld, Ronald Cornet, Ramona Walls, John Owen, Peter N. Robinson, Edward G. Neilan, Thomas Liener, Giovanni Nisato, Neena Modi, Simon Woodworth, Avril Palmeri, Ricarda Gaentzsch, Melissa Walsh, Teresa Berkery, Joanne Lee, Laura Persijn, Kasey Baker, Kristina An Haack, Sonia Segovia Simon, Julius O. B. Jacobsen, Giorgio Reggiardo, Melissa A. Kirwin, Jessie Trueman, Claudia Pansieri, Donato Bonifazi, Sinéad Nally, Fedele Bonifazi, Rebecca Leary, Volker Straub
      First page: 55
      Abstract: The conect4children (c4c) initiative was established to facilitate the development of new drugs and other therapies for paediatric patients. It is widely recognised that there are not enough medicines tested for all relevant ages of the paediatric population. To overcome this, it is imperative that clinical data from different sources are interoperable and can be pooled for larger post hoc studies. c4c has collaborated with the Clinical Data Interchange Standards Consortium (CDISC) to develop cross-cutting data resources that build on existing CDISC standards in an effort to standardise paediatric data. The natural next step was an extension to disease-specific data items. c4c brought together several existing initiatives and resources relevant to disease-specific data and analysed their use for standardising disease-specific data in clinical trials. Several case studies that combined disease-specific data from multiple trials have demonstrated the need for disease-specific data standardisation. We identified three relevant initiatives. These include European Reference Networks, European Joint Programme on Rare Diseases, and Pistoia Alliance. Other resources reviewed were National Cancer Institute Enterprise Vocabulary Services, CDISC standards, pharmaceutical company-specific data dictionaries, Human Phenotype Ontology, Phenopackets, Unified Registry for Inherited Metabolic Disorders, Orphacodes, Rare Disease Cures Accelerator-Data and Analytics Platform (RDCA-DAP), and Observational Medical Outcomes Partnership. The collaborative partners associated with these resources were also reviewed briefly. A plan of action focussed on collaboration was generated for standardising disease-specific paediatric clinical trial data. A paediatric data standards multistakeholder and multi-project user group was established to guide the remaining actions—FAIRification of metadata, a Phenopackets pilot with RDCA-DAP, applying Orphacodes to case report forms of clinical trials, introducing CDISC standards into European Reference Networks, testing of the CDISC Pediatric User Guide using data from the mentioned resources and organisation of further workshops and educational materials.
      Citation: Data
      PubDate: 2024-04-08
      DOI: 10.3390/data9040055
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 56: A Dataset for Studying the Relationship between
           Human and Smart Devices

    • Authors: Francesco Lelli, Heidi Toivonen
      First page: 56
      Abstract: This dataset reports the responses to a survey designed for investigating the relationship that humans have with their smart devices. The dataset was collected between May and July 2020 and is a sample of over 500 respondents of various ethnicities and backgrounds. These data were used for modeling the ways that people relate to their devices using the notion of agency. However, the data can be used for complementing any study that intends to investigate a tool-mediated communication from the perspective of users, applying a variety of beliefs, attitudes, and expectations that users have in relation to their devices and themselves. This article presents the survey items as well as some preliminary data insights. The collected data were in English and the responses were anonymized to ensure GDPR compliance. The data were stored in a .csv file containing the respondents’ answers to the questions.
      Citation: Data
      PubDate: 2024-04-11
      DOI: 10.3390/data9040056
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 57: Experimental Data on Maximum Swelling Pressure of
           Clayey Soils and Related Soil Properties

    • Authors: Reza Taherdangkoo, Muntasir Shehab, Thomas Nagel, Faramarz Doulati Ardejani, Christoph Butscher
      First page: 57
      Abstract: Clayey soils exhibit significant volumetric changes in response to variations in water content. The swelling pressure of clayey soils is a critical parameter for evaluating the stability and performance of structures built on them, facilitating the development of appropriate design methodologies and mitigation strategies to ensure their long-term integrity and safety. We present a dataset comprising maximum swelling pressure values from 759 compacted soil samples, compiled from 16 articles published between 1994 and 2022. The dataset is classified into two main groups: 463 samples of natural clays and 296 samples of bentonite and bentonite mixtures, providing data on various types of soils and their properties. Different swelling test methods, including zero swelling, swell consolidation, restrained swell, double oedometer, free swelling, constant volume oedometer, UPC isochoric cell, isochoric oedometer and consolidometer, were employed to measure the maximum swelling pressure. The comprehensive nature of the dataset enhances its applicability for geotechnical projects. The dataset is a valuable resource for understanding the complex interactions between soil properties and swelling behavior, contributing to advancements in soil mechanics and geotechnical engineering.
      Citation: Data
      PubDate: 2024-04-16
      DOI: 10.3390/data9040057
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 58: Introduction to Reproducible Geospatial Analysis
           and Figures in R: A Tutorial Article

    • Authors: Philippe Maesen, Edouard Salingros
      First page: 58
      Abstract: The present article is intended to serve an educational purpose for data scientists and students who already have experience with the R language and which to start using it for geospatial analysis and map creation. The basic concepts of raster data, vector data, CRS and datum are first presented along with a basic workflow to conduct reproducible geospatial research in R. Examples of important types of maps (scatter, bubble, choropleth, hexbin and faceted) created from open-source environmental data are illustrated and their practical implementation in R is discussed. Through these examples, essential manipulations on geospatial vector data are demonstrated (reading , transforming CRS, creating geometries from scratch, buffer zones around existing geometries and intersections between geometries).
      Citation: Data
      PubDate: 2024-04-20
      DOI: 10.3390/data9040058
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 59: Mapping of Data-Sharing Repositories for
           Paediatric Clinical Research—A Rapid Review

    • Authors: Mariagrazia Felisi, Fedele Bonifazi, Maddalena Toma, Claudia Pansieri, Rebecca Leary, Victoria Hedley, Ronald Cornet, Giorgio Reggiardo, Annalisa Landi, Annunziata D’Ercole, Salma Malik, Sinéad Nally, Anando Sen, Avril Palmeri, Donato Bonifazi, Adriana Ceci
      First page: 59
      Abstract: The reuse of paediatric individual patient data (IPD) from clinical trials (CTs) is essential to overcome specific ethical, regulatory, methodological, and economic issues that hinder the progress of paediatric research. Sharing data through repositories enables the aggregation and dissemination of clinical information, fosters collaboration between researchers, and promotes transparency. This work aims to identify and describe existing data-sharing repositories (DSRs) developed to store, share, and reuse paediatric IPD from CTs. A rapid review of platforms providing access to electronic DSRs was conducted. A two-stage process was used to characterize DSRs: a first step of identification, followed by a second step of analysis using a set of eight purpose-built indicators. From an initial set of forty-five publicly available DSRs, twenty-one DSRs were identified as meeting the eligibility criteria. Only two DSRs were found to be totally focused on the paediatric population. Despite an increased awareness of the importance of data sharing, the results of this study show that paediatrics remains an area in which targeted efforts are still needed. Promoting initiatives to raise awareness of these DSRs and creating ad hoc measures and common standards for the sharing of paediatric CT data could help to bridge this gap in paediatric research.
      Citation: Data
      PubDate: 2024-04-20
      DOI: 10.3390/data9040059
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 60: Predicting Academic Success of College Students
           Using Machine Learning Techniques

    • Authors: Jorge Humberto Guanin-Fajardo, Javier Guaña-Moya, Jorge Casillas
      First page: 60
      Abstract: College context and academic performance are important determinants of academic success; using students’ prior experience with machine learning techniques to predict academic success before the end of the first year reinforces college self-efficacy. Dropout prediction is related to student retention and has been studied extensively in recent work; however, there is little literature on predicting academic success using educational machine learning. For this reason, CRISP-DM methodology was applied to extract relevant knowledge and features from the data. The dataset examined consists of 6690 records and 21 variables with academic and socioeconomic information. Preprocessing techniques and classification algorithms were analyzed. The area under the curve was used to measure the effectiveness of the algorithm; XGBoost had an AUC = 87.75% and correctly classified eight out of ten cases, while the decision tree improved interpretation with ten rules in seven out of ten cases. Recognizing the gaps in the study and that on-time completion of college consolidates college self-efficacy, creating intervention and support strategies to retain students is a priority for decision makers. Assessing the fairness and discrimination of the algorithms was the main limitation of this work. In the future, we intend to apply the extracted knowledge and learn about its influence of on university management.
      Citation: Data
      PubDate: 2024-04-22
      DOI: 10.3390/data9040060
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 39: CybAttT: A Dataset of Cyberattack News Tweets for
           Enhanced Threat Intelligence

    • Authors: Huda Lughbi, Mourad Mars, Khaled Almotairi
      First page: 39
      Abstract: The continuous developments in information technologies have resulted in a significant rise in security concerns, including cybercrimes, unauthorized access, and cyberattacks. Recently, researchers have increasingly turned to social media platforms like X to investigate cyberattacks. Analyzing and collecting news about cyberattacks from tweets can efficiently provide crucial insights into the attacks themselves, including their impacts, occurrence regions, and potential mitigation strategies. However, there is a shortage of labeled datasets related to cyberattacks. This paper describes CybAttT, a dataset of 36,071 English cyberattack-related tweets. These tweets are manually labeled into three classes: high-risk news, normal news, and not news. Our final overall Inner Annotation agreement was 0.99 (Fleiss kappa), which represents high agreement. To ensure dataset reliability and accuracy, we conducted rigorous experiments using different supervised machine learning algorithms and various fine-tuned language models to assess its quality and suitability for its intended purpose. A high F1-score of 87.6% achieved using the CybAttT dataset not only demonstrates the potential of our approach but also validates the high quality and thoroughness of its annotations. We have made our CybAttT dataset accessible to the public for research purposes.
      Citation: Data
      PubDate: 2024-02-23
      DOI: 10.3390/data9030039
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 40: Draft Genome Sequence of Bacillus thuringiensis
           INTA 103-23 Reveals Its Insecticidal Properties: Insights from the Genomic
           Sequence

    • Authors: Leopoldo Palma, Leila Ortiz, José Niz, Marcelo Berretta, Diego Sauka
      First page: 40
      Abstract: The genome of Bacillus thuringiensis strain INTA 103-23 was sequenced, revealing a high-quality draft assembly comprising 243 contigs with a total size of 6.30 Mb and a completeness of 99%. Phylogenetic analysis classified INTA 103-23 within the Bacillus cereus sensu stricto cluster. Genome annotation identified 6993 genes, including 2476 hypothetical proteins. Screening for pesticidal proteins unveiled 10 coding sequences with significant similarity to known pesticidal proteins, showcasing a potential efficacy against various insect orders. AntiSMASH analysis predicted 13 biosynthetic gene clusters (BGCs), including clusters with 100% similarity to petrobactin and anabaenopeptin NZ857/nostamide A. Notably, fengycin exhibited a 40% similarity within the identified clusters. Further exploration involved a comparative genomic analysis with ten phylogenetically closest genomes. The ANI values, calculated using fastANI, confirmed the closest relationships with strains classified under Bacillus cereus sensu stricto. This comprehensive genomic analysis of B. thuringiensis INTA 103-23 provides valuable insights into its genetic makeup, potential pesticidal activity, and biosynthetic capabilities. The identified BGCs and pesticidal proteins contribute to our understanding of the strain’s biocontrol potential against diverse agricultural pests.
      Citation: Data
      PubDate: 2024-02-28
      DOI: 10.3390/data9030040
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 41: Defining the Balearic Islands’ Tourism Data
           Space: An Approach to Functional and Data Requirements

    • Authors: Dolores Ordóñez-Martínez, Joana M. Seguí-Pons, Maurici Ruiz-Pérez
      First page: 41
      Abstract: The definition of a tourism data space (TDS) in the Balearic Islands is a complex process that involves identifying the types of questions to be addressed, including analytical tools, and determining the type of information to be incorporated. This study delves into the functional requirements of a Balearic Islands’ TDS based on the study of scientific research carried out in the field of tourism in the Balearic Islands and drawing comparisons with international scientific research in the field of tourism information. Utilizing a bibliometric analysis of the scientific literature, this study identifies the scientific requirements that should be met for the development of a robust, rigorous, and efficient TDS. The goal is to support excellent scientific research in tourism and facilitate the transfer of research results to the productive sector to maintain and improve the competitiveness of the Balearic Islands as a tourist destination. The results of the analysis provide a structured framework for the construction of the Balearic Islands’ TDS, outlining objectives, methods to be implemented, and information to be considered.
      Citation: Data
      PubDate: 2024-02-29
      DOI: 10.3390/data9030041
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 42: A Set of Ground Penetrating Radar Measures from
           Quarries

    • Authors: Stefano Bonduà, André Monteiro Klen, Massimiliano Pilone, Laurentiu Asimopolos, Natalia-Silvia Asimopolos
      First page: 42
      Abstract: This paper presents a set of Ground Penetrating Radar (GPR) data obtained from in situ measurements conducted in four ornamental stone quarries located in Italy (Botticino quarry) and Romania (Ruschita, Carpinis, and Pietroasa quarries). The GPR is a Non-Destructive Testing (NDT) technique that enables the detection and localization of fractures without damage to the surface, among other capabilities. In this study, two instruments of ground-coupled GPR were used to detect and locate the fractures, discontinuities, or weakened zones. The GPR data contains radargrams for discontinuities and fracture detection, besides the geographic location of the measures. For each measurement site, a set of radargrams has been acquired in two orthogonal directions, allowing for a 3D reconstruction of the investigated site.
      Citation: Data
      PubDate: 2024-03-03
      DOI: 10.3390/data9030042
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 43: Pupil Data Upon Stimulation by Auditory Stimuli

    • Authors: Davide La Rosa, Luca Bruschini, Maria Paola Tramonti Fantozzi, Paolo Orsini, Mario Milazzo, Antonino Crivello
      First page: 43
      Abstract: Evaluating hearing in newborns and uncooperative patients can pose a considerable challenge. One potential solution might be to employ the Pupil Dilation Response (PDR) as an objective physiological metric. In this dataset descriptor paper, we present a collection of data showing changes in pupil dimension and shape upon presentation of auditory stimuli. In particular, we collected pupil data from 16 subjects, with no known hearing loss, upon different lighting conditions, measured in response to a series of 60–100 audible tones, all of the same frequency and amplitude, which may serve to further investigate any relationship between hearing capabilities and PDRs.
      Citation: Data
      PubDate: 2024-03-05
      DOI: 10.3390/data9030043
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 44: Subjective Well-Being and Mental Health among
           College Students: Two Datasets for Diagnosis and Program Evaluation

    • Authors: Lina Martínez, Esteban Robles, Valeria Trofimoff, Nicolás Vidal, Andrés David Espada, Nayith Mosquera, Bryan Franco, Víctor Sarmiento, María Isabel Zafra
      First page: 44
      Abstract: This paper presents two datasets about college students’ subjective well-being and mental health in a developing country. The first data set of this report offers a diagnosis of the prevalence of self-reported symptoms associated with stress, anxiety, depression, and overall evaluation of subjective well-being. The study uses validated scales to measure self-reported symptoms related to mental health conditions. To measure stress, the study used the Perceived Stress Scale (PSS-10) and the 7-item Generalized Anxiety Disorder Scale (GAD-7) to measure symptoms associated with anxiety (GAD-7), and the 9-item Patient Health Questionnaire (PHQ-9) to measure symptoms associated with depression. This diagnosis was collected in a college student sample of 3052 undergrad students in 2022 at a medium-sized university in Colombia. The second dataset reports the evaluation of a positive education intervention implemented in the same university. The Colombian Minister of Science and Technology financed the intervention to promote strategies to mitigate the consequences on college students’ well-being and mental health after the pandemic. The program evaluation data cover two years (2020–2022) with 193 college students in the treatment group (students enrolled in a class teaching evidence-based interventions to promote well-being and mental health awareness) and 135 students in the control group. Data for evaluation include a broad array of variables of life satisfaction, happiness, negative emotions, COVID-19 effects, relationships valuations, and habits and the measurement of three scales: The Satisfaction with Life Scale (SWLS), a brief measurement of depressive symptomatology (CESD-7), and the Brief Strengths Scale (BSS).
      Citation: Data
      PubDate: 2024-03-06
      DOI: 10.3390/data9030044
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 45: A Dataset of Benthic Species from Mesophotic
           Bioconstructions on the Apulian Coast (Southeastern Italy, Mediterranean
           Sea)

    • Authors: Maria Mercurio, Guadalupe Giménez, Giorgio Bavestrello, Frine Cardone, Giuseppe Corriero, Jacopo Giampaoletti, Maria Flavia Gravina, Cataldo Pierri, Caterina Longo, Adriana Giangrande, Carlotta Nonnis Marzano
      First page: 45
      Abstract: Marine bioconstructions are complex habitats that represent a hotspot of biodiversity. Among Mediterranean bioconstructions, those thriving on mesophotic bottoms on southeastern Italian coasts are of particular interest due to their horizontal and vertical extension. In general, the communities that develop in the Mediterranean twilight zone encompassed within the first 30 m of depth are better known, while relatively few data are available on those at greater depths. By further investigating the diversity and structure of mesophotic bioconstructions in the southern Adriatic, we can improve our understanding of Mediterranean biodiversity while developing effective conservation strategies to preserve these habitats of particular interest. The dataset reported here comprises records of benthic marine taxa from algae and invertebrate mesophotic bioconstructions investigated at six sites along the southern Adriatic coast of Italy, at depths between approximately 25 and 65 m. The dataset contains a total of 1718 records, covering 11 phyla and 648 benthic taxa, of which 580 were recognized at the species level. These data could provide a reference point for further investigations with descriptive or management purposes, including the possible assessment of mesophotic bioconstructions as refuges for shallow-water species.
      Citation: Data
      PubDate: 2024-03-08
      DOI: 10.3390/data9030045
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 46: WEA-Acceptance Data—A Dataset of Acoustic,
           Meteorological, and Operational Wind Turbine Measurements

    • Authors: Daphne Schössow, Stephan Preihs, Jürgen Peissig
      First page: 46
      Abstract: In this article, a dataset is described which combines wind turbine supervisory control and data acquisition (SCADA), meteorological and acoustical data and thus gives a detailed description of a wind farm and its atmospheric and acoustic environment. The data were collected during different seasons for several weeks at a time, such that a multitude of environmental and operational conditions are covered. In five measurement campaigns, in total three different locations with similar surroundings were captured. The raw data were enhanced with derived values such as atmospheric stability or direction of sound propagation. Data of one month including all time series measurements as well as monophonic audio recordings are now published. The dataset also contains three exemplary use cases along with documents that describe the data pre-processing.
      Citation: Data
      PubDate: 2024-03-15
      DOI: 10.3390/data9030046
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 18: Can Data and Machine Learning Change the Future of
           Basic Income Models' A Bayesian Belief Networks Approach

    • Authors: Hamed Khalili
      First page: 18
      Abstract: Appeals to governments for implementing basic income are contemporary. The theoretical backgrounds of the basic income notion only prescribe transferring equal amounts to individuals irrespective of their specific attributes. However, the most recent basic income initiatives all around the world are attached to certain rules with regard to the attributes of the households. This approach is facing significant challenges to appropriately recognize vulnerable groups. A possible alternative for setting rules with regard to the welfare attributes of the households is to employ artificial intelligence algorithms that can process unprecedented amounts of data. Can integrating machine learning change the future of basic income by predicting households vulnerable to future poverty' In this paper, we utilize multidimensional and longitudinal welfare data comprising one and a half million individuals’ data and a Bayesian beliefs network approach to examine the feasibility of predicting households’ vulnerability to future poverty based on the existing households’ welfare attributes.
      Citation: Data
      PubDate: 2024-01-23
      DOI: 10.3390/data9020018
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 19: Draft Genome Sequence of the Commercial Strain
           Rhizobium ruizarguesonis bv. viciae RCAM1022

    • Authors: Olga A. Kulaeva, Evgeny A. Zorin, Anton S. Sulima, Gulnar A. Akhtemova, Vladimir A. Zhukov
      First page: 19
      Abstract: Legume plants enter a symbiosis with soil nitrogen-fixing bacteria (rhizobia), thereby gaining access to assimilable atmospheric nitrogen. Since this symbiosis is important for agriculture, biofertilizers with effective strains of rhizobia are created for crop legumes to increase their yield and minimize the amounts of mineral fertilizers required. In this work, we sequenced and characterized the genome of Rhizobium ruizarguesonis bv. viciae strain RCAM1022, a component of the ‘Rhizotorfin’ biofertilizer produced in Russia and used for pea (Pisum sativum L.).
      Citation: Data
      PubDate: 2024-01-23
      DOI: 10.3390/data9020019
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 20: An Optimized Hybrid Approach for Feature Selection
           Based on Chi-Square and Particle Swarm Optimization Algorithms

    • Authors: Amani Abdo, Rasha Mostafa, Laila Abdel-Hamid
      First page: 20
      Abstract: Feature selection is a significant issue in the machine learning process. Most datasets include features that are not needed for the problem being studied. These irrelevant features reduce both the efficiency and accuracy of the algorithm. It is possible to think about feature selection as an optimization problem. Swarm intelligence algorithms are promising techniques for solving this problem. This research paper presents a hybrid approach for tackling the problem of feature selection. A filter method (chi-square) and two wrapper swarm intelligence algorithms (grey wolf optimization (GWO) and particle swarm optimization (PSO)) are used in two different techniques to improve feature selection accuracy and system execution time. The performance of the two phases of the proposed approach is assessed using two distinct datasets. The results show that PSOGWO yields a maximum accuracy boost of 95.3%, while chi2-PSOGWO yields a maximum accuracy improvement of 95.961% for feature selection. The experimental results show that the proposed approach performs better than the compared approaches.
      Citation: Data
      PubDate: 2024-01-25
      DOI: 10.3390/data9020020
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 21: MHAiR: A Dataset of Audio-Image Representations
           for Multimodal Human Actions

    • Authors: Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam, Naveed Akhtar
      First page: 21
      Abstract: Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were captured from an existing video dataset, i.e., UCF101. Each data sample captured a duration of approximately 10 s long, and the overall dataset was split into 4893 training samples and 1944 testing samples. The resulting feature sequences were then converted into images, which can be used for human action recognition and other related tasks. These images can be used as a benchmark dataset for evaluating the performance of machine learning models for human action recognition and related tasks. These audio-image representations could be suitable for a wide range of applications, such as surveillance, healthcare monitoring, and robotics. The dataset can also be used for transfer learning, where pre-trained models can be fine-tuned on a specific task using specific audio images. Thus, this dataset can facilitate the development of new techniques and approaches for improving the accuracy of human action-related tasks and also serve as a standard benchmark for testing the performance of different machine learning models and algorithms.
      Citation: Data
      PubDate: 2024-01-25
      DOI: 10.3390/data9020021
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 22: Genomic Epidemiology Dataset for the Important
           Nosocomial Pathogenic Bacterium Acinetobacter baumannii

    • Authors: Andrey Shelenkov, Yulia Mikhaylova, Vasiliy Akimkin
      First page: 22
      Abstract: The infections caused by various bacterial pathogens both in clinical and community settings represent a significant threat to public healthcare worldwide. The growing resistance to antimicrobial drugs acquired by bacterial species causing healthcare-associated infections has already become a life-threatening danger noticed by the World Health Organization. Several groups or lineages of bacterial isolates, usually called ‘the clones of high risk’, often drive the spread of resistance within particular species. Thus, it is vitally important to reveal and track the spread of such clones and the mechanisms by which they acquire antibiotic resistance and enhance their survival skills. Currently, the analysis of whole-genome sequences for bacterial isolates of interest is increasingly used for these purposes, including epidemiological surveillance and the development of spread prevention measures. However, the availability and uniformity of the data derived from genomic sequences often represent a bottleneck for such investigations. With this dataset, we present the results of a genomic epidemiology analysis of 17,546 genomes of a dangerous bacterial pathogen, Acinetobacter baumannii. Important typing information, including multilocus sequence typing (MLST)-based sequence types (STs), intrinsic blaOXA-51-like gene variants, capsular (KL) and oligosaccharide (OCL) types, CRISPR-Cas systems, and cgMLST profiles are presented, as well as the assignment of particular isolates to nine known international clones of high risk. The presence of antimicrobial resistance genes within the genomes is also reported. These data will be useful for researchers in the field of A. baumannii genomic epidemiology, resistance analysis, and prevention measure development.
      Citation: Data
      PubDate: 2024-01-26
      DOI: 10.3390/data9020022
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 23: Comprehensive Dataset on Pre-SARS-CoV-2 Infection
           Sports-Related Physical Activity Levels, Disease Severity, and Treatment
           Outcomes: Insights and Implications for COVID-19 Management

    • Authors: Dimitrios I. Bourdas, Panteleimon Bakirtzoglou, Antonios K. Travlos, Vasileios Andrianopoulos, Emmanouil Zacharakis
      First page: 23
      Abstract: This dataset aimed to explore associations between pre-SARS-CoV-2 infection exercise and sports-related physical activity (PA) levels and disease severity, along with treatments administered following the most recent SARS-CoV-2 infection. A comprehensive analysis investigated the relationships between PA categories (“Inactive”, “Low PA”, “Moderate PA”, “High PA”), disease severity (“Sporadic”, “Episodic”, “Recurrent”, “Frequent”, “Persistent”), and treatments post-SARS-CoV-2 infection (“No treatment”, “Home remedies”, “Prescribed medication”, “Hospital admission”, “Intensive care unit admission”) within a sample population (n = 5829) from the Hellenic territory. Utilizing the Active-Q questionnaire, data were collected from February to March 2023, capturing PA habits, participant characteristics, medical history, vaccination status, and illness experiences. Findings revealed an independent relationship between preinfection PA levels and disease severity (χ2 = 9.097, df = 12, p = 0.695). Additionally, a statistical dependency emerged between PA levels and illness treatment categories (χ2 = 39.362, df = 12, p < 0.001), particularly linking inactive PA with home remedies treatment. These results highlight the potential influence of preinfection PA on disease severity and treatment choices following SARS-CoV-2 infection. The dataset offers valuable insights into the interplay between PA, disease outcomes, and treatment decisions, aiding future research in shaping targeted interventions and public health strategies related to COVID-19 management.
      Citation: Data
      PubDate: 2024-01-26
      DOI: 10.3390/data9020023
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 24: Mapping Hierarchical File Structures to Semantic
           Data Models for Efficient Data Integration into Research Data Management
           Systems

    • Authors: Henrik tom Wörden, Florian Spreckelsen, Stefan Luther, Ulrich Parlitz, Alexander Schlemmer
      First page: 24
      Abstract: Although other methods exist to store and manage data in modern information technology, the standard solution is file systems. Therefore, keeping well-organized file structures and file system layouts can be key to a sustainable research data management infrastructure. However, file structures alone lack several important capabilities for FAIR data management: the two most significant being insufficient visualization of data and inadequate possibilities for searching and obtaining an overview. Research data management systems (RDMSs) can fill this gap, but many do not support the simultaneous use of the file system and RDMS. This simultaneous use can have many benefits, but keeping data in RDMS in synchrony with the file structure is challenging. Here, we present concepts that allow for keeping file structures and semantic data models (in RDMS) synchronous. Furthermore, we propose a specification in yaml format that allows for a structured and extensible declaration and implementation of a mapping between the file system and data models used in semantic research data management. Implementing these concepts will facilitate the re-use of specifications for multiple use cases. Furthermore, the specification can serve as a machine-readable and, at the same time, human-readable documentation of specific file system structures. We demonstrate our work using the Open Source RDMS LinkAhead (previously named “CaosDB”).
      Citation: Data
      PubDate: 2024-01-26
      DOI: 10.3390/data9020024
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 25: Curating, Collecting, and Cataloguing Global
           COVID-19 Datasets for the Aim of Predicting Personalized Risk

    • Authors: Sepehr Golriz Khatami, Astghik Sargsyan, Maria Francesca Russo, Daniel Domingo-Fernández, Andrea Zaliani, Abish Kaladharan, Priya Sethumadhavan, Sarah Mubeen, Yojana Gadiya, Reagon Karki, Stephan Gebel, Ram Kumar Ruppa Surulinathan, Vanessa Lage-Rupprecht, Saulius Archipovas, Geltrude Mingrone, Marc Jacobs, Carsten Claussen, Martin Hofmann-Apitius, Alpha Tom Kodamullil
      First page: 25
      Abstract: Although hundreds of datasets have been published since the beginning of the coronavirus pandemic, there is a lack of centralized resources where these datasets are listed and harmonized to facilitate their applicability and uptake by predictive modeling approaches. Firstly, such a centralized resource provides information about data owners to researchers who are searching datasets to develop their predictive models. Secondly, the harmonization of the datasets supports simultaneously taking advantage of several similar datasets. This, in turn, does not only ease the imperative external validation of data-driven models but can also be used for virtual cohort generation, which helps to overcome data sharing impediments. Here, we present that the COVID-19 data catalogue is a repository that provides a landscape view of COVID-19 studies and datasets as a putative source to enable researchers to develop personalized COVID-19 predictive risk models. The COVID-19 data catalogue currently contains over 400 studies and their relevant information collected from a wide range of global sources such as global initiatives, clinical trial repositories, publications, and data repositories. Further, the curated content stored in this data catalogue is complemented by a web application, providing visualizations of these studies, including their references, relevant information such as measured variables, and the geographical locations of where these studies were performed. This resource is one of the first to capture, organize, and store studies, datasets, and metadata related to COVID-19 in a comprehensive repository. We believe that our work will facilitate future research and development of personalized predictive risk models for COVID-19.
      Citation: Data
      PubDate: 2024-01-29
      DOI: 10.3390/data9020025
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 26: Dataset for Electronics and Plasmonics in
           Graphene, Silicene, and Germanene Nanostrips

    • Authors: Talia Tene, Nataly Bonilla García, Miguel Ángel Sáez Paguay, John Vera, Marco Guevara, Cristian Vacacela Gomez, Stefano Bellucci
      First page: 26
      Abstract: The quest for novel materials with extraordinary electronic and plasmonic properties is an ongoing pursuit in the field of materials science. The dataset provides the results of a computational study that used ab initio and semi-analytical computations to model freestanding nanosystems. We delve into the world of ribbon-like materials, specifically graphene nanoribbons, silicene nanoribbons, and germanene nanoribbons, comparing their electronic and plasmonic characteristics. Our research reveals a myriad of insights, from the tunability of band structures and the influence of an atomic number on electronic properties to the adaptability of nanoribbons for optoelectronic applications. Further, we uncover the promise of these materials for biosensing, demonstrating their plasmon frequency tunability based on charge density and Fermi velocity modification. Our findings not only expand the understanding of these quasi-1D materials but also open new avenues for the development of cutting-edge devices and technologies. This data presentation holds immense potential for future advancements in electronics, optics, and molecular sensing.
      Citation: Data
      PubDate: 2024-01-30
      DOI: 10.3390/data9020026
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 27: Understanding Data Breach from a Global
           Perspective: Incident Visualization and Data Protection Law Review

    • Authors: Gabriel Arquelau Pimenta Rodrigues, André Luiz Marques Serrano, Amanda Nunes Lopes Espiñeira Lemos, Edna Dias Canedo, Fábio Lúcio Lopes de Mendonça, Robson de Oliveira Albuquerque, Ana Lucila Sandoval Orozco, Luis Javier García Villalba
      First page: 27
      Abstract: Data breaches result in data loss, including personal, health, and financial information that are crucial, sensitive, and private. The breach is a security incident in which personal and sensitive data are exposed to unauthorized individuals, with the potential to incur several privacy concerns. As an example, the French newspaper Le Figaro breached approximately 7.4 billion records that included full names, passwords, and e-mail and physical addresses. To reduce the likelihood and impact of such breaches, it is fundamental to strengthen the security efforts against this type of incident and, for that, it is first necessary to identify patterns of its occurrence, primarily related to the number of data records leaked, the affected geographical region, and its regulatory aspects. To advance the discussion in this regard, we study a dataset comprising 428 worldwide data breaches between 2018 and 2019, providing a visualization of the related statistics, such as the most affected countries, the predominant economic sector targeted in different countries, and the median number of records leaked per incident in different countries, regions, and sectors. We then discuss the data protection regulation in effect in each country comprised in the dataset, correlating key elements of the legislation with the statistical findings. As a result, we have identified an extensive disclosure of medical records in India and government data in Brazil in the time range. Based on the analysis and visualization, we find some interesting insights that researchers seldom focus on before, and it is apparent that the real dangers of data leaks are beyond the ordinary imagination. Finally, this paper contributes to the discussion regarding data protection laws and compliance regarding data breaches, supporting, for example, the decision process of data storage location in the cloud.
      Citation: Data
      PubDate: 2024-01-31
      DOI: 10.3390/data9020027
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 28: Organ-On-A-Chip (OOC) Image Dataset for Machine
           Learning and Tissue Model Evaluation

    • Authors: Valērija Movčana, Arnis Strods, Karīna Narbute, Fēlikss Rūmnieks, Roberts Rimša, Gatis Mozoļevskis, Maksims Ivanovs, Roberts Kadiķis, Kārlis Gustavs Zviedris, Laura Leja, Anastasija Zujeva, Tamāra Laimiņa, Arturs Abols
      First page: 28
      Abstract: Organ-on-a-chip (OOC) technology has emerged as a groundbreaking approach for emulating the physiological environment, revolutionizing biomedical research, drug development, and personalized medicine. OOC platforms offer more physiologically relevant microenvironments, enabling real-time monitoring of tissue, to develop functional tissue models. Imaging methods are the most common approach for daily monitoring of tissue development. Image-based machine learning serves as a valuable tool for enhancing and monitoring OOC models in real-time. This involves the classification of images generated through microscopy contributing to the refinement of model performance. This paper presents an image dataset, containing cell images generated from OOC setup with different cell types. There are 3072 images generated by an automated brightfield microscopy setup. For some images, parameters such as cell type, seeding density, time after seeding and flow rate are provided. These parameters along with predefined criteria can contribute to the evaluation of image quality and identification of potential artifacts. This dataset can be used as a basis for training machine learning classifiers for automated data analysis generated from an OOC setup providing more reliable tissue models, automated decision-making processes within the OOC framework and efficient research in the future.
      Citation: Data
      PubDate: 2024-02-01
      DOI: 10.3390/data9020028
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 29: A Comprehensive Data Pipeline for Comparing the
           Effects of Momentum on Sports Leagues

    • Authors: Jordan Truman Paul Noel, Vinicius Prado da Fonseca, Amilcar Soares
      First page: 29
      Abstract: Momentum has been a consistently studied aspect of sports science for decades. Among the established literature, there has, at times, been a discrepancy between conclusions. However, if momentum is indeed an actual phenomenon, it would affect all aspects of sports, from player evaluation to pre-game prediction and betting. Therefore, using momentum-based features that quantify a team’s linear trend of play, we develop a data pipeline that uses a small sample of recent games to assess teams’ quality of play and measure the predictive power of momentum-based features versus the predictive power of more traditional frequency-based features across several leagues using several machine learning techniques. More precisely, we use our pipeline to determine the differences in the predictive power of momentum-based features and standard statistical features for the National Hockey League (NHL), National Basketball Association (NBA), and five major first-division European football leagues. Our findings show little evidence that momentum has superior predictive power in the NBA. Still, we found some instances of the effects of momentum on the NHL that produced better pre-game predictors, whereas we view a similar trend in European football/soccer. Our results indicate that momentum-based features combined with frequency-based features could improve pre-game prediction models and that, in the future, momentum should be studied more from a feature/performance indicator point-of-view and less from the view of the dependence of sequential outcomes, thus attempting to distance momentum from the binary view of winning and losing.
      Citation: Data
      PubDate: 2024-02-01
      DOI: 10.3390/data9020029
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 30: Expanded Brain CT Dataset for the Development of
           AI Systems for Intracranial Hemorrhage Detection and Classification

    • Authors: Anna N. Khoruzhaya, Tatiana M. Bobrovskaya, Dmitriy V. Kozlov, Dmitriy Kuligovskiy, Vladimir P. Novik, Kirill M. Arzamasov, Elena I. Kremneva
      First page: 30
      Abstract: Intracranial hemorrhage (ICH) is a dangerous life-threatening condition leading to disability. Timely and high-quality diagnosis plays a huge role in the course and outcome of this disease. The gold standard in determining ICH is computed tomography. This method requires a prompt involvement of highly qualified personnel, which is not always possible, for example, in case of a staff shortage or increased workload. In such a situation, every minute counts, and time can be lost. The solution to this problem seems to be a set of diagnostic decisions, including the use of artificial intelligence, which will help to identify patients with ICH in a timely manner and provide prompt and quality medical care. However, the main obstacle to the development of artificial intelligence is a lack of high-quality datasets for training and testing. In this paper, we present a dataset including 800 brain CT scans consisting of multiple series of DICOM images with and without signs of ICH, enriched with clinical and technical parameters, as well as the methodology of its generation utilizing natural language processing tools. The dataset is publicly available, which contributes to increased competition in the development of artificial intelligence systems and their advancement and quality improvement.
      Citation: Data
      PubDate: 2024-02-06
      DOI: 10.3390/data9020030
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 31: The Yinshan Mountains Record over 10,000
           Landslides

    • Authors: Jingjing Sun, Chong Xu, Liye Feng, Lei Li, Xuewei Zhang, Wentao Yang
      First page: 31
      Abstract: China boasts a vast expanse of mountainous terrain, characterized by intricate geological conditions and structural features, resulting in frequent geological disasters. Among these, landslides, as prototypical geological hazards, pose significant threats to both lives and property. Consequently, conducting a comprehensive landslide inventory in mountainous regions is imperative for current research. This study concentrates on the Yinshan Mountains, an ancient fault-block mountain range spanning east–west in the central Inner Mongolia Autonomous Region, extending from Langshan Mountains in the west to Damaqun Mountains in the east, with the narrow sense Xiao–Yin Mountains District in between. Employing multi-temporal high-resolution remote sensing images from Google Earth, this study conducted visual interpretation, identifying 10,968 landslides in the Yinshan area, encompassing a total area of 308.94 km2. The largest landslide occupies 2.95 km2, while the smallest covers 84.47 m2. Specifically, the Langshan area comprises 331 landslides with a total area of 11.96 km2, the narrow sense Xiao–Yin Mountains include 3393 landslides covering 64.13 km2, and the Manhan Mountains, Damaqun Mountains, and adjacent areas account for 7244 landslides over a total area of 232.85 km2. This research not only contributes to global landslide cataloging initiatives but also serves as a robust foundation for future geohazard prevention and management efforts.
      Citation: Data
      PubDate: 2024-02-08
      DOI: 10.3390/data9020031
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 32: Data in Astrophysics and Geophysics: Novel
           Research and Applications

    • Authors: Vladimir A. Srećković, Milan S. Dimitrijević, Zoran R. Mijić
      First page: 32
      Abstract: Rapid development of communication technologies and constant technological improvements as a result of scientific discoveries require the establishment of specific databases [...]
      Citation: Data
      PubDate: 2024-02-08
      DOI: 10.3390/data9020032
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 33: Conflicting Marks Archive Dataset: A Dataset of
           Conflicting Marks from the Brazilian Intellectual Property Office

    • Authors: Igor Bezerra Reis, Rafael Ângelo Santos Leite, Mateus Miranda Torres, Alcides Gonçalves da Silva Neto, Francisco José da Silva e Silva, Ariel Soares Teles
      First page: 33
      Abstract: A registered trademark represents one of a company’s most valuable intellectual assets, acting as a safeguard against possible reputational damage and financial losses resulting from infringements of this intellectual property. To be registered, a mark must be unique and distinctive in relation to other trademarks which are already registered. In this paper, we describe the CMAD, an acronym for Conflicting Marks Archive Dataset. This dataset has been meticulously organized into pairs of marks (Number of pairs = 18,355) involved in copyright infringement across word, figurative and mixed marks. Organizations sought to register these marks with the National Institute of Industrial Property (INPI) in Brazil, and had their applications denied after analysis by intellectual property specialists. The robustness of this dataset is ensured by the intrinsic similarity of the conflicting marks, since the decisions were made by INPI specialists. This characteristic provides a reliable basis for the development and testing of tools designed to analyze similarity between marks, thus contributing to the evolution of practices and computer-based solutions in the field of intellectual property.
      Citation: Data
      PubDate: 2024-02-09
      DOI: 10.3390/data9020033
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 34: Draft Genome Sequencing of the Bacillus
           thuringiensis var. Thuringiensis Highly Insecticidal Strain 800/15

    • Authors: Anton E. Shikov, Iuliia A. Savina, Maria N. Romanenko, Anton A. Nizhnikov, Kirill S. Antonets
      First page: 34
      Abstract: The Bacillus thuringiensis serovar thuringiensis strain 800/15 has been actively used as an agent in biopreparations with high insecticidal activity against the larvae of the Colorado potato beetle Leptinotarsa decemlineata and gypsy moth Lymantria dispar. In the current study, we present the first draft genome of the 800/15 strain coupled with a comparative genomic analysis of its closest reference strains. The raw sequence data were obtained by Illumina technology on the HiSeq X platform and de novo assembled with the SPAdes v3.15.4 software. The genome reached 6,524,663 bp. in size and carried 6771 coding sequences, 3 of which represented loci encoding insecticidal toxins, namely, Spp1Aa1, Cry1Ab9, and Cry1Ba8 active against the orders Lepidoptera, Blattodea, Hemiptera, Diptera, and Coleoptera. We also revealed the biosynthetic gene clusters responsible for the synthesis of secondary metabolites, including fengycin, bacillibactin, and petrobactin with predicted antibacterial, fungicidal, and growth-promoting properties. Further comparative genomics suggested the strain is not enriched with genes linked with biological activities implying that agriculturally important properties rely more on the composition of loci rather than their abundance. The obtained genomic sequence of the strain with the experimental metadata could facilitate the computational prediction of bacterial isolates’ potency from genomic data.
      Citation: Data
      PubDate: 2024-02-10
      DOI: 10.3390/data9020034
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 35: COVID-19 Lockdown Effects on Sleep, Immune
           Fitness, Mood, Quality of Life, and Academic Functioning: Survey Data from
           Turkish University Students

    • Authors: Pauline A. Hendriksen, Sema Tan, Evi C. van Oostrom, Agnese Merlo, Hilal Bardakçi, Nilay Aksoy, Johan Garssen, Gillian Bruce, Joris C. Verster
      First page: 35
      Abstract: Previous studies from the Netherlands, Germany, and Argentina revealed that the 2019 coronavirus disease (COVID-19) pandemic and associated lockdown periods had a significant negative impact on the wellbeing and quality of life of students. The negative impact of lockdown periods on health correlates such as immune fitness, alcohol consumption, and mood were reflected in their academic functioning. As both the duration and intensity of lockdown measures differed between countries, it is important to replicate these findings in different countries and cultures. Therefore, the purpose of the current study was to examine the impact of the COVID-19 pandemic on immune fitness, mood, academic functioning, sleep, smoking, alcohol consumption, healthy diet, and quality of life among Turkish students. Turkish students in the age range of 18 to 30 years old were invited to complete an online survey. Data were collected from n = 307 participants and included retrospective assessments for six time periods: (1) BP (before the COVID-19 pandemic, 1 January 2020–10 March 2020), (2) NL1 (the first no lockdown period, 11 March 2020–28 April 2021), (3) the lockdown period (29 April 2021–17 May 2021), (4) NL2 (the second no lockdown period, 18 May 2021–31 December 2021), (5) NL3 (the third no lockdown period, 1 January 2022–December 2022), and (6) for the past month. In this data descriptor article, the content of the survey and the dataset are described.
      Citation: Data
      PubDate: 2024-02-10
      DOI: 10.3390/data9020035
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 36: AriAplBud: An Aerial Multi-Growth Stage Apple
           Flower Bud Dataset for Agricultural Object Detection Benchmarking

    • Authors: Wenan Yuan
      First page: 36
      Abstract: As one of the most important topics in contemporary computer vision research, object detection has received wide attention from the precision agriculture community for diverse applications. While state-of-the-art object detection frameworks are usually evaluated against large-scale public datasets containing mostly non-agricultural objects, a specialized dataset that reflects unique properties of plants would aid researchers in investigating the utility of newly developed object detectors within agricultural contexts. This article presents AriAplBud: a close-up apple flower bud image dataset created using an unmanned aerial vehicle (UAV)-based red–green–blue (RGB) camera. AriAplBud contains 3600 images of apple flower buds at six growth stages, with 110,467 manual bounding box annotations as positive samples and 2520 additional empty orchard images containing no apple flower bud as negative samples. AriAplBud can be directly deployed for developing object detection models that accept Darknet annotation format without additional preprocessing steps, serving as a potential benchmark for future agricultural object detection research. A demonstration of developing YOLOv8-based apple flower bud detectors is also presented in this article.
      Citation: Data
      PubDate: 2024-02-11
      DOI: 10.3390/data9020036
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 37: Digital Elevation Models and Orthomosaics of the
           Dutch Noordwest Natuurkern Foredune Restoration Project

    • Authors: Gerben Ruessink, Dick Groenendijk, Bas Arens
      First page: 37
      Abstract: Coastal dunes worldwide are increasingly under pressure from the adverse effects of human activities. Therefore, more and more restoration measures are being taken to create conditions that help disturbed coastal dune ecosystems regenerate or recover naturally. However, many projects lack the (open-access) monitoring observations needed to signal whether further actions are needed, and hence lack the opportunity to "learn by doing". This submission presents an open-access data set of 37 high-resolution digital elevation models and 24 orthomosaics collected before and after the excavation of five artificial foredune trough blowouts (“notches”) in winter 2012/2013 in the Dutch Zuid-Kennemerland National Park, one of the largest coastal dune restoration projects in northwest Europe. These high-resolution data provide a valuable resource for improving understanding of the biogeomorphic processes that determine the evolution of restored dune systems as well as developing guidelines to better design future restoration efforts with foredune notching.
      Citation: Data
      PubDate: 2024-02-15
      DOI: 10.3390/data9020037
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 38: Multimodal Hinglish Tweet Dataset for Deep
           Pragmatic Analysis

    • Authors: Pratibha, Amandeep Kaur, Meenu Khurana, Robertas Damaševičius
      First page: 38
      Abstract: Wars, conflicts, and peace efforts have become inherent characteristics of regions, and understanding the prevailing sentiments related to these issues is crucial for finding long-lasting solutions. Twitter/`X’, with its vast user base and real-time nature, provides a valuable source to assess the raw emotions and opinions of people regarding war, conflict, and peace. This paper focuses on collecting and curating hinglish tweets specifically related to wars, conflicts, and associated taxonomy. The creation of said dataset addresses the existing gap in contemporary literature, which lacks comprehensive datasets capturing the emotions and sentiments expressed by individuals regarding wars, conflicts, and peace efforts. This dataset holds significant value and application in deep pragmatic analysis as it enables future researchers to identify the flow of sentiments, analyze the information architecture surrounding war, conflict, and peace effects, and delve into the associated psychology in this context. To ensure the dataset’s quality and relevance, a meticulous selection process was employed, resulting in the inclusion of explanable 500 carefully chosen search filters. The dataset currently has 10,040 tweets that have been validated with the help of human expert to make sure they are correct and accurate.
      Citation: Data
      PubDate: 2024-02-15
      DOI: 10.3390/data9020038
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 11: ADAS Simulation Result Dataset Processing Based on
           Improved BP Neural Network

    • Authors: Songyan Zhao, Lingshan Chen, Yongchao Huang
      First page: 11
      Abstract: The autonomous driving simulation field lacks evaluation and forecasting systems for simulation results. The data obtained from the simulation of target algorithms and vehicle models cannot be reasonably estimated. This problem affects subsequent vehicle improvement and parameter calibration. The authors relied on the simulation results of the AEB algorithm. We selected the BP Neural Network as the basis and improved it with a genetic algorithm optimized via a roulette algorithm. The regression evaluation indicators of the prediction results show that the GA-BP neural network has better prediction accuracy and generalization ability than the original BP neural network and other optimized BP neural networks. This GA-BP neural network also fills the Gap in Evaluation and Prediction Systems.
      Citation: Data
      PubDate: 2024-01-05
      DOI: 10.3390/data9010011
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 12: DeepSpaceYoloDataset: Annotated Astronomical
           Images Captured with Smart Telescopes

    • Authors: Olivier Parisot
      First page: 12
      Abstract: Recent smart telescopes allow the automatic collection of a large quantity of data for specific portions of the night sky—with the goal of capturing images of deep sky objects (nebula, galaxies, globular clusters). Nevertheless, human verification is still required afterwards to check whether celestial targets are effectively visible in the images produced by these instruments. Depending on the magnitude of deep sky objects, the observation conditions and the cumulative time of data acquisition, it is possible that only stars are present in the images. In addition, unfavorable external conditions (light pollution, bright moon, etc.) can make capture difficult. In this paper, we describe DeepSpaceYoloDataset, a set of 4696 RGB astronomical images captured by two smart telescopes and annotated with the positions of deep sky objects that are effectively in the images. This dataset can be used to train detection models on this type of image, enabling the better control of the duration of capture sessions, but also to detect unexpected celestial events such as supernova.
      Citation: Data
      PubDate: 2024-01-10
      DOI: 10.3390/data9010012
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 13: Adaptive Forecasting in Energy Consumption: A
           Bibliometric Analysis and Review

    • Authors: Manuel Jaramillo, Wilson Pavón, Lisbeth Jaramillo
      First page: 13
      Abstract: This paper addresses the challenges in forecasting electrical energy in the current era of renewable energy integration. It reviews advanced adaptive forecasting methodologies while also analyzing the evolution of research in this field through bibliometric analysis. The review highlights the key contributions and limitations of current models with an emphasis on the challenges of traditional methods. The analysis reveals that Long Short-Term Memory (LSTM) networks, optimization techniques, and deep learning have the potential to model the dynamic nature of energy consumption, but they also have higher computational demands and data requirements. This review aims to offer a balanced view of current advancements and challenges in forecasting methods, guiding researchers, policymakers, and industry experts. It advocates for collaborative innovation in adaptive methodologies to enhance forecasting accuracy and support the development of resilient, sustainable energy systems.
      Citation: Data
      PubDate: 2024-01-11
      DOI: 10.3390/data9010013
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 14: GeMSyD: Generic Framework for Synthetic Data
           Generation

    • Authors: Ramona Tolas, Raluca Portase, Rodica Potolea
      First page: 14
      Abstract: In the era of data-driven technologies, the need for diverse and high-quality datasets for training and testing machine learning models has become increasingly critical. In this article, we present a versatile methodology, the Generic Methodology for Constructing Synthetic Data Generation (GeMSyD), which addresses the challenge of synthetic data creation in the context of smart devices. GeMSyD provides a framework that enables the generation of synthetic datasets, aligning them closely with real-world data. To demonstrate the utility of GeMSyD, we instantiate the methodology by constructing a synthetic data generation framework tailored to the domain of event-based data modeling, specifically focusing on user interactions with smart devices. Our framework leverages GeMSyD to create synthetic datasets that faithfully emulate the dynamics of human–device interactions, including the temporal dependencies. Furthermore, we showcase how the synthetic data generated using our framework can serve as a valuable resource for machine learning practitioners. By employing these synthetic datasets, we perform a series of experiments to evaluate the performance of a neural-network-based prediction model in the domain of smart device interaction. Our results underscore the potential of synthetic data in facilitating model development and benchmarking.
      Citation: Data
      PubDate: 2024-01-11
      DOI: 10.3390/data9010014
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 15: Proteomic and Metabolomic Analyses of the Blood
           Samples of Highly Trained Athletes

    • Authors: Kristina A. Malsagova, Arthur T. Kopylov, Vasiliy I. Pustovoyt, Evgenii I. Balakin, Ksenia A. Yurku, Alexander A. Stepanov, Liudmila I. Kulikova, Vladimir R. Rudnev, Anna L. Kaysheva
      First page: 15
      Abstract: High exercise loading causes intricate and ambiguous proteomic and metabolic changes. This study aims to describe the dataset on protein and metabolite contents in plasma samples collected from highly trained athletes across different sports disciplines. The proteomic and metabolomic analyses of the plasma samples of highly trained athletes engaged in sports disciplines of different intensities were carried out using HPLC-MS/MS. The results are reported as two datasets (proteomic data in a derived mgf-file and metabolomic data in processed format), each containing the findings obtained by analyzing 93 mass spectra. Variations in the protein and metabolite contents of the biological samples are observed, depending on the intensity of training load for different sports disciplines. Mass spectrometric proteomic and metabolomic studies can be used for classifying different athlete phenotypes according to the intensity of sports discipline and for the assessment of the efficiency of the recovery period.
      Citation: Data
      PubDate: 2024-01-16
      DOI: 10.3390/data9010015
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 16: Elliott State Research Forest Timber Cruise,
           Oregon, 2015–2016

    • Authors: Todd West, Bogdan M. Strimbu
      First page: 16
      Abstract: The Elliott State Research Forest comprises 33,700 ha of temperate, Douglas-fir rainforest along North America’s Pacific Coast (Oregon, United States). In 2015, naturally regenerated stands at least 92 years old covered 49% of the research area and sawtimber plantations younger than 68 years another 50%. During the winter of 2015–2016, a forest wide inventory sampled both naturally regenerated and plantation stands, recording 97,424 trees on 17,866 plots in 738 stands. The resulting dataset is atypical for the area as plot locations were not restricted to upland, commercially harvestable timber. Multiage stands and riparian areas were therefore documented along with plantations 2–61 years old and trees retained through clearcut harvests. This dataset constitutes the only open access, stand-based forest inventory currently available for a large area within the Oregon Coast Range. The dataset enables development of suites of models as well as many comparisons across stand ages and types, both at stand level and at the level of individual trees.
      Citation: Data
      PubDate: 2024-01-18
      DOI: 10.3390/data9010016
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 17: Machine Learning Classification Workflow and
           Datasets for Ionospheric VLF Data Exclusion

    • Authors: Arnaut, Kolarski, Srećković
      First page: 17
      Abstract: Machine learning (ML) methods are commonly applied in the fields of extraterrestrial physics, space science, and plasma physics. In a prior publication, an ML classification technique, the Random Forest (RF) algorithm, was utilized to automatically identify and categorize erroneous signals, including instrument errors, noisy signals, outlier data points, and the impact of solar flares (SFs) on the ionosphere. This data communication includes the pre-processed dataset used in the aforementioned research, along with a workflow that utilizes the PyCaret library and a post-processing workflow. The code and data serve educational purposes in the interdisciplinary field of ML and ionospheric physics science, as well as being useful to other researchers for diverse objectives.
      Citation: Data
      PubDate: 2024-01-18
      DOI: 10.3390/data9010017
      Issue No: Vol. 9, No. 1 (2024)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.92.91.54
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-
JournalTOCs
 
 

 A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
Showing 1 - 200 of 265 Journals sorted alphabetically
Accountability in Research: Policies and Quality Assurance     Hybrid Journal   (Followers: 19)
Acta Nova     Open Access   (Followers: 2)
Acta Scientifica Malaysia     Open Access   (Followers: 1)
Acta Scientifica Naturalis     Open Access   (Followers: 4)
Adıyaman University Journal of Science     Open Access  
Advanced Science     Open Access   (Followers: 16)
Advanced Science, Engineering and Medicine     Partially Free   (Followers: 8)
Advanced Theory and Simulations     Hybrid Journal   (Followers: 5)
Advances in Research     Open Access  
Advances in Science and Technology     Full-text available via subscription   (Followers: 18)
African Journal of Science, Technology, Innovation and Development     Hybrid Journal   (Followers: 7)
Afrique Science : Revue Internationale des Sciences et Technologie     Open Access   (Followers: 1)
AFRREV STECH : An International Journal of Science and Technology     Open Access   (Followers: 3)
Alfarama Journal of Basic & Applied Sciences     Open Access   (Followers: 12)
American Academic & Scholarly Research Journal     Open Access   (Followers: 4)
American Journal of Applied Sciences     Open Access   (Followers: 22)
American Journal of Humanities and Social Sciences     Open Access   (Followers: 13)
Anales del Instituto de la Patagonia     Open Access  
Applied Mathematics and Nonlinear Sciences     Open Access   (Followers: 2)
Arab Journal of Basic and Applied Sciences     Open Access  
Arabian Journal for Science and Engineering     Hybrid Journal   (Followers: 1)
Archives Internationales d'Histoire des Sciences     Partially Free   (Followers: 5)
Archives of Current Research International     Open Access  
ARPHA Conference Abstracts     Open Access   (Followers: 1)
ARPHA Proceedings     Open Access  
Asian Journal of Advanced Research and Reports     Open Access  
Asian Journal of Scientific Research     Open Access   (Followers: 2)
Asian Journal of Technology Innovation     Hybrid Journal   (Followers: 5)
Australian Field Ornithology     Full-text available via subscription   (Followers: 1)
Australian Journal of Social Issues     Hybrid Journal   (Followers: 6)
Bangladesh Journal of Scientific Research     Open Access  
Beni-Suef University Journal of Basic and Applied Sciences     Open Access   (Followers: 1)
Berichte Zur Wissenschaftsgeschichte     Hybrid Journal   (Followers: 11)
Bilge International Journal of Science and Technology Research     Open Access  
Bioethics Research Notes     Full-text available via subscription   (Followers: 15)
BJHS Themes     Open Access   (Followers: 1)
Bulletin de la Société Royale des Sciences de Liège     Open Access  
Bulletin of the National Research Centre     Open Access  
Chain Reaction     Full-text available via subscription  
Ciencia Amazónica (Iquitos)     Open Access  
Ciencia en su PC     Open Access   (Followers: 1)
Ciencia Ergo Sum     Open Access  
Ciência ET Praxis     Open Access  
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering     Open Access  
Comunicata Scientiae     Open Access  
Conference Papers in Science     Open Access  
Configurations     Full-text available via subscription   (Followers: 11)
COSMOS     Hybrid Journal   (Followers: 1)
Crea Ciencia Revista Científica     Open Access  
Current Issues in Criminal Justice     Hybrid Journal   (Followers: 14)
Current Research in Geoscience     Open Access   (Followers: 6)
Data     Open Access   (Followers: 4)
Dhaka University Journal of Science     Open Access  
Discover Sustainability     Open Access   (Followers: 5)
Einstein (São Paulo)     Open Access  
Ekaia : EHUko Zientzia eta Teknologia aldizkaria     Open Access  
Emergent Scientist     Open Access  
Enhancing Learning in the Social Sciences     Open Access   (Followers: 7)
Enseñanza de las Ciencias : Revista de Investigación y Experiencias Didácticas     Open Access  
Entramado     Open Access  
Entre Ciencia e Ingeniería     Open Access  
Epiphany     Open Access   (Followers: 1)
Ethiopian Journal of Education and Sciences     Open Access   (Followers: 5)
European Online Journal of Natural and Social Sciences     Open Access   (Followers: 4)
European Scientific Journal     Open Access   (Followers: 7)
Evidência - Ciência e Biotecnologia - Interdisciplinar     Open Access  
Exchanges : the Warwick Research Journal     Open Access   (Followers: 1)
Experimental Results     Open Access   (Followers: 2)
Fides et Ratio : Revista de Difusión Cultural y Científica     Open Access  
Fontanus     Open Access   (Followers: 1)
Forensic Science Policy & Management: An International Journal     Hybrid Journal   (Followers: 252)
Frontiers in Climate     Open Access   (Followers: 5)
Frontiers in Science     Open Access   (Followers: 1)
Fundamental Research     Open Access  
Futures & Foresight Science     Hybrid Journal   (Followers: 1)
Gaudium Sciendi     Open Access  
Ghana Studies     Full-text available via subscription   (Followers: 15)
Global Journal of Pure and Applied Sciences     Full-text available via subscription  
Globe, The     Full-text available via subscription   (Followers: 4)
HardwareX     Open Access  
Heidelberger Jahrbücher Online     Open Access  
Heliyon     Open Access   (Followers: 1)
History of Science and Technology     Open Access   (Followers: 6)
Hoosier Science Teacher     Open Access  
Indian Journal of History of Science     Hybrid Journal   (Followers: 3)
Instruments     Open Access  
Interciencia     Open Access  
International Annals of Science     Open Access  
International Journal of Advanced Multidisciplinary Research and Review     Open Access  
International Journal of Applied Science     Open Access  
International Journal of Engineering, Science and Technology     Open Access  
International Journal of Network Science     Hybrid Journal   (Followers: 3)
International Journal of Social Sciences and Management     Open Access   (Followers: 2)
International Journal of Technology Policy and Law     Hybrid Journal   (Followers: 10)
International Science and Technology Journal of Namibia     Open Access   (Followers: 2)
International Scientific and Vocational Studies Journal     Open Access  
Investiga : TEC     Open Access  
Investigación Joven     Open Access  
Investigacion y Ciencia     Open Access   (Followers: 1)
Iranian Journal of Science and Technology, Transactions A : Science     Hybrid Journal  
iScience     Open Access   (Followers: 2)
Issues in Science & Technology     Free   (Followers: 8)
Ithaca : Viaggio nella Scienza     Open Access  
J : Multidisciplinary Scientific Journal     Open Access  
Jaunujų mokslininkų darbai     Open Access   (Followers: 3)
Journal de la Recherche Scientifique de l'Universite de Lome     Full-text available via subscription  
Journal of Chromatography & Separation Techniques     Open Access   (Followers: 9)
Journal of Advanced Research     Open Access   (Followers: 2)
Journal of Analytical Science & Technology     Open Access   (Followers: 5)
Journal of Applied Science and Technology     Full-text available via subscription   (Followers: 1)
Journal of Applied Sciences and Environmental Management     Open Access   (Followers: 1)
Journal of Big History     Open Access   (Followers: 4)
Journal of Composites Science     Open Access   (Followers: 4)
Journal of Diversity Management     Open Access   (Followers: 4)
Journal of Indian Council of Philosophical Research     Hybrid Journal  
Journal of Institute of Science and Technology     Open Access  
Journal of King Saud University - Science     Open Access  
Journal of Mathematical and Fundamental Sciences     Open Access  
Journal of Negative and No Positive Results     Open Access  
Journal of Responsible Technology     Open Access  
Journal of Science and Technology     Open Access   (Followers: 2)
Journal of Science and Technology     Open Access   (Followers: 1)
Journal of Science and Technology (Ghana)     Open Access   (Followers: 3)
Journal of Science and Technology Policy Management     Hybrid Journal   (Followers: 1)
Journal of Science Foundation     Open Access   (Followers: 1)
Journal of Scientific Research and Reports     Open Access   (Followers: 1)
Journal of Shanghai Jiaotong University (Science)     Hybrid Journal  
Journal of Social Science Research     Open Access   (Followers: 2)
Journal of Taibah University for Science     Open Access  
Journal of the Ghana Science Association     Full-text available via subscription   (Followers: 3)
Journal of the History of Ideas     Full-text available via subscription   (Followers: 168)
Journal of the Indian Institute of Science     Hybrid Journal   (Followers: 4)
Journal of the Royal Society of New Zealand     Hybrid Journal   (Followers: 49)
Journal of the South Carolina Academy of Science     Open Access  
Journal of Unsolved Questions     Open Access  
Jurnal Sains Dasar     Open Access  
Jurnal Teknosains     Open Access  
Karaelmas Science and Engineering Journal     Open Access  
Karbala International Journal of Modern Science     Open Access  
Kennedy Institute of Ethics Journal     Full-text available via subscription   (Followers: 10)
Logo STI Science, Technology and Innovation     Open Access   (Followers: 14)
Malawi Journal of Science and Technology     Open Access   (Followers: 6)
Maskana     Open Access  
MethodsX     Open Access  
Mètode Science Studies Journal : Annual Review     Open Access  
Modern Applied Science     Open Access   (Followers: 1)
Momona Ethiopian Journal of Science     Open Access   (Followers: 5)
National Academy Science Letters     Hybrid Journal   (Followers: 3)
National Science Review     Hybrid Journal   (Followers: 1)
Natural Sciences     Open Access  
Natural Sciences Education     Hybrid Journal   (Followers: 1)
Naturen     Full-text available via subscription  
Nepal Journal of Science and Technology     Open Access  
Network Science     Hybrid Journal   (Followers: 4)
Nordic Journal of Science and Technology     Open Access   (Followers: 2)
Nordic Studies in Science Education     Open Access   (Followers: 3)
Nova     Open Access  
Open Conference Proceedings Journal     Open Access  
Open Journal of Applied Sciences     Open Access  
Orbis Cógnita : Revista Científica     Open Access   (Followers: 2)
Patterns     Open Access   (Followers: 9)
People and Nature     Open Access   (Followers: 4)
Población y Desarrollo - Argonautas y caminantes     Open Access  
Politique et Sociétés     Full-text available via subscription   (Followers: 1)
Portal de la Ciencia     Open Access  
Proceedings of the Indian National Science Academy     Full-text available via subscription   (Followers: 5)
Proceedings of the Linnean Society of New South Wales     Full-text available via subscription   (Followers: 2)
Proceedings of the Royal Society of Queensland, The     Full-text available via subscription  
QScience Connect     Open Access  
Quantum Science and Technology     Hybrid Journal   (Followers: 15)
Rafidain Journal of Science     Open Access  
Rehabilitation Research, Policy, and Education     Hybrid Journal   (Followers: 2)
Reportes Científicos de la FaCEN     Open Access  
Reports in Advances of Physical Sciences     Open Access  
Research Ideas and Outcomes     Open Access  
Research Integrity and Peer Review     Open Access  
Research Policy : X     Open Access   (Followers: 3)
Respuestas     Open Access  
Revista Bases de la Ciencia     Open Access  
Revista Cientifica Guillermo de Ockham     Open Access  
Revista Conhecimento Online     Open Access  
Revista Crítica de Ciências Sociais     Open Access  
Revista de Ciencia y Tecnología     Open Access  
Revista de la Academia Colombiana de Ciencias Exactas, Físicas y Naturales     Open Access  
Revista de la Universidad del Zulia     Open Access  
Revista Politécnica     Open Access  
Revista Tecnológica     Open Access  
Revista UniVap     Open Access  
SAINSTIS     Open Access  
Sainteknol : Jurnal Sains dan Teknologi     Open Access  
Sci     Open Access  
Science     Full-text available via subscription   (Followers: 5082)
Science & Diplomacy     Free   (Followers: 3)
Science Advances     Free   (Followers: 44)
Science and Technology     Open Access   (Followers: 2)
Science Heritage Journal     Open Access  
Science World Journal     Open Access  
Science, Technology and Arts Research Journal     Open Access   (Followers: 1)
ScienceRise     Open Access  
Sciences du jeu     Open Access  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Similar Journals
Similar Journals
HOME > Browse the 73 Subjects covered by JournalTOCs  
SubjectTotal Journals
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.92.91.54
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-