A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

  First | 1 2        [Sort alphabetically]   [Restore default list]

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
Showing 201 - 265 of 265 Journals sorted by number of followers
Quantum Science and Technology     Hybrid Journal   (Followers: 15)
Logo STI Science, Technology and Innovation     Open Access   (Followers: 14)
Alfarama Journal of Basic & Applied Sciences     Open Access   (Followers: 12)
RAC: Revista Angolana de Ciências     Open Access   (Followers: 11)
Patterns     Open Access   (Followers: 9)
The Innovation     Open Access   (Followers: 8)
Revista de la Sociedad Científica del Paraguay     Open Access   (Followers: 7)
Research     Open Access   (Followers: 6)
History of Science and Technology     Open Access   (Followers: 6)
Advanced Theory and Simulations     Hybrid Journal   (Followers: 5)
Frontiers in Climate     Open Access   (Followers: 5)
Discover Sustainability     Open Access   (Followers: 5)
Proceedings of the Indian National Science Academy     Full-text available via subscription   (Followers: 5)
International Journal of Culture and Modernity     Open Access   (Followers: 5)
Middle European Scientific Bulletin     Open Access   (Followers: 5)
Data     Open Access   (Followers: 4)
Science & Technology Studies     Open Access   (Followers: 4)
Journal of the Indian Institute of Science     Hybrid Journal   (Followers: 4)
Journal of Big History     Open Access   (Followers: 4)
MUST : Journal of Mathematics Education, Science and Technology     Open Access   (Followers: 4)
Journal of Composites Science     Open Access   (Followers: 4)
People and Nature     Open Access   (Followers: 4)
Citizen Science : Theory and Practice     Open Access   (Followers: 3)
Research Policy : X     Open Access   (Followers: 3)
Revista Saber Digital     Open Access   (Followers: 3)
Indian Journal of History of Science     Hybrid Journal   (Followers: 3)
Jaunujų mokslininkų darbai     Open Access   (Followers: 3)
Journal of Alasmarya University     Open Access   (Followers: 3)
iScience     Open Access   (Followers: 2)
Applied Mathematics and Nonlinear Sciences     Open Access   (Followers: 2)
Acta Nova     Open Access   (Followers: 2)
Indonesian Journal of Science and Mathematics Education     Open Access   (Followers: 2)
Rekayasa     Open Access   (Followers: 2)
Experimental Results     Open Access   (Followers: 2)
South American Sciences     Open Access   (Followers: 2)
BJHS Themes     Open Access   (Followers: 2)
Orbis Cógnita : Revista Científica     Open Access   (Followers: 2)
Revista Científica de la Universidad Nacional del Este     Open Access   (Followers: 2)
International Science and Technology Journal of Namibia     Open Access   (Followers: 2)
Scientific Bulletin     Open Access   (Followers: 1)
Global Journal of Science Frontier Research     Open Access   (Followers: 1)
Impact     Open Access   (Followers: 1)
International Journal of Research in Science     Open Access   (Followers: 1)
Journal of Science and Technology     Open Access   (Followers: 1)
Uluslararası Bilimsel Araştırmalar Dergisi (IBAD)     Open Access   (Followers: 1)
Acta Scientifica Malaysia     Open Access   (Followers: 1)
Scientonomy : Journal for the Science of Science     Open Access   (Followers: 1)
Revista Vivências em Ensino de Ciências     Open Access   (Followers: 1)
PENDIPA : Journal of Science Education     Open Access   (Followers: 1)
Journal of Science and Engineering     Open Access   (Followers: 1)
International Journal of Innovative Research and Scientific Studies     Open Access   (Followers: 1)
Futures & Foresight Science     Hybrid Journal   (Followers: 1)
Journal of Scientific Research and Reports     Open Access   (Followers: 1)
AAS Open Research     Open Access   (Followers: 1)
ARPHA Conference Abstracts     Open Access   (Followers: 1)
Rihan Journal for Scientific Publishing     Open Access   (Followers: 1)
Natural Sciences Education     Hybrid Journal   (Followers: 1)
Fundamental Research     Open Access  
Research Integrity and Peer Review     Open Access  
Journal of Responsible Technology     Open Access  
Natural Sciences     Open Access  
Türk Bilim ve Mühendislik Dergisi     Open Access  
ArtefaCToS : Revista de estudios sobre la ciencia y la tecnología     Open Access  
Ethiopian Journal of Sciences and Sustainable Development     Open Access  
Vilnius University Proceedings     Open Access  
Sciential     Open Access  
ARPHA Proceedings     Open Access  
Gaudium Sciendi     Open Access  
Crea Ciencia Revista Científica     Open Access  
Rafidain Journal of Science     Open Access  
Journal of Al-Qadisiyah for Pure Science     Open Access  
Revista Tecnológica     Open Access  
Himalayan Journal of Science and Technology     Open Access  
International Journal of Academic Research in Business, Arts & Science     Open Access  
Universidad, Ciencia y Tecnología     Open Access  
Fides et Ratio : Revista de Difusión Cultural y Científica     Open Access  
Revista de la Academia Colombiana de Ciencias Exactas, Físicas y Naturales     Open Access  
Entre Ciencia e Ingeniería     Open Access  
Revista Politécnica     Open Access  
Reportes Científicos de la FaCEN     Open Access  
Jurnal Ilmiah Ilmu Terapan Universitas Jambi : JIITUJ     Open Access  
Revista Eletrônica Ludus Scientiae     Open Access  
Emergent Scientist     Open Access  
Asian Journal of Advanced Research and Reports     Open Access  
Archives of Current Research International     Open Access  
Advances in Research     Open Access  
International Journal of Applied Science     Open Access  
Iranian Journal of Science and Technology, Transactions A : Science     Hybrid Journal  
J : Multidisciplinary Scientific Journal     Open Access  
Revista Binacional Brasil - Argentina: Diálogo entre as ciências     Open Access  
Revista Ciencia y Tecnología     Open Access  
Journal of Institute of Science and Technology     Open Access  
Journal of Science (JSc)     Open Access  
WikiJournal of Science     Open Access  
Acta Materialia Transilvanica     Open Access  
Integrated Research Advances     Open Access  
Open Conference Proceedings Journal     Open Access  
Naturen     Full-text available via subscription  
Ekaia : EHUko Zientzia eta Teknologia aldizkaria     Open Access  
Sci     Open Access  
Maskana     Open Access  
Hoosier Science Teacher     Open Access  
Reports in Advances of Physical Sciences     Open Access  
Facets     Open Access  
Adıyaman University Journal of Science     Open Access  
Revista Brasileira de Iniciação Científica     Open Access  
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering     Open Access  
Scientific African     Open Access  
Scientific Journal of Mehmet Akif Ersoy University     Open Access  
Black Sea Journal of Engineering and Science     Open Access  
Fırat University Turkish Journal of Science & Technology     Open Access  
Gazi University Journal of Science     Open Access  
Middle East Journal of Science     Open Access  
International Journal of Computational and Experimental Science and Engineering (IJCESEN)     Open Access  
International Journal of Engineering, Technology and Natural Sciences     Open Access  
Bulletin of the National Research Centre     Open Access  
Uni-pluriversidad     Open Access  
ConCiencia     Open Access  
Ciencia y Tecnología     Open Access  
Revista Bases de la Ciencia     Open Access  
Elkawnie : Journal of Islamic Science and Technology     Open Access  
Ciência ET Praxis     Open Access  
Arab Journal of Basic and Applied Sciences     Open Access  
International Annals of Science     Open Access  
Science Heritage Journal     Open Access  
Bilge International Journal of Science and Technology Research     Open Access  
Avrasya Terim Dergisi     Open Access  
International Scientific and Vocational Studies Journal     Open Access  
TÜBAV Bilim Dergisi     Open Access  
LOGIKA Jurnal Ilmiah Lemlit Unswagati Cirebon     Open Access  
Dalat University Journal of Science     Open Access  
Investiga : TEC     Open Access  
Investigación Joven     Open Access  
Respuestas     Open Access  
Science Diliman     Open Access  
Instruments     Open Access  
Revista Científica y Tecnológica UPSE     Open Access  
HardwareX     Open Access  
Sultan Qaboos University Journal for Science     Open Access  
Borneo Journal of Resource Science and Technology     Open Access  
Sainstek : Jurnal Sains dan Teknologi     Open Access  
Revista de Información Científica     Open Access  
Indonesian Journal of Fundamental Sciences     Open Access  
Sainteknol : Jurnal Sains dan Teknologi     Open Access  
Jurnal Natural     Open Access  
Frontiers for Young Minds     Open Access  
Revista Ciência, Tecnologia & Ambiente     Open Access  
Journal of Indian Council of Philosophical Research     Hybrid Journal  
Journal of Negative and No Positive Results     Open Access  
Revista Conhecimento Online     Open Access  
Nova     Open Access  
CienciaUAT     Open Access  
Enseñanza de las Ciencias : Revista de Investigación y Experiencias Didácticas     Open Access  
Makara Journal of Science     Open Access  
Jurnal Sains Dasar     Open Access  
Indonesian Journal of Science and Technology     Open Access  
Ethiopian Journal of Science and Technology     Open Access  
Jurnal Matematika, Sains, Dan Teknologi     Open Access  
Heidelberger Jahrbücher Online     Open Access  
ARO. The Scientific Journal of Koya University     Open Access  
International Journal of Recent Contributions from Engineering, Science & IT     Open Access  
Estação Científica (UNIFAP)     Open Access  
The Winnower     Open Access  

  First | 1 2        [Sort alphabetically]   [Restore default list]

Similar Journals
Journal Cover
Data
Number of Followers: 4  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2306-5729
Published by MDPI Homepage  [258 journals]
  • Data, Vol. 9, Pages 61: Training Datasets for Epilepsy Analysis:
           Preprocessing and Feature Extraction from Electroencephalography Time
           Series

    • Authors: Christian Riccio, Angelo Martone, Gaetano Zazzaro, Luigi Pavone
      First page: 61
      Abstract: We describe 20 datasets derived through signal filtering and feature extraction steps applied to the raw time series EEG data of 20 epileptic patients, as well as the methods we used to derive them. Background: Epilepsy is a complex neurological disorder which has seizures as its hallmark. Electroencephalography plays a crucial role in epilepsy assessment, offering insights into the brain’s electrical activity and advancing our understanding of seizures. The availability of tagged training sets covering all seizure phases—inter-ictal, pre-ictal, ictal, and post-ictal—is crucial for data-driven epilepsy analyses. Methods: Using the sliding window technique with a two-second window length and a one-second time slip, we extract multiple features from the preprocessed EEG time series of 20 patients from the Freiburg Seizure Prediction Database. In addition, we assign a class label to each instance to specify its corresponding seizure phase. All these operations are made through a software application we developed, which is named Training Builder. Results: The 20 tagged training datasets each contain 1080 univariate and bivariate features, and are openly and publicly available. Conclusions: The datasets support the training of data-driven models for seizure detection, prediction, and clustering, based on features engineering.
      Citation: Data
      PubDate: 2024-04-26
      DOI: 10.3390/data9050061
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 62: Stimulated Microcontroller Dataset for New IoT
           Device Identification Schemes through On-Chip Sensor Monitoring

    • Authors: Alberto Ramos, Honorio Martín, Carmen Cámara, Pedro Peris-Lopez
      First page: 62
      Abstract: Legitimate identification of devices is crucial to ensure the security of present and future IoT ecosystems. In this regard, AI-based systems that exploit intrinsic hardware variations have gained notable relevance. Within this context, on-chip sensors included for monitoring purposes in a wide range of SoCs remain almost unexplored, despite their potential as a valuable source of both information and variability. In this work, we introduce and release a dataset comprising data collected from the on-chip temperature and voltage sensors of 20 microcontroller-based boards from the STM32L family. These boards were stimulated with five different algorithms, as workloads to elicit diverse responses. The dataset consists of five acquisitions (1.3 billion readouts) that are spaced over time and were obtained under different configurations using an automated platform. The raw dataset is publicly available, along with metadata and scripts developed to generate pre-processed T–V sequence sets. Finally, a proof of concept consisting of training a simple model is presented to demonstrate the feasibility of the identification system based on these data.
      Citation: Data
      PubDate: 2024-04-28
      DOI: 10.3390/data9050062
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 63: Detailed Landslide Traces Database of Hancheng
           County, China, Based on High-Resolution Satellite Images Available on the
           Google Earth Platform

    • Authors: Zhao, Xu, Huang
      First page: 63
      Abstract: Hancheng is located in the eastern part of China’s Shaanxi Province, near the west bank of the Yellow River. It is located at the junction of the active geological structure area. The rock layer is relatively fragmented, and landslide disasters are frequent. The occurrence of landslide disasters often causes a large number of casualties along with economic losses in the local area, seriously restricting local economic development. Although risk assessment and deformation mechanism analysis for single landslides have been performed for landslide disasters in the Hancheng area, this area lacks a landslide traces database. A complete landslide database comprises the basic data required for the study of landslide disasters and is an important requirement for subsequent landslide-related research. Therefore, this study used multi-temporal high-resolution optical images and human-computer interaction visual interpretation methods of the Google Earth platform to construct a landslide traces database in Hancheng County. The results showed that at least 6785 landslides had occurred in the study area. The total area of the landslides was about 95.38 km2, accounting for 5.88% of the study area. The average landslide area was 1406.04 m2, the largest landslide area was 377,841 m2, and the smallest landslide area was 202.96 m2. The results of this study provides an important basis for understanding the spatial distribution of landslides in Hancheng County, the evaluation of landslide susceptibility, and local disaster prevention and mitigation work.
      Citation: Data
      PubDate: 2024-04-29
      DOI: 10.3390/data9050063
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 64: A Comprehensive Dataset of the Aerodynamic and
           Geometric Coefficients of Airfoils in the Public Domain

    • Authors: Kanak Agarwal, Vedant Vijaykrishnan, Dyutit Mohanty, Manikandan Murugaiah
      First page: 64
      Abstract: This study presents an extensive collection of data on the aerodynamic behavior at a low Reynolds number and geometric coefficients for 2900 airfoils obtained through the class shape transformation (CST) method. By employing a verified OpenFOAM-based CFD simulation framework, lift and drag coefficients were determined at a Reynolds number of 105. Considering the limited availability of data on low Reynolds number airfoils, this dataset is invaluable for a wide range of applications, including unmanned aerial vehicles (UAVs) and wind turbines. Additionally, the study offers a method for automating CFD simulations that could be applied to obtain aerodynamic coefficients at higher Reynolds numbers. The breadth of this dataset also supports the enhancement and creation of machine learning (ML) models, further advancing research into the aerodynamics of airfoils and lifting surfaces.
      Citation: Data
      PubDate: 2024-04-30
      DOI: 10.3390/data9050064
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 65: Spectral Library of Plant Species from Montesinho
           Natural Park in Portugal

    • Authors: Isabel Pôças, Cátia Rodrigues de Almeida, Salvador Arenas-Castro, João C. Campos, Nuno Garcia, João Alírio, Neftalí Sillero, Ana C. Teodoro
      First page: 65
      Abstract: In this work, we present and describe a spectral library (SL) with 15 vascular plant species from Montesinho Natural Park (MNP), a protected area in Northeast Portugal. We selected species from the vascular plants that are characteristic of the habitats in the MNP, based on their prevalence, and also included one invasive species: Alnus glutinosa (L.) Gaertn, Castanea sativa Mill., Cistus ladanifer L., Crataegus monogyna Jacq., Frangula alnus Mill., Fraxinus angustifolia Vahl, Quercus pyrenaica Willd., Quercus rotundifolia Lam., Trifolium repens L., Arbutus unedo L., Dactylis glomerata L., Genista falcata Brot., Cytisus multiflorus (L’Hér.) Sweet, Erica arborea L., and Acacia dealbata Link. We collected spectra (300–2500 nm) from five records per leaf and leaf side, which resulted in 538 spectra compiled in the SL. Additionally, we computed five vegetation indices from spectral data and analysed them to highlight specific characteristics and differences among the sampled species. We detail the data repository information and its organisation for a better understanding of the data and to facilitate its use. The SL structure can add valuable information about the selected plant species in MNP, contributing to conservation purposes. This plant species SL is publicly available in Zenodo platform.
      Citation: Data
      PubDate: 2024-04-30
      DOI: 10.3390/data9050065
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 66: A Series Production Data Set for Five-Axis CNC
           Milling

    • Authors: Anna-Maria Schmitt, Bastian Engelmann
      First page: 66
      Abstract: The described data set contains features from the machine control of a five-axis milling machine. The features were recorded during thirteen series productions. Each series production includes a changeover process in which the machine was set up for the production of a different product. In addition to the timestamps and the twenty recorded features derived from Numerical Control (NC) variables, the data set also contains labels for the different production phases. For this purpose, up to 23 phases were assigned, which are based on a generalized milling process. The data set consists of thirteen .csv files, each representing a series production. The data set was recorded in a production company in the contract manufacturing sector for components with real series orders in ongoing industrial production.
      Citation: Data
      PubDate: 2024-04-30
      DOI: 10.3390/data9050066
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 67: Unveiling University Groupings: A Clustering
           Analysis for Academic Rankings

    • Authors: George Matlis, Nikos Dimokas, Petros Karvelis
      First page: 67
      Abstract: The evaluation and ranking of educational institutions are of paramount importance to a wide range of stakeholders, including students, faculty members, funding organizations, and the institutions themselves. Traditional ranking systems, such as those provided by QS, ARWU, and THE, have offered valuable insights into university performance by employing a variety of indicators to reflect institutional excellence across research, teaching, international outlook, and more. However, these linear rankings may not fully capture the multifaceted nature of university performance. This study introduces a novel clustering analysis that complements existing rankings by grouping universities with similar characteristics, providing a multidimensional perspective on global higher education landscapes. Utilizing a range of clustering algorithms—K-Means, GMM, Agglomerative, and Fuzzy C-Means—and incorporating both traditional and unique indicators, our approach seeks to highlight the commonalities and shared strengths within clusters of universities. This analysis does not aim to supplant existing ranking systems but to augment them by offering stakeholders an alternative lens through which to view and assess university performance. By focusing on group similarities rather than ordinal positions, our method encourages a more nuanced understanding of institutional excellence and facilitates peer learning among universities with similar profiles. While acknowledging the limitations inherent in any methodological approach, including the selection of indicators and clustering algorithms, this study underscores the value of complementary analyses in enriching our understanding of higher educational institutions’ performance.
      Citation: Data
      PubDate: 2024-05-11
      DOI: 10.3390/data9050067
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 68: EEG and Physiological Signals Dataset from
           Participants during Traditional and Partially Immersive Learning
           Experiences in Humanities

    • Authors: Rebeca Romo-De León, Mei Li L. Cham-Pérez, Verónica Andrea Elizondo-Villegas, Alejandro Villarreal-Villarreal, Alexandro Antonio Ortiz-Espinoza, Carol Stefany Vélez-Saboyá, Jorge de Jesús Lozoya-Santos, Manuel Cebral-Loureda, Mauricio A. Ramírez-Moreno
      First page: 68
      Abstract: The relevance of the interaction between Humanities-enhanced learning using immersive environments and simultaneous physiological signal analysis contributes to the development of Neurohumanities and advancements in applications of Digital Humanities. The present dataset consists of recordings from 24 participants divided in two groups (12 participants in each group) engaging in simulated learning scenarios, traditional learning, and partially immersive learning experiences. Data recordings from each participant contain recordings of physiological signals and psychometric data collected from applied questionnaires. Physiological signals include electroencephalography, real-time engagement and emotion recognition calculation by a Python EEG acquisition code, head acceleration, electrodermal activity, blood volume pressure, inter-beat interval, and temperature. Before the acquisition of physiological signals, participants were asked to fill out the General Health Questionnaire and Trait Meta-Mood Scale. In between recording sessions, participants were asked to fill out Likert-scale questionnaires regarding their experience and a Self-Assessment Manikin. At the end of the recording session, participants filled out the ITC Sense of Presence Inventory questionnaire for user experience. The dataset can be used to explore differences in physiological patterns observed between different learning modalities in the Humanities.
      Citation: Data
      PubDate: 2024-05-15
      DOI: 10.3390/data9050068
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 69: Review of Data Processing Methods Used in
           Predictive Maintenance for Next Generation Heavy Machinery

    • Authors: Ietezaz Ul Hassan, Krishna Panduru, Joseph Walsh
      First page: 69
      Abstract: Vibration-based condition monitoring plays an important role in maintaining reliable and effective heavy machinery in various sectors. Heavy machinery involves major investments and is frequently subjected to extreme operating conditions. Therefore, prompt fault identification and preventive maintenance are important for reducing costly breakdowns and maintaining operational safety. In this review, we look at different methods of vibration data processing in the context of vibration-based condition monitoring for heavy machinery. We divided primary approaches related to vibration data processing into three categories–signal processing methods, preprocessing-based techniques and artificial intelligence-based methods. We highlight the importance of these methods in improving the reliability and effectiveness of heavy machinery condition monitoring systems, highlighting the importance of precise and automated fault detection systems. To improve machinery performance and operational efficiency, this review aims to provide information on current developments and future directions in vibration-based condition monitoring by addressing issues like imbalanced data and integrating cutting-edge techniques like anomaly detection algorithms.
      Citation: Data
      PubDate: 2024-05-15
      DOI: 10.3390/data9050069
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 70: Continuous Wave Measurements Collected in
           Intermediate Depth throughout the North Sea Storm Season during the
           RealDune/REFLEX Experiments

    • Authors: Jantien Rutten, Marion Tissier, Paul van Wiechen, Xinyi Zhang, Sierd de Vries, Ad Reniers, Jan-Willem Mol
      First page: 70
      Abstract: High-resolution wave measurements at intermediate water depth are required to improve coastal impact modeling. Specifically, such data sets are desired to calibrate and validate models, and broaden the insight on the boundary conditions that force models. Here, we present a wave data set collected in the North Sea at three stations in intermediate water depth (6–14 m) during the 2021/2022 storm season as part of the RealDune/REFLEX experiments. Continuous measurements of synchronized surface elevation, velocity and pressure were recorded at 2–4 Hz by Acoustic Doppler Profilers and an Acoustic Doppler Velocimeter for a 5-month duration. Time series were quality-controlled, directional-frequency energy spectra were calculated and common bulk parameters were derived. Measured wave conditions vary from calm to energetic with 0.1–5.0 m sea-swell wave height, 5–16 s mean wave period and W-NNW direction. Nine storms, i.e., wave height beyond 2.5 m for at least six hours, were recorded including the triple storms Dudley, Eunice and Franklin. This unique data set can be used to investigate wave transformation, wave nonlinearity and wave directionality for higher and lower frequencies (e.g., sea-swell and infragravity waves) to compare with theoretical and empirical descriptions. Furthermore, the data can serve to force, calibrate and validate models during storm conditions.
      Citation: Data
      PubDate: 2024-05-17
      DOI: 10.3390/data9050070
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 71: Neural Architecture Comparison for Bibliographic
           Reference Segmentation: An Empirical Study

    • Authors: Rodrigo Cuéllar Cuéllar Hidalgo, Raúl Pinto Pinto Elías, Juan Manuel Torres Torres Moreno, Osslan Osiris Vergara Vergara Villegas , Gerardo Reyes Reyes Salgado, Andrea Magadán Magadán Salazar
      First page: 71
      Abstract: In the realm of digital libraries, efficiently managing and accessing scientific publications necessitates automated bibliographic reference segmentation. This study addresses the challenge of accurately segmenting bibliographic references, a task complicated by the varied formats and styles of references. Focusing on the empirical evaluation of Conditional Random Fields (CRF), Bidirectional Long Short-Term Memory with CRF (BiLSTM + CRF), and Transformer Encoder with CRF (Transformer + CRF) architectures, this research employs Byte Pair Encoding and Character Embeddings for vector representation. The models underwent training on the extensive Giant corpus and subsequent evaluation on the Cora Corpus to ensure a balanced and rigorous comparison, maintaining uniformity across embedding layers, normalization techniques, and Dropout strategies. Results indicate that the BiLSTM + CRF architecture outperforms its counterparts by adeptly handling the syntactic structures prevalent in bibliographic data, achieving an F1-Score of 0.96. This outcome highlights the necessity of aligning model architecture with the specific syntactic demands of bibliographic reference segmentation tasks. Consequently, the study establishes the BiLSTM + CRF model as a superior approach within the current state-of-the-art, offering a robust solution for the challenges faced in digital library management and scholarly communication.
      Citation: Data
      PubDate: 2024-05-18
      DOI: 10.3390/data9050071
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 72: A Benchmark Data Set for Long-Term Monitoring in
           the eLTER Site Gesäuse-Johnsbachtal

    • Authors: Florian Lippl, Alexander Maringer, Margit Kurka, Jakob Abermann, Wolfgang Schöner, Manuela Hirschmugl
      First page: 72
      Abstract: This paper gives an overview over all currently available data sets for the European Long-term Ecosystem Research (eLTER) monitoring site Gesäuse-Johnsbachtal. The site is part of the LTSER platform Eisenwurzen in the Alps of the province of Styria, Austria. It contains both protected (National Park Gesäuse) and non-protected areas (Johnsbachtal). Although the main research focus of the eLTER monitoring site Gesäuse-Johnsbachtal is on inland surface running waters, forests and other wooded land, the eLTER whole system (WAILS) approach was followed in regard to the data selection, systematically screening all available data in regard to its suitability as eLTER’s Standard Observations (SOs). Thus, data from all system strata was included, incorporating Geosphere, Atmosphere, Hydrosphere, Biosphere and Sociosphere. In the WAILS approach these SOs are key data for a whole system approach towards long term ecosystem research. Altogether, 54 data sets have been collected for the eLTER monitoring site Gesäuse-Johnsbachtal and included in the Dynamical Ecological Information Management System – Site and Data Registry (DEIMS-SDR), which is the eLTER data platform. The presented work provides all these data sets through dedicated data repositories for FAIR use. This paper gives an overview on all compiled data sets and their main properties. Additionally, the available data are evaluated in a concluding gap analysis with regard to the needed observation data according to WAILS, followed by an outlook on how to fill these gaps.
      Citation: Data
      PubDate: 2024-05-18
      DOI: 10.3390/data9050072
      Issue No: Vol. 9, No. 5 (2024)
       
  • Data, Vol. 9, Pages 47: An EEG Dataset of Subject Pairs during
           

    • Authors: María A. Hernández-Mustieles, Yoshua E. Lima-Carmona, Axel A. Mendoza-Armenta, Ximena Hernandez-Machain, Diego A. Garza-Vélez, Aranza Carrillo-Márquez, Diana C. Rodríguez-Alvarado, Jorge de J. Lozoya-Santos, Mauricio A. Ramírez-Moreno
      First page: 47
      Abstract: This dataset was acquired during collaboration and competition tasks performed by sixteen subject pairs (N = 32) of one female and one male under different (face-to-face and online) modalities. The collaborative task corresponds to cooperating to put together a 100-piece puzzle, while the competition task refers to playing against each other in a one-on-one classic 28-piece dominoes game. In the face-to-face modality, all interactions between the pair occurred in person. On the other hand, in the online modality, participants were physically separated, and interaction was only allowed through Zoom software with an active microphone and camera. Electroencephalography data of the two subjects were acquired simultaneously while performing the tasks. This article describes the experimental setup, the process of the data streams acquired during the tasks, and the assessment of data quality.
      Citation: Data
      PubDate: 2024-03-27
      DOI: 10.3390/data9040047
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 48: Luxury Car Data Analysis: A Literature Review

    • Authors: Pegah Barakati, Flavio Bertini, Emanuele Corsi, Maurizio Gabbrielli, Danilo Montesi
      First page: 48
      Abstract: The concept of luxury, considering it a rare and exclusive attribute, is evolving due to technological advances and the increasing influence of consumers in the market. Luxury cars have always symbolized wealth, social status, and sophistication. Recently, as technology progresses, the ability and interest to gather, store, and analyze data from these elegant vehicles has also increased. In recent years, the analysis of luxury car data has emerged as a significant area of research, highlighting researchers’ exploration of various aspects that may differentiate luxury cars from ordinary ones. For instance, researchers study factors such as economic impact, technological advancements, customer preferences and demographics, environmental implications, brand reputation, security, and performance. Although the percentage of individuals purchasing luxury cars is lower than that of ordinary cars, the significance of analyzing luxury car data lies in its impact on various aspects of the automotive industry and society. This literature review aims to provide an overview of the current state of the art in luxury car data analysis.
      Citation: Data
      PubDate: 2024-03-30
      DOI: 10.3390/data9040048
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 49: Analysis of a Bluetooth Traffic Dataset Obtained
           during University Examination Sessions

    • Authors: Radu Bouaru, Adrian Peculea, Bogdan Iancu, Sorin Buzura, Emil Cebuc, Vasile Dadarlat
      First page: 49
      Abstract: In academic environments, students take exams simultaneously in campus examination classrooms. Due to recent advancements in technology, examination rooms are flooded with Bluetooth data traffic generated by personal devices (smartphones, smartwatches, etc.). The work presented in this article proposes a method for collecting Bluetooth traffic in an academic examination setting. The desired data were collected during several examination sessions using an Ubertooth One device, and then an in-depth post-processing analysis was performed on the collected dataset. The devices generating traffic were precisely located within the examination room, and areas with heightened data traffic were highlighted. Additionally, another goal of the current research was to provide a unique type of dataset to the academic community, facilitating its utilization in further research endeavors.
      Citation: Data
      PubDate: 2024-03-30
      DOI: 10.3390/data9040049
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 50: DNA of Music: Identifying Relationships among
           Different Versions of the Composition Sadhukarn from Thailand, Laos, and
           Cambodia Using Multivariate Statistics

    • Authors: Sumetus Eambangyung, Gretel Schwörer-Kohl, Witoon Purahong
      First page: 50
      Abstract: Sadhukarn, a sacred music composition performed ritually to salute and invite divine powers to open a ceremony or feast, is played in Thailand, Cambodia, and Laos. Different countries have unique versions, arranged based on musicians’ skills and en vogue styles. This study presents the results of multivariate statistical analyses of 26 different versions of Sadhukarn main melodies using non-metric multidimensional scaling (NMDS) and cluster analysis. The objective was to identify the optimal number of parameters for identifying the origin and relationships among Sadhukarn versions, including rhyme structures, pillar tone, rhythmic and melodic patterns, intervals, pitches, and combinations of these parameters. The data were analyzed using both full and normalized datasets (32 phrases) to avoid biases due to differences in phrases among versions. Overall, the combination of six parameters is the best approach for data analysis in both full and normalized datasets. The analysis of the ‘full version’ shows the separation of Sadhukarn versions from different countries of origin, while the analysis of the ‘normalized version’ reveals the rhyme structure, rhythmic structure, and pitch as crucial parameters for identifying Sadhukarn versions. We conclude that multivariate statistics are powerful tools for identifying relationships among different versions of Sadhukarn compositions from Thailand, Laos, and Cambodia and within the same countries of origin.
      Citation: Data
      PubDate: 2024-03-30
      DOI: 10.3390/data9040050
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 51: Longitudinal Patterns of Online Activity and
           Social Feedback Are Associated with Current and Perceived Changes in
           Quality of Life in Adult Facebook Users

    • Authors: Davide Marengo, Michele Settanni
      First page: 51
      Abstract: The present study explored how sharing verbal status updates on Facebook and receiving Likes, as a form of positive social feedback, correlate with current and perceived changes in Quality of Life (QoL). Utilizing the Facebook Graph API, we collected a longitudinal dataset comprising status updates and Likes received by 1577 adult Facebook users over a 12-month period. Two monthly indicators were calculated: the percentage of verbal status updates and the average number of Likes per post. Participants were administered a survey to assess current and perceived changes in QoL. Confirmatory Factor Analysis (CFA) and the Auto-Regressive Latent Trajectory Model with Structured Residuals (ALT-SRs) were used to model longitudinal patterns emerging from the objective recordings of Facebook activity and explore their correlation with QoL measures. Findings indicated a positive correlation between the percentage of verbal status updated on Facebook and current QoL. Online positive social feedback, measured through received Likes, was associated with both current QoL and perceived improvements in QoL. Of note, perceived improvements in QoL correlated with an increase in received Likes over time. Results highlight the relevance of collecting and modeling longitudinal Facebook data for the investigation of the association between activity on social media and individual well-being.
      Citation: Data
      PubDate: 2024-03-31
      DOI: 10.3390/data9040051
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 52: Natural Language Processing Patents Landscape
           Analysis

    • Authors: Hend S. Al-Khalifa, Taif AlOmar, Ghala AlOlyyan
      First page: 52
      Abstract: Understanding NLP patents provides valuable insights into innovation trends and competitive dynamics in artificial intelligence. This study uses the Lens patent database to investigate the landscape of NLP patents. The overall patent output in the NLP field on a global scale has exhibited a rapid growth over the past decade, indicating rising research and commercial interests in applying NLP techniques. By analyzing patent assignees, technology categories, and geographic distribution, we identify leading innovators as well as research hotspots in applying NLP. The patent landscape reflects intensifying competition between technology giants and research institutions. This research aims to synthesize key patterns and developments in NLP innovation revealed through patent data analysis, highlighting implications for firms and policymakers. A detailed understanding of NLP patenting activity can inform intellectual property strategy and technology investment decisions in this burgeoning AI domain.
      Citation: Data
      PubDate: 2024-03-31
      DOI: 10.3390/data9040052
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 53: Wearable Device Bluetooth/BLE Physical Layer
           Dataset

    • Authors: Artis Rusins, Deniss Tiscenko, Eriks Dobelis, Eduards Blumbergs, Krisjanis Nesenbergs, Peteris Paikens
      First page: 53
      Abstract: Wearable devices, such as headsets and activity trackers, rely heavily on the Bluetooth and/or the Bluetooth Low Energy wireless communication standard to exchange data with smartphones or other peripherals. Since these devices collect personal health and activity data, ensuring the privacy and security of the transmitted data is crucial. Therefore, we present a dataset that captures complete Bluetooth communications—including advertising, connection, data exchange, and disconnection—in an RF isolated environment using software-defined radio. We were able to successfully decode the captured Bluetooth packets using existing tools. This dataset provides researchers with the ability to fully analyze Bluetooth traffic and gain insight into communication patterns and potential security vulnerabilities.
      Citation: Data
      PubDate: 2024-04-03
      DOI: 10.3390/data9040053
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 54: Illumina 16S rRNA Gene Sequencing Dataset of
           Bacterial Communities of Soil Associated with Ironwood Trees (Casuarina
           equisetifolia) in Guam

    • Authors: Tao Jin, Robert L. Schlub, Claudia Husseneder
      First page: 54
      Abstract: Ironwood trees, which are of great importance for the economy and environment of tropical areas, were first discovered to suffer from a slow progressive dieback in Guam in 2002, later referred to as ironwood tree decline (IWTD). A variety of biotic factors have been shown to be associated with IWTD, including putative bacterial pathogens Ralstonia solanacearum and Klebsiella species (K. variicola and K. oxytoca), the fungus Ganoderma australe, and termites. Due to the soilborne nature of these pathogens, soil microbiomes have been suggested to be a significant factor influencing tree health. In this project, we sequenced the microbiome in the soil collected from the root region of healthy ironwood trees and those showing signs of IWTD to evaluate the association between the bacterial community in soil and IWTD. This dataset contains 4,782,728 raw sequencing reads present in soil samples collected from thirty-nine ironwood trees with varying scales of decline severity in Guam obtained via sequencing the V1–V3 region of the 16S rRNA gene on the Illumina NovaSeq (2 × 250 bp) platform. Sequences were taxonomically assigned in QIIME2 using the SILVA 132 database. Firmicutes and Actinobacteria were the most dominant phyla in soil. Differences in soil microbiomes were detected between limestone and sand soil parent materials. No putative plant pathogens of the genera Ralstonia or Klebsiella were found in the samples. Bacterial diversity was not linked to parameters of IWTD. The dataset has been made publicly available through NCBI GenBank under BioProject ID PRJNA883256. This dataset can be used to compare the bacterial taxa present in soil associated with ironwood trees in Guam to bacteria communities of other geographical locations to identify microbial signatures of IWTD. In addition, this dataset can also be used to investigate the relationship between soil microbiomes and the microbiomes of ironwood trees as well as those of the termites which attack ironwood trees.
      Citation: Data
      PubDate: 2024-04-07
      DOI: 10.3390/data9040054
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 55: Learning from Conect4children: A Collaborative
           Approach towards Standardisation of Disease-Specific Paediatric Research
           Data

    • Authors: Anando Sen, Victoria Hedley, Eva Degraeuwe, Steven Hirschfeld, Ronald Cornet, Ramona Walls, John Owen, Peter N. Robinson, Edward G. Neilan, Thomas Liener, Giovanni Nisato, Neena Modi, Simon Woodworth, Avril Palmeri, Ricarda Gaentzsch, Melissa Walsh, Teresa Berkery, Joanne Lee, Laura Persijn, Kasey Baker, Kristina An Haack, Sonia Segovia Simon, Julius O. B. Jacobsen, Giorgio Reggiardo, Melissa A. Kirwin, Jessie Trueman, Claudia Pansieri, Donato Bonifazi, Sinéad Nally, Fedele Bonifazi, Rebecca Leary, Volker Straub
      First page: 55
      Abstract: The conect4children (c4c) initiative was established to facilitate the development of new drugs and other therapies for paediatric patients. It is widely recognised that there are not enough medicines tested for all relevant ages of the paediatric population. To overcome this, it is imperative that clinical data from different sources are interoperable and can be pooled for larger post hoc studies. c4c has collaborated with the Clinical Data Interchange Standards Consortium (CDISC) to develop cross-cutting data resources that build on existing CDISC standards in an effort to standardise paediatric data. The natural next step was an extension to disease-specific data items. c4c brought together several existing initiatives and resources relevant to disease-specific data and analysed their use for standardising disease-specific data in clinical trials. Several case studies that combined disease-specific data from multiple trials have demonstrated the need for disease-specific data standardisation. We identified three relevant initiatives. These include European Reference Networks, European Joint Programme on Rare Diseases, and Pistoia Alliance. Other resources reviewed were National Cancer Institute Enterprise Vocabulary Services, CDISC standards, pharmaceutical company-specific data dictionaries, Human Phenotype Ontology, Phenopackets, Unified Registry for Inherited Metabolic Disorders, Orphacodes, Rare Disease Cures Accelerator-Data and Analytics Platform (RDCA-DAP), and Observational Medical Outcomes Partnership. The collaborative partners associated with these resources were also reviewed briefly. A plan of action focussed on collaboration was generated for standardising disease-specific paediatric clinical trial data. A paediatric data standards multistakeholder and multi-project user group was established to guide the remaining actions—FAIRification of metadata, a Phenopackets pilot with RDCA-DAP, applying Orphacodes to case report forms of clinical trials, introducing CDISC standards into European Reference Networks, testing of the CDISC Pediatric User Guide using data from the mentioned resources and organisation of further workshops and educational materials.
      Citation: Data
      PubDate: 2024-04-08
      DOI: 10.3390/data9040055
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 56: A Dataset for Studying the Relationship between
           Human and Smart Devices

    • Authors: Francesco Lelli, Heidi Toivonen
      First page: 56
      Abstract: This dataset reports the responses to a survey designed for investigating the relationship that humans have with their smart devices. The dataset was collected between May and July 2020 and is a sample of over 500 respondents of various ethnicities and backgrounds. These data were used for modeling the ways that people relate to their devices using the notion of agency. However, the data can be used for complementing any study that intends to investigate a tool-mediated communication from the perspective of users, applying a variety of beliefs, attitudes, and expectations that users have in relation to their devices and themselves. This article presents the survey items as well as some preliminary data insights. The collected data were in English and the responses were anonymized to ensure GDPR compliance. The data were stored in a .csv file containing the respondents’ answers to the questions.
      Citation: Data
      PubDate: 2024-04-11
      DOI: 10.3390/data9040056
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 57: Experimental Data on Maximum Swelling Pressure of
           Clayey Soils and Related Soil Properties

    • Authors: Reza Taherdangkoo, Muntasir Shehab, Thomas Nagel, Faramarz Doulati Ardejani, Christoph Butscher
      First page: 57
      Abstract: Clayey soils exhibit significant volumetric changes in response to variations in water content. The swelling pressure of clayey soils is a critical parameter for evaluating the stability and performance of structures built on them, facilitating the development of appropriate design methodologies and mitigation strategies to ensure their long-term integrity and safety. We present a dataset comprising maximum swelling pressure values from 759 compacted soil samples, compiled from 16 articles published between 1994 and 2022. The dataset is classified into two main groups: 463 samples of natural clays and 296 samples of bentonite and bentonite mixtures, providing data on various types of soils and their properties. Different swelling test methods, including zero swelling, swell consolidation, restrained swell, double oedometer, free swelling, constant volume oedometer, UPC isochoric cell, isochoric oedometer and consolidometer, were employed to measure the maximum swelling pressure. The comprehensive nature of the dataset enhances its applicability for geotechnical projects. The dataset is a valuable resource for understanding the complex interactions between soil properties and swelling behavior, contributing to advancements in soil mechanics and geotechnical engineering.
      Citation: Data
      PubDate: 2024-04-16
      DOI: 10.3390/data9040057
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 58: Introduction to Reproducible Geospatial Analysis
           and Figures in R: A Tutorial Article

    • Authors: Philippe Maesen, Edouard Salingros
      First page: 58
      Abstract: The present article is intended to serve an educational purpose for data scientists and students who already have experience with the R language and which to start using it for geospatial analysis and map creation. The basic concepts of raster data, vector data, CRS and datum are first presented along with a basic workflow to conduct reproducible geospatial research in R. Examples of important types of maps (scatter, bubble, choropleth, hexbin and faceted) created from open-source environmental data are illustrated and their practical implementation in R is discussed. Through these examples, essential manipulations on geospatial vector data are demonstrated (reading , transforming CRS, creating geometries from scratch, buffer zones around existing geometries and intersections between geometries).
      Citation: Data
      PubDate: 2024-04-20
      DOI: 10.3390/data9040058
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 59: Mapping of Data-Sharing Repositories for
           Paediatric Clinical Research—A Rapid Review

    • Authors: Mariagrazia Felisi, Fedele Bonifazi, Maddalena Toma, Claudia Pansieri, Rebecca Leary, Victoria Hedley, Ronald Cornet, Giorgio Reggiardo, Annalisa Landi, Annunziata D’Ercole, Salma Malik, Sinéad Nally, Anando Sen, Avril Palmeri, Donato Bonifazi, Adriana Ceci
      First page: 59
      Abstract: The reuse of paediatric individual patient data (IPD) from clinical trials (CTs) is essential to overcome specific ethical, regulatory, methodological, and economic issues that hinder the progress of paediatric research. Sharing data through repositories enables the aggregation and dissemination of clinical information, fosters collaboration between researchers, and promotes transparency. This work aims to identify and describe existing data-sharing repositories (DSRs) developed to store, share, and reuse paediatric IPD from CTs. A rapid review of platforms providing access to electronic DSRs was conducted. A two-stage process was used to characterize DSRs: a first step of identification, followed by a second step of analysis using a set of eight purpose-built indicators. From an initial set of forty-five publicly available DSRs, twenty-one DSRs were identified as meeting the eligibility criteria. Only two DSRs were found to be totally focused on the paediatric population. Despite an increased awareness of the importance of data sharing, the results of this study show that paediatrics remains an area in which targeted efforts are still needed. Promoting initiatives to raise awareness of these DSRs and creating ad hoc measures and common standards for the sharing of paediatric CT data could help to bridge this gap in paediatric research.
      Citation: Data
      PubDate: 2024-04-20
      DOI: 10.3390/data9040059
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 60: Predicting Academic Success of College Students
           Using Machine Learning Techniques

    • Authors: Jorge Humberto Guanin-Fajardo, Javier Guaña-Moya, Jorge Casillas
      First page: 60
      Abstract: College context and academic performance are important determinants of academic success; using students’ prior experience with machine learning techniques to predict academic success before the end of the first year reinforces college self-efficacy. Dropout prediction is related to student retention and has been studied extensively in recent work; however, there is little literature on predicting academic success using educational machine learning. For this reason, CRISP-DM methodology was applied to extract relevant knowledge and features from the data. The dataset examined consists of 6690 records and 21 variables with academic and socioeconomic information. Preprocessing techniques and classification algorithms were analyzed. The area under the curve was used to measure the effectiveness of the algorithm; XGBoost had an AUC = 87.75% and correctly classified eight out of ten cases, while the decision tree improved interpretation with ten rules in seven out of ten cases. Recognizing the gaps in the study and that on-time completion of college consolidates college self-efficacy, creating intervention and support strategies to retain students is a priority for decision makers. Assessing the fairness and discrimination of the algorithms was the main limitation of this work. In the future, we intend to apply the extracted knowledge and learn about its influence of on university management.
      Citation: Data
      PubDate: 2024-04-22
      DOI: 10.3390/data9040060
      Issue No: Vol. 9, No. 4 (2024)
       
  • Data, Vol. 9, Pages 39: CybAttT: A Dataset of Cyberattack News Tweets for
           Enhanced Threat Intelligence

    • Authors: Huda Lughbi, Mourad Mars, Khaled Almotairi
      First page: 39
      Abstract: The continuous developments in information technologies have resulted in a significant rise in security concerns, including cybercrimes, unauthorized access, and cyberattacks. Recently, researchers have increasingly turned to social media platforms like X to investigate cyberattacks. Analyzing and collecting news about cyberattacks from tweets can efficiently provide crucial insights into the attacks themselves, including their impacts, occurrence regions, and potential mitigation strategies. However, there is a shortage of labeled datasets related to cyberattacks. This paper describes CybAttT, a dataset of 36,071 English cyberattack-related tweets. These tweets are manually labeled into three classes: high-risk news, normal news, and not news. Our final overall Inner Annotation agreement was 0.99 (Fleiss kappa), which represents high agreement. To ensure dataset reliability and accuracy, we conducted rigorous experiments using different supervised machine learning algorithms and various fine-tuned language models to assess its quality and suitability for its intended purpose. A high F1-score of 87.6% achieved using the CybAttT dataset not only demonstrates the potential of our approach but also validates the high quality and thoroughness of its annotations. We have made our CybAttT dataset accessible to the public for research purposes.
      Citation: Data
      PubDate: 2024-02-23
      DOI: 10.3390/data9030039
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 40: Draft Genome Sequence of Bacillus thuringiensis
           INTA 103-23 Reveals Its Insecticidal Properties: Insights from the Genomic
           Sequence

    • Authors: Leopoldo Palma, Leila Ortiz, José Niz, Marcelo Berretta, Diego Sauka
      First page: 40
      Abstract: The genome of Bacillus thuringiensis strain INTA 103-23 was sequenced, revealing a high-quality draft assembly comprising 243 contigs with a total size of 6.30 Mb and a completeness of 99%. Phylogenetic analysis classified INTA 103-23 within the Bacillus cereus sensu stricto cluster. Genome annotation identified 6993 genes, including 2476 hypothetical proteins. Screening for pesticidal proteins unveiled 10 coding sequences with significant similarity to known pesticidal proteins, showcasing a potential efficacy against various insect orders. AntiSMASH analysis predicted 13 biosynthetic gene clusters (BGCs), including clusters with 100% similarity to petrobactin and anabaenopeptin NZ857/nostamide A. Notably, fengycin exhibited a 40% similarity within the identified clusters. Further exploration involved a comparative genomic analysis with ten phylogenetically closest genomes. The ANI values, calculated using fastANI, confirmed the closest relationships with strains classified under Bacillus cereus sensu stricto. This comprehensive genomic analysis of B. thuringiensis INTA 103-23 provides valuable insights into its genetic makeup, potential pesticidal activity, and biosynthetic capabilities. The identified BGCs and pesticidal proteins contribute to our understanding of the strain’s biocontrol potential against diverse agricultural pests.
      Citation: Data
      PubDate: 2024-02-28
      DOI: 10.3390/data9030040
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 41: Defining the Balearic Islands’ Tourism Data
           Space: An Approach to Functional and Data Requirements

    • Authors: Dolores Ordóñez-Martínez, Joana M. Seguí-Pons, Maurici Ruiz-Pérez
      First page: 41
      Abstract: The definition of a tourism data space (TDS) in the Balearic Islands is a complex process that involves identifying the types of questions to be addressed, including analytical tools, and determining the type of information to be incorporated. This study delves into the functional requirements of a Balearic Islands’ TDS based on the study of scientific research carried out in the field of tourism in the Balearic Islands and drawing comparisons with international scientific research in the field of tourism information. Utilizing a bibliometric analysis of the scientific literature, this study identifies the scientific requirements that should be met for the development of a robust, rigorous, and efficient TDS. The goal is to support excellent scientific research in tourism and facilitate the transfer of research results to the productive sector to maintain and improve the competitiveness of the Balearic Islands as a tourist destination. The results of the analysis provide a structured framework for the construction of the Balearic Islands’ TDS, outlining objectives, methods to be implemented, and information to be considered.
      Citation: Data
      PubDate: 2024-02-29
      DOI: 10.3390/data9030041
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 42: A Set of Ground Penetrating Radar Measures from
           Quarries

    • Authors: Stefano Bonduà, André Monteiro Klen, Massimiliano Pilone, Laurentiu Asimopolos, Natalia-Silvia Asimopolos
      First page: 42
      Abstract: This paper presents a set of Ground Penetrating Radar (GPR) data obtained from in situ measurements conducted in four ornamental stone quarries located in Italy (Botticino quarry) and Romania (Ruschita, Carpinis, and Pietroasa quarries). The GPR is a Non-Destructive Testing (NDT) technique that enables the detection and localization of fractures without damage to the surface, among other capabilities. In this study, two instruments of ground-coupled GPR were used to detect and locate the fractures, discontinuities, or weakened zones. The GPR data contains radargrams for discontinuities and fracture detection, besides the geographic location of the measures. For each measurement site, a set of radargrams has been acquired in two orthogonal directions, allowing for a 3D reconstruction of the investigated site.
      Citation: Data
      PubDate: 2024-03-03
      DOI: 10.3390/data9030042
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 43: Pupil Data Upon Stimulation by Auditory Stimuli

    • Authors: Davide La Rosa, Luca Bruschini, Maria Paola Tramonti Fantozzi, Paolo Orsini, Mario Milazzo, Antonino Crivello
      First page: 43
      Abstract: Evaluating hearing in newborns and uncooperative patients can pose a considerable challenge. One potential solution might be to employ the Pupil Dilation Response (PDR) as an objective physiological metric. In this dataset descriptor paper, we present a collection of data showing changes in pupil dimension and shape upon presentation of auditory stimuli. In particular, we collected pupil data from 16 subjects, with no known hearing loss, upon different lighting conditions, measured in response to a series of 60–100 audible tones, all of the same frequency and amplitude, which may serve to further investigate any relationship between hearing capabilities and PDRs.
      Citation: Data
      PubDate: 2024-03-05
      DOI: 10.3390/data9030043
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 44: Subjective Well-Being and Mental Health among
           College Students: Two Datasets for Diagnosis and Program Evaluation

    • Authors: Lina Martínez, Esteban Robles, Valeria Trofimoff, Nicolás Vidal, Andrés David Espada, Nayith Mosquera, Bryan Franco, Víctor Sarmiento, María Isabel Zafra
      First page: 44
      Abstract: This paper presents two datasets about college students’ subjective well-being and mental health in a developing country. The first data set of this report offers a diagnosis of the prevalence of self-reported symptoms associated with stress, anxiety, depression, and overall evaluation of subjective well-being. The study uses validated scales to measure self-reported symptoms related to mental health conditions. To measure stress, the study used the Perceived Stress Scale (PSS-10) and the 7-item Generalized Anxiety Disorder Scale (GAD-7) to measure symptoms associated with anxiety (GAD-7), and the 9-item Patient Health Questionnaire (PHQ-9) to measure symptoms associated with depression. This diagnosis was collected in a college student sample of 3052 undergrad students in 2022 at a medium-sized university in Colombia. The second dataset reports the evaluation of a positive education intervention implemented in the same university. The Colombian Minister of Science and Technology financed the intervention to promote strategies to mitigate the consequences on college students’ well-being and mental health after the pandemic. The program evaluation data cover two years (2020–2022) with 193 college students in the treatment group (students enrolled in a class teaching evidence-based interventions to promote well-being and mental health awareness) and 135 students in the control group. Data for evaluation include a broad array of variables of life satisfaction, happiness, negative emotions, COVID-19 effects, relationships valuations, and habits and the measurement of three scales: The Satisfaction with Life Scale (SWLS), a brief measurement of depressive symptomatology (CESD-7), and the Brief Strengths Scale (BSS).
      Citation: Data
      PubDate: 2024-03-06
      DOI: 10.3390/data9030044
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 45: A Dataset of Benthic Species from Mesophotic
           Bioconstructions on the Apulian Coast (Southeastern Italy, Mediterranean
           Sea)

    • Authors: Maria Mercurio, Guadalupe Giménez, Giorgio Bavestrello, Frine Cardone, Giuseppe Corriero, Jacopo Giampaoletti, Maria Flavia Gravina, Cataldo Pierri, Caterina Longo, Adriana Giangrande, Carlotta Nonnis Marzano
      First page: 45
      Abstract: Marine bioconstructions are complex habitats that represent a hotspot of biodiversity. Among Mediterranean bioconstructions, those thriving on mesophotic bottoms on southeastern Italian coasts are of particular interest due to their horizontal and vertical extension. In general, the communities that develop in the Mediterranean twilight zone encompassed within the first 30 m of depth are better known, while relatively few data are available on those at greater depths. By further investigating the diversity and structure of mesophotic bioconstructions in the southern Adriatic, we can improve our understanding of Mediterranean biodiversity while developing effective conservation strategies to preserve these habitats of particular interest. The dataset reported here comprises records of benthic marine taxa from algae and invertebrate mesophotic bioconstructions investigated at six sites along the southern Adriatic coast of Italy, at depths between approximately 25 and 65 m. The dataset contains a total of 1718 records, covering 11 phyla and 648 benthic taxa, of which 580 were recognized at the species level. These data could provide a reference point for further investigations with descriptive or management purposes, including the possible assessment of mesophotic bioconstructions as refuges for shallow-water species.
      Citation: Data
      PubDate: 2024-03-08
      DOI: 10.3390/data9030045
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 46: WEA-Acceptance Data—A Dataset of Acoustic,
           Meteorological, and Operational Wind Turbine Measurements

    • Authors: Daphne Schössow, Stephan Preihs, Jürgen Peissig
      First page: 46
      Abstract: In this article, a dataset is described which combines wind turbine supervisory control and data acquisition (SCADA), meteorological and acoustical data and thus gives a detailed description of a wind farm and its atmospheric and acoustic environment. The data were collected during different seasons for several weeks at a time, such that a multitude of environmental and operational conditions are covered. In five measurement campaigns, in total three different locations with similar surroundings were captured. The raw data were enhanced with derived values such as atmospheric stability or direction of sound propagation. Data of one month including all time series measurements as well as monophonic audio recordings are now published. The dataset also contains three exemplary use cases along with documents that describe the data pre-processing.
      Citation: Data
      PubDate: 2024-03-15
      DOI: 10.3390/data9030046
      Issue No: Vol. 9, No. 3 (2024)
       
  • Data, Vol. 9, Pages 18: Can Data and Machine Learning Change the Future of
           Basic Income Models' A Bayesian Belief Networks Approach

    • Authors: Hamed Khalili
      First page: 18
      Abstract: Appeals to governments for implementing basic income are contemporary. The theoretical backgrounds of the basic income notion only prescribe transferring equal amounts to individuals irrespective of their specific attributes. However, the most recent basic income initiatives all around the world are attached to certain rules with regard to the attributes of the households. This approach is facing significant challenges to appropriately recognize vulnerable groups. A possible alternative for setting rules with regard to the welfare attributes of the households is to employ artificial intelligence algorithms that can process unprecedented amounts of data. Can integrating machine learning change the future of basic income by predicting households vulnerable to future poverty' In this paper, we utilize multidimensional and longitudinal welfare data comprising one and a half million individuals’ data and a Bayesian beliefs network approach to examine the feasibility of predicting households’ vulnerability to future poverty based on the existing households’ welfare attributes.
      Citation: Data
      PubDate: 2024-01-23
      DOI: 10.3390/data9020018
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 19: Draft Genome Sequence of the Commercial Strain
           Rhizobium ruizarguesonis bv. viciae RCAM1022

    • Authors: Olga A. Kulaeva, Evgeny A. Zorin, Anton S. Sulima, Gulnar A. Akhtemova, Vladimir A. Zhukov
      First page: 19
      Abstract: Legume plants enter a symbiosis with soil nitrogen-fixing bacteria (rhizobia), thereby gaining access to assimilable atmospheric nitrogen. Since this symbiosis is important for agriculture, biofertilizers with effective strains of rhizobia are created for crop legumes to increase their yield and minimize the amounts of mineral fertilizers required. In this work, we sequenced and characterized the genome of Rhizobium ruizarguesonis bv. viciae strain RCAM1022, a component of the ‘Rhizotorfin’ biofertilizer produced in Russia and used for pea (Pisum sativum L.).
      Citation: Data
      PubDate: 2024-01-23
      DOI: 10.3390/data9020019
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 20: An Optimized Hybrid Approach for Feature Selection
           Based on Chi-Square and Particle Swarm Optimization Algorithms

    • Authors: Amani Abdo, Rasha Mostafa, Laila Abdel-Hamid
      First page: 20
      Abstract: Feature selection is a significant issue in the machine learning process. Most datasets include features that are not needed for the problem being studied. These irrelevant features reduce both the efficiency and accuracy of the algorithm. It is possible to think about feature selection as an optimization problem. Swarm intelligence algorithms are promising techniques for solving this problem. This research paper presents a hybrid approach for tackling the problem of feature selection. A filter method (chi-square) and two wrapper swarm intelligence algorithms (grey wolf optimization (GWO) and particle swarm optimization (PSO)) are used in two different techniques to improve feature selection accuracy and system execution time. The performance of the two phases of the proposed approach is assessed using two distinct datasets. The results show that PSOGWO yields a maximum accuracy boost of 95.3%, while chi2-PSOGWO yields a maximum accuracy improvement of 95.961% for feature selection. The experimental results show that the proposed approach performs better than the compared approaches.
      Citation: Data
      PubDate: 2024-01-25
      DOI: 10.3390/data9020020
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 21: MHAiR: A Dataset of Audio-Image Representations
           for Multimodal Human Actions

    • Authors: Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam, Naveed Akhtar
      First page: 21
      Abstract: Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were captured from an existing video dataset, i.e., UCF101. Each data sample captured a duration of approximately 10 s long, and the overall dataset was split into 4893 training samples and 1944 testing samples. The resulting feature sequences were then converted into images, which can be used for human action recognition and other related tasks. These images can be used as a benchmark dataset for evaluating the performance of machine learning models for human action recognition and related tasks. These audio-image representations could be suitable for a wide range of applications, such as surveillance, healthcare monitoring, and robotics. The dataset can also be used for transfer learning, where pre-trained models can be fine-tuned on a specific task using specific audio images. Thus, this dataset can facilitate the development of new techniques and approaches for improving the accuracy of human action-related tasks and also serve as a standard benchmark for testing the performance of different machine learning models and algorithms.
      Citation: Data
      PubDate: 2024-01-25
      DOI: 10.3390/data9020021
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 22: Genomic Epidemiology Dataset for the Important
           Nosocomial Pathogenic Bacterium Acinetobacter baumannii

    • Authors: Andrey Shelenkov, Yulia Mikhaylova, Vasiliy Akimkin
      First page: 22
      Abstract: The infections caused by various bacterial pathogens both in clinical and community settings represent a significant threat to public healthcare worldwide. The growing resistance to antimicrobial drugs acquired by bacterial species causing healthcare-associated infections has already become a life-threatening danger noticed by the World Health Organization. Several groups or lineages of bacterial isolates, usually called ‘the clones of high risk’, often drive the spread of resistance within particular species. Thus, it is vitally important to reveal and track the spread of such clones and the mechanisms by which they acquire antibiotic resistance and enhance their survival skills. Currently, the analysis of whole-genome sequences for bacterial isolates of interest is increasingly used for these purposes, including epidemiological surveillance and the development of spread prevention measures. However, the availability and uniformity of the data derived from genomic sequences often represent a bottleneck for such investigations. With this dataset, we present the results of a genomic epidemiology analysis of 17,546 genomes of a dangerous bacterial pathogen, Acinetobacter baumannii. Important typing information, including multilocus sequence typing (MLST)-based sequence types (STs), intrinsic blaOXA-51-like gene variants, capsular (KL) and oligosaccharide (OCL) types, CRISPR-Cas systems, and cgMLST profiles are presented, as well as the assignment of particular isolates to nine known international clones of high risk. The presence of antimicrobial resistance genes within the genomes is also reported. These data will be useful for researchers in the field of A. baumannii genomic epidemiology, resistance analysis, and prevention measure development.
      Citation: Data
      PubDate: 2024-01-26
      DOI: 10.3390/data9020022
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 23: Comprehensive Dataset on Pre-SARS-CoV-2 Infection
           Sports-Related Physical Activity Levels, Disease Severity, and Treatment
           Outcomes: Insights and Implications for COVID-19 Management

    • Authors: Dimitrios I. Bourdas, Panteleimon Bakirtzoglou, Antonios K. Travlos, Vasileios Andrianopoulos, Emmanouil Zacharakis
      First page: 23
      Abstract: This dataset aimed to explore associations between pre-SARS-CoV-2 infection exercise and sports-related physical activity (PA) levels and disease severity, along with treatments administered following the most recent SARS-CoV-2 infection. A comprehensive analysis investigated the relationships between PA categories (“Inactive”, “Low PA”, “Moderate PA”, “High PA”), disease severity (“Sporadic”, “Episodic”, “Recurrent”, “Frequent”, “Persistent”), and treatments post-SARS-CoV-2 infection (“No treatment”, “Home remedies”, “Prescribed medication”, “Hospital admission”, “Intensive care unit admission”) within a sample population (n = 5829) from the Hellenic territory. Utilizing the Active-Q questionnaire, data were collected from February to March 2023, capturing PA habits, participant characteristics, medical history, vaccination status, and illness experiences. Findings revealed an independent relationship between preinfection PA levels and disease severity (χ2 = 9.097, df = 12, p = 0.695). Additionally, a statistical dependency emerged between PA levels and illness treatment categories (χ2 = 39.362, df = 12, p < 0.001), particularly linking inactive PA with home remedies treatment. These results highlight the potential influence of preinfection PA on disease severity and treatment choices following SARS-CoV-2 infection. The dataset offers valuable insights into the interplay between PA, disease outcomes, and treatment decisions, aiding future research in shaping targeted interventions and public health strategies related to COVID-19 management.
      Citation: Data
      PubDate: 2024-01-26
      DOI: 10.3390/data9020023
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 24: Mapping Hierarchical File Structures to Semantic
           Data Models for Efficient Data Integration into Research Data Management
           Systems

    • Authors: Henrik tom Wörden, Florian Spreckelsen, Stefan Luther, Ulrich Parlitz, Alexander Schlemmer
      First page: 24
      Abstract: Although other methods exist to store and manage data in modern information technology, the standard solution is file systems. Therefore, keeping well-organized file structures and file system layouts can be key to a sustainable research data management infrastructure. However, file structures alone lack several important capabilities for FAIR data management: the two most significant being insufficient visualization of data and inadequate possibilities for searching and obtaining an overview. Research data management systems (RDMSs) can fill this gap, but many do not support the simultaneous use of the file system and RDMS. This simultaneous use can have many benefits, but keeping data in RDMS in synchrony with the file structure is challenging. Here, we present concepts that allow for keeping file structures and semantic data models (in RDMS) synchronous. Furthermore, we propose a specification in yaml format that allows for a structured and extensible declaration and implementation of a mapping between the file system and data models used in semantic research data management. Implementing these concepts will facilitate the re-use of specifications for multiple use cases. Furthermore, the specification can serve as a machine-readable and, at the same time, human-readable documentation of specific file system structures. We demonstrate our work using the Open Source RDMS LinkAhead (previously named “CaosDB”).
      Citation: Data
      PubDate: 2024-01-26
      DOI: 10.3390/data9020024
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 25: Curating, Collecting, and Cataloguing Global
           COVID-19 Datasets for the Aim of Predicting Personalized Risk

    • Authors: Sepehr Golriz Khatami, Astghik Sargsyan, Maria Francesca Russo, Daniel Domingo-Fernández, Andrea Zaliani, Abish Kaladharan, Priya Sethumadhavan, Sarah Mubeen, Yojana Gadiya, Reagon Karki, Stephan Gebel, Ram Kumar Ruppa Surulinathan, Vanessa Lage-Rupprecht, Saulius Archipovas, Geltrude Mingrone, Marc Jacobs, Carsten Claussen, Martin Hofmann-Apitius, Alpha Tom Kodamullil
      First page: 25
      Abstract: Although hundreds of datasets have been published since the beginning of the coronavirus pandemic, there is a lack of centralized resources where these datasets are listed and harmonized to facilitate their applicability and uptake by predictive modeling approaches. Firstly, such a centralized resource provides information about data owners to researchers who are searching datasets to develop their predictive models. Secondly, the harmonization of the datasets supports simultaneously taking advantage of several similar datasets. This, in turn, does not only ease the imperative external validation of data-driven models but can also be used for virtual cohort generation, which helps to overcome data sharing impediments. Here, we present that the COVID-19 data catalogue is a repository that provides a landscape view of COVID-19 studies and datasets as a putative source to enable researchers to develop personalized COVID-19 predictive risk models. The COVID-19 data catalogue currently contains over 400 studies and their relevant information collected from a wide range of global sources such as global initiatives, clinical trial repositories, publications, and data repositories. Further, the curated content stored in this data catalogue is complemented by a web application, providing visualizations of these studies, including their references, relevant information such as measured variables, and the geographical locations of where these studies were performed. This resource is one of the first to capture, organize, and store studies, datasets, and metadata related to COVID-19 in a comprehensive repository. We believe that our work will facilitate future research and development of personalized predictive risk models for COVID-19.
      Citation: Data
      PubDate: 2024-01-29
      DOI: 10.3390/data9020025
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 26: Dataset for Electronics and Plasmonics in
           Graphene, Silicene, and Germanene Nanostrips

    • Authors: Talia Tene, Nataly Bonilla García, Miguel Ángel Sáez Paguay, John Vera, Marco Guevara, Cristian Vacacela Gomez, Stefano Bellucci
      First page: 26
      Abstract: The quest for novel materials with extraordinary electronic and plasmonic properties is an ongoing pursuit in the field of materials science. The dataset provides the results of a computational study that used ab initio and semi-analytical computations to model freestanding nanosystems. We delve into the world of ribbon-like materials, specifically graphene nanoribbons, silicene nanoribbons, and germanene nanoribbons, comparing their electronic and plasmonic characteristics. Our research reveals a myriad of insights, from the tunability of band structures and the influence of an atomic number on electronic properties to the adaptability of nanoribbons for optoelectronic applications. Further, we uncover the promise of these materials for biosensing, demonstrating their plasmon frequency tunability based on charge density and Fermi velocity modification. Our findings not only expand the understanding of these quasi-1D materials but also open new avenues for the development of cutting-edge devices and technologies. This data presentation holds immense potential for future advancements in electronics, optics, and molecular sensing.
      Citation: Data
      PubDate: 2024-01-30
      DOI: 10.3390/data9020026
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 27: Understanding Data Breach from a Global
           Perspective: Incident Visualization and Data Protection Law Review

    • Authors: Gabriel Arquelau Pimenta Rodrigues, André Luiz Marques Serrano, Amanda Nunes Lopes Espiñeira Lemos, Edna Dias Canedo, Fábio Lúcio Lopes de Mendonça, Robson de Oliveira Albuquerque, Ana Lucila Sandoval Orozco, Luis Javier García Villalba
      First page: 27
      Abstract: Data breaches result in data loss, including personal, health, and financial information that are crucial, sensitive, and private. The breach is a security incident in which personal and sensitive data are exposed to unauthorized individuals, with the potential to incur several privacy concerns. As an example, the French newspaper Le Figaro breached approximately 7.4 billion records that included full names, passwords, and e-mail and physical addresses. To reduce the likelihood and impact of such breaches, it is fundamental to strengthen the security efforts against this type of incident and, for that, it is first necessary to identify patterns of its occurrence, primarily related to the number of data records leaked, the affected geographical region, and its regulatory aspects. To advance the discussion in this regard, we study a dataset comprising 428 worldwide data breaches between 2018 and 2019, providing a visualization of the related statistics, such as the most affected countries, the predominant economic sector targeted in different countries, and the median number of records leaked per incident in different countries, regions, and sectors. We then discuss the data protection regulation in effect in each country comprised in the dataset, correlating key elements of the legislation with the statistical findings. As a result, we have identified an extensive disclosure of medical records in India and government data in Brazil in the time range. Based on the analysis and visualization, we find some interesting insights that researchers seldom focus on before, and it is apparent that the real dangers of data leaks are beyond the ordinary imagination. Finally, this paper contributes to the discussion regarding data protection laws and compliance regarding data breaches, supporting, for example, the decision process of data storage location in the cloud.
      Citation: Data
      PubDate: 2024-01-31
      DOI: 10.3390/data9020027
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 28: Organ-On-A-Chip (OOC) Image Dataset for Machine
           Learning and Tissue Model Evaluation

    • Authors: Valērija Movčana, Arnis Strods, Karīna Narbute, Fēlikss Rūmnieks, Roberts Rimša, Gatis Mozoļevskis, Maksims Ivanovs, Roberts Kadiķis, Kārlis Gustavs Zviedris, Laura Leja, Anastasija Zujeva, Tamāra Laimiņa, Arturs Abols
      First page: 28
      Abstract: Organ-on-a-chip (OOC) technology has emerged as a groundbreaking approach for emulating the physiological environment, revolutionizing biomedical research, drug development, and personalized medicine. OOC platforms offer more physiologically relevant microenvironments, enabling real-time monitoring of tissue, to develop functional tissue models. Imaging methods are the most common approach for daily monitoring of tissue development. Image-based machine learning serves as a valuable tool for enhancing and monitoring OOC models in real-time. This involves the classification of images generated through microscopy contributing to the refinement of model performance. This paper presents an image dataset, containing cell images generated from OOC setup with different cell types. There are 3072 images generated by an automated brightfield microscopy setup. For some images, parameters such as cell type, seeding density, time after seeding and flow rate are provided. These parameters along with predefined criteria can contribute to the evaluation of image quality and identification of potential artifacts. This dataset can be used as a basis for training machine learning classifiers for automated data analysis generated from an OOC setup providing more reliable tissue models, automated decision-making processes within the OOC framework and efficient research in the future.
      Citation: Data
      PubDate: 2024-02-01
      DOI: 10.3390/data9020028
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 29: A Comprehensive Data Pipeline for Comparing the
           Effects of Momentum on Sports Leagues

    • Authors: Jordan Truman Paul Noel, Vinicius Prado da Fonseca, Amilcar Soares
      First page: 29
      Abstract: Momentum has been a consistently studied aspect of sports science for decades. Among the established literature, there has, at times, been a discrepancy between conclusions. However, if momentum is indeed an actual phenomenon, it would affect all aspects of sports, from player evaluation to pre-game prediction and betting. Therefore, using momentum-based features that quantify a team’s linear trend of play, we develop a data pipeline that uses a small sample of recent games to assess teams’ quality of play and measure the predictive power of momentum-based features versus the predictive power of more traditional frequency-based features across several leagues using several machine learning techniques. More precisely, we use our pipeline to determine the differences in the predictive power of momentum-based features and standard statistical features for the National Hockey League (NHL), National Basketball Association (NBA), and five major first-division European football leagues. Our findings show little evidence that momentum has superior predictive power in the NBA. Still, we found some instances of the effects of momentum on the NHL that produced better pre-game predictors, whereas we view a similar trend in European football/soccer. Our results indicate that momentum-based features combined with frequency-based features could improve pre-game prediction models and that, in the future, momentum should be studied more from a feature/performance indicator point-of-view and less from the view of the dependence of sequential outcomes, thus attempting to distance momentum from the binary view of winning and losing.
      Citation: Data
      PubDate: 2024-02-01
      DOI: 10.3390/data9020029
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 30: Expanded Brain CT Dataset for the Development of
           AI Systems for Intracranial Hemorrhage Detection and Classification

    • Authors: Anna N. Khoruzhaya, Tatiana M. Bobrovskaya, Dmitriy V. Kozlov, Dmitriy Kuligovskiy, Vladimir P. Novik, Kirill M. Arzamasov, Elena I. Kremneva
      First page: 30
      Abstract: Intracranial hemorrhage (ICH) is a dangerous life-threatening condition leading to disability. Timely and high-quality diagnosis plays a huge role in the course and outcome of this disease. The gold standard in determining ICH is computed tomography. This method requires a prompt involvement of highly qualified personnel, which is not always possible, for example, in case of a staff shortage or increased workload. In such a situation, every minute counts, and time can be lost. The solution to this problem seems to be a set of diagnostic decisions, including the use of artificial intelligence, which will help to identify patients with ICH in a timely manner and provide prompt and quality medical care. However, the main obstacle to the development of artificial intelligence is a lack of high-quality datasets for training and testing. In this paper, we present a dataset including 800 brain CT scans consisting of multiple series of DICOM images with and without signs of ICH, enriched with clinical and technical parameters, as well as the methodology of its generation utilizing natural language processing tools. The dataset is publicly available, which contributes to increased competition in the development of artificial intelligence systems and their advancement and quality improvement.
      Citation: Data
      PubDate: 2024-02-06
      DOI: 10.3390/data9020030
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 31: The Yinshan Mountains Record over 10,000
           Landslides

    • Authors: Jingjing Sun, Chong Xu, Liye Feng, Lei Li, Xuewei Zhang, Wentao Yang
      First page: 31
      Abstract: China boasts a vast expanse of mountainous terrain, characterized by intricate geological conditions and structural features, resulting in frequent geological disasters. Among these, landslides, as prototypical geological hazards, pose significant threats to both lives and property. Consequently, conducting a comprehensive landslide inventory in mountainous regions is imperative for current research. This study concentrates on the Yinshan Mountains, an ancient fault-block mountain range spanning east–west in the central Inner Mongolia Autonomous Region, extending from Langshan Mountains in the west to Damaqun Mountains in the east, with the narrow sense Xiao–Yin Mountains District in between. Employing multi-temporal high-resolution remote sensing images from Google Earth, this study conducted visual interpretation, identifying 10,968 landslides in the Yinshan area, encompassing a total area of 308.94 km2. The largest landslide occupies 2.95 km2, while the smallest covers 84.47 m2. Specifically, the Langshan area comprises 331 landslides with a total area of 11.96 km2, the narrow sense Xiao–Yin Mountains include 3393 landslides covering 64.13 km2, and the Manhan Mountains, Damaqun Mountains, and adjacent areas account for 7244 landslides over a total area of 232.85 km2. This research not only contributes to global landslide cataloging initiatives but also serves as a robust foundation for future geohazard prevention and management efforts.
      Citation: Data
      PubDate: 2024-02-08
      DOI: 10.3390/data9020031
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 32: Data in Astrophysics and Geophysics: Novel
           Research and Applications

    • Authors: Vladimir A. Srećković, Milan S. Dimitrijević, Zoran R. Mijić
      First page: 32
      Abstract: Rapid development of communication technologies and constant technological improvements as a result of scientific discoveries require the establishment of specific databases [...]
      Citation: Data
      PubDate: 2024-02-08
      DOI: 10.3390/data9020032
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 33: Conflicting Marks Archive Dataset: A Dataset of
           Conflicting Marks from the Brazilian Intellectual Property Office

    • Authors: Igor Bezerra Reis, Rafael Ângelo Santos Leite, Mateus Miranda Torres, Alcides Gonçalves da Silva Neto, Francisco José da Silva e Silva, Ariel Soares Teles
      First page: 33
      Abstract: A registered trademark represents one of a company’s most valuable intellectual assets, acting as a safeguard against possible reputational damage and financial losses resulting from infringements of this intellectual property. To be registered, a mark must be unique and distinctive in relation to other trademarks which are already registered. In this paper, we describe the CMAD, an acronym for Conflicting Marks Archive Dataset. This dataset has been meticulously organized into pairs of marks (Number of pairs = 18,355) involved in copyright infringement across word, figurative and mixed marks. Organizations sought to register these marks with the National Institute of Industrial Property (INPI) in Brazil, and had their applications denied after analysis by intellectual property specialists. The robustness of this dataset is ensured by the intrinsic similarity of the conflicting marks, since the decisions were made by INPI specialists. This characteristic provides a reliable basis for the development and testing of tools designed to analyze similarity between marks, thus contributing to the evolution of practices and computer-based solutions in the field of intellectual property.
      Citation: Data
      PubDate: 2024-02-09
      DOI: 10.3390/data9020033
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 34: Draft Genome Sequencing of the Bacillus
           thuringiensis var. Thuringiensis Highly Insecticidal Strain 800/15

    • Authors: Anton E. Shikov, Iuliia A. Savina, Maria N. Romanenko, Anton A. Nizhnikov, Kirill S. Antonets
      First page: 34
      Abstract: The Bacillus thuringiensis serovar thuringiensis strain 800/15 has been actively used as an agent in biopreparations with high insecticidal activity against the larvae of the Colorado potato beetle Leptinotarsa decemlineata and gypsy moth Lymantria dispar. In the current study, we present the first draft genome of the 800/15 strain coupled with a comparative genomic analysis of its closest reference strains. The raw sequence data were obtained by Illumina technology on the HiSeq X platform and de novo assembled with the SPAdes v3.15.4 software. The genome reached 6,524,663 bp. in size and carried 6771 coding sequences, 3 of which represented loci encoding insecticidal toxins, namely, Spp1Aa1, Cry1Ab9, and Cry1Ba8 active against the orders Lepidoptera, Blattodea, Hemiptera, Diptera, and Coleoptera. We also revealed the biosynthetic gene clusters responsible for the synthesis of secondary metabolites, including fengycin, bacillibactin, and petrobactin with predicted antibacterial, fungicidal, and growth-promoting properties. Further comparative genomics suggested the strain is not enriched with genes linked with biological activities implying that agriculturally important properties rely more on the composition of loci rather than their abundance. The obtained genomic sequence of the strain with the experimental metadata could facilitate the computational prediction of bacterial isolates’ potency from genomic data.
      Citation: Data
      PubDate: 2024-02-10
      DOI: 10.3390/data9020034
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 35: COVID-19 Lockdown Effects on Sleep, Immune
           Fitness, Mood, Quality of Life, and Academic Functioning: Survey Data from
           Turkish University Students

    • Authors: Pauline A. Hendriksen, Sema Tan, Evi C. van Oostrom, Agnese Merlo, Hilal Bardakçi, Nilay Aksoy, Johan Garssen, Gillian Bruce, Joris C. Verster
      First page: 35
      Abstract: Previous studies from the Netherlands, Germany, and Argentina revealed that the 2019 coronavirus disease (COVID-19) pandemic and associated lockdown periods had a significant negative impact on the wellbeing and quality of life of students. The negative impact of lockdown periods on health correlates such as immune fitness, alcohol consumption, and mood were reflected in their academic functioning. As both the duration and intensity of lockdown measures differed between countries, it is important to replicate these findings in different countries and cultures. Therefore, the purpose of the current study was to examine the impact of the COVID-19 pandemic on immune fitness, mood, academic functioning, sleep, smoking, alcohol consumption, healthy diet, and quality of life among Turkish students. Turkish students in the age range of 18 to 30 years old were invited to complete an online survey. Data were collected from n = 307 participants and included retrospective assessments for six time periods: (1) BP (before the COVID-19 pandemic, 1 January 2020–10 March 2020), (2) NL1 (the first no lockdown period, 11 March 2020–28 April 2021), (3) the lockdown period (29 April 2021–17 May 2021), (4) NL2 (the second no lockdown period, 18 May 2021–31 December 2021), (5) NL3 (the third no lockdown period, 1 January 2022–December 2022), and (6) for the past month. In this data descriptor article, the content of the survey and the dataset are described.
      Citation: Data
      PubDate: 2024-02-10
      DOI: 10.3390/data9020035
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 36: AriAplBud: An Aerial Multi-Growth Stage Apple
           Flower Bud Dataset for Agricultural Object Detection Benchmarking

    • Authors: Wenan Yuan
      First page: 36
      Abstract: As one of the most important topics in contemporary computer vision research, object detection has received wide attention from the precision agriculture community for diverse applications. While state-of-the-art object detection frameworks are usually evaluated against large-scale public datasets containing mostly non-agricultural objects, a specialized dataset that reflects unique properties of plants would aid researchers in investigating the utility of newly developed object detectors within agricultural contexts. This article presents AriAplBud: a close-up apple flower bud image dataset created using an unmanned aerial vehicle (UAV)-based red–green–blue (RGB) camera. AriAplBud contains 3600 images of apple flower buds at six growth stages, with 110,467 manual bounding box annotations as positive samples and 2520 additional empty orchard images containing no apple flower bud as negative samples. AriAplBud can be directly deployed for developing object detection models that accept Darknet annotation format without additional preprocessing steps, serving as a potential benchmark for future agricultural object detection research. A demonstration of developing YOLOv8-based apple flower bud detectors is also presented in this article.
      Citation: Data
      PubDate: 2024-02-11
      DOI: 10.3390/data9020036
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 37: Digital Elevation Models and Orthomosaics of the
           Dutch Noordwest Natuurkern Foredune Restoration Project

    • Authors: Gerben Ruessink, Dick Groenendijk, Bas Arens
      First page: 37
      Abstract: Coastal dunes worldwide are increasingly under pressure from the adverse effects of human activities. Therefore, more and more restoration measures are being taken to create conditions that help disturbed coastal dune ecosystems regenerate or recover naturally. However, many projects lack the (open-access) monitoring observations needed to signal whether further actions are needed, and hence lack the opportunity to "learn by doing". This submission presents an open-access data set of 37 high-resolution digital elevation models and 24 orthomosaics collected before and after the excavation of five artificial foredune trough blowouts (“notches”) in winter 2012/2013 in the Dutch Zuid-Kennemerland National Park, one of the largest coastal dune restoration projects in northwest Europe. These high-resolution data provide a valuable resource for improving understanding of the biogeomorphic processes that determine the evolution of restored dune systems as well as developing guidelines to better design future restoration efforts with foredune notching.
      Citation: Data
      PubDate: 2024-02-15
      DOI: 10.3390/data9020037
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 38: Multimodal Hinglish Tweet Dataset for Deep
           Pragmatic Analysis

    • Authors: Pratibha, Amandeep Kaur, Meenu Khurana, Robertas Damaševičius
      First page: 38
      Abstract: Wars, conflicts, and peace efforts have become inherent characteristics of regions, and understanding the prevailing sentiments related to these issues is crucial for finding long-lasting solutions. Twitter/`X’, with its vast user base and real-time nature, provides a valuable source to assess the raw emotions and opinions of people regarding war, conflict, and peace. This paper focuses on collecting and curating hinglish tweets specifically related to wars, conflicts, and associated taxonomy. The creation of said dataset addresses the existing gap in contemporary literature, which lacks comprehensive datasets capturing the emotions and sentiments expressed by individuals regarding wars, conflicts, and peace efforts. This dataset holds significant value and application in deep pragmatic analysis as it enables future researchers to identify the flow of sentiments, analyze the information architecture surrounding war, conflict, and peace effects, and delve into the associated psychology in this context. To ensure the dataset’s quality and relevance, a meticulous selection process was employed, resulting in the inclusion of explanable 500 carefully chosen search filters. The dataset currently has 10,040 tweets that have been validated with the help of human expert to make sure they are correct and accurate.
      Citation: Data
      PubDate: 2024-02-15
      DOI: 10.3390/data9020038
      Issue No: Vol. 9, No. 2 (2024)
       
  • Data, Vol. 9, Pages 11: ADAS Simulation Result Dataset Processing Based on
           Improved BP Neural Network

    • Authors: Songyan Zhao, Lingshan Chen, Yongchao Huang
      First page: 11
      Abstract: The autonomous driving simulation field lacks evaluation and forecasting systems for simulation results. The data obtained from the simulation of target algorithms and vehicle models cannot be reasonably estimated. This problem affects subsequent vehicle improvement and parameter calibration. The authors relied on the simulation results of the AEB algorithm. We selected the BP Neural Network as the basis and improved it with a genetic algorithm optimized via a roulette algorithm. The regression evaluation indicators of the prediction results show that the GA-BP neural network has better prediction accuracy and generalization ability than the original BP neural network and other optimized BP neural networks. This GA-BP neural network also fills the Gap in Evaluation and Prediction Systems.
      Citation: Data
      PubDate: 2024-01-05
      DOI: 10.3390/data9010011
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 12: DeepSpaceYoloDataset: Annotated Astronomical
           Images Captured with Smart Telescopes

    • Authors: Olivier Parisot
      First page: 12
      Abstract: Recent smart telescopes allow the automatic collection of a large quantity of data for specific portions of the night sky—with the goal of capturing images of deep sky objects (nebula, galaxies, globular clusters). Nevertheless, human verification is still required afterwards to check whether celestial targets are effectively visible in the images produced by these instruments. Depending on the magnitude of deep sky objects, the observation conditions and the cumulative time of data acquisition, it is possible that only stars are present in the images. In addition, unfavorable external conditions (light pollution, bright moon, etc.) can make capture difficult. In this paper, we describe DeepSpaceYoloDataset, a set of 4696 RGB astronomical images captured by two smart telescopes and annotated with the positions of deep sky objects that are effectively in the images. This dataset can be used to train detection models on this type of image, enabling the better control of the duration of capture sessions, but also to detect unexpected celestial events such as supernova.
      Citation: Data
      PubDate: 2024-01-10
      DOI: 10.3390/data9010012
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 13: Adaptive Forecasting in Energy Consumption: A
           Bibliometric Analysis and Review

    • Authors: Manuel Jaramillo, Wilson Pavón, Lisbeth Jaramillo
      First page: 13
      Abstract: This paper addresses the challenges in forecasting electrical energy in the current era of renewable energy integration. It reviews advanced adaptive forecasting methodologies while also analyzing the evolution of research in this field through bibliometric analysis. The review highlights the key contributions and limitations of current models with an emphasis on the challenges of traditional methods. The analysis reveals that Long Short-Term Memory (LSTM) networks, optimization techniques, and deep learning have the potential to model the dynamic nature of energy consumption, but they also have higher computational demands and data requirements. This review aims to offer a balanced view of current advancements and challenges in forecasting methods, guiding researchers, policymakers, and industry experts. It advocates for collaborative innovation in adaptive methodologies to enhance forecasting accuracy and support the development of resilient, sustainable energy systems.
      Citation: Data
      PubDate: 2024-01-11
      DOI: 10.3390/data9010013
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 14: GeMSyD: Generic Framework for Synthetic Data
           Generation

    • Authors: Ramona Tolas, Raluca Portase, Rodica Potolea
      First page: 14
      Abstract: In the era of data-driven technologies, the need for diverse and high-quality datasets for training and testing machine learning models has become increasingly critical. In this article, we present a versatile methodology, the Generic Methodology for Constructing Synthetic Data Generation (GeMSyD), which addresses the challenge of synthetic data creation in the context of smart devices. GeMSyD provides a framework that enables the generation of synthetic datasets, aligning them closely with real-world data. To demonstrate the utility of GeMSyD, we instantiate the methodology by constructing a synthetic data generation framework tailored to the domain of event-based data modeling, specifically focusing on user interactions with smart devices. Our framework leverages GeMSyD to create synthetic datasets that faithfully emulate the dynamics of human–device interactions, including the temporal dependencies. Furthermore, we showcase how the synthetic data generated using our framework can serve as a valuable resource for machine learning practitioners. By employing these synthetic datasets, we perform a series of experiments to evaluate the performance of a neural-network-based prediction model in the domain of smart device interaction. Our results underscore the potential of synthetic data in facilitating model development and benchmarking.
      Citation: Data
      PubDate: 2024-01-11
      DOI: 10.3390/data9010014
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 15: Proteomic and Metabolomic Analyses of the Blood
           Samples of Highly Trained Athletes

    • Authors: Kristina A. Malsagova, Arthur T. Kopylov, Vasiliy I. Pustovoyt, Evgenii I. Balakin, Ksenia A. Yurku, Alexander A. Stepanov, Liudmila I. Kulikova, Vladimir R. Rudnev, Anna L. Kaysheva
      First page: 15
      Abstract: High exercise loading causes intricate and ambiguous proteomic and metabolic changes. This study aims to describe the dataset on protein and metabolite contents in plasma samples collected from highly trained athletes across different sports disciplines. The proteomic and metabolomic analyses of the plasma samples of highly trained athletes engaged in sports disciplines of different intensities were carried out using HPLC-MS/MS. The results are reported as two datasets (proteomic data in a derived mgf-file and metabolomic data in processed format), each containing the findings obtained by analyzing 93 mass spectra. Variations in the protein and metabolite contents of the biological samples are observed, depending on the intensity of training load for different sports disciplines. Mass spectrometric proteomic and metabolomic studies can be used for classifying different athlete phenotypes according to the intensity of sports discipline and for the assessment of the efficiency of the recovery period.
      Citation: Data
      PubDate: 2024-01-16
      DOI: 10.3390/data9010015
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 16: Elliott State Research Forest Timber Cruise,
           Oregon, 2015–2016

    • Authors: Todd West, Bogdan M. Strimbu
      First page: 16
      Abstract: The Elliott State Research Forest comprises 33,700 ha of temperate, Douglas-fir rainforest along North America’s Pacific Coast (Oregon, United States). In 2015, naturally regenerated stands at least 92 years old covered 49% of the research area and sawtimber plantations younger than 68 years another 50%. During the winter of 2015–2016, a forest wide inventory sampled both naturally regenerated and plantation stands, recording 97,424 trees on 17,866 plots in 738 stands. The resulting dataset is atypical for the area as plot locations were not restricted to upland, commercially harvestable timber. Multiage stands and riparian areas were therefore documented along with plantations 2–61 years old and trees retained through clearcut harvests. This dataset constitutes the only open access, stand-based forest inventory currently available for a large area within the Oregon Coast Range. The dataset enables development of suites of models as well as many comparisons across stand ages and types, both at stand level and at the level of individual trees.
      Citation: Data
      PubDate: 2024-01-18
      DOI: 10.3390/data9010016
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 17: Machine Learning Classification Workflow and
           Datasets for Ionospheric VLF Data Exclusion

    • Authors: Arnaut, Kolarski, Srećković
      First page: 17
      Abstract: Machine learning (ML) methods are commonly applied in the fields of extraterrestrial physics, space science, and plasma physics. In a prior publication, an ML classification technique, the Random Forest (RF) algorithm, was utilized to automatically identify and categorize erroneous signals, including instrument errors, noisy signals, outlier data points, and the impact of solar flares (SFs) on the ionosphere. This data communication includes the pre-processed dataset used in the aforementioned research, along with a workflow that utilizes the PyCaret library and a post-processing workflow. The code and data serve educational purposes in the interdisciplinary field of ML and ionospheric physics science, as well as being useful to other researchers for diverse objectives.
      Citation: Data
      PubDate: 2024-01-18
      DOI: 10.3390/data9010017
      Issue No: Vol. 9, No. 1 (2024)
       
  • Data, Vol. 9, Pages 1: Expert-Annotated Dataset to Study Cyberbullying in
           Polish Language

    • Authors: Michal Ptaszynski, Agata Pieciukiewicz, Pawel Dybala, Pawel Skrzek, Kamil Soliwoda, Marcin Fortuna, Gniewosz Leliwa, Michal Wroczynski
      First page: 1
      Abstract: We introduce the first dataset of harmful and offensive language collected from the Polish Internet. This dataset was meticulously curated to facilitate the exploration of harmful online phenomena such as cyberbullying and hate speech, which have exhibited a significant surge both within the Polish Internet as well as globally. The dataset was systematically collected and then annotated using two approaches. First, it was annotated by two proficient layperson volunteers, operating under the guidance of a specialist in the language of cyberbullying and hate speech. To enhance the precision of the annotations, a secondary round of annotations was carried out by a team of adept annotators with specialized long-term expertise in cyberbullying and hate speech annotations. This second phase was further overseen by an experienced annotator, acting as a super-annotator. In its initial application, the dataset was leveraged for the categorization of cyberbullying instances in the Polish language. Specifically, the dataset serves as the foundation for two distinct tasks: (1) a binary classification that segregates harmful and non-harmful messages and (2) a multi-class classification that distinguishes between two variations of harmful content (cyberbullying and hate speech), as well as a non-harmful category. Alongside the dataset itself, we also provide the models that showed satisfying classification performance. These models are made accessible for third-party use in constructing cyberbullying prevention systems.
      Citation: Data
      PubDate: 2023-12-20
      DOI: 10.3390/data9010001
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 9, Pages 2: Medical Opinions Analysis about the Decrease of
           Autopsies Using Emerging Pattern Mining

    • Authors: Isaac Machorro-Cano, Ingrid Aylin Ríos-Méndez, José Antonio Palet-Guzmán, Nidia Rodríguez-Mazahua, Lisbeth Rodríguez-Mazahua, Giner Alor-Hernández, José Oscar Olmedo-Aguirre
      First page: 2
      Abstract: An autopsy is a widely recognized procedure to guarantee ongoing enhancements in medicine. It finds extensive application in legal, scientific, medical, and research domains. However, declining autopsy rates in hospitals constitute a worldwide concern. For example, the Regional Hospital of Rio Blanco in Veracruz, Mexico, has substantially reduced the number of autopsies at hospitals in recent years. Since there are no documented historical records of a decrease in the frequency of autopsy cases, it is crucial to establish a methodological framework to substantiate any actual trends in the data. Emerging pattern mining (EPM) allows for finding differences between classes or data sets because it builds a descriptive data model concerning some given remarkable property. Data set description has become a significant application area in various contexts in recent years. In this research study, various EPM (emerging pattern mining) algorithms were used to extract emergent patterns from a data set collected based on medical experts’ perspectives on reducing hospital autopsies. Notably, the top-performing EPM algorithms were iEPMiner, LCMine, SJEP-C, Top-k minimal SJEPs, and Tree-based JEP-C. Among these, iEPMiner and LCMine demonstrated faster performance and produced superior emergent patterns when considering metrics such as Confidence, Weighted Relative Accuracy Criteria (WRACC), False Positive Rate (FPR), and True Positive Rate (TPR).
      Citation: Data
      PubDate: 2023-12-21
      DOI: 10.3390/data9010002
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 9, Pages 3: Unlocking Insights: Analysing COVID-19 Lockdown
           Policies and Mobility Data in Victoria, Australia, through a Data-Driven
           Machine Learning Approach

    • Authors: Shiyang Lyu, Oyelola Adegboye, Kiki Adhinugraha, Theophilus I Emeto, David Taniar
      First page: 3
      Abstract: The state of Victoria, Australia, implemented one of the world’s most prolonged cumulative lockdowns in 2020 and 2021. Although lockdowns have proven effective in managing COVID-19 worldwide, this approach faced challenges in containing the rising infection in Victoria. This study evaluates the effects of short-term (less than 60 days) and long-term (more than 60 days) lockdowns on public mobility and the effectiveness of various social restriction measures within these periods. The aim is to understand the complexities of pandemic management by examining various measures over different lockdown durations, thereby contributing to more effective COVID-19 containment methods. Using restriction policy, community mobility, and COVID-19 data, a machine-learning-based simulation model was proposed, incorporating analysis of correlation, infection doubling time, and effective lockdown date. The model result highlights the significant impact of public event cancellations in preventing COVID-19 infection during short- and long-term lockdowns and the importance of international travel controls in long-term lockdowns. The effectiveness of social restriction was found to decrease significantly with the transition from short to long lockdowns, characterised by increased visits to public places and increased use of public transport, which may be associated with an increase in the effective reproduction number (Rt) and infected cases.
      Citation: Data
      PubDate: 2023-12-21
      DOI: 10.3390/data9010003
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 9, Pages 4: An Urban Traffic Dataset Composed of Visible Images
           and Their Semantic Segmentation Generated by the CARLA Simulator

    • Authors: Sergio Bemposta Rosende, David San José Gavilán, Javier Fernández-Andrés, Javier Sánchez-Soriano
      First page: 4
      Abstract: A dataset of aerial urban traffic images and their semantic segmentation is presented to be used to train computer vision algorithms, among which those based on convolutional neural networks stand out. This article explains the process of creating the complete dataset, which includes the acquisition of the images, the labeling of vehicles, pedestrians, and pedestrian crossings as well as a description of the structure and content of the dataset (which amounts to 8694 images including visible images and those corresponding to the semantic segmentation). The images were generated using the CARLA simulator (but were like those that could be obtained with fixed aerial cameras or by using multi-copter drones) in the field of intelligent transportation management. The presented dataset is available and accessible to improve the performance of vision and road traffic management systems, especially for the detection of incorrect or dangerous maneuvers.
      Citation: Data
      PubDate: 2023-12-24
      DOI: 10.3390/data9010004
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 9, Pages 5: Single-Nucleotide Variants in PADI2 and PADI4 and
           Ancestry Informative Markers in Interstitial Lung Disease and Rheumatoid
           Arthritis among a Mexican Mestizo Population

    • Authors: Karol J. Nava-Quiroz, Jorge Rojas-Serrano, Gloria Pérez-Rubio, Ivette Buendia-Roldan, Mayra Mejía, Juan Carlos Fernández-López, Espiridión Ramos-Martínez, Luis A. López-Flores, Alma D. Del Ángel-Pablo, Ramcés Falfán-Valencia
      First page: 5
      Abstract: Rheumatoid arthritis (RA) is an autoimmune disease mainly characterized by joint inflammation. It presents extra-articular manifestations, with the lungs being one of the affected areas. Among these, damage to the pulmonary interstitium (Interstitial Lung Disease—ILD) has been linked to proteins involved in the inflammatory process and related to extracellular matrix deposition and lung fibrosis establishment. Peptidyl arginine deiminase enzymes (PAD), which carry out protein citrullination, play a role in this context. A genetic association analysis was conducted on genes encoding two PAD isoforms: PAD2 and PAD4. This analysis also included ancestry informative markers and protein level determination in samples from patients with RA, RA-associated ILD, and clinically healthy controls. Significant single nucleotide variants (SNV) and one haplotype were identified as susceptibility factors for RA-ILD development. Elevated levels of PAD4 were found in RA-ILD cases, while PADI2 showed an association with RA susceptibility. This work presents data obtained from previously published research. Population variability has been noticed in genetic association studies. We present data for 14 SNVs that show geographical and genetic variation across the Mexican population, which provides highly informative content and greater intrapopulation genetic diversity. Further investigations in the field should be considered in addition to AIMs. The data presented in this study were analyzed in association with SNV genotypes in PADI2 and PADI4 to assess susceptibility to ILD in RA, as well as with changes in PAD2 and PAD4 protein levels according to carrier genotype, in addition to the use of covariates such as ancestry markers.
      Citation: Data
      PubDate: 2023-12-25
      DOI: 10.3390/data9010005
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 9, Pages 6: A Profit Maximization Model for Data Consumers with
           Data Providers’ Incentives in Personal Data Trading Market

    • Authors: Hyojin Park, Hyeontaek Oh, Jun Kyun Choi
      First page: 6
      Abstract: This paper proposes a profit maximization model for a data consumer when it buys personal data from data providers (by obtaining consent) through data brokers and provides their new services to data providers (i.e., service consumers). To observe the behavioral models of data providers, the data consumer, and service consumers, this paper proposes the willingness-to-sell model of personal data of data providers (which is affected by data providers’ behavior related to explicit consent), the service quality model obtained by the collected personal data from the data consumer’s perspective, and the willingness-to-pay model of service consumers regarding provided new services from the data consumer. Particularly, this paper jointly considers the behavior of data providers and service users under a limited budget. With parameters inspired by real-world surveys on data providers, this paper shows various numerical results to check the feasibility of the proposed models.
      Citation: Data
      PubDate: 2023-12-25
      DOI: 10.3390/data9010006
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 9, Pages 7: Data-Driven Analysis of MRI Scans: Exploring Brain
           Structure Variations in Colombian Adolescent Offenders

    • Authors: Germán Sánchez-Torres, Nallig Leal, Mariana Pino
      First page: 7
      Abstract: With the advancements in neuroimaging techniques, understanding the relationship between brain morphology and behavioral tendencies such as criminal behavior has garnered interest. This research addresses the investigation of disparities in neuroanatomical structures between adolescent offenders and non-offenders and considers the implications of such distinctions regarding offender behavior within adolescent populations. Employing data-driven methodologies, MRI scans of adolescents from Barranquilla, Colombia, were analyzed to explore morphological variations. Utilizing a 1.5 Tesla Siemens resonator (Siemens Healthineers, Erlangen, Germany), T1-weighted MPRAGE anatomical images were acquired and analyzed using a systematic five-step methodology including data acquisition, MRI pre-processing, feature selection, model selection, and model validation and evaluation. Participants, both offenders and non-offenders, were aged 14–18 and selected based on education, criminal history, and physical conditions. The research identified significant disparities in the volumes of 42 brain structures between adolescent offenders (AOs) and non-offenders (NOs), highlighting particular brain regions potentially associated with offending behavior. Additionally, a considerable proportion of AOs emanated from lower socioeconomic backgrounds and showcased marked substance use. The findings suggest that neuroanatomical disparities potentially correlate with criminal behavior among adolescents at a neurobiological level. Noticeable socio-environmental factors, such as lower socioeconomic status and substance abuse, were substantially prevalent among AOs. Particularly, neurobiological deviations in structures like ctx-lh-rostralmiddlefrontal and ctx-lh-caudalanteriorcingulate perhaps represent a link between neurological factors and external stimuli.
      Citation: Data
      PubDate: 2023-12-26
      DOI: 10.3390/data9010007
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 9, Pages 8: DNA Methylome and Transcriptome Maps of Primary
           Colorectal Cancer and Matched Liver Metastasis

    • Authors: Ajithkumar, Gimenez, Stockwell, Almomani, Bowden, Leichter, Ahn, Pattison, Schmeier, Frizelle, Eccles, Purcell, Rodger, Chatterjee
      First page: 8
      Abstract: Sequencing-based genome-wide DNA methylation, gene expression studies and associated data on paired colorectal cancer (CRC) primary and liver metastasis are very limited. We have profiled the DNA methylome and transcriptome of matched primary CRC and liver metastasis samples from the same patients. Genome-scale methylation and expression levels were examined using Reduced Representation Bisulfite Sequencing (RRBS) and RNA-Seq, respectively. To investigate DNA methylation and expression patterns, we generated a total of 1.01 × 109 RRBS reads and 4.38 x 108 RNA-Seq reads from the matched cancer tissues. Here, we describe in detail the sample features, experimental design, methods and bioinformatic pipeline for these epigenetic data. We demonstrate the quality of both the samples and sequence data obtained from the paired samples. The sequencing data obtained from this study will serve as a valuable resource for studying underlying mechanisms of distant metastasis and the utility of epigenetic profiles in cancer metastasis.
      Citation: Data
      PubDate: 2023-12-29
      DOI: 10.3390/data9010008
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 9, Pages 9: Wi-Gitation: Replica Wi-Fi CSI Dataset for Physical
           Agitation Activity Recognition

    • Authors: Nikita Sharma, Jeroen Klein Brinke, L. M. A. Braakman Jansen, Paul J. M. Havinga, Duc V. Le
      First page: 9
      Abstract: Agitation is a commonly found behavioral condition in persons with advanced dementia. It requires continuous monitoring to gain insights into agitation levels to assist caregivers in delivering adequate care. The available monitoring techniques use cameras and wearables which are distressful and intrusive and are thus often rejected by older adults. To enable continuous monitoring in older adult care, unobtrusive Wi-Fi channel state information (CSI) can be leveraged to monitor physical activities related to agitation. However, to the best of our knowledge, there are no realistic CSI datasets available for facilitating the classification of physical activities demonstrated during agitation scenarios such as disturbed walking, repetitive sitting–getting up, tapping on a surface, hand wringing, rubbing on a surface, flipping objects, and kicking. Therefore, in this paper, we present a public dataset named Wi-Gitation. For Wi-Gitation, the Wi-Fi CSI data were collected with twenty-three healthy participants depicting the aforementioned agitation-related physical activities at two different locations in a one-bedroom apartment with multiple receivers placed at different distances (0.5–8 m) from the participants. The validation results on the Wi-Gitation dataset indicate higher accuracies (F1-Scores ≥0.95) when employing mixed-data analysis, where the training and testing data share the same distribution. Conversely, in scenarios where the training and testing data differ in distribution (i.e., leave-one-out), the accuracies experienced a notable decline (F1-Scores ≤0.21). This dataset can be used for fundamental research on CSI signals and in the evaluation of advanced algorithms developed for tackling domain invariance in CSI-based human activity recognition.
      Citation: Data
      PubDate: 2023-12-30
      DOI: 10.3390/data9010009
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 9, Pages 10: Experimental Dataset of Tunable Mode Converter
           Based on Long-Period Fiber Gratings Written in Few-Mode Fiber: Impacts of
           Thermal, Wavelength, and Polarization Variations

    • Authors: Juan Soto-Perdomo, Erick Reyes-Vera, Jorge Montoya-Cardona, Pedro Torres
      First page: 10
      Abstract: Mode division multiplexing (MDM) is currently one of the most attractive multiplexing techniques in optical communications, as it allows for an increase in the number of channels available for data transmission. Optical modal converters are one of the main devices used in this technique. Therefore, the characterization and improvement of these devices are of great current interest. In this work, we present a dataset of 49,736 near-field intensity images of a modal converter based on a long-period fiber grating (LPFG) written on a few-mode fiber (FMF). This characterization was performed experimentally at various wavelengths, polarizations, and temperature conditions when the device converted from LP01 mode to LP11 mode. The results show that the modal converter can be tuned by adjusting these parameters, and that its operation is optimal under specific circumstances which have a great impact on its performance. Additionally, the potential application of the database is validated in this work. A modal decomposition technique based on the particle swarm algorithm (PSO) was employed as a tool for determining the most effective combinations of modal weights and relative phases from the spatial distributions collected in the dataset. The proposed dataset can open up new opportunities for researchers working on image segmentation, detection, and classification problems related to MDM technology. In addition, we implement novel artificial intelligence techniques that can help in finding the optimal operating conditions for this type of device.
      Citation: Data
      PubDate: 2023-12-31
      DOI: 10.3390/data9010010
      Issue No: Vol. 9, No. 1 (2023)
       
  • Data, Vol. 8, Pages 174: Machine Learning Applications to Identify Young
           Offenders Using Data from Cognitive Function Tests

    • Authors: María Claudia Bonfante, Juan Contreras Montes, Mariana Pino, Ronald Ruiz, Gabriel González
      First page: 174
      Abstract: Machine learning techniques can be used to identify whether deficits in cognitive functions contribute to antisocial and aggressive behavior. This paper initially presents the results of tests conducted on delinquent and nondelinquent youths to assess their cognitive functions. The dataset extracted from these assessments, consisting of 37 predictor variables and one target, was used to train three algorithms which aim to predict whether the data correspond to those of a young offender or a nonoffending youth. Prior to this, statistical tests were conducted on the data to identify characteristics which exhibited significant differences in order to select the most relevant features and optimize the prediction results. Additionally, other feature selection methods, such as Boruta, RFE, and filter, were applied, and their effects on the accuracy of each of the three machine learning models used (SVM, RF, and KNN) were compared. In total, 80% of the data were utilized for training, while the remaining 20% were used for validation. The best result was achieved by the K-NN model, trained with 19 features selected by the Boruta method, followed by the SVM model, trained with 24 features selected by the filter method.
      Citation: Data
      PubDate: 2023-11-21
      DOI: 10.3390/data8120174
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 175: Long-Term Spatiotemporal Oceanographic Data from
           the Northeast Pacific Ocean: 1980–2022 Reconstruction Based on the
           Korea Oceanographic Data Center (KODC) Dataset

    • Authors: Seong-Hyeon Kim, Hansoo Kim
      First page: 175
      Abstract: The Korea Oceanographic Data Center (KODC), overseen by the National Institute of Fisheries Science (NIFS), is a pivotal hub for collecting, processing, and disseminating marine science data. By digitizing and subjecting observational data to rigorous quality control, the KODC ensures accurate information in line with international standards. The center actively engages in global partnerships and fosters marine data exchange. A wide array of marine information is provided through the KODC website, including observational metadata, coastal oceanographic data, real-time buoy records, and fishery environmental data. Coastal oceanographic observational data from 207 stations across various sea regions have been collected biannually since 1961. This dataset covers 14 standard water depths; includes essential parameters, such as temperature, salinity, nutrients, and pH; serves as the foundation for news, reports, and analyses by the NIFS; and is widely employed to study seasonal and regional marine variations, with researchers supplementing the limited data for comprehensive insights. The dataset offers information for each water depth at a 1 m interval over 1980–2022, facilitating research across disciplines. Data processing, including interpolation and quality control, is based on MATLAB. These data are classified by region and accessible online; hence, researchers can easily explore spatiotemporal trends in marine environments.
      Citation: Data
      PubDate: 2023-11-23
      DOI: 10.3390/data8120175
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 176: Model Design and Applied Methodology in
           Geothermal Simulations in Very Low Enthalpy for Big Data Applications

    • Authors: Roberto Arranz-Revenga, María Pilar Dorrego de Luxán, Juan Herrera Herbert, Luis Enrique García Cambronero
      First page: 176
      Abstract: Low-enthalpy geothermal installations for heating, air conditioning, and domestic hot water are gaining traction due to efforts towards energy decarbonization. This article is part of a broader research project aimed at employing artificial intelligence and big data techniques to develop a predictive system for the thermal behavior of the ground in very low-enthalpy geothermal applications. In this initial article, a summarized process is outlined to generate large quantities of synthetic data through a ground simulation method. The proposed theoretical model allows simulation of the soil’s thermal behavior using an electrical equivalent. The electrical circuit derived is loaded into a simulation program along with an input function representing the system’s thermal load pattern. The simulator responds with another function that calculates the values of the ground over time. Some examples of value conversion and the utility of the input function system to encode thermal loads during simulation are demonstrated. It bears the limitation of invalidity in the presence of underground water currents. Model validation is pending, and once defined, a corresponding testing plan will be proposed for its validation.
      Citation: Data
      PubDate: 2023-11-23
      DOI: 10.3390/data8120176
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 177: Dataset: Impact of β-galactosylceramidase
           Overexpression on the Protein Profile of Braf(V600E) Mutated Melanoma
           Cells

    • Authors: Davide Capoferri, Paola Chiodelli, Stefano Calza, Marcello Manfredi, Marco Presta
      First page: 177
      Abstract: β-Galactosylceramidase (GALC) is a lysosomal enzyme involved in sphingolipid metabolism by removing β-galactosyl moieties from β-galactosyl ceramide and β-galactosyl sphingosine. Previous observations have shown that GALC exerts a pro-oncogenic activity in human melanoma. Here, the impact of GALC overexpression on the proteomic landscape of BRAF-mutated A2058 and A375 human melanoma cell lines was investigated by liquid chromatography–tandem mass spectrometry analysis of the cell extracts. The results indicate that GALC overexpression causes the upregulation/downregulation of 172/99 proteins in GALC-transduced cells when compared to control cells. Gene ontology categorization of up/down-regulated proteins indicates that GALC may modulate the protein landscape in BRAF-mutated melanoma cells by affecting various biological processes, including RNA metabolism, cell organelle fate, and intracellular redox status. Overall, these data provide further insights into the pro-oncogenic functions of the sphingolipid metabolizing enzyme GALC in human melanoma.
      Citation: Data
      PubDate: 2023-11-24
      DOI: 10.3390/data8120177
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 178: In Vivo Drug Testing during Embryonic Wound
           Healing: Establishing the Avian Model

    • Authors: Martin Bablok, Beate Brand-Saberi, Morris Gellisch, Gabriela Morosan-Puopolo
      First page: 178
      Abstract: The relevance of identifying pathological processes in the context of embryonic development is increasingly gaining attention in terms of professionalized prenatal care. To analyze local effects of prenatally administered drugs during embryonic development, the model organism of the chicken embryo can be used in a first exploratory approach. For the examination of local dexamethasone administration—as an exemplary drug—common bead implantation protocols have been adapted to serve as an in vivo technique for local drug testing during embryonic skin regeneration. For this, acrylic beads were soaked in a dexamethasone solution and implanted into skin incisional wounds of 4-day-old chicken embryos. After further incubation, the effects of the applied substance on the process of embryonic skin regeneration were analyzed using histological and molecular biological techniques. This data descriptor contains a detailed microsurgical protocol, a representative video demonstration, and exemplary results of local glucocorticoid-induced changes during embryonic wound healing. To conclude, this method allows for the analysis of the local effects of a particular substance on a cellular level and can be extended to serve as an in vivo technique for numerous other drugs to be tested on embryonic tissue.
      Citation: Data
      PubDate: 2023-11-25
      DOI: 10.3390/data8120178
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 179: A Tourist-Based Framework for Developing Digital
           Marketing for Small and Medium-Sized Enterprises in the Tourism Sector in
           Saudi Arabia

    • Authors: Rishaa Abdulaziz Alnajim, Bahjat Fakieh
      First page: 179
      Abstract: Social media has become an essential tool for travel planning, with tourists increasingly using it to research destinations, book accommodation, and make travel arrangements. However, little is known about how tourists use social media for travel planning and what factors influence their intentions to use social media for this purpose. This thesis aims to understand tourists’ intentions to use social media for travel planning. Specifically, it investigates the factors influencing tourists’ intentions to use social media for planning travel to Saudi Arabia. It develops a machine learning (ML) classification model to assist Saudi tourism SMEs in creating effective digital marketing strategies for social media platforms. A survey was conducted with 573 tourists interested in visiting Saudi Arabia, using the Design Science Research (DSR) approach. The findings support the tourist-based theoretical framework, showing that perceived usefulness (PU), perceived ease of use (PEOU), satisfaction (SAT), marketing-generated content (MGC), and user-generated content (UGC) significantly impact tourists’ intentions to use social media for travel planning. Tourists’ characteristics and visit characteristics influenced their intentions to use MGC but not UGC. The tourist-based ML classification model, developed using the LinearSVC algorithm, achieved an accuracy of 99% when evaluated using the K-Fold Cross-Validation (KF-CV) technique. The findings of this study have several implications for Saudi tourism SMEs. First, the results suggest that SMEs should focus on developing social media content that is perceived as useful, easy to use, and satisfying. Second, the findings suggest that SMEs should focus on using MGC in their social media marketing campaigns. Third, the results suggest that SMEs should tailor their social media marketing campaigns to the characteristics of their target tourists. This study contributes to the literature on tourism marketing and social media by providing a better understanding of how tourists use social media for travel planning. Saudi tourism SMEs can use the findings of this study to develop more effective digital marketing strategies for social media platforms.
      Citation: Data
      PubDate: 2023-11-28
      DOI: 10.3390/data8120179
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 180: Public Perception of ChatGPT and Transfer
           Learning for Tweets Sentiment Analysis Using Wolfram Mathematica

    • Authors: Yankang Su, Zbigniew J. Kabala
      First page: 180
      Abstract: Understanding public opinion on ChatGPT is crucial for recognizing its strengths and areas of concern. By utilizing natural language processing (NLP), this study delves into tweets regarding ChatGPT to determine temporal patterns, content features, and topic modeling and perform a sentiment analysis. Analyzing a dataset of 500,000 tweets, our research shifts from conventional data science tools like Python and R to exploit Wolfram Mathematica’s robust capabilities. Additionally, with the aim of solving the problem of ignoring semantic information in the LDA model feature extraction, a synergistic methodology entwining LDA, GloVe embeddings, and K-Nearest Neighbors (KNN) clustering is proposed to categorize topics within ChatGPT-related tweets. This comprehensive strategy ensures semantic, syntactic, and topical congruence within classified groups by utilizing the strengths of probabilistic modeling, semantic embeddings, and similarity-based clustering. While built-in sentiment classifiers often fall short in accuracy, we introduce four transfer learning techniques from the Wolfram Neural Net Repository to address this gap. Two of these techniques involve transferring static word embeddings, “GloVe” and “ConceptNet”, which are further processed using an LSTM layer. The remaining techniques center on fine-tuning pre-trained models using scantily annotated data; one refines embeddings from language models (ELMo), while the other fine-tunes bidirectional encoder representations from transformers (BERT). Our experiments on the dataset underscore the effectiveness of the four methods for the sentiment analysis of tweets. This investigation augments our comprehension of user sentiment towards ChatGPT and emphasizes the continued significance of exploration in this domain. Furthermore, this work serves as a pivotal reference for scholars who are accustomed to using Wolfram Mathematica in other research domains, aiding their efforts in text analytics on social media platforms.
      Citation: Data
      PubDate: 2023-11-28
      DOI: 10.3390/data8120180
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 181: Internationalization in the Baltic Regional
           Accounts: A NUTS 3 Region Dataset

    • Authors: Rasmus Bøgh Holmen, Nicolas Gavoille, Jaan Masso, Arūnas Burinskas
      First page: 181
      Abstract: Features of internationalization, such as trade, foreign direct investments, and international migration, are crucial for understanding the economic developments of small and open economies. However, studying internationalization at the country level may obscure significant heterogeneity in its relationship with economic growth and other economic and social outcomes. Regional accounts provide insights into the geography of internationalization, but collections of such disaggregated statistics are rarely provided by statistical bureaus. The purpose of this paper is twofold. First, we demonstrate how regional account data, including internationalization indicators, can be constructed to obtain consistent and homogeneous regional-level series using a combination of micro and macro data sources. Second, our aim is to foster spatial research on internationalization and the spatial economy in the Baltics by providing comprehensive data collection of socio-economic variables at the NUTS 3 regional level over time. This collection encompasses trade, FDI, and migration, enabling the study of internationalization and other features of the Baltic economy. We present a series of key features, revealing noticeable correlation patterns between regional development and internationalization.
      Citation: Data
      PubDate: 2023-11-30
      DOI: 10.3390/data8120181
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 182: An Automated Big Data Quality Anomaly Correction
           Framework Using Predictive Analysis

    • Authors: Widad Elouataoui, Saida El Mendili, Youssef Gahi
      First page: 182
      Abstract: Big data has emerged as a fundamental component in various domains, enabling organizations to extract valuable insights and make informed decisions. However, ensuring data quality is crucial for effectively using big data. Thus, big data quality has been gaining more attention in recent years by researchers and practitioners due to its significant impact on decision-making processes. However, existing studies addressing data quality anomalies often have a limited scope, concentrating on specific aspects such as outliers or inconsistencies. Moreover, many approaches are context-specific, lacking a generic solution applicable across different domains. To the best of our knowledge, no existing framework currently automatically addresses quality anomalies comprehensively and generically, considering all aspects of data quality. To fill the gaps in the field, we propose a sophisticated framework that automatically corrects big data quality anomalies using an intelligent predictive model. The proposed framework comprehensively addresses the main aspects of data quality by considering six key quality dimensions: Accuracy, Completeness, Conformity, Uniqueness, Consistency, and Readability. Moreover, the framework is not correlated to a specific field and is designed to be applicable across various areas, offering a generic approach to address data quality anomalies. The proposed framework was implemented on two datasets and has achieved an accuracy of 98.22%. Moreover, the results have shown that the framework has allowed the data quality to be boosted to a great score, reaching 99%, with an improvement rate of up to 14.76% of the quality score.
      Citation: Data
      PubDate: 2023-12-01
      DOI: 10.3390/data8120182
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 183: Spectrogram Dataset of Korean Smartphone Audio
           Files Forged Using the “Mix Paste” Command

    • Authors: Yeongmin Son, Won Jun Kwak, Jae Wan Park
      First page: 183
      Abstract: This study focuses on the field of voice forgery detection, which is increasing in importance owing to the introduction of advanced voice editing technologies and the proliferation of smartphones. This study introduces a unique dataset that was built specifically to identify forgeries created using the “Mix Paste” technique. This editing technique can overlay audio segments from similar or different environments without creating a new timeframe, making it nearly infeasible to detect forgeries using traditional methods. The dataset consists of 4665 and 45,672 spectrogram images from 1555 original audio files and 15,224 forged audio files, respectively. The original audio was recorded using iPhone and Samsung Galaxy smartphones to ensure a realistic sampling environment. The forged files were created from these recordings and subsequently converted into spectrograms. The dataset also provided the metadata of the original voice files, offering additional context and information that could be used for analysis and detection. This dataset not only fills a gap in existing research but also provides valuable support for developing more efficient deep learning models for voice forgery detection. By addressing the “Mix Paste” technique, the dataset caters to a critical need in voice authentication and forensics, potentially contributing to enhancing security in society.
      Citation: Data
      PubDate: 2023-12-01
      DOI: 10.3390/data8120183
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 184: An Urban Image Stimulus Set Generated from Social
           Media

    • Authors: Ardaman Kaur, André Leite Rodrigues, Sarah Hoogstraten, Diego Andrés Blanco-Mora, Bruno Miranda, Paulo Morgado, Dar Meshi
      First page: 184
      Abstract: Social media data, such as photos and status posts, can be tagged with location information (geotagging). This geotagged information can be used for urban spatial analysis to explore neighborhood characteristics or mobility patterns. With increasing rural-to-urban migration, there is a need for comprehensive data capturing the complexity of urban settings and their influence on human experiences. Here, we share an urban image stimulus set from the city of Lisbon that researchers can use in their experiments. The stimulus set consists of 160 geotagged urban space photographs extracted from the Flickr social media platform. We divided the city into 100 × 100 m cells to calculate the cell image density (number of images in each cell) and the cell green index (Normalized Difference Vegetation Index of each cell) and assigned these values to each geotagged image. In addition, we also computed the popularity of each image (normalized views on the social network). We also categorized these images into two putative groups by photographer status (residents and tourists), with 80 images belonging to each group. With the rise in data-driven decisions in urban planning, this stimulus set helps explore human–urban environment interaction patterns, especially if complemented with survey/neuroimaging measures or machine-learning analyses.
      Citation: Data
      PubDate: 2023-12-01
      DOI: 10.3390/data8120184
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 185: Land Cover Classification in the Antioquia Region
           of the Tropical Andes Using NICFI Satellite Data Program Imagery and
           Semantic Segmentation Techniques

    • Authors: Luisa F. Gomez-Ossa, German Sanchez-Torres, John W. Branch-Bedoya
      First page: 185
      Abstract: Land cover classification, generated from satellite imagery through semantic segmentation, has become fundamental for monitoring land use and land cover change (LULCC). The tropical Andes territory provides opportunities due to its significance in the provision of ecosystem services. However, the lack of reliable data for this region, coupled with challenges arising from its mountainous topography and diverse ecosystems, hinders the description of its coverage. Therefore, this research proposes the Tropical Andes Land Cover Dataset (TALANDCOVER). It is constructed from three sample strategies: aleatory, minimum 50%, and 70% of representation per class, which address imbalanced geographic data. Additionally, the U-Net deep learning model is applied for enhanced and tailored classification of land covers. Using high-resolution data from the NICFI program, our analysis focuses on the Department of Antioquia in Colombia. The TALANDCOVER dataset, presented in TIF format, comprises multiband R-G-B-NIR images paired with six labels (dense forest, grasslands, heterogeneous agricultural areas, bodies of water, built-up areas, and bare-degraded lands) with an estimated 0.76 F1 score compared to ground truth data by expert knowledge and surpassing the precision of existing global cover maps for the study area. To the best of our knowledge, this work is a pioneer in its release of open-source data for segmenting coverages with pixel-wise labeled NICFI imagery at a 4.77 m resolution. The experiments carried out with the application of the sample strategies and models show F1 score values of 0.70, 0.72, and 0.74 for aleatory, balanced 50%, and balanced 70%, respectively, over the expert segmented sample (ground truth), which suggests that the personalized application of our deep learning model, together with the TALANDCOVER dataset offers different possibilities that facilitate the training of deep architectures for the classification of large-scale covers in complex areas, such as the tropical Andes. This advance has significant potential for decision making, emphasizing sustainable land use and the conservation of natural resources.
      Citation: Data
      PubDate: 2023-12-04
      DOI: 10.3390/data8120185
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 186: A Qualitative Dataset for Coffee Bio-Aggressors
           Detection Based on the Ancestral Knowledge of the Cauca Coffee Farmers in
           Colombia

    • Authors: Juan Felipe Valencia-Mosquera, David Griol, Mayra Solarte-Montoya, Cristhian Figueroa, Juan Carlos Corrales, David Camilo Corrales
      First page: 186
      Abstract: This paper describes a novel qualitative dataset regarding coffee pests based on the ancestral knowledge of coffee farmers in the Department of Cauca, Colombia. The dataset has been obtained from a survey applied to coffee growers with 432 records and 41 variables collected weekly from September 2020 to August 2021. The qualitative dataset includes climatic conditions, productive activities, external conditions, and coffee bio-aggressors. This dataset allows researchers to find patterns for coffee crop protection through the ancestral knowledge not detected by real-time agricultural sensors. As far as we are concerned, there are no datasets like the one presented in this paper with similar characteristics of qualitative value that express the empirical knowledge of coffee farmers used to detect triggers of causal behaviors of pests and diseases in coffee crops.
      Citation: Data
      PubDate: 2023-12-08
      DOI: 10.3390/data8120186
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 187: Genome Sequence of the Plant-Growth-Promoting
           Endophyte Curtobacterium flaccumfaciens Strain W004

    • Authors: Vladimir K. Chebotar, Maria S. Gancheva, Elena P. Chizhevskaya, Maria E. Baganova, Oksana V. Keleinikova, Kharon A. Husainov, Veronika N. Pishchik
      First page: 187
      Abstract: We report the whole-genome sequences of the endophyte Curtobacterium flaccumfaciens strain W004 isolated from the seeds of winter wheat, cv. Bezostaya 100. The genome was obtained using Oxford Nanopore MinION sequencing. The bacterium has a circular chromosome consisting of 3.63 kbp with a G+C% content of 70.89%. We found that Curtobacterium flaccumfaciens strain W004 could promote the growth of spring wheat plants, resulting in an increase in grain yield of 54.3%. Sequencing the genome of this new strain can provide insights into its potential role in plant–microbe interactions.
      Citation: Data
      PubDate: 2023-12-09
      DOI: 10.3390/data8120187
      Issue No: Vol. 8, No. 12 (2023)
       
  • Data, Vol. 8, Pages 159: DataPLAN: A Web-Based Data Management Plan
           Generator for the Plant Sciences

    • Authors: Xiao-Ran Zhou, Sebastian Beier, Dominik Brilhaus, Cristina Martins Rodrigues, Timo Mühlhaus, Dirk von Suchodoletz, Richard M. Twyman, Björn Usadel, Angela Kranz
      First page: 159
      Abstract: Research data management (RDM) combines a set of practices for the organization, storage and preservation of data from research projects. The RDM strategy of a project is usually formalized as a data management plan (DMP)—a document that sets out procedures to ensure data findability, accessibility, interoperability and reusability (FAIR-ness). Many aspects of RDM are standardized across disciplines so that data and metadata are reusable, but the components of DMPs in the plant sciences are often disconnected. The inability to reuse plant-specific DMP content across projects and funding sources requires additional time and effort to write unique DMPs for different settings. To address this issue, we developed DataPLAN—an open-source tool incorporating prewritten DMP content for the plant sciences that can be used online or offline to prepare multiple DMPs. The current version of DataPLAN supports Horizon 2020 and Horizon Europe projects, as well as projects funded by the German Research Foundation (DFG). Furthermore, DataPLAN offers the option for users to customize their own templates. Additional templates to accommodate other funding schemes will be added in the future. DataPLAN reduces the workload needed to create or update DMPs in the plant sciences by presenting standardized RDM practices optimized for different funding contexts.
      Citation: Data
      PubDate: 2023-10-24
      DOI: 10.3390/data8110159
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 161: Dataset: Biodiversity of Ground Beetles
           (Coleoptera, Carabidae) of the Republic of Mordovia (Russia)

    • Authors: Leonid V. Egorov, Viktor V. Aleksanov, Sergei K. Alekseev, Alexander B. Ruchin, Oleg N. Artaev, Mikhail N. Esin, Sergei V. Lukiyanov, Evgeniy A. Lobachev, Gennadiy B. Semishin
      First page: 161
      Abstract: (1) Background: Carabidae is one of the most diverse families of Coleoptera. Many species of Carabidae are sensitive to anthropogenic impacts and are indicators of their environmental state. Some species of large beetles are on the verge of extinction. The aim of this research is to describe the Carabidae fauna of the Republic of Mordovia (central part of European Russia); (2) Methods: The research was carried out in April-September 1979, 1987, 2000, 2001, 2005, 2007–2022. Collections were performed using a variety of methods (light trapping, soil traps, window traps, etc.). For each observation, the coordinates of the sampling location, abundance, and dates were recorded; (3) Results: The dataset contains data on 251 species of Carabidae from 12 subfamilies and 4576 occurrences. A total of 66,378 specimens of Carabidae were studied. Another 29 species are additionally known from other publications. Also, twenty-two species were excluded from the fauna of the region, as they were determined earlier by mistake (4). Conclusions: The biodiversity of Carabidae in the Republic of Mordovia included 280 species from 12 subfamilies. Four species (Agonum scitulum, Lebia scapularis, Bembidion humerale, and Bembidion tenellum) were identified for the first time in the Republic of Mordovia.
      Citation: Data
      PubDate: 2023-10-24
      DOI: 10.3390/data8110161
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 162: The Development of a Water Resource Monitoring
           Ontology as a Research Tool for Sustainable Regional Development

    • Authors: Assel Ospan, Madina Mansurova, Vladimir Barakhnin, Aliya Nugumanova, Roman Titkov
      First page: 162
      Abstract: The development of knowledge graphs about water resources as a tool for studying the sustainable development of a region is currently an urgent task, because the growing deterioration of the state of water bodies affects the ecology, economy, and health of the population of the region. This study presents a new ontological approach to water resource monitoring in Kazakhstan, providing data integration from heterogeneous sources, semantic analysis, decision support, and querying and searching and presenting new knowledge in the field of water monitoring. The contribution of this work is the integration of table extraction and understanding, semantic web rule language, semantic sensor network, time ontology methods, and the inclusion of a module of socioeconomic indicators that reveal the impact of water quality on the quality of life of the population. Using machine learning methods, the study derived six ontological rules to establish new knowledge about water resource monitoring. The results of the queries demonstrate the effectiveness of the proposed method, demonstrating its potential to improve water monitoring practices, promote sustainable resource management, and support decision-making processes in Kazakhstan, and can also be integrated into the ontology of water resources at the scale of Central Asia.
      Citation: Data
      PubDate: 2023-10-26
      DOI: 10.3390/data8110162
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 163: A Large-Scale Dataset of Search Interests Related
           to Disease X Originating from Different Geographic Regions

    • Authors: Nirmalya Thakur, Shuqi Cui, Kesha A. Patel, Isabella Hall, Yuvraj Nihal Duggal
      First page: 163
      Abstract: The World Health Organization (WHO) added Disease X to their shortlist of blueprint priority diseases to represent a hypothetical, unknown pathogen that could cause a future epidemic. During different virus outbreaks of the past, such as COVID-19, Influenza, Lyme Disease, and Zika virus, researchers from various disciplines utilized Google Trends to mine multimodal components of web behavior to study, investigate, and analyze the global awareness, preparedness, and response associated with these respective virus outbreaks. As the world prepares for Disease X, a dataset on web behavior related to Disease X would be crucial to contribute towards the timely advancement of research in this field. Furthermore, none of the prior works in this field have focused on the development of a dataset to compile relevant web behavior data, which would help to prepare for Disease X. To address these research challenges, this work presents a dataset of web behavior related to Disease X, which emerged from different geographic regions of the world, between February 2018 and August 2023. Specifically, this dataset presents the search interests related to Disease X from 94 geographic regions. These regions were chosen for data mining as these regions recorded significant search interests related to Disease X during this timeframe. The dataset was developed by collecting data using Google Trends. The relevant search interests for all these regions for each month in this time range are available in this dataset. This paper also discusses the compliance of this dataset with the FAIR principles of scientific data management. Finally, an analysis of this dataset is presented to uphold the applicability, relevance, and usefulness of this dataset for the investigation of different research questions in the interrelated fields of Big Data, Data Mining, Healthcare, Epidemiology, and Data Analysis with a specific focus on Disease X.
      Citation: Data
      PubDate: 2023-10-26
      DOI: 10.3390/data8110163
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 164: Information Competences and Academic Achievement:
           A Dataset

    • Authors: Jacqueline Köhler, Roberto González-Ibáñez
      First page: 164
      Abstract: Information literacy (IL) is becoming fundamental in the modern world. Although several IL standards and assessments have been developed for secondary and higher education, there is still no agreement about the possible associations between IL and both academic achievement and student dropout rates. In this article, we present a dataset including IL competences measurements, as well as academic achievement and socioeconomic indicators for 153 Chilean first- and second-year engineering students. The dataset is intended to allow researchers to use machine learning methods to study to what extent, if any, IL and academic achievement are related.
      Citation: Data
      PubDate: 2023-10-27
      DOI: 10.3390/data8110164
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 165: Can We Mathematically Spot the Possible
           Manipulation of Results in Research Manuscripts Using Benford’s Law'
           

    • Authors: Teddy Lazebnik, Dan Gorlitsky
      First page: 165
      Abstract: The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this paper, we utilize an adapted version of Benford’s law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify the potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts rather than the commonly unavailable raw datasets. Our methodology applies the principles of Benford’s law to commonly employed analyses in academic manuscripts, thus reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted 79% of them accurately using our rules. Moreover, we tested the proposed method on known retracted manuscripts, showing that around half (48.6%) can be detected using the proposed method. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with 10 manuscripts randomly sampled from each journal. Our analysis predicted a 3% occurrence of results manipulation with a 96% confidence level. Our findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy.
      Citation: Data
      PubDate: 2023-10-31
      DOI: 10.3390/data8110165
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 166: A Scalable Data Structure for Efficient Graph
           Analytics and In-Place Mutations

    • Authors: Soukaina Firmli, Dalila Chiadmi
      First page: 166
      Abstract: The graph model enables a broad range of analyses; thus, graph processing (GP) is an invaluable tool in data analytics. At the heart of every GP system lies a concurrent graph data structure that stores the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, GP systems face the challenge of providing an appropriate graph data structure that enables both fast analytical workloads and fast, low-memory graph mutations. Existing graph structures offer a hard tradeoff among read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these tradeoffs and enables both fast read-only analytics, and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists (ALs) to achieve the best of both worlds. We compare CSR++ to CSR, ALs from the Boost Graph Library (BGL), and the following state-of-the-art update-friendly graph structures: LLAMA, STINGER, GraphOne, and Teseo. In our evaluation, which is based on popular GP algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average) while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2×) with frequent updates. We also show that both CSR++’s update throughput and analytics performance exceed those of several state-of-the-art graph structures while maintaining low memory consumption when the workload includes updates.
      Citation: Data
      PubDate: 2023-11-03
      DOI: 10.3390/data8110166
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 167: Draft Genome Sequence Data of Lysinibacillus
           sphaericus Strain 1795 with Insecticidal Properties

    • Authors: Maria N. Romanenko, Maksim A. Nesterenko, Anton E. Shikov, Anton A. Nizhnikov, Kirill S. Antonets
      First page: 167
      Abstract: Lysinibacillus sphaericus holds a significant agricultural importance by being able to produce insecticidal toxins and chemical moieties of varying antibacterial and fungicidal activities. In this study, the genome of the L. sphaericus strain 1795 is presented. Illumina short reads sequenced on the HiSeq X platform were used to obtain the genome’s assembly by applying the SPAdes v3.15.4 software. The genome size based on a cumulative length of 23 contigs reached 4.74 Mb, with a respective N50 of 1.34 Mb. The assembled genome carried 4672 genes, including 4643 protein-encoding ones, 5 of which represented loci coding for insecticidal toxins active against the orders Diptera, Lepidoptera, and Blattodea. We also revealed biosynthetic gene clusters responsible for the synthesis of secondary metabolites with predicted antibacterial, fungicidal, and growth-promoting properties. The genomic data provided will be helpful for deepening our understanding of genetic markers determining the efficient application of the L. sphaericus strain 1795 primarily for biocontrol purposes in veterinary and medical applications against several groups of blood-sucking insects.
      Citation: Data
      PubDate: 2023-11-03
      DOI: 10.3390/data8110167
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 168: Applying Eye Tracking with Deep Learning
           Techniques for Early-Stage Detection of Autism Spectrum Disorders

    • Authors: Zeyad A. T. Ahmed, Eid Albalawi, Theyazn H. H. Aldhyani, Mukti E. Jadhav, Prachi Janrao, Mansour Ratib Mohammad Obeidat
      First page: 168
      Abstract: Autism spectrum disorder (ASD) poses a complex challenge to researchers and practitioners, with its multifaceted etiology and varied manifestations. Timely intervention is critical in enhancing the developmental outcomes of individuals with ASD. This paper underscores the paramount significance of early detection and diagnosis as a pivotal precursor to effective intervention. To this end, integrating advanced technological tools, specifically eye-tracking technology and deep learning algorithms, is investigated for its potential to discriminate between children with ASD and their typically developing (TD) peers. By employing these methods, the research aims to contribute to refining early detection strategies and support mechanisms. This study introduces innovative deep learning models grounded in convolutional neural network (CNN) and recurrent neural network (RNN) architectures, employing an eye-tracking dataset for training. Of note, performance outcomes have been realised, with the bidirectional long short-term memory (BiLSTM) achieving an accuracy of 96.44%, the gated recurrent unit (GRU) attaining 97.49%, the CNN-LSTM hybridising to 97.94%, and the LSTM achieving the most remarkable accuracy result of 98.33%. These outcomes underscore the efficacy of the applied methodologies and the potential of advanced computational frameworks in achieving substantial accuracy levels in ASD detection and classification.
      Citation: Data
      PubDate: 2023-11-03
      DOI: 10.3390/data8110168
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 169: Machine Learning for Credit Risk Prediction: A
           Systematic Literature Review

    • Authors: Jomark Pablo Noriega, Luis Antonio Rivera, José Alfredo Herrera
      First page: 169
      Abstract: In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite.
      Citation: Data
      PubDate: 2023-11-07
      DOI: 10.3390/data8110169
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 170: Introducing DeReKoGram: A Novel Frequency Dataset
           with Lemma and Part-of-Speech Information for German

    • Authors: Sascha Wolfer, Alexander Koplenig, Marc Kupietz, Carolin Müller-Spitzer
      First page: 170
      Abstract: We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource.
      Citation: Data
      PubDate: 2023-11-10
      DOI: 10.3390/data8110170
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 171: ChatGPT across Arabic Twitter: A Study of Topics,
           Sentiments, and Sarcasm

    • Authors: Shahad Al-Khalifa, Fatima Alhumaidhi, Hind Alotaibi, Hend S. Al-Khalifa
      First page: 171
      Abstract: While ChatGPT has gained global significance and widespread adoption, its exploration within specific cultural contexts, particularly within the Arab world, remains relatively limited. This study investigates the discussions among early Arab users in Arabic tweets related to ChatGPT, focusing on topics, sentiments, and the presence of sarcasm. Data analysis and topic-modeling techniques were employed to examine 34,760 Arabic tweets collected using specific keywords. This study revealed a strong interest within the Arabic-speaking community in ChatGPT technology, with prevalent discussions spanning various topics, including controversies, regional relevance, fake content, and sector-specific dialogues. Despite the enthusiasm, concerns regarding ethical risks and negative implications of ChatGPT’s emergence were highlighted, indicating apprehension toward advanced artificial intelligence (AI) technology in language generation. Region-specific discussions underscored the diverse adoption of AI applications and ChatGPT technology. Sentiment analysis of the tweets demonstrated a predominantly neutral sentiment distribution (92.8%), suggesting a focus on objectivity and factuality over emotional expression. The prevalence of neutral sentiments indicated a preference for evidence-based reasoning and logical arguments, fostering constructive discussions influenced by cultural norms. Sarcasm was found in 4% of the tweets, distributed across various topics but not dominating the conversation. This study’s implications include the need for AI developers to address ethical concerns and the importance of educating users about the technology’s ethical considerations and risks. Policymakers should consider the regional relevance and potential scams, emphasizing the necessity for ethical guidelines and regulations.
      Citation: Data
      PubDate: 2023-11-14
      DOI: 10.3390/data8110171
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 172: Testate Amoebae (Amphitremida, Arcellinida,
           Euglyphida) in Sphagnum Bogs: The Dataset from Eastern Fennoscandia

    • Authors: Aleksandr Ivanovskii, Kirill Babeshko, Viktor Chernyshov, Anton Esaulov, Aleksandr Komarov, Elena Malysheva, Natalia Mazei, Diana Meskhadze, Damir Saldaev, Andrey N. Tsyganov, Yuri Mazei
      First page: 172
      Abstract: The paper describes a dataset, comprising 236 surface moss samples and 143 testate amoeba taxa. The samples were collected in 11 Sphagnum-dominated bogs during frost-free seasons of 2004, 2007, 2009, 2017, and 2022. For the whole dataset, the sampling effort was sufficient in terms of observed species richness (143 species in total), though a regional species pool is deemed to be discovered incompletely (143 species is its lower 95 % confidence limit using Chao’s estimator). The local community composition demonstrated high heterogeneity in a reduced ordination space. It supports the opinion that the high versatility of bog ecosystems should be taken into account during ecological studies.
      Citation: Data
      PubDate: 2023-11-15
      DOI: 10.3390/data8110172
      Issue No: Vol. 8, No. 11 (2023)
       
  • Data, Vol. 8, Pages 173: Biodiversity of Terrestrial Testate Amoebae in
           Western Siberia Lowland Peatlands

    • Authors: Damir Saldaev, Kirill Babeshko, Viktor Chernyshov, Anton Esaulov, Xiuyuan Gu, Nikita Kriuchkov, Natalia Mazei, Nailia Saldaeva, Jiahui Su, Andrey Tsyganov, Basil Yakimov, Svetlana Yushkovets, Yuri Mazei
      First page: 173
      Abstract: Testate amoebae are unicellular eukaryotic organisms covered with an external skeleton called a shell. They are an important component of many terrestrial ecosystems, especially peatlands, where they can be preserved in peat deposits and used as a proxy of surface wetness in paleoecological reconstructions. Here, we represent a database from a vast but poorly studied region of the Western Siberia Lowland containing information on TA occurrences in relation to substrate moisture and WTD. The dataset includes 88 species from 32 genera, with 2181 incidences and 21,562 counted individuals. All samples were collected in oligotrophic peatlands and prepared using the method of wet sieving with a subsequent sedimentation of aqueous suspensions. This database contributes to the understanding of the distribution of testate amoebae and can be further used in large-scale investigations.
      Citation: Data
      PubDate: 2023-11-17
      DOI: 10.3390/data8110173
      Issue No: Vol. 8, No. 11 (2023)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.230.154.90
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-
JournalTOCs
 
 

 A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

  First | 1 2        [Sort alphabetically]   [Restore default list]

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
Showing 201 - 265 of 265 Journals sorted by number of followers
Quantum Science and Technology     Hybrid Journal   (Followers: 15)
Logo STI Science, Technology and Innovation     Open Access   (Followers: 14)
Alfarama Journal of Basic & Applied Sciences     Open Access   (Followers: 12)
RAC: Revista Angolana de Ciências     Open Access   (Followers: 11)
Patterns     Open Access   (Followers: 9)
The Innovation     Open Access   (Followers: 8)
Revista de la Sociedad Científica del Paraguay     Open Access   (Followers: 7)
Research     Open Access   (Followers: 6)
History of Science and Technology     Open Access   (Followers: 6)
Advanced Theory and Simulations     Hybrid Journal   (Followers: 5)
Frontiers in Climate     Open Access   (Followers: 5)
Discover Sustainability     Open Access   (Followers: 5)
Proceedings of the Indian National Science Academy     Full-text available via subscription   (Followers: 5)
International Journal of Culture and Modernity     Open Access   (Followers: 5)
Middle European Scientific Bulletin     Open Access   (Followers: 5)
Data     Open Access   (Followers: 4)
Science & Technology Studies     Open Access   (Followers: 4)
Journal of the Indian Institute of Science     Hybrid Journal   (Followers: 4)
Journal of Big History     Open Access   (Followers: 4)
MUST : Journal of Mathematics Education, Science and Technology     Open Access   (Followers: 4)
Journal of Composites Science     Open Access   (Followers: 4)
People and Nature     Open Access   (Followers: 4)
Citizen Science : Theory and Practice     Open Access   (Followers: 3)
Research Policy : X     Open Access   (Followers: 3)
Revista Saber Digital     Open Access   (Followers: 3)
Indian Journal of History of Science     Hybrid Journal   (Followers: 3)
Jaunujų mokslininkų darbai     Open Access   (Followers: 3)
Journal of Alasmarya University     Open Access   (Followers: 3)
iScience     Open Access   (Followers: 2)
Applied Mathematics and Nonlinear Sciences     Open Access   (Followers: 2)
Acta Nova     Open Access   (Followers: 2)
Indonesian Journal of Science and Mathematics Education     Open Access   (Followers: 2)
Rekayasa     Open Access   (Followers: 2)
Experimental Results     Open Access   (Followers: 2)
South American Sciences     Open Access   (Followers: 2)
BJHS Themes     Open Access   (Followers: 2)
Orbis Cógnita : Revista Científica     Open Access   (Followers: 2)
Revista Científica de la Universidad Nacional del Este     Open Access   (Followers: 2)
International Science and Technology Journal of Namibia     Open Access   (Followers: 2)
Scientific Bulletin     Open Access   (Followers: 1)
Global Journal of Science Frontier Research     Open Access   (Followers: 1)
Impact     Open Access   (Followers: 1)
International Journal of Research in Science     Open Access   (Followers: 1)
Journal of Science and Technology     Open Access   (Followers: 1)
Uluslararası Bilimsel Araştırmalar Dergisi (IBAD)     Open Access   (Followers: 1)
Acta Scientifica Malaysia     Open Access   (Followers: 1)
Scientonomy : Journal for the Science of Science     Open Access   (Followers: 1)
Revista Vivências em Ensino de Ciências     Open Access   (Followers: 1)
PENDIPA : Journal of Science Education     Open Access   (Followers: 1)
Journal of Science and Engineering     Open Access   (Followers: 1)
International Journal of Innovative Research and Scientific Studies     Open Access   (Followers: 1)
Futures & Foresight Science     Hybrid Journal   (Followers: 1)
Journal of Scientific Research and Reports     Open Access   (Followers: 1)
AAS Open Research     Open Access   (Followers: 1)
ARPHA Conference Abstracts     Open Access   (Followers: 1)
Rihan Journal for Scientific Publishing     Open Access   (Followers: 1)
Natural Sciences Education     Hybrid Journal   (Followers: 1)
Fundamental Research     Open Access  
Research Integrity and Peer Review     Open Access  
Journal of Responsible Technology     Open Access  
Natural Sciences     Open Access  
Türk Bilim ve Mühendislik Dergisi     Open Access  
ArtefaCToS : Revista de estudios sobre la ciencia y la tecnología     Open Access  
Ethiopian Journal of Sciences and Sustainable Development     Open Access  
Vilnius University Proceedings     Open Access  
Sciential     Open Access  
ARPHA Proceedings     Open Access  
Gaudium Sciendi     Open Access  
Crea Ciencia Revista Científica     Open Access  
Rafidain Journal of Science     Open Access  
Journal of Al-Qadisiyah for Pure Science     Open Access  
Revista Tecnológica     Open Access  
Himalayan Journal of Science and Technology     Open Access  
International Journal of Academic Research in Business, Arts & Science     Open Access  
Universidad, Ciencia y Tecnología     Open Access  
Fides et Ratio : Revista de Difusión Cultural y Científica     Open Access  
Revista de la Academia Colombiana de Ciencias Exactas, Físicas y Naturales     Open Access  
Entre Ciencia e Ingeniería     Open Access  
Revista Politécnica     Open Access  
Reportes Científicos de la FaCEN     Open Access  
Jurnal Ilmiah Ilmu Terapan Universitas Jambi : JIITUJ     Open Access  
Revista Eletrônica Ludus Scientiae     Open Access  
Emergent Scientist     Open Access  
Asian Journal of Advanced Research and Reports     Open Access  
Archives of Current Research International     Open Access  
Advances in Research     Open Access  
International Journal of Applied Science     Open Access  
Iranian Journal of Science and Technology, Transactions A : Science     Hybrid Journal  
J : Multidisciplinary Scientific Journal     Open Access  
Revista Binacional Brasil - Argentina: Diálogo entre as ciências     Open Access  
Revista Ciencia y Tecnología     Open Access  
Journal of Institute of Science and Technology     Open Access  
Journal of Science (JSc)     Open Access  
WikiJournal of Science     Open Access  
Acta Materialia Transilvanica     Open Access  
Integrated Research Advances     Open Access  
Open Conference Proceedings Journal     Open Access  
Naturen     Full-text available via subscription  
Ekaia : EHUko Zientzia eta Teknologia aldizkaria     Open Access  
Sci     Open Access  
Maskana     Open Access  
Hoosier Science Teacher     Open Access  
Reports in Advances of Physical Sciences     Open Access  
Facets     Open Access  
Adıyaman University Journal of Science     Open Access  
Revista Brasileira de Iniciação Científica     Open Access  
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering     Open Access  
Scientific African     Open Access  
Scientific Journal of Mehmet Akif Ersoy University     Open Access  
Black Sea Journal of Engineering and Science     Open Access  
Fırat University Turkish Journal of Science & Technology     Open Access  
Gazi University Journal of Science     Open Access  
Middle East Journal of Science     Open Access  
International Journal of Computational and Experimental Science and Engineering (IJCESEN)     Open Access  
International Journal of Engineering, Technology and Natural Sciences     Open Access  
Bulletin of the National Research Centre     Open Access  
Uni-pluriversidad     Open Access  
ConCiencia     Open Access  
Ciencia y Tecnología     Open Access  
Revista Bases de la Ciencia     Open Access  
Elkawnie : Journal of Islamic Science and Technology     Open Access  
Ciência ET Praxis     Open Access  
Arab Journal of Basic and Applied Sciences     Open Access  
International Annals of Science     Open Access  
Science Heritage Journal     Open Access  
Bilge International Journal of Science and Technology Research     Open Access  
Avrasya Terim Dergisi     Open Access  
International Scientific and Vocational Studies Journal     Open Access  
TÜBAV Bilim Dergisi     Open Access  
LOGIKA Jurnal Ilmiah Lemlit Unswagati Cirebon     Open Access  
Dalat University Journal of Science     Open Access  
Investiga : TEC     Open Access  
Investigación Joven     Open Access  
Respuestas     Open Access  
Science Diliman     Open Access  
Instruments     Open Access  
Revista Científica y Tecnológica UPSE     Open Access  
HardwareX     Open Access  
Sultan Qaboos University Journal for Science     Open Access  
Borneo Journal of Resource Science and Technology     Open Access  
Sainstek : Jurnal Sains dan Teknologi     Open Access  
Revista de Información Científica     Open Access  
Indonesian Journal of Fundamental Sciences     Open Access  
Sainteknol : Jurnal Sains dan Teknologi     Open Access  
Jurnal Natural     Open Access  
Frontiers for Young Minds     Open Access  
Revista Ciência, Tecnologia & Ambiente     Open Access  
Journal of Indian Council of Philosophical Research     Hybrid Journal  
Journal of Negative and No Positive Results     Open Access  
Revista Conhecimento Online     Open Access  
Nova     Open Access  
CienciaUAT     Open Access  
Enseñanza de las Ciencias : Revista de Investigación y Experiencias Didácticas     Open Access  
Makara Journal of Science     Open Access  
Jurnal Sains Dasar     Open Access  
Indonesian Journal of Science and Technology     Open Access  
Ethiopian Journal of Science and Technology     Open Access  
Jurnal Matematika, Sains, Dan Teknologi     Open Access  
Heidelberger Jahrbücher Online     Open Access  
ARO. The Scientific Journal of Koya University     Open Access  
International Journal of Recent Contributions from Engineering, Science & IT     Open Access  
Estação Científica (UNIFAP)     Open Access  
The Winnower     Open Access  

  First | 1 2        [Sort alphabetically]   [Restore default list]

Similar Journals
Similar Journals
HOME > Browse the 73 Subjects covered by JournalTOCs  
SubjectTotal Journals
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.230.154.90
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-