A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
Showing 1 - 200 of 265 Journals sorted alphabetically
AAS Open Research     Open Access   (Followers: 1)
Accountability in Research: Policies and Quality Assurance     Hybrid Journal   (Followers: 18)
Acta Materialia Transilvanica     Open Access  
Acta Nova     Open Access   (Followers: 1)
Acta Scientifica Malaysia     Open Access   (Followers: 1)
Acta Scientifica Naturalis     Open Access   (Followers: 4)
Adıyaman University Journal of Science     Open Access  
Advanced Science     Open Access   (Followers: 13)
Advanced Science, Engineering and Medicine     Partially Free   (Followers: 6)
Advanced Theory and Simulations     Hybrid Journal   (Followers: 4)
Advances in Research     Open Access  
Advances in Science and Technology     Full-text available via subscription   (Followers: 18)
African Journal of Science, Technology, Innovation and Development     Hybrid Journal   (Followers: 7)
Afrique Science : Revue Internationale des Sciences et Technologie     Open Access   (Followers: 1)
AFRREV STECH : An International Journal of Science and Technology     Open Access   (Followers: 3)
Alfarama Journal of Basic & Applied Sciences     Open Access   (Followers: 4)
American Academic & Scholarly Research Journal     Open Access   (Followers: 4)
American Journal of Applied Sciences     Open Access   (Followers: 21)
American Journal of Humanities and Social Sciences     Open Access   (Followers: 13)
ANALES de la Universidad Central del Ecuador     Open Access   (Followers: 1)
Anales del Instituto de la Patagonia     Open Access  
Applied Mathematics and Nonlinear Sciences     Open Access   (Followers: 2)
Apuntes de Ciencia & Sociedad     Open Access  
Arab Journal of Basic and Applied Sciences     Open Access  
Arabian Journal for Science and Engineering     Hybrid Journal   (Followers: 1)
Archives Internationales d'Histoire des Sciences     Partially Free   (Followers: 5)
Archives of Current Research International     Open Access  
ARO. The Scientific Journal of Koya University     Open Access  
ARPHA Conference Abstracts     Open Access   (Followers: 1)
ARPHA Proceedings     Open Access  
ArtefaCToS : Revista de estudios sobre la ciencia y la tecnología     Open Access  
Asian Journal of Advanced Research and Reports     Open Access  
Asian Journal of Scientific Research     Open Access   (Followers: 2)
Asian Journal of Technology Innovation     Hybrid Journal   (Followers: 5)
Australian Field Ornithology     Full-text available via subscription   (Followers: 2)
Australian Journal of Social Issues     Hybrid Journal   (Followers: 6)
Avrasya Terim Dergisi     Open Access  
Bangladesh Journal of Scientific Research     Open Access  
Beni-Suef University Journal of Basic and Applied Sciences     Open Access   (Followers: 1)
Berichte Zur Wissenschaftsgeschichte     Hybrid Journal   (Followers: 11)
BIBECHANA     Open Access  
Bilge International Journal of Science and Technology Research     Open Access  
Bioethics Research Notes     Full-text available via subscription   (Followers: 16)
BJHS Themes     Open Access  
Black Sea Journal of Engineering and Science     Open Access  
Borneo Journal of Resource Science and Technology     Open Access  
Bulletin de la Société Royale des Sciences de Liège     Open Access  
Bulletin of the National Research Centre     Open Access  
Butlletí de la Institució Catalana d'Història Natural     Open Access  
Chain Reaction     Full-text available via subscription  
Ciencia Amazónica (Iquitos)     Open Access  
Ciencia en su PC     Open Access   (Followers: 1)
Ciencia Ergo Sum     Open Access  
Ciência ET Praxis     Open Access  
Ciencia y Tecnología     Open Access  
Ciencias Holguin     Open Access   (Followers: 1)
CienciaUAT     Open Access  
Citizen Science : Theory and Practice     Open Access   (Followers: 2)
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering     Open Access  
Communications in Applied Sciences     Open Access  
Comunicata Scientiae     Open Access  
ConCiencia     Open Access  
Conference Papers in Science     Open Access  
Configurations     Full-text available via subscription   (Followers: 11)
COSMOS     Hybrid Journal   (Followers: 1)
Crea Ciencia Revista Científica     Open Access  
Cuadernos de Investigación UNED     Open Access  
Current Issues in Criminal Justice     Hybrid Journal   (Followers: 13)
Current Research in Geoscience     Open Access   (Followers: 5)
Dalat University Journal of Science     Open Access  
Data     Open Access   (Followers: 4)
Data Curation Profiles Directory     Open Access   (Followers: 8)
Dhaka University Journal of Science     Open Access  
Diálogos Interdisciplinares     Open Access  
Digithum     Open Access   (Followers: 2)
Discover Sustainability     Open Access   (Followers: 3)
Einstein (São Paulo)     Open Access  
Ekaia : EHUko Zientzia eta Teknologia aldizkaria     Open Access  
Elkawnie : Journal of Islamic Science and Technology     Open Access  
Emergent Scientist     Open Access  
Enhancing Learning in the Social Sciences     Open Access   (Followers: 7)
Enseñanza de las Ciencias : Revista de Investigación y Experiencias Didácticas     Open Access  
Entramado     Open Access  
Entre Ciencia e Ingeniería     Open Access  
Epiphany     Open Access   (Followers: 1)
Estação Científica (UNIFAP)     Open Access  
Ethiopian Journal of Education and Sciences     Open Access   (Followers: 5)
Ethiopian Journal of Science and Technology     Open Access  
Ethiopian Journal of Sciences and Sustainable Development     Open Access  
European Online Journal of Natural and Social Sciences     Open Access   (Followers: 4)
European Scientific Journal     Open Access   (Followers: 5)
Evidência - Ciência e Biotecnologia - Interdisciplinar     Open Access  
Exchanges : the Warwick Research Journal     Open Access   (Followers: 1)
Experimental Results     Open Access   (Followers: 1)
Facets     Open Access  
Fides et Ratio : Revista de Difusión Cultural y Científica     Open Access  
Fırat University Turkish Journal of Science & Technology     Open Access  
Fontanus     Open Access   (Followers: 1)
Forensic Science Policy & Management: An International Journal     Hybrid Journal   (Followers: 232)
Frontiers for Young Minds     Open Access  
Frontiers in Climate     Open Access   (Followers: 4)
Frontiers in Science     Open Access   (Followers: 1)
Fundamental Research     Open Access  
Futures & Foresight Science     Hybrid Journal   (Followers: 1)
Gaudium Sciendi     Open Access  
Gazi University Journal of Science     Open Access  
Ghana Studies     Full-text available via subscription   (Followers: 15)
Global Journal of Pure and Applied Sciences     Full-text available via subscription  
Global Journal of Science Frontier Research     Open Access   (Followers: 1)
Globe, The     Full-text available via subscription   (Followers: 4)
HardwareX     Open Access  
Heidelberger Jahrbücher Online     Open Access  
Heliyon     Open Access  
Himalayan Journal of Science and Technology     Open Access  
History of Science and Technology     Open Access   (Followers: 5)
Hoosier Science Teacher     Open Access  
Impact     Open Access   (Followers: 1)
Indian Journal of History of Science     Hybrid Journal  
Indonesian Journal of Fundamental Sciences     Open Access  
Indonesian Journal of Science and Mathematics Education     Open Access   (Followers: 2)
Indonesian Journal of Science and Technology     Open Access  
Ingenieria y Ciencia     Open Access   (Followers: 1)
Innovare : Revista de ciencia y tecnología     Open Access  
Instruments     Open Access  
Integrated Research Advances     Open Access  
Interciencia     Open Access  
Interface Focus     Full-text available via subscription  
International Annals of Science     Open Access  
International Archives of Science and Technology     Open Access  
International Journal of Academic Research in Business, Arts & Science     Open Access  
International Journal of Advanced Multidisciplinary Research and Review     Open Access  
International Journal of Applied Science     Open Access  
International Journal of Basic and Applied Sciences     Open Access   (Followers: 1)
International Journal of Computational and Experimental Science and Engineering (IJCESEN)     Open Access  
International Journal of Culture and Modernity     Open Access   (Followers: 2)
International Journal of Engineering, Science and Technology     Open Access  
International Journal of Engineering, Technology and Natural Sciences     Open Access  
International Journal of Innovation and Applied Studies     Open Access   (Followers: 4)
International Journal of Innovative Research and Scientific Studies     Open Access   (Followers: 1)
International Journal of Network Science     Hybrid Journal   (Followers: 3)
International Journal of Recent Contributions from Engineering, Science & IT     Open Access  
International Journal of Research in Science     Open Access   (Followers: 1)
International Journal of Social Sciences and Management     Open Access   (Followers: 2)
International Journal of Technology Policy and Law     Hybrid Journal   (Followers: 8)
International Letters of Social and Humanistic Sciences     Open Access  
International Science and Technology Journal of Namibia     Open Access   (Followers: 1)
International Scientific and Vocational Studies Journal     Open Access  
InterSciencePlace     Open Access  
Investiga : TEC     Open Access  
Investigación Joven     Open Access  
Investigacion y Ciencia     Open Access   (Followers: 1)
Iranian Journal of Science and Technology, Transactions A : Science     Hybrid Journal  
iScience     Open Access   (Followers: 2)
Issues in Science & Technology     Free   (Followers: 8)
Ithaca : Viaggio nella Scienza     Open Access  
J : Multidisciplinary Scientific Journal     Open Access  
Jaunujų mokslininkų darbai     Open Access   (Followers: 1)
Journal de la Recherche Scientifique de l'Universite de Lome     Full-text available via subscription  
Journal of Advanced Research     Open Access   (Followers: 2)
Journal of Al-Qadisiyah for Pure Science     Open Access  
Journal of Alasmarya University     Open Access  
Journal of Analytical Science & Technology     Open Access   (Followers: 4)
Journal of Applied Science and Technology     Full-text available via subscription   (Followers: 1)
Journal of Applied Sciences and Environmental Management     Open Access   (Followers: 1)
Journal of Big History     Open Access   (Followers: 4)
Journal of Composites Science     Open Access   (Followers: 4)
Journal of Diversity Management     Open Access   (Followers: 4)
Journal of Indian Council of Philosophical Research     Hybrid Journal  
Journal of Institute of Science and Technology     Open Access  
Journal of Integrated Science and Technology     Open Access  
Journal of King Saud University - Science     Open Access  
Journal of Mathematical and Fundamental Sciences     Open Access  
Journal of Natural Sciences Research     Open Access   (Followers: 2)
Journal of Negative and No Positive Results     Open Access  
Journal of Responsible Technology     Open Access  
Journal of Science (JSc)     Open Access  
Journal of Science and Engineering     Open Access   (Followers: 1)
Journal of Science and Technology     Open Access   (Followers: 2)
Journal of Science and Technology     Open Access   (Followers: 1)
Journal of Science and Technology (Ghana)     Open Access   (Followers: 3)
Journal of Science and Technology Policy Management     Hybrid Journal   (Followers: 1)
Journal of Science Foundation     Open Access   (Followers: 1)
Journal of Science of the University of Kelaniya Sri Lanka     Open Access  
Journal of Scientific Research     Open Access  
Journal of Scientific Research and Reports     Open Access  
Journal of Scientometric Research     Open Access   (Followers: 21)
Journal of Shanghai Jiaotong University (Science)     Hybrid Journal  
Journal of Social Science Research     Open Access   (Followers: 2)
Journal of Taibah University for Science     Open Access  
Journal of the Asiatic Society of Bangladesh, Science     Open Access  
Journal of the Ghana Science Association     Full-text available via subscription   (Followers: 3)
Journal of the History of Ideas     Full-text available via subscription   (Followers: 151)
Journal of the Indian Institute of Science     Hybrid Journal   (Followers: 4)
Journal of the National Science Foundation of Sri Lanka     Open Access  
Journal of the Royal Society of New Zealand     Hybrid Journal   (Followers: 48)
Journal of the South Carolina Academy of Science     Open Access  
Journal of Unsolved Questions     Open Access  
Jurnal Ilmiah Ilmu Terapan Universitas Jambi : JIITUJ     Open Access  
Jurnal Matematika, Sains, Dan Teknologi     Open Access  
Jurnal MIPA     Open Access  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Similar Journals
Journal Cover
Data
Number of Followers: 4  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2306-5729
Published by MDPI Homepage  [258 journals]
  • Data, Vol. 8, Pages 45: Multi-Level Analysis of Learning Management
           Systems’ User Acceptance Exemplified in Two System Case Studies

    • Authors: Parisa Shayan, Roberto Rondinelli, Menno van Zaanen, Martin Atzmueller
      First page: 45
      Abstract: There has recently been an increasing interest in Learning Management Systems (LMSs). It is currently unclear, however, exactly how these systems are perceived by their users. This article analyzes data on user acceptance for two LMSs (Blackboard and Canvas). The respective data are collected using a questionnaire modeled after the Technology Acceptance Model (TAM); it relates several variables that influence system acceptability, allowing for a detailed analysis of the system acceptance. We present analyses at two levels of the questionnaire data: questions and constructs (taken from TAM) as well as on different analysis levels using targeted methods. First, we investigate the differences between the above LMSs using statistical tests (t-test). Second, we provide results at the question level using descriptive indices, such as the mean and the Gini heterogeneity index, and apply methods for ordinal data using the Cumulative Link Mixed Model (CLMM). Next, we apply the same approach at the TAM construct level plus descriptive network analysis (degree centrality and bipartite motifs) to explore the variability of users’ answers and the degree of users’ satisfaction considering the extracted patterns. In the context of TAM, the statistical model is able to analyze LMS acceptance on the question level. As we are also very much interested in identifying LMS acceptance at the construct level, in this article, we provide both statistical analysis as well as network analysis to explore the connection between questionnaire data and relational data. A network analysis approach is particularly useful when analyzing LMS acceptance on the construct level, as this can take the structure of the users’ answers across questions per construct into account. Taken together, these results suggest a higher rate of user acceptance among Canvas users compared to Blackboard both for the question and construct level. Likewise, the descriptive network modeling for Canvas indicates a slightly higher concordance between Canvas users than Blackboard at the construct level.
      Citation: Data
      PubDate: 2023-02-22
      DOI: 10.3390/data8030045
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 46: Analysis of Government Policy Sentiment Regarding
           Vacation during the COVID-19 Pandemic Using the Bidirectional Encoder
           Representation from Transformers (BERT)

    • Authors: Intan Nurma Yulita, Victor Wijaya, Rudi Rosadi, Indra Sarathan, Yusa Djuyandi, Anton Satria Prabuwono
      First page: 46
      Abstract: To address the COVID-19 situation in Indonesia, the Indonesian government has adopted a number of policies. One of them is a vacation-related policy. Government measures with regard to this vacation policy have produced a wide range of viewpoints in society, which have been extensively shared on social media, including YouTube. However, there has not been any computerized system developed to date that can assess people’s social media reactions. Therefore, this paper provides a sentiment analysis application to this government policy by employing a bidirectional encoder representation from transformers (BERT) approach. The study method began with data collecting, data labeling, data preprocessing, BERT model training, and model evaluation. This study created a new dataset for this topic. The data were collected from the comments section of YouTube, and were categorized into three categories: positive, neutral, and negative. This research yielded an F-score of 84.33%. Another contribution from this study regards the methodology for processing sentiment analysis in Indonesian. In addition, the model was created as an application using the Python programming language and the Flask framework. The government can learn the extent to which the public accepts the policies that have been implemented by utilizing this research.
      Citation: Data
      PubDate: 2023-02-23
      DOI: 10.3390/data8030046
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 47: A Dataset of Service Time and Related Patient
           Characteristics from an Outpatient Clinic

    • Authors: Haolin Feng, Yiwu Jia, Siyi Zhou, Hongyi Chen, Teng Huang
      First page: 47
      Abstract: Outpatient clinics’ productivity largely depends on their appointment scheduling systems. It is crucial for appointment scheduling to understand the intrinsic heterogeneity in patient and service types and act accordingly. This article describes an outpatient clinic dataset of consultation service time with heterogeneous characteristics. The dataset contains 6637 consultation records collected from 381 half-day sessions between 2018 and 2019. Each record includes encrypted session and patient IDs, consultation start and (approximated) end times, the month and day of the week, whether it was on a holiday, the patient’s visit count for a specific medical condition, gender, whether the consultation was cancer-related, and the distance from the patient’s mailing address to the clinic. These features can be used to classify patients into heterogeneous groups in studies of appointment scheduling. Therefore, this dataset with rich, heterogeneous patient characteristics provides a valuable opportunity for healthcare operations management researchers to develop, test, and benchmark the performance of their models and methods. It can also be used for studying appointment scheduling in other service industries. More generally, it provides pedagogical value in areas related to management science and operations research, applied statistics, and machine learning.
      Citation: Data
      PubDate: 2023-02-25
      DOI: 10.3390/data8030047
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 48: Reconstructed River Water Temperature Dataset for
           Western Canada 1980–2018

    • Authors: Rajesh R. Shrestha, Jennifer C. Pesklevits
      First page: 48
      Abstract: Continuous water temperature data are important for understanding historical variability and trends of river thermal regime, as well as impacts of warming climate on aquatic ecosystem health. We describe a reconstructed daily water temperature dataset that supplements sparse historical observations for 55 river stations across western Canada. We employed the air2stream model for reconstructing water temperature dataset over the period 1980–2018, with air temperature and discharge data used as model inputs. The model was calibrated and validated by comparing with observed water temperature records, and the results indicate a reasonable statistical performance. We also present historical trends over the ice-free summer months from June to September using the reconstructed dataset, which indicate- significantly increasing water temperature trends for most stations. Besides trend analysis, the dataset could be used for various applications, such as calculation of heat fluxes, calibration/validation of process-based water temperature models, establishment of baseline condition for future climate projections, and assessment of impacts on ecosystems health and water quality.
      Citation: Data
      PubDate: 2023-02-26
      DOI: 10.3390/data8030048
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 49: Data Balancing Techniques for Predicting Student
           Dropout Using Machine Learning

    • Authors: Neema Mduma
      First page: 49
      Abstract: Predicting student dropout is a challenging problem in the education sector. This is due to an imbalance in student dropout data, mainly because the number of registered students is always higher than the number of dropout students. Developing a model without taking the data imbalance issue into account may lead to an ungeneralized model. In this study, different data balancing techniques were applied to improve prediction accuracy in the minority class while maintaining a satisfactory overall classification performance. Random Over Sampling, Random Under Sampling, Synthetic Minority Over Sampling, SMOTE with Edited Nearest Neighbor and SMOTE with Tomek links were tested, along with three popular classification models: Logistic Regression, Random Forest, and Multi-Layer Perceptron. Publicly accessible datasets from Tanzania and India were used to evaluate the effectiveness of balancing techniques and prediction models. The results indicate that SMOTE with Edited Nearest Neighbor achieved the best classification performance on the 10-fold holdout sample. Furthermore, Logistic Regression correctly classified the largest number of dropout students (57348 for the Uwezo dataset and 13430 for the India dataset) using the confusion matrix as the evaluation matrix. The applications of these models allow for the precise prediction of at-risk students and the reduction of dropout rates.
      Citation: Data
      PubDate: 2023-02-27
      DOI: 10.3390/data8030049
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 50: Dataset of Partial Analytical Validation of the
           1,2-O-Dilauryl-Rac-Glycero-3-Glutaric Acid-(6′-Methylresorufin)
           Ester (DGGR) Lipase Assay in Equine Plasma

    • Authors: Laureen Michèle Peters, Judith Howard
      First page: 50
      Abstract: Laboratory assays require analytical validation to prove they are providing accurate results. This dataset describes the partial analytical validation of lipase activity, measured with the 1,2-o-dilauryl-rac-glycero-3-glutaric acid-(6′-methylresorufin) ester (DGGR) lipase assay in equine plasma. Samples with low (approx. 12 U/L), moderately increased (approx. 79 U/L), and markedly increased lipase activity (approx. 298 U/L) were chosen. Linearity was assessed in samples of ascending dilution prepared by mixing samples with low and high lipase activity in different proportions. Repeatability or intra-assay replication was evaluated by measuring each level in 25 replicates within the same run. Reproducibility or inter-assay replication was calculated by measuring each level in five replicates on five consecutive days. The assay was linear in the range of 12–298 U/L (R2 = 0.9998) with a <2.3% deviation from the calculated value at any point. Within-run coefficients of variation were 4.43%, 0.69%, and 1.00% for the low, medium, and high samples, respectively. Between-run coefficients of variation were 3.57%, 1.42%, and 1.16%, respectively. To our knowledge, these are the first published data on the analytical validation of the DGGR lipase assay in horses, which may be of interest to veterinary clinical pathologists and equine clinicians measuring DGGR lipase in equine blood for diagnostic and research purposes.
      Citation: Data
      PubDate: 2023-02-28
      DOI: 10.3390/data8030050
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 51: Correction: Michel et al. SEN2VENµS, a
           Dataset for the Training of Sentinel-2 Super-Resolution Algorithms. Data
           2022, 7, 96

    • Authors: Julien Michel, Juan Vinasco-Salinas, Jordi Inglada, Olivier Hagolle
      First page: 51
      Abstract: There was an error in the original publication [...]
      Citation: Data
      PubDate: 2023-02-28
      DOI: 10.3390/data8030051
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 52: Dataset on SCADA Data of an Urban Small Wind
           Turbine Operation in São Paulo, Brazil

    • Authors: Welson Bassi, Alcantaro Lemes Rodrigues, Ildo Luis Sauer
      First page: 52
      Abstract: Small wind turbines (SWTs) represent an opportunity to promote energy generation technologies from low-carbon renewable sources in cities. Tall buildings are inherently suitable for placing SWTs in urban environments. Thus, the Institute of Energy and Environment of the University of São Paulo (IEE-USP) has installed an SWT in an existing high-height High Voltage Laboratory building on its campus in São Paulo, Brazil. The dataset file contains data regarding the actual electrical and mechanical operational quantities and control parameters obtained and recorded by the internal inverter of a Skystream 3.7 SWT, with 1.8 kW rated power, from 2017 to 2022. The main electrical parameters are the generated energy, voltages, currents, and power frequency in the connection grid point. Rotation, referential wind speed, and temperatures measured in some points at the inverter and in the nacelle are also recorded. Several other parameters concerning the SWT inverter operation, including alarms and status codes, are also presented. This dataset can be helpful for reanalysis, to access information, such as capacity factor, and can also be used as overall input data of actual SWT operation quantities.
      Citation: Data
      PubDate: 2023-02-28
      DOI: 10.3390/data8030052
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 53: Toward a Spatially Segregated Urban Growth'
           Austerity, Poverty, and the Demographic Decline of Metropolitan Greece

    • Authors: Kostas Rontos, Enrico Maria Mosconi, Mattia Gianvincenzi, Simona Moretti, Luca Salvati
      First page: 53
      Abstract: Metropolitan decline in southern Europe was documented in few cases, being less intensively investigated than in other regions of the continent. Likely for the first time in recent history, the aftermath of the 2007 recession was a time period associated with economic and demographic decline in Mediterranean Europe. However, the impacts and consequences of the great crisis were occasionally verified and quantified, both in strictly urban contexts and in the surrounding rural areas. By exploiting official statistics, our study delineates sequential stages of demographic growth and decline in a large metropolitan region (Athens, Greece) as a response to economic expansion and stagnation. Having important implications for the extent and spatial direction of metropolitan cycles, the Athens’ case—taken as an example of urban cycles in Mediterranean Europe—indicates a possibly new dimension of urban shrinkage, with spatially varying population growth and decline along a geographical gradient of income and wealth. Heterogeneous dynamics led to a leapfrog urban expansion decoupled from agglomeration and scale, the factors most likely shaping long-term metropolitan expansion in advanced economies. Demographic decline in urban contexts was associated with multidimensional socioeconomic processes resulting in spatially complex demographic outcomes that require appropriate, and possibly more specific, regulation policies. By shedding further light on recession-driven metropolitan decline in advanced economies, the present study contributes to re-thinking short-term development mechanisms and medium-term demographic scenarios in Mediterranean Europe.
      Citation: Data
      PubDate: 2023-03-01
      DOI: 10.3390/data8030053
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 54: Manual of GUI Program Governing ABAQUS Simulations
           of Bar Impact Test for Calibrating Bar Properties, Measured Strain, and
           Impact Velocity

    • Authors: Hyunho Shin
      First page: 54
      Abstract: Bar impact instruments, such as the (split) Hopkinson bars and direct impact Hopkinson bars, measure blast/impact waves or mechanical properties of materials at high strain rates. To effectively use such instruments, it is essential to know (i) the elastic properties of the bar, (ii) the correction factor of the measured strain, and (iii) information on impact velocity. This paper presents a graphic-user-interface (GUI) program prepared for solving these fundamental issues. We describe the directory structure of the program, roles and relations of associated files, GUI panels, algorithm, and execution procedure of the program. This program employs a separately measured bar density value and governs the ABAQUS simulations (explicit finite element analyses) of the bar impact test at a given impact velocity for a range of bar properties (elastic modulus and Poisson’s ratio) and two correction factors (in compression and tension) of the measured strain. The simulation is repeated until the predicted elastic wave profile in the bar is reasonably consistent with the experimental counterpart. The bar properties and correction factors are determined as the calibrated values when the two wave profiles are reasonably consistent. The program is also capable of impact velocity calibration with reference to a reliably measured bar strain wave. The quantities of a 19.1 mm diameter bar (maraging steel) were successfully calibrated using the presented GUI program. The GUI program, auxiliary programs, pre-processing files, and an example ABAQUS input file are available in a publicly accessible data repository.
      Citation: Data
      PubDate: 2023-03-01
      DOI: 10.3390/data8030054
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 55: Dataset AqADAPT: Physicochemical Parameters,
           Vibrio Abundance, and Species Determination in Water Columns of Two
           Adriatic Sea Aquaculture Sites

    • Authors: Marija Purgar, Damir Kapetanović, Ana Gavrilović, Branimir K. Hackenberger, Božidar Kurtović, Ines Haberle, Jadranka Pečar Ilić, Sunčana Geček, Domagoj K. Hackenberger, Tamara Djerdj, Lav Bavčević, Jakov Žunić, Fran Barac, Zvjezdana Šoštarić Vulić, Tin Klanjšček
      First page: 55
      Abstract: Aquaculture provides more than 50% of all seafood for human consumption. This important industrial sector is already under pressure from climate-change-induced shifts in water column temperature, nutrient loads, precipitation patterns, microbial community composition, and ocean acidification, all affecting fish welfare. Disease-related risks are also shifting with important implications for risk from vibriosis, a disease that can lead to massive economic losses. Adaptation to these pressures pose numerous challenges for aquaculture producers, policy makers, and researchers. The dataset AqADAPT aims to help the development of management and adaptation tools by providing (i) measurements of physicochemical (temperature, salinity, total dissolved solids, pH, dissolved oxygen, conductivity, transparency, total nitrogen, ammonia, nitrate, nitrite, total phosphorus, total particulate matter, particulate organic matter, and particulate inorganic matter) and microbiological (heterotrophic (total) bacteria, fecal indicators, and Vibrio abundance) parameters of seawater and (ii) biochemical determination of culturable bacteria in two locations near floating cage fish farms in the Adriatic Sea. Water sampling was conducted seasonally in two fish farms (Cres and Vrgada) and corresponding reference (control) sites between 2019 and 2021 of four vertical layers for a total of 108 observations: the surface, 6 m, 12 m, and the bottom.
      Citation: Data
      PubDate: 2023-03-03
      DOI: 10.3390/data8030055
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 56: Learned Sorted Table Search and Static Indexes in
           Small-Space Data Models

    • Authors: Domenico Amato, Raffaele Giancarlo, Giosué Lo Bosco
      First page: 56
      Abstract: Machine-learning techniques, properly combined with data structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed up Binary Searches with the use of additional space with respect to the table being searched into. Such space is devoted to the machine-learning models. Although in their infancy, these are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor, and a major open question concerning this area is to assess to what extent one can enjoy the speeding up of Binary Searches achieved by Learned Indexes while using constant or nearly constant-space models. In this paper, we investigate the mentioned question by (a) introducing two new models, i.e., the Learned k-ary Search Model and the Synoptic Recursive Model Index; and (b) systematically exploring the time–space trade-offs of a hierarchy of existing models, i.e., the ones in the reference software platform Searching on Sorted Data, together with the new ones proposed here. We document a novel and rather complex time–space trade-off picture, which is informative for users as well as designers of Learned Indexing data structures. By adhering to and extending the current benchmarking methodology, we experimentally show that the Learned k-ary Search Model is competitive in time with respect to Binary Search in constant additional space. Our second model, together with the bi-criteria Piece-wise Geometric Model Index, can achieve speeding up of Binary Search with a model space of 0.05% more than the one taken by the table, thereby, being competitive in terms of the time–space trade-off with existing proposals. The Synoptic Recursive Model Index and the bi-criteria Piece-wise Geometric Model complement each other quite well across the various levels of the internal memory hierarchy. Finally, our findings stimulate research in this area since they highlight the need for further studies regarding the time–space relation in Learned Indexes.
      Citation: Data
      PubDate: 2023-03-03
      DOI: 10.3390/data8030056
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 57: Dataset for Spectroscopic, Structural and Dynamic
           Analysis of Human Fe(II)/2OG-Dependent Dioxygenase ALKBH3

    • Authors: Lyubov Yu. Kanazhevskaya, Alexey A. Gorbunov, Polina V. Zhdanova, Vladimir V. Koval
      First page: 57
      Abstract: Fe(II)/2OG-dependent dioxygenases of the AlkB family catalyze a direct removal of alkylated damages in the course of DNA and RNA repair. A human homolog of the E. coli AlkB ALKBH3 protein is able to hydroxylate N1-methyladenine, N3-methylcytosine, and N1-methylguanine in single-stranded DNA and RNA. Due to its contribution to an antitumor drug resistance, this enzyme is considered a promising therapeutic target. The elucidation of ALKBH3’s structural peculiarities is important to establish a detailed mechanism of damaged DNA recognition and processing, as well as to the development of specific inhibitors. This work presents new data on the wild type ALKBH3 protein and its four mutant forms (Y143F, Y143A, L177A, and H191A) obtained by circular dichroism (CD) spectroscopy. The dataset includes the CD spectra of proteins measured at different temperatures and a 3D visualization of the ALKBH3–DNA complex where the mutated amino acid residues are marked. These results show how substitution of the key amino acids influences a secondary structure content of the protein.
      Citation: Data
      PubDate: 2023-03-03
      DOI: 10.3390/data8030057
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 58: Home Comfort Dataset: Acquired from SGH

    • Authors: Mariana Santos, Mário Antunes, Diogo Gomes, Rui L. Aguiar
      First page: 58
      Abstract: In this work, we share the dataset collected during the Smart Green Homes (SGH) project. The project’s goal was to develop integrated products and technology solutions for households, as well as to improve the standards of comfort and user satisfaction. This was to be achieved while improving household energy efficiency and reducing the usage of gaseous pollutants, in response to the planet’s sustainability issues. One of the tasks executed within the project was the collection of data from volunteers’ homes, including environmental information and the level of comfort as perceived by the volunteers themselves. While used in the original project, the resulting dataset contains valuable information that could not be explored at the time. We now share this dataset with the community, which can be used for various scenarios. These may include heating appliance optimisation, presence detection and environmental prediction.
      Citation: Data
      PubDate: 2023-03-03
      DOI: 10.3390/data8030058
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 59: WaRM: A Roof Material Spectral Library for
           Wallonia, Belgium

    • Authors: Coraline Wyard, Rodolphe Marion, Eric Hallot
      First page: 59
      Abstract: The exploitation of urban-material spectral properties is of increasing importance for a broad range of applications, such as urban climate-change modeling and mitigation or specific/dangerous roof-material detection and inventory. A new spectral library dedicated to the detection of roof material was created to reflect the regional diversity of materials employed in Wallonia, Belgium. The Walloon Roof Material (WaRM) spectral library accounts for 26 roof material spectra in the spectral range 350–2500 nm. Spectra were acquired using an ASD FieldSpec3 Hi-Res spectrometer in laboratory conditions, using a spectral sampling interval of 1 nm. The analysis of the spectra shows that spectral signatures are strongly influenced by the color of the roof materials, at least in the VIS spectral range. The SWIR spectral range is in general more relevant to distinguishing the different types of material. Exceptions are the similar properties and very close spectra of several black materials, meaning that their spectral signatures are not sufficiently different to distinguish them from each other. Although building materials can vary regionally due to different available construction materials, the WaRM spectral library can certainly be used for wider applications; Wallonia has always been strongly connected to the surrounding regions and has always encountered climatic conditions similar to all of Northwest Europe.
      Citation: Data
      PubDate: 2023-03-07
      DOI: 10.3390/data8030059
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 60: Development of a Machine-Learning-Based Novel
           Framework for Travel Time Distribution Determination Using Probe Vehicle
           Data

    • Authors: Gurmesh Sihag, Praveen Kumar, Manoranjan Parida
      First page: 60
      Abstract: Investigating travel time variability is critical for pre-trip planning, reliable route selection, traffic management, and the development of control strategies to mitigate traffic congestion problems cost-effectively. Hence, a large number of studies are available in the literature which determine the most suitable distribution to fit the travel time data, but these studies recommend different distributions for the travel time data, and there is a disagreement on the best distribution option for fitting to the travel time data. The present study proposes a novel framework to determine the best distribution to represent the travel time data obtained from probe vehicles by using the modern machine learning technique. This study employs vast travel time data collected by fitting GPS tracking units on the probe vehicles and offers a comprehensive investigation of travel time distribution in different scenarios generated due to spatiotemporal variation of the travel time. The study also considers the effect of weather and uses the three most commonly used non-parametric goodness-of-fit tests (namely, Kolmogorov–Smirnov test, Anderson–Darling test, and chi-squared test) to fit and rank a comprehensive set of around 60 unimodal statistical distributions. The framework proposed in the study can determine the travel time distribution with 91% accuracy. Additionally, the distribution determined by the framework has an acceptance rate of 98.4%, which is better than the acceptance rates of the distributions recommended in existing studies. Because of its robustness and applicability in many different traffic situations, the proposed framework can also be used in developing countries with heterogeneous disordered traffic conditions to evaluate the road network’s performance in terms of travel time reliability.
      Citation: Data
      PubDate: 2023-03-14
      DOI: 10.3390/data8030060
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 61: TKGQA Dataset: Using Question Answering to Guide
           and Validate the Evolution of Temporal Knowledge Graph

    • Authors: Ryan Ong, Jiahao Sun, Ovidiu Șerban, Yi-Ke Guo
      First page: 61
      Abstract: Temporal knowledge graphs can be used to represent the current state of the world and, as daily events happen, the need to update the temporal knowledge graph, in order to stay consistent with the state of the world, becomes very important. However, there is currently no reliable method to accurately validate the update and evolution of knowledge graphs. There has been a recent development in text summarisation, whereby question answering is used to both guide and fact-check summarisation quality. The exact process can be applied to the temporal knowledge graph update process. To the best of our knowledge, there is currently no dataset that connects temporal knowledge graphs with documents with question–answer pairs. In this paper, we proposed the TKGQA dataset, consisting of over 5000 financial news documents related to M&A. Each document has extracted facts, question–answer pairs, and before and after temporal knowledge graphs, to highlight the state of temporal knowledge and any changes caused by the facts extracted from the document. As we parse through each document, we use question–answering to check and guide the update process of the temporal knowledge graph.
      Citation: Data
      PubDate: 2023-03-14
      DOI: 10.3390/data8030061
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 62: Instance and Data Generation for the Offline
           Nanosatellite Task Scheduling Problem

    • Authors: Cezar Antônio Rigo, Edemar Morsch Filho, Laio Oriel Seman, Luís Loures, Valderi Reis Quietinho Leithardt
      First page: 62
      Abstract: This paper discusses several cases of the Offline Nanosatellite Task Scheduling (ONTS) optimization problem, which seeks to schedule the start and finish timings of payloads on a nanosatellite. Modeled after the FloripaSat-I mission, a nanosatellite, the examples were built expressly to test the performance of various solutions to the ONTS problem. Realistic input data for power harvesting calculations were used to generate the instances, and an instance creation procedure was employed to increase the instances’ difficulty. The instances are made accessible to the public to facilitate a fair comparison of various solutions and to aid in establishing a baseline for the ONTS problem. Additionally, the study discusses the various orbit types and their effects on energy harvesting and mission performance.
      Citation: Data
      PubDate: 2023-03-21
      DOI: 10.3390/data8030062
      Issue No: Vol. 8, No. 3 (2023)
       
  • Data, Vol. 8, Pages 22: Transcriptome Dataset of Strawberry (Fragaria x
           ananassa Duch.) Leaves Using Oxford Nanopore Sequencing under LED
           Irradiation and Application of Methyl Jasmonate and Methyl Salicylate
           Hormones Treatment

    • Authors: M. Adrian, Roedhy Poerwanto, Eiichi Inoue, Deden Matra
      First page: 22
      Abstract: This data descriptor introduces a transcriptome dataset of strawberry plant left exposed to an LED light treatment and plant hormones of Methyl Jasmonate (MeJA) and Methyl Salicylate (MeSA). These data consist of a transcriptome dataset (four libraries) obtained from the leaves of strawberry plants treated with LEDs of blue and red spectrums and the hormones of Methyl Jasmonate (MeJA) and Methyl Salicylate (MeSA), which allowed us to conduct a further analysis of the growth and development processes of strawberry plants. In addition, we describe detailed procedures on how the plants were prepared and treated and how the data were generated and processed beforehand. Further analysis of these data will significantly help to improve our understanding of the molecular mechanisms of LED light and MeJA-MeSA in strawberry plants.
      Citation: Data
      PubDate: 2023-01-17
      DOI: 10.3390/data8020022
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 23: Acknowledgment to the Reviewers of Data in 2022

    • Authors: Data Editorial Office Data Editorial Office
      First page: 23
      Abstract: High-quality academic publishing is built on rigorous peer review [...]
      Citation: Data
      PubDate: 2023-01-19
      DOI: 10.3390/data8020023
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 24: Basic Input Data for Audiences’ Geotargeting
           by Destinations’ Partial Accessibility: Notes from Slovakia

    • Authors: Csaba Sidor, Branislav Kršák, Ľubomír Štrba
      First page: 24
      Abstract: The presented notes focus partially on two of the basic elements (accessibility and image) of any managed tourism destination from the perspective of basic ETL processes over open and third-party data. The specific case aims to investigate the usability of open government data on occupancy in combination with third-party data on online audiences’ engagement for DMOs’ potential seasonal geotargeting via utilizing Openrouteservice’s APIs. For the pilot case, a Slovak (Central Europe) destination’s data on occupancy, and the DMO’s website and social media engagement by origin were used to determine potential audiences’ accessibility by car. Testing of the pilot results on a sample of foreign markets indicates that by a partial mix of the means of transportation, the vast majority of audiences are within a 4 h long incoming trip. Although the preliminary tests indicate a linear correlation between the destination’s occupancy and online audiences’ share accessibility by car, for further extrapolation, the list of missing input remains long. The main addition to the field of tourism and destination management may be the partial reusability of developed techniques for data extraction, and transformation for further data overlays, which may save some time.
      Citation: Data
      PubDate: 2023-01-19
      DOI: 10.3390/data8020024
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 25: How to Reach Green Word of Mouth through Green
           Trust, Green Perceived Value and Green Satisfaction

    • Authors: Jose Antonio Román-Augusto, Camila Garrido-Lecca-Vera, Manuel Luis Lodeiros-Zubiria, Martin Mauricio-Andia
      First page: 25
      Abstract: The production and consumption of green food products have become hot topics in marketing. Companies are implementing marketing strategies such as green perceived value, green trust, and green satisfaction to guarantee green word of mouth. An online questionnaire distributed through social media was used to collect the data. The sample consists of 297 people. The 297 responses were coded and analysed with the Software Smart-PLS. The data described include the sample sociodemographic profile, the descriptive analysis of all items, the reliability and validity of the measures of the reflective model and the evaluation of the results of the structural model. Four hypotheses included in the PLS-SEM proposed were validated for a p-value of 0.001. The results confirmed the influence of green perceived value on green trust and green satisfaction. Moreover, the results highlight that green satisfaction and green trust influence green word of mouth.
      Citation: Data
      PubDate: 2023-01-19
      DOI: 10.3390/data8020025
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 26: C2C e-Marketplaces and How Their
           Micro-Segmentation Strategies Influence Their Customers

    • Authors: Sandra Castillo-Sotomayor, Nicholas Guimet-Cornejo, Manuel Luis Lodeiros-Zubiria
      First page: 26
      Abstract: The purpose of this study is to contribute to the literature, understanding how the micro-segmentation strategies developed by the C2C e-marketplaces influence customer satisfaction, brand loyalty, trust, and brand equity by proposing a PLS-SEM model with seven hypotheses. An online questionnaire was answered by a sample of 403 people. The results were edited, coded, transformed, and finally analysed with the software Smart- PLS 3.3.7. The results confirm that the reflective model shows good reliability and validity and that six of the seven were accepted. Furthermore, micro-segmentation mostly influences customer satisfaction, followed by brand equity and trust. On the other hand, the results confirm that, apparently, customer satisfaction does not impact brand loyalty, and micro-segmentation is the more significant construct in reaching brand loyalty in the C2C e-marketplaces. It is worth noting that this research contributes to knowledge about two issues unexplored by the academia, micro-segmentation and the C2C e-marketplaces.
      Citation: Data
      PubDate: 2023-01-19
      DOI: 10.3390/data8020026
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 27: Challenges and Perspectives of Open Data in
           Modelling Infectious Diseases

    • Authors: Francesco Branda, Giorgia Lodi
      First page: 27
      Abstract: The pandemic challenged the scientific community and governments around the world, who were looking for real-time answers but lacked the data or evidence to guide decision-making [...]
      Citation: Data
      PubDate: 2023-01-26
      DOI: 10.3390/data8020027
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 28: A Drought Dataset Based on a Composite Index for
           the Sahelian Climate Zone of Niger

    • Authors: Issa Garba, Zakari Seybou Abdourahamane, Alisher Mirzabaev
      First page: 28
      Abstract: Agricultural drought monitoring in Niger is relevant for the implementation of effective early warning systems and for improving climate change adaptation strategies. However, the scarcity of in situ data hampers an efficient analysis of drought in the country. The present dataset was created for agricultural drought characterization in the Sahelian climate zone of Niger. The dataset comprises the three-month scale and monthly time series of a composite drought index (CDI) and their corresponding drought classes at a spatial resolution of 1 km2 for the period 2000–2020. The CDI was generated from remote sensing data, namely CHIRPS (Climate Hazards Group InfraRed Precipitation with Stations), normalized difference vegetation index (NDVI) and land surface temperature (LST) from MODIS (Moderate Resolution Imaging Spectroradiometer). A weighing technique combining entropy and Euclidian distance was applied in the CDI derivation. From the present dataset, the extraction of the CDI time series can be performed for any location of the study area using its geographic coordinates. Therefore, seasonal drought characteristics, such as onset, end, duration, severity and frequency can be computed from the CDI time series using the theory of runs. The availability of the present dataset is relevant for the socio-economic assessment of drought impacts at small spatial scales, such as district and household level. This dataset is also important for the assessment of drought characteristics in remote areas or areas inaccessible due to civil insecurity in the country as it was entirely generated from remote sensing data. Finally, by including temperature data, the dataset enables drought modelling under global warming.
      Citation: Data
      PubDate: 2023-01-28
      DOI: 10.3390/data8020028
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 29: Retinal Fundus Multi-Disease Image Dataset (RFMiD)
           2.0: A Dataset of Frequently and Rarely Identified Diseases

    • Authors: Sachin Panchal, Ankita Naik, Manesh Kokare, Samiksha Pachade, Rushikesh Naigaonkar, Prerana Phadnis, Archana Bhange
      First page: 29
      Abstract: Irreversible vision loss is a worldwide threat. Developing a computer-aided diagnosis system to detect retinal fundus diseases is extremely useful and serviceable to ophthalmologists. Early detection, diagnosis, and correct treatment could save the eye’s vision. Nevertheless, an eye may be afflicted with several diseases if proper care is not taken. A single retinal fundus image might be linked to one or more diseases. Age-related macular degeneration, cataracts, diabetic retinopathy, Glaucoma, and uncorrected refractive errors are the leading causes of visual impairment. Our research team at the center of excellence lab has generated a new dataset called the Retinal Fundus Multi-Disease Image Dataset 2.0 (RFMiD2.0). This dataset includes around 860 retinal fundus images, annotated by three eye specialists, and is a multiclass, multilabel dataset. We gathered images from a research facility in Jalna and Nanded, where patients across Maharashtra come for preventative and therapeutic eye care. Our dataset would be the second publicly available dataset consisting of the most frequent diseases, along with some rarely identified diseases. This dataset is auxiliary to the previously published RFMiD dataset. This dataset would be significant for the research and development of artificial intelligence in ophthalmology.
      Citation: Data
      PubDate: 2023-01-28
      DOI: 10.3390/data8020029
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 30: Litterfall Production and Litter Decomposition
           Experiments: In Situ Datasets of Nutrient Fluxes in Two Bornean Lowland
           Rain Forests Associated with Acacia Invasion

    • Authors: Salwana Md. Jaafar, Rahayu Sukmaria Sukri, Faizah Metali, David F. R. P. Burslem
      First page: 30
      Abstract: It is increasingly recognized that invasion by alien plant species such as Acacia spp. can impact tropical forest ecosystems, although quantifications of nutrient fluxes for invaded lowland tropical rain forests in aseasonal climates remain understudied. This paper describes the methodology and presents data collected during a year-long study of litterfall production and leaf litter decomposition rates in two distinct tropical lowland forests in Borneo affected by Acacia invasion. The study is the first to present a comprehensive dataset on the impacts of invasive Acacia species on Bornean forests and can be further used for future research to assess the long-term impact of Acacia invasion in these forest ecosystems. Extensive studies of nutrient cycling processes in aseasonal tropical lowland rainforests occurring on different soil types remain limited. Therefore, this dataset improves understanding of nutrient cycling and ecosystem processes in tropical forests and can be utilized by the wider scientific community to examine ecosystem responses in tropical forests.
      Citation: Data
      PubDate: 2023-01-29
      DOI: 10.3390/data8020030
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 31: Runoff for Russia (RFR v1.0): The Large-Sample
           Dataset of Simulated Runoff and Its Characteristics

    • Authors: Georgy Ayzel
      First page: 31
      Abstract: Global warming challenges communities worldwide to develop new adaptation strategies that are required to be based on reliable data. As a vital component of life, river runoff comes into particular focus as a determining and limiting factor of water-related hazard assessment. Here, we present a dataset that makes it possible to estimate the influence of projected climate change on runoff and its characteristics. We utilize the HBV (in Swedish, Hydrologiska Byråns Vattenbalansavdelning) hydrological model and drive it with the ISIMIP (The Inter-Sectoral Impact Model Intercomparison Project) meteorological forcing data for both historical (1979–2016) and projected (2017–2099) periods to simulate runoff and the respective hydrological states and variables, i.e., state of the soil reservoir, snow water equivalent, and predicted amount of melted water, for 425 river basins across Russia. For the projected period, the bias-corrected outputs from four General Circulation Models (GCM) under three Representative Concentration Pathways (RCPs) are used, making it possible to assess the uncertainty of future projections. The simulated runoff formed the basis for calculating its characteristics (191 in total), representing the properties of water regime dynamics. The presented dataset also comprises two auxiliary parts to ensure the seamless assessment of inter-connected hydro-meteorological variables and characteristics: (1) meteorological forcing data and its characteristics and (2) geospatial data. The straightforward use of the presented dataset makes it possible for many interested parties to identify and further communicate water-related climate change issues in Russia on a national scale.
      Citation: Data
      PubDate: 2023-01-30
      DOI: 10.3390/data8020031
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 32: LGCM and PLS-SEM in Panel Survey Data: A
           Systematic Review and Bibliometric Analysis

    • Authors: Zulkifli Mohd Ghazali, Wan Fairos Wan Yaacob, Wan Marhaini Wan Omar
      First page: 32
      Abstract: The application of Latent Growth Curve Model (LGCM) and Partial Least Square Structural Equation Modeling (PLS-SEM) has gained much attention in panel survey studies. This study explores the distributions and trends of LGCM, and PLS-SEM used in panel survey data. It highlights the gaps in the current and existing approaches of PLS-SEM practiced by researchers in analyzing panel survey data. The integrated bibliometric analysis and systematic review were employed in this study. Based on the reviewed articles, the LGCM and PLS-SEM showed an increasing trend of publication in the panel survey data. Though the popularity of LGCM was more outstanding than PLS-SEM for the panel survey data, LGCM has several limitations such as statistical assumptions, reliable sample size, number of repeated measures, and missing data. This systematic review identified five different approaches of PLS-SEM in analyzing the panel survey data namely pre- and post-approach with different constructs, a path comparison approach, a cross-lagged approach, pre- and post-approach with the same constructs, and an evaluation approach practiced by researchers. None of the previous approaches used can establish one structural model to represent the whole changes in the repeated measure. Thus, the findings of this paper could help researchers choose a more appropriate approach to analyzing panel survey data.
      Citation: Data
      PubDate: 2023-01-30
      DOI: 10.3390/data8020032
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 33: Volatiles Emitted by Three Genovese Basil
           Cultivars in Different Growing Systems and Successive Harvests

    • Authors: Michele Ciriello, Luigi Formisano, Youssef Rouphael, Giandomenico Corrado
      First page: 33
      Abstract: The Genovese basil (Ocimum basilicum L.) is the essential ingredient in “pesto” sauce, and it has always had ample use in Mediterranean gastronomy. This horticultural type of basil is grown in the open field and harvested more than once during its cultivation cycle, but in recent decades it is increasingly grown using alternative cultivation methods (e.g., soilless cultivation) that guarantee higher and more uniform production. The dataset presented in this contribution refers to the analysis of the aroma profile by solid-phase microextraction and gas chromatography coupled with a mass spectrometer, of three different cultivars of Genovese basil (Aroma 2, Eleonora, and Italiano Classico) grown in the open field or floating raft system in two successive harvests. The data are a record of the variability of volatile organic compounds due to key agronomic factors, such as the genotype, the cultivation method, and the cut. They may be of interest for those concerned about the impact of different technical factors on the aroma and flavor of basil plants.
      Citation: Data
      PubDate: 2023-01-31
      DOI: 10.3390/data8020033
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 34: Neural Coreference Resolution for Dutch
           Parliamentary Documents with the DutchParliament Dataset

    • Authors: Ruben van Heusden, Jaap Kamps, Maarten Marx
      First page: 34
      Abstract: The task of coreference resolution concerns the clustering of words and phrases referring to the same entity in text, either in the same document or across multiple documents. The task is challenging, as it concerns elements of named entity recognition and reading comprehension, as well as others. In this paper, we introduce DutchParliament, a new Dutch coreference resolution dataset obtained through the manual annotation of 74 government debates, expanded with a domain-specific class. In contrast to existing datasets, which are often composed of news articles, blogs or other documents, the debates in DutchParliament are transcriptions of speech, and therefore offer a unique structure and way of referencing compared to other datasets. By constructing and releasing this dataset, we hope to facilitate the research on coreference resolution in niche domains, with different characteristics than traditional datasets. The DutchParliament dataset was compared to SoNaR-1 and RiddleCoref, two other existing Dutch coreference resolution corpora, to highlight its particularities and differences from existing datasets. Furthermore, two coreference resolution models for Dutch, the rule-based DutchCoref model and the neural e2eDutch model, were evaluated on the DutchParliament dataset to examine their performance on the DutchParliament dataset. It was found that the characteristics of the DutchParliament dataset are quite different from that of the other two datasets, although the performance of the e2eDutch model does not seem to be significantly affected by this. Furthermore, experiments were conducted by utilizing the metadata present in the DutchParliament corpus to improve the performance of the e2eDutch model. The results indicate that the addition of available metadata about speakers has a beneficial effect on the performance of the model, although the addition of the gender of speakers seems to have a limited effect.
      Citation: Data
      PubDate: 2023-02-01
      DOI: 10.3390/data8020034
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 35: Accuracy Assessment of Machine Learning Algorithms
           Used to Predict Breast Cancer

    • Authors: Mohamed Ebrahim, Ahmed Ahmed Hesham Sedky, Saleh Mesbah
      First page: 35
      Abstract: Machine learning (ML) was used to develop classification models to predict individual tumor patients’ outcomes. Binary classification defined whether the tumor was malignant or benign. This paper presents a comparative analysis of machine learning algorithms used for breast cancer prediction. This study used a dataset obtained from the National Cancer Institute (NIH), USA, which contains 1.7 million data records. Classical and deep learning methods were included in the accuracy assessment. Classical decision tree (DT), linear discriminant (LD), logistic regression (LR), support vector machine (SVM), and ensemble techniques (ET) algorithms were used. Probabilistic neural network (PNN), deep neural network (DNN), and recurrent neural network (RNN) methods were used for comparison. Feature selection and its effect on accuracy were also investigated. The results showed that decision trees and ensemble techniques outperformed the other techniques, as they both achieved a 98.7% accuracy.
      Citation: Data
      PubDate: 2023-02-02
      DOI: 10.3390/data8020035
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 36: A Global Multiscale SPEI Dataset under an Ensemble
           Approach

    • Authors: Monia Santini, Sergio Noce, Marco Mancini, Luca Caporaso
      First page: 36
      Abstract: A new multiscale Standardized Precipitation Evapotranspiration Index (SPEI) dataset is provided for a reference period (1960–1999) and two future time horizons (2040–2079) and (2060–2099). The historical forcing is based on combined climate observations and reanalysis (WATer and global CHange Forcing Dataset), and the future projections are fed by the Fast Track experiment of the Inter-Sectoral Impact Model Intercomparison Project under representative concentration pathways (RCPs) 4.5 and 8.5 and by an additional Earth system model (CMCC-CESM) forced by RCP 8.5. To calculate the potential evapotranspiration (PET) input to the SPEI, the Hargreaves–Samani and Thornthwaite equations were adopted. This ensemble considers uncertainty due to different climate models, development pathways, and input formulations. The SPEI is provided for accumulation periods of potential moisture deficit from 1 to 18 months starting in each month of the year, with a focus on the within-period variability, excluding long-term warming effects on PET. In addition to supporting drought analyses, this dataset is also useful for assessing wetter-than-normal conditions spanning one or more months. The SPEI was calculated using the SPEIbase package.
      Citation: Data
      PubDate: 2023-02-05
      DOI: 10.3390/data8020036
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 37: Experimental Spectroscopic Data of SnO2 Films and
           Powder

    • Authors: Hawazin Alghamdi, Olasunbo Z. Farinre, Mathew L. Kelley, Adam J. Biacchi, Dipanjan Saha, Tehseen Adel, Kerry Siebein, Angela R. Hight Walker, Christina A. Hacker, Albert F. Rigosi, Prabhakar Misra
      First page: 37
      Abstract: Powders and films composed of tin dioxide (SnO2) are promising candidates for a variety of high-impact applications, and despite the material’s prevalence in such studies, it remains of high importance that commercially available materials meet the quality demands of the industries that these materials would most benefit. Imaging techniques, such as scanning electron microscopy (SEM), atomic force microscopy (AFM), were used in conjunction with Raman spectroscopy and X-ray photoelectron spectroscopy (XPS) to assess the quality of a variety of samples, such as powder and thin film on quartz with thicknesses of 41 nm, 78 nm, 97 nm, 373 nm, and 908 nm. In this study, the dependencies of the corresponding Raman, XPS, and SEM analysis results on properties of the samples, like the thickness and form (powder versus film) are determined. The outcomes achieved can be regarded as a guide for performing quality checks of such products, and as reference to evaluate commercially available samples.
      Citation: Data
      PubDate: 2023-02-09
      DOI: 10.3390/data8020037
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 38: Datasets of Groundwater Level and Surface Water
           Budget in a Central Mediterranean Site (21 June 2017–1 October 2022)
           

    • Authors: Marco Delle Rose, Paolo Martano
      First page: 38
      Abstract: This note makes available five years of data gathered in a measurement site equipped with a micrometeorological station and two monitoring wells. Series of data of hydrological and atmospheric variables make it possible to estimate the flux of water across the atmosphere-land interface and to calculate the water budget, which are crucial topics in climate and environmental sciences. The water-table measures began during 2017, one of the driest years of the whole instrumental period of climate history for the Central Mediterranean. Data from the micrometeorological station have been used to construct two more datasets of daily and monthly totals of different terms of the surface water budget, from which the net infiltration has been estimated. An apparent decreasing trend characterizes both the data time series of groundwater level and estimated infiltration in the considered period.
      Citation: Data
      PubDate: 2023-02-09
      DOI: 10.3390/data8020038
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 39: Multi-Year On-Farm Trial Data on the Performance
           of Long- and Short-Duration Wheat Varieties against Sowing Dates in the
           Eastern Indo-Gangetic Plain of India

    • Authors: Anurag Ajay, Madhulika Singh, Subhajit Patra, Harshit Ranjan, Ajay Pundir, Shishpal Poonia, Anurag Kumar, Deepak K. Singh, Pankaj Kumar, Moben Ignatius, Prabhat Kumar, Sonam R. Sherpa, Ram K. Malik, Virender Kumar, Sudhanshu Singh, Peter Craufurd, Andrew J. McDonald
      First page: 39
      Abstract: Sub-optimal wheat productivity in the eastern Indo-Gangetic plain of India can largely be attributed to delayed sowing and the use of short duration varieties. The second week of November is the ideal time for sowing wheat in eastern India, though farmers generally plant later. Late-sowing farmers tend to prefer short-duration varieties, leading to additional yield penalty. To validate the effect of timely sowing and the comparative performance of long- and short-duration varieties, multi-location on-farm trials were conducted continuously over five years starting from 2016–2017. Ten districts were selected to ensure that all the agro-climatic zones of the region were covered. There were five treatments of sowing windows: (T1) 1 to 10 November, (T2) 11–20 November, (T3) 21 to 30 November, (T4) 1–15 December, and (T5) 16–31 December. Varietal performance was compared in T3, T4, and T5, as short-duration varieties are normally sown after 20 November. There is asymmetry in the distribution of samples within treatments and over the years due to the allocation of fields by farmers. Altogether, the trial was conducted at 3735 sites and captured 61 variables, including yield and yield attributing traits. Findings suggested that grain yields of long-duration wheat varieties are better even under late sown scenarios.
      Citation: Data
      PubDate: 2023-02-10
      DOI: 10.3390/data8020039
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 40: Whole-Slide Images and Patches of Clear Cell Renal
           Cell Carcinoma Tissue Sections Counterstained with Hoechst 33342, CD3, and
           CD8 Using Multiple Immunofluorescence

    • Authors: Georg Wölflein, In Hwa Um, David J. Harrison, Ognjen Arandjelović
      First page: 40
      Abstract: In recent years, there has been an increased effort to digitise whole-slide images of cancer tissue. This effort has opened up a range of new avenues for the application of deep learning in oncology. One such avenue is virtual staining, where a deep learning model is tasked with reproducing the appearance of stained tissue sections, conditioned on a different, often times less expensive, input stain. However, data to train such models in a supervised manner where the input and output stains are aligned on the same tissue sections are scarce. In this work, we introduce a dataset of ten whole-slide images of clear cell renal cell carcinoma tissue sections counterstained with Hoechst 33342, CD3, and CD8 using multiple immunofluorescence. We also provide a set of over 600,000 patches of size 256 × 256 pixels extracted from these images together with cell segmentation masks in a format amenable to training deep learning models. It is our hope that this dataset will be used to further the development of deep learning methods for digital pathology by serving as a dataset for comparing and benchmarking virtual staining models.
      Citation: Data
      PubDate: 2023-02-15
      DOI: 10.3390/data8020040
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 41: VPAgs-Dataset4ML: A Dataset to Predict Viral
           Protective Antigens for Machine Learning-Based Reverse Vaccinology

    • Authors: Salod, Mahomed
      First page: 41
      Abstract: Reverse vaccinology (RV) is a computer-aided approach for vaccine development that identifies a subset of pathogen proteins as protective antigens (PAgs) or potential vaccine candidates. Machine learning (ML)-based RV is promising, but requires a dataset of PAgs (positives) and non-protective protein sequences (negatives). This study aimed to create an ML dataset, VPAgs-Dataset4ML, to predict viral PAgs based on PAgs obtained from Protegen. We performed seven steps to identify PAgs from the Protegen website and non-protective protein sequences from Universal Protein Resource (UniProt). The seven steps included downloading viral PAgs from Protegen, performing quality checks on PAgs using the standard BLASTp identity check ≤30% via MMseqs2, and computational steps running on Google Colaboratory and the Ubuntu terminal to retrieve and perform quality checks (similar to the PAgs) on non-protective protein sequences as negatives from UniProt. VPAgs-Dataset4ML contains 2,145 viral protein sequences, with 210 PAgs in positive.fasta and 1,935 non-protective protein sequences in negative.fasta. This dataset can be used to train ML models to predict antigens for various viral pathogens with the aim of developing effective vaccines.
      Citation: Data
      PubDate: 2023-02-17
      DOI: 10.3390/data8020041
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 42: Dataset of Public Objects in Uncontrolled
           Environment for Navigation Aiding

    • Authors: Teng-Lai Wong, Ka-Seng Chou, Kei-Long Wong, Su-Kit Tang
      First page: 42
      Abstract: Computer vision is a new approach to navigation aiding that assists visually impaired people to travel independently. A deep learning-based solution implemented on a portable device that uses a monocular camera to capture public objects could be a low-cost and handy navigation aid. By recognizing public objects in the street and estimating their distance from the user, visually impaired people are able to avoid obstacles in the outdoor environment and walk safely. In this paper, we created a dataset of public objects in an uncontrolled environment for navigation aiding. The dataset contains three classes of objects which commonly exist on pavements in the city. It was verified that the dataset was of high quality for object detection and distance estimation, and was ultimately utilized as a navigation aid solution.
      Citation: Data
      PubDate: 2023-02-20
      DOI: 10.3390/data8020042
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 43: Federated Learning for Data Analytics in Education

    • Authors: Christian Fachola, Agustín Tornaría, Paola Bermolen, Germán Capdehourat, Lorena Etcheverry, María Inés Fariello
      First page: 43
      Abstract: Federated learning techniques aim to train and build machine learning models based on distributed datasets across multiple devices while avoiding data leakage. The main idea is to perform training on remote devices or isolated data centers without transferring data to centralized repositories, thus mitigating privacy risks. Data analytics in education, in particular learning analytics, is a promising scenario to apply this approach to address the legal and ethical issues related to processing sensitive data. Indeed, given the nature of the data to be studied (personal data, educational outcomes, and data concerning minors), it is essential to ensure that the conduct of these studies and the publication of the results provide the necessary guarantees to protect the privacy of the individuals involved and the protection of their data. In addition, the application of quantitative techniques based on the exploitation of data on the use of educational platforms, student performance, use of devices, etc., can account for educational problems such as the determination of user profiles, personalized learning trajectories, or early dropout indicators and alerts, among others. This paper presents the application of federated learning techniques to a well-known learning analytics problem: student dropout prediction. The experiments allow us to conclude that the proposed solutions achieve comparable results from the performance point of view with the centralized versions, avoiding the concentration of all the data in a single place for training the models.
      Citation: Data
      PubDate: 2023-02-20
      DOI: 10.3390/data8020043
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 44: Deep Learning with Northern Australian Savanna
           Tree Species: A Novel Dataset

    • Authors: Andrew J. Jansen, Jaylen D. Nicholson, Andrew Esparon, Timothy Whiteside, Michael Welch, Matthew Tunstill, Harinandanan Paramjyothi, Varma Gadhiraju, Steve van Bodegraven, Renee E. Bartolo
      First page: 44
      Abstract: The classification of savanna woodland tree species from high-resolution Remotely Piloted Aircraft Systems (RPAS) imagery is a complex and challenging task. Difficulties for both traditional remote sensing algorithms and human observers arise due to low interspecies variability (species difficult to discriminate because they are morphologically similar) and high intraspecies variability (individuals of the same species varying to the extent that they can be misclassified), and the loss of some taxonomic features commonly used for identification when observing trees from above. Deep neural networks are increasingly being used to overcome challenges in image recognition tasks. However, supervised deep learning algorithms require high-quality annotated and labelled training data that must be verified by subject matter experts. While training datasets for trees have been generated and made publicly available, they are mostly acquired in the Northern Hemisphere and lack species-level information. We present a training dataset of tropical Northern Australia savanna woodland tree species that was generated using RPAS and on-ground surveys to confirm species labels. RPAS-derived imagery was annotated, resulting in 2547 polygons representing 36 tree species. A baseline dataset was produced consisting of: (i) seven orthomosaics that were used for in-field labelling; (ii) a tiled dataset at 1024 × 1024 pixel size in Common Objects in Context (COCO) format that can be used for deep learning model training; (iii) and the annotations.
      Citation: Data
      PubDate: 2023-02-20
      DOI: 10.3390/data8020044
      Issue No: Vol. 8, No. 2 (2023)
       
  • Data, Vol. 8, Pages 14: UTMInDualSymFi: A Dual-Band Wi-Fi Dataset for
           Fingerprinting Positioning in Symmetric Indoor Environments

    • Authors: Asim Abdullah, Muhammad Haris, Omar Abdul Aziz, Rozeha A. Rashid, Ahmad Shahidan Abdullah
      First page: 14
      Abstract: Recent studies on indoor positioning using Wi-Fi fingerprinting are motivated by the ubiquity of Wi-Fi networks and their promising positioning accuracy. Machine learning algorithms are commonly leveraged in indoor positioning works. The performance of machine learning based solutions are dependent on the availability, volume, quality, and diversity of related data. Several public datasets have been published in order to foster advancements in Wi-Fi based fingerprinting indoor positioning solutions. These datasets, however, lack dual-band Wi-Fi data within symmetric indoor environments. To fill this gap, this research work presents the UTMInDualSymFi dataset, as a source of dual-band Wi-Fi data, acquired within multiple residential buildings with symmetric deployment of access points. UTMInDualSymFi comprises the recorded dual-band raw data, training and test datasets, radio maps and supporting metadata. Additionally, a statistical radio map construction algorithm is presented. Benchmark performance was evaluated by implementing a machine-learning-based positioning algorithm on the dataset. In general, higher accuracy was observed, on the 5 GHz data scenarios. This systematically collected dataset enables the development and validation of future comprehensive solutions, inclusive of novel preprocessing, radio map construction, and positioning algorithms.
      Citation: Data
      PubDate: 2023-01-01
      DOI: 10.3390/data8010014
      Issue No: Vol. 8, No. 1 (2023)
       
  • Data, Vol. 8, Pages 15: Visual Lip Reading Dataset in Turkish

    • Authors: Ali Berkol, Talya Tümer-Sivri, Nergis Pervan-Akman, Melike Çolak, Hamit Erdem
      First page: 15
      Abstract: The promised dataset was obtained from daily Turkish words and phrases pronounced by various people in videos posted on YouTube. The purpose of compiling the dataset was to provide a method for the detection of the spoken word by recognizing patterns or classifying lip movements with supervised, unsupervised, and semi-supervised learning, and machine learning algorithms. Most of the datasets related to lip reading consist of people recorded on camera with fixed backgrounds and the same conditions, but the dataset presented here consists of images compatible with machine learning models developed for real-life challenges. It contains a total of 2335 instances taken from TV series, movies, vlogs, and song clips on YouTube. The images in the dataset vary due to factors such as the way people say words, accents, speaking rate, gender, and age. Furthermore, the instances in the dataset consist of videos with different angles, shadows, resolution, and brightness that are not created manually. The most important feature of our lip reading dataset is that we contribute to the non-synthetic Turkish dataset pool, which does not have wide dataset varieties. Machine learning studies can be carried out in many areas, such as education, security, and social life with this dataset.
      Citation: Data
      PubDate: 2023-01-05
      DOI: 10.3390/data8010015
      Issue No: Vol. 8, No. 1 (2023)
       
  • Data, Vol. 8, Pages 16: Traffic Sign Detection and Classification on the
           Austrian Highway Traffic Sign Data Set

    • Authors: Alexander Maletzky, Nikolaus Hofer, Stefan Thumfart, Karin Bruckmüller, Johannes Kasper
      First page: 16
      Abstract: Advanced Driver Assistance Systems rely on automated traffic sign recognition. Today, Deep Learning methods outperform other approaches in terms of accuracy and processing time; however, they require vast and well-curated data sets for training. In this paper, we present the Austrian Highway Traffic Sign Data Set (ATSD), a comprehensive annotated data set of images of almost all traffic signs on Austrian highways in 2014, and corresponding images of full traffic scenes they are contained in. Altogether, the data set consists of almost 7500 scene images with more than 28,000 detailed annotations of more than 100 distinct traffic sign classes. It covers diverse environments, ranging from urban to rural and mountainous areas, and includes many images recorded in tunnels. We further evaluate state-of-the-art traffic sign detectors and classifiers on ATSD to establish baselines for future experiments. The data set and our baseline models are freely available online.
      Citation: Data
      PubDate: 2023-01-09
      DOI: 10.3390/data8010016
      Issue No: Vol. 8, No. 1 (2023)
       
  • Data, Vol. 8, Pages 17: UIBVFED-Mask: A Dataset for Comparing Facial
           Expressions with and without Face Masks

    • Authors: Miquel Mascaró-Oliver, Ramon Mas-Sansó, Esperança Amengual-Alcover, Maria Francesca Roig-Maimó
      First page: 17
      Abstract: After the COVID-19 pandemic the use of face masks has become a common practice in many situations. Partial occlusion of the face due to the use of masks poses new challenges for facial expression recognition because of the loss of significant facial information. Consequently, the identification and classification of facial expressions can be negatively affected when using neural networks in particular. This paper presents a new dataset of virtual characters, with and without face masks, with identical geometric information and spatial location. This novelty will certainly allow researchers a better refinement on lost information due to the occlusion of the mask.
      Citation: Data
      PubDate: 2023-01-11
      DOI: 10.3390/data8010017
      Issue No: Vol. 8, No. 1 (2023)
       
  • Data, Vol. 8, Pages 18: Introducing UWF-ZeekData22: A Comprehensive
           Network Traffic Dataset Based on the MITRE ATT&CK Framework

    • Authors: Sikha S. Bagui, Dustin Mink, Subhash C. Bagui, Tirthankar Ghosh, Russel Plenkers, Tom McElroy, Stephan Dulaney, Sajida Shabanali
      First page: 18
      Abstract: With the rapid rate at which networking technologies are changing, there is a need to regularly update network activity datasets to accurately reflect the current state of network infrastructure/traffic. The uniqueness of this work was that this was the first network dataset collected using Zeek and labelled using the MITRE ATT&CK framework. In addition to identifying attack traffic, the MITRE ATT&CK framework allows for the detection of adversary behavior leading to an attack. It can also be used to develop user profiles of groups intending to perform attacks. This paper also outlined how both the cyber range and hadoop’s big data platform were used for creating this network traffic data repository. The data was collected using Security Onion in two formats: Zeek and PCAPs. Mission logs, which contained the MITRE ATT&CK data, were used to label the network attack data. The data was transferred daily from the Security Onion virtual machine running on a cyber range to the big-data platform, Hadoop’s distributed file system. This dataset, UWF-ZeekData22, is publicly available at datasets.uwf.edu.
      Citation: Data
      PubDate: 2023-01-11
      DOI: 10.3390/data8010018
      Issue No: Vol. 8, No. 1 (2023)
       
  • Data, Vol. 8, Pages 19: Airborne Spectral Reflectance Dataset of Submerged
           Plastic Targets in a Coastal Environment

    • Authors: Apostolos Papakonstantinou, Argyrios Moustakas, Polychronis Kolokoussis, Dimitris Papageorgiou, Robin de Vries, Konstantinos Topouzelis
      First page: 19
      Abstract: Among the emerging applications of remote sensing technologies, the remote detection of plastic litter has observed successful applications in recent years. However, while the number of studies and datasets for spectral characterization of plastic is growing, few studies address plastic litter while being submerged in natural seawater in an outdoor context. This study aims to investigate the feasibility of hyperspectral characterization of submerged plastic litter in less-than-ideal conditions. We present a hyperspectral dataset of eight different polymers in field conditions, taken by an unmanned aerial vehicle (UAV) on different days in a three-week period. The measurements were carried out off the coast of Mytilene, Greece. The team collected the dataset using a Bayspec OCI-F push broom sensor from 25 m and 40 m height above the water. For a contextual background, the dataset also contains optical (RGB) high-resolution orthomosaics.
      Citation: Data
      PubDate: 2023-01-11
      DOI: 10.3390/data8010019
      Issue No: Vol. 8, No. 1 (2023)
       
  • Data, Vol. 8, Pages 20: A Low-Resolution Used Electronic Parts Image
           Dataset for Sorting Application

    • Authors: Praneel Chand
      First page: 20
      Abstract: The accumulation of electronic waste (e-waste) is becoming a problem in society. Old parts and components are conveniently discarded instead of being recycled. Economic and environmental measures should be taken by individuals and organizations to enhance sustainability. This could include desoldering and reusing parts from electronic circuit boards. Hence, the purpose of the dataset presented in this paper is for the classification of used electronic parts in linear voltage regulator power supply circuits. The dataset presented in this paper comprises low-resolution (30 × 30 pixels) grayscale images of major reusable electronic parts from a typical adjustable regulated linear voltage power supply kitset. The three major reusable parts are capacitors, potentiometers, and voltage regulator ICs. These are typically the most relatively expensive components. Data representing the parts are extracted from 960 × 720 pixel workspace images containing multiple parts. This permits the dataset to be used with multiple types of classifiers, such as lightweight shallow neural networks (SNNs), support vector machines (SVMs), or convolutional neural networks (CNNs). Classification accuracies of 93.5%, 94.9%, and 98.4% were achieved with SNNs, SVMs, and CNNs, respectively. Successful detection and classification of parts will permit a Niryo Ned robotic arm to pick and place parts in the desired locations. The dataset can be used by other academics and researchers working with the Niryo Ned robot and Matlab to handle electronic parts. It can be expanded to include relatively expensive components from other types of electronic circuit boards.
      Citation: Data
      PubDate: 2023-01-14
      DOI: 10.3390/data8010020
      Issue No: Vol. 8, No. 1 (2023)
       
  • Data, Vol. 8, Pages 21: Shapley Value as a Quality Control for Mass
           Spectra of Human Glioblastoma Tissues

    • Authors: Denis S. Zavorotnyuk, Anatoly A. Sorokin, Stanislav I. Pekov, Denis S. Bormotov, Vasiliy A. Eliferov, Konstantin V. Bocharov, Eugene N. Nikolaev, Igor A. Popov
      First page: 21
      Abstract: The automatic processing of high-dimensional mass spectrometry data is required for the clinical implementation of ambient ionization molecular profiling methods. However, complex algorithms required for the analysis of peak-rich spectra are sensitive to the quality of the input data. Therefore, an objective and quantitative indicator, insensitive to the conditions of the experiment, is currently in high demand for the automated treatment of mass spectrometric data. In this work, we demonstrate the utility of the Shapley value as an indicator of the quality of the individual mass spectrum in the classification task for human brain tumor tissue discrimination. The Shapley values are calculated on the training set of glioblastoma and nontumor pathological tissues spectra and used as feedback to create a random forest regression model to estimate the contributions for all spectra of each specimen. As a result, it is shown that the implementation of Shapley values significantly accelerates the data analysis of negative mode mass spectrometry data alongside simultaneous improving the regression models’ accuracy.
      Citation: Data
      PubDate: 2023-01-16
      DOI: 10.3390/data8010021
      Issue No: Vol. 8, No. 1 (2023)
       
  • Data, Vol. 8, Pages 1: Gene Expression Datasets for Two Versions of the
           Saccharum spontaneum AP85-441 Genome

    • Authors: Nicolás López-Rozo, Mauricio Ramirez-Castrillon, Miguel Romero, Jorge Finke, Camilo Rocha
      First page: 1
      Abstract: Sugarcane is a species of tall grass with high biomass and sucrose production, and the world’s largest crop by production quantity. Its evolutionary environment adaptation and anthropogenic breeding response have resulted in a complex autopolyploid genome. Few efforts have been reported in the literature to document this organism’s gene co-expression and annotation, and, when available, use different gene identifiers that cannot be easily associated across studies. This data descriptor paper presents a dataset that consolidates expression matrices of two Saccharum spontaneum AP85-441 genome versions and an algorithm implemented in Python to mechanically obtain this dataset. The data are processed from the allele-level information of the two sources, with BLASTn used bidirectionally to suggest feasible mappings between the two sets of alleles, and a graph-matching optimization algorithm to maximize global identity and uniqueness of genes. Association tables are used to consolidate the expression values from alleles to genes. The contributed expression matrices comprise 96 experiments and 109,050 and 35,516 from the two genome versions. They can represent significant computational cost reduction for further research on, e.g., sugarcane co-expression network generation, functional annotation prediction, and stress-specific gene identification.
      Citation: Data
      PubDate: 2022-12-20
      DOI: 10.3390/data8010001
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 2: Spectral Library of Maize Leaves under Nitrogen
           Deficiency Stress

    • Authors: Maria C. Torres-Madronero, Manuel Goez, Manuel A. Guzman, Tatiana Rondon, Pablo Carmona, Camilo Acevedo-Correa, Santiago Gomez-Ortega, Mariana Durango-Flórez, Smith V. López, July Galeano, Maria Casamitjana
      First page: 2
      Abstract: Maize crops occupy an important place in world food security. However, different conditions, such as abiotic stress factors, can affect the productivity of these crops, requiring technologies that facilitate their monitoring. One such technology is spectroscopy, which measures the energy reflected and emitted by a surface along the electromagnetic spectrum. Spectral data can help to identify abiotic factors in plants, since the spectral signature of vegetation has discriminating features associated with the plant’s health condition. This paper introduces a spectral library captured on maize crops under different nitrogen-deficiency stress levels. The datasets will be of potential interest to researchers, ecologists, and agronomists seeking to understand the spectral features of maize under nitrogen-deficiency stress. The library includes three datasets captured at different growth stages of 10 tropical maize genotypes. The spectral signatures collected were in the visible to near-infrared range (450–950 nm). The data were pre-processed to reduce noise and anomalous signatures. This study presents a spectral library of the effects of nitrogen deficiency on ten maize genotypes, highlighting that some genotypes show tolerance to this type of stress at different phenological stages. Most of the evaluated genotypes showed discriminate spectral features 4–6 weeks after sowing. Higher reflectance was obtained at approximately 550 nm for the lowest nitrogen fertilization treatments. Finally, we describe some potential applications of the spectral library of maize leaves under nitrogen-deficiency stress.
      Citation: Data
      PubDate: 2022-12-21
      DOI: 10.3390/data8010002
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 3: Pilot Study of the Metabolomic Profile of an
           Athlete after Short-Term Physical Activity

    • Authors: Kristina A. Malsagova, Arthur T. Kopylov, Vasiliy I. Pustovoyt, Alexander A. Stepanov, Dmitry V. Enikeev, Natalia V. Potoldykova, Evgenii I. Balakin, Anna L. Kaysheva
      First page: 3
      Abstract: A comprehensive analysis of indicators of the state of the body between training and recovery allows a comprehensive evaluation of various aspects of health, athletic performance, and recovery. In this pilot study, an assessment of the metabolomic profile of athletes was performed, and the immunological reaction of the athlete’s body to food before exercise and 48 h after exercise was studied. As a result, 15 amino acids and 3 hormones were identified, the plasma levels of which differed between the training and recovery states. In addition, immunological reactions or hyperreactivity to food allergens were assessed using an enzyme immunoassay. It is likely that for the athletes in the study sample, 48 h is not enough time for the complete recovery of the body.
      Citation: Data
      PubDate: 2022-12-21
      DOI: 10.3390/data8010003
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 4: LoRaWAN Path Loss Measurements in an Urban Scenario
           including Environmental Effects

    • Authors: Mauricio González-Palacio, Diana Tobón-Vallejo, Lina M. Sepúlveda-Cano, Santiago Rúa, Giovanni Pau, Long Bao Le
      First page: 4
      Abstract: LoRaWAN is a widespread protocol by which Internet of things end nodes (ENs) can exchange information over long distances via their gateways. To deploy the ENs, it is mandatory to perform a link budget analysis, which allows for determining adequate radio parameters like path loss (PL). Thus, designers use PL models developed based on theoretical approaches or empirical data. Some previous measurement campaigns have been performed to characterize this phenomenon, primarily based on distance and frequency. However, previous works have shown that weather variations also impact PL, so using the conventional approaches and available datasets without capturing important environmental effects can lead to inaccurate predictions. Therefore, this paper delivers a data descriptor that includes a set of LoRaWAN measurements performed in Medellín, Colombia, including PL, distance, frequency, temperature, relative humidity, barometric pressure, particulate matter, and energy, among other things. This dataset can be used by designers who need to fit highly accurate PL models. As an example of the dataset usage, we provide some model fittings including log-distance, and multiple linear regression models with environmental effects. This analysis shows that including such variables improves path loss predictions with an RMSE of 1.84 dB and an R2 of 0.917.
      Citation: Data
      PubDate: 2022-12-22
      DOI: 10.3390/data8010004
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 5: Thermal Data of Perfluorinated Carboxylic Acid
           Functionalized Aluminum Nanoparticles

    • Authors: Nathan J. Weeks, Bradley Martin, Enrique Gazmin, Scott T. Iacono
      First page: 5
      Abstract: Improving the performance of composite energetic materials comprised of a solid metal fuel and a source of oxidizer (known as thermites) has long been pursued as thermites for pyrolant flares and rocket propellants. The performance of thermites, involving aluminum as the fuel, can be dramatically improved by utilizing nanometer-sized aluminum particles (nAl) leading to vastly higher reaction velocities, owing to the high surface area of nAl. Despite the benefits of the increased surface area, there are still several problems inherent to nanoscale reactants including particle aggregation, and higher viscosity composited materials. The higher viscosity of nAl composites is cumbersome for processing with inert polymer binder formulations, especially at the high mass loadings of metal fuel necessary for industry standards. In order to improve the viscosity of high mass loaded nAl energetics, the surface of the nAl was passivated with covalently bound monolayers of perfluorinated carboxylic acids (PFCAs) utilizing a novel fluorinated solvent washing technique. This work also details the quantitative binding of these monolayers using infrared spectroscopy, in addition to the energetic output from calorimetric and thermogravimetric analysis.
      Citation: Data
      PubDate: 2022-12-23
      DOI: 10.3390/data8010005
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 6: A Large-Scale Dataset of Conservation and Deep
           Tillage in Mollisols, Northeast Plain, China

    • Authors: Fahui Jiang, Shangshu Huang, Yan Wu, Mahbub Ul Islam, Fangjin Dong, Zhen Cao, Guohui Chen, Yuming Guo
      First page: 6
      Abstract: One of the primary challenges of our time is to feed a growing and more demanding world population with degraded soil environments under more variable and extreme climate conditions. Conservation tillage (CS) and deep tillage (DT) have received strong international support to help address these challenges but are less used in major global food production in China. Hence, we conducted a large-scale literature search of English and Chinese publications to synthesize the current scientific evidence to evaluate the effects of CS and DT on soil protection and yield maintenance in the Northeast China Plain, which has the most fertile black soil (Mollisols) and is the main agricultural production area of China. As a result, we found that CS had higher soil bulk density, strong soil penetration resistance, greater water contents, and lower soil temperature, and was well-suited for dry and wind erosion-sensitive regions i.e., the southwest areas of the Northeast. Conversely, DT had better performance in the middle belt of the Northeast China Plain, which contained a lower soil temperature and humid areas. Finally, we created an original dataset from papers [dataset 1, including soil physio-chemical parameters, such as soil water, bulk density, organic carbon, sand, silt, clay, pH, total and available nitrogen (N), phosphorus (P), and potassium (K), etc., on crop biomass and yield], by collecting data directly from publications, and two predicted datasets (dataset 2 and dataset 3) of crop yield changes by developing random forest models based on our data.
      Citation: Data
      PubDate: 2022-12-24
      DOI: 10.3390/data8010006
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 7: Numerical and Experimental Data of the
           Implementation of Logic Gates in an Erbium-Doped Fiber Laser (EDFL)

    • Authors: Samuel Mardoqueo Afanador Delgado, José Luis Echenausía Monroy, Guillermo Huerta Cuellar, Juan Hugo García López, Rider Jaimes Reátegui
      First page: 7
      Abstract: In this article, the methods for obtaining time series from an erbium-doped fiber laser (EDFL) and its numerical simulation are described. In addition, the nature of the obtained files, the meaning of the changing file names, and the ways of accessing these files are described in detail. The response of the laser emission is controlled by the intensity of a digital signal added to the modulation, which allows for various logical operations. The numerical results are in good agreement with experimental observations. The authors provide all of the time series from an experimental implementation where various logic gates are obtained.
      Citation: Data
      PubDate: 2022-12-26
      DOI: 10.3390/data8010007
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 8: Aggregation of Multimodal ICE-MS Data into Joint
           Classifier Increases Quality of Brain Cancer Tissue Classification

    • Authors: Anatoly A. Sorokin, Denis S. Bormotov, Denis S. Zavorotnyuk, Vasily A. Eliferov, Konstantin V. Bocharov, Stanislav I. Pekov, Evgeny N. Nikolaev, Igor A. Popov
      First page: 8
      Abstract: Mass spectrometry fingerprinting combined with multidimensional data analysis has been proposed in surgery to determine if a biopsy sample is a tumor. In the specific case of brain tumors, it is complicated to obtain control samples, leading to model overfitting due to unbalanced sample cohorts. Usually, classifiers are trained using a single measurement regime, most notably single ion polarity, but mass range and spectral resolution could also be varied. It is known that lipid groups differ significantly in their ability to produce positive or negative ions; hence, using only one polarity significantly restricts the chemical space available for sample discrimination purposes. In this work, we have developed an approach employing mass spectrometry data obtained by eight different regimes of measurement simultaneously. Regime-specific classifiers are trained, then a mixture of experts techniques based on voting or mean probability is used to aggregate predictions of all trained classifiers and assign a class to the whole sample. The aggregated classifiers have shown a much better performance than any of the single-regime classifiers and help significantly reduce the effect of an unbalanced dataset without any augmentation.
      Citation: Data
      PubDate: 2022-12-27
      DOI: 10.3390/data8010008
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 9: PERSIST: A Multimodal Dataset for the Prediction of
           Perceived Exertion during Resistance Training

    • Authors: Justin Amadeus Albert, Arne Herdick, Clemens Markus Brahms, Urs Granacher, Bert Arnrich
      First page: 9
      Abstract: Measuring and adjusting the training load is essential in resistance training, as training overload can increase the risk of injuries. At the same time, too little load does not deliver the desired training effects. Usually, external load is quantified using objective measurements, such as lifted weight distributed across sets and repetitions per exercise. Internal training load is usually assessed using questionnaires or ratings of perceived exertion (RPE). A standard RPE scale is the Borg scale, which ranges from 6 (no exertion) to 20 (the highest exertion ever experienced). Researchers have investigated predicting RPE for different sports using sensor modalities and machine learning methods, such as Support Vector Regression or Random Forests. This paper presents PERSIST, a novel dataset for predicting PERceived exertion during reSIStance Training. We recorded multiple sensor modalities simultaneously, including inertial measurement units (IMU), electrocardiography (ECG), and motion capture (MoCap). The MoCap data has been synchronized to the IMU and ECG data. We also provide heart rate variability (HRV) parameters obtained from the ECG signal. Our dataset contains data from twelve young and healthy male participants with at least one year of resistance training experience. Subjects performed twelve sets of squats on a Flywheel platform with twelve repetitions per set. After each set, subjects reported their current RPE. We chose the squat exercise as it involves the largest muscle group. This paper demonstrates how to access the dataset. We further present an exploratory data analysis and show how researchers can use IMU and ECG data to predict perceived exertion.
      Citation: Data
      PubDate: 2022-12-28
      DOI: 10.3390/data8010009
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 10: Antimicrobial Susceptibility Data for Six Lactic
           Acid Bacteria Tested against Fifteen Antimicrobials

    • Authors: Ivana Nikodinoska, Jouni Heikkinen, Colm A. Moran
      First page: 10
      Abstract: Antimicrobial resistance is a rising threat in the agrifood sector. The misuse of antibiotics exerts selective pressure, driving resistance mechanisms in bacteria, which could ultimately spread through many routes and render treatments for infectious diseases inefficient in humans and animals. Herein, we report antimicrobial susceptibility data obtained for six lactic acid bacteria, the members of which are commonly used in the food and feed chain. Fifteen antimicrobials were considered for the phenotypic testing: ampicillin, gentamicin, kanamycin, tetracycline, erythromycin, clindamycin, chloramphenicol, streptomycin, vancomycin, quinupristin-dalfopristin, bacitracin, sulfamethoxazole, ciprofloxacin, linezolid, and rifampicin. The reported dataset could be used for the comparison, generation, and reconsideration of new and/or existing cut-off values when considering lactic acid bacteria, particularly lactobacilli and pediococci.
      Citation: Data
      PubDate: 2022-12-29
      DOI: 10.3390/data8010010
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 11: Natural Language Processing to Extract Information
           from Portuguese-Language Medical Records

    • Authors: Naila Camila da Rocha, Abner Macola Pacheco Barbosa, Yaron Oliveira Schnr, Juliana Machado-Rugolo, Luis Gustavo Modelli de Andrade, José Eduardo Corrente, Liciana Vaz de Arruda Silveira
      First page: 11
      Abstract: Studies that use medical records are often impeded due to the information presented in narrative fields. However, recent studies have used artificial intelligence to extract and process secondary health data from electronic medical records. The aim of this study was to develop a neural network that uses data from unstructured medical records to capture information regarding symptoms, diagnoses, medications, conditions, exams, and treatment. Data from 30,000 medical records of patients hospitalized in the Clinical Hospital of the Botucatu Medical School (HCFMB), São Paulo, Brazil, were obtained, creating a corpus with 1200 clinical texts. A natural language algorithm for text extraction and convolutional neural networks for pattern recognition were used to evaluate the model with goodness-of-fit indices. The results showed good accuracy, considering the complexity of the model, with an F-score of 63.9% and a precision of 72.7%. The patient condition class reached a precision of 90.3% and the medication class reached 87.5%. The proposed neural network will facilitate the detection of relationships between diseases and symptoms and prevalence and incidence, in addition to detecting the identification of clinical conditions, disease evolution, and the effects of prescribed medications.
      Citation: Data
      PubDate: 2022-12-29
      DOI: 10.3390/data8010011
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 12: Lying-People Pressure-Map Datasets: A Systematic
           Review

    • Authors: Luís Fonseca, Fernando Ribeiro, José Metrôlho
      First page: 12
      Abstract: Bedded or lying-people pressure-map datasets can be used to identify patients’ in-bed postures and can be very useful in numerous healthcare applications. However, the construction of these datasets is not always easy, and many researchers often resort to existing datasets to carry out their experiments and validate their solutions. This systematic review aimed to identify and characterise pressure-map datasets on lying-people- or bedded-people positions. We used a systematic approach to select nine studies that were thoroughly reviewed and summarised them considering methods of data collection, fields considered in the datasets, and results or their uses after collection. As a result of the review, six research questions were answered that allowed a characterisation of existing datasets regarding of the types of data included, number and types of poses considered, participant characteristics and size of the dataset, and information on how the datasets were built. This study might represent an important basis for academics and researchers to understand the information collected in each pressure-map dataset, the possible uses of such datasets, or methods to build new datasets.
      Citation: Data
      PubDate: 2022-12-30
      DOI: 10.3390/data8010012
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 8, Pages 13: A Consistent Land Cover Map Time Series at 2 m
           Spatial Resolution—The LifeWatch 2006-2015-2018-2019 Dataset for
           Wallonia

    • Authors: Julien Radoux, Axel Bourdouxhe, Thomas Coppée, Mathilde De De Vroey, Marc Dufrêne, Pierre Defourny
      First page: 13
      Abstract: Ecosystem accounting is based on the definition of the extent and the status of an ecosystem. Land cover maps extents are representative of several ecosystems and can therefore be used to support ecosystem accounting if reliable change information is available. The dataset described in this paper aims to provide land cover information (13 classes) for biodiversity monitoring, which has driven two key features. On one hand, open areas were described in more details (5 classes) than in the other maps available in the study area in order to increase their relevance for biodiversity models. On the other hand, monitoring means that the time series must consist of comparable layers. The time series integrate information from existing high quality land cover maps that are not fully comparable, as well as thematic products (crop type, road network and forest type) and remote sensing data (25 cm orthophotos, 0.8 pts/m2 LIDAR and Sentinel-1&2 data). Because of the high spatial resolution of the data and the fragmented landscape, boundary errors could cause a large proportion of false change detection if the maps are classified independently. Buildings and forests were therefore consolidated across time in order to build a time series where these changes can be trusted. Based on an independent validation, the overall accuracy was 93.1%, 92.6%, 94.8% and 93.9% +/− 1.3% for the years 2006, 2015, 2018 and 2019, respectively. The specific assessment of forest patch change highlighted a 98% +/− 2.7% user accuracy across the 4 years and 85% of forest cut detection. This time series will be completed and further consolidated with other dates using the same protocol and legend.
      Citation: Data
      PubDate: 2022-12-31
      DOI: 10.3390/data8010013
      Issue No: Vol. 8, No. 1 (2022)
       
  • Data, Vol. 7, Pages 168: Spectrogram Data Set for Deep-Learning-Based RF
           Frame Detection

    • Authors: Jakob Wicht, Ulf Wetzker, Vineeta Jain
      First page: 168
      Abstract: Automated spectrum analysis serves as a troubleshooting tool that helps to diagnose faults in wireless networks such as difficult signal propagation conditions and coexisting wireless networks. It provides a higher monitoring coverage while requiring less expertise compared with manual spectrum analysis. In this paper, we introduce a data set that can be used to train and evaluate deep learning models, capable of detecting frames from different wireless standards as well as interference between single frames. Since manually labeling a high variety of frames in different environments is too challenging, an artificial data generation pipeline was developed. The data set consists of 20,000 augmented signal segments, each containing a random number of different Wi-Fi and Bluetooth frames, their spectral image representations and labels that describe the position and type of frame within the spectrogram. The data set contains results of intermediate processing steps that enable the research or teaching community to create new data sets for specific requirements or to provide new interesting examination examples.
      Citation: Data
      PubDate: 2022-11-23
      DOI: 10.3390/data7120168
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 169: A Waveform Dataset in Continuous Mode of the
           Montefeltro Seismic Network (MF) in Central-Northern Italy from 2018 to
           2020

    • Authors: Antonella Megna, Giovanni Battista Cimini, Alessandro Marchetti, Nicola Mauro Pagliuca, Stefano Santini
      First page: 169
      Abstract: The Montefeltro seismic network (FDSN Network code: 1S) was deployed in the Apennines area of northern Marche and southern Emilia-Romagna regions (central Italy). A temporary network was set up in December 2018 and continues to operate, with an array consisting of stations equipped with dynamic digitizers and three-component short/extended/broad band seismometers (Guralp CMG/20s and 30s, Lennartz 3D/5s, Sara SS20 3D/0.5s sensors). The network records in continuous mode at 100 sps. The data are used to analyze the seismic activity and the spatiotemporal evolution of small seismic sequences occurring in the considered area and surrounding zones, strongly clustered in time and space. The data of dataset files are mini-seed formatted and subdivided by the following tree: (1) the dataset is divided by years; (2) the dataset is then subdivided by stations; (3) finally, the data are divided by days of each year in every station folder.
      Citation: Data
      PubDate: 2022-11-26
      DOI: 10.3390/data7120169
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 170: Identifying and Classifying Urban Data Sources
           for Machine Learning-Based Sustainable Urban Planning and Decision Support
           Systems Development

    • Authors: Stéphane C. K. Tékouabou, Jérôme Chenal, Rida Azmi, Hamza Toulni, El Bachir Diop, Anastasija Nikiforova
      First page: 170
      Abstract: With the increase in the amount and variety of data that are constantly produced, collected, and exchanged between systems, the efficiency and accuracy of solutions/services that use data as input may suffer if an inappropriate or inaccurate technique, method, or tool is chosen to deal with them. This paper presents a global overview of urban data sources and structures used to train machine learning (ML) algorithms integrated into urban planning decision support systems (DSS). It contributes to a common understanding of choosing the right urban data for a given urban planning issue, i.e., their type, source and structure, for more efficient use in training ML models. For the purpose of this study, we conduct a systematic literature review (SLR) of all relevant peer-reviewed studies available in the Scopus database. More precisely, 248 papers were found to be relevant with their further analysis using a text-mining approach to determine (a) the main urban data sources used for ML modeling, (b) the most popular approaches used in relevant urban planning and urban problem-solving studies and their relationship to the type of data source used, and (c) the problems commonly encountered in their use. After classifying them, we identified the strengths and weaknesses of data sources depending on several predefined factors. We found that the data mainly come from two main categories of sources, namely (1) sensors and (2) statistical surveys, including social network data. They can be classified as (a) opportunistic or (b) non-opportunistic depending on the process of data acquisition, collection, and storage. Data sources are closely correlated with their structure and potential urban planning issues to be addressed. Almost all urban data have an indexed structure and, in particular, either attribute tables for statistical survey data and data from simple sensors (e.g., climate and pollution sensors) or vectors, mostly obtained from satellite images after large-scale spatio-temporal analysis. The paper also provides a discussion of the potential opportunities, emerging issues, and challenges that urban data sources face and should overcome to better catalyze intelligent/smart planning. This should contribute to the general understanding of the data, their sources and the challenges to be faced and overcome by those seeking data and integrating them into smart applications and urban-planning processes.
      Citation: Data
      PubDate: 2022-11-28
      DOI: 10.3390/data7120170
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 171: Experimental and Nonlinear Finite Element
           Analysis Data for an Innovative Buckling Restrained Bracing System to
           Rehabilitate Seismically Deficient Structures

    • Authors: Abdul Saboor Karzad, Zaid A. Al-Sadoon, Abdullah Sagheer, Mohammad AlHamaydeh
      First page: 171
      Abstract: This article presents experimental data and nonlinear finite element analysis (NLFEA) modeling for an innovative buckling restrained bracing (BRB) system. The data were collected from qualification testing of introduced BRBs per the AISC 341 test provision and finite element modeling. The BRB is made of three parts: core bar, restraining unit, and end units, in which duplicates of three different core bar cross sections (i.e., fully threaded, threaded notched, and smooth shaved) were tested. The BRBs introduced in this research come with innovative end parts, so-called fingers. These fingers provide the longitudinal gap required in every BRB system and simultaneously prevent buckling of the core bar at the end regions at both ends of the BRB sample, thus facilitating an easy core replacement if it gets damaged in the event of an earthquake. The measured parameters were the applied cyclic load and the corresponding displacement. Analysis of the acquired data illustrated an almost symmetric hysteric behavior with a little higher capacity under compression but a noticeable overall ductility of 4. Moreover, finite element modeling data for one type of core bar (fully threaded) were curated. The data presented in this paper will be valuable for fabricating BRBs in practice and further research on the topic considered.
      Citation: Data
      PubDate: 2022-11-28
      DOI: 10.3390/data7120171
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 172: Data from Zimbabwean College Students on the
           Measurement Invariance of the Entrepreneurship Goal and Implementation
           Intentions Scales

    • Authors: Takawira Munyaradzi Ndofirepi
      First page: 172
      Abstract: This article analyses primary data on the entrepreneurship intentions of selected Zimbabwean college students. The goal of this study was to examine the measurement invariance of the entrepreneurship goal and implementation intention scales across gender groups in a higher education setting. Entrepreneurship goal intentions (EGI) and entrepreneurship implementation intentions (EII) are examined as separate but related constructs. To address the research goal, a positivist philosophy and quantitative research approach were used. A cross-sectional survey was used to collect data from a convenient sample of 262 college students in Zimbabwe. A researcher-administered questionnaire, written in English, was distributed to the respondents and collected after completion. Multi-group confirmatory analysis was performed on the dataset using JASP computer software. The results obtained confirmed all four levels of measurement invariance, namely configural, metric, scalar, and strict invariance. The pattern of the results validates the consistency of the measurement properties of the entrepreneurial intention instruments designed in developed countries across different contexts of use. Researchers, entrepreneurship educators, and policymakers in Zimbabwe can use the results of this analysis to quantify potential entrepreneurs among young adults and to come up with intervention measures to support future entrepreneurship.
      Citation: Data
      PubDate: 2022-11-29
      DOI: 10.3390/data7120172
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 173: Digital Twins: A Systematic Literature Review
           Based on Data Analysis and Topic Modeling

    • Authors: Kuzma Kukushkin, Yury Ryabov, Alexey Borovkov
      First page: 173
      Abstract: The digital twin has recently become a popular topic in research related to manufacturing, such as Industry 4.0, the industrial internet of things, and cyber-physical systems. In addition, digital twins are the focus of several research areas: construction, urban management, digital transformation of the economy, medicine, virtual reality, software testing, and others. The concept is not yet fully defined, its scope seems unlimited, and the topic is relatively new; all this can present a barrier to research. The main goal of this paper is to develop a proper methodology for visualizing the digital-twin science landscape using modern bibliometric tools, text-mining and topic-modeling, based on machine learning models—Latent Dirichlet Allocation (LDA) and BERTopic (Bidirectional Encoder Representations from Transformers). The scope of the study includes 8693 publications on the topic selected from the Scopus database, published between January 1993 and September 2022. Keyword co-occurrence analysis and topic-modeling indicate that studies on digital twins are still in the early stage of development. At the same time, the core of the topic is growing, and some topic clusters are emerging. More than 100 topics can be identified; the most popular and fastest-growing topic is ‘digital twins of industrial robots, production lines and objects.’ Further efforts are needed to verify the proposed methodology, which can be achieved by analyzing other research fields.
      Citation: Data
      PubDate: 2022-11-30
      DOI: 10.3390/data7120173
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 174: Determination of Soil Behavior during Evaporation
           Using Geotechnical Datasets

    • Authors: Jared Suchan, Shahid Azam
      First page: 174
      Abstract: Evaporation from soils is critical for agricultural water management. This requires a clear understanding of the water retention and soil shrinkage behavior of soils during water escape and due to fertilizers usage. Based on laboratory testing, this paper provides a comprehensive dataset generated for the determination of the geotechnical properties of inert silty sand and active lean clay using distilled water and saline pore fluid under ambient conditions. The tests include fluid-independent general soil properties, fluid-dependent specific soil properties, low-demand evaporation as a baseline, and high-demand evaporation to capture summer.
      Citation: Data
      PubDate: 2022-12-06
      DOI: 10.3390/data7120174
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 175: Convolutional-Based Encoder–Decoder Network
           for Time Series Anomaly Detection during the Milling of 16MnCr5

    • Authors: Tobias Schlagenhauf, Jan Wolf, Alexander Puchta
      First page: 175
      Abstract: Machine learning methods have widely been applied to detect anomalies in machine and cutting tool behavior during lathe or milling. However, detecting anomalies in the workpiece itself have not received the same attention by researchers. In this article, the authors present a publicly available multivariate time series dataset which was recorded during the milling of 16MnCr5. Due to artificially introduced, realistic anomalies in the workpiece, the dataset can be applied for anomaly detection. By using a convolutional autoencoder as a first model, good results in detecting the location of the anomalies in the workpiece were achieved. Furthermore, milling tools with two different diameters where used which led to a dataset eligible for transfer learning. The objective of this article is to provide researchers with a real-world time series dataset of the milling process which is suitable for modern machine learning research topics such as anomaly detection and transfer learning.
      Citation: Data
      PubDate: 2022-12-06
      DOI: 10.3390/data7120175
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 176: Semantic Representation of the Intersection of
           Criminal Law & Civil Tort

    • Authors: Alexandros Z. Spyropoulos, Angelos Kornilakis, Georgios C. Makris, Charalampos Bratsas, Vassilis Tsiantos, Ioannis Antoniou
      First page: 176
      Abstract: The more complex and globalized social structures become, the greater the need for new ways of exchanging information and knowledge. Legal science is a field that needs to be codified to allow the interoperability between people and states, as well as between humans and machines. The objective of this work is to develop an ontology in order to describe two different pillars of codified law (civil and criminal) and be able to depict the interaction between them. To answer the above question, we examine the Greek Criminal Law as depicted in the Greek Penal Code (ΠΚ) and the way its articles can be analyzed. Then we examine Tort as described in the Greek Civil Code (ΑΚ) and link the two codifications through the concepts of illegality and damage, both being prerequisites of tortious liability. Following that, through the Protégé application, a legal ontology is created in the OWL semantic language, while finally, four articles of the Penal Code are codified in the ontology and a presentation of their relation to the civil tort is required from a reasoning algorithm.
      Citation: Data
      PubDate: 2022-12-09
      DOI: 10.3390/data7120176
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 177: A Blockchain-Based Regulatory Framework for
           mHealth

    • Authors: Dounia Marbouh, Mecit Can Emre Simsekler, Khaled Salah, Raja Jayaraman, Samer Ellahham
      First page: 177
      Abstract: Mobile health (mHealth) is playing a key role in facilitating health services for patients. Such services may include remote diagnostics and monitoring, chronic conditions management, preventive medicine, and health promotion. While mHealth has gained significant traction during the COVID-19 pandemic, they may pose safety risks to patients. This entails regulations and monitoring of shared data and management of potential safety risks of all mHealth applications continuously and systematically. In this study, we propose a blockchain-based framework for regulating mHealth apps and governing their safe use. We systematically identify the needs, stakeholders, and requirements of the current mHealth practices and regulations that may benefit from blockchain features. Further, we exemplify our framework on a diabetes mHealth app that supports safety risk assessment and incident reporting functions. Blockchain technology can offer a solution to achieve this goal by providing improved security, transparency, accountability, and traceability of data among stakeholders. Blockchain has the potential to alleviate existing mHealth problems related to data centralization, poor data quality, lack of trust, and the absence of robust governance. In the paper, we present a discussion on the security aspects of our proposed blockchain-based framework, including limitations and challenges.
      Citation: Data
      PubDate: 2022-12-11
      DOI: 10.3390/data7120177
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 178: Impacts of Data Synthesis: A Metric for
           Quantifiable Data Standards and Performances

    • Authors: Gunjan Chandra, Pekka Siirtola, Satu Tamminen, Mikael J. Knip, Riitta Veijola, Juha Röning
      First page: 178
      Abstract: Clinical data analysis could lead to breakthroughs. However, clinical data contain sensitive information about participants that could be utilized for unethical activities, such as blackmailing, identity theft, mass surveillance, or social engineering. Data anonymization is a standard step during data collection, before sharing, to overcome the risk of disclosure. However, conventional data anonymization techniques are not foolproof and also hinder the opportunity for personalized evaluations. Much research has been done for synthetic data generation using generative adversarial networks and many other machine learning methods; however, these methods are either not free to use or are limited in capacity. This study evaluates the performance of an emerging tool named synthpop, an R package producing synthetic data as an alternative approach for data anonymization. This paper establishes data standards derived from the original data set based on the utilities and quality of information and measures variations in the synthetic data set to evaluate the performance of the data synthesis process. The methods to assess the utility of the synthetic data set can be broadly divided into two approaches: general utility and specific utility. General utility assesses whether synthetic data have overall similarities in the statistical properties and multivariate relationships with the original data set. Simultaneously, the specific utility assesses the similarity of a fitted model’s performance on the synthetic data to its performance on the original data. The quality of information is assessed by comparing variations in entropy bits and mutual information to response variables within the original and synthetic data sets. The study reveals that synthetic data succeeded at all utility tests with a statistically non-significant difference and not only preserved the utilities but also preserved the complexity of the original data set according to the data standard established in this study. Therefore, synthpop fulfills all the necessities and unfolds a wide range of opportunities for the research community, including easy data sharing and information protection.
      Citation: Data
      PubDate: 2022-12-11
      DOI: 10.3390/data7120178
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 179: Two- and Three-Dimensional Benchmarks for
           Particle Detection from an Industrial Rotary Kiln Combustion Chamber Based
           on Light-Field-Camera Recording

    • Authors: Markus Vogelbacher, Miao Zhang, Krasimir Aleksandrov, Hans-Joachim Gehrmann, Jörg Matthes
      First page: 179
      Abstract: This paper describes a benchmark dataset for the detection of fuel particles in 2D and 3D image data in a rotary kiln combustion chamber. The specific challenges of detecting the small particles under demanding environmental conditions allows for the performance of existing and new particle detection techniques to be evaluated. The data set includes a classification of burning and non-burning particles, which can be in the air but also on the rotary kiln wall. The light-field camera used for data generation offers the potential to develop and objectively evaluate new advanced particle detection methods due to the additional 3D information. Besides explanations of the data set and the contained ground truth, an evaluation procedure of the particle detection based on the ground truth and results for an own particle detection procedure for the data set are presented.
      Citation: Data
      PubDate: 2022-12-13
      DOI: 10.3390/data7120179
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 180: Data for Distribution of Vascular Plants
           (Tracheophytes) of Urban Forests and Floodplains in Tyumen City (Western
           Siberia)

    • Authors: Anatoliy A. Khapugin, Igor V. Kuzmin
      First page: 180
      Abstract: Tyumen City is a large city in Western Siberia. This territory has ecological problems, which are typical to many cities around the world, including the loss of biodiversity and environment, habitat pollution, and others. This data paper presents for the first time the plant species composition of 11 natural forest and floodplain areas in Tyumen City. In a city, forests provide a refuge for both threatened plants and weeds (including alien species). In these ecosystems, unique communities are being formed, where both threatened and alien plants can co-occur. Within the city’s area, forests serve as separate green “islands” among urbanized landscapes. A total of 11 forest and floodplain areas have been studied based on field surveys conducted by the authors of the paper in 2020–2022. The obtained data (8742 observations representing 434 species, accepted subspecies, and hybrids belonging to 270 genera and 74 families) serve as a basis for the modern flora of Tyumen City, its conservation, and counteraction to the introduction of alien plants.
      Citation: Data
      PubDate: 2022-12-14
      DOI: 10.3390/data7120180
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 181: DriverSVT: Smartphone-Measured Vehicle Telemetry
           Data for Driver State Identification

    • Authors: Walaa Othman, Alexey Kashevnik, Batol Hamoud, Nikolay Shilov
      First page: 181
      Abstract: One of the key functions of driver monitoring systems is the evaluation of the driver’s state, which is a key factor in improving driving safety. Currently, such systems heavily rely on the technology of deep learning, that in turn requires corresponding high-quality datasets to achieve the required level of accuracy. In this paper, we introduce a dataset that includes information about the driver’s state synchronized with the vehicle telemetry data. The dataset contains more than 17.56 million entries obtained from 633 drivers with the following data: the driver drowsiness and distraction states, smartphone-measured vehicle speed and acceleration, data from magnetometer and gyroscope sensors, g-force, lighting level, and smartphone battery level. The proposed dataset can be used for analyzing driver behavior and detecting aggressive driving styles, which can help to reduce accidents and increase safety on the roads. In addition, we applied the K-means clustering algorithm based on the 11 least-correlated features to label the data. The elbow method showed that the optimal number of clusters could be either two or three clusters. We chose to proceed with the three clusters to label the data into three main scenarios: parking and starting driving, driving in the city, and driving on highways. The result of the clustering was then analyzed to see what the most frequent critical actions inside the cabin in each scenario were. According to our analysis, an unfastened seat belt was the most frequent critical case in driving in the city scenario, while drowsiness was more frequent when driving on the highway.
      Citation: Data
      PubDate: 2022-12-15
      DOI: 10.3390/data7120181
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 182: Blockchain for Patient Safety: Use Cases,
           Opportunities and Open Challenges

    • Authors: Dounia Marbouh, Mecit Can Emre Simsekler, Khaled Salah, Raja Jayaraman, Samer Ellahham
      First page: 182
      Abstract: Medical errors are recognized as major threats to patient safety worldwide. Lack of streamlined communication and an inability to share and exchange data are among the contributory factors affecting patient safety. To address these challenges, blockchain can be utilized to ensure a secure, transparent and decentralized data exchange among stakeholders. In this study, we discuss six use cases that can benefit from blockchain to gain operational effectiveness and efficiency in the patient safety context. The role of stakeholders, system requirements, opportunities and challenges are discussed in each use case in detail. Connecting stakeholders and data in complex healthcare systems, blockchain has the potential to provide an accountable and collaborative milieu for the delivery of safe care. By reviewing the potential of blockchain in six use cases, we suggest that blockchain provides several benefits, such as an immutable and transparent structure and decentralized architecture, which may help transform health care and enhance patient safety. While blockchain offers remarkable opportunities, it also presents open challenges in the form of trust, privacy, scalability and governance. Future research may benefit from including additional use cases and developing smart contracts to present a more comprehensive view on potential contributions and challenges to explore the feasibility of blockchain-based solutions in the patient safety context.
      Citation: Data
      PubDate: 2022-12-16
      DOI: 10.3390/data7120182
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 183: Reduction Data Obtained from Cyclic Voltammetry
           of Benzophenones and Copper-2-Hydroxyphenone Complexes

    • Authors: Emmie Chiyindiko, Ernst H. G. Langner, Jeanet Conradie
      First page: 183
      Abstract: This article provides detailed redox data on nine differently substituted benzophenones and ten square planar copper(II) complexes containing 2-hydroxyphenones obtained by cyclic voltammetry (CV) experiments. The information provided is related to the published full research articles “An electrochemical and computational chemistry study of substituted benzophenones” (Electrochim. Acta2021, 373, 137894) and “Electrochemical behaviour of copper(II) complexes containing 2-hydroxyphenones” (Electrochim. Acta2022, 424, 140629), where the CVs and electrochemical data at mainly one scan rate, namely at 0.100 Vs−1, are reported. CVs and the related peak current and voltage values, not reported in the related research article, are provided in this article for nine differently substituted benzophenones and ten differently substituted copper-2-hydroxyphenone complexes at various scan rates over more than two orders of magnitude. The redox data presented are the first reported complete set of electrochemical data of nine 2-hydroxyphenones and ten copper(II) complexes containing 2-hydroxyphenone ligands.
      Citation: Data
      PubDate: 2022-12-19
      DOI: 10.3390/data7120183
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 146: Predicting Student Dropout and Academic Success

    • Authors: Valentim Realinho, Jorge Machado, Luís Baptista, Mónica V. Martins
      First page: 146
      Abstract: Higher education institutions record a significant amount of data about their students, representing a considerable potential to generate information, knowledge, and monitoring. Both school dropout and educational failure in higher education are an obstacle to economic growth, employment, competitiveness, and productivity, directly impacting the lives of students and their families, higher education institutions, and society as a whole. The dataset described here results from the aggregation of information from different disjointed data sources and includes demographic, socioeconomic, macroeconomic, and academic data on enrollment and academic performance at the end of the first and second semesters. The dataset is used to build machine learning models for predicting academic performance and dropout, which is part of a Learning Analytic tool developed at the Polytechnic Institute of Portalegre that provides information to the tutoring team with an estimate of the risk of dropout and failure. The dataset is useful for researchers who want to conduct comparative studies on student academic performance and also for training in the machine learning area.
      Citation: Data
      PubDate: 2022-10-28
      DOI: 10.3390/data7110146
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 147: Thematic Analysis of Indonesian Physics Education
           Research Literature Using Machine Learning

    • Authors: Purwoko Haryadi Santoso, Edi Istiyono, Haryanto, Wahyu Hidayatulloh
      First page: 147
      Abstract: Abundant physics education research (PER) literature has been disseminated through academic publications. Over the years, the growing body of literature challenges Indonesian PER scholars to understand how the research community has progressed and possible future work that should be encouraged. Nevertheless, the previous traditional method of thematic analysis possesses limitations when the amount of PER literature exponentially increases. In order to deal with this plethora of publications, one of the machine learning (ML) algorithms from natural language processing (NLP) studies was employed in this paper to automate a thematic analysis of Indonesian PER literature that still needs to be explored within the community. One of the well-known NLP algorithms, latent Dirichlet allocation (LDA), was used in this study to extract Indonesian PER topics and their evolution between 2014 and 2021. A total of 852 papers (~4 to 8 pages each) were collectively downloaded from five international conference proceedings organized, peer reviewed, and published by Indonesian PER researchers. Before their topics were modeled through the LDA algorithm, our data corpus was preprocessed through several common procedures of established NLP studies. The findings revealed that LDA had thematically quantified Indonesian PER topics and described their distinct development over a certain period. The identified topics from this study recommended that the Indonesian PER community establish robust development in eight distinct topics to the present. Here, we commenced with an initial interest focusing on research on physics laboratories and followed the research-based instruction in late 2015. For the past few years, the Indonesian PER scholars have mostly studied 21st century skills which have given way to a focus on developing relevant educational technologies and promoting the interdisciplinary aspects of physics education. We suggest an open room for Indonesian PER scholars to address the qualitative aspects of physics teaching and learning that is still scant within the literature.
      Citation: Data
      PubDate: 2022-10-28
      DOI: 10.3390/data7110147
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 148: An Open Dataset of Connected Speech in Aphasia
           with Consensus Ratings of Auditory-Perceptual Features

    • Authors: Zoe Ezzes, Sarah M. Schneck, Marianne Casilio, Davida Fromm, Antje S. Mefferd, Michael de Riesthal, Stephen M. Wilson
      First page: 148
      Abstract: Auditory-perceptual rating of connected speech in aphasia (APROCSA) is a system in which trained listeners rate a variety of perceptual features of connected speech samples, representing the disruptions and abnormalities that commonly occur in aphasia. APROCSA has shown promise as an approach for quantifying expressive speech and language function in individuals with aphasia. The aim of this study was to acquire and share a set of audiovisual recordings of connected speech samples from a diverse group of individuals with aphasia, along with consensus ratings of APROCSA features, for future use as training materials to teach others how to use the APROCSA system. Connected speech samples were obtained from six individuals with chronic post-stroke aphasia. The first five minutes of participant speech were excerpted from each sample, and five researchers independently evaluated each sample using APROCSA, rating its 27 features on a five-point scale. The researchers then discussed each feature in turn to obtain consensus ratings. The dataset will provide a useful, freely accessible resource for researchers, clinicians, and students to learn how to evaluate aphasic speech with an auditory-perceptual approach.
      Citation: Data
      PubDate: 2022-10-30
      DOI: 10.3390/data7110148
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 149: Cryptocurrency Price Prediction with
           Convolutional Neural Network and Stacked Gated Recurrent Unit

    • Authors: Chuen Yik Kang, Chin Poo Lee, Kian Ming Lim
      First page: 149
      Abstract: Virtual currencies have been declared as one of the financial assets that are widely recognized as exchange currencies. The cryptocurrency trades caught the attention of investors as cryptocurrencies can be considered as highly profitable investments. To optimize the profit of the cryptocurrency investments, accurate price prediction is essential. In view of the fact that the price prediction is a time series task, a hybrid deep learning model is proposed to predict the future price of the cryptocurrency. The hybrid model integrates a 1-dimensional convolutional neural network and stacked gated recurrent unit (1DCNN-GRU). Given the cryptocurrency price data over the time, the 1-dimensional convolutional neural network encodes the data into a high-level discriminative representation. Subsequently, the stacked gated recurrent unit captures the long-range dependencies of the representation. The proposed hybrid model was evaluated on three different cryptocurrency datasets, namely Bitcoin, Ethereum, and Ripple. Experimental results demonstrated that the proposed 1DCNN-GRU model outperformed the existing methods with the lowest RMSE values of 43.933 on the Bitcoin dataset, 3.511 on the Ethereum dataset, and 0.00128 on the Ripple dataset.
      Citation: Data
      PubDate: 2022-10-31
      DOI: 10.3390/data7110149
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 150: Manual Conversion of Sadhukarn to Thai and
           Western Music Notations and Their Translation into a Rhyme Structure for
           Music Analysis

    • Authors: Sumetus Eambangyung, Gretel Schwörer-Kohl, Witoon Purahong
      First page: 150
      Abstract: Sadhukarn plays an important role as the most sacred music composition in Thai, Cambodian, and Lao music cultural areas. Due to various versions of unverified Sadhukarn main melodies in three different countries, notating melodies in suitable formats with a systematic method is necessary. This work provides a data descriptor for music transcription related to 25 different versions of the Sadhukarn main melody collected in Thailand, Cambodia, and Laos. Furthermore, we introduce a new procedure of music analysis based on rhyme structure. The aims of the study are to (1) provide Thai/Western musical note comprehension in the forms of Western staff and Thai notation, and (2) describe the procedures for translating from musical note to rhyme structure. To generate a rhyme structure, we apply a Thai poetic and linguistic approach as the method establishment. Rhyme structure is composed of melodic structures, the pillar tones Look-Tok, and melodic rhyming outline.
      Citation: Data
      PubDate: 2022-10-31
      DOI: 10.3390/data7110150
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 151: Isochromatic-Art: A Computational Dataset for
           Digital Photoelasticity Studies

    • Authors: Juan-Carlos Briñez-De-Leon, Mateo Rico-Garcia, Alejandro Restrepo-Martínez
      First page: 151
      Abstract: The importance of evaluating the stress field of loaded structures lies in the need for identifying the forces which make them fail, redesigning their geometry to increase the mechanical resistance, or characterizing unstressed regions to remove material. In such work line, digital photoelasticity highlights with the possibility of revealing the stress information through isochromatic color fringes, and quantifying it through inverse problem strategies. However, the absence of public data with a high variety of spatial fringe distribution has limited developing new proposals which generalize the stress evaluation in a wider variety of industrial applications. This dataset shares a variated collection of stress maps and their respective representation in color fringe patterns. In this case, the data were generated following a computational strategy that emulates the circular polariscope in dark field, but assuming stress surfaces and patches derived from analytical stress models, 3D reconstructions, saliency maps, and superpositions of Gaussian surfaces. In total, two sets of ‘101430’ raw images were separately generated for stress maps and isochromatic color fringes, respectively. This dataset can be valuable for researchers interested in characterizing the mechanical response in loaded models, engineers in computer science interested in modeling inverse problems, and scientists who work in physical phenomena such as 3D reconstruction in visible light, bubble analysis, oil surfaces, and film thickness.
      Citation: Data
      PubDate: 2022-11-01
      DOI: 10.3390/data7110151
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 152: Arabic Twitter Conversation Dataset about the
           COVID-19 Vaccine

    • Authors: Huda Alhazmi
      First page: 152
      Abstract: The development and rollout of COVID-19 vaccination around the world offers hope for controlling the pandemic. People turned to social media such as Twitter seeking information or to voice their opinion. Therefore, mining such conversation can provide a rich source of data for different applications related to the COVID-19 vaccine. In this data article, we developed an Arabic Twitter dataset of 1.1 M Arabic posts regarding the COVID-19 vaccine. The dataset was streamed over one year, covering the period from January to December 2021. We considered a set of crawling keywords in the Arabic language related to the conversation about the vaccine. The dataset consists of seven databases that can be analyzed separately or merged for further analysis. The initial analysis depicts the embedded features within the posts, including hashtags, media, and the dynamic of replies and retweets. Further, the textual analysis reveals the most frequent words that can capture the trends of the discussions. The dataset was designed to facilitate research across different fields, such as social network analysis, information retrieval, health informatics, and social science.
      Citation: Data
      PubDate: 2022-11-04
      DOI: 10.3390/data7110152
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 153: Ground Truth Dataset: Objectionable Web Content

    • Authors: Hamza H. M. Altarturi, Nor Badrul Anuar
      First page: 153
      Abstract: Cyber parental control aims to filter objectionable web content and prevent children from being exposed to harmful content. Succeeding in detecting and blocking objectionable content depends heavily on the accuracy of the topic model. A reliable ground truth dataset is essential for building effective cyber parental control models and validation of new detection methods. The ground truth is the measurement for labeling objectionable and unobjectionable websites of the cyber parental control dataset. The lack of publicly accessible datasets with a reliable ground truth has prevented a fair and coherent comparison of different methods proposed in the field of cyber parental control. This paper presents a ground truth dataset that contains 8000 labelled websites with 4000 objectionable websites and 4000 unobjectionable websites. These websites consist of more than 2 million web pages. Creating a ground truth objectionable web content dataset involved a few phases, including data collection, extraction, and labeling. Finally, the presence of bias, using kappa coefficient measurement, is addressed. The ground truth dataset is available publicly in the Mendeley repository.
      Citation: Data
      PubDate: 2022-11-07
      DOI: 10.3390/data7110153
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 154: Dataset on Force Myography for
           Human–Robot Interactions

    • Authors: Umme Zakia, Carlo Menon
      First page: 154
      Abstract: Force myography (FMG) is a contemporary, non-invasive, wearable technology that can read the underlying muscle volumetric changes during muscle contractions and expansions. The FMG technique can be used in recognizing human applied hand forces during physical human robot interactions (pHRI) via data-driven models. Several FMG-based pHRI studies were conducted in 1D, 2D and 3D during dynamic interactions between a human participant and a robot to realize human applied forces in intended directions during certain tasks. Raw FMG signals were collected via 16-channel (forearm) and 32-channel (forearm and upper arm) FMG bands while interacting with a biaxial stage (linear robot) and a serial manipulator (Kuka robot). In this paper, we present the datasets and their structures, the pHRI environments, and the collaborative tasks performed during the studies. We believe these datasets can be useful in future studies on FMG biosignal-based pHRI control design.
      Citation: Data
      PubDate: 2022-11-08
      DOI: 10.3390/data7110154
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 155: Reference-Guided Draft Genome Assembly,
           

    • Authors: Richard Estrada, Flor-Anita Corredor, Deyanira Figueroa, Wilian Salazar, Carlos Quilcate, Héctor V. Vásquez, Jorge L. Maicelo, Jhony Gonzales, Carlos I. Arbizu
      First page: 155
      Abstract: The Peruvian creole cattle (PCC) is a neglected breed and an essential livestock resource in the Andean region of Peru. To develop a modern breeding program and conservation strategies for the PCC, a better understanding of the genetics of this breed is needed. We sequenced the whole genome of the PCC using a de novo assembly approach with a paired-end 150 strategy on the Illumina HiSeq 2500 platform, obtaining 320 GB of sequencing data. A reference scaffolding was used to improve the draft genome. The obtained genome size of the PCC was 2.81 Gb with a contig N50 of 108 Mb and 92.59% complete BUSCOs. This genome size is similar to the genome references of Bos taurus and B. indicus. In addition, we identified 40.22% of repetitive DNA of the genome assembly, of which retroelements occupy 32.39% of the total genome. A total of 19,803 protein-coding genes were annotated in the PCC genome. For SSR data mining, we detected similar statistics in comparison with other breeds. The PCC genome will contribute to a better understanding of the genetics of this species and its adaptation to tough conditions in the Andean ecosystem.
      Citation: Data
      PubDate: 2022-11-09
      DOI: 10.3390/data7110155
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 156: Hybrid Wi-Fi and BLE Fingerprinting Dataset for
           Multi-Floor Indoor Environments with Different Layouts

    • Authors: Aina Nadhirah Nor Hisham, Yin Hoe Ng, Chee Keong Tan, David Chieng
      First page: 156
      Abstract: Indoor positioning has garnered significant interest over the last decade due to the rapidly growing demand for location-based services. As a result, a multitude of techniques has been proposed to localize objects and devices in indoor environments. Wireless fingerprinting, which leverages machine learning, has emerged as one of the most popular positioning approaches due to its low implementation cost. The prevailing fingerprinting-based positioning mainly utilizes wireless fidelity (Wi-Fi) and Bluetooth low energy (BLE) signals. However, the RSS of Wi-Fi and BLE signals are very sensitive to the layout of the indoor environment. Thus, any change in the indoor layout could potentially lead to severe degradation in terms of localization performance. To foster the development of new positioning methods, several open-source location fingerprinting datasets have been made available to the research community. Unfortunately, none of these public datasets provides the received signal strength (RSS) measurements for indoor environments with different layouts. To fill this gap, this paper presents a new hybrid Wi-Fi and BLE fingerprinting dataset for multi-floor indoor environments with different layouts to facilitate the future development of new fingerprinting-based positioning systems that can provide adaptive positioning performance in dynamic indoor environments. Additionally, the effects of indoor layout change on the location fingerprint and localization performance are also investigated.
      Citation: Data
      PubDate: 2022-11-09
      DOI: 10.3390/data7110156
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 157: High-Resolution UAV RGB Imagery Dataset for
           Precision Agriculture and 3D Photogrammetric Reconstruction Captured over
           a Pistachio Orchard (Pistacia vera L.) in Spain

    • Authors: Sergio Vélez, Rubén Vacas, Hugo Martín, David Ruano-Rosa, Sara Álvarez
      First page: 157
      Abstract: A total of 248 UAV RGB images were taken in the summer of 2021 over a representative pistachio orchard in Spain (X: 341450.3, Y: 4589731.8; ETRS89/UTM zone 30N). It is a 2.03 ha plot, planted in 2016 with Pistacia vera L. cv. Kerman grafted on UCB rootstock, with a NE–SW orientation and a 7 × 6 m triangular planting pattern. The ground was kept free of any weeds that could affect image processing. The photos (provided in JPG format) were taken using a UAV DJI Phantom Advance quadcopter in two flight missions: one planned to take nadir images (β = 0°), and another to take oblique images (β = 30°), both at 55 metres above the ground. The aerial platform incorporates a DJI FC6310 RGB camera with a 20 megapixel sensor, a horizontal field of view of 84° and a mechanical shutter. In addition, GCPs (ground control points) were collected. Finally, a high-quality 3D photogrammetric reconstruction process was carried out to generate a 3D point cloud (provided in LAS, LAZ, OBJ and PLY formats), a DEM (digital elevation model) and an orthomosaic (both in TIF format). The interest in using remote sensing in precision agriculture is growing, but the availability of reliable, ready-to-work, downloadable datasets is limited. Therefore, this dataset could be useful for precision agriculture researchers interested in photogrammetric reconstruction who want to evaluate models for orthomosaic and 3D point cloud generation from UAV missions with changing flight parameters, such as camera angle.
      Citation: Data
      PubDate: 2022-11-10
      DOI: 10.3390/data7110157
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 158: Measuring and Validating the Factors Influenced
           the SME Business Growth in Germany—Descriptive Analysis and
           Construct Validation

    • Authors: Hosam Azat Elsaman, Nourhan El-Bayaa, Suriyakumaran Kousihan
      First page: 158
      Abstract: In Germany, the medical device industry constitutes a cornerstone of the health sector. In this study, we investigated the challenges and factors affecting the present-day performance of German SMEs concerned with medical devices. The research methodology adopted a cross-sectional and correlational research design, with simple random-sampling techniques, to data obtained from 110 mid-level and senior managers in German SMEs by means of an online structured survey in August 2022. We statistically validated our study data using exploratory factor analysis (EFA), Kaiser–Meyer–Olkin (KMO) testing, and Bartlett’s test, to assess the relationship between study variables and measure data adequacy using the R4.1.1(21) software, then carried out principal component analysis (PCA) with varimax factor loading and extracted six factors for use as research variables. The researchers also applied descriptive data analysis techniques using SPSS.21. The main study variables were: (1) the business performance of small and medium businesses (SMP); (2) their financial situation (SMEF); and (3) their implementation of new medical device industry regulations (MDR). By such statistical means, results confirmed poorer business performance and lower anticipated growth amongst SMEs affected by MDR, over and above the impacts of the present-day economic situation. The data can be used by management information systems (MIS) and decision system support professionals for planning and developing practical models about how to cope with current industry challenges. We recommend further research involving inferential analysis and triangulation of these data in the form of a semi-structured qualitative study in the larger scope of the population and different sectors.
      Citation: Data
      PubDate: 2022-11-10
      DOI: 10.3390/data7110158
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 159: Stance Classification of Social Media Texts for
           Under-Resourced Scenarios in Social Sciences

    • Authors: Victoria Yantseva, Kostiantyn Kucher
      First page: 159
      Abstract: In this work, we explore the performance of supervised stance classification methods for social media texts in under-resourced languages and using limited amounts of labeled data. In particular, we focus specifically on the possibilities and limitations of the application of classic machine learning versus deep learning in social sciences. To achieve this goal, we use a training dataset of 5.7K messages posted on Flashback Forum, a Swedish discussion platform, further supplemented with the previously published ABSAbank-Imm annotated dataset, and evaluate the performance of various model parameters and configurations to achieve the best training results given the character of the data. Our experiments indicate that classic machine learning models achieve results that are on par or even outperform those of neural networks and, thus, could be given priority when considering machine learning approaches for similar knowledge domains, tasks, and data. At the same time, the modern pre-trained language models provide useful and convenient pipelines for obtaining vectorized data representations that can be combined with classic machine learning algorithms. We discuss the implications of their use in such scenarios and outline the directions for further research.
      Citation: Data
      PubDate: 2022-11-13
      DOI: 10.3390/data7110159
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 160: Explainable Machine Learning for Financial
           Distress Prediction: Evidence from Vietnam

    • Authors: Kim Long Tran, Hoang Anh Le, Thanh Hien Nguyen, Duc Trung Nguyen
      First page: 160
      Abstract: The past decade has witnessed the rapid development of machine learning applied in economics and finance. Recent evidence suggests that machine learning models have produced superior results to traditional statistical models and have become the driving force for dramatic improvement in the financial industry. However, a much-debated question is whether the prediction results from black box machine learning models can be interpreted. In this study, we compared the predictive power of machine learning algorithms and applied SHAP values to interpret the prediction results on the dataset of listed companies in Vietnam from 2010 to 2021. The results showed that the extreme gradient boosting and random forest models outperformed other models. In addition, based on Shapley values, we also found that long-term debts to equity, enterprise value to revenues, account payable to equity, and diluted EPS had greatly influenced the outputs. In terms of practical contributions, the study helps credit rating companies have a new method for predicting the possibility of default of bond issuers in the market. The study also provides an early warning tool for policymakers about the risks of public companies in order to develop measures to protect retail investors against the risk of bond default.
      Citation: Data
      PubDate: 2022-11-14
      DOI: 10.3390/data7110160
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 161: Dataset: Coleoptera (Insecta) Collected from Beer
           Traps in “Smolny” National Park (Russia)

    • Authors: Alexander B. Ruchin, Leonid V. Egorov, Oleg N. Artaev, Mikhail N. Esin
      First page: 161
      Abstract: Monitoring Coleoptera diversity in protected areas is part of the global ecological monitoring of the state of ecosystems. The purpose of this research is to describe the biodiversity of Coleoptera studied with the help of baits based on fermented substrate in the European part of Russia (Smolny National Park). The research was conducted April–August 2018–2022. Samples were collected in traps of our own design. Beer or wine with the addition of sugar, honey, or jam was used for bait. A total of 194 traps were installed. The dataset contains 1254 occurrences. A total of 9226 Coleoptera specimens have been studied. The dataset contains information about 134 species from 24 Coleoptera families. The largest number of species that have been found in traps belongs to the family Cerambycidae (30 species), Nitidulidae (14 species), Elateridae (12 species), and Curculionidae and Coccinellidae (10 species each). The number of individuals in the traps of these families was distributed as follows: Cerambycidae—1018 specimens; Nitidulidae—5359; Staphylinidae—241; Elateridae—33; Curculionidae—148; and Coccinellidae—19. The 10 dominant species accounted for 90.7% of all detected specimens in the traps. The maximum species diversity and abundance of Coleoptera was obtained in 2021. With the installation of the largest number of traps in 2022 and more diverse biotopes (64 traps), a smaller number of species was caught compared to 2021. New populations of such species have been found from rare Coleoptera: Calosoma sycophanta, Elater ferrugineus, Osmoderma barnabita, Protaetia speciosissima, and Protaetia fieberi.
      Citation: Data
      PubDate: 2022-11-15
      DOI: 10.3390/data7110161
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 162: Methodology for the Surveillance the Voltage
           Supply in Public Buildings Using the ITIC Curve and Python Programming

    • Authors: Javier Fernández-Morales, Juan-José González-de-la-Rosa, José-María Sierra-Fernández, Olivia Florencias-Oliveros, Paula Remigio-Carmona, Manuel-Jesús Espinosa-Gavira, Agustín Agüera-Pérez, José-Carlos Palomares-Salas
      First page: 162
      Abstract: This paper proposes an easy-to-implement method for detecting and assessing two of the most frequent PQ (Power Quality) problems: voltage sags and swells. These can affect sensitive equipment such as computers, programmable logic controllers, contactors, etc. Therefore, it is of great interest to implement it in any laboratory, not only for protection reasons but also as a safeguard for claims against the supply company. Thanks to the actual context, in which it is possible to manage big volumes of data, connect multiple devices with IoT (Internet of Things), etc., it is feasible and of great interest to monitor the voltage at specific points of the network. This makes it possible to detect voltage sags and swells and diagnose which points are more prone to this type of problems. For the detection of sags and swells, a program written in Python is in charge of crawling all the files in the database and target those RMS values that fall outside the established limits. Compared to LabVIEW, which might have been the most logical alternative, being the acquisition hardware from the same company (National Instruments), Python has a higher computational performance and is also free of charge, unlike LabVIEW. Thanks to the libraries available in Python, it allows a hardware control close to what is possible using LabVIEW. Implemented in MATLAB, the ITIC (Information Technology Industry Council) power acceptability curve reflects the impact of these power quality disturbances in electrical power systems. The results showed that the combined action of Python and MATLAB performed well on a conventional desktop computer.
      Citation: Data
      PubDate: 2022-11-17
      DOI: 10.3390/data7110162
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 163: An Analysis by State on The Effect of Movement
           Control Order (MCO) 3.0 Due to COVID-19 on Malaysians’ Mental
           Health: Evidence from Google Trends

    • Authors: Nicholas Tze Ping Pang, Assis Kamu, Chong Mun Ho, Walton Wider, Mathias Wen Leh Tseu
      First page: 163
      Abstract: Due to significant social and economic upheavals brought on by the COVID-19 pandemic, there is a great deal of psychological pain. Google Trends data have been seen as a corollary measure to assess population-wide trends via observing trends in search results. Judicious analysis of Google Trends data can have both analytical and predictive capacities. This study aimed to compare nation-wide and inter-state trends in mental health before and after the Malaysian Movement Control Order 3.0 (MCO 3.0) commencing 12 May 2021. This was through assessment of two terms, “stress” and “sleep” in both the Malay and English language. Google Trends daily data between March 6 and 31 May in both 2019 and 2021 was obtained, and both series were re-scaled to be comparable. Searches before and after MCO 3.0 in 2021 were compared to searches before and after the same date in 2019. This was carried out using the differences in difference (DiD) method. This ensured that seasonal variations between states were not the source of our findings. We found that DiD estimates, β_3 for “sleep” and “stress” were not significantly different from zero, implying that MCO 3.0 had no effect on psychological distress in all states. Johor was the only state where the DiD estimates β_3 were significantly different from zero for the search topic ‘Tidur’. For the topic ‘Tekanan’, there were two states with significant DiD estimates, β_3, namely Penang and Sarawak. This study hence demonstrates that there are particular state-level differences in Google Trend search terms, which gives an indicator as to states to prioritise interventions and increase surveillance for mental health. In conclusion, Google Trends is a powerful tool to examine larger population-based trends especially in monitoring public health parameters such as population-level psychological distress, which can facilitate interventions.
      Citation: Data
      PubDate: 2022-11-17
      DOI: 10.3390/data7110163
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 164: CoviRx: A User-Friendly Interface for Systematic
           Down-Selection of Repurposed Drug Candidates for COVID-19

    • Authors: Hardik A. Jain, Vinti Agarwal, Chaarvi Bansal, Anupama Kumar, Faheem Faheem, Muzaffar-Ur-Rehman Mohammed, Sankaranarayanan Murugesan, Moana M. Simpson, Avinash V. Karpe, Rohitash Chandra, Christopher A. MacRaild, Ian K. Styles, Amanda L. Peterson, Matthew A. Cooper, Carl M. J. Kirkpatrick, Rohan M. Shah, Enzo A. Palombo, Natalie L. Trevaskis, Darren J. Creek, Seshadri S. Vasan
      First page: 164
      Abstract: Although various vaccines are now commercially available, they have not been able to stop the spread of COVID-19 infection completely. An excellent strategy to get safe, effective, and affordable COVID-19 treatments quickly is to repurpose drugs that are already approved for other diseases. The process of developing an accurate and standardized drug repurposing dataset requires considerable resources and expertise due to numerous commercially available drugs that could be potentially used to address the SARS-CoV-2 infection. To address this bottleneck, we created the CoviRx.org platform. CoviRx is a user-friendly interface that allows analysis and filtering of large quantities of data, which is onerous to curate manually for COVID-19 drug repurposing. Through CoviRx, the curated data have been made open source to help combat the ongoing pandemic and encourage users to submit their findings on the drugs they have evaluated, in a uniform format that can be validated and checked for integrity by authenticated volunteers. This article discusses the various features of CoviRx, its design principles, and how its functionality is independent of the data it displays. Thus, in the future, this platform can be extended to include any other disease beyond COVID-19.
      Citation: Data
      PubDate: 2022-11-18
      DOI: 10.3390/data7110164
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 165: Density-Based Unsupervised Learning Algorithm to
           Categorize College Students into Dropout Risk Levels

    • Authors: Miguel Angel Valles-Coral, Luis Salazar-Ramírez, Richard Injante, Edwin Augusto Hernandez-Torres, Juan Juárez-Díaz, Jorge Raul Navarro-Cabrera, Lloy Pinedo, Pierre Vidaurre-Rojas
      First page: 165
      Abstract: Compliance with the basic conditions of quality in higher education implies the design of strategies to reduce student dropout, and Information and Communication Technologies (ICT) in the educational field have allowed directing, reinforcing, and consolidating the process of professional academic training. We propose an academic and emotional tracking model that uses data mining and machine learning to group university students according to their level of dropout risk. We worked with 670 students from a Peruvian public university, applied 5 valid and reliable psychological assessment questionnaires to them using a chatbot-based system, and then classified them using 3 density-based unsupervised learning algorithms, DBSCAN, K-Means, and HDBSCAN. The results showed that HDBSCAN was the most robust option, obtaining better validity levels in two of the three internal indices evaluated, where the performance of the Silhouette index was 0.6823, the performance of the Davies–Bouldin index was 0.6563, and the performance of the Calinski–Harabasz index was 369.6459. The best number of clusters produced by the internal indices was five. For the validation of external indices, with answers from mental health professionals, we obtained a high level of precision in the F-measure: 90.9%, purity: 94.5%, V-measure: 86.9%, and ARI: 86.5%, and this indicates the robustness of the proposed model that allows us to categorize university students into five levels according to the risk of dropping out.
      Citation: Data
      PubDate: 2022-11-18
      DOI: 10.3390/data7110165
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 166: Forecasting Daily COVID-19 Case Counts Using
           Aggregate Mobility Statistics

    • Authors: Bulut Boru, M. Emre Gursoy
      First page: 166
      Abstract: The COVID-19 pandemic has impacted the whole world profoundly. For managing the pandemic, the ability to forecast daily COVID-19 case counts would bring considerable benefit to governments and policymakers. In this paper, we propose to leverage aggregate mobility statistics collected from Google’s Community Mobility Reports (CMRs) toward forecasting future COVID-19 case counts. We utilize features derived from the amount of daily activity in different location categories such as transit stations versus residential areas based on the time series in CMRs, as well as historical COVID-19 daily case and test counts, in forecasting future cases. Our method trains optimized regression models for different countries based on dynamic and data-driven selection of the feature set, regression type, and time period that best fit the country under consideration. The accuracy of our method is evaluated on 13 countries with diverse characteristics. Results show that our method’s forecasts are highly accurate when compared to the real COVID-19 case counts. Furthermore, visual analysis shows that the peaks, plateaus and general trends in case counts are also correctly predicted by our method.
      Citation: Data
      PubDate: 2022-11-20
      DOI: 10.3390/data7110166
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 167: Database of Metagenomes of Sediments from
           Estuarine Aquaculture Farms in Portugal—AquaRAM Project Collection

    • Authors: Teresa Nogueira, Daniel G. Silva, Susana Lopes, Ana Botelho
      First page: 167
      Abstract: Aquaculture farms and estuarine environments close to human activities play a critical role in the interaction between aquatic and terrestrial surroundings and animal and human health. The AquaRAM project aimed to study estuarine aquaculture farms in Portugal as a reservoir of antibiotic resistance genes and the potential of its spread due to mobile genetic elements. We have assembled a collection of metagenomic data from 30 sediment samples from oysters, mussels, and gilt-head sea bream aquaculture farms. This collection includes samples of the estuarine environment of three rivers and one lagoon located from the north to the south of Portugal, namely, the Lima River in Viana do Castelo, Aveiro Lagoon in Aveiro, Tagus River in Alcochete, and Sado River in Setúbal. Statistical data from the raw metagenome files, as well as the file sizes of the assembled nucleotide and protein sequences, are also presented. The link to the statistics and the download page for all the metagenomes is also listed below.
      Citation: Data
      PubDate: 2022-11-20
      DOI: 10.3390/data7110167
      Issue No: Vol. 7, No. 11 (2022)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.238.134.157
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-