A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
The end of the list has been reached or no journals were found for your choice.
Similar Journals
Journal Cover
Data
Number of Followers: 4  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2306-5729
Published by MDPI Homepage  [246 journals]
  • Data, Vol. 7, Pages 119: Student Dataset from Tecnologico de Monterrey in
           Mexico to Predict Dropout in Higher Education

    • Authors: Joanna Alvarado-Uribe, Paola Mejía-Almada, Ana Luisa Masetto Herrera, Roland Molontay, Isabel Hilliger, Vinayak Hegde, José Enrique Montemayor Gallegos, Renato Armando Ramírez Díaz, Hector G. Ceballos
      First page: 119
      Abstract: High dropout rates and delayed completion in higher education are associated with considerable personal and social costs. In Latin America, 50% of students drop out, and only 50% of the remaining ones graduate on time. Therefore, there is an urgent need to identify students at risk and understand the main factors of dropping out. Together with the emergence of efficient computational methods, the rich data accumulated in educational administrative systems have opened novel approaches to promote student persistence. In order to support research related to preventing student dropout, a dataset has been gathered and curated from Tecnologico de Monterrey students, consisting of 50 variables and 143,326 records. The dataset contains non-identifiable information of 121,584 High School and Undergraduate students belonging to the seven admission cohorts from August–December 2014 to 2020, covering two educational models. The variables included in this dataset consider factors mentioned in the literature, such as sociodemographic and academic information related to the student, as well as institution-specific variables, such as student life. This dataset provides researchers with the opportunity to test different types of models for dropout prediction, so as to inform timely interventions to support at-risk students.
      Citation: Data
      PubDate: 2022-08-25
      DOI: 10.3390/data7090119
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 120: A Dataset for the Vietnamese Banking System
           (2002–2021)

    • Authors: Tu D. Q. Le, Tin H. Ho, Thanh Ngo, Dat T. Nguyen, Son H. Tran
      First page: 120
      Abstract: This data article describes a dataset that consists of key statistics on the activities of 45 Vietnamese banks (e.g., deposits, loans, assets, and labor productivity), operated during the 2002–2021 period, yielding a total of 644 bank-year observations. This is the first systematic compilation of data on the splits of state vs. private ownership, foreign vs. domestic banks, commercial vs. policy banks, and listed vs. nonlisted banks. Consequently, this arrives at a unique set of variables and indicators that allow us to capture the development and performance of the Vietnamese banking sector over time along many different dimensions. This can play an important role for financial analysts, researchers, and educators in banking efficiency and performance, risk and profit/revenue management, machine learning, and other fields.
      Citation: Data
      PubDate: 2022-08-25
      DOI: 10.3390/data7090120
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 121: Transcriptome Profiles of Circular RNAs in Common
           Wheat during Fusarium Head Blight Disease

    • Authors: Junliang Yin, Xiaowen Han, Yongxing Zhu, Zhengwu Fang, Derong Gao, Dongfang Ma
      First page: 121
      Abstract: Circular RNAs (circRNAs) are covalently closed RNA molecules, and have been identified in many crops. However, there are few datasets for circRNA junctions from common wheat during Fusarium head blight disease. In the present study, we used RNA-seq to determine the changes in circRNAs among the control (CK) and 1, 3, and 5 days post-Fusarium graminearum inoculation (dpi) samples. More than one billion reads were produced from 12 libraries, and 99.99% of the reads were successfully mapped to a wheat reference genome. In total, 2091 high-confidence circRNAs—which had two or more junction reads and were supported by at least two circRNA identification algorithms—were detected. The completed expression profiling revealed a distinct expression pattern of circRNAs among the CK, 1dpi, 3dpi and 5dpi samples. This study provides a valuable resource for identifying F. graminearum infection-responsive circRNAs in wheat and for further functional characterization of circRNAs that participated in the Fusarium head blight disease response of wheat.
      Citation: Data
      PubDate: 2022-08-29
      DOI: 10.3390/data7090121
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 122: Advances in Contextual Action Recognition:
           Automatic Cheating Detection Using Machine Learning Techniques

    • Authors: Fairouz Hussein, Ayat Al-Ahmad, Subhieh El-Salhi, Esra’a Alshdaifat, Mo’taz Al-Hami
      First page: 122
      Abstract: Teaching and exam proctoring represent key pillars of the education system. Human proctoring, which involves visually monitoring examinees throughout exams, is an important part of assessing the academic process. The capacity to proctor examinations is a critical component of educational scalability. However, such approaches are time-consuming and expensive. In this paper, we present a new framework for the learning and classification of cheating video sequences. This kind of study aids in the early detection of students’ cheating. Furthermore, we introduce a new dataset, “actions of student cheating in paper-based exams”. The dataset consists of suspicious actions in an exam environment. Five classes of cheating were performed by eight different actors. Each pair of subjects conducted five distinct cheating activities. To evaluate the performance of the proposed framework, we conducted experiments on action recognition tasks at the frame level using five types of well-known features. The findings from the experiments on the framework were impressive and substantial.
      Citation: Data
      PubDate: 2022-08-31
      DOI: 10.3390/data7090122
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 123: Revealing the Complete Chloroplast Genome of an
           Andean Horticultural Crop, Sweet Cucumber (Solanum muricatum), and Its
           Comparison with Other Solanaceae Species

    • Authors: Carla L. Saldaña, Julio C. Chávez-Galarza, Germán De la Cruz, Jorge H. Jhoncon, Juan C. Guerrero-Abad, Héctor V. Vásquez, Jorge L. Maicelo, Carlos I. Arbizu
      First page: 123
      Abstract: Sweet cucumber (Solanum muricatum) sect. Basarthrum is a neglected horticultural crop native to the Andean region. It is naturally distributed very close to other two Solanum crops of high importance, potatoes, and tomatoes. To date, molecular tools for this crop remain undetermined. In this study, the complete sweet cucumber chloroplast (cp) genome was obtained and compared with seven Solanaceae species. The cp genome of S. muricatum was 155,681 bp in length and included a large single copy (LSC) region of 86,182 bp and a small single-copy (SSC) region of 18,360 bp, separated by a pair of inverted repeats (IR) regions of 25,568 bp. The cp genome possessed 87 protein-coding genes (CDS), 37 transfer RNA (tRNA) genes, eight ribosomal RNA (rRNA) genes, and one pseudogene. Furthermore, 48 perfect microsatellites were identified. These repeats were mainly located in the noncoding regions. Whole cp genome comparative analysis revealed that the SSC and LSC regions showed more divergence than IR regions. Similar to previous studies, our phylogenetic analysis showed that S. muricatum is a sister species to members of sections Petota + Lycopersicum + Etuberosum. We expect that this first sweet cucumber chloroplast genome will provide potential molecular markers and genomic resources to shed light on the genetic diversity and population studies of S. muricatum, which will allow us to identify varieties and ecotypes. Finally, the features and the structural differentiation will provide us with information about the genes of interest, generating tools for the most precise selection of the best individuals of sweet cucumber, in less time and with fewer resources.
      Citation: Data
      PubDate: 2022-09-01
      DOI: 10.3390/data7090123
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 124: An Updated List of Rock Partridge (Alectoris
           graeca) Haplotypes from the Apennines—Central Italy

    • Authors: Leonardo Brustenga, Paolo Viola, Pedro Girotti, Andrea Amici, Alessandro Rossetti, Stefania Chiesa, Riccardo Primi, Luigi Esposito, Livia Lucentini
      First page: 124
      Abstract: We report an updated and expanded list of Rock Partridge (Alectoris graeca) haplotypes found in wild animals throughout the Apennines of central Italy. Samples were collected and identified during a monitoring program of autochthonous Galliformes and from a private collection. The haplotypes were identified on a longer fragment of the mitochondrial control region (D-loop) based on previously reported haplotypes. This novel evidence, based on a wider sampling area and a higher number of analyzed specimens, will be of relevance in both conservation projects and gamebird breeding for restock, as imposed by the Italian Action Plan. Studying longer fragments can also be useful for phylogeographic analysis.
      Citation: Data
      PubDate: 2022-09-01
      DOI: 10.3390/data7090124
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 125: COVID-19 Lockdown Effects on Mood, Alcohol
           Consumption, Academic Functioning, and Perceived Immune Fitness: Data from
           Young Adults in Germany

    • Authors: Anna Helin Koyun, Pauline A. Hendriksen, Pantea Kiani, Agnese Merlo, Jessica Balikji, Ann-Kathrin Stock, Joris C. Verster
      First page: 125
      Abstract: Recently, a study was conducted in the Netherlands to evaluate the impact of the coronavirus disease (COVID-19) pandemic and its associated lockdown periods on academic functioning, mood, and health correlates, such as alcohol consumption. The Dutch study revealed that lockdowns were associated with significantly poorer mood and reductions in perceived immune fitness. Overall, a reduction in alcohol consumption during lockdown periods was shown. Academic functioning in terms of self-reported performance was unaffected. However, a significant reduction in interactions with other students and teachers was reported. However, there was considerable variability among students; both increases and reductions in alcohol consumption were reported, as well as both improvements and poorer academic functioning during periods of lockdown. The aim of the current online study was to replicate these findings in Germany. To achieve this, a slightly modified version of the survey was administered among young adults (aged 18 to 35 years old) in Germany. The survey assessed possible changes in self-reported academic functioning, mood, and health correlates, such as smoking and alcohol consumption, perceived immune functioning, and sleep quality during periods of lockdown as compared to periods with no lockdowns. Retrospective assessments were made for five periods, including (1) ‘BP’ (the period before the COVID-19 pandemic), (2) ‘L1’ (the first lockdown period, March–May 2020), (3) ‘NL1’ (the first no-lockdown period, summer 2020), (4) ‘L2’ (the second lockdown, November 2020 to May 2021), and (5) ‘NL2’ (the second no-lockdown period, summer 2021). This article describes the content of the survey and the corresponding dataset. The survey was completed by 371 participants.
      Citation: Data
      PubDate: 2022-09-03
      DOI: 10.3390/data7090125
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 126: Using Transfer Learning to Train a Binary
           Classifier for Lorrca Ektacytometery Microscopic Images of Sickle Cells
           and Healthy Red Blood Cells

    • Authors: Marya Butt, Ander de Keijzer
      First page: 126
      Abstract: Multiple blood images of stressed and sheared cells, taken by a Lorrca Ektacytometery microscope, needed a classification for biomedical researchers to assess several treatment options for blood-related diseases. The study proposes the design of a model capable of classifying these images, with high accuracy, into healthy Red Blood Cells (RBCs) or Sickle Cells (SCs) images. The performances of five Deep Learning (DL) models with two different optimizers, namely Adam and Stochastic Gradient Descent (SGD), were compared. The first three models consisted of 1, 2 and 3 blocks of CNN, respectively, and the last two models used a transfer learning approach to extract features. The dataset was first augmented, scaled, and then trained to develop models. The performance of the models was evaluated by testing on new images and was illustrated by confusion matrices, performance metrics (accuracy, recall, precision and f1 score), a receiver operating characteristic (ROC) curve and the area under the curve (AUC) value. The first, second and third models with the Adam optimizer could not achieve training, validation or testing accuracy above 50%. However, the second and third models with SGD optimizers showed good loss and accuracy scores during training and validation, but the testing accuracy did not exceed 51%. The fourth and fifth models used VGG16 and Resnet50 pre-trained models for feature extraction, respectively. VGG16 performed better than Resnet50, scoring 98% accuracy and an AUC of 0.98 with both optimizers. The study suggests that transfer learning with the VGG16 model helped to extract features from images for the classification of healthy RBCs and SCs, thus making a significant difference in performance comparing the first, second, third and fifth models.
      Citation: Data
      PubDate: 2022-09-05
      DOI: 10.3390/data7090126
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 127: Are Source Code Metrics “Good Enough”
           in Predicting Security Vulnerabilities'

    • Authors: Sundarakrishnan Ganesh, Francis Palma, Tobias Olsson
      First page: 127
      Abstract: Modern systems produce and handle a large volume of sensitive enterprise data. Therefore, security vulnerabilities in the software systems must be identified and resolved early to prevent security breaches and failures. Predicting security vulnerabilities is an alternative to identifying them as developers write code. In this study, we studied the ability of several machine learning algorithms to predict security vulnerabilities. We created two datasets containing security vulnerability information from two open-source systems: (1) Apache Tomcat (versions 4.x and five 2.5.x minor versions). We also computed source code metrics for these versions of both systems. We examined four classifiers, including Naive Bayes, Decision Tree, XGBoost Classifier, and Logistic Regression, to show their ability to predict security vulnerabilities. Moreover, an ensemble learner was introduced using a stacking classifier to see whether the prediction performance could be improved. We performed cross-version and cross-project predictions to assess the effectiveness of the best-performing model. Our results showed that the XGBoost classifier performed best compared to other learners, i.e., with an average accuracy of 97% in both datasets. The stacking classifier performed with an average accuracy of 92% in Struts and 71% in Tomcat. Our best-performing model—XGBoost—could predict with an average accuracy of 87% in Tomcat and 99% in Struts in a cross-version setup.
      Citation: Data
      PubDate: 2022-09-07
      DOI: 10.3390/data7090127
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 128: Geo-Locations and System Data of Renewable Energy
           Installations in Germany

    • Authors: David Manske, Lukas Grosch, Julius Schmiedt, Nora Mittelstädt, Daniela Thrän
      First page: 128
      Abstract: Information on geo-locations of renewable energy installations is very useful to investigate spatial, social or environmental questions on their impact at local and national level. However, existing data sets do not provide a sufficiently accurate representation of these installations in Germany over space and time. This work provides a valid approach on how a data set of wind power plants, photovoltaic field systems, bioenergy plants and hydropower plants can be created for Germany based on a data extract from the Core Energy Market Data Register (CEMDR) and publicly available data. Established methods were used (e.g., random forest, image recognition), but new techniques were also developed to fill data gaps or locate misplaced renewable energy installations. In this way, a substantial part of the CEMDR data could be corrected and processed in such a way that it can be freely used in a GIS software by any scientific and non-scientific discipline.
      Citation: Data
      PubDate: 2022-09-10
      DOI: 10.3390/data7090128
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 129: Data for Photodissociation of Some Small
           Molecular Ions Relevant for Astrochemistry and Laboratory Investigation

    • Authors: Vladimir A. Srećković, Ljubinko M. Ignjatović, Aleksandra Kolarski, Zoran R. Mijić, Milan S. Dimitrijević, Veljko Vujčić
      First page: 129
      Abstract: The calculated photodissociation data of some small molecular ions have been reported. The cross-sections and spectral rate coefficients data have been studied using a quantum mechanical method. The plasma parameters, i.e., conditions, cover temperatures from 1000 to 20,000 K and wavelengths in the EUV and UV region. The influence of temperature and wavelength on the spectral coefficients data of all of the investigated species have been discussed. Data could also be useful for plasma diagnostics in laboratory, astrophysics, and industrial plasmas for their modelling.
      Citation: Data
      PubDate: 2022-09-11
      DOI: 10.3390/data7090129
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 130: Redox Data of Tris(polypyridine)manganese(II)
           Complexes

    • Authors: Mtshali, von Eschwege, Conradie
      First page: 130
      Abstract: Very little cyclic voltammetry data for tris(polypyridine)manganese(II) complexes, [MnII(N^N)3]2+, where N^N is bipyridine (bpy), phenanthroline (phen) or substituted bpy or phen ligands, respectively; are available in the literature. Cyclic voltammograms were found for tris(4,7-diphenyl-1,10-phenanthroline)manganese(II) perchlorate only. In addition to our recently published related research article, the data presented here provides cyclic voltammograms and corresponding voltage-current data obtained during electrochemical oxidation and the reduction of four [MnII(N^N)3]2+ complexes, using different scan rates and analyte concentrations. The results show increased concentration and scan rates resulting in higher Mn(II/III) peak oxidation potentials and increased peak current-voltage separations of the irreversible Mn(II/III) redox event. The average peak oxidation and peak reduction potentials of the Mn(II/III) redox events stayed constant within 0.01 V. Similarly, the average of the peak oxidation and reduction potentials of the ligand-based reduction events of [MnII(N^N)3]2+ were constant within 0.01 V.
      Citation: Data
      PubDate: 2022-09-13
      DOI: 10.3390/data7090130
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 131: The COLIBAS Study—COVID-19 Lockdown Effects
           on Mood, Academic Functioning, Alcohol Consumption, and Perceived Immune
           Fitness: Data from Buenos Aires University Students

    • Authors: Pauline A. Hendriksen, Pantea Kiani, Agnese Merlo, Analia Karadayian, Analia Czerniczyniec, Silvia Lores-Arnaiz, Gillian Bruce, Joris C. Verster
      First page: 131
      Abstract: A recent study was conducted in the Netherlands to evaluate the impact of the 2019 coronavirus (COVID-19) pandemic and its associated lockdown periods on academic functioning, mood, and health correlates such as alcohol consumption. The study revealed that lockdowns were associated with a significantly poorer mood and a reduced perceived immune fitness. Overall, a reduction was seen in alcohol consumption during the lockdown periods. Academic functioning in terms of performance was unaffected; however, a significant reduction in interactions with other students and teachers was reported. There was, however, great variability between students as follows: both an increase and a reduction in alcohol consumption were reported, as well as improvements and poorer academic functioning. The aim of the current online study was to replicate these findings in Argentina. To this extent, a modified version of the survey was conducted among students at the University of Buenos Aires, which was adapted to the local lockdown measures. The survey assessed possible changes in self-reported academic functioning, mood, and health correlates, such as alcohol consumption, perceived immune functioning, and sleep quality compared to before the COVID-19 pandemic. Retrospective assessments were made for four periods, including (1) the period before COVID-19, (2) the first lockdown period (March–December 2020), (3) summer 2021 (January-March 2021, no lockdown), and (4) the second lockdown (from April 2021 to July 2021). This article describes the content of the survey and the corresponding dataset. The survey was completed by 508 participants.
      Citation: Data
      PubDate: 2022-09-14
      DOI: 10.3390/data7090131
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 132: Dataset of Psychological Scales and Physiological
           Signals Collected for Anxiety Assessment Using a Portable Device

    • Authors: Mohamed Elgendi, Valeria Galli, Chakaveh Ahmadizadeh, Carlo Menon
      First page: 132
      Abstract: Portable and wearable devices are becoming increasingly common in our daily lives. In this study, we examined the impact of anxiety-inducing videos on biosignals, particularly electrocardiogram (ECG) and respiration (RES) signals, that were collected using a portable device. Two psychological scales (Beck Anxiety Inventory and Hamilton Anxiety Rating Scale) were used to assess overall anxiety before induction. The data were collected at Simon Fraser University from participants aged 18–56, all of whom were healthy at the time. The ECG and RES signals were collected simultaneously while participants continuously watched video clips that stimulated anxiety-inducing (negative experience) and non-anxiety-inducing events (positive experience). The ECG and RES signals were recorded simultaneously at 500 Hz. The final dataset consisted of psychological scores and physiological signals from 19 participants (14 males and 5 females) who watched eight video clips. This dataset can be used to explore the instantaneous relationship between ECG and RES waveforms and anxiety-inducing video clips to uncover and evaluate the latent characteristic information contained in these biosignals.
      Citation: Data
      PubDate: 2022-09-14
      DOI: 10.3390/data7090132
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 133: Prediction of Retention Indices and Response
           Factors of Oxygenates for GC-FID by Multilinear Regression

    • Authors: Nils Kretzschmar, Markus Seifert, Oliver Busse, Jan J. Weigand
      First page: 133
      Abstract: The replacement of fossil carbon sources with green bio-oils promotes the importance of several hundred oxygenated hydrocarbons, which substantially increases the analytical effort in catalysis research. A multilinear regression is performed to correlate retention indices (RIs) and response factors (RFs) with structural properties. The model includes a variety of possible products formed during the hydrodeoxygenation of bio-oils with good accuracy (RRF2 0.921 and RRI2 0.975). The GC parameters are related to the detailed hydrocarbon analysis (DHA) method, which is commonly used for non-oxygenated hydrocarbons. The RIs are determined from a paraffin standard (C5–C15), and the RFs are calculated with ethanol and 1,3,5-trimethylbenzene as internal standards. The method presented here can, therefore, be used together with the DHA method and be expanded further. In addition to the multilinear regression, an increment system has been developed for aromatic oxygenates, which further improves the prediction accuracy of the response factors with respect to the molecular constitution (R2 0.958). Both predictive models are designed exclusively on structural factors to ensure effortless application. All experimental RIs and RFs are determined under identical conditions. Moreover, a folded Plackett–Burman screening design demonstrates the general applicability of the datasets independent of method- or device-specific parameters.
      Citation: Data
      PubDate: 2022-09-14
      DOI: 10.3390/data7090133
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 134: Sheep Nocturnal Activity Dataset

    • Authors: António Monteiro, Pedro Gonçalves, Maria R. Marques, Ana T. Belo, Fernando Braz
      First page: 134
      Abstract: Monitoring sheep’s behavior is of paramount importance, because deviations from normal patterns may indicate nutritional, thermal or social stress, changes in reproductive status, health issues, or predator attacks. The night period, despite being a more restful period in which animals are theoretically sleeping and resting, represents approximately half of the life cycle of animals; therefore, its study is of immense interest. Wearable sensors have become a widely recognized technique for monitoring activity, both for their precision and the ease with which the sensorized data can be analyzed. The present dataset consists of data from the sensorization of 18 Serra da Estrela sheep, during the nocturnal period between 18 November 2021 and 16 February 2022. The data contain measurements taken by ultrasound and accelerometry of the height from neck to ground, as well as measurements taken by an accelerometer in the monitoring collar. Data were collected every 10 s when the animals were in the shelter. With the collection of data from various sensors, active and inactive periods can be identified throughout the night, quantifying the number and average time of those periods.
      Citation: Data
      PubDate: 2022-09-14
      DOI: 10.3390/data7090134
      Issue No: Vol. 7, No. 9 (2022)
       
  • Data, Vol. 7, Pages 102: Dataset of Indicators for the Assessment of
           Ecosystem Services Affected by Agricultural Soil Management

    • Authors: Paul, Donmez, Koeppe, Robinson, Barnickel
      First page: 102
      Abstract: Ecosystem services represent an important concept for assessing the sustainability of agricultural management. However, in practical applications, it can be difficult to find indicators suitable for specific services or specific spatial scales. In order to create a toolbox of indicators for assessing the actual or potential supply of ecosystem services in the context of agricultural land and soil management, we conducted a keyword-based literature review in Web of Science Core Collection and SCOPUS, using the terms ecosystem service AND indicator AND agricultur*. The search was performed in January 2019 and was restricted to journal articles written in English. After eliminating duplicates, we identified 180 articles, out of which 121 met our selection criteria. We extracted information on addressed ecosystem services and indicators which used a full-text review. Where studies used ecosystem service definitions other than the Common International Classification of Ecosystem Services (CICES V.5.1), indicators were assigned to the corresponding CICES class or classes. We used the information derived from the review to create factsheets for 37 ecosystem services. Each factsheet provides tables with available indicators applicable at multiple spatial scales that range from field to global, information on the type of input data required, and a reference to the article or articles that the indicator was taken from. The dataset provides a toolbox for researchers to find indicators that fit their respective research needs.
      Citation: Data
      PubDate: 2022-07-22
      DOI: 10.3390/data7080102
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 103: Dataset for Estimated Closures of Scallop (Pecten
           maximus) Production Areas Due to Phycotoxin Contamination along the French
           Coasts of the Eastern English Channel

    • Authors: Sarra Chenouf, Mathieu Merzereaud, Pascal Raux, José Antonio Pérez Agúndez
      First page: 103
      Abstract: Commercial bans due to harmful algal blooms (HABs), which are natural events, question the sustainability of human activities in marine and coastal areas. A risk assessment of these bans is important to support decision-making to better manage and mitigate their impacts. However, data are sparse and difficult to collect. The dataset presented in this paper includes “estimated closures of scallop fishing areas” due to HAB toxicity along the French coasts of the English Channel. The closure data were simulated for each scallop (Pecten maximus) fishing area through an algorithm applied to the in situ dataset from the French monitoring network REPHYTOX. The methodology of the production of closure data consists of comparing phycotoxin concentration in scallop to regulatory thresholds of phycotoxins, and then, simulating the number and duration of closures based on the monitoring strategies and closure mechanisms as defined in the regulations. These data only cover closures related to regulatory threshold exceedances of phycotoxins in shellfish. Closures induced by the lack of sampling or other reasons (e.g., failures in toxin analysis) are not included in the dataset because of the lack of information. Data are produced during the scallop fishing season. Facing the non-existence of such a closure database due to the lack of centralized management of local closure decrees, this dataset can be used to analyse the management strategies to deal with HABs and to highlight the governance challenges related to these strategies. It is also useful to study the link between the ecological and the socioeconomic dimensions of HABs, and to describe how toxin concentrations in shellfish translate into socioeconomic impacts and management challenges. This methodology can be applied to other species, other areas and other economic activities.
      Citation: Data
      PubDate: 2022-07-27
      DOI: 10.3390/data7080103
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 104: Three-Dimensional, Km-Scale Hyperspectral Data of
           Well-Exposed Zn–Pb Mineralization at Black Angel Mountain, Greenland
           

    • Authors: Sandra Lorenz, Sam T. Thiele, Moritz Kirsch, Gabriel Unger, Robert Zimmermann, Pierpaolo Guarnieri, Nigel Baker, Erik Vest Sørensen, Diogo Rosa, Richard Gloaguen
      First page: 104
      Abstract: Hyperspectral imaging is an innovative technology for non-invasive mapping, with increasing applications in many sectors. As with any novel technology, robust processing workflows are required to ensure a wide use. We present an open-source hypercloud dataset capturing the complex but spectacularly well exposed geology from the Black Angel Mountain in Maarmorilik, West Greenland, alongside a detailed and interactive tutorial documenting relevant processing workflows. This contribution relies on very recent progress made on the correction, interpretation and integration of hyperspectral data in earth sciences. The possibility to fuse hyperspectral scans with 3D point cloud representations (hyperclouds) has opened up new possibilities for the mapping of complex natural targets. Spectroscopic and machine learning tools allow or the rapid and accurate characterization of geological structures in a 3D environment. Potential users can use this exemplary dataset and the associated tools to train themselves or test new algorithms. As the data and the tools have a wide range of application, we expect this contribution to benefit the scientific community at large.
      Citation: Data
      PubDate: 2022-07-28
      DOI: 10.3390/data7080104
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 105: Populating the Data Space for Cultural Heritage
           with Heritage Digital Twins

    • Authors: Franco Niccolucci, Achille Felicetti, Sorin Hermon
      First page: 105
      Abstract: The present paper concerns the design of the semantic infrastructure of the data space for cultural heritage as envisaged by the European Commission in its recent documents. Due to the complexity of the cultural heritage data and of their intrinsic inter-relationships, it is necessary to introduce a novel ontology, yet compliant with existing standards and interoperable with previous platforms used in this context as Europeana. The data space organization must be tailored to the methods and the theory of cultural heritage, briefly summarized in the introduction. The new ontology is based on the Digital Twin concept, i.e., the digital counterpart of cultural heritage assets incorporating all the digital information pertaining to them. This creates a Knowledge Base on the cultural heritage data space. The paper outlines the main features of the proposed Heritage Digital Twin ontology and provides some examples of its application. Future work will include completing the ontology in all its details and testing it in other real cases and with the various sectors of the cultural heritage community.
      Citation: Data
      PubDate: 2022-07-29
      DOI: 10.3390/data7080105
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 106: An Evaluation of the OpenWeatherMap API versus
           INMET Using Weather Data from Two Brazilian Cities: Recife and Campina
           Grande

    • Authors: Anwar Musah, Livia Màrcia Mosso Dutra, Aisha Aldosery, Ella Browning, Tercio Ambrizzi, Iuri Valerio Graciano Borges, Merve Tunali, Selma Başibüyük, Orhan Yenigün, Giselle Machado Magalhaes Moreno, Ana Clara Gomes da Silva, Wellington Pinheiro dos Santos, Clarisse Lins de Lima, Tiago Massoni, Kate Elizabeth Jones, Luiza Cintra Campos, Patty Kostkova
      First page: 106
      Abstract: Certain weather conditions are inadvertently related to increased population of various mosquitoes. In order to predict the burden of mosquito populations in the Global South, it is imperative to integrate weather-related risk factors into such predictive models. There are a lot of online open-source weather platforms that provide historical, current and future weather forecasts which can be utilised for general predictions, and these electronic sources serve as an alternate option for weather data when physical weather stations are inaccessible (or inactive). Before using data from such online source, it is important to assess the accuracy against some baseline measure. In this paper, we therefore evaluated the accuracy and suitability of weather forecasts of two parameters namely temperature and humidity from the OpenWeatherMap API (an online weather platform) and compared them with actual measurements collected from the Brazilian weather stations (INMET). The evaluation was focused on two Brazilian cites, namely, Recife and Campina Grande. The intention is to prepare an early warning model which will harness data from OpenWeatherMap API for mosquito prediction.
      Citation: Data
      PubDate: 2022-07-30
      DOI: 10.3390/data7080106
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 107: Mobility and Dissemination of COVID-19 in
           Portugal: Correlations and Estimates from Google’s Mobility Data

    • Authors: Nelson Mileu, Nuno M. Costa, Eduarda M. Costa, André Alves
      First page: 107
      Abstract: The spread of the coronavirus disease 2019 (COVID-19) has important links with population mobility. Social interaction is a known determinant of human-to-human transmission of infectious diseases and, in turn, population mobility as a proxy of interaction is of paramount importance to analyze COVID-19 diffusion. Using mobility data from Google’s Community Reports, this paper captures the association between changes in mobility patterns through time and the corresponding COVID-19 incidence at a multi-scalar approach applied to mainland Portugal. Results demonstrate a strong relationship between mobility data and COVID-19 incidence, suggesting that more mobility is associated with more COVID-19 cases. Methodological procedures can be summarized in a multiple linear regression with a time moving window. Model validation demonstrate good forecast accuracy, particularly when we consider the cumulative number of cases. Based on this premise, it is possible to estimate and predict future evolution of the number of COVID-19 cases using near real-time information of population mobility.
      Citation: Data
      PubDate: 2022-07-31
      DOI: 10.3390/data7080107
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 108: Go Wild for a While' A Bibliometric Analysis
           of Two Themes in Tourism Demand Forecasting from 1980 to 2021: Current
           Status and Development

    • Authors: Zhang, Choo, Abdul Aziz, Yee, Ho
      First page: 108
      Abstract: Despite the fact that the concept of forecasting has emerged in the realm of tourism, studies delving into this sector have yet to provide a comprehensive overview of the evolution of tourism forecasting visualization. This research presents an analysis of the current state-of-the-art tourism demand forecasting (TDF) and combined tourism demand forecasting (CTDF) systems. Based on the Web of Science Core Collection database, this study built a framework for bibliometric analysis from these fields in three distinct phases (1980–2021). Furthermore, the VOSviewer analysis software was employed to yield a clearer picture of the current status and developments in tourism forecasting research. Descriptive analysis and comprehensive knowledge network mappings using approaches such as co-citation analysis and cooperation networking were employed to identify trending research topics, the most important countries/regions, institutions, publications, and articles, and the most influential researchers. The results yielded demonstrate that scientific output pertaining to TDF exceeds the output pertaining to CTDF. However, there has been a substantial and exponential increase in both situations over recent years. In addition, the results indicated that tourism forecasting research has become increasingly diversified, with numerous combined methods presented. Furthermore, the most influential papers and writers were evaluated based on their citations, publications, network position, and relevance. The contemporary themes were also analyzed, and obstacles to the expansion of the literature were identified. This is the first study on two topics to demonstrate the ways in which bibliometric visualization can assist researchers in gaining perspectives in the tourism forecasting field by effectively communicating key findings, facilitating data exploration, and providing valuable data for future research.
      Citation: Data
      PubDate: 2022-07-31
      DOI: 10.3390/data7080108
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 109: A Large-Scale Dataset of Twitter Chatter about
           Online Learning during the Current COVID-19 Omicron Wave

    • Authors: Nirmalya Thakur
      First page: 109
      Abstract: The COVID-19 Omicron variant, reported to be the most immune-evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations related to online learning in the form of tweets. Mining such tweets to develop a dataset can serve as a data resource for different applications and use-cases related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore, this work presents a large-scale, open-access Twitter dataset of conversations about online learning from different parts of the world since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. The paper also briefly outlines some potential applications in the fields of Big Data, Data Mining, Natural Language Processing, and their related disciplines, with a specific focus on online learning during this Omicron wave that may be studied, explored, and investigated by using this dataset.
      Citation: Data
      PubDate: 2022-08-04
      DOI: 10.3390/data7080109
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 110: Grapevine Plant Image Dataset for Pruning

    • Authors: Kyriakos D. Apostolidis, Theofanis Kalampokas, Theodore P. Pachidis, Vassilis G. Kaburlasos
      First page: 110
      Abstract: Grapevine pruning is conducted during winter, and it is a very important and expensive task for wine producers managing their vineyard. During grapevine pruning every year, the past year’s canes should be removed and should provide the possibility for new canes to grow and produce grapes. It is a difficult procedure, and it is not yet fully automated. However, some attempts have been made by the research community. Based on the literature, grapevine pruning automation is approximated with the help of computer vision and image processing methods. Despite the attempts that have been made to automate grapevine pruning, the task remains hard for the abovementioned domains. The reason for this is that several challenges such as cane overlapping or complex backgrounds appear. Additionally, there is no public image dataset for this problem which makes it difficult for the research community to approach it. Motivated by the above facts, an image dataset is proposed for grapevine canes’ segmentation for a pruning task. An experimental analysis is also conducted in the proposed dataset, achieving a 67% IoU and 78% F1 score in grapevine cane semantic segmentation with the U-net model.
      Citation: Data
      PubDate: 2022-08-09
      DOI: 10.3390/data7080110
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 111: Tropical Wood Species Recognition: A Dataset of
           Macroscopic Images

    • Authors: Daniel Alejandro Cano Saenz, Carlos Felipe Ordoñez Urbano, Holman Raul Gaitan Mesa, Rubiel Vargas-Cañas
      First page: 111
      Abstract: Forests are of incalculable value due to the ecosystem services they provide to humanity such as carbon storage, climate regulation and participation in the hydrological cycle. The threat to forests grows as the population increases and the activities that are carried out in it, such as: cattle rearing, illegal trafficking, deforestation and harvesting. Moreover, the environmental authorities do not have sufficient capacity to exercise strict control over wood production due to the vast variety of timber species within the countries, the lack of tools to verify timber species in the supply chain and the limited available and labelled digital data of the forest species. This paper presents a set of digital macroscopic images of eleven tropical forest species, which can be used as support at checkpoints, to carry out studies and research based on macroscopic analysis of cross-sectional images of tree species such as: dendrology, forestry, as well as algorithms of artificial intelligence. Images were acquired in wood warehouses with a digital magnifying glass following a protocol used by the Colombian Ministry of Environment, as well as the USA Forest Services and the International Association of Wood Anatomists. The dataset contains more than 8000 images with resolution of 640 × 480 pixels which includes 3.9 microns per pixel, and an area of (2.5 × 1.9) square millimeters where the anatomical features are exposed. The dataset presents great usability for academics and researchers in the forestry sector, wood anatomists and personnel who work with computational models, without neglecting forest surveillance institutions such as regional autonomous corporations and the Ministry of the Environment.
      Citation: Data
      PubDate: 2022-08-11
      DOI: 10.3390/data7080111
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 112: MultimodalGasData: Multimodal Dataset for Gas
           Detection and Classification

    • Authors: Parag Narkhede, Rahee Walambe, Pulkit Chandel, Shruti Mandaokar, Ketan Kotecha
      First page: 112
      Abstract: The detection of gas leakages is a crucial aspect to be considered in the chemical industries, coal mines, home applications, etc. Early detection and identification of the type of gas is required to avoid damage to human lives and the environment. The MultimodalGasData presented in this paper is a novel collection of simultaneous data samples taken using seven different gas-detecting sensors and a thermal imaging camera. The low-cost sensors are generally less sensitive and less reliable; hence, they are unable to detect the gases from a longer distance. A thermal camera that can sense the temperature changes is also used while collecting the present multimodal dataset to overcome the drawback of using only the sensors for detecting gases. This multimodal dataset has a total of 6400 samples, including 1600 samples per class for smoke, perfume, a mixture of smoke and perfume, and a neutral environment. The dataset is helpful for the researchers and system developers to develop and train the state-of-the-art artificial intelligence models and systems.
      Citation: Data
      PubDate: 2022-08-12
      DOI: 10.3390/data7080112
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 113: Data Warehousing Process Modeling from Classical
           Approaches to New Trends: Main Features and Comparisons

    • Authors: Asma Dhaouadi, Khadija Bousselmi, Mohamed Mohsen Gammoudi, Sébastien Monnet, Slimane Hammoudi
      First page: 113
      Abstract: The extract, transform, and load (ETL) process is at the core of data warehousing architectures. As such, the success of data warehouse (DW) projects is essentially based on the proper modeling of the ETL process. As there is no standard model for the representation and design of this process, several researchers have made efforts to propose modeling methods based on different formalisms, such as unified modeling language (UML), ontology, model-driven architecture (MDA), model-driven development (MDD), and graphical flow, which includes business process model notation (BPMN), colored Petri nets (CPN), Yet Another Workflow Language (YAWL), CommonCube, entity modeling diagram (EMD), and so on. With the emergence of Big Data, despite the multitude of relevant approaches proposed for modeling the ETL process in classical environments, part of the community has been motivated to provide new data warehousing methods that support Big Data specifications. In this paper, we present a summary of relevant works related to the modeling of data warehousing approaches, from classical ETL processes to ELT design approaches. A systematic literature review is conducted and a detailed set of comparison criteria are defined in order to allow the reader to better understand the evolution of these processes. Our study paints a complete picture of ETL modeling approaches, from their advent to the era of Big Data, while comparing their main characteristics. This study allows for the identification of the main challenges and issues related to the design of Big Data warehousing systems, mainly involving the lack of a generic design model for data collection, storage, processing, querying, and analysis.
      Citation: Data
      PubDate: 2022-08-12
      DOI: 10.3390/data7080113
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 114: An Inventory of Large-Scale Landslides in Baoji
           City, Shaanxi Province, China

    • Authors: Lei Li, Chong Xu, Zhiqiang Yang, Zhongjian Zhang, Mingsheng Lv
      First page: 114
      Abstract: Landslides are a typical geological hazard that endangers people’s lives and property in the Loess Plateau. The destructiveness of large-scale landslides, in particular, is incalculable. For example, traffic disruptions, river blockages, and house collapses may all result from landslides. Thus, it is urgent to compile a complete inventory of landslides in a specific region. The investigation object of this study is Baoji City, Shaanxi Province, China. Using the multi-temporal high-resolution remote sensing images from Google Earth, we preliminarily completed the cataloging of large-scale (area > 5000 m2) landslides in the study area through visual interpretation. The inventory was subsequently compared with the existing literature and hazard records for improvement and supplement. We identified 3422 landslides with a total area of 360.7 km2 and an average area of 105,400 m2 for each individual landslide. The largest landslide had an area of 1.71 km2, while the smallest one was 6042 m2. In previous studies, we analyzed these data without describing the data sources in detail. We now provide a shared dataset of each landslide in shp format, containing geographic location, boundary information, etc. The dataset is significantly useful for understanding the distribution characteristics of large-scale landslides in this region. Moreover, it can serve as basic data for the study of paleolandslide resurrection.
      Citation: Data
      PubDate: 2022-08-15
      DOI: 10.3390/data7080114
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 115: Description and Use of Three-Dimensional
           Numerical Phantoms of Cardiac Computed Tomography Images

    • Authors: Miguel Vera, Antonio Bravo, Rubén Medina
      First page: 115
      Abstract: The World Health Organization indicates the top cause of death is heart disease. These diseases can be detected using several imaging modalities, especially cardiac computed tomography (CT), whose images have imperfections associated with noise and certain artifacts. To minimize the impact of these imperfections on the quality of the CT images, several researchers have developed digital image processing techniques (DPIT) by which the quality is evaluated considering several metrics and databases (DB), both real and simulated. This article describes the processes that made it possible to generate and utilize six three-dimensional synthetic cardiac DBs or voxels-based numerical phantoms. An exhaustive analysis of the most relevant features of images of the left ventricle, belonging to a real CT DB of the human heart, was performed. These features are recreated in the synthetic DBs, generating a reference phantom or ground truth free of imperfections (DB1) and five phantoms, in which Poisson noise (DB2), stair-step artifact (DB3), streak artifact (DB4), both artifacts (DB5) and all imperfections (DB6) are incorporated. These DBs can be used to determine the performance of DPIT, aimed at decreasing the effect of these imperfections on the quality of cardiac images.
      Citation: Data
      PubDate: 2022-08-16
      DOI: 10.3390/data7080115
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 116: Vertical Jump Data from Inertial and Optical
           Motion Tracking Systems

    • Authors: Mateo Rico-Garcia, Juan Botero-Valencia, Ruber Hernández-García
      First page: 116
      Abstract: Motion capture (MOCAP) is a widely used technique to record human, animal, and object movement for various applications such as animation, biomechanical assessment, and control systems. Different systems have been proposed based on diverse technologies, such as visible light cameras, infrared cameras with passive or active markers, inertial systems, or goniometer-based systems. Each system has pros and cons that make it usable in different scenarios. This paper presents a dataset that combines Optical Motion and Inertial Systems, capturing a well-known sports movement as the vertical jump. As a reference system, the optical motion capture consists of six Flex 3 Optitrack cameras with 100 FPS. On the other hand, we developed an inertial system consisting of seven custom-made devices based on the IMU MPU-9250, which includes a three-axis magnetometer, accelerometer and gyroscope, and an embedded Digital Motion Processor (DMP) attached to a microcontroller mounted on a Teensy 3.2 with an ARM Cortex-M4 processor with wireless operation using Bluetooth. The purpose of taking IMU data with a low-cost and customized system is the deployment of applications that can be performed with similar hardware and can be adjusted to different areas. The developed measurement system is flexible, and the acquisition format and enclosure can be customized. The proposed dataset comprises eight jumps recorded from four healthy humans using both systems. Experimental results on the dataset show two usage examples for measuring joint angles and COM position. The proposed dataset is publicly available online and can be used in comparative algorithms, biomechanical studies, skeleton reconstruction, sensor fusion techniques, or machine learning models.
      Citation: Data
      PubDate: 2022-08-17
      DOI: 10.3390/data7080116
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 117: Climate Dataset for South Africa by the
           Agricultural Research Council

    • Authors: Mokhele Edmond Moeletsi, Lindumusa Myeni, Ludwig Christian Kaempffer, Derick Vermaak, Gert de Nysschen, Chrisna Henningse, Irene Nel, Dudley Rowswell
      First page: 117
      Abstract: Long-term, reliable, continuous and real-time weather and climatic data are essential for efficient management and sustainable use of natural resources. This paper describes the weather station network (WSN) of the Agricultural Research Council (ARC) of South Africa, including information on instrumentation, data retrieval and processing protocols, calibration and maintenance protocols, as well as applications of the collected data. To this end, the WSN of the ARC consists of over 600 automatic weather stations that are distributed across the country to cover a wide range of agro-climatic zones. At each weather station, air temperature, rainfall, relative humidity, solar irradiance, wind speed and direction are monitored and archived on an hourly basis. The main objective of this WSN is to archive climate information for South Africa as well as supply the agricultural community with weather data to support decision-making.
      Citation: Data
      PubDate: 2022-08-17
      DOI: 10.3390/data7080117
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 118: A European Approach to the Establishment of Data
           Spaces

    • Authors: Marco Minghini, Alexander Kotsev, Carlos Granell
      First page: 118
      Abstract: Within a context defined by the rapid increase in the availability of data, combined with the complexity of data sources, infrastructures, technologies and actors involved in data sharing flows, the European Union (EU) is devising approaches that can reap the benefits of data-driven innovation [...]
      Citation: Data
      PubDate: 2022-08-19
      DOI: 10.3390/data7080118
      Issue No: Vol. 7, No. 8 (2022)
       
  • Data, Vol. 7, Pages 83: Instagram-Based Benchmark Dataset for
           Cyberbullying Detection in Arabic Text

    • Authors: ALBayari, Abdallah
      First page: 83
      Abstract: (1) Background: the ability to use social media to communicate without revealing one’s real identity has created an attractive setting for cyberbullying. Several studies targeted social media to collect their datasets with the aim of automatically detecting offensive language. However, the majority of the datasets were in English, not in Arabic. Even the few Arabic datasets that were collected, none focused on Instagram despite being a major social media platform in the Arab world. (2) Methods: we use the official Instagram APIs to collect our dataset. To consider the dataset as a benchmark, we use SPSS (Kappa statistic) to evaluate the inter-annotator agreement (IAA), as well as examine and evaluate the performance of various learning models (LR, SVM, RFC, and MNB). (3) Results: in this research, we present the first Instagram Arabic corpus (sub-class categorization (multi-class)) focusing on cyberbullying. The dataset is primarily designed for the purpose of detecting offensive language in texts. We end up with 200,000 comments, of which 46,898 comments were annotated by three human annotators. The results show that the SVM classifier outperforms the other classifiers, with an F1 score of 69% for bullying comments and 85 percent for positive comments.
      Citation: Data
      PubDate: 2022-06-22
      DOI: 10.3390/data7070083
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 84: Dataset: Fauna of Adult Ground Beetles
           (Coleoptera, Carabidae) of the National Park “Smolny” (Russia)
           

    • Authors: Alexander B. Ruchin, Sergei K. Alekseev, Oleg N. Artaev, Anatoliy A. Khapugin, Evgeniy A. Lobachev, Sergei V. Lukiyanov, Gennadiy B. Semishin
      First page: 84
      Abstract: (1) Background: Protected areas are “hotspots” of biodiversity in many countries. In such areas, ecological systems are preserved in their natural state, which allows them to protect animal populations. In several protected areas, the Coleoptera biodiversity is studied as an integral part of the ecological monitoring of the ecosystem state. This study was aimed to describe the Carabidae fauna in one of the largest protected areas of European Russia, namely National Park “Smolny”. (2) Methods: The study was conducted in April–September 2008, 2009, 2017–2021. A variety of ways was used to collect beetles (by hand, caught in light traps, pitfall traps, and others). Seasonal dynamics of the beetle abundance were studied in various biotopes. Coordinates were fixed for each observation. (3) Results: The dataset contains 1994 occurrences. In total, 32,464 specimens of Carabidae have been studied. The dataset contains information about 131 species of Carabidae beetles. In this study, we have not found two species (Carabus estreicheri and Calathus ambiguus), previously reported in the fauna of National Park “Smolny”. (4) Conclusions: The Carabidae diversity in the National Park “Smolny” is represented by 133 species from 10 subfamilies. Ten species (Carabus cancellatus, Harpalus laevipes, Carabus hortensis, Pterostichus niger, Poecilus versicolor, Pterostichus melanarius, Carabus glabratus, Carabus granulatus, Carabus arvensis baschkiricus, Pterostichus oblongopunctatus) constitute the majority of the Carabidae fauna. Seasonal dynamics are maximal in spring; the number of ground beetles decreases in biotopes by autumn.
      Citation: Data
      PubDate: 2022-06-23
      DOI: 10.3390/data7070084
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 85: Collection and Processing of Behavioural Data of
           the Olive Fruit Fly, Bactrocera oleae, When Exposed to Olive Twigs Treated
           with Different Commercial Products

    • Authors: Elissa Daher, Elena Chierici, Nicola Cinosi, Gabriele Rondoni, Franco Famiani, Eric Conti
      First page: 85
      Abstract: The need for the development of sustainable control methods of herbivorous insects implies that new molecules are proposed on the market. Among the different effects the new products may have on the target species, the alteration of insect oviposition behaviour might be considered. At the scope, parallel simple behavioural assays can be conducted in arena. Freely available software can be used to track observed events, but they often need intensive customization to the specific experimental design. Hence, integrating such software with, e.g., R environment, can provide a much more effective protocol development for data collection and analysis. Here we present a dataset and protocol for processing data of the oviposition behaviour of the olive fruit fly, Bactrocera oleae, when exposed to olive twigs treated with different commercial products. Treatments were rock powder, propolis, a mixture of rock powder and propolis, copper oxychloride, copper sulphate, and water as the experimental control. JWatcher was used to simultaneously collect data from 12 arena assays and ad-hoc developed R code was used to process raw data for data analyses. The procedure described here is novel and represents a valuable and transferable protocol to analyse observational events in B. oleae, as well as other biological systems.
      Citation: Data
      PubDate: 2022-06-24
      DOI: 10.3390/data7070085
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 86: Event Forecasting for Thailand’s Car Sales
           during the COVID-19 Pandemic

    • Authors: Chartchai Leenawong, Thanrada Chaikajonwat
      First page: 86
      Abstract: The COVID-19 pandemic that started in 2020 has affected Thailand’s automotive industry, among many others. During the several stages of the pandemic period, car sales figures fluctuate, and hence are difficult to fit and forecast. Due to the trend present in the sales data, the Holt’s forecasting method appears a reasonable choice. However, the pandemic, or in a more general term, the “event”, requires a subtle method to handle this extra event component. This research proposes a forecasting method based on Holt’s method to better suit the time-series data affected by large-scale events. In addition, when combined with seasonality adjustment, three modified Holt’s-based methods are proposed and implemented on Thailand’s monthly car sales covering the pandemic period. Different flags are carefully assigned to each of the sales data to represent different stages of the pandemic. The results show that Holt’s method with seasonality and events yields the lowest MAPE of 8.64%, followed by 9.47% of Holt’s method with events. Compared to the typical Holt’s MAPE of 16.27%, the proposed methods are proved strongly effective for time-series data containing the event component.
      Citation: Data
      PubDate: 2022-06-25
      DOI: 10.3390/data7070086
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 87: Context Sensitive Verb Similarity Dataset for
           Legal Information Extraction

    • Authors: Gathika Ratnayaka, Nisansa de Silva, Amal Shehan Perera, Gayan Kavirathne, Thirasara Ariyarathna, Anjana Wijesinghe
      First page: 87
      Abstract: Existing literature demonstrates that verbs are pivotal in legal information extraction tasks due to their semantic and argumentative properties. However, granting computers the ability to interpret the meaning of a verb and its semantic properties in relation to a given context can be considered as a challenging task, mainly due to the polysemic and domain specific behaviours of verbs. Therefore, developing mechanisms to identify behaviors of verbs and evaluate how artificial models detect the domain specific and polysemic behaviours of verbs can be considered as tasks with significant importance. In this regard, a comprehensive dataset that can be used as an evaluation resource, as well as a training data set, can be considered as a major requirement. In this paper, we introduce LeCoVe, which is a verb similarity dataset intended towards facilitating the process of identifying verbs with similar meanings in a legal domain specific context. Using the dataset, we evaluated both domain specific and domain generic embedding models, which were developed using state-of-the-art word representation and language modelling techniques. As a part of the experiments carried out using the announced dataset, Sense2Vec and BERT models were trained using a corpus of legal opinion texts in order to capture domain specific behaviours. In addition to LeCoVe, we demonstrate that a neural network model, which was developed by combining semantic, syntactic, and contextual features that can be obtained from the outputs of embedding models, can perform comparatively well, even in a low resource scenario.
      Citation: Data
      PubDate: 2022-06-28
      DOI: 10.3390/data7070087
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 88: Daily Precipitation Data for the Mexico City
           Metropolitan Area from 1930 to 2015

    • Authors: Erika D. López-Espinoza, Oscar A. Fuentes-Mariles, Dulce R. Herrera-Moro, Octavio Gómez-Ramos, David A. Novelo-Casanova, Jorge Zavala-Hidalgo
      First page: 88
      Abstract: The Metropolitan Zone of Mexico City, as well as the associated basin, includes the territories of Mexico City, some municipalities of the State of Mexico and the state of Hidalgo. In addition, this area is the most densely populated in Mexico. The region is influenced by mid-latitude and tropical weather systems and is vulnerable to extreme hydrometeorological events. In this context, we developed a dataset from the records of 136 geolocated sites that includes daily precipitation data from the CLImate COMputing (CLICOM) project and the Mexico City Water System. The data spans the period from 1930 to 2015 for the rainy months (June–October) from stations with records of 20 or more years. In each recording site, automatic and manual data quality control were performed to verify the consistency of the daily precipitation data. We believe that our highly dense precipitation dataset will be useful for climate, trend and extreme events analysis. Additionally, the data will allow validating simulations of numerical atmospheric models. The dataset is public, and it was previously used in other research to determine areas susceptible to flooding due to heavy rain events and to develop a web mapping application of daily precipitation data.
      Citation: Data
      PubDate: 2022-06-29
      DOI: 10.3390/data7070088
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 89: Goat Kidding Dataset

    • Authors: Gonçalves, Marques, Belo, Monteiro, Braz
      First page: 89
      Abstract: The detection of kidding in production animals is of the utmost importance, given the frequency of problems associated with the process, and the fact that timely human help can be a safeguard for the well-being of the mother and kid. The continuous human monitoring of the process is expensive, given the uncertainty of when it will occur, so the establishment of an autonomous mechanism that does so would allow calling the human responsible who could intervene at the opportune moment. The present dataset consists of data from the sensorization of 16 pregnant and two non-pregnant Charnequeira goats, during a period of four weeks, the kidding period. The data include measurements from neck to floor height, measured by ultrasound and accelerometry data measured by an accelerometer existing at the monitoring collar. Data was continuously sampled throughout the experiment every 10 s. The goats were monitored both in the goat shelter (day and night) and during the grazing period in the pasture. The births of the animals were also registered, both in terms of the time at which they took place, but also with details regarding how they took place and the number of offspring, and notes were also added.
      Citation: Data
      PubDate: 2022-06-29
      DOI: 10.3390/data7070089
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 90: TED-S: Twitter Event Data in Sports and Politics
           with Aggregated Sentiments

    • Authors: Hansi Hettiarachchi, Doaa Al-Turkey, Mariam Adedoyin-Olowe, Jagdev Bhogal, Mohamed Medhat Gaber
      First page: 90
      Abstract: Even though social media contain rich information on events and public opinions, it is impractical to manually filter this information due to data’s vast generation and dynamicity. Thus, automated extraction mechanisms are invaluable to the community. We need real data with ground truth labels to build/evaluate such systems. Still, to the best of our knowledge, no available social media dataset covers continuous periods with event and sentiment labels together except for events or sentiments. Datasets without time gaps are huge due to high data generation and require extensive effort for manual labelling. Different approaches, ranging from unsupervised to supervised, have been proposed by previous research targeting such datasets. However, their generic nature mainly fails to capture event-specific sentiment expressions, making them inappropriate for labelling event sentiments. Filling this gap, we propose a novel data annotation approach in this paper involving several neural networks. Our approach outperforms the commonly used sentiment annotation models such as VADER and TextBlob. Also, it generates probability values for all sentiment categories besides providing a single category per tweet, supporting aggregated sentiment analyses. Using this approach, we annotate and release a dataset named TED-S, covering two diverse domains, sports and politics. TED-S has complete subsets of Twitter data streams with both sub-event and sentiment labels, providing the ability to support event sentiment-based research.
      Citation: Data
      PubDate: 2022-06-30
      DOI: 10.3390/data7070090
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 91: Unified, Labeled, and Semi-Structured Database of
           Pre-Processed Mexican Laws

    • Authors: Bella Martinez-Seis, Obdulia Pichardo-Lagunas, Harlan Koff, Miguel Equihua, Octavio Perez-Maqueo, Arturo Hernández-Huerta
      First page: 91
      Abstract: This paper presents a corpus of pre-processed Mexican laws for computational tasks. The main contributions are the proposed JSON structure and the methodology used to achieve the semi-structured corpus with the selected algorithms. Law PDF documents were transformed into plain text, unified by a deconstruction of law–document structure, and labeled with natural language processing techniques considering part of speech (PoS); a process of entity extraction was also performed. The corpus includes the Mexican constitution and the Mexican laws that were collected from the official site in PDF format repealed before 14 October 2021. The collection has 305 documents, including: the Mexican constitution, 289 laws, 8 federal codes, 3 regulations, 2 statutes, 1 decree, and 1 ordinance. The semi-structured database includes the transformation of the set of laws from PDF format to a digital representation in order to facilitate its computational analysis. The documents were migrated to JSON type files to represent internal hierarchical relations. In addition, basic natural language processing techniques were implemented on laws for the identification of part of speech and named entities. The presented data set is mainly useful for text analysis and data science. It could be used for various legislative analysis tasks including: comprehension, interpretation, translation, classification, accessibility, coherence, and searches. Finally, we present some statistic of the identified entities and an example of the usefulness of the corpus for environmental laws.
      Citation: Data
      PubDate: 2022-07-06
      DOI: 10.3390/data7070091
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 92: A Database of Topo-Bathy Cross-Shore Profiles and
           Characteristics for U.S. Atlantic and Gulf of Mexico Sandy Coastlines

    • Authors: Rangley C. Mickey, Davina L. Passeri
      First page: 92
      Abstract: A database of seamless topographic and bathymetric cross-shore profiles along with metrics of the associated morphological characteristics based on the latest available lidar data ranging from 2011–2020 and bathymetry from the Continuously Updated Digital Elevation Model was developed for U.S. Atlantic and Gulf of Mexico open-ocean sandy coastlines. Cross-shore resolution ranges from 2.5 m for topographic and nearshore portions to 10 m for offshore portions. Topographic morphological characteristics include: foredune crest elevation, foredune toe elevation, foredune width, foredune volume, foredune relative height, beach width, beach volume, beach slope, and nearshore slope. This database was developed to serve as inputs for current and future morphological modeling studies aimed at providing real-time estimates of coastal change magnitudes resulting from imminent tropical storm and hurricane landfall. Beyond this need for model inputs, the database of cross-shore profiles and characteristic metrics could serve as a tool for coastal scientists to visualize and to analyze varying local, regional, and national variations in coastal morphology for varying types of studies and projects related to Atlantic and Gulf of Mexico sandy coastline environments.
      Citation: Data
      PubDate: 2022-07-06
      DOI: 10.3390/data7070092
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 93: The Role of Human Knowledge in Explainable AI

    • Authors: Andrea Tocchetti, Marco Brambilla
      First page: 93
      Abstract: As the performance and complexity of machine learning models have grown significantly over the last years, there has been an increasing need to develop methodologies to describe their behaviour. Such a need has mainly arisen due to the widespread use of black-box models, i.e., high-performing models whose internal logic is challenging to describe and understand. Therefore, the machine learning and AI field is facing a new challenge: making models more explainable through appropriate techniques. The final goal of an explainability method is to faithfully describe the behaviour of a (black-box) model to users who can get a better understanding of its logic, thus increasing the trust and acceptance of the system. Unfortunately, state-of-the-art explainability approaches may not be enough to guarantee the full understandability of explanations from a human perspective. For this reason, human-in-the-loop methods have been widely employed to enhance and/or evaluate explanations of machine learning models. These approaches focus on collecting human knowledge that AI systems can then employ or involving humans to achieve their objectives (e.g., evaluating or improving the system). This article aims to present a literature overview on collecting and employing human knowledge to improve and evaluate the understandability of machine learning models through human-in-the-loop approaches. Furthermore, a discussion on the challenges, state-of-the-art, and future trends in explainability is also provided.
      Citation: Data
      PubDate: 2022-07-06
      DOI: 10.3390/data7070093
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 94: A Systematic Review of Deep Knowledge Graph-Based
           Recommender Systems, with Focus on Explainable Embeddings

    • Authors: Ronky Francis Doh, Conghua Zhou, John Kingsley Arthur, Isaac Tawiah, Benjamin Doh
      First page: 94
      Abstract: Recommender systems (RS) have been developed to make personalized suggestions and enrich users’ preferences in various online applications to address the information explosion problems. However, traditional recommender-based systems act as black boxes, not presenting the user with insights into the system logic or reasons for recommendations. Recently, generating explainable recommendations with deep knowledge graphs (DKG) has attracted significant attention. DKG is a subset of explainable artificial intelligence (XAI) that utilizes the strengths of deep learning (DL) algorithms to learn, provide high-quality predictions, and complement the weaknesses of knowledge graphs (KGs) in the explainability of recommendations. DKG-based models can provide more meaningful, insightful, and trustworthy justifications for recommended items and alleviate the information explosion problems. Although several studies have been carried out on RS, only a few papers have been published on DKG-based methodologies, and a review in this new research direction is still insufficiently explored. To fill this literature gap, this paper uses a systematic literature review framework to survey the recently published papers from 2018 to 2022 in the landscape of DKG and XAI. We analyze how the methods produced in these papers extract essential information from graph-based representations to improve recommendations’ accuracy, explainability, and reliability. From the perspective of the leveraged knowledge-graph related information and how the knowledge-graph or path embeddings are learned and integrated with the DL methods, we carefully select and classify these published works into four main categories: the Two-stage explainable learning methods, the Joint-stage explainable learning methods, the Path-embedding explainable learning methods, and the Propagation explainable learning methods. We further summarize these works according to the characteristics of the approaches and the recommendation scenarios to facilitate the ease of checking the literature. We finally conclude by discussing some open challenges left for future research in this vibrant field.
      Citation: Data
      PubDate: 2022-07-12
      DOI: 10.3390/data7070094
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 95: Annotations of Lung Abnormalities in the Shenzhen
           Chest X-ray Dataset for Computer-Aided Screening of Pulmonary Diseases

    • Authors: Feng Yang, Pu Xuan Lu, Min Deng, Yì Xiáng J. Wáng, Sivaramakrishnan Rajaraman, Zhiyun Xue, Les R. Folio, Sameer K. Antani, Stefan Jaeger
      First page: 95
      Abstract: Developments in deep learning techniques have led to significant advances in automated abnormality detection in radiological images and paved the way for their potential use in computer-aided diagnosis (CAD) systems. However, the development of CAD systems for pulmonary tuberculosis (TB) diagnosis is hampered by the lack of training data that is of good visual and diagnostic quality, of sufficient size, variety, and, where relevant, containing fine-region annotations. This study presents a collection of annotations/segmentations of pulmonary radiological manifestations that are consistent with TB in the publicly available and widely used Shenzhen chest X-ray (CXR) dataset made available by the U.S. National Library of Medicine and obtained via a research collaboration with No. 3. People’s Hospital Shenzhen, China. The goal of releasing these annotations is to advance the state of the art for image segmentation methods toward improving the performance of the fine-grained segmentation of TB-consistent findings in digital chest X-ray images. The annotation collection comprises the following: (1) annotation files in JavaScript Object Notation (JSON) format that indicate locations and shapes of 19 lung pattern abnormalities for 336 TB patients; (2) mask files saved in PNG format for each abnormality per TB patient; and (3) a comma-separated values (CSV) file that summarizes lung abnormality types and numbers per TB patient. To the best of our knowledge, this is the first collection of pixel-level annotations of TB-consistent findings in CXRs.
      Citation: Data
      PubDate: 2022-07-13
      DOI: 10.3390/data7070095
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 96: SEN2VENµS, a Dataset for the Training of
           Sentinel-2 Super-Resolution Algorithms

    • Authors: Julien Michel, Juan Vinasco-Salinas, Jordi Inglada, Olivier Hagolle
      First page: 96
      Abstract: Boosted by the progress in deep learning, Single Image Super-Resolution (SISR) has gained a lot of interest in the remote sensing community, who sees it as an opportunity to compensate for satellites’ ever-limited spatial resolution with respect to end users’ needs. This is especially true for Sentinel-2 because of its unique combination of resolution, revisit time, global coverage and free and open data policy. While there has been a great amount of work on network architectures in recent years, deep-learning-based SISR in remote sensing is still limited by the availability of the large training sets it requires. The lack of publicly available large datasets with the required variability in terms of landscapes and seasons pushes researchers to simulate their own datasets by means of downsampling. This may impair the applicability of the trained model on real-world data at the target input resolution. This paper presents SEN2VENµS, an open-data licensed dataset composed of 10 m and 20 m cloud-free surface reflectance patches from Sentinel-2, with their reference spatially registered surface reflectance patches at 5 m resolution acquired on the same day by the VENµS satellite. This dataset covers 29 locations on earth with a total of 132,955 patches of 256 × 256 pixels at 5 m resolution and can be used for the training and comparison of super-resolution algorithms to bring the spatial resolution of 8 of the Sentinel-2 bands up to 5 m.
      Citation: Data
      PubDate: 2022-07-13
      DOI: 10.3390/data7070096
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 97: Dataset: Mobility Patterns of a Coastal Area Using
           Traffic Classification Radars

    • Authors: Joaquim Ferreira, Rui Aguiar, José A. Fonseca, João Almeida, João Barraca, Diogo Gomes, Rafael Oliveira, João Rufino, Fernando Braz, Pedro Gonçalves
      First page: 97
      Abstract: Monitoring road traffic is extremely important given the possibilities it opens up in terms of studying the behavior of road users, road design and planning problems, as well as because it can be used to predict future traffic. Especially on highways that connect beaches and larger urban areas, traffic is characterized by having peaks that are highly dependent on weather conditions and rest periods. This paper describes a dataset of mobility patterns of a coastal area in Aveiro region, Portugal, fully covered with traffic classification radars, over a two-year period. The sensing infrastructure was deployed in the scope of the PASMO project, an open living lab for co-operative intelligent transportation systems. The data gathered includes the speed of the detected objects, their position, and their type (heavy vehicle, light vehicle, two-wheeler, and pedestrian). The dataset includes 74,305 records, corresponding to the aggregation of road information at 10 min intervals. A brief analysis of the dataset shows the highly dynamic nature of traffic during the two-year period. In addition, the existence of meteorological records from nearby stations, and the recording of daily data on COVID-19 infections, make it possible to cross-reference information and study the influence of weather conditions and infections on traffic behavior.
      Citation: Data
      PubDate: 2022-07-13
      DOI: 10.3390/data7070097
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 98: SBGTool v2.0: An Empirical Study on a
           Similarity-Based Grouping Tool for Students’ Learning Outcomes

    • Authors: Zeynab (Artemis) Mohseni, Rafael M. Martins, Italo Masiello
      First page: 98
      Abstract: Visual learning analytics (VLA) tools and technologies enable the meaningful exchange of information between educational data and teachers. This allows teachers to create meaningful groups of students based on possible collaboration and productive discussions. VLA tools also allow a better understanding of students’ educational demands. Finding similar samples in huge educational datasets, however, involves the use of effective similarity measures that represent the teacher’s purpose. In this study, we conducted a user study and improved our web-based similarity-based grouping VLA tool, (SBGTool) to help teachers categorize students into groups based on their similar learning outcomes and activities. SBGTool v2.0 differs from SBGTool due to design changes made in response to teacher suggestions, the addition of sorting options to the dashboard table, the addition of a dropdown component to group the students into classrooms, and improvement in some visualizations. To counteract color blindness, we have also considered a number of color palettes. By applying SBGTool v2.0, teachers may compare the outcomes of individual students inside a classroom, determine which subjects are the most and least difficult over the period of a week or an academic year, identify the numbers of correct and incorrect responses for the most difficult and easiest subjects, categorize students into various groups based on their learning outcomes, discover the week with the most interactions for examining students’ engagement, and find the relationship between students’ activity and study success. We used 10,000 random samples from the EdNet dataset, a large-scale hierarchical educational dataset consisting of student–system interactions from multiple platforms at the university level, collected over a two-year period, to illustrate the tool’s efficacy. Finally, we provide the outcomes of the user study that evaluated the tool’s effectiveness. The results revealed that even with limited training, the participants were able to complete the required analysis tasks. Additionally, the participants’ feedback showed that the SBGTool v2.0 gained a good level of support for the given tasks, and it had the potential to assist teachers in enhancing collaborative learning in their classrooms.
      Citation: Data
      PubDate: 2022-07-18
      DOI: 10.3390/data7070098
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 99: A Cross-Sectional Study on Mental Health of School
           Students during the COVID-19 Pandemic in India

    • Authors: Deb, Kar, Deb, Biswas, Dar, Mukherjee
      First page: 99
      Abstract: The broad objective of the present study is to assess the levels of anxiety and depression of school students during the COVID-19 lockdown phase and their association with students’ background, stress, concerns and social support. In this regard, the present study follows a novel two stage approach. In the first phase, an empirical survey was carried out, based on multivariate statistical analysis, wherein a group of 273 school students participated in the study voluntarily. In the second phase, a novel Picture Fuzzy FFA (PF-FFA) method was applied for understanding the dynamics of facilitating and prohibiting factors for three categories of focus groups (FG), formulated on the basis of attendance in online classes. Findings revealed a significant impact of anxiety and depression on mental health. Further, PF-FFA examinedthe impact of the driving forces that steered children to attend class as contrasted to the the impact of the restricting forces.
      Citation: Data
      PubDate: 2022-07-18
      DOI: 10.3390/data7070099
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 100: Measured Indoor Environmental Data in a
           Retrofitted Multiapartment Building to Assess Energy Flexibility and
           Thermal Safety during Winter Power Outages

    • Authors: Silvia Erba, Alessandra Barbieri
      First page: 100
      Abstract: The article describes detailed measurements of indoor environmental parameters in a multiapartment housing block located in Milan, Italy, which has recently undergone a deep energy retrofit and is used as a thermal battery during the winter season. Two datasets are provided: one refers to a series of experimental tests conducted by the authors in an unoccupied flat, in which the thermal capacity of the building mass is exploited to act as an energy storage. The dataset reports, with a time step of 10 min, measurements of air temperature, globe temperature and surface temperatures in the analyzed room and data characterizing the adjacent spaces and the outdoor conditions. The second set of data refers to the air temperature monitoring carried out continuously in all the apartments of the apartment block, and hence also during two unplanned heating power outages. The analyzed data show the role of deep renovations in extending the time over which a building can remain in the thermal comfort range after an energy interruption and thus highlight the potential role of retrofitted buildings in delivering energy flexibility services to related stakeholders, such as the occupants, the building manager, the grid operator, and others. Furthermore, the dataset can be used to calibrate an energy simulation model to investigate different demand-side flexibility strategies and evaluate thermal safety under extreme weather events.
      Citation: Data
      PubDate: 2022-07-19
      DOI: 10.3390/data7070100
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 101: First Draft Genome Assembly of Tropical Bed Bug,
           Cimex hemipterus (F.)

    • Authors: Li Lim, Abdul Hafiz Ab Majid
      First page: 101
      Abstract: Cimex hemipterus, a blood-feeding ectoparasite commonly found in tropical regions, is a notorious household pest. The draft genome assembly of C. hemipterus is presented in this study, generated using SPAdes software with Illumina short reads. The obtained genome size was 388.66 Mb with a contig N50 size of 3503 bp. BUSCO assessment indicated that 96.71% of the expected Insecta lineage genes were complete in the genome assembly. Annotation of the C. hemipterus genome assembly identified 2.88% of repetitive sequences and 17,254 protein-coding genes. Functional annotation showed that most gene families are involved in cellular processes and signaling. This first C. hemipterus genome will be helpful in further understanding the bed bug genetics and evolution, while the annotated genome may also help in devising new strategies in bed bug management.
      Citation: Data
      PubDate: 2022-07-21
      DOI: 10.3390/data7070101
      Issue No: Vol. 7, No. 7 (2022)
       
  • Data, Vol. 7, Pages 74: Handling Dataset with Geophysical and Geological
           Variables on the Bolivian Andes by the GMT Scripts

    • Authors: Polina Lemenkova
      First page: 74
      Abstract: In this paper, an integrated mapping of the georeferenced data is presented using the QGIS and GMT scripting tool set. The study area encompasses the Bolivian Andes, South America, notable for complex geophysical and geological parameters and high seismicity. A data integration was performed for a detailed analysis of the geophysical and geological setting. The data included the raster and vector datasets captured from the open sources: the IRIS seismic data (2015 to 2021), geophysical data from satellite-derived gravity grids based on CryoSat, topographic GEBCO data, geoid undulation data from EGM-2008, and geological georeferences’ vector data from the USGS. The techniques of data processing included quantitative and qualitative evaluation of the seismicity and geophysical setting in Bolivia. The result includes a series of thematic maps on the Bolivian Andes. Based on the data analysis, the western region was identified as the most seismically endangered area in Bolivia with a high risk of earthquake hazards in Cordillera Occidental, followed by Altiplano and Cordillera Real. The earthquake magnitude here ranges from 1.8 to 7.6. The data analysis shows a tight correlation between the gravity, geophysics, and topography in the Bolivian Andes. The cartographic scripts used for processing data in GMT are available in the author’s public GitHub repository in open-access with the provided link. The utility of scripting cartographic techniques for geophysical and topographic data processing combined with GIS spatial evaluation of the geological data supported automated mapping, which has applicability for risk assessment and geological hazard mapping of the Bolivian Andes, South America.
      Citation: Data
      PubDate: 2022-06-01
      DOI: 10.3390/data7060074
      Issue No: Vol. 7, No. 6 (2022)
       
  • Data, Vol. 7, Pages 75: EndoNuke: Nuclei Detection Dataset for Estrogen
           and Progesterone Stained IHC Endometrium Scans

    • Authors: Anton Naumov, Egor Ushakov, Andrey Ivanov, Konstantin Midiber, Tatyana Khovanskaya, Alexandra Konyukova, Polina Vishnyakova, Sergei Nora, Liudmila Mikhaleva, Timur Fatkhudinov, Evgeny Karpulevich
      First page: 75
      Abstract: We present EndoNuke, an open dataset consisting of tiles from endometrium immunohistochemistry slides with the nuclei annotated as keypoints. Several experts with various experience have annotated the dataset. Apart from gathering the data and creating the annotation, we have performed an agreement study and analyzed the distribution of nuclei staining intensity.
      Citation: Data
      PubDate: 2022-06-01
      DOI: 10.3390/data7060075
      Issue No: Vol. 7, No. 6 (2022)
       
  • Data, Vol. 7, Pages 76: The Complete Mitochondrial Genome of a Neglected
           Breed, the Peruvian Creole Cattle (Bos taurus), and Its Phylogenetic
           Analysis

    • Authors: Carlos I. Arbizu, Rubén D. Ferro-Mauricio, Julio C. Chávez-Galarza, Héctor V. Vásquez, Jorge L. Maicelo, Carlos Poemape, Jhony Gonzales, Carlos Quilcate, Flor-Anita Corredor
      First page: 76
      Abstract: Cattle spread throughout the American continent during the colonization years, originating creole breeds that adapted to a wide range of climate conditions. The population of creole cattle in Peru is decreasing mainly due to the introduction of more productive breeds in recent years. During the last 15 years, there has been significant progress in cattle genomics. However, little is known about the genetics of the Peruvian creole cattle (PCC) despite its importance to (i) improving productivity in the Andean region, (ii) agricultural labor, and (iii) cultural traditions. In addition, the origin and phylogenetic relationship of the PCC are still unclear. In order to promote the conservation of the PCC, we sequenced the mitochondrial genome of a creole bull, which also possessed exceptional fighting skills and was employed for agricultural tasks, from the highlands of Arequipa for the first time. The total mitochondrial genome sequence is 16,339 bp in length with the base composition of 31.43% A, 28.64% T, 26.81% C, and 13.12% G. It contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, and a control region. Among the 37 genes, 28 were positioned on the H-strand and 9 were positioned on the L-strand. The most frequently used codons were CUA (leucine), AUA (isoleucine), AUU (isoleucine), AUC (isoleucine), and ACA (threonine). Maximum likelihood reconstruction using complete mitochondrial genome sequences showed that the PCC is related to native African breeds. The annotated mitochondrial genome of PCC will serve as an important genetic data set for further breeding work and conservation strategies.
      Citation: Data
      PubDate: 2022-06-06
      DOI: 10.3390/data7060076
      Issue No: Vol. 7, No. 6 (2022)
       
  • Data, Vol. 7, Pages 77: Statistical Dataset and Data Acquisition System
           for Monitoring the Voltage and Frequency of the Electrical Network in an
           Environment Based on Python and Grafana

    • Authors: Javier Fernández-Morales, Juan-José González-de-la González-de-la Rosa, José-María Sierra-Fernández, Manuel-Jesús Espinosa-Gavira, Olivia Florencias-Oliveros, Agustín Agüera-Pérez, José-Carlos Palomares-Salas, Paula Remigio-Carmona
      First page: 77
      Abstract: This article presents a unique dataset, from a public building, of voltage data, acquired using a hybrid measurement solution that combines PythonTM for acquisition and GrafanaTM for results representation. This study aims to benefit communities, by demonstrating how to achieve more efficient energy management. The study outlines how to obtain a more realistic vision of the quality of the supply, that is oriented to the monitoring of the state of the network; this should allow for better understanding, which should in turn enable the optimization of the operation and maintenance of power systems. Our work focused on frequency and higher order statistical estimators which, combined with exploratory data analysis techniques, improved the characterization of the shape of the stress signal. These techniques and data, together with the acquisition and monitoring system, present a unique combination of low-cost measurement solutions, which have the underlying benefit of contributing to industrial benchmarking. Our study proposes an effective and versatile system, which can do acquisition, statistical analysis, database management and results representation in less than a second. The system offers a wide variety of graphs to present the results of the analysis, so that the user can observe them and identify, with relative ease, any anomalies in the supply which could damage the sensitive equipment of the correspondent installation. It is a system, therefore, that not only provides information about the power quality, but also significantly contributes to the safety and maintenance of the installation. This system can be practically realized, subject to the availability of internet access.
      Citation: Data
      PubDate: 2022-06-06
      DOI: 10.3390/data7060077
      Issue No: Vol. 7, No. 6 (2022)
       
  • Data, Vol. 7, Pages 78: Deep Learning Dataset for Estimating Burned Areas:
           Case Study, Indonesia

    • Authors: Yudhi Prabowo, Anjar Dimara Sakti, Kuncoro Adi Pradono, Qonita Amriyah, Fadillah Halim Rasyidy, Irwan Bengkulah, Kurnia Ulfa, Danang Surya Candra, Muhammad Thufaili Imdad, Shadiq Ali
      First page: 78
      Abstract: Wildland fire is one of the most causes of deforestation, and it has an important impact on atmospheric emissions, notably CO2. It occurs almost every year in Indonesia, especially during the dry season. Therefore, it is necessary to identify the burned areas from remote sensing images to establish the zoning map of areas prone to wildland fires. Many methods have been developed for mapping burned areas from low-resolution to medium-resolution satellite images. One of the popular approaches for mapping tasks is a deep learning approach using U-Net architecture. However, it needs a large amount of representative training data to develop the model. In this paper, we present a new dataset of burned areas in Indonesia for training or evaluating the U-Net model. We delineate burned areas manually by visual interpretation on Landsat-8 satellite images. The dataset is collected from some regions in Indonesia, and it consists of 227 images with a size of 512 × 512 pixels. It contains one or more burned scars or only the background and its labeled masks. The dataset can be used to train and evaluate the deep learning model for image detection, segmentation, and classification tasks related to burned area mapping.
      Citation: Data
      PubDate: 2022-06-09
      DOI: 10.3390/data7060078
      Issue No: Vol. 7, No. 6 (2022)
       
  • Data, Vol. 7, Pages 79: UNIPD-BPE: Synchronized RGB-D and Inertial Data
           for Multimodal Body Pose Estimation and Tracking

    • Authors: Mattia Guidolin, Emanuele Menegatti, Monica Reggiani
      First page: 79
      Abstract: The ability to estimate human motion without requiring any external on-body sensor or marker is of paramount importance in a variety of fields, ranging from human–robot interaction, Industry 4.0, surveillance, and telerehabilitation. The recent development of portable, low-cost RGB-D cameras pushed forward the accuracy of markerless motion capture systems. However, despite the widespread use of such sensors, a dataset including complex scenes with multiple interacting people, recorded with a calibrated network of RGB-D cameras and an external system for assessing the pose estimation accuracy, is still missing. This paper presents the University of Padova Body Pose Estimation dataset (UNIPD-BPE), an extensive dataset for multi-sensor body pose estimation containing both single-person and multi-person sequences with up to 4 interacting people. A network with 5 Microsoft Azure Kinect RGB-D cameras is exploited to record synchronized high-definition RGB and depth data of the scene from multiple viewpoints, as well as to estimate the subjects’ poses using the Azure Kinect Body Tracking SDK. Simultaneously, full-body Xsens MVN Awinda inertial suits allow obtaining accurate poses and anatomical joint angles, while also providing raw data from the 17 IMUs required by each suit. This dataset aims to push forward the development and validation of multi-camera markerless body pose estimation and tracking algorithms, as well as multimodal approaches focused on merging visual and inertial data.
      Citation: Data
      PubDate: 2022-06-09
      DOI: 10.3390/data7060079
      Issue No: Vol. 7, No. 6 (2022)
       
  • Data, Vol. 7, Pages 80: Multi-Resolution Discrete Cosine Transform Fusion
           Technique Face Recognition Model

    • Authors: Bader M. AlFawwaz, Atallah AL-Shatnawi, Faisal Al-Saqqar, Mohammad Nusir
      First page: 80
      Abstract: This work presents a Multi-Resolution Discrete Cosine Transform (MDCT) fusion technique Fusion Feature-Level Face Recognition Model (FFLFRM) comprising face detection, feature extraction, feature fusion, and face classification. It detects core facial characteristics as well as local and global features utilizing Local Binary Pattern (LBP) and Principal Component Analysis (PCA) extraction. MDCT fusion technique was applied, followed by Artificial Neural Network (ANN) classification. Model testing used 10,000 faces derived from the Olivetti Research Laboratory (ORL) library. Model performance was evaluated in comparison with three state-of-the-art models depending on Frequency Partition (FP), Laplacian Pyramid (LP) and Covariance Intersection (CI) fusion techniques, in terms of image features (low-resolution issues and occlusion) and facial characteristics (pose, and expression per se and in relation to illumination). The MDCT-based model yielded promising recognition results, with a 97.70% accuracy demonstrating effectiveness and robustness for challenges. Furthermore, this work proved that the MDCT method used by the proposed FFLFRM is simpler, faster, and more accurate than the Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT). As well as that it is an effective method for facial real-life applications.
      Citation: Data
      PubDate: 2022-06-15
      DOI: 10.3390/data7060080
      Issue No: Vol. 7, No. 6 (2022)
       
  • Data, Vol. 7, Pages 81: Indoor Temperature and Relative Humidity Dataset
           of Controlled and Uncontrolled Environments

    • Authors: Juan Botero-Valencia, Luis Castano-Londono, David Marquez-Viloria
      First page: 81
      Abstract: The large volume of data generated with the increasing development of Internet of Things applications has encouraged the development of a large number of works related to data management, wireless communication technologies, the deployment of sensor networks with limited resources, and energy consumption. Different types of new or well-known algorithms have been used for the processing and analysis of data acquired through sensor networks, algorithms for compression, filtering, calibration, analysis, or variables being common. In some cases, databases available on the network, public government databases, data generated from sensor networks deployed by the authors themselves, or values generated by simulation are used. In the case that the work approach is more related to the algorithm than to the characteristics of the sensor networks, these data source options may have some limitations such as the availability of databases, the time required for data acquisition, the need for the deployment of a real sensors network, and the reliability or characteristics of acquired data. The dataset in this article contains 4,164,267 values of timestamp, indoor temperature, and relative humidity acquired in the months of October and November 2019, with twelve temperature and humidity sensors Xiaomi Mijia at the laboratory of Control Systems and Robotics, and the De La Salle Museum of Natural Sciences, both of the Instituto Tecnológico Metropolitano, Medellín—Colombia. The devices were calibrated in a Metrology Laboratory accredited by the National Accreditation Body of Colombia (Organismo Nacional de Acreditación de Colombia—ONAC). The dataset is available in Mendeley Data repository.
      Citation: Data
      PubDate: 2022-06-16
      DOI: 10.3390/data7060081
      Issue No: Vol. 7, No. 6 (2022)
       
  • Data, Vol. 7, Pages 82: Dataset for Detecting the Electrical Behavior of
           Photovoltaic Panels from RGB Images

    • Authors: Juan-Pablo Villegas-Ceballos, Mateo Rico-Garcia, Carlos Andres Ramos-Paja
      First page: 82
      Abstract: The dynamic reconfiguration and maximum power point tracking in large-scale photovoltaic (PV) systems require a large number of voltage and current sensors. In particular, the reconfiguration process requires a pair of voltage/current sensors for each panel, which introduces costs, increases size and reduces the reliability of the installation. A suitable solution for reducing the number of sensors is to adopt image-based solutions to estimate the electrical characteristics of the PV panels, but the lack of reliable data with large diversity of irradiance and shading conditions is a major problem in this topic. Therefore, this paper presents a dataset correlating RGB images and electrical data of PV panels with different irradiance and shading conditions; moreover, the dataset also provides complementary weather data and additional image characteristics to support the training of estimation models. In particular, the dataset was designed to support the design of image-based estimators of electrical data, which could be used to replace large arrays of sensors. The dataset was captured during 70 days distributed between 2020 and 2021, generating 5211 images and registers. The paper also describes the measurement platform used to collect the data, which will help to replicate the experiments in different geographical locations.
      Citation: Data
      PubDate: 2022-06-17
      DOI: 10.3390/data7060082
      Issue No: Vol. 7, No. 6 (2022)
       
  • Data, Vol. 7, Pages 168: Spectrogram Data Set for Deep-Learning-Based RF
           Frame Detection

    • Authors: Jakob Wicht, Ulf Wetzker, Vineeta Jain
      First page: 168
      Abstract: Automated spectrum analysis serves as a troubleshooting tool that helps to diagnose faults in wireless networks such as difficult signal propagation conditions and coexisting wireless networks. It provides a higher monitoring coverage while requiring less expertise compared with manual spectrum analysis. In this paper, we introduce a data set that can be used to train and evaluate deep learning models, capable of detecting frames from different wireless standards as well as interference between single frames. Since manually labeling a high variety of frames in different environments is too challenging, an artificial data generation pipeline was developed. The data set consists of 20,000 augmented signal segments, each containing a random number of different Wi-Fi and Bluetooth frames, their spectral image representations and labels that describe the position and type of frame within the spectrogram. The data set contains results of intermediate processing steps that enable the research or teaching community to create new data sets for specific requirements or to provide new interesting examination examples.
      Citation: Data
      PubDate: 2022-11-23
      DOI: 10.3390/data7120168
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 169: A Waveform Dataset in Continuous Mode of the
           Montefeltro Seismic Network (MF) in Central-Northern Italy from 2018 to
           2020

    • Authors: Antonella Megna, Giovanni Battista Cimini, Alessandro Marchetti, Nicola Mauro Pagliuca, Stefano Santini
      First page: 169
      Abstract: The Montefeltro seismic network (FDSN Network code: 1S) was deployed in the Apennines area of northern Marche and southern Emilia-Romagna regions (central Italy). A temporary network was set up in December 2018 and continues to operate, with an array consisting of stations equipped with dynamic digitizers and three-component short/extended/broad band seismometers (Guralp CMG/20s and 30s, Lennartz 3D/5s, Sara SS20 3D/0.5s sensors). The network records in continuous mode at 100 sps. The data are used to analyze the seismic activity and the spatiotemporal evolution of small seismic sequences occurring in the considered area and surrounding zones, strongly clustered in time and space. The data of dataset files are mini-seed formatted and subdivided by the following tree: (1) the dataset is divided by years; (2) the dataset is then subdivided by stations; (3) finally, the data are divided by days of each year in every station folder.
      Citation: Data
      PubDate: 2022-11-26
      DOI: 10.3390/data7120169
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 170: Identifying and Classifying Urban Data Sources
           for Machine Learning-Based Sustainable Urban Planning and Decision Support
           Systems Development

    • Authors: Stéphane C. K. Tékouabou, Jérôme Chenal, Rida Azmi, Hamza Toulni, El Bachir Diop, Anastasija Nikiforova
      First page: 170
      Abstract: With the increase in the amount and variety of data that are constantly produced, collected, and exchanged between systems, the efficiency and accuracy of solutions/services that use data as input may suffer if an inappropriate or inaccurate technique, method, or tool is chosen to deal with them. This paper presents a global overview of urban data sources and structures used to train machine learning (ML) algorithms integrated into urban planning decision support systems (DSS). It contributes to a common understanding of choosing the right urban data for a given urban planning issue, i.e., their type, source and structure, for more efficient use in training ML models. For the purpose of this study, we conduct a systematic literature review (SLR) of all relevant peer-reviewed studies available in the Scopus database. More precisely, 248 papers were found to be relevant with their further analysis using a text-mining approach to determine (a) the main urban data sources used for ML modeling, (b) the most popular approaches used in relevant urban planning and urban problem-solving studies and their relationship to the type of data source used, and (c) the problems commonly encountered in their use. After classifying them, we identified the strengths and weaknesses of data sources depending on several predefined factors. We found that the data mainly come from two main categories of sources, namely (1) sensors and (2) statistical surveys, including social network data. They can be classified as (a) opportunistic or (b) non-opportunistic depending on the process of data acquisition, collection, and storage. Data sources are closely correlated with their structure and potential urban planning issues to be addressed. Almost all urban data have an indexed structure and, in particular, either attribute tables for statistical survey data and data from simple sensors (e.g., climate and pollution sensors) or vectors, mostly obtained from satellite images after large-scale spatio-temporal analysis. The paper also provides a discussion of the potential opportunities, emerging issues, and challenges that urban data sources face and should overcome to better catalyze intelligent/smart planning. This should contribute to the general understanding of the data, their sources and the challenges to be faced and overcome by those seeking data and integrating them into smart applications and urban-planning processes.
      Citation: Data
      PubDate: 2022-11-28
      DOI: 10.3390/data7120170
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 171: Experimental and Nonlinear Finite Element
           Analysis Data for an Innovative Buckling Restrained Bracing System to
           Rehabilitate Seismically Deficient Structures

    • Authors: Abdul Saboor Karzad, Zaid A. Al-Sadoon, Abdullah Sagheer, Mohammad AlHamaydeh
      First page: 171
      Abstract: This article presents experimental data and nonlinear finite element analysis (NLFEA) modeling for an innovative buckling restrained bracing (BRB) system. The data were collected from qualification testing of introduced BRBs per the AISC 341 test provision and finite element modeling. The BRB is made of three parts: core bar, restraining unit, and end units, in which duplicates of three different core bar cross sections (i.e., fully threaded, threaded notched, and smooth shaved) were tested. The BRBs introduced in this research come with innovative end parts, so-called fingers. These fingers provide the longitudinal gap required in every BRB system and simultaneously prevent buckling of the core bar at the end regions at both ends of the BRB sample, thus facilitating an easy core replacement if it gets damaged in the event of an earthquake. The measured parameters were the applied cyclic load and the corresponding displacement. Analysis of the acquired data illustrated an almost symmetric hysteric behavior with a little higher capacity under compression but a noticeable overall ductility of 4. Moreover, finite element modeling data for one type of core bar (fully threaded) were curated. The data presented in this paper will be valuable for fabricating BRBs in practice and further research on the topic considered.
      Citation: Data
      PubDate: 2022-11-28
      DOI: 10.3390/data7120171
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 172: Data from Zimbabwean College Students on the
           Measurement Invariance of the Entrepreneurship Goal and Implementation
           Intentions Scales

    • Authors: Takawira Munyaradzi Ndofirepi
      First page: 172
      Abstract: This article analyses primary data on the entrepreneurship intentions of selected Zimbabwean college students. The goal of this study was to examine the measurement invariance of the entrepreneurship goal and implementation intention scales across gender groups in a higher education setting. Entrepreneurship goal intentions (EGI) and entrepreneurship implementation intentions (EII) are examined as separate but related constructs. To address the research goal, a positivist philosophy and quantitative research approach were used. A cross-sectional survey was used to collect data from a convenient sample of 262 college students in Zimbabwe. A researcher-administered questionnaire, written in English, was distributed to the respondents and collected after completion. Multi-group confirmatory analysis was performed on the dataset using JASP computer software. The results obtained confirmed all four levels of measurement invariance, namely configural, metric, scalar, and strict invariance. The pattern of the results validates the consistency of the measurement properties of the entrepreneurial intention instruments designed in developed countries across different contexts of use. Researchers, entrepreneurship educators, and policymakers in Zimbabwe can use the results of this analysis to quantify potential entrepreneurs among young adults and to come up with intervention measures to support future entrepreneurship.
      Citation: Data
      PubDate: 2022-11-29
      DOI: 10.3390/data7120172
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 173: Digital Twins: A Systematic Literature Review
           Based on Data Analysis and Topic Modeling

    • Authors: Kuzma Kukushkin, Yury Ryabov, Alexey Borovkov
      First page: 173
      Abstract: The digital twin has recently become a popular topic in research related to manufacturing, such as Industry 4.0, the industrial internet of things, and cyber-physical systems. In addition, digital twins are the focus of several research areas: construction, urban management, digital transformation of the economy, medicine, virtual reality, software testing, and others. The concept is not yet fully defined, its scope seems unlimited, and the topic is relatively new; all this can present a barrier to research. The main goal of this paper is to develop a proper methodology for visualizing the digital-twin science landscape using modern bibliometric tools, text-mining and topic-modeling, based on machine learning models—Latent Dirichlet Allocation (LDA) and BERTopic (Bidirectional Encoder Representations from Transformers). The scope of the study includes 8693 publications on the topic selected from the Scopus database, published between January 1993 and September 2022. Keyword co-occurrence analysis and topic-modeling indicate that studies on digital twins are still in the early stage of development. At the same time, the core of the topic is growing, and some topic clusters are emerging. More than 100 topics can be identified; the most popular and fastest-growing topic is ‘digital twins of industrial robots, production lines and objects.’ Further efforts are needed to verify the proposed methodology, which can be achieved by analyzing other research fields.
      Citation: Data
      PubDate: 2022-11-30
      DOI: 10.3390/data7120173
      Issue No: Vol. 7, No. 12 (2022)
       
  • Data, Vol. 7, Pages 142: Data of National Dishes in the Developed and
           Developing Countries in the World, Their Similarity and Trade Flows

    • Authors: Anne C. Wunderlich, Andreas Kohler
      First page: 142
      Abstract: This paper presents a database that includes information on national recipes and their ingredients for 171 countries, measures for food taste similarities between all 171 countries as well as bilateral migration and agro-food trade data for 5 years. The database can be used for analyzing e.g., the relation between food preferences and international trade or food preferences and health outcomes (e.g., obesity) across countries.
      Citation: Data
      PubDate: 2022-10-26
      DOI: 10.3390/data7110142
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 143: Assessing the Accuracy of Google Trends for
           Predicting Presidential Elections: The Case of Chile, 2006–2021

    • Authors: Francisco Vergara-Perucich
      First page: 143
      Abstract: This article presents the results of reviewing the predictive capacity of Google Trends for national elections in Chile. The electoral results of the elections between Michelle Bachelet and Sebastián Piñera in 2006, Sebastián Piñera and Eduardo Frei in 2010, Michelle Bachelet and Evelyn Matthei in 2013, Sebastián Piñera and Alejandro Guillier in 2017, and Gabriel Boric and José Antonio Kast in 2021 were reviewed. The time series analyzed were organized on the basis of relative searches between the candidacies, assisted by R software, mainly with the gtrendsR and forecast libraries. With the series constructed, forecasts were made using the Auto Regressive Integrated Moving Average (ARIMA) technique to check the weight of one presidential option over the other. The ARIMA analyses were performed on 3 ways of organizing the data: the linear series, the series transformed by moving average, and the series transformed by Hodrick–Prescott. The results indicate that the method offers the optimal predictive ability.
      Citation: Data
      PubDate: 2022-10-27
      DOI: 10.3390/data7110143
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 144: Technical Data of In Silico Analysis of the
           Interaction of Dietary Flavonoid Compounds against Spike-Glycoprotein and
           Proteases of SARS-CoV-2

    • Authors: Altu, Budiman, Razali, Mokhtar, Kamaruzaman
      First page: 144
      Abstract: The spike glycoprotein (S protein), 3-chymotrypsin-like protease (3CL-Pro), and papain-like protease (PL-Pro) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus are widely targeted for the discovery of therapeutic compounds against this virus. Dietary flavonoid compounds were proposed as a candidate for safe therapy for COVID-19 patients. Nevertheless, wet lab experiments for high-throughput screening of the compounds are undoubtedly time and cost consuming. This study aims to screen dietary flavonoid compounds that bind to S protein, 3CL-Pro, and PL-Pro of SARS-CoV-2. For this purpose, protein structures of the receptor-binding domain (RBD) of S protein (6M0J), 3CL-Pro (6LU7), and PL-Pro (6W9C) were retrieved from the RCSB Protein Data Bank (PDB). Twelve dietary flavonoid compounds were selected for the studies on their binding affinity to the targeted proteins by global and local docking. The docking and molecular dynamic (MD) simulations were performed using YASARA software. Out of 12 compounds, the highest binding score was observed between hesperidin against RBD S protein (−9.98 kcal/mol), 3CL-Pro (−9.43 kcal/mol), and PL-Pro (−8.89 kcal/mol) in global docking. Interestingly, MD simulation revealed that the complex between 3CL-Pro and RBD S protein has better stability than PL-Pro. This study suggests that hesperidin might have versatile inhibitory properties against several essential proteins of SARS-CoV-2. This study, nevertheless, remains to be confirmed through in vitro and in vivo assays.
      Citation: Data
      PubDate: 2022-10-27
      DOI: 10.3390/data7110144
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 145: In Vitro Major Arterial Cardiovascular Simulator
           to Generate Benchmark Data Sets for In Silico Model Validation

    • Authors: Michelle Wisotzki, Alexander Mair, Paul Schlett, Bernhard Lindner, Max Oberhardt, Stefan Bernhard
      First page: 145
      Abstract: Cardiovascular diseases are commonly caused by atherosclerosis, stenosis and aneurysms. Understanding the influence of these pathological conditions on the circulatory mechanism is required to establish methods for early diagnosis. Different tools have been developed to simulate healthy and pathological conditions of blood flow. These simulations are often based on computational models that allow the generation of large data sets for further investigation. However, because computational models often lack some aspects of real-world data, hardware simulators are used to close this gap and generate data for model validation. The aim of this study is to develop and validate a hardware simulator to generate benchmark data sets of healthy and pathological conditions. The development process was led by specific design criteria to allow flexible and physiological simulations. The in vitro hardware simulator includes the major 33 arteries and is driven by a ventricular assist device generating a parametrised in-flow condition at the heart node. Physiologic flow conditions, including heart rate, systolic/diastolic pressure, peripheral resistance and compliance, are adjustable across a wide range. The pressure and flow waves at 17+1 locations are measured by inverted fluid-resistant pressure transducers and one ultrasound flow transducer, supporting a detailed analysis of the measurement data even for in silico modelling applications. The pressure and flow waves are compared to in vivo measurements and show physiological conditions. The influence of the degree and location of the stenoses on blood pressure and flow was also investigated. The results indicate decreasing translesional pressure and flow with an increasing degree of stenosis, as expected. The benchmark data set is made available to the research community for validating and comparing different types of computational models. It is hoped that the validation and improvement of computational simulation models will provide better clinical predictions.
      Citation: Data
      PubDate: 2022-10-27
      DOI: 10.3390/data7110145
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 146: Predicting Student Dropout and Academic Success

    • Authors: Valentim Realinho, Jorge Machado, Luís Baptista, Mónica V. Martins
      First page: 146
      Abstract: Higher education institutions record a significant amount of data about their students, representing a considerable potential to generate information, knowledge, and monitoring. Both school dropout and educational failure in higher education are an obstacle to economic growth, employment, competitiveness, and productivity, directly impacting the lives of students and their families, higher education institutions, and society as a whole. The dataset described here results from the aggregation of information from different disjointed data sources and includes demographic, socioeconomic, macroeconomic, and academic data on enrollment and academic performance at the end of the first and second semesters. The dataset is used to build machine learning models for predicting academic performance and dropout, which is part of a Learning Analytic tool developed at the Polytechnic Institute of Portalegre that provides information to the tutoring team with an estimate of the risk of dropout and failure. The dataset is useful for researchers who want to conduct comparative studies on student academic performance and also for training in the machine learning area.
      Citation: Data
      PubDate: 2022-10-28
      DOI: 10.3390/data7110146
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 147: Thematic Analysis of Indonesian Physics Education
           Research Literature Using Machine Learning

    • Authors: Purwoko Haryadi Santoso, Edi Istiyono, Haryanto, Wahyu Hidayatulloh
      First page: 147
      Abstract: Abundant physics education research (PER) literature has been disseminated through academic publications. Over the years, the growing body of literature challenges Indonesian PER scholars to understand how the research community has progressed and possible future work that should be encouraged. Nevertheless, the previous traditional method of thematic analysis possesses limitations when the amount of PER literature exponentially increases. In order to deal with this plethora of publications, one of the machine learning (ML) algorithms from natural language processing (NLP) studies was employed in this paper to automate a thematic analysis of Indonesian PER literature that still needs to be explored within the community. One of the well-known NLP algorithms, latent Dirichlet allocation (LDA), was used in this study to extract Indonesian PER topics and their evolution between 2014 and 2021. A total of 852 papers (~4 to 8 pages each) were collectively downloaded from five international conference proceedings organized, peer reviewed, and published by Indonesian PER researchers. Before their topics were modeled through the LDA algorithm, our data corpus was preprocessed through several common procedures of established NLP studies. The findings revealed that LDA had thematically quantified Indonesian PER topics and described their distinct development over a certain period. The identified topics from this study recommended that the Indonesian PER community establish robust development in eight distinct topics to the present. Here, we commenced with an initial interest focusing on research on physics laboratories and followed the research-based instruction in late 2015. For the past few years, the Indonesian PER scholars have mostly studied 21st century skills which have given way to a focus on developing relevant educational technologies and promoting the interdisciplinary aspects of physics education. We suggest an open room for Indonesian PER scholars to address the qualitative aspects of physics teaching and learning that is still scant within the literature.
      Citation: Data
      PubDate: 2022-10-28
      DOI: 10.3390/data7110147
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 148: An Open Dataset of Connected Speech in Aphasia
           with Consensus Ratings of Auditory-Perceptual Features

    • Authors: Zoe Ezzes, Sarah M. Schneck, Marianne Casilio, Davida Fromm, Antje S. Mefferd, Michael de Riesthal, Stephen M. Wilson
      First page: 148
      Abstract: Auditory-perceptual rating of connected speech in aphasia (APROCSA) is a system in which trained listeners rate a variety of perceptual features of connected speech samples, representing the disruptions and abnormalities that commonly occur in aphasia. APROCSA has shown promise as an approach for quantifying expressive speech and language function in individuals with aphasia. The aim of this study was to acquire and share a set of audiovisual recordings of connected speech samples from a diverse group of individuals with aphasia, along with consensus ratings of APROCSA features, for future use as training materials to teach others how to use the APROCSA system. Connected speech samples were obtained from six individuals with chronic post-stroke aphasia. The first five minutes of participant speech were excerpted from each sample, and five researchers independently evaluated each sample using APROCSA, rating its 27 features on a five-point scale. The researchers then discussed each feature in turn to obtain consensus ratings. The dataset will provide a useful, freely accessible resource for researchers, clinicians, and students to learn how to evaluate aphasic speech with an auditory-perceptual approach.
      Citation: Data
      PubDate: 2022-10-30
      DOI: 10.3390/data7110148
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 149: Cryptocurrency Price Prediction with
           Convolutional Neural Network and Stacked Gated Recurrent Unit

    • Authors: Chuen Yik Kang, Chin Poo Lee, Kian Ming Lim
      First page: 149
      Abstract: Virtual currencies have been declared as one of the financial assets that are widely recognized as exchange currencies. The cryptocurrency trades caught the attention of investors as cryptocurrencies can be considered as highly profitable investments. To optimize the profit of the cryptocurrency investments, accurate price prediction is essential. In view of the fact that the price prediction is a time series task, a hybrid deep learning model is proposed to predict the future price of the cryptocurrency. The hybrid model integrates a 1-dimensional convolutional neural network and stacked gated recurrent unit (1DCNN-GRU). Given the cryptocurrency price data over the time, the 1-dimensional convolutional neural network encodes the data into a high-level discriminative representation. Subsequently, the stacked gated recurrent unit captures the long-range dependencies of the representation. The proposed hybrid model was evaluated on three different cryptocurrency datasets, namely Bitcoin, Ethereum, and Ripple. Experimental results demonstrated that the proposed 1DCNN-GRU model outperformed the existing methods with the lowest RMSE values of 43.933 on the Bitcoin dataset, 3.511 on the Ethereum dataset, and 0.00128 on the Ripple dataset.
      Citation: Data
      PubDate: 2022-10-31
      DOI: 10.3390/data7110149
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 150: Manual Conversion of Sadhukarn to Thai and
           Western Music Notations and Their Translation into a Rhyme Structure for
           Music Analysis

    • Authors: Sumetus Eambangyung, Gretel Schwörer-Kohl, Witoon Purahong
      First page: 150
      Abstract: Sadhukarn plays an important role as the most sacred music composition in Thai, Cambodian, and Lao music cultural areas. Due to various versions of unverified Sadhukarn main melodies in three different countries, notating melodies in suitable formats with a systematic method is necessary. This work provides a data descriptor for music transcription related to 25 different versions of the Sadhukarn main melody collected in Thailand, Cambodia, and Laos. Furthermore, we introduce a new procedure of music analysis based on rhyme structure. The aims of the study are to (1) provide Thai/Western musical note comprehension in the forms of Western staff and Thai notation, and (2) describe the procedures for translating from musical note to rhyme structure. To generate a rhyme structure, we apply a Thai poetic and linguistic approach as the method establishment. Rhyme structure is composed of melodic structures, the pillar tones Look-Tok, and melodic rhyming outline.
      Citation: Data
      PubDate: 2022-10-31
      DOI: 10.3390/data7110150
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 151: Isochromatic-Art: A Computational Dataset for
           Digital Photoelasticity Studies

    • Authors: Juan-Carlos Briñez-De-Leon, Mateo Rico-Garcia, Alejandro Restrepo-Martínez
      First page: 151
      Abstract: The importance of evaluating the stress field of loaded structures lies in the need for identifying the forces which make them fail, redesigning their geometry to increase the mechanical resistance, or characterizing unstressed regions to remove material. In such work line, digital photoelasticity highlights with the possibility of revealing the stress information through isochromatic color fringes, and quantifying it through inverse problem strategies. However, the absence of public data with a high variety of spatial fringe distribution has limited developing new proposals which generalize the stress evaluation in a wider variety of industrial applications. This dataset shares a variated collection of stress maps and their respective representation in color fringe patterns. In this case, the data were generated following a computational strategy that emulates the circular polariscope in dark field, but assuming stress surfaces and patches derived from analytical stress models, 3D reconstructions, saliency maps, and superpositions of Gaussian surfaces. In total, two sets of ‘101430’ raw images were separately generated for stress maps and isochromatic color fringes, respectively. This dataset can be valuable for researchers interested in characterizing the mechanical response in loaded models, engineers in computer science interested in modeling inverse problems, and scientists who work in physical phenomena such as 3D reconstruction in visible light, bubble analysis, oil surfaces, and film thickness.
      Citation: Data
      PubDate: 2022-11-01
      DOI: 10.3390/data7110151
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 152: Arabic Twitter Conversation Dataset about the
           COVID-19 Vaccine

    • Authors: Huda Alhazmi
      First page: 152
      Abstract: The development and rollout of COVID-19 vaccination around the world offers hope for controlling the pandemic. People turned to social media such as Twitter seeking information or to voice their opinion. Therefore, mining such conversation can provide a rich source of data for different applications related to the COVID-19 vaccine. In this data article, we developed an Arabic Twitter dataset of 1.1 M Arabic posts regarding the COVID-19 vaccine. The dataset was streamed over one year, covering the period from January to December 2021. We considered a set of crawling keywords in the Arabic language related to the conversation about the vaccine. The dataset consists of seven databases that can be analyzed separately or merged for further analysis. The initial analysis depicts the embedded features within the posts, including hashtags, media, and the dynamic of replies and retweets. Further, the textual analysis reveals the most frequent words that can capture the trends of the discussions. The dataset was designed to facilitate research across different fields, such as social network analysis, information retrieval, health informatics, and social science.
      Citation: Data
      PubDate: 2022-11-04
      DOI: 10.3390/data7110152
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 153: Ground Truth Dataset: Objectionable Web Content

    • Authors: Hamza H. M. Altarturi, Nor Badrul Anuar
      First page: 153
      Abstract: Cyber parental control aims to filter objectionable web content and prevent children from being exposed to harmful content. Succeeding in detecting and blocking objectionable content depends heavily on the accuracy of the topic model. A reliable ground truth dataset is essential for building effective cyber parental control models and validation of new detection methods. The ground truth is the measurement for labeling objectionable and unobjectionable websites of the cyber parental control dataset. The lack of publicly accessible datasets with a reliable ground truth has prevented a fair and coherent comparison of different methods proposed in the field of cyber parental control. This paper presents a ground truth dataset that contains 8000 labelled websites with 4000 objectionable websites and 4000 unobjectionable websites. These websites consist of more than 2 million web pages. Creating a ground truth objectionable web content dataset involved a few phases, including data collection, extraction, and labeling. Finally, the presence of bias, using kappa coefficient measurement, is addressed. The ground truth dataset is available publicly in the Mendeley repository.
      Citation: Data
      PubDate: 2022-11-07
      DOI: 10.3390/data7110153
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 154: Dataset on Force Myography for
           Human–Robot Interactions

    • Authors: Umme Zakia, Carlo Menon
      First page: 154
      Abstract: Force myography (FMG) is a contemporary, non-invasive, wearable technology that can read the underlying muscle volumetric changes during muscle contractions and expansions. The FMG technique can be used in recognizing human applied hand forces during physical human robot interactions (pHRI) via data-driven models. Several FMG-based pHRI studies were conducted in 1D, 2D and 3D during dynamic interactions between a human participant and a robot to realize human applied forces in intended directions during certain tasks. Raw FMG signals were collected via 16-channel (forearm) and 32-channel (forearm and upper arm) FMG bands while interacting with a biaxial stage (linear robot) and a serial manipulator (Kuka robot). In this paper, we present the datasets and their structures, the pHRI environments, and the collaborative tasks performed during the studies. We believe these datasets can be useful in future studies on FMG biosignal-based pHRI control design.
      Citation: Data
      PubDate: 2022-11-08
      DOI: 10.3390/data7110154
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 155: Reference-Guided Draft Genome Assembly,
           

    • Authors: Richard Estrada, Flor-Anita Corredor, Deyanira Figueroa, Wilian Salazar, Carlos Quilcate, Héctor V. Vásquez, Jorge L. Maicelo, Jhony Gonzales, Carlos I. Arbizu
      First page: 155
      Abstract: The Peruvian creole cattle (PCC) is a neglected breed and an essential livestock resource in the Andean region of Peru. To develop a modern breeding program and conservation strategies for the PCC, a better understanding of the genetics of this breed is needed. We sequenced the whole genome of the PCC using a de novo assembly approach with a paired-end 150 strategy on the Illumina HiSeq 2500 platform, obtaining 320 GB of sequencing data. A reference scaffolding was used to improve the draft genome. The obtained genome size of the PCC was 2.81 Gb with a contig N50 of 108 Mb and 92.59% complete BUSCOs. This genome size is similar to the genome references of Bos taurus and B. indicus. In addition, we identified 40.22% of repetitive DNA of the genome assembly, of which retroelements occupy 32.39% of the total genome. A total of 19,803 protein-coding genes were annotated in the PCC genome. For SSR data mining, we detected similar statistics in comparison with other breeds. The PCC genome will contribute to a better understanding of the genetics of this species and its adaptation to tough conditions in the Andean ecosystem.
      Citation: Data
      PubDate: 2022-11-09
      DOI: 10.3390/data7110155
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 156: Hybrid Wi-Fi and BLE Fingerprinting Dataset for
           Multi-Floor Indoor Environments with Different Layouts

    • Authors: Aina Nadhirah Nor Hisham, Yin Hoe Ng, Chee Keong Tan, David Chieng
      First page: 156
      Abstract: Indoor positioning has garnered significant interest over the last decade due to the rapidly growing demand for location-based services. As a result, a multitude of techniques has been proposed to localize objects and devices in indoor environments. Wireless fingerprinting, which leverages machine learning, has emerged as one of the most popular positioning approaches due to its low implementation cost. The prevailing fingerprinting-based positioning mainly utilizes wireless fidelity (Wi-Fi) and Bluetooth low energy (BLE) signals. However, the RSS of Wi-Fi and BLE signals are very sensitive to the layout of the indoor environment. Thus, any change in the indoor layout could potentially lead to severe degradation in terms of localization performance. To foster the development of new positioning methods, several open-source location fingerprinting datasets have been made available to the research community. Unfortunately, none of these public datasets provides the received signal strength (RSS) measurements for indoor environments with different layouts. To fill this gap, this paper presents a new hybrid Wi-Fi and BLE fingerprinting dataset for multi-floor indoor environments with different layouts to facilitate the future development of new fingerprinting-based positioning systems that can provide adaptive positioning performance in dynamic indoor environments. Additionally, the effects of indoor layout change on the location fingerprint and localization performance are also investigated.
      Citation: Data
      PubDate: 2022-11-09
      DOI: 10.3390/data7110156
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 157: High-Resolution UAV RGB Imagery Dataset for
           Precision Agriculture and 3D Photogrammetric Reconstruction Captured over
           a Pistachio Orchard (Pistacia vera L.) in Spain

    • Authors: Sergio Vélez, Rubén Vacas, Hugo Martín, David Ruano-Rosa, Sara Álvarez
      First page: 157
      Abstract: A total of 248 UAV RGB images were taken in the summer of 2021 over a representative pistachio orchard in Spain (X: 341450.3, Y: 4589731.8; ETRS89/UTM zone 30N). It is a 2.03 ha plot, planted in 2016 with Pistacia vera L. cv. Kerman grafted on UCB rootstock, with a NE–SW orientation and a 7 × 6 m triangular planting pattern. The ground was kept free of any weeds that could affect image processing. The photos (provided in JPG format) were taken using a UAV DJI Phantom Advance quadcopter in two flight missions: one planned to take nadir images (β = 0°), and another to take oblique images (β = 30°), both at 55 metres above the ground. The aerial platform incorporates a DJI FC6310 RGB camera with a 20 megapixel sensor, a horizontal field of view of 84° and a mechanical shutter. In addition, GCPs (ground control points) were collected. Finally, a high-quality 3D photogrammetric reconstruction process was carried out to generate a 3D point cloud (provided in LAS, LAZ, OBJ and PLY formats), a DEM (digital elevation model) and an orthomosaic (both in TIF format). The interest in using remote sensing in precision agriculture is growing, but the availability of reliable, ready-to-work, downloadable datasets is limited. Therefore, this dataset could be useful for precision agriculture researchers interested in photogrammetric reconstruction who want to evaluate models for orthomosaic and 3D point cloud generation from UAV missions with changing flight parameters, such as camera angle.
      Citation: Data
      PubDate: 2022-11-10
      DOI: 10.3390/data7110157
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 158: Measuring and Validating the Factors Influenced
           the SME Business Growth in Germany—Descriptive Analysis and
           Construct Validation

    • Authors: Hosam Azat Elsaman, Nourhan El-Bayaa, Suriyakumaran Kousihan
      First page: 158
      Abstract: In Germany, the medical device industry constitutes a cornerstone of the health sector. In this study, we investigated the challenges and factors affecting the present-day performance of German SMEs concerned with medical devices. The research methodology adopted a cross-sectional and correlational research design, with simple random-sampling techniques, to data obtained from 110 mid-level and senior managers in German SMEs by means of an online structured survey in August 2022. We statistically validated our study data using exploratory factor analysis (EFA), Kaiser–Meyer–Olkin (KMO) testing, and Bartlett’s test, to assess the relationship between study variables and measure data adequacy using the R4.1.1(21) software, then carried out principal component analysis (PCA) with varimax factor loading and extracted six factors for use as research variables. The researchers also applied descriptive data analysis techniques using SPSS.21. The main study variables were: (1) the business performance of small and medium businesses (SMP); (2) their financial situation (SMEF); and (3) their implementation of new medical device industry regulations (MDR). By such statistical means, results confirmed poorer business performance and lower anticipated growth amongst SMEs affected by MDR, over and above the impacts of the present-day economic situation. The data can be used by management information systems (MIS) and decision system support professionals for planning and developing practical models about how to cope with current industry challenges. We recommend further research involving inferential analysis and triangulation of these data in the form of a semi-structured qualitative study in the larger scope of the population and different sectors.
      Citation: Data
      PubDate: 2022-11-10
      DOI: 10.3390/data7110158
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 159: Stance Classification of Social Media Texts for
           Under-Resourced Scenarios in Social Sciences

    • Authors: Victoria Yantseva, Kostiantyn Kucher
      First page: 159
      Abstract: In this work, we explore the performance of supervised stance classification methods for social media texts in under-resourced languages and using limited amounts of labeled data. In particular, we focus specifically on the possibilities and limitations of the application of classic machine learning versus deep learning in social sciences. To achieve this goal, we use a training dataset of 5.7K messages posted on Flashback Forum, a Swedish discussion platform, further supplemented with the previously published ABSAbank-Imm annotated dataset, and evaluate the performance of various model parameters and configurations to achieve the best training results given the character of the data. Our experiments indicate that classic machine learning models achieve results that are on par or even outperform those of neural networks and, thus, could be given priority when considering machine learning approaches for similar knowledge domains, tasks, and data. At the same time, the modern pre-trained language models provide useful and convenient pipelines for obtaining vectorized data representations that can be combined with classic machine learning algorithms. We discuss the implications of their use in such scenarios and outline the directions for further research.
      Citation: Data
      PubDate: 2022-11-13
      DOI: 10.3390/data7110159
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 160: Explainable Machine Learning for Financial
           Distress Prediction: Evidence from Vietnam

    • Authors: Kim Long Tran, Hoang Anh Le, Thanh Hien Nguyen, Duc Trung Nguyen
      First page: 160
      Abstract: The past decade has witnessed the rapid development of machine learning applied in economics and finance. Recent evidence suggests that machine learning models have produced superior results to traditional statistical models and have become the driving force for dramatic improvement in the financial industry. However, a much-debated question is whether the prediction results from black box machine learning models can be interpreted. In this study, we compared the predictive power of machine learning algorithms and applied SHAP values to interpret the prediction results on the dataset of listed companies in Vietnam from 2010 to 2021. The results showed that the extreme gradient boosting and random forest models outperformed other models. In addition, based on Shapley values, we also found that long-term debts to equity, enterprise value to revenues, account payable to equity, and diluted EPS had greatly influenced the outputs. In terms of practical contributions, the study helps credit rating companies have a new method for predicting the possibility of default of bond issuers in the market. The study also provides an early warning tool for policymakers about the risks of public companies in order to develop measures to protect retail investors against the risk of bond default.
      Citation: Data
      PubDate: 2022-11-14
      DOI: 10.3390/data7110160
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 161: Dataset: Coleoptera (Insecta) Collected from Beer
           Traps in “Smolny” National Park (Russia)

    • Authors: Alexander B. Ruchin, Leonid V. Egorov, Oleg N. Artaev, Mikhail N. Esin
      First page: 161
      Abstract: Monitoring Coleoptera diversity in protected areas is part of the global ecological monitoring of the state of ecosystems. The purpose of this research is to describe the biodiversity of Coleoptera studied with the help of baits based on fermented substrate in the European part of Russia (Smolny National Park). The research was conducted April–August 2018–2022. Samples were collected in traps of our own design. Beer or wine with the addition of sugar, honey, or jam was used for bait. A total of 194 traps were installed. The dataset contains 1254 occurrences. A total of 9226 Coleoptera specimens have been studied. The dataset contains information about 134 species from 24 Coleoptera families. The largest number of species that have been found in traps belongs to the family Cerambycidae (30 species), Nitidulidae (14 species), Elateridae (12 species), and Curculionidae and Coccinellidae (10 species each). The number of individuals in the traps of these families was distributed as follows: Cerambycidae—1018 specimens; Nitidulidae—5359; Staphylinidae—241; Elateridae—33; Curculionidae—148; and Coccinellidae—19. The 10 dominant species accounted for 90.7% of all detected specimens in the traps. The maximum species diversity and abundance of Coleoptera was obtained in 2021. With the installation of the largest number of traps in 2022 and more diverse biotopes (64 traps), a smaller number of species was caught compared to 2021. New populations of such species have been found from rare Coleoptera: Calosoma sycophanta, Elater ferrugineus, Osmoderma barnabita, Protaetia speciosissima, and Protaetia fieberi.
      Citation: Data
      PubDate: 2022-11-15
      DOI: 10.3390/data7110161
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 162: Methodology for the Surveillance the Voltage
           Supply in Public Buildings Using the ITIC Curve and Python Programming

    • Authors: Javier Fernández-Morales, Juan-José González-de-la-Rosa, José-María Sierra-Fernández, Olivia Florencias-Oliveros, Paula Remigio-Carmona, Manuel-Jesús Espinosa-Gavira, Agustín Agüera-Pérez, José-Carlos Palomares-Salas
      First page: 162
      Abstract: This paper proposes an easy-to-implement method for detecting and assessing two of the most frequent PQ (Power Quality) problems: voltage sags and swells. These can affect sensitive equipment such as computers, programmable logic controllers, contactors, etc. Therefore, it is of great interest to implement it in any laboratory, not only for protection reasons but also as a safeguard for claims against the supply company. Thanks to the actual context, in which it is possible to manage big volumes of data, connect multiple devices with IoT (Internet of Things), etc., it is feasible and of great interest to monitor the voltage at specific points of the network. This makes it possible to detect voltage sags and swells and diagnose which points are more prone to this type of problems. For the detection of sags and swells, a program written in Python is in charge of crawling all the files in the database and target those RMS values that fall outside the established limits. Compared to LabVIEW, which might have been the most logical alternative, being the acquisition hardware from the same company (National Instruments), Python has a higher computational performance and is also free of charge, unlike LabVIEW. Thanks to the libraries available in Python, it allows a hardware control close to what is possible using LabVIEW. Implemented in MATLAB, the ITIC (Information Technology Industry Council) power acceptability curve reflects the impact of these power quality disturbances in electrical power systems. The results showed that the combined action of Python and MATLAB performed well on a conventional desktop computer.
      Citation: Data
      PubDate: 2022-11-17
      DOI: 10.3390/data7110162
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 163: An Analysis by State on The Effect of Movement
           Control Order (MCO) 3.0 Due to COVID-19 on Malaysians’ Mental
           Health: Evidence from Google Trends

    • Authors: Nicholas Tze Ping Pang, Assis Kamu, Chong Mun Ho, Walton Wider, Mathias Wen Leh Tseu
      First page: 163
      Abstract: Due to significant social and economic upheavals brought on by the COVID-19 pandemic, there is a great deal of psychological pain. Google Trends data have been seen as a corollary measure to assess population-wide trends via observing trends in search results. Judicious analysis of Google Trends data can have both analytical and predictive capacities. This study aimed to compare nation-wide and inter-state trends in mental health before and after the Malaysian Movement Control Order 3.0 (MCO 3.0) commencing 12 May 2021. This was through assessment of two terms, “stress” and “sleep” in both the Malay and English language. Google Trends daily data between March 6 and 31 May in both 2019 and 2021 was obtained, and both series were re-scaled to be comparable. Searches before and after MCO 3.0 in 2021 were compared to searches before and after the same date in 2019. This was carried out using the differences in difference (DiD) method. This ensured that seasonal variations between states were not the source of our findings. We found that DiD estimates, β_3 for “sleep” and “stress” were not significantly different from zero, implying that MCO 3.0 had no effect on psychological distress in all states. Johor was the only state where the DiD estimates β_3 were significantly different from zero for the search topic ‘Tidur’. For the topic ‘Tekanan’, there were two states with significant DiD estimates, β_3, namely Penang and Sarawak. This study hence demonstrates that there are particular state-level differences in Google Trend search terms, which gives an indicator as to states to prioritise interventions and increase surveillance for mental health. In conclusion, Google Trends is a powerful tool to examine larger population-based trends especially in monitoring public health parameters such as population-level psychological distress, which can facilitate interventions.
      Citation: Data
      PubDate: 2022-11-17
      DOI: 10.3390/data7110163
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 164: CoviRx: A User-Friendly Interface for Systematic
           Down-Selection of Repurposed Drug Candidates for COVID-19

    • Authors: Hardik A. Jain, Vinti Agarwal, Chaarvi Bansal, Anupama Kumar, Faheem Faheem, Muzaffar-Ur-Rehman Mohammed, Sankaranarayanan Murugesan, Moana M. Simpson, Avinash V. Karpe, Rohitash Chandra, Christopher A. MacRaild, Ian K. Styles, Amanda L. Peterson, Matthew A. Cooper, Carl M. J. Kirkpatrick, Rohan M. Shah, Enzo A. Palombo, Natalie L. Trevaskis, Darren J. Creek, Seshadri S. Vasan
      First page: 164
      Abstract: Although various vaccines are now commercially available, they have not been able to stop the spread of COVID-19 infection completely. An excellent strategy to get safe, effective, and affordable COVID-19 treatments quickly is to repurpose drugs that are already approved for other diseases. The process of developing an accurate and standardized drug repurposing dataset requires considerable resources and expertise due to numerous commercially available drugs that could be potentially used to address the SARS-CoV-2 infection. To address this bottleneck, we created the CoviRx.org platform. CoviRx is a user-friendly interface that allows analysis and filtering of large quantities of data, which is onerous to curate manually for COVID-19 drug repurposing. Through CoviRx, the curated data have been made open source to help combat the ongoing pandemic and encourage users to submit their findings on the drugs they have evaluated, in a uniform format that can be validated and checked for integrity by authenticated volunteers. This article discusses the various features of CoviRx, its design principles, and how its functionality is independent of the data it displays. Thus, in the future, this platform can be extended to include any other disease beyond COVID-19.
      Citation: Data
      PubDate: 2022-11-18
      DOI: 10.3390/data7110164
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 165: Density-Based Unsupervised Learning Algorithm to
           Categorize College Students into Dropout Risk Levels

    • Authors: Miguel Angel Valles-Coral, Luis Salazar-Ramírez, Richard Injante, Edwin Augusto Hernandez-Torres, Juan Juárez-Díaz, Jorge Raul Navarro-Cabrera, Lloy Pinedo, Pierre Vidaurre-Rojas
      First page: 165
      Abstract: Compliance with the basic conditions of quality in higher education implies the design of strategies to reduce student dropout, and Information and Communication Technologies (ICT) in the educational field have allowed directing, reinforcing, and consolidating the process of professional academic training. We propose an academic and emotional tracking model that uses data mining and machine learning to group university students according to their level of dropout risk. We worked with 670 students from a Peruvian public university, applied 5 valid and reliable psychological assessment questionnaires to them using a chatbot-based system, and then classified them using 3 density-based unsupervised learning algorithms, DBSCAN, K-Means, and HDBSCAN. The results showed that HDBSCAN was the most robust option, obtaining better validity levels in two of the three internal indices evaluated, where the performance of the Silhouette index was 0.6823, the performance of the Davies–Bouldin index was 0.6563, and the performance of the Calinski–Harabasz index was 369.6459. The best number of clusters produced by the internal indices was five. For the validation of external indices, with answers from mental health professionals, we obtained a high level of precision in the F-measure: 90.9%, purity: 94.5%, V-measure: 86.9%, and ARI: 86.5%, and this indicates the robustness of the proposed model that allows us to categorize university students into five levels according to the risk of dropping out.
      Citation: Data
      PubDate: 2022-11-18
      DOI: 10.3390/data7110165
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 166: Forecasting Daily COVID-19 Case Counts Using
           Aggregate Mobility Statistics

    • Authors: Bulut Boru, M. Emre Gursoy
      First page: 166
      Abstract: The COVID-19 pandemic has impacted the whole world profoundly. For managing the pandemic, the ability to forecast daily COVID-19 case counts would bring considerable benefit to governments and policymakers. In this paper, we propose to leverage aggregate mobility statistics collected from Google’s Community Mobility Reports (CMRs) toward forecasting future COVID-19 case counts. We utilize features derived from the amount of daily activity in different location categories such as transit stations versus residential areas based on the time series in CMRs, as well as historical COVID-19 daily case and test counts, in forecasting future cases. Our method trains optimized regression models for different countries based on dynamic and data-driven selection of the feature set, regression type, and time period that best fit the country under consideration. The accuracy of our method is evaluated on 13 countries with diverse characteristics. Results show that our method’s forecasts are highly accurate when compared to the real COVID-19 case counts. Furthermore, visual analysis shows that the peaks, plateaus and general trends in case counts are also correctly predicted by our method.
      Citation: Data
      PubDate: 2022-11-20
      DOI: 10.3390/data7110166
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 167: Database of Metagenomes of Sediments from
           Estuarine Aquaculture Farms in Portugal—AquaRAM Project Collection

    • Authors: Teresa Nogueira, Daniel G. Silva, Susana Lopes, Ana Botelho
      First page: 167
      Abstract: Aquaculture farms and estuarine environments close to human activities play a critical role in the interaction between aquatic and terrestrial surroundings and animal and human health. The AquaRAM project aimed to study estuarine aquaculture farms in Portugal as a reservoir of antibiotic resistance genes and the potential of its spread due to mobile genetic elements. We have assembled a collection of metagenomic data from 30 sediment samples from oysters, mussels, and gilt-head sea bream aquaculture farms. This collection includes samples of the estuarine environment of three rivers and one lagoon located from the north to the south of Portugal, namely, the Lima River in Viana do Castelo, Aveiro Lagoon in Aveiro, Tagus River in Alcochete, and Sado River in Setúbal. Statistical data from the raw metagenome files, as well as the file sizes of the assembled nucleotide and protein sequences, are also presented. The link to the statistics and the download page for all the metagenomes is also listed below.
      Citation: Data
      PubDate: 2022-11-20
      DOI: 10.3390/data7110167
      Issue No: Vol. 7, No. 11 (2022)
       
  • Data, Vol. 7, Pages 135: RIFIS: A Novel Rice Field Sidewalk Detection
           Dataset for Walk-Behind Hand Tractor

    • Authors: Padma Nyoman Crisnapati, Dechrit Maneetham
      First page: 135
      Abstract: Rice field sidewalk (RIFIS) identification plays a crucial role in enhancing the performance of agricultural computer applications, especially for rice farming, by dividing the image into areas of rice fields to be ploughed and the areas outside of rice fields. This division isolates the desired area and reduces computational costs for processing RIFIS detection in the automation of ploughing fields using hand tractors. Testing and evaluating the performance of the RIFIS detection method requires a collection of image data that includes various features of the rice field environment. However, the available agricultural image datasets focus only on rice plants and their diseases; a dataset that explicitly provides RIFIS imagery has not been found. This study presents an RIFIS image dataset that addresses this deficiency by including specific linear characteristics. In Bali, Indonesia, two geographically separated rice fields were selected. The initial data collected were from several videos, which were then converted into image sequences. Manual RIFIS annotations were applied to the image. This research produced a dataset consisting of 970 high-definition RGB images (1920 × 1080 pixels) and corresponding annotations. This dataset has a combination of 19 different features. By utilizing our dataset for detection, it can be applied not only for the time of rice planting but also for the time of rice harvest, and our dataset can be used for a variety of applications throughout the entire year.
      Citation: Data
      PubDate: 2022-09-25
      DOI: 10.3390/data7100135
      Issue No: Vol. 7, No. 10 (2022)
       
  • Data, Vol. 7, Pages 136: Full-Body Mobility Data to Validate Inertial
           Measurement Unit Algorithms in Healthy and Neurological Cohorts

    • Authors: Elke Warmerdam, Clint Hansen, Robbin Romijnders, Markus A. Hobert, Julius Welzel, Walter Maetzler
      First page: 136
      Abstract: Gait and balance dysfunctions are common in neurological disorders and have a negative effect on quality of life. Regularly quantifying these mobility limitations can be used to measure disease progression and the effect of treatment. This information can be used to provide a more individualized treatment. Inertial measurement units (IMUs) can be utilized to quantify mobility in different contexts. However, algorithms are required to extract valuable parameters out of the raw IMU data. These algorithms need to be validated to make sure that they extract the features they should extract. This validation should be performed per disease since different mobility limitations or symptoms can influence the performance of an algorithm in different ways. Therefore, this dataset contains data from both healthy subjects and patients with neurological diseases (Parkinson’s disease, stroke, multiple sclerosis, chronic low back pain). The full bodies of 167 subjects were measured with IMUs and an optical motion capture (reference) system. Subjects performed multiple standardized mobility assessments and non-standardized activities of daily living. The data of 21 healthy subjects are shared online, data of the other subjects and patients can only be obtained after contacting the corresponding author and signing a data sharing agreement.
      Citation: Data
      PubDate: 2022-09-27
      DOI: 10.3390/data7100136
      Issue No: Vol. 7, No. 10 (2022)
       
  • Data, Vol. 7, Pages 137: Cheating, Trust and Social Norms: Data from
           Germany, Vietnam, China, Taiwan, and Japan

    • Authors: Toan L. D. Huynh, Marc Oliver Rieger, Mei Wang, David Berens, Duy-Linh Bui, Hung-Ling Chen, Tobias Peter Emering, Sen Geng, Yang Liu-Gerhards, Thomas Neumann, Thanh Dac Nguyen, Thong Trung Nguyen, Diefeng Peng, Thuy Chung Phan, Denis Reinhardt, Junyi Shen, Hiromasa Takahashi, Bodo Vogt
      First page: 137
      Abstract: The data presented here contain information on cheating behavior from experiments and general self-reported attitudes related to honesty-related social norms and trust, together with individual-level demographic variables. Our sample included 493 university students in five countries, namely, Germany, Vietnam, Taiwan, China, and Japan. The experiment was monetarily incentivized based on the performance on a matrix task. The participants also answered a survey questionnaire. The dataset is valuable for academic researchers in sociology, psychology, and economics who are interested in honesty, norms, and cultural differences.
      Citation: Data
      PubDate: 2022-09-28
      DOI: 10.3390/data7100137
      Issue No: Vol. 7, No. 10 (2022)
       
  • Data, Vol. 7, Pages 138: Consumer Perceptions towards Unsolicited
           Advertisements on Social Media

    • Authors: Romano, Han
      First page: 138
      Abstract: The practice of unsolicited advertisements on social media has grown prevalent. This data article presents 837 US-based social media users’ consumer perceptions of such advertisements. Understanding how consumers perceive unsolicited advertising is vital to developing effective digital marketing strategies. Data collection was via an online survey adopting multi-measurement items from extant studies for reliability and validity. The data showed high internal consistency with Cronbach’s alpha testing, and confirmatory factor analysis (CFA) found the measurement model valid. Goodness-of-fit indices showed a good fit with the data. Finally, convergent and discriminant validity was confirmed using the composite reliability, average variance extracted (AVE), and correlations among constructs. Further research may utilise the data using inferential analysis techniques to add to our understanding of consumer perceptions of unsolicited advertising on social media.
      Citation: Data
      PubDate: 2022-10-01
      DOI: 10.3390/data7100138
      Issue No: Vol. 7, No. 10 (2022)
       
  • Data, Vol. 7, Pages 139: Technology Transfer from Nordic Capital Parenting
           Companies to Lithuanian and Estonian Subsidiaries or Joint Capital
           Companies: The Analysis of the Obtained Primary Data

    • Authors: Agnė Šimelytė, Manuela Tvaronavičienė
      First page: 139
      Abstract: Scientific literature describes various factors that influence knowledge transfer and successful adoption, assimilation, transformation, and exploitation. These four components are mostly related to the absorptive capacity of the company. However, more factors influence both developments of innovations or patents and the lack of ability to use external and internal information (knowledge). Using external knowledge is often associated with previous experience, or even a point of view towards investment in innovation or developing patents. Thus, the companies might be divided into innovators and imitators. The research addresses several problems (questions). What external factors are influencing knowledge transfer and further development of innovation' What factors are influencing absorptive capacity' What factors are essential in cooperation and knowledge transfer to switch from a linear to a circular economy' To collect data, a computer-assisted telephone interviewing method was used. The survey was addressed to subsidiaries, joint companies, Lithuanian-Nordic, Estonian-Nordic capital companies, or companies in close collaboration with the Nordic countries. A total of 158 companies from Estonia and Lithuania agreed to answer all the questions. The survey involves companies of various sizes and ages from different business sectors. Reliability was denoted, as Cronbach’s Alpha was estimated. The KMO test was used to measure whether the data were suitable for principal component analysis. Additionally, PCA was performed. PCA reduced the number of variables into an extracted number of components. The separate row of the component defined a linear composite of the component score that would be the expected value of the associated variable. The dataset may be used to develop interlinkages among the research mentioned above questions, and the results of introducing innovation, the company’s size, and age might be used as control variables. The article aims to analyze the factors that determine innovation development and their interlinkages while technology is transferred from Nordic parenting companies to the subsidiaries. The article’s results contribute to the interdisciplinary knowledge transfer, innovations, and internationalization field.
      Citation: Data
      PubDate: 2022-10-14
      DOI: 10.3390/data7100139
      Issue No: Vol. 7, No. 10 (2022)
       
  • Data, Vol. 7, Pages 140: Experimental Data on Solubility of the Two
           Calcium Sulfates Gypsum and Anhydrite in Aqueous Solutions

    • Authors: Reza Taherdangkoo, Miaomiao Tian, Ali Sadighi, Tao Meng, Huichen Yang, Christoph Butscher
      First page: 140
      Abstract: Calcium sulfate exists in three forms, namely dihydrate or gypsum (CaSO4·2H2O), anhydrite (CaSO4), and hemihydrate or bassanite (CaSO4·0.5H2O) depending on temperature, pressure, pH, and formation conditions. The formation of calcium sulfates occurs widely in nature and in many engineering settings. Herein, a dataset containing the experimental solubility data of calcium sulfate minerals, i.e., gypsum and anhydrite, in aqueous solutions is presented. The compiled dataset contains calcium sulfates solubility values extracted from 42 papers published between 1906 and 2019. The dataset can be used for various scientific and engineering purposes such as environmental applications (e.g., gas treatment, wastewater treatment, and chemical disposal), geotechnical applications (e.g., clay-sulfate rock swelling), separation processes (e.g., crystallization, extractive distillation, and seawater desalination), and electrochemical processes (e.g., corrosion and electrolysis).
      Citation: Data
      PubDate: 2022-10-16
      DOI: 10.3390/data7100140
      Issue No: Vol. 7, No. 10 (2022)
       
  • Data, Vol. 7, Pages 141: Heartprint: A Dataset of Multisession ECG Signal
           with Long Interval Captured from Fingers for Biometric Recognition

    • Authors: Md Saiful Islam, Haikel Alhichri, Yakoub Bazi, Nassim Ammour, Naif Alajlan, Rami M. Jomaa
      First page: 141
      Abstract: The electrocardiogram (ECG) signal produced by the human heart is an emerging biometric modality that can play an important role in the future generation’s identity recognition with the support of machine learning techniques. One of the major obstacles in the progress of this modality is the lack of public datasets with a long interval between sessions of data acquisition to verify the uniqueness and permanence of the biometric signature of the heart of a subject. To address this issue, we put forward Heartprint, a large biometric database of multisession ECG signals comprising 1539 records captured from the fingers of 199 healthy subjects. The capturing time for each record was 15 s, and recordings were made in resting and reading conditions. They were collected in multiple sessions over ten years, and the average interval between first session (S1) and third session (S3L) was 1572.2 days. The dataset also covers several demographic classes such as genders, ethnicities, and age groups. The combination of raw ECG signals and demographic information turns the Heartprint dataset, which is made publicly available online, into a valuable resource for the development and evaluation of biometric recognition algorithms.
      Citation: Data
      PubDate: 2022-10-21
      DOI: 10.3390/data7100141
      Issue No: Vol. 7, No. 10 (2022)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 44.210.85.190
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-