A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

  Subjects -> SCIENCES: COMPREHENSIVE WORKS (Total: 374 journals)
Showing 1 - 200 of 265 Journals sorted alphabetically
AAS Open Research     Open Access  
Accountability in Research: Policies and Quality Assurance     Hybrid Journal   (Followers: 17)
Acta Materialia Transilvanica     Open Access  
Acta Nova     Open Access  
Acta Scientifica Malaysia     Open Access   (Followers: 1)
Acta Scientifica Naturalis     Open Access   (Followers: 2)
Adıyaman University Journal of Science     Open Access  
Advanced Science     Open Access   (Followers: 12)
Advanced Science, Engineering and Medicine     Partially Free   (Followers: 4)
Advanced Theory and Simulations     Hybrid Journal   (Followers: 2)
Advances in Research     Open Access  
Advances in Science and Technology     Full-text available via subscription   (Followers: 16)
African Journal of Science, Technology, Innovation and Development     Hybrid Journal   (Followers: 7)
Afrique Science : Revue Internationale des Sciences et Technologie     Open Access   (Followers: 1)
AFRREV STECH : An International Journal of Science and Technology     Open Access   (Followers: 3)
American Academic & Scholarly Research Journal     Open Access   (Followers: 4)
American Journal of Applied Sciences     Open Access   (Followers: 22)
American Journal of Humanities and Social Sciences     Open Access   (Followers: 11)
ANALES de la Universidad Central del Ecuador     Open Access   (Followers: 1)
Anales del Instituto de la Patagonia     Open Access  
Applied Mathematics and Nonlinear Sciences     Open Access  
Apuntes de Ciencia & Sociedad     Open Access  
Arab Journal of Basic and Applied Sciences     Open Access  
Arabian Journal for Science and Engineering     Hybrid Journal   (Followers: 1)
Archives Internationales d'Histoire des Sciences     Partially Free   (Followers: 5)
Archives of Current Research International     Open Access  
ARO. The Scientific Journal of Koya University     Open Access  
ARPHA Conference Abstracts     Open Access   (Followers: 1)
ARPHA Proceedings     Open Access  
ArtefaCToS : Revista de estudios sobre la ciencia y la tecnología     Open Access  
Asian Journal of Advanced Research and Reports     Open Access  
Asian Journal of Scientific Research     Open Access   (Followers: 2)
Asian Journal of Technology Innovation     Hybrid Journal   (Followers: 5)
Australian Field Ornithology     Full-text available via subscription   (Followers: 2)
Australian Journal of Social Issues     Hybrid Journal   (Followers: 6)
Avrasya Terim Dergisi     Open Access  
Bangladesh Journal of Scientific Research     Open Access  
Beni-Suef University Journal of Basic and Applied Sciences     Open Access  
Berichte Zur Wissenschaftsgeschichte     Hybrid Journal   (Followers: 10)
BIBECHANA     Open Access  
Bilge International Journal of Science and Technology Research     Open Access   (Followers: 1)
Bioethics Research Notes     Full-text available via subscription   (Followers: 15)
BJHS Themes     Open Access  
Black Sea Journal of Engineering and Science     Open Access  
Borneo Journal of Resource Science and Technology     Open Access  
Bulletin de la Société Royale des Sciences de Liège     Open Access  
Bulletin of the National Research Centre     Open Access  
Butlletí de la Institució Catalana d'Història Natural     Open Access  
Chain Reaction     Full-text available via subscription  
Ciencia Amazónica (Iquitos)     Open Access  
Ciencia en su PC     Open Access   (Followers: 1)
Ciencia Ergo Sum     Open Access  
Ciência ET Praxis     Open Access  
Ciencia y Tecnología     Open Access  
Ciencias Holguin     Open Access   (Followers: 1)
CienciaUAT     Open Access  
Citizen Science : Theory and Practice     Open Access   (Followers: 1)
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering     Open Access  
Communications in Applied Sciences     Open Access  
Comunicata Scientiae     Open Access  
ConCiencia     Open Access  
Conference Papers in Science     Open Access  
Configurations     Full-text available via subscription   (Followers: 11)
COSMOS     Hybrid Journal   (Followers: 1)
Crea Ciencia Revista Científica     Open Access  
Cuadernos de Investigación UNED     Open Access  
Current Issues in Criminal Justice     Hybrid Journal   (Followers: 15)
Current Research in Geoscience     Open Access   (Followers: 5)
Dalat University Journal of Science     Open Access  
Data     Open Access   (Followers: 4)
Data Curation Profiles Directory     Open Access   (Followers: 5)
Dhaka University Journal of Science     Open Access  
Diálogos Interdisciplinares     Open Access  
Digithum     Open Access   (Followers: 2)
Discover Sustainability     Open Access   (Followers: 3)
Einstein (São Paulo)     Open Access  
Ekaia : EHUko Zientzia eta Teknologia aldizkaria     Open Access  
Elkawnie : Journal of Islamic Science and Technology     Open Access  
Emergent Scientist     Open Access  
Enhancing Learning in the Social Sciences     Open Access   (Followers: 7)
Enseñanza de las Ciencias : Revista de Investigación y Experiencias Didácticas     Open Access  
Entramado     Open Access  
Entre Ciencia e Ingeniería     Open Access  
Epiphany     Open Access   (Followers: 1)
Estação Científica (UNIFAP)     Open Access  
Ethiopian Journal of Education and Sciences     Open Access   (Followers: 5)
Ethiopian Journal of Science and Technology     Open Access  
Ethiopian Journal of Sciences and Sustainable Development     Open Access  
European Online Journal of Natural and Social Sciences     Open Access   (Followers: 4)
European Scientific Journal     Open Access   (Followers: 1)
Evidência - Ciência e Biotecnologia - Interdisciplinar     Open Access  
Exchanges : the Warwick Research Journal     Open Access   (Followers: 1)
Experimental Results     Open Access   (Followers: 1)
Facets     Open Access  
Fides et Ratio : Revista de Difusión Cultural y Científica     Open Access  
Fırat University Turkish Journal of Science & Technology     Open Access  
Fontanus     Open Access  
Forensic Science Policy & Management: An International Journal     Hybrid Journal   (Followers: 227)
Frontiers for Young Minds     Open Access  
Frontiers in Climate     Open Access   (Followers: 4)
Frontiers in Science     Open Access   (Followers: 1)
Fundamental Research     Open Access  
Futures & Foresight Science     Hybrid Journal   (Followers: 1)
Gaudium Sciendi     Open Access  
Gazi University Journal of Science     Open Access  
Ghana Studies     Full-text available via subscription   (Followers: 15)
Global Journal of Pure and Applied Sciences     Full-text available via subscription  
Global Journal of Science Frontier Research     Open Access   (Followers: 1)
Globe, The     Full-text available via subscription   (Followers: 3)
HardwareX     Open Access  
Heidelberger Jahrbücher Online     Open Access  
Heliyon     Open Access  
Himalayan Journal of Science and Technology     Open Access  
History of Science and Technology     Open Access   (Followers: 2)
Hoosier Science Teacher     Open Access  
Impact     Open Access   (Followers: 1)
Indian Journal of History of Science     Hybrid Journal  
Indonesian Journal of Fundamental Sciences     Open Access  
Indonesian Journal of Science and Mathematics Education     Open Access   (Followers: 1)
Indonesian Journal of Science and Technology     Open Access  
Ingenieria y Ciencia     Open Access   (Followers: 1)
Innovare : Revista de ciencia y tecnología     Open Access  
Instruments     Open Access  
Integrated Research Advances     Open Access  
Interciencia     Open Access  
Interface Focus     Full-text available via subscription  
International Annals of Science     Open Access  
International Archives of Science and Technology     Open Access  
International Journal of Academic Research in Business, Arts & Science     Open Access  
International Journal of Advanced Multidisciplinary Research and Review     Open Access  
International Journal of Applied Science     Open Access  
International Journal of Basic and Applied Sciences     Open Access   (Followers: 1)
International Journal of Computational and Experimental Science and Engineering (IJCESEN)     Open Access  
International Journal of Culture and Modernity     Open Access  
International Journal of Engineering, Science and Technology     Open Access  
International Journal of Engineering, Technology and Natural Sciences     Open Access  
International Journal of Innovation and Applied Studies     Open Access   (Followers: 3)
International Journal of Innovative Research and Scientific Studies     Open Access   (Followers: 1)
International Journal of Network Science     Hybrid Journal   (Followers: 3)
International Journal of Recent Contributions from Engineering, Science & IT     Open Access  
International Journal of Social Sciences and Management     Open Access   (Followers: 2)
International Journal of Technology Policy and Law     Hybrid Journal   (Followers: 6)
International Letters of Social and Humanistic Sciences     Open Access  
International Scientific and Vocational Studies Journal     Open Access  
InterSciencePlace     Open Access  
Investiga : TEC     Open Access  
Investigación Joven     Open Access  
Investigacion y Ciencia     Open Access   (Followers: 1)
Iranian Journal of Science and Technology, Transactions A : Science     Hybrid Journal  
iScience     Open Access   (Followers: 1)
Issues in Science & Technology     Free   (Followers: 8)
Ithaca : Viaggio nella Scienza     Open Access  
J : Multidisciplinary Scientific Journal     Open Access  
Jaunujų mokslininkų darbai     Open Access  
Journal de la Recherche Scientifique de l'Universite de Lome     Full-text available via subscription  
Journal of Advanced Research     Open Access   (Followers: 2)
Journal of Al-Qadisiyah for Pure Science     Open Access  
Journal of Alasmarya University     Open Access  
Journal of Analytical Science & Technology     Open Access   (Followers: 4)
Journal of Applied Science and Technology     Full-text available via subscription   (Followers: 1)
Journal of Applied Sciences and Environmental Management     Open Access   (Followers: 1)
Journal of Big History     Open Access   (Followers: 3)
Journal of Composites Science     Open Access   (Followers: 3)
Journal of Diversity Management     Open Access   (Followers: 4)
Journal of Indian Council of Philosophical Research     Hybrid Journal  
Journal of Institute of Science and Technology     Open Access  
Journal of Integrated Science and Technology     Open Access  
Journal of King Saud University - Science     Open Access  
Journal of Mathematical and Fundamental Sciences     Open Access  
Journal of Natural Sciences Research     Open Access   (Followers: 1)
Journal of Negative and No Positive Results     Open Access  
Journal of Responsible Technology     Open Access  
Journal of Science (JSc)     Open Access  
Journal of Science and Engineering     Open Access  
Journal of Science and Technology     Open Access   (Followers: 2)
Journal of Science and Technology     Open Access   (Followers: 1)
Journal of Science and Technology (Ghana)     Open Access   (Followers: 3)
Journal of Science and Technology Policy Management     Hybrid Journal   (Followers: 1)
Journal of Science Foundation     Open Access   (Followers: 1)
Journal of Science of the University of Kelaniya Sri Lanka     Open Access  
Journal of Scientific Research     Open Access  
Journal of Scientific Research and Reports     Open Access  
Journal of Scientometric Research     Open Access   (Followers: 20)
Journal of Shanghai Jiaotong University (Science)     Hybrid Journal  
Journal of Social Science Research     Open Access   (Followers: 2)
Journal of Taibah University for Science     Open Access  
Journal of the Asiatic Society of Bangladesh, Science     Open Access  
Journal of the Ghana Science Association     Full-text available via subscription   (Followers: 3)
Journal of the History of Ideas     Full-text available via subscription   (Followers: 126)
Journal of the Indian Institute of Science     Hybrid Journal   (Followers: 3)
Journal of the National Science Foundation of Sri Lanka     Open Access  
Journal of the Royal Society of New Zealand     Hybrid Journal   (Followers: 48)
Journal of the South Carolina Academy of Science     Open Access  
Journal of Unsolved Questions     Open Access  
Jurnal Ilmiah Ilmu Terapan Universitas Jambi : JIITUJ     Open Access  
Jurnal Matematika, Sains, Dan Teknologi     Open Access  
Jurnal MIPA     Open Access  
Jurnal Natural     Open Access  
Jurnal Sains Dasar     Open Access  
Jurnal Teknosains     Open Access  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Similar Journals
Journal Cover
Data
Number of Followers: 4  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2306-5729
Published by MDPI Homepage  [84 journals]
  • Data, Vol. 7, Pages 51: A Hybrid Stock Price Prediction Model Based on PRE
           and Deep Neural Network

    • Authors: Srivinay, B. Manujakshi, Mohan Kabadi, Nagaraj Naik
      First page: 51
      Abstract: Stock prices are volatile due to different factors that are involved in the stock market, such as geopolitical tension, company earnings, and commodity prices, affecting stock price. Sometimes stock prices react to domestic uncertainty such as reserve bank policy, government policy, inflation, and global market uncertainty. The volatility estimation of stock is one of the challenging tasks for traders. Accurate prediction of stock price helps investors to reduce the risk in portfolio or investment. Stock prices are nonlinear. To deal with nonlinearity in data, we propose a hybrid stock prediction model using the prediction rule ensembles (PRE) technique and deep neural network (DNN). First, stock technical indicators are considered to identify the uptrend in stock prices. We considered moving average technical indicators: moving average 20 days, moving average 50 days, and moving average 200 days. Second, using the PRE technique-computed different rules for stock prediction, we selected the rules with the lowest root mean square error (RMSE) score. Third, the three-layer DNN is considered for stock prediction. We have fine-tuned the hyperparameters of DNN, such as the number of layers, learning rate, neurons, and number of epochs in the model. Fourth, the average results of the PRE and DNN prediction model are combined. The hybrid stock prediction model results are computed using the mean absolute error (MAE) and RMSE metric. The performance of the hybrid stock prediction model is better than the single prediction model, namely DNN and ANN, with a 5% to 7% improvement in RMSE score. The Indian stock price data are considered for the work.
      Citation: Data
      PubDate: 2022-04-20
      DOI: 10.3390/data7050051
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 52: Measurements of User and Sensor Data from the
           Internet of Things (IoT) Devices

    • Authors: Aleksandr Ometov, Joaquín Torres-Sospedra
      First page: 52
      Abstract: The evolution of modern cyber-physical systems and the tremendous growth in the number of interconnected Internet of Things (IoT) devices are already paving new ways for the development of improved data collection and processing methods [...]
      Citation: Data
      PubDate: 2022-04-20
      DOI: 10.3390/data7050052
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 53: Dataset: Traffic Images Captured from UAVs for Use
           in Training Machine Vision Algorithms for Traffic Management

    • Authors: Sergio Bemposta Rosende, Sergio Ghisler, Javier Fernández-Andrés, Javier Sánchez-Soriano
      First page: 53
      Abstract: A dataset of Spanish road traffic images taken from unmanned aerial vehicles (UAV) is presented with the purpose of being used to train artificial vision algorithms, among which those based on convolutional neural networks stand out. This article explains the process of creating the complete dataset, which involves the acquisition of the data and images, the labeling of the vehicles, anonymization, data validation by training a simple neural network model, and the description of the structure and contents of the dataset (which amounts to 15,070 images). The images were captured by drones (but would be similar to those that could be obtained by fixed cameras) in the field of intelligent vehicle management. The presented dataset is available and accessible to improve the performance of road traffic vision and management systems since there is a lack of resources in this specific domain.
      Citation: Data
      PubDate: 2022-04-25
      DOI: 10.3390/data7050053
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 54: An Estimated-Travel-Time Data Scraping and
           Analysis Framework for Time-Dependent Route Planning

    • Authors: Hong-Le Tee, Soung-Yue Liew, Chee-Siang Wong, Boon-Yaik Ooi
      First page: 54
      Abstract: Generally, a courier company needs to employ a fleet of vehicles to travel through a number of locations in order to provide efficient parcel delivery services. The route planning of these vehicles can be formulated as a vehicle routing problem (VRP). Most existing VRP algorithms assume that the traveling durations between locations are time invariant; thus, they normally use only a set of estimated travel times (ETTs) to plan the vehicles’ routes; however, this is not realistic because the traffic pattern in a city varies over time. One solution to tackle the problem is to use different sets of ETTs for route planning in different time periods, and these data are collectively called the time-dependent estimated travel times (TD-ETTs). This paper focuses on a low-cost and robust solution to effectively scrape, process, clean, and analyze the TD-ETT data from free web-mapping services in order to gain the knowledge of the traffic pattern in a city in different time periods. To achieve the abovementioned goal, our proposed framework contains four phases, namely, (i) Full Data Scraping, (ii) Data Pre-Processing and Analysis, (iii) Fast Data Scraping, and (iv) Data Patching and Maintenance. In our experiment, we used the above framework to obtain the TD-ETT data across 68 locations in Penang, Malaysia, for six months. We then fed the data to a VRP algorithm for evaluation. We found that the performance of our low-cost approach is comparable with that of using the expensive paid data.
      Citation: Data
      PubDate: 2022-04-27
      DOI: 10.3390/data7050054
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 55: Manual for Calibrating Sound Speed and
           Poisson’s Ratio of (Split) Hopkinson Bar via Dispersion Correction
           Using Excel® and Matlab® Templates

    • Authors: Hyunho Shin
      First page: 55
      Abstract: This manual presents a procedure to calibrate the one-dimensional sound speed (co) and Poisson’s ratio (ν) of a (split) Hopkinson bar using the open-source templates written in Excel® and Matlab® for dispersion correction. The Excel® template carries out the Fourier synthesis and one-time dispersion correction of a traveling elastic pulse under a given set of co and ν. The MATLAB® template performs the Fourier synthesis and iterative dispersion correction of a traveling elastic pulse for a range of co and ν sets. In the case of the iterative dispersion correction, a set of co and ν is assumed at each iteration step, and the sound speed vs. frequency (cdc vs. fdc) relationship necessary for dispersion correction is obtained under the assumed set by solving the Pochhammer–Chree equation. Subsequently, dispersion correction is carried out by using the cdc vs. fdc relationship. The co and ν values of the bar are determined in the iteration process when the dispersion-corrected pulse profiles are reasonably consistent with the measured ones at two travel distances (2103 and 4000 mm) in the bar. In the case of the experimental profile considered herein, the ν and co values were calibrated to six and four decimal places, respectively. The calibration algorithm is described with the tips for using the open-source templates, which are available online in a publicly accessible repository.
      Citation: Data
      PubDate: 2022-04-28
      DOI: 10.3390/data7050055
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 56: Microscopic Imaging and Labeling Dataset for the
           Detection of Pneumocystis jirovecii Using Methenamine Silver Staining
           Method

    • Authors: Erick Reyes-Vera, Juan S. Botero-Valencia, Karen Arango-Bustamante, Alejandra Zuluaga, Tonny W. Naranjo
      First page: 56
      Abstract: Pneumocystis jirovecii pneumonia is one of the diseases that most affects immunocompromised patients today, and under certain circumstances, it can be fatal. On the other hand, more and more automatic tools based on artificial intelligence are required every day to help diagnose diseases and thus optimize the resources of the healthcare system. It is therefore important to develop techniques and mechanisms that enable early diagnosis. One of the most widely used techniques in diagnostic laboratories for the detection of its etiological agent, Pneumocystis jirovecii, is optical microscopy. Therefore, an image dataset of 29 different patients is presented in this work, which can be used to detect whether a patient is positive or negative for this fungi. These images were taken in at least four random positions on the specimen holder. The dataset consists of a total of 137 RGB images. Likewise, it contains realistic, annotated, and high-quality microscope images. In addition, we provide image segmentation and labeling that can also be used in numerous studies based on artificial intelligence implementation. The labeling was also validated by an expert, allowing it to be used as a reference in the training of automatic algorithms with supervised learning methods and thus to develop diagnostic assistance systems. Therefore, the dataset will open new opportunities for researchers working in image segmentation, detection, and classification problems related to Pneumocystis jirovecii pneumonia diagnosis.
      Citation: Data
      PubDate: 2022-04-29
      DOI: 10.3390/data7050056
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 57: Multi-Layer Web Services Discovery Using Word
           Embedding and Clustering Techniques

    • Authors: Waeal J. Obidallah, Bijan Raahemi, Waleed Rashideh
      First page: 57
      Abstract: We propose a multi-layer data mining architecture for web services discovery using word embedding and clustering techniques to improve the web service discovery process. The proposed architecture consists of five layers: web services description and data preprocessing; word embedding and representation; syntactic similarity; semantic similarity; and clustering. In the first layer, we identify the steps to parse and preprocess the web services documents. In the second layer, Bag of Words with Term Frequency–Inverse Document Frequency and three word-embedding models are employed for web services representation. In the third layer, four distance measures, namely, Cosine, Euclidean, Minkowski, and Word Mover, are considered to find the similarities between Web services documents. In layer four, WordNet and Normalized Google Distance are employed to represent and find the similarity between web services documents. Finally, in the fifth layer, three clustering algorithms, namely, affinity propagation, K-means, and hierarchical agglomerative clustering, are investigated for clustering of web services based on observed similarities in documents. We demonstrate how each component of the five layers is employed in web services clustering using randomly selected web services documents. We conduct experimental analysis to cluster web services using a collected dataset consisting of web services documents and evaluate their clustering performances. Using a ground truth for evaluation purposes, we observe that clusters built based on the word embedding models performed better than those built using the Bag of Words with Term Frequency–Inverse Document Frequency model. Among the three word embedding models, the pre-trained Word2Vec’s skip-gram model reported higher performance in clustering web services. Among the three semantic similarity measures, path-based WordNet similarity reported higher clustering performance. By considering the different word representations models and syntactic and semantic similarity measures, we found that the affinity propagation clustering technique performed better in discovering similarities among Web services.
      Citation: Data
      PubDate: 2022-05-04
      DOI: 10.3390/data7050057
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 58: Fundamentals and Applications of Artificial Neural
           Network Modelling of Continuous Bifidobacteria Monoculture at a Low Flow
           Rate

    • Authors: Sergey Dudarov, Elena Guseva, Yury Lemetyuynen, Ilya Maklyaev, Boris Karetkin, Svetlana Evdokimova, Pavel Papaev, Natalia Menshutina, Victor Panfilov
      First page: 58
      Abstract: The application of artificial neural networks (ANNs) to mathematical modelling in microbiology and biotechnology has been a promising and convenient tool for over 30 years because ANNs make it possible to predict complex multiparametric dependencies. This article is devoted to the investigation and justification of ANN choice for modelling the growth of a probiotic strain of Bifidobacterium adolescentis in a continuous monoculture, at low flow rates, under different oligofructose (OF) concentrations, as a preliminary study for a predictive model of the behaviour of intestinal microbiota. We considered the possibility and effectiveness of various classes of ANN. Taking into account the specifics of the experimental data, we proposed two-layer perceptrons as a mathematical modelling tool trained on the basis of the error backpropagation algorithm. We proposed and tested the mechanisms for training, testing and tuning the perceptron on the basis of both the standard ratio between the training and test sample volumes and under the condition of limited training data, due to the high cost, duration and the complexity of the experiments. We developed and tested the specific ANN models (class, structure, training settings, weight coefficients) with new data. The validity of the model was confirmed using RMSE, which was from 4.24 to 980% for different concentrations. The results showed the high efficiency of ANNs in general and bilayer perceptrons in particular in solving modelling tasks in microbiology and biotechnology, making it possible to recommend this tool for further wider applications.
      Citation: Data
      PubDate: 2022-05-06
      DOI: 10.3390/data7050058
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 59: A Comprehensive Dataset of the Spanish Research
           Output and Its Associated Social Media and Altmetric Mentions
           (2016–2020)

    • Authors: Wenceslao Arroyo-Machado, Nicolas Robinson-Garcia, Daniel Torres-Salinas
      First page: 59
      Abstract: This paper presents data on research publications authored by scientists affiliated with Spanish institutions between 2016 and 2020, along with their associated social media and altmetric mentions, and on researchers affiliated with Spanish institutions whose work is highly mentioned on social media and non-academic outlets. The first dataset contains 219,988 records and 24 attributes. Each observation represents a scientific publication (article, review or letter) extracted from the Web of Science database. For each record, we provide bibliographic metadata, its subject area and a battery of altmetric indicators extracted from Altmetric.com. The second dataset includes 4209 records and four attributes. Each record corresponds to a researcher. For each record, we include their full name, an author identifier (ORCID), their affiliation and their list of publications connecting to the first dataset.
      Citation: Data
      PubDate: 2022-05-07
      DOI: 10.3390/data7050059
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 60: Formation of Dataset for Fuzzy Quantitative Risk
           Assessment of LNG Bunkering SIMOPs

    • Authors: Hongjun Fan, Hossein Enshaei, Shantha Gamini Jayasinghe
      First page: 60
      Abstract: New international regulations aimed at decarbonizing maritime transportation are positively contributing to attention being paid to the use of liquefied natural gas (LNG) as a ship fuel. Scaling up LNG-fueled ships is highly dependent on safe bunkering operations, particularly during simultaneous operations (SIMOPs); therefore, performing a quantitative risk assessment (QRA) is either mandated or highly recommended, and a dynamic quantitative risk assessment (DQRA) has been developed to make up for the deficiencies of the traditional QRA. The QRA and DQRA are both data-driven processes, and so far, the data of occurrence rates (ORs) of basic events (BEs) in LNG bunkering SIMOPs are unavailable. To fill this gap, this study identified a total of 41 BEs and employed the online questionnaire method, the fuzzy set theory, and the Onisawa function to the investigation of the fuzzy ORs for the identified BEs. Purposive sampling was applied when selecting experts in the process of online data collection. The closed-ended structured questionnaire garnered responses from 137 experts from the industry and academia. The questionnaire, the raw data and obtained ORs, and the process of data analysis are presented in this data descriptor. The obtained data can be used directly in QRAs and DQRAs. This dataset is first of its kind and could be expanded further for research in the field of risk assessment of LNG bunkering.
      Citation: Data
      PubDate: 2022-05-08
      DOI: 10.3390/data7050060
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 61: An Ensemble Model for Predicting Retail Banking
           Churn in the Youth Segment of Customers

    • Authors: Vijayakumar Bharathi S, Dhanya Pramod, Ramakrishnan Raman
      First page: 61
      Abstract: (1) This study aims to predict the youth customers’ defection in retail banking. The sample comprised 602 young adult bank customers. (2) The study applied Machine learning techniques, including ensembles, to predict the possibility of churn. (3) The absence of mobile banking, zero-interest personal loans, access to ATMs, and customer care and support were critical driving factors to churn. The ExtraTreeClassifier model resulted in an accuracy rate of 92%, and an AUC of 91.88% validated the findings. (4) Customer retention is one of the critical success factors for organizations so as to enhance the business value. It is imperative for banks to predict the drivers of churn among their young adult customers so as to create and deliver proactive enable quality services.
      Citation: Data
      PubDate: 2022-05-09
      DOI: 10.3390/data7050061
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 62: DriverMVT: In-Cabin Dataset for Driver Monitoring
           including Video and Vehicle Telemetry Information

    • Authors: Walaa Othman, Alexey Kashevnik, Ammar Ali, Nikolay Shilov
      First page: 62
      Abstract: Developing a driver monitoring system that can assess the driver’s state is a prerequisite and a key to improving the road safety. With the success of deep learning, such systems can achieve a high accuracy if corresponding high-quality datasets are available. In this paper, we introduce DriverMVT (Driver Monitoring dataset with Videos and Telemetry). The dataset contains information about the driver head pose, heart rate, and driver behaviour inside the cabin like drowsiness and unfastened belt. This dataset can be used to train and evaluate deep learning models to estimate the driver’s health state, mental state, concentration level, and his/her activity in the cabin. Developing such systems that can alert the driver in case of drowsiness or distraction can reduce the number of accidents and increase the safety on the road. The dataset contains 1506 videos for 9 different drivers (7 males and 2 females) with total number of frames equal 5119k and total time over 36 h. In addition, evaluated the dataset with multi-task temporal shift convolutional attention network (MTTS-CAN) algorithm. The algorithm mean average error on our dataset is 16.375 heartbeats per minute.
      Citation: Data
      PubDate: 2022-05-11
      DOI: 10.3390/data7050062
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 63: Geomorphological Data from Detonation Craters in
           the Fehmarn Belt, German Baltic Sea

    • Authors: Svenja Papenmeier, Alexander Darr, Peter Feldens
      First page: 63
      Abstract: Military munitions from World War I and II dumped at the seafloor are a threat to the marine environment and its users. Decades of saltwater exposure make the explosives fragile and difficult to dispose of. If required, the munition is blast-in-place. In August 2019, 42 ground mines were detonated in a controlled manner underwater during a NATO maneuver in the German Natura2000 Special Area of Conservation Fehmarn Belt, the Baltic Sea. In June 2020, four detonation craters were investigated with a multibeam echosounder for the first time. This dataset is represented here as maps of bathymetry, slope angle, and height difference to the surrounding. The circular craters were still clearly visible a year after the detonation. The diameter and depth of the structures were between 7.5–12.6 m and 0.7–2.2 m, respectively. In total, about 321 m2 of the seafloor was destroyed along the track line.
      Citation: Data
      PubDate: 2022-05-11
      DOI: 10.3390/data7050063
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 64: Comprehensive Landscape of STEAP Family Members
           Expression in Human Cancers: Unraveling the Potential Usefulness in
           Clinical Practice Using Integrated Bioinformatics Analysis

    • Authors: Sandra M. Rocha, Sílvia Socorro, Luís A. Passarinha, Cláudio J. Maia
      First page: 64
      Abstract: The human Six-Transmembrane Epithelial Antigen of the Prostate (STEAP) family comprises STEAP1-4. Several studies have pointed out STEAP proteins as putative biomarkers, as well as therapeutic targets in several types of human cancers, particularly in prostate cancer. However, the relationships and significance of the expression pattern of STEAP1-4 in cancer cases are barely known. Herein, the Oncomine database and cBioPortal platform were selected to predict the differential expression levels of STEAP members and clinical prognosis. The most common expression pattern observed was the combination of the over- and underexpression of distinct STEAP genes, but cervical and gastric cancer and lymphoma showed overexpression of all STEAP genes. It was also found that STEAP genes’ expression levels were already deregulated in benign lesions. Regarding the prognostic value, it was found that STEAP1 (prostate), STEAP2 (brain and central nervous system), STEAP3 (kidney, leukemia and testicular) and STEAP4 (bladder, cervical, gastric) overexpression correlate with lower patient survival rate. However, in prostate cancer, overexpression of the STEAP4 gene was correlated with a higher survival rate. Overall, this study first showed that the expression levels of STEAP genes are highly variable in human cancers, which may be related to different patients’ outcomes.
      Citation: Data
      PubDate: 2022-05-11
      DOI: 10.3390/data7050064
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 65: A Deep Learning Framework for Detection of
           COVID-19 Fake News on Social Media Platforms

    • Authors: Yahya Tashtoush, Balqis Alrababah, Omar Darwish, Majdi Maabreh, Nasser Alsaedi
      First page: 65
      Abstract: The fast growth of technology in online communication and social media platforms alleviated numerous difficulties during the COVID-19 epidemic. However, it was utilized to propagate falsehoods and misleading information about the disease and the vaccination. In this study, we investigate the ability of deep neural networks, namely, Long Short-Term Memory (LSTM), Bi-directional LSTM, Convolutional Neural Network (CNN), and a hybrid of CNN and LSTM networks, to automatically classify and identify fake news content related to the COVID-19 pandemic posted on social media platforms. These deep neural networks have been trained and tested using the “COVID-19 Fake News” dataset, which contains 21,379 real and fake news instances for the COVID-19 pandemic and its vaccines. The real news data were collected from independent and internationally reliable institutions on the web, such as the World Health Organization (WHO), the International Committee of the Red Cross (ICRC), the United Nations (UN), the United Nations Children’s Fund (UNICEF), and their official accounts on Twitter. The fake news data were collected from different fact-checking websites (such as Snopes, PolitiFact, and FactCheck). The evaluation results showed that the CNN model outperforms the other deep neural networks with the best accuracy of 94.2%.
      Citation: Data
      PubDate: 2022-05-13
      DOI: 10.3390/data7050065
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 66: Datasets on Energy Simulations of Standard and
           Optimized Buildings under Current and Future Weather Conditions across
           Europe

    • Authors: Delia D’Agostino, Danny Parker, Ilenia Epifani, Dru Crawley, Linda Lawrie
      First page: 66
      Abstract: The building sector has a strategic role in the clean energy transition towards a fully decarbonized stock by mid-century. This data article investigates the use of different weather datasets in building energy simulations across Europe. It focuses on a standard performing building optimized to a nearly-zero level accounting for climate projections towards 2060. The provided data quantify the building energy performance in the current and future scenarios. The article investigates how heating and cooling loads change depending on the location and climate scenario. Hourly weather datasets frequently used in building energy simulations are analyzed to investigate how climatic conditions have changed over recent decades. The data give insight into the implications of the use of weather datasets on buildings in terms of energy consumption, efficiency measures (envelope, appliances, systems), costs, and renewable production. Due to the ongoing changing climate, basing building energy simulations and design optimization on obsolete weather data may produce inaccurate results and related building designs with an increased energy consumption in the coming decades. Energy efficiency will become more crucial in the future when cooling and overheating will have to be controlled with appropriate measures used in combination with renewable energy sources.
      Citation: Data
      PubDate: 2022-05-14
      DOI: 10.3390/data7050066
      Issue No: Vol. 7, No. 5 (2022)
       
  • Data, Vol. 7, Pages 38: Comprehensive Data via Spectroscopy and Molecular
           Dynamics of Chemically Treated Graphene Nanoplatelets

    • Authors: Olasunbo Z. Farinre, Hawazin Alghamdi, Swapnil M. Mhatre, Mathew L. Kelley, Adam J. Biacchi, Albert V. Davydov, Christina A. Hacker, Albert F. Rigosi, Prabhakar Misra
      First page: 38
      Abstract: Graphene nanoplatelets (GnPs) are promising candidates for gas sensing applications because they have a high surface area to volume ratio, high conductivity, and a high temperature stability. The information provided in this data article will cover the surface and structural properties of pure and chemically treated GnPs, specifically with carboxyl, ammonia, nitrogen, oxygen, fluorocarbon, and argon. Molecular dynamics and adsorption calculations are provided alongside characterization data, which was performed with Raman spectroscopy, X-ray photoelectron spectroscopy (XPS), and X-ray diffraction (XRD) to determine the functional groups present and effects of those groups on the structural and vibrational properties. Certain features in the observed Raman spectra are attributed to the variations in concentration of the chemically treated GnPs. XRD data show smaller crystallite sizes for chemically treated GnPs that agree with images acquired with scanning electron microscopy. A molecular dynamics simulation is also employed to gain a better understanding of the Raman and adsorption properties of pure GnPs.
      Citation: Data
      PubDate: 2022-03-29
      DOI: 10.3390/data7040038
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 39: OpenStreetMap Contribution to Local Data
           Ecosystems in COVID-19 Times: Experiences and Reflections from the Italian
           Case

    • Authors: Marco Minghini, Alessandro Sarretta, Maurizio Napolitano
      First page: 39
      Abstract: Data and digital technologies have been at the core of the societal response to COVID-19 since the beginning of the pandemic. This work focuses on the specific contribution of the OpenStreetMap (OSM) project to address the early stage of the COVID-19 crisis (approximately from February to May 2020) in Italy. Several activities initiated by the Italian OSM community are described, including: mapping ‘red zones’ (the first municipalities affected by the emergency); updating OSM pharmacies based on the authoritative dataset from the Ministry of Health; adding information on delivery services of commercial activities during COVID-19 times; publishing web maps to offer COVID-19-specific information at the local level; and developing software tools to help collect new data. Those initiatives are analysed from a data ecosystem perspective, identifying the actors, data and data flows involved, and reflecting on the enablers and barriers for their success from a technical, organisational and legal point of view. The OSM project itself is then assessed in the wider European policy context, in particular against the objectives of the recent European strategy for data, highlighting opportunities and challenges for scaling successful approaches such as those to fight COVID-19 from the local to the national and European scales.
      Citation: Data
      PubDate: 2022-03-31
      DOI: 10.3390/data7040039
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 40: Dataset of Annotated Virtual Detection Line for
           Road Traffic Monitoring

    • Authors: Ivars Namatēvs, Roberts Kadiķis, Anatolijs Zencovs, Laura Leja, Artis Dobrājs
      First page: 40
      Abstract: Monitoring, detection, and control of traffic is a serious problem in many cities and on roads around the world and poses a problem for effective and safe control and management of pedestrians with edge devices. Systems using the computer vision approach must ensure the safety of citizens and minimize the risk of traffic collisions. This approach is well suited for multiple object detection by automatic video surveillance cameras on roads, highways, and pedestrian walkways. A new Annotated Virtual Detection Line (AVDL) dataset is presented for multiple object detection, consisting of 74,108 data files and 74,108 manually annotated files divided into six classes: Vehicles, Trucks, Pedestrians, Bicycles, Motorcycles, and Scooters from the video. The data were captured from real road scenes using 50 video cameras from the leading video camera manufacturers at different road locations and under different meteorological conditions. The AVDL dataset consists of two directories, the Data directory and the Labels directory. Both directories provide the data as NumPy arrays. The dataset can be used to train and test deep neural network models for traffic and pedestrian detection, recognition, and counting.
      Citation: Data
      PubDate: 2022-03-31
      DOI: 10.3390/data7040040
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 41: Dataset: Variable Message Signal Annotated Images
           for Object Detection

    • Authors: Enrique Puertas, Gonzalo De-Las-Heras, Javier Sánchez-Soriano, Javier Fernández-Andrés
      First page: 41
      Abstract: This publication presents a dataset consisting of Spanish road images taken from inside a vehicle, as well as annotations in XML files in PASCAL VOC format that indicate the location of Variable Message Signals within them. Additionally, a CSV file is attached with information regarding the geographic position, the folder where the image is located and the text in Spanish. This can be used to train supervised learning computer vision algorithms such as convolutional neural networks. Throughout this work, the process followed to obtain the dataset, image acquisition and labeling and its specifications are detailed. The dataset constitutes 1216 instances, 888 positives and 328 negatives, in 1152 jpg images with a resolution of 1280 × 720 pixels. These are divided into 756 real images and 756 images created from the data-augmentation technique. The purpose of this dataset is to help in road computer vision research since there is not one specifically for VMSs.
      Citation: Data
      PubDate: 2022-04-01
      DOI: 10.3390/data7040041
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 42: Climate Data to Support the Adaptation of
           Buildings to Climate Change in Canada

    • Authors: Abhishek Gaur, Michael Lacasse
      First page: 42
      Abstract: Climate change will continue to bring about unprecedented climate extremes in the future, and buildings and infrastructure will be exposed to such conditions. To ensure that new and existing buildings deliver satisfactory performance over their design lives, their performance under current and future projected climates needs to be assessed by undertaking building simulations. This study prepares climate data needed for building simulations for 564 locations by bias-correcting the Canadian Regional Climate Model version 4 (CanRCM4) large ensemble (LE) simulations with reference to observations. Technical validation results show that bias-correction effectively reduces the bias associated with CanRCM4-LE simulations in terms of their marginal distributions and the inter-relationship between climate variables. To ensure that the range of projected climate change impacts are encompassed within these data sets, and to furthermore provide building moisture and energy reference years, the reference year files were prepared from bias-corrected CanRCM4-LE simulations and are comprised of a typical meteorological year for building energy applications, a typical and extreme moisture reference year, a typical downscaled year, an extreme warm year, and an extreme cold year.
      Citation: Data
      PubDate: 2022-04-06
      DOI: 10.3390/data7040042
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 43: Data on Gastrointestinal and Claw Disorders as
           Possible Predictive Factors in Beef Cattle and Veal Calves’ Health
           and Welfare

    • Authors: Luisa Magrin, Barbara Contiero, Giulio Cozzi, Flaviana Gottardo
      First page: 43
      Abstract: Today, consumers have a growing concern about the welfare of beef cattle, and specific schemes have been proposed to assess their wellbeing during the fattening. On-farm assessments can be integrated and partially replaced by animal-based measures recorded postmortem at the abattoir. Postmortem organ inspection data are of value, as several lesions can be reflective of subclinical diseases not easily detected in the live animal. The present data collection aimed to evaluate the slaughterhouse prevalence and location of hoof, gastric, hepatic, and liver lesions in beef cattle and veal calves and retrospectively associated this information with the animals’ housing and feeding management systems. Individual data on gastrointestinal and claw disorders of beef cattle (bulls and heifers) and veal calves were collected through a postmortem inspection by trained veterinarians directly at the slaughter line. Around 15 animals/batch, belonging to 97 batches of young bulls, 56 batches of beef heifers, and 41 batches of veal calves were inspected in three slaughterhouses located in Northern Italy during 30 sampling days, and information on the animals’ rearing systems were gathered a posteriori from farmer interviews. The implementation of this recording system should promote a continuous improvement of beef cattle management from a health and welfare perspective.
      Citation: Data
      PubDate: 2022-04-06
      DOI: 10.3390/data7040043
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 44: Using Social Media to Detect Fake News Information
           Related to Product Marketing: The FakeAds Corpus

    • Authors: Noha Alnazzawi, Najlaa Alsaedi, Fahad Alharbi, Najla Alaswad
      First page: 44
      Abstract: Nowadays, an increasing portion of our lives is spent interacting online through social media platforms, thanks to the widespread adoption of the latest technology and the proliferation of smartphones. Obtaining news from social media platforms is fast, easy, and less expensive compared with other traditional media platforms, e.g., television and newspapers. Therefore, social media is now being exploited to disseminate fake news and false information. This research aims to build the FakeAds corpus, which consists of tweets for product advertisements. The aim of the FakeAds corpus is to study the impact of fake news and false information in advertising and marketing materials for specific products and which types of products (i.e., cosmetics, health, fashion, or electronics) are targeted most on Twitter to draw the attention of consumers. The corpus is unique and novel, in terms of the very specific topic (i.e., the role of Twitter in disseminating fake news related to production promotion and advertisement) and also in terms of its fine-grained annotations. The annotation guidelines were designed with guidance by a domain expert, and the annotation is performed by two domain experts, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.815.
      Citation: Data
      PubDate: 2022-04-07
      DOI: 10.3390/data7040044
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 45: Classification of Building Types in Germany: A
           Data-Driven Modeling Approach

    • Authors: Abhilash Bandam, Eedris Busari, Chloi Syranidou, Jochen Linssen, Detlef Stolten
      First page: 45
      Abstract: Details on building levels play an essential part in a number of real-world application models. Energy systems, telecommunications, disaster management, the internet-of-things, health care, and marketing are a few of the many applications that require building information. The essential variables that most of these models require are building type, house type, area of living space, and number of residents. In order to acquire some of this information, this paper introduces a methodology and generates corresponding data. The study was conducted for specific applications in energy system modeling. Nonetheless, these data can also be used in other applications. Building locations and some of their details are openly available in the form of map data from OpenStreetMap (OSM). However, data regarding building types (i.e., residential, industrial, office, single-family house, multi-family house, etc.) are only partially available in the OSM dataset. Therefore, a machine learning classification algorithm for predicting the building types on the basis of the OSM buildings’ data was introduced. Although the OSM dataset is the fundamental and most crucial one used for modeling, the machine learning algorithm’s training was performed on a dataset that was prepared by combining several features from three other datasets. The generated dataset consists of approximately 29 million buildings, of which about 19 million are residential, with 72% being single-family houses and the rest multi-family ones that include two-family houses and apartment buildings. Furthermore, the results were validated through a comparison with publicly available statistical data. The comparison of the resulting data with official statistics reveals that there is a percentage error of 3.64% for residential buildings, 13.14% for single-family houses, and −15.38% for multi-family houses classification. Nevertheless, by incorporating the building types, this dataset is able to complement existing building information in studies in which building type information is crucial.
      Citation: Data
      PubDate: 2022-04-09
      DOI: 10.3390/data7040045
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 46: A Collection of 30 Multidimensional Functions for
           Global Optimization Benchmarking

    • Authors: Vagelis Plevris, German Solorzano
      First page: 46
      Abstract: A collection of thirty mathematical functions that can be used for optimization purposes is presented and investigated in detail. The functions are defined in multiple dimensions, for any number of dimensions, and can be used as benchmark functions for unconstrained multidimensional single-objective optimization problems. The functions feature a wide variability in terms of complexity. We investigate the performance of three optimization algorithms on the functions: two metaheuristic algorithms, namely Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), and one mathematical algorithm, Sequential Quadratic Programming (SQP). All implementations are done in MATLAB, with full source code availability. The focus of the study is both on the objective functions, the optimization algorithms used, and their suitability for solving each problem. We use the three optimization methods to investigate the difficulty and complexity of each problem and to determine whether the problem is better suited for a metaheuristic approach or for a mathematical method, which is based on gradients. We also investigate how increasing the dimensionality affects the difficulty of each problem and the performance of the optimizers. There are functions that are extremely difficult to optimize efficiently, especially for higher dimensions. Such examples are the last two new objective functions, F29 and F30, which are very hard to optimize, although the optimum point is clearly visible, at least in the two-dimensional case.
      Citation: Data
      PubDate: 2022-04-11
      DOI: 10.3390/data7040046
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 47: Dataset: Roundabout Aerial Images for Vehicle
           Detection

    • Authors: Enrique Puertas, Gonzalo De-Las-Heras, Javier Fernández-Andrés, Javier Sánchez-Soriano
      First page: 47
      Abstract: This publication presents a dataset of Spanish roundabouts aerial images taken from a UAV, along with annotations in PASCAL VOC XML files that indicate the position of vehicles within them. Additionally, a CSV file is attached containing information related to the location and characteristics of the captured roundabouts. This work details the process followed to obtain them: image capture, processing, and labeling. The dataset consists of 985,260 total instances: 947,400 cars, 19,596 cycles, 2262 trucks, 7008 buses, and 2208 empty roundabouts in 61,896 1920 × 1080 px JPG images. These are divided into 15,474 extracted images from 8 roundabouts with different traffic flows and 46,422 images created using data augmentation techniques. The purpose of this dataset is to help research into computer vision on the road, as such labeled images are not abundant. It can be used to train supervised learning models, such as convolutional neural networks, which are very popular in object detection.
      Citation: Data
      PubDate: 2022-04-12
      DOI: 10.3390/data7040047
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 48: A Dataset of Dropout Rates and Other School-Level
           Variables in Louisiana Public High Schools

    • Authors: Michael Stein, Michael Leitner, Jill C. Trepanier, Kory Konsoer
      First page: 48
      Abstract: Students dropping out of high school is a nationwide problem in the United States, plaguing communities and often greatly reducing the prospects of a quality life for those students who do not complete their high school education. The state of Louisiana consistently has among the highest public high school dropout rates in the United States and, often, the highest. This massive dataset of school variables covering a duration of five academic years (2014–2015 to 2018–2019) was originally compiled with the intention of identifying the factors that correlate with high school dropouts in Louisiana public high schools, specifically. However, it can be useful to any researchers interested in analyzing school-level data concerning a wide range of variables beyond merely dropout rates. This dataset also contains socioeconomic demographics, financial variables, class size, and much more. The correlation analyses ultimately revealed many intriguing insights into the relationships between the tested variables and the dropout rates.
      Citation: Data
      PubDate: 2022-04-12
      DOI: 10.3390/data7040048
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 49: The Missing Case of Disinformation from the
           Cybersecurity Risk Continuum: A Comparative Assessment of Disinformation
           with Other Cyber Threats

    • Authors: Kevin Matthe Caramancion, Yueqi Li, Elisabeth Dubois, Ellie Seoe Jung
      First page: 49
      Abstract: This study examines the phenomenon of disinformation as a threat in the realm of cybersecurity. We have analyzed multiple authoritative cybersecurity standards, manuals, handbooks, and literary works. We present the unanimous meaning and construct of the term cyber threat. Our results reveal that although their definitions are mostly consistent, most of them lack the inclusion of disinformation in their list/glossary of cyber threats. We then proceeded to dissect the phenomenon of disinformation through the lens of cyber threat epistemology; it displays the presence of the necessary elements required (i.e., threat agent, attack vector, target, impact, defense) for its appropriate classification. To conjunct this, we have also included an in-depth comparative analysis of disinformation and its similar nature and characteristics with the prevailing and existing cyber threats. We, therefore, argue for its recommendation as an official and actual cyber threat. The significance of this paper, beyond the taxonomical correction it recommends, rests in the hope that it influences future policies and regulations in combatting disinformation and its propaganda.
      Citation: Data
      PubDate: 2022-04-12
      DOI: 10.3390/data7040049
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 50: HAGDAVS: Height-Augmented Geo-Located Dataset for
           Detection and Semantic Segmentation of Vehicles in Drone Aerial
           Orthomosaics

    • Authors: John R. Ballesteros, German Sanchez-Torres, John W. Branch-Bedoya
      First page: 50
      Abstract: Detection and Semantic Segmentation of vehicles in drone aerial orthomosaics has applications in a variety of fields such as security, traffic and parking management, urban planning, logistics, and transportation, among many others. This paper presents the HAGDAVS dataset fusing RGB spectral channel and Digital Surface Model DSM for the detection and segmentation of vehicles from aerial drone images, including three vehicle classes: cars, motorcycles, and ghosts (motorcycle or car). We supply DSM as an additional variable to be included in deep learning and computer vision models to increase its accuracy. RGB orthomosaic, RG-DSM fusion, and multi-label mask are provided in Tag Image File Format. Geo-located vehicle bounding boxes are provided in GeoJSON vector format. We also describes the acquisition of drone data, the derived products, and the workflow to produce the dataset. Researchers would benefit from using the proposed dataset to improve results in the case of vehicle occlusion, geo-location, and the need for cleaning ghost vehicles. As far as we know, this is the first openly available dataset for vehicle detection and segmentation, comprising RG-DSM drone data fusion and different color masks for motorcycles, cars, and ghosts.
      Citation: Data
      PubDate: 2022-04-14
      DOI: 10.3390/data7040050
      Issue No: Vol. 7, No. 4 (2022)
       
  • Data, Vol. 7, Pages 26: Deceptive Content Labeling Survey Data from Two
           U.S. Midwestern Universities

    • Authors: Ryan Suttle, Scott Hogan, Rachel Aumaugher, Matthew Spradling, Zak Merrigan, Jeremy Straub
      First page: 26
      Abstract: Intentionally deceptive online content seeks to manipulate individuals in their roles as voters, consumers, and participants in society at large. While this problem is pronounced, techniques to combat it may exist. To analyze the problem and potential solutions, we conducted three surveys relating to how news consumption decisions are made and the impact of labels on decision making. This article describes these three surveys and the data that were collected by them.
      Citation: Data
      PubDate: 2022-02-22
      DOI: 10.3390/data7030026
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 27: Impact of p53 Knockout on Protein Data Set of
           HaCaT Cells in Confluent and Subconfluent Conditions

    • Authors: Alexander L. Rusanov, Daniil D. Romashin, Peter M. Kozhin, Maxim N. Karagyaur, Dmitry S. Loginov, Olga V. Tikhonova, Victor G. Zgoda, Nataliya G. Luzgina
      First page: 27
      Abstract: The immortalized keratinocytes, HaCaT, are a popular model for skin research (toxicity, irritation, allergic reactions, or interaction of cells). They maintain a stable keratinocyte phenotype and respond to keratinocyte differentiation stimuli. However, programs of stratification and expression of differentiation markers in HaCaT keratinocytes are aberrant. In HaCaT cells, there are two mutant p53 alleles (i.e., R282Q and H179Y) that contain gain-of-function (GOF) mutations resulting from spontaneous immortalization (mutp53). At the same time, mutp53 acts as a transcription factor and also affects the interaction of p63 protein with its transcription targets. Proteins of the p53 family are crucial for regulation of proliferation and differentiation processes in human keratinocytes, although the involvement of mutp53 in these processes is not fully clear. We present data sets obtained as a result of high-performance proteomic analysis of immortalized HaCaT keratinocytes with p53 knockout in two different states, subconfluent and confluent, which are characterized by different intensites of cell differentiation processes. To obtain the proteomic profiles of the cells, we applied LC-MS/MS measurements processed with MaxQuant software (version 1.6.3.4).
      Citation: Data
      PubDate: 2022-02-23
      DOI: 10.3390/data7030027
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 28: Artificial Intelligence Computing at the Quantum
           Level

    • Authors: Olawale Ayoade, Pablo Rivas, Javier Orduz
      First page: 28
      Abstract: The extraordinary advance in quantum computation leads us to believe that, in the not-too-distant future, quantum systems will surpass classical systems. Moreover, the field’s rapid growth has resulted in the development of many critical tools, including programmable machines (quantum computers) that execute quantum algorithms and the burgeoning field of quantum machine learning, which investigates the possibility of faster computation than traditional machine learning. In this paper, we provide a thorough examination of quantum computing from the perspective of a physicist. The purpose is to give laypeople and scientists a broad but in-depth understanding of the area. We also recommend charts that summarize the field’s diversions to put the whole field into context.
      Citation: Data
      PubDate: 2022-02-25
      DOI: 10.3390/data7030028
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 29: H-Prop and H-Prop-News: Computational Propaganda
           Datasets in Hindi

    • Authors: Deptii Chaudhari, Ambika Vishal Pawar, Alberto Barrón-Cedeño
      First page: 29
      Abstract: In this digital era, people rely on the internet for their news consumption. As people are free to express their opinions on social media, much information shared on the internet is loaded with propaganda. Propagandist contents are intended to influence public opinion. In the mainstream media or prominent news agencies, the authors’ and news agencies’ own bias may impact in the news contents. Hence, it is required to detect such propaganda spread through news articles. Detection and classification of propagandist text require standard, high-quality, annotated datasets. A few datasets are available for propaganda classification. However, these datasets are mostly in English. Hindi is the most spoken language in India, and efforts are needed to detect its propagandist contents. This research work introduces two new datasets: H-Prop and H-Prop-News, which consist of news articles in Hindi annotated as propaganda or non-propaganda. The H-Prop dataset is generated by translating 28,630 news articles from the QProp dataset. The H-Prop-News dataset contains 5500 news articles collected from 32 prominent Hindi news websites. We experiment with the proposed datasets using four supervised machine learning models combined with different feature vectors and word embeddings. Our experiments achieve 87% accuracy using Logistic Regression with TF-IDF feature vectors. The datasets provide high-quality labeled news articles in Hindi and open new avenues for researchers to explore techniques for analyzing and classifying propaganda in Hindi text.
      Citation: Data
      PubDate: 2022-02-28
      DOI: 10.3390/data7030029
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 30: Bihourly Subterranean Temperature and Relative
           Humidity Data from the Nullarbor Plain, Australia (Nov 2019–Mar
           2021)

    • Authors: Matej Lipar, Mateja Ferk
      First page: 30
      Abstract: This research provides bihourly temperature and relative humidity data from ten measuring locations in eight caves from one of the largest contiguous arid karst areas in the world, the Nullarbor Plain in south Australia. The current data span the period from November 2019 to March 2021, and represent the first continuous published monitoring of the subterranean features in this area. The data were recorded using ten TGP-4500 Tinytag Plus 2 self-contained temperature (resolution ±0.01 °C or better with a reading range from −25 °C to +85 °C) and relative humidity (resolution ±3.0% or better with a reading range from 0% to 100%) data loggers and are available in the form of a spreadsheet. The text also describes reported (but only occasional) visits to the caves, so that the data for those particular days and/or hours can be treated as anthropogenically influenced. The data have great potential to provide insight into underground karst processes, air mass movements, hydrogeology, speleothems and (palaeo)climate, current climatic changes, and biology.
      Citation: Data
      PubDate: 2022-03-01
      DOI: 10.3390/data7030030
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 31: Monitoring a Bolted Vibrating Structure Using
           Multiple Acoustic Emission Sensors: A Benchmark

    • Authors: Emmanuel Ramasso, Benoît Verdin, Gaël Chevallier
      First page: 31
      Abstract: The dataset presented in this work, called ORION-AE, is made of raw AE data streams collected by three different AE sensors and a laser vibrometer during five campaigns of measurements by varying the tightening conditions of two bolted plates submitted to harmonic vibration tests. With seven different operating conditions, this dataset was designed to challenge supervised and unsupervised machine/deep learning as well as signal processing methods which are developed for material characterization or structural health monitoring (SHM). One motivation of this work was to create a common benchmark for comparing data-driven methods dedicated to AE data interpretation. The dataset is made of time series collected during an experiment designed to reproduce the loosening phenomenon observed in aeronautics, automotive, or civil engineering structures where parts are assembled together by means of bolted joints. Monitoring loosening in jointed structures during operation remains challenging because contact and friction in bolted joints induce a nonlinear stochastic behavior.
      Citation: Data
      PubDate: 2022-03-02
      DOI: 10.3390/data7030031
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 32: Dataset Documenting the Interactions of Biochar
           with Manure, Soil, and Plants: Towards Improved Sustainability of Animal
           and Crop Agriculture

    • Authors: Bonds, Koziel, De, Chen, Singh, Licht
      First page: 32
      Abstract: Plant and animal agriculture is a part of a larger system where the environment, soil, water, and nutrient management interact. Biochar (a pyrolyzed biomass) has been shown to affect the single components of this complex system positively. Biochar is a soil amendment, which has been documented for its benefits as a soil enhancer particularly to increase soil carbon, improve soil fertility, and better nutrient retention. These effects have been documented in the literature. Still, there is a need for a broader examination of these single components and effects that aims at the complementarity and synergy attainable with biochar and the animal and crop-production system. Thus, we report a comprehensive dataset documenting the interactions of biochar with manure, soil, and plants. We evaluated three biochars mixed with manure alongside both manure and soil controls for improvement in soil quality, reduction in nutrient movement, and increase in plant nutrient availability. We explain the experiments and the dataset that contains the physicochemical properties of each biochar–manure mixture, the physicochemical properties of soil amended with each biochar–manure mixture, and the biomass and nutrient information of plants grown in biochar–manure mixture-amended soil. This dataset is useful for continued research examining both the short- and long-term effects of biochar–manure mixtures on both plant and soil systems. In addition, these data will be beneficial to extend the findings to field settings for practical and realized gains.
      Citation: Data
      PubDate: 2022-03-02
      DOI: 10.3390/data7030032
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 33: A Data Resource for Prediction of Gas-Phase
           Thermodynamic Properties of Small Molecules

    • Authors: William Bains, Janusz Jurand Petkowski, Zhuchang Zhan, Sara Seager
      First page: 33
      Abstract: The thermodynamic properties of a substance are key to predicting its behavior in physical and chemical systems. Specifically, the enthalpy of formation and entropy of a substance can be used to predict whether reactions involving that substance will proceed spontaneously under conditions of constant temperature and pressure, and if they do, what the heat and work yield of those reactions would be. Prediction of enthalpy and entropy of substances is therefore of value for substances for which those parameters have not been experimentally measured. We developed a database of 2869 experimental values of enthalpy of formation and 1403 values for entropy for substances composed of stable small molecules, derived from the literature. We developed a model for predicting enthalpy of formation and entropy from semiempirical quantum mechanical calculations of energy and atom counts, and applied the model to a comprehensive database of 16,417 small molecules. The database of small-molecule thermodynamic properties will be useful for predicting the outcome of any process that might involve the generation or destruction of volatile products, such as atmospheric chemistry, volcanism, or waste pyrolysis. Additionally, the collected experimental thermodynamic values will be of value to others developing models to predict enthalpy and entropy.
      Citation: Data
      PubDate: 2022-03-11
      DOI: 10.3390/data7030033
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 34: Large-Scale Dataset for the Analysis of
           Outdoor-to-Indoor Propagation for 5G Mid-Band Operational Networks

    • Authors: Usman Ali, Giuseppe Caso, Luca De Nardis, Konstantinos Kousias, Mohammad Rajiullah, Özgü Alay, Marco Neri, Anna Brunstrom, Maria-Gabriella Di Benedetto
      First page: 34
      Abstract: Understanding radio propagation characteristics and developing channel models is fundamental to building and operating wireless communication systems. Among others uses, channel characterization and modeling can be used for coverage and performance analysis and prediction. Within this context, this paper describes a comprehensive dataset of channel measurements performed to analyze outdoor-to-indoor propagation characteristics in the mid-band spectrum identified for the operation of 5th Generation (5G) cellular systems. Previous efforts to analyze outdoor-to-indoor propagation characteristics in this band were made by using measurements collected on dedicated, mostly single-link setups. Hence, measurements performed on deployed and operational 5G networks still lack in the literature. To fill this gap, this paper presents a dataset of measurements performed over commercial 5G networks. In particular, the dataset includes measurements of channel power delay profiles from two 5G networks in Band n78, i.e., 3.3–3.8 GHz. Such measurements were collected at multiple locations in a large office building in the city of Rome, Italy by using the Rohde & Schwarz (R&S) TSMA6 network scanner during several weeks in 2020 and 2021. A primary goal of the dataset is to provide an opportunity for researchers to investigate a large set of 5G channel measurements, aiming at analyzing the corresponding propagation characteristics toward the definition and refinement of empirical channel propagation models.
      Citation: Data
      PubDate: 2022-03-15
      DOI: 10.3390/data7030034
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 35: Transcriptomic Response of Human Nosocomial
           Pathogen Pseudomonas aeruginosa Biofilms Following Continuous Exposure to
           Antibiotic-Impregnated Catheters

    • Authors: Kidon Sung, Dan Li, Jungwhan Chon, Ohgew Kweon, Minjae Kim, Joshua Xu, Miseon Park, Saeed A. Khan
      First page: 35
      Abstract: Biofilms are complex surface-attached bacterial communities that serve as a protective survival strategy to adapt to an environment. Bacterial contamination and biofilm formation on implantable medical devices pose a serious threat to human health, and these biofilms have become the most important source of nosocomial infections. Although antimicrobial-impregnated catheters have been employed to prevent bacterial infection, there have been concerns about the potential emergence of antibiotic resistance. To investigate the risk of developing resistance, we performed RNA-sequencing gene expression profiling of P. aeruginosa biofilms in response to chronic exposure to clindamycin and rifampicin eluted from antibiotic-coated catheters in a CDC biofilm bioreactor. There were 877 and 178 differentially expressed genes identified in planktonic and biofilm cells after growth for 144 h with control (without antibiotic-impregnation) and clindamycin/rifampicin-impregnated catheters, respectively. The differentially expressed genes were further analyzed by Clusters of Orthologous Groups (COGs) functional classification and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. The data are publicly available through the GEO database with accession number GSE153546.
      Citation: Data
      PubDate: 2022-03-17
      DOI: 10.3390/data7030035
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 36: Land Cover Map for Multifunctional Landscapes of
           Taita Taveta County, Kenya, Based on Sentinel-1 Radar, Sentinel-2 Optical,
           and Topoclimatic Data

    • Authors: Temesgen Alemayehu Abera, Ilja Vuorinne, Martha Munyao, Petri K. E. Pellikka, Janne Heiskanen
      First page: 36
      Abstract: Taita Taveta County (TTC) is one of the world’s biodiversity hotspots in the highlands with some of the world’s megafaunas in the lowlands. Detailed mapping of the terrestrial ecosystem of the whole county is of global significance for biodiversity conservation. Here, we present a land cover map for 2020 based on satellite observations, a machine learning algorithm, and a reference database for accuracy assessment. For the land cover map production processing chain, temporal metrics from Sentinel-1 and Sentinel-2 (such as median, quantiles, and interquartile range), vegetation indices from Sentinel-2 (normalized difference vegetation index, tasseled cap greenness, and tasseled cap wetness), topographic metrics (elevation, slope, and aspect), and mean annual rainfall were used as predictors in the gradient tree boost classification model. Reference sample points which were collected in the field were used to guide the collection of additional reference sample points based on high spatial resolution imagery for training and validation of the model. The accuracy of the land cover map and uncertainty of area estimates at 95% confidence interval were assessed using sample-based statistical inference. The land cover map has an overall accuracy of 81 ± 2.3% and it is freely accessible for land use planners, conservation managers, and researchers.
      Citation: Data
      PubDate: 2022-03-17
      DOI: 10.3390/data7030036
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 37: Towards a National-Scale Dataset of Geotechnical
           and Hydrological Soil Parameters for Shallow Landslide Modeling

    • Authors: Pietro Vannocci, Samuele Segoni, Elena Benedetta Masi, Francesco Cardi, Nicola Nocentini, Ascanio Rosi, Gabriele Bicocchi, Michele D’Ambrosio, Massimiliano Nocentini, Luca Lombardi, Veronica Tofani, Nicola Casagli, Filippo Catani
      First page: 37
      Abstract: One of the main constraints in assessing shallow landslide hazards through physically based models is the need to characterize the geotechnical parameters of the involved materials. Indeed, the quantity and quality of input data are closely related to the reliability of the results of every model used, therefore data acquisition is a critical and time-consuming step in every research activity. In this perspective, we reviewed all official certificates of tests performed through 30 years at the Geotechnics Laboratory of the Earth Science Department (University of Firenze, Firenze, Italy), compiling a dataset in which 380 points are accurately geolocated and provide information about one or more geotechnical parameters used in slope stability modeling. All tests performed in the past (in the framework of previous research programs, agreements of cooperation, or to support didactic activities) were gathered, homogenized, digitalized, and geotagged. The dataset is based on both on-site tests and laboratory tests, it accounts for 40 attributes, among which 13 are descriptive (e.g., lithology or location) and 27 may be of direct interest in slope stability modeling as input parameters. The dataset is made openly available and can be useful for scientists or practitioners committed to landslide modeling.
      Citation: Data
      PubDate: 2022-03-21
      DOI: 10.3390/data7030037
      Issue No: Vol. 7, No. 3 (2022)
       
  • Data, Vol. 7, Pages 12: Linking and Sharing Technology: Partnerships for
           Data Innovations for Management of Agricultural Big Data

    • Authors: Tulsi P. Kharel, Amanda J. Ashworth, Phillip R. Owens
      First page: 12
      Abstract: Combining data into a centralized, searchable, and linked platform will provide a data exploration platform to agricultural stakeholders and researchers for better agricultural decision making, thus fully utilizing existing data and preventing redundant research. Such a data repository requires readiness to share data, knowledge, and skillsets and working with Big Data infrastructures. With the adoption of new technologies and increased data collection, agricultural workforces need to update their knowledge, skills, and abilities. The partnerships for data innovation (PDI) effort integrates agricultural data by efficiently capturing them from field, lab, and greenhouse studies using a variety of sensors, tools, and apps and provides a quick visualization and summary of statistics for real-time decision making. This paper aims to evaluate and provide examples of case studies currently using PDI and use its long-term continental US database (18 locations and 24 years) to test the cover crop and grazing effects on soil organic carbon (SOC) storage. The results show that legume and rye (Secale cereale L.) cover crops increased SOC storage by 36% and 50%, respectively, compared with oat (Avena sativa L.) and rye mixtures and low and high grazing intensities improving the upper SOC by 69–72% compared with a medium grazing intensity. This was likely due to legumes providing a more favorable substrate for SOC formation and high grazing intensity systems having continuous manure deposition. Overall, PDI can be used to democratize data regionally and nationally and therefore can address large-scale research questions aimed at addressing agricultural grand challenges.
      Citation: Data
      PubDate: 2022-01-20
      DOI: 10.3390/data7020012
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 13: #PraCegoVer: A Large Dataset for Image Captioning
           in Portuguese

    • Authors: Gabriel Oliveira dos Santos, Esther Luna Colombini, Sandra Avila
      First page: 13
      Abstract: Automatically describing images using natural sentences is essential to visually impaired people’s inclusion on the Internet. This problem is known as Image Captioning. There are many datasets in the literature, but most contain only English captions, whereas datasets with captions described in other languages are scarce. We introduce the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese. In contrast to popular datasets, #PraCegoVer has only one reference per image, and both mean and variance of reference sentence length are significantly high, which makes our dataset challenging due to its linguistic aspect. We carry a detailed analysis to find the main classes and topics in our data. We compare #PraCegoVer to MS COCO dataset in terms of sentence length and word frequency. We hope that #PraCegoVer dataset encourages more works addressing the automatic generation of descriptions in Portuguese.
      Citation: Data
      PubDate: 2022-01-21
      DOI: 10.3390/data7020013
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 14: Analysing Computer Science Courses over Time

    • Authors: Renza Campagni, Donatella Merlini, Maria Cecilia Verri
      First page: 14
      Abstract: In this paper we consider courses of a Computer Science degree in an Italian university from the year 2011 up to 2020. For each course, we know the number of exams taken by students during a given calendar year and the corresponding average grade; we also know the average normalized value of the result obtained in the entrance test and the distribution of students according to the gender. By using classification and clustering techniques, we analyze different data sets obtained by pre-processing the original data with information about students and their exams, and highlight which courses show a significant deviation from the typical progression of the courses of the same teaching year, as time changes. Finally, we give heat maps showing the order in which exams were taken by graduated students. The paper shows a reproducible methodology that can be applied to any degree course with a similar organization, to identify courses that present critical issues over time. A strength of the work is to consider courses over time as variables of interest, instead of the more frequently used personal and academic data concerning students.
      Citation: Data
      PubDate: 2022-01-24
      DOI: 10.3390/data7020014
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 15: Managing FAIR Tribological Data Using Kadi4Mat

    • Authors: Nico Brandt, Nikolay T. Garabedian, Ephraim Schoof, Paul J. Schreiber, Philipp Zschumme, Christian Greiner, Michael Selzer
      First page: 15
      Abstract: The ever-increasing amount of data generated from experiments and simulations in engineering sciences is relying more and more on data science applications to generate new knowledge. Comprehensive metadata descriptions and a suitable research data infrastructure are essential prerequisites for these tasks. Experimental tribology, in particular, presents some unique challenges in this regard due to the interdisciplinary nature of the field and the lack of existing standards. In this work, we demonstrate the versatility of the open source research data infrastructure Kadi4Mat by managing and producing FAIR tribological data. As a showcase example, a tribological experiment is conducted by an experimental group with a focus on comprehensiveness. The result is a FAIR data package containing all produced data as well as machine- and user-readable metadata. The close collaboration between tribologists and software developers shows a practical bottom-up approach and how such infrastructures are an essential part of our FAIR digital future.
      Citation: Data
      PubDate: 2022-01-25
      DOI: 10.3390/data7020015
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 16: Regression-Based Approach to Test Missing Data
           Mechanisms

    • Authors: Serguei Rouzinov, André Berchtold
      First page: 16
      Abstract: Missing data occur in almost all surveys; in order to handle them correctly it is essential to know their type. Missing data are generally divided into three types (or generating mechanisms): missing completely at random, missing at random, and missing not at random. The first step to understand the type of missing data generally consists in testing whether the missing data are missing completely at random or not. Several tests have been developed for that purpose, but they have difficulties when dealing with non-continuous variables and data with a low quantity of missing data. Our approach checks whether the missing data are missing completely at random or missing at random using a regression model and a distribution test, and it can be applied to continuous and categorical data. The simulation results show that our regression-based approach tends to be more sensitive to the quantity and the type of missing data than the commonly used methods.
      Citation: Data
      PubDate: 2022-01-25
      DOI: 10.3390/data7020016
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 17: VC-SLAM—A Handcrafted Data Corpus for the
           Construction of Semantic Models

    • Authors: Andreas Burgdorf, Alexander Paulus, André Pomp, Tobias Meisen
      First page: 17
      Abstract: Ontology-based data management and knowledge graphs have emerged in recent years as efficient approaches for managing and utilizing diverse and large data sets. In this regard, research on algorithms for automatic semantic labeling and modeling as a prerequisite for both has made steady progress in the form of new approaches. The range of algorithms varies in the type of information used (data schema, values, or metadata), as well as in the underlying methodology (e.g., use of different machine learning methods or external knowledge bases). Approaches that have been established over the years, however, still come with various weaknesses. Most approaches are evaluated on few small data corpora specific to the approach. This reduces comparability and also limits statements for the general applicability and performance of those approaches. Other research areas, such as computer vision or natural language processing solve this problem by providing unified data corpora for the evaluation of specific algorithms and tasks. In this paper, we present and publish VC-SLAM to lay the necessary foundation for future research. This corpus allows the evaluation and comparison of semantic labeling and modeling approaches across different methodologies, and it is the first corpus that additionally allows to leverage textual data documentations for semantic labeling and modeling. Each of the contained 101 data sets consists of labels, data and metadata, as well as corresponding semantic labels and a semantic model that were manually created by human experts using an ontology that was explicitly built for the corpus. We provide statistical information about the corpus as well as a critical discussion of its strengths and shortcomings, and test the corpus with existing methods for labeling and modeling.
      Citation: Data
      PubDate: 2022-01-25
      DOI: 10.3390/data7020017
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 18: An Empirical Study on Data Validation Methods of
           Delphi and General Consensus

    • Authors: Puthearath Chan
      First page: 18
      Abstract: Data collection and review are the building blocks of academic research regardless of the discipline. The gathered and reviewed data, however, need to be validated in order to obtain accurate information. The Delphi consensus is known as a method for validating the data. However, several studies have shown that this method is time-consuming and requires a number of rounds to complete. Until now, there has been no clear evidence that validating data by a Delphi consensus is more significant than by a general consensus. In this regard, if data validation between both methods are not significantly different, then just using a general consensus method is sufficient, easier, and less time-consuming. Hence, this study aims to find out whether or not data validation by a Delphi consensus method is more significant than by a general consensus method. This study firstly collected and reviewed the data of sustainable building criteria, secondly validated these data by applying each consensus method, and finally made a comparison between both consensus methods. The results showed that seventeen of the valid criteria obtained from the general consensus and reduced by the Delphi consensus were found to be inconsistent for sustainable building assessments in Cambodia. Therefore, this study concludes that using the Delphi consensus method is more significant in validating the gathered and reviewed data. This experiment contributes to the selection and application of consensus methods in validating data, information, or criteria, especially in engineering fields.
      Citation: Data
      PubDate: 2022-01-27
      DOI: 10.3390/data7020018
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 19: Acknowledgment to Reviewers of Data in 2021

    • Authors: Data Editorial Office Data Editorial Office
      First page: 19
      Abstract: Rigorous peer-reviews are the basis of high-quality academic publishing [...]
      Citation: Data
      PubDate: 2022-01-28
      DOI: 10.3390/data7020019
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 20: Collaborative Data Use between Private and Public
           Stakeholders—A Regional Case Study

    • Authors: Claire Jean-Quartier, Miguel Rey Mazón, Mario Lovrić, Sarah Stryeck
      First page: 20
      Abstract: Research and development are facilitated by sharing knowledge bases, and the innovation process benefits from collaborative efforts that involve the collective utilization of data. Until now, most companies and organizations have produced and collected various types of data, and stored them in data silos that still have to be integrated with one another in order to enable knowledge creation. For this to happen, both public and private actors must adopt a flexible approach to achieve the necessary transition to break data silos and create collaborative data sharing between data producers and users. In this paper, we investigate several factors influencing cooperative data usage and explore the challenges posed by the participation in cross-organizational data ecosystems by performing an interview study among stakeholders from private and public organizations in the context of the project IDE@S, which aims at fostering the cooperation in data science in the Austrian federal state of Styria. We highlight technological and organizational requirements of data infrastructure, expertise, and practises towards collaborative data usage.
      Citation: Data
      PubDate: 2022-01-28
      DOI: 10.3390/data7020020
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 21: Development of a Web-Based Prediction System for
           Students’ Academic Performance

    • Authors: Dabiah Alboaneen, Modhe Almelihi, Rawan Alsubaie, Raneem Alghamdi, Lama Alshehri, Renad Alharthi
      First page: 21
      Abstract: Educational Data Mining (EDM) is used to extract and discover interesting patterns from educational institution datasets using Machine Learning (ML) algorithms. There is much academic information related to students available. Therefore, it is helpful to apply data mining to extract factors affecting students’ academic performance. In this paper, a web-based system for predicting academic performance and identifying students at risk of failure through academic and demographic factors is developed. The ML model is developed to predict the total score of a course at the early stages. Several ML algorithms are applied, namely: Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), Artificial Neural Network (ANN), and Linear Regression (LR). This model applies to the data of female students of the Computer Science Department at Imam Abdulrahman bin Faisal University (IAU). The dataset contains 842 instances for 168 students. Moreover, the results showed that the prediction’s Mean Absolute Percentage Error (MAPE) reached 6.34%, and the academic factors had a higher impact on students’ academic performance than the demographic factors, the midterm exam score in the top. The developed web-based prediction system is available on an online server and can be used by tutors.
      Citation: Data
      PubDate: 2022-01-29
      DOI: 10.3390/data7020021
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 22: The Comparison of Cybersecurity Datasets

    • Authors: Ahmed Alshaibi, Mustafa Al-Ani, Abeer Al-Azzawi, Anton Konev, Alexander Shelupanov
      First page: 22
      Abstract: Almost all industrial internet of things (IIoT) attacks happen at the data transmission layer according to a majority of the sources. In IIoT, different machine learning (ML) and deep learning (DL) techniques are used for building the intrusion detection system (IDS) and models to detect the attacks in any layer of its architecture. In this regard, minimizing the attacks could be the major objective of cybersecurity, while knowing that they cannot be fully avoided. The number of people resisting the attacks and protection system is less than those who prepare the attacks. Well-reasoned and learning-backed problems must be addressed by the cyber machine, using appropriate methods alongside quality datasets. The purpose of this paper is to describe the development of the cybersecurity datasets used to train the algorithms which are used for building IDS detection models, as well as analyzing and summarizing the different and famous internet of things (IoT) attacks. This is carried out by assessing the outlines of various studies presented in the literature and the many problems with IoT threat detection. Hybrid frameworks have shown good performance and high detection rates compared to standalone machine learning methods in a few experiments. It is the researchers’ recommendation to employ hybrid frameworks to identify IoT attacks for the foreseeable future.
      Citation: Data
      PubDate: 2022-01-29
      DOI: 10.3390/data7020022
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 23: Dataset for the Heat-Up and Heat Transfer towards
           Single Particles and Synthetic Particle Clusters from Particle-Resolved
           CFD Simulations

    • Authors: Mario Pichler, Markus Bösenhofer, Michael Harasek
      First page: 23
      Abstract: Heat transfer to particles is a key aspect of thermo-chemical conversion of pulverized fuels. These fuels tend to agglomerate in some areas of turbulent flow and to form particle clusters. Heat transfer and drag of such clusters are significantly different from single-particle approximations commonly used in Euler–Lagrange models. This fact prompted a direct numerical investigation of the heat transfer and drag behavior of synthetic particle clusters consisting of 44 spheres of uniform diameter (60 μm). Particle-resolved computational fluid dynamic simulations were carried out to investigate the heat fluxes, the forces acting upon the particle cluster, and the heat-up times of particle clusters with multiple void fractions (0.477–0.999) and varying relative velocities (0.5–25 m/s). The integral heat fluxes and exact particle positions for each particle in the cluster, integral heat fluxes, and the total acting force, derived from steady-state simulations, are reported for 85 different cases. The heat-up times of individual particles and the particle clusters are provided for six cases (three cluster void fractions and two relative velocities each). Furthermore, the heat-up times of single particles with different commonly used representative particle diameters are presented. Depending on the case, the particle Reynolds number, the cluster void fraction, the Nusselt number, and the cluster drag coefficient are included in the secondary data.
      Citation: Data
      PubDate: 2022-02-14
      DOI: 10.3390/data7020023
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 24: British Columbia’s Index of Multiple
           Deprivation for Community Health Service Areas

    • Authors: Sharon Relova, Yayuk Joffres, Drona Rasali, Li Rita Zhang, Geoffrey McKee, Naveed Janjua
      First page: 24
      Abstract: Area-based socio-economic indicators, such as the Canadian Index of Multiple Deprivation (CIMD), have been used in equity analyses to inform strategies to improve needs-based, timely, and effective patient care and public health services to communities. The CIMD comprises four dimensions of deprivation: residential instability, economic dependency, ethno-cultural composition, and situational vulnerability. Using the CIMD methodology, the British Columbia Index of Multiple Deprivation (BCIMD) was developed to create indexes at the Community Health Services Area (CHSA) level in British Columbia (BC). BCIMD indexes are reported by quintiles, where quintile 1 represents the least deprived (or ethno-culturally diverse), and quintile 5 is the most deprived (or diverse). Distinctive characteristics of a community can be captured using the BCIMD, where a given CHSA may have a high level of deprivation in one dimension and a low level of deprivation in another. The utility of this data as a surveillance tool to monitor population demography has been used to inform decision making in healthcare by stakeholders in the regional health authorities and governmental agencies. The data have also been linked to health care data, such as COVID-19 case incidence and vaccination coverage, to understand the epidemiology of disease burden through an equity lens.
      Citation: Data
      PubDate: 2022-02-21
      DOI: 10.3390/data7020024
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 25: A Mixture Hidden Markov Model to Mine
           Students’ University Curricula

    • Authors: Silvia Bacci, Bruno Bertaccini
      First page: 25
      Abstract: In the context of higher education, the wide availability of data gathered by universities for administrative purposes or for recording the evolution of students’ learning processes makes novel data mining techniques particularly useful to tackle critical issues. In Italy, current academic regulations allow students to customize the chronological sequence of courses they have to attend to obtain the final degree. This leads to a variety of sequences of exams, with an average time taken to obtain the degree that may significantly differ from the time established by law. In this contribution, we propose a mixture hidden Markov model to classify students into groups that are homogenous in terms of university paths, with the aim of detecting bottlenecks in the academic career and improving students’ performance.
      Citation: Data
      PubDate: 2022-02-21
      DOI: 10.3390/data7020025
      Issue No: Vol. 7, No. 2 (2022)
       
  • Data, Vol. 7, Pages 5: Open Government Data Use in the Brazilian States
           and Federal District Public Administrations

    • Authors: Ilka Kawashita, Ana Alice Baptista, Delfina Soares
      First page: 5
      Abstract: This research investigates whether, why, and how open government data (OGD) is used and reused by Brazilian state and district public administrations. A new online questionnaire was developed and collected data from 26 of the 27 federation units between June and July 2021. The resulting dataset was cleaned and anonymized. It contains an insight on 158 parameters for 26 federation units explored. This article describes the questionnaire metadata and the methods applied to collect and treat data. The data file was divided into four sections: respondent profile (identify the respondent and his workplace), OGD use/consumption, what OGD is used for by public administrations, and why OGD is used by public administrations (benefits, barriers, drivers, and barriers to OGD use/reuse). Results provide the state of the play of OGD use/reuse in the federation units administrations. Therefore, they could be used to inform open data policy and decision-making processes. Furthermore, they could be the starting point for discussing how OGD could better support the digital transformation in the public sector.
      Citation: Data
      PubDate: 2022-01-05
      DOI: 10.3390/data7010005
      Issue No: Vol. 7, No. 1 (2022)
       
  • Data, Vol. 7, Pages 6: Multi-Temporal Surface Water Classification for
           Four Major Rivers from the Peruvian Amazon

    • Authors: Margaret Kalacska, J. Pablo Arroyo-Mora, Oliver T. Coomes, Yoshito Takasaki, Christian Abizaid
      First page: 6
      Abstract: We describe a new minimum extent, persistent surface water classification for reaches of four major rivers in the Peruvian Amazon (i.e., Amazon, Napo, Pastaza, Ucayali). These data were generated by the Peruvian Amazon Rural Livelihoods and Poverty (PARLAP) Project which aims to better understand the nexus between livelihoods (e.g., fishing, agriculture, forest use, trade), poverty, and conservation in the Peruvian Amazon over a 35,000 km river network. Previous surface water datasets do not adequately capture the temporal changes in the course of the rivers, nor discriminate between primary main channel and non-main channel (e.g., oxbow lakes) water. We generated the surface water classifications in Google Earth Engine from Landsat TM 5, 7 ETM+, and 8 OLI satellite imagery for time periods from circa 1989, 2000, and 2015 using a hierarchical logical binary classification predominantly based on a modified Normalized Difference Water Index (mNDWI) and shortwave infrared surface reflectance. We included surface reflectance in the blue band and brightness temperature to minimize misclassification. High accuracies were achieved for all time periods (>90%).
      Citation: Data
      PubDate: 2022-01-06
      DOI: 10.3390/data7010006
      Issue No: Vol. 7, No. 1 (2022)
       
  • Data, Vol. 7, Pages 7: Knowledge Management Model for Smart Campus in
           Indonesia

    • Authors: Deden Sumirat Hidayat, Dana Indra Sensuse
      First page: 7
      Abstract: The application of smart campuses (SC), especially at higher education institutions (HEI) in Indonesia, is very diverse, and does not yet have standards. As a result, SC practice is spread across various areas in an unstructured and uneven manner. KM is one of the critical components of SC. However, the use of KM to support SC is less clearly discussed. Most implementations and assumptions still consider the latest IT application as the SC component. As such, this study aims to identify the components of the KM model for SC. This study used a systematic literature review (SLR) technique with PRISMA procedures, an analytical hierarchy process, and expert interviews. SLR is used to identify the components of the conceptual model, and AHP is used for model priority component analysis. Interviews were used for validation and model development. The results show that KM, IoT, and big data have the highest trends. Governance, people, and smart education have the highest trends. IT is the highest priority component. The KM model for SC has five main layers grouped in phases of the system cycle. This cycle describes the organization’s intellectual ability to adapt in achieving SC indicators. The knowledge cycle at HEIs focuses on education, research, and community service.
      Citation: Data
      PubDate: 2022-01-10
      DOI: 10.3390/data7010007
      Issue No: Vol. 7, No. 1 (2022)
       
  • Data, Vol. 7, Pages 8: TBCOV: Two Billion Multilingual COVID-19 Tweets
           with Sentiment, Entity, Geo, and Gender Labels

    • Authors: Muhammad Imran, Umair Qazi, Ferda Ofli
      First page: 8
      Abstract: As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. The widespread usage of social networking sites, especially during mass convergence events, such as health emergencies, provides instant access to citizen-generated data offering rich information about public opinions, sentiments, and situational updates useful for authorities to gain insights. We offer a large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages. We used state-of-the-art machine learning models to enrich the data with sentiment labels and named-entities. Additionally, a gender identification approach is proposed to segregate user gender. Furthermore, a geolocalization approach is devised to geotag tweets at country, state, county, and city granularities, enabling a myriad of data analysis tasks to understand real-world issues at national and sub-national levels. We believe this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being.
      Citation: Data
      PubDate: 2022-01-10
      DOI: 10.3390/data7010008
      Issue No: Vol. 7, No. 1 (2022)
       
  • Data, Vol. 7, Pages 9: A Repertoire of Virtual-Reality, Occupational
           Therapy Exercises for Motor Rehabilitation Based on Action Observation

    • Authors: Emilia Scalona, Doriana De Marco, Maria Chiara Bazzini, Arturo Nuara, Adolfo Zilli, Elisa Taglione, Fabrizio Pasqualetti, Generoso Della Polla, Nicola Francesco Lopomo, Maddalena Fabbri-Destro, Pietro Avanzini
      First page: 9
      Abstract: There is a growing interest in action observation treatment (AOT), i.e., a rehabilitative procedure combining action observation, motor imagery, and action execution to promote the recovery, maintenance, and acquisition of motor abilities. AOT studies employed basic upper limb gestures as stimuli, but—in principle—the AOT approach can be effectively extended to more complex actions like occupational gestures. Here, we present a repertoire of virtual-reality (VR) stimuli depicting occupational therapy exercises intended for AOT, potentially suitable for occupational safety and injury prevention. We animated a humanoid avatar by fitting the kinematics recorded by a healthy subject performing the exercises. All the stimuli are available via a custom-made graphical user interface, which allows the user to adjust several visualization parameters like the viewpoint, the number of repetitions, and the observed movement’s speed. Beyond providing clinicians with a set of VR stimuli promoting via AOT the recovery of goal-oriented, occupational gestures, such a repertoire could extend the use of AOT to the field of occupational safety and injury prevention.
      Citation: Data
      PubDate: 2022-01-11
      DOI: 10.3390/data7010009
      Issue No: Vol. 7, No. 1 (2022)
       
  • Data, Vol. 7, Pages 10: The Impact of Global Structural Information in
           Graph Neural Networks Applications

    • Authors: Davide Buffelli, Fabio Vandin
      First page: 10
      Abstract: Graph Neural Networks (GNNs) rely on the graph structure to define an aggregation strategy where each node updates its representation by combining information from its neighbours. A known limitation of GNNs is that, as the number of layers increases, information gets smoothed and squashed and node embeddings become indistinguishable, negatively affecting performance. Therefore, practical GNN models employ few layers and only leverage the graph structure in terms of limited, small neighbourhoods around each node. Inevitably, practical GNNs do not capture information depending on the global structure of the graph. While there have been several works studying the limitations and expressivity of GNNs, the question of whether practical applications on graph structured data require global structural knowledge or not remains unanswered. In this work, we empirically address this question by giving access to global information to several GNN models, and observing the impact it has on downstream performance. Our results show that global information can in fact provide significant benefits for common graph-related tasks. We further identify a novel regularization strategy that leads to an average accuracy improvement of more than 5% on all considered tasks.
      Citation: Data
      PubDate: 2022-01-13
      DOI: 10.3390/data7010010
      Issue No: Vol. 7, No. 1 (2022)
       
  • Data, Vol. 7, Pages 11: An Efficient Spark-Based Hybrid Frequent Itemset
           Mining Algorithm for Big Data

    • Authors: Mohamed Reda Al-Bana, Marwa Salah Farhan, Nermin Abdelhakim Othman
      First page: 11
      Abstract: Frequent itemset mining (FIM) is a common approach for discovering hidden frequent patterns from transactional databases used in prediction, association rules, classification, etc. Apriori is an FIM elementary algorithm with iterative nature used to find the frequent itemsets. Apriori is used to scan the dataset multiple times to generate big frequent itemsets with different cardinalities. Apriori performance descends when data gets bigger due to the multiple dataset scan to extract the frequent itemsets. Eclat is a scalable version of the Apriori algorithm that utilizes a vertical layout. The vertical layout has many advantages; it helps to solve the problem of multiple datasets scanning and has information that helps to find each itemset support. In a vertical layout, itemset support can be achieved by intersecting transaction ids (tidset/tids) and pruning irrelevant itemsets. However, when tids become too big for memory, it affects algorithms efficiency. In this paper, we introduce SHFIM (spark-based hybrid frequent itemset mining), which is a three-phase algorithm that utilizes both horizontal and vertical layout diffset instead of tidset to keep track of the differences between transaction ids rather than the intersections. Moreover, some improvements are developed to decrease the number of candidate itemsets. SHFIM is implemented and tested over the Spark framework, which utilizes the RDD (resilient distributed datasets) concept and in-memory processing that tackles MapReduce framework problem. We compared the SHFIM performance with Spark-based Eclat and dEclat algorithms for the four benchmark datasets. Experimental results proved that SHFIM outperforms Eclat and dEclat Spark-based algorithms in both dense and sparse datasets in terms of execution time.
      Citation: Data
      PubDate: 2022-01-14
      DOI: 10.3390/data7010011
      Issue No: Vol. 7, No. 1 (2022)
       
  • Data, Vol. 7, Pages 1: Datasets for the Determination of Evaporative Flux
           from Distilled Water and Saturated Brine Using Bench-Scale Atmospheric
           Simulators

    • Authors: Jared Suchan, Shahid Azam
      First page: 1
      Abstract: Evaporation from fresh water and saline water is critical for the estimation of water budget in the Canadian Prairies. Predictive models using empirical field-based data are subject to significant errors and uncertainty. Therefore, highly controlled test conditions and accurately measured experimental data are required to understand the relationship between atmospheric variables at water surfaces. This paper provides a comprehensive dataset generated for the determination of evaporative flux from distilled water and saturated brine using the bench-scale atmospheric simulator (BAS) and the subsequently improved design (BAS2). Analyses of the weather scenarios from atmospheric parameters and evaporative flux from the experimental data are provided.
      Citation: Data
      PubDate: 2021-12-22
      DOI: 10.3390/data7010001
      Issue No: Vol. 7, No. 1 (2021)
       
  • Data, Vol. 7, Pages 2: Business Intelligence for IT Governance of a
           Technology Company

    • Authors: Vittoria Biagi, Riccardo Patriarca, Giulio Di Gravio
      First page: 2
      Abstract: Managers are required to make fast, reliable, and fact-based decisions to encompass the dynamicity of modern business environments. Data visualization and reporting are thus crucial activities to ensure a systematic organizational intelligence especially for technological companies operating in a fast-moving context. As such, this paper presents case-study research for the definition of a business intelligence model and related Key Performance Indicators (KPIs) to support risk-related decision making. The study firstly comprises a literature review on approaches for governance management, which confirm a disconnection between theory and practice. It then progresses to mapping the main business areas and suggesting exemplary KPIs to fill this gap. Finally, it documents the design and usage of a BI dashboard, as emerged via a validation with four managers. This early application shows the advantages of BI for both business operators and governance managers.
      Citation: Data
      PubDate: 2021-12-27
      DOI: 10.3390/data7010002
      Issue No: Vol. 7, No. 1 (2021)
       
  • Data, Vol. 7, Pages 3: News Monitor: A Framework for Exploring News in
           Real-Time

    • Authors: Nikolaos Panagiotou, Antonia Saravanou, Dimitrios Gunopulos
      First page: 3
      Abstract: News articles generated by online media are a major source of information. In this work, we present News Monitor, a framework that automatically collects news articles from a wide variety of online news portals and performs various analysis tasks. The framework initially identifies fresh news (first stories) and clusters articles about the same incidents. For every story, at first, it extracts all of the corresponding triples and, then, it creates a knowledge base (KB) using open information extraction techniques. This knowledge base is then used to create a summary for the user. News Monitor allows for the users to use it as a search engine, ask their questions in their natural language and receive answers that have been created by the state-of-the-art framework BERT. In addition, News Monitor crawls the Twitter stream using a dynamic set of “trending” keywords in order to retrieve all messages relevant to the news. The framework is distributed, online and performs analysis in real-time. According to the evaluation results, the fake news detection techniques utilized by News Monitor allow for a F-measure of 82% in the rumor identification task and an accuracy of 92% in the stance detection tasks. The major contribution of this work can be summarized as a novel real-time and scalable architecture that combines various effective techniques under a news analysis framework.
      Citation: Data
      PubDate: 2021-12-27
      DOI: 10.3390/data7010003
      Issue No: Vol. 7, No. 1 (2021)
       
  • Data, Vol. 7, Pages 4: View VULMA: Data Set for Training a
           Machine-Learning Tool for a Fast Vulnerability Analysis of Existing
           Buildings

    • Authors: Angelo Cardellicchio, Sergio Ruggieri, Valeria Leggieri, Giuseppina Uva
      First page: 4
      Abstract: The paper presents View VULMA, a data set specifically designed for training machine-learning tools for elaborating fast vulnerability analysis of existing buildings. Such tools require supervised training via an extensive set of building imagery, for which several typological parameters should be defined, with a proper label assigned to each sample on a per-parameter basis. Thus, it is clear how defining an adequate training data set plays a key role, and several aspects should be considered, such as data availability, preprocessing, augmentation and balancing according to the selected labels. In this paper, we highlight all these issues, describing the pursued strategies to elaborate a reliable data set. In particular, a detailed description of both requirements (e.g., scale and resolution of images, evaluation parameters and data heterogeneity) and the steps followed to define View VULMA are provided, starting from the data assessment (which allowed to reduce the initial sample of about 20.000 images to a subset of about 3.000 pictures), to achieve the goal of training a transfer-learning-based automated tool for fast estimation of the vulnerability of existing buildings from single pictures.
      Citation: Data
      PubDate: 2021-12-31
      DOI: 10.3390/data7010004
      Issue No: Vol. 7, No. 1 (2021)
       
  • Data, Vol. 6, Pages 123: Collection of Bacterial Community Associated with
           Size Fractionated Aerosols from Kuwait

    • Authors: Nazima Habibi, Saif Uddin, Fadila Al Salameen, Montaha Behbehani, Faiz Shirshikhar, Nasreem Abdul Razzack, Anisha Shajan, Farhana Zakir Hussain
      First page: 123
      Abstract: Airborne particles play a significant role in the spread of bacterial communities. The prevalence of both pathogenic and non-pathogenic forms in the inhalable fractions of aerosols is known. The abundance of microorganisms in the aerosols heightens the likely health hazards due to inhalation since they serve as carriers for pathogens and allergens, often acting as a vector for pulmonary/respiratory infections. Not much information is available on the occurrence and prevalence of bacterial communities in different size-fractionated aerosols in Kuwait. A high-volume air sampler with a six-stage cascade impactor was deployed for sample collection at two sites representing a remote and an urban site. A total volume of 815 ± 5 m3 of air was passed through the filters to trap the particulate matter ranging from 0.39 to >10.2 μm in size (Stage 1 to Stage 5 and base filter). Aeromonas dominated all the stages at the urban site and Stage 5 at the remote site, whereas Sphingobium was prevalent at Stages, 2, 3 and 4 at the remote site. Brevundimonas were found at Stages 1 and 5, and the base filter at the remote site. These results show that the bacterial community is altered in different size fractions of aerosols. Stages 1–4 form the respirable fraction, whereas Stage 5 and particles on the base filter are the inhalable fractions. Many species of Aeromonas cause disease, and hence their presence in inhalable fractions is a health concern, meaning that species-level identification is warranted.
      Citation: Data
      PubDate: 2021-11-24
      DOI: 10.3390/data6120123
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 124: Crystal Clear: Investigating Databases for
           Research, the Case of Drone Strikes

    • Authors: Giampiero Giacomello, Damiano Martinelli
      First page: 124
      Abstract: The availability of numerous online databases offers new and tremendous opportunities for social science research. Furthermore, databases based on news reports often allow scholars to investigate issues otherwise hard to tackle, such as, for example, the impact and consequences of drone strikes. Crucial to the campaign against terrorism, official data on drone strikes are classified, but news reports permit a certain degree of independent scrutiny. The quality of such research may be improved if scholars can rely on two (or more) databases independently reporting on the same issue (a solution akin to ‘data triangulation’). Given these conditions, such databases should be as reliable and valid as possible. This paper aimed to discuss the ‘validity and reliability’ of two such databases, as well as open up a debate on the evaluation of the quality, reliability and validity of research data on ‘problematic’ topics that have recently become more accessible thanks to online sources.
      Citation: Data
      PubDate: 2021-11-25
      DOI: 10.3390/data6120124
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 125: Learning Interpretable Mixture of Weibull
           Distributions—Exploratory Analysis of How Economic Development
           Influences the Incidence of COVID-19 Deaths

    • Authors: Róbert Csalódi, Zoltán Birkner, János Abonyi
      First page: 125
      Abstract: This paper presents an algorithm for learning local Weibull models, whose operating regions are represented by fuzzy rules. The applicability of the proposed method is demonstrated in estimating the mortality rate of the COVID-19 pandemic. The reproducible results show that there is a significant difference between mortality rates of countries due to their economic situation, urbanization, and the state of the health sector. The proposed method is compared with the semi-parametric Cox proportional hazard regression method. The distribution functions of these two methods are close to each other, so the proposed method can estimate efficiently.
      Citation: Data
      PubDate: 2021-11-26
      DOI: 10.3390/data6120125
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 126: Spatial Interpolation of Air Pollutant and
           Meteorological Variables in Central Amazonia

    • Authors: Renato Okabayashi Miyaji, Felipe Valencia de Almeida, Lucas de Oliveira Bauer, Victor Madureira Ferrari, Pedro Luiz Pizzigatti Corrêa, Luciana Varanda Rizzo, Giri Prakash
      First page: 126
      Abstract: The Amazon Rainforest is highlighted by the global community both for its extensive vegetation cover that constantly suffers the effects of anthropic action and for its substantial biodiversity. This dataset presents data of meteorological variables from the Amazon Rainforest region with a spatial resolution of 0.001° in latitude and longitude, resulting from an interpolation process. The original data were obtained from the GoAmazon 2014/5 project, in the Atmospheric Radiation Measurement (ARM) repository, and then processed through mathematical and statistical methods. The dataset presented here can be used in experiments in the field of Data Science, such as training models for predicting climate variables or modeling the distribution of species.
      Citation: Data
      PubDate: 2021-11-30
      DOI: 10.3390/data6120126
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 127: Development of A Spatiotemporal Database for
           Evolution Analysis of the Moscow Backbone Power Grid

    • Authors: Andrey Karpachevskiy, German Titov, Oksana Filippova
      First page: 127
      Abstract: Currently in the field of transport geography, the spatial evolution of electrical networks remain globally understudied. Publicly available data sources, including remote sensing data, have made it possible to collect spatial data on electrical networks, but at the same time a suitable data structure for storing them has not been defined. The main purpose of this study was the collection and structuring of spatiotemporal data on electric networks with the possibility of their further processing and analysis. To collect data, we used publicly available remote sensing and geoinformation systems, archival schemes and maps, as well as other documents related to the Moscow power grid. Additionally, we developed a web service for data publication and visualization. We conducted a small morphological analysis of the evolution of the network to show the possibilities of working with the database using a Python script. For example, we found that the portion of new lines has been declining since 1950s and in the 2010s the portion of partial reconstruction reached its maximum. Thus, the developed data structure and the database itself provide ample opportunities for the analysis and interpretation of the spatiotemporal development of electric networks. This can be used as a basis to study other territories. The main results of the study are published on the web service where the user can interactively choose a year and two forms of power lines representation to visualize on a map.
      Citation: Data
      PubDate: 2021-11-30
      DOI: 10.3390/data6120127
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 128: Geo-Questionnaire for Environmental Planning: The
           Case of Ecosystem Services Delivered by Trees in Poland

    • Authors: Patrycja Przewoźna, Adam Inglot, Marcin Mielewczyk, Krzysztof Mączka, Piotr Matczak, Piotr Wężyk
      First page: 128
      Abstract: Studies on society and the environment interface are often based on simple questionnaires that do not allow for an in-depth analysis. Research conducted with geo-questionnaires is an increasingly common method. However, even if data collected via a geo-questionnaire are available, the shared databases provide limited information due to personal data protection. In the article, we present open databases that overcome those limitations. They are the result of the iTre-es project concerning public opinion on the benefits provided by trees and shrubs in four different research areas. The databases provide information on the location of trees that are valuable to the residents, the distances from the respondents’ residence place, their attitude toward tree removal, socio-demographic variables, attachment to the place of life, and environmental attitudes. The presentation of all these aspects was possible thanks to the appropriate aggregation of the results. A method to anonymize the respondents is presented. We discuss the collected data and their possible areas of application.
      Citation: Data
      PubDate: 2021-12-01
      DOI: 10.3390/data6120128
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 129: Shipping Accidents Dataset: Data-Driven
           Directions for Assessing Accident’s Impact and Improving Safety
           Onboard

    • Authors: Panagiotis Panagiotidis, Kyriakos Giannakis, Nikolaos Angelopoulos, Angelos Liapis
      First page: 129
      Abstract: Recent tragic marine incidents indicate that more efficient safety procedures and emergency management systems are needed. During the 2014–2019 period, 320 accidents cost 496 lives, and 5424 accidents caused 6210 injuries. Ideally, we need historical data from real accident cases of ships to develop data-driven solutions. According to the literature, the most critical factor to the post-incident management phase is human error. However, no structured datasets record the crew’s actions during an incident and the human factors that contributed to its occurrence. To overcome the limitations mentioned above, we decided to utilise the unstructured information from accident reports conducted by governmental organisations to create a new, well-structured dataset of maritime accidents and provide intuitions for its usage. Our dataset contains all the information that the majority of the marine datasets include, such as the place, the date, and the conditions during the post-incident phase, e.g., weather data. Additionally, the proposed dataset contains attributes related to each incident’s environmental/financial impact, as well as a concise description of the post-incident events, highlighting the crew’s actions and the human factors that contributed to the incident. We utilise this dataset to predict the incident’s impact and provide data-driven directions regarding the improvement of the post-incident safety procedures for specific types of ships.
      Citation: Data
      PubDate: 2021-12-03
      DOI: 10.3390/data6120129
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 130: Mexican Emotional Speech Database Based on
           Semantic, Frequency, Familiarity, Concreteness, and Cultural Shaping of
           Affective Prosody

    • Authors: Mathilde Marie Duville, Luz María Alonso-Valerdi, David I. Ibarra-Zarate
      First page: 130
      Abstract: In this paper, the Mexican Emotional Speech Database (MESD) that contains single-word emotional utterances for anger, disgust, fear, happiness, neutral and sadness with adult (male and female) and child voices is described. To validate the emotional prosody of the uttered words, a cubic Support Vector Machines classifier was trained on the basis of prosodic, spectral and voice quality features for each case study: (1) male adult, (2) female adult and (3) child. In addition, cultural, semantic, and linguistic shaping of emotional expression was assessed by statistical analysis. This study was registered at BioMed Central and is part of the implementation of a published study protocol. Mean emotional classification accuracies yielded 93.3%, 89.4% and 83.3% for male, female and child utterances respectively. Statistical analysis emphasized the shaping of emotional prosodies by semantic and linguistic features. A cultural variation in emotional expression was highlighted by comparing the MESD with the INTERFACE for Castilian Spanish database. The MESD provides reliable content for linguistic emotional prosody shaped by the Mexican cultural environment. In order to facilitate further investigations, a corpus controlled for linguistic features and emotional semantics, as well as one containing words repeated across voices and emotions are provided. The MESD is made freely available.
      Citation: Data
      PubDate: 2021-12-06
      DOI: 10.3390/data6120130
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 131: Panel Dataset to Assess Proactive Eco-Innovation
           in the Paradigm of Firm Financial Progression

    • Authors: Md Abu Toha, Satirenjit Kaur Johl
      First page: 131
      Abstract: Recently, eco-innovation has received a lot of attention in the academic and corporate world due to its potential to accelerate firm financial progression. To measure eco-innovation, mostly primary data and a reactive approach were employed. By emphasising the proactive approach and utilising a secondary panel dataset, this study fills the existing research gap. Data presented in this paper comprise 31 energy firms from Bursa Malaysia for the years between 2015 and 2019. Panel data associated with eco-innovation proactiveness and firm financial progression were collected from three different sources such as company websites, annual reports, and sustainability reports using content analysis. For data collection, an index was adapted comprising five dimensions of eco-innovation, named as product, process, technology, organizational, and marketing. In addition to that, Tobin’s Q was considered as a proxy dimension for firm financial progression because it considers both market value as well as book value. Following a unit root test, six specific data diagnostic tests were performed to ensure data reliability and validity for future potential usage. The results reveal that the panel dataset was organised and is eligible for further statistical model analysis.
      Citation: Data
      PubDate: 2021-12-10
      DOI: 10.3390/data6120131
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 132: Lipid Profiles of Human Brain Tumors Obtained by
           High-Resolution Negative Mode Ambient Mass Spectrometry

    • Authors: Denis S. Zavorotnyuk, Stanislav I. Pekov, Anatoly A. Sorokin, Denis S. Bormotov, Nikita Levin, Evgeny Zhvansky, Savva Semenov, Polina Strelnikova, Konstantin V. Bocharov, Alexander Vorobiev, Alexey Kononikhin, Vsevolod Shurkhay, Eugene N. Nikolaev, Igor A. Popov
      First page: 132
      Abstract: Alterations in cell metabolism, including changes in lipid composition occurring during malignancy, are well characterized for various tumor types. However, a significant part of studies that deal with brain tumors have been performed using cell cultures and animal models. Here, we present a dataset of 124 high-resolution negative ionization mode lipid profiles of human brain tumors resected during neurosurgery. The dataset is supplemented with 38 non-tumor pathological brain tissue samples resected during elective surgery. The change in lipid composition alterations of brain tumors enables the possibility of discriminating between malignant and healthy tissues with the implementation of ambient mass spectrometry. On the other hand, the collection of clinical samples allows the comparison of the metabolism alteration patterns in animal models or in vitro models with natural tumor samples ex vivo. The presented dataset is intended to be a data sample for bioinformaticians to test various data analysis techniques with ambient mass spectrometry profiles, or to be a source of clinically relevant data for lipidomic research in oncology.
      Citation: Data
      PubDate: 2021-12-12
      DOI: 10.3390/data6120132
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 133: Indoor Environment Dataset to Estimate Room
           Occupancy

    • Authors: Andreé Vela, Joanna Alvarado-Uribe, Hector G. Ceballos
      First page: 133
      Abstract: The estimation of occupancy is a crucial contribution to achieve improvements in energy efficiency. The drawback of data or incomplete data related to occupancy in enclosed spaces makes it challenging to develop new models focused on estimating occupancy with high accuracy. Furthermore, considerable variation in the monitored spaces also makes it difficult to compare the results of different approaches. This dataset comprises the indoor environmental information (pressure, altitude, humidity, and temperature) and the corresponding occupancy level for two different rooms: (1) a fitness gym and (2) a living room. The fitness gym data were collected for six days between 18 September and 2 October 2019, obtaining 10,125 objects with a 1 s resolution according to the following occupancy levels: low (2442 objects), medium (5325 objects), and high (2358 objects). The living room data were collected for 11 days between 14 May and 4 June 2020, obtaining 295,823 objects with a 1 s resolution, according to the following occupancy levels: empty (50,978 objects), low (202,613 objects), medium (35,410 objects), and high (6822 objects). Additionally, the number of fans turned on is provided for the living room data. The data are publicly available in the Mendeley Data repository. This dataset can be used to train and compare different machine learning, deep learning, and physical models for estimating occupancy at enclosed spaces.
      Citation: Data
      PubDate: 2021-12-13
      DOI: 10.3390/data6120133
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 134: A Prototypical Network-Based Approach for
           Low-Resource Font Typeface Feature Extraction and Utilization

    • Authors: Kangying Li, Biligsaikhan Batjargal, Akira Maeda
      First page: 134
      Abstract: This paper introduces a framework for retrieving low-resource font typeface databases by handwritten input. A new deep learning model structure based on metric learning is proposed to extract the features of a character typeface and predict the category of handwrittten input queries. Rather than using sufficient training data, we aim to utilize ancient character font typefaces with only one sample per category. Our research aims to achieve decent retrieval performances over more than 600 categories of handwritten characters automatically. We consider utilizing generic handcrafted features to train a model to help the voting classifier make the final prediction. The proposed method is implemented on the ‘Shirakawa font oracle bone script’ dataset as an isolated ancient-character-recognition system based on free ordering and connective strokes. We evaluate the proposed model on several standard character and symbol datasets. The experimental results showed that the proposed method provides good performance in extracting the features of symbols or characters’ font images necessary to perform further retrieval tasks. The demo system has been released, and it requires only one sample for each character to predict the user input. The extracted features have a better effect in finding the highest-ranked relevant item in retrieval tasks and can also be utilized in various technical frameworks for ancient character recognition and can be applied to educational application development.
      Citation: Data
      PubDate: 2021-12-16
      DOI: 10.3390/data6120134
      Issue No: Vol. 6, No. 12 (2021)
       
  • Data, Vol. 6, Pages 109: Neglected Theories of Business
           Cycle—Alternative Ways of Explaining Economic Fluctuations

    • Authors: Klára Čermáková, Michal Bejček, Jan Vorlíček, Helena Mitwallyová
      First page: 109
      Abstract: The business cycle is a frequent topic in economic research; however, the approach based on individual strategies often remains neglected. The aspiration of this study is to prove that the behavior of individuals can originate and fuel an economic cycle. For this purpose, we are using an algorithm based on a repeated dove–hawk game. The results reveal that the sum of output in a society is affected by the ratio of individual strategies. Cyclical changes in this ratio will be translated into fluctuations of the total product of society. We present game theory modelling of a strategic behavioral approach as a valid theoretical foundation for explaining economic fluctuations. This article offers an unusual insight into the business cycle’s causes and growth theories.
      Citation: Data
      PubDate: 2021-10-20
      DOI: 10.3390/data6110109
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 110: Dataset of Students’ Performance Using Student
           Information System, Moodle and the Mobile Application “eDify”

    • Authors: Raza Hasan, Sellappan Palaniappan, Salman Mahmood, Ali Abbas, Kamal Uddin Sarker
      First page: 110
      Abstract: The data presented in this article comprise an educational dataset collected from the student information system (SIS), the learning management system (LMS) called Moodle, and video interactions from the mobile application called “eDify.” The dataset, from the higher educational institution (HEI) in Sultanate of Oman, comprises five modules of data from Spring 2017 to Spring 2021. The dataset consists of 326 student records with 40 features in total, including the students’ academic information from SIS (which has 24 features), the students’ activities performed on Moodle within and outside the campus (comprising 10 features), and the students’ video interactions collected from eDify (consisting of six features). The dataset is useful for researchers who want to explore students’ academic performance in online learning environments, and will help them to model their educational datamining models. Moreover, it can serve as an input for predicting students’ academic performance within the module for educational datamining and learning analytics. Furthermore, researchers are highly recommended to refer to the original papers for more details.
      Citation: Data
      PubDate: 2021-10-22
      DOI: 10.3390/data6110110
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 111: King Abdulaziz University Breast Cancer Mammogram
           Dataset (KAU-BCMD)

    • Authors: Asmaa S. Alsolami, Wafaa Shalash, Wafaa Alsaggaf, Sawsan Ashoor, Haneen Refaat, Mohammed Elmogy
      First page: 111
      Abstract: The current era is characterized by the rapidly increasing use of computer-aided diagnosis (CAD) systems in the medical field. These systems need a variety of datasets to help develop, evaluate, and compare their performances fairly. Physicians indicated that breast anatomy, especially dense ones, and the probability of breast cancer and tumor development, vary highly depending on race. Researchers reported that breast cancer risk factors are related to culture and society. Thus, there is a massive need for a local dataset representing breast cancer in our region to help develop and evaluate automatic breast cancer CAD systems. This paper presents a public mammogram dataset called King Abdulaziz University Breast Cancer Mammogram Dataset (KAU-BCMD) version 1. To our knowledge, KAU-BCMD is the first dataset in Saudi Arabia that deals with a large number of mammogram scans. The dataset was collected from the Sheikh Mohammed Hussein Al-Amoudi Center of Excellence in Breast Cancer at King Abdulaziz University. It contains 1416 cases. Each case has two views for both the right and left breasts, resulting in 5662 images based on the breast imaging reporting and data system. It also contains 205 ultrasound cases corresponding to a part of the mammogram cases, with 405 images as a total. The dataset was annotated and reviewed by three different radiologists. Our dataset is a promising dataset that contains different imaging modalities for breast cancer with different cancer grades for Saudi women.
      Citation: Data
      PubDate: 2021-10-25
      DOI: 10.3390/data6110111
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 112: Future Prediction of COVID-19 Vaccine Trends
           Using a Voting Classifier

    • Authors: Syed Ali Jafar Zaidi, Saad Tariq, Samir Brahim Belhaouari
      First page: 112
      Abstract: Machine learning (ML)-based prediction is considered an important technique for improving decision making during the planning process. Modern ML models are used for prediction, prioritization, and decision making. Multiple ML algorithms are used to improve decision-making at different aspects after forecasting. This study focuses on the future prediction of the effectiveness of the COVID-19 vaccine effectiveness which has been presented as a light in the dark. People bear several reservations, including concerns about the efficacy of the COVID-19 vaccine. Under these presumptions, the COVID-19 vaccine would either lower the risk of developing the malady after injection, or the vaccine would impose side effects, affecting their existing health condition. In this regard, people have publicly expressed their concerns regarding the vaccine. This study intends to estimate what perception the masses will establish about the role of the COVID-19 vaccine in the future. Specifically, this study exhibits people’s predilection toward the COVID-19 vaccine and its results based on the reviews. Five models, e.g., random forest (RF), a support vector machine (SVM), decision tree (DT), K-nearest neighbor (KNN), and an artificial neural network (ANN), were used for forecasting the overall predilection toward the COVID-19 vaccine. A voting classifier was used at the end of this study to determine the accuracy of all the classifiers. The results prove that the SVM produces the best forecasting results and that artificial neural networks (ANNs) produce the worst prediction toward the individual aptitude to be vaccinated by the COVID-19 vaccine. When using the voting classifier, the proposed system provided an overall accuracy of 89.9% for the random dataset and 45.7% for the date-wise dataset. Thus, the results show that the studied prediction technique is a promising and encouraging procedure for studying the future trends of the COVID-19 vaccine.
      Citation: Data
      PubDate: 2021-11-02
      DOI: 10.3390/data6110112
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 113: Nowcasting India Economic Growth Using a
           Mixed-Data Sampling (MIDAS) Model (Empirical Study with Economic Policy
           Uncertainty–Consumer Prices Index)

    • Authors: Pradeep Mishra, Khder Alakkari, Mostafa Abotaleb, Pankaj Kumar Singh, Shilpi Singh, Monika Ray, Soumitra Sankar Das, Umme Habibah Rahman, Ali J. Othman, Nazirya Alexandrovna Ibragimova, Gulfishan Firdose Ahmed, Fozia Homa, Pushpika Tiwari, Ritisha Balloo
      First page: 113
      Abstract: Economics suffers from a blurred view of the economy due to the delay in the official publication of macroeconomic variables and, essentially, of the most important variable of real GDP. Therefore, this paper aimed at nowcasting GDP in India based on high-frequency data released early. Instead of using a large set of data thus increasing statistical complexity, two main indicators of the Indian economy (economic policy uncertainty and consumer price index) were relied on. The paper followed the MIDAS–Almon (PDL) weighting approach, which allowed us to successfully capture structural breaks and predict Indian GDP for the second quarter of 2021, after evaluating the accuracy of the nowcasting and out-of-sample prediction. Our results indicated low values of the RMSE in the sample and when predicting the out-of-sample1- and 4-quarter horizon, but RMSE increased when predicting the 10-quarter horizon. Due to the effect of the short-term structural break, we found that RMSE values decreased for the last prediction point.
      Citation: Data
      PubDate: 2021-11-02
      DOI: 10.3390/data6110113
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 114: Metadata Schema for Managing Digital Data and
           Images of Thai Human Skulls

    • Authors: Satapon Yosakonkun, Panya Tuamsuk, Wirapong Chansanam, Kulthida Tuamsuk
      First page: 114
      Abstract: This research was aimed at developing metadata that meets international standards for the purpose of managing digital data and images of Thai human skulls for medical studies. The research was conducted by applying the Metadata Lifecycle Model of the Metadata Architecture and Application Team. The model comprises four steps: requirement assessment and content analysis, identification of metadata requirements, metadata schema development, and metadata service and evaluation. The research outcome was a metadata schema composed of four modules, seven data element sets, and 29 pieces of data, each of which had six sets of property descriptions. Metadata evaluation conducted by three specialists in the field of anatomy and forensic medicine and three experts in the field of information science and metadata through free retrieval based on the Continuum of Metadata Quality in four aspects revealed that the experts were satisfied with the quality of metadata at a very high level: 100% for completeness, accuracy, and accessibility, and 94% for conformance to expectations. The developed metadata contain details that can be used to describe the characteristics of human skulls, with consideration taken in the development of the language used, retrieval, access, data exchange, and sharing. Thus, this novel metadata schema can be of use in management of digital data and images of human skulls for the purposes of medical studies, i.e., human anatomy and forensic anthropology.
      Citation: Data
      PubDate: 2021-11-10
      DOI: 10.3390/data6110114
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 115: Innovation Trajectories for a Society 5.0

    • Authors: Fabio De Felice, Marta Travaglioni, Antonella Petrillo
      First page: 115
      Abstract: Big Data, the Internet of Things, and robotic and augmented realities are just some of the technologies that belong to Industry 4.0. These technologies improve working conditions and increase productivity and the quality of industry production. However, they can also improve life and society as a whole. A new perspective is oriented towards social well-being and it is called Society 5.0. Industry 4.0 supports the transition to the new society, but other drivers are also needed. To guide the transition, it is necessary to identify the enabling factors that integrate Industry 4.0. A conceptual framework was developed in which these factors were identified through a literature review and the analytical hierarchy process (AHP) methodology. Furthermore, the way in which they relate was evaluated with the help of the interpretive structural modeling (ISM) methodology. The proposed framework fills a research gap, which has not yet consolidated a strategy that includes all aspects of Society 5.0. As a result, the main driver, in addition to technology, is international politics.
      Citation: Data
      PubDate: 2021-11-10
      DOI: 10.3390/data6110115
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 116: A Comparative Analysis of Machine Learning Models
           for the Prediction of Insurance Uptake in Kenya

    • Authors: Nelson Kemboi Yego, Juma Kasozi, Joseph Nkurunziza
      First page: 116
      Abstract: The role of insurance in financial inclusion and economic growth, in general, is immense and is increasingly being recognized. However, low uptake impedes the growth of the sector, hence the need for a model that robustly predicts insurance uptake among potential clients. This study undertook a two phase comparison of machine learning classifiers. Phase I had eight machine learning models compared for their performance in predicting the insurance uptake using 2016 Kenya FinAccessHousehold Survey data. Taking Phase I as a base in Phase II, random forest and XGBoost were compared with four deep learning classifiers using 2019 Kenya FinAccess Household Survey data. The random forest model trained on oversampled data showed the highest F1-score, accuracy, and precision. The area under the receiver operating characteristic curve was furthermore highest for random forest; hence, it could be construed as the most robust model for predicting the insurance uptake. Finally, the most important features in predicting insurance uptake as extracted from the random forest model were income, bank usage, and ability and willingness to support others. Hence, there is a need for a design and distribution of low income based products, and bancassurance could be said to be a plausible channel for the distribution of insurance products.
      Citation: Data
      PubDate: 2021-11-15
      DOI: 10.3390/data6110116
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 117: Multi-Ideology ISIS/Jihadist White Supremacist
           (MIWS) Dataset for Multi-Class Extremism Text Classification

    • Authors: Mayur Gaikwad, Swati Ahirrao, Shraddha Phansalkar, Ketan Kotecha
      First page: 117
      Abstract: Social media platforms are a popular choice for extremist organizations to disseminate their perceptions, beliefs, and ideologies. This information is generally based on selective reporting and is subjective in content. However, the radical presentation of this disinformation and its outreach on social media leads to an increased number of susceptible audiences. Hence, detection of extremist text on social media platforms is a significant area of research. The unavailability of extremism text datasets is a challenge in online extremism research. The lack of emphasis on classifying extremism text into propaganda, radicalization, and recruitment classes is a challenge. The lack of data validation methods also challenges the accuracy of extremism detection. This research addresses these challenges and presents a seed dataset with a multi-ideology and multi-class extremism text dataset. This research presents the construction of a multi-ideology ISIS/Jihadist White supremacist (MIWS) dataset with recent tweets collected from Twitter. The presented dataset can be employed effectively and importantly to classify extremist text into popular types like propaganda, radicalization, and recruitment. Additionally, the seed dataset is statistically validated with a coherence score of Latent Dirichlet Allocation (LDA) and word mover’s distance using a pretrained Google News vector. The dataset shows effectiveness in its construction with good coherence scores within a topic and appropriate distance measures between topics. This dataset is the first publicly accessible multi-ideology, multi-class extremism text dataset to reinforce research on extremism text detection on social media platforms.
      Citation: Data
      PubDate: 2021-11-15
      DOI: 10.3390/data6110117
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 118: Transmission Electron Microscopy Tilt-Series Data
           from In-Situ Chondrocyte Primary Cilia

    • Authors: Michael J. Jennings, Timothy C. A. Molteno, Robert J. Walker, Jennifer J. Bedford, John P. Leader, Tony Poole
      First page: 118
      Abstract: The primary cilium has recently become the focus of intensive investigations into understanding the physical structure and processes of eukaryotic cells. This paper describes two tilt-series image datasets, acquired by transmission electron microscopy, of in situ chick-embryo sternal-cartilage primary cilia. These data have been released under an open-access licence, and are well suited to tomographic reconstruction and modelling of the cilium.
      Citation: Data
      PubDate: 2021-11-15
      DOI: 10.3390/data6110118
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 119: Deep Reinforcement Learning For Trading—A
           Critical Survey

    • Authors: Adrian Millea
      First page: 119
      Abstract: Deep reinforcement learning (DRL) has achieved significant results in many machine learning (ML) benchmarks. In this short survey, we provide an overview of DRL applied to trading on financial markets with the purpose of unravelling common structures used in the trading community using DRL, as well as discovering common issues and limitations of such approaches. We include also a short corpus summarization using Google Scholar. Moreover, we discuss how one can use hierarchy for dividing the problem space, as well as using model-based RL to learn a world model of the trading environment which can be used for prediction. In addition, multiple risk measures are defined and discussed, which not only provide a way of quantifying the performance of various algorithms, but they can also act as (dense) reward-shaping mechanisms for the agent. We discuss in detail the various state representations used for financial markets, which we consider critical for the success and efficiency of such DRL agents. The market in focus for this survey is the cryptocurrency market; the results of this survey are two-fold: firstly, to find the most promising directions for further research and secondly, to show how a lack of consistency in the community can significantly impede research and the development of DRL agents for trading.
      Citation: Data
      PubDate: 2021-11-16
      DOI: 10.3390/data6110119
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 120: COVID-19 Lockdown Effects on Academic
           Functioning, Mood, and Health Correlates: Data from Dutch Pharmacy
           Students, PhD Candidates and Postdocs

    • Authors: Pauline A. Hendriksen, Agnese Merlo, Elisabeth Y. Bijlsma, Ferdi Engels, Johan Garssen, Gillian Bruce, Joris C. Verster
      First page: 120
      Abstract: Mixed results have been published on the impact of the 2019 coronavirus (COVID-19) pandemic and its associated lockdown periods on academic functioning, mood, and health correlates such as alcohol consumption. Whereas a number of students report an impaired academic performance and increased alcohol intake during lockdown periods, other students report no change or an improvement in academic functioning and a reduced alcohol consumption. This data descriptor article describes the dataset of a study investigating the impact of the COVID-19 pandemic on academic functioning. To investigate this, an online survey was conducted among Dutch pharmacy students, PhD candidates and postdoctoral researchers (postdocs) of Utrecht University, the Netherlands. Compared to before the COVID-19 pandemic, the survey assessed possible changes in self-reported academic functioning, mood and health correlates such as alcohol consumption, perceived immune functioning and sleep quality. Retrospective assessments were made for four periods, including (1) the year 2019 (the period before COVID-19), (2) the first lockdown period (15 March–11 May 2020), (3) summer 2020 (no lockdown) and (4) the second lockdown (November 2020–April 2021). This article describes the content of the survey and corresponding dataset. The survey had a response rate of 24.3% and was completed by 345 participants.
      Citation: Data
      PubDate: 2021-11-17
      DOI: 10.3390/data6110120
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 121: Bicycle Mobility Data: Current Use and Future
           Potential. An International Survey of Domain Professionals

    • Authors: Christian Werner, Martin Loidl
      First page: 121
      Abstract: Active mobility, especially cycling, is an essential building block for sustainable urban mobility. Public and private stakeholders are striving to improve conditions for cycling and subsequently increase its modal share. Data are regarded as key for different measures to become efficient and targeted. There is extensive evidence for an increasing amount of mobility data, availability of new data sources and potential usage scenarios for such data. However, little is known about the current use of these data in policy making, planning and related fields. To the best of our knowledge, it has not been investigated yet to which degree professionals in the broader field of cycling promotion benefit from an increasing amount of cycling-related data. Thus, we conducted a multi-lingual online survey among domain professionals and acquired data on their perspectives on current data availability, use and suitability as well as the potential they see for the use of cycling data in the future. In total, we received 325 complete responses from 32 countries, with the vast majority of 241 valid responses originating from Germany, Austria and Italy. Key findings are: 84% of domain professionals attribute high importance to data, and 89% state that they currently cannot or only partly solve their tasks with the data available to them. Results emphasize the need for making more and better suited data available to professionals in cycling-related positions, in both the private and public sector.
      Citation: Data
      PubDate: 2021-11-18
      DOI: 10.3390/data6110121
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 122: Determination of Specific IgG to Identify
           Possible Food Intolerance in Athletes Using ELISA

    • Authors: Kristina Malsagova, Alexander Stepanov, Alexandra A. Sinitsyna, Alexander Izotov, Mikhail S. Klyuchnikov, Arthur T. Kopylov, Anna L. Kaysheva
      First page: 122
      Abstract: Nutrition is considered one of the foundations of athletic performance, and post-workout nutritional recommendations are fundamental to the effectiveness of the recovery and adaptive processes. Therefore, at present, new directions in dietetics are being formed, focused on the creation of personalized diets. To identify the probable risk of somatic and allergic reactions upon contact with food antigens, we used the method of enzyme-linked immunosorbent assay (ELISA) for the quantitative determination of IgG antibodies in the blood plasma of athletes against protein–peptide antigens accommodated in food. The study enrolled 40 athletes of boating and fighting sport disciplines. We found that the majority of the studied participants were characterized by an elevated IgG level against one or two food allergens (barley, almond, strawberry, etc.). Comparative analysis of the semiquantitative levels of IgG antibodies in athletes engaged in boating and fighting did not reveal significant differences between these groups. As a result, foods that are likely to cause the most pronounced immune response amongst the studied participants can be identified, which may indicate the presence of food intolerances. An athlete’s diet is influenced by both external and internal factors that can reduce or worsen the symptoms of a food intolerance/allergy associated with exercise. The range of foods is wide, and the effectiveness of a diet depends on the time, the place, and environmental factors. Therefore, during the recovery period (the post-competition period), athletes are advised to follow the instructions of doctors and nutritionists. An effective, comprehensive recovery strategy during the recovery period may enhance the adaptive response to fatigue, improving muscle function and increasing exercise tolerance. The data obtained may be useful for guiding the development of a new personalized approach and dietary recommendations covering the composition of athletes’ diet and the prevalence of food intolerance.
      Citation: Data
      PubDate: 2021-11-21
      DOI: 10.3390/data6110122
      Issue No: Vol. 6, No. 11 (2021)
       
  • Data, Vol. 6, Pages 101: Long-Term Dataset of Tidal Residuals in New South
           Wales, Australia

    • Authors: Cristina N. A. Viola, Danielle C. Verdon-Kidd, David J. Hanslow, Sam Maddox, Hannah E. Power
      First page: 101
      Abstract: Continuous water level records are required to detect long-term trends and analyse the climatological mechanisms responsible for extreme events. This paper compiles nine ocean water level records from gauges located along the New South Wales (NSW) coast of Australia. These gauges represent the longest and most complete records of hourly—and in five cases 15-min—water level data for this region. The datasets were adjusted to the vertical Australian Height Datum (AHD) and had the rainfall-related peaks removed from the records. The Unified Tidal Analysis and Prediction (Utide) model was subsequently used to predict tides for datasets with at least 25 years of records to obtain the associated tidal residuals. Finally, we provide a series of examples of how this dataset can be used to analyse trends in tidal anomalies as well as extreme events and their causal processes.
      Citation: Data
      PubDate: 2021-09-23
      DOI: 10.3390/data6100101
      Issue No: Vol. 6, No. 10 (2021)
       
  • Data, Vol. 6, Pages 102: Multiple Image Splicing Dataset (MISD): A Dataset
           for Multiple Splicing

    • Authors: Kalyani Dhananjay Kadam, Swati Ahirrao, Ketan Kotecha
      First page: 102
      Abstract: Image forgery has grown in popularity due to easy access to abundant image editing software. These forged images are so devious that it is impossible to predict with the naked eye. Such images are used to spread misleading information in society with the help of various social media platforms such as Facebook, Twitter, etc. Hence, there is an urgent need for effective forgery detection techniques. In order to validate the credibility of these techniques, publically available and more credible standard datasets are required. A few datasets are available for image splicing, such as Columbia, Carvalho, and CASIA V1.0. However, these datasets are employed for the detection of image splicing. There are also a few custom datasets available such as Modified CASIA, AbhAS, which are also employed for the detection of image splicing forgeries. A study of existing datasets used for the detection of image splicing reveals that they are limited to only image splicing and do not contain multiple spliced images. This research work presents a Multiple Image Splicing Dataset, which consists of a total of 300 multiple spliced images. We are the pioneer in developing the first publicly available Multiple Image Splicing Dataset containing high-quality, annotated, realistic multiple spliced images. In addition, we are providing a ground truth mask for these images. This dataset will open up opportunities for researchers working in this significant area.
      Citation: Data
      PubDate: 2021-09-28
      DOI: 10.3390/data6100102
      Issue No: Vol. 6, No. 10 (2021)
       
  • Data, Vol. 6, Pages 103: Experimental Data of Bottom Pressure and Free
           Surface Elevation including Wave and Current Interactions

    • Authors: Roman Gabl, Samuel Draycott, Ajit C. Pillai, Thomas Davey
      First page: 103
      Abstract: Force plates are commonly used in tank testing to measure loads acting on the foundation of a structure. These targeted measurements are overlaid by the hydrostatic and dynamic pressure acting on the force plate induced by the waves and currents. This paper presents a dataset of bottom force measurement with a six degree-of-freedom force plate (AMTI OR6-7 1000, surface area 0.464 m × 0.508 m) combined with synchronised measurements of surface elevation and current velocity. The data cover wave frequencies between 0.2 to 0.7 Hz and wave directions between 0∘ and 180∘. These variations are provided for current speeds of 0 and 0.2 m/s and a variation of the current in the absence of waves covering 0 to 0.45 m/s. The dataset can be utilised as a validation dataset for models predicting bottom pressure based on free surface elevation. Additionally, the dataset provides the wave- and current-induced load acting on the specific load cell at a fixed water depth of 2 m, which can subsequently be removed to obtain the often-desired measurement of structural loads.
      Citation: Data
      PubDate: 2021-09-30
      DOI: 10.3390/data6100103
      Issue No: Vol. 6, No. 10 (2021)
       
  • Data, Vol. 6, Pages 104: Human Activity Vibrations

    • Authors: Sakdirat Kaewunruen, Jessada Sresakoolchai, Junhui Huang, Satoru Harada, Wisinee Wisetjindawat
      First page: 104
      Abstract: We present a unique, comprehensive dataset that provides the pattern of five activities walking, cycling, taking a train, a bus, or a taxi. The measurements are carried out by embedded sensor accelerometers in smartphones. The dataset offers dynamic responses of subjects carrying smartphones in varied styles as they perform the five activities through vibrations acquired by accelerometers. The dataset contains corresponding time stamps and vibrations in three directions longitudinal, horizontal, and vertically stored in an Excel Macro-enabled Workbook (xlsm) format that can be used to train an AI model in a smartphone which has the potential to collect people’s vibration data and decide what movement is being conducted. Moreover, with more data received, the database can be updated and used to train the model with a larger dataset. The prevalence of the smartphone opens the door to crowdsensing, which leads to the pattern of people taking public transport being understood. Furthermore, the time consumed in each activity is available in the dataset. Therefore, with a better understanding of people using public transport, services and schedules can be planned perceptively.
      Citation: Data
      PubDate: 2021-09-30
      DOI: 10.3390/data6100104
      Issue No: Vol. 6, No. 10 (2021)
       
  • Data, Vol. 6, Pages 105: Experimental Data of a Hexagonal Floating
           Structure under Waves

    • Authors: Roman Gabl, Robert Klar, Thomas Davey, David M. Ingram
      First page: 105
      Abstract: Floating structures have a wide range of application and shapes. This experimental investigations observes a hexagonal floating structure under wave conditions for three different draft configurations. Regular waves as well as a range of white noise tests were conducted to quantify the response amplitude operator (RAO). Further irregular waves focused on the survivability of the floating structure. The presented dataset includes wave gauge data as well as a six degree of freedom motion measurement to quantify the response only restricted by a soft mooring system. Additional analysis include the measurement of the mass properties of the individual configuration, natural frequency of the mooring system as well as the comparison between requested and measured wave heights. This allows us to use the provided dataset as a validation experiment.
      Citation: Data
      PubDate: 2021-09-30
      DOI: 10.3390/data6100105
      Issue No: Vol. 6, No. 10 (2021)
       
  • Data, Vol. 6, Pages 106: Mobile Apps to Fight the COVID-19 Crisis

    • Authors: Chrisa Tsinaraki, Irena Mitton, Marco Minghini, Marina Micheli, Alexander Kotsev, Lorena Hernandez Quiros, Fabiano-Antonio Spinelli, Alessandro Dalla Benetta, Sven Schade
      First page: 106
      Abstract: The COVID-19 pandemic led to a multi-faceted global crisis, which triggered the diverse and quickly emerging use of old and new digital tools. We have developed a multi-channel approach for the monitoring and analysis of a subset of such tools, the COVID-19 related mobile applications (apps). Our approach builds on the information available in the two most prominent app stores (i.e., Google Play for Android-powered devices and Apple’s App Store for iOS-powered devices), as well as on relevant tweets and digital media outlets. The dataset presented here is one of the outcomes of this approach, uses the content of the app stores and enriches it, providing aggregated information about 837 mobile apps published across the world to fight the COVID-19 crisis. This information includes: (a) information available in the mobile app stores between 20 April 2020 and 2 August 2020; (b) complementary information obtained from manual analysis performed until mid-September 2020; and (c) status information about app availability on 28 February 2021, when we last collected data from the mobile app stores. We highlight our findings with a series of descriptives, which depict both the activities in the app stores and the qualitative information that was revealed by the manual analysis.
      Citation: Data
      PubDate: 2021-10-08
      DOI: 10.3390/data6100106
      Issue No: Vol. 6, No. 10 (2021)
       
  • Data, Vol. 6, Pages 107: The Retreat of Mountain Glaciers since the Little
           Ice Age: A Spatially Explicit Database

    • Authors: Silvio Marta, Roberto Sergio Azzoni, Davide Fugazza, Levan Tielidze, Pritam Chand, Katrin Sieron, Peter Almond, Roberto Ambrosini, Fabien Anthelme, Pablo Alviz Gazitúa, Rakesh Bhambri, Aurélie Bonin, Marco Caccianiga, Sophie Cauvy-Fraunié, Jorge Luis Ceballos Lievano, John Clague, Justiniano Alejo Cochachín Rapre, Olivier Dangles, Philip Deline, Andre Eger, Rolando Cruz Encarnación, Sergey Erokhin, Andrea Franzetti, Ludovic Gielly, Fabrizio Gili, Mauro Gobbi, Alessia Guerrieri, Sigmund Hågvar, Norine Khedim, Rahab Kinyanjui, Erwan Messager, Marco Aurelio Morales-Martínez, Gwendolyn Peyre, Francesca Pittino, Jerome Poulenard, Roberto Seppi, Milap Chand Sharma, Nurai Urseitova, Blake Weissling, Yan Yang, Vitalii Zaginaev, Anaïs Zimmer, Guglielmina Adele Diolaiuti, Antoine Rabatel, Gentile Francesco Ficetola
      First page: 107
      Abstract: Most of the world’s mountain glaciers have been retreating for more than a century in response to climate change. Glacier retreat is evident on all continents, and the rate of retreat has accelerated during recent decades. Accurate, spatially explicit information on the position of glacier margins over time is useful for analyzing patterns of glacier retreat and measuring reductions in glacier surface area. This information is also essential for evaluating how mountain ecosystems are evolving due to climate warming and the attendant glacier retreat. Here, we present a non-comprehensive spatially explicit dataset showing multiple positions of glacier fronts since the Little Ice Age (LIA) maxima, including many data from the pre-satellite era. The dataset is based on multiple historical archival records including topographical maps; repeated photographs, paintings, and aerial or satellite images with a supplement of geochronology; and own field data. We provide ESRI shapefiles showing 728 past positions of 94 glacier fronts from all continents, except Antarctica, covering the period between the Little Ice Age maxima and the present. On average, the time series span the past 190 years. From 2 to 46 past positions per glacier are depicted (on average: 7.8).
      Citation: Data
      PubDate: 2021-10-09
      DOI: 10.3390/data6100107
      Issue No: Vol. 6, No. 10 (2021)
       
  • Data, Vol. 6, Pages 108: A Principal Components Analysis-Based Method for
           the Detection of Cannabis Plants Using Representation Data by Remote
           Sensing

    • Authors: Carmine Gambardella, Rosaria Parente, Alessandro Ciambrone, Marialaura Casbarra
      First page: 108
      Abstract: Integrating the representation of the territory, through airborne remote sensing activities with hyperspectral and visible sensors, and managing complex data through dimensionality reduction for the identification of cannabis plantations, in Albania, is the focus of the research proposed by the multidisciplinary group of the Benecon University Consortium. In this study, principal components analysis (PCA) was used to remove redundant spectral information from multiband datasets. This makes it easier to identify the most prevalent spectral characteristics in most bands and those that are specific to only a few bands. The survey and airborne monitoring by hyperspectral sensors is carried out with an Itres CASI 1500 sensor owned by Benecon, characterized by a spectral range of 380–1050 nm and 288 configurable channels. The spectral configuration adopted for the research was developed specifically to maximize the spectral separability of cannabis. The ground resolution of the georeferenced cartographic data varies according to the flight planning, inserted in the aerial platform of an Italian Guardia di Finanza’s aircraft, in relation to the orography of the sites under investigation. The geodatabase, wherein the processing of hyperspectral and visible images converge, contains ancillary data such as digital aeronautical maps, digital terrain models, color orthophoto, topographic data and in any case a significant amount of data so that they can be processed synergistically. The goal is to create maps and predictive scenarios, through the application of the spectral angle mapper algorithm, of the cannabis plantations scattered throughout the area. The protocol consists of comparing the spectral data acquired with the CASI1500 airborne sensor and the spectral signature of the cannabis leaves that have been acquired in the laboratory with ASD Fieldspec PRO FR spectrometers. These scientific studies have demonstrated how it is possible to achieve ex ante control of the evolution of the phenomenon itself for monitoring the cultivation of cannabis plantations.
      Citation: Data
      PubDate: 2021-10-13
      DOI: 10.3390/data6100108
      Issue No: Vol. 6, No. 10 (2021)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 34.231.247.88
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-