Subjects -> COMPUTER SCIENCE (Total: 2313 journals)
    - ANIMATION AND SIMULATION (33 journals)
    - ARTIFICIAL INTELLIGENCE (133 journals)
    - AUTOMATION AND ROBOTICS (116 journals)
    - COMPUTER ARCHITECTURE (11 journals)
    - COMPUTER ENGINEERING (12 journals)
    - COMPUTER GAMES (23 journals)
    - COMPUTER PROGRAMMING (25 journals)
    - COMPUTER SCIENCE (1305 journals)
    - COMPUTER SECURITY (59 journals)
    - DATA BASE MANAGEMENT (21 journals)
    - DATA MINING (50 journals)
    - E-BUSINESS (21 journals)
    - E-LEARNING (30 journals)
    - IMAGE AND VIDEO PROCESSING (42 journals)
    - INFORMATION SYSTEMS (109 journals)
    - INTERNET (111 journals)
    - SOCIAL WEB (61 journals)
    - SOFTWARE (43 journals)
    - THEORY OF COMPUTING (10 journals)

COMPUTER SCIENCE (1305 journals)                  1 2 3 4 5 6 7 | Last

Showing 1 - 200 of 872 Journals sorted alphabetically
3D Printing and Additive Manufacturing     Full-text available via subscription   (Followers: 27)
Abakós     Open Access   (Followers: 3)
ACM Computing Surveys     Hybrid Journal   (Followers: 29)
ACM Inroads     Full-text available via subscription   (Followers: 1)
ACM Journal of Computer Documentation     Free   (Followers: 4)
ACM Journal on Computing and Cultural Heritage     Hybrid Journal   (Followers: 5)
ACM Journal on Emerging Technologies in Computing Systems     Hybrid Journal   (Followers: 11)
ACM SIGACCESS Accessibility and Computing     Free   (Followers: 2)
ACM SIGAPP Applied Computing Review     Full-text available via subscription  
ACM SIGBioinformatics Record     Full-text available via subscription  
ACM SIGEVOlution     Full-text available via subscription  
ACM SIGHIT Record     Full-text available via subscription  
ACM SIGHPC Connect     Full-text available via subscription  
ACM SIGITE Newsletter     Open Access   (Followers: 1)
ACM SIGMIS Database: the DATABASE for Advances in Information Systems     Hybrid Journal  
ACM SIGUCCS plugged in     Full-text available via subscription  
ACM SIGWEB Newsletter     Full-text available via subscription   (Followers: 3)
ACM Transactions on Accessible Computing (TACCESS)     Hybrid Journal   (Followers: 3)
ACM Transactions on Algorithms (TALG)     Hybrid Journal   (Followers: 13)
ACM Transactions on Applied Perception (TAP)     Hybrid Journal   (Followers: 3)
ACM Transactions on Architecture and Code Optimization (TACO)     Hybrid Journal   (Followers: 9)
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)     Hybrid Journal  
ACM Transactions on Autonomous and Adaptive Systems (TAAS)     Hybrid Journal   (Followers: 10)
ACM Transactions on Computation Theory (TOCT)     Hybrid Journal   (Followers: 11)
ACM Transactions on Computational Logic (TOCL)     Hybrid Journal   (Followers: 5)
ACM Transactions on Computer Systems (TOCS)     Hybrid Journal   (Followers: 19)
ACM Transactions on Computer-Human Interaction     Hybrid Journal   (Followers: 15)
ACM Transactions on Computing Education (TOCE)     Hybrid Journal   (Followers: 9)
ACM Transactions on Computing for Healthcare     Hybrid Journal  
ACM Transactions on Cyber-Physical Systems (TCPS)     Hybrid Journal   (Followers: 1)
ACM Transactions on Design Automation of Electronic Systems (TODAES)     Hybrid Journal   (Followers: 5)
ACM Transactions on Economics and Computation     Hybrid Journal  
ACM Transactions on Embedded Computing Systems (TECS)     Hybrid Journal   (Followers: 4)
ACM Transactions on Information Systems (TOIS)     Hybrid Journal   (Followers: 18)
ACM Transactions on Intelligent Systems and Technology (TIST)     Hybrid Journal   (Followers: 11)
ACM Transactions on Interactive Intelligent Systems (TiiS)     Hybrid Journal   (Followers: 6)
ACM Transactions on Internet of Things     Hybrid Journal   (Followers: 2)
ACM Transactions on Modeling and Performance Evaluation of Computing Systems (ToMPECS)     Hybrid Journal  
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)     Hybrid Journal   (Followers: 10)
ACM Transactions on Parallel Computing     Full-text available via subscription  
ACM Transactions on Reconfigurable Technology and Systems (TRETS)     Hybrid Journal   (Followers: 6)
ACM Transactions on Sensor Networks (TOSN)     Hybrid Journal   (Followers: 9)
ACM Transactions on Social Computing     Hybrid Journal  
ACM Transactions on Spatial Algorithms and Systems (TSAS)     Hybrid Journal   (Followers: 1)
ACM Transactions on Speech and Language Processing (TSLP)     Hybrid Journal   (Followers: 11)
ACM Transactions on Storage     Hybrid Journal  
ACS Applied Materials & Interfaces     Hybrid Journal   (Followers: 39)
Acta Informatica Malaysia     Open Access  
Acta Universitatis Cibiniensis. Technical Series     Open Access   (Followers: 1)
Ad Hoc Networks     Hybrid Journal   (Followers: 12)
Adaptive Behavior     Hybrid Journal   (Followers: 8)
Additive Manufacturing Letters     Open Access   (Followers: 3)
Advanced Engineering Materials     Hybrid Journal   (Followers: 32)
Advanced Science Letters     Full-text available via subscription   (Followers: 9)
Advances in Adaptive Data Analysis     Hybrid Journal   (Followers: 9)
Advances in Artificial Intelligence     Open Access   (Followers: 31)
Advances in Catalysis     Full-text available via subscription   (Followers: 7)
Advances in Computational Mathematics     Hybrid Journal   (Followers: 20)
Advances in Computer Engineering     Open Access   (Followers: 13)
Advances in Computer Science : an International Journal     Open Access   (Followers: 18)
Advances in Computing     Open Access   (Followers: 3)
Advances in Data Analysis and Classification     Hybrid Journal   (Followers: 52)
Advances in Engineering Software     Hybrid Journal   (Followers: 26)
Advances in Geosciences (ADGEO)     Open Access   (Followers: 19)
Advances in Human-Computer Interaction     Open Access   (Followers: 19)
Advances in Image and Video Processing     Open Access   (Followers: 20)
Advances in Materials Science     Open Access   (Followers: 19)
Advances in Multimedia     Open Access   (Followers: 1)
Advances in Operations Research     Open Access   (Followers: 13)
Advances in Remote Sensing     Open Access   (Followers: 59)
Advances in Science and Research (ASR)     Open Access   (Followers: 8)
Advances in Technology Innovation     Open Access   (Followers: 5)
AEU - International Journal of Electronics and Communications     Hybrid Journal   (Followers: 8)
African Journal of Information and Communication     Open Access   (Followers: 6)
African Journal of Mathematics and Computer Science Research     Open Access   (Followers: 5)
AI EDAM     Hybrid Journal   (Followers: 2)
Air, Soil & Water Research     Open Access   (Followers: 6)
AIS Transactions on Human-Computer Interaction     Open Access   (Followers: 5)
Al-Qadisiyah Journal for Computer Science and Mathematics     Open Access   (Followers: 2)
AL-Rafidain Journal of Computer Sciences and Mathematics     Open Access   (Followers: 3)
Algebras and Representation Theory     Hybrid Journal  
Algorithms     Open Access   (Followers: 13)
American Journal of Computational and Applied Mathematics     Open Access   (Followers: 8)
American Journal of Computational Mathematics     Open Access   (Followers: 6)
American Journal of Information Systems     Open Access   (Followers: 4)
American Journal of Sensor Technology     Open Access   (Followers: 2)
Analog Integrated Circuits and Signal Processing     Hybrid Journal   (Followers: 15)
Animation Practice, Process & Production     Hybrid Journal   (Followers: 4)
Annals of Combinatorics     Hybrid Journal   (Followers: 3)
Annals of Data Science     Hybrid Journal   (Followers: 14)
Annals of Mathematics and Artificial Intelligence     Hybrid Journal   (Followers: 16)
Annals of Pure and Applied Logic     Open Access   (Followers: 4)
Annals of Software Engineering     Hybrid Journal   (Followers: 12)
Annual Reviews in Control     Hybrid Journal   (Followers: 7)
Anuario Americanista Europeo     Open Access  
Applicable Algebra in Engineering, Communication and Computing     Hybrid Journal   (Followers: 3)
Applied and Computational Harmonic Analysis     Full-text available via subscription  
Applied Artificial Intelligence: An International Journal     Hybrid Journal   (Followers: 17)
Applied Categorical Structures     Hybrid Journal   (Followers: 4)
Applied Clinical Informatics     Hybrid Journal   (Followers: 4)
Applied Computational Intelligence and Soft Computing     Open Access   (Followers: 16)
Applied Computer Systems     Open Access   (Followers: 6)
Applied Computing and Geosciences     Open Access   (Followers: 3)
Applied Mathematics and Computation     Hybrid Journal   (Followers: 31)
Applied Medical Informatics     Open Access   (Followers: 11)
Applied Numerical Mathematics     Hybrid Journal   (Followers: 4)
Applied Soft Computing     Hybrid Journal   (Followers: 13)
Applied Spatial Analysis and Policy     Hybrid Journal   (Followers: 5)
Applied System Innovation     Open Access   (Followers: 1)
Archive of Applied Mechanics     Hybrid Journal   (Followers: 4)
Archive of Numerical Software     Open Access  
Archives and Museum Informatics     Hybrid Journal   (Followers: 97)
Archives of Computational Methods in Engineering     Hybrid Journal   (Followers: 5)
arq: Architectural Research Quarterly     Hybrid Journal   (Followers: 7)
Array     Open Access   (Followers: 1)
Artifact : Journal of Design Practice     Open Access   (Followers: 8)
Artificial Life     Hybrid Journal   (Followers: 7)
Asian Journal of Computer Science and Information Technology     Open Access   (Followers: 3)
Asian Journal of Control     Hybrid Journal  
Asian Journal of Research in Computer Science     Open Access   (Followers: 4)
Assembly Automation     Hybrid Journal   (Followers: 2)
Automatic Control and Computer Sciences     Hybrid Journal   (Followers: 6)
Automatic Documentation and Mathematical Linguistics     Hybrid Journal   (Followers: 5)
Automatica     Hybrid Journal   (Followers: 13)
Automatika : Journal for Control, Measurement, Electronics, Computing and Communications     Open Access  
Automation in Construction     Hybrid Journal   (Followers: 8)
Balkan Journal of Electrical and Computer Engineering     Open Access  
Basin Research     Hybrid Journal   (Followers: 7)
Behaviour & Information Technology     Hybrid Journal   (Followers: 32)
BenchCouncil Transactions on Benchmarks, Standards, and Evaluations     Open Access   (Followers: 3)
Big Data and Cognitive Computing     Open Access   (Followers: 5)
Big Data Mining and Analytics     Open Access   (Followers: 10)
Biodiversity Information Science and Standards     Open Access   (Followers: 1)
Bioinformatics     Hybrid Journal   (Followers: 216)
Bioinformatics Advances : Journal of the International Society for Computational Biology     Open Access   (Followers: 1)
Biomedical Engineering     Hybrid Journal   (Followers: 11)
Biomedical Engineering and Computational Biology     Open Access   (Followers: 11)
Briefings in Bioinformatics     Hybrid Journal   (Followers: 43)
British Journal of Educational Technology     Hybrid Journal   (Followers: 93)
Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics     Open Access  
c't Magazin fuer Computertechnik     Full-text available via subscription   (Followers: 1)
Cadernos do IME : Série Informática     Open Access  
CALCOLO     Hybrid Journal  
CALICO Journal     Full-text available via subscription   (Followers: 1)
Calphad     Hybrid Journal  
Canadian Journal of Electrical and Computer Engineering     Full-text available via subscription   (Followers: 14)
Catalysis in Industry     Hybrid Journal  
CCF Transactions on High Performance Computing     Hybrid Journal  
CCF Transactions on Pervasive Computing and Interaction     Hybrid Journal  
CEAS Space Journal     Hybrid Journal   (Followers: 6)
Cell Communication and Signaling     Open Access   (Followers: 3)
Central European Journal of Computer Science     Hybrid Journal   (Followers: 4)
CERN IdeaSquare Journal of Experimental Innovation     Open Access  
Chaos, Solitons & Fractals     Hybrid Journal   (Followers: 1)
Chaos, Solitons & Fractals : X     Open Access   (Followers: 1)
Chemometrics and Intelligent Laboratory Systems     Hybrid Journal   (Followers: 13)
ChemSusChem     Hybrid Journal   (Followers: 7)
China Communications     Full-text available via subscription   (Followers: 8)
Chinese Journal of Catalysis     Full-text available via subscription   (Followers: 2)
Chip     Full-text available via subscription   (Followers: 2)
Ciencia     Open Access  
CIN : Computers Informatics Nursing     Hybrid Journal   (Followers: 11)
Circuits and Systems     Open Access   (Followers: 16)
CLEI Electronic Journal     Open Access  
Clin-Alert     Hybrid Journal   (Followers: 1)
Clinical eHealth     Open Access  
Cluster Computing     Hybrid Journal   (Followers: 1)
Cognitive Computation     Hybrid Journal   (Followers: 2)
Cognitive Computation and Systems     Open Access  
COMBINATORICA     Hybrid Journal  
Combinatorics, Probability and Computing     Hybrid Journal   (Followers: 4)
Combustion Theory and Modelling     Hybrid Journal   (Followers: 18)
Communication Methods and Measures     Hybrid Journal   (Followers: 12)
Communication Theory     Hybrid Journal   (Followers: 29)
Communications in Algebra     Hybrid Journal   (Followers: 1)
Communications in Partial Differential Equations     Hybrid Journal   (Followers: 2)
Communications of the ACM     Full-text available via subscription   (Followers: 59)
Communications of the Association for Information Systems     Open Access   (Followers: 15)
Communications on Applied Mathematics and Computation     Hybrid Journal   (Followers: 1)
COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering     Hybrid Journal   (Followers: 4)
Complex & Intelligent Systems     Open Access   (Followers: 1)
Complex Adaptive Systems Modeling     Open Access  
Complex Analysis and Operator Theory     Hybrid Journal   (Followers: 2)
Complexity     Hybrid Journal   (Followers: 8)
Computación y Sistemas     Open Access  
Computation     Open Access   (Followers: 1)
Computational and Applied Mathematics     Hybrid Journal   (Followers: 3)
Computational and Mathematical Methods     Hybrid Journal  
Computational and Mathematical Methods in Medicine     Open Access   (Followers: 2)
Computational and Mathematical Organization Theory     Hybrid Journal   (Followers: 1)
Computational and Structural Biotechnology Journal     Open Access   (Followers: 1)
Computational and Theoretical Chemistry     Hybrid Journal   (Followers: 11)
Computational Astrophysics and Cosmology     Open Access   (Followers: 6)
Computational Biology and Chemistry     Hybrid Journal   (Followers: 13)
Computational Biology Journal     Open Access   (Followers: 6)
Computational Brain & Behavior     Hybrid Journal   (Followers: 1)
Computational Chemistry     Open Access   (Followers: 3)
Computational Communication Research     Open Access   (Followers: 1)
Computational Complexity     Hybrid Journal   (Followers: 5)
Computational Condensed Matter     Open Access   (Followers: 1)

        1 2 3 4 5 6 7 | Last

Similar Journals
Journal Cover
Big Data and Cognitive Computing
Number of Followers: 5  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2504-2289
Published by MDPI Homepage  [84 journals]
  • BDCC, Vol. 6, Pages 33: RoBERTaEns: Deep Bidirectional Encoder Ensemble
           Model for Fact Verification

    • Authors: Muchammad Naseer, Jauzak Hussaini Windiatmaja, Muhamad Asvial, Riri Fitri Sari
      First page: 33
      Abstract: The application of the bidirectional encoder model to detect fake news has been widely applied because of its ability to provide factual verification with good results. Good fact verification requires the most optimal model and has the best evaluation to make news readers trust the reliable and accurate verification results. In this study, we evaluated the application of a homogeneous ensemble (HE) on RoBERTa to improve the accuracy of a model. We improve the HE method using a bagging ensemble from three types of RoBERTa models. Then, each prediction is combined to build a new model called RoBERTaEns. The FEVER dataset is used to train and test our model. The experimental results showed that the proposed method, RoBERTaEns, obtained a higher accuracy value with an F1-Score of 84.2% compared to the other RoBERTa models. In addition, RoBERTaEns has a smaller margin of error compared to the other models. Thus, it proves that the application of the HE functions increases the accuracy of a model and produces better values in handling various types of fact input in each fold.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-22
      DOI: 10.3390/bdcc6020033
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 34: Startups and Consumer Purchase Behavior:
           Application of Support Vector Machine Algorithm

    • Authors: Pejman Ebrahimi, Aidin Salamzadeh, Maryam Soleimani, Seyed Mohammad Khansari, Hadi Zarea, Maria Fekete-Farkas
      First page: 34
      Abstract: This study evaluated the impact of startup technology innovations and customer relationship management (CRM) performance on customer participation, value co-creation, and consumer purchase behavior (CPB). This analytical study empirically tested the proposed hypotheses using structural equation modeling (SEM) and SmartPLS 3 techniques. Moreover, we used a support vector machine (SVM) algorithm to verify the model’s accuracy. SVM algorithm uses four different kernels to check the accuracy criterion, and we checked all of them. This research used the convenience sampling approach in gathering the data. We used the conventional bias test method. A total of 466 respondents were completed. Technological innovations of startups and CRM have a positive and significant effect on customer participation. Customer participation significantly affects the value of pleasure, economic value, and relationship value. Based on the importance-performance map analysis (IPMA) matrix results, “customer participation” with a score of 0.782 had the highest importance. If customers increase their participation performance by one unit during the COVID-19 epidemic, its overall CPB increases by 0.782. In addition, our results showed that the lowest performance is related to the technological innovations of startups, which indicates an excellent opportunity for development in this area. SVM results showed that polynomial kernel, to a high degree, is the best kernel that confirms the model’s accuracy.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-25
      DOI: 10.3390/bdcc6020034
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 35: Social Networks Marketing and Consumer Purchase
           Behavior: The Combination of SEM and Unsupervised Machine Learning

    • Authors: Pejman Ebrahimi, Marjan Basirat, Ali Yousefi, Md. Nekmahmud, Abbas Gholampour, Maria Fekete-Farkas
      First page: 35
      Abstract: The purpose of this paper is to reveal how social network marketing (SNM) can affect consumers’ purchase behavior (CPB). We used the combination of structural equation modeling (SEM) and unsupervised machine learning approaches as an innovative method. The statistical population of the study concluded users who live in Hungary and use Facebook Marketplace. This research uses the convenience sampling approach to overcome bias. Out of 475 surveys distributed, a total of 466 respondents successfully filled out the entire survey with a response rate of 98.1%. The results showed that all dimensions of social network marketing, such as entertainment, customization, interaction, WoM and trend, had positively and significantly influenced consumer purchase behavior (CPB) in Facebook Marketplace. Furthermore, we used hierarchical clustering and K-means unsupervised algorithms to cluster consumers. The results show that respondents of this research can be clustered in nine different groups based on behavior regarding demographic attributes. It means that distinctive strategies can be used for different clusters. Meanwhile, marketing managers can provide different options, products and services for each group. This study is of high importance in that it has adopted and used plspm and Matrixpls packages in R to show the model predictive power. Meanwhile, we used unsupervised machine learning algorithms to cluster consumer behaviors.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-25
      DOI: 10.3390/bdcc6020035
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 36: Illusion of Truth: Analysing and Classifying
           COVID-19 Fake News in Brazilian Portuguese Language

    • Authors: Patricia Takako Endo, Guto Leoni Santos, Maria Eduarda de Lima Xavier, Gleyson Rhuan Nascimento Campos, Luciana Conceição de Lima, Ivanovitch Silva, Antonia Egli, Theo Lynn
      First page: 36
      Abstract: Public health interventions to counter the COVID-19 pandemic have accelerated and increased digital adoption and use of the Internet for sourcing health information. Unfortunately, there is evidence to suggest that it has also accelerated and increased the spread of false information relating to COVID-19. The consequences of misinformation, disinformation and misinterpretation of health information can interfere with attempts to curb the virus, delay or result in failure to seek or continue legitimate medical treatment and adherence to vaccination, as well as interfere with sound public health policy and attempts to disseminate public health messages. While there is a significant body of literature, datasets and tools to support countermeasures against the spread of false information online in resource-rich languages such as English and Chinese, there are few such resources to support Portuguese, and Brazilian Portuguese specifically. In this study, we explore the use of machine learning and deep learning techniques to identify fake news in online communications in the Brazilian Portuguese language relating to the COVID-19 pandemic. We build a dataset of 11,382 items comprising data from January 2020 to February 2021. Exploratory data analysis suggests that fake news about the COVID-19 vaccine was prevalent in Brazil, much of it related to government communications. To mitigate the adverse impact of fake news, we analyse the impact of machine learning to detect fake news based on stop words in communications. The results suggest that stop words improve the performance of the models when keeping them within the message. Random Forest was the machine learning model with the best results, achieving 97.91% of precision, while Bi-GRU was the best deep learning model with an F1 score of 94.03%.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-01
      DOI: 10.3390/bdcc6020036
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 37: Operations with Nested Named Sets as a Tool for
           Artificial Intelligence

    • Authors: Mark Burgin
      First page: 37
      Abstract: Knowledge and data representations are important for artificial intelligence (AI), as well as for intelligence in general. Intelligent functioning presupposes efficient operation with knowledge and data representations in particular. At the same time, it has been demonstrated that named sets, which are also called fundamental triads, instantiate the most fundamental structure in general and for knowledge and data representations in particular. In this context, named sets allow for effective mathematical portrayal of the key phenomenon, called nesting. Nesting plays a weighty role in a variety of fields, such as mathematics and computer science. Computing tools of AI include nested levels of parentheses in arithmetical expressions; different types of recursion; nesting of several levels of subroutines; nesting in recursive calls; multilevel nesting in information hiding; a variety of nested data structures, such as records, objects, and classes; and nested blocks of imperative source code, such as nested repeat-until clauses, while clauses, if clauses, etc. In this paper, different operations with nested named sets are constructed and their properties obtained, reflecting different attributes of nesting. An AI system receives information in the form of data and knowledge and processing information, performs operations with these data and knowledge. Thus, such a system needs various operations for these processes. Operations constructed in this paper perform processing of data and knowledge in the form of nested named sets. Knowing properties of these operations can help to optimize the processing of data and knowledge in AI systems.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-01
      DOI: 10.3390/bdcc6020037
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 38: Spark Configurations to Optimize Decision Tree
           Classification on UNSW-NB15

    • Authors: Sikha Bagui, Mary Walauskis, Robert DeRush, Huyen Praviset, Shaunda Boucugnani
      First page: 38
      Abstract: This paper looks at the impact of changing Spark’s configuration parameters on machine learning algorithms using a large dataset—the UNSW-NB15 dataset. The environmental conditions that will optimize the classification process are studied. To build smart intrusion detection systems, a deep understanding of the environmental parameters is necessary. Specifically, the focus is on the following environmental parameters: the executor memory, number of executors, number of cores per executor, execution time, as well as the impact on statistical measures. Hence, the objective was to optimize resource usage and minimize processing time for Decision Tree classification, using Spark. This shows whether additional resources will increase performance, lower processing time, and optimize computing resources. The UNSW-NB15 dataset, being a large dataset, provides enough data and complexity to see the changes in computing resource configurations in Spark. Principal Component Analysis was used for preprocessing the dataset. Results indicated that a lack of executors and cores result in wasted resources and long processing time. Excessive resource allocation did not improve processing time. Environmental tuning has a noticeable impact.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-07
      DOI: 10.3390/bdcc6020038
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 39: PCB Component Detection Using Computer Vision for
           Hardware Assurance

    • Authors: Wenwei Zhao, Suprith Reddy Gurudu, Shayan Taheri, Shajib Ghosh, Mukhil Azhagan Mallaiyan Sathiaseelan, Navid Asadizanjani
      First page: 39
      Abstract: Printed circuit board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving, so new techniques are required to overcome the emerging problems. Existing ML-based methods outperform traditional CV methods; however, they often require more data, have low explainability, and can be difficult to adapt when a new technology arises. To overcome these challenges, CV methods can be used in tandem with ML methods. In particular, human-interpretable CV algorithms such as those that extract color, shape, and texture features increase PCB assurance explainability. This allows for incorporation of prior knowledge, which effectively reduces the number of trainable ML parameters and, thus, the amount of data needed to achieve high accuracy when training or retraining an ML model. Hence, this study explores the benefits and limitations of a variety of common computer vision-based features for the task of PCB component detection. The study results indicate that color features demonstrate promising performance for PCB component detection. The purpose of this paper is to facilitate collaboration between the hardware assurance, computer vision, and machine learning communities.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-08
      DOI: 10.3390/bdcc6020039
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 40: Breast and Lung Anticancer Peptides Classification
           Using N-Grams and Ensemble Learning Techniques

    • Authors: Ayad Rodhan Abbas, Bashar Saadoon Mahdi, Osamah Younus Fadhil
      First page: 40
      Abstract: Anticancer peptides (ACPs) are short protein sequences; they perform functions like some hormones and enzymes inside the body. The role of any protein or peptide is related to its structure and the sequence of amino acids that make up it. There are 20 types of amino acids in humans, and each of them has a particular characteristic according to its chemical structure. Current machine and deep learning models have been used to classify ACPs problems. However, these models have neglected Amino Acid Repeats (AARs) that play an essential role in the function and structure of peptides. Therefore, in this paper, ACPs offer a promising route for novel anticancer peptides by extracting AARs based on N-Grams and k-mers using two peptides’ datasets. These datasets pointed to breast and lung cancer cells assembled and curated manually from the Cancer Peptide and Protein Database (CancerPPD). Every dataset consists of a sequence of peptides and their synthesis and anticancer activity on breast and lung cancer cell lines. Five different feature selection methods were used in this paper to improve classification performance and reduce the experimental costs. After that, ACPs were classified using four classifiers, namely AdaBoost, Random Forest Tree (RFT), Multi-class Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). These classifiers were evaluated by applying five well-known evaluation metrics. Experimental results showed that the breast and lung ACPs classification process provided an accurate performance that reached 89.25% and 92.56%, respectively. In terms of AUC, it reached 95.35% and 96.92% for both breast and lung ACPs, respectively. The proposed classifiers performed competently somewhat equally in AUC, accuracy, precision, F-measures, and recall, except for Multi-class SVM-based feature selection, which showed superior performance. As a result, this paper significantly improved the predictive performance that can effectively distinguish ACPs as virtual inactive, experimental inactive, moderately active, and very active.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-12
      DOI: 10.3390/bdcc6020040
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 41: Revisiting Gradient Boosting-Based Approaches for
           Learning Imbalanced Data: A Case of Anomaly Detection on Power Grids

    • Authors: Maya Hilda Lestari Louk, Bayu Adhi Tama
      First page: 41
      Abstract: Gradient boosting ensembles have been used in the cyber-security area for many years; nonetheless, their efficacy and accuracy for intrusion detection systems (IDSs) remain questionable, particularly when dealing with problems involving imbalanced data. This article fills the void in the existing body of knowledge by evaluating the performance of gradient boosting-based ensembles, including gradient boosting machine (GBM), extreme gradient boosting (XGBoost), LightGBM, and CatBoost. This paper assesses the performance of various imbalanced data sets using the Matthew correlation coefficient (MCC), area under the receiver operating characteristic curve (AUC), and F1 metrics. The article discusses an example of anomaly detection in an industrial control network and, more specifically, threat detection in a cyber-physical smart power grid. The tests’ results indicate that CatBoost surpassed its competitors, regardless of the imbalance ratio of the data sets. Moreover, LightGBM showed a much lower performance value and had more variability across the data sets.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-16
      DOI: 10.3390/bdcc6020041
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 42: An Emergency Event Detection Ensemble Model Based
           on Big Data

    • Authors: Khalid Alfalqi, Martine Bellaiche
      First page: 42
      Abstract: Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as well as the inherent financial consequences. Social network utilization in emergency event detection models can play an important role as information is shared and users’ status is updated once an emergency event occurs. Besides, big data proved its significance as a tool to assist and alleviate emergency events by processing an enormous amount of data over a short time interval. This paper shows that it is necessary to have an appropriate emergency event detection ensemble model (EEDEM) to respond quickly once such unfortunate events occur. Furthermore, it integrates Snapchat maps to propose a novel method to pinpoint the exact location of an emergency event. Moreover, merging social networks and big data can accelerate the emergency event detection system: social network data, such as those from Twitter and Snapchat, allow us to manage, monitor, analyze and detect emergency events. The main objective of this paper is to propose a novel and efficient big data-based EEDEM to pinpoint the exact location of emergency events by employing the collected data from social networks, such as “Twitter” and “Snapchat”, while integrating big data (BD) and machine learning (ML). Furthermore, this paper evaluates the performance of five ML base models and the proposed ensemble approach to detect emergency events. Results show that the proposed ensemble approach achieved a very high accuracy of 99.87% which outperform the other base models. Moreover, the proposed base models yields a high level of accuracy: 99.72%, 99.70% for LSTM and decision tree, respectively, with an acceptable training time.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-16
      DOI: 10.3390/bdcc6020042
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 43: New Efficient Approach to Solve Big Data Systems
           Using Parallel Gauss–Seidel Algorithms

    • Authors: Shih Yu Chang, Hsiao-Chun Wu, Yifan Wang
      First page: 43
      Abstract: In order to perform big-data analytics, regression involving large matrices is often necessary. In particular, large scale regression problems are encountered when one wishes to extract semantic patterns for knowledge discovery and data mining. When a large matrix can be processed in its factorized form, advantages arise in terms of computation, implementation, and data-compression. In this work, we propose two new parallel iterative algorithms as extensions of the Gauss–Seidel algorithm (GSA) to solve regression problems involving many variables. The convergence study in terms of error-bounds of the proposed iterative algorithms is also performed, and the required computation resources, namely time- and memory-complexities, are evaluated to benchmark the efficiency of the proposed new algorithms. Finally, the numerical results from both Monte Carlo simulations and real-world datasets are presented to demonstrate the striking effectiveness of our proposed new methods.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-19
      DOI: 10.3390/bdcc6020043
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 44: Deep Learning Approaches for Video Compression: A
           Bibliometric Analysis

    • Authors: Bidwe, Mishra, Patil, Shaw, Vora, Kotecha, Zope
      First page: 44
      Abstract: Every data and kind of data need a physical drive to store it. There has been an explosion in the volume of images, video, and other similar data types circulated over the internet. Users using the internet expect intelligible data, even under the pressure of multiple resource constraints such as bandwidth bottleneck and noisy channels. Therefore, data compression is becoming a fundamental problem in wider engineering communities. There has been some related work on data compression using neural networks. Various machine learning approaches are currently applied in data compression techniques and tested to obtain better lossy and lossless compression results. A very efficient and variety of research is already available for image compression. However, this is not the case for video compression. Because of the explosion of big data and the excess use of cameras in various places globally, around 82% of the data generated involve videos. Proposed approaches have used Deep Neural Networks (DNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs), and various variants of Autoencoders (AEs) are used in their approaches. All newly proposed methods aim to increase performance (reducing bitrate up to 50% at the same data quality and complexity). This paper presents a bibliometric analysis and literature survey of all Deep Learning (DL) methods used in video compression in recent years. Scopus and Web of Science are well-known research databases. The results retrieved from them are used for this analytical study. Two types of analysis are performed on the extracted documents. They include quantitative and qualitative results. In quantitative analysis, records are analyzed based on their citations, keywords, source of publication, and country of publication. The qualitative analysis provides information on DL-based approaches for video compression, as well as the advantages, disadvantages, and challenges of using them.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-19
      DOI: 10.3390/bdcc6020044
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 45: Virtual Reality-Based Stimuli for Immersive Car
           Clinics: A Performance Evaluation Model

    • Authors: Alexandre Costa Henriques, Thiago Barros Murari, Jennifer Callans, Alexandre Maguino Pinheiro Silva, Antonio Lopes Apolinario, Ingrid Winkler
      First page: 45
      Abstract: This study proposes a model to evaluate the performance of virtual reality-based stimuli for immersive car clinics. The model considered Attribute Importance, Stimuli Efficacy and Stimuli Cost factors and the method was divided into three stages: we defined the importance of fourteen attributes relevant to a car clinic based on the perceptions of Marketing and Design experts; then we defined the efficacy of five virtual stimuli based on the perceptions of Product Development and Virtual Reality experts; and we used a cost factor to calculate the efficiency of the five virtual stimuli in relation to the physical. The Marketing and Design experts identified a new attribute, Scope; eleven of the fifteen attributes were rated as Important or Very Important, while four were removed from the model due to being considered irrelevant. According to our performance evaluation model, virtual stimuli have the same efficacy as physical stimuli. However, when cost is considered, virtual stimuli outperform physical stimuli, particularly virtual stimuli with glasses. We conclude that virtual stimuli have the potential to reduce the cost and time required to develop new stimuli in car clinics, but with concerns related to hardware, software, and other definitions.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-20
      DOI: 10.3390/bdcc6020045
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 46: A Non-Uniform Continuous Cellular Automata for
           Analyzing and Predicting the Spreading Patterns of COVID-19

    • Authors: Puspa Eosina, Aniati Murni Arymurthy, Adila Alfa Krisnadhi
      First page: 46
      Abstract: During the COVID-19 outbreak, modeling the spread of infectious diseases became a challenging research topic due to its rapid spread and high mortality rate. The main objective of a standard epidemiological model is to estimate the number of infected, suspected, and recovered from the illness by mathematical modeling. This model does not capture how the disease transmits between neighboring regions through interaction. A more general framework such as Cellular Automata (CA) is required to accommodate a more complex spatial interaction within the epidemiological model. The critical issue of modeling in the spread of diseases is how to reduce the prediction error. This research aims to formulate the influence of the interaction of a neighborhood on the spreading pattern of COVID-19 using a neighborhood frame model in a Cellular-Automata (CA) approach and obtain a predictive model for the COVID-19 spread with the error reduction to improve the model. We propose a non-uniform continuous CA (N-CCA) as our contribution to demonstrate the influence of interactions on the spread of COVID-19. The model has succeeded in demonstrating the influence of the interaction between regions on the COVID-19 spread, as represented by the coefficients obtained. These coefficients result from multiple regression models. The coefficient obtained represents the population’s behavior interacting with its neighborhood in a cell and influences the number of cases that occur the next day. The evaluation of the N-CCA model is conducted by root mean square error (RMSE) for the difference in the number of cases between prediction and real cases per cell in each region. This study demonstrates that this approach improves the prediction of accuracy for 14 days in the future using data points from the past 42 days, compared to a baseline model.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-24
      DOI: 10.3390/bdcc6020046
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 47: Incentive Mechanisms for Smart Grid: State of the
           Art, Challenges, Open Issues, Future Directions

    • Authors: Sweta Bhattacharya, Rajeswari Chengoden, Gautam Srivastava, Mamoun Alazab, Abdul Rehman Javed, Nancy Victor, Praveen Kumar Reddy Maddikunta, Thippa Reddy Gadekallu
      First page: 47
      Abstract: Smart grids (SG) are electricity grids that communicate with each other, provide reliable information, and enable administrators to operate energy supplies across the country, ensuring optimized reliability and efficiency. The smart grid contains sensors that measure and transmit data to adjust the flow of electricity automatically based on supply/demand, and thus, responding to problems becomes quicker and easier. This also plays a crucial role in controlling carbon emissions, by avoiding energy losses during peak load hours and ensuring optimal energy management. The scope of big data analytics in smart grids is huge, as they collect information from raw data and derive intelligent information from the same. However, these benefits of the smart grid are dependent on the active and voluntary participation of the consumers in real-time. Consumers need to be motivated and conscious to avail themselves of the achievable benefits. Incentivizing the appropriate actor is an absolute necessity to encourage prosumers to generate renewable energy sources (RES) and motivate industries to establish plants that support sustainable and green-energy-based processes or products. The current study emphasizes similar aspects and presents a comprehensive survey of the start-of-the-art contributions pertinent to incentive mechanisms in smart grids, which can be used in smart grids to optimize the power distribution during peak times and also reduce carbon emissions. The various technologies, such as game theory, blockchain, and artificial intelligence, used in implementing incentive mechanisms in smart grids are discussed, followed by different incentive projects being implemented across the globe. The lessons learnt, challenges faced in such implementations, and open issues such as data quality, privacy, security, and pricing related to incentive mechanisms in SG are identified to guide the future scope of research in this sector.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-27
      DOI: 10.3390/bdcc6020047
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 48: A New Ontology-Based Method for Arabic Sentiment

    • Authors: Safaa M. Khabour, Qasem A. Al-Radaideh, Dheya Mustafa
      First page: 48
      Abstract: Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for Arabic sentiment analysis based on domain ontologies and features’ importance. In this paper, we built a semantic orientation approach for calculating overall polarity from the Arabic subjective texts based on built domain ontology and the available sentiment lexicon. We used the ontology concepts to extract and weight the semantic domain features by considering their levels in the ontology tree and their frequencies in the dataset to compute the overall polarity of a given textual review based on the importance of each domain feature. For evaluation, an Arabic dataset from the hotels’ domain was selected to build the domain ontology and to test the proposed approach. The overall accuracy and f-measure reach 79.20% and 78.75%, respectively. Results showed that the approach outperformed the other semantic orientation approaches, and it is an appealing approach to be used for Arabic sentiment analysis.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-29
      DOI: 10.3390/bdcc6020048
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 49: A Comparative Study of MongoDB and Document-Based
           MySQL for Big Data Application Data Management

    • Authors: Cornelia A. Győrödi, Diana V. Dumşe-Burescu, Doina R. Zmaranda, Robert Ş. Győrödi
      First page: 49
      Abstract: In the context of the heavy demands of Big Data, software developers have also begun to consider NoSQL data storage solutions. One of the important criteria when choosing a NoSQL database for an application is its performance in terms of speed of data accessing and processing, including response times to the most important CRUD operations (CREATE, READ, UPDATE, DELETE). In this paper, the behavior of two of the major document-based NoSQL databases, MongoDB and document-based MySQL, was analyzed in terms of the complexity and performance of CRUD operations, especially in query operations. The main objective of the paper is to make a comparative analysis of the impact that each specific database has on application performance when realizing CRUD requests. To perform this analysis, a case-study application was developed using the two document-based MongoDB and MySQL databases, which aim to model and streamline the activity of service providers that use a lot of data. The results obtained demonstrate the performance of both databases for different volumes of data; based on these, a detailed analysis and several conclusions were presented to support a decision for choosing an appropriate solution that could be used in a big-data application.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-05
      DOI: 10.3390/bdcc6020049
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 50: Gender Stereotypes in Hollywood Movies and Their
           Evolution over Time: Insights from Network Analysis

    • Authors: Arjun M. Kumar, Jasmine Y. Q. Goh, Tiffany H. H. Tan, Cynthia S. Q. Siew
      First page: 50
      Abstract: The present analysis of more than 180,000 sentences from movie plots across the period from 1940 to 2019 emphasizes how gender stereotypes are expressed through the cultural products of society. By applying a network analysis to the word co-occurrence networks of movie plots and using a novel method of identifying story tropes, we demonstrate that gender stereotypes exist in Hollywood movies. An analysis of specific paths in the network and the words reflecting various domains show the dynamic changes in some of these stereotypical associations. Our results suggest that gender stereotypes are complex and dynamic in nature. Specifically, whereas male characters appear to be associated with a diversity of themes in movies, female characters seem predominantly associated with the theme of romance. Although associations of female characters to physical beauty and marriage are declining over time, associations of female characters to sexual relationships and weddings are increasing. Our results demonstrate how the application of cognitive network science methods can enable a more nuanced investigation of gender stereotypes in textual data.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-06
      DOI: 10.3390/bdcc6020050
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 51: Robust Multi-Mode Synchronization of Chaotic
           Fractional Order Systems in the Presence of Disturbance, Time Delay and
           Uncertainty with Application in Secure Communications

    • Authors: Ali Akbar Kekha Javan, Assef Zare, Roohallah Alizadehsani, Saeed Balochian
      First page: 51
      Abstract: This paper investigates the robust adaptive synchronization of multi-mode fractional-order chaotic systems (MMFOCS). To that end, synchronization was performed with unknown parameters, unknown time delays, the presence of disturbance, and uncertainty with the unknown boundary. The convergence of the synchronization error to zero was guaranteed using the Lyapunov function. Additionally, the control rules were extracted as explicit continuous functions. An image encryption approach was proposed based on maps with time-dependent coding for secure communication. The simulations indicated the effectiveness of the proposed design regarding the suitability of the parameters, the convergence of errors, and robustness. Subsequently, the presented method was applied to fractional-order Chen systems and was encrypted using the chaotic masking of different benchmark images. The results indicated the desirable performance of the proposed method in encrypting the benchmark images.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-08
      DOI: 10.3390/bdcc6020051
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 52: Cognitive Networks Extract Insights on COVID-19
           Vaccines from English and Italian Popular Tweets: Anticipation, Logistics,
           Conspiracy and Loss of Trust

    • Authors: Massimo Stella, Michael S. Vitevitch, Federico Botta
      First page: 52
      Abstract: Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. This work reconstructs how popular and trending posts framed semantically and emotionally COVID-19 vaccines on Twitter. We achieve this by merging natural language processing, cognitive network science and AI-based image analysis. We focus on 4765 unique popular tweets in English or Italian about COVID-19 vaccines between December 2020 and March 2021. One popular English tweet contained in our data set was liked around 495,000 times, highlighting how popular tweets could cognitively affect large parts of the population. We investigate both text and multimedia content in tweets and build a cognitive network of syntactic/semantic associations in messages, including emotional cues and pictures. This network representation indicates how online users linked ideas in social discourse and framed vaccines along specific semantic/emotional content. The English semantic frame of “vaccine” was highly polarised between trust/anticipation (towards the vaccine as a scientific asset saving lives) and anger/sadness (mentioning critical issues with dose administering). Semantic associations with “vaccine,” “hoax” and conspiratorial jargon indicated the persistence of conspiracy theories and vaccines in extremely popular English posts. Interestingly, these were absent in Italian messages. Popular tweets with images of people wearing face masks used language that lacked the trust and joy found in tweets showing people with no masks. This difference indicates a negative effect attributed to face-covering in social discourse. Behavioural analysis revealed a tendency for users to share content eliciting joy, sadness and disgust and to like sad messages less. Both patterns indicate an interplay between emotions and content diffusion beyond sentiment. After its suspension in mid-March 2021, “AstraZeneca” was associated with trustful language driven by experts. After the deaths of a small number of vaccinated people in mid-March, popular Italian tweets framed “vaccine” by crucially replacing earlier levels of trust with deep sadness. Our results stress how cognitive networks and innovative multimedia processing open new ways for reconstructing online perceptions about vaccines and trust.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-12
      DOI: 10.3390/bdcc6020052
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 53: Knowledge Modelling and Learning through Cognitive

    • Authors: Massimo Stella, Yoed N. Kenett
      First page: 53
      Abstract: Knowledge modelling is a growing field at the fringe of computer science, psychology and network science [...]
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-13
      DOI: 10.3390/bdcc6020053
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 54: A New Comparative Study of Dimensionality
           Reduction Methods in Large-Scale Image Retrieval

    • Authors: Mohammed Amin Belarbi, Saïd Mahmoudi, Ghalem Belalem, Sidi Ahmed Mahmoudi, Aurélie Cools
      First page: 54
      Abstract: Indexing images by content is one of the most used computer vision methods, where various techniques are used to extract visual characteristics from images. The deluge of data surrounding us, due the high use of social and diverse media acquisition systems, has created a major challenge for classical multimedia processing systems. This problem is referred to as the ‘curse of dimensionality’. In the literature, several methods have been used to decrease the high dimension of features, including principal component analysis (PCA) and locality sensitive hashing (LSH). Some methods, such as VA-File or binary tree, can be used to accelerate the search phase. In this paper, we propose an efficient approach that exploits three particular methods, those being PCA and LSH for dimensionality reduction, and the VA-File method to accelerate the search phase. This combined approach is fast and can be used for high dimensionality features. Indeed, our method consists of three phases: (1) image indexing within SIFT and SURF algorithms, (2) compressing the data using LSH and PCA, and (3) finally launching the image retrieval process, which is accelerated by using a VA-File approach.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-13
      DOI: 10.3390/bdcc6020054
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 55: Virtual Reality Adaptation Using Electrodermal
           Activity to Support the User Experience

    • Authors: Francesco Chiossi, Robin Welsch, Steeven Villa, Lewis Chuang, Sven Mayer
      First page: 55
      Abstract: Virtual reality is increasingly used for tasks such as work and education. Thus, rendering scenarios that do not interfere with such goals and deplete user experience are becoming progressively more relevant. We present a physiologically adaptive system that optimizes the virtual environment based on physiological arousal, i.e., electrodermal activity. We investigated the usability of the adaptive system in a simulated social virtual reality scenario. Participants completed an n-back task (primary) and a visual detection (secondary) task. Here, we adapted the visual complexity of the secondary task in the form of the number of non-player characters of the secondary task to accomplish the primary task. We show that an adaptive virtual reality can improve users’ comfort by adapting to physiological arousal regarding the task complexity. Our findings suggest that physiologically adaptive virtual reality systems can improve users’ experience in a wide range of scenarios.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-13
      DOI: 10.3390/bdcc6020055
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 56: A Better Mechanistic Understanding of Big Data
           through an Order Search Using Causal Bayesian Networks

    • Authors: Changwon Yoo, Efrain Gonzalez, Zhenghua Gong, Deodutta Roy
      First page: 56
      Abstract: Every year, biomedical data is increasing at an alarming rate and is being collected from many different sources, such as hospitals (clinical Big Data), laboratories (genomic and proteomic Big Data), and the internet (online Big Data). This article presents and evaluates a practical causal discovery algorithm that uses modern statistical, machine learning, and informatics approaches that have been used in the learning of causal relationships from biomedical Big Data, which in turn integrates clinical, omics (genomic and proteomic), and environmental aspects. The learning of causal relationships from data using graphical models does not address the hidden (unknown or not measured) mechanisms that are inherent to most measurements and analyses. Also, many algorithms lack a practical usage since they do not incorporate current mechanistic knowledge. This paper proposes a practical causal discovery algorithm using causal Bayesian networks to gain a better understanding of the underlying mechanistic process that generated the data. The algorithm utilizes model averaging techniques such as searching through a relative order (e.g., if gene A is regulating gene B, then we can say that gene A is of a higher order than gene B) and incorporates relevant prior mechanistic knowledge to guide the Markov chain Monte Carlo search through the order. The algorithm was evaluated by testing its performance on datasets generated from the ALARM causal Bayesian network. Out of the 37 variables in the ALARM causal Bayesian network, two sets of nine were chosen and the observations for those variables were provided to the algorithm. The performance of the algorithm was evaluated by comparing its prediction with the generating causal mechanism. The 28 variables that were not in use are referred to as hidden variables and they allowed for the evaluation of the algorithm’s ability to predict hidden confounded causal relationships. The algorithm’s predicted performance was also compared with other causal discovery algorithms. The results show that incorporating order information provides a better mechanistic understanding even when hidden confounded causes are present. The prior mechanistic knowledge incorporated in the Markov chain Monte Carlo search led to the better discovery of causal relationships when hidden variables were involved in generating the simulated data.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-17
      DOI: 10.3390/bdcc6020056
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 57: Sentiment Analysis of Emirati Dialects

    • Authors: Al Shamsi, Abdallah
      First page: 57
      Abstract: Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA. Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati dataset to prepare the dataset for the sentiment analysis experiment and improve its classification performance. The sentiment analysis experiment was carried out on both balanced and unbalanced datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis experiments were accuracy, recall, precision, and f-measure. The results reported that the best accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the sentiment classification of the unbalanced dataset.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-17
      DOI: 10.3390/bdcc6020057
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 58: COVID-19 Tweets Classification Based on a Hybrid
           Word Embedding Method

    • Authors: Yosra Didi, Ahlam Walha, Ali Wali
      First page: 58
      Abstract: In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better assess health-related decision making. Therefore, we propose that users’ sentiments could be analysed with the application of effective supervised machine learning approaches to predict disease prevalence and provide early warnings. The collected tweets were prepared for preprocessing and categorised into: negative, positive, and neutral. In the second phase, different features were extracted from the posts by applying several widely used techniques, such as TF-IDF, Word2Vec, Glove, and FastText to capture features’ datasets. The novelty of this study is based on hybrid features extraction, where we combined syntactic features (TF-IDF) with semantic features (FastText and Glove) to represent posts accurately, which helps in improving the classification process. Experimental results show that FastText combined with TF-IDF performed better with SVM than the other models. SVM outperformed the other models by 88.72%, as well as for XGBoost, with an 85.29% accuracy score. This study shows that the hybrid methods proved their capability of extracting features from the tweets and increasing the performance of classification.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-18
      DOI: 10.3390/bdcc6020058
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 59: The Predictive Power of a Twitter User’s
           Profile on Cryptocurrency Popularity

    • Authors: Maria Trigka, Andreas Kanavos, Elias Dritsas, Gerasimos Vonitsanos, Phivos Mylonas
      First page: 59
      Abstract: Microblogging has become an extremely popular communication tool among Internet users worldwide. Millions of users daily share a huge amount of information related to various aspects of their lives, which makes the respective sites a very important source of data for analysis. Bitcoin (BTC) is a decentralized cryptographic currency and is equivalent to most recurrently known currencies in the way that it is influenced by socially developed conclusions, regardless of whether those conclusions are considered valid. This work aims to assess the importance of Twitter users’ profiles in predicting a cryptocurrency’s popularity. More specifically, our analysis focused on the user influence, captured by different Twitter features (such as the number of followers, retweets, lists) and tweet sentiment scores as the main components of measuring popularity. Moreover, the Spearman, Pearson, and Kendall Correlation Coefficients are applied as post-hoc procedures to support hypotheses about the correlation between a user influence and the aforementioned features. Tweets sentiment scoring (as positive or negative) was performed with the aid of Valence Aware Dictionary and Sentiment Reasoner (VADER) for a number of tweets fetched within a concrete time period. Finally, the Granger causality test was employed to evaluate the statistical significance of various features time series in popularity prediction to identify the most influential variable for predicting future values of the cryptocurrency popularity.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-20
      DOI: 10.3390/bdcc6020059
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 60: Earthquake Insurance in California, USA: What Does
           Community-Generated Big Data Reveal to Us'

    • Authors: Fabrizio Terenzio Gizzi, Maria Rosaria Potenza
      First page: 60
      Abstract: California has a high seismic hazard, as many historical and recent earthquakes remind us. To deal with potential future damaging earthquakes, a voluntary insurance system for residential properties is in force in the state. However, the insurance penetration rate is quite low. Bearing this in mind, the aim of this article is to ascertain whether Big Data can provide policymakers and stakeholders with useful information in view of future action plans on earthquake coverage. Therefore, we extracted and analyzed the online search interest in earthquake insurance over time (2004–2021) through Google Trends (GT), a website that explores the popularity of top search queries in Google Search across various regions and languages. We found that (1) the triggering of online searches stems primarily from the occurrence of earthquakes in California and neighboring areas as well as oversea regions, thus suggesting that the interest of users was guided by both direct and vicarious earthquake experiences. However, other natural hazards also come to people’s notice; (2) the length of the higher level of online attention spans from one day to one week, depending on the magnitude of the earthquakes, the place where they occur, the temporal proximity of other natural hazards, and so on; (3) users interested in earthquake insurance are also attentive to knowing the features of the policies, among which are first the price of coverage, and then their worth and practical benefits; (4) online interest in the time span analyzed fits fairly well with the real insurance policy underwritings recorded over the years. Based on the research outcomes, we can propose the establishment of an observatory to monitor the online behavior that is suitable for supporting well-timed and geographically targeted information and communication action plans.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-20
      DOI: 10.3390/bdcc6020060
      Issue No: Vol. 6, No. 2 (2022)
  • BDCC, Vol. 6, Pages 3: Analyzing Political Polarization on Social Media by
           Deleting Bot Spamming

    • Authors: Riccardo Cantini, Fabrizio Marozzo, Domenico Talia, Paolo Trunfio
      First page: 3
      Abstract: Social media platforms are part of everyday life, allowing the interconnection of people around the world in large discussion groups relating to every topic, including important social or political issues. Therefore, social media have become a valuable source of information-rich data, commonly referred to as Social Big Data, effectively exploitable to study the behavior of people, their opinions, moods, interests and activities. However, these powerful communication platforms can be also used to manipulate conversation, polluting online content and altering the popularity of users, through spamming activities and misinformation spreading. Recent studies have shown the use on social media of automatic entities, defined as social bots, that appear as legitimate users by imitating human behavior aimed at influencing discussions of any kind, including political issues. In this paper we present a new methodology, namely TIMBRE (Time-aware opInion Mining via Bot REmoval), aimed at discovering the polarity of social media users during election campaigns characterized by the rivalry of political factions. This methodology is temporally aware and relies on a keyword-based classification of posts and users. Moreover, it recognizes and filters out data produced by social media bots, which aim to alter public opinion about political candidates, thus avoiding heavily biased information. The proposed methodology has been applied to a case study that analyzes the polarization of a large number of Twitter users during the 2016 US presidential election. The achieved results show the benefits brought by both removing bots and taking into account temporal aspects in the forecasting process, revealing the high accuracy and effectiveness of the proposed approach. Finally, we investigated how the presence of social bots may affect political discussion by studying the 2016 US presidential election. Specifically, we analyzed the main differences between human and artificial political support, estimating also the influence of social bots on legitimate users.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-04
      DOI: 10.3390/bdcc6010003
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 4: Analyzing COVID-19 Medical Papers Using Artificial
           Intelligence: Insights for Researchers and Medical Professionals

    • Authors: Dmitry Soshnikov, Tatiana Petrova, Vickie Soshnikova, Andrey Grunin
      First page: 4
      Abstract: Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers navigate the medical papers collections in a meaningful way and extract some knowledge from scientific COVID-19 papers. The main idea of our approach is to get as much semi-structured information from text corpus as possible, using named entity recognition (NER) with a model called PubMedBERT and Text Analytics for Health service, then store the data into NoSQL database for further fast processing and insights generation. Additionally, the contexts in which the entities were used (neutral or negative) are determined. Application of NLP and text-based emotion detection (TBED) methods to COVID-19 text corpus allows us to gain insights on important issues of diagnosis and treatment (such as changes in medical treatment over time, joint treatment strategies using several medications, and the connection between signs and symptoms of coronavirus, etc.).
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-05
      DOI: 10.3390/bdcc6010004
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 5: A Hierarchical Hadoop Framework to Process
           Geo-Distributed Big Data

    • Authors: Giuseppe Di Modica, Orazio Tomarchio
      First page: 5
      Abstract: In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-06
      DOI: 10.3390/bdcc6010005
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 6: On Developing Generic Models for Predicting Student
           Outcomes in Educational Data Mining

    • Authors: Gomathy Ramaswami, Teo Susnjak, Anuradha Mathrani
      First page: 6
      Abstract: Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-07
      DOI: 10.3390/bdcc6010006
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 7: Infusing Autopoietic and Cognitive Behaviors into

    • Authors: Rao Mikkilineni
      First page: 7
      Abstract: All living beings use autopoiesis and cognition to manage their “life” processes from birth through death. Autopoiesis enables them to use the specification in their genomes to instantiate themselves using matter and energy transformations. They reproduce, replicate, and manage their stability. Cognition allows them to process information into knowledge and use it to manage its interactions between various constituent parts within the system and its interaction with the environment. Currently, various attempts are underway to make modern computers mimic the resilience and intelligence of living beings using symbolic and sub-symbolic computing. We discuss here the limitations of classical computer science for implementing autopoietic and cognitive behaviors in digital machines. We propose a new architecture applying the general theory of information (GTI) and pave the path to make digital automata mimic living organisms by exhibiting autopoiesis and cognitive behaviors. The new science, based on GTI, asserts that information is a fundamental constituent of the physical world and that living beings convert information into knowledge using physical structures that use matter and energy. Our proposal uses the tools derived from GTI to provide a common knowledge representation from existing symbolic and sub-symbolic computing structures to implement autopoiesis and cognitive behaviors.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-10
      DOI: 10.3390/bdcc6010007
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 8: An Empirical Comparison of Portuguese and
           Multilingual BERT Models for Auto-Classification of NCM Codes in
           International Trade

    • Authors: Roberta Rodrigues de Lima, Anita M. R. Fernandes, James Roberto Bombasar, Bruno Alves da Silva, Paul Crocker, Valderi Reis Quietinho Leithardt
      First page: 8
      Abstract: Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-10
      DOI: 10.3390/bdcc6010008
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 9: An Efficient Multi-Scale Anchor Box Approach to
           Detect Partial Faces from a Video Sequence

    • Authors: Dweepna Garg, Priyanka Jain, Ketan Kotecha, Parth Goel, Vijayakumar Varadarajan
      First page: 9
      Abstract: In recent years, face detection has achieved considerable attention in the field of computer vision using traditional machine learning techniques and deep learning techniques. Deep learning is used to build the most recent and powerful face detection algorithms. However, partial face detection still remains to achieve remarkable performance. Partial faces are occluded due to hair, hat, glasses, hands, mobile phones, and side-angle-captured images. Fewer facial features can be identified from such images. In this paper, we present a deep convolutional neural network face detection method using the anchor boxes section strategy. We limited the number of anchor boxes and scales and chose only relevant to the face shape. The proposed model was trained and tested on a popular and challenging face detection benchmark dataset, i.e., Face Detection Dataset and Benchmark (FDDB), and can also detect partially covered faces with better accuracy and precision. Extensive experiments were performed, with evaluation metrics including accuracy, precision, recall, F1 score, inference time, and FPS. The results show that the proposed model is able to detect the face in the image, including occluded features, more precisely than other state-of-the-art approaches, achieving 94.8% accuracy and 98.7% precision on the FDDB dataset at 21 frames per second (FPS).
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-11
      DOI: 10.3390/bdcc6010009
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 10: Extraction of the Relations among Significant
           Pharmacological Entities in Russian-Language Reviews of Internet Users on

    • Authors: Alexander Sboev, Anton Selivanov, Ivan Moloshnikov, Roman Rybka, Artem Gryaznov, Sanna Sboeva, Gleb Rylkov
      First page: 10
      Abstract: Nowadays, the analysis of digital media aimed at prediction of the society’s reaction to particular events and processes is a task of a great significance. Internet sources contain a large amount of meaningful information for a set of domains, such as marketing, author profiling, social situation analysis, healthcare, etc. In the case of healthcare, this information is useful for the pharmacovigilance purposes, including re-profiling of medications. The analysis of the mentioned sources requires the development of automatic natural language processing methods. These methods, in turn, require text datasets with complex annotation including information about named entities and relations between them. As the relevant literature analysis shows, there is a scarcity of datasets in the Russian language with annotated entity relations, and none have existed so far in the medical domain. This paper presents the first Russian-language textual corpus where entities have labels of different contexts within a single text, so that related entities share a common context. therefore this corpus is suitable for the task of belonging to the medical domain. Our second contribution is a method for the automated extraction of entity relations in Russian-language texts using the XLM-RoBERTa language model preliminarily trained on Russian drug review texts. A comparison with other machine learning methods is performed to estimate the efficiency of the proposed method. The method yields state-of-the-art accuracy of extracting the following relationship types: ADR–Drugname, Drugname–Diseasename, Drugname–SourceInfoDrug, Diseasename–Indication. As shown on the presented subcorpus from the Russian Drug Review Corpus, the method developed achieves a mean F1-score of 80.4% (estimated with cross-validation, averaged over the four relationship types). This result is 3.6% higher compared to the existing language model RuBERT, and 21.77% higher compared to basic ML classifiers.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-17
      DOI: 10.3390/bdcc6010010
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 11: Context-Aware Explainable Recommendation Based on
           Domain Knowledge Graph

    • Authors: Muzamil Hussain Syed, Tran Quoc Bao Huy, Sun-Tae Chung
      First page: 11
      Abstract: With the rapid growth of internet data, knowledge graphs (KGs) are considered as efficient form of knowledge representation that captures the semantics of web objects. In recent years, reasoning over KG for various artificial intelligence tasks have received a great deal of research interest. Providing recommendations based on users’ natural language queries is an equally difficult undertaking. In this paper, we propose a novel, context-aware recommender system, based on domain KG, to respond to user-defined natural queries. The proposed recommender system consists of three stages. First, we generate incomplete triples from user queries, which are then segmented using logical conjunction (∧) and disjunction (∨) operations. Then, we generate candidates by utilizing a KGE-based framework (Query2Box) for reasoning over segmented logical triples, with ∧, ∨, and ∃ operators; finally, the generated candidates are re-ranked using neural collaborative filtering (NCF) model by exploiting contextual (auxiliary) information from GraphSAGE embedding. Our approach demonstrates to be simple, yet efficient, at providing explainable recommendations on user’s queries, while leveraging user-item contextual information. Furthermore, our framework has shown to be capable of handling logical complex queries by transforming them into a disjunctive normal form (DNF) of simple queries. In this work, we focus on the restaurant domain as an application domain and use the Yelp dataset to evaluate the system. Experiments demonstrate that the proposed recommender system generalizes well on candidate generation from logical queries and effectively re-ranks those candidates, compared to the matrix factorization model.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-20
      DOI: 10.3390/bdcc6010011
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 12: Scalable Extended Reality: A Future Research

    • Authors: Vera Marie Memmesheimer, Achim Ebert
      First page: 12
      Abstract: Extensive research has outlined the potential of augmented, mixed, and virtual reality applications. However, little attention has been paid to scalability enhancements fostering practical adoption. In this paper, we introduce the concept of scalable extended reality (XRS), i.e., spaces scaling between different displays and degrees of virtuality that can be entered by multiple, possibly distributed users. The development of such XRS spaces concerns several research fields. To provide bidirectional interaction and maintain consistency with the real environment, virtual reconstructions of physical scenes need to be segmented semantically and adapted dynamically. Moreover, scalable interaction techniques for selection, manipulation, and navigation as well as a world-stabilized rendering of 2D annotations in 3D space are needed to let users intuitively switch between handheld and head-mounted displays. Collaborative settings should further integrate access control and awareness cues indicating the collaborators’ locations and actions. While many of these topics were investigated by previous research, very few have considered their integration to enhance scalability. Addressing this gap, we review related previous research, list current barriers to the development of XRS spaces, and highlight dependencies between them.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-26
      DOI: 10.3390/bdcc6010012
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 13: Fuzzy Neural Network Expert System with an
           Improved Gini Index Random Forest-Based Feature Importance Measure
           Algorithm for Early Diagnosis of Breast Cancer in Saudi Arabia

    • Authors: Ebrahem A. Algehyne, Muhammad Lawan Jibril, Naseh A. Algehainy, Osama Abdulaziz Alamri, Abdullah K. Alzahrani
      First page: 13
      Abstract: Breast cancer is one of the common malignancies among females in Saudi Arabia and has also been ranked as the one most prevalent and the number two killer disease in the country. However, the clinical diagnosis process of any disease such as breast cancer, coronary artery diseases, diabetes, COVID-19, among others, is often associated with uncertainty due to the complexity and fuzziness of the process. In this work, a fuzzy neural network expert system with an improved gini index random forest-based feature importance measure algorithm for early diagnosis of breast cancer in Saudi Arabia was proposed to address the uncertainty and ambiguity associated with the diagnosis of breast cancer and also the heavier burden on the overlay of the network nodes of the fuzzy neural network system that often happens due to insignificant features that are used to predict or diagnose the disease. An Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm was used to select the five fittest features of the diagnostic wisconsin breast cancer database out of the 32 features of the dataset. The logistic regression, support vector machine, k-nearest neighbor, random forest, and gaussian naïve bayes learning algorithms were used to develop two sets of classification models. Hence, the classification models with full features (32) and models with the 5 fittest features. The two sets of classification models were evaluated, and the results of the evaluation were compared. The result of the comparison shows that the models with the selected fittest features outperformed their counterparts with full features in terms of accuracy, sensitivity, and sensitivity. Therefore, a fuzzy neural network based expert system was developed with the five selected fittest features and the system achieved 99.33% accuracy, 99.41% sensitivity, and 99.24% specificity. Moreover, based on the comparison of the system developed in this work against the previous works that used fuzzy neural network or other applied artificial intelligence techniques on the same dataset for diagnosis of breast cancer using the same dataset, the system stands to be the best in terms of accuracy, sensitivity, and specificity, respectively. The z test was also conducted, and the test result shows that there is significant accuracy achieved by the system for early diagnosis of breast cancer.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-27
      DOI: 10.3390/bdcc6010013
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 14: Acknowledgment to Reviewers of BDCC in 2021

    • Authors: BDCC Editorial Office BDCC Editorial Office
      First page: 14
      Abstract: Rigorous peer-reviews are the basis of high-quality academic publishing [...]
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-27
      DOI: 10.3390/bdcc6010014
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 15: Google Street View Images as Predictors of Patient
           Health Outcomes, 2017–2019

    • Authors: Quynh C. Nguyen, Tom Belnap, Pallavi Dwivedi, Amir Hossein Nazem Deligani, Abhinav Kumar, Dapeng Li, Ross Whitaker, Jessica Keralis, Heran Mane, Xiaohe Yue, Thu T. Nguyen, Tolga Tasdizen, Kim D. Brunisholz
      First page: 15
      Abstract: Collecting neighborhood data can both be time- and resource-intensive, especially across broad geographies. In this study, we leveraged 1.4 million publicly available Google Street View (GSV) images from Utah to construct indicators of the neighborhood built environment and evaluate their associations with 2017–2019 health outcomes of approximately one-third of the population living in Utah. The use of electronic medical records allows for the assessment of associations between neighborhood characteristics and individual-level health outcomes while controlling for predisposing factors, which distinguishes this study from previous GSV studies that were ecological in nature. Among 938,085 adult patients, we found that individuals living in communities in the highest tertiles of green streets and non-single-family homes have 10–27% lower diabetes, uncontrolled diabetes, hypertension, and obesity, but higher substance use disorders—controlling for age, White race, Hispanic ethnicity, religion, marital status, health insurance, and area deprivation index. Conversely, the presence of visible utility wires overhead was associated with 5–10% more diabetes, uncontrolled diabetes, hypertension, obesity, and substance use disorders. Our study found that non-single-family and green streets were related to a lower prevalence of chronic conditions, while visible utility wires and single-lane roads were connected with a higher burden of chronic conditions. These contextual characteristics can better help healthcare organizations understand the drivers of their patients’ health by further considering patients’ residential environments, which present both risks and resources.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-27
      DOI: 10.3390/bdcc6010015
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 16: A Dataset for Emotion Recognition Using Virtual
           Reality and EEG (DER-VREEG): Emotional State Classification Using Low-Cost
           Wearable VR-EEG Headsets

    • Authors: Nazmi Sofian Suhaimi, James Mountstephens, Jason Teo
      First page: 16
      Abstract: Emotions are viewed as an important aspect of human interactions and conversations, and allow effective and logical decision making. Emotion recognition uses low-cost wearable electroencephalography (EEG) headsets to collect brainwave signals and interpret these signals to provide information on the mental state of a person, with the implementation of a virtual reality environment in different applications; the gap between human and computer interaction, as well as the understanding process, would shorten, providing an immediate response to an individual’s mental health. This study aims to use a virtual reality (VR) headset to induce four classes of emotions (happy, scared, calm, and bored), to collect brainwave samples using a low-cost wearable EEG headset, and to run popular classifiers to compare the most feasible ones that can be used for this particular setup. Firstly, we attempt to build an immersive VR database that is accessible to the public and that can potentially assist with emotion recognition studies using virtual reality stimuli. Secondly, we use a low-cost wearable EEG headset that is both compact and small, and can be attached to the scalp without any hindrance, allowing freedom of movement for participants to view their surroundings inside the immersive VR stimulus. Finally, we evaluate the emotion recognition system by using popular machine learning algorithms and compare them for both intra-subject and inter-subject classification. The results obtained here show that the prediction model for the four-class emotion classification performed well, including the more challenging inter-subject classification, with the support vector machine (SVM Class Weight kernel) obtaining 85.01% classification accuracy. This shows that using less electrode channels but with proper parameter tuning and selection features affects the performance of the classifications.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-28
      DOI: 10.3390/bdcc6010016
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 17: Big Data Analytics in Supply Chain Management: A
           Systematic Literature Review and Research Directions

    • Authors: In Lee, George Mangalaraj
      First page: 17
      Abstract: Big data analytics has been successfully used for various business functions, such as accounting, marketing, supply chain, and operations. Currently, along with the recent development in machine learning and computing infrastructure, big data analytics in the supply chain are surging in importance. In light of the great interest and evolving nature of big data analytics in supply chains, this study conducts a systematic review of existing studies in big data analytics. This study presents a framework of a systematic literature review from interdisciplinary perspectives. From the organizational perspective, this study examines the theoretical foundations and research models that explain the sustainability and performances achieved through the use of big data analytics. Then, from the technical perspective, this study analyzes types of big data analytics, techniques, algorithms, and features developed for enhanced supply chain functions. Finally, this study identifies the research gap and suggests future research directions.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-01
      DOI: 10.3390/bdcc6010017
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 18: Big Data in Construction: Current Applications and
           Future Opportunities

    • Authors: Munawar, Ullah, Qayyum, Shahzad
      First page: 18
      Abstract: Big data have become an integral part of various research fields due to the rapid advancements in the digital technologies available for dealing with data. The construction industry is no exception and has seen a spike in the data being generated due to the introduction of various digital disruptive technologies. However, despite the availability of data and the introduction of such technologies, the construction industry is lagging in harnessing big data. This paper critically explores literature published since 2010 to identify the data trends and how the construction industry can benefit from big data. The presence of tools such as computer-aided drawing (CAD) and building information modelling (BIM) provide a great opportunity for researchers in the construction industry to further improve how infrastructure can be developed, monitored, or improved in the future. The gaps in the existing research data have been explored and a detailed analysis was carried out to identify the different ways in which big data analysis and storage work in relevance to the construction industry. Big data engineering (BDE) and statistics are among the most crucial steps for integrating big data technology in construction. The results of this study suggest that while the existing research studies have set the stage for improving big data research, the integration of the associated digital technologies into the construction industry is not very clear. Among the future opportunities, big data research into construction safety, site management, heritage conservation, and project waste minimization and quality improvements are key areas.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-06
      DOI: 10.3390/bdcc6010018
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 19: The Next-Generation NIDS Platform: Cloud-Based
           Snort NIDS Using Containers and Big Data

    • Authors: Ferry Astika Saputra, Muhammad Salman, Jauari Akhmad Nur Hasim, Isbat Uzzin Nadhori, Kalamullah Ramli
      First page: 19
      Abstract: Snort is a well-known, signature-based network intrusion detection system (NIDS). The Snort sensor must be placed within the same physical network, and the defense centers in the typical NIDS architecture offer limited network coverage, especially for remote networks with a restricted bandwidth and network policy. Additionally, the growing number of sensor instances, followed by a quick increase in log data volume, has caused the present system to face big data challenges. This research paper proposes a novel design for a cloud-based Snort NIDS using containers and implementing big data in the defense center to overcome these problems. Our design consists of Docker as the sensor’s platform, Apache Kafka, as the distributed messaging system, and big data technology orchestrated on lambda architecture. We conducted experiments to measure sensor deployment, optimum message delivery from the sensors to the defense center, aggregation speed, and efficiency in the data-processing performance of the defense center. We successfully developed a cloud-based Snort NIDS and found the optimum method for message-delivery from the sensor to the defense center. We also succeeded in developing the dashboard and attack maps to display the attack statistics and visualize the attacks. Our first design is reported to implement the big data architecture, namely, lambda architecture, as the defense center and utilize rapid deployment of Snort NIDS using Docker technology as the network security monitoring platform.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-07
      DOI: 10.3390/bdcc6010019
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 20: Person Re-Identification via Pyramid Multipart
           Features and Multi-Attention Framework

    • Authors: Randa Mohamed Bayoumi, Elsayed E. Hemayed, Mohammad Ehab Ragab, Magda B. Fayek
      First page: 20
      Abstract: Video-based person re-identification has become quite attractive due to its importance in many vision surveillance problems. It is a challenging topic due to the inter/intra changes, occlusion, and pose variations involved. In this paper, we propose a pyramid-attentive framework that relies on multi-part features and multiple attention to aggregate features of multi-levels and learns attention-based representations of persons through various aspects. Self-attention is used to strengthen the most discriminative features in the spatial and channel domains and hence capture robust global information. We propose the use of part-relation attention between different multi-granularities of features’ representation to focus on learning appropriate local features. Temporal attention is used to aggregate temporal features. We integrate the most robust features in the global and multi-level views to build an effective convolution neural network (CNN) model. The proposed model outperforms the previous state-of-the art models on three datasets. Notably, using the proposed model enables the achievement of 98.9% (a relative improvement of 2.7% on the GRL) top1 accuracy and 99.3% mAP on the PRID2011, and 92.8% (a relative improvement of 2.4% relative to GRL) top1 accuracy on iLIDS-vid. We also explore the generalization ability of our model on a cross dataset.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-09
      DOI: 10.3390/bdcc6010020
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 21: Vec2Dynamics: A Temporal Word Embedding Approach
           to Exploring the Dynamics of Scientific Keywords—Machine Learning as
           a Case Study

    • Authors: Amna Dridi, Mohamed Medhat Gaber, Raja Muhammad Atif Azad, Jagdev Bhogal
      First page: 21
      Abstract: The study of the dynamics or the progress of science has been widely explored with descriptive and statistical analyses. Also this study has attracted several computational approaches that are labelled together as the Computational History of Science, especially with the rise of data science and the development of increasingly powerful computers. Among these approaches, some works have studied dynamism in scientific literature by employing text analysis techniques that rely on topic models to study the dynamics of research topics. Unlike topic models that do not delve deeper into the content of scientific publications, for the first time, this paper uses temporal word embeddings to automatically track the dynamics of scientific keywords over time. To this end, we propose Vec2Dynamics, a neural-based computational history approach that reports stability of k-nearest neighbors of scientific keywords over time; the stability indicates whether the keywords are taking new neighborhood due to evolution of scientific literature. To evaluate how Vec2Dynamics models such relationships in the domain of Machine Learning (ML), we constructed scientific corpora from the papers published in the Neural Information Processing Systems (NIPS; actually abbreviated NeurIPS) conference between 1987 and 2016. The descriptive analysis that we performed in this paper verify the efficacy of our proposed approach. In fact, we found a generally strong consistency between the obtained results and the Machine Learning timeline.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-21
      DOI: 10.3390/bdcc6010021
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 22: LFA: A Lévy Walk and Firefly-Based Search
           Algorithm: Application to Multi-Target Search and Multi-Robot Foraging

    • Authors: Ouarda Zedadra, Antonio Guerrieri, Hamid Seridi
      First page: 22
      Abstract: In the literature, several exploration algorithms have been proposed so far. Among these, Lévy walk is commonly used since it is proved to be more efficient than the simple random-walk exploration. It is beneficial when targets are sparsely distributed in the search space. However, due to its super-diffusive behavior, some tuning is needed to improve its performance, specifically when targets are clustered. Firefly algorithm is a swarm intelligence-based algorithm useful for intensive search, but its exploration rate is very limited. An efficient and reliable search could be attained by combining the two algorithms since the first one allows exploration space, and the second one encourages its exploitation. In this paper, we propose a swarm intelligence-based search algorithm called Lévy walk and Firefly-based Algorithm (LFA), which is a hybridization of the two aforementioned algorithms. The algorithm is applied to Multi-Target Search and Multi-Robot Foraging. Numerical experiments to test the performances are conducted on the robotic simulator ARGoS. A comparison with the original firefly algorithm proves the goodness of our contribution.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-21
      DOI: 10.3390/bdcc6010022
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 23: A Framework for Content-Based Search in Large
           Music Collections

    • Authors: Tiange Zhu, Raphaël Fournier-S’niehotta, Philippe Rigaux, Nicolas Travers
      First page: 23
      Abstract: We address the problem of scalable content-based search in large collections of music documents. Music content is highly complex and versatile and presents multiple facets that can be considered independently or in combination. Moreover, music documents can be digitally encoded in many ways. We propose a general framework for building a scalable search engine, based on (i) a music description language that represents music content independently from a specific encoding, (ii) an extendible list of feature-extraction functions, and (iii) indexing, searching, and ranking procedures designed to be integrated into the standard architecture of a text-oriented search engine. As a proof of concept, we also detail an actual implementation of the framework for searching in large collections of XML-encoded music scores, based on the popular ElasticSearch system. It is released as open-source in GitHub, and available as a ready-to-use Docker image for communities that manage large collections of digitized music documents.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-23
      DOI: 10.3390/bdcc6010023
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 24: Combination of Reduction Detection Using TOPSIS
           for Gene Expression Data Analysis

    • Authors: Jogeswar Tripathy, Rasmita Dash, Binod Kumar Pattanayak, Sambit Kumar Mishra, Tapas Kumar Mishra, Deepak Puthal
      First page: 24
      Abstract: In high-dimensional data analysis, Feature Selection (FS) is one of the most fundamental issues in machine learning and requires the attention of researchers. These datasets are characterized by huge space due to a high number of features, out of which only a few are significant for analysis. Thus, significant feature extraction is crucial. There are various techniques available for feature selection; among them, the filter techniques are significant in this community, as they can be used with any type of learning algorithm and drastically lower the running time of optimization algorithms and improve the performance of the model. Furthermore, the application of a filter approach depends on the characteristics of the dataset as well as on the machine learning model. Thus, to avoid these issues in this research, a combination of feature reduction (CFR) is considered designing a pipeline of filter approaches for high-dimensional microarray data classification. Considering four filter approaches, sixteen combinations of pipelines are generated. The feature subset is reduced in different levels, and ultimately, the significant feature set is evaluated. The pipelined filter techniques are Correlation-Based Feature Selection (CBFS), Chi-Square Test (CST), Information Gain (InG), and Relief Feature Selection (RFS), and the classification techniques are Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), and k-Nearest Neighbor (k-NN). The performance of CFR depends highly on the datasets as well as on the classifiers. Thereafter, the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) method is used for ranking all reduction combinations and evaluating the superior filter combination among all.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-23
      DOI: 10.3390/bdcc6010024
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 25: Big Data in Criteria Selection and Identification
           in Managing Flood Disaster Events Based on Macro Domain PESTEL Analysis:
           Case Study of Malaysia Adaptation Index

    • Authors: Mohammad Fikry Abdullah, Zurina Zainol, Siaw Yin Thian, Noor Hisham Ab Ghani, Azman Mat Jusoh, Mohd Zaki Mat Amin, Nur Aiza Mohamad
      First page: 25
      Abstract: The impact of Big Data (BD) creates challenges in selecting relevant and significant data to be used as criteria to facilitate flood management plans. Studies on macro domain criteria expand the criteria selection, which is important for assessment in allowing a comprehensive understanding of the current situation, readiness, preparation, resources, and others for decision assessment and disaster events planning. This study aims to facilitate the criteria identification and selection from a macro domain perspective in improving flood management planning. The objectives of this study are (a) to explore and identify potential and possible criteria to be incorporated in the current flood management plan in the macro domain perspective; (b) to understand the type of flood measures and decision goals implemented to facilitate flood management planning decisions; and (c) to examine the possible structured mechanism for criteria selection based on the decision analysis technique. Based on a systematic literature review and thematic analysis using the PESTEL framework, the findings have identified and clustered domains and their criteria to be considered and applied in future flood management plans. The critical review on flood measures and decision goals would potentially equip stakeholders and policy makers for better decision making based on a disaster management plan. The decision analysis technique as a structured mechanism would significantly improve criteria identification and selection for comprehensive and collective decisions. The findings from this study could further improve Malaysia Adaptation Index (MAIN) criteria identification and selection, which could be the complementary and supporting reference in managing flood disaster management. A proposed framework from this study can be used as guidance in dealing with and optimising the criteria based on challenges and the current application of Big Data and criteria in managing disaster events.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-01
      DOI: 10.3390/bdcc6010025
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 26: A Combined System Metrics Approach to Cloud
           Service Reliability Using Artificial Intelligence

    • Authors: Tek Raj Chhetri, Chinmaya Kumar Dehury, Artjom Lind, Satish Narayana Srirama, Anna Fensel
      First page: 26
      Abstract: Identifying and anticipating potential failures in the cloud is an effective method for increasing cloud reliability and proactive failure management. Many studies have been conducted to predict potential failure, but none have combined SMART (self-monitoring, analysis, and reporting technology) hard drive metrics with other system metrics, such as central processing unit (CPU) utilisation. Therefore, we propose a combined system metrics approach for failure prediction based on artificial intelligence to improve reliability. We tested over 100 cloud servers’ data and four artificial intelligence algorithms: random forest, gradient boosting, long short-term memory, and gated recurrent unit, and also performed correlation analysis. Our correlation analysis sheds light on the relationships that exist between system metrics and failure, and the experimental results demonstrate the advantages of combining system metrics, outperforming the state-of-the-art.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-01
      DOI: 10.3390/bdcc6010026
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 27: Optimizations for Computing Relatedness in
           Biomedical Heterogeneous Information Networks: SemNet 2.0

    • Authors: Anna Kirkpatrick, Chidozie Onyeze, David Kartchner, Stephen Allegri, Davi Nakajima An, Kevin McCoy, Evie Davalbhakta, Cassie S. Mitchell
      First page: 27
      Abstract: Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-01
      DOI: 10.3390/bdcc6010027
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 28: Comparison of Object Detection in Head-Mounted and
           Desktop Displays for Congruent and Incongruent Environments

    • Authors: René Reinhard, Erinchan Telatar, Shah Rukh Humayoun
      First page: 28
      Abstract: Virtual reality technologies, including head-mounted displays (HMD), can provide benefits to psychological research by combining high degrees of experimental control with improved ecological validity. This is due to the strong feeling of being in the displayed environment (presence) experienced by VR users. As of yet, it is not fully explored how using HMDs impacts basic perceptual tasks, such as object perception. In traditional display setups, the congruency between background environment and object category has been shown to impact response times in object perception tasks. In this study, we investigated whether this well-established effect is comparable when using desktop and HMD devices. In the study, 21 participants used both desktop and HMD setups to perform an object identification task and, subsequently, their subjective presence while experiencing two-distinct virtual environments (a beach and a home environment) was evaluated. Participants were quicker to identify objects in the HMD condition, independent of object-environment congruency, while congruency effects were not impacted. Furthermore, participants reported significantly higher presence in the HMD condition.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-07
      DOI: 10.3390/bdcc6010028
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 29: Radiology Imaging Scans for Early Diagnosis of
           Kidney Tumors: A Review of Data Analytics-Based Machine Learning and Deep
           Learning Approaches

    • Authors: Maha Gharaibeh, Dalia Alzu’bi, Malak Abdullah, Ismail Hmeidi, Mohammad Rustom Al Nasar, Laith Abualigah, Amir H. Gandomi
      First page: 29
      Abstract: Plenty of disease types exist in world communities that can be explained by humans’ lifestyles or the economic, social, genetic, and other factors of the country of residence. Recently, most research has focused on studying common diseases in the population to reduce death risks, take the best procedure for treatment, and enhance the healthcare level of the communities. Kidney Disease is one of the common diseases that have affected our societies. Sectionicularly Kidney Tumors (KT) are the 10th most prevalent tumor for men and women worldwide. Overall, the lifetime likelihood of developing a kidney tumor for males is about 1 in 466 (2.02 percent) and it is around 1 in 80 (1.03 percent) for females. Still, more research is needed on new diagnostic, early, and innovative methods regarding finding an appropriate treatment method for KT. Compared to the tedious and time-consuming traditional diagnosis, automatic detection algorithms of machine learning can save diagnosis time, improve test accuracy, and reduce costs. Previous studies have shown that deep learning can play a role in dealing with complex tasks, diagnosis and segmentation, and classification of Kidney Tumors, one of the most malignant tumors. The goals of this review article on deep learning in radiology imaging are to summarize what has already been accomplished, determine the techniques used by the researchers in previous years in diagnosing Kidney Tumors through medical imaging, and identify some promising future avenues, whether in terms of applications or technological developments, as well as identifying common problems, describing ways to expand the data set, summarizing the knowledge and best practices, and determining remaining challenges and future directions.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-08
      DOI: 10.3390/bdcc6010029
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 30: Big Data Management in Drug–Drug
           Interaction: A Modern Deep Learning Approach for Smart Healthcare

    • Authors: Muhammad Salman, Hafiz Suliman Munawar, Khalid Latif, Muhammad Waseem Akram, Sara Imran Khan, Fahim Ullah
      First page: 30
      Abstract: The detection and classification of drug–drug interactions (DDI) from existing data are of high importance because recent reports show that DDIs are among the major causes of hospital-acquired conditions and readmissions and are also necessary for smart healthcare. Therefore, to avoid adverse drug interactions, it is necessary to have an up-to-date knowledge of DDIs. This knowledge could be extracted by applying text-processing techniques to the medical literature published in the form of ‘Big Data’ because, whenever a drug interaction is investigated, it is typically reported and published in healthcare and clinical pharmacology journals. However, it is crucial to automate the extraction of the interactions taking place between drugs because the medical literature is being published in immense volumes, and it is impossible for healthcare professionals to read and collect all of the investigated DDI reports from these Big Data. To avoid this time-consuming procedure, the Information Extraction (IE) and Relationship Extraction (RE) techniques that have been studied in depth in Natural Language Processing (NLP) could be very promising. Since 2011, a lot of research has been reported in this particular area, and there are many approaches that have been implemented that can also be applied to biomedical texts to extract DDI-related information. A benchmark corpus is also publicly available for the advancement of DDI extraction tasks. The current state-of-the-art implementations for extracting DDIs from biomedical texts has employed Support Vector Machines (SVM) or other machine learning methods that work on manually defined features and that might be the cause of the low precision and recall that have been achieved in this domain so far. Modern deep learning techniques have also been applied for the automatic extraction of DDIs from the scientific literature and have proven to be very promising for the advancement of DDI extraction tasks. As such, it is pertinent to investigate deep learning techniques for the extraction and classification of DDIs in order for them to be used in the smart healthcare domain. We proposed a deep neural network-based method (SEV-DDI: Severity-Drug–Drug Interaction) with some further-integrated units/layers to achieve higher precision and accuracy. After successfully outperforming other methods in the DDI classification task, we moved a step further and utilized the methods in a sentiment analysis task to investigate the severity of an interaction. The ability to determine the severity of a DDI will be very helpful for clinical decision support systems in making more accurate and informed decisions, ensuring the safety of the patients.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-09
      DOI: 10.3390/bdcc6010030
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 31: Factors Influencing Citizens’ Intention to
           Use Open Government Data—A Case Study of Pakistan

    • Authors: Muhammad Mahboob Khurshid, Nor Hidayati Zakaria, Muhammad Irfanullah Arfeen, Ammar Rashid, Safi Ullah Nasir, Hafiz Muhammad Faisal Shehzad
      First page: 31
      Abstract: Open government data (OGD) has gained much attention worldwide; however, there is still an increasing demand for exploring research from the perspective of its adoption and diffusion. Policymakers expect that OGD will be used on a large scale by the public, which will result in a range of benefits, such as: faith and trust in governments, innovation and development, and participatory governance. However, not much is known about which factors influence the citizens’ intention to use OGD. Therefore, this research aims at empirically investigating the factors that influence citizens’ intention to use OGD in a developing country using information systems theory. Improved knowledge and understanding of the influencing factors can assist policymakers in determining which policy initiatives they can take to increase the intention to widely use OGD. Upon conducting a survey and performing analysis, findings reveal that perceived usefulness, social approval, and enjoyment positively influences intention, whereas voluntariness of use negatively influences OGD use. Further, perceived usefulness is significantly affected by perceived ease of use, and OGD use is significantly affected by OGD use intention. However, surprisingly, the intention to use OGD is not significantly affected by perceived ease of use. The policymakers suggest increasing the intention to use OGD by considering significant factors.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-17
      DOI: 10.3390/bdcc6010031
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 32: Service Oriented R-ANN Knowledge Model for Social
           Internet of Things

    • Authors: Mohana S. D., S. P. Shiva Prakash, Kirill Krinkin
      First page: 32
      Abstract: Increase in technologies around the world requires adding intelligence to the objects, and making it a smart object in an environment leads to the Social Internet of Things (SIoT). These social objects are uniquely identifiable, transferable and share information from user-to-objects and objects-to objects through interactions in a smart environment such as smart homes, smart cities and many more applications. SIoT faces certain challenges such as handling of heterogeneous objects, selection of generated data in objects, missing values in data. Therefore, the discovery and communication of meaningful patterns in data are more important for every application. Thus, the analysis of data is essential in smarter decisions and qualifies performance of data for various applications. In a smart environment, social networks of intelligent objects are increasing services and decreasing the relationship in a reliable and efficient way of sharing resources and services. Hence, this work proposed the feature selection method based on proposed semantic rules and established the relationships to classify the services using relationship artificial neural networks (R-ANN). R-ANN is an inversely proportional relationship to the objects based on certain rules and conditions between the objects to objects and users to objects. It provides the service oriented knowledge model to make decisions in the proposed R-ANN model that produces service to the users. The proposed R-ANN provides an accuracy of 89.62% for various services namely weather, air quality, parking, light status, and people presence respectively in the SIoT environment compared to the existing model.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-18
      DOI: 10.3390/bdcc6010032
      Issue No: Vol. 6, No. 1 (2022)
  • BDCC, Vol. 6, Pages 1: AGR4BS: A Generic Multi-Agent Organizational Model
           for Blockchain Systems

    • Authors: Hector Roussille, Önder Gürcan, Fabien Michel
      First page: 1
      Abstract: Blockchain is a very attractive technology since it maintains a public, append-only, immutable and ordered log of transactions which guarantees an auditable ledger accessible by anyone. Blockchain systems are inherently interdisciplinary since they combine various fields such as cryptography, multi-agent systems, distributed systems, social systems, economy, and finance. Furthermore, they have a very active and dynamic ecosystem where new blockchain platforms and algorithms are developed continuously due to the interest of the public and the industries to the technology. Consequently, we anticipate a challenging and interdisciplinary research agenda in blockchain systems, built upon a methodology that strives to capture the rich process resulting from the interplay between the behavior of agents and the dynamic interactions among them. To be effective, however, modeling studies providing insights into blockchain systems, and appropriate description of agents paired with a generic understanding of their components are needed. Such studies will create a more unified field of blockchain systems that advances our understanding and leads to further insight. According to this perspective, in this study, we propose using a generic multi-agent organizational modeling for studying blockchain systems, namely AGR4BS. Concretely, we use the Agent/Group/Role (AGR) organizational modeling approach to identify and represent the generic entities which are common to blockchain systems. We show through four real case studies how this generic model can be used to model different blockchain systems. We also show briefly how it can be used for modeling three well-known attacks on blockchain systems.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-21
      DOI: 10.3390/bdcc6010001
      Issue No: Vol. 6, No. 1 (2021)
  • BDCC, Vol. 6, Pages 2: Early Diagnosis of Alzheimer’s Disease Using
           Cerebral Catheter Angiogram Neuroimaging: A Novel Model Based on Deep
           Learning Approaches

    • Authors: Maha Gharaibeh, Mothanna Almahmoud, Mostafa Z. Ali, Amer Al-Badarneh, Mwaffaq El-Heis, Laith Abualigah, Maryam Altalhi, Ahmad Alaiad, Amir H. Gandomi
      First page: 2
      Abstract: Neuroimaging refers to the techniques that provide efficient information about the neural structure of the human brain, which is utilized for diagnosis, treatment, and scientific research. The problem of classifying neuroimages is one of the most important steps that are needed by medical staff to diagnose their patients early by investigating the indicators of different neuroimaging types. Early diagnosis of Alzheimer’s disease is of great importance in preventing the deterioration of the patient’s situation. In this research, a novel approach was devised based on a digital subtracted angiogram scan that provides sufficient features of a new biomarker cerebral blood flow. The used dataset was acquired from the database of K.A.U.H hospital and contains digital subtracted angiograms of participants who were diagnosed with Alzheimer’s disease, besides samples of normal controls. Since each scan included multiple frames for the left and right ICA’s, pre-processing steps were applied to make the dataset prepared for the next stages of feature extraction and classification. The multiple frames of scans transformed from real space into DCT space and averaged to remove noises. Then, the averaged image was transformed back to the real space, and both sides filtered with Meijering and concatenated in a single image. The proposed model extracts the features using different pre-trained models: InceptionV3 and DenseNet201. Then, the PCA method was utilized to select the features with 0.99 explained variance ratio, where the combination of selected features from both pre-trained models is fed into machine learning classifiers. Overall, the obtained experimental results are at least as good as other state-of-the-art approaches in the literature and more efficient according to the recent medical standards with a 99.14% level of accuracy, considering the difference in dataset samples and the used cerebral blood flow biomarker.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-28
      DOI: 10.3390/bdcc6010002
      Issue No: Vol. 6, No. 1 (2021)
  • BDCC, Vol. 5, Pages 46: Uncovering Active Communities from Directed Graphs
           on Distributed Spark Frameworks, Case Study: Twitter Data

    • Authors: Moertini, Adithia
      First page: 46
      Abstract: Directed graphs can be prepared from big data containing peoples’ interaction information. In these graphs the vertices represent people, while the directed edges denote the interactions among them. The number of interactions at certain intervals can be included as the edges’ attribute. Thus, the larger the count, the more frequent the people (vertices) interact with each other. Subgraphs which have a count larger than a threshold value can be created from these graphs, and temporal active communities can then be mined from each of these subgraphs. Apache Spark has been recognized as a data processing framework that is fast and scalable for processing big data. It provides DataFrames, GraphFrames, and GraphX APIs which can be employed for analyzing big graphs. We propose three kinds of active communities, namely, Similar interest communities (SIC), Strong-interacting communities (SC), and Strong-interacting communities with their “inner circle” neighbors (SCIC), along with algorithms needed to uncover them. The algorithm design and implementation are based on these APIs. We conducted experiments on a Spark cluster using ten machines. The results show that our proposed algorithms are able to uncover active communities from public big graphs as well from Twitter data collected using Spark structured streaming. In some cases, the execution time of the algorithms that are based on GraphFrames’ motif findings is faster.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-22
      DOI: 10.3390/bdcc5040046
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 47: Exploring How Phonotactic Knowledge Can Be
           Represented in Cognitive Networks

    • Authors: Michael S. Vitevitch, Leo Niehorster-Cook, Sasha Niehorster-Cook
      First page: 47
      Abstract: In Linguistics and Psycholinguistics, phonotactics refers to the constraints on individual sounds in a given language that restrict how those sounds can be ordered to form words in that language. Previous empirical work in Psycholinguistics demonstrated that phonotactic knowledge influenced how quickly and accurately listeners retrieved words from that part of memory known as the mental lexicon. In the present study, we used three computer simulations to explore how three different cognitive network architectures could account for the previously observed effects of phonotactics on processing. The results of Simulation 1 showed that some—but not all—effects of phonotactics could be accounted for in a network where nodes represent words and edges connect words that are phonologically related to each other. In Simulation 2, a different network architecture was used to again account for some—but not all—effects of phonotactics and phonological neighborhood density. A bipartite network was used in Simulation 3 to account for many of the previously observed effects of phonotactic knowledge on spoken word recognition. The value of using computer simulations to explore different network architectures is discussed.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-23
      DOI: 10.3390/bdcc5040047
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 48: Big Data Contribution in Desktop and Mobile
           Devices Comparison, Regarding Airlines’ Digital Brand Name Effect

    • Authors: Damianos P. Sakas, Nikolaos Th. Giannakopoulos
      First page: 48
      Abstract: Rising demand for optimized digital marketing strategies has led firms in a hunt to harvest every possible aspect indicating users’ experience and preference. People visit, regularly through the day, numerous websites using both desktop and mobile devices. For businesses to acknowledge device’s usage rates is extremely important. Thus, this research is focused on analyzing each device’s usage and their effect on airline firms’ digital brand name. In the first phase of the research, we gathered web data from 10 airline firms during an observation period of 180 days. We then proceeded in developing an exploratory model using Fuzzy Cognitive Mapping, as well as a predictive and simulation model using Agent-Based Modeling. We inferred that various factors of airlines’ digital brand name are affected by both desktop and mobile usage, with mobile usage having a slightly bigger impact on most of them, with gradually rising values. Desktop device usage also appeared to be quite significant, especially in traffic coming from referral sources. The paper’s contribution has been to provide a handful of time-accurate insights for marketeers, regarding airlines’ digital marketing strategies.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-26
      DOI: 10.3390/bdcc5040048
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 49: ADD: Attention-Based DeepFake Detection Approach

    • Authors: Aminollah Khormali, Jiann-Shiun Yuan
      First page: 49
      Abstract: Recent advancements of Generative Adversarial Networks (GANs) pose emerging yet serious privacy risks threatening digital media’s integrity and trustworthiness, specifically digital video, through synthesizing hyper-realistic images and videos, i.e., DeepFakes. The need for ascertaining the trustworthiness of digital media calls for automatic yet accurate DeepFake detection algorithms. This paper presents an attention-based DeepFake detection (ADD) method that exploits the fine-grained and spatial locality attributes of artificially synthesized videos for enhanced detection. ADD framework is composed of two main components including face close-up and face shut-off data augmentation methods and is applicable to any classifier based on convolutional neural network architecture. ADD first locates potentially manipulated areas of the input image to extract representative features. Second, the detection model is forced to pay more attention to these forgery regions in the decision-making process through a particular focus on interpreting the sample in the learning phase. ADD’s performance is evaluated against two challenging datasets of DeepFake forensics, i.e., Celeb-DF (V2) and WildDeepFake. We demonstrated the generalization of ADD by evaluating four popular classifiers, namely VGGNet, ResNet, Xception, and MobileNet. The obtained results demonstrate that ADD can boost the detection performance of all four baseline classifiers significantly on both benchmark datasets. Particularly, ADD with ResNet backbone detects DeepFakes with more than 98.3% on Celeb-DF (V2), outperforming state-of-the-art DeepFake detection methods.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-27
      DOI: 10.3390/bdcc5040049
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 50: Advances in Convolution Neural Networks Based
           Crowd Counting and Density Estimation

    • Authors: Rafik Gouiaa, Moulay A. Akhloufi, Mozhdeh Shahbazi
      First page: 50
      Abstract: Automatically estimating the number of people in unconstrained scenes is a crucial yet challenging task in different real-world applications, including video surveillance, public safety, urban planning, and traffic monitoring. In addition, methods developed to estimate the number of people can be adapted and applied to related tasks in various fields, such as plant counting, vehicle counting, and cell microscopy. Many challenges and problems face crowd counting, including cluttered scenes, extreme occlusions, scale variation, and changes in camera perspective. Therefore, in the past few years, tremendous research efforts have been devoted to crowd counting, and numerous excellent techniques have been proposed. The significant progress in crowd counting methods in recent years is mostly attributed to advances in deep convolution neural networks (CNNs) as well as to public crowd counting datasets. In this work, we review the papers that have been published in the last decade and provide a comprehensive survey of the recent CNNs based crowd counting techniques. We briefly review detection-based, regression-based, and traditional density estimation based approaches. Then, we delve into detail regarding the deep learning based density estimation approaches and recently published datasets. In addition, we discuss the potential applications of crowd counting and in particular its applications using unmanned aerial vehicle (UAV) images.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-28
      DOI: 10.3390/bdcc5040050
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 51: A Study on Singapore’s Ageing Population in the
           Context of Eldercare Initiatives Using Machine Learning Algorithms

    • Authors: Easwaramoorthy Rangaswamy, Girija Periyasamy, Nishad Nawaz
      First page: 51
      Abstract: Ageing has always directly impacted the healthcare systems and, more specifically, the eldercare costs, as initiatives related to eldercare need to be addressed beyond the regular healthcare costs. This study aims to examine the general issues of eldercare in the Singapore context, as the population of the country is ageing rapidly. The main objective of the study is to examine the eldercare initiatives of the government and their likely impact on the ageing population. The methodology adopted in this study is Cross-Industry Standard Process for Data Mining (CRISP-DM). Reviews related to the impact of an ageing population on healthcare systems in the context of eldercare initiatives were studied. Analysis methods include correlation and machine learning algorithms, such as Decision Tree, Logistic Regression and Receiver Operating Characteristics curve analysis. Suggestions have been provided for various healthcare and eldercare systems’ initiatives and needs that are required to transform to cope with the ageing population.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-29
      DOI: 10.3390/bdcc5040051
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 52: Hardening the Security of Multi-Access Edge
           Computing through Bio-Inspired VM Introspection

    • Authors: Huseyn Huseynov, Tarek Saadawi, Kenichi Kourai
      First page: 52
      Abstract: The extreme bandwidth and performance of 5G mobile networks changes the way we develop and utilize digital services. Within a few years, 5G will not only touch technology and applications, but dramatically change the economy, our society and individual life. One of the emerging technologies that enables the evolution to 5G by bringing cloud capabilities near to the end users is Edge Computing or also known as Multi-Access Edge Computing (MEC) that will become pertinent towards the evolution of 5G. This evolution also entails growth in the threat landscape and increase privacy in concerns at different application areas, hence security and privacy plays a central role in the evolution towards 5G. Since MEC application instantiated in the virtualized infrastructure, in this paper we present a distributed application that aims to constantly introspect multiple virtual machines (VMs) in order to detect malicious activities based on their anomalous behavior. Once suspicious processes detected, our IDS in real-time notifies system administrator about the potential threat. Developed software is able to detect keyloggers, rootkits, trojans, process hiding and other intrusion artifacts via agent-less operation, by operating remotely or directly from the host machine. Remote memory introspection means no software to install, no notice to malware to evacuate or destroy data. Experimental results of remote VMI on more than 50 different malicious code demonstrate average anomaly detection rate close to 97%. We have established wide testbed environment connecting networks of two universities Kyushu Institute of Technology and The City College of New York through secure GRE tunnel. Conducted experiments on this testbed deliver high response time of the proposed system.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-10-08
      DOI: 10.3390/bdcc5040052
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 53: Bag of Features (BoF) Based Deep Learning
           Framework for Bleached Corals Detection

    • Authors: Sonain Jamil, MuhibUr Rahman, Amir Haider
      First page: 53
      Abstract: Coral reefs are the sub-aqueous calcium carbonate structures collected by the invertebrates known as corals. The charm and beauty of coral reefs attract tourists, and they play a vital role in preserving biodiversity, ceasing coastal erosion, and promoting business trade. However, they are declining because of over-exploitation, damaging fishery, marine pollution, and global climate changes. Also, coral reefs help treat human immune-deficiency virus (HIV), heart disease, and coastal erosion. The corals of Australia’s great barrier reef have started bleaching due to the ocean acidification, and global warming, which is an alarming threat to the earth’s ecosystem. Many techniques have been developed to address such issues. However, each method has a limitation due to the low resolution of images, diverse weather conditions, etc. In this paper, we propose a bag of features (BoF) based approach that can detect and localize the bleached corals before the safety measures are applied. The dataset contains images of bleached and unbleached corals, and various kernels are used to support the vector machine so that extracted features can be classified. The accuracy of handcrafted descriptors and deep convolutional neural networks is analyzed and provided in detail with comparison to the current method. Various handcrafted descriptors like local binary pattern, a histogram of an oriented gradient, locally encoded transform feature histogram, gray level co-occurrence matrix, and completed joint scale local binary pattern are used for feature extraction. Specific deep convolutional neural networks such as AlexNet, GoogLeNet, VGG-19, ResNet-50, Inception v3, and CoralNet are being used for feature extraction. From experimental analysis and results, the proposed technique outperforms in comparison to the current state-of-the-art methods. The proposed technique achieves 99.08% accuracy with a classification error of 0.92%. A novel bleached coral positioning algorithm is also proposed to locate bleached corals in the coral reef images.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-10-08
      DOI: 10.3390/bdcc5040053
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 54: Effects of Neuro-Cognitive Load on Learning
           Transfer Using a Virtual Reality-Based Driving System

    • Authors: Usman Alhaji Abdurrahman, Shih-Ching Yeh, Yunying Wong, Liang Wei
      First page: 54
      Abstract: Understanding the ways different people perceive and apply acquired knowledge, especially when driving, is an important area of study. This study introduced a novel virtual reality (VR)-based driving system to determine the effects of neuro-cognitive load on learning transfer. In the experiment, easy and difficult routes were introduced to the participants, and the VR system is capable of recording eye-gaze, pupil dilation, heart rate, as well as driving performance data. So, the main purpose here is to apply multimodal data fusion, several machine learning algorithms, and strategic analytic methods to measure neurocognitive load for user classification. A total of ninety-eight (98) university students participated in the experiment, in which forty-nine (49) were male participants and forty-nine (49) were female participants. The results showed that data fusion methods achieved higher accuracy compared to other classification methods. These findings highlight the importance of physiological monitoring to measure mental workload during the process of learning transfer.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-10-13
      DOI: 10.3390/bdcc5040054
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 55: Unraveling the Impact of Land Cover Changes on
           Climate Using Machine Learning and Explainable Artificial Intelligence

    • Authors: Anastasiia Kolevatova, Michael A. Riegler, Francesco Cherubini, Xiangping Hu, Hugo L. Hammer
      First page: 55
      Abstract: A general issue in climate science is the handling of big data and running complex and computationally heavy simulations. In this paper, we explore the potential of using machine learning (ML) to spare computational time and optimize data usage. The paper analyzes the effects of changes in land cover (LC), such as deforestation or urbanization, on local climate. Along with green house gas emission, LC changes are known to be important causes of climate change. ML methods were trained to learn the relation between LC changes and temperature changes. The results showed that random forest (RF) outperformed other ML methods, and especially linear regression models representing current practice in the literature. Explainable artificial intelligence (XAI) was further used to interpret the RF method and analyze the impact of different LC changes on temperature. The results mainly agree with the climate science literature, but also reveal new and interesting findings, demonstrating that ML methods in combination with XAI can be useful in analyzing the climate effects of LC changes. All parts of the analysis pipeline are explained including data pre-processing, feature extraction, ML training, performance evaluation, and XAI.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-10-15
      DOI: 10.3390/bdcc5040055
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 56: 6G Cognitive Information Theory: A Mailbox

    • Authors: Yixue Hao, Yiming Miao, Min Chen, Hamid Gharavi, Victor C. M. Leung
      First page: 56
      Abstract: With the rapid development of 5G communications, enhanced mobile broadband, massive machine type communications and ultra-reliable low latency communications are widely supported. However, a 5G communication system is still based on Shannon’s information theory, while the meaning and value of information itself are not taken into account in the process of transmission. Therefore, it is difficult to meet the requirements of intelligence, customization, and value transmission of 6G networks. In order to solve the above challenges, we propose a 6G mailbox theory, namely a cognitive information carrier to enable distributed algorithm embedding for intelligence networking. Based on Mailbox, a 6G network will form an intelligent agent with self-organization, self-learning, self-adaptation, and continuous evolution capabilities. With the intelligent agent, redundant transmission of data can be reduced while the value transmission of information can be improved. Then, the features of mailbox principle are introduced, including polarity, traceability, dynamics, convergence, figurability, and dependence. Furthermore, key technologies with which value transmission of information can be realized are introduced, including knowledge graph, distributed learning, and blockchain. Finally, we establish a cognitive communication system assisted by deep learning. The experimental results show that, compared with a traditional communication system, our communication system performs less data transmission quantity and error.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-10-16
      DOI: 10.3390/bdcc5040056
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 57: A Semantic Web Framework for Automated Smart
           Assistants: A Case Study for Public Health

    • Authors: Yusuf Sermet, Ibrahim Demir
      First page: 57
      Abstract: The COVID-19 pandemic elucidated that knowledge systems will be instrumental in cases where accurate information needs to be communicated to a substantial group of people with different backgrounds and technological resources. However, several challenges and obstacles hold back the wide adoption of virtual assistants by public health departments and organizations. This paper presents the Instant Expert, an open-source semantic web framework to build and integrate voice-enabled smart assistants (i.e., chatbots) for any web platform regardless of the underlying domain and technology. The component allows non-technical domain experts to effortlessly incorporate an operational assistant with voice recognition capability into their websites. Instant Expert is capable of automatically parsing, processing, and modeling Frequently Asked Questions pages as an information resource as well as communicating with an external knowledge engine for ontology-powered inference and dynamic data use. The presented framework uses advanced web technologies to ensure reusability and reliability, and an inference engine for natural-language understanding powered by deep learning and heuristic algorithms. A use case for creating an informatory assistant for COVID-19 based on the Centers for Disease Control and Prevention (CDC) data is presented to demonstrate the framework’s usage and benefits.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-10-18
      DOI: 10.3390/bdcc5040057
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 58: Preparing Datasets of Surface Roughness for
           Constructing Big Data from the Context of Smart Manufacturing and
           Cognitive Computing

    • Authors: Saman Fattahi, Takuya Okamoto, Sharifu Ura
      First page: 58
      Abstract: In smart manufacturing, human-cyber-physical systems host digital twins and IoT-based networks. The networks weave manufacturing enablers such as CNC machine tools, robots, CAD/CAM systems, process planning systems, enterprise resource planning systems, and human resources. The twins work as the brains of the enablers; that is, the twins supply the required knowledge and help enablers solve problems autonomously in real-time. Since surface roughness is a major concern of all manufacturing processes, twins to solve surface roughness-relevant problems are needed. The twins must machine-learn the required knowledge from the relevant datasets available in big data. Therefore, preparing surface roughness-relevant datasets to be included in the human-cyber-physical system-friendly big data is a critical issue. However, preparing such datasets is a challenge due to the lack of a steadfast procedure. This study sheds some light on this issue. A state-of-the-art method is proposed to prepare the said datasets for surface roughness, wherein each dataset consists of four segments: semantic annotation, roughness model, simulation algorithm, and simulation system. These segments provide input information for digital twins’ input, modeling, simulation, and validation modules. The semantic annotation segment boils down to a concept map. A human- and machine-readable concept map is thus developed where the information of other segments (roughness model, simulation algorithm, and simulation system) is integrated. The delay map of surface roughness profile heights plays a pivotal role in the proposed dataset preparation method. The successful preparation of datasets of surface roughness underlying milling, turning, grinding, electric discharge machining, and polishing shows the efficacy of the proposed method. The method will be extended to the manufacturing processes in the next phase of this study.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-10-25
      DOI: 10.3390/bdcc5040058
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 59: NERWS: Towards Improving Information Retrieval of
           Digital Library Management System Using Named Entity Recognition and Word

    • Authors: Ahmed Aliwy, Ayad Abbas, Ahmed Alkhayyat
      First page: 59
      Abstract: An information retrieval (IR) system is the core of many applications, including digital library management systems (DLMS). The IR-based DLMS depends on either the title with keywords or content as symbolic strings. In contrast, it ignores the meaning of the content or what it indicates. Many researchers tried to improve IR systems either using the named entity recognition (NER) technique or the words’ meaning (word sense) and implemented the improvements with a specific language. However, they did not test the IR system using NER and word sense disambiguation together to study the behavior of this system in the presence of these techniques. This paper aims to improve the information retrieval system used by the DLMS by adding the NER and word sense disambiguation (WSD) together for the English and Arabic languages. For NER, a voting technique was used among three completely different classifiers: rules-based, conditional random field (CRF), and bidirectional LSTM-CNN. For WSD, an examples-based method was used to implement it for the first time with the English language. For the IR system, a vector space model (VSM) was used to test the information retrieval system, and it was tested on samples from the library of the University of Kufa for the Arabic and English languages. The overall system results show that the precision, recall, and F-measures were increased from 70.9%, 74.2%, and 72.5% to 89.7%, 91.5%, and 90.6% for the English language and from 66.3%, 69.7%, and 68.0% to 89.3%, 87.1%, and 88.2% for the Arabic language.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-10-28
      DOI: 10.3390/bdcc5040059
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 60: Fine-Grained Algorithm for Improving KNN
           Computational Performance on Clinical Trials Text Classification

    • Authors: Jasmir Jasmir, Siti Nurmaini, Bambang Tutuko
      First page: 60
      Abstract: Text classification is an important component in many applications. Text classification has attracted the attention of researchers to continue to develop innovations and build new classification models that are sourced from clinical trial texts. In building classification models, many methods are used, including supervised learning. The purpose of this study is to improve the computational performance of one of the supervised learning methods, namely KNN, in building a clinical trial document text classification model by combining KNN and the fine-grained algorithm. This research contributed to increasing the computational performance of KNN from 388,274 s to 260,641 s in clinical trial texts on a clinical trial text dataset with a total of 1,000,000 data.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-10-28
      DOI: 10.3390/bdcc5040060
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 61: Using Machine Learning in Business Process

    • Authors: Younis Al-Anqoudi, Abdullah Al-Hamdani, Mohamed Al-Badawi, Rachid Hedjam
      First page: 61
      Abstract: A business process re-engineering value in improving the business process is undoubted. Nevertheless, it is incredibly complex, time-consuming and costly. This study aims to review available literature in the use of machine learning for business process re-engineering. The review investigates available literature in business process re-engineering frameworks, methodologies, tools, techniques, and machine-learning applications in automating business process re-engineering. The study covers 200+ research papers published between 2015 and 2020 in reputable scientific publication platforms: Scopus, Emerald, Science Direct, IEEE, and British Library. The results indicate that business process re-engineering is a well-established field with scientifically solid frameworks, methodologies, tools, and techniques, which support decision making by generating and analysing relevant data. The study indicates a wealth of data generated, analysed and utilised throughout business process re-engineering projects, thus making it a potential greenfield for innovative machine-learning applications aiming to reduce implementation costs and manage complexity by exploiting the data’s hiding patterns. This suggests that there were attempts towards applying machine learning in business process management and improvement in general. They address process discovery, process behaviour prediction, process improvement, and process optimisation. The review suggests that expanding the applications to business process re-engineering is promising. The study proposed a machine-learning model for automating business process re-engineering, inspired by the Lean Six Sigma principles of eliminating waste and variance in the business process.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-02
      DOI: 10.3390/bdcc5040061
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 62: Prediction of Cloud Fractional Cover Using Machine

    • Authors: Hanna Svennevik, Michael A. Riegler, Steven Hicks, Trude Storelvmo, Hugo L. Hammer
      First page: 62
      Abstract: Climate change is stated as one of the largest issues of our time, resulting in many unwanted effects on life on earth. Cloud fractional cover (CFC), the portion of the sky covered by clouds, might affect global warming and different other aspects of human society such as agriculture and solar energy production. It is therefore important to improve the projection of future CFC, which is usually projected using numerical climate methods. In this paper, we explore the potential of using machine learning as part of a statistical downscaling framework to project future CFC. We are not aware of any other research that has explored this. We evaluated the potential of two different methods, a convolutional long short-term memory model (ConvLSTM) and a multiple regression equation, to predict CFC from other environmental variables. The predictions were associated with much uncertainty indicating that there might not be much information in the environmental variables used in the study to predict CFC. Overall the regression equation performed the best, but the ConvLSTM was the better performing model along some coastal and mountain areas. All aspects of the research analyses are explained including data preparation, model development, ML training, performance evaluation and visualization.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-03
      DOI: 10.3390/bdcc5040062
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 63: GANs and Artificial Facial Expressions in
           Synthetic Portraits

    • Authors: Pilar Rosado, Rubén Fernández, Ferran Reverter
      First page: 63
      Abstract: Generative adversarial networks (GANs) provide powerful architectures for deep generative learning. GANs have enabled us to achieve an unprecedented degree of realism in the creation of synthetic images of human faces, landscapes, and buildings, among others. Not only image generation, but also image manipulation is possible with GANs. Generative deep learning models are inherently limited in their creative abilities because of a focus on learning for perfection. We investigated the potential of GAN’s latent spaces to encode human expressions, highlighting creative interest for suboptimal solutions rather than perfect reproductions, in pursuit of the artistic concept. We have trained Deep Convolutional GAN (DCGAN) and StyleGAN using a collection of portraits of detained persons, portraits of dead people who died of violent causes, and people whose portraits were taken during an orgasm. We present results which diverge from standard usage of GANs with the specific intention of producing portraits that may assist us in the representation and recognition of otherness in contemporary identity construction.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-04
      DOI: 10.3390/bdcc5040063
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 64: How Does Learning Analytics Contribute to Prevent
           Students’ Dropout in Higher Education: A Systematic Literature Review

    • Authors: Catarina Félix de Oliveira, Sónia Rolland Sobral, Maria João Ferreira, Fernando Moreira
      First page: 64
      Abstract: Retention and dropout of higher education students is a subject that must be analysed carefully. Learning analytics can be used to help prevent failure cases. The purpose of this paper is to analyse the scientific production in this area in higher education in journals indexed in Clarivate Analytics’ Web of Science and Elsevier’s Scopus. We use a bibliometric and systematic study to obtain deep knowledge of the referred scientific production. The information gathered allows us to perceive where, how, and in what ways learning analytics has been used in the latest years. By analysing studies performed all over the world, we identify what kinds of data and techniques are used to approach the subject. We propose a feature classification into several categories and subcategories, regarding student and external features. Student features can be seen as personal or academic data, while external factors include information about the university, environment, and support offered to the students. To approach the problems, authors successfully use data mining applied to the identified educational data. We also identify some other concerns, such as privacy issues, that need to be considered in the studies.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-04
      DOI: 10.3390/bdcc5040064
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 65: An Enhanced Parallelisation Model for Performance
           Prediction of Apache Spark on a Multinode Hadoop Cluster

    • Authors: Nasim Ahmed, Andre L. C. Barczak, Mohammad A. Rashid, Teo Susnjak
      First page: 65
      Abstract: Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache Spark has been established as one of the most popular big data engines for its efficiency and reliability. However, one of the significant problems of the Spark system is performance prediction. Spark has more than 150 configurable parameters, and configuration of so many parameters is challenging task when determining the suitable parameters for the system. In this paper, we proposed two distinct parallelisation models for performance prediction. Our insight is that each node in a Hadoop cluster can communicate with identical nodes, and a certain function of the non-parallelisable runtime can be estimated accordingly. Both models use simple equations that allows us to predict the runtime when the size of the job and the number of executables are known. The proposed models were evaluated based on five HiBench workloads, Kmeans, PageRank, Graph (NWeight), SVM, and WordCount. The workload’s empirical data were fitted with one of the two models meeting the accuracy requirements. Finally, the experimental findings show that the model can be a handy and helpful tool for scheduling and planning system deployment.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-05
      DOI: 10.3390/bdcc5040065
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 66: Kano Model Integration with Data Mining to Predict
           Customer Satisfaction

    • Authors: Khaled Al Rabaiei, Fady Alnajjar, Amir Ahmad
      First page: 66
      Abstract: The Kano model is one of the models that help determine which features must be included in a product or service to improve customer satisfaction. The model is focused on highlighting the most relevant attributes of a product or service along with customers’ estimation of how the presence of these attributes can be used to predict satisfaction about specific services or products. This research aims to develop a method to integrate the Kano model and data mining approaches to select relevant attributes that drive customer satisfaction, with a specific focus on higher education. The significant contribution of this research is to solve the problem of selecting features that are not methodically correlated to customer satisfaction, which could reduce the risk of investing in features that could ultimately be irrelevant to enhancing customer satisfaction. Questionnaire data were collected from 646 students from UAE University. The experiment suggests that XGBoost Regression and Decision Tree Regression produce best results for this kind of problem. Based on the integration between the Kano model and the feature selection method, the number of features used to predict customer satisfaction is minimized to four features. It was found that ANOVA features selection model’s integration with the Kano model gives higher Pearson correlation coefficients and higher R2 values.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-11
      DOI: 10.3390/bdcc5040066
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 67: Spiking Neural Networks for Computational
           Intelligence: An Overview

    • Authors: Shirin Dora, Nikola Kasabov
      First page: 67
      Abstract: Deep neural networks with rate-based neurons have exhibited tremendous progress in the last decade. However, the same level of progress has not been observed in research on spiking neural networks (SNN), despite their capability to handle temporal data, energy-efficiency and low latency. This could be because the benchmarking techniques for SNNs are based on the methods used for evaluating deep neural networks, which do not provide a clear evaluation of the capabilities of SNNs. Particularly, the benchmarking of SNN approaches with regards to energy efficiency and latency requires realization in suitable hardware, which imposes additional temporal and resource constraints upon ongoing projects. This review aims to provide an overview of the current real-world applications of SNNs and identifies steps to accelerate research involving SNNs in the future.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-15
      DOI: 10.3390/bdcc5040067
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 68: The Impact of Big Data Adoption on SMEs’

    • Authors: Mahdi Nasrollahi, Javaneh Ramezani, Mahmoud Sadraei
      First page: 68
      Abstract: The notion of Industry 4.0 encompasses the adoption of new information technologies that enable an enormous amount of information to be digitally collected, analyzed, and exploited in organizations to make better decisions. Therefore, finding how organizations can adopt big data (BD) components to improve their performance becomes a relevant research area. This issue is becoming more pertinent for small and medium enterprises (SMEs), especially in developing countries that encounter limited resources and infrastructures. Due to the lack of empirical studies related to big data adoption (BDA) and BD’s business value, especially in SMEs, this study investigates the impact of BDA on SMEs’ performance by obtaining the required data from experts. The quantitative investigation followed a mixed approach, including survey data from 224 managers from Iranian SMEs, and a structural equation modeling (SEM) methodology for the data analysis. Results showed that 12 factors affected the BDA in SMEs. BDA can affect both operational performance and economic performance. There has been no support for the influence of BDA and economic performance on social performance. Finally, the study implications and findings are discussed alongside future research suggestions, as well as some limitations and unanswered questions.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-24
      DOI: 10.3390/bdcc5040068
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 69: Networks and Stories. Analyzing the Transmission
           of the Feminist Intangible Cultural Heritage on Twitter

    • Authors: Jordi Morales-i-Gras, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández
      First page: 69
      Abstract: Internet social media is a key space in which the memorial resources of social movements, including the stories and knowledge of previous generations, are organised, disseminated, and reinterpreted. This is especially important for movements such as feminism, which places great emphasis on the transmission of an intangible cultural legacy between its different generations or waves, which are conformed through these cultural transmissions. In this sense, several authors have highlighted the importance of social media and hashtivism in shaping the fourth wave of feminism that has been taking place in recent years (e.g., #metoo). The aim of this article is to present to the scientific community a hybrid methodological proposal for the network and content analysis of audiences and their interactions on Twitter: we will do so by describing and evaluating the results of different research we have carried out in the field of feminist hashtivism. Structural analysis methods such as social network analysis have demonstrated their capacity to be applied to the analysis of social media interactions as a mixed methodology, that is, both quantitative and qualitative. This article shows the potential of a specific methodological process that combines inductive and inferential reasoning with hypothetico-deductive approaches. By applying the methodology developed in the case studies included in the article, it is shown that these two modes of reasoning work best when they are used together.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-24
      DOI: 10.3390/bdcc5040069
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 70: Gambling Strategies and Prize-Pricing
           Recommendation in Sports Multi-Bets

    • Authors: Oz Pirvandy, Moti Fridman, Gur Yaari
      First page: 70
      Abstract: A sports multi-bet is a bet on the results of a set of N games. One type of multi-bet offered by the Israeli government is WINNER 16, where participants guess the results of a set of 16 soccer games. The prizes in WINNER 16 are determined by the accumulated profit in previous rounds, and are split among all winning forms. When the reward increases beyond a certain threshold, a profitable strategy can be devised. Here, we present a machine-learning algorithm scheme to play WINNER 16. Our proposed algorithm is marginally profitable on average in a range of hyper-parameters, indicating inefficiencies in this game. To make a better prize-pricing mechanism we suggest a generalization of the single-bet approach. We studied the expected profit and risk of WINNER 16 after applying our suggestion. Our proposal can make the game more fair and more appealing without reducing the profitability.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-29
      DOI: 10.3390/bdcc5040070
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 71: Customized Rule-Based Model to Identify At-Risk
           Students and Propose Rational Remedial Actions

    • Authors: Balqis Albreiki, Tetiana Habuza, Zaid Shuqfa, Mohamed Adel Serhani, Nazar Zaki, Saad Harous
      First page: 71
      Abstract: Detecting at-risk students provides advanced benefits for reducing student retention rates, effective enrollment management, alumni engagement, targeted marketing improvement, and institutional effectiveness advancement. One of the success factors of educational institutes is based on accurate and timely identification and prioritization of the students requiring assistance. The main objective of this paper is to detect at-risk students as early as possible in order to take appropriate correction measures taking into consideration the most important and influential attributes in students’ data. This paper emphasizes the use of a customized rule-based system (RBS) to identify and visualize at-risk students in early stages throughout the course delivery using the Risk Flag (RF). Moreover, it can serve as a warning tool for instructors to identify those students that may struggle to grasp learning outcomes. The module allows the instructor to have a dashboard that graphically depicts the students’ performance in different coursework components. The at-risk student will be distinguished (flagged), and remedial actions will be communicated to the student, instructor, and stakeholders. The system suggests remedial actions based on the severity of the case and the time the student is flagged. It is expected to improve students’ achievement and success, and it could also have positive impacts on under-performing students, educators, and academic institutions in general.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-11-29
      DOI: 10.3390/bdcc5040071
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 72: Exploring Ensemble-Based Class Imbalance Learners
           for Intrusion Detection in Industrial Control Networks

    • Authors: Maya Hilda Lestari Louk, Bayu Adhi Tama
      First page: 72
      Abstract: Classifier ensembles have been utilized in the industrial cybersecurity sector for many years. However, their efficacy and reliability for intrusion detection systems remain questionable in current research, owing to the particularly imbalanced data issue. The purpose of this article is to address a gap in the literature by illustrating the benefits of ensemble-based models for identifying threats and attacks in a cyber-physical power grid. We provide a framework that compares nine cost-sensitive individual and ensemble models designed specifically for handling imbalanced data, including cost-sensitive C4.5, roughly balanced bagging, random oversampling bagging, random undersampling bagging, synthetic minority oversampling bagging, random undersampling boosting, synthetic minority oversampling boosting, AdaC2, and EasyEnsemble. Each ensemble’s performance is tested against a range of benchmarked power system datasets utilizing balanced accuracy, Kappa statistics, and AUC metrics. Our findings demonstrate that EasyEnsemble outperformed significantly in comparison to its rivals across the board. Furthermore, undersampling and oversampling strategies were effective in a boosting-based ensemble but not in a bagging-based ensemble.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-06
      DOI: 10.3390/bdcc5040072
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 73: Explainable COVID-19 Detection on Chest X-rays
           Using an End-to-End Deep Convolutional Neural Network Architecture

    • Authors: Mohamed Chetoui, Moulay A. Akhloufi, Bardia Yousefi, El Mostafa Bouattane
      First page: 73
      Abstract: The coronavirus pandemic is spreading around the world. Medical imaging modalities such as radiography play an important role in the fight against COVID-19. Deep learning (DL) techniques have been able to improve medical imaging tools and help radiologists to make clinical decisions for the diagnosis, monitoring and prognosis of different diseases. Computer-Aided Diagnostic (CAD) systems can improve work efficiency by precisely delineating infections in chest X-ray (CXR) images, thus facilitating subsequent quantification. CAD can also help automate the scanning process and reshape the workflow with minimal patient contact, providing the best protection for imaging technicians. The objective of this study is to develop a deep learning algorithm to detect COVID-19, pneumonia and normal cases on CXR images. We propose two classifications problems, (i) a binary classification to classify COVID-19 and normal cases and (ii) a multiclass classification for COVID-19, pneumonia and normal. Nine datasets and more than 3200 COVID-19 CXR images are used to assess the efficiency of the proposed technique. The model is trained on a subset of the National Institute of Health (NIH) dataset using swish activation, thus improving the training accuracy to detect COVID-19 and other pneumonia. The models are tested on eight merged datasets and on individual test sets in order to confirm the degree of generalization of the proposed algorithms. An explainability algorithm is also developed to visually show the location of the lung-infected areas detected by the model. Moreover, we provide a detailed analysis of the misclassified images. The obtained results achieve high performances with an Area Under Curve (AUC) of 0.97 for multi-class classification (COVID-19 vs. other pneumonia vs. normal) and 0.98 for the binary model (COVID-19 vs. normal). The average sensitivity and specificity are 0.97 and 0.98, respectively. The sensitivity of the COVID-19 class achieves 0.99. The results outperformed the comparable state-of-the-art models for the detection of COVID-19 on CXR images. The explainability model shows that our model is able to efficiently identify the signs of COVID-19.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-07
      DOI: 10.3390/bdcc5040073
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 74: Fusion of Moment Invariant Method and Deep
           Learning Algorithm for COVID-19 Classification

    • Authors: Ervin Gubin Moung, Chong Joon Hou, Maisarah Mohd Sufian, Mohd Hanafi Ahmad Hijazi, Jamal Ahmad Dargham, Sigeru Omatu
      First page: 74
      Abstract: The COVID-19 pandemic has resulted in a global health crisis. The rapid spread of the virus has led to the infection of a significant population and millions of deaths worldwide. Therefore, the world is in urgent need of a fast and accurate COVID-19 screening. Numerous researchers have performed exceptionally well to design pioneering deep learning (DL) models for the automatic screening of COVID-19 based on computerised tomography (CT) scans; however, there is still a concern regarding the performance stability affected by tiny perturbations and structural changes in CT images. This paper proposes a fusion of a moment invariant (MI) method and a DL algorithm for feature extraction to address the instabilities in the existing COVID-19 classification models. The proposed method incorporates the MI-based features into the DL models using the cascade fusion method. It was found that the fusion of MI features with DL features has the potential to improve the sensitivity and accuracy of the COVID-19 classification. Based on the evaluation using the SARS-CoV-2 dataset, the fusion of VGG16 and Hu moments shows the best result with 90% sensitivity and 93% accuracy.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-08
      DOI: 10.3390/bdcc5040074
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 75: Screening of Potential Indonesia Herbal Compounds
           Based on Multi-Label Classification for 2019 Coronavirus Disease

    • Authors: Aulia Fadli, Wisnu Ananta Kusuma, Annisa, Irmanida Batubara, Rudi Heryanto
      First page: 75
      Abstract: Coronavirus disease 2019 pandemic spreads rapidly and requires an acceleration in the process of drug discovery. Drug repurposing can help accelerate the drug discovery process by identifying new efficacy for approved drugs, and it is considered an efficient and economical approach. Research in drug repurposing can be done by observing the interactions of drug compounds with protein related to a disease (DTI), then predicting the new drug-target interactions. This study conducted multilabel DTI prediction using the stack autoencoder-deep neural network (SAE-DNN) algorithm. Compound features were extracted using PubChem fingerprint, daylight fingerprint, MACCS fingerprint, and circular fingerprint. The results showed that the SAE-DNN model was able to predict DTI in COVID-19 cases with good performance. The SAE-DNN model with a circular fingerprint dataset produced the best average metrics with an accuracy of 0.831, recall of 0.918, precision of 0.888, and F-measure of 0.89. Herbal compounds prediction results using the SAE-DNN model with the circular, daylight, and PubChem fingerprint dataset resulted in 92, 65, and 79 herbal compounds contained in herbal plants in Indonesia respectively.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-09
      DOI: 10.3390/bdcc5040075
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 76: GO-E-MON: A New Online Platform for Decentralized
           Cognitive Science

    • Authors: Satoshi Yazawa, Kikue Sakaguchi, Kazuo Hiraki
      First page: 76
      Abstract: Advances in web technology and the widespread use of smartphones and PCs have proven that it is possible to optimize various services using personal data, such as location information and search history. While considerations of personal privacy and legal aspects lead to situations where data are monopolized by individual services and companies, a replication crisis has been pointed out for the data of laboratory experiments, which is challenging to solve given the difficulty of data distribution. To ensure distribution of experimental data while guaranteeing security, an online experiment platform can be a game changer. Current online experiment platforms have not yet considered improving data distribution, and it is currently difficult to use the data obtained from one experiment for other purposes. In addition, various devices such as activity meters and consumer-grade electroencephalography meters are emerging, and if a platform that collects data from such devices and tasks online is to be realized, the platform will hold a large amount of sensitive data, making it even more important to ensure security. We propose GO-E-MON, a service that combines an online experimental environment with a distributed personal data store (PDS), and explain how GO-E-MON can realize the reuse of experimental data with the subject’s consent by connecting to a distributed PDS. We report the results of the experiment in a groupwork lecture for university students to verify whether this method works. By building an online experiment environment integrated with a distributed PDS, we present the possibility of integrating multiple experiments performed by different experimenters—with the consent of individual subjects—while solving the security issues.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-13
      DOI: 10.3390/bdcc5040076
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 77: DASentimental: Detecting Depression, Anxiety, and
           Stress in Texts via Emotional Recall, Cognitive Networks, and Machine

    • Authors: Asra Fatima, Ying Li, Thomas Trenholm Hills, Massimo Stella
      First page: 77
      Abstract: Most current affect scales and sentiment analysis on written text focus on quantifying valence/sentiment, the primary dimension of emotion. Distinguishing broader, more complex negative emotions of similar valence is key to evaluating mental health. We propose a semi-supervised machine learning model, DASentimental, to extract depression, anxiety, and stress from written text. We trained DASentimental to identify how N = 200 sequences of recalled emotional words correlate with recallers’ depression, anxiety, and stress from the Depression Anxiety Stress Scale (DASS-21). Using cognitive network science, we modeled every recall list as a bag-of-words (BOW) vector and as a walk over a network representation of semantic memory—in this case, free associations. This weights BOW entries according to their centrality (degree) in semantic memory and informs recalls using semantic network distances, thus embedding recalls in a cognitive representation. This embedding translated into state-of-the-art, cross-validated predictions for depression (R = 0.7), anxiety (R = 0.44), and stress (R = 0.52), equivalent to previous results employing additional human data. Powered by a multilayer perceptron neural network, DASentimental opens the door to probing the semantic organizations of emotional distress. We found that semantic distances between recalls (i.e., walk coverage), was key for estimating depression levels but redundant for anxiety and stress levels. Semantic distances from “fear” boosted anxiety predictions but were redundant when the “sad–happy” dyad was considered. We applied DASentimental to a clinical dataset of 142 suicide notes and found that the predicted depression and anxiety levels (high/low) corresponded to differences in valence and arousal as expected from a circumplex model of affect. We discuss key directions for future research enabled by artificial intelligence detecting stress, anxiety, and depression in texts.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-13
      DOI: 10.3390/bdcc5040077
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 78: Automatic Diagnosis of Epileptic Seizures in EEG
           Signals Using Fractal Dimension Features and Convolutional Autoencoder

    • Authors: Anis Malekzadeh, Assef Zare, Mahdi Yaghoobi, Roohallah Alizadehsani
      First page: 78
      Abstract: This paper proposes a new method for epileptic seizure detection in electroencephalography (EEG) signals using nonlinear features based on fractal dimension (FD) and a deep learning (DL) model. Firstly, Bonn and Freiburg datasets were used to perform experiments. The Bonn dataset consists of binary and multi-class classification problems, and the Freiburg dataset consists of two-class EEG classification problems. In the preprocessing step, all datasets were prepossessed using a Butterworth band pass filter with 0.5–60 Hz cut-off frequency. Then, the EEG signals of the datasets were segmented into different time windows. In this section, dual-tree complex wavelet transform (DT-CWT) was used to decompose the EEG signals into the different sub-bands. In the following section, in order to feature extraction, various FD techniques were used, including Higuchi (HFD), Katz (KFD), Petrosian (PFD), Hurst exponent (HE), detrended fluctuation analysis (DFA), Sevcik, box counting (BC), multiresolution box-counting (MBC), Margaos-Sun (MSFD), multifractal DFA (MF-DFA), and recurrence quantification analysis (RQA). In the next step, the minimum redundancy maximum relevance (mRMR) technique was used for feature selection. Finally, the k-nearest neighbors (KNN), support vector machine (SVM), and convolutional autoencoder (CNN-AE) were used for the classification step. In the classification step, the K-fold cross-validation with k = 10 was employed to demonstrate the effectiveness of the classifier methods. The experiment results show that the proposed CNN-AE method achieved an accuracy of 99.736% and 99.176% for the Bonn and Freiburg datasets, respectively.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-13
      DOI: 10.3390/bdcc5040078
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 79: Spatial Sound in a 3D Virtual Environment: All
           Bark and No Bite'

    • Authors: Radha Nila Meghanathan, Patrick Ruediger-Flore, Felix Hekele, Jan Spilski, Achim Ebert, Thomas Lachmann
      First page: 79
      Abstract: Although the focus of Virtual Reality (VR) lies predominantly on the visual world, acoustic components enhance the functionality of a 3D environment. To study the interaction between visual and auditory modalities in a 3D environment, we investigated the effect of auditory cues on visual searches in 3D virtual environments with both visual and auditory noise. In an experiment, we asked participants to detect visual targets in a 360° video in conditions with and without environmental noise. Auditory cues indicating the target location were either absent or one of simple stereo or binaural audio, both of which assisted sound localization. To investigate the efficacy of these cues in distracting environments, we measured participant performance using a VR headset with an eye tracker. We found that the binaural cue outperformed both stereo and no auditory cues in terms of target detection irrespective of the environmental noise. We used two eye movement measures and two physiological measures to evaluate task dynamics and mental effort. We found that the absence of a cue increased target search duration and target search path, measured as time to fixation and gaze trajectory lengths, respectively. Our physiological measures of blink rate and pupil size showed no difference between the different stadium and cue conditions. Overall, our study provides evidence for the utility of binaural audio in a realistic, noisy and virtual environment for performing a target detection task, which is a crucial part of everyday behaviour—finding someone in a crowd.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-13
      DOI: 10.3390/bdcc5040079
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 80: Semantic Trajectory Analytics and Recommender
           Systems in Cultural Spaces

    • Authors: Sotiris Angelis, Konstantinos Kotis, Dimitris Spiliotopoulos
      First page: 80
      Abstract: Semantic trajectory analytics and personalised recommender systems that enhance user experience are modern research topics that are increasingly getting attention. Semantic trajectories can efficiently model human movement for further analysis and pattern recognition, while personalised recommender systems can adapt to constantly changing user needs and provide meaningful and optimised suggestions. This paper focuses on the investigation of open issues and challenges at the intersection of these two topics, emphasising semantic technologies and machine learning techniques. The goal of this paper is twofold: (a) to critically review related work on semantic trajectories and knowledge-based interactive recommender systems, and (b) to propose a high-level framework, by describing its requirements. The paper presents a system architecture design for the recognition of semantic trajectory patterns and for the inferencing of possible synthesis of visitor trajectories in cultural spaces, such as museums, making suggestions for new trajectories that optimise cultural experiences.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-16
      DOI: 10.3390/bdcc5040080
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 81: Clustering Algorithm to Measure Student Assessment
           Accuracy: A Double Study

    • Authors: Sónia Rolland Sobral, Catarina Félix de Oliveira
      First page: 81
      Abstract: Self-assessment is one of the strategies used in active teaching to engage students in the entire learning process, in the form of self-regulated academic learning. This study aims to assess the possibility of including self-evaluation in the student’s final grade, not just as a self-assessment that allows students to predict the grade obtained but also as something to weigh on the final grade. Two different curricular units are used, both from the first year of graduation, one from the international relations course (N = 29) and the other from the computer science and computer engineering courses (N = 50). Students were asked to self-assess at each of the two evaluation moments of each unit, after submitting their work/test and after knowing the correct answers. This study uses statistical analysis as well as a clustering algorithm (K-means) on the data to try to gain deeper knowledge and visual insights into the data and the patterns among them. It was verified that there are no differences between the obtained grade and the thought grade by gender and age variables, but a direct correlation was found between the thought grade averages and the grade level. The difference is less accentuated at the second moment of evaluation—which suggests that an improvement in the self-assessment skill occurs from the first to the second evaluation moment.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-12-18
      DOI: 10.3390/bdcc5040081
      Issue No: Vol. 5, No. 4 (2021)
  • BDCC, Vol. 5, Pages 42: Indoor Localization for Personalized Ambient
           Assisted Living of Multiple Users in Multi-Floor Smart Environments

    • Authors: Nirmalya Thakur, Chia Y. Han
      First page: 42
      Abstract: This paper presents a multifunctional interdisciplinary framework that makes four scientific contributions towards the development of personalized ambient assisted living (AAL), with a specific focus to address the different and dynamic needs of the diverse aging population in the future of smart living environments. First, it presents a probabilistic reasoning-based mathematical approach to model all possible forms of user interactions for any activity arising from user diversity of multiple users in such environments. Second, it presents a system that uses this approach with a machine learning method to model individual user-profiles and user-specific user interactions for detecting the dynamic indoor location of each specific user. Third, to address the need to develop highly accurate indoor localization systems for increased trust, reliance, and seamless user acceptance, the framework introduces a novel methodology where two boosting approaches—Gradient Boosting and the AdaBoost algorithm are integrated and used on a decision tree-based learning model to perform indoor localization. Fourth, the framework introduces two novel functionalities to provide semantic context to indoor localization in terms of detecting each user’s floor-specific location as well as tracking whether a specific user was located inside or outside a given spatial region in a multi-floor-based indoor setting. These novel functionalities of the proposed framework were tested on a dataset of localization-related Big Data collected from 18 different users who navigated in 3 buildings consisting of 5 floors and 254 indoor spatial regions, with an to address the limitation in prior works in this field centered around the lack of training data from diverse users. The results show that this approach of indoor localization for personalized AAL that models each specific user always achieves higher accuracy as compared to the traditional approach of modeling an average user. The results further demonstrate that the proposed framework outperforms all prior works in this field in terms of functionalities, performance characteristics, and operational features.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-08
      DOI: 10.3390/bdcc5030042
      Issue No: Vol. 5, No. 3 (2021)
  • BDCC, Vol. 5, Pages 43: AI Based Emotion Detection for Textual Big Data:
           Techniques and Contribution

    • Authors: Sheetal Kusal, Shruti Patil, Ketan Kotecha, Rajanikanth Aluvalu, Vijayakumar Varadarajan
      First page: 43
      Abstract: Online Social Media (OSM) like Facebook and Twitter has emerged as a powerful tool to express via text people’s opinions and feelings about the current surrounding events. Understanding the emotions at the fine-grained level of these expressed thoughts is important for system improvement. Such crucial insights cannot be completely obtained by doing AI-based big data sentiment analysis; hence, text-based emotion detection using AI in social media big data has become an upcoming area of Natural Language Processing research. It can be used in various fields such as understanding expressed emotions, human–computer interaction, data mining, online education, recommendation systems, and psychology. Even though the research work is ongoing in this domain, it still lacks a formal study that can give a qualitative (techniques used) and quantitative (contributions) literature overview. This study has considered 827 Scopus and 83 Web of Science research papers from the years 2005–2020 for the analysis. The qualitative review represents different emotion models, datasets, algorithms, and application domains of text-based emotion detection. The quantitative bibliometric review of contributions presents research details such as publications, volume, co-authorship networks, citation analysis, and demographic research distribution. In the end, challenges and probable solutions are showcased, which can provide future research directions in this area.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-09
      DOI: 10.3390/bdcc5030043
      Issue No: Vol. 5, No. 3 (2021)
  • BDCC, Vol. 5, Pages 44: CrowDSL: Platform for Incidents Management in a
           Smart City Context

    • Authors: Darío Rodríguez-García, Vicente García-Díaz, Cristian González García
      First page: 44
      Abstract: The final objective of smart cities is to optimize services and improve the quality of life of their citizens, who can play important roles due to the information they can provide. This information can be used in order to enhance many sectors involved in city activity such as transport, energy or health. Crowd-sourcing initiatives focus their efforts on making cities safer places that are adapted to the population size they host. In this way, citizens are able to report the issues they identify to the relevant body so that they can be fixed and, at the same time, they can provide useful information to other citizens. There are several projects aimed at reporting incidents in a smart city context. In this paper, we propose the use of model-driven engineering by designing a graphical domain-specific language to abstract and improve the incident-reporting process. With the use of a domain-specific language, we can obtain several benefits in our research for users and cities. For instance, we can shorten the time for reporting the events by users and, at the same time, we gain an expressive power compared to other methodologies for incident reporting. In addition, it can be reused and is centered in this specific domain after being studied. Furthermore, we have evaluated the DSL with different users, obtaining a high satisfaction percentage.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-16
      DOI: 10.3390/bdcc5030044
      Issue No: Vol. 5, No. 3 (2021)
  • BDCC, Vol. 5, Pages 45: Diversification of Legislation Editing Open
           Software (LEOS) Using Software Agents—Transforming Parliamentary Control
           of the Hellenic Parliament into Big Open Legal Data

    • Authors: Sotiris Leventis, Fotios Fitsilis, Vasileios Anastasiou
      First page: 45
      Abstract: The accessibility and reuse of legal data is paramount for promoting transparency, accountability and, ultimately, trust towards governance institutions. The aggregation of structured and semi-structured legal data inevitably leads to the big data realm and a series of challenges for the generation, handling, and analysis of large datasets. When it comes to data generation, LEOS represents a legal informatics tool that is maturing quickly. Now in its third release, it effectively supports the drafting of legal documents using Akoma Ntoso compatible schemes. However, the tool, originally developed for cooperative legislative drafting, can be repurposed to draft parliamentary control documents. This is achieved through the use of actor-oriented software components, referred to as software agents, which enable system interoperability by interlinking the text editing system with parliamentary control datasets. A validated corpus of written questions from the Hellenic Parliament is used to evaluate the feasibility of the endeavour, and the feasibility of using it as an authoring tool for written parliamentary questions and generation of standardised, open, legislative data. Systemic integration not only proves the tool’s versatility, but also opens up new grounds in interoperability between formerly unrelated legal systems and data sources.
      Citation: Big Data and Cognitive Computing
      PubDate: 2021-09-18
      DOI: 10.3390/bdcc5030045
      Issue No: Vol. 5, No. 3 (2021)
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762

Your IP address:
Home (Search)
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-