Subjects -> COMPUTER SCIENCE (Total: 2313 journals)
    - ANIMATION AND SIMULATION (33 journals)
    - ARTIFICIAL INTELLIGENCE (133 journals)
    - AUTOMATION AND ROBOTICS (116 journals)
    - CLOUD COMPUTING AND NETWORKS (75 journals)
    - COMPUTER ARCHITECTURE (11 journals)
    - COMPUTER ENGINEERING (12 journals)
    - COMPUTER GAMES (23 journals)
    - COMPUTER PROGRAMMING (25 journals)
    - COMPUTER SCIENCE (1305 journals)
    - COMPUTER SECURITY (59 journals)
    - DATA BASE MANAGEMENT (21 journals)
    - DATA MINING (50 journals)
    - E-BUSINESS (21 journals)
    - E-LEARNING (30 journals)
    - ELECTRONIC DATA PROCESSING (23 journals)
    - IMAGE AND VIDEO PROCESSING (42 journals)
    - INFORMATION SYSTEMS (109 journals)
    - INTERNET (111 journals)
    - SOCIAL WEB (61 journals)
    - SOFTWARE (43 journals)
    - THEORY OF COMPUTING (10 journals)

COMPUTER SCIENCE (1305 journals)                  1 2 3 4 5 6 7 | Last

Showing 1 - 200 of 872 Journals sorted alphabetically
3D Printing and Additive Manufacturing     Full-text available via subscription   (Followers: 27)
Abakós     Open Access   (Followers: 3)
ACM Computing Surveys     Hybrid Journal   (Followers: 29)
ACM Inroads     Full-text available via subscription   (Followers: 1)
ACM Journal of Computer Documentation     Free   (Followers: 4)
ACM Journal on Computing and Cultural Heritage     Hybrid Journal   (Followers: 5)
ACM Journal on Emerging Technologies in Computing Systems     Hybrid Journal   (Followers: 12)
ACM SIGACCESS Accessibility and Computing     Free   (Followers: 2)
ACM SIGAPP Applied Computing Review     Full-text available via subscription  
ACM SIGBioinformatics Record     Full-text available via subscription  
ACM SIGEVOlution     Full-text available via subscription  
ACM SIGHIT Record     Full-text available via subscription  
ACM SIGHPC Connect     Full-text available via subscription  
ACM SIGITE Newsletter     Open Access   (Followers: 1)
ACM SIGMIS Database: the DATABASE for Advances in Information Systems     Hybrid Journal  
ACM SIGUCCS plugged in     Full-text available via subscription  
ACM SIGWEB Newsletter     Full-text available via subscription   (Followers: 3)
ACM Transactions on Accessible Computing (TACCESS)     Hybrid Journal   (Followers: 3)
ACM Transactions on Algorithms (TALG)     Hybrid Journal   (Followers: 13)
ACM Transactions on Applied Perception (TAP)     Hybrid Journal   (Followers: 3)
ACM Transactions on Architecture and Code Optimization (TACO)     Hybrid Journal   (Followers: 9)
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)     Hybrid Journal  
ACM Transactions on Autonomous and Adaptive Systems (TAAS)     Hybrid Journal   (Followers: 10)
ACM Transactions on Computation Theory (TOCT)     Hybrid Journal   (Followers: 11)
ACM Transactions on Computational Logic (TOCL)     Hybrid Journal   (Followers: 5)
ACM Transactions on Computer Systems (TOCS)     Hybrid Journal   (Followers: 19)
ACM Transactions on Computer-Human Interaction     Hybrid Journal   (Followers: 15)
ACM Transactions on Computing Education (TOCE)     Hybrid Journal   (Followers: 9)
ACM Transactions on Computing for Healthcare     Hybrid Journal  
ACM Transactions on Cyber-Physical Systems (TCPS)     Hybrid Journal   (Followers: 1)
ACM Transactions on Design Automation of Electronic Systems (TODAES)     Hybrid Journal   (Followers: 5)
ACM Transactions on Economics and Computation     Hybrid Journal  
ACM Transactions on Embedded Computing Systems (TECS)     Hybrid Journal   (Followers: 4)
ACM Transactions on Information Systems (TOIS)     Hybrid Journal   (Followers: 18)
ACM Transactions on Intelligent Systems and Technology (TIST)     Hybrid Journal   (Followers: 11)
ACM Transactions on Interactive Intelligent Systems (TiiS)     Hybrid Journal   (Followers: 6)
ACM Transactions on Internet of Things     Hybrid Journal   (Followers: 2)
ACM Transactions on Modeling and Performance Evaluation of Computing Systems (ToMPECS)     Hybrid Journal  
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)     Hybrid Journal   (Followers: 10)
ACM Transactions on Parallel Computing     Full-text available via subscription  
ACM Transactions on Reconfigurable Technology and Systems (TRETS)     Hybrid Journal   (Followers: 6)
ACM Transactions on Sensor Networks (TOSN)     Hybrid Journal   (Followers: 9)
ACM Transactions on Social Computing     Hybrid Journal  
ACM Transactions on Spatial Algorithms and Systems (TSAS)     Hybrid Journal   (Followers: 1)
ACM Transactions on Speech and Language Processing (TSLP)     Hybrid Journal   (Followers: 11)
ACM Transactions on Storage     Hybrid Journal  
ACS Applied Materials & Interfaces     Hybrid Journal   (Followers: 41)
Acta Informatica Malaysia     Open Access  
Acta Universitatis Cibiniensis. Technical Series     Open Access   (Followers: 1)
Ad Hoc Networks     Hybrid Journal   (Followers: 12)
Adaptive Behavior     Hybrid Journal   (Followers: 8)
Additive Manufacturing Letters     Open Access   (Followers: 3)
Advanced Engineering Materials     Hybrid Journal   (Followers: 32)
Advanced Science Letters     Full-text available via subscription   (Followers: 9)
Advances in Adaptive Data Analysis     Hybrid Journal   (Followers: 9)
Advances in Artificial Intelligence     Open Access   (Followers: 33)
Advances in Catalysis     Full-text available via subscription   (Followers: 7)
Advances in Computational Mathematics     Hybrid Journal   (Followers: 20)
Advances in Computer Engineering     Open Access   (Followers: 13)
Advances in Computer Science : an International Journal     Open Access   (Followers: 19)
Advances in Computing     Open Access   (Followers: 3)
Advances in Data Analysis and Classification     Hybrid Journal   (Followers: 52)
Advances in Engineering Software     Hybrid Journal   (Followers: 27)
Advances in Geosciences (ADGEO)     Open Access   (Followers: 19)
Advances in Human-Computer Interaction     Open Access   (Followers: 19)
Advances in Image and Video Processing     Open Access   (Followers: 20)
Advances in Materials Science     Open Access   (Followers: 20)
Advances in Multimedia     Open Access   (Followers: 1)
Advances in Operations Research     Open Access   (Followers: 13)
Advances in Remote Sensing     Open Access   (Followers: 59)
Advances in Science and Research (ASR)     Open Access   (Followers: 8)
Advances in Technology Innovation     Open Access   (Followers: 5)
AEU - International Journal of Electronics and Communications     Hybrid Journal   (Followers: 8)
African Journal of Information and Communication     Open Access   (Followers: 6)
African Journal of Mathematics and Computer Science Research     Open Access   (Followers: 5)
AI EDAM     Hybrid Journal   (Followers: 2)
Air, Soil & Water Research     Open Access   (Followers: 6)
AIS Transactions on Human-Computer Interaction     Open Access   (Followers: 5)
Al-Qadisiyah Journal for Computer Science and Mathematics     Open Access   (Followers: 2)
AL-Rafidain Journal of Computer Sciences and Mathematics     Open Access   (Followers: 3)
Algebras and Representation Theory     Hybrid Journal  
Algorithms     Open Access   (Followers: 13)
American Journal of Computational and Applied Mathematics     Open Access   (Followers: 8)
American Journal of Computational Mathematics     Open Access   (Followers: 6)
American Journal of Information Systems     Open Access   (Followers: 4)
American Journal of Sensor Technology     Open Access   (Followers: 2)
Analog Integrated Circuits and Signal Processing     Hybrid Journal   (Followers: 15)
Animation Practice, Process & Production     Hybrid Journal   (Followers: 4)
Annals of Combinatorics     Hybrid Journal   (Followers: 3)
Annals of Data Science     Hybrid Journal   (Followers: 14)
Annals of Mathematics and Artificial Intelligence     Hybrid Journal   (Followers: 16)
Annals of Pure and Applied Logic     Open Access   (Followers: 4)
Annals of Software Engineering     Hybrid Journal   (Followers: 12)
Annual Reviews in Control     Hybrid Journal   (Followers: 7)
Anuario Americanista Europeo     Open Access  
Applicable Algebra in Engineering, Communication and Computing     Hybrid Journal   (Followers: 3)
Applied and Computational Harmonic Analysis     Full-text available via subscription  
Applied Artificial Intelligence: An International Journal     Hybrid Journal   (Followers: 17)
Applied Categorical Structures     Hybrid Journal   (Followers: 4)
Applied Clinical Informatics     Hybrid Journal   (Followers: 4)
Applied Computational Intelligence and Soft Computing     Open Access   (Followers: 16)
Applied Computer Systems     Open Access   (Followers: 6)
Applied Computing and Geosciences     Open Access   (Followers: 3)
Applied Mathematics and Computation     Hybrid Journal   (Followers: 31)
Applied Medical Informatics     Open Access   (Followers: 11)
Applied Numerical Mathematics     Hybrid Journal   (Followers: 4)
Applied Soft Computing     Hybrid Journal   (Followers: 13)
Applied Spatial Analysis and Policy     Hybrid Journal   (Followers: 5)
Applied System Innovation     Open Access   (Followers: 1)
Archive of Applied Mechanics     Hybrid Journal   (Followers: 4)
Archive of Numerical Software     Open Access  
Archives and Museum Informatics     Hybrid Journal   (Followers: 97)
Archives of Computational Methods in Engineering     Hybrid Journal   (Followers: 5)
arq: Architectural Research Quarterly     Hybrid Journal   (Followers: 7)
Array     Open Access   (Followers: 1)
Artifact : Journal of Design Practice     Open Access   (Followers: 8)
Artificial Life     Hybrid Journal   (Followers: 7)
Asian Journal of Computer Science and Information Technology     Open Access   (Followers: 3)
Asian Journal of Control     Hybrid Journal  
Asian Journal of Research in Computer Science     Open Access   (Followers: 4)
Assembly Automation     Hybrid Journal   (Followers: 2)
Automatic Control and Computer Sciences     Hybrid Journal   (Followers: 6)
Automatic Documentation and Mathematical Linguistics     Hybrid Journal   (Followers: 5)
Automatica     Hybrid Journal   (Followers: 13)
Automatika : Journal for Control, Measurement, Electronics, Computing and Communications     Open Access  
Automation in Construction     Hybrid Journal   (Followers: 8)
Balkan Journal of Electrical and Computer Engineering     Open Access  
Basin Research     Hybrid Journal   (Followers: 7)
Behaviour & Information Technology     Hybrid Journal   (Followers: 32)
BenchCouncil Transactions on Benchmarks, Standards, and Evaluations     Open Access   (Followers: 7)
Big Data and Cognitive Computing     Open Access   (Followers: 5)
Big Data Mining and Analytics     Open Access   (Followers: 10)
Biodiversity Information Science and Standards     Open Access   (Followers: 2)
Bioinformatics     Hybrid Journal   (Followers: 226)
Bioinformatics Advances : Journal of the International Society for Computational Biology     Open Access   (Followers: 1)
Biomedical Engineering     Hybrid Journal   (Followers: 11)
Biomedical Engineering and Computational Biology     Open Access   (Followers: 11)
Briefings in Bioinformatics     Hybrid Journal   (Followers: 43)
British Journal of Educational Technology     Hybrid Journal   (Followers: 96)
Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics     Open Access  
c't Magazin fuer Computertechnik     Full-text available via subscription   (Followers: 1)
Cadernos do IME : Série Informática     Open Access  
CALCOLO     Hybrid Journal  
CALICO Journal     Full-text available via subscription   (Followers: 3)
Calphad     Hybrid Journal  
Canadian Journal of Electrical and Computer Engineering     Full-text available via subscription   (Followers: 14)
Catalysis in Industry     Hybrid Journal  
CCF Transactions on High Performance Computing     Hybrid Journal  
CCF Transactions on Pervasive Computing and Interaction     Hybrid Journal  
CEAS Space Journal     Hybrid Journal   (Followers: 6)
Cell Communication and Signaling     Open Access   (Followers: 3)
Central European Journal of Computer Science     Hybrid Journal   (Followers: 5)
CERN IdeaSquare Journal of Experimental Innovation     Open Access  
Chaos, Solitons & Fractals     Hybrid Journal   (Followers: 1)
Chaos, Solitons & Fractals : X     Open Access   (Followers: 1)
Chemometrics and Intelligent Laboratory Systems     Hybrid Journal   (Followers: 13)
ChemSusChem     Hybrid Journal   (Followers: 7)
China Communications     Full-text available via subscription   (Followers: 8)
Chinese Journal of Catalysis     Full-text available via subscription   (Followers: 2)
Chip     Full-text available via subscription   (Followers: 5)
Ciencia     Open Access  
CIN : Computers Informatics Nursing     Hybrid Journal   (Followers: 11)
Circuits and Systems     Open Access   (Followers: 16)
CLEI Electronic Journal     Open Access  
Clin-Alert     Hybrid Journal   (Followers: 1)
Clinical eHealth     Open Access  
Cluster Computing     Hybrid Journal   (Followers: 1)
Cognitive Computation     Hybrid Journal   (Followers: 2)
Cognitive Computation and Systems     Open Access  
COMBINATORICA     Hybrid Journal  
Combinatorics, Probability and Computing     Hybrid Journal   (Followers: 4)
Combustion Theory and Modelling     Hybrid Journal   (Followers: 18)
Communication Methods and Measures     Hybrid Journal   (Followers: 12)
Communication Theory     Hybrid Journal   (Followers: 29)
Communications in Algebra     Hybrid Journal   (Followers: 1)
Communications in Partial Differential Equations     Hybrid Journal   (Followers: 2)
Communications of the ACM     Full-text available via subscription   (Followers: 59)
Communications of the Association for Information Systems     Open Access   (Followers: 15)
Communications on Applied Mathematics and Computation     Hybrid Journal   (Followers: 1)
COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering     Hybrid Journal   (Followers: 4)
Complex & Intelligent Systems     Open Access   (Followers: 1)
Complex Adaptive Systems Modeling     Open Access  
Complex Analysis and Operator Theory     Hybrid Journal   (Followers: 2)
Complexity     Hybrid Journal   (Followers: 8)
Computación y Sistemas     Open Access  
Computation     Open Access   (Followers: 1)
Computational and Applied Mathematics     Hybrid Journal   (Followers: 3)
Computational and Mathematical Methods     Hybrid Journal  
Computational and Mathematical Methods in Medicine     Open Access   (Followers: 2)
Computational and Mathematical Organization Theory     Hybrid Journal   (Followers: 1)
Computational and Structural Biotechnology Journal     Open Access   (Followers: 1)
Computational and Theoretical Chemistry     Hybrid Journal   (Followers: 11)
Computational Astrophysics and Cosmology     Open Access   (Followers: 7)
Computational Biology and Chemistry     Hybrid Journal   (Followers: 13)
Computational Biology Journal     Open Access   (Followers: 6)
Computational Brain & Behavior     Hybrid Journal   (Followers: 1)
Computational Chemistry     Open Access   (Followers: 3)
Computational Communication Research     Open Access   (Followers: 1)
Computational Complexity     Hybrid Journal   (Followers: 5)
Computational Condensed Matter     Open Access   (Followers: 1)

        1 2 3 4 5 6 7 | Last

Similar Journals
Journal Cover
Big Data and Cognitive Computing
Number of Followers: 5  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2504-2289
Published by MDPI Homepage  [84 journals]
  • BDCC, Vol. 6, Pages 69: Comparative Analysis of Backbone Networks for Deep
           Knee MRI Classification Models

    • Authors: Nataliya Shakhovska, Pavlo Pukach
      First page: 69
      Abstract: This paper focuses on different types of backbone networks for machine learning architectures which perform classification of knee Magnetic Resonance Imaging (MRI) images. This paper aims to compare different types of feature extraction networks for the same classification task, in terms of accuracy and performance. Multiple variations of machine learning models were trained based on the MRNet architecture, choosing AlexNet, ResNet, VGG-11, VGG-16, and Efficientnet as the backbone. The models were evaluated on the MRNet validation dataset, computing Area Under the Receiver Operating Characteristics Curve (ROC-AUC), accuracy, f1 score, and Cohen’s Kappa as evaluation metrics. The MRNet-VGG16 model variant shows the best results for Anterior Cruciate Ligament (ACL) tear detection. For general abnormality detection, MRNet-VGG16 is dominated by MRNet-Resnet in confidence between 0.5 and 0.75 and by MRNet-VGG11 for confidence more than 0.8. Due to the non-uniform nature of backbone network performance on different MRI planes, it is advisable to use an LR ensemble of: VGG16 on a coronal plane for all classification tasks; on an axial plane for abnormality and ACL tear detection; Alexnet on a sagittal plane for abnormality detection, and an axial plane for meniscal tear detection; and VGG11 on a sagittal plane for ACL tear detection. The results also indicate that the Cohen’s Kappa metric is valuable in model evaluation for the MRNet dataset, as it provides deeper insights on classification decisions.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-06-21
      DOI: 10.3390/bdcc6030069
      Issue No: Vol. 6, No. 3 (2022)
       
  • BDCC, Vol. 6, Pages 33: RoBERTaEns: Deep Bidirectional Encoder Ensemble
           Model for Fact Verification

    • Authors: Muchammad Naseer, Jauzak Hussaini Windiatmaja, Muhamad Asvial, Riri Fitri Sari
      First page: 33
      Abstract: The application of the bidirectional encoder model to detect fake news has been widely applied because of its ability to provide factual verification with good results. Good fact verification requires the most optimal model and has the best evaluation to make news readers trust the reliable and accurate verification results. In this study, we evaluated the application of a homogeneous ensemble (HE) on RoBERTa to improve the accuracy of a model. We improve the HE method using a bagging ensemble from three types of RoBERTa models. Then, each prediction is combined to build a new model called RoBERTaEns. The FEVER dataset is used to train and test our model. The experimental results showed that the proposed method, RoBERTaEns, obtained a higher accuracy value with an F1-Score of 84.2% compared to the other RoBERTa models. In addition, RoBERTaEns has a smaller margin of error compared to the other models. Thus, it proves that the application of the HE functions increases the accuracy of a model and produces better values in handling various types of fact input in each fold.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-22
      DOI: 10.3390/bdcc6020033
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 34: Startups and Consumer Purchase Behavior:
           Application of Support Vector Machine Algorithm

    • Authors: Pejman Ebrahimi, Aidin Salamzadeh, Maryam Soleimani, Seyed Mohammad Khansari, Hadi Zarea, Maria Fekete-Farkas
      First page: 34
      Abstract: This study evaluated the impact of startup technology innovations and customer relationship management (CRM) performance on customer participation, value co-creation, and consumer purchase behavior (CPB). This analytical study empirically tested the proposed hypotheses using structural equation modeling (SEM) and SmartPLS 3 techniques. Moreover, we used a support vector machine (SVM) algorithm to verify the model’s accuracy. SVM algorithm uses four different kernels to check the accuracy criterion, and we checked all of them. This research used the convenience sampling approach in gathering the data. We used the conventional bias test method. A total of 466 respondents were completed. Technological innovations of startups and CRM have a positive and significant effect on customer participation. Customer participation significantly affects the value of pleasure, economic value, and relationship value. Based on the importance-performance map analysis (IPMA) matrix results, “customer participation” with a score of 0.782 had the highest importance. If customers increase their participation performance by one unit during the COVID-19 epidemic, its overall CPB increases by 0.782. In addition, our results showed that the lowest performance is related to the technological innovations of startups, which indicates an excellent opportunity for development in this area. SVM results showed that polynomial kernel, to a high degree, is the best kernel that confirms the model’s accuracy.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-25
      DOI: 10.3390/bdcc6020034
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 35: Social Networks Marketing and Consumer Purchase
           Behavior: The Combination of SEM and Unsupervised Machine Learning
           Approaches

    • Authors: Pejman Ebrahimi, Marjan Basirat, Ali Yousefi, Md. Nekmahmud, Abbas Gholampour, Maria Fekete-Farkas
      First page: 35
      Abstract: The purpose of this paper is to reveal how social network marketing (SNM) can affect consumers’ purchase behavior (CPB). We used the combination of structural equation modeling (SEM) and unsupervised machine learning approaches as an innovative method. The statistical population of the study concluded users who live in Hungary and use Facebook Marketplace. This research uses the convenience sampling approach to overcome bias. Out of 475 surveys distributed, a total of 466 respondents successfully filled out the entire survey with a response rate of 98.1%. The results showed that all dimensions of social network marketing, such as entertainment, customization, interaction, WoM and trend, had positively and significantly influenced consumer purchase behavior (CPB) in Facebook Marketplace. Furthermore, we used hierarchical clustering and K-means unsupervised algorithms to cluster consumers. The results show that respondents of this research can be clustered in nine different groups based on behavior regarding demographic attributes. It means that distinctive strategies can be used for different clusters. Meanwhile, marketing managers can provide different options, products and services for each group. This study is of high importance in that it has adopted and used plspm and Matrixpls packages in R to show the model predictive power. Meanwhile, we used unsupervised machine learning algorithms to cluster consumer behaviors.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-25
      DOI: 10.3390/bdcc6020035
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 36: Illusion of Truth: Analysing and Classifying
           COVID-19 Fake News in Brazilian Portuguese Language

    • Authors: Patricia Takako Endo, Guto Leoni Santos, Maria Eduarda de Lima Xavier, Gleyson Rhuan Nascimento Campos, Luciana Conceição de Lima, Ivanovitch Silva, Antonia Egli, Theo Lynn
      First page: 36
      Abstract: Public health interventions to counter the COVID-19 pandemic have accelerated and increased digital adoption and use of the Internet for sourcing health information. Unfortunately, there is evidence to suggest that it has also accelerated and increased the spread of false information relating to COVID-19. The consequences of misinformation, disinformation and misinterpretation of health information can interfere with attempts to curb the virus, delay or result in failure to seek or continue legitimate medical treatment and adherence to vaccination, as well as interfere with sound public health policy and attempts to disseminate public health messages. While there is a significant body of literature, datasets and tools to support countermeasures against the spread of false information online in resource-rich languages such as English and Chinese, there are few such resources to support Portuguese, and Brazilian Portuguese specifically. In this study, we explore the use of machine learning and deep learning techniques to identify fake news in online communications in the Brazilian Portuguese language relating to the COVID-19 pandemic. We build a dataset of 11,382 items comprising data from January 2020 to February 2021. Exploratory data analysis suggests that fake news about the COVID-19 vaccine was prevalent in Brazil, much of it related to government communications. To mitigate the adverse impact of fake news, we analyse the impact of machine learning to detect fake news based on stop words in communications. The results suggest that stop words improve the performance of the models when keeping them within the message. Random Forest was the machine learning model with the best results, achieving 97.91% of precision, while Bi-GRU was the best deep learning model with an F1 score of 94.03%.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-01
      DOI: 10.3390/bdcc6020036
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 37: Operations with Nested Named Sets as a Tool for
           Artificial Intelligence

    • Authors: Mark Burgin
      First page: 37
      Abstract: Knowledge and data representations are important for artificial intelligence (AI), as well as for intelligence in general. Intelligent functioning presupposes efficient operation with knowledge and data representations in particular. At the same time, it has been demonstrated that named sets, which are also called fundamental triads, instantiate the most fundamental structure in general and for knowledge and data representations in particular. In this context, named sets allow for effective mathematical portrayal of the key phenomenon, called nesting. Nesting plays a weighty role in a variety of fields, such as mathematics and computer science. Computing tools of AI include nested levels of parentheses in arithmetical expressions; different types of recursion; nesting of several levels of subroutines; nesting in recursive calls; multilevel nesting in information hiding; a variety of nested data structures, such as records, objects, and classes; and nested blocks of imperative source code, such as nested repeat-until clauses, while clauses, if clauses, etc. In this paper, different operations with nested named sets are constructed and their properties obtained, reflecting different attributes of nesting. An AI system receives information in the form of data and knowledge and processing information, performs operations with these data and knowledge. Thus, such a system needs various operations for these processes. Operations constructed in this paper perform processing of data and knowledge in the form of nested named sets. Knowing properties of these operations can help to optimize the processing of data and knowledge in AI systems.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-01
      DOI: 10.3390/bdcc6020037
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 38: Spark Configurations to Optimize Decision Tree
           Classification on UNSW-NB15

    • Authors: Sikha Bagui, Mary Walauskis, Robert DeRush, Huyen Praviset, Shaunda Boucugnani
      First page: 38
      Abstract: This paper looks at the impact of changing Spark’s configuration parameters on machine learning algorithms using a large dataset—the UNSW-NB15 dataset. The environmental conditions that will optimize the classification process are studied. To build smart intrusion detection systems, a deep understanding of the environmental parameters is necessary. Specifically, the focus is on the following environmental parameters: the executor memory, number of executors, number of cores per executor, execution time, as well as the impact on statistical measures. Hence, the objective was to optimize resource usage and minimize processing time for Decision Tree classification, using Spark. This shows whether additional resources will increase performance, lower processing time, and optimize computing resources. The UNSW-NB15 dataset, being a large dataset, provides enough data and complexity to see the changes in computing resource configurations in Spark. Principal Component Analysis was used for preprocessing the dataset. Results indicated that a lack of executors and cores result in wasted resources and long processing time. Excessive resource allocation did not improve processing time. Environmental tuning has a noticeable impact.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-07
      DOI: 10.3390/bdcc6020038
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 39: PCB Component Detection Using Computer Vision for
           Hardware Assurance

    • Authors: Wenwei Zhao, Suprith Reddy Gurudu, Shayan Taheri, Shajib Ghosh, Mukhil Azhagan Mallaiyan Sathiaseelan, Navid Asadizanjani
      First page: 39
      Abstract: Printed circuit board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving, so new techniques are required to overcome the emerging problems. Existing ML-based methods outperform traditional CV methods; however, they often require more data, have low explainability, and can be difficult to adapt when a new technology arises. To overcome these challenges, CV methods can be used in tandem with ML methods. In particular, human-interpretable CV algorithms such as those that extract color, shape, and texture features increase PCB assurance explainability. This allows for incorporation of prior knowledge, which effectively reduces the number of trainable ML parameters and, thus, the amount of data needed to achieve high accuracy when training or retraining an ML model. Hence, this study explores the benefits and limitations of a variety of common computer vision-based features for the task of PCB component detection. The study results indicate that color features demonstrate promising performance for PCB component detection. The purpose of this paper is to facilitate collaboration between the hardware assurance, computer vision, and machine learning communities.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-08
      DOI: 10.3390/bdcc6020039
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 40: Breast and Lung Anticancer Peptides Classification
           Using N-Grams and Ensemble Learning Techniques

    • Authors: Ayad Rodhan Abbas, Bashar Saadoon Mahdi, Osamah Younus Fadhil
      First page: 40
      Abstract: Anticancer peptides (ACPs) are short protein sequences; they perform functions like some hormones and enzymes inside the body. The role of any protein or peptide is related to its structure and the sequence of amino acids that make up it. There are 20 types of amino acids in humans, and each of them has a particular characteristic according to its chemical structure. Current machine and deep learning models have been used to classify ACPs problems. However, these models have neglected Amino Acid Repeats (AARs) that play an essential role in the function and structure of peptides. Therefore, in this paper, ACPs offer a promising route for novel anticancer peptides by extracting AARs based on N-Grams and k-mers using two peptides’ datasets. These datasets pointed to breast and lung cancer cells assembled and curated manually from the Cancer Peptide and Protein Database (CancerPPD). Every dataset consists of a sequence of peptides and their synthesis and anticancer activity on breast and lung cancer cell lines. Five different feature selection methods were used in this paper to improve classification performance and reduce the experimental costs. After that, ACPs were classified using four classifiers, namely AdaBoost, Random Forest Tree (RFT), Multi-class Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). These classifiers were evaluated by applying five well-known evaluation metrics. Experimental results showed that the breast and lung ACPs classification process provided an accurate performance that reached 89.25% and 92.56%, respectively. In terms of AUC, it reached 95.35% and 96.92% for both breast and lung ACPs, respectively. The proposed classifiers performed competently somewhat equally in AUC, accuracy, precision, F-measures, and recall, except for Multi-class SVM-based feature selection, which showed superior performance. As a result, this paper significantly improved the predictive performance that can effectively distinguish ACPs as virtual inactive, experimental inactive, moderately active, and very active.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-12
      DOI: 10.3390/bdcc6020040
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 41: Revisiting Gradient Boosting-Based Approaches for
           Learning Imbalanced Data: A Case of Anomaly Detection on Power Grids

    • Authors: Maya Hilda Lestari Louk, Bayu Adhi Tama
      First page: 41
      Abstract: Gradient boosting ensembles have been used in the cyber-security area for many years; nonetheless, their efficacy and accuracy for intrusion detection systems (IDSs) remain questionable, particularly when dealing with problems involving imbalanced data. This article fills the void in the existing body of knowledge by evaluating the performance of gradient boosting-based ensembles, including gradient boosting machine (GBM), extreme gradient boosting (XGBoost), LightGBM, and CatBoost. This paper assesses the performance of various imbalanced data sets using the Matthew correlation coefficient (MCC), area under the receiver operating characteristic curve (AUC), and F1 metrics. The article discusses an example of anomaly detection in an industrial control network and, more specifically, threat detection in a cyber-physical smart power grid. The tests’ results indicate that CatBoost surpassed its competitors, regardless of the imbalance ratio of the data sets. Moreover, LightGBM showed a much lower performance value and had more variability across the data sets.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-16
      DOI: 10.3390/bdcc6020041
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 42: An Emergency Event Detection Ensemble Model Based
           on Big Data

    • Authors: Khalid Alfalqi, Martine Bellaiche
      First page: 42
      Abstract: Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as well as the inherent financial consequences. Social network utilization in emergency event detection models can play an important role as information is shared and users’ status is updated once an emergency event occurs. Besides, big data proved its significance as a tool to assist and alleviate emergency events by processing an enormous amount of data over a short time interval. This paper shows that it is necessary to have an appropriate emergency event detection ensemble model (EEDEM) to respond quickly once such unfortunate events occur. Furthermore, it integrates Snapchat maps to propose a novel method to pinpoint the exact location of an emergency event. Moreover, merging social networks and big data can accelerate the emergency event detection system: social network data, such as those from Twitter and Snapchat, allow us to manage, monitor, analyze and detect emergency events. The main objective of this paper is to propose a novel and efficient big data-based EEDEM to pinpoint the exact location of emergency events by employing the collected data from social networks, such as “Twitter” and “Snapchat”, while integrating big data (BD) and machine learning (ML). Furthermore, this paper evaluates the performance of five ML base models and the proposed ensemble approach to detect emergency events. Results show that the proposed ensemble approach achieved a very high accuracy of 99.87% which outperform the other base models. Moreover, the proposed base models yields a high level of accuracy: 99.72%, 99.70% for LSTM and decision tree, respectively, with an acceptable training time.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-16
      DOI: 10.3390/bdcc6020042
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 43: New Efficient Approach to Solve Big Data Systems
           Using Parallel Gauss–Seidel Algorithms

    • Authors: Shih Yu Chang, Hsiao-Chun Wu, Yifan Wang
      First page: 43
      Abstract: In order to perform big-data analytics, regression involving large matrices is often necessary. In particular, large scale regression problems are encountered when one wishes to extract semantic patterns for knowledge discovery and data mining. When a large matrix can be processed in its factorized form, advantages arise in terms of computation, implementation, and data-compression. In this work, we propose two new parallel iterative algorithms as extensions of the Gauss–Seidel algorithm (GSA) to solve regression problems involving many variables. The convergence study in terms of error-bounds of the proposed iterative algorithms is also performed, and the required computation resources, namely time- and memory-complexities, are evaluated to benchmark the efficiency of the proposed new algorithms. Finally, the numerical results from both Monte Carlo simulations and real-world datasets are presented to demonstrate the striking effectiveness of our proposed new methods.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-19
      DOI: 10.3390/bdcc6020043
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 44: Deep Learning Approaches for Video Compression: A
           Bibliometric Analysis

    • Authors: Bidwe, Mishra, Patil, Shaw, Vora, Kotecha, Zope
      First page: 44
      Abstract: Every data and kind of data need a physical drive to store it. There has been an explosion in the volume of images, video, and other similar data types circulated over the internet. Users using the internet expect intelligible data, even under the pressure of multiple resource constraints such as bandwidth bottleneck and noisy channels. Therefore, data compression is becoming a fundamental problem in wider engineering communities. There has been some related work on data compression using neural networks. Various machine learning approaches are currently applied in data compression techniques and tested to obtain better lossy and lossless compression results. A very efficient and variety of research is already available for image compression. However, this is not the case for video compression. Because of the explosion of big data and the excess use of cameras in various places globally, around 82% of the data generated involve videos. Proposed approaches have used Deep Neural Networks (DNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs), and various variants of Autoencoders (AEs) are used in their approaches. All newly proposed methods aim to increase performance (reducing bitrate up to 50% at the same data quality and complexity). This paper presents a bibliometric analysis and literature survey of all Deep Learning (DL) methods used in video compression in recent years. Scopus and Web of Science are well-known research databases. The results retrieved from them are used for this analytical study. Two types of analysis are performed on the extracted documents. They include quantitative and qualitative results. In quantitative analysis, records are analyzed based on their citations, keywords, source of publication, and country of publication. The qualitative analysis provides information on DL-based approaches for video compression, as well as the advantages, disadvantages, and challenges of using them.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-19
      DOI: 10.3390/bdcc6020044
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 45: Virtual Reality-Based Stimuli for Immersive Car
           Clinics: A Performance Evaluation Model

    • Authors: Alexandre Costa Henriques, Thiago Barros Murari, Jennifer Callans, Alexandre Maguino Pinheiro Silva, Antonio Lopes Apolinario, Ingrid Winkler
      First page: 45
      Abstract: This study proposes a model to evaluate the performance of virtual reality-based stimuli for immersive car clinics. The model considered Attribute Importance, Stimuli Efficacy and Stimuli Cost factors and the method was divided into three stages: we defined the importance of fourteen attributes relevant to a car clinic based on the perceptions of Marketing and Design experts; then we defined the efficacy of five virtual stimuli based on the perceptions of Product Development and Virtual Reality experts; and we used a cost factor to calculate the efficiency of the five virtual stimuli in relation to the physical. The Marketing and Design experts identified a new attribute, Scope; eleven of the fifteen attributes were rated as Important or Very Important, while four were removed from the model due to being considered irrelevant. According to our performance evaluation model, virtual stimuli have the same efficacy as physical stimuli. However, when cost is considered, virtual stimuli outperform physical stimuli, particularly virtual stimuli with glasses. We conclude that virtual stimuli have the potential to reduce the cost and time required to develop new stimuli in car clinics, but with concerns related to hardware, software, and other definitions.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-20
      DOI: 10.3390/bdcc6020045
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 46: A Non-Uniform Continuous Cellular Automata for
           Analyzing and Predicting the Spreading Patterns of COVID-19

    • Authors: Puspa Eosina, Aniati Murni Arymurthy, Adila Alfa Krisnadhi
      First page: 46
      Abstract: During the COVID-19 outbreak, modeling the spread of infectious diseases became a challenging research topic due to its rapid spread and high mortality rate. The main objective of a standard epidemiological model is to estimate the number of infected, suspected, and recovered from the illness by mathematical modeling. This model does not capture how the disease transmits between neighboring regions through interaction. A more general framework such as Cellular Automata (CA) is required to accommodate a more complex spatial interaction within the epidemiological model. The critical issue of modeling in the spread of diseases is how to reduce the prediction error. This research aims to formulate the influence of the interaction of a neighborhood on the spreading pattern of COVID-19 using a neighborhood frame model in a Cellular-Automata (CA) approach and obtain a predictive model for the COVID-19 spread with the error reduction to improve the model. We propose a non-uniform continuous CA (N-CCA) as our contribution to demonstrate the influence of interactions on the spread of COVID-19. The model has succeeded in demonstrating the influence of the interaction between regions on the COVID-19 spread, as represented by the coefficients obtained. These coefficients result from multiple regression models. The coefficient obtained represents the population’s behavior interacting with its neighborhood in a cell and influences the number of cases that occur the next day. The evaluation of the N-CCA model is conducted by root mean square error (RMSE) for the difference in the number of cases between prediction and real cases per cell in each region. This study demonstrates that this approach improves the prediction of accuracy for 14 days in the future using data points from the past 42 days, compared to a baseline model.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-24
      DOI: 10.3390/bdcc6020046
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 47: Incentive Mechanisms for Smart Grid: State of the
           Art, Challenges, Open Issues, Future Directions

    • Authors: Sweta Bhattacharya, Rajeswari Chengoden, Gautam Srivastava, Mamoun Alazab, Abdul Rehman Javed, Nancy Victor, Praveen Kumar Reddy Maddikunta, Thippa Reddy Gadekallu
      First page: 47
      Abstract: Smart grids (SG) are electricity grids that communicate with each other, provide reliable information, and enable administrators to operate energy supplies across the country, ensuring optimized reliability and efficiency. The smart grid contains sensors that measure and transmit data to adjust the flow of electricity automatically based on supply/demand, and thus, responding to problems becomes quicker and easier. This also plays a crucial role in controlling carbon emissions, by avoiding energy losses during peak load hours and ensuring optimal energy management. The scope of big data analytics in smart grids is huge, as they collect information from raw data and derive intelligent information from the same. However, these benefits of the smart grid are dependent on the active and voluntary participation of the consumers in real-time. Consumers need to be motivated and conscious to avail themselves of the achievable benefits. Incentivizing the appropriate actor is an absolute necessity to encourage prosumers to generate renewable energy sources (RES) and motivate industries to establish plants that support sustainable and green-energy-based processes or products. The current study emphasizes similar aspects and presents a comprehensive survey of the start-of-the-art contributions pertinent to incentive mechanisms in smart grids, which can be used in smart grids to optimize the power distribution during peak times and also reduce carbon emissions. The various technologies, such as game theory, blockchain, and artificial intelligence, used in implementing incentive mechanisms in smart grids are discussed, followed by different incentive projects being implemented across the globe. The lessons learnt, challenges faced in such implementations, and open issues such as data quality, privacy, security, and pricing related to incentive mechanisms in SG are identified to guide the future scope of research in this sector.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-27
      DOI: 10.3390/bdcc6020047
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 48: A New Ontology-Based Method for Arabic Sentiment
           Analysis

    • Authors: Safaa M. Khabour, Qasem A. Al-Radaideh, Dheya Mustafa
      First page: 48
      Abstract: Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for Arabic sentiment analysis based on domain ontologies and features’ importance. In this paper, we built a semantic orientation approach for calculating overall polarity from the Arabic subjective texts based on built domain ontology and the available sentiment lexicon. We used the ontology concepts to extract and weight the semantic domain features by considering their levels in the ontology tree and their frequencies in the dataset to compute the overall polarity of a given textual review based on the importance of each domain feature. For evaluation, an Arabic dataset from the hotels’ domain was selected to build the domain ontology and to test the proposed approach. The overall accuracy and f-measure reach 79.20% and 78.75%, respectively. Results showed that the approach outperformed the other semantic orientation approaches, and it is an appealing approach to be used for Arabic sentiment analysis.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-04-29
      DOI: 10.3390/bdcc6020048
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 49: A Comparative Study of MongoDB and Document-Based
           MySQL for Big Data Application Data Management

    • Authors: Cornelia A. Győrödi, Diana V. Dumşe-Burescu, Doina R. Zmaranda, Robert Ş. Győrödi
      First page: 49
      Abstract: In the context of the heavy demands of Big Data, software developers have also begun to consider NoSQL data storage solutions. One of the important criteria when choosing a NoSQL database for an application is its performance in terms of speed of data accessing and processing, including response times to the most important CRUD operations (CREATE, READ, UPDATE, DELETE). In this paper, the behavior of two of the major document-based NoSQL databases, MongoDB and document-based MySQL, was analyzed in terms of the complexity and performance of CRUD operations, especially in query operations. The main objective of the paper is to make a comparative analysis of the impact that each specific database has on application performance when realizing CRUD requests. To perform this analysis, a case-study application was developed using the two document-based MongoDB and MySQL databases, which aim to model and streamline the activity of service providers that use a lot of data. The results obtained demonstrate the performance of both databases for different volumes of data; based on these, a detailed analysis and several conclusions were presented to support a decision for choosing an appropriate solution that could be used in a big-data application.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-05
      DOI: 10.3390/bdcc6020049
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 50: Gender Stereotypes in Hollywood Movies and Their
           Evolution over Time: Insights from Network Analysis

    • Authors: Arjun M. Kumar, Jasmine Y. Q. Goh, Tiffany H. H. Tan, Cynthia S. Q. Siew
      First page: 50
      Abstract: The present analysis of more than 180,000 sentences from movie plots across the period from 1940 to 2019 emphasizes how gender stereotypes are expressed through the cultural products of society. By applying a network analysis to the word co-occurrence networks of movie plots and using a novel method of identifying story tropes, we demonstrate that gender stereotypes exist in Hollywood movies. An analysis of specific paths in the network and the words reflecting various domains show the dynamic changes in some of these stereotypical associations. Our results suggest that gender stereotypes are complex and dynamic in nature. Specifically, whereas male characters appear to be associated with a diversity of themes in movies, female characters seem predominantly associated with the theme of romance. Although associations of female characters to physical beauty and marriage are declining over time, associations of female characters to sexual relationships and weddings are increasing. Our results demonstrate how the application of cognitive network science methods can enable a more nuanced investigation of gender stereotypes in textual data.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-06
      DOI: 10.3390/bdcc6020050
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 51: Robust Multi-Mode Synchronization of Chaotic
           Fractional Order Systems in the Presence of Disturbance, Time Delay and
           Uncertainty with Application in Secure Communications

    • Authors: Ali Akbar Kekha Javan, Assef Zare, Roohallah Alizadehsani, Saeed Balochian
      First page: 51
      Abstract: This paper investigates the robust adaptive synchronization of multi-mode fractional-order chaotic systems (MMFOCS). To that end, synchronization was performed with unknown parameters, unknown time delays, the presence of disturbance, and uncertainty with the unknown boundary. The convergence of the synchronization error to zero was guaranteed using the Lyapunov function. Additionally, the control rules were extracted as explicit continuous functions. An image encryption approach was proposed based on maps with time-dependent coding for secure communication. The simulations indicated the effectiveness of the proposed design regarding the suitability of the parameters, the convergence of errors, and robustness. Subsequently, the presented method was applied to fractional-order Chen systems and was encrypted using the chaotic masking of different benchmark images. The results indicated the desirable performance of the proposed method in encrypting the benchmark images.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-08
      DOI: 10.3390/bdcc6020051
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 52: Cognitive Networks Extract Insights on COVID-19
           Vaccines from English and Italian Popular Tweets: Anticipation, Logistics,
           Conspiracy and Loss of Trust

    • Authors: Massimo Stella, Michael S. Vitevitch, Federico Botta
      First page: 52
      Abstract: Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. This work reconstructs how popular and trending posts framed semantically and emotionally COVID-19 vaccines on Twitter. We achieve this by merging natural language processing, cognitive network science and AI-based image analysis. We focus on 4765 unique popular tweets in English or Italian about COVID-19 vaccines between December 2020 and March 2021. One popular English tweet contained in our data set was liked around 495,000 times, highlighting how popular tweets could cognitively affect large parts of the population. We investigate both text and multimedia content in tweets and build a cognitive network of syntactic/semantic associations in messages, including emotional cues and pictures. This network representation indicates how online users linked ideas in social discourse and framed vaccines along specific semantic/emotional content. The English semantic frame of “vaccine” was highly polarised between trust/anticipation (towards the vaccine as a scientific asset saving lives) and anger/sadness (mentioning critical issues with dose administering). Semantic associations with “vaccine,” “hoax” and conspiratorial jargon indicated the persistence of conspiracy theories and vaccines in extremely popular English posts. Interestingly, these were absent in Italian messages. Popular tweets with images of people wearing face masks used language that lacked the trust and joy found in tweets showing people with no masks. This difference indicates a negative effect attributed to face-covering in social discourse. Behavioural analysis revealed a tendency for users to share content eliciting joy, sadness and disgust and to like sad messages less. Both patterns indicate an interplay between emotions and content diffusion beyond sentiment. After its suspension in mid-March 2021, “AstraZeneca” was associated with trustful language driven by experts. After the deaths of a small number of vaccinated people in mid-March, popular Italian tweets framed “vaccine” by crucially replacing earlier levels of trust with deep sadness. Our results stress how cognitive networks and innovative multimedia processing open new ways for reconstructing online perceptions about vaccines and trust.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-12
      DOI: 10.3390/bdcc6020052
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 53: Knowledge Modelling and Learning through Cognitive
           Networks

    • Authors: Massimo Stella, Yoed N. Kenett
      First page: 53
      Abstract: Knowledge modelling is a growing field at the fringe of computer science, psychology and network science [...]
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-13
      DOI: 10.3390/bdcc6020053
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 54: A New Comparative Study of Dimensionality
           Reduction Methods in Large-Scale Image Retrieval

    • Authors: Mohammed Amin Belarbi, Saïd Mahmoudi, Ghalem Belalem, Sidi Ahmed Mahmoudi, Aurélie Cools
      First page: 54
      Abstract: Indexing images by content is one of the most used computer vision methods, where various techniques are used to extract visual characteristics from images. The deluge of data surrounding us, due the high use of social and diverse media acquisition systems, has created a major challenge for classical multimedia processing systems. This problem is referred to as the ‘curse of dimensionality’. In the literature, several methods have been used to decrease the high dimension of features, including principal component analysis (PCA) and locality sensitive hashing (LSH). Some methods, such as VA-File or binary tree, can be used to accelerate the search phase. In this paper, we propose an efficient approach that exploits three particular methods, those being PCA and LSH for dimensionality reduction, and the VA-File method to accelerate the search phase. This combined approach is fast and can be used for high dimensionality features. Indeed, our method consists of three phases: (1) image indexing within SIFT and SURF algorithms, (2) compressing the data using LSH and PCA, and (3) finally launching the image retrieval process, which is accelerated by using a VA-File approach.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-13
      DOI: 10.3390/bdcc6020054
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 55: Virtual Reality Adaptation Using Electrodermal
           Activity to Support the User Experience

    • Authors: Francesco Chiossi, Robin Welsch, Steeven Villa, Lewis Chuang, Sven Mayer
      First page: 55
      Abstract: Virtual reality is increasingly used for tasks such as work and education. Thus, rendering scenarios that do not interfere with such goals and deplete user experience are becoming progressively more relevant. We present a physiologically adaptive system that optimizes the virtual environment based on physiological arousal, i.e., electrodermal activity. We investigated the usability of the adaptive system in a simulated social virtual reality scenario. Participants completed an n-back task (primary) and a visual detection (secondary) task. Here, we adapted the visual complexity of the secondary task in the form of the number of non-player characters of the secondary task to accomplish the primary task. We show that an adaptive virtual reality can improve users’ comfort by adapting to physiological arousal regarding the task complexity. Our findings suggest that physiologically adaptive virtual reality systems can improve users’ experience in a wide range of scenarios.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-13
      DOI: 10.3390/bdcc6020055
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 56: A Better Mechanistic Understanding of Big Data
           through an Order Search Using Causal Bayesian Networks

    • Authors: Changwon Yoo, Efrain Gonzalez, Zhenghua Gong, Deodutta Roy
      First page: 56
      Abstract: Every year, biomedical data is increasing at an alarming rate and is being collected from many different sources, such as hospitals (clinical Big Data), laboratories (genomic and proteomic Big Data), and the internet (online Big Data). This article presents and evaluates a practical causal discovery algorithm that uses modern statistical, machine learning, and informatics approaches that have been used in the learning of causal relationships from biomedical Big Data, which in turn integrates clinical, omics (genomic and proteomic), and environmental aspects. The learning of causal relationships from data using graphical models does not address the hidden (unknown or not measured) mechanisms that are inherent to most measurements and analyses. Also, many algorithms lack a practical usage since they do not incorporate current mechanistic knowledge. This paper proposes a practical causal discovery algorithm using causal Bayesian networks to gain a better understanding of the underlying mechanistic process that generated the data. The algorithm utilizes model averaging techniques such as searching through a relative order (e.g., if gene A is regulating gene B, then we can say that gene A is of a higher order than gene B) and incorporates relevant prior mechanistic knowledge to guide the Markov chain Monte Carlo search through the order. The algorithm was evaluated by testing its performance on datasets generated from the ALARM causal Bayesian network. Out of the 37 variables in the ALARM causal Bayesian network, two sets of nine were chosen and the observations for those variables were provided to the algorithm. The performance of the algorithm was evaluated by comparing its prediction with the generating causal mechanism. The 28 variables that were not in use are referred to as hidden variables and they allowed for the evaluation of the algorithm’s ability to predict hidden confounded causal relationships. The algorithm’s predicted performance was also compared with other causal discovery algorithms. The results show that incorporating order information provides a better mechanistic understanding even when hidden confounded causes are present. The prior mechanistic knowledge incorporated in the Markov chain Monte Carlo search led to the better discovery of causal relationships when hidden variables were involved in generating the simulated data.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-17
      DOI: 10.3390/bdcc6020056
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 57: Sentiment Analysis of Emirati Dialects

    • Authors: Al Shamsi, Abdallah
      First page: 57
      Abstract: Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA. Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati dataset to prepare the dataset for the sentiment analysis experiment and improve its classification performance. The sentiment analysis experiment was carried out on both balanced and unbalanced datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis experiments were accuracy, recall, precision, and f-measure. The results reported that the best accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the sentiment classification of the unbalanced dataset.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-17
      DOI: 10.3390/bdcc6020057
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 58: COVID-19 Tweets Classification Based on a Hybrid
           Word Embedding Method

    • Authors: Yosra Didi, Ahlam Walha, Ali Wali
      First page: 58
      Abstract: In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better assess health-related decision making. Therefore, we propose that users’ sentiments could be analysed with the application of effective supervised machine learning approaches to predict disease prevalence and provide early warnings. The collected tweets were prepared for preprocessing and categorised into: negative, positive, and neutral. In the second phase, different features were extracted from the posts by applying several widely used techniques, such as TF-IDF, Word2Vec, Glove, and FastText to capture features’ datasets. The novelty of this study is based on hybrid features extraction, where we combined syntactic features (TF-IDF) with semantic features (FastText and Glove) to represent posts accurately, which helps in improving the classification process. Experimental results show that FastText combined with TF-IDF performed better with SVM than the other models. SVM outperformed the other models by 88.72%, as well as for XGBoost, with an 85.29% accuracy score. This study shows that the hybrid methods proved their capability of extracting features from the tweets and increasing the performance of classification.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-18
      DOI: 10.3390/bdcc6020058
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 59: The Predictive Power of a Twitter User’s
           Profile on Cryptocurrency Popularity

    • Authors: Maria Trigka, Andreas Kanavos, Elias Dritsas, Gerasimos Vonitsanos, Phivos Mylonas
      First page: 59
      Abstract: Microblogging has become an extremely popular communication tool among Internet users worldwide. Millions of users daily share a huge amount of information related to various aspects of their lives, which makes the respective sites a very important source of data for analysis. Bitcoin (BTC) is a decentralized cryptographic currency and is equivalent to most recurrently known currencies in the way that it is influenced by socially developed conclusions, regardless of whether those conclusions are considered valid. This work aims to assess the importance of Twitter users’ profiles in predicting a cryptocurrency’s popularity. More specifically, our analysis focused on the user influence, captured by different Twitter features (such as the number of followers, retweets, lists) and tweet sentiment scores as the main components of measuring popularity. Moreover, the Spearman, Pearson, and Kendall Correlation Coefficients are applied as post-hoc procedures to support hypotheses about the correlation between a user influence and the aforementioned features. Tweets sentiment scoring (as positive or negative) was performed with the aid of Valence Aware Dictionary and Sentiment Reasoner (VADER) for a number of tweets fetched within a concrete time period. Finally, the Granger causality test was employed to evaluate the statistical significance of various features time series in popularity prediction to identify the most influential variable for predicting future values of the cryptocurrency popularity.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-20
      DOI: 10.3390/bdcc6020059
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 60: Earthquake Insurance in California, USA: What Does
           Community-Generated Big Data Reveal to Us'

    • Authors: Fabrizio Terenzio Gizzi, Maria Rosaria Potenza
      First page: 60
      Abstract: California has a high seismic hazard, as many historical and recent earthquakes remind us. To deal with potential future damaging earthquakes, a voluntary insurance system for residential properties is in force in the state. However, the insurance penetration rate is quite low. Bearing this in mind, the aim of this article is to ascertain whether Big Data can provide policymakers and stakeholders with useful information in view of future action plans on earthquake coverage. Therefore, we extracted and analyzed the online search interest in earthquake insurance over time (2004–2021) through Google Trends (GT), a website that explores the popularity of top search queries in Google Search across various regions and languages. We found that (1) the triggering of online searches stems primarily from the occurrence of earthquakes in California and neighboring areas as well as oversea regions, thus suggesting that the interest of users was guided by both direct and vicarious earthquake experiences. However, other natural hazards also come to people’s notice; (2) the length of the higher level of online attention spans from one day to one week, depending on the magnitude of the earthquakes, the place where they occur, the temporal proximity of other natural hazards, and so on; (3) users interested in earthquake insurance are also attentive to knowing the features of the policies, among which are first the price of coverage, and then their worth and practical benefits; (4) online interest in the time span analyzed fits fairly well with the real insurance policy underwritings recorded over the years. Based on the research outcomes, we can propose the establishment of an observatory to monitor the online behavior that is suitable for supporting well-timed and geographically targeted information and communication action plans.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-20
      DOI: 10.3390/bdcc6020060
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 61: A Novel Method of Exploring the Uncanny Valley in
           Avatar Gender(Sex) and Realism Using Electromyography

    • Authors: Jacqueline D. Bailey, Karen L. Blackmore
      First page: 61
      Abstract: Despite the variety of applications that use avatars (virtual humans), how end-users perceive avatars are not fully understood, and accurately measuring these perceptions remains a challenge. To measure end-user responses more accurately to avatars, this pilot study uses a novel methodology which aims to examine and categorize end-user facial electromyography (f-EMG) responses. These responses (n = 92) can be categorized as pleasant, unpleasant, and neutral using control images sourced from the International Affective Picture System (IAPS). This methodology can also account for variability between participant responses to avatars. The novel methodology taken here can assist in the comparisons of avatars, such as gender(sex)-based differences. To examine these gender(sex) differences, participant responses to an avatar can be categorized as either pleasant, unpleasant, neutral or a combination. Although other factors such as age may unconsciously affect the participant responses, age was not directly considered in this work. This method may allow avatar developers to better understand how end-users objectively perceive an avatar. The recommendation of this methodology is to aim for an avatar that returns a pleasant, neutral, or pleasant-neutral response, unless an unpleasant response is the intended. This methodology demonstrates a novel and useful way forward to address some of the known variability issues found in f-EMG responses, and responses to avatar realism and uncanniness that can be used to examine gender(sex) perceptions.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-05-30
      DOI: 10.3390/bdcc6020061
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 62: Synthesizing a Talking Child Avatar to Train
           Interviewers Working with Maltreated Children

    • Authors: Pegah Salehi, Syed Zohaib Hassan, Myrthe Lammerse, Saeed Shafiee Sabet, Ingvild Riiser, Ragnhild Klingenberg Røed, Miriam S. Johnson, Vajira Thambawita, Steven A. Hicks, Martine Powell, Michael E. Lamb, Gunn Astrid Baugerud, Pål Halvorsen, Michael A. Riegler
      First page: 62
      Abstract: When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Current research emphasizes the importance of the interviewer’s ability to follow empirically based guidelines. In doing so, it is essential to implement economical and scientific training courses for interviewers. Due to recent advances in artificial intelligence, we propose to generate a realistic and interactive child avatar, aiming to mimic a child. Our ongoing research involves the integration and interaction of different components with each other, including how to handle the language, auditory, emotional, and visual components of the avatar. This paper presents three subjective studies that investigate and compare various state-of-the-art methods for implementing multiple aspects of the child avatar. The first user study evaluates the whole system and shows that the system is well received by the expert and highlights the importance of its realism. The second user study investigates the emotional component and how it can be integrated with video and audio, and the third user study investigates realism in the auditory and visual components of the avatar created by different methods. The insights and feedback from these studies have contributed to the refined and improved architecture of the child avatar system which we present here.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-06-01
      DOI: 10.3390/bdcc6020062
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 63: Social Media Analytics as a Tool for Cultural
           Spaces—The Case of Twitter Trending Topics

    • Authors: Vassilis Poulopoulos, Manolis Wallace
      First page: 63
      Abstract: We are entering an era in which online personalities and personas will grow faster and faster. People are tending to use the Internet, and social media especially, more frequently and for a wider variety of purposes. In parallel, a number of cultural spaces have already decided to invest in marketing and message spreading through the web and the media. Growing their audience, or locating the appropriate group of people to share their information, remains a tedious task within the chaotic environment of the Internet. The investment is mainly financial—usually large—and directed to advertisements. Still, there is much space for research and investment in analytics that can provide evidence considering the spreading of the word and finding groups of people interested in specific information or trending topics and influencers. In this paper, we present a part of a national project that aims to perform an analysis of Twitter’s trending topics. The main scope of the analysis is to provide a basic ordering on the topics based on their “importance”. Based on this, we clarify how cultural institutions can benefit from such an analysis in order to empower their online presence.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-06-02
      DOI: 10.3390/bdcc6020063
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 64: Decision-Making Using Big Data Relevant to
           Sustainable Development Goals (SDGs)

    • Authors: Saman Fattahi, Sharifu Ura, Muhammad Noor-E-Alam
      First page: 64
      Abstract: Policymakers, practitioners, and researchers around the globe have been acting in a coordinated manner, yet remaining independent, to achieve the seventeen Sustainable Development Goals (SDGs) defined by the United Nations. Remarkably, SDG-centric activities have manifested a huge information silo known as big data. In most cases, a relevant subset of big data is visualized using several two-dimensional plots. These plots are then used to decide a course of action for achieving the relevant SDGs, and the whole process remains rather informal. Consequently, the question of how to make a formal decision using big data-generated two-dimensional plots is a critical one. This article fills this gap by presenting a novel decision-making approach (method and tool). The approach formally makes decisions where the decision-relevant information is two-dimensional plots rather than numerical data. The efficacy of the proposed approach is demonstrated by conducting two case studies relevant to SDG 12 (responsible consumption and production). The first case study confirms whether or not the proposed decision-making approach produces reliable results. In this case study, datasets of wooden and polymeric materials regarding two eco-indicators (CO2 footprint and water usage) are represented using two two-dimensional plots. The plots show that wooden and polymeric materials are indifferent in water usage, whereas wooden materials are better than polymeric materials in terms of CO2 footprint. The proposed decision-making approach correctly captures this fact and correctly ranks the materials. For the other case study, three materials (mild steel, aluminum alloys, and magnesium alloys) are ranked using six criteria (strength, modulus of elasticity, cost, density, CO2 footprint, and water usage) and their relative weights. The datasets relevant to the six criteria are made available using three two-dimensional plots. The plots show the relative positions of mild steel, aluminum alloys, and magnesium alloys. The proposed decision-making approach correctly captures the decision-relevant information of these three plots and correctly ranks the materials. Thus, the outcomes of this article can help those who wish to develop pragmatic decision support systems leveraging the capacity of big data in fulfilling SDGs.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-06-05
      DOI: 10.3390/bdcc6020064
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 65: Analysis and Prediction of User Sentiment on
           COVID-19 Pandemic Using Tweets

    • Authors: Nilufa Yeasmin, Nosin Ibna Mahbub, Mrinal Kanti Baowaly, Bikash Chandra Singh, Zulfikar Alom, Zeyar Aung, Mohammad Abdul Azim
      First page: 65
      Abstract: The novel coronavirus disease (COVID-19) has dramatically affected people’s daily lives worldwide. More specifically, since there is still insufficient access to vaccines and no straightforward, reliable treatment for COVID-19, every country has taken the appropriate precautions (such as physical separation, masking, and lockdown) to combat this extremely infectious disease. As a result, people invest much time on online social networking platforms (e.g., Facebook, Reddit, LinkedIn, and Twitter) and express their feelings and thoughts regarding COVID-19. Twitter is a popular social networking platform, and it enables anyone to use tweets. This research used Twitter datasets to explore user sentiment from the COVID-19 perspective. We used a dataset of COVID-19 Twitter posts from nine states in the United States for fifteen days (from 1 April 2020, to 15 April 2020) to analyze user sentiment. We focus on exploiting machine learning (ML), and deep learning (DL) approaches to classify user sentiments regarding COVID-19. First, we labeled the dataset into three groups based on the sentiment values, namely positive, negative, and neutral, to train some popular ML algorithms and DL models to predict the user concern label on COVID-19. Additionally, we have compared traditional bag-of-words and term frequency-inverse document frequency (TF-IDF) for representing the text to numeric vectors in ML techniques. Furthermore, we have contrasted the encoding methodology and various word embedding schemes, such as the word to vector (Word2Vec) and global vectors for word representation (GloVe) versions, with three sets of dimensions (100, 200, and 300) for representing the text to numeric vectors for DL approaches. Finally, we compared COVID-19 infection cases and COVID-19-related tweets during the COVID-19 pandemic.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-06-10
      DOI: 10.3390/bdcc6020065
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 66: CompositeView: A Network-Based Visualization Tool

    • Authors: Stephen A. Allegri, Kevin McCoy, Cassie S. Mitchell
      First page: 66
      Abstract: Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display them using the Cytoscape component of Dash. Composite scores are defined representations of smaller sets of conceptually similar data that, when combined, generate a single score to reduce information overload. Visualized interactive results are user-refined via filtering elements such as node value and edge weight sliders and graph manipulation options (e.g., node color and layout spread). The primary difference between CompositeView and other network visualization tools is its ability to auto-calculate and auto-update composite scores as the user interactively filters or aggregates data. CompositeView was developed to visualize network relevance rankings, but it performs well with non-network data. Three disparate CompositeView use cases are shown: relevance rankings from SemNet 2.0, an open-source knowledge graph relationship ranking software for biomedical literature-based discovery; Human Development Index (HDI) data; and the Framingham cardiovascular study. CompositeView was stress tested to construct reference benchmarks that define breadth and size of data effectively visualized. Finally, CompositeView is compared to Excel, Tableau, Cytoscape, neo4j, NodeXL, and Gephi.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-06-14
      DOI: 10.3390/bdcc6020066
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 67: Iris Liveness Detection Using Multiple Deep
           Convolution Networks

    • Authors: Smita Khade, Shilpa Gite, Biswajeet Pradhan
      First page: 67
      Abstract: In the recent decade, comprehensive research has been carried out in terms of promising biometrics modalities regarding humans’ physical features for person recognition. This work focuses on iris characteristics and traits for person identification and iris liveness detection. This study used five pre-trained networks, including VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7, to recognize iris liveness using transfer learning techniques. These models are compared using three state-of-the-art biometric databases: the LivDet-Iris 2015 dataset, IIITD contact dataset, and ND Iris3D 2020 dataset. Validation accuracy, loss, precision, recall, and f1-score, APCER (attack presentation classification error rate), NPCER (normal presentation classification error rate), and ACER (average classification error rate) were used to evaluate the performance of all pre-trained models. According to the observational data, these models have a considerable ability to transfer their experience to the field of iris recognition and to recognize the nanostructures within the iris region. Using the ND Iris 3D 2020 dataset, the EfficeintNetB7 model has achieved 99.97% identification accuracy. Experiments show that pre-trained models outperform other current iris biometrics variants.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-06-15
      DOI: 10.3390/bdcc6020067
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 68: Áika: A Distributed Edge System for AI
           Inference

    • Authors: Joakim Aalstad Alslie, Aril Bernhard Ovesen, Tor-Arne Schmidt Nordmo, Håvard Dagenborg Johansen, Pål Halvorsen, Michael Alexander Riegler, Dag Johansen
      First page: 68
      Abstract: Video monitoring and surveillance of commercial fisheries in world oceans has been proposed by the governing bodies of several nations as a response to crimes such as overfishing. Traditional video monitoring systems may not be suitable due to limitations in the offshore fishing environment, including low bandwidth, unstable satellite network connections and issues of preserving the privacy of crew members. In this paper, we present Áika, a robust system for executing distributed Artificial Intelligence (AI) applications on the edge. Áika provides engineers and researchers with several building blocks in the form of Agents, which enable the expression of computation pipelines and distributed applications with robustness and privacy guarantees. Agents are continuously monitored by dedicated monitoring nodes, and provide applications with a distributed checkpointing and replication scheme. Áika is designed for monitoring and surveillance in privacy-sensitive and unstable offshore environments, where flexible access policies at the storage level can provide privacy guarantees for data transfer and access.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-06-17
      DOI: 10.3390/bdcc6020068
      Issue No: Vol. 6, No. 2 (2022)
       
  • BDCC, Vol. 6, Pages 3: Analyzing Political Polarization on Social Media by
           Deleting Bot Spamming

    • Authors: Riccardo Cantini, Fabrizio Marozzo, Domenico Talia, Paolo Trunfio
      First page: 3
      Abstract: Social media platforms are part of everyday life, allowing the interconnection of people around the world in large discussion groups relating to every topic, including important social or political issues. Therefore, social media have become a valuable source of information-rich data, commonly referred to as Social Big Data, effectively exploitable to study the behavior of people, their opinions, moods, interests and activities. However, these powerful communication platforms can be also used to manipulate conversation, polluting online content and altering the popularity of users, through spamming activities and misinformation spreading. Recent studies have shown the use on social media of automatic entities, defined as social bots, that appear as legitimate users by imitating human behavior aimed at influencing discussions of any kind, including political issues. In this paper we present a new methodology, namely TIMBRE (Time-aware opInion Mining via Bot REmoval), aimed at discovering the polarity of social media users during election campaigns characterized by the rivalry of political factions. This methodology is temporally aware and relies on a keyword-based classification of posts and users. Moreover, it recognizes and filters out data produced by social media bots, which aim to alter public opinion about political candidates, thus avoiding heavily biased information. The proposed methodology has been applied to a case study that analyzes the polarization of a large number of Twitter users during the 2016 US presidential election. The achieved results show the benefits brought by both removing bots and taking into account temporal aspects in the forecasting process, revealing the high accuracy and effectiveness of the proposed approach. Finally, we investigated how the presence of social bots may affect political discussion by studying the 2016 US presidential election. Specifically, we analyzed the main differences between human and artificial political support, estimating also the influence of social bots on legitimate users.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-04
      DOI: 10.3390/bdcc6010003
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 4: Analyzing COVID-19 Medical Papers Using Artificial
           Intelligence: Insights for Researchers and Medical Professionals

    • Authors: Dmitry Soshnikov, Tatiana Petrova, Vickie Soshnikova, Andrey Grunin
      First page: 4
      Abstract: Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers navigate the medical papers collections in a meaningful way and extract some knowledge from scientific COVID-19 papers. The main idea of our approach is to get as much semi-structured information from text corpus as possible, using named entity recognition (NER) with a model called PubMedBERT and Text Analytics for Health service, then store the data into NoSQL database for further fast processing and insights generation. Additionally, the contexts in which the entities were used (neutral or negative) are determined. Application of NLP and text-based emotion detection (TBED) methods to COVID-19 text corpus allows us to gain insights on important issues of diagnosis and treatment (such as changes in medical treatment over time, joint treatment strategies using several medications, and the connection between signs and symptoms of coronavirus, etc.).
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-05
      DOI: 10.3390/bdcc6010004
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 5: A Hierarchical Hadoop Framework to Process
           Geo-Distributed Big Data

    • Authors: Giuseppe Di Modica, Orazio Tomarchio
      First page: 5
      Abstract: In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-06
      DOI: 10.3390/bdcc6010005
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 6: On Developing Generic Models for Predicting Student
           Outcomes in Educational Data Mining

    • Authors: Gomathy Ramaswami, Teo Susnjak, Anuradha Mathrani
      First page: 6
      Abstract: Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-07
      DOI: 10.3390/bdcc6010006
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 7: Infusing Autopoietic and Cognitive Behaviors into
           

    • Authors: Rao Mikkilineni
      First page: 7
      Abstract: All living beings use autopoiesis and cognition to manage their “life” processes from birth through death. Autopoiesis enables them to use the specification in their genomes to instantiate themselves using matter and energy transformations. They reproduce, replicate, and manage their stability. Cognition allows them to process information into knowledge and use it to manage its interactions between various constituent parts within the system and its interaction with the environment. Currently, various attempts are underway to make modern computers mimic the resilience and intelligence of living beings using symbolic and sub-symbolic computing. We discuss here the limitations of classical computer science for implementing autopoietic and cognitive behaviors in digital machines. We propose a new architecture applying the general theory of information (GTI) and pave the path to make digital automata mimic living organisms by exhibiting autopoiesis and cognitive behaviors. The new science, based on GTI, asserts that information is a fundamental constituent of the physical world and that living beings convert information into knowledge using physical structures that use matter and energy. Our proposal uses the tools derived from GTI to provide a common knowledge representation from existing symbolic and sub-symbolic computing structures to implement autopoiesis and cognitive behaviors.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-10
      DOI: 10.3390/bdcc6010007
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 8: An Empirical Comparison of Portuguese and
           Multilingual BERT Models for Auto-Classification of NCM Codes in
           International Trade

    • Authors: Roberta Rodrigues de Lima, Anita M. R. Fernandes, James Roberto Bombasar, Bruno Alves da Silva, Paul Crocker, Valderi Reis Quietinho Leithardt
      First page: 8
      Abstract: Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-10
      DOI: 10.3390/bdcc6010008
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 9: An Efficient Multi-Scale Anchor Box Approach to
           Detect Partial Faces from a Video Sequence

    • Authors: Dweepna Garg, Priyanka Jain, Ketan Kotecha, Parth Goel, Vijayakumar Varadarajan
      First page: 9
      Abstract: In recent years, face detection has achieved considerable attention in the field of computer vision using traditional machine learning techniques and deep learning techniques. Deep learning is used to build the most recent and powerful face detection algorithms. However, partial face detection still remains to achieve remarkable performance. Partial faces are occluded due to hair, hat, glasses, hands, mobile phones, and side-angle-captured images. Fewer facial features can be identified from such images. In this paper, we present a deep convolutional neural network face detection method using the anchor boxes section strategy. We limited the number of anchor boxes and scales and chose only relevant to the face shape. The proposed model was trained and tested on a popular and challenging face detection benchmark dataset, i.e., Face Detection Dataset and Benchmark (FDDB), and can also detect partially covered faces with better accuracy and precision. Extensive experiments were performed, with evaluation metrics including accuracy, precision, recall, F1 score, inference time, and FPS. The results show that the proposed model is able to detect the face in the image, including occluded features, more precisely than other state-of-the-art approaches, achieving 94.8% accuracy and 98.7% precision on the FDDB dataset at 21 frames per second (FPS).
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-11
      DOI: 10.3390/bdcc6010009
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 10: Extraction of the Relations among Significant
           Pharmacological Entities in Russian-Language Reviews of Internet Users on
           Medications

    • Authors: Alexander Sboev, Anton Selivanov, Ivan Moloshnikov, Roman Rybka, Artem Gryaznov, Sanna Sboeva, Gleb Rylkov
      First page: 10
      Abstract: Nowadays, the analysis of digital media aimed at prediction of the society’s reaction to particular events and processes is a task of a great significance. Internet sources contain a large amount of meaningful information for a set of domains, such as marketing, author profiling, social situation analysis, healthcare, etc. In the case of healthcare, this information is useful for the pharmacovigilance purposes, including re-profiling of medications. The analysis of the mentioned sources requires the development of automatic natural language processing methods. These methods, in turn, require text datasets with complex annotation including information about named entities and relations between them. As the relevant literature analysis shows, there is a scarcity of datasets in the Russian language with annotated entity relations, and none have existed so far in the medical domain. This paper presents the first Russian-language textual corpus where entities have labels of different contexts within a single text, so that related entities share a common context. therefore this corpus is suitable for the task of belonging to the medical domain. Our second contribution is a method for the automated extraction of entity relations in Russian-language texts using the XLM-RoBERTa language model preliminarily trained on Russian drug review texts. A comparison with other machine learning methods is performed to estimate the efficiency of the proposed method. The method yields state-of-the-art accuracy of extracting the following relationship types: ADR–Drugname, Drugname–Diseasename, Drugname–SourceInfoDrug, Diseasename–Indication. As shown on the presented subcorpus from the Russian Drug Review Corpus, the method developed achieves a mean F1-score of 80.4% (estimated with cross-validation, averaged over the four relationship types). This result is 3.6% higher compared to the existing language model RuBERT, and 21.77% higher compared to basic ML classifiers.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-17
      DOI: 10.3390/bdcc6010010
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 11: Context-Aware Explainable Recommendation Based on
           Domain Knowledge Graph

    • Authors: Muzamil Hussain Syed, Tran Quoc Bao Huy, Sun-Tae Chung
      First page: 11
      Abstract: With the rapid growth of internet data, knowledge graphs (KGs) are considered as efficient form of knowledge representation that captures the semantics of web objects. In recent years, reasoning over KG for various artificial intelligence tasks have received a great deal of research interest. Providing recommendations based on users’ natural language queries is an equally difficult undertaking. In this paper, we propose a novel, context-aware recommender system, based on domain KG, to respond to user-defined natural queries. The proposed recommender system consists of three stages. First, we generate incomplete triples from user queries, which are then segmented using logical conjunction (∧) and disjunction (∨) operations. Then, we generate candidates by utilizing a KGE-based framework (Query2Box) for reasoning over segmented logical triples, with ∧, ∨, and ∃ operators; finally, the generated candidates are re-ranked using neural collaborative filtering (NCF) model by exploiting contextual (auxiliary) information from GraphSAGE embedding. Our approach demonstrates to be simple, yet efficient, at providing explainable recommendations on user’s queries, while leveraging user-item contextual information. Furthermore, our framework has shown to be capable of handling logical complex queries by transforming them into a disjunctive normal form (DNF) of simple queries. In this work, we focus on the restaurant domain as an application domain and use the Yelp dataset to evaluate the system. Experiments demonstrate that the proposed recommender system generalizes well on candidate generation from logical queries and effectively re-ranks those candidates, compared to the matrix factorization model.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-20
      DOI: 10.3390/bdcc6010011
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 12: Scalable Extended Reality: A Future Research
           Agenda

    • Authors: Vera Marie Memmesheimer, Achim Ebert
      First page: 12
      Abstract: Extensive research has outlined the potential of augmented, mixed, and virtual reality applications. However, little attention has been paid to scalability enhancements fostering practical adoption. In this paper, we introduce the concept of scalable extended reality (XRS), i.e., spaces scaling between different displays and degrees of virtuality that can be entered by multiple, possibly distributed users. The development of such XRS spaces concerns several research fields. To provide bidirectional interaction and maintain consistency with the real environment, virtual reconstructions of physical scenes need to be segmented semantically and adapted dynamically. Moreover, scalable interaction techniques for selection, manipulation, and navigation as well as a world-stabilized rendering of 2D annotations in 3D space are needed to let users intuitively switch between handheld and head-mounted displays. Collaborative settings should further integrate access control and awareness cues indicating the collaborators’ locations and actions. While many of these topics were investigated by previous research, very few have considered their integration to enhance scalability. Addressing this gap, we review related previous research, list current barriers to the development of XRS spaces, and highlight dependencies between them.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-26
      DOI: 10.3390/bdcc6010012
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 13: Fuzzy Neural Network Expert System with an
           Improved Gini Index Random Forest-Based Feature Importance Measure
           Algorithm for Early Diagnosis of Breast Cancer in Saudi Arabia

    • Authors: Ebrahem A. Algehyne, Muhammad Lawan Jibril, Naseh A. Algehainy, Osama Abdulaziz Alamri, Abdullah K. Alzahrani
      First page: 13
      Abstract: Breast cancer is one of the common malignancies among females in Saudi Arabia and has also been ranked as the one most prevalent and the number two killer disease in the country. However, the clinical diagnosis process of any disease such as breast cancer, coronary artery diseases, diabetes, COVID-19, among others, is often associated with uncertainty due to the complexity and fuzziness of the process. In this work, a fuzzy neural network expert system with an improved gini index random forest-based feature importance measure algorithm for early diagnosis of breast cancer in Saudi Arabia was proposed to address the uncertainty and ambiguity associated with the diagnosis of breast cancer and also the heavier burden on the overlay of the network nodes of the fuzzy neural network system that often happens due to insignificant features that are used to predict or diagnose the disease. An Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm was used to select the five fittest features of the diagnostic wisconsin breast cancer database out of the 32 features of the dataset. The logistic regression, support vector machine, k-nearest neighbor, random forest, and gaussian naïve bayes learning algorithms were used to develop two sets of classification models. Hence, the classification models with full features (32) and models with the 5 fittest features. The two sets of classification models were evaluated, and the results of the evaluation were compared. The result of the comparison shows that the models with the selected fittest features outperformed their counterparts with full features in terms of accuracy, sensitivity, and sensitivity. Therefore, a fuzzy neural network based expert system was developed with the five selected fittest features and the system achieved 99.33% accuracy, 99.41% sensitivity, and 99.24% specificity. Moreover, based on the comparison of the system developed in this work against the previous works that used fuzzy neural network or other applied artificial intelligence techniques on the same dataset for diagnosis of breast cancer using the same dataset, the system stands to be the best in terms of accuracy, sensitivity, and specificity, respectively. The z test was also conducted, and the test result shows that there is significant accuracy achieved by the system for early diagnosis of breast cancer.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-27
      DOI: 10.3390/bdcc6010013
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 14: Acknowledgment to Reviewers of BDCC in 2021

    • Authors: BDCC Editorial Office BDCC Editorial Office
      First page: 14
      Abstract: Rigorous peer-reviews are the basis of high-quality academic publishing [...]
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-27
      DOI: 10.3390/bdcc6010014
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 15: Google Street View Images as Predictors of Patient
           Health Outcomes, 2017–2019

    • Authors: Quynh C. Nguyen, Tom Belnap, Pallavi Dwivedi, Amir Hossein Nazem Deligani, Abhinav Kumar, Dapeng Li, Ross Whitaker, Jessica Keralis, Heran Mane, Xiaohe Yue, Thu T. Nguyen, Tolga Tasdizen, Kim D. Brunisholz
      First page: 15
      Abstract: Collecting neighborhood data can both be time- and resource-intensive, especially across broad geographies. In this study, we leveraged 1.4 million publicly available Google Street View (GSV) images from Utah to construct indicators of the neighborhood built environment and evaluate their associations with 2017–2019 health outcomes of approximately one-third of the population living in Utah. The use of electronic medical records allows for the assessment of associations between neighborhood characteristics and individual-level health outcomes while controlling for predisposing factors, which distinguishes this study from previous GSV studies that were ecological in nature. Among 938,085 adult patients, we found that individuals living in communities in the highest tertiles of green streets and non-single-family homes have 10–27% lower diabetes, uncontrolled diabetes, hypertension, and obesity, but higher substance use disorders—controlling for age, White race, Hispanic ethnicity, religion, marital status, health insurance, and area deprivation index. Conversely, the presence of visible utility wires overhead was associated with 5–10% more diabetes, uncontrolled diabetes, hypertension, obesity, and substance use disorders. Our study found that non-single-family and green streets were related to a lower prevalence of chronic conditions, while visible utility wires and single-lane roads were connected with a higher burden of chronic conditions. These contextual characteristics can better help healthcare organizations understand the drivers of their patients’ health by further considering patients’ residential environments, which present both risks and resources.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-27
      DOI: 10.3390/bdcc6010015
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 16: A Dataset for Emotion Recognition Using Virtual
           Reality and EEG (DER-VREEG): Emotional State Classification Using Low-Cost
           Wearable VR-EEG Headsets

    • Authors: Nazmi Sofian Suhaimi, James Mountstephens, Jason Teo
      First page: 16
      Abstract: Emotions are viewed as an important aspect of human interactions and conversations, and allow effective and logical decision making. Emotion recognition uses low-cost wearable electroencephalography (EEG) headsets to collect brainwave signals and interpret these signals to provide information on the mental state of a person, with the implementation of a virtual reality environment in different applications; the gap between human and computer interaction, as well as the understanding process, would shorten, providing an immediate response to an individual’s mental health. This study aims to use a virtual reality (VR) headset to induce four classes of emotions (happy, scared, calm, and bored), to collect brainwave samples using a low-cost wearable EEG headset, and to run popular classifiers to compare the most feasible ones that can be used for this particular setup. Firstly, we attempt to build an immersive VR database that is accessible to the public and that can potentially assist with emotion recognition studies using virtual reality stimuli. Secondly, we use a low-cost wearable EEG headset that is both compact and small, and can be attached to the scalp without any hindrance, allowing freedom of movement for participants to view their surroundings inside the immersive VR stimulus. Finally, we evaluate the emotion recognition system by using popular machine learning algorithms and compare them for both intra-subject and inter-subject classification. The results obtained here show that the prediction model for the four-class emotion classification performed well, including the more challenging inter-subject classification, with the support vector machine (SVM Class Weight kernel) obtaining 85.01% classification accuracy. This shows that using less electrode channels but with proper parameter tuning and selection features affects the performance of the classifications.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-01-28
      DOI: 10.3390/bdcc6010016
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 17: Big Data Analytics in Supply Chain Management: A
           Systematic Literature Review and Research Directions

    • Authors: In Lee, George Mangalaraj
      First page: 17
      Abstract: Big data analytics has been successfully used for various business functions, such as accounting, marketing, supply chain, and operations. Currently, along with the recent development in machine learning and computing infrastructure, big data analytics in the supply chain are surging in importance. In light of the great interest and evolving nature of big data analytics in supply chains, this study conducts a systematic review of existing studies in big data analytics. This study presents a framework of a systematic literature review from interdisciplinary perspectives. From the organizational perspective, this study examines the theoretical foundations and research models that explain the sustainability and performances achieved through the use of big data analytics. Then, from the technical perspective, this study analyzes types of big data analytics, techniques, algorithms, and features developed for enhanced supply chain functions. Finally, this study identifies the research gap and suggests future research directions.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-01
      DOI: 10.3390/bdcc6010017
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 18: Big Data in Construction: Current Applications and
           Future Opportunities

    • Authors: Munawar, Ullah, Qayyum, Shahzad
      First page: 18
      Abstract: Big data have become an integral part of various research fields due to the rapid advancements in the digital technologies available for dealing with data. The construction industry is no exception and has seen a spike in the data being generated due to the introduction of various digital disruptive technologies. However, despite the availability of data and the introduction of such technologies, the construction industry is lagging in harnessing big data. This paper critically explores literature published since 2010 to identify the data trends and how the construction industry can benefit from big data. The presence of tools such as computer-aided drawing (CAD) and building information modelling (BIM) provide a great opportunity for researchers in the construction industry to further improve how infrastructure can be developed, monitored, or improved in the future. The gaps in the existing research data have been explored and a detailed analysis was carried out to identify the different ways in which big data analysis and storage work in relevance to the construction industry. Big data engineering (BDE) and statistics are among the most crucial steps for integrating big data technology in construction. The results of this study suggest that while the existing research studies have set the stage for improving big data research, the integration of the associated digital technologies into the construction industry is not very clear. Among the future opportunities, big data research into construction safety, site management, heritage conservation, and project waste minimization and quality improvements are key areas.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-06
      DOI: 10.3390/bdcc6010018
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 19: The Next-Generation NIDS Platform: Cloud-Based
           Snort NIDS Using Containers and Big Data

    • Authors: Ferry Astika Saputra, Muhammad Salman, Jauari Akhmad Nur Hasim, Isbat Uzzin Nadhori, Kalamullah Ramli
      First page: 19
      Abstract: Snort is a well-known, signature-based network intrusion detection system (NIDS). The Snort sensor must be placed within the same physical network, and the defense centers in the typical NIDS architecture offer limited network coverage, especially for remote networks with a restricted bandwidth and network policy. Additionally, the growing number of sensor instances, followed by a quick increase in log data volume, has caused the present system to face big data challenges. This research paper proposes a novel design for a cloud-based Snort NIDS using containers and implementing big data in the defense center to overcome these problems. Our design consists of Docker as the sensor’s platform, Apache Kafka, as the distributed messaging system, and big data technology orchestrated on lambda architecture. We conducted experiments to measure sensor deployment, optimum message delivery from the sensors to the defense center, aggregation speed, and efficiency in the data-processing performance of the defense center. We successfully developed a cloud-based Snort NIDS and found the optimum method for message-delivery from the sensor to the defense center. We also succeeded in developing the dashboard and attack maps to display the attack statistics and visualize the attacks. Our first design is reported to implement the big data architecture, namely, lambda architecture, as the defense center and utilize rapid deployment of Snort NIDS using Docker technology as the network security monitoring platform.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-07
      DOI: 10.3390/bdcc6010019
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 20: Person Re-Identification via Pyramid Multipart
           Features and Multi-Attention Framework

    • Authors: Randa Mohamed Bayoumi, Elsayed E. Hemayed, Mohammad Ehab Ragab, Magda B. Fayek
      First page: 20
      Abstract: Video-based person re-identification has become quite attractive due to its importance in many vision surveillance problems. It is a challenging topic due to the inter/intra changes, occlusion, and pose variations involved. In this paper, we propose a pyramid-attentive framework that relies on multi-part features and multiple attention to aggregate features of multi-levels and learns attention-based representations of persons through various aspects. Self-attention is used to strengthen the most discriminative features in the spatial and channel domains and hence capture robust global information. We propose the use of part-relation attention between different multi-granularities of features’ representation to focus on learning appropriate local features. Temporal attention is used to aggregate temporal features. We integrate the most robust features in the global and multi-level views to build an effective convolution neural network (CNN) model. The proposed model outperforms the previous state-of-the art models on three datasets. Notably, using the proposed model enables the achievement of 98.9% (a relative improvement of 2.7% on the GRL) top1 accuracy and 99.3% mAP on the PRID2011, and 92.8% (a relative improvement of 2.4% relative to GRL) top1 accuracy on iLIDS-vid. We also explore the generalization ability of our model on a cross dataset.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-09
      DOI: 10.3390/bdcc6010020
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 21: Vec2Dynamics: A Temporal Word Embedding Approach
           to Exploring the Dynamics of Scientific Keywords—Machine Learning as
           a Case Study

    • Authors: Amna Dridi, Mohamed Medhat Gaber, Raja Muhammad Atif Azad, Jagdev Bhogal
      First page: 21
      Abstract: The study of the dynamics or the progress of science has been widely explored with descriptive and statistical analyses. Also this study has attracted several computational approaches that are labelled together as the Computational History of Science, especially with the rise of data science and the development of increasingly powerful computers. Among these approaches, some works have studied dynamism in scientific literature by employing text analysis techniques that rely on topic models to study the dynamics of research topics. Unlike topic models that do not delve deeper into the content of scientific publications, for the first time, this paper uses temporal word embeddings to automatically track the dynamics of scientific keywords over time. To this end, we propose Vec2Dynamics, a neural-based computational history approach that reports stability of k-nearest neighbors of scientific keywords over time; the stability indicates whether the keywords are taking new neighborhood due to evolution of scientific literature. To evaluate how Vec2Dynamics models such relationships in the domain of Machine Learning (ML), we constructed scientific corpora from the papers published in the Neural Information Processing Systems (NIPS; actually abbreviated NeurIPS) conference between 1987 and 2016. The descriptive analysis that we performed in this paper verify the efficacy of our proposed approach. In fact, we found a generally strong consistency between the obtained results and the Machine Learning timeline.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-21
      DOI: 10.3390/bdcc6010021
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 22: LFA: A Lévy Walk and Firefly-Based Search
           Algorithm: Application to Multi-Target Search and Multi-Robot Foraging

    • Authors: Ouarda Zedadra, Antonio Guerrieri, Hamid Seridi
      First page: 22
      Abstract: In the literature, several exploration algorithms have been proposed so far. Among these, Lévy walk is commonly used since it is proved to be more efficient than the simple random-walk exploration. It is beneficial when targets are sparsely distributed in the search space. However, due to its super-diffusive behavior, some tuning is needed to improve its performance, specifically when targets are clustered. Firefly algorithm is a swarm intelligence-based algorithm useful for intensive search, but its exploration rate is very limited. An efficient and reliable search could be attained by combining the two algorithms since the first one allows exploration space, and the second one encourages its exploitation. In this paper, we propose a swarm intelligence-based search algorithm called Lévy walk and Firefly-based Algorithm (LFA), which is a hybridization of the two aforementioned algorithms. The algorithm is applied to Multi-Target Search and Multi-Robot Foraging. Numerical experiments to test the performances are conducted on the robotic simulator ARGoS. A comparison with the original firefly algorithm proves the goodness of our contribution.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-21
      DOI: 10.3390/bdcc6010022
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 23: A Framework for Content-Based Search in Large
           Music Collections

    • Authors: Tiange Zhu, Raphaël Fournier-S’niehotta, Philippe Rigaux, Nicolas Travers
      First page: 23
      Abstract: We address the problem of scalable content-based search in large collections of music documents. Music content is highly complex and versatile and presents multiple facets that can be considered independently or in combination. Moreover, music documents can be digitally encoded in many ways. We propose a general framework for building a scalable search engine, based on (i) a music description language that represents music content independently from a specific encoding, (ii) an extendible list of feature-extraction functions, and (iii) indexing, searching, and ranking procedures designed to be integrated into the standard architecture of a text-oriented search engine. As a proof of concept, we also detail an actual implementation of the framework for searching in large collections of XML-encoded music scores, based on the popular ElasticSearch system. It is released as open-source in GitHub, and available as a ready-to-use Docker image for communities that manage large collections of digitized music documents.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-23
      DOI: 10.3390/bdcc6010023
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 24: Combination of Reduction Detection Using TOPSIS
           for Gene Expression Data Analysis

    • Authors: Jogeswar Tripathy, Rasmita Dash, Binod Kumar Pattanayak, Sambit Kumar Mishra, Tapas Kumar Mishra, Deepak Puthal
      First page: 24
      Abstract: In high-dimensional data analysis, Feature Selection (FS) is one of the most fundamental issues in machine learning and requires the attention of researchers. These datasets are characterized by huge space due to a high number of features, out of which only a few are significant for analysis. Thus, significant feature extraction is crucial. There are various techniques available for feature selection; among them, the filter techniques are significant in this community, as they can be used with any type of learning algorithm and drastically lower the running time of optimization algorithms and improve the performance of the model. Furthermore, the application of a filter approach depends on the characteristics of the dataset as well as on the machine learning model. Thus, to avoid these issues in this research, a combination of feature reduction (CFR) is considered designing a pipeline of filter approaches for high-dimensional microarray data classification. Considering four filter approaches, sixteen combinations of pipelines are generated. The feature subset is reduced in different levels, and ultimately, the significant feature set is evaluated. The pipelined filter techniques are Correlation-Based Feature Selection (CBFS), Chi-Square Test (CST), Information Gain (InG), and Relief Feature Selection (RFS), and the classification techniques are Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), and k-Nearest Neighbor (k-NN). The performance of CFR depends highly on the datasets as well as on the classifiers. Thereafter, the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) method is used for ranking all reduction combinations and evaluating the superior filter combination among all.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-02-23
      DOI: 10.3390/bdcc6010024
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 25: Big Data in Criteria Selection and Identification
           in Managing Flood Disaster Events Based on Macro Domain PESTEL Analysis:
           Case Study of Malaysia Adaptation Index

    • Authors: Mohammad Fikry Abdullah, Zurina Zainol, Siaw Yin Thian, Noor Hisham Ab Ghani, Azman Mat Jusoh, Mohd Zaki Mat Amin, Nur Aiza Mohamad
      First page: 25
      Abstract: The impact of Big Data (BD) creates challenges in selecting relevant and significant data to be used as criteria to facilitate flood management plans. Studies on macro domain criteria expand the criteria selection, which is important for assessment in allowing a comprehensive understanding of the current situation, readiness, preparation, resources, and others for decision assessment and disaster events planning. This study aims to facilitate the criteria identification and selection from a macro domain perspective in improving flood management planning. The objectives of this study are (a) to explore and identify potential and possible criteria to be incorporated in the current flood management plan in the macro domain perspective; (b) to understand the type of flood measures and decision goals implemented to facilitate flood management planning decisions; and (c) to examine the possible structured mechanism for criteria selection based on the decision analysis technique. Based on a systematic literature review and thematic analysis using the PESTEL framework, the findings have identified and clustered domains and their criteria to be considered and applied in future flood management plans. The critical review on flood measures and decision goals would potentially equip stakeholders and policy makers for better decision making based on a disaster management plan. The decision analysis technique as a structured mechanism would significantly improve criteria identification and selection for comprehensive and collective decisions. The findings from this study could further improve Malaysia Adaptation Index (MAIN) criteria identification and selection, which could be the complementary and supporting reference in managing flood disaster management. A proposed framework from this study can be used as guidance in dealing with and optimising the criteria based on challenges and the current application of Big Data and criteria in managing disaster events.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-01
      DOI: 10.3390/bdcc6010025
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 26: A Combined System Metrics Approach to Cloud
           Service Reliability Using Artificial Intelligence

    • Authors: Tek Raj Chhetri, Chinmaya Kumar Dehury, Artjom Lind, Satish Narayana Srirama, Anna Fensel
      First page: 26
      Abstract: Identifying and anticipating potential failures in the cloud is an effective method for increasing cloud reliability and proactive failure management. Many studies have been conducted to predict potential failure, but none have combined SMART (self-monitoring, analysis, and reporting technology) hard drive metrics with other system metrics, such as central processing unit (CPU) utilisation. Therefore, we propose a combined system metrics approach for failure prediction based on artificial intelligence to improve reliability. We tested over 100 cloud servers’ data and four artificial intelligence algorithms: random forest, gradient boosting, long short-term memory, and gated recurrent unit, and also performed correlation analysis. Our correlation analysis sheds light on the relationships that exist between system metrics and failure, and the experimental results demonstrate the advantages of combining system metrics, outperforming the state-of-the-art.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-01
      DOI: 10.3390/bdcc6010026
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 27: Optimizations for Computing Relatedness in
           Biomedical Heterogeneous Information Networks: SemNet 2.0

    • Authors: Anna Kirkpatrick, Chidozie Onyeze, David Kartchner, Stephen Allegri, Davi Nakajima An, Kevin McCoy, Evie Davalbhakta, Cassie S. Mitchell
      First page: 27
      Abstract: Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-01
      DOI: 10.3390/bdcc6010027
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 28: Comparison of Object Detection in Head-Mounted and
           Desktop Displays for Congruent and Incongruent Environments

    • Authors: René Reinhard, Erinchan Telatar, Shah Rukh Humayoun
      First page: 28
      Abstract: Virtual reality technologies, including head-mounted displays (HMD), can provide benefits to psychological research by combining high degrees of experimental control with improved ecological validity. This is due to the strong feeling of being in the displayed environment (presence) experienced by VR users. As of yet, it is not fully explored how using HMDs impacts basic perceptual tasks, such as object perception. In traditional display setups, the congruency between background environment and object category has been shown to impact response times in object perception tasks. In this study, we investigated whether this well-established effect is comparable when using desktop and HMD devices. In the study, 21 participants used both desktop and HMD setups to perform an object identification task and, subsequently, their subjective presence while experiencing two-distinct virtual environments (a beach and a home environment) was evaluated. Participants were quicker to identify objects in the HMD condition, independent of object-environment congruency, while congruency effects were not impacted. Furthermore, participants reported significantly higher presence in the HMD condition.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-07
      DOI: 10.3390/bdcc6010028
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 29: Radiology Imaging Scans for Early Diagnosis of
           Kidney Tumors: A Review of Data Analytics-Based Machine Learning and Deep
           Learning Approaches

    • Authors: Maha Gharaibeh, Dalia Alzu’bi, Malak Abdullah, Ismail Hmeidi, Mohammad Rustom Al Nasar, Laith Abualigah, Amir H. Gandomi
      First page: 29
      Abstract: Plenty of disease types exist in world communities that can be explained by humans’ lifestyles or the economic, social, genetic, and other factors of the country of residence. Recently, most research has focused on studying common diseases in the population to reduce death risks, take the best procedure for treatment, and enhance the healthcare level of the communities. Kidney Disease is one of the common diseases that have affected our societies. Sectionicularly Kidney Tumors (KT) are the 10th most prevalent tumor for men and women worldwide. Overall, the lifetime likelihood of developing a kidney tumor for males is about 1 in 466 (2.02 percent) and it is around 1 in 80 (1.03 percent) for females. Still, more research is needed on new diagnostic, early, and innovative methods regarding finding an appropriate treatment method for KT. Compared to the tedious and time-consuming traditional diagnosis, automatic detection algorithms of machine learning can save diagnosis time, improve test accuracy, and reduce costs. Previous studies have shown that deep learning can play a role in dealing with complex tasks, diagnosis and segmentation, and classification of Kidney Tumors, one of the most malignant tumors. The goals of this review article on deep learning in radiology imaging are to summarize what has already been accomplished, determine the techniques used by the researchers in previous years in diagnosing Kidney Tumors through medical imaging, and identify some promising future avenues, whether in terms of applications or technological developments, as well as identifying common problems, describing ways to expand the data set, summarizing the knowledge and best practices, and determining remaining challenges and future directions.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-08
      DOI: 10.3390/bdcc6010029
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 30: Big Data Management in Drug–Drug
           Interaction: A Modern Deep Learning Approach for Smart Healthcare

    • Authors: Muhammad Salman, Hafiz Suliman Munawar, Khalid Latif, Muhammad Waseem Akram, Sara Imran Khan, Fahim Ullah
      First page: 30
      Abstract: The detection and classification of drug–drug interactions (DDI) from existing data are of high importance because recent reports show that DDIs are among the major causes of hospital-acquired conditions and readmissions and are also necessary for smart healthcare. Therefore, to avoid adverse drug interactions, it is necessary to have an up-to-date knowledge of DDIs. This knowledge could be extracted by applying text-processing techniques to the medical literature published in the form of ‘Big Data’ because, whenever a drug interaction is investigated, it is typically reported and published in healthcare and clinical pharmacology journals. However, it is crucial to automate the extraction of the interactions taking place between drugs because the medical literature is being published in immense volumes, and it is impossible for healthcare professionals to read and collect all of the investigated DDI reports from these Big Data. To avoid this time-consuming procedure, the Information Extraction (IE) and Relationship Extraction (RE) techniques that have been studied in depth in Natural Language Processing (NLP) could be very promising. Since 2011, a lot of research has been reported in this particular area, and there are many approaches that have been implemented that can also be applied to biomedical texts to extract DDI-related information. A benchmark corpus is also publicly available for the advancement of DDI extraction tasks. The current state-of-the-art implementations for extracting DDIs from biomedical texts has employed Support Vector Machines (SVM) or other machine learning methods that work on manually defined features and that might be the cause of the low precision and recall that have been achieved in this domain so far. Modern deep learning techniques have also been applied for the automatic extraction of DDIs from the scientific literature and have proven to be very promising for the advancement of DDI extraction tasks. As such, it is pertinent to investigate deep learning techniques for the extraction and classification of DDIs in order for them to be used in the smart healthcare domain. We proposed a deep neural network-based method (SEV-DDI: Severity-Drug–Drug Interaction) with some further-integrated units/layers to achieve higher precision and accuracy. After successfully outperforming other methods in the DDI classification task, we moved a step further and utilized the methods in a sentiment analysis task to investigate the severity of an interaction. The ability to determine the severity of a DDI will be very helpful for clinical decision support systems in making more accurate and informed decisions, ensuring the safety of the patients.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-09
      DOI: 10.3390/bdcc6010030
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 31: Factors Influencing Citizens’ Intention to
           Use Open Government Data—A Case Study of Pakistan

    • Authors: Muhammad Mahboob Khurshid, Nor Hidayati Zakaria, Muhammad Irfanullah Arfeen, Ammar Rashid, Safi Ullah Nasir, Hafiz Muhammad Faisal Shehzad
      First page: 31
      Abstract: Open government data (OGD) has gained much attention worldwide; however, there is still an increasing demand for exploring research from the perspective of its adoption and diffusion. Policymakers expect that OGD will be used on a large scale by the public, which will result in a range of benefits, such as: faith and trust in governments, innovation and development, and participatory governance. However, not much is known about which factors influence the citizens’ intention to use OGD. Therefore, this research aims at empirically investigating the factors that influence citizens’ intention to use OGD in a developing country using information systems theory. Improved knowledge and understanding of the influencing factors can assist policymakers in determining which policy initiatives they can take to increase the intention to widely use OGD. Upon conducting a survey and performing analysis, findings reveal that perceived usefulness, social approval, and enjoyment positively influences intention, whereas voluntariness of use negatively influences OGD use. Further, perceived usefulness is significantly affected by perceived ease of use, and OGD use is significantly affected by OGD use intention. However, surprisingly, the intention to use OGD is not significantly affected by perceived ease of use. The policymakers suggest increasing the intention to use OGD by considering significant factors.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-17
      DOI: 10.3390/bdcc6010031
      Issue No: Vol. 6, No. 1 (2022)
       
  • BDCC, Vol. 6, Pages 32: Service Oriented R-ANN Knowledge Model for Social
           Internet of Things

    • Authors: Mohana S. D., S. P. Shiva Prakash, Kirill Krinkin
      First page: 32
      Abstract: Increase in technologies around the world requires adding intelligence to the objects, and making it a smart object in an environment leads to the Social Internet of Things (SIoT). These social objects are uniquely identifiable, transferable and share information from user-to-objects and objects-to objects through interactions in a smart environment such as smart homes, smart cities and many more applications. SIoT faces certain challenges such as handling of heterogeneous objects, selection of generated data in objects, missing values in data. Therefore, the discovery and communication of meaningful patterns in data are more important for every application. Thus, the analysis of data is essential in smarter decisions and qualifies performance of data for various applications. In a smart environment, social networks of intelligent objects are increasing services and decreasing the relationship in a reliable and efficient way of sharing resources and services. Hence, this work proposed the feature selection method based on proposed semantic rules and established the relationships to classify the services using relationship artificial neural networks (R-ANN). R-ANN is an inversely proportional relationship to the objects based on certain rules and conditions between the objects to objects and users to objects. It provides the service oriented knowledge model to make decisions in the proposed R-ANN model that produces service to the users. The proposed R-ANN provides an accuracy of 89.62% for various services namely weather, air quality, parking, light status, and people presence respectively in the SIoT environment compared to the existing model.
      Citation: Big Data and Cognitive Computing
      PubDate: 2022-03-18
      DOI: 10.3390/bdcc6010032
      Issue No: Vol. 6, No. 1 (2022)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.235.78.122
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-