A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

  Subjects -> ELECTRONICS (Total: 207 journals)
Showing 1 - 200 of 277 Journals sorted alphabetically
ACS Applied Electronic Materials     Open Access   (Followers: 1)
Acta Electronica Malaysia     Open Access  
Advanced Materials Technologies     Hybrid Journal   (Followers: 2)
Advances in Biosensors and Bioelectronics     Open Access   (Followers: 6)
Advances in Electrical and Electronic Engineering     Open Access   (Followers: 5)
Advances in Electronics     Open Access   (Followers: 122)
Advances in Microelectronic Engineering     Open Access   (Followers: 12)
Advances in Power Electronics     Open Access   (Followers: 56)
Advancing Microelectronics     Hybrid Journal   (Followers: 2)
American Journal of Electrical and Electronic Engineering     Open Access   (Followers: 26)
Annals of Telecommunications     Hybrid Journal   (Followers: 6)
APSIPA Transactions on Signal and Information Processing     Open Access   (Followers: 8)
Archives of Electrical Engineering     Open Access   (Followers: 14)
Australian Journal of Electrical and Electronics Engineering     Hybrid Journal  
Automatika : Journal for Control, Measurement, Electronics, Computing and Communications     Open Access  
Batteries     Open Access   (Followers: 8)
Batteries & Supercaps     Hybrid Journal   (Followers: 5)
Bell Labs Technical Journal     Hybrid Journal   (Followers: 27)
Bioelectronics in Medicine     Hybrid Journal  
Canadian Journal of Remote Sensing     Full-text available via subscription   (Followers: 50)
China Communications     Full-text available via subscription   (Followers: 8)
Chinese Journal of Electronics     Open Access  
Circuits and Systems     Open Access   (Followers: 16)
Control Systems     Hybrid Journal   (Followers: 236)
e-Prime : Advances in Electrical Engineering, Electronics and Energy     Open Access   (Followers: 2)
ECTI Transactions on Electrical Engineering, Electronics, and Communications     Open Access   (Followers: 1)
Edu Elektrika Journal     Open Access   (Followers: 1)
Electronic Design     Partially Free   (Followers: 125)
Electronic Markets     Hybrid Journal   (Followers: 6)
Electronic Materials Letters     Hybrid Journal   (Followers: 4)
Electronics     Open Access   (Followers: 125)
Electronics and Communications in Japan     Hybrid Journal   (Followers: 8)
Electronics For You     Partially Free   (Followers: 114)
Electronics Letters     Open Access   (Followers: 25)
Elektronika ir Elektortechnika     Open Access  
Elkha : Jurnal Teknik Elektro     Open Access  
Emitor : Jurnal Teknik Elektro     Open Access  
Energy Storage     Hybrid Journal   (Followers: 2)
Energy Storage Materials     Full-text available via subscription   (Followers: 5)
EPE Journal : European Power Electronics and Drives     Hybrid Journal   (Followers: 3)
EPJ Quantum Technology     Open Access   (Followers: 2)
Facta Universitatis, Series : Electronics and Energetics     Open Access  
Foundations and Trends® in Communications and Information Theory     Full-text available via subscription   (Followers: 6)
Foundations and Trends® in Signal Processing     Full-text available via subscription   (Followers: 7)
Frontiers in Electronics     Open Access   (Followers: 1)
Frontiers of Optoelectronics     Hybrid Journal   (Followers: 1)
IACR Transactions on Symmetric Cryptology     Open Access  
IEEE Antennas and Propagation Magazine     Hybrid Journal   (Followers: 112)
IEEE Antennas and Wireless Propagation Letters     Hybrid Journal   (Followers: 88)
IEEE Embedded Systems Letters     Hybrid Journal   (Followers: 60)
IEEE Journal of Electromagnetics, RF and Microwaves in Medicine and Biology     Hybrid Journal  
IEEE Journal of Emerging and Selected Topics in Power Electronics     Hybrid Journal   (Followers: 52)
IEEE Journal of the Electron Devices Society     Open Access   (Followers: 8)
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits     Hybrid Journal   (Followers: 2)
IEEE Letters on Electromagnetic Compatibility Practice and Applications     Hybrid Journal   (Followers: 1)
IEEE Magnetics Letters     Hybrid Journal   (Followers: 7)
IEEE Nanotechnology Magazine     Hybrid Journal   (Followers: 45)
IEEE Open Journal of Circuits and Systems     Open Access  
IEEE Open Journal of Industry Applications     Open Access  
IEEE Open Journal of the Industrial Electronics Society     Open Access  
IEEE Power Electronics Magazine     Full-text available via subscription   (Followers: 90)
IEEE Pulse     Hybrid Journal   (Followers: 5)
IEEE Reviews in Biomedical Engineering     Hybrid Journal   (Followers: 19)
IEEE Solid-State Circuits Letters     Hybrid Journal  
IEEE Solid-State Circuits Magazine     Hybrid Journal   (Followers: 11)
IEEE Transactions on Aerospace and Electronic Systems     Hybrid Journal   (Followers: 281)
IEEE Transactions on Antennas and Propagation     Full-text available via subscription   (Followers: 79)
IEEE Transactions on Automatic Control     Hybrid Journal   (Followers: 65)
IEEE Transactions on Autonomous Mental Development     Hybrid Journal   (Followers: 8)
IEEE Transactions on Biomedical Engineering     Hybrid Journal   (Followers: 35)
IEEE Transactions on Broadcasting     Hybrid Journal   (Followers: 11)
IEEE Transactions on Circuits and Systems for Video Technology     Hybrid Journal   (Followers: 31)
IEEE Transactions on Consumer Electronics     Hybrid Journal   (Followers: 45)
IEEE Transactions on Electron Devices     Hybrid Journal   (Followers: 18)
IEEE Transactions on Geoscience and Remote Sensing     Hybrid Journal   (Followers: 174)
IEEE Transactions on Haptics     Hybrid Journal   (Followers: 4)
IEEE Transactions on Industrial Electronics     Hybrid Journal   (Followers: 85)
IEEE Transactions on Industry Applications     Hybrid Journal   (Followers: 57)
IEEE Transactions on Information Theory     Hybrid Journal   (Followers: 27)
IEEE Transactions on Learning Technologies     Full-text available via subscription   (Followers: 12)
IEEE Transactions on Power Electronics     Hybrid Journal   (Followers: 87)
IEEE Transactions on Services Computing     Hybrid Journal   (Followers: 5)
IEEE Transactions on Signal and Information Processing over Networks     Hybrid Journal   (Followers: 14)
IEEE Transactions on Software Engineering     Hybrid Journal   (Followers: 84)
IEEE Women in Engineering Magazine     Hybrid Journal   (Followers: 11)
IEEE/OSA Journal of Optical Communications and Networking     Hybrid Journal   (Followers: 19)
IEICE - Transactions on Electronics     Full-text available via subscription   (Followers: 11)
IEICE - Transactions on Information and Systems     Full-text available via subscription   (Followers: 5)
IET Cyber-Physical Systems : Theory & Applications     Open Access   (Followers: 1)
IET Energy Systems Integration     Open Access   (Followers: 1)
IET Microwaves, Antennas & Propagation     Open Access   (Followers: 35)
IET Nanodielectrics     Open Access  
IET Power Electronics     Open Access   (Followers: 76)
IET Smart Grid     Open Access   (Followers: 2)
IET Wireless Sensor Systems     Open Access   (Followers: 17)
IETE Journal of Education     Open Access   (Followers: 3)
IETE Journal of Research     Open Access   (Followers: 10)
IETE Technical Review     Open Access   (Followers: 9)
IJEIS (Indonesian Journal of Electronics and Instrumentation Systems)     Open Access   (Followers: 3)
Industrial Technology Research Journal Phranakhon Rajabhat University     Open Access  
Informatik-Spektrum     Hybrid Journal   (Followers: 3)
Intelligent Transportation Systems Magazine, IEEE     Full-text available via subscription   (Followers: 12)
International Journal of Advanced Electronics and Communication Systems     Open Access   (Followers: 10)
International Journal of Advanced Research in Computer Science and Electronics Engineering     Open Access   (Followers: 14)
International Journal of Advances in Telecommunications, Electrotechnics, Signals and Systems     Open Access   (Followers: 12)
International Journal of Aerospace Innovations     Full-text available via subscription   (Followers: 23)
International Journal of Antennas and Propagation     Open Access   (Followers: 10)
International Journal of Applied Electronics in Physics & Robotics     Open Access   (Followers: 3)
International Journal of Computational Vision and Robotics     Hybrid Journal   (Followers: 5)
International Journal of Control     Hybrid Journal   (Followers: 13)
International Journal of Electronics     Hybrid Journal   (Followers: 7)
International Journal of Electronics and Telecommunications     Open Access   (Followers: 8)
International Journal of Granular Computing, Rough Sets and Intelligent Systems     Hybrid Journal   (Followers: 1)
International Journal of High Speed Electronics and Systems     Hybrid Journal  
International Journal of Hybrid Intelligence     Hybrid Journal   (Followers: 1)
International Journal of Image, Graphics and Signal Processing     Open Access   (Followers: 22)
International Journal of Microwave and Wireless Technologies     Hybrid Journal   (Followers: 16)
International Journal of Nanoscience     Hybrid Journal  
International Journal of Numerical Modelling: Electronic Networks, Devices and Fields     Hybrid Journal   (Followers: 4)
International Journal of Power Electronics     Hybrid Journal   (Followers: 30)
International Journal of Review in Electronics & Communication Engineering     Open Access   (Followers: 2)
International Journal of Sensors, Wireless Communications and Control     Hybrid Journal   (Followers: 13)
International Journal of Systems, Control and Communications     Hybrid Journal   (Followers: 6)
International Journal of Wireless and Microwave Technologies     Open Access   (Followers: 12)
International Transaction of Electrical and Computer Engineers System     Open Access   (Followers: 2)
JAREE (Journal on Advanced Research in Electrical Engineering)     Open Access  
Journal of Biosensors & Bioelectronics     Open Access   (Followers: 4)
Journal of Advanced Dielectrics     Open Access   (Followers: 1)
Journal of Artificial Intelligence     Open Access   (Followers: 18)
Journal of Circuits, Systems, and Computers     Hybrid Journal   (Followers: 4)
Journal of Computational Intelligence and Electronic Systems     Full-text available via subscription   (Followers: 1)
Journal of Electrical and Electronics Engineering Research     Open Access   (Followers: 41)
Journal of Electrical Engineering & Electronic Technology     Hybrid Journal   (Followers: 4)
Journal of Electromagnetic Analysis and Applications     Open Access   (Followers: 6)
Journal of Electromagnetic Waves and Applications     Hybrid Journal   (Followers: 10)
Journal of Electronic Science and Technology     Open Access  
Journal of Electronics (China)     Hybrid Journal   (Followers: 5)
Journal of Energy Storage     Full-text available via subscription   (Followers: 4)
Journal of Engineered Fibers and Fabrics     Open Access  
Journal of Field Robotics     Hybrid Journal   (Followers: 5)
Journal of Guidance, Control, and Dynamics     Hybrid Journal   (Followers: 165)
Journal of Information and Telecommunication     Open Access   (Followers: 2)
Journal of Intelligent Procedures in Electrical Technology     Open Access   (Followers: 2)
Journal of Low Power Electronics     Full-text available via subscription   (Followers: 14)
Journal of Low Power Electronics and Applications     Open Access   (Followers: 9)
Journal of Microelectronics and Electronic Packaging     Hybrid Journal   (Followers: 2)
Journal of Microwave Power and Electromagnetic Energy     Hybrid Journal   (Followers: 8)
Journal of Nuclear Cardiology     Hybrid Journal   (Followers: 1)
Journal of Optoelectronics Engineering     Open Access   (Followers: 4)
Journal of Power Electronics     Hybrid Journal   (Followers: 8)
Journal of Power Electronics & Power Systems     Full-text available via subscription   (Followers: 19)
Journal of Sensors     Open Access   (Followers: 25)
Jurnal Rekayasa Elektrika     Open Access  
Jurnal Teknik Elektro     Open Access  
Jurnal Teknologi Elektro     Open Access  
Kinetik : Game Technology, Information System, Computer Network, Computing, Electronics, and Control     Open Access   (Followers: 5)
Machine Learning with Applications     Full-text available via subscription   (Followers: 3)
Majalah Ilmiah Teknologi Elektro : Journal of Electrical Technology     Open Access   (Followers: 1)
Metrology and Measurement Systems     Open Access   (Followers: 8)
Microelectronics and Solid State Electronics     Open Access   (Followers: 27)
Nanotechnology, Science and Applications     Open Access   (Followers: 7)
Nature Electronics     Hybrid Journal   (Followers: 3)
Networks: an International Journal     Hybrid Journal   (Followers: 4)
npj Flexible Electronics     Open Access  
Open Electrical & Electronic Engineering Journal     Open Access   (Followers: 1)
Open Journal of Antennas and Propagation     Open Access   (Followers: 8)
Power Electronics and Drives     Open Access   (Followers: 2)
Problemy Peredachi Informatsii     Full-text available via subscription  
Progress in Quantum Electronics     Full-text available via subscription   (Followers: 8)
Radiophysics and Quantum Electronics     Hybrid Journal   (Followers: 2)
Research & Reviews : Journal of Embedded System & Applications     Full-text available via subscription   (Followers: 6)
Security and Communication Networks     Hybrid Journal   (Followers: 2)
Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of     Hybrid Journal   (Followers: 62)
Semiconductors and Semimetals     Full-text available via subscription   (Followers: 1)
Sensing and Imaging : An International Journal     Hybrid Journal   (Followers: 2)
Sensors International     Open Access   (Followers: 3)
Solid State Electronics Letters     Open Access  
Solid-State Electronics     Hybrid Journal   (Followers: 7)
Superconductivity     Full-text available via subscription   (Followers: 4)
Synthesis Lectures on Power Electronics     Full-text available via subscription   (Followers: 4)
Technical Report Electronics and Computer Engineering     Open Access  
Telematique     Open Access  
TELKOMNIKA (Telecommunication, Computing, Electronics and Control)     Open Access   (Followers: 2)
Transactions on Cryptographic Hardware and Embedded Systems     Open Access   (Followers: 1)
Transactions on Electrical and Electronic Materials     Hybrid Journal   (Followers: 2)
Universal Journal of Electrical and Electronic Engineering     Open Access   (Followers: 7)
Ural Radio Engineering Journal     Open Access   (Followers: 1)
Visión Electrónica : algo más que un estado sólido     Open Access  
Wireless and Mobile Technologies     Open Access   (Followers: 4)
Електротехніка і Електромеханіка     Open Access   (Followers: 1)

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Similar Journals
Journal Cover
IEEE Transactions on Circuits and Systems for Video Technology
Journal Prestige (SJR): 0.977
Citation Impact (citeScore): 5
Number of Followers: 31  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1051-8215
Published by IEEE Homepage  [228 journals]
  • IEEE Transactions on Circuits and Systems for Video Technology publication
           information

    • Free pre-print version: Loading...

      Abstract: Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • IEEE Transactions on Circuits and Systems for Video Technology publication
           information

    • Free pre-print version: Loading...

      Abstract: Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Guest Editorial Special Section on Learning With Multimodal Data for
           Biomedical Informatics

    • Free pre-print version: Loading...

      Authors: Zhangyang Wang;Vishal Patel;Bing Yao;Steve Jiang;Huimin Lu;Yang Shen;
      Pages: 2508 - 2511
      Abstract: In this Special Section of the IEEE Transactions on Circuits and Systems for Video Technology, it is our honor to present emerging advanced machine learning and data analytics algorithms aiming at catalyzing synergies among image/video processing, text/speech understanding, and multimodal learning in biomedical informatics. Our goals are to 1) introduce novel data-driven models to accelerate knowledge discovery in biomedicine through the seamless integration of medical data collected from imaging systems, laboratory and wearable devices, as well as other related medical devices; 2) promote the development of new multi-modal learning systems to enhance the healthcare quality and patient safety; and 3) promote new applications in biomedical informatics that can leverage or benefits from the integration of multi-modal data and machine learning.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Mutual Information-Based Graph Co-Attention Networks for Multimodal
           Prior-Guided Magnetic Resonance Imaging Segmentation

    • Free pre-print version: Loading...

      Authors: Shaocong Mo;Ming Cai;Lanfen Lin;Ruofeng Tong;Qingqing Chen;Fang Wang;Hongjie Hu;Yutaro Iwamoto;Xian-Hua Han;Yen-Wei Chen;
      Pages: 2512 - 2526
      Abstract: Multimodal magnetic resonance imaging (MRI) provides complementary information about targets, and the segmentation of multimodal MRI is widely used as an essential preprocessing step for initial diagnosis, stage differentiation, and post-treatment efficacy evaluation in clinical situations. For the main modality or each of the modalities, it is important to enhance the visual information by modeling the connection and effectively fusing the features among them. However, the existing methods for multimodal segmentation have a drawback; they coincidentally drop information of individual modality during the fusion process. Recently, graph learning-based methods have been applied in segmentation, and these methods have achieved considerable improvements by modeling the relationships across feature regions and reasoning using global information. In this paper, we propose a graph learning-based approach to efficiently extract modality-specific features and establish regional correspondence effectively among all modalities. In detail, after projecting features into a graph domain and employing graph convolution to propagate information across all regions for learning global modality-specific features, we propose a mutual information-based graph co-attention module to learn the weight coefficients of one bipartite graph constructed by the fully connected graphs having different modalities in the graph domain and by selectively fusing the node features. Based on the deformation diagram between the spatial-graph space and our proposed graph co-attention module, we present a multimodal prior-guided segmentation framework, which uses two strategies for two clinical situations: Modality-Specific Learning Strategy and Co-Modality Learning Strategy. Besides, the improved Co-Modality Learning Strategy is used with trainable weights in the multi-task loss for the optimization of the proposed framework. We validated our -roposed modules and frameworks on two multimodal MRI datasets: our private liver lesion dataset and a public prostate zone dataset. Our experimental results on both datasets prove the superiority of our proposed approaches.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Multimodal Feature Fusion and Knowledge-Driven Learning via Experts
           Consult for Thyroid Nodule Classification

    • Free pre-print version: Loading...

      Authors: Danilo Avola;Luigi Cinque;Alessio Fagioli;Sebastiano Filetti;Giorgio Grani;Emanuele Rodolà;
      Pages: 2527 - 2534
      Abstract: Computer-aided diagnosis (CAD) is becoming a prominent approach to assist clinicians spanning across multiple fields. These automated systems take advantage of various computer vision (CV) procedures, as well as artificial intelligence (AI) techniques, to formulate a diagnosis of a given image, e.g., computed tomography and ultrasound. Advances in both areas (CV and AI) are enabling ever increasing performances of CAD systems, which can ultimately avoid performing invasive procedures such as fine-needle aspiration. In this study, a novel end-to-end knowledge-driven classification framework is presented. The system focuses on multimodal data generated by thyroid ultrasonography, and acts as a CAD system by providing a thyroid nodule classification into the benign and malignant categories. Specifically, the proposed system leverages cues provided by an ensemble of experts to guide the learning phase of a densely connected convolutional network (DenseNet). The ensemble is composed by various networks pretrained on ImageNet, including AlexNet, ResNet, VGG, and others. The previously computed multimodal feature parameters are used to create ultrasonography domain experts via transfer learning, decreasing, moreover, the number of samples required for training. To validate the proposed method, extensive experiments were performed, providing detailed performances for both the experts ensemble and the knowledge-driven DenseNet. As demonstrated by the results, the proposed system achieves relevant performances in terms of qualitative metrics for the thyroid nodule classification task, thus resulting in a great asset when formulating a diagnosis.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Cohesive Multi-Modality Feature Learning and Fusion for COVID-19 Patient
           Severity Prediction

    • Free pre-print version: Loading...

      Authors: Jinzhao Zhou;Xingming Zhang;Ziwei Zhu;Xiangyuan Lan;Lunkai Fu;Haoxiang Wang;Hanchun Wen;
      Pages: 2535 - 2549
      Abstract: The outbreak of coronavirus disease (COVID-19) has been a nightmare to citizens, hospitals, healthcare practitioners, and the economy in 2020. The overwhelming number of confirmed cases and suspected cases put forward an unprecedented challenge to the hospital’s capacity of management and medical resource distribution. To reduce the possibility of cross-infection and attend a patient according to his severity level, expertly diagnosis and sophisticated medical examinations are often required but hard to fulfil during a pandemic. To facilitate the assessment of a patient’s severity, this paper proposes a multi-modality feature learning and fusion model for end-to-end covid patient severity prediction using the blood test supported electronic medical record (EMR) and chest computerized tomography (CT) scan images. To evaluate a patient’s severity by the co-occurrence of salient clinical features, the High-order Factorization Network (HoFN) is proposed to learn the impact of a set of clinical features without tedious feature engineering. On the other hand, an attention-based deep convolutional neural network (CNN) using pre-trained parameters are used to process the lung CT images. Finally, to achieve cohesion of cross-modality representation, we design a loss function to shift deep features of both-modality into the same feature space which improves the model’s performance and robustness when one modality is absent. Experimental results demonstrate that the proposed multi-modality feature learning and fusion model achieves high performance in an authentic scenario.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Hierarchical Deep CNN Feature Set-Based Representation Learning for Robust
           Cross-Resolution Face Recognition

    • Free pre-print version: Loading...

      Authors: Guangwei Gao;Yi Yu;Jian Yang;Guo-Jun Qi;Meng Yang;
      Pages: 2550 - 2560
      Abstract: Cross-resolution face recognition (CRFR), which is important in intelligent surveillance and biometric forensics, refers to the problem of matching a low-resolution (LR) probe face image against high-resolution (HR) gallery face images. Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space where the resolution discrepancy is mitigated. However, little works consider how to extract and utilize the intermediate discriminative features from the noisy LR query faces to further mitigate the resolution discrepancy due to the resolution limitations. In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR. In particular, our contributions are threefold. (i) To learn more robust and discriminative features, we desire to adaptively fuse the contextual features from different layers. (ii) To fully exploit these contextual features, we design a feature set-based representation learning (FSRL) scheme to collaboratively represent the hierarchical features for more accurate recognition. Moreover, FSRL utilizes the primitive form of feature maps to keep the latent structural information, especially in noisy cases. (iii) To further promote the recognition performance, we desire to fuse the hierarchical recognition outputs from different stages. Meanwhile, the discriminability from different scales can also be fully integrated. By exploiting these advantages, the efficiency of the proposed method can be delivered. Experimental results on several face datasets have verified the superiority of the presented algorithm to the other competitive CRFR approaches.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • A Spatio-Temporal Approach for Apathy Classification

    • Free pre-print version: Loading...

      Authors: Abhijit Das;Xuesong Niu;Antitza Dantcheva;S. L. Happy;Hu Han;Radia Zeghari;Philippe Robert;Shiguang Shan;Francois Bremond;Xilin Chen;
      Pages: 2561 - 2573
      Abstract: Apathy is characterized by symptoms such as reduced emotional response, lack of motivation, and limited social interaction. Current methods for apathy diagnosis require the patient’s presence in a clinic and time consuming clinical interviews, which are costly and inconvenient for both, patients and clinical staff, hindering among other large-scale diagnostics. In this work, we propose a novel spatio-temporal framework for apathy classification, which is streamlined to analyze facial dynamics and emotion in videos. Specifically, we divide the videos into smaller clips, and proceed to extract associated facial dynamics and emotion-based features. Statistical representations/descriptors based on each feature and clip serve as input of the proposed Gated Recurrent Unit (GRU)-architecture. Temporal representations of individual features at the lower level of the proposed architecture are combined at deeper layers of the proposed GRU architecture, in order to obtain the final feature-set for apathy classification. Based on extensive experiments, we show that fusion of characteristics such as emotion and facial dynamics in proposed deep-bi-directional GRU obtains an accuracy of 95.34% in apathy classification.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • An OCR Post-Correction Approach Using Deep Learning for Processing Medical
           Reports

    • Free pre-print version: Loading...

      Authors: Srinidhi Karthikeyan;Alba G. Seco de Herrera;Faiyaz Doctor;Asim Mirza;
      Pages: 2574 - 2581
      Abstract: According to a recent Deloitte study, the COVID-19 pandemic continues to place a huge strain on the global health care sector. Covid-19 has also catalysed digital transformation across the sector for improving operational efficiencies. As a result, the amount of digitally stored patient data such as discharge letters, scan images, test results or free text entries by doctors has grown significantly. In 2020, 2314 exabytes of medical data was generated globally. This medical data does not conform to a generic structure and is mostly in the form of unstructured digitally generated or scanned paper documents stored as part of a patient’s medical reports. This unstructured data is digitised using Optical Character Recognition (OCR) process. A key challenge here is that the accuracy of the OCR process varies due to the inability of current OCR engines to correctly transcribe scanned or handwritten documents in which text may be skewed, obscured or illegible. This is compounded by the fact that processed text is comprised of specific medical terminologies that do not necessarily form part of general language lexicons. The proposed work uses a deep neural network based self-supervised pre-training technique: Robustly Optimized Bidirectional Encoder Representations from Transformers (RoBERTa) that can learn to predict hidden (masked) sections of text to fill in the gaps of non-transcribable parts of the documents being processed. Evaluating the proposed method on domain-specific datasets which include real medical documents, shows a significantly reduced word error rate demonstrating the effectiveness of the approach.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Characterization of Pulmonary Nodules in Computed Tomography Images Based
           on Pseudo-Labeling Using Radiology Reports

    • Free pre-print version: Loading...

      Authors: Yohei Momoki;Akimichi Ichinose;Yutaro Shigeto;Ukyo Honda;Keigo Nakamura;Yuji Matsumoto;
      Pages: 2582 - 2591
      Abstract: A computer-aided diagnosis (CAD) system that characterizes nodules in medical images can help radiologists determine its malignancy. Preparing large volumes of labeled data for CAD systems, however, requires advanced medical knowledge. This makes it extremely difficult to develop such systems, despite their growing demand. In this paper, we propose a new training method to build an image classifier for characterization of nodules utilizing pseudo-labels, i.e., image labels automatically retrieved from radiology reports. A radiology report is a type of record in which radiologists present a summary of lesion characteristics and diagnosis. Labeling radiology reports is much easier than labeling radiology images, and can be done without high expertise. Using several thousand labeled reports, we constructed a hierarchical attention network-based text classifier to assign pseudo-labels of the characteristics of pulmonary nodules with high accuracy (macro F1-score of 0.941). Experimental results show that the image classifier trained with the pseudo-labels can achieve almost the same performance as the one trained with the labels annotated by radiologists: AUC 0.848 for the model trained with the pseudo-labels on 3,000 computed tomography (CT) images and 0.847 for the model trained with the manual labels on 800 CT images.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Continuous Prediction of Lower-Limb Kinematics From Multi-Modal Biomedical
           Signals

    • Free pre-print version: Loading...

      Authors: Chunzhi Yi;Feng Jiang;Shengping Zhang;Hao Guo;Chifu Yang;Zhen Ding;Baichun Wei;Xiangyuan Lan;Huiyu Zhou;
      Pages: 2592 - 2602
      Abstract: The fast-growing techniques of measuring and fusing multi-modal biomedical signals enable advanced motor intent decoding schemes of lower-limb exoskeletons, meeting the increasing demand for rehabilitative or assistive applications of take-home healthcare. Challenges of exoskeletons’ motor intent decoding schemes remain in making a continuous prediction to compensate for the hysteretic response caused by mechanical transmission. In this paper, we solve this problem by proposing an ahead-of-time continuous prediction of lower-limb kinematics, with the prediction of knee angles during level walking as a case study. Firstly, an end-to-end kinematics prediction network(KinPreNet),1 consisting of a feature extractor and an angle predictor, is proposed and experimentally compared with features and methods traditionally used in ahead-of-time prediction of gait phases. Secondly, inspired by the electromechanical delay(EMD), we further explore our algorithm’s capability of compensating response delay of mechanical transmission by validating the performance of the different sections of prediction time. And we experimentally reveal the time boundary of compensating the hysteretic response. Thirdly, a comparison of employing EMG signals or not is performed to reveal the EMG and kinematic signals’ collaborated contributions to the continuous prediction. During the experiments, EMG signals of nine muscles and knee angles calculated from inertial measurement unit (IMU) signals are recorded from ten healthy subjects. Our algorithm can predict knee angles with the averaged RMSE of 3.98 deg which is better than the 15.95-deg averaged RMSE of utilizing the traditional methods of ahead-of-time prediction. The best prediction time is in the interval of 27ms and 108ms. To the best of our knowledge, this is the first study of continuously predicting lower-limb kinematics in an ahead-o--time manner based on the electromechanical delay (EMD).
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • An Ensemble Framework for Improving the Prediction of Deleterious
           Synonymous Mutation

    • Free pre-print version: Loading...

      Authors: Na Cheng;Huadong Wang;Xi Tang;Tao Zhang;Jie Gui;Chun-Hou Zheng;Junfeng Xia;
      Pages: 2603 - 2611
      Abstract: In recent years, the association between synonymous mutations (SMs) and human diseases has been uncovered in many studies. It is a challenge for identifying deleterious SMs in the field of medical genomics. Although there are several computational methods proposed in the past years, the precise prediction of deleterious SMs is still challenging. In this work, we proposed a predictor named as EnDSM, which is an accurate method based on the ensemble framework. We explored multimodal features across four groups including functional score, conservation, splicing, and sequence features, and we then trained eight conceptually different machine learning classifiers for each of them, resulting in 32 base classification models. We further selected four base models referring to their prediction performance and the predictive probabilities of these base classification models were subsequently used as the input feature vectors of logistic regression classifier to construct the ensemble learning model. The results suggested that EnDSM achieved better performance comparing with other state-of-the-art predictors on the training and independent test datasets. We anticipate that our ensemble predictor EnDSM will become a valuable tool for deleterious SM prediction. The EnDSM server interface along with the benchmarking data sets are freely available at http://bioinfo.ahu.edu.cn/EnDSM.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • PhyDAA: Physiological Dataset Assessing Attention

    • Free pre-print version: Loading...

      Authors: Victor Delvigne;Hazem Wannous;Thierry Dutoit;Laurence Ris;Jean-Philippe Vandeborre;
      Pages: 2612 - 2623
      Abstract: Attention Deficit Hyperactivity Disorder (ADHD) is the most prevalent neurodevelopmental disorder among children. It affects patients’ lives in many ways: inattention, difficulty with stimuli inhibition or motor function regulation. Different treatments exist today, but these can present side effects or are not effective for all subgroups. Neurofeedback (NF) is an innovative treatment consisting of brain activity display. NF training could consist of a virtual reality (VR) video-game in which the participant’s attention affects the game. Attention being assessed through physiological signals, one of the main steps is to design an estimator for the attention state. We present a novel framework able to record physiological signals in specific attention states and able to estimate the corresponding attention state. We propose a database composed of electroencephalography signals (EEG), and an eye-tracker labelled with a score representing the attention span for 32 healthy participants. Different features are extracted from the signals and machine learning (ML) algorithms are proposed. Our approach exhibits high accuracy for attention estimation, which corroborates a correlation between attention state and physiological signals (i.e. EEG, eye-tracking signals). The dataset has been made publicly available to promote research in the domain and we encourage other scientists to use their own approach for attention estimation.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Hash Learning With Variable Quantization for Large-Scale Retrieval

    • Free pre-print version: Loading...

      Authors: Yuan Cao;Sheng Chen;Jie Gui;Heng Qi;Zhiyang Li;Chao Liu;
      Pages: 2624 - 2637
      Abstract: Approximate Nearest Neighbor(ANN) search is the core problem in many large-scale machine learning and computer vision applications such as multimodal retrieval. Hashing is becoming increasingly popular, since it can provide efficient similarity search and compact data representations suitable for handling such large-scale ANN search problems. Most hashing algorithms concentrate on learning more effective projection functions. However, the accuracy loss in the quantization step has been ignored and barely studied. In this paper, we analyse the importance of various projected dimensions, distribute them into several groups and quantize them with two types of values which can both better preserve the neighborhood structure among data. One is Variable Integer-based Quantization (VIQ) that quantizes each projected dimension with integer values. The other is Variable Codebook-based Quantization (VCQ) that quantizes each projected dimension with corresponding codebook values. We conduct experiments on five common public data sets containing up to one million vectors. The results show that the proposed VCQ and VIQ algorithms can both achieve much higher accuracy than state-of-the-art quantization methods. Furthermore, although VCQ performs better than VIQ, ANN search with VIQ provides much higher search efficiency.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Feature Selection With Multi-Source Transfer

    • Free pre-print version: Loading...

      Authors: Joey Tianyi Zhou;
      Pages: 2638 - 2646
      Abstract: Feature selection aims at choosing a subset of features to represent the original feature space. In practice, however, it is hard to achieve desirable performance due to limited training data. To alleviate this issue, we propose a novel problem named feature selection with multi-source transfer where the privileged information from another data source or modality– only available during the training phase, is exploited to improve the performance of feature selection. To be exact, we propose a novel objective function that formulates the privileged information into feature selection. Moreover, an efficient optimization algorithm is introduced to solve the proposed problem of high dimension. Extensive experimental results demonstrate that the proposed algorithm significantly outperforms several popular algorithms, especially when the training data size and the selected feature size are small.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Survey on Mapping Human Hand Motion to Robotic Hands for Teleoperation

    • Free pre-print version: Loading...

      Authors: Rui Li;Hongyu Wang;Zhenyu Liu;
      Pages: 2647 - 2665
      Abstract: Mapping human hand motion to robotic hands has great significance in a wide range of applications, such as teleoperation and imitation learning. The ultimate goal is to develop a device-independent control solution based on human hand synergies. Over the past twenty years, a considerable number of mapping methods have been proposed, but most of the mapping methods use intrusive devices, such as the CyberGlove data gloves, to capture human hand motion. Until recently, a very small number of mapping methods have been proposed based on vision-based human hand pose estimation. Traditionally, mapping methods and vision-based human hand pose estimation have been studied independently. To the best of our knowledge, no review has been conducted to summarize the achievements on haptic mapping methods or explore the feasibility of applying off-the-shelf human hand pose estimation algorithms to teleoperation. To address this literature gap, we present the first survey on mapping human hand motion to robotic hands from a kinematic and algorithmic perspective. We discuss the realistic challenges, intuitively summarize recent mapping methods, analyze the theoretical solutions, and provide a teleoperation-oriented human hand pose estimation overview. As a preliminary exploration, a vision-based human hand pose estimation algorithm is introduced for robotic hand teleoperation.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Real-Time Tone Mapping: A Survey and Cross-Implementation Hardware
           Benchmark

    • Free pre-print version: Loading...

      Authors: Yafei Ou;Prasoon Ambalathankandy;Shinya Takamaeda;Masato Motomura;Tetsuya Asai;Masayuki Ikebe;
      Pages: 2666 - 2686
      Abstract: The rising demand for high quality display has ensued active research in high dynamic range (HDR) imaging, which has the potential to replace the standard dynamic range imaging. This is due to HDR’s features like accurate reproducibility of a scene with its entire spectrum of visible lighting and color depth. But this capability comes with expensive capture, display, storage and distribution resource requirements. Also, display of HDR images/video content on an ordinary display device with limited dynamic range requires some form of adaptation. Many adaptation algorithms, widely known as tone mapping (TM) operators, have been studied and proposed in the last few decades. In this article, we present a comprehensive survey of 60 TM algorithms that have been implemented on hardware for acceleration and real-time performance. In this state-of-the-art survey, we will discuss those TM algorithms which have been implemented on GPU, FPGA, and ASIC in terms of their hardware specifications and performance. Output image quality is an important metric for TM algorithms. From our literature survey we found that, various objective quality metrics have been used to demonstrate the quality of those algorithms hardware implementation. We have compiled those metrics used in this survey, and analyzed the relationship between hardware cost, image quality and computational efficiency. Currently, machine learning-based (ML) algorithms have become an important tool to solve many image processing tasks, and this article concludes with a discussion on the future research directions to realize ML-based TM operators on hardware.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • A Decade Survey of Content Based Image Retrieval Using Deep Learning

    • Free pre-print version: Loading...

      Authors: Shiv Ram Dubey;
      Pages: 2687 - 2704
      Abstract: The content based image retrieval aims to find the similar images from a large scale dataset against a query image. Generally, the similarity between the representative features of the query image and dataset images is used to rank the images for retrieval. In early days, various hand designed feature descriptors have been investigated based on the visual cues such as color, texture, shape, etc. that represent the images. However, the deep learning has emerged as a dominating alternative of hand-designed feature engineering from a decade. It learns the features automatically from the data. This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval. The categorization of existing state-of-the-art methods from different perspectives is also performed for greater understanding of the progress. The taxonomy used in this survey covers different supervision, different networks, different descriptor type and different retrieval type. A performance analysis is also performed using the state-of-the-art methods. The insights are also presented for the benefit of the researchers to observe the progress and to make the best choices. The survey presented in this paper will help in further research progress in image retrieval using deep learning.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Quadratic Terms Based Point-to-Surface 3D Representation for Deep Learning
           of Point Cloud

    • Free pre-print version: Loading...

      Authors: Tiecheng Sun;Guanghui Liu;Ru Li;Shuaicheng Liu;Shuyuan Zhu;Bing Zeng;
      Pages: 2705 - 2718
      Abstract: In this paper, we introduce a novel point-to-surface representation for 3D point cloud learning. Unlike the previous methods that mainly adopt voxel, mesh, or point coordinates, we propose to tackle this problem from a new perspective: learn a set of quadratic terms based static and global reference surfaces to describe 3D shapes, such that the coordinates of a 3D point (x, y, z) can be extended to quadratic terms (xy, xz, yz, $ldots $ ) and transformed to the relationship between the local point and the global reference surfaces. Then, the static surfaces are changed into dynamic surfaces by adaptive contribution weighting to improve the descriptive capability. Towards this end, we propose our point-to-surface representation, a new representation for 3D point cloud learning that has not been attempted before, which can assemble local and global geometric information effectively by building connections between the point cloud and the learned reference surfaces. Given 3D points, we show how the reference surfaces are constructed, and how they are inserted into the 3D learning pipeline for different tasks. The experimental results confirm the effectiveness of our new representation, which has outperformed the state-of-the-art methods on the tasks of 3D classification and segmentation.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Image-Scale-Symmetric Cooperative Network for Defocus Blur Detection

    • Free pre-print version: Loading...

      Authors: Fan Zhao;Huimin Lu;Wenda Zhao;Libo Yao;
      Pages: 2719 - 2731
      Abstract: Defocus blur detection (DBD) for natural images is a challenging vision task especially in the presence of homogeneous regions and gradual boundaries. In this paper, we propose a novel image-scale-symmetric cooperative network (IS2CNet) for DBD. On one hand, in the process of image scales from large to small, IS2CNet gradually spreads the recept of image content. Thus, the homogeneous region detection map can be optimized gradually. On the other hand, in the process of image scales from small to large, IS2CNet gradually feels the high-resolution image content, thereby gradually refining transition region detection. In addition, we propose a hierarchical feature integration and bi-directional delivering mechanism to transfer the hierarchical feature of previous image scale network to the input and tail of the current image scale network for guiding the current image scale network to better learn the residual. The proposed approach achieves state-of-the-art performance on existing datasets. Codes and results are available at: https://github.com/wdzhao123/IS2CNet.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • A Novel Video Salient Object Detection Method via Semisupervised Motion
           Quality Perception

    • Free pre-print version: Loading...

      Authors: Chenglizhao Chen;Jia Song;Chong Peng;Guodong Wang;Yuming Fang;
      Pages: 2732 - 2745
      Abstract: Previous video salient object detection (VSOD) approaches have mainly focused on the perspective of network design for achieving performance improvements. However, with the recent slowdown in the development of deep learning techniques, it might become increasingly difficult to anticipate another breakthrough solely via complex networks. Therefore, this paper proposes a universal learning scheme to obtain a further 3% performance improvement for all state-of-the-art (SOTA) VSOD models. The major highlight of our method is that we propose the ‘motion quality’, a new concept for mining video frames from the ‘buffered’ testing video stream for constructing a fine-tuning set. By using our approach, all frames in this set can all well-detect their salient object by the ‘target SOTA model’ — the one we want to improve. Thus, the VSOD results of the mined set, which were previously derived by the target SOTA model, can be directly applied as pseudolearning objectives to fine-tune a completely new spatial model that has been pretrained on the widely used DAVIS-TR set. Since some spatial scenes in the buffered testing video stream are shown, the fine-tuned spatial model can perform very well for the remaining unseen testing frames, outperforming the target SOTA model significantly. Although offline model fine tuning requires additional time costs, the performance gain can still benefit scenarios without speed requirements. Moreover, its semisupervised methodology might have considerable potential to inspire the VSOD community in the future.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Fine-Grained Image Quality Assessment: A Revisit and Further Thinking

    • Free pre-print version: Loading...

      Authors: Xinfeng Zhang;Weisi Lin;Qingming Huang;
      Pages: 2746 - 2759
      Abstract: Image quality assessment (IQA) plays a central role in many image processing algorithms and systems. Although many popular IQA models achieves high performance on existing released databases, they are still not well accepted in practical applications due to the not-always satisfactory accuracy on real-world data and situations. In this paper, we revisit the IQA research, and point out an ignored but interesting problem in IQA: the coarse-grained (i.e., when quality variation is sufficiently big, as the setting of most IQA databases up to date) statistical results evaluated on existing databases mask the fine-grained differentiation. Accordingly, we present a survey on image quality assessment from a new perspective: fine-grained image quality assessment (FG-IQA). Recent FG-IQA research on five major kinds of images is introduced, and some popular IQA methods are analyzed from FG-IQA perspective. The potential problems for current IQA research based on existing coarse-grained databases are analyzed and the necessity of more FG-IQA research is justified. Finally, we discuss some challenges and possible directions for future works in FG-IQA.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • TMS-GAN: A Twofold Multi-Scale Generative Adversarial Network for Single
           Image Dehazing

    • Free pre-print version: Loading...

      Authors: Pengyu Wang;Hongqing Zhu;Hui Huang;Han Zhang;Nan Wang;
      Pages: 2760 - 2772
      Abstract: In recent years, learning-based single image dehazing networks have been comprehensively developed. However, performance improvement is limited due to domain shift between trained synthetic hazy images and untrained real-world hazy images. To alleviate this issue, this paper proposes a real-world dehazing targeted training scheme which nearly realizes paired real-world data training. As a result, a Twofold Multi-scale Generative Adversarial Network (TMS-GAN) consisting of a Haze-generation GAN (HgGAN) and a Haze-removal GAN (HrGAN) is designed. HgGAN attributes real haze properties to synthetic images and HrGAN removes haze from both synthetic and generated fake realistic data under supervision. Thus, the proposed method can better adapt to real-world image dehazing using this cooperative training scheme. Meanwhile, several structural advances of TMS-GAN also improve dehazing performance. Specifically, a haze residual map based on atmospheric scattering model is deduced in HgGAN for fake realistic data generation. The dual-branch generator in HrGAN draws attention to detail restoration by one branch along with another color-branch. A plug-and-play Multi-attention Progressive Fusion Module (MAPFM) is proposed and inserted in both HgGAN and HrGAN. MAPFM incorporates multi-attention mechanism to guide multi-scale feature fusion in a progressive manner, in which Adjacency-attention Block (AAB) can capture contributing features of each level and Self-attention Block (SAB) can establish non-local dependency of feature fusion. Experiments on mainstream benchmarks show that the proposed framework is superior especially on real-world hazy images among single image dehazing methods.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Residual-Guided Multiscale Fusion Network for Bit-Depth Enhancement

    • Free pre-print version: Loading...

      Authors: Jing Liu;Xin Wen;Weizhi Nie;Yuting Su;Peiguang Jing;Xiaokang Yang;
      Pages: 2773 - 2786
      Abstract: Bit-depth enhancement (BDE) is a challenging task due to stubborn false contour artifacts and disappeared detailed information. Given the mixture of structural distortions and real edges in low bit-depth (LBD) images, both large and small receptive fields (RFs) are critical for BDE tasks. However, even powerful state-of-the-art CNN-based methods can hardly capture sufficient LBD features under multiple RFs. This paper proposes a residual-guided multiscale fusion network (RMFNet) to explore multiscale features in a residual manner. We find that the shuffling operation provides desired multiscale inputs for effectively distinguishing false contours from real edges without any loss of information. Therefore, we shuffle LBD images to multiple scales and then fully extract residual features under different RFs with corresponding subnets. To facilitate interscale guidance from the global context to the local context, we progressively transfer the encoded residual features between adjacent subnets from top to bottom. We further propose a dual-branch depthwise group fusion (DDGF) module to fully capture inter- and inner correlations of multiscale features with fewer parameters. Finally, extensive experiments show that our algorithm achieves excellent performance improvement both quantitatively and qualitatively, verifying its effectiveness.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Structured Context Enhancement Network for Mouse Pose Estimation

    • Free pre-print version: Loading...

      Authors: Feixiang Zhou;Zheheng Jiang;Zhihua Liu;Fang Chen;Long Chen;Lei Tong;Zhile Yang;Haikuan Wang;Minrui Fei;Ling Li;Huiyu Zhou;
      Pages: 2787 - 2801
      Abstract: Automated analysis of mouse behaviours is crucial for many applications in neuroscience. However, quantifying mouse behaviours from videos or images remains a challenging problem, where pose estimation plays an important role in describing mouse behaviours. Although deep learning based methods have made promising advances in human pose estimation, they cannot be directly applied to pose estimation of mice due to different physiological natures. Particularly, since mouse body is highly deformable, it is a challenge to accurately locate different keypoints on the mouse body. In this paper, we propose a novel Hourglass network based model, namely Graphical Model based Structured Context Enhancement Network (GM-SCENet) where two effective modules, i.e., Structured Context Mixer (SCM) and Cascaded Multi-level Supervision (CMLS) are subsequently implemented. SCM can adaptively learn and enhance the proposed structured context information of each mouse part by a novel graphical model that takes into account the motion difference between body parts. Then, the CMLS module is designed to jointly train the proposed SCM and the Hourglass network by generating multi-level information, increasing the robustness of the whole network. Using the multi-level prediction information from SCM and CMLS, we develop an inference method to ensure the accuracy of the localisation results. Finally, we evaluate our proposed approach against several baselines on our Parkinson’s Disease Mouse Behaviour (PDMB) and the standard DeepLabCut Mouse Pose datasets. The experimental results show that our method achieves better or competitive performance against the other state-of-the-art approaches.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Fuzzified Contrast Enhancement for Nearly Invisible Images

    • Free pre-print version: Loading...

      Authors: Reman Kumar;Ashish Kumar Bhandari;
      Pages: 2802 - 2813
      Abstract: Image enhancement is a basic requirement for any computer vision application for further processing of an image. A common limitation with most of the existing methods, when applied to nearly invisible images, is the loss of color details during the enhancement process. So, a fuzzy c-means clustering-based method for image enhancement is proposed which enhances the perceptually invisible image along with preserving its color and naturalness. In this method, the image pixels are grouped into different clusters and are assigned membership values to those clusters. Based on this membership value, its intensity level is modified in the spatial domain. Modification of the gray levels proportional to the membership values leads to the stretching of the image histogram, similar in shape, to the original histogram. The process results in a very small shift in the mean intensity which preserves the color and brightness-related information of the image. The method enhances the image contrast and maintains the naturalness without introducing any artifacts. The simulation results on standard datasets reflect that the proposed algorithm is superior to many state-of-the-art and traditional methods for perceptually invisible images.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Triple Adversarial Learning and Multi-View Imaginative Reasoning for
           Unsupervised Domain Adaptation Person Re-Identification

    • Free pre-print version: Loading...

      Authors: Huafeng Li;Neng Dong;Zhengtao Yu;Dapeng Tao;Guanqiu Qi;
      Pages: 2814 - 2830
      Abstract: Due to the importance of practical applications, unsupervised domain adaptation (UDA) person re-identification (re-ID) has attracted increasing attention. However, most of existing methods often lack the multi-view information reasoning and ignore the domain discrepancy of the pedestrian images with the same identity, which constrain the further improvement of recognition performance. So, this paper proposes a triple adversarial learning and multi-view imaginative reasoning network (TAL-MIRN) for UDA person re-ID, which consists of a multi-view imaginative reasoning module (IRM) and a triple adversarial learning module (TALM). IRM makes the classified pedestrian identity features from a single-view image extracted by a feature encoder consistent with the classification results of the aggregated multi-view pedestrian identity features, so the strong multi-view imaginative reasoning ability of the feature encoder is obtained. TALM is composed by the adversarial learning between the camera classifier and feature encoder, adversarial learning of joint distribution alignment, and adversarial learning of the difference between two classifiers used in classification. In particular, the domain-invariant features at camera level are guaranteed by the adversarial learning between the feature extractor and camera classifier. The joint alignment of identity and domain is achieved by the competition between the feature extractor and classifier integrated with identity and domain. The discriminability and robustness of the learned features are enhanced by playing a MinMax game between two different identity classifiers. Furthermore, a simple normalization operation named as cross normalization (CN) is proposed to increase both modeling and generalization capability of the proposed TAL-MIRN across multiple domains. The proposed TAL-MIRN is applied to five benchmark datasets, and the comparative experimental results confirm its superiority over the state-of-the-art methods. The re-ated source codes is available at https://github.com/lhf12278/TALM-IRM.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • A New Dataset, Poisson GAN and AquaNet for Underwater Object Grabbing

    • Free pre-print version: Loading...

      Authors: Chongwei Liu;Zhihui Wang;Shijie Wang;Tao Tang;Yulong Tao;Caifei Yang;Haojie Li;Xing Liu;Xin Fan;
      Pages: 2831 - 2844
      Abstract: To boost the object grabbing capability of underwater robots for open-sea farming, we propose a new dataset (UDD) consisting of three categories (seacucumber, seaurchin, and scallop) with 2,227 images. To the best of our knowledge, it is the first 4K HD dataset collected in a real open-sea farm. We also propose a novel Poisson-blending Generative Adversarial Network (Poisson GAN) and an efficient object detection network (AquaNet) to address two common issues within related datasets: the class-imbalance problem and the problem of mass small object, respectively. Specifically, Poisson GAN combines Poisson blending into its generator and employs a new loss called Dual Restriction loss (DR loss), which supervises both implicit space features and image-level features during training to generate more realistic images. By utilizing Poisson GAN, objects of minority class like seacucumber or scallop could be added into an image naturally and annotated automatically, which could increase the loss of minority classes during training detectors to eliminate the class-imbalance problem; AquaNet is a high-efficiency detector to address the problem of detecting mass small objects from cloudy underwater pictures. Within it, we design two efficient components: a depth-wise-convolution-based Multi-scale Contextual Features Fusion (MFF) block and a Multi-scale Blursampling (MBP) module to reduce the parameters of the network to 1.3 million. Both two components could provide multi-scale features of small objects under a short backbone configuration without any loss of accuracy. In addition, we construct a large-scale augmented dataset (AUDD) and a pre-training dataset via Poisson GAN from UDD. Extensive experiments show the effectiveness of the proposed Poisson GAN, AquaNet, UDD, AUDD, and pre-training dataset.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • TCDesc: Learning Topology Consistent Descriptors for Image Matching

    • Free pre-print version: Loading...

      Authors: Honghu Pan;Yongyong Chen;Zhenyu He;Fanyang Meng;Nana Fan;
      Pages: 2845 - 2855
      Abstract: The triplet loss is widely used in learning the local descriptors for image matching. However, existing triplet loss-based methods, like HardNet and DSM, employ the point-to-point distance metric, which neglects the neighborhood information of descriptors. Considering the fact that local neighborhood structures of matching descriptors would be similar under the ideal condition, this paper aims to learn the neighborhood topology-consistent descriptors (TCDesc). To this end, we first propose the linear combination weight as the topology weight to depict the neighborhood topology for each descriptor, where the difference between the center descriptor and the linear combination of its neighbors is minimized. For the global comparison, we then define a global topology vector by using the local topology weights. Next, beyond the Euclidean distance, we define a topology distance with the topology vectors to indicate the topological difference between the matching descriptors. Furthermore, we propose an adaptive weighting strategy to jointly minimize the topology distance and Euclidean distance in triplet loss. Experimental results on four widely-used datasets, i.e., UBC PhotoTourism, HPatches, W1BS and Oxford, demonstrate that our method can effectively improve the performance of both HardNet and DSM.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation

    • Free pre-print version: Loading...

      Authors: Shuaicheng Liu;Haipeng Li;Zhengning Wang;Jue Wang;Shuyuan Zhu;Bing Zeng;
      Pages: 2856 - 2867
      Abstract: Mobile captured images can be aligned using their gyroscope sensors. Optical image stabilizer (OIS) terminates this possibility by adjusting the images during the capturing. In this work, we propose a deep network that compensates for the motions caused by the OIS, such that the gyroscopes can be used for image alignment on the OIS cameras. To achieve this, we first record both videos and gyroscope readings with an OIS camera as training data. Then, we convert gyroscope readings into motion fields. Second, we propose an Essential Mixtures motion model for rolling shutter cameras, where an array of rotations within a frame are extracted as the ground-truth guidance. Third, we train a convolutional neural network with gyroscope motions as input to compensate for the OIS motion. Once finished, the compensation network can be applied for other scenes, where the image alignment is purely based on gyroscopes with no need for images contents, delivering strong robustness. Experiments show that our results are comparable with that of non-OIS cameras, and outperform image-based alignment results with a relatively large margin. Code and dataset is available at: https://github.com/lhaippp/DeepOIS.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Efficient PVO-Based Reversible Data Hiding by Selecting Blocks With
           Full-Enclosing Context

    • Free pre-print version: Loading...

      Authors: Shijun Xiang;Guanqi Ruan;
      Pages: 2868 - 2880
      Abstract: In reversible data hiding (RDH) schemes, how to select those smooth pixels, pixel pairs or pixel blocks in order to improve performance is an important issue. For pixel-value-ordering (PVO) based RDH schemes, two existing techniques appear to be defective since only two reference pixels in a block or the right and bottom neighbors of a block are exploited as the context for block selection, and their performance might be only adequate for those smooth images. For rough images, the embedding performance could be affected. In this paper, an efficient block selection method by computing a block’s smoothness with a full-enclosing context (FEC) way is proposed. Obtained results show that the proposed FEC strategy can better estimate a block’s smoothness for PVO-based schemes. Furthermore, a more scalable pairing way is presented for the recently reported location-based PVO predictor. The proposed PVO scheme can be implemented by dividing the cover image and embedding bits into two different types of blocks, respectively. Experimental results show that the marked images by the proposed two-stage PVO scheme have higher visual quality, e.g., the average PSNR for the Kodak image database is 63.31 dB after embedding 10,000 bits, and the gain is 0.16 dB against the best result in the literature. Compared with some state-of-the-art RDH works, the superiority of the proposed algorithm has been verified in extensive experiments.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • MRS-Net+ for Enhancing Face Quality of Compressed Videos

    • Free pre-print version: Loading...

      Authors: Tie Liu;Mai Xu;Shengxi Li;Rui Ding;Huaida Liu;
      Pages: 2881 - 2894
      Abstract: During the past few years, face videos, e.g., video conference, interviews and variety shows, have grown explosively with millions of users over social media networks. Unfortunately, the existing compression algorithms are applied to these videos for reducing bandwidth, which also bring annoying artifacts to face regions. This paper addresses the problem of face quality enhancement in compressed videos by reducing the artifacts of face regions. Specifically, we establish a compressed face video (CFV) database, which includes 196,337 faces in 214 high-quality video sequences and their corresponding 1,712 compressed sequences. We find that the faces of compressed videos exhibit tremendous scale variation and quality fluctuation. Motivated by scalable video coding, we propose a multi-scale recurrent scalable network (MRS-Net+) to enhance the quality of multi-scale faces in compressed videos. The MRS-Net+ is comprised by one base and two refined enhancement levels, corresponding to the quality enhancement of small-, medium- and large-scale faces, respectively. In the multi-level architecture of our MRS-Net+, small-/medium-scale face quality enhancement serves as the basis for facilitating the quality enhancement of medium-/large-scale faces. We further develop a landmark-assisted pyramid alignment (LPA) subnet to align faces across consecutive frames, and then apply the mask-guided quality enhancement (QE) subnet for enhancing multi-scale faces. Finally, experimental results show that our MRS-Net+ method achieves averagely 1.196 dB improvement of peak signal-to-noise ratio (PSNR) and 23.54% saving of Bjøntegaard distortion-rate (BD-rate), significantly outperforming other state-of-the-art methods.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Assessing Individual VR Sickness Through Deep Feature Fusion of VR Video
           and Physiological Response

    • Free pre-print version: Loading...

      Authors: Sangmin Lee;Seongyeop Kim;Hak Gu Kim;Yong Man Ro;
      Pages: 2895 - 2907
      Abstract: Recently, VR sickness assessment for VR videos is highly demanded in industry and research fields to address VR viewing safety issues. Especially, it is difficult to evaluate VR sickness of individuals due to individual differences. To achieve the challenging goal, we focus on deep feature fusion of sickness-related information. In this paper, we propose a novel deep learning-based assessment framework which estimates VR sickness of individual viewers with VR videos and corresponding physiological responses. We design the content stimulus guider imitating the phenomenon that humans feel VR sickness. The content stimulus guider extracts a deep stimulus feature from a VR video to reflect VR sickness caused by VR videos. In addition, we devise the physiological response guider to encode physiological responses that are acquired while humans experience VR videos. Each physiology sickness feature extractor (EEG, ECG, and GSR) in the physiological response guider is designed to suit their physiological characteristics. Extracted physiology sickness features are then fused into a deep physiology feature that comprehensively reflects individual deviations of VR sickness. Finally, the VR sickness predictor assesses individual VR sickness effectively with the fusion of the deep stimulus feature and the deep physiology feature. To validate the proposed method extensively, we built two benchmark datasets which contain 360-degree VR videos with physiological responses (EEG, ECG, and GSR) and SSQ scores. Experimental results show that the proposed method achieves meaningful correlations with human SSQ scores. Further, we validate the effectiveness of the proposed network designs by conducting analysis on feature fusion and visualization.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Target-Aware State Estimation for Visual Tracking

    • Free pre-print version: Loading...

      Authors: Zikun Zhou;Xin Li;Nana Fan;Hongpeng Wang;Zhenyu He;
      Pages: 2908 - 2920
      Abstract: Trackers based on the IoU prediction network (IoU-Net) have shown superior performance, which refines a coarse bounding box to an accurate one by maximizing the IoU between the target and the coarse box. However, the traditional IoU-Net is less effective in exploiting the limited but crucial supervision information contained in the initial frame, including the discriminative information between the target and backgrounds and the structure information of the initial target. Missing such information makes the IoU-Net less robust to background distractors and diverse variations of the target appearance. To address this issue, we propose a target-aware state estimation network for visual tracking. A gradient-guided feature adjustment module is built on an online discriminative model to generate target-aware features for constructing the state estimation network; it conveys the online learned discriminative information into the offline trained state estimation network. In addition, we propose a structure-aware integration module and embed it into the state estimation network, enabling the tracker to explicitly model the structure information of the initial target. Extensive experimental results on the VOT2018, OTB2015, UAV123, NFS30, TC128, TrackingNet, LaSOT, and VOT2018-LT datasets demonstrate that the proposed approach performs favorably against state-of-the-art trackers.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Adversarial Camera Alignment Network for Unsupervised Cross-Camera Person
           Re-Identification

    • Free pre-print version: Loading...

      Authors: Lei Qi;Lei Wang;Jing Huo;Yinghuan Shi;Xin Geng;Yang Gao;
      Pages: 2921 - 2936
      Abstract: In person re-identification (Re-ID), supervised methods usually need a large amount of expensive label information, while unsupervised ones are still unable to deliver satisfactory identification performance. In this paper, we introduce a novel person Re-ID task called unsupervised cross-camera person Re-ID, which only needs the within-camera (intra-camera) label information but not cross-camera (inter-camera) labels which are more expensive to obtain. In real-world applications, the intra-camera label information can be easily captured by tracking algorithms and few manual annotations. In this situation, the main challenge becomes the distribution discrepancy across different camera views, caused by the various body pose, occlusion, image resolution, illumination conditions, and background noises in different cameras. To address this situation, we propose a novel Adversarial Camera Alignment Network (ACAN) for unsupervised cross-camera person Re-ID. It consists of the camera-alignment task and the supervised within-camera learning task. To achieve the camera alignment, we develop a Multi-Camera Adversarial Learning (MCAL) to map images of different cameras into a shared subspace. Particularly, we investigate two different schemes, including the existing GRL (i.e., gradient reversal layer) scheme and the proposed scheme called “other camera equiprobability” (OCE), to conduct the multi-camera adversarial task. Based on this shared subspace, we then leverage the within-camera labels to train the network. Extensive experiments on five large-scale datasets demonstrate the superiority of ACAN over multiple state-of-the-art unsupervised methods that take advantage of labeled source domains and generated images by GAN-based models. In particular, we verify that the proposed multi-camera adversarial task does contribute to the significant improvement.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Gaussian Dynamic Convolution for Efficient Single-Image Segmentation

    • Free pre-print version: Loading...

      Authors: Xin Sun;Changrui Chen;Xiaorui Wang;Junyu Dong;Huiyu Zhou;Sheng Chen;
      Pages: 2937 - 2948
      Abstract: Interactive single-image segmentation is ubiquitous in the scientific and commercial imaging software. Lightweight neural network is one practical and effective way to accomplish the single-image segmentation task. This work focuses on the single-image segmentation problem only with some seeds such as scribbles. Inspired by the dynamic receptive field in the human being’s visual system, we propose the Gaussian dynamic convolution (GDC) to fast and efficiently aggregate the contextual information for neural networks. The core idea is randomly selecting the spatial sampling area according to the Gaussian distribution offsets. Our GDC can be easily used as a module to build lightweight or complex segmentation networks. We adopt the proposed GDC to address the typical single-image segmentation tasks. Furthermore, we also build a Gaussian dynamic pyramid Pooling to show its potential and generality in common semantic segmentation. Experiments demonstrate that the GDC outperforms other existing convolutions on three benchmark segmentation datasets including Pascal-Context, Pascal-VOC 2012, and Cityscapes. Additional experiments are also conducted to illustrate that the GDC can produce richer and more vivid features compared with other convolutions. In general, our GDC is conducive to the convolutional neural networks to form an overall impression of the image.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection

    • Free pre-print version: Loading...

      Authors: Jie Wang;Kechen Song;Yanqi Bao;Liming Huang;Yunhui Yan;
      Pages: 2949 - 2961
      Abstract: RGB salient object detection (SOD) has made great progress. However, the performance of this single-modal salient object detection will be significantly decreased when encountering some challenging scenes, such as low light or darkness. To deal with the above challenges, thermal infrared (T) image is introduced into the salient object detection. This fused modal is called RGB-T salient object detection. To achieve deep mining of the unique characteristics of single modal and the full integration of cross-modality information, a novel Cross-Guided Fusion Network (CGFNet) for RGB-T salient object detection is proposed. Specifically, a Cross-Scale Alternate Guiding Fusion (CSAGF) module is proposed to mine the high-level semantic information and provide global context support. Subsequently, we design a Guidance Fusion Module (GFM) to achieve sufficient cross-modality fusion by using single modal as the main guidance and the other modal as auxiliary. Finally, the Cross-Guided Fusion Module (CGFM) is presented and serves as the main decoding block. And each decoding block is consists of two parts with two modalities information of each being the main guidance, i.e., cross-shared Cross-Level Enhancement (CLE) and Global Auxiliary Enhancement (GAE). The main difference between the two parts is that the GFM using different modalities as the main guide. The comprehensive experimental results prove that our method achieves better performance than the state-of-the-art salient detection methods. The source code has released at: https://github.com/wangjie0825/CGFNet.git.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Capsule Boundary Network With 3D Convolutional Dynamic Routing for
           Temporal Action Detection

    • Free pre-print version: Loading...

      Authors: Yaosen Chen;Bing Guo;Yan Shen;Wei Wang;Weichen Lu;Xinhua Suo;
      Pages: 2962 - 2975
      Abstract: Temporal action detection is a challenging task in video understanding, due to the complexity of the background and rich action content impacting high-quality temporal proposals generation in untrimmed videos. Capsule networks can avoid some limitations of the invariance caused by pooling and inability from convolutional neural networks, which can better understand the temporal relations for temporal action detection. However, because of the extremely computationally expensive procedure, capsule network is difficult to be applied to the task of temporal action detection. To address this issue, this paper proposes a novel U-shaped capsule network framework with a k-Nearest Neighbor (k-NN) mechanism of 3D convolutional dynamic routing, which we named U-BlockConvCaps. Furthermore, we build a Capsules Boundary Network (CapsBoundNet) based on U-BlockConvCaps for dense temporal action proposal generation. Specifically, the first module is one 1D convolutional layer for fusing the two-stream with RGB and optical flow video features. The sampling module further processes the fused features to generate the 2D start-end action proposal feature maps. Then, the multi-scale U-Block convolutional capsule module with 3D convolutional dynamic routing is used to process the proposal feature map. Finally, the feature maps generated from the CapsBoundNet are used to predict starting, ending, action classification, and action regression score maps, which help to capture the boundary and intersection over union features. Our work innovatively improves the dynamic routing algorithm of capsule networks and extends the use of capsule networks to the temporal action detection task for the first time in the literature. The experimental results on benchmarks THUMOS14 show that the performance of CapsBoundNet is obviously beyond the state-of-the-art methods, e.g., the mAP@tIoU = 0.3, 0.4, 0.5 on THUMOS14 are improved from 63.6% to 70.0&-x0025;, 57.8% to 63.1%, 51.3% to 52.9%, respectively. We also got competitive results on the action detection dataset of ActivityNet1.3.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Object Tracking via Spatial-Temporal Memory Network

    • Free pre-print version: Loading...

      Authors: Zikun Zhou;Xin Li;Tianzhu Zhang;Hongpeng Wang;Zhenyu He;
      Pages: 2976 - 2989
      Abstract: Temporal and spatial contexts, characterizing target appearance variations and target-background differences, respectively, are crucial for improving the online adaptive ability and instance-level discriminative ability of object tracking. However, most existing trackers focus on either the temporal context or the spatial context during tracking and have not exploited these contexts simultaneously and effectively. In this paper, we propose a Spatial-TEmporal Memory (STEM) network to exploit these contexts jointly for object tracking. Specifically, we develop a key-value structured memory model equipped with a key-value index-based memory reading mechanism to model the spatial and temporal contexts simultaneously. To update the memory with new target states and ensure the diversity of the memory, we introduce a similarity-aware memory update scheme. In addition, we construct an entropy-guided ensemble strategy to fuse the prediction models based on these two contexts, such that these two contexts can be exploited to estimate the target state jointly. Extensive experimental results on eight challenging datasets, including OTB2015, TC128, UAV123, VOT2018, LaSOT, TrackingNet, GOT-10k, and OxUvA, demonstrate that the proposed method performs favorably against state-of-the-art trackers.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • RPNet: Gait Recognition With Relationships Between Each Body-Parts

    • Free pre-print version: Loading...

      Authors: Hao Qin;Zhenxue Chen;Qingqiang Guo;Q. M. Jonathan Wu;Mengxu Lu;
      Pages: 2990 - 3000
      Abstract: At present, many studies have shown that partitioning the gait sequence and its feature map can improve the accuracy of gait recognition. However, most models just cut the feature map at a fixed single scale, which loses the dependence between various parts. So, our paper proposes a structure called Part Feature Relationship Extractor (PFRE) to discover all of the relationships between each parts for gait recognition. The paper uses PFRE and a Convolutional Neural Network (CNN) to form the RPNet. PFRE is divided into two parts. One part that we call the Total-Partial Feature Extractor (TPFE) is used to extract the features of different scale blocks, and the other part, called the Adjacent Feature Relation Extractor (AFRE), is used to find the relationships between each block. At the same time, the paper adjusts the number of input frames during training to perform quantitative experiments and finds the rule between the number of input frames and the performance of the model. Our model is tested on three public gait datasets, CASIA-B, OU-LP and OU-MVLP. It exhibits a significant level of robustness to occlusion situations, and achieves accuracies of 92.82% and 80.26% on CASIA-B under BG # and CL # conditions, respectively. The results show that our method reaches the top level among state-of-the-art methods.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Fingertips Detection With Nearest-Neighbor Pose Particles From a Single
           RGB Image

    • Free pre-print version: Loading...

      Authors: Purnendu Mishra;Kishor Prabhakar Sarawadekar;
      Pages: 3001 - 3011
      Abstract: Vision-based detection of fingertips is useful for freehand Human-Computer Interaction (HCI)—especially in virtual, augmented, and mixed reality—to have a seamless experience. The estimation of fingertips position in an RGB image involves overcoming various challenges like occlusion, appearance ambiguities, etc. The general approach relies on a two-stage pipeline involving hand location and detection of fingertips for a single hand. This paper presents an effective single-stage Convolutional Neural Network (CNN) for the detection of fingertips of both hands. We use a set of reference points, referred to as pose particles, and train a CNN model end-to-end to find the N-nearest particles in the proximity of each fingertip. Moreover, the same CNN model is used to compute the position vector’s components with reference to these N-nearest neighbors. Finally, a fingertip position is estimated by computing the centroid of all the points given by these position vectors. With the proposed approach, it is possible to estimate the fingertips position for single or double hands. Moreover, there is no requirement for prior hand localization. We demonstrated the feasibility and effectiveness of the proposed methodology by performing experiments on three different datasets.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • DEF-Net: A Face Aging Model by Using Different Emotional Learnings

    • Free pre-print version: Loading...

      Authors: Mingxing Duan;Kenli Li;Qing Liao;Qi Tian;
      Pages: 3012 - 3022
      Abstract: Face aging has attracted widespread attention in recent years, but most studies are based on the same emotional situation. Is the same person’s aging in different emotional situations the same' To solve the above confusion, this paper proposes a novel face aging model DEF-Net, which consists of two parts: different emotional learnings (Emotion-Net) and face aging (Age-Net). Given a target emotion category, DEF-Net first assists the image from the original dataset to learn the emotion features through Emotion-Net and the generated dataset is used as the inputs of Age-Net. At the same time, multiple loss functions are used to ensure that the crucial information of the original image is not lost. Secondly, Age-Net, which has been pre-trained on the original dataset, began to adopt the generated dataset to learn the aging distribution under different emotions. Designed loss functions are utilized to ensure that the realistic target images generated by Age-Net do not lose the learned emotional characteristics. Finally, extensive experiments are used to verify the performance of DEF-Net. Compared with other state-of-the-art methods: (1) DEF-Net can learn different facial emotions across different datasets and generate corresponding realistic aging images; (2) the results achieved by our DEF-Net are demonstrated to be better than those by the model that performs face aging first and then learns different emotional characteristics.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Multi-Object Tracking With Spatial-Temporal Topology-Based Detector

    • Free pre-print version: Loading...

      Authors: Sisi You;Hantao Yao;Changsheng Xu;
      Pages: 3023 - 3035
      Abstract: Multi-object tracking is a challenging task due to the occlusion of different targets. Existing methods focus on inferring a robust and discriminative feature for data association based on the targets generated by the existing detector. Unlike existing methods that consider each target independently during generating the trajectories, we propose a novel Spatial-Temporal Topology-based Detector (STTD) algorithm that treats the target and its nearest neighbors as a cluster and introduces a topology structure to describe the dynamics of moving targets belonging to the same cluster. With the public detections and the tracked objects in the previous frame, STTD firstly refines them by regression of detector to obtain the candidate proposals in the current frame. After that, the temporal topology constraint is proposed to recover the missed objects by considering the continuity and consistency of the topological structure. Based on the assumption that the targets belonging to the same topology should have a consistent characteristic, the spatial topology constraint is proposed to remove the inaccurate targets. Then we can obtain new candidate objects and construct the cost matrix used for data association. The evaluations on three MOTChallenge benchmarks verify the effectiveness of the proposed method.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Graph-Based Object Semantic Refinement for Visual Emotion Recognition

    • Free pre-print version: Loading...

      Authors: Jing Zhang;Xinyu Liu;Zhe Wang;Hai Yang;
      Pages: 3036 - 3049
      Abstract: The rich semantic information contained in images is an important clue to explore visual emotions. Therefore, exploring the correlation between visual emotion and the semantic relationship of objects, and extracting more effective semantic features through explicit or implicit modeling is very important for visual emotion analysis. In this paper, a novel Graph-based Object Semantic Refinement (GOSR) model is proposed to extract multi-level semantic features for visual emotion classification, in which graph structures is used to represent the object semantics and their position relationships of an image, and Graph Convolutional Networks (GCN) is used to refine object information by the aggregating neighbor object with their position relationships. The different convolutional layer’s features from GCN are further fused by Gated Recurrent Units (GRU) networks to achieve high-level semantic features. Then a framework with two branches to leverage visual and semantic information for visual sentiment analysis is proposed, which uses convolutional neural networks to extract visual features from images, and collaborates with semantic features from GOSR model to achieve better emotion recognition results. Besides, for alleviating the potentially unreasonable predictions and promote models collaboration, a novel tendency loss function based on the correlations among emotion labels is proposed to adjust the output activation value other than the target label. Extensive experiments on four widely used benchmark datasets show that our proposed method can achieve competitive performance and outperform most of the state-of-the-art methods on visual emotion recognition.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Multi-Stream Interaction Networks for Human Action Recognition

    • Free pre-print version: Loading...

      Authors: Haoran Wang;Baosheng Yu;Jiaqi Li;Linlin Zhang;Dongyue Chen;
      Pages: 3050 - 3060
      Abstract: Skeleton-based human action recognition has received extensive attention due to its efficiency and robustness to complex backgrounds. Though the human skeleton can accurately capture the dynamics of human poses, it fails to recognize human actions induced by the interaction between human and objects, making it is of great importance to further explore the interaction between the human and objects for human action recognition. In this paper, we devise the multi-stream interaction networks (MSIN), to simultaneously explore the dynamics of human skeleton, objects, and the interaction between human and objects. Specifically, apart from the traditional human skeleton stream, 1) the second stream explores the dynamics of object appearance from the objects surrounding the human body joints; and 2) the third stream captures the dynamics of object position in regard to the distance between the object and different human body joints. Experimental results on three popular skeleton-based human action recognition datasets, NTU RGB + D, NTU RGB + D 120, and SYSU, demonstrate the effectiveness of the proposed method, especially for recognizing the human actions with human-object interactions.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • A Cross View Learning Approach for Skeleton-Based Action Recognition

    • Free pre-print version: Loading...

      Authors: Hui Zheng;Xinming Zhang;
      Pages: 3061 - 3072
      Abstract: With the prevalence of accessible multi-modal sensors and the maturity of pose estimation algorithms, skeleton-based action recognition has gradually become the mainstream of human action recognition (HAR). The key issue is to mine the correlations and dependencies between different joints and bones. In this paper, we propose a cross view learning approach. First, the static and dynamic representations of skeletons, from two different views (joints and bones), are calculated and aggregated respectively. Then, the integrated representations of these two views are used as parallel inputs to the cross view learning model, which mainly includes two blocks, namely a multi-scale learning block and a multi-view fusion block. The former is used to excavate the intra-view’s discriminative and comprehensive features, and the latter is utilized to capture the complementary representations of the inter-view. Finally, the fused representations are input to the classifier for action recognition. It has been experimentally proven that our proposed approach outperforms several state-of-the-art baseline methods and achieves a very competitive performance.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Dense Semantics-Assisted Networks for Video Action Recognition

    • Free pre-print version: Loading...

      Authors: Haonan Luo;Guosheng Lin;Yazhou Yao;Zhenmin Tang;Qingyao Wu;Xiansheng Hua;
      Pages: 3073 - 3084
      Abstract: Most existing action recognition approaches directly leverage the video-level features to recognize human actions from videos. Although these methods have made remarkable progress, the accuracy is still unsatisfied. When the test video involves complex backgrounds and activities, existing methods usually suffer from a significant drop in accuracy. Human action is inherently a high-level concept. Merely applying a video classification model without a detailed semantic understanding of the video content, e.g., objects, scene context, object motions, object interactions, is inadequate to tackle the challenges for action recognition. Fine-level semantic understanding of videos generates elementary semantic concepts from the raw video data, such as the semantics of objects and background regions. It can be employed to bridge the gap between the raw video data and the high-level concept of human actions. In this work, we leverage dense semantic segmentation masks, which encode rich semantic details, provide extra information for the network training, and improve the performance of action recognition. We propose a novel deep architecture which is named as Dense Semantics-Assisted Convolutional Neural Networks (DSA-CNNs) to effectively utilize dense semantic information of video by a bottom-up attention way in the spatial stream, while by the way of branch fusion in the temporal stream. To verify the effectiveness of our approach, we conduct extensive experiments on publicly available datasets – UCF101, HMDB51, and Kinetics. The experimental results demonstrate that our approach substantially improves existing methods and achieves very competitive performance. It also shows that our approach is superior to other related methods that utilize extra information for action recognition.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Bayesian Correlation Filter Learning With Gaussian Scale Mixture Model for
           Visual Tracking

    • Free pre-print version: Loading...

      Authors: Yuan Cao;Guangming Shi;Tianzhu Zhang;Weisheng Dong;Jinjian Wu;Xuemei Xie;Xin Li;
      Pages: 3085 - 3098
      Abstract: Correlation filters (CF), a popular tool for visual tracking, suffer from unwanted boundary effects due to the periodic assumption needed for FFT implementation. To address this issue, spatially regularized discriminative correlation filters (SRDCF) have been proposed by introducing a weighting matrix to the regularization term. However, the existing design of spatial weighting matrix is often heuristic and non-adaptive. Inspired by recent advances in joint discrimination and reliability learning for correlation tracking, we propose a principled Bayesian correlation filter learning method using Gaussian scale mixture (GSM) model. The key idea is to decompose each CF coefficient into the product of a positive scalar multiplier and a Gaussian random variable. Treating positive multipliers as weighting coefficients, GSM-based modeling of CFs leads to a spatially adaptive regularization strategy with improved capability of handling various appearance-related uncertainty factors (e.g., scale variation, out-of-plane rotation, and motion blur). Moreover, by imposing a sparse prior over the multipliers, we can jointly learn multipliers and CFs under a unified Bayesian estimation framework. Structured GSM model allows us to better exploit the spatial correlations among CFs and further improve the tracking performance. Experimental results on OTB-2013, OTB-2015, Temple Color-128, VOT-2016, and VOT-2017 show that our tracking method performs favorably when compared with current state-of-the-art methods.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Adaptive Disparity Candidates Prediction Network for Efficient Real-Time
           Stereo Matching

    • Free pre-print version: Loading...

      Authors: He Dai;Xuchong Zhang;Yongli Zhao;Hongbin Sun;Nanning Zheng;
      Pages: 3099 - 3110
      Abstract: Efficient real-time disparity estimation is critical for the application of stereo vision systems in various areas. Recently, stereo network based on coarse-to-fine method has largely relieved the memory constraints and speed limitations of large-scale network models. Nevertheless, all of the previous coarse-to-fine designs employ constant offsets and three or more stages to progressively refine the coarse disparity map, still resulting in unsatisfactory computation accuracy and inference time when deployed on mobile devices. This paper claims that the coarse matching errors can be corrected efficiently with fewer stages as long as more accurate disparity candidates can be provided. Therefore, we propose a dynamic offset prediction module to meet different correction requirements of diverse objects and design an efficient two-stage framework. In addition, a disparity-independent convolution is proposed to regularize the compact cost volume efficiently and further improve the overall performance. The disparity quality and efficiency of various stereo networks are evaluated on multiple datasets and platforms. Evaluation results demonstrate that, the disparity error rate of the proposed network achieves 2.66% and 2.71% on KITTI 2012 and 2015 test sets respectively, where the computation speed is $2times $ faster than the state-of-the-art lightweight models on high-end and source-constrained GPUs.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Efficient Context-Guided Stacked Refinement Network for RGB-T Salient
           Object Detection

    • Free pre-print version: Loading...

      Authors: Fushuo Huo;Xuegui Zhu;Lei Zhang;Qifeng Liu;Yu Shu;
      Pages: 3111 - 3124
      Abstract: RGB-T salient object detection (SOD) aims at utilizing the complementary cues of RGB and Thermal (T) modalities to detect and segment the common objects. However, on one hand, existing methods simply fuse the features of two modalities without fully considering the characters of RGB and T. On the other hand, the high computational cost of existing methods prevents them from real-world applications (e.g., automatic driving, abnormal detection, person re-ID). To this end, we proposed an efficient encoder-decoder network named Context-guided Stacked Refinement Network (CSRNet). Specifically, we utilize a lightweight backbone and design efficient decoder parts, which greatly reduce the computational cost. To fuse RGB and T modalities, we proposed an efficient Context-guided Cross Modality Fusion (CCMF) module to filter the noise and explore the complementation of two modalities. Besides, Stacked Refinement Network (SRN) progressively refines the features from top to down via the interaction of semantic and spatial information. Extensive experiments show that our method performs favorably against state-of-the-art algorithms on RGB-T SOD task while with small model size (4.6M), few FLOPs (4.2G), and real-time speed (38 fps). Our codes is available at: https://github.com/huofushuo/CSRNet.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Double-Laplacian Mixture-Error Model-Based Supervised Group-Sparse Coding
           for Robust Palmprint Recognition

    • Free pre-print version: Loading...

      Authors: Kunlei Jing;Xinman Zhang;Xuebin Xu;
      Pages: 3125 - 3140
      Abstract: Robustness enhancement and feature selection are the two crucial issues to be resolved in robust palmprint recognition. However, existing regression-based methods are insufficient to handle outliers and select significant features. From a statistical viewpoint, we present a general framework to intrinsically resolve the two issues. By investigating the role of outliers in the formation of coding errors, we devise a double-Laplacian mixture-error model to faithfully fit the error distribution. Additionally, we design a supervised group-sparse regularizer to enforce the locality and group sparsity of the codes. Integrating the two parts into the framework produces a nonconvex constrained problem, for which we develop an iteratively reweighted ${l} _{ {1}}$ - ${l} _{{1}}$ minimization algorithm by combining the majorization-minimization strategy and the alternating direction method of multipliers. The weighted vector learned from the error model and the local group sparsity enforced by the regularizer enable our method to better handle outliers and select more significant features than the state-of-the-art methods. Extensive experimental results verify the flexibility and robustness of our method to various contaminations.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • FEXNet: Foreground Extraction Network for Human Action Recognition

    • Free pre-print version: Loading...

      Authors: Zhongwei Shen;Xiao-Jun Wu;Tianyang Xu;
      Pages: 3141 - 3151
      Abstract: As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtraction (FEX) block to explicitly model the foreground clues to achieve effective management of action subjects. In particular, the designed FEX block contains two components. The first part is a Foreground Enhancement (FE) module, which highlights the potential feature channels related to the action attributes, providing channel-level refinement for the following spatiotemporal modeling. The second phase is a Scene Segregation (SS) module, which splits feature maps into foreground and background. Specifically, a temporal model with dynamic enhancement is constructed for the foreground part, reflecting the essential nature of the action category. While the background is modeled using simple spatial convolutions, mapping the inputs to the consistent feature space. The FEX blocks can be inserted into existing 2D CNNs (denoted as FEXNet) for spatiotemporal modeling, concentrating on the foreground clues for effective action inference. Our experiments performed on Something-Something V1, V2 and Kinetics400 verify the effectiveness of the proposed method.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Text Region Conditional Generative Adversarial Network for Text
           Concealment in the Wild

    • Free pre-print version: Loading...

      Authors: Prateek Keserwani;Partha Pratim Roy;
      Pages: 3152 - 3163
      Abstract: Textual information appearing on the captured image may contain personal information. In various circumstances, publishing such images in the public domain may create a threat of privacy leak. To avoid these situations, we propose a text concealment method. To accomplish this task, we have used a conditional generator, which is a text region conditioned concealment network. The text regions predicted by the detector network are used as a conditioning criterion for the concealment process. The text region prediction in the form of a word-level bounding box may contain stroke pixels as well as the background pixels. However, to reduce the number of background pixels for proper conditioning of text concealment network, a character level annotation is used for the generator in place of word-level bounding box annotation. It helps to focus more on strokes as compared to the background pixels. A character-level symmetric line representation of text has been proposed to obtain finer level text region prediction as compared to the character-level bounding box. The proposed model is trainable from end-to-end. The text region conditioned generator is trained from the loss of global and local discriminators. The proposed method is validated on public scene text image datasets such as ICDAR 2015, COCO-Text and Synthesis dataset. The proposed architecture shows competitive results as compared to other state-of-the-art approaches.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Median Stable Clustering and Global Distance Classification for
           Cross-Domain Person Re-Identification

    • Free pre-print version: Loading...

      Authors: Zhiqi Pang;Jifeng Guo;Zhiqiang Ma;Wenbo Sun;Yanbang Xiao;
      Pages: 3164 - 3177
      Abstract: The person re-identification (ReID) method in a single-domain achieves appealing performance, but its reliance on label information greatly limits its extensibility. Therefore, the unsupervised cross-domain ReID method has received extensive attention. Its purpose is to optimize the model by using the labelled source domain and the unlabelled target domain and finally make the model well generalized in the target domain. We propose an unsupervised cross-domain ReID method based on median stable clustering (MSC) and global distance classification (GDC). Specifically, the measurement method used by MSC comprehensively considers the similarity between clusters, the number of samples in a cluster, and the combined similarity within a cluster. Different from the method based on triple loss, GDC can separate the distance distribution of positive and negative sample pairs in a global scope. In addition, considering that model performance is very sensitive to probability parameters when source domain memory is reconsolidated, we designed a dynamic memory reconsolidation (DMR) method to reduce the influence of parameters on performance. Extensive experiments on large-scale datasets (Market-1501, DukeMTMC-reID and MSMT17) demonstrate the superior performance of MSC-GDC over the state-of-the-art methods.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Learning Informative and Discriminative Features for Facial Expression
           Recognition in the Wild

    • Free pre-print version: Loading...

      Authors: Yingjian Li;Yao Lu;Bingzhi Chen;Zheng Zhang;Jinxing Li;Guangming Lu;David Zhang;
      Pages: 3178 - 3189
      Abstract: The informativeness and discriminativeness of features collaboratively ensure high-accuracy Facial Expression Recognition (FER) in the wild. Most of existing methods use the single-path deep convolutional neural network with softmax loss for basic FER, while they cannot deal with the challenging situations of the compound FER in the wild, because they fail to learn informative and discriminative features in a targeted manner. To this end, we present an Informative and Discriminative Feature Learning (IDFL) framework that consists of two key components: the Multi-Path Attention Convolutional Neural Network (MPACNN) and Balanced Separate loss (BS loss), for both basic and compound high-accuracy FER in the wild. Specifically, MPACNN leverages different paths to learn diverse features. These features are then adaptively fused into informative ones via an attention module, such that the model can adequately capture detailed information for both basic and compound FER. The BS loss maximizes the inter-class distance of features and minimizes the intra-class one. In this way, the features are discriminative enough for high-accuracy FER in the wild. Particularly, the BS loss is invoked as the objective function of MPACNN, so the model can learn informative and discriminative features at the same time, yielding better performance. Seven databases are utilized to evaluate the proposed method, and the results demonstrate that our method achieves state-of-the-art performance on both basic and compound expressions with good generalization ability. Moreover, our model contains fewer parameters and can be trained faster than other related models.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Self-Supervised Exclusive-Inclusive Interactive Learning for Multi-Label
           Facial Expression Recognition in the Wild

    • Free pre-print version: Loading...

      Authors: Yingjian Li;Yingnan Gao;Bingzhi Chen;Zheng Zhang;Guangming Lu;David Zhang;
      Pages: 3190 - 3202
      Abstract: Facial Expression Recognition (FER) is a long-standing but challenging research problem in computer vision. Existing approaches mainly focus on single-label emotional prediction, which cannot handle the complex multi-label FER task because of the coupling behavior of multiple emotions on a single facial image. To this end, in this paper, we propose a novel Self-supervised Exclusive-Inclusive Interactive Learning (SEIIL) method to facilitate discriminative multi-label FER in the wild, which can effectively handle the coupled multiple sentiments with limited unconstrained training data. Specifically, we construct an emotion disentangling module to capture the inclusive and exclusive characteristics of facial expressions, which can decouple the compound numerous emotions on an image. Moreover, an adaptively-weighted ensemble technique is conceived to aggregate category-level latent exclusive embeddings, and then a conditional adversarial interactive learning module is designed to fully leverage the complementary between the inclusive and formulated latent representations. Furthermore, to tackle the insufficient data for training, we introduce a self-supervised learning strategy to augment the amount and diversity of facial images, which can endow the model with advanced generalization ability. Under this strategy, the proposed two modules can be concurrently utilized in our SEIIL to jointly handle the coupled emotions and alleviate the overfitting problem. Extensive experimental results on six databases illustrate the superb performance of our method against state-of-the-art baselines.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Overview of the Neural Network Compression and Representation (NNR)
           Standard

    • Free pre-print version: Loading...

      Authors: Heiner Kirchhoffer;Paul Haase;Wojciech Samek;Karsten Müller;Hamed Rezazadegan-Tavakoli;Francesco Cricri;Emre B. Aksu;Miska M. Hannuksela;Wei Jiang;Wei Wang;Shan Liu;Swayambhoo Jain;Shahab Hamidi-Rad;Fabien Racapé;Werner Bailer;
      Pages: 3203 - 3216
      Abstract: Neural Network Coding and Representation (NNR) is the first international standard for efficient compression of neural networks (NNs). The standard is designed as a toolbox of compression methods, which can be used to create coding pipelines. It can be either used as an independent coding framework (with its own bitstream format) or together with external neural network formats and frameworks. For providing the highest degree of flexibility, the network compression methods operate per parameter tensor in order to always ensure proper decoding, even if no structure information is provided. The NNR standard contains compression-efficient quantization and deep context-adaptive binary arithmetic coding (DeepCABAC) as core encoding and decoding technologies, as well as neural network parameter pre-processing methods like sparsification, pruning, low-rank decomposition, unification, local scaling and batch norm folding. NNR achieves a compression efficiency of more than 97% for transparent coding cases, i.e. without degrading classification quality, such as top-1 or top-5 accuracies. This paper provides an overview of the technical features and characteristics of NNR.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Interlayer Restoration Deep Neural Network for Scalable High Efficiency
           Video Coding

    • Free pre-print version: Loading...

      Authors: Gang He;Li Xu;Jie Lei;Weiying Xie;Yunsong Li;Yibo Fan;Jinjia Zhou;
      Pages: 3217 - 3234
      Abstract: This paper applies an interlayer restoration deep neural network (IRDNN) for scalable high efficiency video coding (SHVC) to improve visual quality and coding efficiency. It is the first time to combine deep neural network (DNN) and SHVC. Considering the coding architecture of SHVC, we elaborate a multi-frame and multi-layer neural network to restore the interlayer of SHVC by utilizing both the adjacent reconstructed frames of the base layer (BL) and enhancement layer (EL). Moreover, we analyze the temporal motion relationship of frames in one layer and the compression degradation relationship of frames between different layers, and propose the synergistic mechanism of motion restoration and compression restoration in our IRDNN. The network can generate an interlayer with higher quality serving for the EL coding and thus enhance the coding efficiency. A large-scale and various-quality-degradation dataset is self-made for the task of interlayer restoration of SHVC. The experimental results show that with our implementation on SHVC, the EL Bj $phi $ ntegaard delta bit-rate (BD-BR) reduction is 9.291% and 6.007% in signal-to-noise ratio scalability and spatial scalability, respectively. The code is available at https://github.com/icecherylXuli/IRDNN.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • FastInter360: A Fast Inter Mode Decision for HEVC 360 Video Coding

    • Free pre-print version: Loading...

      Authors: Iago Storch;Luciano Agostini;Bruno Zatt;Sergio Bampi;Daniel Palomino;
      Pages: 3235 - 3249
      Abstract: This paper presents FastInter360, a fast inter mode decision algorithm for accelerating the encoding of ERP 360 videos. The development of FastInter360 involves an in-depth and comprehensive set of evaluations performed to understand the differences in the encoder’s behavior when encoding 360 and conventional videos. These evaluations showed that due to the texture distortions resulting from projection, the encoder presents a specific behavior when encoding 360 videos, making it more likely to use a recurrent set of encoding modes when processing 360 videos. Besides, the coding efficiency is less sensible to approximations in some encoding steps depending on the frame region. FastInter360 is then proposed to reduce the encoding complexity by exploiting these differences. FastInter360 comprises three algorithms that accelerate the encoding by performing early decision by SKIP mode, reducing integer motion estimation search range, and adjusting fractional motion estimation precision. Furthermore, each of these algorithms behaves according to distortion intensity, performing greater complexity reduction in more distorted regions. When employed altogether, these algorithms compose FastInter360, which is able to achieve an average complexity reduction of 22.84% with a coding efficiency loss of 0.652% BD-BR, on average, making FastInter360 competitive with literature works.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Immersive Video Coding: Should Geometry Information Be Transmitted as
           Depth Maps'

    • Free pre-print version: Loading...

      Authors: Patrick Garus;Félix Henry;Joel Jung;Thomas Maugey;Christine Guillemot;
      Pages: 3250 - 3264
      Abstract: Immersive video often refers to multiple views with texture and scene geometry information, from which different viewports can be synthesized on the client side. To design efficient immersive video coding solutions, it is desirable to minimize bitrate, pixel rate and complexity. We investigate whether the classical approach of sending the geometry of a scene as depth maps is appropriate to serve this purpose. Previous work shows that bypassing depth transmission entirely and estimating depth at the client side improves the synthesis performance while saving bitrate and pixel rate. In order to understand if the encoder side depth maps contain information that is beneficial to be transmitted, we first explore a hybrid approach which enables partial depth map transmission using a block-based RD-based decision in the depth coding process. This approach reveals that partial depth map transmission may improve the rendering performance but does not present a good compromise in terms of compression efficiency. This led us to address the remaining drawbacks of decoder side depth estimation: complexity and depth map inaccuracy. We propose a novel system that takes advantage of high quality depth maps at the server side by encoding them into lightweight features that support the depth estimator at the client side. These features allow reducing the amount of data that has to be handled during decoder side depth estimation by 88%, which significantly speeds up the cost computation and the energy minimization of the depth estimator. Furthermore, −46.0% and −37.9% average synthesis BD-Rate gains are achieved compared to the classical approach with depth maps estimated at the encoder.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Joint Local Correlation and Global Contextual Information for Unsupervised
           3D Model Retrieval and Classification

    • Free pre-print version: Loading...

      Authors: Wenhui Li;Zhenlan Zhao;An-An Liu;Zan Gao;Chenggang Yan;Zhendong Mao;Haipeng Chen;Weizhi Nie;
      Pages: 3265 - 3278
      Abstract: Unsupervised 3D model analysis has attracted tremendous attentions with the increasing growth of 3D model data and the extensive human annotations. Many effective methods have been designed to address the 3D model analysis with labeled information, while rare methods devote to unsupervised deep learning due to the difficulty of mining reliable information. In this paper, we propose a novel unsupervised deep learning method named joint local correlation and global contextual information (LCGC) for 3D model retrieval and classification, which mines the reliable triplet set and uses triplet loss to optimize the deep neural network. Our method proposes two schemes: 1) Local self-correlation information learning, which adopts the intra and inter information to construct the view-level triplet set. 2) Global neighbor contextual information learning, which employs the neighbor contextual information to explore the reliable relations among 3D models and construct the model-level triplet set. The above schemes encourage that the selected triple set can been used to improve the discrimination of learned features. Extensive evaluations on two large-scale datasets, ModelNet40 and ShapeNet55, have demonstrated the effectiveness of our proposed method.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Detection of Double JPEG Compression With the Same Quantization Matrix via
           Convergence Analysis

    • Free pre-print version: Loading...

      Authors: Yakun Niu;Xiaolong Li;Yao Zhao;Rongrong Ni;
      Pages: 3279 - 3290
      Abstract: Detecting double JPEG compression with the same quantization matrix is a challenging task in image forensics. To address this problem, in this paper, a novel method is proposed by leveraging the component convergence during repeated JPEG compressions. Firstly, an in-depth analysis of the pipeline in successive JPEG compressions is conducted, and it reveals that the rounding/truncation errors as well as JPEG coefficients tend to converge after multiple recompressions. Based on this fact, the backward quantization error (BQE) is defined, and we find that the ratio of non-zero BQE for single compression is larger than that for double compression. Moreover, to exploit the convergence property of JPEG coefficients, a multi-threshold strategy is designed for capturing the statistics of the number of different JPEG coefficients between two sequential compressions. Finally, the statistical features of the dual components are concatenated into a 15-D vector to detect double JPEG compression. Experimental results demonstrate the efficiency of the proposed method, which outperforms some state-of-the-art schemes.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
  • Modeling Two-Stream Correspondence for Visual Sound Separation

    • Free pre-print version: Loading...

      Authors: Yixuan He;Xing Xu;Jingran Zhang;Fumin Shen;Yang Yang;Heng Tao Shen;
      Pages: 3291 - 3302
      Abstract: Visual sound separation (VSS) aims to obtain each sound component from the mixed audio signals with the guidance of visual information. Existing works mainly capture the global-level audio-visual correspondence and exploit various visual features to enhance the appearance and motion features of visual modality. However, they commonly neglect the intrinsic properties of the audio modality, resulting in less effective audio feature extraction and unbalanced audio-visual correspondence. To tackle this problem, we propose a novel end-to-end framework termed Modeling Two-Stream Correspondence (MTSC) for VSS by explicitly extracting the timbre and content features in audio modality. The proposed MTSC method employs a two-stream architecture to enhance audio-visual correspondence for both the appearance-timbre and motion-content features. Moreover, with the advanced two-stream pipeline, more lightweight appearance and motion features for visual modality are exploited. Extensive experiments conducted on two benchmark musical instrument datasets demonstrate that with the above properties, our MTSC method remarkably outperforms seven state-of-the-art VSS approaches. The implementation code and extensive experimental results of the proposed MTSC method are provided at https://github.com/CFM-MSG/MTSC-VSS.
      PubDate: May 2022
      Issue No: Vol. 32, No. 5 (2022)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 44.192.25.113
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-