Subjects -> ELECTRONICS (Total: 207 journals)
| A B C D E F G H I J K L M N O P Q R S T U V W X Y Z | The end of the list has been reached or no journals were found for your choice. |
|
|
- IEEE Transactions on Circuits and Systems for Video Technology publication
information-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Pages: C2 - C2 PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- IEEE Transactions on Circuits and Systems for Video Technology publication
information-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Pages: C3 - C3 PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- MSNet: Multi-Resolution Synergistic Networks for Adaptive Inference
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Renlong Hang;Xuwei Qian;Qingshan Liu;
Pages: 2009 - 2018 Abstract: Adaptive inference with multiple networks has attracted much attention for resource-limited image classification. It assumes that a large portion of test samples can be correctly classified by small networks with fewer layers or channels, which poses a great challenge for them. In this paper, we argue that large networks have abilities to help the small ones address this challenge if fully explored. To this end, we propose a multi-resolution synergistic network (MSNet) using two different kinds of fusion modules. The first one is a cross-branch aggregation module, which aims to transfer the high-resolution features to the low-resolution ones between neighboring branches. The other one is an adaptive distillation module, whose purpose is feeding the discriminative ability of the large network to the other ones. Via these two modules, the small networks will be powerful enough to correctly classify large numbers of test samples, thus improving the classification accuracy and inference efficiency. We evaluate MSNet on three benchmark datasets: CIFAR-10, CIFAR-100, and ImageNet. Experimental results show that our network can obtain better results than several state-of-the-art networks in both anytime classification and budgeted batch classification settings. The code is available at https://github.com/bigdata-qian/MSNet-Pytorch. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- CNN-Transformer Based Generative Adversarial Network for Copy-Move Source/
Target Distinguishment-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Yulan Zhang;Guopu Zhu;Xing Wang;Xiangyang Luo;Yicong Zhou;Hongli Zhang;Ligang Wu;
Pages: 2019 - 2032 Abstract: Copy-move forgery can be used for hiding certain objects or duplicating meaningful objects in images. Although copy-move forgery detection has been studied extensively in recent years, it is still a challenging task to distinguish between the source and the target regions in copy-move forgery images. In this paper, a convolutional neural network-transformer based generative adversarial network (CNN-T GAN) is proposed to distinguish the source and target regions in a copy-move forged image. A generator is first utilized to generate a mask that is similar to the groundtruth mask. Then, a discriminator is trained to discriminate the true image pairs from the false ones. When the discriminator cannot discriminate the true/false image pairs accurately, the generator can be used to obtain the final localization maps of copy-move forgery. In the generator, convolutional neural network (CNN) and transformer are exploited to extract the local features and global representations in copy-move forgery images, respectively. In addition, feature coupling layers are designed to integrate the features in CNN branch and transformer branch in an interactive way. Finally, a new Pearson correlation layer is introduced to match the similarity features in source and target regions, which can improve the performance of copy-move forgery localization, especially the localization performance on source regions. To the best of our knowledge, this is the first work to utilize transformer for feature extraction in copy-move forgery localization. The proposed method can not only detect the copy-move regions, but also distinguish the source and target regions. Extensive experimental results on several commonly used copy-move datasets have shown that the proposed method outperforms the state-of-the-art methods for copy-move detection. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Toward Facial Expression Recognition in the Wild via Noise-Tolerant
Network-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Yu Gu;Huan Yan;Xiang Zhang;Yantong Wang;Yusheng Ji;Fuji Ren;
Pages: 2033 - 2047 Abstract: Facial Expression Recognition (FER) has recently emerged as a crucial area in Human-Computer Interaction (HCI) system for understanding the user’s inner state and intention. However, feature- and label-noise constitute the major challenge for FER in the wild due to the ambiguity of facial expressions worsened by low-quality images. To deal with this problem, in this paper, we propose a simple but effective Facial Expression Noise-tolerant Network (FENN) which explores the inter-class correlations for mitigating ambiguity that usually happens between morphologically similar classes. Specifically, FENN leverages a multivariate normal distribution to model such correlations at the final hidden layer of the neural network to suppress the heteroscedastic uncertainty caused by inter-class label noise. Furthermore, the discriminative ability of deep features is weakened by the subtle differences between expressions and the presence of feature noise. FENN utilizes a feature-noise mitigation module to extract compact intra-class feature representations under feature noise while preserving the intrinsic inter-class relationships. We conduct extensive experiments to evaluate the effectiveness of FENN on both original annotated images and synthetic noisy annotated images from RAF-DB, AffectNet, and FERPlus in-the-wild facial expression datasets. The results show that FENN significantly outperforms state-of-the-art FER methods. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Recurrent Interaction Network for Stereoscopic Image Super-Resolution
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Zhe Zhang;Bo Peng;Jianjun Lei;Haifeng Shen;Qingming Huang;
Pages: 2048 - 2060 Abstract: Recently, deep learning-based stereoscopic image super-resolution has attracted extensive attention and made great progress. However, existing methods have not adequately explored the inter-view dependency among two-view multi-level features. In this paper, a recurrent interaction network for stereoscopic image super-resolution (RISSRnet) is proposed to learn the inter-view dependency. To efficiently utilize the relationship between the two views, a recurrent interaction module is designed to achieve recurrent interaction among two-view multi-level features from the regrouped sequences, which are generated by a coupled queue-regroup mechanism. In addition, to recursively enhance features in the recurrent interaction module, an iterative propagation strategy is developed for sufficient interaction. Extensive experimental results demonstrate the effectiveness and superiority of the proposed RISSRnet. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Motion Stimulation for Compositional Action Recognition
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Lei Ma;Yuhui Zheng;Zhao Zhang;Yazhou Yao;Xijian Fan;Qiaolin Ye;
Pages: 2061 - 2074 Abstract: Recognizing the unseen combinations of action and different objects, namely (zero-shot) compositional action recognition, is extremely challenging for conventional action recognition algorithms in real-world applications. Previous methods focus on enhancing the dynamic clues of objects that appear in the scene by building region features or tracklet embedding from ground-truths or detected bounding boxes. These methods rely heavily on manual annotation or the quality of detectors, which are inflexible for practical applications. In this work, we aim to mining the temporal clues from moving objects or hands without explicit supervision. Thus, we propose a novel Motion Stimulation (MS) block, which is specifically designed to mine dynamic clues of the local regions autonomously from adjacent frames. Furthermore, MS consists of the following three steps: motion feature extraction, motion feature recalibration, and action-centric excitation. The proposed MS block can be directly and conveniently integrated into existing video backbones to enhance the ability of compositional generalization for action recognition algorithms. Extensive experimental results on three action recognition datasets, the Something-Else, IKEA-Assembly and EPIC-KITCHENS datasets, indicate the effectiveness and interpretability of our MS block. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Deep Cross-Layer Collaborative Learning Network for Online Knowledge
Distillation-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Tongtong Su;Qiyu Liang;Jinsong Zhang;Zhaoyang Yu;Ziyue Xu;Gang Wang;Xiaoguang Liu;
Pages: 2075 - 2087 Abstract: Recent online knowledge distillation (OKD) methods focus on capturing rich and useful intermediate information by performing multi-layer feature learning. Existing works only consider intermediate layer feature maps between the same layers and ignore valuable information across layers, which results in the lack of appropriate cross-layer supervision in detail and the process of learning. Besides, this manner provides insufficient supervision information to supervise the learning of student, since it fails to construct a qualified teacher. In this work, we propose a Deep Cross-layer Collaborative Learning network (DCCL) for OKD, which efficiently exploits fruitful knowledge of peer student models by keeping appropriate intermediate cross-layer supervision. Specifically, each student gradually integrates its own features at different layers for feature matching, so as to effectively utilize features in low and high levels for learning more composite knowledge. Moreover, we assign a collaborative knowledge learning strategy, in which a qualified teacher is established via fusing the features of last convolution layers for enhancing high-level representation. In this way, all student models continuously transfer the rich teacher’s internal representation as well as capture its dynamic growth process, and in turn assist the learning of the fusion teacher to further supervise students. In the experiments, our proposed DCCL has shown great generalization ability with various backbone models on CIFAR-100, Tiny ImageNet and ImageNet, and also demonstrated superior performance against mainstream OKD works. Our code is available here: https://github.com/nanxiaotong/DCCL. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Point-and-Shoot All-in-Focus Photo Synthesis From Smartphone Camera Pair
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Xianrui Luo;Juewen Peng;Weiyue Zhao;Ke Xian;Hao Lu;Zhiguo Cao;
Pages: 2088 - 2101 Abstract: All-in-Focus (AIF) photography is expected to be a commercial selling point for modern smartphones. Standard AIF synthesis requires manual, time-consuming operations such as focal stack compositing, which is unfriendly to ordinary people. To achieve point-and-shoot AIF photography with a smartphone, we expect that an AIF photo can be generated from one shot of the scene, instead of from multiple photos captured by the same camera. Benefiting from the multi-camera module in modern smartphones, we introduce a new task of AIF synthesis from main (wide) and ultra-wide cameras. The goal is to recover sharp details from defocused regions in the main-camera photo with the help of the ultra-wide-camera one. The camera setting poses new challenges such as parallax-induced occlusions and inconsistent color between cameras. To overcome the challenges, we introduce a predict-and-refine network to mitigate occlusions and propose dynamic frequency-domain alignment for color correction. To enable effective training and evaluation, we also build an AIF dataset with 2686 unique scenes. Each scene includes two photos captured by the main camera, one photo captured by the ultra-wide camera, and a synthesized AIF photo. Results show that our solution, termed EasyAIF, can produce high-quality AIF photos and outperforms strong baselines quantitatively and qualitatively. For the first time, we demonstrate point-and-shoot AIF photo synthesis successfully from main and ultra-wide cameras. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Quaternion-Valued Correlation Learning for Few-Shot Semantic Segmentation
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Zewen Zheng;Guoheng Huang;Xiaochen Yuan;Chi-Man Pun;Hongrui Liu;Wing-Kuen Ling;
Pages: 2102 - 2115 Abstract: Few-shot segmentation (FSS) aims to segment unseen classes given only a few annotated samples. Encouraging progress has been made for FSS by leveraging semantic features learned from base classes with sufficient training samples to represent novel classes. The correlation-based methods lack the ability to consider interaction of the two subspace matching scores due to the inherent nature of the real-valued 2D convolutions. In this paper, we introduce a quaternion perspective on correlation learning and propose a novel Quaternion-valued Correlation Learning Network (QCLNet), with the aim to alleviate the computational burden of high-dimensional correlation tensor and explore internal latent interaction between query and support images by leveraging operations defined by the established quaternion algebra. Specifically, our QCLNet is formulated as a hyper-complex valued network and represents correlation tensors in the quaternion domain, which uses quaternion-valued convolution to explore the external relations of query subspace when considering the hidden relationship of the support sub-dimension in the quaternion space. Extensive experiments on the PASCAL- $5^{i}$ and COCO- $20^{i}$ datasets demonstrate that our method outperforms the existing state-of-the-art methods effectively. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Optical Flow Reusing for High-Efficiency Space-Time Video Super Resolution
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Yuantong Zhang;Huairui Wang;Han Zhu;Zhenzhong Chen;
Pages: 2116 - 2128 Abstract: In this paper, we consider the task of space-time video super-resolution (ST-VSR), which can increase the spatial resolution and frame rate for a given video simultaneously. Despite the remarkable progress of recent methods, most of them still suffer from high computational costs and inefficient long-range information usage. To alleviate these problems, we propose a Bidirectional Recurrence Network (BRN) with the optical-flow-reuse strategy to better use temporal knowledge from long-range neighboring frames for high-efficiency reconstruction. Specifically, an efficient and memory-saving multi-frame motion utilization strategy is proposed by reusing the intermediate flow of adjacent frames, which considerably reduces the computation burden of frame alignment compared with traditional LSTM-based designs. In addition, the proposed hidden state in BRN is updated by the reused optical flow and refined by the Feature Refinement Module (FRM) for further optimization. Moreover, by utilizing intermediate flow estimation, the proposed method can inference non-linear motion and restore details better. Extensive experiments demonstrate that our optical-flow-reuse-based bidirectional recurrent network (OFR-BRN) is superior to state-of-the-art methods in accuracy and efficiency. Codes are available on URL: https://github.com/hahazh/OFR-BRN PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Motion Estimation for Complex Fluid Flows Using Helmholtz Decomposition
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Jun Chen;Hui Duan;Yuanxin Song;Zemin Cai;Guangguang Yang;Tianshu Liu;
Pages: 2129 - 2146 Abstract: In this paper, we proposed a novel motion model with Helmholtz decomposition for complex fluid flows in a filtering-based optical flow framework, where the optimization of the regularization term is treated as a filtering process, and different motion patterns can be captured by finding appropriate filter kernels based on the designed filters. In this framework, we introduce a novel optical flow method with a joint spatial filter, which is based on the Helmholtz decomposition theorem that assumes a local motion field is composed of a curl field and a divergence field. By adjusting the scale of the weights in the filter kernels and combining a curl filter with a divergence filter in a certain ratio, it can simulate different motion patterns. In addition, if the correlation between the horizontal and vertical components of the optical flow field in the filter kernel is eliminated, it will be transformed into a linear motion model. Based on this linear motion model, we also develop a novel optical flow method with an adaptive guided filter. By finding an adaptive filter kernel driven by both the input image and the guided motion field for the designed filter, it can successfully capture different motion patterns, and yield an edge-preserving smoothing optical flow field. Most importantly, the proposed optical flow model provides a new way to design the regularizers for capturing different motion patterns in complex fluid flow. In particular, the designed optical flow method with an adaptive guided filter significantly outperforms the current state-of-the-art optical flow methods in predicting complex fluid flows. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- HRInversion: High-Resolution GAN Inversion for Cross-Domain Image
Synthesis-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Peng Zhou;Lingxi Xie;Bingbing Ni;Lin Liu;Qi Tian;
Pages: 2147 - 2161 Abstract: We investigate GAN inversion problems of using pre-trained GANs to reconstruct real images. Recent methods for such problems typically employ a VGG perceptual loss to measure the difference between images. While the perceptual loss has achieved remarkable success in various computer vision tasks, it may cause unpleasant artifacts and is sensitive to changes in input scale. This paper delivers an important message that algorithm details are crucial for achieving satisfying performance. In particular, we propose two important but undervalued design principles: (i) not down-sampling the input of the perceptual loss to avoid high-frequency artifacts; and (ii) calculating the perceptual loss using convolutional features which are robust to scale. Integrating these designs derives the proposed framework, HRInversion, that achieves superior performance in reconstructing image details. We validate the effectiveness of HRInversion on a cross-domain image synthesis task and propose a post-processing approach named local style optimization (LSO) to synthesize clean and controllable stylized images. For the evaluation of the cross-domain images, we introduce a metric named ID retrieval which captures the similarity of face identities of stylized images to content images. We also test HRInversion on non-square images. Equipped with implicit neural representation, HRInversion applies to ultra-high resolution images with more than 10 million pixels. Furthermore, we show applications of style transfer and 3D-aware GAN inversion, paving the way for extending the application range of HRInversion. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Sequential Learning for Ingredient Recognition From Images
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Mengyang Zhang;Guohui Tian;Ying Zhang;Hong Liu;
Pages: 2162 - 2175 Abstract: To incorporate the cooking logic into ingredient recognition from food images is beneficial for food cognition. Compared with food categorization, ingredient recognition gives a better understanding on food cognition, by providing crucial information on food compositions. However, there exist situations in which different food are made of different ingredients, thus it is necessary to incorporate cooking logic into ingredient recognition to achieve a better food cognition. Based on this point, our paper proposes a sequential learning method to guide a neural network based (NN-based) model on producing ingredients following the corresponding cooking logic in recipes. Firstly, in order to make a maximum utilization of visual features from images, a double-flow feature fusion module (DFFF) is proposed to obtain features from two image-based, visual tasks (food name proposal and multi-label ingredient proposal). After that, fused features from DFFF, together with original image features, are feed into a bidirectional long short time memory (Bi-LSTM) based ingredient generator to produce sequential ingredients. To guide the sequential ingredient generation process, reinforcement learning is employed by designing a hybrid loss related to both the common and personality traits in ingredients for optimizing the model ability of associating images and sequential ingredients. In addition, sequential ingredients are utilized in a backward flow by reconstructing food images, so that sequential ingredient generation can be further optimized in a complementary manner. In experiments, the results demonstrate the superiority of our method on driving the model to allocate more attention to the correlation between images and sequential ingredients, and produced ingredients are comprehensive and logical. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Pull & Push: Leveraging Differential Knowledge Distillation for Efficient
Unsupervised Anomaly Detection and Localization-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Qihang Zhou;Shibo He;Haoyu Liu;Tao Chen;Jiming Chen;
Pages: 2176 - 2189 Abstract: Recently, much attention has been paid to segmenting subtle unknown defect regions by knowledge distillation in an unsupervised setting. Most previous studies concentrated on guiding the student network to learn the same representations on the normality, neglecting the different behaviors of the abnormality. This leads to a high probability of false detection of subtle defects. To address such an issue, we propose to push representations on abnormal areas of the teacher and student network as far as possible while pulling representations on normal areas as close as possible. Based on this idea, we design an efficient teacher-student model for anomaly detection and localization, which maximizes pixel-wise discrepancies for anomalous regions approximated by data augmentation and simultaneously minimizes discrepancies for pixel-wise normal regions between these two networks. The explicit differential knowledge distillation enlarges the margin between normal representations and abnormal ones in favour of discriminating them. Then, the appropriate small student network is not only efficient, but more importantly, helps inhibit the generalization ability of anomalous patterns when learning normal patterns, facilitating the precise decision boundary. The experimental results on the MVTec AD, Fashion-MNIST, and CIFAR-10 datasets demonstrate that our proposed method achieves better performance than current state-of-the-art (SOTA) approaches. Especially, For the MVTec AD dataset with high resolution images, we achieve 98.1 AUROC% and 93.6 AUPRO% in anomaly localization, outperforming knowledge distillation based SOTA methods by 1.1 AUROC% and 1.5 AUPRO% with a lightweight model. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- A Good Data Augmentation Policy is not All You Need: A Multi-Task Learning
Perspective-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Linfeng Zhang;Kaisheng Ma;
Pages: 2190 - 2201 Abstract: Data augmentation, which improves the diversity of datasets by applying image transformations, has become one of the most effective techniques in visual representation learning. Usually, the design of augmentation policies faces a diversity-difficulty trade-off. On the one hand, a simple augmentation leads to a low training set diversity, which can not improve model performance significantly. On the other hand, an excessively hard augmentation has an overlarge regularization effect which harms model performance. Recently, automatic augmentation methods have been proposed to address this issue by searching the optimal data augmentation policy from a predefined searching space. However, these methods still suffer from heavy searching overhead or complex optimization objectives. In this paper, instead of searching the optimal augmentation policy, we propose to break the diversity-difficulty trade-off from a multi-task learning perspective. By formulating model learning on the augmented images and the original images as the auxiliary task and the primary task in multi-task learning respectively, the hard augmentation does not directly influence the training of the primary branch and thus its negative influence can be alleviated. Hence, neural networks can learn valuable semantic information even with a totally random augmentation policy. Experimental results on ten datasets for four tasks demonstrate the superiority of our method over the other twelve methods. Codes have been released in https://github.com/ArchipLab-LinfengZhang/data-augmentation-multi-task. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Multi-Source Collaborative Contrastive Learning for Decentralized Domain
Adaptation-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Yikang Wei;Liu Yang;Yahong Han;Qinghua Hu;
Pages: 2202 - 2216 Abstract: Unsupervised multi-source domain adaptation aims to obtain a model working well on the unlabeled target domain by reducing the domain gap between the labeled source domains and the unlabeled target domain. Considering the data privacy and storage cost, data from multiple source domains and target domain are isolated and decentralized. This data decentralization scenario brings the difficulty of domain alignment for reducing the domain gap between the decentralized source domains and target domain, respectively. For conducting domain alignment under the data decentralization scenario, we propose Multi-source Collaborative Contrastive learning for decentralized Domain Adaptation (MCC-DA). The models from other domains are used as the bridge to reduce the domain gap. On the source domains and target domain, we penalize the inconsistency of data features extracted from the source domain models and target domain model by contrastive alignment. With the collaboration of source domain models and target domain model, the domain gap between decentralized source domains and target domain is reduced without accessing the data from other domains. The experiment results on multiple benchmarks indicate that our method can reduce the domain gap effectively and outperform the state-of-the-art methods significantly. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Anchor Assisted Experience Replay for Online Class-Incremental Learning
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Huiwei Lin;Shanshan Feng;Xutao Li;Wentao Li;Yunming Ye;
Pages: 2217 - 2232 Abstract: Online class-incremental learning (OCIL) studies the problem of mitigating the phenomenon of catastrophic forgetting while learning new classes from a continuously non-stationary data stream. Existing approaches mainly constrain the updating of parameters to prevent the drift of previous classes that reflects the movement of samples in the embedding space. Although this kind of drift can be relieved to some extent by existing approaches, it is usually inevitable. Therefore, only prevention of drift is not enough, and we also need to further compensate for it. To this end, for each previous class, we exploit the sample with the smallest loss value as its anchor, which can representatively characterize the corresponding class. Based on the assistance of anchors, we present a novel Anchor Assisted Experience Replay (AAER) method that not only prevents the drift but also compensates for the inevitable drift to overcome the catastrophic forgetting. Specifically, we design a Drift-Prevention with Anchor (DPA) operation, which plays a preventive role by reducing the drift implicitly as well as encouraging the samples with the same label cluster tightly. Moreover, we propose a Drift-Compensation with Anchor (DCA) operation that contains two remedy mechanisms: one is Forward-offset which keeps embedding of previous data but estimates new classification centers; the other is just the opposite named Backward-offset, which keeps the old classification centers unchanged but updates the embedding of previous data. We conduct extensive experiments on three real-world datasets, and empirical results consistently demonstrate the superior performance of AAER over various state-of-the-art methods. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Concept-Enhanced Relation Network for Video Visual Relation Inference
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Qianwen Cao;Heyan Huang;Mucheng Ren;Changsen Yuan;
Pages: 2233 - 2244 Abstract: Video visual relation inference aims at extracting the relation triplets in the form of < subject-predicate-object $>$ in videos. With the development of deep learning, existing approaches are designed based on data-driven neural networks. But the datasets are always biased in terms of objects and relation triplets, which make relation inference challenging. Existing approaches often describe the relationships from visual, spatial, and semantic characteristics. The semantic description plays a key role to indicate the potential linguistic connections between objects, that are crucial to transfer knowledge across relationships, especially for the determination of novel relations. However, in these works, the semantic features are not emphasized, but simply obtained by mapping object labels, which can not reflect sufficient linguistic meanings. To alleviate the above issues, we propose a novel network, termed Concept-Enhanced Relation Network (CERN), to facilitate video visual relation inference. Thanks to the attributes and linguistic contexts implied in concepts, the semantic representations aggregated with related concept knowledge of objects are of benefit to relation inference. To this end, we incorporate retrieved concepts with local semantics of objects via the gating mechanism to generate the concept-enhanced semantic representations. Extensive experimental results show that our approach has achieved state-of-the-art performance on two public datasets: ImageNet-VidVRD and VidOR. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Unsupervised Video-Based Action Recognition With Imagining Motion and
Perceiving Appearance-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Wei Lin;Xiaoyu Liu;Yihong Zhuang;Xinghao Ding;Xiaotong Tu;Yue Huang;Huanqiang Zeng;
Pages: 2245 - 2258 Abstract: Video-based action recognition is a challenging task, which demands carefully considering the temporal property of videos in addition to the appearance attributes. Particularly, the temporal domain of raw videos usually contains significantly more redundant or irrelevant information than still images. For that, this paper proposes an unsupervised video-based action recognition approach with imagining motion and perceiving appearance, called IMPA, by comprehensively learning the spatio-temporal characteristics inherited in videos, with a particular emphasis on the moving object for action recognition. Specifically, a self-supervised Motion Extracting Block (MEB) is designed to extract the principal motion features by focusing on the large movement of the moving object, based on the observation that humans can infer complete motion trajectories from partial moving objects. To further take the indispensable appearance attribute in videos into account, an unsupervised Appearance Learning Block (ALB) is developed to perceive the static appearance, thus in combination with the MEB to recognize actions. Extensive validation experiments and ablation studies on multiple datasets demonstrate that our proposed IMPA approach obtains superior performance and surpasses other classical and state-of-the-art unsupervised action recognition methods. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Hybrid Attention and Motion Constraint for Anomaly Detection in Crowded
Scenes-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Xinfeng Zhang;Jinpeng Fang;Baoqing Yang;Shuhan Chen;Bin Li;
Pages: 2259 - 2274 Abstract: Crowds often appear in surveillance videos in public places, from which anomaly detection is of great importance to public safety. Since the abnormal cases are rare, variable and unpredictable, autoencoders with encoder and decoder structures using only normal samples have become a hot topic among various approaches for anomaly detection. However, since autoencoders have excessive generalization ability, they can sometimes still reconstruct abnormal cases very well. Recently, some researchers construct memory modules under normal conditions and use these normal memory items to reconstruct test samples during inference to increase the reconstruction errors for anomalies. However, in practice, the errors of reconstructing normal samples with the memory items often increase as well, which makes it still difficult to distinguish between normal and abnormal cases. In addition, the memory-based autoencoder is usually available only in the specific scene where the memory module is constructed and almost loses the prospect of cross-scene applications. We mitigate the overgeneralization of autoencoders from a different perspective, namely, by reducing the prediction errors for normal cases rather than increasing the prediction errors for abnormal cases. To this end, we propose an autoencoder based on hybrid attention and motion constraint for anomaly detection. The hybrid attention includes the channel attention used in the encoding process and spatial attention added to the skip connection between the encoder and decoder. The hybrid attention is introduced to reduce the weight of the feature channels and regions representing the background in the feature matrix, which makes the autoencoder features more focused on optimizing the representation of the normal targets during training. Furthermore, we introduce motion constraint to improve the autoencoder’s ability to predict normal activities in crowded scenes. We conduct experiments on real-world surveilla-ce videos, UCSD, CUHK Avenue, and ShanghaiTech datasets. The experimental results indicate that the prediction errors of the proposed method for frequent normal crowd activities are smaller than those of other approaches, which increases the gap between the prediction errors for normal frames and the prediction errors for abnormal frames. In addition, the proposed method does not depend on a specific scene. Therefore, it balances good anomaly detection performance and strong cross-scene capability. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Embedding Global Contrastive and Local Location in Self-Supervised
Learning-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Wenyi Zhao;Chongyi Li;Weidong Zhang;Lu Yang;Peixian Zhuang;Lingqiao Li;Kefeng Fan;Huihua Yang;
Pages: 2275 - 2289 Abstract: Self-supervised representation learning (SSL) typically suffers from inadequate data utilization and feature-specificity due to the suboptimal sampling strategy and the monotonous optimization method. Existing contrastive-based methods alleviate these issues through exceedingly long training time and large batch size, resulting in non-negligible computational consumption and memory usage. In this paper, we present an efficient self-supervised framework, called GLNet. The key insights of this work are the novel sampling and ensemble learning strategies embedded in the self-supervised framework. We first propose a location-based sampling strategy to integrate the complementary advantages of semantic and spatial characteristics. Whereafter, a Siamese network with momentum update is introduced to generate representative vectors, which are used to optimize the feature extractor. Finally, we particularly embed global contrastive and local location tasks in the framework, which aims to leverage the complementarity between the high-level semantic features and low-level texture features. Such complementarity is significant for mitigating the feature-specificity and improving the generalizability, thus effectively improving the performance of downstream tasks. Extensive experiments on representative benchmark datasets demonstrate that GLNet performs favorably against the state-of-the-art SSL methods. Specifically, GLNet improves MoCo-v3 by 2.4% accuracy on ImageNet dataset, while improves 2% accuracy and consumes only 75% training time on the ImageNet-100 dataset. In addition, GLNet is appealing in its compatibility with popular SSL frameworks. Code is available at GLNet. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- A Dense-Aware Cross-splitNet for Object Detection and Recognition
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Sheng-Ye Wang;Zhong Qu;Cui-Jin Li;
Pages: 2290 - 2301 Abstract: Object detection and recognition is widely used in various fields and have become key technologies in computer vision. The distribution of objects in natural images can be roughly divided into densely stacked objects and scattered objects. Due to the incomplete attributes or features of some objects in densely stacked distributions, some object detectors have missed local area details or low detection accuracy. In this paper, we propose Cross-splitNet, a novel cross-split method for dense object detection and recognition based on candidate box generation. First, an adaptive feature extraction network is constructed. Different datasets are input into convolutional neural networks with various depths, the generalization of the model. Then, the proposed cross-split algorithm is introduced to guide the different deep networks to learn features of images with various densities, according to intermediate object density classification results. Finally, we adopt a feature pyramid network (FPN) subnet to perform multi-scale feature extraction while retaining lower-layer object information and physical characteristics. The model was trained on the COCO 17, VOC 12, and VOC 07 datasets, which contain a large number of object categories. Our network was compared with several two-stage detectors, and the results show that our model achieved an average precision (AP) of 0.819 at 22.9 frames per second (FPS) on the VOC 07+12 dataset. The mean average precision (mAP) of the object detection model with R50+R2-101 backbones on the COCO dataset was increased by 1.9%. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Weak-Boundary Sensitive Superpixel Segmentation Based on Local Adaptive
Distance-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Limin Sun;Dongyang Ma;Xiao Pan;Yuanfeng Zhou;
Pages: 2302 - 2316 Abstract: Superpixel segmentation provides a way to capture object boundaries unsupervised and has benefited many compute vision applications. However, under-segmentation for weak boundaries and poor compatibility with image feature representations often limit its wide application. In this paper, we propose a new weak-boundary sensitive superpixel generation method and provide an all-in-one solution for images with different feature representations. We first design a local adaptive distance (LAD) to be more sensitive to feature changes in low-contrast regions. LAD leverages image local standard deviation as region contrast clues. It adaptively increases the feature distances in low-contrast regions to avoid feature space distances of weak boundaries being inundated by regularity constraints. LAD is scale-invariant that can be compatible with high bit-depth and multi-feature images. Then, based on LAD, we introduce a novel morphological contour evolution model to generate superpixels iteratively. Leveraging morphological dilation of superpixel shapes, the new model is more conducive to the boundary detection of irregular or slender objects. Extensive experiments demonstrate that our method favorably outperforms state-of-the-art methods, especially regarding the under-segmentation error and segmentation accuracy. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Bridging Multi-Scale Context-Aware Representation for Object Detection
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Boying Wang;Ruyi Ji;Libo Zhang;Yanjun Wu;
Pages: 2317 - 2329 Abstract: Feature Pyramid Network (FPN) exploits multi-scale fusion representation to deal with scale variances in object detection. However, it ignores the context information gap across different levels. In this paper, we develop a plug-and-play detector, the multi-scale context-aware feature pyramid network to unleash the power of feature pyramid representation. Based on the dilated feature map at the highest level of the backbone, we propose the cross-scale context aggregation block to make full use of context information in the feature pyramid. Moreover, we extract discriminative features among different levels by the adaptive context aggregation block for robust object detection. Comprehensive experiments on MS-COCO demonstrate the effectiveness and efficiency of the proposed network, where about 1.0~3.0 AP improvements are achieved compared with existing FPN-based methods. In addition, we also conduct extensive experiments on pixel-level prediction tasks, i.e., instance segmentation, semantic segmentation, and panoptic segmentation, which further verify the effectiveness of the proposed method. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Robust Tracking via Learning Model Update With Unsupervised Anomaly
Detection Philosophy-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Jie Gao;Bineng Zhong;Yan Chen;
Pages: 2330 - 2341 Abstract: Template tracking is a typical paradigm to adaptively locate arbitrary objects in the tracking literature. Although existing works present diverse template updating approaches, one of the essential problems of template updating has not been solved effectively, i.e., when and how to update a template. In this work, we treat the updating time as an abnormal moment that indicates the previous template cannot depict the target accurately any more. Thus, we introduce an effective State-Edge Awareness (SEA) module that detect such abnormal moments via unsupervised anomaly detection. To be specific, by retaining multi search frames of a video, SEA firstly analysis the correlation features that generated by the template and search images. Then, it estimates the measurement for abnormal degree that is regarded as the sign for template updating. As a result, our method can not only capture the updating time automatically, but also update the templates effectively. Furthermore, the effectiveness of the proposed method has been verified on a representative CNN-based and Transformer-based tracker, respectively. The experimental results on five popular benchmarks show that our tracker can achieve the state-of-the-art performance. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- AO2-DETR: Arbitrary-Oriented Object Detection Transformer
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Linhui Dai;Hong Liu;Hao Tang;Zhiwei Wu;Pinhao Song;
Pages: 2342 - 2356 Abstract: Arbitrary-oriented object detection (AOOD) is a challenging task to detect objects in the wild with arbitrary orientations and cluttered arrangements. Existing approaches are mainly based on anchor-based boxes or dense points, which rely on complicated hand-designed processing steps and inductive bias, such as anchor generation, transformation, and non-maximum suppression reasoning. Recently, the emerging transformer-based approaches view object detection as a direct set prediction problem that effectively removes the need for hand-designed components and inductive biases. In this paper, we propose an Arbitrary-Oriented Object DEtection TRansformer framework, termed AO2-DETR, which comprises three dedicated components. More precisely, an oriented proposal generation mechanism is proposed to explicitly generate oriented proposals, which provides better positional priors for pooling features to modulate the cross-attention in the transformer decoder. An adaptive oriented proposal refinement module is introduced to extract rotation-invariant region features and eliminate the misalignment between region features and objects. And a rotation-aware set matching loss is used to ensure the one-to-one matching process for direct set prediction without duplicate predictions. Our method considerably simplifies the overall pipeline and presents a new AOOD paradigm. Comprehensive experiments on several challenging datasets show that our method achieves superior performance on the AOOD task. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Attention-Based Multi-View Feature Collaboration for Decoupled Few-Shot
Learning-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Shuai Shao;Lei Xing;Yanjiang Wang;Baodi Liu;Weifeng Liu;Yicong Zhou;
Pages: 2357 - 2369 Abstract: Decoupled Few-shot learning (FSL) is an effective methodology that deals with the problem of data-scarce. Its standard paradigm includes two phases: (1) Pre-train. Generating a CNN-based feature extraction model (FEM) via base data. (2) Meta-test. Employing the frozen FEM to obtain the novel data features, then classifying them. Obviously, one crucial factor, the category gap, prevents the development of FSL, i.e., it is challenging for the pre-trained FEM to adapt to the novel class flawlessly. Inspired by a common-sense theory: the FEMs based on different strategies focus on different priorities, we attempt to address this problem from the multi-view feature collaboration (MVFC) perspective. Specifically, we first denoise the multi-view features by subspace learning method, then design three attention blocks (loss-attention block, self-attention block and graph-attention block) to balance the representation between different views. The proposed method is evaluated on four benchmark datasets and achieves significant improvements of 0.9%-5.6% compared with SOTAs. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Reinforced Adaptation Network for Partial Domain Adaptation
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Keyu Wu;Min Wu;Zhenghua Chen;Ruibing Jin;Wei Cui;Zhiguang Cao;Xiaoli Li;
Pages: 2370 - 2380 Abstract: Domain adaptation enables generalized learning in new environments by transferring knowledge from label-rich source domains to label-scarce target domains. As a more realistic extension, partial domain adaptation (PDA) relaxes the assumption of fully shared label space, and instead deals with the scenario where the target label space is a subset of the source label space. In this paper, we propose a Reinforced Adaptation Network (RAN) to address the challenging PDA problem. Specifically, a deep reinforcement learning model is proposed to learn source data selection policies. Meanwhile, a domain adaptation model is presented to simultaneously determine rewards and learn domain-invariant feature representations. By combining reinforcement learning and domain adaptation techniques, the proposed network alleviates negative transfer by automatically filtering out less relevant source data and promotes positive transfer by minimizing the distribution discrepancy across domains. Experiments on three benchmark datasets demonstrate that RAN consistently outperforms seventeen existing state-of-the-art methods by a large margin. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Texture Brush for Fashion Inspiration Transfer: A Generative Adversarial
Network With Heatmap-Guided Semantic Disentanglement-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Han Yan;Haijun Zhang;Jianyang Shi;Jianghong Ma;
Pages: 2381 - 2395 Abstract: Automatically accomplishing intelligent fashion design with certain ‘inspiration’ images can greatly facilitate a designer’s design process, as well as allow users to interactively participate in the process. In this research, we propose a generative adversarial network with heatmap-guided semantic disentanglement (HSD-GAN) to perform an ‘intelligent’ design with ‘inspiration’ transfer. Our model aims to learn how to integrate the feature representations, from the styles of both source fashion items and target fashion items, in an unsupervised manner. Specifically, a semantic disentanglement attention-based encoder is proposed to capture the most discriminative regions of different input fashion items and disentangle the features into two key factors: attribute and texture. A generator is then developed to synthesize mixed-style fashion items by utilizing the two factors. In addition, a heatmap-based patch loss is introduced to evaluate the visual-semantic matching degree between the texture of the generated fashion items and the input texture information. Extensive experimental results show that our proposed HSD-GAN consistently achieves superior performance, compared to other state-of-the-art methods. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Deep Intra Prediction by Jointly Exploiting Local and Non-Local
Similarities-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Meng Lei;Jiaqi Zhang;Shiqi Wang;Shanshe Wang;Siwei Ma;
Pages: 2396 - 2409 Abstract: Intra prediction, which aims to remove the redundancies within a frame, has shown promising performance by simply projecting and interpolating samples along multiple angular directions. Recently, with numerous approaches devoted to learning nonlinear predictors with deep neural networks (DNN) based on local correlations, much less work has been dedicated to exploring non-local self-similarities in intra prediction. In this paper, we propose a unified prediction model that exploits both local and non-local correlations for intra prediction. The proposed model not only supports the nonlinear prediction using local reference samples as input, but also aggregates useful non-local information from a large reconstructed region with a Patch-level Non-local Attention Network (PNA-Net). More specifically, PNA-Net incorporates template matching with attention mechanism in feature domain to obtain the responses of all non-local features to the content to be predicted, leading to the prediction produced with weighted non-local patches. Finally, the predictions in the local and non-local manners are blended adaptively with a trainable network, ensuring the capability to handle a variety of contents. Experimental results on Versatile Video Coding (VVC) software VTM-11.0 show that the proposed model achieves on average 4.69% bit rate savings for natural scene sequences, and 4.24% bit rate savings for screen content sequences under the all intra configuration. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Advancing Learned Video Compression With In-Loop Frame Prediction
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Ren Yang;Radu Timofte;Luc Van Gool;
Pages: 2410 - 2423 Abstract: Recent years have witnessed an increasing interest in end-to-end learned video compression. Most previous works explore temporal redundancy by detecting and compressing a motion map to warp the reference frame towards the target frame. Yet, it failed to adequately take advantage of the historical priors in the sequential reference frames. In this paper, we propose an Advanced Learned Video Compression (ALVC) approach with the in-loop frame prediction module, which is able to effectively predict the target frame from the previously compressed frames, without consuming any bit-rate. The predicted frame can serve as a better reference than the previously compressed frame, and therefore it benefits the compression performance. The proposed in-loop prediction module is a part of the end-to-end video compression and is jointly optimized in the whole framework. We propose the recurrent and the bi-directional in-loop prediction modules for compressing P-frames and B-frames, respectively. The experiments show the state-of-the-art performance of our ALVC approach in learned video compression. We also outperform the default hierarchical B mode of x265 in terms of PSNR and beat the slowest mode of the SSIM-tuned x265 on MS-SSIM. The project page: https://github.com/RenYang-home/ALVC. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Rate-Distortion Modeling for Bit Rate Constrained Point Cloud Compression
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Pan Gao;Shengzhou Luo;Manoranjan Paul;
Pages: 2424 - 2438 Abstract: As being one of the main representation formats of 3D real world and well-suited for virtual reality and augmented reality applications, point clouds have gained a lot of popularity. In order to reduce the huge amount of data, a considerable amount of research on point cloud compression has been done. However, given a target bit rate, how to properly choose the color and geometry quantization parameters for compressing point clouds is still an open issue. In this paper, we propose a rate-distortion model based quantization parameter selection scheme for bit rate constrained point cloud compression. Firstly, to overcome the measurement uncertainty in evaluating the distortion of the point clouds, we propose a unified model to combine the geometry distortion and color distortion. In this model, we take into account the correlation between geometry and color variables of point clouds and derive a dimensionless quantity to represent the overall quality degradation. Then, we derive the relationships of overall distortion and bit rate with the quantization parameters. Finally, we formulate the bit rate constrained point cloud compression as a constrained minimization problem using the derived polynomial models and deduce the solution via an iterative numerical method. Experimental results show that the proposed algorithm can achieve optimal decoded point cloud quality at various target bit rates, and substantially outperform the video-rate-distortion model based point cloud compression scheme. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- NOMA-Based Uncoded Video Transmission With Optimization of Joint Resource
Allocation-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Chaofan He;Shuyuan Zhu;Bing Zeng;
Pages: 2439 - 2450 Abstract: The non-orthogonal multiple access (NOMA) technique has demonstrated potential for the multicast of multiple videos. However, it simply multiplexes limited number of signals in each single channel and cannot satisfy different video quality requirements of multiple users. To resolve this problem, we construct the NOMA-based uncoded multi-user video transmission (NOMA-UMVT) system in which the allocation of power and channel resources to all the users is jointly optimized to guarantee high video quality. Specifically, we first implement the power allocation of multiuser by converting it into the inter-channel and intra-channel allocation sub-problems. After solving these sub-problems for power allocation, we assign channels with the proposed two-staged channel assignment algorithm. The simulation results demonstrate the superior performance of our proposed NOMA-UMVT system when it is applied to transmit videos to multiple users. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Multi-Level Cascade Sparse Representation Learning for Small Data
Classification-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Wenyuan Zhong;Huaxiong Li;Qinghua Hu;Yang Gao;Chunlin Chen;
Pages: 2451 - 2464 Abstract: Deep learning (DL) methods have recently captured much attention for image classification. However, such methods may lead to a suboptimal solution for small-scale data since the lack of training samples. Sparse representation stands out with its efficiency and interpretability, but its precision is not so competitive. We develop a Multi-Level Cascade Sparse Representation (ML-CSR) learning method to combine both advantages when processing small-scale data. ML-CSR is proposed using a pyramid structure to expand the training data size. It adopts two core modules, the Error-To-Feature (ETF) module, and the Generate-Adaptive-Weight (GAW) module, to further improve the precision. ML-CSR calculates the inter-layer differences by the ETF module to increase the diversity of samples and obtains adaptive weights based on the layer accuracy in the GAW module. This helps ML-CSR learn more discriminative features. State-of-the-art results on the benchmark face databases validate the effectiveness of the proposed ML-CSR. Ablation experiments demonstrate that the proposed pyramid structure, ETF, and GAW module can improve the performance of ML-CSR. The code is available at https://github.com/Zhongwenyuan98/ML-CSR. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Image-Text Retrieval With Cross-Modal Semantic Importance Consistency
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Zejun Liu;Fanglin Chen;Jun Xu;Wenjie Pei;Guangming Lu;
Pages: 2465 - 2476 Abstract: Cross-modal image-text retrieval is an important area of Vision-and-Language task that models the similarity of image-text pairs by embedding features into a shared space for alignment. To bridge the heterogeneous gap between the two modalities, current approaches achieve inter-modal alignment and intra-modal semantic relationship modeling through complex weighted combinations between items. In the intra-modal association and inter-modal interaction processes, the higher-weight items have a higher contribution to the global semantics. However, the same item always produces different contributions in the two processes, since most traditional approaches only focus on the alignment. This usually results in semantic changes and misalignment. To address this issue, this paper proposes Cross-modal Semantic Importance Consistency (CSIC) which achieves invariance in the semantic of items during aligning. The proposed technique measures the semantic importance of items obtained from intra-modal and inter-modal self-attention and learns a more reasonable representation vector by inter-calibrating the importance distribution to improve performance. We conducted extensive experiments on the Flickr30K and MS COCO datasets. The results show that our approach can significantly improve retrieval performance, proving the proposed approach’s superiority and rationality. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Video File Allocation for Wear-Leveling in Distributed Storage Systems
With Heterogeneous Solid-State-Disks (SSDs)-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Dayoung Lee;Joonho Lee;Minseok Song;
Pages: 2477 - 2490 Abstract: With the advent of new large-capacity solid-state disks (SSDs) such as quad-level-cells (QLC), SSD arrays can be effectively used in video storage systems that require large-capacity storage space. Typically, SSD manufacturers specify a drive-writes-per-day (DWPD) metric, which is the ratio of bytes written per day to the total capacity in bytes, to ensure an SSD’s specified lifetime; it is important to limit the number of write operations by considering the DWPD for each SSD. We propose a new video file allocation technique to effectively manage the heterogeneous DWPD characteristics of SSDs in distributed storage systems. To express the degree of wear-leveling for heterogeneous SSDs, we first introduce the concept of ADWD, which is the actual number of bytes written per day compared to DWPD. We then propose two algorithms for file placement and migration. The file placement algorithm places files greedily based on the bandwidth-to-space ratio (BSR) of each file and SSD to balance the bandwidth usage and storage of the SSD. The file migration algorithm moves files from overloaded to underloaded SSDs to meet bandwidth limit requirements while minimizing the overall ADWD as a result of migration, and then migrates additional popular files to improve SSD bandwidth utilization. To use these algorithms in actual distributed file systems, we implemented a suite of tools for file placement and migration in the Hadoop distributed file system (HDFS). Experimental results show that the proposed algorithm reduces the mean of ADWD by 35.44% and its standard deviation by 69.78% compared to the benchmark methods on average. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Few-Shot Temporal Sentence Grounding via Memory-Guided Semantic Learning
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Daizong Liu;Pan Zhou;Zichuan Xu;Haozhao Wang;Ruixuan Li;
Pages: 2491 - 2505 Abstract: Temporal sentence grounding (TSG) is an important yet challenging task in video-based information retrieval. Given an untrimmed video input, it requires the machine to predict the interested video segment semantically related to a given sentence query. Most existing TSG methods train well-designed deep networks to align the semantic between video-query pairs for activity grounding with a large amount of data. However, we argue that these works easily capture the selection biases of video-query pairs in a dataset rather than showing the robust reasoning abilities to handle the rarely appeared pairs (i.e., few-shot contents). To alleviate such limitation of the off-balance data distribution during the network training, in this paper, we propose a novel memory-augmented network called Memory-Guided Semantic Learning Network (MGSL-Net) to handle the few-shot TSG task for enhancing the model generalization ability. Specifically, given the matched video-query input, we first employ a graph attentive cross-modal interaction module to align their semantics in a cycle-consistent manner. Then, we develop the memory modules in both video and query domains to record the cross-modal shared semantic features in the domain-specific persistent memory. At last, a heterogeneous attention module is utilized to integrate the memory-enhanced multi-modal features in both video and query domains with further feature calibration. During training, the memory modules are dynamically associated with both common and rare cases to memorize all appeared contents, alleviating the issue of forgetting the few-shot contents. Therefore, in testing, the rare cases can be enhanced by retrieving the stored memories, improving the generalization ability of the model. Experimental results on three benchmarks (ActivityNet Caption, Charades-STA and TACoS) show the superiority of our method on both effectiveness and efficiency. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- Image Encryption via Complementary Embedding Algorithm and New
Spatiotemporal Chaotic System-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Pengbo Liu;Xingyuan Wang;Yining Su;
Pages: 2506 - 2519 Abstract: Although image encryption is developing rapidly, it lacks the pertinence of specific scenarios. This paper proposes a complementary embedding encryption strategy. The strategy firstly identifies the airport area, then replaces the optimal similar area and embeds the airport image into a random position through the complementary embedding algorithm. In the encryption stage, to generate a better key stream, we propose an improved sine cross coupled mapping lattice (ISCCML). Comprehensive performance analysis shows that ISCCML has a larger parameter space and better chaotic cryptographic properties. Furthermore, a fractal disordered matrix (FDM) with iterative and out-of-order properties is presented for the simultaneous scrambling diffusion of images. In particular, the encryption algorithm has generalization and is also suitable for ordinary image encryption. Our results indicate that the proposed scheme can avoid repeated encryption on the premise of ensuring the security of important information; at the same time, the security analysis shows that our algorithm has security, practicality and scalability. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
- EIFNet: An Explicit and Implicit Feature Fusion Network for Finger Vein
Verification-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Yizhuo Song;Pengyang Zhao;Wenming Yang;Qingmin Liao;Jie Zhou;
Pages: 2520 - 2532 Abstract: Finger vein recognition has received more attention in recent years due to its high security and promising development potential. However, extracting complete vein patterns and obtaining features from the original images suffer from the low contrast of finger vein images, which dramatically restrains the performance of finger vein recognition algorithms. Inspired by this motivation, we propose an explicit and implicit feature fusion Network (EIFNet) for finger vein verification. It can extract more comprehensive and discriminative features by complementarily fusing the features extracted from binary vein masks and gray original images. We design a feature fusion module (FFM) acting as a bridge between mask feature extraction module (MFEM) and contextual feature extraction module (CFEM) to achieve the optimal fusion of features. To obtain more accurate vein masks, we develop a novel finger vein pattern extraction method and provide the first finger vein segmentation dataset THUFVS. We solve the difficulty of building finger vein segmentation datasets in a simple but effective way, and develop a complete process encompassing dataset creation, data augmentation refinement and network design, which refers to the Mask Generation Module (MGM), for the deep learning based finger vein pattern extraction method. Experimental results demonstrate the superior verification performance of EIFNet on three widely used datasets compared with other existing methods. PubDate:
May 2023
Issue No: Vol. 33, No. 5 (2023)
|