Adaptive Educational Hypermedia (AEH) e-learning models aim to personalize educational content and learning resources based on the needs of an individual learner. The Adaptive Hypermedia Architecture (AHA) is a specific implementation of the AEH model that exploits the cognitive characteristics of learner feedback to adapt resources accordingly. However, beside cognitive feedback, the learning realm generally includes both the affective and emotional feedback of the learner, which is often neglected in the design of e-learning models. This article aims to explore the potential of utilizing affect or emotion recognition research in AEH models. The framework is referred to as Multiple Kernel Learning Decision Tree Weighted Kernel Alignment (MKLDT-WFA). PubDate: Thu, 04 Jan 2018 00:00:00 GMT
In this article, we address the cross-domain (i.e., street and shop) clothing retrieval problem and investigate its real-world applications for online clothing shopping. It is a challenging problem due to the large discrepancy between street and shop domain images. We focus on learning an effective feature-embedding model to generate robust and discriminative feature representation across domains. Existing triplet embedding models achieve promising results by finding an embedding metric in which the distance between negative pairs is larger than the distance between positive pairs plus a margin. However, existing methods do not address the challenges in the cross-domain clothing retrieval scenario sufficiently. PubDate: Thu, 04 Jan 2018 00:00:00 GMT
Abstract: Sicong Liu, Silvestro Roberto Poccia, K. Selçuk Candan, Maria Luisa Sapino, Xiaolan Wang
Many applications generate and/or consume multi-variate temporal data, and experts often lack the means to adequately and systematically search for and interpret multi-variate observations. In this article, we first observe that multi-variate time series often carry localized multi-variate temporal features that are robust against noise. We then argue that these multi-variate temporal features can be extracted by simultaneously considering, at multiple scales, temporal characteristics of the time series along with external knowledge, including variate relationships that are known a priori. Relying on these observations, we develop data models and algorithms to detect robust multi-variate temporal (RMT) features that can be indexed for efficient and accurate retrieval and can be used for supporting data exploration and analysis tasks. PubDate: Thu, 04 Jan 2018 00:00:00 GMT
This article tackles the problem of joint estimation of human age and facial expression. This is an important yet challenging problem because expressions can alter face appearances in a similar manner to human aging. Different from previous approaches that deal with the two tasks independently, our approach trains a convolutional neural network (CNN) model that unifies ordinal regression and multi-class classification in a single framework. We demonstrate experimentally that our method performs more favorably against state-of-the-art approaches. PubDate: Thu, 04 Jan 2018 00:00:00 GMT
Abstract: Zhaoqing Pan, Jianjun Lei, Yajuan Zhang, Fu Lee Wang
High-Efficiency Video Coding (HEVC) efficiently addresses the storage and transmit problems of high-definition videos, especially for 4K videos. The variable-size Prediction Units (PUs)--based Motion Estimation (ME) contributes a significant compression rate to the HEVC encoder and also generates a huge computation load. Meanwhile, high-level encoding complexity prevents widespread adoption of the HEVC encoder in multimedia systems. In this article, an adaptive fractional-pixel ME skipped scheme is proposed for low-complexity HEVC ME. First, based on the property of the variable-size PUs--based ME process and the video content partition relationship among variable-size PUs, all inter-PU modes during a coding unit encoding process are classified into root-type PU mode and children-type PU modes. PubDate: Thu, 04 Jan 2018 00:00:00 GMT
Abstract: Weiwei Sun, Jiantao Zhou, Shuyuan Zhu, Yuan Yan Tang
Sharing images online has become extremely easy and popular due to the ever-increasing adoption of mobile devices and online social networks (OSNs). The privacy issues arising from image sharing over OSNs have received significant attention in recent years. In this article, we consider the problem of designing a secure, robust, high-fidelity, storage-efficient image-sharing scheme over Facebook, a representative OSN that is widely accessed. To accomplish this goal, we first conduct an in-depth investigation on the manipulations that Facebook performs to the uploaded images. Assisted by such knowledge, we propose a DCT-domain image encryption/decryption framework that is robust against these lossy operations. PubDate: Thu, 04 Jan 2018 00:00:00 GMT
Learning-based hashing has been researched extensively in the past few years due to its great potential in fast and accurate similarity search among huge volumes of multimedia data. In this article, we present a novel multimedia hashing framework, called Label Preserving Multimedia Hashing (LPMH) for multimedia similarity search. In LPMH, a general optimization method is used to learn the joint binary codes of multiple media types by explicitly preserving semantic label information. Compared with existing hashing methods which are typically developed under and thus restricted to some specific objective functions, the proposed optimization strategy is not tied to any specific loss function and can easily incorporate bit balance constraints to produce well-balanced binary codes. PubDate: Wed, 20 Dec 2017 00:00:00 GMT
Abstract: Rodrigo Ceballos, Beatrice Ionascu, Wanjoo Park, Mohamad Eid
Today, ubiquitous digital communication systems do not have an intuitive, natural way of communicating emotion, which, in turn, affects the degree to which humans can emotionally connect and interact with one another. To address this problem, a more natural, intuitive, and implicit emotion communication system was designed and created that employs asymmetry-based EEG emotion classification for detecting the emotional state of the sender and haptic feedback (in the form of tactile gestures) for displaying emotions for a receiver. Emotions are modeled in terms of valence (positive/negative emotions) and arousal (intensity of the emotion). Performance analysis shows that the proposed EEG subject-dependent emotion classification model with Free Asymmetry features allows for more flexible feature-generation schemes than other existing algorithms and attains an average accuracy of 92.5% for valence and 96.5% for arousal, outperforming previous-generation schemes in high feature space. PubDate: Wed, 20 Dec 2017 00:00:00 GMT
Abstract: Dan Guo, Wengang Zhou, Houqiang Li, Meng Wang
In sign language recognition (SLR) with multimodal data, a sign word can be represented by multiply features, for which there exist an intrinsic property and a mutually complementary relationship among them. To fully explore those relationships, we propose an online early-late fusion method based on the adaptive Hidden Markov Model (HMM). In terms of the intrinsic property, we discover that inherent latent change states of each sign are related not only to the number of key gestures and body poses but also to their translation relationships. We propose an adaptive HMM method to obtain the hidden state number of each sign by affinity propagation clustering. PubDate: Wed, 20 Dec 2017 00:00:00 GMT
Abstract: Jiyan Wu, Bo Cheng, Yuan Yang, Ming Wang, Junliang Chen
Cloud-assisted video streaming has emerged as a new paradigm to optimize multimedia content distribution over the Internet. This article investigates the problem of streaming cloud-assisted real-time video to multiple destinations (e.g., cloud video conferencing, multi-player cloud gaming, etc.) over lossy communication networks. The user diversity and network dynamics result in the delay differences among multiple destinations. This research proposes Differentiated cloud-Assisted VIdeo Streaming (DAVIS) framework, which proactively leverages such delay differences in video coding and transmission optimization. First, we analytically formulate the optimization problem of joint coding and transmission to maximize received video quality. Second, we develop a quality optimization framework that integrates the video representation selection and FEC (Forward Error Correction) packet interleaving. PubDate: Wed, 13 Dec 2017 00:00:00 GMT
Content-based image retrieval (CBIR) is one of the most important applications of computer vision. In recent years, there have been many important advances in the development of CBIR systems, especially Convolutional Neural Networks (CNNs) and other deep-learning techniques. On the other hand, current CNN-based CBIR systems suffer from high computational complexity of CNNs. This problem becomes more severe as mobile applications become more and more popular. The current practice is to deploy the entire CBIR systems on the server side while the client side only serves as an image provider. This architecture can increase the computational burden on the server side, which needs to process thousands of requests per second. PubDate: Wed, 13 Dec 2017 00:00:00 GMT
Abstract: Shao Huang, Weiqiang Wang, Shengfeng He, Rynson W. H. Lau
Egocentric videos, which mainly record the activities carried out by the users of wearable cameras, have drawn much research attention in recent years. Due to its lengthy content, a large number of ego-related applications have been developed to abstract the captured videos. As the users are accustomed to interacting with the target objects using their own hands, while their hands usually appear within their visual fields during the interaction, an egocentric hand detection step is involved in tasks like gesture recognition, action recognition, and social interaction understanding. In this work, we propose a dynamic region-growing approach for hand region detection in egocentric videos, by jointly considering hand-related motion and egocentric cues. PubDate: Wed, 13 Dec 2017 00:00:00 GMT
In this article, we revisit two popular convolutional neural networks in person re-identification (re-ID): verification and identification models. The two models have their respective advantages and limitations due to different loss functions. Here, we shed light on how to combine the two models to learn more discriminative pedestrian descriptors. Specifically, we propose a Siamese network that simultaneously computes the identification loss and verification loss. Given a pair of training images, the network predicts the identities of the two input images and whether they belong to the same identity. Our network learns a discriminative embedding and a similarity measurement at the same time, thus taking full usage of the re-ID annotations. PubDate: Wed, 13 Dec 2017 00:00:00 GMT