Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Real-time screen-sharing provides users with ubiquitous access to remote applications, such as computer games, movie players, and desktop applications (apps), anywhere and anytime. In this article, we study the performance of different screen-sharing technologies, which can be classified into native and clientless ones. The native ones dictate that users install special-purpose software, while the clientless ones directly run in web browsers. In particular, we conduct extensive experiments in three steps. First, we identify a suite of the most representative native and clientless screen-sharing technologies. Second, we propose a systematic measurement methodology for comparing screen-sharing technologies under diverse and dynamic network conditions using different performance metrics. PubDate: Fri, 28 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Xin Yang, Zongliang Ma, Letian Yu, Ying Cao, Baocai Yin, Xiaopeng Wei, Qiang Zhang, Rynson W. H. Lau
In this article, we propose a fully automatic system for generating comic books from videos without any human intervention. Given an input video along with its subtitles, our approach first extracts informative keyframes by analyzing the subtitles and stylizes keyframes into comic-style images. Then, we propose a novel automatic multi-page layout framework that can allocate the images across multiple pages and synthesize visually interesting layouts based on the rich semantics of the images (e.g., importance and inter-image relation). Finally, as opposed to using the same type of balloon as in previous works, we propose an emotion-aware balloon generation method to create different types of word balloons by analyzing the emotion of subtitles and audio. PubDate: Fri, 28 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Xiangyuan Lan, Zifei Yang, Wei Zhang, Pong C. Yuen
The development of multi-spectrum image sensing technology has brought great interest in exploiting the information of multiple modalities (e.g., RGB and infrared modalities) for solving computer vision problems. In this article, we investigate how to exploit information from RGB and infrared modalities to address two important issues in visual tracking: robustness and object re-detection. Although various algorithms that attempt to exploit multi-modality information in appearance modeling have been developed, they still face challenges that mainly come from the following aspects: (1) the lack of robustness to deal with large appearance changes and dynamic background, (2) failure in re-capturing the object when tracking loss happens, and (3) difficulty in determining the reliability of different modalities. PubDate: Fri, 28 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Brushstrokes are viewed as the artist’s “handwriting” in a painting. In many applications such as style learning and transfer, mimicking painting, and painting authentication, it is highly desired to quantitatively and accurately identify brushstroke characteristics from old masters’ pieces using computer programs. However, due to the nature of hundreds or thousands of intermingling brushstrokes in the painting, it still remains challenging. This article proposes an efficient algorithm for brush Stroke extraction based on a Deep neural network, i.e., DStroke. Compared to the state-of-the-art research, the main merit of the proposed DStroke is to automatically and rapidly extract brushstrokes from a painting without manual annotation, while accurately approximating the real brushstrokes with high reliability. PubDate: Mon, 10 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Phuong-Anh Nguyen, Chong-Wah Ngo
This article conducts user evaluation to study the performance difference between interactive and automatic search. Particularly, the study aims to provide empirical insights of how the performance landscape of video search changes, with tens of thousands of concept detectors freely available to exploit for query formulation. We compare three types of search modes: free-to-play (i.e., search from scratch), non-free-to-play (i.e., search by inspecting results provided by automatic search), and automatic search including concept-free and concept-based retrieval paradigms. The study involves a total of 40 participants; each performs interactive search over 15 queries of various difficulty levels using two search modes on the IACC.3 dataset provided by TRECVid organizers. PubDate: Mon, 10 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Auto-encoder has been widely used to compress high-dimensional data such as the images and videos. However, the traditional auto-encoder network needs to store a large number of parameters. Namely, when the input data is of dimension n, the number of parameters in an auto-encoder is in general O(n). In this article, we introduce a network structure called 3D Tensor Auto-Encoder (3DTAE). Unlike the traditional auto-encoder, in which a video is represented as a vector, our 3DTAE considers videos as 3D tensors to directly pass tensor objects through the network. The weights of each layer are represented by three small matrices, and thus the number of parameters in 3DTAE is just O(n1/3). PubDate: Mon, 10 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abbas Mehrabi, Matti Siekkinen, Teemu Kämäräinen, Antti yl¨-J¨¨ski
The availability of high bandwidth with low-latency communication in 5G mobile networks enables remote rendered real-time virtual reality (VR) applications. Remote rendering of VR graphics in a cloud removes the need for local personal computer for graphics rendering and augments weak graphics processing unit capacity of stand-alone VR headsets. However, to prevent the added network latency of remote rendering from ruining user experience, rendering a locally navigable viewport that is larger than the field of view of the HMD is necessary. The size of the viewport required depends on latency: Longer latency requires rendering a larger viewport and streaming more content. PubDate: Mon, 10 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Lu Sun, Hussein Al Osman, Jochen Lang
Our augmented reality online assistance platform enables an expert to specify 6DoF movements of a component and apply the geometrical and physical constraints in real-time. We track the real components on the expert’s side to monitor the operations of an expert. We leverage a remote rendering technique that we proposed previously to relieve the rendering burden of the augmented reality end devices. By conducting a user study, we show that the proposed method outperforms conventional instructional videos and sketches. The answers to the questionnaires show that the proposed method receives higher recommendation than sketching, and, compared to conventional instructional videos, is outstanding in terms of instruction clarity, preference, recommendation, and confidence of task completion. PubDate: Mon, 10 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Meiqi Zhao, Jianmin Zheng, Elvis S. Liu
In recent years, Massively Multiplayer Online Games (MMOGs) are becoming popular, partially due to their sophisticated graphics and broad virtual world, and cloud gaming is demanded more than ever especially when entertaining with light and portable devices. This article considers the problem of server allocation for running MMOG on cloud, aiming to reduce the cost on cloud gaming service and meanwhile enhance the quality of service. The problem is formulated into minimizing an objective function involving the cost of server rental, the cost of data transfer and the network latency during the gaming time. A genetic algorithm is developed to solve the minimization problem for processing simultaneous server allocation for the players who log into the system at the same time while many existing players are playing the same game. PubDate: Mon, 10 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Most existing image captioning methods use only the visual information of the image to guide the generation of captions, lack the guidance of effective scene semantic information, and the current visual attention mechanism cannot adjust the focus intensity on the image. In this article, we first propose an improved visual attention model. At each timestep, we calculated the focus intensity coefficient of the attention mechanism through the context information of the model, then automatically adjusted the focus intensity of the attention mechanism through the coefficient to extract more accurate visual information. In addition, we represented the scene semantic knowledge of the image through topic words related to the image scene, then added them to the language model. PubDate: Mon, 10 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
In this work, we address the task of scene recognition from image data. A scene is a spatially correlated arrangement of various visual semantic contents also known as concepts, e.g., “chair,” “car,” “sky,” etc. Representation learning using visual semantic content can be regarded as one of the most trivial ideas as it mimics the human behavior of perceiving visual information. Semantic multinomial (SMN) representation is one such representation that captures semantic information using posterior probabilities of concepts. The core part of obtaining SMN representation is the building of concept models. Therefore, it is necessary to have ground-truth (true) concept labels for every concept present in an image. PubDate: Mon, 10 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Prasen Kumar Sharma, Sujoy Ghosh, Arijit Sur
In this article, we address the problem of rain-streak removal in the videos. Unlike the image, challenges in video restoration comprise temporal consistency besides spatial enhancement. The researchers across the world have proposed several effective methods for estimating the de-noised videos with outstanding temporal consistency. However, such methods also amplify the computational cost due to their larger size. By way of analysis, incorporating separate modules for spatial and temporal enhancement may require more computational resources. It motivates us to propose a unified architecture that directly estimates the de-rained frame with maximal visual quality and minimal computational cost. To this end, we present a deep learning-based Frame-recurrent Multi-contextual Adversarial Network for rain-streak removal in videos. PubDate: Mon, 10 May 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
The technologies for real-time multimedia transmission and immersive 3D gaming applications are rapidly emerging, posing challenges in terms of performance, security, authentication, data privacy, and encoding. The communication channel for these multimedia applications must be secure and reliable from network attack vectors and data-contents must employ strong encryption to preserve privacy and confidentiality. Towards delivering secure multimedia application environment for 5G networks, we propose an SDN/NFV (Software-Defined-Networking/Network-Function-Virtualization) framework called STREK, which attempts to deliver highly adaptable Quality-of-Experience (QoE), Security, and Authentication functions for multi-domain Cloud to Edge networks. The STREK architecture consists of a holistic SDNFV dataplane, NFV service-chaining and network slicing, a lightweight adaptable hybrid cipher scheme called TREK, and an open RESTful API for applications to deploy custom policies at runtime for multimedia services. PubDate: Wed, 21 Apr 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: S. Sambath Kumar, M. Nandhini
Alzheimer’s Disease (AD) is an irreversible neurogenerative disorder that undergoes progressive decline in memory and cognitive function and is characterized by structural brain Magnetic Resonance Images (sMRI). In recent years, sMRI data has played a vital role in the evaluation of brain anatomical changes, leading to early detection of AD through deep networks. The existing AD problems such as preprocessing complexity and unreliability are major concerns at present. To overcome these, a model (FEESCTL) has been proposed with an entropy slicing for feature extraction and Transfer Learning for classification. In the present study, the entropy image slicing method is attempted for selecting the most informative MRI slices during training stages. PubDate: Wed, 21 Apr 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
The Internet of Vehicles (IoV) connects vehicles, roadside units (RSUs) and other intelligent objects, enabling data sharing among them, thereby improving the efficiency of urban traffic and safety. Currently, collections of multimedia content, generated by multimedia surveillance equipment, vehicles, and so on, are transmitted to edge servers for implementation, because edge computing is a formidable paradigm for accommodating multimedia services with low-latency resource provisioning. However, the uneven or discrete distribution of the traffic flow covered by edge servers negatively affects the service performance (e.g., overload and underload) of edge servers in multimedia IoV systems. Therefore, how to accurately schedule and dynamically reserve proper numbers of resources for multimedia services in edge servers is still challenging. PubDate: Wed, 21 Apr 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
With the rapid advancement of video and image processing technologies in the Internet of Things, it is urgent to address the issues in real-time performance, clarity, and reliability of image recognition technology for a monitoring system in foggy weather conditions. In this work, a fast defogging image recognition algorithm is proposed based on bilateral hybrid filtering. First, the mathematical model based on bilateral hybrid filtering is established. The dark channel is used for filtering and denoising the defogging image. Next, a bilateral hybrid filtering method is proposed by using a combination of guided filtering and median filtering, as it can effectively improve the robustness and transmittance of defogging images. PubDate: Wed, 21 Apr 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Today, users of social platforms upload a large number of photos. These photos contain personal private information, including user identity information, which is easily gleaned by intelligent detection algorithms. To thwart this, in this work, we propose an intelligent algorithm to prevent deep neural network (DNN) detectors from detecting private information, especially human faces, while minimizing the impact on the visual quality of the image. More specifically, we design an image privacy protection algorithm by training and generating a corresponding adversarial sample for each image to defend DNN detectors. In addition, we propose an improved model based on the previous model by training an adversarial perturbation generative network to generate perturbation instead of training for each image. PubDate: Wed, 21 Apr 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Mythili K., Manish Narwaria
Quality assessment of audiovisual (AV) signals is important from the perspective of system design, optimization, and management of a modern multimedia communication system. However, automatic prediction of AV quality via the use of computational models remains challenging. In this context, machine learning (ML) appears to be an attractive alternative to the traditional approaches. This is especially when such assessment needs to be made in no-reference (i.e., the original signal is unavailable) fashion. While development of ML-based quality predictors is desirable, we argue that proper assessment and validation of such predictors is also crucial before they can be deployed in practice. To this end, we raise some fundamental questions about the current approach of ML-based model development for AV quality assessment and signal processing for multimedia communication in general. PubDate: Wed, 21 Apr 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Kenta Hama, Takashi Matsubara, Kuniaki Uehara, Jianfei Cai
With the significant development of black-box machine learning algorithms, particularly deep neural networks, the practical demand for reliability assessment is rapidly increasing. On the basis of the concept that “Bayesian deep learning knows what it does not know,” the uncertainty of deep neural network outputs has been investigated as a reliability measure for classification and regression tasks. By considering an embedding task as a regression task, several existing studies have quantified the uncertainty of embedded features and improved the retrieval performance of cutting-edge models by model averaging. However, in image-caption embedding-and-retrieval tasks, well-known samples are not always easy to retrieve. This study shows that the existing method has poor performance in reliability assessment and investigates another aspect of image-caption embedding-and-retrieval tasks. PubDate: Wed, 21 Apr 2021 00:00:00 GMT
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.