Journal Cover
Information Processing & Management
Journal Prestige (SJR): 0.92
Citation Impact (citeScore): 4
Number of Followers: 520  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 0306-4573
Published by Elsevier Homepage  [3184 journals]
  • Reposting negative information on microblogs: Do personality traits
           matter'
    • Abstract: Publication date: January 2020Source: Information Processing & Management, Volume 57, Issue 1Author(s): Chunxiao Yin, Xiaofei Zhang, Libo LiuAbstractForwarding negative information on microblogs, termed reposting negative information (RNI) in this study, refers to reposting negative, non-original information publicly on microblogs, causes large-scale bad news dissemination on microblogs, which in turn has detrimental consequences for organizations and the society. However, previous research concentrated on sharing of original content (such as knowledge sharing and word-of-mouth) or focused on general information forwarding on social media without distinguishing between positive and negative information. To address this issue, this study develops a model to investigate the predictors of RNI on microblogs (negative emotions and issue involvement in this study) and explores the contingency role of personality. A scenario-based online survey was conducted to test the proposed model and hypotheses. The empirical results confirmed (1) the direct and positive effects of negative emotions and issue involvement, (2) the negative moderation effect of extraversion on the relationship between negative emotions and RNI, and (3) the positive moderation effects of conscientiousness and agreeableness on the relationship between issue involvement and RNI. The study contributes to the literature by revealing the predictors of RNI on microblogs and by investigating the contingency role of personality.
       
  • Consumer health information needs: A systematic review of measures
    • Abstract: Publication date: Available online 14 September 2019Source: Information Processing & ManagementAuthor(s): Wenjing Pian, Shijie Song, Yan ZhangAbstractInformation needs motivate human information behavior. Knowledge of information needs is critical for user-centered information behavior research and system design. In consumer health information behavior research, there is a lack of understanding of how consumer health information needs (CHIN) is measured in empirical studies. This study is a systematic review of empirical quantitative studies on CHIN, with a focus on how CHIN is defined and operationalized. A search of six academic databases and citation-track of relevant articles identified a total of 216 relevant articles. These articles were analyzed using the qualitative content analysis method. We found that few included articles explicitly defined either CHIN or information needs in general. When definitions were given, they were from a cognitive perspective and largely ignored the multidimensionality of the concept. Consistent with this cognitive-centered conceptualization, CHIN was operationalized primarily as information topics, with some articles also measuring several additional attributes, including level of importance, fulfilment, amount of information needed, and frequency of needs. These findings suggest that CHIN is undertheorized. To address this gap, future studies should attend to social and emotional dimensions of CHIN, such as motivations, goals, activities, and emotions. Further, more research is needed to understand how CHIN is related to consumer health information seeking behavior and to the social and environmental context in which the needs arise.
       
  • Focal elements of neural information retrieval models. An outlook through
           a reproducibility study
    • Abstract: Publication date: Available online 13 September 2019Source: Information Processing & ManagementAuthor(s): Stefano Marchesin, Alberto Purpura, Gianmaria SilvelloAbstractThis paper analyzes two state-of-the-art Neural Information Retrieval (NeuIR) models: the Deep Relevance Matching Model (DRMM) and the Neural Vector Space Model (NVSM). Our contributions include: (i) a reproducibility study of two state-of-the-art supervised and unsupervised NeuIR models, where we present the issues we encountered during their reproducibility; (ii) a performance comparison with other lexical, semantic and state-of-the-art models, showing that traditional lexical models are still highly competitive with DRMM and NVSM; (iii) an application of DRMM and NVSM on collections from heterogeneous search domains and in different languages, which helped us to analyze the cases where DRMM and NVSM can be recommended; (iv) an evaluation of the impact of varying word embedding models on DRMM, showing how relevance-based representations generally outperform semantic-based ones; (v) a topic-by-topic evaluation of the selected NeuIR approaches, comparing their performance to the well-known BM25 lexical model, where we perform an in-depth analysis of the different cases where DRMM and NVSM outperform the BM25 model or fail to do so. We run an extensive experimental evaluation to check if the improvements of NeuIR models, if any, over the selected baselines are statistically significant.
       
  • The effect of the perceived risk on the adoption of the sharing economy in
           the tourism industry: The case of Airbnb
    • Abstract: Publication date: Available online 10 September 2019Source: Information Processing & ManagementAuthor(s): Jisu Yi, Gao Yuan, Changsok YooAbstractSmart tourism and the sharing economy within it are transforming human lives and are considered a huge innovation in the industry. This change inevitably creates huge resistance, which did not obtain much attention. Thus, this study focuses on sharing economy's risk aspects, which have become a social issue. It investigates how risks affect the development and diffusion of the sharing economy, especially in Airbnb. This study adopts extended model of goal-directed behavior and depicts the decision-making process of potential Airbnb users to analyze risk effect. Results of structural equation modeling applied to 300 potential customers indicate that privacy and financial risks negatively affect the intention to use the sharing economy. However, physical and performance risks are positively related with behavioral intention or desire. This risk paradox can be explained by the disruptive innovation of the sharing economy and the characteristics of risk engagement in tourism. Implications for research and practice are discussed along with the findings of the study.
       
  • Evaluating the use of interactive virtual reality technology with older
           adults living in residential aged care
    • Abstract: Publication date: Available online 9 September 2019Source: Information Processing & ManagementAuthor(s): Steven Baker, Jenny Waycott, Elena Robertson, Romina Carrasco, Barbara Barbosa Neves, Ralph Hampson, Frank VetereAbstractBackground and objectivesAs technologies gain traction within the aged care community, better understanding their impact becomes vital. This paper reports on a study that explored the deployment of virtual reality (VR) as a tool to engage older adults in Residential Aged Care Facilities (RACF). The paper has two aims: 1) to identify the benefits and challenges associated with using VR with residents in aged care settings, and 2) to gather the views of older adult residents in RACF about the potential uses of VR in aged care.Research design and methodsFive RACF residents and five RACF staff members took part in an intensive two-week evaluation of a VR system. Qualitative data was collected from multiple interviews and via researcher notes and video recordings made during the VR sessions.ResultsResults highlight the usability issues that impacted on the aged care residents' ability to use interactive VR technology and the potential negative impact head mounted displays can have on those living with dementia; the role that VR can play in engaging residents who might otherwise self-isolate, and how this can extend to increased engagement with family and friends.Discussion and implicationsWe discuss the design challenges that will need to be met in order to ensure that interactive VR technology can be used by residents living in aged care, and the potential for VR to be used as a tool to improve the quality of life of some older residents, particularly those for whom traditional social activities do not appeal.
       
  • Immigrating after 60: Information experiences of older Chinese migrants to
           Australia and Canada
    • Abstract: Publication date: Available online 9 September 2019Source: Information Processing & ManagementAuthor(s): Nadia Caidi, Jia Tina Du, Lei Li, Junyue Mavis Shend, Qiaoling SunbAbstractWhile there is much research on migrant information behavior, the older population tends to be underrepresented in the literature. This article reports on a qualitative study with 16 Chinese older adults (aged 60 and over) who were recent immigrants to Australia and Canada. Migrating late in life presents some unique characteristics and challenges. In both countries, the discourse of “family reunification” frames the experiences of the participants, including their information activities as they learn to navigate the new environment. We used a parallel approach across the two countries to examine these older adults’ information practices as well as the transnational dimension of their settlement process. Findings point to a shared social imaginary as well as daily rituals and coping mechanisms of these late-life immigrants, along with associated information activities. We draw implications for our understanding of this under-studied migrant population, as well as for the design of information support for older migrants as part of their social inclusion in the host country.
       
  • Publisher's Note
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s):
       
  • Automatic classification of complaint letters according to service
           provider categories
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Yaakov HaCohen-Kerner, Rakefet Dilmon, Maor Hone, Matanya Aharon Ben-BasanAbstractIn the technological age, the phenomenon of complaint letters published on the Internet is increasing. Therefore, it is important to automatically classify complaint letters according to various criteria, such as company categories. In this research, we investigated the automatic text classification of complaint letters written in Hebrew that were sent to various companies from a wide variety of categories. The classification was performed according to company categories such as insurance, cellular communication, and rental cars. We conducted an extensive set of classification experiments of complaint letters to seven/six/five/four company categories. The classification experiments were performed using various sets of word unigrams, four machine learning methods, two feature filtering methods, and parameter tuning. The classification results are relatively high for all six measures: accuracy, precision, recall, F1, PRC-area, and ROC-area. The best accuracy results for seven, six, five, and four categories are 84.5%, 88.4%, 91.4%, and 93.8%, respectively. An analysis of the most frequently occurring words in the complaints about almost all categories revealed that the most significant issues were related to poor service and delayed delivery. An interesting result shows that only in the domain of hospitals was the subject of the domain itself (i.e., the patient, the medical treatment, the place of the treatment, and the medical staff) the most important issue. Another interesting finding is that the issue of “price” was of little or no importance to the complainants. These findings suggest that in their preoccupation with their bottom line of profitability, many service providers are blind to how paramount good service and timely delivery (and, in the case of hospitals, the domain itself) are to their clientele.
       
  • SLTFNet: A spatial and language-temporal tensor fusion network for video
           moment retrieval
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Bin Jiang, Xin Huang, Chao Yang, Junsong YuanAbstractThis paper focuses on temporal retrieval of activities in videos via sentence queries. Given a sentence query describing an activity, temporal moment retrieval aims at localizing the temporal segment within the video that best describes the textual query. This is a general yet challenging task as it requires the comprehending of both video and language. Existing research predominantly employ coarse frame-level features as the visual representation, obfuscating the specific details (e.g., the desired objects “girl”, “cup” and action “pour”) within the video which may provide critical cues for localizing the desired moment. In this paper, we propose a novel Spatial and Language-Temporal Tensor Fusion (SLTF) approach to resolve those issues. Specifically, the SLTF method first takes advantage of object-level local features and attends to the most relevant local features (e.g., the local features “girl”, “cup”) by spatial attention. Then we encode the sequence of the local features on consecutive frames by employing LSTM network, which can capture the motion information and interactions among these objects (e.g., the interaction “pour” involving these two objects). Meanwhile, language-temporal attention is utilized to emphasize the keywords based on moment context information. Thereafter, a tensor fusion network learns both the intra-modality and inter-modality dynamics, which can enhance the learning of moment-query representation. Therefore, our proposed two attention sub-networks can adaptively recognize the most relevant objects and interactions in the video, and simultaneously highlight the keywords in the query for retrieving the desired moment. Experimental results on three public benchmark datasets (obtained from TACOS, Charades-STA, and DiDeMo) show that the SLTF model significantly outperforms current state-of-the-art approaches, and demonstrate the benefits produced by new technologies incorporated into SLTF.
       
  • Large-scale instance-level image retrieval
    • Abstract: Publication date: Available online 29 August 2019Source: Information Processing & ManagementAuthor(s): Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Lucia VadicamoAbstractThe great success of visual features learned from deep neural networks has led to a significant effort to develop efficient and scalable technologies for image retrieval. Nevertheless, its usage in large-scale Web applications of content-based retrieval is still challenged by their high dimensionality. To overcome this issue, some image retrieval systems employ the product quantization method to learn a large-scale visual dictionary from a training set of global neural network features. These approaches are implemented in main memory, preventing their usage in big-data applications. The contribution of the work is mainly devoted to investigating some approaches to transform neural network features into text forms suitable for being indexed by a standard full-text retrieval engine such as Elasticsearch. The basic idea of our approaches relies on a transformation of neural network features with the twofold aim of promoting the sparsity without the need of unsupervised pre-training. We validate our approach on a recent convolutional neural network feature, namely Regional Maximum Activations of Convolutions (R-MAC), which is a state-of-art descriptor for image retrieval. Its effectiveness has been proved through several instance-level retrieval benchmarks. An extensive experimental evaluation conducted on the standard benchmarks shows the effectiveness and efficiency of the proposed approach and how it compares to state-of-the-art main-memory indexes.
       
  • Deep ranking based cost-sensitive multi-label learning for distant
           supervision relation extraction
    • Abstract: Publication date: Available online 26 August 2019Source: Information Processing & ManagementAuthor(s): Hai Ye, Zhunchen LuoAbstractKnowledge base provides a potential way to improve the intelligence of information retrieval (IR) systems, for that knowledge base has numerous relations between entities which can help the IR systems to conduct inference from one entity to another entity. Relation extraction is one of the fundamental techniques to construct a knowledge base. Distant supervision is a semi-supervised learning method for relation extraction which learns with labeled and unlabeled data. However, this approach suffers the problem of relation overlapping in which one entity tuple may have multiple relation facts. We believe that relation types can have latent connections, which we call class ties, and can be exploited to enhance relation extraction. However, this property between relation classes has not been fully explored before. In this paper, to exploit class ties between relations to improve relation extraction, we propose a general ranking based multi-label learning framework combined with convolutional neural networks, in which ranking based loss functions with regularization technique are introduced to learn the latent connections between relations. Furthermore, to deal with the problem of class imbalance in distant supervision relation extraction, we further adopt cost-sensitive learning to rescale the costs from the positive and negative labels. Extensive experiments on a widely used dataset show the effectiveness of our model to exploit class ties and to relieve class imbalance problem.
       
  • Big data adoption: State of the art and research challenges
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Maria Ijaz Baig, Liyana Shuib, Elaheh YadegaridehkordiAbstractBig data adoption is a process through which businesses find innovative ways to enhance productivity and predict risk to satisfy customers need more efficiently. Despite the increase in demand and importance of big data adoption, there is still a lack of comprehensive review and classification of the existing studies in this area. This research aims to gain a comprehensive understanding of the current state-of-the-art by highlighting theoretical models, the influence factors, and the research challenges of big data adoption. By adopting a systematic selection process, twenty studies were identified in the domain of big data adoption and were reviewed in order to extract relevant information that answers a set of research questions. According to the findings, Technology–Organization–Environment and Diffusion of Innovations are the most popular theoretical models used for big data adoption in various domains. This research also revealed forty-two factors in technology, organization, environment, and innovation that have a significant influence on big data adoption. Finally, challenges found in the current research about big data adoption are represented, and future research directions are recommended. This study is helpful for researchers and stakeholders to take initiatives that will alleviate the challenges and facilitate big data adoption in various fields.
       
  • Information need: Introduction to the special issue
    • Abstract: Publication date: Available online 23 August 2019Source: Information Processing & ManagementAuthor(s): Pia Borlund, Ian Ruthven
       
  • Dynamic attention-based explainable recommendation with textual and visual
           fusion
    • Abstract: Publication date: Available online 20 August 2019Source: Information Processing & ManagementAuthor(s): Peng Liu, Lemei Zhang, Jon Atle GullaAbstractExplainable recommendation, which provides explanations about why an item is recommended, has attracted growing attention in both research and industry communities. However, most existing explainable recommendation methods cannot provide multi-model explanations consisting of both textual and visual modalities or adaptive explanations tailored for the user’s dynamic preference, potentially leading to the degradation of customers’ satisfaction, confidence and trust for the recommender system. On the technical side, Recurrent Neural Network (RNN) has become the most prevalent technique to model dynamic user preferences. Benefit from the natural characteristics of RNN, the hidden state is a combination of long-term dependency and short-term interest to some degrees. But it works like a black-box and the monotonic temporal dependency of RNN is not sufficient to capture the user’s short-term interest.In this paper, to deal with the above issues, we propose a novel Attentive Recurrent Neural Network (Ante-RNN) with textual and visual fusion for the dynamic explainable recommendation. Specifically, our model jointly learns image representations with textual alignment and text representations with topical attention mechanism in a parallel way. Then a novel dynamic contextual attention mechanism is incorporated into Ante-RNN for modelling the complicated correlations among recent items and strengthening the user’s short-term interests. By combining the full latent visual-semantic alignments and a hybrid attention mechanism including topical and contextual attentions, Ante-RNN makes the recommendation process more transparent and explainable. Extensive experimental results on two real world datasets demonstrate the superior performance and explainability of our model.
       
  • Taylor's Q1 “Visceral” level of information need: What is
           it'
    • Abstract: Publication date: Available online 20 August 2019Source: Information Processing & ManagementAuthor(s): Charles ColeAbstractTaylor (1968) dramatically stated that information seekers/searchers do not use their real Q1-level of information need when formulating their query to the system. Instead, they use a compromised Q4-level form of their need. The article directly confronts what Taylor's (1968) Q1-level information need is–the “actual” or “real” information need of the searcher. The article conceptually and operationally defines what Taylor's Q1-level of information need is using Belkin's (1980) ASK concept as a basis for designing a system intervention that shifts the searcher from representing the Q4-level compromised form of the need in her query to representing instead her Q1-level real information need. The article describes the Q1 Actualizing Intervention Model, which can be built into a system capable of actualizing the uncertainty distribution of the searcher's belief ASK so that information search is directed by the searcher's real Q1-level information need. The objective of the Q1 Actualizing Intervention Model is to enable in our Knowledge Age the introduction of intervention IR systems that are organic and human-centric, designed to initiate organic knowledge production processes in the searcher.
       
  • ATM: Adversarial-neural Topic Model
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Rui Wang, Deyu Zhou, Yulan HeAbstractTopic models are widely used for thematic structure discovery in text. But traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate word-level semantic representations. To address the limitations, we propose a neural topic modeling approach based on the Generative Adversarial Nets (GANs), called Adversarial-neural Topic Model (ATM) in this paper. To our best knowledge, this work is the first attempt to use adversarial training for topic modeling. The proposed ATM models topics with dirichlet prior and employs a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce word-level semantic representations. Besides, to illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. To validate the effectiveness of the proposed ATM, two topic modeling benchmark corpora and an event dataset are employed in the experiments. Our experimental results on benchmark corpora show that ATM generates more coherence topics (considering five topic coherence measures), outperforming a number of competitive baselines. Moreover, the experiments on event dataset also validate that the proposed approach is able to extract meaningful events from news articles.
       
  • The usefulness of multimedia surrogates for making relevance judgments
           about digital video objects
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Barbara M. Wildemuth, Gary Marchionini, Xin Fu, Jun Sung Oh, Meng YangAbstractLarge collections of digital video are increasingly accessible. The large volume and range of available video demands search tools that allow people to browse and query easily and to quickly make sense of the videos behind the result sets. This study focused on the usefulness of several multimedia surrogates, in terms of effectiveness, efficiency, and user satisfaction. Three surrogates were evaluated and compared: a storyboard, a 7-second segment, and a fast forward. Thirty-six experienced users of digital video conducted searches on each of four systems: three incorporated one of the surrogates each, and the fourth made all three surrogates available. Participants judged the relevance of at least 10 items for each search based on the surrogate(s) available, then re-judged the relevance of two of those items based on viewing the full video. Transaction logs and post-search and post-session questionnaires provided data on user interactions, including relevance judgments, and user perceptions. All of the surrogates provided a basis for accurate relevance judgments, though they varied (in expected ways) in terms of their efficiency. User perceptions favored the system with all three surrogates available, even though it took longer to use; they found it easier to learn and easier to use, and it gave them more confidence in their judgments. Based on these results, we can conclude that it's important for digital video collections to provide multiple surrogates, each providing a different view of the video.
       
  • Cascade embedding model for knowledge graph inference and retrieval
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Daifeng Li, Andrew MaddenAbstractKnowledge graphs are widely used in retrieval systems, question answering systems (QA), hypothesis generation systems, etc. Representation learning provides a way to mine knowledge graphs to detect missing relations; and translation-based embedding models are a popular form of representation model. Shortcomings of translation-based models however, limits their practicability as knowledge completion algorithms. The proposed model helps to address some of these shortcomings.The similarity between graph structural features of two entities was found to be correlated to the relations of those entities. This correlation can help to solve the problem caused by unbalanced relations and reciprocal relations. We used Node2vec, a graph embedding algorithm, to represent information related to an entity's graph structure, and we introduce a cascade model to incorporate graph embedding with knowledge embedding into a unified framework. The cascade model first refines feature representation in the first two stages (Local Optimization Stage), and then uses backward propagation to optimize parameters of all the stages (Global Optimization Stage). This helps to enhance the knowledge representation of existing translation-based algorithms by taking into account both semantic features and graph features and fusing them to extract more useful information. Besides, different cascade structures are designed to find the optimal solution to the problem of knowledge inference and retrieval.The proposed model was verified using three mainstream knowledge graphs: WIN18, FB15K and BioChem. Experimental results were validated using the hit@10 rate entity prediction task. The proposed model performed better than TransE, giving an average improvement of 2.7% on WN18, 2.3% on FB15k and 28% on BioChem. Improvements were particularly marked where there were problems with unbalanced relations and reciprocal relations. Furthermore, the stepwise-cascade structure is proved to be more effective and significantly outperforms other baselines.
       
  • An image-text consistency driven multimodal sentiment analysis approach
           for social media
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Ziyuan Zhao, Huiying Zhu, Zehao Xue, Zhao Liu, Jing Tian, Matthew Chin Heng Chua, Maofu LiuAbstractSocial media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in the conventional social media. Consequently, the conventional text-based sentiment analysis has evolved into more complicated studies of multimodal sentiment analysis. To tackle the challenge of how to effectively exploit the information from both visual content and textual content from image-text posts, this paper proposes a new image-text consistency driven multimodal sentiment analysis approach. The proposed approach explores the correlation between the image and the text, followed by a multimodal adaptive sentiment analysis method. To be more specific, the mid-level visual features extracted by the conventional SentiBank approach are used to represent visual concepts, with the integration of other features, including textual, visual and social features, to develop a machine learning sentiment analysis approach. Extensive experiments are conducted to demonstrate the superior performance of the proposed approach.
       
  • Sentence modeling via multiple word embeddings and multi-level comparison
           for semantic textual similarity
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Nguyen Huy Tien, Nguyen Minh Le, Yamasaki Tomohiro, Izuha TatsuyaAbstractRecently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic properties. However, the Semantic Textual Similarity task, which evaluates similarity/relation between two sentences, requires to take into account of these linguistic aspects. Therefore, this research aims to encode various characteristics from multiple sets of word embeddings into one embedding and then learn similarity/relation between sentences via this novel embedding. Representing each word by multiple word embeddings, the proposed MaxLSTM-CNN encoder generates a novel sentence embedding. We then learn the similarity/relation between our sentence embeddings via Multi-level comparison. Our method M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., measure textual similarity, identify paraphrase, recognize textual entailment). Our model does not use hand-crafted features (e.g., alignment features, Ngram overlaps, dependency features) as well as does not require pre-trained word embeddings to have the same dimension.
       
  • Knowledge acquisition from parsing natural language expressions for
           humanoid robot action commands
    • Abstract: Publication date: Available online 9 August 2019Source: Information Processing & ManagementAuthor(s): Diego Reforgiato Recupero, Federico SpigaAbstractIn this paper we propose an approach that allows the NAO humanoid robot to execute natural language commands spoken by the user. To provide the robot with knowledge, we have defined an action robot ontology. The ontology is fed to an NLP engine that performs a machine reading of the input text (in natural language) given by a user and tries to identify action commands for the robot to execute. The system can work in two modes: STATELESS and STATEFUL. In STATELESS mode, each human expression correctly interpreted by the robot as an action command is performed by NAO which returns in its default posture afterwards. When in STATEFUL mode, the robot has knowledge of its current posture and performs the command only if it is compatible with its current state. In this mode, the robot does not return to its default posture. For example, if the user had told the robot to stand on its right leg in a first command, the robot cannot perform a following command stating to stand on its left leg as the two actions (raise left leg and raise right leg are incompatible). For each action that the robot can perform we modeled a corresponding element in the ontology that also includes a list of associated compatible and non-compatible actions. Our system also handles compound expressions (e.g., move your arms up) and multiple expressions (different commands within one sentence) that the robot understands and performs.
       
  • Textual keyword extraction and summarization: State-of-the-art
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Zara Nasar, Syed Waqar Jaffry, Muhammad Kamran MalikWith the advent of Web 2.0, there exist many online platforms that results in massive textual data production such as social networks, online blogs, magazines etc. This textual data carries information that can be used for betterment of humanity. Hence, there is a dire need to extract potential information out of it. This study aims to present an overview of approaches that can be applied to extract and later present these valuable information nuggets residing within text in brief, clear and concise way. In this regard, two major tasks of automatic keyword extraction and text summarization are being reviewed. To compile the literature, scientific articles were collected using major digital computing research repositories. In the light of acquired literature, survey study covers early approaches up to all the way till recent advancements using machine learning solutions. Survey findings conclude that annotated benchmark datasets for various textual data-generators such as twitter and social forms are not available. This scarcity of dataset has resulted into relatively less progress in many domains. Also, applications of deep learning techniques for the task of automatic keyword extraction are relatively unaddressed. Hence, impact of various deep architectures stands as an open research direction. For text summarization task, deep learning techniques are applied after advent of word vectors, and are currently governing state-of-the-art for abstractive summarization. Currently, one of the major challenges in these tasks is semantic aware evaluation of generated results.
       
  • Vertical and horizontal relationships amongst task-based information needs
    • Abstract: Publication date: Available online 2 August 2019Source: Information Processing & ManagementAuthor(s): Katriina Byström, Sanna KumpulainenAbstractIn this article, we present a conceptual framework of information needs for task-based information studies. The framework accounts for both vertical and horizontal relationships between information needs as fluid activities in work-task performance. As part of task performance, pieces of information are gathered from various, heterogeneous sources, not primarily to fulfil any expressed formulation of information needs, but in order to make progress in the task. The vertical relationships pinpoint connections between general and specific, from the workplace context to the interaction with an information source, and the horizontal relationships between parallel information needs. These relationships enrich the conceptual understanding of information needs in information studies, which previously has focussed on sequential relationships. The sequential, vertical and horizontal relationships form an analytical network that allows a departure from the black-box depiction of information needs.
       
  • Understanding in-context interaction: An investigation into on-the-go
           mobile search
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Morgan Harvey, Matthew PointonAbstractRecent years have seen a profound change in how most users interact with search engines: the majority of search requests now come from mobile devices, which are used in a number of distracting contexts. This use of mobile devices in various situational contexts away from a desk presents a range of novel challenges for users and, consequently, possibilities for interface improvements. However, there is at present a lack of work that evaluates interaction in such contexts to understand what effects context and mobility have on behaviour and errors and, ultimately, users’ search performance.Through a controlled study, in which we simulate walking conditions on a treadmill and obstacle course, we use a combination of interaction logs and multiple video streams to capture interaction behaviour as participants (n = 24) complete simple search tasks. Using a bespoke tagging tool to analyse these recordings, we investigate how situational context and distractions impact user behaviour and performance, contrasting this with users in a baseline, seated condition. Our findings provide insights into the issues these common contexts cause, how users adapt and how such interfaces could be improved.
       
  • “Nothing's available”: Young fathers’ experiences with unmet
           information needs and barriers to resolving them
    • Abstract: Publication date: Available online 25 July 2019Source: Information Processing & ManagementAuthor(s): C. Mniszak, H.L. O'Brien, D. Greyson, C. Chabot, J. ShovellerAbstractYoung fathers, like all parents, have a range of information needs, such as learning how to introduce their babies to solid foods. Yet compared to young mothers and older parents, they have fewer resources available to them. To date, young fathers have not been identified as a priority population in need of parenting-related information and face unequal access to information resources. This inequality is in part related to gender stereotypes and social biases about young men who become parents at “too early” an age. Through interviews and field observation conducted during a longitudinal ethnographic study of young mothers, fathers, their parents, and service providers in two cities in British Columbia, Canada, we examined young fathers’ gendered experiences accessing parenting information and resources. Using an ecological model of information needs, we identified factors at different levels: micro (e.g., personal), meso (e.g., relational) and macro (e.g., access to city/provincial parenting programs and resources) that revealed information inequalities for young fathers. Our findings illustrate that young fathers often have unexpressed and unaddressed information needs due to barriers they encounter when accessing services, the stigma they experience as early age parents, and social pressures that result in avoiding asking for help in order to adhere to traditional masculine values.
       
  • Changing views: Persuasion modeling and argument extraction from online
           discussions
    • Abstract: Publication date: Available online 24 July 2019Source: Information Processing & ManagementAuthor(s): Subhabrata Dutta, Dipankar Das, Tanmoy ChakrabortyAbstractPersuasion and argumentation are possibly among the most complex examples of the interplay between multiple human subjects. With the advent of the Internet, online forums provide wide platforms for people to share their opinions and reasonings around various diverse topics. In this work, we attempt to model persuasive interaction between users on Reddit, a popular online discussion forum. We propose a deep LSTM model to classify whether a conversation leads to a successful persuasion or not, and use this model to predict whether a certain chain of arguments can lead to persuasion. While learning persuasion dynamics, our model tends to identify argument facets implicitly, using an attention mechanism. We also propose a semi-supervised approach to extract argumentative components from discussion threads. Both these models provide useful insight into how people engage in argumentation on online discussion forums.
       
  • Vulnerable community identification using hate speech detection on social
           media
    • Abstract: Publication date: Available online 23 July 2019Source: Information Processing & ManagementAuthor(s): Zewdie Mossie, Jenq-Haur WangAbstractWith the rapid development in mobile computing and Web technologies, online hate speech has been increasingly spread in social network platforms since it's easy to post any opinions. Previous studies confirm that exposure to online hate speech has serious offline consequences to historically deprived communities. Thus, research on automated hate speech detection has attracted much attention. However, the role of social networks in identifying hate-related vulnerable community is not well investigated. Hate speech can affect all population groups, but some are more vulnerable to its impact than others. For example, for ethnic groups whose languages have few computational resources, it is a challenge to automatically collect and process online texts, not to mention automatic hate speech detection on social media. In this paper, we propose a hate speech detection approach to identify hatred against vulnerable minority groups on social media. Firstly, in Spark distributed processing framework, posts are automatically collected and pre-processed, and features are extracted using word n-grams and word embedding techniques such as Word2Vec. Secondly, deep learning algorithms for classification such as Gated Recurrent Unit (GRU), a variety of Recurrent Neural Networks (RNNs), are used for hate speech detection. Finally, hate words are clustered with methods such as Word2Vec to predict the potential target ethnic group for hatred. In our experiments, we use Amharic language in Ethiopia as an example. Since there was no publicly available dataset for Amharic texts, we crawled Facebook pages to prepare the corpus. Since data annotation could be biased by culture, we recruit annotators from different cultural backgrounds and achieved better inter-annotator agreement. In our experimental results, feature extraction using word embedding techniques such as Word2Vec performs better in both classical and deep learning-based classification algorithms for hate speech detection, among which GRU achieves the best result. Our proposed approach can successfully identify the Tigre ethnic group as the highly vulnerable community in terms of hatred compared with Amhara and Oromo. As a result, hatred vulnerable group identification is vital to protect them by applying automatic hate speech detection model to remove contents that aggravate psychological harm and physical conflicts. This can also encourage the way towards the development of policies, strategies, and tools to empower and protect vulnerable communities.
       
  • Propagating sentiment signals for estimating reputation polarity
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Anastasia Giachanou, Julio Gonzalo, Fabio CrestaniAbstractThe emergence of social media and the huge amount of opinions that are posted everyday have influenced online reputation management. Reputation experts need to filter and control what is posted online and, more importantly, determine if an online post is going to have positive or negative implications towards the entity of interest. This task is challenging, considering that there are posts that have implications on an entity's reputation but do not express any sentiment. In this paper, we propose two approaches for propagating sentiment signals to estimate reputation polarity of tweets. The first approach is based on sentiment lexicons augmentation, whereas the second is based on direct propagation of sentiment signals to tweets that discuss the same topic. In addition, we present a polar fact filter that is able to differentiate between reputation-bearing and reputation-neutral tweets. Our experiments indicate that weakly supervised annotation of reputation polarity is feasible and that sentiment signals can be propagated to effectively estimate the reputation polarity of tweets. Finally, we show that learning PMI values from the training data is the most effective approach for reputation polarity analysis.
       
  • HoAFM: A High-order Attentive Factorization Machine for CTR Prediction
    • Abstract: Publication date: Available online 22 July 2019Source: Information Processing & ManagementAuthor(s): Zhulin Tao, Xiang Wang, Xiangnan He, Xianglin Huang, Tat-Seng ChuaAbstractModeling feature interactions is of crucial importance to predict click-through rate (CTR) in industrial recommender systems. However, manually crafting cross features usually requires extensive domain knowledge and labor-intensive feature engineering to obtain the desired cross features. To alleviate this problem, the factorization machine (FM) is proposed to model feature interactions from raw features automatically. In particular, it embeds each feature in a vector representation and discovers second-order interactions as the product of two feature representations. In order to learn nonlinear and complex patterns, recent works, such as NFM, PIN, and DeepFM, exploited deep learning techniques to capture higher-order feature interactions. These approaches lack guarantees about the effectiveness of high-order pattern as they model feature interactions in a rather implicit way. To address this limitation, xDeepFM is recently proposed to generate high-order interactions of features in an explicit fashion, where multiple interaction networks are stacked. Nevertheless, xDeepFM suffers from rather high complexity which easily leads to overfitting.In this paper, we develop a more expressive but lightweight solution based on FM, named High-order Attentive Factorization Machine (HoAFM), by accounting for the higher-order sparse feature interactions in an explicit manner. Beyond the linearity of FM, we devise a cross interaction layer, which updates a feature’s representation by aggregating the representations of other co-occurred features. In addition, we perform a bit-wise attention mechanism to determine the different importance of co-occurred features on the granularity of dimensions. By stacking multiple cross interaction layers, we can inject high-order feature interactions into feature representation learning, in order to establish expressive and informative cross features. Extensive experiments are performed on two benchmark datasets, Criteo and Avazu, to demonstrate the rationality and effectiveness of HoAFM. Empirical results suggest that HoAFM achieves significant improvement over other state-of-the-art methods, such as NFM and xDeepFM. We will make the codes public upon acceptance of this paper.
       
  • On detecting business event from the headlines and leads of massive online
           news articles
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Yu Qian, Xiongwen Deng, Qiongwei Ye, Baojun Ma, Hua YuanAbstractMassive online news articles can be a good data resource for detecting the information of business events, which may be useful in many real-world applications. In this paper, we propose a three-step process of “clustering-annotation-classification strategy to extract high-quality information about business events from massive online news headlines and leads. To that end, we first introduce the word embeddings method to represent all the terms in a corpus into word vectors, based on which, we cluster the verbal terms into groups. Then, we introduce an expert to annotate each group of terms with a corresponding business events. Finally, we utilize the extracted information of business events as a classifier to detect the potential events from online news headlines and leads. By evaluating our approach with several state-of-the-art classification algorithms, the results show that our approach offers a competitive performance than the baselines in detecting business events from online news articles.Findings indicate that the verbal terms in headlines of online news article have a significant effect on identifying business events by improving the performance of our method on Recall and F−value. On the contrary, the verbal terms in leads provide a more stable performance on Precision. As a result, the strategy of combining the headline of an online news article with its lead is a viable option for detecting event information from massive online texts.
       
  • TDAM: A topic-dependent attention model for sentiment analysis
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Gabriele Pergola, Lin Gui, Yulan HeAbstractWe propose a topic-dependent attention model for sentiment classification and topic extraction. Our model assumes that a global topic embedding is shared across documents and employs an attention mechanism to derive local topic embedding for words and sentences. These are subsequently incorporated in a modified Gated Recurrent Unit (GRU) for sentiment classification and extraction of topics bearing different sentiment polarities. Those topics emerge from the words’ local topic embeddings learned by the internal attention of the GRU cells in the context of a multi-task learning framework. In this paper, we present the hierarchical architecture, the new GRU unit and the experiments conducted on users’ reviews which demonstrate classification performance on a par with the state-of-the-art methodologies for sentiment classification and topic coherence outperforming the current approaches for supervised topic extraction. In addition, our model is able to extract coherent aspect-sentiment clusters despite using no aspect-level annotations for training.
       
  • Motivating scholars’ responses in academic social networking sites: An
           empirical study on ResearchGate Q&A behavior
    • Abstract: Publication date: Available online 18 July 2019Source: Information Processing & ManagementAuthor(s): Shengli Deng, Jingjing Tong, Yanqing Lin, Hongxiu Li, Yong LiuAbstractThe advent of academic social networking sites (ASNS) has offered an unprecedented opportunity for scholars to obtain peer support online. However, little is known about the characteristics that make questions and answers popular among scholars on ASNS. Focused on the statements embedded in questions and answers, this study strives to explore the precursors that motivate scholars to respond, such as reading, following, or recommending a question or an answer. We collected empirical data from ResearchGate and coded the data via the act4teams coding scheme. Our analysis revealed a threshold effect—when the length of question description is over circa 150 words, scholars would quickly lose interest and thus not read the description. In addition, we found that questions, including positive action-oriented statements, are more likely to entice subsequent reads from other scholars. Furthermore, scholars prefer to recommend an answer with positive procedural statements or negative action-oriented statements.
       
  • The reflection of offline activities on users’ online social
           behavior: An observational study
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Seyed Amin Mirlohi Falavarjani, Fattane Zarrinkalam, Jelena Jovanovic, Ebrahim Bagheri, Ali A. GhorbaniAbstractThe ever increasing presence of online social networks in users’ daily lives has led to the interplay between users’ online and offline activities. There have already been several works that have studied the impact of users’ online activities on their offline behavior, e.g., the impact of interaction with friends on an exercise social network on the number of daily steps. In this paper, we consider the inverse to what has already been studied and report on our extensive study that explores the potential causal effects of users’ offline activities on their online social behavior. The objective of our work is to understand whether the activities that users are involved with in their real daily life, which place them within or away from social situations, have any direct causal impact on their behavior in online social networks. Our work is motivated by the theory of normative social influence, which argues that individuals may show behaviors or express opinions that conform to those of the community for the sake of being accepted or from fear of rejection or isolation. We have collected data from two online social networks, namely Twitter and Foursquare, and systematically aligned user content on both social networks. On this basis, we have performed a natural experiment that took the form of an interrupted time series with a comparison group design to study whether users’ socially situated offline activities exhibited through their Foursquare check-ins impact their online behavior captured through the content they share on Twitter. Our main findings can be summarised as follows (1) a change in users’ offline behavior that affects the level of users’ exposure to social situations, e.g., starting to go to the gym or discontinuing frequenting bars, can have a causal impact on users’ online topical interests and sentiment; and (2) the causal relations between users’ socially situated offline activities and their online social behavior can be used to build effective predictive models of users’ online topical interests and sentiments.
       
  • Using weighted k-means to identify Chinese leading venture capital firms
           incorporating with centrality measures
    • Abstract: Publication date: Available online 16 July 2019Source: Information Processing & ManagementAuthor(s): Hu Yang, Jar-Der Luo, Ying Fan, Li ZhuAbstractAlthough identifying leading venture capital firms (VCs) is a meaningful challenge in the analysis of the Chinese investment market, this research topic is rarely mentioned in the relevant literature. Given the co-investment network of VCs, identifying leading VCs is equal to determine influential nodes in the field of complex network analysis. As there are some disadvantages and limitations of using single centrality measures and the multiple criteria decision analysis (MCDA) method to identify leading VCs, this paper incorporates with several different centrality measures of co-investment network of VCs, and then proposes a new approach based on the weighted k-means to rank VCs at both group and individual levels and identify the leading VCs. The proposed approach not only shows alternative groupings based on multiple evaluation criteria, but also ranks them according to their comprehensive score which is the weighted sum of these criteria. Empirical analysis shows the efficiency and practicability of the proposed approach to identify leading Chinese VCs.
       
  • Information needs of drug users on a local dark Web marketplace
    • Abstract: Publication date: Available online 15 July 2019Source: Information Processing & ManagementAuthor(s): Ari Haasio, J. Tuomas Harviainen, Reijo SavolainenAbstractThis study examines the nature of context-sensitive information needs by focusing on the articulations of need for disnormative information among drug users. To this end, the sample of 9300 messages posted to Sipulitori, a Finnish dark web site were examined by means of descriptive statistics and qualitative content analysis. The theoretical framework of the study was developed by drawing on Tom Wilson´s idea of information need as a phenomenon fundamentally triggered by physiological, affective and cognitive factors indicating basic human needs. To examine the contextual features of needs for disnormative information, the study made use of Chatman’s theory of information poverty characteristic of small worlds and Savolainen´s model for way of life. The findings indicate that about 72% of the information need topics related to drugs dealt with the usage, availability and price of narcotics. The articulations of drug-related information needs reflected the users´ ways of life dominated by the activities of buying, selling and using illegal narcotics. Drug-related information needs are typically triggered by physiological factors, because of the centrality of the physical dependence on drugs. Our study also revealed the simultaneous existence of physiological, affective and cognitive factors especially in messages in which the information need was articulated in greater detail.
       
  • An interactive human centered data science approach towards crime pattern
           analysis
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Nadeem Qazi, B.L. William WongAbstractThe traditional machine learning systems lack a pathway for a human to integrate their domain knowledge into the underlying machine learning algorithms. The utilization of such systems, for domains where decisions can have serious consequences (e.g. medical decision-making and crime analysis), requires the incorporation of human experts' domain knowledge. The challenge, however, is how to effectively incorporate domain expert knowledge with machine learning algorithms to develop effective models for better decision making.In crime analysis, the key challenge is to identify plausible linkages in unstructured crime reports for the hypothesis formulation. Crime analysts painstakingly perform time-consuming searches of many different structured and unstructured databases to collate these associations without any proper visualization. To tackle these challenges and aiming towards facilitating the crime analysis, in this paper, we examine unstructured crime reports through text mining to extract plausible associations. Specifically, we present associative questioning based searching model to elicit multi-level associations among crime entities. We coupled this model with partition clustering to develop an interactive, human-assisted knowledge discovery and data mining scheme.The proposed human-centered knowledge discovery and data mining scheme for crime text mining is able to extract plausible associations between crimes, identifying crime pattern, grouping similar crimes, eliciting co-offender network and suspect list based on spatial-temporal and behavioral similarity. These similarities are quantified through calculating Cosine, Jacquard, and Euclidean distances. Additionally, each suspect is also ranked by a similarity score in the plausible suspect list. These associations are then visualized through creating a two-dimensional re-configurable crime cluster space along with a bipartite knowledge graph.This proposed scheme also inspects the grand challenge of integrating effective human interaction with the machine learning algorithms through a visualization feedback loop. It allows the analyst to feed his/her domain knowledge including choosing of similarity functions for identifying associations, dynamic feature selection for interactive clustering of crimes and assigning weights to each component of the crime pattern to rank suspects for an unsolved crime.We demonstrate the proposed scheme through a case study using the Anonymized burglary dataset. The scheme is found to facilitate human reasoning and analytic discourse for intelligence analysis.
       
  • Fast top-k similarity search in large dynamic attributed networks
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Zaiqiao Meng, Hong ShenAbstractIn this paper, we study the problem of retrieving top-k nodes that are similar to a given query node in large dynamic attributed networks. To tackle this problem, we propose a fast Attribute augmented Single-source Path similarity algorithm (ASP). Our ASP constructs an attribute augmented network that integrates both node structure and attribute similarities through similarity scores computed by an efficient single-source path sampling scheme. It also contains simple and effective updating schemes to maintain similarity scores for dynamic edge insertions and deletions. We provide an upper bound of the sampling size of ASP for obtaining an ϵ-approximation estimation of similarity scores with probability at least 1−δ. We theoretically prove that the sampling size and computational complexity of ASP are significantly lower than that of deploying the currently known most effective method Panther. Implementation results from extensive experiments in both synthetic networks and real-world social networks show that our ASP achieves better convergence performance, runs faster than the baselines, and effectively captures both structure and attribute similarities of the nodes. Specifically, we show that our algorithm can return top-k similar nodes for any query node in the real-world network at least 44x faster than the state-of-the-art methods, and update similarity scores in about 1 second for a batch of network modifications in a network of size 1,000,000.
       
  • Tracking user-role evolution via topic modeling in community question
           answering
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Chaogang FuAbstractCommunity question answering (CQA) services that enable users to ask and answer questions are popular on the internet. Each user can simultaneously play the roles of asker and answerer. Some work has aimed to model the roles of users for potential applications in CQA. However, the dynamic characteristics of user roles have not been addressed. User roles vary over time. This paper explores user representation by tracking user-role evolution, which could enable several potential applications in CQA, such as question recommendation. We believe this paper is the first to track user-role evolution and investigate its influence on the performance of question recommendation in CQA. Moreover, we propose a time-aware role model (TRM) to effectively track user-role evolution. With different independence assumptions, two variants of TRM are developed. Finally, we present the TRM-based approach to question recommendation, which provides a mechanism to naturally integrate the user-role evolution with content relevance between the answerer and the question into a unified probabilistic framework. Experiments using real-world data from Stack Overflow show that (1) the TRM is valid for tracking user-role evolution, and (2) compared with baselines utilizing role based methods, our TRM-based approach consistently and significantly improves the performance of question recommendation. Hence, our approach could enable several potential applications in CQA.
       
  • Information behavior and ICT use of Latina immigrants to the U.S. Midwest
    • Abstract: Publication date: Available online 13 July 2019Source: Information Processing & ManagementAuthor(s): Denice Adkins, Heather Moulaison SandyAbstractLatina immigrants to the U.S. Midwest are a vibrant, complex, and resilient population of women with intersectional identities stemming from their participation in at least three distinct but interrelated communities: (1) women [in a family-centric culture defined by strong gender roles], (2) immigrants [potentially with linguistic and socioeconomic status disadvantages] and (3) residents of the U.S. Midwest [a low-population/rural area with lesser access to resources and an increasingly xenophobic host community]. Given the potential for marginalization, Latina immigrants to the Midwest represent a population vulnerable to digital exclusion. The current research is the first to investigate systematically ICT use by immigrant Latinas to the U.S. Midwest. Specifically, as consumers and users of technology-mediated information, Latina immigrants to the U.S. Midwest navigate a complex and understudied social environment. To develop a strategy to begin to break down technology barriers for these women, first the complex and interconnected nature of their social environment and information practices needs to be understood; the current article presents that foundational research.
       
  • An extensive study on the evolution of context-aware personalized travel
           recommender systems
    • Abstract: Publication date: Available online 11 July 2019Source: Information Processing & ManagementAuthor(s): Shini Renjith, A. Sreekumar, M. JathavedanAbstractEver since the beginning of civilization, travel for various causes exists as an essential part of human life so as travel recommendations, though the early form of recommendations were the accrued experiences shared by the community. Modern recommender systems evolved along with the growth of Information Technology and are contributing to all industry and service segments inclusive of travel and tourism. The journey started with generic recommender engines which gave way to personalized recommender systems and further advanced to contextualized personalization with advent of artificial intelligence. Current era is also witnessing a boom in social media usage and the social media big data is acting as a critical input for various analytics with no exception for recommender systems. This paper details about the study conducted on the evolution of travel recommender systems, their features and current set of limitations. We also discuss on the key algorithms being used for classification and recommendation processes and metrics that can be used to evaluate the performance of the algorithms and thereby the recommenders.
       
  • The interaction effects of online reviews and free samples on consumers’
           downloads: An empirical analysis
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Li Shengli, Li FanAbstractConsumers’ software purchase decisions are influenced both by online reviews and by their experiences with free samples provided by firms. This paper empirically investigates the differential effects of online reviews (user and editor ratings) on consumers’ sample downloading behavior, using a dataset drawn from a large software free sampling website CNET.com. Our findings extend the previous research by suggesting that information disclosure levels of free samples (indicated by licenses) moderates the impacts of online reviews on consumers’ sample downloads. For samples that disclose a great level of information, higher user ratings can increase downloads; otherwise, higher user ratings fail to increase downloads. When both user and editor ratings are available to consumers, only user ratings can increase sample downloads. The findings can be explained by consumers’ two-stage information process whereby consumers first refer to online reviews and then determine whether to sample software. This study provides practical implications on the design of information disclosure channel and offers suggestions for firms regarding how to select and apply sample licenses.
       
  • Exploratory study of cross-device search tasks
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Dan Wu, Jing Dong, Chunxiang LiuAbstractCross-device search is an emerging subject in the study of information retrieval. This paper explores cross-device search behavior through the characteristics of cross-device search tasks. Unlike previous research on transaction log analysis, this paper extracted cross-device search tasks from descriptions of real-situation cross-device search experiences collected by a crowdsourcing survey targeting global users. A total of 343 valid responses were used for the content analysis, and the coding scheme was grounded in the Multiple Information Seeking Episodes (MISE) model, which was proposed for explaining successive multiple-episode search. Characteristics of cross-device search tasks were uncovered by coded categories of Topic, Type, Complexity of Knowledge Dimension, Complexity of Cognitive Dimension, Environment, Device Switch, and Switching Demand. The results show the most frequently searched topics are Arts, Shopping, Reference, and Computers. Task types focus on factual tasks, indicating a clear information need. Task complexity depends heavily on the user's cognition. Eight reasons for switching device are identified in understanding device switch demand. Finally, implications for designing cross-device search tasks are proposed based on the correlation among task attributes. Limitations on the degree to which respondents answered recall-based questions accurately have been acknowledged.
       
  • A Deep Look into neural ranking models for information retrieval
    • Abstract: Publication date: Available online 9 July 2019Source: Information Processing & ManagementAuthor(s): Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W. Bruce Croft, Xueqi ChengAbstractRanking models lie at the heart of research on information retrieval (IR). During the past decades, different techniques have been proposed for constructing ranking models, from traditional heuristic methods, probabilistic methods, to modern machine learning methods. Recently, with the advance of deep learning technology, we have witnessed a growing body of work in applying shallow or deep neural networks to the ranking problem in IR, referred to as neural ranking models in this paper. The power of neural ranking models lies in the ability to learn from the raw text inputs for the ranking problem to avoid many limitations of hand-crafted features. Neural networks have sufficient capacity to model complicated tasks, which is needed to handle the complexity of relevance estimation in ranking. Since there have been a large variety of neural ranking models proposed, we believe it is the right time to summarize the current status, learn from existing methodologies, and gain some insights for future development. In contrast to existing reviews, in this survey, we will take a deep look into the neural ranking models from different dimensions to analyze their underlying assumptions, major design principles, and learning strategies. We compare these models through benchmark tasks to obtain a comprehensive empirical understanding of the existing techniques. We will also discuss what is missing in the current literature and what are the promising and desired future directions.
       
  • Class-aware tensor factorization for multi-relational classification
    • Abstract: Publication date: Available online 9 July 2019Source: Information Processing & ManagementAuthor(s): Georgios Katsimpras, Georgios PaliourasAbstractIn this paper, we propose a tensor factorization method, called CLASS-RESCAL, which associates the class labels of data samples with their latent representations. Specifically, we extend RESCAL to produce a semi-supervised factorization method that combines a classification error term with the standard factor optimization process. CLASS-RESCAL assimilates information from all the relations of the tensor, while also taking into account classification performance. This procedure forces the data samples within the same class to have similar latent representations. Experimental results on several real-world social network data indicate this is a promising approach for multi-relational classification tasks.
       
  • Language models and fusion for authorship attribution
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Olga Fourkioti, Symeon Symeonidis, Avi ArampatzisAbstractWe deal with the task of authorship attribution, i.e. identifying the author of an unknown document, proposing the use of Part Of Speech (POS) tags as features for language modeling. The experimentation is carried out on corpora untypical for the task, i.e., with documents edited by non-professional writers, such as movie reviews or tweets. The former corpus is homogeneous with respect to the topic making the task more challenging, The latter corpus, puts language models into a framework of a continuously and fast evolving language, unique and noisy writing style, and limited length of social media messages. While we find that language models based on POS tags are competitive in only one of the corpora (movie reviews), they generally provide efficiency benefits and robustness against data sparsity. Furthermore, we experiment with model fusion, where language models based on different modalities are combined. By linearly combining three language models, based on characters, words, and POS trigrams, respectively, we achieve the best generalization accuracy of 96% on movie reviews, while the combination of language models based on characters and POS trigrams provides 54% accuracy on the Twitter corpus. In fusion, POS language models are proven essential effective components.
       
  • The evolution of argumentation mining: From models to social media and
           emerging tools
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Anastasios Lytos, Thomas Lagkas, Panagiotis Sarigiannidis, Kalina BontchevaAbstractArgumentation mining is a rising subject in the computational linguistics domain focusing on extracting structured arguments from natural text, often from unstructured or noisy text. The initial approaches on modeling arguments was aiming to identify a flawless argument on specific fields (Law, Scientific Papers) serving specific needs (completeness, effectiveness). With the emerge of Web 2.0 and the explosion in the use of social media both the diffusion of the data and the argument structure have changed. In this survey article, we bridge the gap between theoretical approaches of argumentation mining and pragmatic schemes that satisfy the needs of social media generated data, recognizing the need for adapting more flexible and expandable schemes, capable to adjust to the argumentation conditions that exist in social media. We review, compare, and classify existing approaches, techniques and tools, identifying the positive outcome of combining tasks and features, and eventually propose a conceptual architecture framework. The proposed theoretical framework is an argumentation mining scheme able to identify the distinct sub-tasks and capture the needs of social media text, revealing the need for adopting more flexible and extensible frameworks.
       
  • On the negative impact of social influence in recommender systems: A study
           of bribery in collaborative hybrid algorithms
    • Abstract: Publication date: Available online 3 July 2019Source: Information Processing & ManagementAuthor(s): Guilherme Ramos, Ludovico Boratto, Carlos CaleiroAbstractRecommender systems are based on inherent forms of social influence. Indeed, suggestions are provided to the users based on the opinions of peers. Given the relevance that ratings have nowadays to push the sales of an item, sellers might decide to bribe users so that they rate or change the ratings given to items, thus increasing the sellers’ reputation. Hence, by exploiting the fact that influential users can lead an item to get recommended, bribing can become an effective way to negatively exploit social influence and introduce a bias in the recommendations. Given that bribing is forbidden but still employed by sellers, we propose a novel matrix completion algorithm that performs hybrid memory-based collaborative filtering using an approximation of Kolmogorov complexity. We also propose a framework to study the bribery effect and the bribery resistance of our approach. Our theoretical analysis, validated through experiments on real-world datasets, shows that our approach is an effective way to counter bribing while, with state-of-the-art algorithms, sellers can bribe a large part of the users.
       
  • Implicit information need as explicit problems, help, and behavioral
           signals
    • Abstract: Publication date: Available online 3 July 2019Source: Information Processing & ManagementAuthor(s): Shawon Sarkar, Matthew Mitsui, Jiqun Liu, Chirag ShahAbstractInformation need is one of the most fundamental aspects of information seeking, which traditionally conceptualizes as the initiation phase of an individual’s information seeking behavior. However, the very elusive and inexpressible nature of information need makes it hard to elicit from the information seeker or to extract through an automated process. One approach to understanding how a person realizes and expresses information need is to observe their seeking behaviors, to engage processes with information retrieval systems, and to focus on situated performative actions. Using Dervin’s Sense-Making theory and conceptualization of information need based on existing studies, the work reported here tries to understand and explore the concept of information need from a fresh methodological perspective by examining users’ perceived barriers and desired helps in different stages of information search episodes through the analyses of various implicit and explicit user search behaviors. In a controlled lab study, each participant performed three simulated online information search tasks. Participants’ implicit behaviors were collected through search logs, and explicit feedback was elicited through pre-task and post-task questionnaires. A total of 208 query segments were logged, along with users’ annotations on perceived problems and help. Data collected from the study was analyzed by applying both quantitative and qualitative methods. The findings identified several behaviors – such as the number of bookmarks, query length, number of the unique queries, time spent on search results observed in the previous segment, the current segment, and throughout the session – strongly associated with participants’ perceived barriers and help needed. The findings also showed that it is possible to build accurate predictive models to infer perceived problems of articulation of queries, useless and irrelevant information, and unavailability of information from users’ previous segment, current segment, and whole session behaviors. The findings also demonstrated that by combining perceived problem(s) and search behavioral features, it was possible to infer users’ needed help(s) in search with a certain level of accuracy (78%).
       
  • Shaping the contours of fractured landscapes: Extending the layering of an
           information perspective on refugee resettlement
    • Abstract: Publication date: Available online 28 June 2019Source: Information Processing & ManagementAuthor(s): Annemaree LloydAbstractRefugee experience of resettlement into a third country is problematised by posing the question, what happens when an established information landscape fractures' Themes of disjuncture, intensification and liminality that have emerged from the author's research are described, using social theories as the analytical lens to shape the contours of fracture. Two other questions are posed How is digital space implicated in rebuilding information landscapes that have become fractured' and; What is the role of technology in enabling or constraining the conditions for remaking place'
       
  • A multi-centrality index for graph-based keyword extraction
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Didier A. Vega-Oliveros, Pedro Spoljaric Gomes, Evangelos E. Milios, Lilian BertonAbstractKeyword extraction aims to capture the main topics of a document and is an important step in natural language processing (NLP) applications. The use of different graph centrality measures has been proposed to extract automatic keywords. However, there is no consensus yet on how these measures compare in this task. Here, we present the multi-centrality index (MCI) approach, which aims to find the optimal combination of word rankings according to the selection of centrality measures. We analyze nine centrality measures (Betweenness, Clustering Coefficient, Closeness, Degree, Eccentricity, Eigenvector, K-Core, PageRank, Structural Holes) for identifying keywords in co-occurrence word-graphs representation of documents. We perform experiments on three datasets of documents and demonstrate that all individual centrality methods achieve similar statistical results, while the proposed MCI approach significantly outperforms the individual centralities, three clustering algorithms, and previously reported results in the literature.
       
  • A two phase investment game for competitive opinion dynamics in social
           networks
    • Abstract: Publication date: Available online 24 June 2019Source: Information Processing & ManagementAuthor(s): Swapnil Dhamal, Walid Ben-Ameur, Tijani Chahed, Eitan AltmanAbstractWe propose a setting for two-phase opinion dynamics in social networks, where a node’s final opinion in the first phase acts as its initial biased opinion in the second phase. In this setting, we study the problem of two camps aiming to maximize adoption of their respective opinions, by strategically investing on nodes in the two phases. A node’s initial opinion in the second phase naturally plays a key role in determining the final opinion of that node, and hence also of other nodes in the network due to its influence on them. However, more importantly, this bias also determines the effectiveness of a camp’s investment on that node in the second phase. In order to formalize this two-phase investment setting, we propose an extension of Friedkin–Johnsen model, and hence formulate the utility functions of the camps. We arrive at a decision parameter which can be interpreted as two-phase Katz centrality. There is a natural tradeoff while splitting the available budget between the two phases. A lower investment in the first phase results in worse initial biases in the network for the second phase. On the other hand, a higher investment in the first phase spares a lower available budget for the second phase, resulting in an inability to fully harness the influenced biases. We first analyze the non-competitive case where only one camp invests, for which we present a polynomial time algorithm for determining an optimal way to split the camp’s budget between the two phases. We then analyze the case of competing camps, where we show the existence of Nash equilibrium and that it can be computed in polynomial time under reasonable assumptions. We conclude our study with simulations on real-world network datasets, in order to quantify the effects of the initial biases and the weightage attributed by nodes to their initial biases, as well as that of a camp deviating from its equilibrium strategy. Our main conclusion is that, if nodes attribute high weightage to their initial biases, it is advantageous to have a high investment in the first phase, so as to effectively influence the biases to be harnessed in the second phase.
       
  • Crime base: Towards building a knowledge base for crime entities and their
           relationships from online news papers
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Srinivasa K, P. Santhi ThilagamAbstractIn the current era of internet, information related to crime is scattered across many sources namely news media, social networks, blogs, and video repositories, etc. Crime reports published in online newspapers are often considered as reliable compared to crowdsourced data like social media and contain crime information not only in the form of unstructured text but also in the form of images. Given the volume and availability of crime-related information present in online newspapers, gathering and integrating crime entities from multiple modalities and representing them as a knowledge base in machine-readable form will be useful for any law enforcement agencies to analyze and prevent criminal activities. Extant research works to generate the crime knowledge base, does not address extraction of all non-redundant entities from text and image data present in multiple newspapers. Hence, this work proposes Crime Base, an entity relationship based system to extract and integrate crime related text and image data from online newspapers with a focus towards reducing duplicity and loss of information in the knowledge base. The proposed system uses a rule-based approach to extract the entities from text and image captions. The entities extracted from text data are correlated using contextual as-well-as semantic similarity measures and image entities are correlated using low-level and high-level image features. The proposed system also presents an integrated view of these entities and their relations in the form of a knowledge base using OWL. The system is tested for a collection of crime related articles from popular Indian online newspapers.
       
  • Evidential fine-grained event localization using Twitter
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Zahra Khodabandeh Shahraki, Afsaneh Fatemi, Hadi Tabatabaee MalaziAbstractThe widespread popularity and worldwide application of social networks have raised interest in the analysis of content created on the networks. One such analytical application and aspect of social networks, including Twitter, is identifying the location of various political and social events, natural disasters and so on. The present study focuses on the localization of traffic accidents. Outdated and inaccurate information in user profiles, the absence of location data in tweet texts, and the limited number of geotagged posts are among the challenges tackled by location estimation. Adopting the Dempster–Shafer Evidence Theory, the present study estimates the location of accidents using a combination of user profiles, tweet texts, and the place attachments in tweets. The results indicate improved performance regarding error distance and average error distance compared to previously developed methods. The proposed method in this study resulted in a reduced error distance of 26%.
       
  • Extended co-citation search: Graph-based document retrieval on a
           co-citation network containing citation context information
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Masaki EtoAbstractThis study proposes a novel extended co-citation search technique, which is graph-based document retrieval on a co-citation network containing citation context information. The proposed search expands the scope of the target documents by repetitively spreading the relationship of co-citation in order to obtain relevant documents that are not identified by traditional co-citation searches. Specifically, this search technique is a combination of (a) applying a graph-based algorithm to compute the similarity score on a complicated network, and (b) incorporating co-citation contexts into the process of calculating similarity scores to reduce the negative effects of an increasing number of irrelevant documents. To evaluate the search performance of the proposed search, 10 proposed methods (five representative graph-based algorithms applied to co-citation networks weighted with/without contexts) are compared with two kinds of baselines (a traditional co-citation search with/without contexts) in information retrieval experiments based on two test collections (biomedicine and computer linguistic articles). The experiment results showed that the scores of the normalized discounted cumulative gain (nDCG@K) of the proposed methods using co-citation contexts tended to be higher than those of the baselines. In addition, the combination of the random walk with restart (RWR) algorithm and the network weighted with contexts achieved the best search performance among the 10 proposed methods. Thus, it is clarified that the combination of graph-based algorithms and co-citation contexts are effective in improving the performance of co-citation search techniques, and that sole use of a graph-based algorithm is not enough to enhance search performances from the baselines.
       
  • Fuzzy topic modeling approach for text mining over short text
    • Abstract: Publication date: November 2019Source: Information Processing & Management, Volume 56, Issue 6Author(s): Junaid Rashid, Syed Muhammad Adnan Shah, Aun IrtazaAbstractIn this era, the proliferating role of social media in our lives has popularized the posting of the short text. The short texts contain limited context with unique characteristics which makes them difficult to handle. Every day billions of short texts are produced in the form of tags, keywords, tweets, phone messages, messenger conversations social network posts, etc. The analysis of these short texts is imperative in the field of text mining and content analysis. The extraction of precise topics from large-scale short text documents is a critical and challenging task. The conventional approaches fail to obtain word co-occurrence patterns in topics due to the sparsity problem in short texts, such as text over the web, social media like Twitter, and news headlines. Therefore, in this paper, the sparsity problem is ameliorated by presenting a novel fuzzy topic modeling (FTM) approach for short text through fuzzy perspective. In this research, the local and global term frequencies are computed through a bag-of-words (BOW) model. To remove the negative impact of high dimensionality on the global term weighting, the principal component analysis is adopted; thereafter the fuzzy c-means algorithm is employed to retrieve the semantically relevant topics from the documents. The experiments are conducted over the three real-world short text datasets: the snippets dataset is in the category of small dataset whereas the other two datasets, Twitter and questions, are the bigger datasets. Experimental results show that the proposed approach discovered the topics more precisely and performed better as compared to other state-of-the-art baseline topic models such as GLTM, CSTM, LTM, LDA, Mix-gram, BTM, SATM, and DREx+LDA. The performance of FTM is also demonstrated in classification, clustering, topic coherence and execution time. FTM classification accuracy is 0.95, 0.94, 0.91, 0.89 and 0.87 on snippets dataset with 50, 75, 100, 125 and 200 number of topics. The classification accuracy of FTM on questions dataset is 0.73, 0.74, 0.70, 0.68 and 0.78 with 50, 75, 100, 125 and 200 number of topics. The classification accuracies of FTM on snippets and questions datasets are higher than state-of-the-art baseline topic models.
       
  • Churn modeling with probabilistic meta paths-based representation learning
    • Abstract: Publication date: Available online 22 June 2019Source: Information Processing & ManagementAuthor(s): Sandra Mitrović, Jochen De WeerdtAbstractFinding structural and efficient ways of leveraging available data is not an easy task, especially when dealing with network data, as is the case in telco churn prediction. Several previous works have made advancements in this direction both from the perspective of churn prediction, by proposing augmented call graph architectures, and from the perspective of graph featurization, by proposing different graph representation learning methods, frequently exploiting random walks. However, both graph augmentation as well as representation learning-based featurization face drawbacks. In this work, we first shift the focus from a homogeneous to a heterogeneous perspective, by defining different probabilistic meta paths on augmented call graphs. Secondly, we focus on solutions for the usually significant number of random walks that graph representation learning methods require. To this end, we propose a sampling method for random walks based on a combination of most suitable random walk generation strategies, which we determine with the help of corresponding Markov models. In our experimental evaluation, we demonstrate the benefits of probabilistic meta path-based walk generation in terms of predictive power. In addition, this paper provides promising insights regarding the interplay of the type of meta path and the predictive outcome, as well as the potential of sampling random walks based on the meta path structure in order to alleviate the computational requirements of representation learning by reducing typically sizable required data input.
       
  • Compact group discovery in attributed graphs and social networks
    • Abstract: Publication date: Available online 22 June 2019Source: Information Processing & ManagementAuthor(s): Abeer Khan, Lukasz Golab, Mehdi Kargar, Jaroslaw Szlichta, Morteza ZihayatAbstractSocial networks and many other graphs are attributed, meaning that their nodes are labelled with textual information such as personal data, expertise or interests. In attributed graphs, a common data analysis task is to find subgraphs whose nodes contain a given set of keywords. In many applications, the size of the subgraph should be limited (i.e., a subgraph with thousands of nodes is not desired). In this work, we introduce the problem of compact attributed group (AG) discovery. Given a set of query keywords and a desired solution size, the task is to find subgraphs with the desired number of nodes, such that the nodes are closely connected and each node contains as many query keywords as possible. We prove that finding an optimal solution is NP-hard and we propose approximation algorithms with a guaranteed ratio of two. Since the number of qualifying AGs may be large, we also show how to find approximate top-k AGs with polynomial delay. Finally, we experimentally verify the effectiveness and efficiency of our techniques on real-world graphs.
       
  • Fine-grained tourism prediction: Impact of social and environmental
           features
    • Abstract: Publication date: Available online 21 June 2019Source: Information Processing & ManagementAuthor(s): Amir Khatibi, Fabiano Belém, Ana Paula Couto da Silva, Jussara M. Almeida, Marcos A. GonçalvesAbstractAccurate predictions about future events is essential in many areas, one of them being the Tourism Industry. Usually, cities and countries invest a huge amount of money for planning and preparation in order to welcome (and profit from) tourists. The success of many businesses depends largely or totally on the state of tourism demand. Estimation of tourism demand can be helpful to business planners in reducing the risk of decisions regarding the future since tourism products are, generally speaking, perishable (gone if not used). Prior studies in this domain focus on forecasting for a whole country and not for fine-grained areas within a country (e.g., specific touristic attractions) mainly because of lack of data. Our article tackles exactly this issue. With the rapid popularity growth of social media applications, each year more people interact within online resources to plan and comment on their trips. Motivated by such observation, we here suggest that accessible data in online social networks or travel websites, in addition to environmental data, can be used to support the inference of visitation count for either indoor or outdoor touristic attractions. To test our hypothesis we analyze visitation counts, environmental features and social media data related to 27 museums and galleries in U.K as well as 76 national parks in the U.S. Our experimental results reveal high accuracy levels (above 92%) for predicting tourism demand using features from both social media and environmental data. We also show that, for outdoor attractions, environmental features have better predictive power while the opposite occurs for indoor attractions. In any case, best results, in all scenarios, are obtained when using both types of features jointly. Finally, we perform a detailed failure analysis to inspect the cases in which the prediction results are not satisfactory.
       
  • User community detection via embedding of social network structure and
           temporal content
    • Abstract: Publication date: Available online 21 June 2019Source: Information Processing & ManagementAuthor(s): Hossein Fani, Eric Jiang, Ebrahim Bagheri, Feras Al-Obeidat, Weichang Du, Mehdi KargarAbstractIdentifying and extracting user communities is an important step towards understanding social network dynamics from a macro perspective. For this reason, the work in this paper explores various aspects related to the identification of user communities. To date, user community detection methods employ either explicit links between users (link analysis), or users’ topics of interest in posted content (content analysis), or in tandem. Little work has considered temporal evolution when identifying user communities in a way to group together those users who share not only similar topical interests but also similar temporal behavior towards their topics of interest. In this paper, we identify user communities through multimodal feature learning (embeddings). Our core contributions can be enumerated as (a) we propose a new method for learning neural embeddings for users based on their temporal content similarity; (b) we learn user embeddings based on their social network connections (links) through neural graph embeddings; (c) we systematically interpolate temporal content-based embeddings and social link-based embeddings to capture both social network connections and temporal content evolution for representing users, and (d) we systematically evaluate the quality of each embedding type in isolation and also when interpolated together and demonstrate their performance on a Twitter dataset under two different application scenarios, namely news recommendation and user prediction. We find that (1) content-based methods produce higher quality communities compared to link-based methods; (2) methods that consider temporal evolution of content, our proposed method in particular, show better performance compared to their non-temporal counter-parts; (3) communities that are produced when time is explicitly incorporated in user vector representations have higher quality than the ones produced when time is incorporated into a generative process, and finally (4) while link-based methods are weaker than content-based methods, their interpolation with content-based methods leads to improved quality of the identified communities.
       
  • Boosted seed oversampling for local community ranking
    • Abstract: Publication date: Available online 18 June 2019Source: Information Processing & ManagementAuthor(s): Emmanouil Krasanakis, Emmanouil Schinas, Symeon Papadopoulos, Yiannis Kompatsiaris, Andreas SymeonidisAbstractLocal community detection is an emerging topic in network analysis that aims to detect well-connected communities encompassing sets of priorly known seed nodes. In this work, we explore the similar problem of ranking network nodes based on their relevance to the communities characterized by seed nodes. However, seed nodes may not be central enough or sufficiently many to produce high quality ranks. To solve this problem, we introduce a methodology we call seed oversampling, which first runs a node ranking algorithm to discover more nodes that belong to the community and then reruns the same ranking algorithm for the new seed nodes. We formally discuss why this process improves the quality of calculated community ranks if the original set of seed nodes is small and introduce a boosting scheme that iteratively repeats seed oversampling to further improve rank quality when certain ranking algorithm properties are met. Finally, we demonstrate the effectiveness of our methods in improving community relevance ranks given only a few random seed nodes of real-world network communities. In our experiments, boosted and simple seed oversampling yielded better rank quality than the previous neighborhood inflation heuristic, which adds the neighborhoods of original seed nodes to seeds.
       
  • Eating healthier: Exploring nutrition information for healthier recipe
           recommendation
    • Abstract: Publication date: Available online 3 June 2019Source: Information Processing & ManagementAuthor(s): Meng Chen, Xiaoyi Jia, Elizabeth Gorbonos, Chnh T. Hong, Xiaohui Yu, Yang LiuAbstractWith the booming of personalized recipe sharing networks (e.g., Yummly), a deluge of recipes from different cuisines could be obtained easily. In this paper, we aim to solve a problem which many home-cooks encounter when searching for recipes online. Namely, finding recipes which best fit a handy set of ingredients while at the same time follow healthy eating guidelines. This task is especially difficult since the lions share of online recipes have been shown to be unhealthy. In this paper we propose a novel framework named NutRec, which models the interactions between ingredients and their proportions within recipes for the purpose of offering healthy recommendation. Specifically, NutRec consists of three main components: 1) using an embedding-based ingredient predictor to predict the relevant ingredients with user-defined initial ingredients, 2) predicting the amounts of the relevant ingredients with a multi-layer perceptron-based network, 3) creating a healthy pseudo-recipe with a list of ingredients and their amounts according to the nutritional information and recommending the top similar recipes with the pseudo-recipe. We conduct the experiments on two recipe datasets, including Allrecipes with 36,429 recipes and Yummly with 89,413 recipes, respectively. The empirical results support the framework’s intuition and showcase its ability to retrieve healthier recipes.
       
  • Hierarchical neural query suggestion with an attention mechanism
    • Abstract: Publication date: Available online 18 May 2019Source: Information Processing & ManagementAuthor(s): Wanyu Chen, Fei Cai, Honghui Chen, Maarten de RijkeAbstractQuery suggestions help users of a search engine to refine their queries. Previous work on query suggestion has mainly focused on incorporating directly observable features such as query co-occurrence and semantic similarity. The structure of such features is often set manually, as a result of which hidden dependencies between queries and users may be ignored. We propose an Attention-based Hierarchical Neural Query Suggestion (AHNQS) model that uses an attention mechanism to automatically capture user preferences. AHNQS combines a session-level neural network and a user-level neural network into a hierarchical structure to model the short- and long-term search history of a user. We quantify the improvements of AHNQS over state-of-the-art recurrent neural network-based query suggestion baselines on the AOL query log dataset, with improvements of up to 9.66% and 12.51% in terms of Recall@10 and MRR@10, respectively; improvements are especially obvious for short sessions and inactive users with few search sessions.
       
  • The rise and fall of network stars: Analyzing 2.5 million graphs to reveal
           how high-degree vertices emerge over time
    • Abstract: Publication date: Available online 16 May 2019Source: Information Processing & ManagementAuthor(s): Michael Fire, Carlos GuestrinAbstractTrends change rapidly in today’s world, prompting this key question: What is the mechanism behind the emergence of new trends' By representing real-world dynamic systems as complex networks, the emergence of new trends can be symbolized by vertices that “shine.” That is, at a specific time interval in a network’s life, certain vertices become increasingly connected to other vertices. This process creates new high-degree vertices, i.e., network stars. Thus, to study trends, we must look at how networks evolve over time and determine how the stars behave. In our research, we constructed the largest publicly available network evolution dataset to date, which contains 38,000 real-world networks and 2.5 million graphs. Then, we performed the first precise wide-scale analysis of the evolution of networks with various scales. Three primary observations resulted: (a) links are most prevalent among vertices that join a network at a similar time; (b) the rate that new vertices join a network is a central factor in molding a network’s topology; and (c) the emergence of network stars (high-degree vertices) is correlated with fast-growing networks. We applied our learnings to develop a flexible network-generation model based on large-scale, real-world data. This model gives a better understanding of how stars rise and fall within networks, and is applicable to dynamic systems both in nature and society.Multimedia Links▶ Video ▶ Interactive Data Visualization ▶ Data ▶ Code Tutorials
       
  • An evaluation of document clustering and topic modelling in two online
           social networks: Twitter and Reddit
    • Abstract: Publication date: Available online 17 April 2019Source: Information Processing & ManagementAuthor(s): Stephan A. Curiskis, Barry Drake, Thomas R. Osborn, Paul J. KennedyAbstractMethods for document clustering and topic modelling in online social networks (OSNs) offer a means of categorising, annotating and making sense of large volumes of user generated content. Many techniques have been developed over the years, ranging from text mining and clustering methods to latent topic models and neural embedding approaches. However, many of these methods deliver poor results when applied to OSN data as such text is notoriously short and noisy, and often results are not comparable across studies. In this study we evaluate several techniques for document clustering and topic modelling on three datasets from Twitter and Reddit. We benchmark four different feature representations derived from term-frequency inverse-document-frequency (tf-idf) matrices and word embedding models combined with four clustering methods, and we include a Latent Dirichlet Allocation topic model for comparison. Several different evaluation measures are used in the literature, so we provide a discussion and recommendation for the most appropriate extrinsic measures for this task. We also demonstrate the performance of the methods over data sets with different document lengths. Our results show that clustering techniques applied to neural embedding feature representations delivered the best performance over all data sets using appropriate extrinsic evaluation measures. We also demonstrate a method for interpreting the clusters with a top-words based approach using tf-idf weights combined with embedding distance measures.
       
  • Neural opinion dynamics model for the prediction of user-level stance
           dynamics
    • Abstract: Publication date: Available online 29 March 2019Source: Information Processing & ManagementAuthor(s): Lixing Zhu, Yulan He, Deyu ZhouAbstractSocial media platforms allow users to express their opinions towards various topics online. Oftentimes, users’ opinions are not static, but might be changed over time due to the influences from their neighbors in social networks or updated based on arguments encountered that undermine their beliefs. In this paper, we propose to use a Recurrent Neural Network (RNN) to model each user’s posting behaviors on Twitter and incorporate their neighbors’ topic-associated context as attention signals using an attention mechanism for user-level stance prediction. Moreover, our proposed model operates in an online setting in that its parameters are continuously updated with the Twitter stream data and can be used to predict user’s topic-dependent stance. Detailed evaluation on two Twitter datasets, related to Brexit and US General Election, justifies the superior performance of our neural opinion dynamics model over both static and dynamic alternatives for user-level stance prediction.
       
  • Detecting breaking news rumors of emerging topics in social media
    • Abstract: Publication date: Available online 28 February 2019Source: Information Processing & ManagementAuthor(s): Sarah A. Alkhodair, Steven H.H. Ding, Benjamin C.M. Fung, Junqiang LiuAbstractUsers of social media websites tend to rapidly spread breaking news and trending stories without considering their truthfulness. This facilitates the spread of rumors through social networks. A rumor is a story or statement for which truthfulness has not been verified. Efficiently detecting and acting upon rumors throughout social networks is of high importance to minimizing their harmful effect. However, detecting them is not a trivial task. They belong to unseen topics or events that are not covered in the training dataset. In this paper, we study the problem of detecting breaking news rumors, instead of long-lasting rumors, that spread in social media. We propose a new approach that jointly learns word embeddings and trains a recurrent neural network with two different objectives to automatically identify rumors. The proposed strategy is simple but effective to mitigate the topic shift issues. Emerging rumors do not have to be false at the time of the detection. They can be deemed later to be true or false. However, most previous studies on rumor detection focus on long-standing rumors and assume that rumors are always false. In contrast, our experiment simulates a cross-topic emerging rumor detection scenario with a real-life rumor dataset. Experimental results suggest that our proposed model outperforms state-of-the-art methods in terms of precision, recall, and F1.
       
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
 
Home (Search)
Subjects A-Z
Publishers A-Z
Customise
APIs
Your IP address: 34.226.234.20
 
About JournalTOCs
API
Help
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-