Journal Cover
Information Processing & Management
Journal Prestige (SJR): 0.92
Citation Impact (citeScore): 4
Number of Followers: 595  
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 0306-4573 - ISSN (Online) 0306-4573
Published by Elsevier Homepage  [3147 journals]
  • Heuristics for interesting class association rule mining a colorectal
           cancer database
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): José A. Delgado-Osuna, Carlos García-Martínez, José Gómez-Barbadillo, Sebastián VenturaColorectal cancer affects many people and is one of the most frequent causes of cancer-related deaths in many countries. Professionals of the Reina Sofia University Hospital have fed a database about this pathology, with 1516 patients and 126 attributes, for more than 10 years. Finding useful knowledge therein has shown to be a difficult endeavor. We present four heuristic operators and a complete methodology for searching for interesting rules that describe cases with complications and recurrences. Our proposal has shown some advantages over the well-known Apriori algorithm, for class association rule mining, and the adaptation of three representatives of associative classification. Besides, it has allowed us to identify rules with practical interest among the vast amount of trivial and sporadic associations.Graphical abstractGraphical abstract for this article
  • Efficient generation of spatiotemporal relationships from spatial data
           streams and static data
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Sungkwang Eom, Xiongnan Jin, Kyong-Ho LeeAbstractRecently, a massive amount of position-annotated data is being generated in a stream fashion. Also, massive amounts of static data including spatial features are collected and made available. In the Internet of Things (IoT) environments, various applications can get benefits by utilizing spatial data streams and static data. Therefore, IoT applications typically require stream processing and reasoning capabilities that extract information from low-level data. Particularly for sophisticated stream processing and reasoning, spatiotemporal relationship (SR) generation from spatial data streams and static data must be preceded. However, existing techniques mostly focus solely on direct processing of sensing data or generation of spatial relationships from static data. In this paper, we first address the importance of SRs between spatial data streams and static data and then propose an efficient approach of deriving SRs in real-time. We design a novel R-tree-based index with Representative Rectangles (RRs) called R3 index and devise an algorithm that leverages relationships and distances between RRs to generate SRs. To verify the effectiveness and efficiency of the proposed approach, we performed experiments using real-world datasets. Through the results of the experiments, we confirmed the superiority of the proposed approach.
  • ViSSa: Recognizing the appropriateness of videos on social media with
           on-demand crowdsourcing
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Sankar Kumar Mridha, Braznev Sarkar, Sujoy Chatterjee, Malay BhattacharyyaAbstractThe recent significant growth of social media has brought the attention of researchers toward monitoring the enormous amount of streaming data using real-time approaches. This data may appear in different forms like streaming text, images, audio, videos, etc. In this paper, we address the problem of deciding the appropriateness of streaming videos with the help of on-demand crowdsourcing. We propose a novel crowd-powered model ViSSa, which is an open crowdsourcing platform that helps to automatically detect appropriateness of the videos getting uploaded online through employing the viewers of existing videos. The proposed model presents a unique approach of not only identifying unsafe videos but also detecting the portion of inappropriateness (in terms of platform’s vulnerabilities). Our experiments with 47 crowd contributors demonstrate the effectiveness of the proposed approach. On the designed ViSSa platform, 18 safe videos are initially posted. After getting access, 20 new videos are added by different users. These videos are assessed (and marked as safe or unsafe) by users and finally with judgment analysis a consensus judgment is obtained. The approach detects the unsafe videos with high accuracy (95%) and point out the portion of inappropriateness. Interestingly, changing the mode of video segment allocation (homogeneous and heterogeneous) is found to have a significant impact on the viewers’ feedback. However, the proposed approach performs consistently well in different modes of viewing (with varying diversity of opinions), and with any arbitrary video size and type. The users are found to be motivated by their sense of responsibility. This paper also highlights the importance of identifying spammers through such models.
  • Incremental focal loss GANs
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Fei Gao, Jingjie Zhu, Hanliang Jiang, Zhenxing Niu, Weidong Han, Jun YuAbstractGenerative Adversarial Networks (GANs) have achieved inspiring performance in both unsupervised image generation and conditional cross-modal image translation. However, how to generate quality images at an affordable cost is still challenging. We argue that it is the vast number of easy examples that disturb training of GANs, and propose to address this problem by down-weighting losses assigned to easy examples. Our novel Incremental Focal Loss (IFL) progressively focuses training on hard examples and prevents easy examples from overwhelming the generator and discriminator during training. In addition, we propose an enhanced self-attention (ESA) mechanism to boost the representational capacity of the generator. We apply IFL and ESA to a number of unsupervised and conditional GANs, and conduct experiments on various tasks, including face photo-sketch synthesis, map↔aerial-photo translation, single image super-resolution reconstruction, and image generation on CelebA, LSUN, and CIFAR-10. Results show that IFL boosts learning of GANs over existing loss functions. Besides, both IFL and ESA make GANs produce quality images with realistic details in all these tasks, even when no task adaptation is involved.
  • A multi-cascaded model with data augmentation for enhanced paraphrase
           detection in short texts
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Muhammad Haroon Shakeel, Asim Karim, Imdadullah KhanAbstractParaphrase detection is an important task in text analytics with numerous applications such as plagiarism detection, duplicate question identification, and enhanced customer support helpdesks. Deep models have been proposed for representing and classifying paraphrases. These models, however, require large quantities of human-labeled data, which is expensive to obtain. In this work, we present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts. Our data augmentation strategy considers the notions of paraphrases and non-paraphrases as binary relations over the set of texts. Subsequently, it uses graph theoretic concepts to efficiently generate additional paraphrase and non-paraphrase pairs in a sound manner. Our multi-cascaded model employs three supervised feature learners (cascades) based on CNN and LSTM networks with and without soft-attention. The learned features, together with hand-crafted linguistic features, are then forwarded to a discriminator network for final classification. Our model is both wide and deep and provides greater robustness across clean and noisy short texts. We evaluate our approach on three benchmark datasets and show that it produces a comparable or state-of-the-art performance on all three.
  • Detecting breaking news rumors of emerging topics in social media
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Sarah A. Alkhodair, Steven H.H. Ding, Benjamin C.M. Fung, Junqiang LiuAbstractUsers of social media websites tend to rapidly spread breaking news and trending stories without considering their truthfulness. This facilitates the spread of rumors through social networks. A rumor is a story or statement for which truthfulness has not been verified. Efficiently detecting and acting upon rumors throughout social networks is of high importance to minimizing their harmful effect. However, detecting them is not a trivial task. They belong to unseen topics or events that are not covered in the training dataset. In this paper, we study the problem of detecting breaking news rumors, instead of long-lasting rumors, that spread in social media. We propose a new approach that jointly learns word embeddings and trains a recurrent neural network with two different objectives to automatically identify rumors. The proposed strategy is simple but effective to mitigate the topic shift issues. Emerging rumors do not have to be false at the time of the detection. They can be deemed later to be true or false. However, most previous studies on rumor detection focus on long-standing rumors and assume that rumors are always false. In contrast, our experiment simulates a cross-topic emerging rumor detection scenario with a real-life rumor dataset. Experimental results suggest that our proposed model outperforms state-of-the-art methods in terms of precision, recall, and F1.
  • Neural opinion dynamics model for the prediction of user-level stance
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Lixing Zhu, Yulan He, Deyu ZhouAbstractSocial media platforms allow users to express their opinions towards various topics online. Oftentimes, users’ opinions are not static, but might be changed over time due to the influences from their neighbors in social networks or updated based on arguments encountered that undermine their beliefs. In this paper, we propose to use a Recurrent Neural Network (RNN) to model each user’s posting behaviors on Twitter and incorporate their neighbors’ topic-associated context as attention signals using an attention mechanism for user-level stance prediction. Moreover, our proposed model operates in an online setting in that its parameters are continuously updated with the Twitter stream data and can be used to predict user’s topic-dependent stance. Detailed evaluation on two Twitter datasets, related to Brexit and US General Election, justifies the superior performance of our neural opinion dynamics model over both static and dynamic alternatives for user-level stance prediction.
  • An evaluation of document clustering and topic modelling in two online
           social networks: Twitter and Reddit
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Stephan A. Curiskis, Barry Drake, Thomas R. Osborn, Paul J. KennedyAbstractMethods for document clustering and topic modelling in online social networks (OSNs) offer a means of categorising, annotating and making sense of large volumes of user generated content. Many techniques have been developed over the years, ranging from text mining and clustering methods to latent topic models and neural embedding approaches. However, many of these methods deliver poor results when applied to OSN data as such text is notoriously short and noisy, and often results are not comparable across studies. In this study we evaluate several techniques for document clustering and topic modelling on three datasets from Twitter and Reddit. We benchmark four different feature representations derived from term-frequency inverse-document-frequency (tf-idf) matrices and word embedding models combined with four clustering methods, and we include a Latent Dirichlet Allocation topic model for comparison. Several different evaluation measures are used in the literature, so we provide a discussion and recommendation for the most appropriate extrinsic measures for this task. We also demonstrate the performance of the methods over data sets with different document lengths. Our results show that clustering techniques applied to neural embedding feature representations delivered the best performance over all data sets using appropriate extrinsic evaluation measures. We also demonstrate a method for interpreting the clusters with a top-words based approach using tf-idf weights combined with embedding distance measures.
  • The rise and fall of network stars: Analyzing 2.5 million graphs to reveal
           how high-degree vertices emerge over time
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Michael Fire, Carlos GuestrinAbstractTrends change rapidly in today’s world, prompting this key question: What is the mechanism behind the emergence of new trends' By representing real-world dynamic systems as complex networks, the emergence of new trends can be symbolized by vertices that “shine.” That is, at a specific time interval in a network’s life, certain vertices become increasingly connected to other vertices. This process creates new high-degree vertices, i.e., network stars. Thus, to study trends, we must look at how networks evolve over time and determine how the stars behave. In our research, we constructed the largest publicly available network evolution dataset to date, which contains 38,000 real-world networks and 2.5 million graphs. Then, we performed the first precise wide-scale analysis of the evolution of networks with various scales. Three primary observations resulted: (a) links are most prevalent among vertices that join a network at a similar time; (b) the rate that new vertices join a network is a central factor in molding a network’s topology; and (c) the emergence of network stars (high-degree vertices) is correlated with fast-growing networks. We applied our learnings to develop a flexible network-generation model based on large-scale, real-world data. This model gives a better understanding of how stars rise and fall within networks, and is applicable to dynamic systems both in nature and society.Multimedia Links▶ Video ▶ Interactive Data Visualization ▶ Data ▶ Code Tutorials
  • Churn modeling with probabilistic meta paths-based representation learning
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Sandra Mitrović, Jochen De WeerdtAbstractFinding structural and efficient ways of leveraging available data is not an easy task, especially when dealing with network data, as is the case in telco churn prediction. Several previous works have made advancements in this direction both from the perspective of churn prediction, by proposing augmented call graph architectures, and from the perspective of graph featurization, by proposing different graph representation learning methods, frequently exploiting random walks. However, both graph augmentation as well as representation learning-based featurization face drawbacks. In this work, we first shift the focus from a homogeneous to a heterogeneous perspective, by defining different probabilistic meta paths on augmented call graphs. Secondly, we focus on solutions for the usually significant number of random walks that graph representation learning methods require. To this end, we propose a sampling method for random walks based on a combination of most suitable random walk generation strategies, which we determine with the help of corresponding Markov models. In our experimental evaluation, we demonstrate the benefits of probabilistic meta path-based walk generation in terms of predictive power. In addition, this paper provides promising insights regarding the interplay of the type of meta path and the predictive outcome, as well as the potential of sampling random walks based on the meta path structure in order to alleviate the computational requirements of representation learning by reducing typically sizable required data input.
  • Boosted seed oversampling for local community ranking
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Emmanouil Krasanakis, Emmanouil Schinas, Symeon Papadopoulos, Yiannis Kompatsiaris, Andreas SymeonidisAbstractLocal community detection is an emerging topic in network analysis that aims to detect well-connected communities encompassing sets of priorly known seed nodes. In this work, we explore the similar problem of ranking network nodes based on their relevance to the communities characterized by seed nodes. However, seed nodes may not be central enough or sufficiently many to produce high quality ranks. To solve this problem, we introduce a methodology we call seed oversampling, which first runs a node ranking algorithm to discover more nodes that belong to the community and then reruns the same ranking algorithm for the new seed nodes. We formally discuss why this process improves the quality of calculated community ranks if the original set of seed nodes is small and introduce a boosting scheme that iteratively repeats seed oversampling to further improve rank quality when certain ranking algorithm properties are met. Finally, we demonstrate the effectiveness of our methods in improving community relevance ranks given only a few random seed nodes of real-world network communities. In our experiments, boosted and simple seed oversampling yielded better rank quality than the previous neighborhood inflation heuristic, which adds the neighborhoods of original seed nodes to seeds.
  • Compact group discovery in attributed graphs and social networks
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Abeer Khan, Lukasz Golab, Mehdi Kargar, Jaroslaw Szlichta, Morteza ZihayatAbstractSocial networks and many other graphs are attributed, meaning that their nodes are labelled with textual information such as personal data, expertise or interests. In attributed graphs, a common data analysis task is to find subgraphs whose nodes contain a given set of keywords. In many applications, the size of the subgraph should be limited (i.e., a subgraph with thousands of nodes is not desired). In this work, we introduce the problem of compact attributed group (AG) discovery. Given a set of query keywords and a desired solution size, the task is to find subgraphs with the desired number of nodes, such that the nodes are closely connected and each node contains as many query keywords as possible. We prove that finding an optimal solution is NP-hard and we propose approximation algorithms with a guaranteed ratio of two. Since the number of qualifying AGs may be large, we also show how to find approximate top-k AGs with polynomial delay. Finally, we experimentally verify the effectiveness and efficiency of our techniques on real-world graphs.
  • User community detection via embedding of social network structure and
           temporal content
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Hossein Fani, Eric Jiang, Ebrahim Bagheri, Feras Al-Obeidat, Weichang Du, Mehdi KargarAbstractIdentifying and extracting user communities is an important step towards understanding social network dynamics from a macro perspective. For this reason, the work in this paper explores various aspects related to the identification of user communities. To date, user community detection methods employ either explicit links between users (link analysis), or users’ topics of interest in posted content (content analysis), or in tandem. Little work has considered temporal evolution when identifying user communities in a way to group together those users who share not only similar topical interests but also similar temporal behavior towards their topics of interest. In this paper, we identify user communities through multimodal feature learning (embeddings). Our core contributions can be enumerated as (a) we propose a new method for learning neural embeddings for users based on their temporal content similarity; (b) we learn user embeddings based on their social network connections (links) through neural graph embeddings; (c) we systematically interpolate temporal content-based embeddings and social link-based embeddings to capture both social network connections and temporal content evolution for representing users, and (d) we systematically evaluate the quality of each embedding type in isolation and also when interpolated together and demonstrate their performance on a Twitter dataset under two different application scenarios, namely news recommendation and user prediction. We find that (1) content-based methods produce higher quality communities compared to link-based methods; (2) methods that consider temporal evolution of content, our proposed method in particular, show better performance compared to their non-temporal counter-parts; (3) communities that are produced when time is explicitly incorporated in user vector representations have higher quality than the ones produced when time is incorporated into a generative process, and finally (4) while link-based methods are weaker than content-based methods, their interpolation with content-based methods leads to improved quality of the identified communities.
  • Fine-grained tourism prediction: Impact of social and environmental
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Amir Khatibi, Fabiano Belém, Ana Paula Couto da Silva, Jussara M. Almeida, Marcos A. GonçalvesAbstractAccurate predictions about future events is essential in many areas, one of them being the Tourism Industry. Usually, cities and countries invest a huge amount of money for planning and preparation in order to welcome (and profit from) tourists. The success of many businesses depends largely or totally on the state of tourism demand. Estimation of tourism demand can be helpful to business planners in reducing the risk of decisions regarding the future since tourism products are, generally speaking, perishable (gone if not used). Prior studies in this domain focus on forecasting for a whole country and not for fine-grained areas within a country (e.g., specific touristic attractions) mainly because of lack of data. Our article tackles exactly this issue. With the rapid popularity growth of social media applications, each year more people interact within online resources to plan and comment on their trips. Motivated by such observation, we here suggest that accessible data in online social networks or travel websites, in addition to environmental data, can be used to support the inference of visitation count for either indoor or outdoor touristic attractions. To test our hypothesis we analyze visitation counts, environmental features and social media data related to 27 museums and galleries in U.K as well as 76 national parks in the U.S. Our experimental results reveal high accuracy levels (above 92%) for predicting tourism demand using features from both social media and environmental data. We also show that, for outdoor attractions, environmental features have better predictive power while the opposite occurs for indoor attractions. In any case, best results, in all scenarios, are obtained when using both types of features jointly. Finally, we perform a detailed failure analysis to inspect the cases in which the prediction results are not satisfactory.
  • On the negative impact of social influence in recommender systems: A study
           of bribery in collaborative hybrid algorithms
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Guilherme Ramos, Ludovico Boratto, Carlos CaleiroAbstractRecommender systems are based on inherent forms of social influence. Indeed, suggestions are provided to the users based on the opinions of peers. Given the relevance that ratings have nowadays to push the sales of an item, sellers might decide to bribe users so that they rate or change the ratings given to items, thus increasing the sellers’ reputation. Hence, by exploiting the fact that influential users can lead an item to get recommended, bribing can become an effective way to negatively exploit social influence and introduce a bias in the recommendations. Given that bribing is forbidden but still employed by sellers, we propose a novel matrix completion algorithm that performs hybrid memory-based collaborative filtering using an approximation of Kolmogorov complexity. We also propose a framework to study the bribery effect and the bribery resistance of our approach. Our theoretical analysis, validated through experiments on real-world datasets, shows that our approach is an effective way to counter bribing while, with state-of-the-art algorithms, sellers can bribe a large part of the users.
  • A two phase investment game for competitive opinion dynamics in social
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Swapnil Dhamal, Walid Ben-Ameur, Tijani Chahed, Eitan AltmanAbstractWe propose a setting for two-phase opinion dynamics in social networks, where a node’s final opinion in the first phase acts as its initial biased opinion in the second phase. In this setting, we study the problem of two camps aiming to maximize adoption of their respective opinions, by strategically investing on nodes in the two phases. A node’s initial opinion in the second phase naturally plays a key role in determining the final opinion of that node, and hence also of other nodes in the network due to its influence on them. However, more importantly, this bias also determines the effectiveness of a camp’s investment on that node in the second phase. In order to formalize this two-phase investment setting, we propose an extension of Friedkin–Johnsen model, and hence formulate the utility functions of the camps. We arrive at a decision parameter which can be interpreted as two-phase Katz centrality. There is a natural tradeoff while splitting the available budget between the two phases. A lower investment in the first phase results in worse initial biases in the network for the second phase. On the other hand, a higher investment in the first phase spares a lower available budget for the second phase, resulting in an inability to fully harness the influenced biases. We first analyze the non-competitive case where only one camp invests, for which we present a polynomial time algorithm for determining an optimal way to split the camp’s budget between the two phases. We then analyze the case of competing camps, where we show the existence of Nash equilibrium and that it can be computed in polynomial time under reasonable assumptions. We conclude our study with simulations on real-world network datasets, in order to quantify the effects of the initial biases and the weightage attributed by nodes to their initial biases, as well as that of a camp deviating from its equilibrium strategy. Our main conclusion is that, if nodes attribute high weightage to their initial biases, it is advantageous to have a high investment in the first phase, so as to effectively influence the biases to be harnessed in the second phase.
  • Class-aware tensor factorization for multi-relational classification
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Georgios Katsimpras, Georgios PaliourasAbstractIn this paper, we propose a tensor factorization method, called CLASS-RESCAL, which associates the class labels of data samples with their latent representations. Specifically, we extend RESCAL to produce a semi-supervised factorization method that combines a classification error term with the standard factor optimization process. CLASS-RESCAL assimilates information from all the relations of the tensor, while also taking into account classification performance. This procedure forces the data samples within the same class to have similar latent representations. Experimental results on several real-world social network data indicate this is a promising approach for multi-relational classification tasks.
  • Using weighted k-means to identify Chinese leading venture capital firms
           incorporating with centrality measures
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Hu Yang, Jar-Der Luo, Ying Fan, Li ZhuAbstractAlthough identifying leading venture capital firms (VCs) is a meaningful challenge in the analysis of the Chinese investment market, this research topic is rarely mentioned in the relevant literature. Given the co-investment network of VCs, identifying leading VCs is equal to determine influential nodes in the field of complex network analysis. As there are some disadvantages and limitations of using single centrality measures and the multiple criteria decision analysis (MCDA) method to identify leading VCs, this paper incorporates with several different centrality measures of co-investment network of VCs, and then proposes a new approach based on the weighted k-means to rank VCs at both group and individual levels and identify the leading VCs. The proposed approach not only shows alternative groupings based on multiple evaluation criteria, but also ranks them according to their comprehensive score which is the weighted sum of these criteria. Empirical analysis shows the efficiency and practicability of the proposed approach to identify leading Chinese VCs.
  • Implicit information need as explicit problems, help, and behavioral
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Shawon Sarkar, Matthew Mitsui, Jiqun Liu, Chirag ShahAbstractInformation need is one of the most fundamental aspects of information seeking, which traditionally conceptualizes as the initiation phase of an individual’s information seeking behavior. However, the very elusive and inexpressible nature of information need makes it hard to elicit from the information seeker or to extract through an automated process. One approach to understanding how a person realizes and expresses information need is to observe their seeking behaviors, to engage processes with information retrieval systems, and to focus on situated performative actions. Using Dervin’s Sense-Making theory and conceptualization of information need based on existing studies, the work reported here tries to understand and explore the concept of information need from a fresh methodological perspective by examining users’ perceived barriers and desired helps in different stages of information search episodes through the analyses of various implicit and explicit user search behaviors. In a controlled lab study, each participant performed three simulated online information search tasks. Participants’ implicit behaviors were collected through search logs, and explicit feedback was elicited through pre-task and post-task questionnaires. A total of 208 query segments were logged, along with users’ annotations on perceived problems and help. Data collected from the study was analyzed by applying both quantitative and qualitative methods. The findings identified several behaviors – such as the number of bookmarks, query length, number of the unique queries, time spent on search results observed in the previous segment, the current segment, and throughout the session – strongly associated with participants’ perceived barriers and help needed. The findings also showed that it is possible to build accurate predictive models to infer perceived problems of articulation of queries, useless and irrelevant information, and unavailability of information from users’ previous segment, current segment, and whole session behaviors. The findings also demonstrated that by combining perceived problem(s) and search behavioral features, it was possible to infer users’ needed help(s) in search with a certain level of accuracy (78%).
  • Information needs of drug users on a local dark Web marketplace
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Ari Haasio, J. Tuomas Harviainen, Reijo SavolainenAbstractThis study examines the nature of context-sensitive information needs by focusing on the articulations of need for disnormative information among drug users. To this end, the sample of 9300 messages posted to Sipulitori, a Finnish dark web site were examined by means of descriptive statistics and qualitative content analysis. The theoretical framework of the study was developed by drawing on Tom Wilson´s idea of information need as a phenomenon fundamentally triggered by physiological, affective and cognitive factors indicating basic human needs. To examine the contextual features of needs for disnormative information, the study made use of Chatman’s theory of information poverty characteristic of small worlds and Savolainen´s model for way of life. The findings indicate that about 72% of the information need topics related to drugs dealt with the usage, availability and price of narcotics. The articulations of drug-related information needs reflected the users´ ways of life dominated by the activities of buying, selling and using illegal narcotics. Drug-related information needs are typically triggered by physiological factors, because of the centrality of the physical dependence on drugs. Our study also revealed the simultaneous existence of physiological, affective and cognitive factors especially in messages in which the information need was articulated in greater detail.
  • Vulnerable community identification using hate speech detection on social
    • Abstract: Publication date: Available online 23 July 2019Source: Information Processing & ManagementAuthor(s): Zewdie Mossie, Jenq-Haur WangAbstractWith the rapid development in mobile computing and Web technologies, online hate speech has been increasingly spread in social network platforms since it's easy to post any opinions. Previous studies confirm that exposure to online hate speech has serious offline consequences to historically deprived communities. Thus, research on automated hate speech detection has attracted much attention. However, the role of social networks in identifying hate-related vulnerable community is not well investigated. Hate speech can affect all population groups, but some are more vulnerable to its impact than others. For example, for ethnic groups whose languages have few computational resources, it is a challenge to automatically collect and process online texts, not to mention automatic hate speech detection on social media. In this paper, we propose a hate speech detection approach to identify hatred against vulnerable minority groups on social media. Firstly, in Spark distributed processing framework, posts are automatically collected and pre-processed, and features are extracted using word n-grams and word embedding techniques such as Word2Vec. Secondly, deep learning algorithms for classification such as Gated Recurrent Unit (GRU), a variety of Recurrent Neural Networks (RNNs), are used for hate speech detection. Finally, hate words are clustered with methods such as Word2Vec to predict the potential target ethnic group for hatred. In our experiments, we use Amharic language in Ethiopia as an example. Since there was no publicly available dataset for Amharic texts, we crawled Facebook pages to prepare the corpus. Since data annotation could be biased by culture, we recruit annotators from different cultural backgrounds and achieved better inter-annotator agreement. In our experimental results, feature extraction using word embedding techniques such as Word2Vec performs better in both classical and deep learning-based classification algorithms for hate speech detection, among which GRU achieves the best result. Our proposed approach can successfully identify the Tigre ethnic group as the highly vulnerable community in terms of hatred compared with Amhara and Oromo. As a result, hatred vulnerable group identification is vital to protect them by applying automatic hate speech detection model to remove contents that aggravate psychological harm and physical conflicts. This can also encourage the way towards the development of policies, strategies, and tools to empower and protect vulnerable communities.
  • HoAFM: A High-order Attentive Factorization Machine for CTR Prediction
    • Abstract: Publication date: Available online 22 July 2019Source: Information Processing & ManagementAuthor(s): Zhulin Tao, Xiang Wang, Xiangnan He, Xianglin Huang, Tat-Seng ChuaAbstractModeling feature interactions is of crucial importance to predict click-through rate (CTR) in industrial recommender systems. However, manually crafting cross features usually requires extensive domain knowledge and labor-intensive feature engineering to obtain the desired cross features. To alleviate this problem, the factorization machine (FM) is proposed to model feature interactions from raw features automatically. In particular, it embeds each feature in a vector representation and discovers second-order interactions as the product of two feature representations. In order to learn nonlinear and complex patterns, recent works, such as NFM, PIN, and DeepFM, exploited deep learning techniques to capture higher-order feature interactions. These approaches lack guarantees about the effectiveness of high-order pattern as they model feature interactions in a rather implicit way. To address this limitation, xDeepFM is recently proposed to generate high-order interactions of features in an explicit fashion, where multiple interaction networks are stacked. Nevertheless, xDeepFM suffers from rather high complexity which easily leads to overfitting.In this paper, we develop a more expressive but lightweight solution based on FM, named High-order Attentive Factorization Machine (HoAFM), by accounting for the higher-order sparse feature interactions in an explicit manner. Beyond the linearity of FM, we devise a cross interaction layer, which updates a feature’s representation by aggregating the representations of other co-occurred features. In addition, we perform a bit-wise attention mechanism to determine the different importance of co-occurred features on the granularity of dimensions. By stacking multiple cross interaction layers, we can inject high-order feature interactions into feature representation learning, in order to establish expressive and informative cross features. Extensive experiments are performed on two benchmark datasets, Criteo and Avazu, to demonstrate the rationality and effectiveness of HoAFM. Empirical results suggest that HoAFM achieves significant improvement over other state-of-the-art methods, such as NFM and xDeepFM.
  • Information behavior and ICT use of Latina immigrants to the U.S. Midwest
    • Abstract: Publication date: Available online 13 July 2019Source: Information Processing & ManagementAuthor(s): Denice Adkins, Heather Moulaison SandyAbstractLatina immigrants to the U.S. Midwest are a vibrant, complex, and resilient population of women with intersectional identities stemming from their participation in at least three distinct but interrelated communities: (1) women [in a family-centric culture defined by strong gender roles], (2) immigrants [potentially with linguistic and socioeconomic status disadvantages] and (3) residents of the U.S. Midwest [a low-population/rural area with lesser access to resources and an increasingly xenophobic host community]. Given the potential for marginalization, Latina immigrants to the Midwest represent a population vulnerable to digital exclusion. The current research is the first to investigate systematically ICT use by immigrant Latinas to the U.S. Midwest. Specifically, as consumers and users of technology-mediated information, Latina immigrants to the U.S. Midwest navigate a complex and understudied social environment. To develop a strategy to begin to break down technology barriers for these women, first the complex and interconnected nature of their social environment and information practices needs to be understood; the current article presents that foundational research.
  • A Deep Look into neural ranking models for information retrieval
    • Abstract: Publication date: Available online 9 July 2019Source: Information Processing & ManagementAuthor(s): Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W. Bruce Croft, Xueqi ChengAbstractRanking models lie at the heart of research on information retrieval (IR). During the past decades, different techniques have been proposed for constructing ranking models, from traditional heuristic methods, probabilistic methods, to modern machine learning methods. Recently, with the advance of deep learning technology, we have witnessed a growing body of work in applying shallow or deep neural networks to the ranking problem in IR, referred to as neural ranking models in this paper. The power of neural ranking models lies in the ability to learn from the raw text inputs for the ranking problem to avoid many limitations of hand-crafted features. Neural networks have sufficient capacity to model complicated tasks, which is needed to handle the complexity of relevance estimation in ranking. Since there have been a large variety of neural ranking models proposed, we believe it is the right time to summarize the current status, learn from existing methodologies, and gain some insights for future development. In contrast to existing reviews, in this survey, we will take a deep look into the neural ranking models from different dimensions to analyze their underlying assumptions, major design principles, and learning strategies. We compare these models through benchmark tasks to obtain a comprehensive empirical understanding of the existing techniques. We will also discuss what is missing in the current literature and what are the promising and desired future directions.
  • Shaping the contours of fractured landscapes: Extending the layering of an
           information perspective on refugee resettlement
    • Abstract: Publication date: Available online 28 June 2019Source: Information Processing & ManagementAuthor(s): Annemaree LloydAbstractRefugee experience of resettlement into a third country is problematised by posing the question, what happens when an established information landscape fractures' Themes of disjuncture, intensification and liminality that have emerged from the author's research are described, using social theories as the analytical lens to shape the contours of fracture. Two other questions are posed How is digital space implicated in rebuilding information landscapes that have become fractured' and; What is the role of technology in enabling or constraining the conditions for remaking place'
  • Eating healthier: Exploring nutrition information for healthier recipe
    • Abstract: Publication date: Available online 3 June 2019Source: Information Processing & ManagementAuthor(s): Meng Chen, Xiaoyi Jia, Elizabeth Gorbonos, Chnh T. Hong, Xiaohui Yu, Yang LiuAbstractWith the booming of personalized recipe sharing networks (e.g., Yummly), a deluge of recipes from different cuisines could be obtained easily. In this paper, we aim to solve a problem which many home-cooks encounter when searching for recipes online. Namely, finding recipes which best fit a handy set of ingredients while at the same time follow healthy eating guidelines. This task is especially difficult since the lions share of online recipes have been shown to be unhealthy. In this paper we propose a novel framework named NutRec, which models the interactions between ingredients and their proportions within recipes for the purpose of offering healthy recommendation. Specifically, NutRec consists of three main components: 1) using an embedding-based ingredient predictor to predict the relevant ingredients with user-defined initial ingredients, 2) predicting the amounts of the relevant ingredients with a multi-layer perceptron-based network, 3) creating a healthy pseudo-recipe with a list of ingredients and their amounts according to the nutritional information and recommending the top similar recipes with the pseudo-recipe. We conduct the experiments on two recipe datasets, including Allrecipes with 36,429 recipes and Yummly with 89,413 recipes, respectively. The empirical results support the framework’s intuition and showcase its ability to retrieve healthier recipes.
  • Hierarchical neural query suggestion with an attention mechanism
    • Abstract: Publication date: Available online 18 May 2019Source: Information Processing & ManagementAuthor(s): Wanyu Chen, Fei Cai, Honghui Chen, Maarten de RijkeAbstractQuery suggestions help users of a search engine to refine their queries. Previous work on query suggestion has mainly focused on incorporating directly observable features such as query co-occurrence and semantic similarity. The structure of such features is often set manually, as a result of which hidden dependencies between queries and users may be ignored. We propose an Attention-based Hierarchical Neural Query Suggestion (AHNQS) model that uses an attention mechanism to automatically capture user preferences. AHNQS combines a session-level neural network and a user-level neural network into a hierarchical structure to model the short- and long-term search history of a user. We quantify the improvements of AHNQS over state-of-the-art recurrent neural network-based query suggestion baselines on the AOL query log dataset, with improvements of up to 9.66% and 12.51% in terms of Recall@10 and MRR@10, respectively; improvements are especially obvious for short sessions and inactive users with few search sessions.
  • An approach for measuring semantic similarity between Wikipedia concepts
           using multiple inheritances
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Muhammad Jawad Hussain, Shahbaz Hassan Wasti, Guangjian Huang, Lina Wei, Yuncheng Jiang, Yong TangAbstractWikipedia provides a huge collaboratively made semi-structured taxonomy called Wikipedia category graph (WCG), which can be utilized as a Knowledge Graph (KG) to measure the semantic similarity (SS) between Wikipedia concepts. Previously, several Most Informative Common Ancestor-based (MICA-based) SS methods have been proposed by intrinsically manipulating the taxonomic structure of WCG. However, some basic structural issues in WCG such as huge size, branching factor and multiple inheritance relations hamper the applicability of traditional MICA-based and multiple inheritance-based approaches in it. Therefore, in this paper, we propose a solution to handle these structural issues and present a new multiple inheritance-based SS approach, called Neighborhood Ancestor Semantic Contribution (NASC). In this approach, firstly, we define the neighborhood of a category (a taxonomic concept in WCG) to define its semantic space. Secondly, we describe the semantic value of a category by aggregating the intrinsic IC-based semantic contribution weights of its semantically relevant multiple ancestors. Thirdly, based on our approach, we propose six different methods to compute the SS between Wikipedia concepts. Finally, we evaluate our methods on gold standard word similarity benchmarks for English, German, Spanish and French languages. The experimental evaluation demonstrates that the proposed NASC-based methods remarkably outperform traditional MICA-based and multiple inheritance-based approaches.
  • Topical result caching in web search engines
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Ida Mele, Nicola Tonellotto, Ophir Frieder, Raffaele PeregoAbstractCaching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache), a refinement of the traditional SDC (Static-Dynamic Cache) that stores in a static cache the results of popular queries and manages the dynamic cache with a replacement policy for intercepting the temporal variations in the query stream.Our proposed caching scheme includes another layer for topic-based caching, where the entries are allocated to different topics (e.g., weather, education). The results of queries characterized by a topic are kept in the fraction of the cache dedicated to it. This permits to adapt the cache-space utilization to the temporal locality of the various topics and reduces cache misses due to those queries that are neither sufficiently popular to be in the static portion nor requested within short-time intervals to be in the dynamic portion.We simulate different configurations for STD using two real-world query streams. Experiments demonstrate that our approach outperforms SDC with an increase up to 3% in terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the theoretical optimal caching algorithm.
  • Exploring temporal representations by leveraging attention-based
           bidirectional LSTM-RNNs for multi-modal emotion recognition
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Chao Li, Zhongtian Bao, Linhao Li, Ziping ZhaoAbstractEmotional recognition contributes to automatically perceive the user’s emotional response to multimedia content through implicit annotation, which further benefits establishing effective user-centric services. Physiological-based ways have increasingly attract researcher’s attention because of their objectiveness on emotion representation. Conventional approaches to solve emotion recognition have mostly focused on the extraction of different kinds of hand-crafted features. However, hand-crafted feature always requires domain knowledge for the specific task, and designing the proper features may be more time consuming. Therefore, exploring the most effective physiological-based temporal feature representation for emotion recognition becomes the core problem of most works. In this paper, we proposed a multimodal attention-based BLSTM network framework for efficient emotion recognition. Firstly, raw physiological signals from each channel are transformed to spectrogram image for capturing their time and frequency information. Secondly, Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) are utilized to automatically learn the best temporal features. The learned deep features are then fed into a deep neural network (DNN) to predict the probability of emotional output for each channel. Finally, decision level fusion strategy is utilized to predict the final emotion. The experimental results on AMIGOS dataset show that our method outperforms other state of art methods.
  • Unsupervised dialectal neural machine translation
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Wael Farhan, Bashar Talafha, Analle Abuammar, Ruba Jaikat, Mahmoud Al-Ayyoub, Ahmad Bisher Tarakji, Anas TomaAbstractIn this paper, we present the first work on unsupervised dialectal Neural Machine Translation (NMT), where the source dialect is not represented in the parallel training corpus. Two systems are proposed for this problem. The first one is the Dialectal to Standard Language Translation (D2SLT) system, which is based on the standard attentional sequence-to-sequence model while introducing two novel ideas leveraging similarities among dialects: using common words as anchor points when learning word embeddings and a decoder scoring mechanism that depends on cosine similarity and language models. The second system is based on the celebrated Google NMT (GNMT) system. We first evaluate these systems in a supervised setting (where the training and testing are done using our parallel corpus of Jordanian dialect and Modern Standard Arabic (MSA)) before going into the unsupervised setting (where we train each system once on a Saudi-MSA parallel corpus and once on an Egyptian-MSA parallel corpus and test them on the Jordanian-MSA parallel corpus). The highest BLEU score obtained in the unsupervised setting is 32.14 (by D2SLT trained on Saudi-MSA data), which is remarkably high compared with the highest BLEU score obtained in the supervised setting, which is 48.25.
  • Social media video summarization using multi-Visual features and Kohnen's
           Self Organizing Map
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Seema Rani, Mukesh KumarAbstractSocial networking tools such as Facebook, YouTube, Twitter, and Instagram, are becoming major platforms for communication. YouTube as one of the primary video sharing platform serves over 100 million distinct videos, 300 hours of videos are uploaded on YouTube every minute along with textual data. This massive amount of multimedia data needs to be managed with high efficiency, the irrelevant and redundant data needs to be removed. Video summarization ideals with the problem of redundant data in a video. A summarized video contains the most distinct frames which are termed as key frames. Most of the research work on key frames extraction considers only a single visual feature which is not sufficient for capturing the full pictorial details and hence affecting the quality of video summary generated. So there is a need to explore multiple visual features for key frames extraction. In this research work a key frame extraction technique based upon fusion of four visual features namely: correlation of RGB color channels, color histogram, mutual information and moments of inertia is proposed. Kohonen Self Organizing map as a clustering approach is used to find the most representative frames from the list of frames coming after fusion. Useless frames are discarded and frames having maximum Euclidean distance within a cluster are selected as final key frames. The results of the proposed technique are compared with the existing video summarization techniques: User generated summary, Video SUMMarization (VSUMM), and Video Key Frame Extraction through Dynamic Delaunay Clustering (VKEDDCSC) which shows a considerable improvement in terms of fidelity and Shot Reconstruction Degree (SRD) score.
  • Karcı summarization: A simple and effective approach for automatic text
           summarization using Karcı entropy
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Cengiz Hark, Ali KarcıAbstractIncreases in the amount of text resources available via the Internet has amplified the need for automated document summarizing tools. However, further efforts are needed in order to improve the quality of the existing summarization tools currently available. The current study proposes Karcı Summarization, a novel methodology for extractive, generic summarization of text documents. Karcı Entropy was used for the first time in a document summarization method within a unique approach. An important feature of the proposed system is that it does not require any kind of information source or training data. At the stage of presenting the input text, a tool for text processing was introduced; known as KUSH (named after its authors; Karcı, Uçkan, Seyyarer, and Hark), and is used to protect semantic consistency between sentences. The Karcı Entropy-based solution chooses the most effective, generic and most informational sentences within a paragraph or unit of text. Experimentation with the Karcı Summarization approach was tested using open-access document text (Document Understanding Conference; DUC-2002, DUC-2004) datasets. Performance achievement of the Karcı Summarization approach was calculated using metrics known as Recall-Oriented Understudy for Gisting Evaluation (ROUGE). The experimental results showed that the proposed summarizer outperformed all current state-of-the-art methods in terms of 200-word summaries in the metrics of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-W-1.2. In addition, the proposed summarizer outperformed the nearest competitive summarizers by a factor of 6.4% for ROUGE-1 Recall on the DUC-2002 dataset. These results demonstrate that Karcı Summarization is a promising technique and it is therefore expected to attract interest from researchers in the field. Our approach was shown to have a high potential for adoptability. Moreover, the method was assessed as quite insensitive to disorderly and missing texts due to its KUSH text processing module.
  • Graph-based Arabic text semantic representation
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Wael Etaiwi, Arafat AwajanAbstractSemantic representation reflects the meaning of the text as it may be understood by humans. Thus, it contributes to facilitating various automated language processing applications. Although semantic representation is very useful for several applications, a few models were proposed for the Arabic language. In that context, this paper proposes a graph-based semantic representation model for Arabic text. The proposed model aims to extract the semantic relations between Arabic words. Several tools and concepts have been employed such as dependency relations, part-of-speech tags, name entities, patterns, and Arabic language predefined linguistic rules. The core idea of the proposed model is to represent the meaning of Arabic sentences as a rooted acyclic graph. Textual entailment recognition challenge is considered in order to evaluate the ability of the proposed model to enhance other Arabic NLP applications. The experiments have been conducted using a benchmark Arabic textual entailment dataset, namely, ArbTED. The results proved that the proposed graph-based model is able to enhance the performance of the textual entailment recognition task in comparison to other baseline models. On average, the proposed model achieved 8.6%, 30.2%, 5.3% and 16.2% improvement in terms of accuracy, recall, precision, and F-score results, respectively.
  • Fine-grained image classification with factorized deep user click feature
    • Abstract: Publication date: May 2020Source: Information Processing & Management, Volume 57, Issue 3Author(s): Min Tan, Jian Zhou, Zhiyou Peng, Jun Yu, Fang TangAbstractThe advantages of user click data greatly inspire its wide application in fine-grained image classification tasks. In previous click data based image classification approaches, each image is represented as a click frequency vector on a pre-defined query/word dictionary. However, this approach not only introduces high-dimensional issues, but also ignores the part of speech (POS) of a specific word as well as the word correlations. To address these issues, we devise the factorized deep click features to represent images. We first represent images as the factorized TF-IDF click feature vectors to discover word correlation, wherein several word dictionaries of different POS are constructed. Afterwards, we learn an end-to-end deep neural network on click feature tensors built on these factorized TF-IDF vectors. We evaluate our approach on the public Clickture-Dog dataset. It shows that: 1) the deep click feature learned on click tensor performs much better than traditional click frequency vectors; and 2) compared with many state-of-the-art textual representations, the proposed deep click feature is more discriminative and with higher classification accuracies.
  • An end-to-end pseudo relevance feedback framework for neural document
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Le Wang, Ze Luo, Canjia Li, Ben He, Le Sun, Hao Yu, Yingfei SunAbstractPseudo relevance feedback (PRF) is commonly used to boost the performance of traditional information retrieval (IR) models by using top-ranked documents to identify and weight new query terms, thereby reducing the effect of query-document vocabulary mismatches. While neural retrieval models have recently demonstrated promising results for ad-hoc document retrieval, combining them with PRF is not straightforward due to incompatibilities between existing PRF approaches and neural architectures. To bridge this gap, we propose an end-to-end neural PRF framework, coined NPRF, that enriches the representation of user information need from a single query to multiple PRF documents. NPRF can be used with existing neural IR models by embedding different neural models as building blocks. Three state-of-the-art neural retrieval models, including the unigram DRMM and KNRM models, and the position-aware PACRR model, are utilized to instantiate the NPRF framework. Extensive experiments on two standard test collections, TREC1-3 and Robust04, confirm the effectiveness of the proposed NPRF framework in improving the performance of three state-of-the-art neural IR models. In addition, analysis shows that integrating the existing neural IR models within the NPRF framework results in reduced training and validation losses, and consequently, improved effectiveness of the learned ranking functions.
  • Image caption generation with dual attention mechanism
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Maofu Liu, Lingjun Li, Huijun Hu, Weili Guan, Jing TianAbstractAs a crossing domain of computer vision and natural language processing, the image caption generation has been an active research topic in recent years, which contributes to the multimodal social media translation from unstructured image data to structured text data. The conventional research works have proposed a series of image captioning methods, such as template-based, retrieval-based, encode-decode. Among these methods, the one with encode-decode framework is widely used in the image caption generation, in which the encoder extracts the image features by Convolutional Neural Network (CNN), and the decoder adopts Recurrent Neural Network (RNN) to generate the image description. The Neural Image Caption (NIC) model has achieved good performance in image captioning, and however, there still remains some challenges to be addressed. To tackle the challenges of the lack of image information and the deviation from the core content of the image, our proposed model explores visual attention to deepen the understanding of the image, incorporating the image labels generated by Fully Convolutional Network (FCN) into the generation of image caption. Furthermore, our proposed model exploits textual attention to increase the integrity of the information. Finally, the label generation, attached to the textual attention mechanism, and the image caption generation, have been merged to form an end-to-end trainable framework. In this paper, extensive experiments have been carried out on the AIC-ICC image caption benchmark dataset, and the experimental results show that our proposed model is effective and feasible in the image caption generation.
  • Search task success evaluation by exploiting multi-view active
           semi-supervised learning
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Ling Chen, Alin Fan, Hongyu Shi, Gencai ChenAbstractSearch task success rate is an important indicator to measure the performance of search engines. In contrast to most of the previous approaches that rely on labeled search tasks provided by users or third-party editors, this paper attempts to improve the performance of search task success evaluation by exploiting unlabeled search tasks that are existing in search logs as well as a small amount of labeled ones. Concretely, the Multi-view Active Semi-Supervised Search task Success Evaluation (MA4SE) approach is proposed, which exploits labeled data and unlabeled data by integrating the advantages of both semi-supervised learning and active learning with the multi-view mechanism. In the semi-supervised learning part of MA4SE, we employ a multi-view semi-supervised learning approach that utilizes different parameter configurations to achieve the disagreement between base classifiers. The base classifiers are trained separately from the pre-defined action and time views. In the active learning part of MA4SE, each classifier received from semi-supervised learning is applied to unlabeled search tasks, and the search tasks that need to be manually annotated are selected based on both the degree of disagreement between base classifiers and a regional density measurement. We evaluate the proposed approach on open datasets with two different definitions of search tasks success. The experimental results show that MA4SE outperforms the state-of-the-art semi-supervised search task success evaluation approach.
  • Revealing the political affinity of online entities through their Twitter
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Giorgos Stamatelatos, Sotirios Gyftopoulos, George Drosatos, Pavlos S. EfraimidisAbstractIn this work, we show that the structural features of the Twitter online social network can divulge valuable information about the political affinity of the participating nodes. More precisely, we show that Twitter followers can be used to predict the political affinity of prominent Nodes of Interest (NOIs) they opt to follow. We utilize a series of purely structure-based algorithmic approaches, such as modularity clustering, the minimum linear arrangement (MinLA) problem and the DeGroot opinion update model in order to reveal diverse aspects of the NOIs’ political profile. Our methods are applied to a dataset containing the Twitter accounts of the members of the Greek Parliament as well as an enriched dataset that additionally contains popular news sources. The results confirm the viability of our approach and provide evidence that the political affinity of NOIs can be determined with high accuracy via the Twitter follower network. Moreover, the outcome of an independently performed expert study about the offline political scene confirms the effectiveness of our methods.
  • Answering recreational web searches with relevant things to do results
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Omar Alonso, Vasileios Kandylas, Serge-Eric Tremblay, Stewart WhitingAbstractRecreational queries from users searching for places to go and things to do or see are very common in web and mobile search. Users specify constraints for what they are looking for, like suitability for kids, romantic ambiance or budget. Queries like “restaurants in New York City” are currently served by static local results or the thumbnail carousel. More complex queries like “things to do in San Francisco with kids” or “romantic places to eat in Seattle” require the user to click on every element of the search engine result page to read articles from Yelp, TripAdvisor, or WikiTravel to satisfy their needs. Location data, which is an essential part of web search, is even more prevalent with location-based social networks and offers new opportunities for many ways of satisfying information seeking scenarios.In this paper, we address the problem of recreational queries in information retrieval and propose a solution that combines search query logs with LBSNs data to match user needs and possible options. At the core of our solution is a framework that combines social, geographical, and temporal information for a relevance model centered around the use of semantic annotations on Points of Interest with the goal of addressing these recreational queries. A central part of the framework is a taxonomy derived from behavioral data that drives the modeling and user experience. We also describe in detail the complexity of assessing and evaluating Point of Interest data, a topic that is usually not covered in related work, and propose task design alternatives that work well.We demonstrate the feasibility and scalability of our methods using a data set of 1B check-ins and a large sample of queries from the real-world. Finally, we describe the integration of our techniques in a commercial search engine.
  • Unwanted advances in higher education:Uncovering sexual harassment
           experiences in academia with text mining
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Amir Karami, Cynthia Nicole White, Kayla Ford, Suzanne Swan, Melek Yildiz SpinelAbstractSexual harassment in academia is often a hidden problem because victims are usually reluctant to report their experiences. Recently, a web survey was developed to provide an opportunity to share thousands of sexual harassment experiences in academia. Using an efficient approach, this study collected and investigated more than 2,000 sexual harassment experiences to better understand these unwanted advances in higher education. This paper utilized text mining to disclose hidden topics and explore their weight across three variables: harasser gender, institution type, and victim's field of study. We mapped the topics on five themes drawn from the sexual harassment literature and found that more than 50% of the topics were assigned to the unwanted sexual attention theme. Fourteen percent of the topics were in the gender harassment theme, in which insulting, sexist, or degrading comments or behavior was directed towards women. Five percent of the topics involved sexual coercion (a benefit is offered in exchange for sexual favors), 5% involved sex discrimination, and 7% of the topics discussed retaliation against the victim for reporting the harassment, or for simply not complying with the harasser. Findings highlight the power differential between faculty and students, and the toll on students when professors abuse their power. While some topics did differ based on type of institution, there were no differences between the topics based on gender of harasser or field of study. This research can be beneficial to researchers in further investigation of this paper's dataset, and to policymakers in improving existing policies to create a safe and supportive environment in academia.
  • Does the review deserve more helpfulness when its title resembles the
           content' Locating helpful reviews by text mining
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Yusheng zhou, Shuiqing Yang, yixiao li, Yuangao chen, Jianrong Yao, Atika QaziAbstractOnline review helpfulness has always sparked a heated discussion among academics and practitioners. Despite the fact that research has extensively examined the impacts of review title and content on perceptions of online review helpfulness, the underlying mechanism of how the similarities between a review' title and content may affect review helpfulness has been rarely explored. Based on mere exposure theory, a research model reflecting the influences of title-content similarity and sentiment consistency on review helpfulness was developed and empirically examined by using data collected from 127,547 product reviews on The TF-IDF and the cosine of similarity were used for measuring the text similarity between review title and review content, and the Tobit model was used for regression analysis. The results showed that the title-content similarity positively affected review helpfulness. In addition, the positive effect of title-content similarity on review helpfulness is increased when the title-content sentiment consistency is high. The title sentiment also negatively moderates the impact of the title-content similarity on review helpfulness. The present research can help online retailers identify the most helpful reviews and, thus, reduce consumers' search costs as well as assist reviewers in contributing more valuable online reviews.
  • Locally and multiply distorted image quality assessment via multi-stage
    • Abstract: Publication date: Available online 30 November 2019Source: Information Processing & ManagementAuthor(s): Yuan Yuan, Hai Su, Juhua Liu, Guoqiang ZengAbstractThe majority of existing objective Image Quality Assessment (IQA) methods are designed specifically for singly and globally distorted images, which are incapable of dealing with locally and multiply distorted images effectively. On the one hand, artificially extracted features in traditional IQA methods are insufficient to represent quality variations in locally and multiply distorted images. On the other hand, the IQA methods suitable for both locally and multiply distorted images are scarce. In view of this, an IQA method based on multi-stage deep Convolutional Neural Networks (CNNs) is proposed for locally and multiply distorted images in this paper. The method adopts a three-stage strategy, which are distortion classification, quality prediction of single distortion and comprehensive assessment, respectively. Firstly, three datasets of locally, multiply and singly distorted images are designed and established. Secondly, a local and multiple distortion classifier, a distortion type classifier and prediction models of single distortions are obtained based on CNN models in their corresponding stages. Thirdly, the predicted results of single distortions are weighted by the output confidence probability of the classifiers, thus obtaining the final comprehensive quality. Experimental results verified the advantages of the proposed method in measuring the quality of locally and multiply distorted images.
  • Investigating the lack of diversity in user behavior: The case of musical
           content on online platforms
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Rémy Poulain, Fabien TarissanAbstractWhether to deal with issues related to information ranking (e.g. search engines) or content recommendation (on social networks, for instance), algorithms are at the core of processes that select which information is made visible. Such algorithmic choices have a strong impact on users’ activity de facto, and therefore on their access to information. This raises the question of how to measure the quality of the choices algorithms make and their impact on users. As a first step in that direction, this paper presents a framework with which to analyze the diversity of information accessed by users in the context of musical content.The approach adopted centers on the representation of user activity through a tripartite graph that maps users to products and products to categories. In turn, conducting random walks in this structure makes it possible to analyze how categories catch users’ attention and how this attention is distributed. Building upon this distribution, we propose a new index referred to as the (calibrated) herfindahl diversity, which is aimed at quantifying the extent to which this distribution is diverse and representative of existing categories.To the best of our knowledge, this paper is the first to connect the output of random walks on graphs with diversity indexes. We demonstrate the benefit of such an approach by applying our index to two datasets that record user activity on online platforms involving musical content. The results are threefold. First, we show that our index can discriminate between different user behaviors. Second, we shed some light on a saturation phenomenon in the diversity of users’ attention. Finally, we show that the lack of diversity observed in the datasets derives from exogenous factors related to the heterogeneous popularity of music styles, as opposed to internal factors such as recurrent user behaviors.
  • Effectiveness evaluation without human relevance judgments: A systematic
           analysis of existing methods and of their combinations
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Kevin Roitero, Andrea Brunello, Giuseppe Serra, Stefano MizzaroAbstractIn test collection based evaluation of retrieval effectiveness, it has been suggested to completely avoid using human relevance judgments. Although several methods have been proposed, their accuracy is still limited. In this paper we present two overall contributions. First, we provide a systematic comparison of all the most widely adopted previous approaches on a large set of 14 TREC collections. We aim at analyzing the methods in a homogeneous and complete way, in terms of the accuracy measures used as well as in terms of the datasets selected, showing that considerably different results may be achieved considering different methods, datasets, and measures. Second, we study the combination of such methods, which, to the best of our knowledge, has not been investigated so far. Our experimental results show that simple combination strategies based on data fusion techniques are usually not effective and even harmful. However, some more sophisticated solutions, based on machine learning, are indeed effective and often outperform all individual methods. Moreover, they are more stable, as they show a smaller variation across datasets. Our results have the practical implication that, when trying to automatically evaluate retrieval effectiveness, researchers should not use a single method, but a (machine-learning based) combination of them.
  • Network measures: A new paradigm towards reliable novel word sense
    • Abstract: Publication date: Available online 28 November 2019Source: Information Processing & ManagementAuthor(s): Abhik Jana, Animesh Mukherjee, Pawan GoyalAbstractIn this era of digitization, with the fast flow of information on the web, words are being used to denote newer meanings. Thus novel sense detection becomes a crucial and challenging task in order to build any natural language processing application which depends on the efficient semantic representation of words. With the recent availability of large amounts of digitized texts, automated analysis of language evolution has become possible. Given corpus from two different time periods, the main focus of our work is to detect the words evolved with a novel sense precisely. We pose this problem as a binary classification task to detect whether a new sense of a target word has emerged. This paper presents a unique proposal based on network features to improve the precision of this task of detecting emerged new sense of a target word. For a candidate word where a new sense has been detected by comparing the sense clusters induced at two different time periods, we further compare the network properties of the subgraphs induced from novel sense clusters across these two time periods. Using the mean fractional change in edge density, structural similarity and average path length as features in a Support Vector Machine (SVM) classifier, manual evaluation gives precision values of 0.86 and 0.74 for the task of new sense detection, when tested on 2 distinct time-point pairs, in comparison to the precision values in the range of 0.23-0.32, when the proposed scheme is not used. The outlined method can, therefore, be used as a new post-hoc step to improve the precision of novel word sense detection in a robust and reliable way where the underlying framework uses a graph structure. Another important observation is that even though our proposal is a post-hoc step, it can be used in isolation and that itself results in a very decent performance achieving a precision of 0.54-0.62. Finally, we also show that our method is able to detect well-known historical shifts in 80% cases.
  • Best practices for conducting fieldwork with marginalized communities
    • Abstract: Publication date: Available online 27 November 2019Source: Information Processing & ManagementAuthor(s): Devendra Potnis, Bhakti GalaAbstractFieldwork is indispensable for understanding, explaining, and predicting the role of information and communication technologies for development (ICT4D) of marginalized communities. Engaging with marginalized communities is at the heart of ICT4D fieldwork. However, vulnerabilities of marginalized communities can prevent them from participating in fieldwork. The goal of this paper is to report best practices for engaging with marginalized communities. The findings are grounded in our fieldwork that consisted of ten three-hour sessions, each including focus groups, surveys, and hands-on exercises, with 152 participants earning less than USD 2 a day, in their native language at ten rural and urban public libraries in India. We conclude that a combination of proactive planning around the vulnerabilities of marginalized communities, constant reflective monitoring of the vulnerabilities and resulting challenges during fieldwork, and appropriate responsive action to address the challenges is one of the best ways of conducting fieldwork, since it helped us manage a majority of the challenges resulting from geographic, temporal, technological, financial, educational, psychological, informational, infrastructural, social, and cultural vulnerabilities of the participants. Our organic and structured guidance in the form of lessons learned can help library and information science researchers and practitioners to customize fieldwork around the vulnerabilities of marginalized communities.
  • Identifying breakthrough scientific papers
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Pavel Savov, Adam Jatowt, Radoslaw NielekAbstractCitation analysis does not tell the whole story about the innovativeness of scientific papers. Works by prominent authors tend to receive disproportionately many citations, while publications by less well-known researchers covering the same topics may not attract as much attention. In this paper we address the shortcomings of traditional scientometric approaches by proposing a novel method that utilizes a classifier for predicting publication years based on latent topic distributions. We then calculate real-number innovation scores used to identify potential breakthrough papers and turnaround years. The proposed approach can complement existing citation-based measures of article importance and author contribution analysis; it opens as well novel research direction for time-based, innovation-centered research scientific output evaluation. In our experiments, we focus on two corpora of research papers published over several decades at two well-established conferences: The World Wide Web Conference (WWW) and the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), containing around 3500 documents in total. We indicate significant years and demonstrate examples of highly-ranked papers, thus providing a novel insight on the evolution of the two conferences. Finally, we compare our results to citation analysis and discuss how our approach may complement traditional scientometrics.
  • Foreword to the special issue on mining actionable insights from social
    • Abstract: Publication date: Available online 22 November 2019Source: Information Processing & ManagementAuthor(s): Marcelo G. Armentano, Ebrahim Bagheri, Frank W. Takes, Virginia D. Yannibelli
  • Explore instance similarity: An instance correlation based hashing method
           for multi-label cross-model retrieval
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Chengkai Huang, Xuan Luo, Jiajia Zhang, Qing Liao, Xuan Wang, Zoe.L. Jiang, Shuhan QiAbstractWith the rapid growth of multimedia data, cross-media hashing has gained more and more attention. However, most existing cross-modal hashing methods ignore the multi-label correlation and only apply binary similarity to measure the correlation between two instances. Most existing methods perform poorly in capturing the relevance between retrieval results and queries since binary similarity measurement has limited abilities to discriminate minor differences among different instances. In order to overcome the mentioned shortcoming, we introduce a novel notion of instance similarity method, which is used to evaluate the semantic correlation between two specific instances in training data. Base on the instance similarity, we also propose a novel deep instance hashing network, which utilizes instance similarity and binary similarity simultaneously for multi-label cross-model retrieval. The experiment results on two real datasets show the superiority of our proposed method, compared with a series of state-of-the-art cross-modal hashing methods in terms of several metric evaluations.
  • Graph neural news recommendation with long-term and short-term interest
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Linmei Hu, Chen Li, Chuan Shi, Cheng Yang, Chao ShaoAbstractWith the information explosion of news articles, personalized news recommendation has become important for users to quickly find news that they are interested in. Existing methods on news recommendation mainly include collaborative filtering methods which rely on direct user-item interactions and content based methods which characterize the content of user reading history. Although these methods have achieved good performances, they still suffer from data sparse problem, since most of them fail to extensively exploit high-order structure information (similar users tend to read similar news articles) in news recommendation systems. In this paper, we propose to build a heterogeneous graph to explicitly model the interactions among users, news and latent topics. The incorporated topic information would help indicate a user’s interest and alleviate the sparsity of user-item interactions. Then we take advantage of graph neural networks to learn user and news representations that encode high-order structure information by propagating embeddings over the graph. The learned user embeddings with complete historic user clicks capture the users’ long-term interests. We also consider a user’s short-term interest using the recent reading history with an attention based LSTM model. Experimental results on real-world datasets show that our proposed model significantly outperforms state-of-the-art methods on news recommendation.
  • What motivates physicians to share free health information on online
           health platforms'
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Xiaofei Zhang, Feng Guo, Tianxue Xu, Yongli LiAbstractOnline platforms enable physicians to share health and medical information with the public; however, little research has been conducted to explore why physicians share free health education information. Drawing on motivation theory, this study develops a theoretical model to explore the influences of material and professional motivation on free information sharing and the contingent roles of professional expertise and online expertise. The model is tested using a six-month panel data set of 61,326 physicians’ sharing experiences. The results indicate that in addition to material motivation, professional motivation also plays a primary role in inducing physicians to share free information. However, when a physician's professional and online expertise is at a high level, the effect of material motivation is weakened and professional motivation plays a more important role. This study contributes to the literature on knowledge sharing, online health behavior, and motivation theory, and provides implications for practice.
  • Multimodal joint learning for personal knowledge base construction from
           Twitter-based lifelogs
    • Abstract: Publication date: Available online 12 November 2019Source: Information Processing & ManagementAuthor(s): An-Zi Yen, Hen-Hsen Huang, Hsin-Hsi ChenAbstractPeople are used to log their life on social media platforms. In this paper, we aim to extract life events by leveraging both visual and textual information shared on Twitter and construct personal knowledge bases of individuals. The issues to be tackled include (1) not all text descriptions are related to life events, (2) life events in a text description can be expressed explicitly or implicitly, (3) the predicates in the implicit life events are often absent, and (4) the mapping from natural language predicates to knowledge base relations may be ambiguous. A multimodal joint learning approach trained on both text and images from social media posts shared on Twitter is proposed to detect life events in tweets and extract event components including subjects, predicates, objects, and time expressions. Finally, the extracted information is transformed to knowledge base facts. The evaluation is performed on a collection of lifelogs from 18 Twitter users. Experimental results show our proposed system is effective in life event extraction, and the constructed personal knowledge bases are expected to be useful to memory recall applications.
  • Multi-Modal fusion with multi-level attention for Visual Dialog
    • Abstract: Publication date: Available online 11 November 2019Source: Information Processing & ManagementAuthor(s): Jingping Zhang, Qiang Wang, Yahong HanAbstractGiven an input image, Visual Dialog is introduced to answer a sequence of questions in the form of a dialog. To generate accurate answers for questions in the dialog, we need to consider all information of the dialog history, the question, and the image. However, existing methods usually directly utilized the high-level semantic information of the whole sentence for the dialog history and the question, while ignoring the low-level detailed information of words in the sentence. Similarly, the detailed region information of the image in low level is also required to be considered for question answering. Therefore, we propose a novel visual dialog method, which focuses on both high-level and low-level information of the dialog history, the question, and the image. In our approach, we introduce three low-level attention modules, the goal of which is to enhance the representation of words in the sentence of the dialog history and the question based on the word-to-word connection and enrich the region information of the image based on the region-to-region relation. Besides, we design three high-level attention modules to select important words in the sentence of the dialog history and the question as the supplement of the detailed information for semantic understanding, as well as to select relevant regions in the image to provide the targeted visual information for question answering. We evaluate the proposed approach on two datasets: VisDial v0.9 and VisDial v1.0. The experimental results demonstrate that utilizing both low-level and high-level information really enhances the representation of inputs.
  • Heterogeneous graph-based joint representation learning for users and POIs
           in location-based social network
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Yaqiong Qiao, Xiangyang Luo, Chenliang Li, Hechan Tian, Jiangtao MaAbstractLearning latent representations for users and points of interests (POIs) is an important task in location-based social networks (LBSN), which could largely benefit multiple location-based services, such as POI recommendation and social link prediction. Many contextual factors, like geographical influence, user social relationship and temporal information, are available in LBSN and would be useful for this task. However, incorporating all these contextual factors for user and POI representation learning in LBSN remains challenging, due to their heterogeneous nature. Although the encouraging performance of POI recommendation and social link prediction are delivered, most of the existing representation learning methods for LBSN incorporate only one or two of these contextual factors. In this paper, we propose a novel joint representation learning framework for users and POIs in LBSN, named UP2VEC. In UP2VEC, we present a heterogeneous LBSN graph to incorporate all these aforementioned factors. Specifically, the transition probabilities between nodes inside the heterogeneous graph are derived by jointly considering these contextual factors. The latent representations of users and POIs are then learnt by matching the topological structure of the heterogeneous graph. For evaluating the effectiveness of UP2VEC, a series of experiments are conducted with two real-world datasets (Foursquare and Gowalla) in terms of POI recommendation and social link prediction. Experimental results demonstrate that the proposed UP2VEC significantly outperforms the existing state-of-the-art alternatives. Further experiment shows the superiority of UP2VEC in handling cold-start problem for POI recommendation.
  • A social-semantic recommender system for advertisements
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Francisco García-Sánchez, Ricardo Colomo-Palacios, Rafael Valencia-GarcíaAbstractSocial applications foster the involvement of end users in Web content creation, as a result of which a new source of vast amounts of data about users and their likes and dislikes has become available. Having access to users’ contributions to social sites and gaining insights into the consumers’ needs is of the utmost importance for marketing decision making in general, and to advertisement recommendation in particular. By analyzing this information, advertisement recommendation systems can attain a better understanding of the users’ interests and preferences, thus allowing these solutions to provide more precise ad suggestions. However, in addition to the already complex challenges that hamper the performance of recommender systems (i.e., data sparsity, cold-start, diversity, accuracy and scalability), new issues that should be considered have also emerged from the need to deal with heterogeneous data gathered from disparate sources. The technologies surrounding Linked Data and the Semantic Web have proved effective for knowledge management and data integration. In this work, an ontology-based advertisement recommendation system that leverages the data produced by users in social networking sites is proposed, and this approach is substantiated by a shared ontology model with which to represent both users’ profiles and the content of advertisements. Both users and advertisement are represented by means of vectors generated using natural language processing techniques, which collect ontological entities from textual content. The ad recommender framework has been extensively validated in a simulated environment, obtaining an aggregated f-measure of 79.2% and a Mean Average Precision at 3 (MAP@3) of 85.6%.
  • Towards a model for spoken conversational search
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Johanne R. Trippas, Damiano Spina, Paul Thomas, Mark Sanderson, Hideo Joho, Lawrence CavedonAbstractConversation is the natural mode for information exchange in daily life, a spoken conversational interaction for search input and output is a logical format for information seeking. However, the conceptualisation of user–system interactions or information exchange in spoken conversational search (SCS) has not been explored. The first step in conceptualising SCS is to understand the conversational moves used in an audio-only communication channel for search. This paper explores conversational actions for the task of search. We define a qualitative methodology for creating conversational datasets, propose analysis protocols, and develop the SCSdata. Furthermore, we use the SCSdata to create the first annotation schema for SCS: the SCoSAS, enabling us to investigate interactivity in SCS. We further establish that SCS needs to incorporate interactivity and pro-activity to overcome the complexity that the information seeking process in an audio-only channel poses. In summary, this exploratory study unpacks the breadth of SCS. Our results highlight the need for integrating discourse in future SCS models and contributes the advancement in the formalisation of SCS models and the design of SCS systems.
  • node2hash: Graph aware deep semantic text hashing
    • Abstract: Publication date: Available online 2 November 2019Source: Information Processing & ManagementAuthor(s): Suthee Chaidaroon, Dae Hoon Park, Yi Chang, Yi FangAbstractSemantic hashing is an effective method for fast similarity search which maps high-dimensional data to a compact binary code that preserves the semantic information of the original data. Most existing text hashing approaches treat each document separately and only learn the hash codes from the content of the documents. However, in reality, documents are related to each other either explicitly through an observed linkage such as citations or implicitly through unobserved connections such as adjacency in the original space. The document relationships are pervasive in the real world while they are largely ignored in the prior semantic hashing work. In this paper, we propose node2hash, an unsupervised deep generative model for semantic text hashing by utilizing graph context. It is designed to incorporate both document content and connection information through a probabilistic formulation. Based on the deep generative modeling framework, node2hash employs deep neural networks to learn complex mappings from the original space to the hash space. Moreover, the probabilistic formulation enables a principled way to generate hash codes for unseen documents that do not have any connections with the existing documents. Besides, node2hash can go beyond one-hop connections about directed linked documents by considering more global graph information. We conduct comprehensive experiments on seven datasets with explicit and implicit connections. The results have demonstrated the effectiveness of node2hash over competitive baselines.
  • An effective approach to candidate retrieval for cross-language plagiarism
           detection: A fusion of conceptual and keyword-based schemes
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Meysam Roostaee, Mohammad Hadi Sadreddini, Seyed Mostafa FakhrahmadAbstractDue to the rapid growth of documents and manuscripts in various languages all over the world, plagiarism detection has become a challenging task, especially for cross lingual cases. Because of this issue, in today's plagiarism detection systems, a candidate retrieval process is developed as the first step, in order to reduce the set of documents for comparison to a reasonable number. The performance of the second step of plagiarism detection, which is devoted to a detailed analysis of the candidates is tightly dependent on the candidate retrieval phase. Regarding its high importance, the present study focuses on the candidate retrieval task and aims to extract the minimal set of highly potential source documents, accurately. The paper proposes a fusion of concept-based and keyword-based retrieval models for this purpose. A dynamic interpolation factor is used in the proposed scheme in order to combine the results of conceptual and bag-of-words models. The effectiveness of the proposed model for cross language candidate retrieval is also compared with state-of-the-art models over German-English and Spanish-English language partitions. The results show that the proposed candidate retrieval model outperforms the state-of-the-art models and can be considered as a proper choice to be embedded in cross-language plagiarism detection systems.
  • Information Technology (IT) enabled crowdsourcing: A conceptual framework
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Minoo Modaresnezhad, Lakshmi Iyer, Prashant Palvia, Vasyl TarasAbstractIT-enabled crowdsourcing is defined as technology-enabled outsourcing of tasks through an open call to the masses via the internet. Crowdsourcing is an IT artifact that has gone beyond the traditional boundaries of an organization to a much broader context. Over the past decade, research and practice on crowdsourcing have continued to grow, evolve, and revolutionize the way work gets done. Although numerous studies have been conducted in this area, our understanding of the main components involved in crowdsourcing processes remains limited. The goal of the current study is to conduct a structured literature review and synthesize the available crowdsourcing literature and applications in one coherent conceptual framework. The framework identifies the main elements involved in the crowdsourcing process and its characteristics. This framework extends the field of Information Systems (IS) and would help us better understand this phenomenon. Furthermore, the results of this study could potentially fill the knowledge gap in the crowdsourcing literature by identifying the main characteristics of a crowdsourcing process as a legitimate, IT-enabled form of problem-solving. Our results would also help organizations to leverage crowdsourcing more efficiently.
  • Identifying crisis-related informative tweets using learning on
    • Abstract: Publication date: March 2020Source: Information Processing & Management, Volume 57, Issue 2Author(s): Seyed Hossein Ghafarian, Hadi Sadoghi YazdiAbstractSocial networks like Twitter are good means for people to express themselves and ask for help in times of crisis. However, to provide help, authorities need to identify informative posts on the network from the vast amount of non-informative ones to better know what is actually happening. Traditional methods for identifying informative posts put emphasis on the presence or absence of certain words which has limitations for classifying these posts. In contrast, in this paper, we propose to consider the (overall) distribution of words in the post. To do this, based on the distributional hypothesis in linguistics, we assume that each tweet is a distribution from which we have drawn a sample of words. Building on recent developments in learning methods, namely learning on distributions, we propose an approach which identifies informative tweets by using distributional assumption. Extensive experiments have been performed on Twitter data from more than 20 crisis incidents of nearly all types of incidents. These experiments show the superiority of the proposed approach in a number of real crisis incidents. This implies that better modelling of the content of a tweet based on recent advances in estimating distributions and using domain-specific knowledge for various types of crisis incidents such as floods or earthquakes, may help to achieve higher accuracy in the task.
  • On Black Wikipedians: Motivations behind content contribution
    • Abstract: Publication date: Available online 23 October 2019Source: Information Processing & ManagementAuthor(s): Brenton Stewart, Boryung JuAbstractDespite a growing body of scholarship on Wikipedia very few studies focus on motivational factors that compel underrepresented populations to contribute to the online encyclopedia. This study examines the information behavior of Black Wikipedians by examining a proposed research model that considers black altruism and perceptions of information quality among factors that drive content contribution. Data were collected from an online survey of 318 Black identified Wikipedia contributors and analyzed using Partial Least Squares (PLS), a Structural Equation Modeling (SEM) technique, to measure the degree of association among factors. Results show that content contribution among Black Wikipedians is driven by self-interest and contributors’ perceptions of information quality on Wikipedia. Social presence and black altruism also have significant but indirect influences on Black Wikipedians’ content contribution. Our proposed research model for content contribution is statistically validated.
  • Content-based characterization of online social communities
    • Abstract: Publication date: Available online 15 October 2019Source: Information Processing & ManagementAuthor(s): Giorgia Ramponi, Marco Brambilla, Stefano Ceri, Florian Daniel, Marco Di GiovanniAbstractNowadays social networks are becoming an essential ingredient of our life, the faster way to share ideas and to influence people. Interaction within social networks tends to take place within communities, sets of social accounts which share friendships, ideas, interests and passions; detecting digital communities is of increasing relevance, from a social and economical point of view.In this paper, we analyze the problem of community detection from a content analysis perspective: we argue that the content produced in social interaction is a very distinctive feature of a community, hence it can be effectively used for community detection. We analyze the problem from a textual perspective using only syntactic and semantic features, including high level latent features that we denote as topics.We show that, by inspecting the content used by tweets, we can achieve very efficient classifiers and predictors of account membership within a given community. We describe the features that best constitute a vocabulary, then we provide their comparative evaluation and select the best features for the task, and finally we illustrate an application of our approach to some concrete community detection scenarios, such as Italian politics and targeted advertising.
  • Label consistent locally linear embedding based cross-modal hashing
    • Abstract: Publication date: Available online 7 October 2019Source: Information Processing & ManagementAuthor(s): Hui Zeng, Huaxiang Zhang, Lei ZhuAbstractHashing methods have gained widespread attention in cross-modal retrieval applications due to their efficiency and effectiveness. Many works have been done but they fail to capture the feature based similarity consistency or the discriminative semantics of label consistency. In addition, most of them suffer from large quantization loss, resulting in low retrieval performance. To address these issues, we propose a novel cross-modal hashing method named Label Consistent Locally Linear Embedding based Cross-modal Hashing (LCLCH). LCLCH preserves the non-linear manifold structure of different modality data by Locally Linear Embedding, and transforms heterogeneous data into a latent common semantic space to reduce the semantic gap and support cross-modal retrieval tasks. Therefore, it not only discovers the potential correlation of heterogeneous cross-modal data but also maintains label consistency. To further ensure the effectiveness of hash code learning, we utilize an iterative quantization method to handle the discrete optimization task and obtain the hash codes directly. We compare LCLCH with some advanced supervised and unsupervised methods on three datasets to evaluate its effectiveness.
  • Building a morpho-semantic knowledge graph for Arabic information
    • Abstract: Publication date: Available online 25 September 2019Source: Information Processing & ManagementAuthor(s): Ibrahim Bounhas, Nadia Soudani, Yahya SlimaniAbstractIn this paper, we propose to build a morpho-semantic knowledge graph from Arabic vocalized corpora. Our work focuses on classical Arabic as it has not been deeply investigated in related works. We use a tool suite which allows analyzing and disambiguating Arabic texts, taking into account short diacritics to reduce ambiguities. At the morphological level, we combine Ghwanmeh stemmer and MADAMIRA which are adapted to extract a multi-level lexicon from Arabic vocalized corpora. At the semantic level, we infer semantic dependencies between tokens by exploiting contextual knowledge extracted by a concordancer. Both morphological and semantic links are represented through compressed graphs, which are accessed through lazy methods. These graphs are mined using a measure inspired from BM25 to compute one-to-many similarity. Indeed, we propose to evaluate the morpho-semantic Knowledge Graph in the context of Arabic Information Retrieval (IR). Several scenarios of document indexing and query expansion are assessed. That is, we vary indexing units for Arabic IR based on different levels of morphological knowledge, a challenging issue which is not yet resolved in previous research. We also experiment several combinations of morpho-semantic query expansion. This permits to validate our resource and to study its impact on IR based on state-of-the art evaluation metrics.
  • Conceptualising misinformation in the context of asylum seekers
    • Abstract: Publication date: Available online 19 September 2019Source: Information Processing & ManagementAuthor(s): Hilda Ruokolainen, Gunilla WidénAbstractThis conceptual paper focuses on misinformation in the context of asylum seekers. We conducted a literature review on the concept of misinformation, which showed that a more nuanced understanding of information and misinformation is needed. To understand and study different viewpoints when it comes to the perception of the accuracy of information, we introduce two new concepts: perceived misinformation and normative misinformation. The concepts are especially helpful when marginalised and vulnerable groups are studied, as these groups may perceive information differently compared to majority populations. Our literature review on the information practices of asylum seekers shows that asylum seekers come across different types of misinformation. These include official information that is inadequate or presented inadequately, outdated information, misinformation via gatekeepers and other mediators, information giving false hope or unrealistic expectations, rumours and distorted information. The diversity of misinformation in their lives shows that there is a need to understand information in general in a broad and more nuanced way. Based on this idea, we propose a Social Information Perception model (SIP), which shows that different social, cultural and historical aspects, as well as situation and context, are involved in the mental process which determines whether people perceive information as accurate information, misinformation or disinformation. The model, as well as the concepts of perceived and normative misinformation, are helpful when the information practices of marginalised and vulnerable groups are studied, giving a holistic view on their information situation. Understanding the information practices more holistically enables different actors to give trustworthy information in an understandable and culturally meaningful way to the asylum seekers.
  • Enhancing usability of digital libraries: Designing help features to
           support blind and visually impaired users
    • Abstract: Publication date: Available online 16 September 2019Source: Information Processing & ManagementAuthor(s): Iris Xie, Rakesh Babu, Tae Hee Lee, Melissa Davey Castillo, Sukjin You, Ann M HanlonAbstractBlind and visually impaired (BVI) users experience vulnerabilities in digital library (DL) environments largely due to limitations in DL design that prevent them from effectively interacting with DL content and features. Existing research has not adequately examined how BVI users interact with DLs, nor the typical problems encountered during interactions. This is the first study conducted to test whether implementing help features corresponding to BVI users’ needs can reduce five critical help-seeking situations they typically encounter, with the goal to further enhance usability of DLs. Multiple data collection methods including pre-questionnaires, think-aloud protocols, transaction logs, and pre and post search interviews, were employed in an experimental design. Forty subjects were divided into two groups with similar demographic data based on data generated from pre-questionnaires. The findings of this study show that the experimental group encountered fewer number of help-seeking situations than the control group when interacting with the experimental and baseline versions of a DL. Moreover, the experimental group outperformed the control group on perceived usefulness of the DL features, ease of use of the DL, and DL satisfaction. This study provides theoretical and practical contributions to the field of library and information science. Theoretically, this study frames vulnerabilities of BVI users within the social model of disability in which improper DL design impairs their ability to effectively access and use DLs. Practically, this study takes into account BVI users’ critical help-seeking situations and further translates these into the design of help features to improve the usability of DLs.
  • Consumer health information needs: A systematic review of measures
    • Abstract: Publication date: Available online 14 September 2019Source: Information Processing & ManagementAuthor(s): Wenjing Pian, Shijie Song, Yan ZhangAbstractInformation needs motivate human information behavior. Knowledge of information needs is critical for user-centered information behavior research and system design. In consumer health information behavior research, there is a lack of understanding of how consumer health information needs (CHIN) is measured in empirical studies. This study is a systematic review of empirical quantitative studies on CHIN, with a focus on how CHIN is defined and operationalized. A search of six academic databases and citation-track of relevant articles identified a total of 216 relevant articles. These articles were analyzed using the qualitative content analysis method. We found that few included articles explicitly defined either CHIN or information needs in general. When definitions were given, they were from a cognitive perspective and largely ignored the multidimensionality of the concept. Consistent with this cognitive-centered conceptualization, CHIN was operationalized primarily as information topics, with some articles also measuring several additional attributes, including level of importance, fulfilment, amount of information needed, and frequency of needs. These findings suggest that CHIN is undertheorized. To address this gap, future studies should attend to social and emotional dimensions of CHIN, such as motivations, goals, activities, and emotions. Further, more research is needed to understand how CHIN is related to consumer health information seeking behavior and to the social and environmental context in which the needs arise.
  • Focal elements of neural information retrieval models. An outlook through
           a reproducibility study
    • Abstract: Publication date: Available online 13 September 2019Source: Information Processing & ManagementAuthor(s): Stefano Marchesin, Alberto Purpura, Gianmaria SilvelloAbstractThis paper analyzes two state-of-the-art Neural Information Retrieval (NeuIR) models: the Deep Relevance Matching Model (DRMM) and the Neural Vector Space Model (NVSM). Our contributions include: (i) a reproducibility study of two state-of-the-art supervised and unsupervised NeuIR models, where we present the issues we encountered during their reproducibility; (ii) a performance comparison with other lexical, semantic and state-of-the-art models, showing that traditional lexical models are still highly competitive with DRMM and NVSM; (iii) an application of DRMM and NVSM on collections from heterogeneous search domains and in different languages, which helped us to analyze the cases where DRMM and NVSM can be recommended; (iv) an evaluation of the impact of varying word embedding models on DRMM, showing how relevance-based representations generally outperform semantic-based ones; (v) a topic-by-topic evaluation of the selected NeuIR approaches, comparing their performance to the well-known BM25 lexical model, where we perform an in-depth analysis of the different cases where DRMM and NVSM outperform the BM25 model or fail to do so. We run an extensive experimental evaluation to check if the improvements of NeuIR models, if any, over the selected baselines are statistically significant.
  • Evaluating the use of interactive virtual reality technology with older
           adults living in residential aged care
    • Abstract: Publication date: Available online 9 September 2019Source: Information Processing & ManagementAuthor(s): Steven Baker, Jenny Waycott, Elena Robertson, Romina Carrasco, Barbara Barbosa Neves, Ralph Hampson, Frank VetereAbstractBackground and objectivesAs technologies gain traction within the aged care community, better understanding their impact becomes vital. This paper reports on a study that explored the deployment of virtual reality (VR) as a tool to engage older adults in Residential Aged Care Facilities (RACF). The paper has two aims: 1) to identify the benefits and challenges associated with using VR with residents in aged care settings, and 2) to gather the views of older adult residents in RACF about the potential uses of VR in aged care.Research design and methodsFive RACF residents and five RACF staff members took part in an intensive two-week evaluation of a VR system. Qualitative data was collected from multiple interviews and via researcher notes and video recordings made during the VR sessions.ResultsResults highlight the usability issues that impacted on the aged care residents' ability to use interactive VR technology and the potential negative impact head mounted displays can have on those living with dementia; the role that VR can play in engaging residents who might otherwise self-isolate, and how this can extend to increased engagement with family and friends.Discussion and implicationsWe discuss the design challenges that will need to be met in order to ensure that interactive VR technology can be used by residents living in aged care, and the potential for VR to be used as a tool to improve the quality of life of some older residents, particularly those for whom traditional social activities do not appeal.
  • Immigrating after 60: Information experiences of older Chinese migrants to
           Australia and Canada
    • Abstract: Publication date: Available online 9 September 2019Source: Information Processing & ManagementAuthor(s): Nadia Caidi, Jia Tina Du, Lei Li, Junyue Mavis Shend, Qiaoling SunbAbstractWhile there is much research on migrant information behavior, the older population tends to be underrepresented in the literature. This article reports on a qualitative study with 16 Chinese older adults (aged 60 and over) who were recent immigrants to Australia and Canada. Migrating late in life presents some unique characteristics and challenges. In both countries, the discourse of “family reunification” frames the experiences of the participants, including their information activities as they learn to navigate the new environment. We used a parallel approach across the two countries to examine these older adults’ information practices as well as the transnational dimension of their settlement process. Findings point to a shared social imaginary as well as daily rituals and coping mechanisms of these late-life immigrants, along with associated information activities. We draw implications for our understanding of this under-studied migrant population, as well as for the design of information support for older migrants as part of their social inclusion in the host country.
  • Large-scale instance-level image retrieval
    • Abstract: Publication date: Available online 29 August 2019Source: Information Processing & ManagementAuthor(s): Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Lucia VadicamoAbstractThe great success of visual features learned from deep neural networks has led to a significant effort to develop efficient and scalable technologies for image retrieval. Nevertheless, its usage in large-scale Web applications of content-based retrieval is still challenged by their high dimensionality. To overcome this issue, some image retrieval systems employ the product quantization method to learn a large-scale visual dictionary from a training set of global neural network features. These approaches are implemented in main memory, preventing their usage in big-data applications. The contribution of the work is mainly devoted to investigating some approaches to transform neural network features into text forms suitable for being indexed by a standard full-text retrieval engine such as Elasticsearch. The basic idea of our approaches relies on a transformation of neural network features with the twofold aim of promoting the sparsity without the need of unsupervised pre-training. We validate our approach on a recent convolutional neural network feature, namely Regional Maximum Activations of Convolutions (R-MAC), which is a state-of-art descriptor for image retrieval. Its effectiveness has been proved through several instance-level retrieval benchmarks. An extensive experimental evaluation conducted on the standard benchmarks shows the effectiveness and efficiency of the proposed approach and how it compares to state-of-the-art main-memory indexes.
  • Deep ranking based cost-sensitive multi-label learning for distant
           supervision relation extraction
    • Abstract: Publication date: Available online 26 August 2019Source: Information Processing & ManagementAuthor(s): Hai Ye, Zhunchen LuoAbstractKnowledge base provides a potential way to improve the intelligence of information retrieval (IR) systems, for that knowledge base has numerous relations between entities which can help the IR systems to conduct inference from one entity to another entity. Relation extraction is one of the fundamental techniques to construct a knowledge base. Distant supervision is a semi-supervised learning method for relation extraction which learns with labeled and unlabeled data. However, this approach suffers the problem of relation overlapping in which one entity tuple may have multiple relation facts. We believe that relation types can have latent connections, which we call class ties, and can be exploited to enhance relation extraction. However, this property between relation classes has not been fully explored before. In this paper, to exploit class ties between relations to improve relation extraction, we propose a general ranking based multi-label learning framework combined with convolutional neural networks, in which ranking based loss functions with regularization technique are introduced to learn the latent connections between relations. Furthermore, to deal with the problem of class imbalance in distant supervision relation extraction, we further adopt cost-sensitive learning to rescale the costs from the positive and negative labels. Extensive experiments on a widely used dataset show the effectiveness of our model to exploit class ties and to relieve class imbalance problem.
  • Information need: Introduction to the special issue
    • Abstract: Publication date: Available online 23 August 2019Source: Information Processing & ManagementAuthor(s): Pia Borlund, Ian Ruthven
  • Dynamic attention-based explainable recommendation with textual and visual
    • Abstract: Publication date: Available online 20 August 2019Source: Information Processing & ManagementAuthor(s): Peng Liu, Lemei Zhang, Jon Atle GullaAbstractExplainable recommendation, which provides explanations about why an item is recommended, has attracted growing attention in both research and industry communities. However, most existing explainable recommendation methods cannot provide multi-model explanations consisting of both textual and visual modalities or adaptive explanations tailored for the user’s dynamic preference, potentially leading to the degradation of customers’ satisfaction, confidence and trust for the recommender system. On the technical side, Recurrent Neural Network (RNN) has become the most prevalent technique to model dynamic user preferences. Benefit from the natural characteristics of RNN, the hidden state is a combination of long-term dependency and short-term interest to some degrees. But it works like a black-box and the monotonic temporal dependency of RNN is not sufficient to capture the user’s short-term interest.In this paper, to deal with the above issues, we propose a novel Attentive Recurrent Neural Network (Ante-RNN) with textual and visual fusion for the dynamic explainable recommendation. Specifically, our model jointly learns image representations with textual alignment and text representations with topical attention mechanism in a parallel way. Then a novel dynamic contextual attention mechanism is incorporated into Ante-RNN for modelling the complicated correlations among recent items and strengthening the user’s short-term interests. By combining the full latent visual-semantic alignments and a hybrid attention mechanism including topical and contextual attentions, Ante-RNN makes the recommendation process more transparent and explainable. Extensive experimental results on two real world datasets demonstrate the superior performance and explainability of our model.
  • Taylor's Q1 “Visceral” level of information need: What is
    • Abstract: Publication date: Available online 20 August 2019Source: Information Processing & ManagementAuthor(s): Charles ColeAbstractTaylor (1968) dramatically stated that information seekers/searchers do not use their real Q1-level of information need when formulating their query to the system. Instead, they use a compromised Q4-level form of their need. The article directly confronts what Taylor's (1968) Q1-level information need is–the “actual” or “real” information need of the searcher. The article conceptually and operationally defines what Taylor's Q1-level of information need is using Belkin's (1980) ASK concept as a basis for designing a system intervention that shifts the searcher from representing the Q4-level compromised form of the need in her query to representing instead her Q1-level real information need. The article describes the Q1 Actualizing Intervention Model, which can be built into a system capable of actualizing the uncertainty distribution of the searcher's belief ASK so that information search is directed by the searcher's real Q1-level information need. The objective of the Q1 Actualizing Intervention Model is to enable in our Knowledge Age the introduction of intervention IR systems that are organic and human-centric, designed to initiate organic knowledge production processes in the searcher.
  • Knowledge acquisition from parsing natural language expressions for
           humanoid robot action commands
    • Abstract: Publication date: Available online 9 August 2019Source: Information Processing & ManagementAuthor(s): Diego Reforgiato Recupero, Federico SpigaAbstractIn this paper we propose an approach that allows the NAO humanoid robot to execute natural language commands spoken by the user. To provide the robot with knowledge, we have defined an action robot ontology. The ontology is fed to an NLP engine that performs a machine reading of the input text (in natural language) given by a user and tries to identify action commands for the robot to execute. The system can work in two modes: STATELESS and STATEFUL. In STATELESS mode, each human expression correctly interpreted by the robot as an action command is performed by NAO which returns in its default posture afterwards. When in STATEFUL mode, the robot has knowledge of its current posture and performs the command only if it is compatible with its current state. In this mode, the robot does not return to its default posture. For example, if the user had told the robot to stand on its right leg in a first command, the robot cannot perform a following command stating to stand on its left leg as the two actions (raise left leg and raise right leg are incompatible). For each action that the robot can perform we modeled a corresponding element in the ontology that also includes a list of associated compatible and non-compatible actions. Our system also handles compound expressions (e.g., move your arms up) and multiple expressions (different commands within one sentence) that the robot understands and performs.
  • Vertical and horizontal relationships amongst task-based information needs
    • Abstract: Publication date: Available online 2 August 2019Source: Information Processing & ManagementAuthor(s): Katriina Byström, Sanna KumpulainenAbstractIn this article, we present a conceptual framework of information needs for task-based information studies. The framework accounts for both vertical and horizontal relationships between information needs as fluid activities in work-task performance. As part of task performance, pieces of information are gathered from various, heterogeneous sources, not primarily to fulfil any expressed formulation of information needs, but in order to make progress in the task. The vertical relationships pinpoint connections between general and specific, from the workplace context to the interaction with an information source, and the horizontal relationships between parallel information needs. These relationships enrich the conceptual understanding of information needs in information studies, which previously has focussed on sequential relationships. The sequential, vertical and horizontal relationships form an analytical network that allows a departure from the black-box depiction of information needs.
  • “Nothing's available”: Young fathers’ experiences with unmet
           information needs and barriers to resolving them
    • Abstract: Publication date: Available online 25 July 2019Source: Information Processing & ManagementAuthor(s): C. Mniszak, H.L. O'Brien, D. Greyson, C. Chabot, J. ShovellerAbstractYoung fathers, like all parents, have a range of information needs, such as learning how to introduce their babies to solid foods. Yet compared to young mothers and older parents, they have fewer resources available to them. To date, young fathers have not been identified as a priority population in need of parenting-related information and face unequal access to information resources. This inequality is in part related to gender stereotypes and social biases about young men who become parents at “too early” an age. Through interviews and field observation conducted during a longitudinal ethnographic study of young mothers, fathers, their parents, and service providers in two cities in British Columbia, Canada, we examined young fathers’ gendered experiences accessing parenting information and resources. Using an ecological model of information needs, we identified factors at different levels: micro (e.g., personal), meso (e.g., relational) and macro (e.g., access to city/provincial parenting programs and resources) that revealed information inequalities for young fathers. Our findings illustrate that young fathers often have unexpressed and unaddressed information needs due to barriers they encounter when accessing services, the stigma they experience as early age parents, and social pressures that result in avoiding asking for help in order to adhere to traditional masculine values.
  • Changing views: Persuasion modeling and argument extraction from online
    • Abstract: Publication date: Available online 24 July 2019Source: Information Processing & ManagementAuthor(s): Subhabrata Dutta, Dipankar Das, Tanmoy ChakrabortyAbstractPersuasion and argumentation are possibly among the most complex examples of the interplay between multiple human subjects. With the advent of the Internet, online forums provide wide platforms for people to share their opinions and reasonings around various diverse topics. In this work, we attempt to model persuasive interaction between users on Reddit, a popular online discussion forum. We propose a deep LSTM model to classify whether a conversation leads to a successful persuasion or not, and use this model to predict whether a certain chain of arguments can lead to persuasion. While learning persuasion dynamics, our model tends to identify argument facets implicitly, using an attention mechanism. We also propose a semi-supervised approach to extract argumentative components from discussion threads. Both these models provide useful insight into how people engage in argumentation on online discussion forums.
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762

Your IP address:
Home (Search)
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-