Journal Cover
Information Retrieval
Journal Prestige (SJR): 0.352
Citation Impact (citeScore): 2
Number of Followers: 614  
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1573-7659 - ISSN (Online) 1386-4564
Published by Springer-Verlag Homepage  [2351 journals]
  • How do interval scales help us with better understanding IR evaluation
    • Abstract: Evaluation measures are the basis for quantifying the performance of IR systems and the way in which their values can be processed to perform statistical analyses depends on the scales on which these measures are defined. For example, mean and variance should be computed only when relying on interval scales. In our previous work we defined a theory of IR evaluation measures, based on the representational theory of measurement, which allowed us to determine whether and when IR measures are interval scales. We found that common set-based retrieval measures—namely precision, recall, and F-measure—always are interval scales in the case of binary relevance while this does not happen in the multi-graded relevance case. In the case of rank-based retrieval measures—namely AP, gRBP, DCG, and ERR—only gRBP is an interval scale when we choose a specific value of the parameter p and define a specific total order among systems while all the other IR measures are not interval scales. In this work, we build on our previous findings and we carry out an extensive evaluation, based on standard TREC collections, to study how our theoretical findings impact on the experimental ones. In particular, we conduct a correlation analysis to study the relationship among the above-mentioned state-of-the-art evaluation measures and their scales. We study how the scales of evaluation measures impact on non parametric and parametric statistical tests for multiple comparisons of IR system performance. Finally, we analyse how incomplete information and pool downsampling affect different scales and evaluation measures.
      PubDate: 2019-09-04
  • Evaluating sentence-level relevance feedback for high-recall information
    • Abstract: This study uses a novel simulation framework to evaluate whether the time and effort necessary to achieve high recall using active learning is reduced by presenting the reviewer with isolated sentences, as opposed to full documents, for relevance feedback. Under the weak assumption that more time and effort is required to review an entire document than a single sentence, simulation results indicate that the use of isolated sentences for relevance feedback can yield comparable accuracy and higher efficiency, relative to the state-of-the-art baseline model implementation (BMI) of the AutoTAR continuous active learning (“CAL”) method employed in the TREC 2015 and 2016 Total Recall Track.
      PubDate: 2019-08-13
  • Deep cross-platform product matching in e-commerce
    • Abstract: Online shopping has become more and more popular in recent years, which leads to a prosperity on online platforms. Generally, the identical products are provided by many sellers on multiple platforms. Thus the comparison between products on multiple platforms becomes a basic demand for both consumers and sellers. However, identifying identical products on multiple platforms is difficult because the description for a certain product can be various. In this work, we propose a novel neural matching model to solve this problem. Two kinds of descriptions (i.e. product titles and attributes), which are widely provided on online platforms, are considered in our method. We conduct experiments on a real-world data set which contains thousands of products on two online e-commerce platforms. The experimental results show that our method can take use of the product information contained in both titles and attributes and significantly outperform the state-of-the-art matching models.
      PubDate: 2019-08-13
  • Payoffs and pitfalls in using knowledge-bases for consumer health search
    • Abstract: Consumer health search (CHS) is a challenging domain with vocabulary mismatch and considerable domain expertise hampering peoples’ ability to formulate effective queries. We posit that using knowledge bases for query reformulation may help alleviate this problem. How to exploit knowledge bases for effective CHS is nontrivial, involving a swathe of key choices and design decisions (many of which are not explored in the literature). Here we rigorously empirically evaluate the impact these different choices have on retrieval effectiveness. A state-of-the-art knowledge-base retrieval model—the Entity Query Feature Expansion model—was used to evaluate these choices, which include: which knowledge base to use (specialised vs. general purpose), how to construct the knowledge base, how to extract entities from queries and map them to entities in the knowledge base, what part of the knowledge base to use for query expansion, and if to augment the knowledge base search process with relevance feedback. While knowledge base retrieval has been proposed as a solution for CHS, this paper delves into the finer details of doing this effectively, highlighting both payoffs and pitfalls. It aims to provide some lessons to others in advancing the state-of-the-art in CHS.
      PubDate: 2019-08-01
  • Neural variational entity set expansion for automatically populated
           knowledge graphs
    • Abstract: We propose Neural variational set expansion to extract actionable information from a noisy knowledge graph (KG) and propose a general approach for increasing the interpretability of recommendation systems. We demonstrate the usefulness of applying a variational autoencoder to the Entity set expansion task based on a realistic automatically generated KG.
      PubDate: 2019-08-01
  • Overcoming low-utility facets for complex answer retrieval
    • Abstract: Many questions cannot be answered simply; their answers must include numerous nuanced details and context. Complex Answer Retrieval (CAR) is the retrieval of answers to such questions. These questions can be constructed from a topic entity (e.g., ‘cheese’) and a facet (e.g., ‘health effects’). While topic matching has been thoroughly explored, we observe that some facets use general language that is unlikely to appear verbatim in answers, exhibiting low utility. In this work, we present an approach to CAR that identifies and addresses low-utility facets. First, we propose two estimators of facet utility: the hierarchical structure of CAR queries, and facet frequency information from training data. Then, to improve the retrieval performance on low-utility headings, we include entity similarity scores using embeddings trained from a CAR knowledge graph, which captures the context of facets. We show that our methods are effective by applying them to two leading neural ranking techniques, and evaluating them on the TREC CAR dataset. We find that our approach perform significantly better than the unmodified neural ranker and other leading CAR techniques, yielding state-of-the-art results. We also provide a detailed analysis of our results, verify that low-utility facets are indeed difficult to match, and that our approach improves the performance for these difficult queries.
      PubDate: 2019-08-01
  • Abstraction of query auto completion logs for anonymity-preserving
    • Abstract: Query auto completion (QAC) is used in search interfaces to interactively offer a list of suggestions to users as they enter queries. The suggested completions are updated each time the user modifies their partial query, as they either add further keystrokes or interact directly with completions that have been offered. In this work we use a state model to capture the possible interactions that can occur in a QAC environment. Using this model, we show how an abstract QAC log can be derived from a sequence of QAC interactions; this log does not contain the actual characters entered, but records only the sequence of types of interaction, thus preserving user anonymity with extremely high confidence. To validate the usefulness of the approach, we use a large scale abstract QAC log collected from a popular commercial search engine to demonstrate how previous and new knowledge about QAC behavior can be inferred without knowledge of the queries being entered. An interaction model is then derived from this log to demonstrate its validity, and we report observations on user behavior with QAC systems based on the interaction model that is proposed.
      PubDate: 2019-06-06
  • The impact of result diversification on search behaviour and performance
    • Abstract: Result diversification aims to provide searchers with a broader view of a given topic while attempting to maximise the chances of retrieving relevant material. Diversifying results also aims to reduce search bias by increasing the coverage over different aspects of the topic. As such, searchers should learn more about the given topic in general. Despite diversification algorithms being introduced over two decades ago, little research has explicitly examined their impact on search behaviour and performance in the context of Interactive Information Retrieval (IIR). In this paper, we explore the impact of diversification when searchers undertake complex search tasks that require learning about different aspects of a topic (aspectual retrieval). We hypothesise that by diversifying search results, searchers will be exposed to a greater number of aspects. In turn, this will maximise their coverage of the topic (and thus reduce possible search bias). As a consequence, diversification should lead to performance benefits, regardless of the task, but how does diversification affect search behaviours and search satisfaction' Based on Information Foraging Theory (IFT), we infer two hypotheses regarding search behaviours due to diversification, namely that (i) it will lead to searchers examining fewer documents per query, and (ii) it will also mean searchers will issue more queries overall. To this end, we performed a within-subjects user study using the TREC AQUAINT collection with 51 participants, examining the differences in search performance and behaviour when using (i) a non-diversified system (BM25) versus (ii) a diversified system (BM25 + xQuAD) when the search task is either (a) ad-hoc or (b) aspectual. Our results show a number of notable findings in terms of search behaviour: participants on the diversified system issued more queries and examined fewer documents per query when performing the aspectual search task. Furthermore, we showed that when using the diversified system, participants were: more successful in marking relevant documents, and obtained a greater awareness of the topics (i.e. identified relevant documents containing more novel aspects). These findings show that search behaviour is influenced by diversification and task complexity. They also motivate further research into complex search tasks such as aspectual retrieval—and how diversity can play an important role in improving the search experience, by providing greater coverage of a topic and mitigating potential bias in search results.
      PubDate: 2019-05-16
  • Fewer topics' A million topics' Both'! On topics subsets in
           test collections
    • Abstract: When evaluating IR run effectiveness using a test collection, a key question is: What search topics should be used' We explore what happens to measurement accuracy when the number of topics in a test collection is reduced, using the Million Query 2007, TeraByte 2006, and Robust 2004 TREC collections, which all feature more than 50 topics, something that has not been examined in past work. Our analysis finds that a subset of topics can be found that is as accurate as the full topic set at ranking runs. Further, we show that the size of the subset, relative to the full topic set, can be substantially smaller than was shown in past work. We also study the topic subsets in the context of the power of statistical significance tests. We find that there is a trade off with using such sets in that significant results may be missed, but the loss of statistical significance is much smaller than when selecting random subsets. We also find topic subsets that can result in a low accuracy test collection, even when the number of queries in the subset is quite large. These negatively correlated subsets suggest we still lack good methodologies which provide stability guarantees on topic selection in new collections. Finally, we examine whether clustering of topics is an appropriate strategy to find and characterize good topic subsets. Our results contribute to the understanding of information retrieval effectiveness evaluation, and offer insights for the construction of test collections.
      PubDate: 2019-05-08
  • Low-cost, bottom-up measures for evaluating search result diversification
    • Abstract: Search result diversification aims at covering different user intents by returning a diversified document list. Most existing diversity measures require a predefined set of intents for a given query, where it is assumed that there is no relationship across these intents. However, studies have shown that modeling a hierarchy of intents has some benefits over the standard measure of using a flat list of intents. Intuitively, having more layers in the intent hierarchy seems to imply that we can consider more intricate relationships between intents and thereby identify subtle differences between documents that cover different intents. On the other hand, manually building a rich intent hierarchy imposes extra cost and is probably not very practical. In light of these considerations, we first propose a measure to build a hierarchy of intents from a given set of flat intents by clustering per-intent relevant documents and thereby identifying subintents. Furthermore, in our second measure, we consider a variant of our first measure that clusters per-topic relevance documents rather than per-intent ones, which is also intent-free. In addition, we propose our third measure, a simple, completely intent-free measure to search result diversity evaluation, which leverages document similarities. Our experiments based on TREC Web Track 2009–2013 test collections show that our proposed measures have advantages over existing diversity measures despite their low annotation costs.
      PubDate: 2019-04-20
  • User interest prediction over future unobserved topics on social networks
    • Abstract: The accurate prediction of users’ future interests on social networks allows one to perform future planning by studying how users will react if certain topics emerge in the future. It can improve areas such as targeted advertising and the efficient delivery of services. Despite the importance of predicting user future interests on social networks, existing works mainly focus on identifying user current interests and little work has been done on the prediction of user potential interests in the future. There have been work that attempt to identify a user future interests, however they cannot predict user interests with regard to new topics since these topics have never received any feedback from users in the past. In this paper, we propose a framework that works on the basis of temporal evolution of user interests and utilizes semantic information from knowledge bases such as Wikipedia to predict user future interests and overcome the cold item problem. Through extensive experiments on a real-world Twitter dataset, we demonstrate the effectiveness of our approach in predicting future interests of users compared to state-of-the-art baselines. Moreover, we further show that the impact of our work is especially meaningful when considered in case of cold items.
      PubDate: 2019-04-01
  • Influence me! Predicting links to influential users
    • Abstract: In addition to being in contact with friends, online social networks are commonly used as a source of information, suggestions and recommendations from members of the community. Whenever we accept a suggestion or perform any action because it was recommended by a “friend”, we are being influenced by him/her. For this reason, it is useful for users seeking for interesting information to identify and connect to this kind of influential users. In this context, we propose an approach to predict links to influential users. Compared to approaches that identify general influential users in a network, our approach seeks to identify users who might have some kind of influence to individual (target) users. To carry out this goal, we adapted an influence maximization algorithm to find new influential users from the set of current influential users of the target user. Moreover, we compared the results obtained with different metrics for link prediction and analyzed in which context these metrics obtained better results.
      PubDate: 2019-04-01
  • A topic recommender for journalists
    • Abstract: The way in which people gather information about events and form their own opinion on them has changed dramatically with the advent of social media. For many readers, the news gathered from online sources has become an opportunity to share points of view and information within micro-blogging platforms such as Twitter, mainly aimed at satisfying their communication needs. Furthermore, the need to deepen the aspects related to news stimulates a demand for additional information which is often met through online encyclopedias, such as Wikipedia. This behaviour has also influenced the way in which journalists write their articles, requiring a careful assessment of what actually interests the readers. The goal of this paper is to present a recommender system, What to Write and Why, capable of suggesting to a journalist, for a given event, the aspects still uncovered in news articles on which the readers focus their interest. The basic idea is to characterize an event according to the echo it receives in online news sources and associate it with the corresponding readers’ communicative and informative patterns, detected through the analysis of Twitter and Wikipedia, respectively. Our methodology temporally aligns the results of this analysis and recommends the concepts that emerge as topics of interest from Twitter and Wikipedia, either not covered or poorly covered in the published news articles.
      PubDate: 2019-04-01
  • Determining the interests of social media users: two approaches
    • Abstract: Although social media platforms serve diverse purposes, from social and professional networking to photo sharing and blogging, people frequently use them to share the thoughts and opinions and most importantly, their interests (e.g., politics, economy, sports). Understanding the interests of social media users is key to many applications that need to characterize them to recommend some services and find other individuals with similar interests. In this paper, we propose two approaches to the automatic determination of the interests of social media users. The first, that we named Frisk, is an unsupervised multilingual approach that determines the interests of a user from the explicit meaning of the words that occur in the user’s posts. The second, that we termed Ascertain, is a supervised approach that resorts to the hidden dimensions of the words that several studies indicated to be capable of revealing some of the psychological processes and personality traits of a person. In our evaluation, that we performed on two datasets obtained from Twitter, we show that Frisk is capable of inferring the interests in a multilingual context with good accuracy and that the psychological dimensions used by Ascertain are also good predictors of a user’s interests.
      PubDate: 2019-04-01
  • Special issue on knowledge graphs and semantics in text analysis and
    • PubDate: 2019-03-04
  • Guest editorial: social media for personalization and search
    • Authors: Ludovico Boratto; Andreas Kaltenbrunner; Giovanni Stilo
      PubDate: 2019-02-21
      DOI: 10.1007/s10791-019-09352-1
  • Neural architecture for question answering using a knowledge graph and web
    • Abstract: In Web search, entity-seeking queries often trigger a special question answering (QA) system. It may use a parser to interpret the question to a structured query, execute that on a knowledge graph (KG), and return direct entity responses. QA systems based on precise parsing tend to be brittle: minor syntax variations may dramatically change the response. Moreover, KG coverage is patchy. At the other extreme, a large corpus may provide broader coverage, but in an unstructured, unreliable form. We present AQQUCN, a QA system that gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of query syntax, between well-formed questions to short “telegraphic” keyword sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals from KGs and large corpora to directly rank KG entities, rather than commit to one semantic interpretation of the query. AQQUCN models the ideal interpretation as an unobservable or latent variable. Interpretations and candidate entity responses are scored as pairs, by combining signals from multiple convolutional networks that operate collectively on the query, KG and corpus. On four public query workloads, amounting to over 8000 queries with diverse query syntax, we see 5–16% absolute improvement in mean average precision (MAP), compared to the entity ranking performance of recent systems. Our system is also competitive at entity set retrieval, almost doubling F1 scores for challenging short queries.
      PubDate: 2019-01-07
      DOI: 10.1007/s10791-018-9348-8
  • Automated assessment of knowledge hierarchy evolution: comparing directed
           acyclic graphs
    • Authors: Guruprasad Nayak; Sourav Dutta; Deepak Ajwani; Patrick Nicholson; Alessandra Sala
      Abstract: Automated construction of knowledge hierarchies from huge data corpora is gaining increasing attention in recent years, in order to tackle the infeasibility of manually extracting and semantically linking millions of concepts. As a knowledge hierarchy evolves with these automated techniques, there is a need for measures to assess its temporal evolution, quantifying the similarities between different versions and identifying the relative growth of different subgraphs in the knowledge hierarchy. In this paper, we focus on measures that leverage structural properties of the knowledge hierarchy graph to assess the temporal changes. We propose a principled and scalable similarity measure, based on Katz similarity between concept nodes, for comparing different versions of a knowledge hierarchy, modeled as a generic directed acyclic graph. We present theoretical analysis to depict that the proposed measure accurately captures the salient properties of taxonomic hierarchies, assesses changes in the ordering of nodes, along with the logical subsumption of relationships among concepts. We also present a linear time variant of the measure, and show that our measures, unlike previous approaches, are tunable to cater to diverse application needs. We further show that our measure provides interpretability, thereby identifying the key structural and logical difference in the hierarchies. Experiments on a real DBpedia and biological knowledge hierarchy showcase that our measures accurately capture structural similarity, while providing enhanced scalability and tunability. Also, we demonstrate that the temporal evolution of different subgraphs in this knowledge hierarchy, as captured purely by our structural measure, corresponds well with the known disruptions in the related subject areas.
      PubDate: 2018-12-17
      DOI: 10.1007/s10791-018-9345-y
  • Identifying and exploiting target entity type information for ad hoc
           entity retrieval
    • Authors: Darío Garigliotti; Faegheh Hasibi; Krisztian Balog
      Abstract: Today, the practice of returning entities from a knowledge base in response to search queries has become widespread. One of the distinctive characteristics of entities is that they are typed, i.e., assigned to some hierarchically organized type system (type taxonomy). The primary objective of this paper is to gain a better understanding of how entity type information can be utilized in entity retrieval. We perform this investigation in two settings: firstly, in an idealized “oracle” setting, assuming that we know the distribution of target types of the relevant entities for a given query; and secondly, in a realistic scenario, where target entity types are identified automatically based on the keyword query. We perform a thorough analysis of three main aspects: (i) the choice of type taxonomy, (ii) the representation of hierarchical type information, and (iii) the combination of type-based and term-based similarity in the retrieval model. Using a standard entity search test collection based on DBpedia, we show that type information can significantly and substantially improve retrieval performance, yielding up to 67% relative improvement in terms of NDCG@10 over a strong text-only baseline in an oracle setting. We further show that using automatic target type detection, we can outperform the text-only baseline by 44% in terms of NDCG@10. This is as good as, and sometimes even better than, what is attainable by using explicit target type information provided by humans. These results indicate that identifying target entity types of queries is challenging even for humans and attests to the effectiveness of our proposed automatic approach.
      PubDate: 2018-12-05
      DOI: 10.1007/s10791-018-9346-x
  • Those were the days: learning to rank social media posts for reminiscence
    • Authors: Kaweh Djafari Naini; Ricardo Kawase; Nattiya Kanhabua; Claudia Niederée; Ismail Sengor Altingovde
      Abstract: Social media posts are a great source for life summaries aggregating activities, events, interactions and thoughts of the last months or years. They can be used for personal reminiscence as well as for keeping track with developments in the lives of not-so-close friends. One of the core challenges of automatically creating such summaries is to decide which posts are memorable, i.e., should be considered for retention and which ones to forget. To address this challenge, we design and conduct user evaluation studies and construct a corpus that captures human expectations towards content retention. We analyze this corpus to identify a small set of seed features that are most likely to characterize memorable posts. Next, we compile a broader set of features that are leveraged to build general and personalized machine-learning models to rank posts for retention. By applying feature selection, we identify a compact yet effective subset of these features. The models trained with the presented feature sets outperform the baseline models exploiting an intuitive set of temporal and social features.
      PubDate: 2018-08-11
      DOI: 10.1007/s10791-018-9339-9
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-