Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Similar Journals
Journal Cover
Information Retrieval
Journal Prestige (SJR): 0.352
Citation Impact (citeScore): 2
Number of Followers: 734  
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1573-7659 - ISSN (Online) 1386-4564
Published by Springer-Verlag Homepage  [2626 journals]
  • Deep cross-platform product matching in e-commerce
    • Abstract: Online shopping has become more and more popular in recent years, which leads to a prosperity on online platforms. Generally, the identical products are provided by many sellers on multiple platforms. Thus the comparison between products on multiple platforms becomes a basic demand for both consumers and sellers. However, identifying identical products on multiple platforms is difficult because the description for a certain product can be various. In this work, we propose a novel neural matching model to solve this problem. Two kinds of descriptions (i.e. product titles and attributes), which are widely provided on online platforms, are considered in our method. We conduct experiments on a real-world data set which contains thousands of products on two online e-commerce platforms. The experimental results show that our method can take use of the product information contained in both titles and attributes and significantly outperform the state-of-the-art matching models.
      PubDate: 2020-04-01
  • Offline evaluation options for recommender systems
    • Abstract: We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.
      PubDate: 2020-03-18
  • A passage-based approach to learning to rank documents
    • Abstract: According to common relevance-judgments regimes, such as TREC’s, a document can be deemed relevant to a query even if it contains a very short passage of text with pertinent information. This fact has motivated work on passage-based document retrieval: document ranking methods that induce information from the document’s passages. However, the main source of passage-based information utilized was passage-query similarities. In this paper, we address the challenge of utilizing richer sources of passage-based information to improve document retrieval effectiveness. Specifically, we devise a suite of learning-to-rank-based document retrieval methods that utilize an effective ranking of passages produced in response to the query. Some of the methods quantify the ranking of the passages of a document. Others utilize the feature-based representation of the document’s passages. Empirical evaluation attests to the clear merits of our methods with respect to highly effective baselines. Our best performing method is based on learning a document ranking function using document-query features and passage-query features of the document’s passage most highly ranked; the passage-query features are those used to learn a highly effective passage ranker.
      PubDate: 2020-03-06
  • Introduction to special issue on eCommerce search and recommendation
    • PubDate: 2020-02-28
  • Low-cost, bottom-up measures for evaluating search result diversification
    • Abstract: Search result diversification aims at covering different user intents by returning a diversified document list. Most existing diversity measures require a predefined set of intents for a given query, where it is assumed that there is no relationship across these intents. However, studies have shown that modeling a hierarchy of intents has some benefits over the standard measure of using a flat list of intents. Intuitively, having more layers in the intent hierarchy seems to imply that we can consider more intricate relationships between intents and thereby identify subtle differences between documents that cover different intents. On the other hand, manually building a rich intent hierarchy imposes extra cost and is probably not very practical. In light of these considerations, we first propose a measure to build a hierarchy of intents from a given set of flat intents by clustering per-intent relevant documents and thereby identifying subintents. Furthermore, in our second measure, we consider a variant of our first measure that clusters per-topic relevance documents rather than per-intent ones, which is also intent-free. In addition, we propose our third measure, a simple, completely intent-free measure to search result diversity evaluation, which leverages document similarities. Our experiments based on TREC Web Track 2009–2013 test collections show that our proposed measures have advantages over existing diversity measures despite their low annotation costs.
      PubDate: 2020-02-01
  • Fewer topics' A million topics' Both'! On topics subsets in
           test collections
    • Abstract: When evaluating IR run effectiveness using a test collection, a key question is: What search topics should be used' We explore what happens to measurement accuracy when the number of topics in a test collection is reduced, using the Million Query 2007, TeraByte 2006, and Robust 2004 TREC collections, which all feature more than 50 topics, something that has not been examined in past work. Our analysis finds that a subset of topics can be found that is as accurate as the full topic set at ranking runs. Further, we show that the size of the subset, relative to the full topic set, can be substantially smaller than was shown in past work. We also study the topic subsets in the context of the power of statistical significance tests. We find that there is a trade off with using such sets in that significant results may be missed, but the loss of statistical significance is much smaller than when selecting random subsets. We also find topic subsets that can result in a low accuracy test collection, even when the number of queries in the subset is quite large. These negatively correlated subsets suggest we still lack good methodologies which provide stability guarantees on topic selection in new collections. Finally, we examine whether clustering of topics is an appropriate strategy to find and characterize good topic subsets. Our results contribute to the understanding of information retrieval effectiveness evaluation, and offer insights for the construction of test collections.
      PubDate: 2020-02-01
  • Demographic differences in search engine use with implications for cohort
    • Abstract: The correlation between the demographics of users and the text they write has been investigated through literary texts and, more recently, social media. However, differences pertaining to language use in search engines has not been thoroughly analyzed, especially for age and gender differences. Such differences are important especially due to the growing use of search engine data in the study of human health, where queries are used to identify patient populations. Using three datasets comprising of queries from multiple general-purpose Internet search engines we investigate the correlation between demography (age, gender, and income) and the text of queries submitted to search engines. Our results show that females and younger people use longer queries. This difference is such that females make approximately 25% more queries with 10 or more words. In the case of queries which identify users as having specific medical conditions we find that females make 53% more queries than expected, and that this results in patient cohorts which are highly skewed in gender and age, compared to known patient populations. We show that methods for cohort selection which use additional information beyond queries where users indicate their condition are less skewed. Finally, we show that biased training cohorts can lead to differential performance of models designed to detect disease from search engine queries. Our results indicate that studies where demographic representation is important, such as in the study of health aspect of users or when search engines are evaluated for fairness, care should be taken in the selection of search engine data so as to create a representative dataset.
      PubDate: 2019-12-01
  • Beyond word embeddings: learning entity and concept representations from
           large scale knowledge bases
    • Abstract: Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: (1) analogical reasoning, where we achieve a state-of-the-art performance of 91% on semantic analogies, (2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions.
      PubDate: 2019-12-01
  • A comparison of filtering evaluation metrics based on formal constraints
    • Abstract: Although document filtering is simple to define, there is a wide range of different evaluation measures that have been proposed in the literature, all of which have been subject to criticism. Our goal is to compare metrics from a formal point of view, in order to understand whether each metric is appropriate, why and when, in order to achieve a better understanding of the similarities and differences between metrics. Our formal study leads to a typology of measures for document filtering which is based on (1) a formal constraint that must be satisfied by any suitable evaluation measure, and (2) a set of three (mutually exclusive) formal properties which help to understand the fundamental differences between measures and determining which ones are more appropriate depending on the application scenario. As far as we know, this is the first in-depth study on how filtering metrics can be categorized according to their appropriateness for different scenarios. Two main findings derive from our study. First, not every measure satisfies the basic constraint; but problematic measures can be adapted using smoothing techniques that and makes them compliant with the basic constraint while preserving their original properties. Our second finding is that all metrics (except one) can be grouped in three families, each satisfying one out of three formal properties which are mutually exclusive. In cases where the application scenario is clearly defined, this classification of metrics should help choosing an adequate evaluation measure. The exception is the Reliability/Sensitivity metric pair, which does not fit into any of the three families, but has two valuable empirical properties: it is strict (i.e. a good result according to reliability/sensitivity ensures a good result according to all other metrics) and has more robustness that all other measures considered in our study.
      PubDate: 2019-12-01
  • A selective approach to index term weighting for robust information
           retrieval based on the frequency distributions of query terms
    • Abstract: A typical information retrieval (IR) system applies a single retrieval strategy to every information need of users. However, the results of the past IR experiments show that a particular retrieval strategy is in general good at fulfilling some type of information needs while failing to fulfil some other type, i.e., high variation in retrieval effectiveness across information needs. On the other hand, the same results also show that an information need that a particular retrieval strategy failed to fulfil could be fulfilled by one of the other existing retrieval strategies. The challenge in here is therefore to determine in advance what retrieval strategy should be applied to which information need. This challenge is related to the robustness of IR systems in retrieval effectiveness. For an IR system, robustness can be defined as fulfilling every information need of users with an acceptable level of satisfaction. Maintaining robustness in retrieval effectiveness is a long-standing challenge and in this article we propose a simple but powerful method as a remedy. The method is a selective approach to index term weighting and for any given query (i.e., information need) it predicts the “best” term weighting model amongst a set of alternatives, on the basis of the frequency distributions of query terms on a target document collection. To predict the best term weighting model, the method uses the Chi-square statistic, the statistic of the Chi-square goodness-of-fit test. The results of the experiments, performed using the official query sets of the TREC Web track and the Million Query track, reveal in general that the frequency distributions of query terms provide relevant information on the retrieval effectiveness of term weighting models. In particular, the results show that the selective approach proposed in this article is, on average, more effective and more robust than the most effective single term weighting model.
      PubDate: 2019-12-01
  • Informational, transactional, and navigational need of information:
           relevance of search intention in search engine advertising
    • Abstract: This study investigates the impact of search query intention when evaluating and managing search engine advertising. Specifically, we study whether the performance of a search engine advertising campaign depends on the informational, transactional, and navigational search intentions and also consider the appearance of an organic result alongside a search engine advertisement on the same search engine results page. Both, search intention and organic presence significantly affect some performance indicators of search engine advertising. Advertising ranking as well as click and conversion metrics are influenced by search intention and organic presence. Advertisers may consequently assign advertising budgets according to the dominant search intention in line with their advertising objectives. With the help of search engine optimization, advertisers can also influence the organic presence on the search engine results pages. In summary, theory and practice need to include search intention and organic presence in search engine advertising management.
      PubDate: 2019-11-26
  • Preference-based interactive multi-document summarisation
    • Abstract: Interactive NLP is a promising paradigm to close the gap between automatic NLP systems and the human upper bound. Preference-based interactive learning has been successfully applied, but the existing methods require several thousand interaction rounds even in simulations with perfect user feedback. In this paper, we study preference-based interactive summarisation. To reduce the number of interaction rounds, we propose the Active Preference-based ReInforcement Learning (APRIL) framework. APRIL uses active learning to query the user, preference learning to learn a summary ranking function from the preferences, and neural Reinforcement learning to efficiently search for the (near-)optimal summary. Our results show that users can easily provide reliable preferences over summaries and that APRIL outperforms the state-of-the-art preference-based interactive method in both simulation and real-user experiments.
      PubDate: 2019-11-19
  • Boosting learning to rank with user dynamics and continuation methods
    • Abstract: Learning to rank (LtR) techniques leverage assessed samples of query-document relevance to learn effective ranking functions able to exploit the noisy signals hidden in the features used to represent queries and documents. In this paper we explore how to enhance the state-of-the-art LambdaMart LtR algorithm by integrating in the training process an explicit knowledge of the underlying user-interaction model and the possibility of targeting different objective functions that can effectively drive the algorithm towards promising areas of the search space. We enrich the iterative process followed by the learning algorithm in two ways: (1) by considering complex query-based user dynamics instead than simply discounting the gain by the rank position; (2) by designing a learning path across different loss functions that can capture different signals in the training data. Our extensive experiments, conducted on publicly available datasets, show that the proposed solution permits to improve various ranking quality measures by statistically significant margins.
      PubDate: 2019-11-05
  • Optimizing the recency-relevance-diversity trade-offs in non-personalized
           news recommendations
    • Abstract: Online news media sites are emerging as the primary source of news for a large number of users. Due to a large number of stories being published in these media sites, users usually rely on news recommendation systems to find important news. In this work, we focus on automatically recommending news stories to all users of such media websites, where the selection is not influenced by a particular user’s news reading habit. When recommending news stories in such non-personalized manner, there are three basic metrics of interest—recency, importance (analogous to relevance in personalized recommendation) and diversity of the recommended news. Ideally, recommender systems should recommend the most important stories soon after they are published. However, the importance of a story only becomes evident as the story ages, thereby creating a tension between recency and importance. A systematic analysis of popular recommendation strategies in use today reveals that they lead to poor trade-offs between recency and importance in practice. So, in this paper, we propose a new recommendation strategy (called Highest Future-Impact) which attempts to optimize on both the axes. To implement our proposed strategy in practice, we propose two approaches to predict the future-impact of news stories, by using crowd-sourced popularity signals and by observing editorial selection in past news data. Finally, we propose approaches to inculcate diversity in recommended news which can maintain a balanced proportion of news from different news sections. Evaluations over real-world news datasets show that our implementations achieve good performance in recommending news stories.
      PubDate: 2019-10-01
  • On the impact of group size on collaborative search effectiveness
    • Abstract: While today’s web search engines are designed for single-user search, over the years research efforts have shown that complex information needs—which are explorative, open-ended and multi-faceted—can be answered more efficiently and effectively when searching in collaboration. Collaborative search (and sensemaking) research has investigated techniques, algorithms and interface affordances to gain insights and improve the collaborative search process. It is not hard to imagine that the size of the group collaborating on a search task significantly influences the group’s behaviour and search effectiveness. However, a common denominator across almost all existing studies is a fixed group size—usually either pairs, triads or in a few cases four users collaborating. Investigations into larger group sizes and the impact of group size dynamics on users’ behaviour and search metrics have so far rarely been considered—and when, then only in a simulation setup. In this work, we investigate in a large-scale user experiment to what extent those simulation results carry over to the real world. To this end, we extended our collaborative search framework SearchX with algorithmic mediation features and ran a large-scale experiment with more than 300 crowd-workers. We consider the collaboration group size as a dependent variable, and investigate collaborations between groups of up to six people. We find that most prior simulation-based results on the impact of collaboration group size on behaviour and search effectiveness cannot be reproduced in our user experiment.
      PubDate: 2019-10-01
  • ReBoost: a retrieval-boosted sequence-to-sequence model for neural
           response generation
    • Abstract: Human–computer conversation is an active research topic in natural language processing. One of the representative methods to build conversation systems uses the sequence-to-sequence (Seq2seq) model through neural networks. However, with limited input information, the Seq2seq model tends to generate meaningless and trivial responses. It can be greatly enhanced if more supplementary information is provided in the generation process. In this work, we propose to utilize retrieved responses to boost the Seq2seq model for generating more informative replies. Our method, called ReBoost, incorporates retrieved results in the Seq2seq model by a hierarchical structure. The input message and retrieved results can influence the generation process jointly. Experiments on two benchmark datasets demonstrate that our model is able to generate more informative responses in both automatic and human evaluations and outperforms the state-of-the-art response generation models.
      PubDate: 2019-09-23
  • Special issue on de-personalisation, diversification, filter bubbles and
    • PubDate: 2019-09-18
  • Evaluating sentence-level relevance feedback for high-recall information
    • Abstract: This study uses a novel simulation framework to evaluate whether the time and effort necessary to achieve high recall using active learning is reduced by presenting the reviewer with isolated sentences, as opposed to full documents, for relevance feedback. Under the weak assumption that more time and effort is required to review an entire document than a single sentence, simulation results indicate that the use of isolated sentences for relevance feedback can yield comparable accuracy and higher efficiency, relative to the state-of-the-art baseline model implementation (BMI) of the AutoTAR continuous active learning (“CAL”) method employed in the TREC 2015 and 2016 Total Recall Track.
      PubDate: 2019-08-13
  • Abstraction of query auto completion logs for anonymity-preserving
    • Abstract: Query auto completion (QAC) is used in search interfaces to interactively offer a list of suggestions to users as they enter queries. The suggested completions are updated each time the user modifies their partial query, as they either add further keystrokes or interact directly with completions that have been offered. In this work we use a state model to capture the possible interactions that can occur in a QAC environment. Using this model, we show how an abstract QAC log can be derived from a sequence of QAC interactions; this log does not contain the actual characters entered, but records only the sequence of types of interaction, thus preserving user anonymity with extremely high confidence. To validate the usefulness of the approach, we use a large scale abstract QAC log collected from a popular commercial search engine to demonstrate how previous and new knowledge about QAC behavior can be inferred without knowledge of the queries being entered. An interaction model is then derived from this log to demonstrate its validity, and we report observations on user behavior with QAC systems based on the interaction model that is proposed.
      PubDate: 2019-06-06
  • The impact of result diversification on search behaviour and performance
    • Abstract: Result diversification aims to provide searchers with a broader view of a given topic while attempting to maximise the chances of retrieving relevant material. Diversifying results also aims to reduce search bias by increasing the coverage over different aspects of the topic. As such, searchers should learn more about the given topic in general. Despite diversification algorithms being introduced over two decades ago, little research has explicitly examined their impact on search behaviour and performance in the context of Interactive Information Retrieval (IIR). In this paper, we explore the impact of diversification when searchers undertake complex search tasks that require learning about different aspects of a topic (aspectual retrieval). We hypothesise that by diversifying search results, searchers will be exposed to a greater number of aspects. In turn, this will maximise their coverage of the topic (and thus reduce possible search bias). As a consequence, diversification should lead to performance benefits, regardless of the task, but how does diversification affect search behaviours and search satisfaction' Based on Information Foraging Theory (IFT), we infer two hypotheses regarding search behaviours due to diversification, namely that (i) it will lead to searchers examining fewer documents per query, and (ii) it will also mean searchers will issue more queries overall. To this end, we performed a within-subjects user study using the TREC AQUAINT collection with 51 participants, examining the differences in search performance and behaviour when using (i) a non-diversified system (BM25) versus (ii) a diversified system (BM25 + xQuAD) when the search task is either (a) ad-hoc or (b) aspectual. Our results show a number of notable findings in terms of search behaviour: participants on the diversified system issued more queries and examined fewer documents per query when performing the aspectual search task. Furthermore, we showed that when using the diversified system, participants were: more successful in marking relevant documents, and obtained a greater awareness of the topics (i.e. identified relevant documents containing more novel aspects). These findings show that search behaviour is influenced by diversification and task complexity. They also motivate further research into complex search tasks such as aspectual retrieval—and how diversity can play an important role in improving the search experience, by providing greater coverage of a topic and mitigating potential bias in search results.
      PubDate: 2019-05-16
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762

Your IP address:
Home (Search)
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-