for Journals by Title or ISSN
for Articles by Keywords
Journal Cover Foundations and Trends® in Information Retrieval
  [SJR: 1.217]   [H-I: 18]   [125 followers]  Follow
   Full-text available via subscription Subscription journal
   ISSN (Print) 1554-0669 - ISSN (Online) 1554-0677
   Published by Now Publishers Inc Homepage  [30 journals]
  • Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting
    • Abstract: AbstractThe most significant progress in recent years in online display advertisingis what is known as the Real-Time Bidding (RTB) mechanismto buy and sell ads. RTB essentially facilitates buying an individual adimpression in real time while it is still being generated from a user’svisit. RTB not only scales up the buying process by aggregating alarge amount of available inventories across publishers but, most importantly,enables direct targeting of individual users. As such, RTBhas fundamentally changed the landscape of digital marketing. Scientifically,the demand for automation, integration and optimisation inRTB also brings new research opportunities in information retrieval,data mining, machine learning and other related fields. In this monograph,an overview is given of the fundamental infrastructure, algorithms,and technical solutions of this new frontier of computationaladvertising. The covered topics include user response prediction, bidlandscape forecasting, bidding algorithms, revenue optimisation, statisticalarbitrage, dynamic pricing, and ad fraud detection.Suggested CitationJun Wang, Weinan Zhang and Shuai Yuan (2017), "Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting", Foundations and Trends® in Information Retrieval: Vol. 11: No. 4-5, pp 297-435.
      PubDate: Mon, 24 Jul 2017 00:00:00 +020
  • Applications of Topic Models
    • Abstract: AbstractHow can a single person understand what’s going on in a collection ofmillions of documents' This is an increasingly common problem: siftingthrough an organization’s e-mails, understanding a decade worth ofnewspapers, or characterizing a scientific field’s research. Topic modelsare a statistical framework that help users understand large documentcollections: not just to find individual documents but to understand thegeneral themes present in the collection.This survey describes the recent academic and industrial applicationsof topic models with the goal of launching a young researcher capableof building their own applications of topic models. In addition to topicmodels’ effective application to traditional problems like informationretrieval, visualization, statistical inference, multilingual modeling, andlinguistic understanding, this survey also reviews topic models’ abilityto unlock large text collections for qualitative analysis. We review theirsuccessful use by researchers to help understand fiction, non-fiction,scientific publications, and political texts.Suggested CitationJordan Boyd-Graber, Yuening Hu and David Mimno (2017), "Applications of Topic Models", Foundations and Trends® in Information Retrieval: Vol. 11: No. 2-3, pp 143-296.
      PubDate: Thu, 20 Jul 2017 00:00:00 +020
  • Searching the Enterprise
    • Abstract: AbstractSearch has become ubiquitous but that does not mean that search hasbeen solved. Enterprise search, which is broadly speaking the use ofinformation retrieval technology to find information within organisations,is a good example to illustrate this. It is an area that is of hugeimportance for businesses, yet has attracted relatively little academicinterest. This monograph will explore the main issues involved in enterprisesearch both from a research as well as a practical point of view.We will first plot the landscape of enterprise search and its links to relatedareas. This will allow us to identify key features before we surveythe field in more detail. Throughout the monograph we will discuss thetopic as part of the wider information retrieval research field, and weuse Web search as a common reference point as this is likely the searchapplication area that the average reader is most familiar with.Suggested CitationUdo Kruschwitz and Charlie Hull (2017), "Searching the Enterprise", Foundations and Trends® in Information Retrieval: Vol. 11: No. 1, pp 1-142.
      PubDate: Wed, 12 Jul 2017 00:00:00 +020
  • Aggregated Search
    • Abstract: AbstractThe goal of aggregated search is to provide integrated search acrossmultiple heterogeneous search services in a unified interface—a singlequery box and a common presentation of results. In the web searchdomain, aggregated search systems are responsible for integrating resultsfrom specialized search services, or verticals, alongside the coreweb results. For example, search portals such as Google, Bing, andYahoo! provide access to vertical search engines that focus on differenttypes of media (images and video), different types of search tasks(search for local businesses and online products), and even applicationsthat can help users complete certain tasks (language translation andmath calculations).Aggregated search systems perform two mains tasks. The first task(vertical selection) is to predict which verticals (if any) to present inresponse to a user’s query. The second task (vertical presentation) is topredict where and how to present each selected vertical alongside thecore web results.The goal of this work is to provide a comprehensive summary of previousresearch in aggregated search. We first describe why aggregatedsearch requires unique solutions. Then, we discuss different sources ofevidence that are likely to be available to an aggregated search system,as well as different techniques for integrating evidence in order to makevertical selection and presentation decisions. Next, we survey differentevaluation methodologies for aggregated search and discuss prioruser studies that have aimed to better understand how users behavewith aggregated search interfaces. Finally, we review different advancedtopics in aggregated search.Suggested CitationJaime Arguello (2017), "Aggregated Search", Foundations and Trends® in Information Retrieval: Vol. 10: No. 5, pp 365-502.
      PubDate: Mon, 06 Mar 2017 00:00:00 +010
  • A Survey of Query Auto Completion in Information Retrieval
    • Abstract: AbstractAbstractIn information retrieval, query auto completion (QAC), also known as type-ahead [Xiao et al., 2013, Cai et al., 2014b] and auto-complete suggestion [Jain and Mishne, 2010], refers to the following functionality: given a prefix consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prefix to a full query. Ranking query completions is a challenging task due to the limited length of prefixes entered by users, the large volume of possible query completions matching a prefix, and the broad range of possible search intents. In recent years, a large number of query auto completion approaches have been proposed that produce ranked lists of alternative query completions by mining query logs.In this survey, we review work on query auto completion that has been published before 2016. We focus mainly on web search and provide a formal definition of the query auto completion problem. We describe two dominant families of approaches to the query auto completion problem, one based on heuristic models and the other based on learning to rank. We also identify dominant trends in published work on query auto completion, viz. the use of time-sensitive signals and the use of user-specific signals. We describe the datasets and metrics that are used to evaluate algorithms for query auto completion. We also devote a chapter to efficiency and a chapter to presentation and interaction aspects of query auto completion. We end by discussing related tasks as well as potential research directions to further the area.Suggested CitationFei Cai and Maarten de Rijke (2016), "A Survey of Query Auto Completion in Information Retrieval", Foundations and Trends® in Information Retrieval: Vol. 10: No. 4, pp 273-363.
      PubDate: Mon, 19 Sep 2016 00:00:00 +020
  • Semantic Search on Text and Knowledge Bases
    • Abstract: AbstractThis article provides a comprehensive overview of the broad area of semantic search on text and knowledge bases. In a nutshell, semantic search is “search with meaning”. This “meaning” can refer to various parts of the search process: understanding the query (instead of just finding matches of its components in the data), understanding the data (instead of just searching it for such matches), or representing knowledge in a way suitable for meaningful retrieval.Semantic search is studied in a variety of different communities with a variety of different views of the problem. In this survey, we classify this work according to two dimensions: the type of data (text, knowledge bases, combinations of these) and the kind of search (keyword, structured, natural language). We consider all nine combinations. The focus is on fundamental techniques, concrete systems, and benchmarks. The survey also considers advanced issues: ranking, indexing, ontology matching and merging, and inference. It also provides a succinct overview of fundamental natural language processing techniques: POS-tagging, named-entity recognition and disambiguation, sentence parsing, and distributional semantics.The survey is as self-contained as possible, and should thus also serve as a good tutorial for newcomers to this fascinating and highly topical field.Suggested CitationHannah Bast, Björn Buchhold and Elmar Haussmann (2016), "Semantic Search on Text and Knowledge Bases", Foundations and Trends® in Information Retrieval: Vol. 10: No. 2-3, pp 119-271.
      PubDate: Wed, 22 Jun 2016 00:00:00 +020
  • Online Evaluation for Information Retrieval
    • Abstract: AbstractOnline evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users’ interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience.In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data.A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.Suggested CitationKatja Hofmann, Lihong Li and Filip Radlinski (2016), "Online Evaluation for Information Retrieval", Foundations and Trends® in Information Retrieval: Vol. 10: No. 1, pp 1-117.
      PubDate: Wed, 22 Jun 2016 00:00:00 +020
  • Credibility in Information Retrieval
    • PubDate: 2015-12-22T23:20:50-05:00
      Issue No: Vol. 9, No. 5 (2015)
  • Information Retrieval with Verbose Queries
    • Abstract: AbstractRecently, the focus of many novel search applications has shifted from short keyword queries to verbose natural language queries. Examples include question answering systems and dialogue systems, voice search on mobile devices and entity search engines like Facebook’s Graph Search or Google’s Knowledge Graph. However the performance of textbook information retrieval techniques for such verbose queries is not as good as that for their shorter counterparts. Thus, effective handling of verbose queries has become a critical factor for adoption of information retrieval techniques in this new breed of search applications. Over the past decade, the information retrieval community has deeply explored the problem of transforming natural language verbose queries using operations like reduction, weighting, expansion, reformulation and segmentation into more effective structural representations. However, thus far, there was not a coherent and organized survey on this topic. In this survey, we aim to put together various research pieces of the puzzle, provide a comprehensive and structured overview of various proposed methods, and also list various application scenarios where effective verbose query processing can make a significant difference.Suggested CitationManish Gupta and Michael Bendersky (2015), "Information Retrieval with Verbose Queries", Foundations and Trends® in Information Retrieval: Vol. 9: No. 3-4, pp 209-354.
      PubDate: Fri, 31 Jul 2015 00:00:00 +020
  • Temporal Information Retrieval
    • Abstract: AbstractTemporal dynamics and how they impact upon various components of information retrieval (IR) systems have received a large share of attention in the last decade. In particular, the study of relevance in information retrieval can now be framed within the so-called temporal IR approaches, which explain how user behavior, document content and scale vary with time, and how we can use them in our favor in order to improve retrieval effectiveness. This survey provides a comprehensive overview of temporal IR approaches, centered on the following questions: what are temporal dynamics, why do they occur, and when and how to leverage temporal information throughout the search cycle and architecture. We first explain the general and wide aspects associated to temporal dynamics by focusing on the web domain, from content and structural changes to variations of user behavior and interactions. Next, we pinpoint several research issues and the impact of such temporal characteristics on search, essentially regarding processing dynamic content, temporal query analysis and time-aware ranking. We also address particular aspects of temporal information extraction (for instance, how to timestamp documents and generate temporal profiles of text). To this end, we present existing temporal search engines and applications in related research areas, e.g., exploration, summarization, and clustering of search results, as well as future event retrieval and prediction, where the time dimension also plays an important role.Suggested CitationNattiya Kanhabua, Roi Blanco and Kjetil Nørvåg (2015), "Temporal Information Retrieval", Foundations and Trends® in Information Retrieval: Vol. 9: No. 2, pp 91-208.
      PubDate: Thu, 09 Jul 2015 00:00:00 +020
  • Search Result Diversification
    • Abstract: AbstractRanking in information retrieval has been traditionally approachedas a pursuit of relevant information, under the assumption that theusers’ information needs are unambiguously conveyed by their submittedqueries. Nevertheless, as an inherently limited representation of amore complex information need, every query can arguably be consideredambiguous to some extent. In order to tackle query ambiguity,search result diversification approaches have recently been proposed toproduce rankings aimed to satisfy the multiple possible informationneeds underlying a query. In this survey, we review the published literatureon search result diversification. In particular, we discuss themotivations for diversifying the search results for an ambiguous queryand provide a formal definition of the search result diversification problem.In addition, we describe the most successful approaches in theliterature for producing and evaluating diversity in multiple search domains.Finally, we also discuss recent advances as well as open researchdirections in the field of search result diversification.Suggested CitationRodrygo L. T. Santos, Craig Macdonald and Iadh Ounis (2015), "Search Result Diversification", Foundations and Trends® in Information Retrieval: Vol. 9: No. 1, pp 1-90.
      PubDate: Thu, 05 Mar 2015 00:00:00 +010
  • Computational Advertising: Techniques for Targeting Relevant Ads
    • Abstract: AbstractComputational Advertising, popularly known as online advertising or Web advertising, refers to finding the most relevant ads matching a particular context on the Web. The context depends on the type of advertising and could mean – content where the ad is shown, the user who is viewing the ad or the social network of the user. Computational Advertising (CA) is a scientific sub-discipline at the intersection of information retrieval, statistical modeling, machine learning, optimization, large scale search and text analysis. The core problem addressed in Computational Advertising is of match-making between the ads and the context.CA is prevalent in three major forms on the Web. One of the forms involves showing textual ads relevant to a query on the search page, known as Sponsored Search. On the other hand, showing textual ads relevant to a third party webpage content is known as Contextual Advertising. The third form of advertising also deals with the placement of ads on third party Web pages, but the ads in this form are rich multimedia ads – image, video, audio, flash. The business model with rich media ads is slightly different from the ones with textual ads. These ads are also called banner ads, and this form of advertising is known as Display Advertising.Both Sponsored Search and Contextual Advertising involve retrieving relevant ads for different types of content (query and Web page). As ads are short and are mainly written to attract the user, retrieval of ads pose challenges like vocabulary mismatch between the query/content and the ad. Also, as the user’s probability of examining an ad decreases with the position of the ad in the ranked list, it is imperative to keep the best ads at the top positions. Display Advertising poses several challenges including modeling user behaviour and noisy page content and bid optimization on the advertiser’s side. Additionally, online advertising faces challenges like false bidding, click spam and ad spam. These challenges are prevalent in all forms of advertising. There has been a lot of research work published in different areas of CA in the last one and a half decade. The focus of this survey is to discuss the problems and solutions pertaining to the information retrieval, machine learning and statistics domain of CA. This survey covers techniques and approaches that deal with several issues mentioned above.Research in Computational Advertising has evolved over time and currently continues both in traditional areas (vocabulary mismatch, query rewriting, click prediction) and recently identified areas (user targeting, mobile advertising, social advertising). In this study, we predominantly focus on the problems and solutions proposed in traditional areas in detail and briefly cover the emerging areas in the latter half of the survey. To facilitate future research, a discussion of available resources, list of public benchmark datasets and future directions of work is also provided in the end.Suggested CitationKushal Dave and Vasudeva Varma (2014), "Computational Advertising: Techniques for Targeting Relevant Ads", Foundations and Trends® in Information Retrieval: Vol. 8: No. 4–5, pp 263-418.
      PubDate: Wed, 29 Oct 2014 00:00:00 +010
  • Music Information Retrieval: Recent Developments and Applications
    • Abstract: AbstractWe provide a survey of the field of Music Information Retrieval (MIR), in particular paying attention to latest developments, such as semantic auto-tagging and user-centric retrieval and recommendation approaches. We first elaborate on well-established and proven methods for feature extraction and music indexing, from both the audio signal and contextual data sources about music items, such as web pages or collaborative tags. These in turn enable a wide variety of music retrieval tasks, such as semantic music search or music identification (“query by example"). Subsequently, we review current work on user analysis and modeling in the context of music recommendation and retrieval, addressing the recent trend towards user-centric and adaptive approaches and systems. A discussion follows about the important aspect of how various MIR approaches to different problems are evaluated and compared. Eventually, a discussion about the major open challenges concludes the survey.Suggested CitationMarkus Schedl, Emilia Gómez and Julián Urbano (2014), "Music Information Retrieval: Recent Developments and Applications", Foundations and Trends® in Information Retrieval: Vol. 8: No. 2-3, pp 127-261.
      PubDate: Fri, 12 Sep 2014 00:00:00 +020
  • LifeLogging: Personal Big Data
    • Abstract: AbstractWe have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses for information access and retrieval in general. This review is a suitable reference for those seeking an information retrieval scientist’s perspective on lifelogging and the quantified self.Suggested CitationCathal Gurrin, Alan F. Smeaton and Aiden R. Doherty (2014), "LifeLogging: Personal Big Data", Foundations and Trends® in Information Retrieval: Vol. 8: No. 1, pp 1-125.
      PubDate: Mon, 16 Jun 2014 00:00:00 +020
  • Semantic Matching in Search
    • Abstract: AbstractRelevance is the most important factor to assure users’ satisfaction in search and the success of a search engine heavily depends on its performance on relevance. It has been observed that most of the dissatisfaction cases in relevance are due to term mismatch between queries and documents (e.g., query “NY times” does not match well with a document only containing “New York Times”), because term matching, i.e., the bag-of-words approach, still functions as the main mechanism of modern search engines. It is not exaggerated to say, therefore, that mismatch between query and document poses the most critical challenge in search. Ideally, one would like to see query and document match with each other, if they are topically relevant. Recently, researchers have expended significant effort to address the problem. The major approach is to conduct semantic matching, i.e., to perform more query and document understanding to represent the meanings of them, and perform better matching between the enriched query and document representations. With the availability of large amounts of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made recently. This survey gives a systematic and detailed introduction to newly developed machine learning technologies for query document matching (semantic matching) in search, particularly web search. It focuses on the fundamental problems, as well as the state-of-the-art solutions of query document matching on form aspect, phrase aspect, word sense aspect, topic aspect, and structure aspect. The ideas and solutions explained may motivate industrial practitioners to turn the research results into products. The methods introduced and the discussions made may also stimulate academic researchers to find new research directions and approaches. Matching between query and document is not limited to search and similar problems can be found in question answering, online advertising, cross-language information retrieval, machine translation, recommender systems, link prediction, image annotation, drug design, and other applications, as the general task of matching between objects from two different spaces. The technologies introduced can be generalized into more general machine learning techniques, which is referred to as learning to match in this survey.Suggested CitationHang Li and Jun Xu (2014), "Semantic Matching in Search", Foundations and Trends® in Information Retrieval: Vol. 7: No. 5, pp 343-469.
      PubDate: Thu, 12 Jun 2014 00:00:00 +020
  • Arabic Information Retrieval
    • Abstract: AbstractIn the past several years, Arabic Information Retrieval (IR) has garnered significant attention. The main research interests have focused on retrieval of formal language, mostly in the news domain, with ad hoc retrieval, OCR document retrieval, and cross-language retrieval. The literature on other aspects of retrieval continues to be sparse or non-existent, though some of these aspects have been investigated by industry. Others aspects of Arabic retrieval that have received attention include document image retrieval, speech search, social media and web search, and filtering. However, efforts on different aspects of Arabic retrieval continue to be deficient and severely lacking behind efforts in other languages. The survey covers: 1) general properties of the Arabic language; 2) some of the aspects of Arabic that affect retrieval; 3) Arabic processing necessary for effective Arabic retrieval; 4) Arabic retrieval in public IR evaluations; 5) specialized retrieval problems, namely Arabic-English CLIR, Arabic Document Image Retrieval, Arabic Social Search, Arabic Web Search, Question Answering, Image retrieval, and Arabic Speech Search; 6) Arabic IR and NLP resources; and 7) open IR problems that require further attention.Suggested CitationKareem Darwish and Walid Magdy (2014), "Arabic Information Retrieval", Foundations and Trends® in Information Retrieval: Vol. 7: No. 4, pp 239-342.
      PubDate: Wed, 05 Feb 2014 00:00:00 +010
  • Information Retrieval for E-Discovery
    • Abstract: AbstractE-discovery refers generally to the process by which one party (for example, the plaintiff) is entitled to "discover" evidence in the form of "electronically stored information" that is held by another party (for example, the defendant), and that is relevant to some matter that is the subject of civil litigation (that is, what is commonly called a "lawsuit"). This survey describes the emergence of the field, identifies the information retrieval issues that arise, reviews the work to date on this topic, and summarizes major open issues.Suggested CitationDouglas W. Oard and William Webber (2013), "Information Retrieval for E-Discovery", Foundations and Trends® in Information Retrieval: Vol. 7: No. 2–3, pp 99-237.
      PubDate: Wed, 26 Jun 2013 00:00:00 +020
  • Patent Retrieval
    • Abstract: AbstractIntellectual property and the patent system in particular have been extremely present in research and discussion, even in the public media, in the last few years. Without going into any controversial issues regarding the patent system, we approach a very real and growing problem: searching for innovation. The target collection for this task does not consist of patent documents only, but it is in these documents that the main difference is found compared to web or news information retrieval. In addition, the issue of patent search implies a particular user model and search process model. This review is concerned with how research and technology in the field of Information Retrieval assists or even changes the processes of patent search. It is a survey of work done on patent data in relation to Information Retrieval in the last 20–25 years. It explains the sources of difficulty and the existing document processing and retrieval methods of the domain, and provides a motivation for further research in the area.Suggested CitationMihai Lupu and Allan Hanbury (2013), "Patent Retrieval", Foundations and Trends® in Information Retrieval: Vol. 7: No. 1, pp 1-97.
      PubDate: Wed, 20 Feb 2013 00:00:00 +010
  • Contextual Search: A Computational Framework
    • Abstract: AbstractThe growing availability of data in electronic form, the expansion of the World Wide Web (WWW) and the accessibility of computational methods for large-scale data processing have allowed researchers in Information Retrieval (IR) to design systems which can effectively and efficiently constrain search within the boundaries given by context, thus transforming classical search into contextual search. Because of the constraints imposed by context, contextual search better focuses on the user's relevance and improves retrieval performance, since the out-of-context aspects of the search carried out by users that are likely linked to irrelevant documents are left apart.This survey introduces contextual search within a computational framework based on contextual variables, contextual factors and statistical models. The framework adopted in this survey considers the data observable from the real world entities participating in contextual search and classifies them as whatwe call contextual variables. The contextual variables considered are content, geotemporal, interaction, and social variables. Moreover, we distinguish between contextual variables and contextual factor: the former is what can be observed, the latter is what cannot be observed, yet this is the factor affecting the user's relevance assessment. Therefore, in this survey, we describe how statistical models can process contextual variables to infer the contextual factors underlying the current search context.In this survey we provide a background to the subject by: placing it among other surveys on relevance, interaction, context, and behavior; providing the description of the contextual variables used for implementing the statistical models which represent and predict relevance and contextual factors; citing and surveying useful publications to the reader for further examination; providing an overview of the evaluation methodologies and findings relevant to this subject; and briefly describing some implementations of contextual search tools.Suggested CitationMassimo Melucci (2012), "Contextual Search: A Computational Framework", Foundations and Trends® in Information Retrieval: Vol. 6: No. 4–5, pp 257-405.
      PubDate: Wed, 05 Dec 2012 00:00:00 +010
  • Expertise Retrieval
    • Abstract: AbstractPeople have looked for experts since before the advent of computers. With advances in information retrieval technology and the large-scale availability of digital traces of knowledge-related activities, computer systems that can fully automate the process of locating expertise have become a reality. The past decade has witnessed tremendous interest, and a wealth of results, in expertise retrieval as an emerging subdiscipline in information retrieval. This survey highlights advances in models and algorithms relevant to this field. We draw connections among methods proposed in the literature and summarize them in five groups of basic approaches. These serve as the building blocks for more advanced models that arise when we consider a range of content-based factors that may impact the strength of association between a topic and a person. We also discuss practical aspects of building an expert search system and present applications of the technology in other domains, such as blog distillation and entity retrieval. The limitations of current approaches are also pointed out. We end our survey with a set of conjectures on what the future may hold for expertise retrieval research.Suggested CitationKrisztian Balog, Yi Fang, Maarten de Rijke, Pavel Serdyukov and Luo Si (2012), "Expertise Retrieval", Foundations and Trends® in Information Retrieval: Vol. 6: No. 2–3, pp 127-256.
      PubDate: Mon, 30 Jul 2012 00:00:00 +020
  • Information Retrieval on the Blogosphere
    • Abstract: AbstractBlogs have recently emerged as a new open, rapidly evolving and reactive publishing medium on the Web. Rather than managed by a central entity, the content on the blogosphere — the collection of all blogs on the Web — is produced by millions of independent bloggers, who can write about virtually anything. This open publishing paradigm has led to a growing mass of user-generated content on theWeb, which can vary tremendously both in format and quality when looked at in isolation, but which can also reveal interesting patterns when observed in aggregation. One field particularly interested in studying how information is produced, consumed, and searched in the blogosphere is information retrieval. In this survey, we review the published literature on searching the blogosphere. In particular, we describe the phenomenon of blogging and the motivations for searching for information on blogs. We cover both the search tasks underlying blog searchers' information needs and the most successful approaches to these tasks. These include blog post and full blog search tasks, as well as blog-aided search tasks, such as trend and market analysis. Finally, we also describe the publicly available resources that support research on searching the blogosphere. Disclaimer: Certain companies and/or products are identified in this paper in order to describe concepts and to specify experimental procedures adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the companies or products identified are necessarily the best available for the purpose.Suggested CitationRodrygo L. T. Santos, Craig Macdonald, Richard McCreadie, Iadh Ounis and Ian Soboroff (2012), "Information Retrieval on the Blogosphere", Foundations and Trends® in Information Retrieval: Vol. 6: No. 1, pp 1-125.
      PubDate: Mon, 30 Jul 2012 00:00:00 +020
  • Spoken Content Retrieval: A Survey of Techniques and Technologies
    • Abstract: AbstractSpeech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR.Suggested CitationMartha Larson and Gareth J. F. Jones (2012), "Spoken Content Retrieval: A Survey of Techniques and Technologies", Foundations and Trends® in Information Retrieval: Vol. 5: No. 4–5, pp 235-422.
      PubDate: Mon, 23 Jul 2012 00:00:00 +020
  • Automatic Summarization
    • Abstract: AbstractIt has now been 50 years since the publication of Luhn's seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field.Suggested CitationAni Nenkova and Kathleen McKeown (2011), "Automatic Summarization", Foundations and Trends® in Information Retrieval: Vol. 5: No. 2–3, pp 103-233.
      PubDate: Thu, 30 Jun 2011 00:00:00 +020
  • Federated Search
    • Abstract: AbstractFederated search (federated information retrieval or distributed information retrieval) is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant answers. The results returned by selected collections are integrated and merged into a single list. Federated search is preferred over centralized search alternatives in many environments. For example, commercial search engines such as Google cannot easily index uncrawlable hidden web collections while federated search systems can search the contents of hidden web collections without crawling. In enterprise environments, where each organization maintains an independent search engine, federated search techniques can provide parallel search over multiple collections.There are three major challenges in federated search. For each query, a subset of collections that are most likely to return relevant documents are selected. This creates the collection selection problem. To be able to select suitable collections, federated search systems need to acquire some knowledge about the contents of each collection, creating the collection representation problem. The results returned from the selected collections are merged before the final presentation to the user. This final step is the result merging problem.The goal of this work, is to provide a comprehensive summary of the previous research on the federated search challenges described above.Suggested CitationMilad Shokouhi and Luo Si (2011), "Federated Search", Foundations and Trends® in Information Retrieval: Vol. 5: No. 1, pp 1-102.
      PubDate: Mon, 07 Mar 2011 00:00:00 +010
  • Adversarial Web Search
    • Abstract: AbstractWeb search engines have become indispensable tools for finding content. As the popularity of the Web has increased, the efforts to exploit the Web for commercial, social, or political advantage have grown, making it harder for search engines to discriminate between truthful signals of content quality and deceptive attempts to game search engines' rankings. This problem is further complicated by the open nature of the Web, which allows anyone to write and publish anything, and by the fact that search engines must analyze ever-growing numbers of Web pages. Moreover, increasing expectations of users, who over time rely on Web search for information needs related to more aspects of their lives, further deepen the need for search engines to develop effective counter-measures against deception.In this monograph, we consider the effects of the adversarial relationship between search systems and those who wish to manipulate them, a field known as "Adversarial Information Retrieval". We show that search engine spammers create false content and misleading links to lure unsuspecting visitors to pages filled with advertisements or malware. We also examine work over the past decade or so that aims to discover such spamming activities to get spam pages removed or their effect on the quality of the results reduced.Research in Adversarial Information Retrieval has been evolving over time, and currently continues both in traditional areas (e.g., link spam) and newer areas, such as click fraud and spam in social media, demonstrating that this conflict is far from over.Suggested CitationCarlos Castillo and Brian D. Davison (2011), "Adversarial Web Search", Foundations and Trends® in Information Retrieval: Vol. 4: No. 5, pp 377-486.
      PubDate: Sat, 22 Jan 2011 00:00:00 +010
  • Test Collection Based Evaluation of Information Retrieval Systems
    • Abstract: AbstractUse of test collections and evaluation measures to assess the effectiveness of information retrieval systems has its origins in work dating back to the early 1950s. Across the nearly 60 years since that work started, use of test collections is a de facto standard of evaluation. This monograph surveys the research conducted and explains the methods and measures devised for evaluation of retrieval systems, including a detailed look at the use of statistical significance testing in retrieval experimentation. This monograph reviews more recent examinations of the validity of the test collection approach and evaluation measures as well as outlining trends in current research exploiting query logs and live labs. At its core, the modern-day test collection is little different from the structures that the pioneering researchers in the 1950s and 1960s conceived of. This tutorial and review shows that despite its age, this long-standing evaluation method is still a highly valued tool for retrieval research.Suggested CitationMark Sanderson (2010), "Test Collection Based Evaluation of Information Retrieval Systems", Foundations and Trends® in Information Retrieval: Vol. 4: No. 4, pp 247-375.
      PubDate: Tue, 22 Jun 2010 00:00:00 +020
  • Web Crawling
    • Abstract: AbstractThis is a survey of the science and practice of web crawling. While at first glance web crawling may appear to be merely an application of breadth-first-search, the truth is that there are many challenges ranging from systems concerns such as managing very large data structures to theoretical questions such as how often to revisit evolving content sources. This survey outlines the fundamental challenges and describes the state-of-the-art models and solutions. It also highlights avenues for future work.Suggested CitationChristopher Olston and Marc Najork (2010), "Web Crawling", Foundations and Trends® in Information Retrieval: Vol. 4: No. 3, pp 175-246.
      PubDate: Fri, 12 Feb 2010 00:00:00 +010
  • The Probabilistic Relevance Framework: BM25 and Beyond
    • Abstract: AbstractThe Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970–1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.Suggested CitationStephen Robertson and Hugo Zaragoza (2009), "The Probabilistic Relevance Framework: BM25 and Beyond", Foundations and Trends® in Information Retrieval: Vol. 3: No. 4, pp 333-389.
      PubDate: Thu, 17 Dec 2009 00:00:00 +010
  • Mining Query Logs: Turning Search Usage Data into Knowledge
    • Abstract: AbstractWeb search engines have stored in their logs information about users since they started to operate. This information often serves many purposes. The primary focus of this survey is on introducing to the discipline of query mining by showing its foundations and by analyzing the basic algorithms and techniques that are used to extract useful knowledge from this (potentially) infinite source of information. We show how search applications may benefit from this kind of analysis by analyzing popular applications of query log mining and their influence on user experience. We conclude the paper by, briefly, presenting some of the most challenging current open problems in this field.Suggested CitationFabrizio Silvestri (2009), "Mining Query Logs: Turning Search Usage Data into Knowledge", Foundations and Trends® in Information Retrieval: Vol. 4: No. 1–2, pp 1-174.
      PubDate: Sun, 29 Nov 2009 00:00:00 +010
  • Learning to Rank for Information Retrieval
    • Abstract: AbstractLearning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.Suggested CitationTie-Yan Liu (2009), "Learning to Rank for Information Retrieval", Foundations and Trends® in Information Retrieval: Vol. 3: No. 3, pp 225-331.
      PubDate: Sat, 27 Jun 2009 00:00:00 +020
  • Concept-Based Video Retrieval
    • Abstract: AbstractIn this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human–computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities.Suggested CitationCees G. M. Snoek and Marcel Worring (2009), "Concept-Based Video Retrieval", Foundations and Trends® in Information Retrieval: Vol. 2: No. 4, pp 215-322.
      PubDate: Wed, 27 May 2009 00:00:00 +020
  • Methods for Evaluating Interactive Information Retrieval Systems with
    • Abstract: AbstractThis paper provides overview and instruction regarding the evaluation of interactive information retrieval systems with users. The primary goal of this article is to catalog and compile material related to this topic into a single source. This article (1) provides historical background on the development of user-centered approaches to the evaluation of interactive information retrieval systems; (2) describes the major components of interactive information retrieval system evaluation; (3) describes different experimental designs and sampling strategies; (4) presents core instruments and data collection techniques and measures; (5) explains basic data analysis techniques; and (4) reviews and discusses previous studies. This article also discusses validity and reliability issues with respect to both measures and methods, presents background information on research ethics and discusses some ethical issues which are specific to studies of interactive information retrieval (IIR). Finally, this article concludes with a discussion of outstanding challenges and future research directions.Suggested CitationDiane Kelly (2009), "Methods for Evaluating Interactive Information Retrieval Systems with Users", Foundations and Trends® in Information Retrieval: Vol. 3: No. 1–2, pp 1-224.
      PubDate: Tue, 28 Apr 2009 00:00:00 +020
  • Statistical Language Models for Information Retrieval A Critical Review
    • Abstract: AbstractStatistical language models have recently been successfully applied to many information retrieval problems. A great deal of recent work has shown that statistical language models not only lead to superior empirical performance, but also facilitate parameter tuning and open up possibilities for modeling nontraditional retrieval problems. In general, statistical language models provide a principled way of modeling various kinds of retrieval problems. The purpose of this survey is to systematically and critically review the existing work in applying statistical language models to information retrieval, summarize their contributions, and point out outstanding challenges.Suggested CitationChengXiang Zhai (2008), "Statistical Language Models for Information Retrieval A Critical Review", Foundations and Trends® in Information Retrieval: Vol. 2: No. 3, pp 137-213.
      PubDate: Sun, 30 Nov 2008 00:00:00 +010
  • Opinion Mining and Sentiment Analysis
    • Abstract: AbstractAn important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object.This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.Suggested CitationBo Pang and Lillian Lee (2008), "Opinion Mining and Sentiment Analysis", Foundations and Trends® in Information Retrieval: Vol. 2: No. 1–2, pp 1-135.
      PubDate: Mon, 07 Jul 2008 00:00:00 +020
  • Email Spam Filtering: A Systematic Review
    • Abstract: AbstractSpam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with their appeal to the intent of sender and recipient make them difficult to formalize. A typical email user has a working definition no more formal than "I know it when I see it." Yet, current spam filters are remarkably effective, more effective than might be expected given the level of uncertainty and debate over a formal definition of spam, more effective than might be expected given the state-of-the-art information retrieval and machine learning methods for seemingly similar problems. But are they effective enough' Which are better' How might they be improved' Will their effectiveness be compromised by more cleverly crafted spam'We survey current and proposed spam filtering techniques with particular emphasis on how well they work. Our primary focus is spam filtering in email; Similarities and differences with spam filtering in other communication and storage media — such as instant messaging and the Web — are addressed peripherally. In doing so we examine the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe. Well-known methods are detailed sufficiently to make the exposition self-contained, however, the focus is on considerations unique to spam. Comparisons, wherever possible, use common evaluation measures, and control for differences in experimental setup. Such comparisons are not easy, as benchmarks, measures, and methods for evaluating spam filters are still evolving. We survey these efforts, their results and their limitations. In spite of recent advances in evaluation methodology, many uncertainties (including widely held but unsubstantiated beliefs) remain as to the effectiveness of spam filtering techniques and as to the validity of spam filter evaluation methods. We outline several uncertainties and propose experimental methods to address them.Suggested CitationGordon V. Cormack (2008), "Email Spam Filtering: A Systematic Review", Foundations and Trends® in Information Retrieval: Vol. 1: No. 4, pp 335-455.
      PubDate: Mon, 23 Jun 2008 00:00:00 +020
  • Authorship Attribution
    • Abstract:
      Authors hip attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. Recent work in "non-traditional" authorship attribution demonstrates the practicality of automatically analyzing documents based on authorial style, but the state of the art is confusing. Analyses are difficult to apply, little is known about type or rate of errors, and few "best practices" are available. In part because of this confusion, the field has perhaps had less uptake and general acceptance than is its due.This review surveys the history and present state of the discipline, presenting some comparative results when available. It shows, first, that the discipline is quite successful, even in difficult cases involving small documents in unfamiliar and less studied languages; it further analyzes the types of analysis and features used and tries to determine characteristics of well-performing systems, finally formulating these in a set of recommendations for best practices.Suggested CitationPatrick Juola (2008), "
      Authors hip Attribution", Foundations and Trends® in Information Retrieval: Vol. 1: No. 3, pp 233-334.
      PubDate: Fri, 07 Mar 2008 00:00:00 +010
  • Open-Domain Question–Answering
    • Abstract: AbstractThe increasing availability of music in digital format needs to be matched by the development of tools for music accessing, filtering, classification, and retrieval. The research area of Music Information Retrieval (MIR) covers many of these aspects. The aim of this paper is to present an overview of this vast and new field. A number of issues, which are peculiar to the music language, are described–including forms, formats, and dimensions of music–together with the typologies of users and their information needs. To fulfil these needs a number of approaches are discussed, from direct search to information filtering and clustering of music documents. An overview of the techniques for music processing, which are commonly exploited in many approaches, is also presented. Evaluation and comparisons of the approaches on a common benchmark are other important issues. To this end, a description of the initial efforts and evaluation campaigns for MIR is provided.Suggested CitationJohn Prager (2007), "Open-Domain Question–Answering", Foundations and Trends® in Information Retrieval: Vol. 1: No. 2, pp 91-231.
      PubDate: Tue, 21 Aug 2007 00:00:00 +020
  • Music Retrieval: A Tutorial and Review
    • Abstract: AbstractThe increasing availability of music in digital format needs to be matched by the development of tools for music accessing, filtering, classification, and retrieval. The research area of Music Information Retrieval (MIR) covers many of these aspects. The aim of this paper is to present an overview of this vast and new field. A number of issues, which are peculiar to the music language, are described–including forms, formats, and dimensions of music–together with the typologies of users and their information needs. To fulfil these needs a number of approaches are discussed, from direct search to information filtering and clustering of music documents. An overview of the techniques for music processing, which are commonly exploited in many approaches, is also presented. Evaluation and comparisons of the approaches on a common benchmark are other important issues. To this end, a description of the initial efforts and evaluation campaigns for MIR is provided.Suggested CitationNicola Orio (2006), "Music Retrieval: A Tutorial and Review", Foundations and Trends® in Information Retrieval: Vol. 1: No. 1, pp 1-90.
      PubDate: Thu, 26 Oct 2006 00:00:00 +020
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016