Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Nowadays, a considerable volume of news articles is produced daily by news agencies worldwide. Since there is an extensive volume of news on the web, finding exact answers to the users’ questions is not a straightforward task. Developing Question Answering (QA) systems for the news articles can tackle this challenge. Due to the lack of studies on Persian QA systems and the importance and wild applications of QA systems in the news domain, this research aims to design and implement a QA system for the Persian news articles. This is the first attempt to develop a Persian QA system in the news domain to our best knowledge. We first create FarsQuAD: a Persian QA dataset for the news domain. We analyze the type and complexity of the users’ questions about the Persian news. The results show that What and Who questions have the most and Why and Which questions have the least occurrences in the Persian news domain. The results also indicate that the users usually raise complex questions about the Persian news. Then we develop FarsNewsQA: a QA system for answering questions about Persian news. We developed three models of the FarsNewsQA using BERT, ParsBERT, and ALBERT. The best version of the FarsNewsQA offers an F1 score of 75.61%, which is comparable with that of QA system on the English SQuAD dataset made by the Stanford university, and shows the new Bert-based technologies works well for Persian news QA systems. PubDate: 2023-03-19 DOI: 10.1007/s10791-023-09417-2
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Visual search has become more popular in recent years, allowing users to search by an image they are taking using their mobile device or uploading from their photo library. One domain in which visual search is especially valuable is electronic commerce, where users seek for items to purchase. Despite the increasing popularity of visual search in e-commerce, no comprehensive study has inspected its characteristics compared to traditional search using a text query. In this work, we present an in-depth comprehensive study of visual e-commerce search. We perform query log analysis of one of the largest e-commerce platforms’ mobile search application. We compare visual and textual search by a variety of characteristics, with special focus on the retrieved results and user interaction with them. We also examine image query characteristics, refinement by attributes, and segmentation by user types. Additionally, we examine, for the first time, a wide variety of visual pre- and post-retrieval query performance predictors, several of which showing strong results. Our study points out a variety of differences between visual and textual e-commerce search. We discuss the implications of these differences for the design of future e-commerce search systems. PubDate: 2023-03-03 DOI: 10.1007/s10791-023-09418-1
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In light of recent advances in adversarial learning, there has been strong and continuing interest in exploring how to perform adversarial learning-to-rank. The previous adversarial ranking methods [e.g., IRGAN by Wang et al. (IRGAN: a minimax game for unifying generative and discriminative information retrieval models. Proceedings of the 40th SIGIR pp. 515–524, 2017)] mainly follow the generative adversarial networks (GAN) framework (Goodfellow et al. in Generative adversarial nets. Proceedings of NeurIPS pp. 2672–2680, 2014), and focus on either pointwise or pairwise optimization based on the rule-based adversarial sampling. Unfortunately, there are still many open problems. For example, how to perform listwise adversarial learning-to-rank has not been explored. Furthermore, GAN has many variants, such as f-GAN (Nowozin et al. in Proceedings of the 30th international conference on neural information processing systems, pp. 271–279, 2016) and EBGAN (Zhao et al. in Energy-based generative adversarial network. International conference on learning representations (ICLR), 2017), a natural question arises then: to what extent does the adversarial learning strategy affect the ranking performance' To cope with these problems, firstly, we show how to perform adversarial learning-to-rank in a listwise manner by following the GAN framework. Secondly, we investigate the effects of using a different adversarial learning framework, namely f-GAN. Specifically, a new general adversarial learning-to-rank framework via variational divergence minimization is proposed (referred to as IRf-GAN). Furthermore, we show how to perform pointwise, pairwise and listwise adversarial learning-to-rank within the same framework of IRf-GAN. In order to clearly understand the pros and cons of adversarial learning-to-rank, we conduct a series of experiments using multiple benchmark collections. The experimental results demonstrate that: (1) Thanks to the flexibility of being able to use different divergence functions, IRf-GAN-pair shows significantly better performance than adversarial learning-to-rank methods based on the IRGAN framework. This reveals that the learning strategy significantly affects the adversarial ranking performance. (2) An in-depth comparison with conventional ranking methods shows that although the adversarial learning-to-rank models can achieve comparable performance as conventional methods based on neural networks, they are still inferior to LambdaMART by a large margin. In particular, we pinpoint that the weakness of adversarial learning-to-rank is largely attributable to the gradient estimation based on sampled rankings which significantly diverge from ideal rankings. Careful examination of this weakness is highly recommended for developing adversarial learning-to-rank approaches. PubDate: 2023-02-28 DOI: 10.1007/s10791-023-09419-0
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this work we address the problem of performing an error-tolerant prefix search on a set of string keys. While the ideas presented here could be adopted in other applications, our primary target application is error-tolerant query autocompletion. Tries and their variations have been adopted as the basic data structure to implement recently proposed error-tolerant prefix search methods. However, they must require a lot of extra memory to process queries. Burst tries are alternative compact tries proposed to reduce storage costs and maintain a close performance when compared to the use of tries. Here we discuss alternatives for adapting burst tries as error-tolerant prefix search data structures. We show how to adapt state-of-the-art trie-based methods for use with burst tries. We studied the trade-off between memory usage and time performance while varying the parameters to build the burst trie index. As an example, when indexing the JusBrasil dataset, one of the datasets adopted in the experiments, the use of burst tries reduces the memory required by a full trie to 26% and increases the time performance to 116%. The possibility of balancing memory usage and time performance constitutes an advantage of the burst trie when compared to the full trie when adopted as an index for the task of performing error-tolerant prefix search. PubDate: 2022-10-18 DOI: 10.1007/s10791-022-09416-9
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: There exist many research works that strive to answer the question “what news article is a user going to click next given his profile”. These works take into account the time dimension to reveal users’ preferences over time. However, few works exploit adequately the information that is hidden inside user sessions. User sessions include a list of user interactions with items within a short period of time such as 30 min, and can reveal her very last intentions. In this paper, we combine intra- with inter-session item transition probabilities to reveal the short- and long-term intentions of individuals. Thus, we are able to better capture the similarities among items that are co-selected inside a user session but also within any two consecutive sessions. We have evaluated experimentally our method and compare it against state-of-the-art algorithms on three real-life datasets. We demonstrate the superiority of our method over its competitors. PubDate: 2022-09-28 DOI: 10.1007/s10791-022-09415-w
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Pretrained language models (PLMs) exemplified by BERT have proven to be remarkably effective for ad hoc ranking. As opposed to pre-BERT models that required specialized neural components to capture different aspects of query-document relevance, PLMs are solely based on transformers where attention is the only mechanism used for extracting signals from term interactions. Thanks to the transformer’s cross-match attention, BERT was found to be an effective soft matching model. However, exact matching is still an essential signal for assessing the relevance of a document to an information-seeking query aside from semantic matching. We assume that BERT might benefit from explicit exact match cues to better adapt to the relevance classification task. In this work, we explore strategies for integrating exact matching signals using marker tokens to highlight exact term-matches between the query and the document. We find that this simple marking approach significantly improves over the common vanilla baseline. We empirically demonstrate the effectiveness of our approach through exhaustive experiments on three standard ad hoc benchmarks. Results show that explicit exact match cues conveyed by marker tokens are beneficial for BERT and ELECTRA variant to achieve higher or at least comparable performance. Our findings support that traditional information retrieval cues such as exact matching are still valuable for large pretrained contextualized models such as BERT. PubDate: 2022-08-06 DOI: 10.1007/s10791-022-09414-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users’ interactions, such as clicks. Clicks however are a biased signal: specifically, top-ranked documents are likely to attract more clicks than documents down the ranking (position bias). In this paper, we propose a novel learning algorithm for OLTR that uses reinforcement learning to optimize rankers: Reinforcement Online Learning to Rank (ROLTR). In ROLTR, the gradients of the ranker are estimated based on the rewards assigned to clicked and unclicked documents. In order to de-bias the users’ position bias contained in the reward signals, we introduce unbiased reward shaping functions that exploit inverse propensity scoring for clicked and unclicked documents. The fact that our method can also model unclicked documents provides a further advantage in that less users interactions are required to effectively train a ranker, thus providing gains in efficiency. Empirical evaluation on standard OLTR datasets shows that ROLTR achieves state-of-the-art performance, and provides significantly better user experience than other OLTR approaches. To facilitate the reproducibility of our experiments, we make all experiment code available at https://github.com/ielab/OLTR. PubDate: 2022-08-04 DOI: 10.1007/s10791-022-09413-y
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Recent years have seen enormous gains in core information retrieval tasks, including document and passage ranking. Datasets and leaderboards, and in particular the MS MARCO datasets, illustrate the dramatic improvements achieved by modern neural rankers. When compared with traditional information retrieval test collections, such as those developed by TREC, the MS MARCO datasets employ substantially more queries—thousands vs. dozens – with substantially fewer known relevant items per query—often just one. For example, 94% of the nearly seven thousand queries in the MS MARCO passage ranking development set have only a single known relevant passage, and no query has more than four. Given the sparsity of these relevance labels, the MS MARCO leaderboards track improvements with mean reciprocal rank (MRR). In essence, the known relevant item is treated as the “right answer” or “best answer”, with rankers scored on their ability to place this item as high in the ranking as possible. In working with these sparse labels, we have observed that the top items returned by a ranker often appear superior to judged relevant items. Others have reported the same observation. To test this observation, we employed crowdsourced workers to make preference judgments between the top item returned by a modern neural ranking stack and a judged relevant item for the nearly seven thousand queries in the passage ranking development set. The results support our observation. If we imagine a hypothetical perfect ranker under MRR, with a score of 1 on all queries, our preference judgments indicate that a searcher would prefer the top result from a modern neural ranking stack more frequently than the top result from the hypothetical perfect ranker, making our neural ranker “better than perfect”. To understand the implications for the leaderboard, we pooled the top document from available runs near the top of the passage ranking leaderboard for over 500 queries. We employed crowdsourced workers to make preference judgments over these pools and re-evaluated the runs. Our results support our concerns that current MS MARCO datasets may no longer be able to recognize genuine improvements in rankers. In future, if rankers are measured against a single answer, this answer should be the best answer or most preferred answer, and maintained with ongoing judgments. Since only the best known answer is required, this ongoing maintenance might be performed with shallow pooling. When a previously unjudged document is surfaced as the top item in a ranking, it can be directly compared with the previous best known answer. PubDate: 2022-07-20 DOI: 10.1007/s10791-022-09411-0
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Session-based recommendation, without the access to a user’s historical user-item interactions, is a challenging task, where the available information in an ongoing session is very limited. Previous work on session-based recommendation has considered sequences of items that users have interacted with sequentially. Such item sequences may not fully capture the complex transition relationship between items that go beyond the inspection order. This issue is partially addressed by the graph neural network (GNN) based models. However, GNNs can only propagate information from adjacent items while neglecting items without a direct connection, which makes the latent connections unavailable in propagation of GNNs. Importantly, GNN-based approaches often face a serious overfitting problem. Thus, we propose Star Graph Neural Networks with Highway Net- works (SGNN-HN) for session-based recommendation. The proposed SGNN-HN model applies a star graph neural network (SGNN) to model the complex transition relationship between items in an ongoing session. To avoid overfitting, we employ the highway networks (HN) to adaptively select embeddings from item representations before and after multi-layer SGNNs. Finally, we aggregate the item embeddings generated by SGNN in an ongoing session to represent a user’s final preference for item prediction. Experiments are conducted on two public benchmark datasets, i.e., Yoochoose and Diginetica. The results show that SGNN-HN can outperform the state-of-the-art models in terms of Recall and MRR for session-based recommendation. PubDate: 2022-07-18 DOI: 10.1007/s10791-022-09412-z
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Dialogue systems are becoming an increasingly common part of many users’ daily routines. Natural language serves as a convenient interface to express our preferences with the underlying systems. In this paper, we aim to learn user preferences through online conversations. Compared to the traditional collaborative filtering setting where feedback is provided quantitatively, conversational users may only indicate their preferences at a high level with inexact item mentions. To tackle the ambiguities in natural language conversations, we propose Personalized Memory Transfer which learns a personalized model in an online manner by leveraging a key-value memory structure to distill user feedback directly from conversations. This memory structure enables the integration of prior knowledge to transfer existing item representations/preferences and natural language representations. The experiments were conducted on two public datasets and the results demonstrate the effectiveness of the proposed approach. PubDate: 2022-06-26 DOI: 10.1007/s10791-022-09410-1
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We address the problem of recommending relevant items to a user in order to “complete” a partial set of already-known items. We consider the two scenarios of citation and subject label recommendation, which resemble different semantics of item co-occurrence: relatedness for co-citations and diversity for subject labels. We assess the influence of the completeness of an already known partial item set on the recommender’s performance. We also investigate data sparsity by imposing a pruning threshold on minimum item occurrence and the influence of using additional metadata. As models, we focus on different autoencoders, which are particularly suited for reconstructing missing items in a set. We extend autoencoders to exploit a multi-modal input of text and structured data. Our experiments on six real-world datasets show that supplying the partial item set as input is usually helpful when item co-occurrence resembles relatedness, while metadata are effective when co-occurrence implies diversity. The simple item co-occurrence model is a strong baseline for citation recommendation but can provide good results also for subject labels. Autoencoders have the capability to exploit additional metadata besides the partial item set as input, and achieve comparable or better performance. For the subject label recommendation task, the title is the most important attribute. Adding more input modalities sometimes even harms the results. In conclusion, it is crucial to consider the semantics of the item co-occurrence for the choice of an appropriate model and carefully decide which metadata to exploit. PubDate: 2022-04-04 DOI: 10.1007/s10791-022-09408-9
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this work, we study recent advances in context-sensitive language models for the task of query expansion. We study the behavior of existing and new approaches for lexical word-based expansion in both unsupervised and supervised contexts. For unsupervised models, we study the behavior of the Contextualized Embeddings for Query Expansion (CEQE) model. We introduce a new model, Supervised Contextualized Query Expansion with Transformers (SQET) that performs expansion as a supervised classification task and leverages context in pseudo-relevant results. We study the behavior of these expansion approaches for the tasks of ad-hoc document and passage retrieval. We conduct experiments combining expansion with probabilistic retrieval models as well as neural document ranking models. We evaluate expansion effectiveness on three standard TREC collections: Robust, Complex Answer Retrieval, and Deep Learning. We analyze the results of extrinsic retrieval effectiveness, intrinsic ability to rank expansion terms, and perform a qualitative analysis of the differences between the methods. We find out CEQE statically significantly outperforms static embeddings across all three datasets for Recall@1000. Moreover, CEQE outperforms static embedding-based expansion methods on multiple collections (by up to 18% on Robust and 31% on Deep Learning on average precision) and also improves over proven probabilistic pseudo-relevance feedback (PRF) models. SQET outperforms CEQE by 6% in P@20 on the intrinsic term ranking evaluation and is approximately as effective in retrieval performance. Models incorporating neural and CEQE-based expansion score achieves gains of up to 5% in P@20 and 2% in AP on Robust over the state-of-the-art transformer-based re-ranking model, Birch. PubDate: 2022-03-22 DOI: 10.1007/s10791-022-09405-y
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: On the quest of providing a more natural interaction between users and search systems, open-domain conversational search assistants have emerged, by assisting users in answering questions about open topics in a conversational manner. In this work, we show how the Transformer architecture achieves state-of-the-art results in key IR tasks, leveraging the creation of conversational assistants that engage in open-domain conversational search with single, yet informative, answers. In particular, we propose a complete open-domain abstractive conversational search agent pipeline to address two major challenges: first, conversation context-aware search and second, abstractive search-answers generation. To address the first challenge, the conversation context is modeled using a query rewriting method that unfolds the context of the conversation up to a specific moment to search for the correct answers. These answers are then passed to a Transformer-based re-ranker to further improve retrieval performance. The second challenge, is tackled with recent ive Transformer architectures to generate a digest of the top most relevant passages. Experiments show that Transformers deliver a solid performance across all tasks in conversational search, outperforming several baselines. This work is an expanded version of Ferreira et al. (Open-domain conversational search assistant with transformers. In: Advances in information retrieval—43rd European conference on IR research, ECIR 2021, virtual event, 28 March–1 April 2021, proceedings, Part I. Springer) which provides more details about the various components of the of the system, and extends the automatic evaluation with a novel user-study, which confirmed the need for the conversational search paradigm, and assessed the performance of our answer generation approach. PubDate: 2022-03-14 DOI: 10.1007/s10791-022-09403-0
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: A key application of conversational search is refining a user’s search intent by asking a series of clarification questions, aiming to improve the relevance of search results. Training and evaluating such conversational systems currently requires human participation, making it infeasible to examine a wide range of user behaviors. To support robust training/evaluation of such systems, we propose a simulation framework called CoSearcher Information about code/resources available at https://github.com/amzn/cosearcher that includes a parameterized user simulator controlling key behavioral factors like cooperativeness and patience. To evaluate our approach, we use both a standard conversational query clarification benchmark and develop an extended dataset using query suggestions from a popular Web search engine as a source of additional refinement candidates. Using these datasets, we investigate the impact of a variety of conditions on search refinement and clarification effectiveness over a wide range of user behaviors, semantic policies, and dynamic facet generation. Our results quantify the effects of user behavior variation, and identify conditions required for conversational search refinement and clarification to be effective. This paper is an extended version of our previous work, and includes new experimental results for comparing semantic similarity ranking strategies for facets, using enhanced representations of facets, learning from negative user responses, among other new results and more detailed experimental descriptions. PubDate: 2022-03-10 DOI: 10.1007/s10791-022-09404-z
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Query performance prediction (QPP) has been studied extensively in the IR community over the last two decades. A by-product of this research is a methodology to evaluate the effectiveness of QPP techniques. In this paper, we re-examine the existing evaluation methodology commonly used for QPP, and propose a new approach. Our key idea is to model QPP performance as a distribution instead of relying on point estimates. To obtain such distribution, we exploit the scaled Absolute Ranking Error (sARE) measure, and its mean the scaled Mean Absolute Ranking Error (sMARE). Our work demonstrates important statistical implications, and overcomes key limitations imposed by the currently used correlation-based point-estimate evaluation approaches. We also explore the potential benefits of using multiple query formulations and ANalysis Of VAriance (ANOVA) modeling in order to measure interactions between multiple factors. The resulting statistical analysis combined with a novel evaluation framework demonstrates the merits of modeling QPP performance as distributions, and enables detailed statistical ANOVA models for comparative analyses to be created. PubDate: 2022-03-07 DOI: 10.1007/s10791-022-09407-w
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Pretrained multilingual text encoders based on neural transformer architectures, such as multilingual BERT (mBERT) and XLM, have recently become a default paradigm for cross-lingual transfer of natural language processing models, rendering cross-lingual word embedding spaces (CLWEs) effectively obsolete. In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs. We first treat these models as multilingual text encoders and benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR. In contrast to supervised language understanding, our results indicate that for unsupervised document-level CLIR—a setup with no relevance judgments for IR-specific fine-tuning—pretrained multilingual encoders on average fail to significantly outperform earlier models based on CLWEs. For sentence-level retrieval, we do obtain state-of-the-art performance: the peak scores, however, are met by multilingual encoders that have been further specialized, in a supervised fashion, for sentence understanding tasks, rather than using their vanilla ‘off-the-shelf’ variants. Following these results, we introduce localized relevance matching for document-level CLIR, where we independently score a query against document sections. In the second part, we evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments. Our results show that, despite the supervision, and due to the domain and language shift, supervised re-ranking rarely improves the performance of multilingual transformers as unsupervised base rankers. Finally, only with in-domain contrastive fine-tuning (i.e., same domain, only language transfer), we manage to improve the ranking quality. We uncover substantial empirical differences between cross-lingual retrieval results and results of (zero-shot) cross-lingual transfer for monolingual retrieval in target languages, which point to “monolingual overfitting” of retrieval models trained on monolingual (English) data, even if they are based on multilingual transformers. PubDate: 2022-03-07 DOI: 10.1007/s10791-022-09406-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: An automated contextual suggestion algorithm is likely to recommend contextually appropriate and personalized ‘points-of-interest’ (POIs) to a user, if it can extract information from the user’s preference history (exploitation) and effectively blend it with the user’s current contextual information (exploration) to predict a POI’s ‘appropriateness’ in the current context. To balance this trade-off between exploitation and exploration, we propose an unsupervised, generic framework involving a factored relevance model (FRLM), constituting two distinct components, one pertaining to historical contexts, and the other corresponding to the current context. We further generalize the proposed FRLM by incorporating the semantic relationships between terms in POI descriptors using kernel density estimation (KDE) on embedded word vectors. Additionally, we show that trip-qualifiers, (e.g. ‘trip-type’, ‘accompanied-by’) are potentially useful information sources that could be used to improve the recommendation effectiveness. Using such information is not straightforward since users’ texts/reviews of visited POIs typically do not explicitly contain such annotations. We undertake a weakly supervised approach to predict the associations between the review-texts in a user profile and the likely trip contexts. Our experiments, conducted on the TREC Contextual Suggestion 2016 dataset, demonstrate that factorization, KDE-based generalizations, and trip-qualifier enriched contexts of the relevance model improve POI recommendation. PubDate: 2022-01-21 DOI: 10.1007/s10791-021-09400-9
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In top-k ranked retrieval the goal is to efficiently compute an ordered list of the highest scoring k documents according to some stipulated similarity function such as the well-known BM25 approach. In most implementation techniques a min-heap of size k is used to track the top scoring candidates. In this work we consider the question of how best to retrieve the second page of search results, given that a first page has already been computed; that is, identification of the documents at ranks \(k+1\) to 2k for some query. Our goal is to understand what information is available as a by-product of the first-page scoring, and how it can be employed to accelerate the second-page computation, assuming that the second-page of results is required for only a fraction of the query load. We propose a range of simple, yet efficient, next-page retrieval techniques which are suitable for accelerating Document-at-a-Time mechanisms, and demonstrate their performance on three large text collections. PubDate: 2022-01-18 DOI: 10.1007/s10791-021-09402-7
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Clustering of the contents of a document corpus is used to create sub-corpora with the intention that they are expected to consist of documents that are related to each other. However, while clustering is used in a variety of ways in document applications such as information retrieval, and a range of methods have been applied to the task, there has been relatively little exploration of how well it works in practice. Indeed, given the high dimensionality of the data it is possible that clustering may not always produce meaningful outcomes. In this paper we use a well-known clustering method to explore a variety of techniques, existing and novel, to measure clustering effectiveness. Results with our new, extrinsic techniques based on relevance judgements or retrieved documents demonstrate that retrieval-based information can be used to assess the quality of clustering, and also show that clustering can succeed to some extent at gathering together similar material. Further, they show that intrinsic clustering techniques that have been shown to be informative in other domains do not work for information retrieval. Whether clustering is sufficiently effective to have a significant impact on practical retrieval is unclear, but as the results show our measurement techniques can effectively distinguish between clustering methods. PubDate: 2022-01-10 DOI: 10.1007/s10791-021-09401-8