Subjects -> MATHEMATICS (Total: 1013 journals)     - APPLIED MATHEMATICS (92 journals)    - GEOMETRY AND TOPOLOGY (23 journals)    - MATHEMATICS (714 journals)    - MATHEMATICS (GENERAL) (45 journals)    - NUMERICAL ANALYSIS (26 journals)    - PROBABILITIES AND MATH STATISTICS (113 journals) MATHEMATICS (714 journals)            First | 1 2 3 4
 Showing 601 - 538 of 538 Journals sorted alphabetically Results in Control and Optimization Results in Mathematics Results in Nonlinear Analysis Review of Symbolic Logic       (Followers: 2) Reviews in Mathematical Physics       (Followers: 1) Revista Baiana de Educação Matemática Revista Bases de la Ciencia Revista BoEM - Boletim online de Educação Matemática Revista Colombiana de Matemáticas       (Followers: 1) Revista de Ciencias Revista de Educación Matemática Revista de la Escuela de Perfeccionamiento en Investigación Operativa Revista de la Real Academia de Ciencias Exactas, Fisicas y Naturales. Serie A. Matematicas Revista de Matemática : Teoría y Aplicaciones       (Followers: 1) Revista Digital: Matemática, Educación e Internet Revista Electrónica de Conocimientos, Saberes y Prácticas Revista Integración : Temas de Matemáticas Revista Internacional de Sistemas Revista Latinoamericana de Etnomatemática Revista Latinoamericana de Investigación en Matemática Educativa Revista Matemática Complutense Revista REAMEC : Rede Amazônica de Educação em Ciências e Matemática Revista SIGMA Ricerche di Matematica RMS : Research in Mathematics & Statistics Royal Society Open Science       (Followers: 7) Russian Journal of Mathematical Physics Russian Mathematics Sahand Communications in Mathematical Analysis Sampling Theory, Signal Processing, and Data Analysis São Paulo Journal of Mathematical Sciences Science China Mathematics       (Followers: 1) Science Progress       (Followers: 1) Sciences & Technologie A : sciences exactes Selecta Mathematica       (Followers: 1) SeMA Journal Semigroup Forum       (Followers: 1) Set-Valued and Variational Analysis SIAM Journal on Applied Mathematics       (Followers: 11) SIAM Journal on Computing       (Followers: 11) SIAM Journal on Control and Optimization       (Followers: 18) SIAM Journal on Discrete Mathematics       (Followers: 8) SIAM Journal on Financial Mathematics       (Followers: 3) SIAM Journal on Mathematics of Data Science       (Followers: 1) SIAM Journal on Matrix Analysis and Applications       (Followers: 3) SIAM Journal on Optimization       (Followers: 12) Siberian Advances in Mathematics Siberian Mathematical Journal Sigmae SILICON SN Partial Differential Equations and Applications Soft Computing       (Followers: 7) Statistics and Computing       (Followers: 14) Stochastic Analysis and Applications       (Followers: 3) Stochastic Partial Differential Equations : Analysis and Computations       (Followers: 2) Stochastic Processes and their Applications       (Followers: 6) Stochastics and Dynamics       (Followers: 2) Studia Scientiarum Mathematicarum Hungarica       (Followers: 1) Studia Universitatis Babeș-Bolyai Informatica Studies In Applied Mathematics       (Followers: 1) Studies in Mathematical Sciences       (Followers: 1) Superficies y vacio Suska Journal of Mathematics Education       (Followers: 1) Swiss Journal of Geosciences       (Followers: 1) Synthesis Lectures on Algorithms and Software in Engineering       (Followers: 2) Synthesis Lectures on Mathematics and Statistics       (Followers: 1) Tamkang Journal of Mathematics Tatra Mountains Mathematical Publications Teaching Mathematics       (Followers: 10) Teaching Mathematics and its Applications: An International Journal of the IMA       (Followers: 4) Teaching Statistics       (Followers: 8) Technometrics       (Followers: 8) The Journal of Supercomputing       (Followers: 1) The Mathematica journal The Mathematical Gazette       (Followers: 1) The Mathematical Intelligencer The Ramanujan Journal The VLDB Journal       (Followers: 2) Theoretical and Mathematical Physics       (Followers: 7) Theory and Applications of Graphs Topological Methods in Nonlinear Analysis Transactions of the London Mathematical Society       (Followers: 1) Transformation Groups Turkish Journal of Mathematics Ukrainian Mathematical Journal Uniciencia Uniform Distribution Theory Unisda Journal of Mathematics and Computer Science Unnes Journal of Mathematics       (Followers: 1) Unnes Journal of Mathematics Education       (Followers: 2) Unnes Journal of Mathematics Education Research       (Followers: 1) Ural Mathematical Journal Vestnik Samarskogo Gosudarstvennogo Tekhnicheskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki Vestnik St. Petersburg University: Mathematics VFAST Transactions on Mathematics       (Followers: 1) Vietnam Journal of Mathematics Vinculum Visnyk of V. N. Karazin Kharkiv National University. Ser. Mathematics, Applied Mathematics and Mechanics       (Followers: 2) Water SA       (Followers: 1) Water Waves Zamm-Zeitschrift Fuer Angewandte Mathematik Und Mechanik       (Followers: 1) ZDM       (Followers: 2) Zeitschrift für angewandte Mathematik und Physik       (Followers: 2) Zeitschrift fur Energiewirtschaft Zetetike

First | 1 2 3 4

Similar Journals
 The VLDB JournalJournal Prestige (SJR): 1.003 Citation Impact (citeScore): 5Number of Followers: 2      Hybrid journal (It can contain Open Access articles) ISSN (Print) 1066-8888 - ISSN (Online) 0949-877X Published by Springer-Verlag  [2469 journals]
• Efficient kNN query for moving objects on time-dependent road networks

Abstract: Abstract In this paper, we study the Time-Dependent k Nearest Neighbor (TD-kNN) query on moving objects that aims to return k objects arriving at the query location with the least traveling cost departing at a given time t. Although the kNN query on moving objects has been widely studied in the scenario of the static road network, the TD-kNN query tends to be more complicated and challenging because under the time-dependent road network, the cost of each edge is measured by a cost function rather than a fixed distance value. To tackle such difficulty, we adopt the framework of GLAD and develop an advanced index structure to support efficient fastest travel cost query on time-dependent road network. In particular, we propose the Time-Dependent H2H (TD-H2H) index, which pre-computes the aggregated weight functions between each node to some specific nodes in the decomposition tree derived from the road network. Additionally, we establish a grid index on moving objects for candidate object retrieval and location update. To further accelerate the TD-kNN query, two pruning strategies are proposed in our solution. Apart from that, we extend our framework to tackle the time-dependent approachable kNN (TD-AkNN) query on moving objects targeting for the application of taxi-hailing service, where the moving object might have been occupied. Extensive experiments with different parameter settings on real-world road network show that our solutions for both TD-kNN and TD-AkNN queries are superior to the competitors in orders of magnitude.
PubDate: 2022-07-27

• A Pareto optimal Bloom filter family with hash adaptivity

Abstract: Abstract Bloom filter is a compact memory-efficient probabilistic data structure supporting membership testing, i.e., to check whether an element is in a given set. However, as Bloom filter maps each element with random hash functions, little flexibility is provided even if the information of negative keys (elements are not in the set) is available, especially when the misidentification of negative keys brings different costs. The problem worsens when the hash functions are non-uniform, i.e., mapping each element into Bloom filter non-uniformly. To address the above problem, we propose a new hash adaptive Bloom filter (HABF) that supports customizing hash functions for keys. Besides, we propose a filter family, including f-HABF (fast hashing version), c-HABF (cache-friendly version), and s-HABF (stacked version). We show that HABF family is Pareto optimal among all comparison filters in terms of accuracy and query latency. We conduct extensive experiments on representative datasets, and the results show that HABF family outperforms the standard Bloom filter and its cutting-edge variants on the whole in terms of accuracy, construction/query time, and memory space consumption. All the source codes are available in our source codes (https://github.com/njulands/HashAdaptiveBF).
PubDate: 2022-07-26

• HERMES: data placement and schema optimization for enterprise knowledge
bases

Abstract: Abstract Enterprises create domain-specific knowledge bases (KBs) by curating and integrating their business data from multiple sources. To support a variety of query types over domain-specific KBs, we propose Hermes, an ontology-based system that allows storing KB data in multiple backends, and querying them with different query languages. In this paper, we address two important challenges in realizing such a system: data placement and schema optimization. First, we identify the best data store for any query type and determine the subset of the KB that needs to be stored in this data store, while minimizing data replication. Second, we optimize how we organize the data for best query performance. To choose the best data stores, we partition the data described by the domain ontology into multiple overlapping subsets based on the operations performed in a given query workload, and place these subsets in appropriate data stores according to their capabilities. Then, we optimize the schema on each data store to enable efficient querying. In particular, we focus on the property graph schema optimization, which has been largely ignored in the literature. We propose two algorithms to generate an optimized schema from the domain ontology. We demonstrate the effectiveness of our data placement and schema optimization algorithms with two real-world KBs from the medical and financial domains. The results show that the proposed data placement algorithm generates near-optimal data placement plans with minimal data replication overhead, and the schema optimization algorithms produce high-quality schemas, achieving up to two orders of magnitude speed-up compared to alternative schema designs.
PubDate: 2022-07-26

• Reverse spatial top-k keyword queries

Abstract: Abstract We introduce the R everse S patial Top-k K eyword (RSK) query, which is defined as: given a query term q, an integer k and a neighborhood size find all the neighborhoods of that size where q is in the top-k most frequent terms among the social posts in those neighborhoods. An obvious approach would be to partition the dataset with a uniform grid structure of a given cell size and identify the cells where this term is in the top-k most frequent keywords. However, this answer would be incomplete since it only checks for neighborhoods that are perfectly aligned with the grid. Furthermore, for every neighborhood (square) that is an answer, we can define infinitely more result neighborhoods by minimally shifting the square without including more posts in it. To address that, we need to identify contiguous regions where any point in the region can be the center of a neighborhood that satisfies the query. We propose an algorithm to efficiently answer an RSK query using an index structure consisting of a uniform grid augmented by materialized lists of term frequencies. We apply various optimizations that drastically improve query latency against baseline approaches. We also provide a theoretical model to choose the optimal cell size for the index to minimize query latency. We further examine a restricted version of the problem (RSKR) that limits the scope of the answer and propose efficient approximate algorithms. Finally, we examine how parallelism can improve performance by balancing the workload using a smart load slicing technique. Extensive experimental performance evaluation of the proposed methods using real Twitter datasets and crime report datasets, shows the efficiency of our optimizations and the accuracy of the proposed theoretical model.
PubDate: 2022-07-25

• Enhancing domain-aware multi-truth data fusion using copy-based source
authority and value similarity

Abstract: Abstract Data fusion, within the data integration pipeline, addresses the problem of discovering the true values of a data item when multiple sources provide different values for it. An important contribution to the solution of the problem can be given by assessing the quality of the involved sources and relying more on the values coming from trusted sources. State-of-the-art data fusion systems define source trustworthiness on the basis of the accuracy of the provided values and on the dependence on other sources, and recently it has been also recognized that the trustworthiness of the same source may vary with the domain of interest. In this paper we propose STORM, a novel domain-aware algorithm for data fusion designed for the multi-truth case, that is, when a data item can also have multiple true values. Like many other data-fusion techniques, STORM relies on Bayesian inference. However, differently from the other Bayesian approaches to the problem, it determines the trustworthiness of sources by taking into account their authority: Here, we define authoritative sources as those that have been copied by many other ones, assuming that, when source administrators decide to copy data from other sources, they choose the ones they perceive as the most reliable. To group together the values that have been recognized as variants representing the same real-world entity, STORM provides also a value-reconciliation step, thus reducing the possibility of making mistakes in the remaining part of the algorithm. The experimental results on multi-truth synthetic and real-world datasets show that STORM represents a solid step forward in data-fusion research.
PubDate: 2022-07-19

• ICS-GNN $$^+$$ + : lightweight interactive community search via graph
neural network

Abstract: Abstract Searching for a community containing a query node in an online social network enjoys wide applications like recommendation, team organization, etc. When applied to real-life networks, the existing approaches face two major limitations. First, they usually take two steps, i.e., crawling a large part of the network first and then finding the community next, but the entire network is usually too big and most of the data are not interesting to end users. Second, the existing methods utilize hand-crafted rules to measure community membership, while it is very difficult to define effective rules as the communities are flexible for different query nodes. This paper proposes an interactive community search method based on graph neural network (shortened by ICS-GNN $$^+$$ ) to locate the target community over a subgraph collected on the fly from an online network iteratively. In each iteration, we first build a candidate subgraph around the query node and labeled nodes. We then train a node classification model using GNN to determine whether every node belongs to the target community, which captures similarities between nodes by combining content and structural features seamlessly and flexibly under the guide of users’ labeling. Based on the probabilities inferred from the trained GNN, we introduce a k-sized Maximum-GNN-scores (shortened by kMG) community to describe the target community and design a method to locate the kMG community which will be evaluated by end users to acquire more feedback. Besides, various optimization strategies are proposed including an adaptive method to maintain the subgraph during iterations, combining ranking loss into the GNN model, generating node embedding enhanced by pseudo-labels from node clusters in the subgraph, and a greedy community searching method with benefit computed globally. We conduct the experiments on both offline and online real-life datasets, and demonstrate that ICS-GNN $$^+$$ can produce effective communities with low overhead in communication, computation, and user labeling.
PubDate: 2022-07-18

• Highly distributed and privacy-preserving queries on personal data
management systems

Abstract: Abstract Personal data management system (PDMS) solutions are flourishing, boosted by smart disclosure initiatives and new regulations. PDMSs allow users to easily store and manage data directly generated by their devices or resulting from their (digital) interactions. Users can then leverage the power of their PDMS to benefit from their personal data, for their own good and in the interest of the community. The PDMS paradigm thus brings exciting perspectives by unlocking novel usages, but also raises security issues. An effective approach, considered in several recent works, is to let the user data distributed on personal platforms, secured locally using hardware and/or software security mechanisms. This paper goes beyond the local security issues and addresses the important question of securely querying this massively distributed personal data. To this end, we propose DISPERS, a fully distributed PDMS peer-to-peer architecture. DISPERS allows users to securely and efficiently share and query their personal data, even in the presence of malicious nodes. We consider three increasingly powerful threat models and derive, for each, a security requirement that must be fulfilled to reach a lower-bound in terms of sensitive data leakage: (1) hidden communications, (2) random dispersion of data and (3) collaborative proofs. These requirements are incremental and, respectively, resist spied, leaking or corrupted nodes. We show that the expected security level can be guaranteed with near certainty and validate experimentally the efficiency of the proposed protocols, allowing for adjustable trade-off between the security level and its cost.
PubDate: 2022-07-07

• VolcanoML: speeding up end-to-end AutoML via scalable search space
decomposition

Abstract: Abstract End-to-end AutoML has attracted intensive interests from both academia and industry which automatically searches for ML pipelines in a space induced by feature engineering, algorithm/model selection, and hyper-parameter tuning. Existing AutoML systems, however, suffer from scalability issues when applying to application domains with large, high-dimensional search spaces. We present VolcanoML, a scalable and extensible framework that facilitates systematic exploration of large AutoML search spaces. VolcanoML introduces and implements basic building blocks, which decompose a large search space into smaller ones, and allows users to utilize these building blocks to compose an execution plan for the AutoML problem at hand. VolcanoML further supports a Volcano-style execution model—akin to the one supported by modern database systems—to execute the plan constructed. Our evaluation demonstrates that, not only does VolcanoML raise the level of expressiveness for search space decomposition in AutoML, it also leads to actual findings of decomposition strategies that are significantly more efficient than the ones employed by state-of-the-art AutoML systems such as auto-sklearn.
PubDate: 2022-07-01

• Efficient exploratory clustering analyses in large-scale exploration
processes

Abstract: Abstract Clustering is a fundamental primitive in manifold applications. In order to achieve valuable results in exploratory clustering analyses, parameters of the clustering algorithm have to be set appropriately, which is a tremendous pitfall. We observe multiple challenges for large-scale exploration processes. On the one hand, they require specific methods to efficiently explore large parameter search spaces. On the other hand, they often exhibit large runtimes, in particular when large datasets are analyzed using clustering algorithms with super-polynomial runtimes, which repeatedly need to be executed within exploratory clustering analyses. We address these challenges as follows: First, we present LOG-Means and show that it provides estimates for the number of clusters in sublinear time regarding the defined search space, i.e., provably requiring less executions of a clustering algorithm than existing methods. Second, we demonstrate how to exploit fundamental characteristics of exploratory clustering analyses in order to significantly accelerate the (repetitive) execution of clustering algorithms on large datasets. Third, we show how these challenges can be tackled at the same time. To the best of our knowledge, this is the first work which simultaneously addresses the above-mentioned challenges. In our comprehensive evaluation, we unveil that our proposed methods significantly outperform state-of-the-art methods, thus especially supporting novice analysts for exploratory clustering analyses in large-scale exploration processes.
PubDate: 2022-07-01

• Interactively discovering and ranking desired tuples by data exploration

Abstract: Abstract Data exploration—the problem of extracting knowledge from database even if we do not know exactly what we are looking for —is important for data discovery and analysis. However, precisely specifying SQL queries is not always practical, such as “finding and ranking off-road cars based on a combination of Price, Make, Model, Age, Mileage, etc”—not only due to the query complexity (e.g.,the queries may have many if-then-else, and, or and not logic), but also because the user typically does not have the knowledge of all data instances (and their variants). We propose DExPlorer, a system for interactive data exploration. From the user perspective, we propose a simple and user-friendly interface, which allows to: (1) confirm whether a tuple is desired or not, and (2) decide whether a tuple is more preferred than another. Behind the scenes, we jointly use multiple ML models to learn from the above two types of user feedback. Moreover, in order to effectively involve human-in-the-loop, we need to select a set of tuples for each user interaction so as to solicit feedback. Therefore, we devise question selection algorithms, which consider not only the estimated benefit of each tuple, but also the possible partial orders between any two suggested tuples. Experiments on real-world datasets show that DExPlorer outperforms existing approaches in effectiveness.
PubDate: 2022-07-01

• Fast, exact, and parallel-friendly outlier detection algorithms with
proximity graph in metric spaces

Abstract: Abstract In many fields, e.g., data mining and machine learning, distance-based outlier detection (DOD) is widely employed to remove noises and find abnormal phenomena, because DOD is unsupervised, can be employed in any metric spaces, and does not have any assumptions of data distributions. Nowadays, data mining and machine learning applications face the challenge of dealing with large datasets, which requires efficient DOD algorithms. We address the DOD problem with two different definitions. Our new idea, which solves the problems, is to exploit an in-memory proximity graph. For each problem, we propose a new algorithm that exploits a proximity graph and analyze an appropriate type of proximity graph for the algorithm. Our empirical study using real datasets confirms that our DOD algorithms are significantly faster than state-of-the-art ones.
PubDate: 2022-07-01

• Parallel mining of large maximal quasi-cliques

Abstract: Abstract Given a user-specified minimum degree threshold $$\gamma$$ , a $$\gamma$$ -quasi-clique is a subgraph where each vertex connects to at least $$\gamma$$ fraction of the other vertices. Quasi-clique is a natural definition for dense structures, so finding large and hence statistically significant quasi-cliques is useful in applications such as community detection in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive, and even a recent algorithm for mining large maximal quasi-cliques is flawed and can lead to a lot of repeated searches. This paper proposes a parallel solution for mining maximal quasi-cliques that is able to fully utilize CPU cores. Our solution utilizes divide and conquer to decompose the workloads into independent tasks for parallel mining, and we addressed the problem of (i) drastic load imbalance among different tasks and (ii) difficulty in predicting the task running time and the time growth with task-subgraph size, by (a) using a timeout-based task decomposition strategy, and by (b) utilizing a priority task queue to schedule long-running tasks earlier for mining and decomposition to avoid stragglers. Unlike our conference version in PVLDB 2020 where the solution was built on a distributed graph mining framework called G-thinker, this paper targets a single-machine multi-core environment which is more accessible to an average end user. A general framework called T-thinker is developed to facilitate the programming of parallel programs for algorithms that adopt divide and conquer, including but not limited to our quasi-clique mining algorithm. Additionally, we consider the problem of directly mining large quasi-cliques from dense parts of a graph, where we identify the repeated search issue of a recent method and address it using a carefully designed concurrent trie data structure. Extensive experiments verify that our parallel solution scales well with the number of CPU cores, achieving 26.68 $$\times$$ runtime speedup when mining a graph with 3.77M vertices and 16.5M edges with 32 mining threads. Additionally, mining large quasi-cliques from dense parts can provide an additional speedup of up to 89.46 $$\times$$ .
PubDate: 2022-07-01

• A survey on semantic schema discovery

Abstract: Abstract More and more weakly structured, and irregular data sources are becoming available every day. The schema of these sources is useful for a number of tasks, such as query answering, exploration and summarization. However, although semantic web data might contain schema information, in many cases this is completely missing or partially defined. In this paper, we present a survey of the state of the art on schema information extraction approaches. We analyze and classify these approaches into three families: (1) approaches that exploit the implicit structure of the data, without assuming that some explicit statements on the schema are provided in the dataset; (2) approaches that use the explicit schema statements contained in the dataset to complement and enrich the schema, and (3) those that discover structural patterns contained in a dataset. We compare these studies in terms of their approach, advantages and limitations. Finally we discuss the problems that remain open.
PubDate: 2022-07-01

• Span-reachability querying in large temporal graphs

Abstract: Abstract Reachability is a fundamental problem in graph analysis. In applications such as social networks and collaboration networks, edges are always associated with timestamps. Most existing works on reachability queries in temporal graphs assume that two vertices are related if they are connected by a path with non-decreasing timestamps (time-respecting) of edges. This assumption fails to capture the relationship between entities involved in the same group or activity with no time-respecting path connecting them. In this paper, we define a new reachability model, called span-reachability, designed to relax the time order dependency and identify the relationship between entities in a given time period. We adopt the idea of two-hop cover and propose an index-based method to answer span-reachability queries. Several optimizations are also given to improve the efficiency of index construction and query processing. We conduct extensive experiments on eighteen real-world datasets to show the efficiency of our proposed solution.
PubDate: 2022-07-01

• Privacy-preserving worker allocation in crowdsourcing

Abstract: Abstract Crowdsourcing has been a prevalent way to obtain answers for tasks that need human intelligence. In general, a crowdsourcing platform is responsible for allocating workers to each received task, with high-quality workers in priority. However, the allocation results can in turn yield knowledge about workers’ quality. For example, those unallocated workers are supposed to be less-qualified. They can be upset if such information is known by the public, which is an invasion of their privacy. To alleviate such concerns, we study the privacy-preserving worker allocation problem in this paper, aiming to properly allocate the workers while protecting their privacy. We propose worker allocation methods with the property of differential privacy, which proceed by first computing weights for each potential allocation and then sampling according to the weights. The Markov Chain Monte Carlo-based method is shown in our experiments to improve over the trivial random allocation method by 18.9% in terms of worker quality on synthetic data. On the real data, it realizes differential privacy with less than 20% loss on quality even when $$\epsilon = \frac{1}{3}$$ .
PubDate: 2022-07-01

• Optimal price profile for influential nodes in online social networks

Abstract: Abstract Influential nodes with rich connections in online social networks (OSNs) are of great values to initiate marketing campaigns. However, the potential influence spread that can be generated by these influential nodes is hidden behind the structures of OSNs, which are often held by OSN providers and unavailable to advertisers for privacy concerns. A social advertising model known as influencer marketing is to have OSN providers offer and price candidate nodes for advertisers to purchase for seeding marketing campaigns. In this setting, a reasonable price profile for the candidate nodes should effectively reflect the expected influence gain they can bring in a marketing campaign. In this paper, we study the problem of pricing the influential nodes based on their expected influence spread to help advertisers select the initiators of marketing campaigns without the knowledge of OSN structures. We design a function characterizing the divergence between the price and the expected influence of the initiator sets. We formulate the problem to minimize the divergence and derive an optimal price profile. An advanced algorithm is developed to estimate the price profile with accuracy guarantees. Experiments with real OSN datasets show that our pricing algorithm can significantly outperform other baselines.
PubDate: 2022-07-01

• Correction to: BugDoc Iterative debugging and explanation of pipeline
executions

PubDate: 2022-06-13

• Detecting rumours with latency guarantees using massive streaming data

Abstract: Abstract Today’s social networks continuously generate massive streams of data, which provide a valuable starting point for the detection of rumours as soon as they start to propagate. However, rumour detection faces tight latency bounds, which cannot be met by contemporary algorithms, given the sheer volume of high-velocity streaming data emitted by social networks. Hence, in this paper, we argue for best-effort rumour detection that detects most rumours quickly rather than all rumours with a high delay. To this end, we combine techniques for efficient, graph-based matching of rumour patterns with effective load shedding that discards some of the input data while minimising the loss in accuracy. Experiments with large-scale real-world datasets illustrate the robustness of our approach in terms of runtime performance and detection accuracy under diverse streaming conditions.
PubDate: 2022-06-08

• Fast subgraph query processing and subgraph matching via static and
dynamic equivalences

Abstract: Abstract Subgraph query processing (also known as subgraph search) and subgraph matching are fundamental graph problems in many application domains. A lot of efforts have been made to develop practical solutions for these problems. Despite the efforts, existing algorithms showed limited running time and scalability in dealing with large and/or many graphs. In this paper, we propose a new subgraph search algorithm using equivalences of vertices in order to reduce search space: (1) static equivalence of vertices in a query graph that leads to an efficient matching order of the vertices and (2) dynamic equivalence of candidate vertices in a data graph, which enables us to capture and remove redundancies in search space. These techniques for subgraph search also lead to an improved algorithm for subgraph matching. Experiments show that our approach outperforms state-of-the-art subgraph search and subgraph matching algorithms by up to several orders of magnitude with respect to query processing time.
PubDate: 2022-06-07

• Unified route representation learning for multi-modal transportation
recommendation with spatiotemporal pre-training

Abstract: Abstract Multi-modal transportation recommendation aims to provide the most appropriate travel route with various transportation modes according to certain criteria. After analyzing large-scale navigation data, we find that route representations exhibit two patterns: spatio-temporal autocorrelations within transportation networks and the semantic coherence of route sequences. However, there are few studies that consider both patterns when developing multi-modal transportation systems. To this end, in this paper, we study multi-modal transportation recommendation with unified route representation learning by exploiting both spatio-temporal dependencies in transportation networks and the semantic coherence of historical routes. Specifically, we first transform the multi-modal transportation network into time-dependent multi-view transportation graphs and devise a graph-based contextual encoder to impute the missing traffic condition in transportation networks by leveraging various contextual factors. Then, we propose a hierarchical multi-task route representation learning (HMTRL) framework for recommendations, including (1) a spatiotemporal graph neural network module to capture the spatial and temporal autocorrelation, (2) a coherent-aware attentive route representation learning module to explicitly model route coherence from historical routes, and (3) a hierarchical multi-task learning module to differentiate route representations for different transport modes by incorporating multiple auxiliary tasks equipped in different network layers. Moreover, to improve the model generalization capability, we further propose spatiotemporal pre-training strategies to exploit rich self-supervision signals hidden in transportation networks and historical trajectories. Finally, extensive experimental results on two large-scale real-world datasets demonstrate the effectiveness of the proposed system against eight baselines.
PubDate: 2022-05-27
DOI: 10.1007/s00778-022-00748-y

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762