Authors:Vladimir Vapnik; Rauf Izmailov Pages: 3 - 19 Abstract: Abstract The paper considers general machine learning models, where knowledge transfer is positioned as the main method to improve their convergence properties. Previous research was focused on mechanisms of knowledge transfer in the context of SVM framework; the paper shows that this mechanism is applicable to neural network framework as well. The paper describes several general approaches for knowledge transfer in both SVM and ANN frameworks and illustrates algorithmic implementations and performance of one of these approaches for several synthetic examples. PubDate: 2017-10-01 DOI: 10.1007/s10472-017-9538-x Issue No:Vol. 81, No. 1-2 (2017)

Authors:Vladimir Vovk; Ilia Nouretdinov; Valentina Fedorova; Ivan Petej; Alex Gammerman Pages: 21 - 46 Abstract: Abstract We study optimal conformity measures for various criteria of efficiency of set-valued classification in an idealised setting. This leads to an important class of criteria of efficiency that we call probabilistic and argue for; it turns out that the most standard criteria of efficiency used in literature on conformal prediction are not probabilistic unless the problem of classification is binary. We consider both unconditional and label-conditional conformal prediction. PubDate: 2017-10-01 DOI: 10.1007/s10472-017-9540-3 Issue No:Vol. 81, No. 1-2 (2017)

Authors:Vladimir Vovk; Dusko Pavlovic Pages: 47 - 70 Abstract: Abstract We construct universal prediction systems in the spirit of Popper’s falsifiability and Kolmogorov complexity and randomness. These prediction systems do not depend on any statistical assumptions (but under the IID assumption they dominate, to within the usual accuracy, conformal prediction). Our constructions give rise to a theory of algorithmic complexity and randomness of time containing analogues of several notions and results of the classical theory of Kolmogorov complexity and randomness. PubDate: 2017-10-01 DOI: 10.1007/s10472-017-9547-9 Issue No:Vol. 81, No. 1-2 (2017)

Authors:Anthony Bellotti Pages: 71 - 84 Abstract: Abstract Accurate property valuation is important for property purchasers, investors and for mortgage-providers to assess credit risk in the mortgage market. Automated valuation models (AVM) are being developed to provide cheap, objective valuations that allow dynamic updating of property values over the term of a mortgage. A useful feature of automated valuations is to provide a region of plausible price estimates for each individual property, rather than just a single point estimate. This would allow buyers and sellers to understand uncertainty on pricing individual properties and mortgage providers to include conservatism in their credit risk assessment. In this study, Conformal Predictors (CP) are used to provide such region predictions, whilst strictly controlling for predictive accuracy. We show how an AVM can be constructed using a CP, based on an underlying k-nearest neighbours approach. Time trend in property prices is dealt with by assuming a systematic effect over time and adjusting prices in the training data accordingly. The AVM is tested on a large data set of London property prices. Region predictions are shown to be reliable and the efficiency, ie region width, of property price predictions is investigated. In particular, a regression model is constructed to model the uncertainty in price prediction linked to property characteristics. PubDate: 2017-10-01 DOI: 10.1007/s10472-016-9534-6 Issue No:Vol. 81, No. 1-2 (2017)

Authors:Henrik Boström; Henrik Linusson; Tuve Löfström; Ulf Johansson Pages: 125 - 144 Abstract: Abstract The conformal prediction framework allows for specifying the probability of making incorrect predictions by a user-provided confidence level. In addition to a learning algorithm, the framework requires a real-valued function, called nonconformity measure, to be specified. The nonconformity measure does not affect the error rate, but the resulting efficiency, i.e., the size of output prediction regions, may vary substantially. A recent large-scale empirical evaluation of conformal regression approaches showed that using random forests as the learning algorithm together with a nonconformity measure based on out-of-bag errors normalized using a nearest-neighbor-based difficulty estimate, resulted in state-of-the-art performance with respect to efficiency. However, the nearest-neighbor procedure incurs a significant computational cost. In this study, a more straightforward nonconformity measure is investigated, where the difficulty estimate employed for normalization is based on the variance of the predictions made by the trees in a forest. A large-scale empirical evaluation is presented, showing that both the nearest-neighbor-based and the variance-based measures significantly outperform a standard (non-normalized) nonconformity measure, while no significant difference in efficiency between the two normalized approaches is observed. The evaluation moreover shows that the computational cost of the variance-based measure is several orders of magnitude lower than when employing the nearest-neighbor-based nonconformity measure. The use of out-of-bag instances for calibration does, however, result in nonconformity scores that are distributed differently from those obtained from test instances, questioning the validity of the approach. An adjustment of the variance-based measure is presented, which is shown to be valid and also to have a significant positive effect on the efficiency. For conformal regression forests, the variance-based nonconformity measure is hence a computationally efficient and theoretically well-founded alternative to the nearest-neighbor procedure. PubDate: 2017-10-01 DOI: 10.1007/s10472-017-9539-9 Issue No:Vol. 81, No. 1-2 (2017)

Authors:Ernst Ahlberg; Oscar Hammar; Claus Bendtsen; Lars Carlsson Pages: 145 - 154 Abstract: Abstract We present two applications of conformal prediction relevant to drug discovery. The first application is around interpretation of predictions and the second one around the selection of compounds to progress in a drug discovery project setting. PubDate: 2017-10-01 DOI: 10.1007/s10472-017-9550-1 Issue No:Vol. 81, No. 1-2 (2017)

Authors:Claus Bendtsen; Andrea Degasperi; Ernst Ahlberg; Lars Carlsson Pages: 155 - 166 Abstract: Abstract The high cost for new medicines is hindering their development and machine learning is therefore being used to avoid carrying out physical experiments. Here, we present a comparison between three different machine learning approaches in a classification setting where learning and prediction follow a teaching schedule to mimic the drug discovery process. The approaches are standard SVM classification, SVM based multi-kernel classification and SVM classification based on learning using privileged information. Our two main conclusions are derived using experimental in-vitro data and compound structure descriptors. The in-vitro data is assumed to i) be completely absent in the standard SVM setting, ii) be available at all times when applying multi-kernel learning, or iii) be available as privileged information during training only. The structure descriptors are always available. One conclusion is that multi-kernel learning has higher odds than standard SVM in producing higher accuracy. The second is that learning using privileged information does not have higher odds than the standard SVM, although it may improve accuracy when the training sets are small. PubDate: 2017-10-01 DOI: 10.1007/s10472-017-9541-2 Issue No:Vol. 81, No. 1-2 (2017)

Authors:A. Zaytsev; E. Burnaev Pages: 167 - 186 Abstract: Abstract Engineers widely use Gaussian process regression framework to construct surrogate models aimed to replace computationally expensive physical models while exploring design space. Thanks to Gaussian process properties we can use both samples generated by a high fidelity function (an expensive and accurate representation of a physical phenomenon) and a low fidelity function (a cheap and coarse approximation of the same physical phenomenon) while constructing a surrogate model. However, if samples sizes are more than few thousands of points, computational costs of the Gaussian process regression become prohibitive both in case of learning and in case of prediction calculation. We propose two approaches to circumvent this computational burden: one approach is based on the Nyström approximation of sample covariance matrices and another is based on an intelligent usage of a blackbox that can evaluate a low fidelity function on the fly at any point of a design space. We examine performance of the proposed approaches using a number of artificial and real problems, including engineering optimization of a rotating disk shape. PubDate: 2017-10-01 DOI: 10.1007/s10472-017-9545-y Issue No:Vol. 81, No. 1-2 (2017)

Authors:Evgeny Burnaev; Ivan Panin; Bruno Sudret Pages: 187 - 207 Abstract: Abstract Global sensitivity analysis aims at quantifying respective effects of input random variables (or combinations thereof) onto variance of a physical or mathematical model response. Among the abundant literature on sensitivity measures, Sobol indices have received much attention since they provide accurate information for most of models. We consider a problem of experimental design points selection for Sobol’ indices estimation. Based on the concept of D-optimality, we propose a method for constructing an adaptive design of experiments, effective for calculation of Sobol’ indices based on Polynomial Chaos Expansions. We provide a set of applications that demonstrate the efficiency of the proposed approach. PubDate: 2017-10-01 DOI: 10.1007/s10472-017-9542-1 Issue No:Vol. 81, No. 1-2 (2017)

Authors:Jean-François Baget; Laurent Garcia; Fabien Garreau; Claire Lefèvre; Swan Rocher; Igor Stéphan Abstract: Abstract This article deals with the combination of ontologies and rules by means of existential rules and answer set programming. Existential rules have been proposed for representing ontological knowledge, specifically in the context of Ontology- Based Data Access. Furthermore Answer Set Programming (ASP) is an appropriate formalism to represent various problems issued from Artificial Intelligence and arising when available information is incomplete. The combination of the two formalisms requires to extend existential rules with nonmonotonic negation and to extend ASP with existential variables. In this article, we present the syntax and semantics of Existential Non Monotonic Rules (ENM-rules) using skolemization which join together the two frameworks. We formalize its links with standard ASP. Moreover, since entailment with existential rules is undecidable, we present conditions that ensure the termination of a breadth-first forward chaining algorithm known as the chase and we discuss extension of these results in the nonmonotonic case. PubDate: 2017-09-13 DOI: 10.1007/s10472-017-9563-9

Authors:Olivier Caelen Abstract: Abstract We propose a way to infer distributions of any performance indicator computed from the confusion matrix. This allows us to evaluate the variability of an indicator and to assess the importance of an observed difference between two performance indicators. We will assume that the values in a confusion matrix are observations coming from a multinomial distribution. Our method is based on a Bayesian approach in which the unknown parameters of the multinomial probability function themselves are assumed to be generated from a random vector. We will show that these unknown parameters follow a Dirichlet distribution. Thanks to the Bayesian approach, we also benefit from an elegant way of injecting prior knowledge into the distributions. Experiments are done on real and synthetic data sets and assess our method’s ability to construct accurate distributions. PubDate: 2017-09-11 DOI: 10.1007/s10472-017-9564-8

Authors:Hiromi Narimatsu; Hiroyuki Kasai Abstract: Abstract Sequential data modeling and analysis have become indispensable tools for analyzing sequential data, such as time-series data, because larger amounts of sensed event data have become available. These methods capture the sequential structure of data of interest, such as input-output relations and correlation among datasets. However, because most studies in this area are specialized or limited to their respective applications, rigorous requirement analysis of such models has not been undertaken from a general perspective. Therefore, we particularly examine the structure of sequential data, and extract the necessity of “state duration” and “state interval” of events for efficient and rich representation of sequential data. Specifically addressing the hidden semi-Markov model (HSMM) that represents such state duration inside a model, we attempt to add representational capability of a state interval of events onto HSMM. To this end, we propose two extended models: an interval state hidden semi-Markov model (IS-HSMM) to express the length of a state interval with a special state node designated as “interval state node”; and an interval length probability hidden semi-Markov model (ILP-HSMM) which represents the length of the state interval with a new probabilistic parameter “interval length probability.” Exhaustive simulations have revealed superior performance of the proposed models in comparison with HSMM. These proposed models are the first reported extensions of HMM to support state interval representation as well as state duration representation. PubDate: 2017-08-31 DOI: 10.1007/s10472-017-9561-y

Authors:Ignacio Montes; Sebastien Destercke Abstract: Abstract Within imprecise probability theory, the extreme points of convex probability sets have an important practical role (to perform inference on graphical models, to compute expectation bounds, …). This is especially true for sets presenting specific features that make them easy to manipulate in applications. This easiness is the reason why extreme points of such models (probability intervals, possibility distributions, …) have been well studied. Yet, imprecise cumulative distributions (a.k.a. p-boxes) constitute an important exception, as the characterization of their extreme points remain to be studied. This is what we do in this paper, where we characterize the maximal number of extreme points of a p-box, give a family of p-boxes that attains this number and show an algorithm that allows to compute the extreme points of a given p-box. To achieve all this, we also provide what we think to be a new characterization of extreme points of a belief function. PubDate: 2017-08-11 DOI: 10.1007/s10472-017-9562-x

Authors:Pavel Surynek Abstract: This paper deals with solving cooperative path finding (CPF) problems in a makespan-optimal way. A feasible solution to the CPF problem lies in the moving of mobile agents where each agent has unique initial and goal positions. The abstraction adopted in CPF assumes that agents are discrete units that move over an undirected graph by traversing its edges. We focus specifically on makespan-optimal solutions to the CPF problem where the task is to generate solutions that are as short as possible in terms of the total number of time steps required for all agents to reach their goal positions. We demonstrate that reducing CPF to propositional satisfiability (SAT) represents a viable way to obtain makespan-optimal solutions. Several ways of encoding CPFs into propositional formulae are proposed and evaluated both theoretically and experimentally. Encodings based on the log and direct representations of decision variables are compared. The evaluation indicates that SAT-based solutions to CPF outperform the makespan-optimal versions of such search-based CPF solvers such as OD+ID, CBS, and ICTS in highly constrained scenarios (i.e., environments that are densely occupied by agents and where interactions among the agents are frequent). Moreover, the experiments clearly show that CPF encodings based on the direct representation of variables can be solved faster, although they are less space-efficient than log encodings. PubDate: 2017-08-02 DOI: 10.1007/s10472-017-9560-z

Authors:S. Zhou; E. N. Smirnov; G. Schoenmakers; R. Peeters Abstract: Abstract Instance transfer for classification aims at boosting generalization performance of classification models for a target domain by exploiting data from a relevant source domain. Most of the instance-transfer approaches assume that the source data is relevant to the target data for the complete set of features used to represent the data. This assumption fails if the target data and source data are relevant only for strict subsets of the input features which we call “partially input-feature relevant”. In this case these approaches may result in sub-optimal classification models or even in a negative transfer. This paper proposes a new decision-tree approach to instance transfer when the source data are partially input-feature relevant to the target data. The approach selects input features for tree nodes using univariate transfer of source instances. The instance transfer is guided by a conformal test for source relevance estimation. Experimental results on real-world data sets demonstrate that the new decision-tree approach is capable of outperforming existing instance-transfer approaches, especially, when the source data are partially input-feature relevant to the target data. PubDate: 2017-06-17 DOI: 10.1007/s10472-017-9554-x

Authors:Paolo Toccaceli; Ilia Nouretdinov; Alexander Gammerman Abstract: Abstract The paper presents an application of Conformal Predictors to a chemoinformatics problem of predicting the biological activities of chemical compounds. The paper addresses some specific challenges in this domain: a large number of compounds (training examples), high-dimensionality of feature space, sparseness and a strong class imbalance. A variant of conformal predictors called Inductive Mondrian Conformal Predictor is applied to deal with these challenges. Results are presented for several non-conformity measures extracted from underlying algorithms and different kernels. A number of performance measures are used in order to demonstrate the flexibility of Inductive Mondrian Conformal Predictors in dealing with such a complex set of data. This approach allowed us to identify the most likely active compounds for a given biological target and present them in a ranking order. PubDate: 2017-06-16 DOI: 10.1007/s10472-017-9556-8

Authors:Alexander Kuleshov; Alexander Bernstein Abstract: Abstract Consider unknown smooth function which maps high-dimensional inputs to multidimensional outputs and whose domain of definition is unknown low-dimensional input manifold embedded in an ambient high-dimensional input space. Given training dataset consisting of ‘input-output’ pairs, regression on input manifold problem is to estimate the unknown function and its Jacobian matrix, as well to estimate the input manifold. By transforming high-dimensional inputs in their low-dimensional features, initial regression problem is reduced to certain regression on feature space problem. The paper presents a new geometrically motivated method for solving both interrelated regression problems. PubDate: 2017-05-16 DOI: 10.1007/s10472-017-9551-0