Abstract: Abstract This article proposes the modified AHC (Agglomerative Hierarchical Clustering) algorithm which clusters string vectors, instead of numerical vectors, as the approach to the text clustering. The results from applying the string vector based algorithms to the text clustering were successful in previous works and synergy effect between the text clustering and the word clustering is expected by combining them with each other; the two facts become motivations for this research. In this research, we define the operation on string vectors called semantic similarity, and modify the AHC algorithm by adopting the proposed similarity metric as the approach to the text clustering. The proposed AHC algorithm is empirically validated as the better approach in clustering texts in news articles and opinions. We need to define and characterize mathematically more operations on string vectors for modifying more advanced machine learning algorithms. PubDate: 2020-01-13

Abstract: Abstract Time series averaging is an important subroutine for several time series data mining tasks. The most successful approaches formulate the problem of time series averaging as an optimization problem based on the dynamic time warping (DTW) distance. The existence of an optimal solution, called sample mean, is an open problem for more than four decades. Its existence is a necessary prerequisite to formulate exact algorithms, to derive complexity results, and to study statistical consistency. In this article, we propose sufficient conditions for the existence of a sample mean. A key result for deriving the proposed sufficient conditions is the Reduction Theorem that provides an upper bound for the minimum length of a sample mean. PubDate: 2020-01-10

Abstract: Abstract State-of-the-art decision tree algorithms are top-down induction heuristics which greedily partition the attribute space by iteratively choosing the best split on an individual attribute. Despite their attractive performance in terms of runtime, simple examples, such as the XOR-Problem, point out that these heuristics often fail to find the best classification rules if there are strong interactions between two or more attributes from the given datasets. In this context, we present a branch and bound based decision tree algorithm to identify optimal bivariate axis-aligned splits according to a given impurity measure. In contrast to a univariate split that can be found in linear time, such an optimal cross-split has to consider every combination of values for every possible selection of pairs of attributes which leads to a combinatorial optimization problem that is quadratic in the number of values and attributes. To overcome this complexity, we use a branch and bound procedure, a well known technique from combinatorial optimization, to divide the solution space into several sets and to detect the optimal cross-splits in a short amount of time. These cross splits can either be used directly to construct quaternary decision trees or they can be used to select only the better one of the individual splits. In the latter case, the outcome is a binary decision tree with a certain sense of foresight for correlated attributes. We test both of these variants on various datasets of the UCI Machine Learning Repository and show that cross-splits can consistently produce smaller decision trees than state-of-the-art methods with comparable accuracy. In some cases, our algorithm produces considerably more accurate trees due to the ability of drawing more elaborate decisions than univariate induction algorithms. PubDate: 2020-01-03

Abstract: Abstract This paper proposes an algorithm to reason on constraints expressed in terms of polynomials with integer coefficients whose variables take values from finite subsets of the integers. The proposed algorithm assumes that an initial approximation of the domains of variables is available in terms of a bounding box, and it recursively subdivides the box into disjoint boxes until a termination condition is met. The algorithm includes three termination conditions that allow using it for three related reasoning tasks: constraint satisfaction, enumeration of solutions, and hyper-arc consistency enforcement. Considered termination conditions are based on suitable lower and upper bounds for polynomial functions over boxes that are determined using new results proved in the paper. The algorithm is particularly appropriate to reason on high-degree polynomial constraints because the proposed method to determine lower and upper bounds can outperform alternative methods when high-degree polynomials in a moderate number of variables are considered. PubDate: 2019-12-05

Abstract: Abstract Recently, Mahloujifar and Mahmoody (Theory of Cryptography Conference’17) studied attacks against learning algorithms using a special case of Valiant’s malicious noise, called p-tampering, in which the adversary gets to change any training example with independent probability p but is limited to only choose ‘adversarial’ examples with correct labels. They obtained p-tampering attacks that increase the error probability in the so called ‘targeted’ poisoning model in which the adversary’s goal is to increase the loss of the trained hypothesis over a particular test example. At the heart of their attack was an efficient algorithm to bias the expected value of any bounded real-output function through p-tampering. In this work, we present new biasing attacks for increasing the expected value of bounded real-valued functions. Our improved biasing attacks, directly imply improved p-tampering attacks against learners in the targeted poisoning model. As a bonus, our attacks come with considerably simpler analysis. We also study the possibility of PAC learning under p-tampering attacks in the non-targeted (aka indiscriminate) setting where the adversary’s goal is to increase the risk of the generated hypothesis (for a random test example). We show that PAC learning is possible under p-tampering poisoning attacks essentially whenever it is possible in the realizable setting without the attacks. We further show that PAC learning under ‘no-mistake’ adversarial noise is not possible, if the adversary could choose the (still limited to only p fraction of) tampered examples that she substitutes with adversarially chosen ones. Our formal model for such ‘bounded-budget’ tampering attackers is inspired by the notions of adaptive corruption in cryptography. PubDate: 2019-12-03

Abstract: Moser and Tardos (J. ACM (JACM) 57(2), 11 2010) gave an algorithmic proof of the lopsided Lovász local lemma (LLL) in the variable framework, where each of the undesirable events is assumed to depend on a subset of a collection of independent random variables. For the proof, they define a notion of a lopsided dependency between the events suitable for this framework. In this work, we strengthen this notion, defining a novel directed notion of dependency and prove the LLL for the corresponding graph. We show that this graph can be strictly sparser (thus the sufficient condition for the LLL weaker) compared with graphs that correspond to other extant lopsided versions of dependency. Thus, in a sense, we address the problem “find other simple local conditions for the constraints (in the variable framework) that advantageously translate to some abstract lopsided condition” posed by Szegedy (2013). We also give an example where our notion of dependency graph gives better results than the classical Shearer lemma. Finally, we prove Shearer’s lemma for the dependency graph we define. For the proofs, we perform a direct probabilistic analysis that yields an exponentially small upper bound for the probability of the algorithm that searches for the desired assignment to the variables not to return a correct answer within n steps. In contrast, the method of proof that became known as the entropic method, gives an estimate of only the expectation of the number of steps until the algorithm returns a correct answer, unless the probabilities are tinkered with. PubDate: 2019-12-02

Abstract: Abstract This paper is concerned with investigating the fundamental conditions on the locations of the sampled entries, i.e., sampling pattern, for finite completability of a matrix that represents the union of several subspaces with given ranks. In contrast with the existing analysis on Grassmannian manifold for the conventional matrix completion, we propose a geometric analysis on the manifold structure for the union of several subspaces to incorporate all given rank constraints simultaneously. In order to obtain the deterministic conditions on the sampling pattern, we characterizes the algebraic independence of a set of polynomials defined based on the sampling pattern, which is closely related to finite completion. We also give a probabilistic condition in terms of the number of samples per column, i.e., the sampling probability, which leads to finite completability with high probability. Furthermore, using the proposed geometric analysis for finite completability, we characterize sufficient conditions on the sampling pattern that ensure there exists only one completion for the sampled data. PubDate: 2019-12-01

Abstract: Abstract In this paper, we provide an overview of the SAT+CAS method that combines satisfiability checkers (SAT solvers) and computer algebra systems (CAS) to resolve combinatorial conjectures, and present new results vis-à-vis best matrices. The SAT+CAS method is a variant of the Davis–Putnam–Logemann–Loveland DPLL(T) architecture, where the T solver is replaced by a CAS. We describe how the SAT+CAS method has been previously used to resolve many open problems from graph theory, combinatorial design theory, and number theory, showing that the method has broad applications across a variety of fields. Additionally, we apply the method to construct the largest best matrices yet known and present new skew Hadamard matrices constructed from best matrices. We show the best matrix conjecture (that best matrices exist in all orders of the form r2 + r + 1) which was previously known to hold for r ≤ 6 also holds for r = 7. We also confirmed the results of the exhaustive searches that have been previously completed for r ≤ 6. PubDate: 2019-12-01

Abstract: Abstract Artificial neural networks grow on the number of applications and complexity, which require a minimization on the number of units for some practical implementations. A particular problem is the minimum number of units that a feed forward neural network needs on its first layer. In order to study this problem, it is defined a family of classification problems following a continuity hypothesis, where inputs that are close to some set of points may share the same category. Given a set S of k −dimensional inputs and let \(\mathcal {N}\) be a feed forward neural network that classifies any input in S within a fixed error, there is proved that \(\mathcal {N}\) requires \({\Theta } \left (k \right )\) units in the first layer, if \(\mathcal {N}\) can solve any instance from the given family of classification problems. Furthermore, this asymptotic result is optimal. PubDate: 2019-12-01

Abstract: Abstract By their very design, many robot control programs are non-terminating. This paper describes a situation calculus approach to expressing and proving properties of non-terminating programs expressed in Golog, a high level logic programming language for modeling and implementing dynamical systems. Because in this approach actions and programs are represented in classical (second-order) logic, it is natural to express and prove properties of Golog programs, including non-terminating ones, in the very same logic. This approach to program proofs has the advantage of logical uniformity and the availability of classical proof theory. PubDate: 2019-11-18

Abstract: Abstract We consider the binary classification problem when data are large and subject to unknown but bounded uncertainties. We address the problem by formulating the nonlinear support vector machine training problem with robust optimization. To do so, we analyze and propose two bounding schemes for uncertainties associated to random approximate features in low dimensional spaces. The proposed bound calculations are based on Random Fourier Features and the Nyström methods. Numerical experiments are conducted to illustrate the benefit of the technique. We also emphasize the decomposable structure of the proposed robust nonlinear formulation that allows the use of efficient stochastic approximation techniques when datasets are large. PubDate: 2019-11-15

Abstract: Abstract This article presents a novel, nonlinear, data-driven signal processing method, which can help neuroscience researchers visualize and understand complex dynamical patterns in both time and space. Specifically, we present applications of a Koopman operator approach for eigendecomposition of electrophysiological signals into orthogonal, coherent components and examine their associated spatiotemporal dynamics. This approach thus provides enhanced capabilities over conventional computational neuroscience tools restricted to analyzing signals in either the time or space domains. This is achieved via machine learning and kernel methods for data-driven approximation of skew-product dynamical systems. The approximations successfully converge to theoretical values in the limit of long embedding windows. First, we describe the method, then using electrocorticographic (ECoG) data from a mismatch negativity experiment, we extract time-separable frequencies without bandpass filtering or prior selection of wavelet features. Finally, we discuss in detail two of the extracted components, Beta (\( \sim \) 13 Hz) and high Gamma (\( \sim \) 50 Hz) frequencies, and explore the spatiotemporal dynamics of high- and low- frequency components. PubDate: 2019-11-11

Abstract: Abstract Graph transformation theory uses rules to perform a graph transformation. However, there is no a way to choose between such different transformations in the case where several of them are applicable. A way to get the choice is suggested here based on the comparing of the values of implications which correspond to different transformation variants. The relationship between the topos of bundles, and the set of graphs with the same vertices, is introduced to include logic into graph transformation theory. Thus, one can use the special type of implication and the truth-values set of such a topos to estimate different variants of graph transformations. In this approach, the maximal part of the initial graph towards the terminal one is conserved in the chosen variant. Analysis of self-adaptive systems uses some graph grammars. Self-adaptive systems autonomously perform an adaptation to changes both in user needs and in their operational environments, while still maintaining some desired properties. The suggested way to choose such graph transformation variants may be used to make a choice between different graph grammars in such systems modeling. This approach is illustrated in a model of some business processes, that result in the automated choice of the business process adaptation under the assumption that the process changes are minimal towards the terminal state. PubDate: 2019-11-02

Abstract: Abstract This paper is about the statics and dynamics of belief states that are represented by pairs consisting of an agent’s credences (represented by a subjective probability measure) and her categorical beliefs (represented by a set of possible worlds). Regarding the static side, we argue that the latter proposition should be coherent with respect to the probability measure and that its probability should reach a certain threshold value. On the dynamic side, we advocate Jeffrey conditionalisation as the principal mode of changing one’s belief state. This updating method fits the idea of the Lockean Thesis better than plain Bayesian conditionalisation, and it affords a flexible method for adding and withdrawing categorical beliefs. We show that it fails to satisfy the traditional principles of Inclusion and Preservation for belief revision and the principle of Recovery for belief withdrawals, as well as the Levi and Harper identities. We take this to be a problem for the latter principles rather than for the idea of coherent belief change. PubDate: 2019-11-01

Abstract: Abstract Values are at the heart of human decision-making. They are used to decide whether something or some state of affairs is good or not, and they are also used to address the moral dilemma of the right thing to do under given circumstances. Both uses are present in several everyday situations, from the design of a public policy to the negotiation of employee benefit packages. Both uses of values are specially relevant when one intends to design or validate that artificial intelligent systems behave in a morally correct way. In real life, the choice of policy components or the agreed upon benefit package are processes that involve argumentation. Likewise, the design and deployment of value-driven artificial entities may be well served by embedding practical reasoning capabilities in these entities or using argumentation for their design and certification processes. In this paper, we propose a formal framework to support the choice of actions of a value-driven agent and arrange them into plans that reflect the agent’s preferences. The framework is based on defeasible argumentation. It presumes that agent values are partially ordered in a hierarchy that is used to resolve conflicts between incommensurable values. PubDate: 2019-11-01

Abstract: Abstract This work explores formal aspects of model repair, i.e., how to rationally modify Kripke models representing the behavior of a system in order to satisfy a desired property. We investigate the problem in the light of Alchourrón, Gärdenfors, and Makinson’s work on belief revision. We propose two AGM-style characterizations of model repair: one based on belief sets and the other based on structural changes. In the first characterization, we define a set of rationality postulates over formulas with a close correspondence to those in the classical belief revision theory. We show that the proposed set of postulates fully characterizes the expected rationality of modifications in the model repair problem. In the second characterization, we propose a new set of rationality postulates based on structural modifications on models. These postulates have a close correspondence to the classical approach of model repair, while preserving the same rationality of the first characterization. We provide two representation results and the connection between them. PubDate: 2019-11-01

Abstract: Abstract Three seemingly weak and plausible conditions on an operation of contraction on belief sets are shown to be logically incompatible: (1) there is at least one sentence that can be successfully removed by the operation, (2) both the original belief set and the outcome of the operation have finite representations, and (3) a non-tautologous sentence can be removed without loss of all its non-tautologous logical consequences. PubDate: 2019-11-01

Abstract: Abstract This is a survey of some recent results relating Dung-style semantics for different types of logical argumentation frameworks and several forms of reasoning with maximally consistent sets (MCS) of premises. The related formalsims are also examined with respect to some rationality postulates and are carried on to corresponding proof systems for non-monotonic reasoning. PubDate: 2019-11-01