Authors:Róisín Loughran; Alexandros Agapitos; Ahmed Kattan; Anthony Brabazon; Michael O’Neill Abstract: Abstract We present a study examining feature selection from high performing models evolved using genetic programming (GP) on the problem of automatic speaker verification (ASV). ASV is a highly unbalanced binary classification problem in which a given speaker must be verified against everyone else. We evolve classification models for 10 individual speakers using a variety of fitness functions and data sampling techniques and examine the generalisation of each model on a 1:9 unbalanced set. A significant difference between train and test performance is found which may indicate overfitting in the models. Using only the best generalising models, we examine two methods for selecting the most important features. We compare the performance of a number of tuned machine learning classifiers using the full 275 features and a reduced set of 20 features from both feature selection methods. Results show that using only the top 20 features found in high performing GP programs led to test classifications that are as good as, or better than, those obtained using all data in the majority of experiments undertaken. The classification accuracy between speakers varies considerably across all experiments showing that some speakers are easier to classify than others. This indicates that in such real-world classification problems, the content and quality of the original data has a very high influence on the quality of results obtainable. PubDate: 2017-01-11 DOI: 10.1007/s12065-016-0150-5

Authors:Hoai Bach Nguyen; Bing Xue; Peter Andreae Pages: 95 - 110 Abstract: Abstract In classification, feature selection is an important pre-processing step to simplify the dataset and improve the data representation quality, which makes classifiers become better, easier to train, and understand. Because of an ability to analyse non-linear interactions between features, mutual information has been widely applied to feature selection. Along with counting approaches, a traditional way to calculate mutual information, many mutual information estimations have been proposed to allow mutual information to work directly on continuous datasets. This work focuses on comparing the effect of counting approach and kernel density estimation (KDE) approach in feature selection using particle swarm optimisation as a search mechanism. The experimental results on 15 different datasets show that KDE can work well on both continuous and discrete datasets. In addition, feature subsets evolved by KDE achieves similar or better classification performance than the counting approach. Furthermore, the results on artificial datasets with various interactions show that KDE is able to capture correctly the interaction between features, in both relevance and redundancy, which can not be achieved by using the counting approach. PubDate: 2016-08-20 DOI: 10.1007/s12065-016-0143-4 Issue No:Vol. 9, No. 3 (2016)

Authors:Nafrizuan Mat Yahya; M. Osman Tokhi; Hyreil Anuar Kasdirin Pages: 1 - 20 Abstract: Abstract Bats sonar algorithm (BSA) as a swarm intelligence approach utilises the concept of echolocation of bats to find prey. However, the algorithm is unable to achieve good precision and fast convergence rate to the optimum solution. With this in mind, an adaptive bats sonar algorithm is introduced with new paradigms of real bats echolocation behaviour. The performance of the algorithm is validated through rigorous tests with several single objective optimisation benchmark test functions. The obtained results show that the proposed scheme outperforms the BSA in terms of accuracy and convergence speed and can be efficiently employed to solve engineering problems. PubDate: 2016-02-18 DOI: 10.1007/s12065-016-0134-5 Issue No:Vol. 9, No. 1-2 (2016)

Authors:Seyed Alireza Mohseni; Tony Wong; Vincent Duchaine Pages: 21 - 36 Abstract: Abstract As with other nature-inspired algorithms, the cuckoo optimization algorithm (COA) produces a population of candidate solutions to find the (near-) optimal solutions to a problem. In this paper, several modifications, including a dynamic mutation operator, are proposed for this algorithm. Design of experiments is employed to determine factors controlling the value of parameters and the target levels of those values to achieve desirable output. The efficiency of the modified COA algorithm is substantiated with the help of several optimization test problems. The results are then compared to other well-known algorithms such as PSO, DE and harmony search using a non-parametric statistical procedure. In order to analyze its effectiveness, the proposed modified COA is applied to a feature selection problem and spacecraft attitude control problem. PubDate: 2016-04-12 DOI: 10.1007/s12065-016-0135-4 Issue No:Vol. 9, No. 1-2 (2016)

Authors:Mohamad Farhan Mohamad Mohsin; Abdul Razak Hamdan; Azuraliza Abu Bakar Pages: 37 - 51 Abstract: Abstract Signal normalization is a part of signal formalization which is a vital data pre-processing constraint required for the functioning of the dendritic cell algorithm. In existing applications, most normalization algorithms are developed purposely for a specific application with drawing on human domain expertise and very few algorithms are designed for general problems. This makes it difficult for the inexperienced user to exploit existing approaches to another problem, particularly when the initial information about the problem is limited. Therefore, this study proposes a new signal normalization method for the dendritic cell algorithm that uses the statistical upper and lower cumulative sum so that the algorithm can be applied to general classification problems. In addition, a new method to calculate the anomaly threshold based on the average mature-contact antigen value is presented to suit the proposed algorithm. The proposed model is evaluated by applying it to eight universal classification datasets and assessing its performance according to four measurement metrics: detection rate, specificity, false alarm rate, and accuracy. Its performance is compared with that of the existing dendritic cell algorithm and four non-bio-inspired classifiers, namely, rough set, decision tree, naïve Bayes, and multilayer perceptron. The results show that the proposed model outperforms the existing model and the other classifiers as well as demonstrates a significant improvement in terms of specificity, false alarm rate, and accuracy for all datasets. This indicates that the proposed normalization approach can be applied to general classification problems and can improve detection performance. PubDate: 2016-04-20 DOI: 10.1007/s12065-016-0136-3 Issue No:Vol. 9, No. 1-2 (2016)

Authors:Anthony Brabazon; Piotr Lipinski; Philip Hamill Abstract: Abstract Trading on major financial markets is typically conducted via electronic order books whose state is visible to market participants in real-time. A significant research literature has emerged concerning order book evolution, focussing on characteristics of the order book such as the time series of trade prices, movements in the bid-ask spread and changes in the depth of the order book at each price point. The latter two items can be characterised as order book shape where the book is viewed as a histogram with the size of the bar at each price point corresponding to the volume of shares demanded or offered for sale at that price. Order book shape is of interest to market participants as it provides insight as to current, and potentially future, market liquidity. Questions such as what shapes are commonly observed in order books and whether order books transition between certain shape patterns over time are of evident interest from both a theoretical and practical standpoint. In this study, using high-frequency equity data from the London Stock Exchange, we apply an unsupervised clustering methodology to determine clusters of common order book shapes, and also attempt to assess the transition probabilities between these clusters. Findings indicate that order books for individual stocks display intraday seasonality, exhibit some common patterns, and that transitions between order book patterns over sequential time periods is not random. PubDate: 2016-11-17 DOI: 10.1007/s12065-016-0149-y

Authors:Omar Andres Carmona Cortes; Andrew Rau-Chaplin Abstract: Abstract The purpose of this paper is to revisit the Multiobjective Population-Based Incremental Learning method and show how its performance can be improved in the context of a real-world financial optimization problem . The proposed enhancements lead to both better performance and improvements in the quality of solutions, which can represent millions of dollars for the insurance company in terms of recoveries. Its performance was assessed in terms of runtime and speedup when parallelized. Also, metrics such as the average number of solutions, the average hypervolume, and coverage have been used in order to compare the Pareto frontiers obtained by both the original and enhanced methods. Results indicated that the proposed method is 22.1% faster, present more solutions in the average (better defining the Pareto frontier) and often generates solutions having larger hypervolumes. The method achieves a speedup of 15.7 on 16 cores of a dual socket Intel multi-core machine when solving a Reinsurance Contract Optimization problem involving 15 layers or sub-contracts . PubDate: 2016-10-24 DOI: 10.1007/s12065-016-0147-0

Authors:D. Lim; Y. S. Ong; A. Gupta; C. K. Goh; P. S. Dutta Abstract: Abstract As the field of evolutionary optimization continues to expand, it is becoming increasingly common to incorporate various machine learning approaches, such as clustering, classification, and regression models, to improve algorithmic efficiency. However, we note that although problem learning is popularly used in improving the ongoing optimization process, little effort is ever made in extracting re-usable domain knowledge. In other words, the acquired knowledge is seldom transferred and exploited for future design exercises. Focusing on evolutionary optimization, in this paper we investigate the concept of simultaneous problem learning and optimization inspired by the following notions: (1) that prior/dynamically acquired knowledge can enhance the effectiveness of evolutionary search, and (2) that evolution can be geared towards gathering crucial knowledge about the underlying problem. Taking benchmark functions as well as an engineering (process) design problem into consideration, we demonstrate the efficacy of a novel classifier-assisted constrained EA towards simultaneous evolutionary search and problem learning. PubDate: 2016-10-11 DOI: 10.1007/s12065-016-0146-1

Authors:Dietmar Maringer; Sebastian H. M. Deininger Abstract: Abstract Selecting and estimating parsimonious models is often desired, but hard to achieve. This is particularly true when models can potentially contain a very large number of parameters but data are scarce—as is the case for many macro-economic models in general and interest-rate models in particular. These models need to cater for a large number of potential relationships and dependencies, but are fitted on low-frequency data to focus on the bigger picture and long-term effects. To identify the ideal model and estimating it is then particularly demanding from an optimization perspective. In this paper, we suggest an evolutionary approach that considers model selection and estimation simultaneously. Numerical experiments with artificial data suggest that the approach is well suited for this type of problem. In an empirical application for short-term and long-term interest rates denominated in US dollar, euro and the Japanese yen, respectively, parsimonious model structures are identified that highlight the dependencies as well as spill-overs across maturities and currencies. PubDate: 2016-09-27 DOI: 10.1007/s12065-016-0145-2

Authors:Igor Deplano; Giovanni Squillero; Alberto Tonda Abstract: Abstract Predicting the market’s behavior to profit from trading stocks is far from trivial. Such a task becomes even harder when investors do not have large amounts of money available, and thus cannot influence this complex system in any way. Machine learning paradigms have been already applied to financial forecasting, but usually with no restrictions on the size of the investor’s budget. In this paper, we analyze an evolutionary portfolio optimizer for the management of limited budgets, dissecting each part of the framework, discussing in detail the issues and the motivations that led to the final choices. Expected returns are modeled resorting to artificial neural networks trained on past market data, and the portfolio composition is chosen by approximating the solution to a multi-objective constrained problem. An investment simulator is eventually used to measure the portfolio performance. The proposed approach is tested on real-world data from New York’s, Milan’s and Paris’ stock exchanges, exploiting data from June 2011 to May 2014 to train the framework, and data from June 2014 to July 2015 to validate it. Experimental results demonstrate that the presented tool is able to obtain a more than satisfying profit for the considered time frame. PubDate: 2016-09-02 DOI: 10.1007/s12065-016-0144-3

Authors:Cao Truong Tran; Mengjie Zhang; Peter Andreae; Bing Xue Abstract: Abstract Missing values are an unavoidable problem of many real-world datasets. Inadequate treatment of missing values may result in large errors on classification; thus, dealing well with missing values is essential for classification. Feature selection has been well known for improving classification, but it has been seldom used for improving classification with incomplete datasets. Moreover, some classifiers such as C4.5 are able to directly classify incomplete datasets, but they often generate more complex classifiers with larger classification errors. The purpose of this paper is to propose a wrapper-based feature selection method to improve the ability of a classifier able to classify incomplete datasets. In order to achieve the purpose, the feature selection method evaluates feature subsets using a classifier able to classify incomplete datasets. Empirical results on 14 datasets using particle swarm optimisation for searching feature subsets and C4.5 for evaluating the feature subsets in the feature selection method show that the wrapper-based feature selection is not only able to improve classification accuracy of the classifier, but also able to reduce the size of trees generated by the classifier. PubDate: 2016-08-09 DOI: 10.1007/s12065-016-0141-6

Authors:Mohammad Majid al-Rifaie; Tim Blackwell Abstract: Abstract This paper extends particle aggregate reconstruction technique (PART), a reconstruction algorithm for binary tomography based on the movement of particles. PART supposes that pixel values are particles, and that particles diffuse through the image, staying together in regions of uniform pixel value known as aggregates. In this work, a variation of this algorithm is proposed and a focus is placed on reducing the number of projections and whether this impacts the reconstruction of images. The algorithm is tested on three phantoms of varying sizes and numbers of forward projections and compared to filtered back projection, a random search algorithm and to SART, a standard algebraic reconstruction method. It is shown that the proposed algorithm outperforms the aforementioned algorithms on small numbers of projections. This potentially makes the algorithm attractive in scenarios where collecting less projection data are inevitable. PubDate: 2016-08-08 DOI: 10.1007/s12065-016-0140-7

Authors:Yih-Chun Cheng; Tom Hartmann; Pei-Yun Tsai; Martin Middendorf Abstract: Abstract A population based ant colony optimization algorithm (PACO) for the reconstruction of electrocardiogram (ECG) signals is proposed. Specifically, the PACO finds a subset of nonzero positions of a sparse wavelet domain ECG signal vector that is used for the reconstruction of the signal. A time window is used by the proposed PACO for fixing certain decisions of the ants during the run of the algorithm. The optimization behaviour of the PACO is compared with various algorithms from the literature for ECG signal reconstruction, and with two random search heuristics. Experimental results are presented for ECG signals from the MIT-BIT Arrhythmia database. The influence of several algorithmic parameters and of a local search procedure is evaluated. The results show that the proposed PACO algorithm reconstructs ECG signals with high accuracy. PubDate: 2016-07-30 DOI: 10.1007/s12065-016-0139-0

Authors:Shu-Heng Chen; Umberto Gostoli Abstract: Abstract In this paper, we carry out a sensitivity analysis for an agent-based model of the use of public resources as manifested by the El Farol Bar problem. An early study using the same model has shown that a good-society equilibrium, characterized by both economic efficiency and economic equality, can be achieved probabilistically by a von Neumann network, and can be achieved surely with the presence of some agents having social preferences, such as the inequity-averse preference or the ‘keeping-up-with-the-Joneses’ preference. In this study, we examine this fundamental result by exploring the inherent complexity of the model; specifically, we address the effect of the three key parameters related to size, namely, the network size, the neighborhood size, and the memory size. We find that social preferences still play an important role over all the sizes considered. Nonetheless, it is also found that when network size becomes large, the parameter, the bar capacity (the attendance threshold), may also play a determining role. PubDate: 2016-07-02 DOI: 10.1007/s12065-016-0138-1

Authors:Alhanoof Althnian; Arvin Agah Abstract: Abstract This paper presents an evolutionary approach that, given a performance goal, produces a communication strategy that can improve a multi-agent system’s performance with respect to the desired goal. The evolved strategy determines what, when, and to whom agents communicate. The proposed approach further enables tuning the trade-off between the performance goal and communication cost, to produce a strategy that achieves a good balance between the two objectives, according the system’s designer needs. Experiments are designed to evaluate the approach using the Wumpus World application domain, with variations of three factors: fitness parameters (including objectives’ weights and action and communication costs), fitness goal, and simulation environment. Results show that the system’s performance can be highly tuned by controlling communication, and that the presented approach has significant utilization in improving the performance with respect to the goal. PubDate: 2016-06-13 DOI: 10.1007/s12065-016-0137-2

Authors:Theodore W. Cornforth; Hod Lipson Pages: 149 - 164 Abstract: Abstract Natural and artificial dynamical systems in the real world often have dynamics at multiple time scales. Such dynamics can contribute substantially to the complexity of a dynamical system and increase the difficulty with which it can be analyzed. Although evolutionary algorithms have been proposed that are amenable to the automated modeling of dynamical systems, none have explicitly taken into account multiple time scales or leveraged the information about these dynamics that is inherent in experimental observations. We propose a hybrid approach to the design of models for multiple-time-scale dynamical systems that combines an evolutionary algorithm with other metaheuristics and conventional nonlinear regression. With only minimal human-supplied domain knowledge, the algorithm automates the process of analyzing raw experimental observations and creating an interpretable symbolic model of the system under study. We describe the algorithm in detail and demonstrate its applicability to a variety of both physical and simulated systems. In addition, we study the performance and scalability of the algorithm under different types of dynamics, varying levels of experimental noise, and other factors relevant to the practical application of the algorithm. PubDate: 2015-01-31 DOI: 10.1007/s12065-015-0126-x Issue No:Vol. 8, No. 4 (2015)

Authors:Mohammad-H. Tayarani-N.; Adam Prügel-Bennett Pages: 165 - 184 Abstract: Abstract The anatomy of the fitness landscape for the quadratic assignment problem is studied in this paper. We study the properties of both random problems, and real-world problems. Using auto-correlation as a measure for the landscape ruggedness, we study the landscape of the problems and show how this property is related to the problem matrices with which the problems are represented. Our main goal in this paper is to study new properties of the fitness landscape, which have not been studied before, and we believe are more capable of reflecting the problem difficulties. Using local search algorithm which exhaustively explore the plateaus and the local optima, we explore the landscape, store all the local optima we find, and study their properties. The properties we study include the time it takes for a local search algorithm to find local optima, the number of local optima, the probability of reaching the global optimum, the expected cost of the local optima around the global optimum and the basin of attraction of the global and local optima. We study the properties for problems of different sizes, and through extrapolations, we show how the properties change with the system size and why the problem becomes harder as the system size grows. In our study we show how the real-world problems are similar to, or different from the random problems. We also try to show what properties of the problem matrices make the landscape of the real problems be different from or similar to one another. PubDate: 2015-05-22 DOI: 10.1007/s12065-015-0132-z Issue No:Vol. 8, No. 4 (2015)

Authors:Syahaneim Marzukhi; Will N. Browne; Mengjie Zhang Pages: 185 - 201 Abstract: Abstract The Three-Cornered Coevolution Framework describes a method that is capable of addressing classification tasks through coevolution (coadaptive evolution) where three different agents (i.e. a generation agent and two classification agents) learn and adapt to the changes of the problems without human involvement. Here, artificial problems can be generated in concert with classification agents in order to provide insight into their relationships. Previous work on the Two-Cornered Coevolution Framework provided foundation for implementing the system that was able to set-up the problem’s difficulty appropriately while triggering the coevolutionary process. However, the triggering process was set manually without utilising the third agent as proposed in the original framework to perform this task. Previous work on the Three-Cornered Coevolution introduced the third agent (a new classification agent) to trigger the coevolutionary process within the system, where its functionality and effect on the system requires investigation. This paper details the implementation for this case; two classification agents that use different styles of learning techniques (e.g. supervised versus reinforcement learning techniques) is adapted in the classification agents to learn the various classification problems. Ultimately, Learning Classifier System (LCS) is chosen to be implemented in the participating agents. LCS has several potential characteristics, such as interpretability, generalization capability and variations in representation, that are suitable for the system. Experiments show that the Pittsburgh-style LCS with the adaptation of Tabu Search technique in S capable to autonomously adjust the problem’s difficulty and generate a wide range of problems for classification. The adaptation of A-PLUS to an ‘on-line’ system is successful implemented. Further, the classification agents (i.e. R and I) are able to solve the classification tasks where the classification performance are varied. The Three-Cornered Coevolution Framework offers a great potential for autonomous learning and provides useful insight into coevolution learning over the standard studies of pattern recognition. The system is capable of autonomously generating various problems, learning and providing insight into each learning system’s ability by determining the problem domains where they perform relatively well. This is in contrast to humans having to determine the problem domains. PubDate: 2015-11-18 DOI: 10.1007/s12065-015-0133-y Issue No:Vol. 8, No. 4 (2015)