A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

              [Sort alphabetically]   [Restore default list]

  Subjects -> STATISTICS (Total: 130 journals)
Showing 1 - 151 of 151 Journals sorted by number of followers
Review of Economics and Statistics     Hybrid Journal   (Followers: 189)
Statistics in Medicine     Hybrid Journal   (Followers: 140)
Journal of Econometrics     Hybrid Journal   (Followers: 83)
Journal of the American Statistical Association     Full-text available via subscription   (Followers: 76, SJR: 3.746, CiteScore: 2)
Advances in Data Analysis and Classification     Hybrid Journal   (Followers: 52)
Biometrics     Hybrid Journal   (Followers: 49)
Sociological Methods & Research     Hybrid Journal   (Followers: 47)
Journal of the Royal Statistical Society, Series B (Statistical Methodology)     Hybrid Journal   (Followers: 42)
Journal of Business & Economic Statistics     Full-text available via subscription   (Followers: 41, SJR: 3.664, CiteScore: 2)
Computational Statistics & Data Analysis     Hybrid Journal   (Followers: 37)
Journal of the Royal Statistical Society Series C (Applied Statistics)     Hybrid Journal   (Followers: 36)
Oxford Bulletin of Economics and Statistics     Hybrid Journal   (Followers: 35)
Journal of Risk and Uncertainty     Hybrid Journal   (Followers: 34)
Journal of the Royal Statistical Society, Series A (Statistics in Society)     Hybrid Journal   (Followers: 29)
Journal of Urbanism: International Research on Placemaking and Urban Sustainability     Hybrid Journal   (Followers: 28)
The American Statistician     Full-text available via subscription   (Followers: 25)
Statistical Methods in Medical Research     Hybrid Journal   (Followers: 23)
Journal of Computational & Graphical Statistics     Full-text available via subscription   (Followers: 21)
Journal of Forecasting     Hybrid Journal   (Followers: 21)
Journal of Applied Statistics     Hybrid Journal   (Followers: 20)
British Journal of Mathematical and Statistical Psychology     Full-text available via subscription   (Followers: 19)
Statistical Modelling     Hybrid Journal   (Followers: 18)
International Journal of Quality, Statistics, and Reliability     Open Access   (Followers: 18)
Journal of Statistical Software     Open Access   (Followers: 18, SJR: 13.802, CiteScore: 16)
Journal of Time Series Analysis     Hybrid Journal   (Followers: 17)
Journal of Biopharmaceutical Statistics     Hybrid Journal   (Followers: 17)
Computational Statistics     Hybrid Journal   (Followers: 16)
Risk Management     Hybrid Journal   (Followers: 16)
Decisions in Economics and Finance     Hybrid Journal   (Followers: 15)
Statistics and Computing     Hybrid Journal   (Followers: 14)
Demographic Research     Open Access   (Followers: 14)
Australian & New Zealand Journal of Statistics     Hybrid Journal   (Followers: 13)
Statistics & Probability Letters     Hybrid Journal   (Followers: 13)
Geneva Papers on Risk and Insurance - Issues and Practice     Hybrid Journal   (Followers: 13)
Journal of Statistical Physics     Hybrid Journal   (Followers: 12)
Structural and Multidisciplinary Optimization     Hybrid Journal   (Followers: 12)
Statistics: A Journal of Theoretical and Applied Statistics     Hybrid Journal   (Followers: 11)
International Statistical Review     Hybrid Journal   (Followers: 10)
The Canadian Journal of Statistics / La Revue Canadienne de Statistique     Hybrid Journal   (Followers: 10)
Communications in Statistics - Theory and Methods     Hybrid Journal   (Followers: 10)
Journal of Probability and Statistics     Open Access   (Followers: 10)
Advances in Complex Systems     Hybrid Journal   (Followers: 10)
Pharmaceutical Statistics     Hybrid Journal   (Followers: 9)
Scandinavian Journal of Statistics     Hybrid Journal   (Followers: 9)
Communications in Statistics - Simulation and Computation     Hybrid Journal   (Followers: 9)
Stata Journal     Full-text available via subscription   (Followers: 9)
Journal of Educational and Behavioral Statistics     Hybrid Journal   (Followers: 8)
Multivariate Behavioral Research     Hybrid Journal   (Followers: 8)
Teaching Statistics     Hybrid Journal   (Followers: 8)
Law, Probability and Risk     Hybrid Journal   (Followers: 8)
Fuzzy Optimization and Decision Making     Hybrid Journal   (Followers: 8)
Current Research in Biostatistics     Open Access   (Followers: 8)
Environmental and Ecological Statistics     Hybrid Journal   (Followers: 7)
Journal of Combinatorial Optimization     Hybrid Journal   (Followers: 7)
Journal of Global Optimization     Hybrid Journal   (Followers: 7)
Journal of Statistical Planning and Inference     Hybrid Journal   (Followers: 7)
Queueing Systems     Hybrid Journal   (Followers: 7)
Argumentation et analyse du discours     Open Access   (Followers: 7)
Handbook of Statistics     Full-text available via subscription   (Followers: 7)
Research Synthesis Methods     Hybrid Journal   (Followers: 7)
Asian Journal of Mathematics & Statistics     Open Access   (Followers: 7)
Biometrical Journal     Hybrid Journal   (Followers: 6)
Journal of Nonparametric Statistics     Hybrid Journal   (Followers: 6)
Lifetime Data Analysis     Hybrid Journal   (Followers: 6)
Significance     Hybrid Journal   (Followers: 6)
International Journal of Computational Economics and Econometrics     Hybrid Journal   (Followers: 6)
Journal of Mathematics and Statistics     Open Access   (Followers: 6)
Applied Categorical Structures     Hybrid Journal   (Followers: 5)
Engineering With Computers     Hybrid Journal   (Followers: 5)
Optimization Methods and Software     Hybrid Journal   (Followers: 5)
Statistical Methods and Applications     Hybrid Journal   (Followers: 5)
CHANCE     Hybrid Journal   (Followers: 5)
ESAIM: Probability and Statistics     Open Access   (Followers: 4)
Mathematical Methods of Statistics     Hybrid Journal   (Followers: 4)
Metrika     Hybrid Journal   (Followers: 4)
Statistical Papers     Hybrid Journal   (Followers: 4)
TEST     Hybrid Journal   (Followers: 3)
Journal of Algebraic Combinatorics     Hybrid Journal   (Followers: 3)
Journal of Theoretical Probability     Hybrid Journal   (Followers: 3)
Statistical Inference for Stochastic Processes     Hybrid Journal   (Followers: 3)
Monthly Statistics of International Trade - Statistiques mensuelles du commerce international     Full-text available via subscription   (Followers: 3)
Handbook of Numerical Analysis     Full-text available via subscription   (Followers: 3)
Sankhya A     Hybrid Journal   (Followers: 3)
Journal of Statistical and Econometric Methods     Open Access   (Followers: 3)
AStA Advances in Statistical Analysis     Hybrid Journal   (Followers: 2)
Extremes     Hybrid Journal   (Followers: 2)
Optimization Letters     Hybrid Journal   (Followers: 2)
Stochastic Models     Hybrid Journal   (Followers: 2)
Stochastics An International Journal of Probability and Stochastic Processes: formerly Stochastics and Stochastics Reports     Hybrid Journal   (Followers: 2)
IEA World Energy Statistics and Balances -     Full-text available via subscription   (Followers: 2)
Building Simulation     Hybrid Journal   (Followers: 2)
Technology Innovations in Statistics Education (TISE)     Open Access   (Followers: 2)
International Journal of Stochastic Analysis     Open Access   (Followers: 2)
Measurement Interdisciplinary Research and Perspectives     Hybrid Journal   (Followers: 1)
Statistica Neerlandica     Hybrid Journal   (Followers: 1)
Sequential Analysis: Design Methods and Applications     Hybrid Journal   (Followers: 1)
Wiley Interdisciplinary Reviews - Computational Statistics     Hybrid Journal   (Followers: 1)
Statistics and Economics     Open Access  
Review of Socionetwork Strategies     Hybrid Journal  
SourceOECD Measuring Globalisation Statistics - SourceOCDE Mesurer la mondialisation - Base de donnees statistiques     Full-text available via subscription  
Journal of the Korean Statistical Society     Hybrid Journal  

              [Sort alphabetically]   [Restore default list]

Similar Journals
Journal Cover
AStA Advances in Statistical Analysis
Journal Prestige (SJR): 0.548
Citation Impact (citeScore): 1
Number of Followers: 2  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1863-818X - ISSN (Online) 1863-8171
Published by Springer-Verlag Homepage  [2468 journals]
  • Zero-modified count time series modeling with an application to influenza
           cases

    • Free pre-print version: Loading...

      Abstract: Abstract The past few decades have seen considerable interest in modeling time series of counts, with applications in many domains. Classical and Bayesian modeling have primarily focused on conditional Poisson sampling distributions at each time. There is very little research on modeling time series involving Zero-Modified (i.e., Zero Deflated or Inflated) distributions. This paper aims to fill this gap and develop models for count time series involving Zero-Modified distributions, which belong to the Power Series family and are suitable for time series exhibiting both zero-inflation and zero-deflation. A full Bayesian approach via the Hamiltonian Monte Carlo (HMC) technique enables accurate modeling and inference. The paper illustrates our approach using time series on the number of deaths from the influenza virus in the city of São Paulo, Brazil.
      PubDate: 2023-11-27
       
  • Mixtures of generalized normal distributions and EGARCH models to analyse
           returns and volatility of ESG and traditional investments

    • Free pre-print version: Loading...

      Abstract: Abstract Environmental, social and governance (ESG) criteria are increasingly integrated into investment process to contribute to overcoming global sustainability challenges. Focusing on the reaction to turmoil periods, this work analyses returns and volatility of several ESG indices and makes a comparison with their traditional counterparts from 2016 to 2022. These indices comprise the following markets: Global, the US, Europe and emerging markets. Firstly, the two-component mixture of generalized normal distribution was exploited to objectively detect financial market turmoil periods with the Naïve Bayes’ classifier. Secondly, the EGARCH-in-mean model with exogenous dummy variables was applied to capture the turmoil period impact. Results show that returns and volatility are both affected by turmoil periods. The return–risk performance differs by index type and market: the European ESG index is less volatile than its traditional market benchmark, while in the other markets, the estimated volatility is approximately the same. Moreover, ESG and non-ESG indices differ in terms of turmoil periods impact, risk premium and leverage effect.
      PubDate: 2023-11-18
       
  • Mixture of experts distributional regression: implementation using robust
           estimation with adaptive first-order methods

    • Free pre-print version: Loading...

      Abstract: Abstract In this work, we propose an efficient implementation of mixtures of experts distributional regression models which exploits robust estimation by using stochastic first-order optimization techniques with adaptive learning rate schedulers. We take advantage of the flexibility and scalability of neural network software and implement the proposed framework in mixdistreg, an R software package that allows for the definition of mixtures of many different families, estimation in high-dimensional and large sample size settings and robust optimization based on TensorFlow. Numerical experiments with simulated and real-world data applications show that optimization is as reliable as estimation via classical approaches in many different settings and that results may be obtained for complicated scenarios where classical approaches consistently fail.
      PubDate: 2023-11-15
       
  • A Bayesian approach to modeling topic-metadata relationships

    • Free pre-print version: Loading...

      Abstract: Abstract The objective of advanced topic modeling is not only to explore latent topical structures, but also to estimate relationships between the discovered topics and theoretically relevant metadata. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but instead being estimated itself in an unsupervised fashion, usually by common topic models. A frequently used procedure to achieve this is the method of composition, a Monte Carlo sampling technique performing multiple repeated linear regressions of sampled topic proportions on metadata covariates. In this paper, we propose two modifications of this approach: First, we substantially refine the existing implementation of the method of composition from the R package stm by replacing linear regression with the more appropriate Beta regression. Second, we provide a fundamental enhancement of the entire estimation framework by substituting the current blending of frequentist and Bayesian methods with a fully Bayesian approach. This allows for a more appropriate quantification of uncertainty. We illustrate our improved methodology by investigating relationships between Twitter posts by German parliamentarians and different metadata covariates related to their electoral districts, using the structural topic model to estimate topic proportions.
      PubDate: 2023-11-03
       
  • GPS data on tourists: a spatial analysis on road networks

    • Free pre-print version: Loading...

      Abstract: Abstract This paper proposes a spatial point process model on a linear network to analyse cruise passengers’ stop activities. It identifies and models tourists’ stop intensity at the destination as a function of their main determinants. For this purpose, we consider data collected on cruise passengers through the integration of traditional questionnaire-based survey methods and GPS tracking data in two cities, namely Palermo (Italy) and Dubrovnik (Croatia). Firstly, the density-based spatial clustering of applications with noise algorithm is applied to identify stop locations from GPS tracking data. The influence of individual-related variables and itinerary-related characteristics is considered within a framework of a Gibbs point process model. The proposed model describes spatial stop intensity at the destination, accounting for the geometry of the underlying road network, individual-related variables, contextual-level information, and the spatial interaction amongst stop points. The analysis succeeds in quantifying the influence of both individual-related variables and trip-related characteristics on stop intensity. An interaction parameter allows for measuring the degree of dependence amongst cruise passengers in stop location decisions.
      PubDate: 2023-11-03
       
  • Conditional sum of squares estimation of k-factor GARMA models

    • Free pre-print version: Loading...

      Abstract: Abstract We analyze issues related to estimation and inference for the constrained sum of squares estimator (CSS) of the k-factor Gegenbauer autoregressive moving average (GARMA) model. We present theoretical results for the estimator and show that the parameters that determine the cycle lengths are asymptotically independent, converging at rate T, the sample size, for finite cycles. The remaining parameters lack independence and converge at the standard rate. Analogous with existing literature, some challenges exist for testing the hypothesis of non-cyclical long memory, since the associated parameter lies on the boundary of the parameter space. We present simulation results to explore small sample properties of the estimator, which support most distributional results, while also highlighting areas that merit additional exploration. We demonstrate the applicability of the theory and estimator with an application to IBM trading volume.
      PubDate: 2023-10-31
       
  • Measures of interrater agreement for quantitative data

    • Free pre-print version: Loading...

      Abstract: Abstract In this paper measures of interrater absolute agreement for quantitative measurements based on the standard deviation are proposed. Such indices allow (i) to overcome the limits affecting the intraclass correlation index; (ii) to measure the interrater agreement on single targets. Estimators of the proposed measures are introduced and their sampling properties are investigated for normal and non-normal data. Simulated data are employed to demonstrate the accuracy and practical utility of the new indices for assessing agreement. Finally, an application to assess the consistency of measurements performed by radiologists evaluating tumor size of lung cancer is presented.
      PubDate: 2023-10-10
       
  • Calibrated imputation for multivariate categorical data

    • Free pre-print version: Loading...

      Abstract: Abstract Non-response is a major problem for anyone collecting and processing data. A commonly used technique to deal with missing data is imputation, where missing values are estimated and filled in into the dataset. Imputation can become challenging if the variable to be imputed has to comply with a known total. Even more challenging is the case where several variables in the same dataset need to be imputed and, in addition to known totals, logical restrictions between variables have to be satisfied. In our paper, we develop an approach for a broad class of imputation methods for multivariate categorical data such that previously published totals are preserved while logical restrictions on the data are satisfied. The developed approach can be used in combination with any imputation model that estimates imputation probabilities, i.e. the probability that imputation of a certain category for a variable in a certain unit leads to the correct value for this variable and unit.
      PubDate: 2023-10-05
       
  • Editorial

    • Free pre-print version: Loading...

      PubDate: 2023-09-15
      DOI: 10.1007/s10182-023-00480-0
       
  • Hierarchical disjoint principal component analysis

    • Free pre-print version: Loading...

      Abstract: Abstract Dimension reduction, by means of Principal Component Analysis (PCA), is often employed to obtain a reduced set of components preserving the largest possible part of the total variance of the observed variables. Several methodologies have been proposed either to improve the interpretation of PCA results (e.g., by means of orthogonal, oblique rotations, shrinkage methods), or to model oblique components or factors with a hierarchical structure, such as in Bi-factor and High-Order Factor analyses. In this paper, we propose a new methodology, called Hierarchical Disjoint Principal Component Analysis (HierDPCA), that aims at building a hierarchy of disjoint principal components of maximum variance associated with disjoint groups of observed variables, from Q up to a unique, general one. HierDPCA also allows choosing the type of the relationship among disjoint principal components of two sequential levels, from the lowest upwards, by testing the component correlation per level and changing from a reflective to a formative approach when this correlation turns out to be not statistically significant. The methodology is formulated in a semi-parametric least-squares framework and a coordinate descent algorithm is proposed to estimate the model parameters. A simulation study and two real applications are illustrated to highlight the empirical properties of the proposed methodology.
      PubDate: 2023-09-01
       
  • Tests of stochastic dominance with repeated measurements data

    • Free pre-print version: Loading...

      Abstract: Abstract The paper explores a testing problem which involves four hypotheses, that is, based on observations of two random variables X and Y, we wish to discriminate between four possibilities: identical survival functions, stochastic dominance of X over Y, stochastic dominance of Y over X, or crossing survival functions. Four-decision testing procedures for repeated measurements data are proposed. The tests are based on a permutation approach and do not rely on distributional assumptions. One-sided versions of the Cramér–von Mises, Anderson–Darling, and Kolmogorov–Smirnov statistics are utilized. The consistency of the tests is proven. A simulation study shows good power properties and control of false-detection errors. The suggested tests are applied to data from a psychophysical experiment.
      PubDate: 2023-09-01
       
  • Sieve bootstrapping the memory parameter in long-range dependent
           stationary functional time series

    • Free pre-print version: Loading...

      Abstract: Abstract We consider a sieve bootstrap procedure to quantify the estimation uncertainty of long-memory parameters in stationary functional time series. We use a semiparametric local Whittle estimator to estimate the long-memory parameter. In the local Whittle estimator, discrete Fourier transform and periodogram are constructed from the first set of principal component scores via a functional principal component analysis. The sieve bootstrap procedure uses a general vector autoregressive representation of the estimated principal component scores. It generates bootstrap replicates that adequately mimic the dependence structure of the underlying stationary process. We first compute the estimated first set of principal component scores for each bootstrap replicate and then apply the semiparametric local Whittle estimator to estimate the memory parameter. By taking quantiles of the estimated memory parameters from these bootstrap replicates, we can nonparametrically construct confidence intervals of the long-memory parameter. As measured by coverage probability differences between the empirical and nominal coverage probabilities at three levels of significance, we demonstrate the advantage of using the sieve bootstrap compared to the asymptotic confidence intervals based on normality.
      PubDate: 2023-09-01
       
  • Distributional properties of continuous time processes: from CIR to bates

    • Free pre-print version: Loading...

      Abstract: Abstract In this paper, we compute closed-form expressions of moments and comoments for the CIR process which allows us to provide a new construction of the transition probability density based on a moment argument that differs from the historic approach. For Bates’ model with stochastic volatility and jumps, we show that finite difference approximations of higher moments such as the skewness and the kurtosis are unstable and, as a remedy, provide exact analytic formulas for log-returns. Our approach does not assume a constant mean for log-price differentials but correctly incorporates volatility resulting from Ito’s lemma. We also provide R, MATLAB, and Mathematica modules with exact implementations of the theoretical conditional and unconditional moments. These modules should prove useful for empirical research.
      PubDate: 2023-09-01
       
  • On dealing with the unknown population minimum in parametric inference

    • Free pre-print version: Loading...

      Abstract: Abstract A myriad of physical, biological and other phenomena are better modeled with semi-infinite distribution families, in which case not knowing the population minimum becomes a hassle when performing parametric inference. Ad hoc methods to deal with this problem exist, but are suboptimal and sometimes unfeasible. Besides, having the statistician handcraft solutions in a case-by-case basis is counterproductive. In this paper, we propose a framework under which the issue can be analyzed, and perform an extensive search in the literature for methods that could be used to solve the aforementioned problem; we also propose a method of our own. Simulation experiments were then performed to compare some methods from the literature and our proposal. We found that the straightforward method, which is to infer the population minimum by maximum likelihood, has severe difficulty in giving a good estimate for the population minimum, but manages to achieve very good inferred models. The other methods, including our proposal, involve estimating the population minimum, and we found that our method is superior to the other methods of this kind, considering the distributions simulated, followed very closely by the endpoint estimator by Alves et al. (Stat Sin 24(4):1811–1835, 2014). Although these two give much more accurate estimates for the population minimum, the straightforward method also displays some advantages, so choosing between these three methods will depend on the problem domain.
      PubDate: 2023-09-01
       
  • Group sparse recovery via group square-root elastic net and the iterative
           multivariate thresholding-based algorithm

    • Free pre-print version: Loading...

      Abstract: Abstract In this work, we propose a novel group selection method called Group Square-Root Elastic Net. It is based on square-root regularization with a group elastic net penalty, i.e., a \(\ell _{2,1}+\ell _2\) penalty. As a type of square-root-based procedure, one distinct feature is that the estimator is independent of the unknown noise level \(\sigma \) , which is non-trivial to estimate under the high-dimensional setting, especially when \(p\gg n\) . In many applications, the estimator is expected to be sparse, not in an irregular way, but rather in a structured manner. It makes the proposed method very attractive to tackle both high-dimensionality and structured sparsity. We study the correct subset recovery under a Group Elastic Net Irrepresentable Condition. Both the slow rate bounds and fast rate bounds are established, the latter under the Restricted Eigenvalue assumption and Gaussian noise assumption. To implement, a fast algorithm based on the scaled multivariate thresholding-based iterative selection idea is introduced with proved convergence. A comparative study examines the superiority of our approach against alternatives.
      PubDate: 2023-09-01
       
  • Hierarchical clustering and matrix completion for the reconstruction of
           world input–output tables

    • Free pre-print version: Loading...

      Abstract: Abstract Multi-regional input–output (I/O) matrices provide the networks of within- and cross-country economic relations. In the context of I/O analysis, the methodology adopted by national statistical offices in data collection raises the issue of obtaining reliable data in a timely fashion and it makes the reconstruction of (parts of) the I/O matrices of particular interest. In this work, we propose a method combining hierarchical clustering and matrix completion with a LASSO-like nuclear norm penalty, to predict missing entries of a partially unknown I/O matrix. Through analyses based on both real-world and synthetic I/O matrices, we study the effectiveness of the proposed method to predict missing values from both previous years data and current data related to countries similar to the one for which current data are obscured. To show the usefulness of our method, an application based on World Input–Output Database (WIOD) tables—which are an example of industry-by-industry I/O tables—is provided. Strong similarities in structure between WIOD and other I/O tables are also found, which make the proposed approach easily generalizable to them.
      PubDate: 2023-09-01
       
  • Debiasing SHAP scores in random forests

    • Free pre-print version: Loading...

      Abstract: Abstract Black box machine learning models are currently being used for high-stakes decision making in various parts of society such as healthcare and criminal justice. While tree-based ensemble methods such as random forests typically outperform deep learning models on tabular data sets, their built-in variable importance algorithms are known to be strongly biased toward high-entropy features. It was recently shown that the increasingly popular SHAP (SHapley Additive exPlanations) values suffer from a similar bias. We propose debiased or "shrunk" SHAP scores based on sample splitting which additionally enable the detection of overfitting issues at the feature level.
      PubDate: 2023-08-22
      DOI: 10.1007/s10182-023-00479-7
       
  • A family of consistent normally distributed tests for Poissonity

    • Free pre-print version: Loading...

      Abstract: Abstract A family of consistent tests, derived from a characterization of the probability generating function, is proposed for assessing Poissonity against a wide class of count distributions, which includes some of the most frequently adopted alternatives to the Poisson distribution. Actually, the family of test statistics is based on the difference between the plug-in estimator of the Poisson cumulative distribution function and the empirical cumulative distribution function. The test statistics have an intuitive and simple form and are asymptotically normally distributed, allowing a straightforward implementation of the test. The finite sample properties of the test are investigated by means of an extensive simulation study. The test shows satisfactory behaviour compared to other tests with known limit distribution.
      PubDate: 2023-06-15
      DOI: 10.1007/s10182-023-00478-8
       
  • Correlation-type goodness-of-fit tests based on independence
           characterizations

    • Free pre-print version: Loading...

      Abstract: Abstract This paper uses independence-type characterizations to propose a class of test statistics which can be used for testing goodness-of-fit with several classes of null distributions. The resulting tests are consistent against fixed alternatives. Some limiting and small sample properties of the test statistics are explored. In comparison with common universal goodness-of-fit tests, the new tests exhibit better power for most of the alternatives considered, while in comparison with another characterization-based procedure, the new tests provide competitive or comparable power in various simulation settings. The handiness of the proposed tests is demonstrated through several real-data examples.
      PubDate: 2023-05-04
      DOI: 10.1007/s10182-023-00475-x
       
  • Conditional feature importance for mixed data

    • Free pre-print version: Loading...

      Abstract: Abstract Despite the popularity of feature importance (FI) measures in interpretable machine learning, the statistical adequacy of these methods is rarely discussed. From a statistical perspective, a major distinction is between analysing a variable’s importance before and after adjusting for covariates—i.e., between marginal and conditional measures. Our work draws attention to this rarely acknowledged, yet crucial distinction and showcases its implications. We find that few methods are available for testing conditional FI and practitioners have hitherto been severely restricted in method application due to mismatched data requirements. Most real-world data exhibits complex feature dependencies and incorporates both continuous and categorical features (i.e., mixed data). Both properties are oftentimes neglected by conditional FI measures. To fill this gap, we propose to combine the conditional predictive impact (CPI) framework with sequential knockoff sampling. The CPI enables conditional FI measurement that controls for any feature dependencies by sampling valid knockoffs—hence, generating synthetic data with similar statistical properties—for the data to be analysed. Sequential knockoffs were deliberately designed to handle mixed data and thus allow us to extend the CPI approach to such datasets. We demonstrate through numerous simulations and a real-world example that our proposed workflow controls type I error, achieves high power, and is in-line with results given by other conditional FI measures, whereas marginal FI metrics can result in misleading interpretations. Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.
      PubDate: 2023-04-29
      DOI: 10.1007/s10182-023-00477-9
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.238.180.174
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-