|
|
- Robust Bayesian structure learning for graphical models with
T-distributions using G-Wishart prior-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Accurately interpreting complex relationships among many variables is of significant importance in science. One appealing approach to this task is Bayesian Gaussian graphical modeling, which has recently undergone numerous improvements. However, this model may struggle with datasets containing outliers; replacing Gaussian distributions with t-distributions enhances inferences and handles datasets with outliers. In this paper, we aim to address the challenges of Gaussian graphical models through t-distributions graphical models. To this end, we draw inspiration from the Birth–Death Monte Carlo Markov Chain (BDMCMC) algorithm and introduce a Bayesian method for structure learning in both classical and alternative t-distributions graphical models. We also demonstrate that the more flexible model outperforms the other when applied to more complex generated data. This is illustrated using a wide range of simulated datasets as well as a real-world dataset. PubDate: 2025-04-20
- A community detection algorithm based on spectral co-clustering and weight
self-adjustment in attributed stochastic co-block models-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The Degree-Corrected Stochastic co-Block Model (DC-ScBM) is widely utilized for detecting the community structure in directed networks. It can flexibly depict the topology of edges in directed graphs. However, in practice, node attributes provide an additional source of information that can be leveraged for community detection, which is not considered in the DC-ScBM. Therefore, there is a critical need to develop models and detection methods for node-attributed directed networks, especially when the goal is to discover important nodes or special community structures. We generalize the DC-ScBM using the multiplicative form to fuse edges and node attributes and describe the extent of influence of node attributes on each community. Then, a detection algorithm based on spectral co-clustering and feature weight self-adjustment (Spcc-SA) is developed. The algorithm aims to minimize normalized cut (Ncut), and iteratively detects the sending and receiving communities and the weights of node attributes, so that node attributes with stronger signals are given greater weights. Numerical studies demonstrate that the Spcc-SA algorithm outperforms existing methods across a variety of node attributes and network topologies. Especially when attribute values differ greatly and the community structure is distinct, the normalized mutual information of Spcc-SA in the sending and receiving communities can reach 0.6 and 0.8, respectively. Furthermore, We apply this algorithm to real world datasets, including the Enron email, world trade, and Weddell Sea network, demonstrating that the algorithm can effectively detect interesting community structures. PubDate: 2025-04-15
- Hierarchical relations among principal component and factor analysis
procedures elucidated from a comprehensive model-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this review article, the term “hierarchy” is related to constrained-ness, but not to superiority. Procedures A and B forming a hierarchy means that A is a constrained variant of B or vice versa. A goal of this article is to present a hierarchy of principal component analysis (PCA) and factor analysis (FA) procedures, which follows from a comprehensive FA (CompFA) model. This model can be regarded as a hybrid of PCA and prevalent FA models. First, we show how a non-random version of the CompFA model leads to the following hierarchy: PCA is a constrained variant of completely decomposed FA, which itself is a constrained variant of matrix decomposition FA. Then, we prove that a random version of the CompFA model leads to minimum rank FA (MRFA) and constraining MRFA leads to random PCA (RPCA), so as to present the following hierarchy: Probabilistic PCA is a constrained variant of prevalent FA, and the latter is a constrained variant of RPCA, which is itself a constrained variant of MRFA. Finally, this hierarchy and the above hierarchy following from the non-random version are unified into one. We further utilize the unified hierarchy to present a strategy for selecting a procedure suitable to a data set. PubDate: 2025-04-03
- Parameter-expanded ECME algorithms for logistic and penalized logistic
regression-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Parameter estimation in logistic regression is a well-studied problem with the Newton–Raphson method being one of the most prominent optimization techniques used in practice. A number of monotone optimization methods including minorization-maximization (MM) algorithms, expectation-maximization (EM) algorithms and related variational Bayes approaches offer useful alternatives guaranteed to increase the logistic regression likelihood at every iteration. In this article, we propose and evaluate an optimization procedure that is based on a straightforward modification of an EM algorithm for logistic regression. Our method can substantially improve the computational efficiency of the EM algorithm while preserving the monotonicity of EM and the simplicity of the EM parameter updates. By introducing an additional latent parameter and selecting this parameter to maximize the penalized observed-data log-likelihood at every iteration, our iterative algorithm can be interpreted as a parameter-expanded expectation-conditional maximization either (ECME) algorithm, and we demonstrate how to use the parameter-expanded ECME with an arbitrary choice of weights and penalty function. In addition, we describe a generalized version of our parameter-expanded ECME algorithm that can be tailored to the challenges encountered in specific high-dimensional problems, and we study several interesting connections between this generalized algorithm and other well-known methods. Performance comparisons between our method, the EM algorithm, Newton–Raphson, and several other optimization methods are presented using an extensive series of simulation studies based upon both real and synthetic datasets. PubDate: 2025-04-01
- Finding exclusive peak with permutation testing
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Comparing distributions is a crucial part of statistical work and supports decision-making across diverse fields. Various statistical tools address the task, evaluating central tendency, shape, and dispersion. This paper introduces a novel permutation-based methodology to compare distributions from a different perspective. In real-world scenarios, the objective may be to identify a product or service that clearly outperforms all competitors, if such an entity exists. To achieve this, we have implemented a method to compare exclusive peaks. Our approach relies on NonParametric Combination methodology and offers flexibility without distributional assumptions, making it particularly suitable to multivariate problems. In this paper we outline the methodology, conduct simulation studies, and conclude with insights for practical applications. PubDate: 2025-03-29
- Estimation of stress–strength reliability for the generalized inverted
exponential distribution based on improved adaptive Type-II progressive censoring-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This study aims to estimate the reliability of a stress–strength system using the generalized inverted exponential distribution (GIED). We achieve this by employing an improved adaptive Type-II progressive censoring scheme and utilizing various estimation techniques. The techniques used include maximum likelihood estimation through the EM algorithm and Bayesian inference. We use Markov chain Monte Carlo (MCMC) methods and TK approximation in the Bayesian framework. We compute various intervals, such as asymptotic confidence, arcsin transformed, Bayesian credible, and higher posterior density confidence intervals. To guide the estimation process, we use a generalized entropy loss function. Additionally, we conduct a comprehensive simulation analysis to validate the method’s performance and rigorously assess its applicability through real-life data analysis. PubDate: 2025-03-26
- Robust mixture of linear mixed modeling via multivariate Laplace
distribution-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The assumption of normality in random effects and regression errors is the primary cause of the lack of robustness in the maximum likelihood estimation procedure for linear mixed models. In this paper, we introduce a robust method for estimating regression parameters in these models, by positing that the random effects and regression errors follow a multivariate Laplace distribution. This new methodology, implemented via an EM algorithm, is computationally more efficient compared to the existing robust t procedure in the literature. Simulation studies suggest that the performance of the proposed estimation method in finite samples either surpasses or is at least on par with the robust t procedure. PubDate: 2025-03-24
- Bayesian modeling and forecasting of seasonal autoregressive models with
scale-mixtures of normal errors-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Most of existing Bayesian analysis methods of time series with seasonal pattern are based on the normality assumption; however, most of the real time series violate this assumption. With assuming the scale-mixtures of normal (SMN) distribution for the model errors, we introduce the Bayesian estimation and prediction of seasonal autoregressive (SAR) models, using the Gibbs sampler and Metropolis-Hastings algorithms. The SMN distribution is a general class that includes different symmetric heavy-tailed distributions as special cases, such as the Student’s t, slash and contaminated normal distributions. With employing different priors for the SAR parameters, we derive the full conditional posterior distributions of the SAR coefficients and scale parameter to be the multivariate normal and inverse gamma, respectively, and the conditional predictive distribution of the future observations to be the multivariate normal. For the other parameters related to the SMN distribution, we derive their conditional posteriors to be in a closed form but some of them are not standard distributions. Using the derived closed-form conditional posterior and predictive distributions, we propose the Gibbs sampler with the Metropolis-Hastings algorithm to approximate empirically the marginal posterior and predictive distributions. We introduce an extensive simulation study and a real application in order to evaluate the accuracy of the proposed MCMC algorithm. PubDate: 2025-03-23
- An improved preconditioned unsupervised K-means clustering
algorithm-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The Unsupervised K-means clustering (UKM) algorithm has attracted the attention of many researchers because it can automatically identify the number of clusters without requiring any parameter selection. However, it may produce poor clustering results on datasets with Gaussian mixtures. In this paper, we consider the preconditioned UKM algorithm, where the truncated UKM algorithm is first used as a preconditioning strategy. To further enhance the algorithm’s performance, we introduce a circular modification strategy. In particular, we determine whether to use the above strategies based on the Bayesian Information Criterion (BIC). The experimental results reveal that the proposed algorithms have a higher clustering accuracy than the UKM algorithm when applied to Gaussian mixture datasets. PubDate: 2025-03-19
- Bayesian nonparametric hypothesis testing methods on multiple comparisons
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper, we introduce Bayesian testing procedures based on the Bayes factor to compare the means across multiple populations in classical nonparametric contexts. The proposed Bayesian methods are designed to maximize the probability of rejecting the null hypothesis when the Bayes factor exceeds a specified evidence threshold. It is shown that these procedures have straightforward closed-form expressions based on classical nonparametric test statistics and their corresponding critical values, allowing for easy computation. We also demonstrate that they effectively control Type I error and enable researchers to make consistent decisions aligned with both frequentist and Bayesian approaches, provided that the evidence threshold for the Bayesian methods is set according to the significance level of the frequentist tests. Importantly, the proposed approaches allow for the quantification of evidence from empirical data in favor of the null hypothesis, an advantage that frequentist methods lack, as they cannot quantify support for the null when the null hypothesis is not rejected. We also present simulation studies and real-world applications to illustrate the performance of the proposed testing procedures. PubDate: 2025-03-12
- A novel hybrid framework for forecasting stock indices based on the
nonlinear time series models-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This study presents a new hybrid forecasting system to enhance the accuracy and efficiency of predicting stock market trends. To do this, the proposed involves several steps. Firstly, the closed stock index price time series is preprocessed to address missing values, variance stabilization, nonnormality, and nonstationarity. Second, the stock index closing prices are processed and filtered into a nonlinear long-term trend series and a stochastic series using three proposed filters and a benchmark filter. Third, the filtered series are estimated using the nonlinear and neural network autoregressive models. Fourth, the residual from both the fitted models is extracted, and a new series is obtained. The new residual series is forecasted using the autoregressive condition heterogeneity model. Finally, the forecasts from each model are combined to get the final estimates. The results indicate that the proposed final hybrid model produced the most accurate and efficient comparison with the baseline models. PubDate: 2025-03-11
- Approximate Bayesian inference in a model for self-generated gradient
collective cell movement-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this article we explore parameter inference in a novel hybrid discrete-continuum model describing the movement of a population of cells in response to a self-generated chemotactic gradient. The model employs a drift-diffusion stochastic process, rendering likelihood-based inference methods impractical. Consequently, we consider approximate Bayesian computation (ABC) methods, which have gained popularity for models with intractable or computationally expensive likelihoods. ABC involves simulating from the generative model, using parameters from generated observations that are “close enough” to the true data to approximate the posterior distribution. Given the plethora of existing ABC methods, selecting the most suitable one for a specific problem can be challenging. To address this, we employ a simple drift-diffusion stochastic differential equation (SDE) as a benchmark problem. This allows us to assess the accuracy of popular ABC algorithms under known configurations. We also evaluate the bias between ABC-posteriors and the exact posterior for the basic SDE model, where the posterior distribution is tractable. The top-performing ABC algorithms are subsequently applied to the proposed cell movement model to infer its key parameters. This study not only contributes to understanding cell movement but also sheds light on the comparative efficiency of different ABC algorithms in a well-defined context. PubDate: 2025-03-08
- Iterative weighted LAD estimation with homoskedasticity testing using the
Gini concentration index-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: An iterative technique is presented for weighted least absolute deviation (LAD) estimation, incorporating weights derived from the sparsity function associated with the response variable. The initial condition assumes homoskedastic residuals. The method’s essence lies in interpolating the unconditioned quantile function of the responses within a narrow neighborhood around 0.5. This interpolation yields an approximation of the sparsity function, which, in turn, guides the updating of weights based on the reciprocals of sparsity function values. These iterative steps are repeated until a predefined stopping criterion is satisfied. We propose using the Gini concentration index of these weights to assess the presence of heteroskedasticity in LAD residuals. The test statistic follows an asymptotic standard Gaussian distribution under the null hypothesis. We provide a simulation study to demonstrate the application and finite-sample performance of this test. Our results provide evidence for the utility of the Gini test. PubDate: 2025-03-04
- Modeling of long-term survival data with unobserved dispersion via neural
network-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Traditional models in survival analysis assume that every subject will eventually experience the event of interest in the study, such as death or disease recurrence; thus, the survival function is said to be proper. The cure rate model, which was first proposed seven decades ago, accounts for the presence of a fraction of individuals who will never experience the occurrence of the event of interest, referred to as the cure fraction. This cure fraction can be conceptualized as immune or cured subjects in the context of cancer treatment. In the literature, various cure rate models have been widely studied and commonly applied to structured data with a limited number of covariates. Recently, the use of convolutional neural networks, a powerful deep learning technique for image processing, has become increasingly common in the medical field. Medical images, such as histological slides and magnetic resonance images, are directly related to a patient’s prognostic factors. Therefore, it is reasonable to introduce these images as predictors in cure models. In this work, we extend the article by Xie and Yu (Stat Med 40(15):3516–3532, 2021. https://doi.org/10.1002/sim.8980), which employed a neural network to model the effect of unstructured predictors in the promotion time cure model setting for cases involving overdispersed data. We refer to our extension as the integrated negative binomial cure rate model, with its parameters estimated through the Expectation–Maximization algorithm. PubDate: 2025-02-27
- Informative right censoring in nonparametric survival models
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Survival analysis models allow us to analyze and predict the time until a certain event occurs. Existing nonparametric models assume that the censoring of observations is random and unrelated to the study conditions. The estimators of the survival and hazard functions assume a constant survival probability between modes, have poor interpretability for datasets with multimodal time distributions, and lead to poor-quality data descriptions. In this paper, we investigate the quality of nonparametric models on four medical datasets with informative censoring and multimodal time distribution and propose a modification to improve the description quality. Proved properties of IBS and AUPRC metrics show that the best quality is achieved at survival function with unimodal time distribution. We propose modifying the nonparametric model based on virtual events from a truncated normal distribution that allows for the suppression of informative censoring. We compared the quality of the nonparametric models on multiple random subsets of datasets of different sizes using the AUPRC and IBS metrics. According to the comparison of the quality using Welch’s test, the proposed model with virtual events significantly outperformed the existing Kaplan–Meier model for all datasets (p-value $$ PubDate: 2025-02-25
- An adaptive importance sampling for locally stable point processes
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The problem of finding the expected value of a statistic of a locally stable point process in a bounded region is addressed. We propose an adaptive importance sampling for solving the problem. In our proposal, we restrict the importance point process to the family of homogeneous Poisson point processes, which enables us to generate quickly independent samples of the importance point process. The optimal intensity of the importance point process is found by applying the cross-entropy minimization method. In the proposed scheme, the expected value of the statistic and the optimal intensity are iteratively estimated in an adaptive manner. We show that the proposed estimator converges to the target value almost surely, and prove the asymptotic normality of it. We explain how to apply the proposed scheme to the estimation of the intensity of a stationary pairwise interaction point process. The performance of the proposed scheme is compared numerically with Markov chain Monte Carlo simulation and perfect sampling. PubDate: 2025-02-21
- Success in the MLS SuperDraft: evaluating player characteristics and
performance using mixed effects models-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Drafting is a common way for many North American professional sports teams to obtain new players. The Major League Soccer (MLS) SuperDraft takes place prior to the start of each season to select valuable players. Being able to make well informed decisions surrounding draft selections is an important aspect of managing a team. This paper seeks to identify desirable characteristics of players drafted by MLS teams. Modelling the number of MLS games played and the probability of playing at least 30 MLS games, Cox proportional hazards models and mixed effects Logistic regression models were used to identify desirable characteristics and attempt to predict the success of future drafted players in MLS. The performances of the techniques have been evaluated and compared through 10-fold cross-validation. Results reveal significant player characteristics and multiple significant sources of variability during drafting. Furthermore, predictions were made for players who were selected in the 2018 and 2019 MLS SuperDrafts. PubDate: 2025-02-17
- Analysing kinematic data from recreational runners using functional data
analysis-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We present a multivariate functional mixed effects model for kinematic data from a large number of recreational runners. The runners’ sagittal plane hip and knee angles are modelled jointly as a bivariate function with random effects functions accounting for the dependence among bilateral measurements. The model is fitted by applying multivariate functional principal component analysis (mv-FPCA) and modelling the mv-FPCA scores using scalar linear mixed effects models. Simulation and bootstrap approaches are introduced to construct simultaneous confidence bands for the fixed effects functions, and covariance functions are reconstructed to summarise the variability structure in the data and thoroughly investigate the suitability of the proposed model. In our scientific application, we observe a statistically significant effect of running speed on both joints. We observe strong within-subject correlations, reflecting the highly idiosyncratic nature of running technique. Our approach is applicable to modelling multiple streams of smooth biomechanical data collected in complex experimental designs. PubDate: 2025-02-06
- A tree-based varying coefficient model
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The paper introduces a tree-based varying coefficient model (VCM) where the varying coefficients are modelled using the cyclic gradient boosting machine (CGBM) from Delong et al. (On cyclic gradient boosting machines, 2023). Modelling the coefficient functions using a CGBM allows for dimension-wise early stopping and feature importance scores. The dimension-wise early stopping not only reduces the risk of dimension-specific overfitting, but also reveals differences in model complexity across dimensions. The use of feature importance scores allows for simple feature selection and easy model interpretation. The model is evaluated on the same simulated and real data examples as those used in Richman and Wüthrich (Scand Actuar J 2023:71–95, 2023), and the results show that it produces results in terms of out of sample loss that are comparable to those of their neural network-based VCM called LocalGLMnet. PubDate: 2025-02-04
- On practical implementation of the fully robust one-sided cross-validation
method in the nonparametric regression and density estimation contexts-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The fully robust one-sided cross-validation (OSCV) method has versions in the nonparametric regression and density estimation settings. It selects the consistent bandwidths for estimating the continuous regression and density functions that might have finitely many discontinuities in their first derivatives. The theoretical results underlying the method were thoroughly elaborated in the preceding publications, while its practical implementations needed improvement. In particular, until this publication, no appropriate implementation of the method existed in the density estimation context. In the regression setting, the previously proposed implementation has a serious disadvantage of occasionally producing the irregular OSCV functions that complicates the bandwidth selection procedure. In this article, we make a substantial progress towards resolving the aforementioned issues by proposing a suitable implementation of fully robust OSCV for density estimation and providing specific recommendations for the further improvement of the method in the regression setting. PubDate: 2025-02-03
|