|
|
- Distributed penalizing function criterion for local polynomial estimation
in nonparametric regression with massive data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The selection of bandwidth is one of the most important issues in local polynomial estimation. However, the related researches about data-driven bandwidth selection methodology in combination with divide-and-conquer (DC) strategy have still been rare in the existing literature, which is not feasible to support the application of local polynomial estimation for massive data sets. In this paper, as a development of traditional penalizing function criterion, we propose a distributed penalizing function (DPF) to achieve the selection of optimal bandwidth. The proposed DPF is computationally efficient for massive data sets and is shown to be “globally optimal” in the sense that the minimization of the DPF is asymptotically equivalent to the minimization of the true empirical loss of the averaged function estimator, i.e., the DC estimator. Besides, a novel algorithm is proposed to resolve the selection of bandwidth parameter with imbalance DC strategy. The performance of this DPF is presented in the simulation studies and the real data analysis. PubDate: 2025-03-10
- Tests for high-dimensional partially linear regression models
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper, we consider the tests for high-dimensional partially linear regression models. The presence of high-dimensional nuisance covariates and the unknown nuisance function makes the inference problem very challenging. We adopt machine learning methods to estimate the unknown nuisance function and introduce quadratic-form test statistics. Interestingly, though the machine learning methods can be very complex, under suitable conditions, we establish the asymptotic normality of our introduced test statistics under the null hypothesis and local alternative hypotheses. We further propose a power-enhanced procedure to improve the performance of test statistics. Two thresholding determination methods are provided for the proposed power-enhanced procedure. We show that the power enhancement procedure is powerful to detect signals under either sparse or dense alternatives and it can still control the type-I error asymptotically under the null hypothesis. Numerical studies are carried out to illustrate the empirical performance of our introduced procedures. PubDate: 2025-03-04
- Bounded data modeling using logit-skew-normal mixtures
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Bounded data on (0, 1) have often been modelled in several real-world applications using several distributions. However, these studies lack addressing skewness, kurtosis and heavy-tailed properties in observations. This study presents a novel skew-normal type distribution defined within a bounded interval, which is derived by integrating the structures of skew-normal distributions and the logit function. With its extended skewness and bounded properties, the proposed model provides a versatile and suitable solution for modeling rates and proportions. We have developed an EM-type algorithm to accurately estimate the model parameters and its finite mixtures. To illustrate the effectiveness of our approach, we conducted experiments that included two simulation studies and an analysis of real data. The results highlight the flexibility and accuracy of our proposed model in comparison to traditional mixture models. PubDate: 2025-03-04
- Dimensionality reduction in multivariate nonparametric regression via
nuclear norm penalization-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper reports on our study of a nonparametric reduced-rank regression method within an additive model framework. The nuclear norm of component functions is penalized to incorporate inherent low-dimensional structure into the estimation process. The proposed penalization scheme introduces sparsity to the singular values of coefficient matrices of basis functions, thereby enabling the identification of low-rank structures in the function space. A non-asymptotic oracle inequality is established to investigate the theoretical properties of the proposed estimator. Minimax upper and lower bounds are obtained to measure the complexity of the nonparametric multivariate regression problem with a low-rank structure under various high-dimensional asymptotic scenarios. The minimax analysis demonstrates the dimensionality reduction effects of the reduced-rank and additive modeling framework. The results also show that the proposed estimator is rate optimal in the minimax sense. The proposed method is implemented with the backfitting and the alternating direction method of multipliers algorithm. Simulation studies are conducted to complement the theoretical findings. To demonstrate the practicality of the proposed method, we apply it to the biochemical data and Arabidopsis thaliana data. PubDate: 2025-03-04
- A new estimator for the multicollinear logistic regression model
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Logistic regression model applications have become very popular in the analysis of social, economy, finance, agriculture, health economics and medical science in recent years. However, if there is a high degree of relationship between the independent variables, the problem of multicollinearity arises in this model. In this paper, we introduce a new Jackknifed two-parameter estimator (JTPE) and the two-parameter estimator (TPE) for the logistic regression model by unifying the JTPE of Kandemir Çetinkaya and Kaçranlar (Biased estimators and their applications in generalized linear models. Thesis, 2024). We examined bias vectors and matrix mean squared error (MMSE) of the TPE and the JTPE. The generalization of some estimation methods for ridge and Liu parameters in logistic regression model are provided. Also, the superiority of JTPE is assessed by the simulated mean squared error (SMSE) via Monte Carlo simulation study where the response follows logistic regression model. We finally consider real data applications. The proposed estimators are compared and interpreted. PubDate: 2025-02-20
- Goodness-of-fit tests for discrete response models with covariates
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We propose goodness-of-fit tests for models of count responses with covariates. Our main focus is on the null hypothesis that the observed data come from a Poisson, a negative binomial, or a binomial regression model, but the method is fairly general allowing for the responses to follow, conditionally on covariates, any given discrete distribution. The test criteria are formulated by using the probability generating function and they are convenient from a computational point of view. Asymptotic as well as Monte Carlo results are presented. Applications on real data are also reported. PubDate: 2025-02-20
- Variable selection for nonparametric spatial additive autoregressive model
via deep learning-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper, the variable selection method based on deep neural networks is extended to the nonparametric spatial additive autoregressive model. We introduce the concept of the Lasso penalty, which is established in the spatial residual network structure, and then transform the problem into a constrained optimization. This method can simultaneously perform the processes of variable selection and parameter estimation. Without specifying the degree of sparsity, we are also able to obtain a particular set of selected variables. The model with the nonparametric endogenous effect is used to adapt spatial data uniformly, and the variable selection method is also appropriate for linear cases due to the nonparametric additive covariate structure. The network structure can learn about the specific form of influence of each important feature, so it has good interpretability and can solve the black box problem in deep learning models to some degree. Through simulation studies and analysis of real dataset, the superiority of the method in variable selection and prediction performance is demonstrated. PubDate: 2025-02-17
- Group strong orthogonal arrays and their construction methods
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In practical applications, group structures may exist between response variables and input factors in certain computer experiments. The interactions of interest occur exclusively within factors within several disjoint groups. In such experiments, an ideal design ensures superior space-filling properties for each group compared to the overall design, which itself exhibits commendable space-filling characteristics. Inspired by this idea, we introduce the concept of group strong orthogonal arrays, which can be partitioned into distinct groups. Both the overall design and each individual group constitute strong orthogonal arrays, with the strength of each group exceeding that of the entire design. Addressing different strengths and levels, we present the construction methods for three distinct types of such designs, among which two types are column-orthogonal. Orthogonal arrays, difference matrices, and rotation matrices play pivotal roles in the construction process. PubDate: 2025-02-13
- Fast rates of exponential cost function
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper, we introduce a new algorithm of learning with exponential cost function within the framework of statistical learning theory. We establish an important comparison theorem that illustrates the relationship between the prediction error and the excess generalization error under the moment condition. Furthermore, the paper investigates the generalization performance of algorithm and the robustness of exponential cost function. We prove that the resulting estimator enjoys asymptotic optimality and robustness under certain conditions. Numerical simulations are provided to demonstrate our theoretical findings. PubDate: 2025-02-13
- Modified maximum likelihood estimator for censored linear regression model
with two-piece generalized t distribution-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In many fields, limited or censored data are often collected due to limitations of measurement equipment or experimental design. Commonly used censored linear regression models rely on the assumption of normality for the error terms. However, this approach has faced criticism in literature due to its sensitivity to deviations from the normality assumption. In this paper, we propose an extension of the CR model under the two-piece generalized t (TPGT)-error distribution, called TPGT-CR model. The TPGT-CR model offers greater flexibility in modeling data by accommodating skewness and heavy tails. We developed a modified maximum likelihood (MML) estimator for the proposed model and introduced the modified deviance residual to detect outliers. The developed MML estimator under the TPGT assumption possesses several appealing merits, including robustness against outliers, asymptotic equivalence to the maximum likelihood estimator, and explicit functions of sample observations. Simulation studies are conducted to examine the finite sample performance, robustness, and effectiveness of both the classical and proposed estimators. The results from both the simulated and real data illustrate the usefulness of the proposed method. PubDate: 2025-02-10
- Shrinkage and pretest Liu estimators in semiparametric linear measurement
error models-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this article for semiparametric linear mesurement errors models under a multicollinearity setting, we define five shrinkage Liu estimators, namely, ordinary Liu estimator, restricted Liu estimator (RLE), preliminary test Liu estimator (Ple), Stien Liu estimator (Sle) and positive stein Liu estimator (Psle) for estimating the parameters when it is suspected that the parameter $$\beta $$ may belong to a linear subspace defined by $${\textbf{H}}\beta = c$$. Asymptotic properties of the estimators are studied with respect to quadratic risks. We derive the biases and quadratic risk expressions of these estimators and obtain the region of optimality of each estimator. Also, necessary and sufficient conditions, for the superiority of the shrinkage Liu estimator over its counterpart, for choosing the Liu parameter d are established. Finally, we illustrate the performance of the proposed shrinkage estimators with a simulation study and real data analyses. PubDate: 2025-02-09
- A class of nonparametric tests for DMTTF alternatives based on moment
inequality-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Based on a moment inequality, a family of test statistics for testing exponentiality against DMTTF alternatives is proposed. The asymptotic distribution of the test statistics is derived under the null and alternative hypothesis, and the consistency of the test is shown by exploiting the U-statistics theory. Comparisons with competing tests are made in terms of Pitman Asymptotic Relative Efficiency (PARE). Additionally, an adapted version of the test under random censorship is explored. The performance of the proposed test has been accessed by means of a simulation study and through application to some real-life data sets. PubDate: 2025-02-07
- The empirical Bernstein process with application to uniformity testing
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this study, we introduce empirical Bernstein process and establish its weak convergence. We also present a novel testing procedure for assessing uniformity, which utilizes the Cramér–Von Mises and Kolmogorov–Smirnov functionals of the empirical Bernstein process. Additionally, we derive the asymptotic properties of the proposed tests’ statistics under the null hypothesis and under a sequence of local alternative hypotheses. Comprehensive simulation studies demonstrate that tests outperform those based solely on the empirical distribution. PubDate: 2025-02-07
- A penalization method to estimate the intrinsic dimensionality of data
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We propose a novel penalization method for estimating the intrinsic dimensionality of data within a Probabilistic Principal Components Model, extending beyond the Gaussian case. Unlike existing approaches, our method is designed to handle non-normal data, providing a flexible alternative to traditional factor models. Our procedure identifies the dimension at which the eigenvalues of a scatter matrix stabilize. We establish the consistency of the procedure under mild conditions and demonstrate its robustness across a range of data distributions. A comparative analysis highlights its advantages over existing techniques, making it a valuable tool for dimensionality estimation without relying on distributional assumptions. PubDate: 2025-02-06
- Equivalent conditions of complete moment convergence for randomly weighted
sums of random variables and some applications with random design-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper, the equivalent conditions of complete moment convergence for maximal randomly weighted sums of negatively associated random variables with negatively associated random weights are investigated through using the slicing and monotonic truncation methods. Furthermore, some corollaries about the complete convergence and the almost sure convergence for random variables with randomly weighted are presented. The results obtained in this paper remain valid for some dependent random variables and mixing random variables, serving as generalizations and improvements of some known ones. Additionally, the derived results are applied to the bootstrap sample means, yielding the complete moment convergence and complete convergence for the bootstrap sample means of negatively associated random variables. These results are further applied to a simple linear errors-in-variables regression model with random design, establishing the strong consistency of the least squares estimator. Finally, a numerical simulation is conducted to evaluate the performances in finite samples. PubDate: 2025-02-05
- Asymptotic behavior of bootstrapped extreme order statistics under unknown
power normalizing constants-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Bootstrapping is a powerful statistical technique, but its reliability for analyzing extreme events with unknown power normalization remains unclear. This paper addresses this issue by exploring the asymptotic properties of bootstrapped extreme order statistics under unknown power normalizing constants. Different estimators of the power normalizing constants are considered. The consistency properties of bootstrapped extreme order statistics with the estimated power normalizing constants are investigated. An extensive simulation study is conducted to identify an optimal bootstrap sample size under power and linear normalization. This study is based on how closely the bootstrapped extreme order statistics align with their theoretical asymptotic limits. PubDate: 2025-01-29
- Transfer learning for semiparametric varying coefficient spatial
autoregressive models-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Transfer learning is widely recognized for its effectiveness in leveraging external information to enhance the learning performance and predictive accuracy of target domain models. However, research on transfer learning within the context of the semiparametric varying-coefficient spatial autoregressive model is currently absent. In this study, we address this gap by introducing a transfer learning approach tailored to this model. Our method aims to improve estimation and prediction accuracy by effectively transferring knowledge from source data to the target model. We propose different algorithms for the cases where the transferable sources are known and unknown, respectively. Through extensive simulation experiments and real-world applications, we validate the efficacy of our proposed approach. PubDate: 2025-01-29
- Nonparametric testing of first-order structure in point processes on
linear networks-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper we address a two-sample problem in the context of point processes on linear networks. The aim is to determine whether two given point patterns defined over the same linear network and under the assumption of Poissonness, share the same spatial structure. To do so, a Kolmogorov–Smirnov and a Cramér von Mises type test statistics are developed and analysed through an extensive simulation study. We have included different types of networks, balanced and unbalanced sample sizes, and homogeneous and inhomogeneous Poisson point processes. The results show a good level adjustment and high power values, the latter increasing with the sample size and the discrepancy between the two generating intensities. Finally, these methods have also been applied to the analysis of traffic accidents in Rio de Janeiro (Brazil), studying their distribution at different rush hours. PubDate: 2025-01-27
- Communication-efficient model averaging prediction for massive data with
asymptotic optimality-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper focuses on model averaging prediction for massive dataset. Specifically, in the framework of Mallows model averaging, we propose two distributed approaches to estimate the parameters of each submodel and weights in the final weighted estimator, respectively. The first approach is an one-shot procedure that aggregates the estimated parameters and weights from each local machine via simple average. The second approach is an iterative procedure that approximates the global loss by a surrogate loss in parameter estimation. The two proposed distributed estimators are communication-efficient, where the former requires only one round of communication and the latter requires two rounds of communications between central and local machines for parameter estimation to achieve the globally statistical efficiency. To estimate weight vector, two distributed algorithms are presented. Furthermore, we theoretically justify the two approaches by proving convergence rates and asymptotic normalities. More importantly, we establish the asymptotic optimality of distributed estimator of weight vector in terms of the out-of-sample prediction error criterion. Finally, simulations and a real data analysis are carried out to illustrate the proposed methods. PubDate: 2025-01-24
- Feature screening via false discovery rate control for linear model with
multivariate responses-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We develop a novel feature selection method for linear regression with multivariate responses in ultrahigh-dimensional data analysis. This method is constructed under the framework of False Discovery Rate (FDR) control for multiple testing, and it employs a multiple data-splitting strategy. In each splitting, the data is divided into two disjoint parts. The first part is utilized for feature screening based on R-Vector (RV) correlation, and multiple testing is then conducted on the selected features for both parts. The z-values of the statistics are aggregated to control the FDR, and the set of important features is determined by rejecting the null hypotheses. The asymptotic theory of FDR control for this method is established under mild conditions. Additionally, we evaluate the finite sample performance of our feature selection procedure through Monte Carlo simulations. Finally, we apply this approach to detect important human genes associated with psychological well-being. PubDate: 2025-01-23
|