Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract It is well known that the independence of the sample mean and the sample variance characterizes the normal distribution. By using Anosov’s theorem, we further investigate the analogous characteristic properties in terms of the sample mean and some feasible definite statistics. The latter statistics introduced in this paper for the first time are based on nonnegative, definite and continuous functions of ordered arguments with positive degree of homogeneity. The proposed approach seems to be natural and can be used to derive easily characterization results for many feasible definite statistics, such as known characterizations involving the sample variance, sample range as well as Gini’s mean difference. PubDate: 2022-06-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Vinberg cones and the ambient vector spaces are important in modern statistics of sparse models. The aim of this paper is to study eigenvalue distributions of Gaussian, Wigner and covariance matrices related to growing Vinberg matrices. For Gaussian or Wigner ensembles, we give an explicit formula for the limiting distribution. For Wishart ensembles defined naturally on Vinberg cones, their limiting Stieltjes transforms, support and atom at 0 are described explicitly in terms of the Lambert–Tsallis functions, which are defined by using the Tsallis q-exponential functions. PubDate: 2022-06-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In general, the solution to a regression problem is the minimizer of a given loss criterion and depends on the specified loss function. The nonparametric isotonic regression problem is special, in that optimal solutions can be found by solely specifying a functional. These solutions will then be minimizers under all loss functions simultaneously as long as the loss functions have the requested functional as the Bayes act. For the functional, the only requirement is that it can be defined via an identification function, with examples including the expectation, quantile, and expectile functionals. Generalizing classical results, we characterize the optimal solutions to the isotonic regression problem for identifiable functionals by rigorously treating these functionals as set-valued. The results hold in the case of totally or partially ordered explanatory variables. For total orders, we show that any solution resulting from the pool-adjacent-violators algorithm is optimal. PubDate: 2022-06-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Rank regression is a robust modeling tool; it is challenging to implement it for the distributed massive data owing to memory constraints. In practice, the massive data may be distributed heterogeneously from machine to machine; how to incorporate the heterogeneity is also an interesting issue. This paper proposes a distributed rank regression ( \(\mathrm {DR}^{2}\) ), which can be implemented in the master machine by solving a weighted least-squares and adaptive when the data are heterogeneous. Theoretically, we prove that the resulting estimator is statistically as efficient as the global rank regression estimator. Furthermore, based on the adaptive LASSO and a newly defined distributed BIC-type tuning parameter selector, we propose a distributed regularized rank regression ( \(\mathrm {DR}^{3}\) ), which can make consistent variable selection and can also be easily implemented by using the LARS algorithm on the master machine. Simulation results and real data analysis are included to validate our method. PubDate: 2022-06-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We investigate Bayesian variable selection in models driven by Gaussian processes, which allows us to treat linear, nonlinear and nonparametric models, in conjunction with even dependent setups, in the same vein. We consider the Bayes factor route to variable selection, and develop a general asymptotic theory for the Gaussian process framework in the “large p, large n” settings even with \(p\gg n\) , establishing almost sure exponential convergence of the Bayes factor under appropriately mild conditions. The fixed p setup is included as a special case. To illustrate, we apply our result to variable selection in linear regression, Gaussian process model with squared exponential covariance function accommodating the covariates, and a first-order autoregressive process with time-varying covariates. We also follow up our theoretical investigations with ample simulation experiments in the above regression contexts and variable selection in a real, riboflavin data consisting of 71 observations and 4088 covariates. For implementation of variable selection using Bayes factors, we develop a novel and effective general-purpose transdimensional, transformation-based Markov chain Monte Carlo algorithm, which has played a crucial role in simulated and real data applications. PubDate: 2022-06-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Let \(f_{Y X,Z}(y x,z)\) be the conditional probability function of Y given (X, Z), where Y is the scalar response variable, while (X, Z) is the covariable vector. This paper proposes a robust model selection criterion for \(f_{Y X,Z}(y x,z)\) with X missing at random. The proposed method is developed based on a set of assumed models for the selection probability function. However, the consistency of model selection by our proposal does not require these models to be correctly specified, while it only requires that the selection probability function is a function of these assumed selective probability functions. Under some conditions, it is proved that the model selection by the proposed method is consistent and the estimator for population parameter vector is consistent and asymptotically normal. A Monte Carlo study was conducted to evaluate the finite-sample performance of our proposal. A real data analysis was used to illustrate the practical application of our proposal. PubDate: 2022-06-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, we propose new semiparametric procedures for inference on linear functionals in the context of two semicontinuous populations. The distribution of each semicontinuous population is characterized by a mixture of a discrete point mass at zero and a continuous skewed positive component. To utilize the information from both populations, we model the positive components of the two mixture distributions via a semiparametric density ratio model. Under this model setup, we construct the maximum empirical likelihood estimators of the linear functionals. The asymptotic normality of the proposed estimators is established and is used to construct confidence regions and perform hypothesis tests for these functionals. We show that the proposed estimators are more efficient than the fully nonparametric ones. Simulation studies demonstrate the advantages of our method over existing methods. Two real-data examples are provided for illustration. PubDate: 2022-06-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In high-dimensional data analysis, bi-level sparsity is often assumed when covariates function group-wisely and sparsity can appear either at the group level or within certain groups. In such cases, an ideal model should be able to encourage the bi-level variable selection consistently. Bi-level variable selection has become even more challenging when data have heavy-tailed distribution or outliers exist in random errors and covariates. In this paper, we study a framework of high-dimensional M-estimation for bi-level variable selection. This framework encourages bi-level sparsity through a computationally efficient two-stage procedure. In theory, we provide sufficient conditions under which our two-stage penalized M-estimator possesses simultaneous local estimation consistency and the bi-level variable selection consistency if certain non-convex penalty functions are used at the group level. Both our simulation studies and real data analysis demonstrate satisfactory finite sample performance of the proposed estimators under different irregular settings. PubDate: 2022-06-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The proportional hazards model proposed by D. R. Cox in a high-dimensional and sparse setting is discussed. The regression parameter is estimated by the Dantzig selector, which will be proved to have the variable selection consistency. This fact enables us to reduce the dimension of the parameter and to construct asymptotically normal estimators for the regression parameter and the cumulative baseline hazard function. PubDate: 2022-06-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We consider a linear mixed-effects model with a clustered structure, where the parameters are estimated using maximum likelihood (ML) based on possibly unbalanced data. Inference with this model is typically done based on asymptotic theory, assuming that the number of clusters tends to infinity with the sample size. However, when the number of clusters is fixed, classical asymptotic theory developed under a divergent number of clusters is no longer valid and can lead to erroneous conclusions. In this paper, we establish the asymptotic properties of the ML estimators of random-effects parameters under a general setting, which can be applied to conduct valid statistical inference with fixed numbers of clusters. Our asymptotic theorems allow both fixed effects and random effects to be misspecified, and the dimensions of both effects to go to infinity with the sample size. PubDate: 2022-05-14
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract For \( 1\le i \le r\) , let \(F_i\) be the cumulative incidence function (CIF) corresponding to the ith risk in an r-competing risks model. We assume a discrete or a grouped time framework and obtain the maximum likelihood estimators (m.l.e.) of these CIFs under the restriction that \(F_i(t)/F_{i+1}(t)\) is nondecreasing, \(1 \le i \le r-1.\) We also derive the likelihood ratio tests for testing for and against this restriction and obtain their asymptotic distributions. The theory developed here can also be used to investigate the association between a failure time and a discretized or ordinal mark variable that is observed only at the time of failure. To illustrate the applicability of our results, we give examples in the competing risks and the mark variable settings. PubDate: 2022-05-14
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract For the class of Gauss–Markov processes we study the problem of asymptotic equivalence of the nonparametric regression model with errors given by the increments of the process and the continuous time model, where a whole path of a sum of a deterministic signal and the Gauss–Markov process can be observed. We derive sufficient conditions which imply asymptotic equivalence of the two models. We verify these conditions for the special cases of Sobolev ellipsoids and Hölder classes with smoothness index \(>1/2\) under mild assumptions on the Gauss–Markov process. To give a counterexample, we show that asymptotic equivalence fails to hold for the special case of Brownian bridge. Our findings demonstrate that the well-known asymptotic equivalence of the Gaussian white noise model and the nonparametric regression model with i.i.d. standard normal errors (see Brown and Low (Ann Stat 24:2384–2398, 1996)) can be extended to a setup with general Gauss–Markov noises. PubDate: 2022-05-09
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Feature selection for the high-dimensional Cox proportional hazards model (Cox model) is very important in many microarray genetic studies. In this paper, we propose a sequential feature selection procedure for this model. We define a novel partial profile score to assess the impact of unselected features conditional on the current model, significant features are thereby added into the model sequentially, and the Extended Bayesian Information Criteria (EBIC) is adopted as a stopping rule. Under mild conditions, we show that this procedure is selection consistent. Extensive simulation studies and two real data applications are conducted to demonstrate the advantage of our proposed procedure over several representative approaches. PubDate: 2022-05-07
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper proposes a blockwise network autoregressive (BWNAR) model by grouping nodes in the network into nonoverlapping blocks to adapt networks with blockwise structures. Before modeling, we employ the pseudo likelihood ratio criterion (pseudo-LR) together with the standard spectral clustering approach and a binary segmentation method developed by Ma et al. (Journal of Machine Learning Research, 22, 1–63, 2021) to estimate the number of blocks and their memberships, respectively. Then, we acquire the consistency and asymptotic normality of the estimator of influence parameters by the quasi-maximum likelihood estimation method without imposing any distribution assumptions. In addition, a novel likelihood ratio test statistic is proposed to verify the heterogeneity of the influencing parameters. The performance and usefulness of the model are assessed through simulations and an empirical example of the detection of fraud in financial transactions, respectively. PubDate: 2022-05-04
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The research is about a systematic investigation on the following issues. First, we construct different outcome regression-based estimators for conditional average treatment effect under, respectively, true, parametric, nonparametric and semiparametric dimension reduction structure. Second, according to the corresponding asymptotic variance functions when supposing the models are correctly specified, we answer the following questions: what is the asymptotic efficiency ranking about the four estimators in general' how is the efficiency related to the affiliation of the given covariates in the set of arguments of the regression functions' what do the roles of bandwidth and kernel function selections play for the estimation efficiency; and in which scenarios should the estimator under semiparametric dimension reduction regression structure be used in practice' Meanwhile, the results show that any outcome regression-based estimation should be asymptotically more efficient than any inverse probability weighting-based estimation. Several simulation studies are conducted to examine the finite sample performances of these estimators, and a real dataset is analyzed for illustration. PubDate: 2022-04-29
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Image classifiers based on convolutional neural networks are defined, and the rate of convergence of the misclassification risk of the estimates towards the optimal misclassification risk is analyzed. Under suitable assumptions on the smoothness and structure of a posteriori probability, the rate of convergence is shown which is independent of the dimension of the image. This proves that in image classification, it is possible to circumvent the curse of dimensionality by convolutional neural networks. Furthermore, the obtained result gives an indication why convolutional neural networks are able to outperform the standard feedforward neural networks in image classification. Our classifiers are compared with various other classification methods using simulated data. Furthermore, the performance of our estimates is also tested on real images. PubDate: 2022-04-27
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Motivated by the complexity of network data, we propose a directed hybrid random network that mixes preferential attachment (PA) rules with uniform attachment rules. When a new edge is created, with probability \(p\in (0,1)\) , it follows the PA rule. Otherwise, this new edge is added between two uniformly chosen nodes. Such mixture makes the in- and out-degrees of a fixed node grow at a slower rate, compared to the pure PA case, thus leading to lighter distributional tails. For estimation and inference, we develop two numerical methods which are applied to both synthetic and real network data. We see that with extra flexibility given by the parameter p, the hybrid random network provides a better fit to real-world scenarios, where lighter tails from in- and out-degrees are observed. PubDate: 2022-04-23
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, we consider the variable selection problem in functional linear regression with interactions. Our goal is to identify relevant main effects and corresponding interactions associated with the response variable. Heredity is a natural assumption in many statistical models involving two-way or higher-order interactions. Inspired by this, we propose an adaptive group Lasso method for the multiple functional linear model that adaptively selects important single functional predictors and pairwise interactions while obeying the strong heredity constraint. The proposed method is based on the functional principal components analysis with two adaptive group penalties, one for main effects and one for interaction effects. With appropriate selection of the tuning parameters, the rates of convergence of the proposed estimators and the consistency of the variable selection procedure are established. Simulation studies demonstrate the performance of the proposed procedure and a real example is analyzed to illustrate its practical usage. PubDate: 2022-04-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Uniform designs have been widely applied in engineering and sciences’ innovation. When a lot of quantitative factors are investigated with as few runs as possible, a supersaturated uniform design with good overall and projection uniformity is needed. By combining combinatorial methods and stochastic algorithms, such uniform designs with flexible numbers of columns are constructed in this article under the wrap-around \(L_2\) -discrepancy. Compared with the existing designs, the new designs and their two-dimensional projections not only have less aberration, but also have lower discrepancy. Furthermore, some novel theoretical results on the minimum-aberration, uniform and uniform projection designs are obtained. PubDate: 2022-04-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper studies local polynomial estimation of expectile regression. Expectiles and quantiles both provide a full characterization of a (conditional) distribution function, but have each their own merits and inconveniences. Local polynomial fitting as a smoothing technique has a major advantage of being simple, allowing for explicit expressions and henceforth advantages when doing inference theory. The aim of this paper is twofold: to study in detail the use of local polynomial fitting in the context of expectile regression and to contribute to the important issue of bandwidth selection, from theoretical and practical points of view. We discuss local polynomial expectile regression estimators and establish an asymptotic normality result for them. The finite-sample performance of the estimators, combined with various bandwidth selectors, is investigated in a simulation study. Some illustrations with real data examples are given. PubDate: 2022-04-01