Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Feature selection for the high-dimensional Cox proportional hazards model (Cox model) is very important in many microarray genetic studies. In this paper, we propose a sequential feature selection procedure for this model. We define a novel partial profile score to assess the impact of unselected features conditional on the current model, significant features are thereby added into the model sequentially, and the Extended Bayesian Information Criteria (EBIC) is adopted as a stopping rule. Under mild conditions, we show that this procedure is selection consistent. Extensive simulation studies and two real data applications are conducted to demonstrate the advantage of our proposed procedure over several representative approaches. PubDate: 2022-12-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract This paper proposes a blockwise network autoregressive (BWNAR) model by grouping nodes in the network into nonoverlapping blocks to adapt networks with blockwise structures. Before modeling, we employ the pseudo likelihood ratio criterion (pseudo-LR) together with the standard spectral clustering approach and a binary segmentation method developed by Ma et al. (Journal of Machine Learning Research, 22, 1–63, 2021) to estimate the number of blocks and their memberships, respectively. Then, we acquire the consistency and asymptotic normality of the estimator of influence parameters by the quasi-maximum likelihood estimation method without imposing any distribution assumptions. In addition, a novel likelihood ratio test statistic is proposed to verify the heterogeneity of the influencing parameters. The performance and usefulness of the model are assessed through simulations and an empirical example of the detection of fraud in financial transactions, respectively. PubDate: 2022-12-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract We consider a linear mixed-effects model with a clustered structure, where the parameters are estimated using maximum likelihood (ML) based on possibly unbalanced data. Inference with this model is typically done based on asymptotic theory, assuming that the number of clusters tends to infinity with the sample size. However, when the number of clusters is fixed, classical asymptotic theory developed under a divergent number of clusters is no longer valid and can lead to erroneous conclusions. In this paper, we establish the asymptotic properties of the ML estimators of random-effects parameters under a general setting, which can be applied to conduct valid statistical inference with fixed numbers of clusters. Our asymptotic theorems allow both fixed effects and random effects to be misspecified, and the dimensions of both effects to go to infinity with the sample size. PubDate: 2022-12-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Image classifiers based on convolutional neural networks are defined, and the rate of convergence of the misclassification risk of the estimates towards the optimal misclassification risk is analyzed. Under suitable assumptions on the smoothness and structure of a posteriori probability, the rate of convergence is shown which is independent of the dimension of the image. This proves that in image classification, it is possible to circumvent the curse of dimensionality by convolutional neural networks. Furthermore, the obtained result gives an indication why convolutional neural networks are able to outperform the standard feedforward neural networks in image classification. Our classifiers are compared with various other classification methods using simulated data. Furthermore, the performance of our estimates is also tested on real images. PubDate: 2022-12-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract For \( 1\le i \le r\) , let \(F_i\) be the cumulative incidence function (CIF) corresponding to the ith risk in an r-competing risks model. We assume a discrete or a grouped time framework and obtain the maximum likelihood estimators (m.l.e.) of these CIFs under the restriction that \(F_i(t)/F_{i+1}(t)\) is nondecreasing, \(1 \le i \le r-1.\) We also derive the likelihood ratio tests for testing for and against this restriction and obtain their asymptotic distributions. The theory developed here can also be used to investigate the association between a failure time and a discretized or ordinal mark variable that is observed only at the time of failure. To illustrate the applicability of our results, we give examples in the competing risks and the mark variable settings. PubDate: 2022-12-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract For the class of Gauss–Markov processes we study the problem of asymptotic equivalence of the nonparametric regression model with errors given by the increments of the process and the continuous time model, where a whole path of a sum of a deterministic signal and the Gauss–Markov process can be observed. We derive sufficient conditions which imply asymptotic equivalence of the two models. We verify these conditions for the special cases of Sobolev ellipsoids and Hölder classes with smoothness index \(>1/2\) under mild assumptions on the Gauss–Markov process. To give a counterexample, we show that asymptotic equivalence fails to hold for the special case of Brownian bridge. Our findings demonstrate that the well-known asymptotic equivalence of the Gaussian white noise model and the nonparametric regression model with i.i.d. standard normal errors (see Brown and Low (Ann Stat 24:2384–2398, 1996)) can be extended to a setup with general Gauss–Markov noises. PubDate: 2022-12-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, we consider possibly misspecified stochastic differential equation models driven by Lévy processes. Regardless of whether the driving noise is Gaussian or not, Gaussian quasi-likelihood estimator can estimate unknown parameters in the drift and scale coefficients. However, in the misspecified case, the asymptotic distribution of the estimator varies by the correction of the misspecification bias, and consistent estimators for the asymptotic variance proposed in the correctly specified case may lose theoretical validity. As one of its solutions, we propose a bootstrap method for approximating the asymptotic distribution. We show that our bootstrap method theoretically works in both correctly specified case and misspecified case without assuming the precise distribution of the driving noise. PubDate: 2022-11-10

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, we propose tests for the existence of random effects and interactions for two-way models with dependent errors. We prove that the proposed tests are asymptotically distribution-free which have asymptotically size \({{\tau }}\) and are consistent. We elucidate the nontrivial power under the local alternative when a sample size tends to infinity and the number of groups is fixed. A simulation study is performed to investigate the finite-sample performance of the proposed tests. In the real data analysis, we apply our tests to the daily log-returns of 24 stock prices from six countries and four sectors. We find that there is no strong evidence to support the existence of substantial differences in the log-return across countries, nor to the existence of interactions between countries and sectors. However, there exists random effect differences in the daily log-return series across different sectors. PubDate: 2022-10-31

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In recent years, many methodologies for distributed data have been developed. However, there are two problems. First, most of these methods require the data to be randomly and uniformly distributed across different machines. Second, the methods are mainly not robust. To solve these problems, we propose a distributed pilot modal regression estimator, which achieves robustness and can adapt when the data are stored nonrandomly. First, we collect a random pilot sample from different machines; then, we approximate the global MR objective function by a communication-efficient surrogate that can be efficiently evaluated by the pilot sample and the local gradients. The final estimator is obtained by minimizing the surrogate function in the master machine, while the other machines only need to calculate their gradients. Theoretical results show the new estimator is asymptotically efficient as the global MR estimator. Simulation studies illustrate the utility of the proposed approach. PubDate: 2022-10-12

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Motivated by the complexity of network data, we propose a directed hybrid random network that mixes preferential attachment (PA) rules with uniform attachment rules. When a new edge is created, with probability \(p\in (0,1)\) , it follows the PA rule. Otherwise, this new edge is added between two uniformly chosen nodes. Such mixture makes the in- and out-degrees of a fixed node grow at a slower rate, compared to the pure PA case, thus leading to lighter distributional tails. For estimation and inference, we develop two numerical methods which are applied to both synthetic and real network data. We see that with extra flexibility given by the parameter p, the hybrid random network provides a better fit to real-world scenarios, where lighter tails from in- and out-degrees are observed. PubDate: 2022-10-01 DOI: 10.1007/s10463-022-00827-5

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper we study generalized semi-Markov high dimension regression models in continuous time, observed at fixed discrete time moments. The generalized semi-Markov process has dependent jumps and, therefore, it is an extension of the semi-Markov regression introduced in Barbu et al. (Stat Inference Stoch Process 22:187–231, 2019a). For such models we consider estimation problems in nonparametric setting. To this end, we develop model selection procedures for which sharp non-asymptotic oracle inequalities for the robust risks are obtained. Moreover, we give constructive sufficient conditions which provide through the obtained oracle inequalities the adaptive robust efficiency property in the minimax sense. It should be noted also that, for these results, we do not use neither sparse conditions nor the parameter dimension in the model. As examples, regression models constructed through spherical symmetric noise impulses and truncated fractional Poisson processes are considered. Numerical Monte-Carlo simulations confirming the theoretical results are given in the supplementary materials. PubDate: 2022-10-01 DOI: 10.1007/s10463-022-00820-y

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Some quasi-arithmetic means of random variables easily give unbiased strongly consistent closed-form estimators of the joint of the location and scale parameters of the Cauchy distribution. The one-step estimators of those quasi-arithmetic means of the Cauchy distribution are considered. We establish the Bahadur efficiency of the maximum likelihood estimator and the one-step estimators. We also show that the rate of the convergence of the mean-squared errors achieves the Cramér–Rao bound. Our results are also applicable to the circular Cauchy distribution . PubDate: 2022-10-01 DOI: 10.1007/s10463-021-00818-y

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Statistical analysis of large-scale dataset is challenging due to the limited memory constraint and computation source and calls for the efficient distributed methods. In this paper, we mainly study the distributed estimation and inference for composite quantile regression (CQR). For computational and statistical efficiency, we propose to apply a smoothing idea to the CQR loss function for the distributed data and then successively refine the estimator via multiple rounds of aggregations. Based on the Bahadur representation, we derive the asymptotic normality of the proposed multi-round smoothed CQR estimator and show that it also achieves the same efficiency of the ideal CQR estimator by analyzing the entire dataset simultaneously. Moreover, to improve the efficiency of the CQR, we propose a multi-round smoothed weighted CQR estimator. Extensive numerical experiments on both simulated and real data validate the superior performance of the proposed estimators. PubDate: 2022-10-01 DOI: 10.1007/s10463-021-00816-0

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The research is about a systematic investigation on the following issues. First, we construct different outcome regression-based estimators for conditional average treatment effect under, respectively, true, parametric, nonparametric and semiparametric dimension reduction structure. Second, according to the corresponding asymptotic variance functions when supposing the models are correctly specified, we answer the following questions: what is the asymptotic efficiency ranking about the four estimators in general' how is the efficiency related to the affiliation of the given covariates in the set of arguments of the regression functions' what do the roles of bandwidth and kernel function selections play for the estimation efficiency; and in which scenarios should the estimator under semiparametric dimension reduction regression structure be used in practice' Meanwhile, the results show that any outcome regression-based estimation should be asymptotically more efficient than any inverse probability weighting-based estimation. Several simulation studies are conducted to examine the finite sample performances of these estimators, and a real dataset is analyzed for illustration. PubDate: 2022-10-01 DOI: 10.1007/s10463-022-00821-x

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this work, we propose nonparametric two-sample tests for population-averaged transition and state occupation probabilities for continuous-time and finite state space processes with clustered, right-censored, and/or left-truncated data. We consider settings where the two groups under comparison are independent or dependent, with or without complete cluster structure. The proposed tests do not impose assumptions regarding the structure of the within-cluster dependence and are applicable to settings with informative cluster size and/or non-Markov processes. The asymptotic properties of the tests are rigorously established using empirical process theory. Simulation studies show that the proposed tests work well even with a small number of clusters, and that they can be substantially more powerful compared to the only, to the best of our knowledge, previously proposed nonparametric test for this problem. The tests are illustrated using data from a multicenter randomized controlled trial on metastatic squamous-cell carcinoma of the head and neck. PubDate: 2022-10-01 DOI: 10.1007/s10463-021-00819-x

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Multivariate nonresponse is often encountered in complex survey sampling, and simply ignoring it leads to erroneous inference. In this paper, we propose a new matrix completion method for complex survey sampling. Different from existing works either conducting row-wise or column-wise imputation, the data matrix is treated as a whole which allows for exploiting both row and column patterns simultaneously. A column-space-decomposition model is adopted incorporating a low-rank structured matrix for the finite population with easy-to-obtain demographic information as covariates. Besides, we propose a computationally efficient projection strategy to identify the model parameters under complex survey sampling. Then, an augmented inverse probability weighting estimator is used to estimate the parameter of interest, and the corresponding asymptotic upper bound of the estimation error is derived. Simulation studies show that the proposed estimator has a smaller mean squared error than other competitors, and the corresponding variance estimator performs well. The proposed method is applied to assess the health status of the U.S. population. PubDate: 2022-09-19 DOI: 10.1007/s10463-022-00851-5

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract A general class of inhomogeneous hidden semi-Markov models (IHSMMs) is proposed for modelling partially observed processes that do not necessarily behave in a stationary and memoryless manner. The key feature of the proposed model is that the sojourn times of the states in the semi-Markov chain are time-dependent, making it an inhomogeneous semi-Markov chain. Conjectured consistency of the parameter estimators is checked by simulation study using direct numerical optimization of the log-likelihood function. The proposed models are applied to a global volcanic eruption catalogue to investigate the time-dependent incompleteness of the record by introducing a particular case of IHSMMs with time-dependent shifted Poisson state durations and a renewal process as the observed process. The Akaike Information Criterion and residual analysis are used to choose the best model. The selected IHSMM provides useful insights into the completeness of the global record of volcanic eruptions, demonstrating the effectiveness of this method. PubDate: 2022-09-18 DOI: 10.1007/s10463-022-00843-5

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Over the last four decades, the cluster regression analysis in a finite population (FP) setup for an exponential family such as linear or binary data was done by using a two-stage cluster sample chosen from the FP but by treating the sample as though it is a single-stage cluster sample from a super-population (SP) which contains the FP as a hypothetical sample. Because the responses within a cluster in the FP are correlated, the aforementioned sample mis-specification makes the sample-based so-called GLS (generalized least square) estimators design biased and inconsistent. In this paper, we demonstrate for the exponential family data how to avoid the sampling mis-specification and accommodate the cluster correlations to obtain unbiased and consistent estimates for the FP parameters. The asymptotic normality of the regression estimators is also given for the construction of confidence intervals when needed. PubDate: 2022-09-14 DOI: 10.1007/s10463-022-00850-6

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract We propose a general model that jointly characterizes degree heterogeneity and homophily in weighted, undirected networks. We present a moment estimation method using node degrees and homophily statistics. We establish consistency and asymptotic normality of our estimator using novel analysis. We apply our general framework to three applications, including both exponential family and non-exponential family models. Comprehensive numerical studies and a data example also demonstrate the usefulness of our method. PubDate: 2022-09-02 DOI: 10.1007/s10463-022-00848-0

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, we consider conditional selective inference (SI) for a linear model estimated after outliers are removed from the data. To apply the conditional SI framework, it is necessary to characterize the events of how the robust method identifies outliers. Unfortunately, the existing conditional SIs cannot be directly applied to our problem because they are applicable to the case where the selection events can be represented by linear or quadratic constraints. We propose a conditional SI method for popular robust regressions such as least-absolute-deviation regression and Huber regression by introducing a new computational method using a convex optimization technique called homotopy method. We show that the proposed conditional SI method is applicable to a wide class of robust regression and outlier detection methods and has good empirical performance on both synthetic data and real data experiments. PubDate: 2022-08-27 DOI: 10.1007/s10463-022-00846-2