|
|
- Extended Hotelling $$T^2$$ test in distributed frameworks
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Hypothesis test for a mean vector is a classical problem in data analysis but has been highly underinvestigated in distributed frameworks where samples of size n are located on k local sites. This paper focuses on the one-sample mean test, proposing synthesized test statistics with a much lower communication cost than the centralized Hotelling \(T^2\) test. For the homogeneous case, where data on different local sites are independent and identically distributed, the efficiency of our proposed test is comparable to that of the centralized one, and much better than the test constructed from the divide and conquer method. Besides, three heterogeneous cases are considered, where the distributions of the data on local sites can be different. Heterogeneous cases are much more challenging because the local sample means and covariance matrices may be inconsistent estimators. We construct communication-efficient testing procedures for heterogeneous cases, and the power of the proposed test statistics is comparable to that of the centralized one under some conditions. Simulation results verify the effectiveness of the proposed testing procedures. PubDate: 2024-07-30
- Optimal subsampling for $$L_p$$ -quantile regression via decorrelated
score-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract To balance robustness of quantile regression and effectiveness of expectile regression, we consider \(L_p\) -quantile regression models with large-scale data and develop a unified optimal subsampling method to downsize the data volume and reduce computational burden. For low-dimensional \(L_p\) -quantile regression models, two optimal subsampling probabilities based on the A- and L-optimality criteria are firstly proposed. For the preconceived low-dimensional parameter in high-dimensional \(L_p\) -quantile regression models, a novel optimal subsampling decorrelated score function is proposed to mitigate the effect from nuisance parameter estimation and then two optimal decorrelated score subsampling probabilities are provided. The asymptotic properties of two optimal subsample estimators are established. The finite-sample performance of the proposed estimators is studied through simulations, and an application to Beijing Air Quality Dataset is also presented. PubDate: 2024-07-21
- Oracle-efficient M-estimation for single-index models with a smooth
simultaneous confidence band-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Single-index models are important and popular semiparametric models, as they can handle the problem of the “curse of dimensionality” and enjoy the flexibility of nonparametric modeling and the interpretability of parametric modeling. Most existing methods for single-index models are sensitive to outliers or heavy-tailed distributions because they use the least squares criterion. An oracle-efficient M-estimator is proposed for single-index models, and a smooth simultaneous confidence band is constructed by treating the index coefficients as nuisance parameters. Under general assumptions it is shown that the M-estimator for the nonparametric link function, based on any \(\sqrt{n}\) -consistent coefficient index parameter estimators, is oracle-efficient. This means that it is uniformly as efficient as the infeasible one obtained by M-regression using the true single-index coefficient parameters. As a result, the asymptotic distribution of the maximal deviation between the M-type kernel estimator and the true link function is derived, and an asymptotically accurate simultaneous confidence band is established as a global inference tool for the link function. The proposed method generalizes the desirable uniform convergence property of ordinary least squares to the M-estimation. Meanwhile, it is a general approach that allows any \(\sqrt{n}\) -consistent coefficient parameter estimators to be applied in the procedure to make global inferences for the link function. Simulation studies with commonly encountered sample sizes are reported to support the theoretical findings. These numerical results show certain desirable robustness properties against heavy-tailed errors and outliers. As an illustration, the proposed method is applied to the analysis of a car purchasing dataset. PubDate: 2024-07-12
- Specifications tests for count time series models with covariates
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We propose a goodness-of-fit test for a class of count time series models with covariates which includes the Poisson autoregressive model with covariates (PARX) as a special case. The test criteria are derived from a specific characterization for the conditional probability generating function, and the test statistic is formulated as a \(L_2\) weighting norm of the corresponding sample counterpart. The asymptotic properties of the proposed test statistic are provided under the null hypothesis as well as under specific alternatives. A bootstrap version of the test is explored in a Monte–Carlo study and illustrated on a real data set on road safety. PubDate: 2024-07-01
- Marginal analysis of count time series in the presence of missing
observations-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Time series in real-world applications often have missing observations, making typical analytical methods unsuitable. One method for dealing with missing data is the concept of amplitude modulation. While this principle works with any data, here, missing data for unbounded and bounded count time series are investigated, where tailor-made dispersion and skewness statistics are used for model diagnostics. General closed-form asymptotic formulas are derived for such statistics with only weak assumptions on the underlying process. Moreover, closed-form formulas are derived for the popular special cases of Poisson and binomial autoregressive processes, always under the assumption that missingness occurs. The finite-sample performances of the considered asymptotic approximations are analyzed with simulations. The practical application of the corresponding dispersion and skewness tests under missing data is demonstrated with three real data examples. PubDate: 2024-06-28
- Conformal link prediction for false discovery rate control
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Most link prediction methods return estimates of the connection probability of missing edges in a graph. Such output can be used to rank the missing edges from most to least likely to be a true edge, but does not directly provide a classification into true and nonexistent. In this work, we consider the problem of identifying a set of true edges with a control of the false discovery rate (FDR). We propose a novel method based on high-level ideas from the literature on conformal inference. The graph structure induces intricate dependence in the data, which we carefully take into account, as this makes the setup different from the usual setup in conformal inference, where data exchangeability is assumed. The FDR control is empirically demonstrated for both simulated and real data. PubDate: 2024-06-11 DOI: 10.1007/s11749-024-00934-w
- Comments on: Data integration via analysis of subspaces (DIVAS)
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
PubDate: 2024-06-06 DOI: 10.1007/s11749-024-00937-7
- Comments on: Data integration via analysis of subspaces (DIVAS)
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
PubDate: 2024-06-04 DOI: 10.1007/s11749-024-00936-8
- Change point detection in high dimensional data with U-statistics
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We consider the problem of detecting distributional changes in a sequence of high dimensional data. Our approach combines two separate statistics stemming from \(L_p\) norms whose behavior is similar under \(H_0\) but potentially different under \(H_A\) , leading to a testing procedure that that is flexible against a variety of alternatives. We establish the asymptotic distribution of our proposed test statistics separately in cases of weakly dependent and strongly dependent coordinates as \(\min \{N,d\}\rightarrow \infty \) , where N denotes sample size and d is the dimension, and establish consistency of testing and estimation procedures in high dimensions under one-change alternative settings. Computational studies in single and multiple change point scenarios demonstrate our method can outperform other nonparametric approaches in the literature for certain alternatives in high dimensions. We illustrate our approach through an application to Twitter data concerning the mentions of U.S. governors. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00900-y
- A generalized Hosmer–Lemeshow goodness-of-fit test for a family of
generalized linear models-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Generalized linear models (GLMs) are very widely used, but formal goodness-of-fit (GOF) tests for the overall fit of the model seem to be in wide use only for certain classes of GLMs. We develop and apply a new goodness-of-fit test, similar to the well-known and commonly used Hosmer–Lemeshow (HL) test, that can be used with a wide variety of GLMs. The test statistic is a variant of the HL statistic, but we rigorously derive an asymptotically correct sampling distribution using methods of Stute and Zhu (Scand J Stat 29(3):535–545, 2002) and demonstrate its consistency. We compare the performance of our new test with other GOF tests for GLMs, including a naive direct application of the HL test to the Poisson problem. Our test provides competitive or comparable power in various simulation settings and we identify a situation where a naive version of the test fails to hold its size. Our generalized HL test is straightforward to implement and interpret and an R package is publicly available. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00912-8
- Complete asymptotic expansions and the high-dimensional Bingham
distributions-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract For \(d \ge 2\) , let X be a random vector having a Bingham distribution on \({\mathcal {S}}^{d-1}\) , the unit sphere centered at the origin in \({\mathbb {R}}^d\) , and let \(\Sigma \) denote the symmetric matrix parameter of the distribution. Let \(\Psi (\Sigma )\) be the normalizing constant of the distribution and let \(\nabla \Psi _d(\Sigma )\) be the matrix of first-order partial derivatives of \(\Psi (\Sigma )\) with respect to the entries of \(\Sigma \) . We derive complete asymptotic expansions for \(\Psi (\Sigma )\) and \(\nabla \Psi _d(\Sigma )\) , as \(d \rightarrow \infty \) ; these expansions are obtained subject to the growth condition that \(\Vert \Sigma \Vert \) , the Frobenius norm of \(\Sigma \) , satisfies \(\Vert \Sigma \Vert \le \gamma _0 d^{r/2}\) for all d, where \(\gamma _0 > 0\) and \(r \in [0,1)\) . Consequently, we obtain for the covariance matrix of X an asymptotic expansion up to terms of arbitrary degree in \(\Sigma \) . Using a range of values of d that have appeared in a variety of applications of high-dimensional spherical data analysis, we tabulate the bounds on the remainder terms in the expansions of \(\Psi (\Sigma )\) and \(\nabla \Psi _d(\Sigma )\) and we demonstrate the rapid convergence of the bounds to zero as r decreases. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00910-w
- Change-point detection in a tensor regression model
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, we consider an inference problem in a tensor regression model with one change-point. Specifically, we consider a general hypothesis testing problem on a tensor parameter and the studied testing problem includes as a special case the problem about the absence of a change-point. To this end, we derive the unrestricted estimator (UE) and the restricted estimator (RE) as well as the joint asymptotic normality of the UE and RE. Thanks to the established asymptotic normality, we derive a test for testing the hypothesized restriction. We also derive the asymptotic power of the proposed test and we prove that the established test is consistent. Beyond the complexity of the testing problem in the tensor model, we consider a very general case where the tensor error term and the regressors do not need to be independent and the dependence structure of the outer-product of the tensor error term and regressors is as weak as that of an \(\mathcal {L}^2-\) mixingale. Further, to study the performance of the proposed methods in small and moderate sample sizes, we present some simulation results that corroborate the theoretical results. Finally, to illustrate the application of the proposed methods, we test the non-existence of a change-point in some fMRI neuro-imaging data. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00915-5
- Application of the Cramér–Wold theorem to testing for
invariance under group actions-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We address the problem of testing for the invariance of a probability measure under the action of a group of linear transformations. We propose a procedure based on consideration of one-dimensional projections, justified using a variant of the Cramér–Wold theorem. Our test procedure is powerful, computationally efficient, and dimension-independent, extending even to the case of infinite-dimensional spaces (multivariate functional data). It includes, as special cases, tests for exchangeability and sign-invariant exchangeability. We compare our procedure with some previous proposals in these cases, in a small simulation study. The paper concludes with two real-data examples. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00899-2
- An instrumental variable approach under dependent censoring
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper considers the problem of inferring the causal effect of a variable Z on a dependently censored survival time T. We allow for unobserved confounding variables, such that the error term of the regression model for T is dependent on the confounded variable Z. Moreover, T is subject to dependent censoring. This means that T is right censored by a censoring time C, which is dependent on T (even after conditioning out the effects of the measured covariates). A control function approach, relying on an instrumental variable, is leveraged to tackle the confounding issue. Further, it is assumed that T and C follow a joint regression model with bivariate Gaussian error terms and an unspecified covariance matrix, such that the dependent censoring can be handled in a flexible manner. Conditions under which the model is identifiable are given, a two-step estimation procedure is proposed, and it is shown that the resulting estimator is consistent and asymptotically normal. Simulations are used to confirm the validity and finite-sample performance of the estimation procedure. Finally, the proposed method is used to estimate the causal effect of job training programs on unemployment duration. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00903-9
- Testing hypotheses about correlation matrices in general MANOVA designs
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Correlation matrices are an essential tool for investigating the dependency structures of random vectors or comparing them. We introduce an approach for testing a variety of null hypotheses that can be formulated based upon the correlation matrix. Examples cover MANOVA-type hypothesis of equal correlation matrices as well as testing for special correlation structures such as sphericity. Apart from existing fourth moments, our approach requires no other assumptions, allowing applications in various settings. To improve the small sample performance, a bootstrap technique is proposed and theoretically justified. Based on this, we also present a procedure to simultaneously test the hypotheses of equal correlation and equal covariance matrices. The performance of all new test statistics is compared with existing procedures through extensive simulations. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00906-6
- LRD spectral analysis of multifractional functional time series on
manifolds-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper addresses the estimation of the second-order structure of a manifold cross-time random field (RF) displaying spatially varying Long Range Dependence (LRD), adopting the functional time series framework introduced in Ruiz-Medina (Fract Calc Appl Anal 25:1426–1458, 2022). Conditions for the asymptotic unbiasedness of the integrated periodogram operator in the Hilbert–Schmidt operator norm are derived beyond structural assumptions. Weak-consistent estimation of the long-memory operator is achieved under a semiparametric functional spectral framework in the Gaussian context. The case where the projected manifold process can display Short Range Dependence (SRD) and LRD at different manifold scales is also analyzed. The performance of both estimation procedures is illustrated in the simulation study, in the context of multifractionally integrated spherical functional autoregressive–moving average (SPHARMA(p,q)) processes. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00913-7
- Model checking for generalized partially linear models
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We propose a residual-marked empirical process test to check goodness of fit for generalized partially linear models. The proposed test can gain dimension reduction, is shown to be consistent, and can detect root-n local alternatives. We further establish asymptotic distributions of the proposed test under the null hypothesis and analyze asymptotic properties under the local and global alternatives, and suggest a bootstrap procedure for calculating the critical value. We investigate its numerical performance by simulation experiments and illustrate its utilization in two real data examples. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00897-4
- Specification procedures for multivariate stable-Paretian laws for
independent and for conditionally heteroskedastic data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We consider goodness-of-fit methods for multivariate symmetric and asymmetric stable Paretian random vectors in arbitrary dimension. The methods are based on the empirical characteristic function and are implemented both in the i.i.d. context as well as for innovations in GARCH models. Asymptotic properties of the proposed procedures are discussed, while the finite-sample properties are illustrated by means of an extensive Monte Carlo study. The procedures are also applied to real data from the financial markets. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00909-3
- Tensor eigenvectors for projection pursuit
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Tensor eigenvectors naturally generalize matrix eigenvectors to multi-way arrays: eigenvectors of symmetric tensors of order k and dimension p are stationary points of polynomials of degree k in p variables on the unit sphere. Dominant eigenvectors of symmetric tensors maximize polynomials in several variables on the unit sphere, while base eigenvectors are roots of polynomials in several variables. In this paper, we focus on skewness-based projection pursuit and on third-order tensor eigenvectors, which provide the simplest, yet relevant connections between tensor eigenvectors and projection pursuit. Skewness-based projection pursuit finds interesting data projections using the dominant eigenvector of the sample third standardized cumulant to maximize skewness. Skewness-based projection pursuit also uses base eigenvectors of the sample third cumulant to remove skewness and facilitate the search for interesting data features other than skewness. Our contribution to the literature on tensor eigenvectors and on projection pursuit is twofold. Firstly, we show how skewness-based projection pursuit might be helpful in sequential cluster detection. Secondly, we show some asymptotic results regarding both dominant and base tensor eigenvectors of sample third cumulants. The practical relevance of the theoretical results is assessed with six well-known data sets. PubDate: 2024-06-01 DOI: 10.1007/s11749-023-00902-w
- Correction to: LRD spectral analysis of multifractional functional time
series on manifolds-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
PubDate: 2024-03-01 DOI: 10.1007/s11749-024-00924-y
|