Abstract: We propose a functional classification method with high-dimensional image predictors using a combination of logistic discrimination and basis expansions with sparse principal component analysis (PCA). Our model is an extension of the existing functional generalized linear models with image predictors using functional principal component regression to L1-regularized principal components. This extension enables us to create a more flexible prognostic region that does not depend on the shape of basis functions. Monte Carlo simulations were conducted to examine the method’s efficiency when compared with several possible classification techniques. Our method was shown to be the best in terms of both sensitivity and specificity for detecting the shape of interests and classifying groups. In addition, our model was applied to data on Alzheimer’s disease. Our model detected the prognostic brain region and was used to classify early-stage Alzheimer patients efficiently, based on three-dimensional structural magnetic resonance imaging (sMRI). PubDate: 2019-03-22

Abstract: Generalized structured component analysis (GSCA) has been extensively enhanced in terms of data-analytic capability and flexibility as well as computational efficiency. This article illustrates a novel application of GSCA for brain connectivity research, the purpose of which is to facilitate its uses with functional neuroimaging data among applied researchers and practitioners. Using data collected during encoding of source memory in a functional magnetic resonance imaging study, this article demonstrates how to specify and evaluate a fully and bidirectionally connected structural model of brain connectivity using GSCA. Implications of the GSCA approach and future directions for brain research are discussed. PubDate: 2019-03-21

Abstract: Clustering (partitioning) and simultaneous dimension reduction of objects and variables of a two-way two-mode data matrix is proposed here. The methodology is based on a general model that includes K-means clustering, factorial K-means, projection pursuit clustering (also known as reduced K-means), principal component analysis and intermediate cases of object clustering and variable reduction. Since we often have sets consisting of both qualitative and quantitative variables, the general model is now extended to deal with the general relevant case of mixed variables, analogous to variants of PCA handling qualitative (nominal and ordinal) variables in addition to quantitative variables. The model, called clustering and dimension reduction (CDR), is fully discussed in all the special cases cited above. For least-squares estimation of the model, an efficient coordinate descent algorithm is presented. Finally, a simulation study and two analyses on real data illustrate the features of CDR and study the performance of the proposed algorithm. PubDate: 2019-03-11

Abstract: Null hypothesis significance testing is cited as a threat to validity and reproducibility. While many individuals suggest that we focus on altering the p value at which we deem an effect significant, we believe this suggestion is short-sighted. Alternative procedures (i.e., Bayesian analyses and observation-oriented modeling: OOM) can be more powerful and meaningful to our discipline. However, these methodologies are less frequently utilized and are rarely discussed in combination with NHST. Herein, we discuss three methodologies (NHST, Bayesian Model comparison, and OOM), then compare the possible interpretations of three analyses (ANOVA, Bayes Factor, and an Ordinal Pattern Analysis) in various data environments using a frequentist simulation study. We found that changing significance thresholds had little effect on conclusions. Furthermore, we suggest that evaluating multiple estimates as evidence of an effect allows for more robust and nuanced interpretations of results and implies the need to redefine evidentiary value and reporting practices. PubDate: 2019-03-08

Abstract: In functional data analysis, it is often of interest to discover a general common pattern, or shape, of the function. When the subject-specific amplitude and phase variation of data are not of interest, curve registration can be used to separate the variation from the data. Shape-invariant models (SIM), one of the registration methods, aim to estimate the unknown shared-shape function. However, the use of SIM and of general registration methods assumes that all curves have the shared-shape in common and does not consider the existence of outliers, such as a curve, whose shape is inconsistent with the remainder of the data. Therefore, we propose using the t distribution to robustify SIMs, allowing outliers of amplitude, phase, and other errors. Our SIM can identify and classify the three types of outliers mentioned above. We use simulation and an empirical data set to evaluate the performance of our robust SIM. PubDate: 2019-02-22

Abstract: The mixture Rasch model is gaining popularity as it allows items to perform differently across subpopulations and hence addresses the violation of the unidimensionality assumption with traditional Rasch models. This study focuses on comparing two common maximum likelihood methods for estimating such models using Monte Carlo simulations. The conditional maximum likelihood (CML) and joint maximum likelihood (JML) estimations, as implemented in three popular R packages are compared by evaluating parameter recovery and class accuracy. The results suggest that in general, CML is preferred in parameter recovery and JML is preferred in identifying the correct number of classes. A set of guidelines is also provided regarding how sample sizes, test lengths or actual class probabilities affect the accuracy of estimation and number of classes, as well as how different information criteria compare in achieving class accuracy. Specific issues regarding the performance of particular R packages are highlighted in the study as well. PubDate: 2019-01-18

Abstract: Many external validity indices for comparing different clusterings of the same set of objects are overall measures: they quantify similarity between clusterings for all clusters simultaneously. Because a single number only provides a general notion of what is going on, the values of such overall indices (usually between 0 and 1) are often difficult to interpret. In this paper, we show that a class of normalizations of the mutual information can be decomposed into indices that contain information on the level of individual clusters. The decompositions (1) reveal that overall measures can be interpreted as summary statistics of information reflected in the individual clusters, (2) specify how these overall indices are related to individual clusters, and (3) show that the overall indices are affected by cluster size imbalance. We recommend to use measures for individual clusters since they provide much more detailed information than a single overall number. PubDate: 2018-12-04

Abstract: Item banks are often created in large-scale research and testing settings in the social sciences to predict individuals’ latent trait scores. A common procedure is to fit multiple candidate item response theory (IRT) models to a calibration sample and select a single best-fitting IRT model. The parameter estimates from this model are then used to obtain trait scores for subsequent respondents. However, this model selection procedure ignores model uncertainty stemming from the fact that the model ranking in the calibration phase is subject to sampling variability. Consequently, the standard errors of trait scores obtained from subsequent respondents do not reflect such uncertainty. Ignoring such sources of uncertainty contributes to the current replication crisis in the social sciences. In this article, we propose and demonstrate an alternative procedure to account for model uncertainty in this context—model averaging of IRT trait scores and their standard errors. We outline the general procedure step-by-step and provide software to aid researchers in implementation, both for large-scale research settings with item banks and for smaller research settings involving IRT scoring. We then demonstrate the procedure with a simulated item-banking illustration, comparing model selection and model averaging within sample in terms of predictive coverage. We conclude by discussing ways that model averaging and IRT scoring can be used and investigated in future research. PubDate: 2018-10-01

Abstract: Propensity score methods are popular and effective statistical techniques for reducing selection bias in observational data to increase the validity of causal inference based on observational studies in behavioral and social science research. Some methodologists and statisticians have raised concerns about the rationale and applicability of propensity score methods. In this review, we addressed these concerns by reviewing the development history and the assumptions of propensity score methods, followed by the fundamental techniques of and available software packages for propensity score methods. We especially discussed the issues in and debates about the use of propensity score methods. This review provides beneficial information about propensity score methods from the historical point of view and helps researchers to select appropriate propensity score methods for their observational studies. PubDate: 2018-10-01

Abstract: Reading comprehension is often assessed by having students read passages and administering a test that assesses their understanding of the text. Shorter assessments may fail to give a full picture of comprehension ability while more thorough ones can be time consuming and costly. This study used data from a conversational intelligent tutoring system (AutoTutor) to assess reading comprehension ability in 52 low-literacy adults who interacted with the system. We analyzed participants’ accuracy and time spent answering questions in conversations in lessons that targeted four theoretical components of comprehension: Word, Textbase, Situation Model, and Rhetorical Structure. Accuracy and answer response time were analyzed to track adults’ proficiency for comprehension components, and we analyzed whether the four components predicted reading grade level. We discuss the results with respect to the advantages that a conversational intelligent tutoring system assessment may provide over traditional assessment tools and the linking of theory to practice in adult literacy. PubDate: 2018-10-01

Abstract: In this article, we present a study on the design, development, and validation of a multimedia-based performance assessment (MBPA) for measuring the skills of confined space guards in Dutch vocational education. An MBPA is a computer-based assessment that incorporates multimedia to simulate tasks. It is designed to measure performance-based skills. A confined space guard (CSG) supervises operations that are carried out in a confined space (e.g., a tank or silo). In the Netherlands, individuals who want to become certified CSGs must participate in a one-day training program, and pass both a multiple-choice knowledge test and a performance-based assessment. In the first part of this article, we focus on the design and development of the MBPA, using a specific framework for design and development. In the second part of the article, we present a validation study. We use the argument-based approach to validation to validate the MBPA (Kane in Educational measurement. American Council on Education and Praeger Publishers, Westport, 2006 and J Educ Meas 50(1):1–73, 2013). More specifically, the extended argument-based approach to validation is used (Wools et al. in CADMO 18(1):63–82, 2010 and Stud Educ Eval 48:10–18, 2016). The approach suggests using multiple sources of validity evidence to build a comprehensive validity case for the proposed interpretation of assessment scores (Kane 2006, 2013) and to evaluate the strength of the validity case (Wools et al. 2010, 2016). We demonstrate that MBPA scores can be used for their intended purpose; students’ performance in the MBPA can be used as the basis for making a CSG certification decision. PubDate: 2018-10-01

Abstract: A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees. PubDate: 2018-10-01

Abstract: Scales that are psychometrically sound, meaning those that meet established standards regarding reliability and validity when measuring one or more constructs of interest, are customarily evaluated based on a set modality (i.e., computer or paper) and administration (fixed-item order). Deviating from an established administration profile could result in non-equivalent response patterns, indicating the possible evaluation of a dissimilar construct. Randomizing item administration may alter or eliminate these effects. Therefore, we examined the differences in scale relationships for randomized and nonrandomized computer delivery for two scales measuring meaning/purpose in life. These scales have questions about suicidality, depression, and life goals that may cause item reactivity (i.e., a changed response to a second item based on the answer to the first item). Results indicated that item randomization does not alter scale psychometrics for meaning in life scales, which implies that results are comparable even if researchers implement different delivery modalities. PubDate: 2018-10-01

Abstract: This paper provides a theoretical foundation to examine the effectiveness of post-hoc adjustment approaches such as propensity score matching in reducing the selection bias of synthetic cohort design (SCD) for causal inference and program evaluation. Compared with the Solomon four-group design, the SCD often encounters selection bias due to the imbalance of covariates between the two cohorts. The efficiency of SCD is ensured by the historical equivalence of groups (HEoG) assumption, indicating the comparability between the two cohorts. The multilevel structural equation modeling framework is used to define the HEoG assumption. According to the mathematical proof, HEoG ensures that the use of SCD results in an unbiased estimator of the schooling effect. Practical considerations and suggestions for future research and use of SCD are discussed. PubDate: 2018-10-01

Authors:Housila P. Singh; Swarangi M. Gorey Abstract: In this paper, we have suggested a weighted unbiased estimator based on mixed randomized response model. Some unbiased estimators are generated from the proposed weighted estimator. The variance of the proposed weighted estimator is obtained and relevant condition is obtained in which the proposed weighted estimator is superior to Singh and Tarray (Commun Stat Appl Methods 19(6):751–759, 2014) estimator. It is interesting to mention that we have investigated an estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) which is the member of the suggested weighted estimator \( \hat{\pi }_{\text{HS}} \) provide better efficiency than the Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{h}} \) and close to the optimum estimator \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) . Thus, the estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) is an alternative to optimum estimator \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) . The study is further extended in case of stratified random sampling. PubDate: 2018-04-24 DOI: 10.1007/s41237-018-0049-9

Authors:Marco Scutari Abstract: A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian–Dirichlet (BD) scores; the most famous is the Bayesian–Dirichlet equivalent uniform (BDeu) score from Heckerman et al. (Mach Learn 20(3):197–243, 1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (Proceedings of the 27th international workshop on Bayesian inference and maximum entropy methods in science and engineering, pp 74–84, 2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work [Scutari in J Mach Learn Res (Proc Track PGM 2016) 52:438–448, 2016] that the Bayesian–Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior. PubDate: 2018-04-07 DOI: 10.1007/s41237-018-0048-x