Authors:Maurício Silva LACERDA, Jhennifer dos Santos NASCIMENTO, Eduardo Campana BARBOSA, Rômulo César MANULI, Moysés NASCIMENTO, Ana Carolina Campana NASCIMENTO, Paulo Cesar EMILIANO Abstract: The present study aimed to evaluate, through data simulation, the multivariate statistical tests Likelihood ratio test (LRT) and Hotelling’s T2 test for mean vectors regarding the type I error rate and the power of test. The scenarios were designed to analyze test performance under the influence of p−variate normality, correlation, and homogeneity of variance, as well as number of variables and sample size. Our results show that the type I error rate was not affected by the violation of the assumptions of independence and homogeneity of variances, due to the presence of p−variate normality, differently from the power of test. In data simulation of p−variate distribution with heavier tails than usual (Student−t with 1 degree of freedom), the Hotelling’s T2 showed to be conservative, while the LRT showed better results, especially for small sample sizes. PubDate: 2022-09-23 DOI: 10.28951/bjb.v40i3.560 Issue No:Vol. 40, No. 3 (2022)

Authors:Juliana Vieira GOMES, Camila Rafaela Gomes DIAS, José Ivo RIBEIRO JUNIOR Abstract: For exploratory analysis of the principal components (CPs), the assumption of multivariate normality of the variables is not required, nor necessarily that they are random. This means that variables that do not behave randomly can also be included in this analysis. Thus, in order to carry out the analysis of the PCs with random variables or not, a correction of the matrix based on the coefficients of variation was proposed (Campana et al., 2010) by applying the method of Lenth (1989), whose new array was named . To verify its feasibility, ten data sets of random variables Y1, Y2, Y3 and Y4 were simulated, with 10,000 values each and that followed multivariate normal distribution. After the simulation, 0%, 1%, 2%, 3% and 4% of the random values of Y4 were replaced by the same and respective percentages of outliers, in order to break its randomness. Subsequently, response surface analyzes were performed for eight different absolute mean percentage errors obtained in relation to eight parameters related to the performance of the CP analysis, as a function of the replacement percentages by Y4 outliers (0, 1, 2, 3 and 4 ) and the matrices used in the analysis of the PCs. According to the results, it was concluded that, in the presence of only normal random variables, it is the best matrix. On the other hand, when there are outliers, it is the most recommended. PubDate: 2022-09-23 DOI: 10.28951/bjb.v40i3.551 Issue No:Vol. 40, No. 3 (2022)

Authors:Maria Letícia SALVADOR, Eduardo Elías RIBEIRO JUNIOR, César Augusto TACONELI, Idemaro Antonio Rodrigues LARA Abstract: In agronomic experiments, the presence of polytomous variables is common, and the generalized logit model can be used to analyze these data. One of the characteristics of the generalized logit model is the assumption that the variance is a known function of the mean, and the observed variance is expected to be close to that assumed by the model. However, it is not uncommon for extra-multinomial variation to occur, due to the systematic observation of data that are more heterogeneous than the variance specified by the model, a phenomenon known as overdispersion. In this context, the present work discusses a diagnostic of overdispersion in multinomial data, with the proposal of a descriptive measure for this problem, as well as presenting a methodological alternative through the Dirichlet- multinomial model. The descriptive measure is evaluated through simulation, based on two particular scenarios. As a motivational study, we report an experiment applied to fruit growing, whose objective was to compare the flowering of adult plants of an orange tree, grafted on “Rangpur” lime or “Swingle” citrumelo, with as response variable the classification of branches into three categories: lateral flower, no flower or aborted flower, terminal flower. Through the proposed descriptive measure, evidence of overdispersion was verified, indicating that the generalized logit model may not be the most appropriate. Thus, as a methodological alternative, the Dirichlet-multinomial model was used. Compared to the generalized logit model, the Dirichlet-multinomial proved to be more suitable to fit the data with overdispersion, by allowing the inclusion of an additional parameter to accommodate the excessive extra-multinomial dispersion. PubDate: 2022-09-23 DOI: 10.28951/bjb.v40i3.584 Issue No:Vol. 40, No. 3 (2022)

Authors:André Bisca FERREIRA, Josmar MAZUCHELI Abstract: In this paper, we propose the zero, one and zero-and-one-inflated New unit-Lindley distributions as natural extensions of the New unit-Lindley distribution to model continuous responses measured at the following intervals [0, 1), (0, 1] and [0, 1]. They were constructed based on convex combinations between the New unit-Lindley distribution and the distributions degenerate at zero, one, and Bernoulli distribution. They also have a number of interesting properties, such as being members of the exponential family. Besides, they have closed forms for the cumulative distribution functions, quantiles, and moments. Inferential aspects and regression structures are discussed in this work as well as a Monte Carlo simulation study to evaluate the performance of the regressors. Finally, we bring an application to real data on the suicide rate in the year 2016. PubDate: 2022-09-23 DOI: 10.28951/bjb.v40i3.571 Issue No:Vol. 40, No. 3 (2022)

Authors:Luiz Otávio de Oliveira PALA, Marcela de Marillac CARVALHO, Thelma SÁFADI Abstract: Risk and exposure factors are important features to be considered, providing financial and actuarial information for the insurer. Pricing methods are supported by the mutualism theory, ensuring a level of indemnity and expected cost, making possible to constitute monetary reserves. The aim of our paper is to model and analyze the distribution of vehicle insurance claims in the south of Minas Gerais/Brazil. The data represents policies with a claim occurrence in the year of 2018. Under the Bayesian approach, we consider the Gamma and Log-normal distributions that allow asymmetric data modeling and they can be used in loss models. The Jeffreys’s prior class was applied considering the data of the first semester of 2018. The information level was updated to construct an informative prior to analyze the data of the second semester. To compare models, we estimated the Bayes Factor and the logarithm of the marginal likelihood, that showed the Log-normal more likely. After selecting a model, we estimate metrics as the Conditional Tail Expectation (CTE) and the percentiles of the adjusted distribution to evaluate extreme costs. The results showed the applicability of Bayesian inference to fit insurance data, allowing to insert prior knowledge as the portfolio experience and to use a wide class of probability distributions. PubDate: 2022-09-23 DOI: 10.28951/bjb.v40i3.555 Issue No:Vol. 40, No. 3 (2022)

Authors:Jéssica SPURI, Lucas Monteiro CHAVES Abstract: One proof of the friendship theorem, a classical result in combinatorics, is presented. Graphs are intensively used to explain all the steps of the demonstration and thus make it more intuitive. An application in experimental designs is presented. PubDate: 2022-09-23 DOI: 10.28951/bjb.v40i3.586 Issue No:Vol. 40, No. 3 (2022)

Authors:Marcos Vinicius BUENO, Robson Marcelo ROSSI Abstract: The goal of this study was to use frequentist and Bayesian methodologies to adjust some probability distributions for survival time in HIV/AIDS patients in Mato Grosso do Sul, Brazil, followeds from 2009 to 2018. The influence of explanatory variables on the response variable can be calculated using regression models. The Log-Normal distribution was shown to be the most parsimonious for the data using the Akaike information criterion (AIC) values and the maximum likelihood logarithm. Two regression models were built based on the described methodologies, converging to the same interpretation of the explanatory variables: sex, race, education, and injecting drug use. The median time to death from HIV/AIDS is approximately: 2.1 higher for females, 1.8 higher for white people, 5.4 higher for individuals with more than 8 years of education, 5.5 higher for individuals who do not use injecting drugs, according to the study. Based on the interpretations of the coefficients of the model parameters, the need for prevention and early diagnosis policies focused on groups that have a shorter median survival time after notification of HIV infection can be discussed. PubDate: 2022-09-23 DOI: 10.28951/bjb.v40i3.574 Issue No:Vol. 40, No. 3 (2022)