Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract This paper proposes a supervised kernel-free quadratic surface regression method for feature selection (QSR-FS). The method is to find a quadratic function in each class and incorporates it into the least squares loss function. The \(l_{2,1}\) -norm regularization term is introduced to obtain a sparse solution, and a feature weight vector is constructed by the coefficients of the quadratic functions in all classes to explain the importance of each feature. An alternating iteration algorithm is designed to solve the optimization problem of this model. The computational complexity of the algorithm is provided, and the iterative formula is reformulated to further accelerate computation. In the experimental part, feature selection and its downstream classification tasks are performed on eight datasets from different domains, and the experimental results are analyzed by relevant evaluation index. Furthermore, feature selection interpretability and parameter sensitivity analysis are provided. The experimental results demonstrate the feasibility and effectiveness of our method. PubDate: 2024-02-15

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract This paper introduces a new family of distributions called the hyperbolic tangent (HT) family. The cumulative distribution function of this model is defined using the standard hyperbolic tangent function. The fundamental properties of the distribution are thoroughly examined and presented. Additionally, an inverse exponential distribution is employed as a sub-model within the HT family, and its properties are also derived. The parameters of the HT family are estimated using the maximum likelihood method, and the performance of these estimators is assessed using a simulation approach. To demonstrate the significance and flexibility of the newly introduced family of distributions, two real data sets are utilized. These data sets serve as practical examples that showcase the applicability and usefulness of the HT family in real-world scenarios. By introducing the HT family, exploring its properties, employing the maximum likelihood estimation, and conducting simulations and real data analyses, this paper contributes to the advancement of statistical modeling and distribution theory. PubDate: 2024-02-15

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The main objective of this paper is to forecast the realized volatility (RV) of Bitcoin futures (BTCF) market. To serve our purpose, we propose an augmented heterogenous autoregressive (HAR) model to consider the information on time-varying jumps observed in BTCF returns. Specifically, we estimate the jump-induced volatility using the GARCH-jump process and then consider this information in the HAR model. Both the in-sample and out-of-sample analyses show that jumps offer added information which is not provided by the existing HAR models. In addition, a novel finding is that the jump-induced volatility offers incremental information relative to the Bitcoin implied volatility index. In sum, our results indicate that the HAR-RV process comprising the leverage effects and jump volatility would predict the RV more precisely compared to the standard HAR-type models. These findings have important implications to cryptocurrency investors. PubDate: 2024-02-14

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, we propose and investigate a novel approach for generating the probability distributions. The novel method is known as the SMP transformation technique. By using the SMP Transformation technique, we have developed a new model of the Lomax distribution known as SMP Lomax (SMPL) distribution. The SMPL distribution, which is comparable to the Sine Power Lomax distribution, Power Length BiasedWeighted Lomax Distribution, Exponentiated Lomax and Lomax distribution have the desirable attribute of allowing the superiority and the flexibility over other well known existing models. Furthermore, the research article examines various aspects related to the SMPL , including the statistical properties along with the maximum likelihood estimation procedure to estimate the parameters. An extensive simulation study is carried out to illustrate the behaviour of MLEs on the basis of Mean Square Errors. To evaluate the effectiveness and flexibility of the proposed distribution, two real-life data sets are employed and it is observed that SMPL outperforms base model of Lomax distribution as well as other mentioned competing models based on Akaike Information Criterion, Akaike Information criterion Corrected, Hannan–Quinn information criterion and other goodness of fit measures. PubDate: 2024-02-13

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Identifying and controlling diseases present in plants is very necessary and useful to have healthy growth in plants and to get products of good quality. In this paper, we proposed a novel model to detect whether a plant is diseased or healthy. This model was developed with a deep neural network (DNN) that extracts and evaluates features from plant leaf images. The proposed DNN model is trained on two popular datasets: New Plant Diseases (Augmented) and Rice Leaf, with 38 and 4 classes of plant leaf images, respectively. The model extracts twelve features from a leaf image. They are: total area, infected area, perimeter, x-centroid, y-centroid, mean intensity, equivalent diameter, entropy, eccentricity, energy, homogeneity, and dissimilarity. We observed that considering these many features for evaluation yield good results. The model has exhibited good performance on the two datasets. The model proposed is trained by setting different values for the following parameters: epoch, batch size, activation function, and dropout. When the model was applied to the validation dataset, it showed good performance. After considerable recreation, the proposed model achieved 96% to 99% classification accuracy for certain classes. When compared to traditional machine learning models, the proposed model achieves better accuracy. The proposed model is also tested for consistency and reliability. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Over the past two decades, the community of data science, computer vision and programming has evolved rapidly and new programming techniques have replaced the computationally expensive techniques. This is achieved with the aid of smart programming languages, smart computers and intelligent minds. The neural networks are replaced by the deep neural networks which are comprised of several layers and neurons, the direct large data “classification” has been replaced by the transfer learning tools, which are computationally more efficient and accurate as long as the user has the clear vision of synchronizing the new problem with the pre-trained model. Artificial intelligence tools are much improved since the discovery of transfer learning tools and the programming time of several days or weeks for the deep networks has now reduced to few minutes or hours. This article presents detailed insight of transfer learning frame work with the aid of some useful programming tools. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, the authors propose a data-driven approach to draw insightful knowledge from the Indian crime data. The proposed approach can be helpful for police and other law enforcement bodies in India for controlling and preventing crime region-wise. In the proposed approach different regression models are built based on different regression algorithms, viz., random forest regression (RFR), decision tree regression (DTR), multiple linear regression (MLR), simple linear regression (SLR), and support vector regression (SVR) after pre-processing the data using MySQL Workbench and R programming. These regression models can predict 28 different types of IPC cognizable crime counts and also a total number of Indian Penal Code (IPC) cognizable crime counts region-wise, state-wise, and year-wise (for all over the country) provided the desired inputs to the model. Data visualization techniques, namely, chord diagrams and map plots, are used to visualize pre-processed data (corresponding to the years 2014 to 2020) and predicted data by the relatively best regression model for the year 2022. For the chosen data, it is concluded that Random Forest Regression (RFR), which predicts total IPC cognizable crime, fits relatively the best, with a 0.96 adjusted r squared value and a MAPE value of 0.2, and among regression models predicting region-wise theft crime count, the random forest regression-based model relatively fits the best, with an adjusted R squared value of 0.96 and a MAPE value of 0.166. These regression models predict that Andhra Pradesh state will have the highest crime counts, with Adilabad district at the top, having 31,933 predicted crime counts. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Recently, the Bivariate Weibull distribution based on different copula functions has received considerable attention in the statistical literature. For progressive Type-II censored samples of bivariate distribution, the effective sample size of censored sample \(m\) is fixed; the progressive censoring scheme is provided during the experiment for one variable and the second variable concomitant it. The Farlie–Gumbel–Morgenstern (FGM) copula has been used to construct the Bivariate Weibull distribution, which is called FGM Bivariate Weibull (FGMBW) distribution. In this paper, we consider the point, and interval estimation of the unknown parameters of the FGM bivariate Weibull distribution based on progressive Type-II censored samples. Two bootstrap confidence intervals are also proposed. In addition, two real data sets have been introduced and analyzed to examine the model in practice. A simulation study has been conducted to compare the preferences between different censoring schemes. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In the domain of natural language processing, part-of-speech (POS) tagging is the most important task. It plays a vital role in applications like sentiment analysis, text summarization, opinion mining, etc. POS tagging is a process of assigning POS information (noun, pronoun, verb, etc.) to the given word. This information is considered in the context of their relationship with the surrounding words. Hindi is very popular language in countries like India, Nepal, United States, Mauritius, etc. Majority of Indians are accustomed to Hindi for reading and writing. They also use Hindi for writing on social media such as Twitter, Facebook, WhatsApp, etc. POS tagging is the most important phase to analyze these Hindi text from social media. The text scripted in Hindi is ambiguous in nature and rich in morphology. It makes identification of POS information challenging. In this article, a heuristic based approach is proposed for identifying POS information. The proposed method deployed a context-based bigram model that create a bigram sequence based on the relationship with the adjacent words. Subsequently, it selects the most likelihood POS information for a word based on both the forward and reverse bigram sequences. The experimental result of the proposed heuristic approach is compared with existing state-of-the-art techniques like hidden Markov model, decision tree, conditional random fields, support vector machine, neural network, and recurrent neural networks. Finally, it is observe that the proposed heuristic approach for POS tagging in Hindi outperforms the existing techniques and attains an accuracy of 94.3%. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Accumulation of the amyloid- \(\beta \) (A \(\beta \) ) peptide in the brain gives rise to a cascade of key events in the pathogenesis of Alzheimer’s disease (AD). It is verified by different research trials that the sleep-wake cycle directly affects A \(\beta \) levels in the brain. The catalytic nature of amyloidosis and the protein aggregation can be understood with the help of enzyme kinetics. During this research, the chemical kinetics of the enzyme and substrate are used to explore the initiation of Alzheimer’s disease, and the associated physiological factors, such as the sleep wake cycles, related to this symptomatology. The model is based on the concentration of the A \(\beta \) fibrils, such that the resulting solution from the mathematical model may help to monitor the concentration gradients (deposition) during sleep deprivation. The model proposed here analyzes the existence of two phases in the production of amyloid fibrils in the sleep deprivation condition: a first phase in which the soluble form of amyloid A \(\beta \) is dominant and a second phase in which the fibrillar form predominates and suggests that such product is the result of a strong imbalance between the production of amyloid A \(\beta \) and its clearance. The time dependent model with delay, helps to explore the production of soluble A \(\beta \) amyloid form by a defective circadian cycle. The limitations of the time dependent model are facilitated by the artificial intelligence (AI) time series forecasting tools. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Statistical techniques allow assertive and controlled studies of projects, processes and products, aiding in management decision-making. Statistical Process Control (SPC) is one of the most important and powerful statistical tools for measuring, monitoring and improving the quality of processes and products. Adopting Artificial Intelligence (AI) has recently gained increasing attention in the SPC literature. This paper presents a combined use of SPC and AI techniques, which results in a novel and efficient process monitoring tool. The proposed prediction control chart, which we call pred-chart, may be regarded as a more robust and flexible alternative (given that it adopts the median behavior of the process) to traditional SPC tools. Besides its ability to recognize patterns and diagnose anomalies in the data, regardless of the sample scenario, this innovative approach is capable of performing its monitoring functions also on a large scale, predicting market scenarios and processes on massive amounts of data. The performance of the pred-chart is evaluated by the average run length (ARL) computed through Monte Carlo simulation studies. Two real data sets (small and medium sets) are also used to illustrate the applicability and usefulness of the proposed control chart for prediction of continuous outcomes. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Diabetic retinopathy is considered the leading cause of blindness in the population. High blood sugar levels can damage the tiny blood vessels in the retina at any time, leading to retinal detachment and sometimes glaucoma blindness. Treatment involves maintaining the current visual quality of the patient, as the disease is irreversible. Early diagnosis and timely treatment are crucial to minimizing the risk of vision loss. However, existing DR recognition strategies face numerous challenges, such as limited training datasets, high training loss, high-dimensional features, and high misclassification rates, which can significantly affect classification accuracies. In this paper, we propose a ResNet-50-based transfer learning method for classifying DR, which leverages the knowledge and expertise gained from training on a large dataset such as ImageNet. Our method involves preprocessing and segmenting the input images, which are then fed into ResNet-50 for extracting optimal features. We freeze a few layers of the pre-trained ResNet-50 and add Global Average Pooling to generate feature maps. The reduced feature maps are then classified to categorize the type of diabetic retinopathy. We evaluated the proposed method on 40 Real-time fundus images gathered from ICF Hospital together with the APTOS-2019 dataset and used various metrics to evaluate its performance. The experimentation results revealed that the proposed method achieved an accuracy of 99.82%, a sensitivity of 99%, a specificity of 96%, and an AUC score of 0.99 compared to existing DR recognition techniques. Overall, our ResNet-50-based transfer learning method presents a promising approach for DR classification and addresses the existing challenges of DR recognition strategies. It has the potential to aid in early DR diagnosis, leading to timely treatment and improved visual outcomes for patients. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper we present a new G family of probability distributions. Some of its mathematical properties are derived. Based on a special member of the new family, a single acceptance sampling plan is considered. The issue of a single sample plan when the lifetime test is truncated at a pre-determined period is discussed. For certain different acceptance levels, confidence limits and values ratio of time and the sample size is desired to assure the estimated fixed mean life. The results of lowest ratio of actual mean life to fixed mean life that confirms acceptance with a given probability are presented. A case study is presented for this purpose. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, we propose a new generalization of Pareto distribution with a truncation parameter. Motivation is provided to generate the distribution. The shape of density and hazard functions are studied in mathematical detail. The raw moments are derived and stochastic ordering is also discussed. The parameter estimation is discussed through several estimators. A simulation study is conducted to compare the estimators. Three examples are provided with real data sets used in the literature. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this challenging world, social media plays a vital role as it is at the pinnacle of data sharing. The advancement in technology has made a huge amount of information available for data analysis and it is on the hotlist nowadays. Opinions of the people are expressed and shared across various social media platforms like Twitter, Facebook, and Instagram. Twitter is a prodigious platform containing an ample amount of data and analyzing the data is of topmost priority. One of the most widely utilized approaches for classifying an individual’s emotions displayed in subjective data is sentiment analysis. Sentiment analysis is done using various algorithms of machine learning like Support Vector Machine, Naive Bayes, Long Short-Term Memory, Decision Tree Classifier, and many more, but this paper aims at the generalized way of performing Twitter sentiment analysis using flask environment. Flask environment provides various inbuilt functionalities to analyze the sentiments of text into three different categories: positive, negative, and neutral. Also, it makes API calls to the Twitter Developer account to fetch the Twitter data. After fetching and analyzing the data, the results get displayed on a webpage containing the percentage of positive, negative, and neutral tweets for a phrase in a pie chart. It displays the language analysis for the same phrase. Furthermore, the webpage calls attention to the tweets done on that phrase and reveals the details of the tweets. Considering the major industry runners of three different sectors namely Enterprises, Sports Apparel Industry, and Multimedia Industry, we have analyzed and compared sentiments of two different Multinational companies from each sector. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Because of the COVID-19 pandemic, most of the tasks have shifted to an online platform. Sectors such as e-commerce, sensitive multi-media transfer, online banking have skyrocketed. Because of this, there is an urgent need to develop highly secure algorithms which can not be hacked into by unauthorized users. The method which is the backbone for building encryption algorithms is the pseudo-random number generator based on chaotic maps. Chaotic maps are mathematical functions that generate a highly arbitrary pattern based on the initial seed value. This manuscript gives a summary of how the chaotic maps are used to generate pseudo-random numbers and perform multimedia encryption. After carefully analyzing all the recent literature, we found that the lowest correlation coefficient was 0.00006, which was achieved by Ikeda chaotic map. The highest entropy was 7.999995 bits per byte using the quantum chaotic map. The lowest execution time observed was 0.23 seconds with the Zaslavsky chaotic map and the highest data rate was 15.367 Mbits per second using a hyperchaotic map. Chaotic map-based pseudo-random number generation can be utilized in multi-media encryption, video-game animations, digital marketing, chaotic system simulation, chaotic missile systems, and other applications. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, we mainly present a machine learning based approach to detect real-time phishing websites by taking into account URL and hyperlink based hybrid features to achieve high accuracy without relying on any third-party systems. In phishing, the attackers typically try to deceive internet users by masking a webpage as an official genuine webpage to steal sensitive information such as usernames, passwords, social security numbers, credit card information, etc. Anti-phishing solutions like blacklist or whitelist, heuristic, and visual similarity based methods cannot detect zero-hour phishing attacks or brand-new websites. Moreover, earlier approaches are complex and unsuitable for real-time environments due to the dependency on third-party sources, such as a search engine. Hence, detecting recently developed phishing websites in a real-time environment is a great challenge in the domain of cybersecurity. To overcome these problems, this paper proposes a hybrid feature based anti-phishing strategy that extracts features from URL and hyperlink information of client-side only. We also develop a new dataset for the purpose of conducting experiments using popular machine learning classification techniques. Our experimental result shows that the proposed phishing detection approach is more effective having higher detection accuracy of 99.17% with the XG Boost technique than traditional approaches. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this study, a one-parameter discrete probability distribution is proposed and studied. The understudy distribution is named “Poisson Moment Exponential distribution”. Mathematical properties of proposed distribution are derived and discussed. For parameter estimation purposes seven different methods maximum likelihood, maximum product spacing, Anderson-Darling, Cramer von-Misses, least-squares, weighted least-squares and right tailed Anderson-Darling are used. The behavior of these estimators is assessed using a Monte Carlo simulation study. Four real datasets from different fields (i.e. failure times, slow-pace students’ marks, epileptic seizure counts, and European corn borer) are used to show the flexibility of the proposed distribution. It is evident that the proposed discrete distribution efficiently analyzed these datasets. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In recent years cryptocurrencies are emerging as a prime digital currency as an important asset and financial system is also emerging as an important aspect. To reduce the risk of investment and to predict price, trend, portfolio construction, and fraud detection some Artificial Intelligence techniques are required. The Paper discusses recent research in the field of AI techniques for cryptocurrency and Bitcoin which is the most popular cryptocurrency. AI and ML techniques such as SVM, ANN, LSTM, GRU, and much other related research work with cryptocurrency and Bitcoin have been reviewed and most relevant studies are discussed in the paper. Also highlighted some possible research opportunities and areas for better efficiency of the results. Recently in the past few years, artificial intelligence (AI) and cybersecurity have advanced expeditiously. Its implementation has been extensively useful in finance as well as has a crucial impact on markets, institutions, and legislation. It is making the world a better place. AI is responsible for the simulation of machines that are replicas of human beings and are intelligent enough. AI in finance is changing the way we communicate with money. It helps the financial industry streamline and optimize processes from credit judgments to quantitative analysis marketing and economic risk management. The main goal of this research has been investigating certain impacts of artificial intelligence in this contemporary world. It's centered on the appeal of artificial intelligence, confrontation, chances, and its influence on professions and careers. The research paper uses AI to enable banks to generate financial resources and to provide valuable customer services. The application of the growing Indian banking sector is part of everyday life made up of several banks like RBI, SBI, HDFC, etc. and these banks have digitally implemented using chat-bots that have brought benefits to the customers. PubDate: 2024-02-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In recent times, various machine learning approaches have been widely employed for effective diagnosis and prediction of diseases like cancer, thyroid, Covid-19, etc. Likewise, Alzheimer’s (AD) is also one progressive malady that destroys memory and cognitive function over time. Unfortunately, there are no dedicated AI-based solutions for diagnoses of AD to go hand in hand with medical diagnosis, even though multiple factors contribute to the diagnosis, making AI a very viable supplementary diagnostic solution. This paper reports an endeavor to apply various machine learning algorithms like SGD, k-Nearest Neighbors, Logistic Regression, Decision tree, Random Forest, AdaBoost, Neural Network, SVM, and Naïve Bayes on the dataset of affected victims to diagnose Alzheimer’s disease. Longitudinal collections of subjects from OASIS dataset have been used for prediction. Moreover, some feature selection and dimension reduction methods like Information Gain, Information Gain Ratio, Gini index, Chi-Squared, and PCA are applied to rank different factors and identify the optimum number of factors from the dataset for disease diagnosis. Furthermore, performance is evaluated of each classifier in terms of ROC-AUC, accuracy, F1 score, recall, and precision as well as included comparative analysis between algorithms. Our study suggests that approximately 90% classification accuracy is observed under top-rated four features CDR, SES, nWBV, and EDUC. PubDate: 2024-02-01