Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Smart contracts are slowly penetrating our society where they are leveraged to support critical business transactions of which financial stakes are high. Smart contract programming is, however, in its infancy, and many failures due to programming defects exploited by malicious attackers and have made the headlines. In recent years, there has been an increasing effort in the literature to identify such vulnerabilities early in smart contracts to reduce the threats to the security of the accounts. Automatically patching smart contracts, however, is a much less investigated research topic. Yet, it can provide tools to help developers in fixing known vulnerabilities more rapidly. In this paper, we propose to review smart contract vulnerabilities and specify templates that will serve to automate patch generation. We implement the TIPS pipeline with 12 fix templates and assess its effectiveness on established smart contract datasets such as SmartBugs and ContractDefects. In particular, we show that TIPS is competitive against the state-of-the-art automated repair approach (SCRepair) in the literature. Finally, we evaluate the impact of the code changes suggested by TIPS in terms of gas usage. PubDate: 2023-09-13
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Apps reviews hold a huge amount of informative user feedback that may be used to assist software practitioners in better understanding users’ needs, identify issues related to quality, such as privacy concerns and low efficiency, and evaluate the perceived users’ satisfaction with the app features. One way to efficiently extract this information is by using Aspect-Based Sentiment Analysis (ABSA). The role of ABSA of apps reviews is to identify all app’s aspects being reviewed and assign a sentiment polarity towards each aspect. This paper aims to build ABSA models using supervised Machine Learning (ML) and Deep Learning (DL) approaches. Our automated technique is intended to (1) identify the most useful and effective text-representation and task-specific features in both Aspect Category Detection (ACD) and Aspect Category Polarity, (2) empirically investigate the performance of conventional ML models when utilized for ABSA task of apps reviews, and (3) empirically compare the performance of ML models and DL models in the context of ABSA task. We built the models using different algorithms/architectures and performed hyper-parameters tuning. In addition, we extracted a set of relevant features for the ML models and performed an ablation study to analyze their contribution to the performance. Our empirical study showed that the ML model trained using Logistic Regression algorithm and BERT embeddings outperformed the other models. Although ML outperformed DL, DL models do not require hand-crafted features and they allow for a better learning of features when trained with more data. PubDate: 2023-09-09
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The Go programming language offers a wide range of primitives to coordinate lightweight threads, e.g., channels, waitgroups, and mutexes—all of which may cause concurrency bugs. Static checkers that guarantee the absence of bugs are essential to help programmers avoid these costly errors before their code is executed. However existing tools either miss too many bugs or cannot handle large programs, and do not support programs that rely on statically unknown parameters that affect their concurrent structure (e.g., number of threads). To address these limitations, we propose a static checker for Go programs which relies on performing bounded model checking of their concurrent behaviours. In contrast to previous works, our approach deals with large codebases, supports programs that have statically unknown parameters, and is extensible to additional concurrency primitives. Our work includes a detailed presentation of the extraction algorithm from Go programs to models, an algorithm to automatically check programs with statically unknown parameters, and a large scale evaluation of our approach. The latter shows that our approach outperforms the state-of-the-art on 220 synthetic programs and 78 buggy programs adapted from existing codebases. PubDate: 2023-08-26
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Widespread applications of deep neural networks (DNNs) benefit from DNN testing to guarantee their quality. In the DNN testing, numerous test cases are fed into the model to explore potential vulnerabilities, but they require expensive manual cost to check the label. Therefore, test case prioritization is proposed to solve the problem of labeling cost, e.g., surprise adequacy-based, uncertainty quantifiers-based and mutation-based prioritization methods. However, most of them suffer from limited scenarios (i.e. high confidence adversarial or false positive cases) and high time complexity. To address these challenges, we propose the concept of the activation graph from the perspective of the spatial relationship of neurons. We observe that the activation graph of cases that triggers the model’s misbehavior significantly differs from that of normal cases. Motivated by it, we design a test case prioritization method based on the activation graph, ActGraph, by extracting the high-order node feature of the activation graph for prioritization. ActGraph explains the difference between the test cases to solve the problem of scenario limitation. Without mutation operations, ActGraph is easy to implement, leading to lower time complexity. Extensive experiments on three datasets and four models demonstrate that ActGraph has the following key characteristics. (i) Effectiveness and generalizability: ActGraph shows competitive performance in all of the natural, adversarial and mixed scenarios, especially in RAUC-100 improvement ( \(\sim \times \) 1.40). (ii) Efficiency: ActGraph runs at less time cost ( \(\sim \times \) 1/50) than the state-of-the-art method. The code of ActGraph is open-sourced at https://github.com/Embed-Debuger/ActGraph. PubDate: 2023-08-22
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Program translation aims to translate one kind of programming language to another, e.g., from Python to Java. Due to the inefficiency of translation rules construction with pure human effort (software engineer) and the low quality of machine translation results with pure machine effort, it is suggested to implement program translation in a human–machine cooperative way. However, existing human–machine program translation methods fail to utilize the human’s ability effectively, which require human to post-edit the results (i.e., statically modified directly on the model generated code). To solve this problem, we propose HMPT (Human-Machine Program Translation), a novel method that achieves program translation based on human–machine cooperation. It can (1) reduce the human effort by introducing a prefix-based interactive protocol that feeds the human’s edit into the model as the prefix and regenerates better output code, and (2) reduce the interactive response time resulted by excessive program length in the regeneration process from two aspects: avoiding duplicate prefix generation with cache attention information, as well as reducing invalid suffix generation by splicing the suffix of the results. The experiments are conducted on two real datasets. Results show compared to the baselines, our method reduces the human effort up to 73.5% at the token level and reduces the response time up to 76.1%. PubDate: 2023-08-21
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The testing of Android applications(apps) is a challenging task due to the serious fragmentation issues and diverse usage environments. To improve the testing efficiency and collect the feedbacks from real usage scenarios, crowdsourcing has been employed in the testing of Android. However, crowdsourced testing is a manual working paradigm, while the shortage of testing guidance for crowd workers who often have limited software engineering knowledge may result in many redundant or invalid test reports. To fill this gap, this paper presents an automated test case generation approach for the testing of Android apps. Our approach is built upon static program analysis and is capable of providing detailed testing steps to guide workers in performing testing. Furthermore, we use the automated testing tool for pre-testing, and crowd workers only need to test the uncovered test cases. We evaluate our approach with six widely-used apps to evaluate its effectiveness and efficiency. The experimental results show that our approach can detect 71.5% more bugs in diverse categories and achieve 21.8% higher path coverage in comparison with classic crowdsourced testing techniques. Also, in the experiment, we detect 44 unknown bugs in the six subjects, which indicates our approach is highly promising for assisting the testing of Android apps in practice. PubDate: 2023-08-02
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Requirements elicitation is a stakeholder-centered approach; therefore, natural language remains an effective way of documenting and validating requirements. As the scope of the software domain grows, software analysts process a higher number of requirements documents, generating delays and errors while characterizing the software domain. Natural language processing is key in such a process, allowing software analysts for speeding up the requirements elicitation process and mitigating the impact of the ambiguity and misinterpretations coming from natural-language-based requirements documents. However, natural-language-processing-based proposals for requirements elicitation are mainly focused on specific domains and still fail for understanding several requirements writing styles. In this paper, we present QUARE, a question-answering model for requirements elicitation. The QUARE model comprises a meta-ontology for requirements elicitation, easing the generation of requirements-elicitation-related questions and the initial structuration of any software domain. In addition, the QUARE model includes a named entity recognition and relation extraction system focused on requirements elicitation, allowing software analysts for processing several requirements writing styles. Although software analysts address a software domain at a time, they use the same kind of questions for identifying and characterizing requirements abstractions such as actors, concepts, and actions from a software domain. Such a process may be framed into the QUARE model workflow. We validate our proposal by using an experimental process including real-world requirements documents coming from several software domains and requirements writing styles. The QUARE model is a novel proposal aimed at supporting software analysts in the requirements elicitation process. PubDate: 2023-07-29
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Obviously, the complexity of mathematical database cost models increases with the evolution of the database technology brought by emerging hardware and the new deployment platforms (ex. Cloud). This finding raises questions about the reliability of past Cost Models (CMs). Indeed, redesigning a database CM to evaluate the quality of service (QoS) attributes (i.e. response time, energy, sizing, etc.) is becoming a challenging task. First, because developers directly implement the CM by hard coding inside a DBMS without a prior design. Second, due to a lack of a stepwise development process to support an incremental CM design and continuous testing to diagnose errors that occur at each design stage. Moreover, reusing CMs for other purposes is a major issue that necessitates investigations to allow designers reusing and adapting CMs according to their needs. To take up these challenges, we propose a model-based framework for incremental design and continuous testing of Database CMs Specifically, we are motivated by proposing an approach that aims at shifting CMs design from an adhoc design to a structured and shared design by using a set of design guidelines inspired from software engineering practices. Finally, we propose to use the DevOps reuse practices (Continuous Integration/Continuous Delivery: CI/CD) to store the CM under design in a repository after each upgrade to be reused, improved, calibrated, and refined for other purposes. We evaluate our approach against common CM features, and we carry out a comparison with some analytical models from the literature. Findings show that our framework provides a high CM prediction accuracy, and identify the right design components with a precision ranging from 85% to 100%. PubDate: 2023-07-28
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Software for scientific computing has been traditionally unfavorable to changes, mainly due to its frequent use of old codebases where modern programming practices for modularity have been relatively difficult to adopt. In this paper, We present our 10-year experience of developing and evolving program adaptations in scientific software. Throughout a series of different adaptive scenario implementations, we show why managing adaptation codes using in-house tools in scientific computing has been difficult, and how we address the issue by means of modern software engineering techniques. PubDate: 2023-07-28
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Quantum computing systems harness the power of quantum mechanics to execute computationally demanding tasks more effectively than their classical counterparts. This has led to the emergence of Quantum Software Engineering (QSE), which focuses on unlocking the full potential of quantum computing systems. As QSE gains prominence, it seeks to address the evolving challenges of quantum software development by offering comprehensive concepts, principles, and guidelines. This paper aims to identify, prioritize, and develop a systematic decision-making framework of the challenging factors associated with QSE process execution. We conducted a literature survey to identify the challenging factors associated with QSE process and mapped them into 7 core categories. Additionally, we used a questionnaire survey to collect insights from practitioners regarding these challenges. To examine the relationships between core categories of challenging factors, we applied Interpretive Structure Modeling (ISM). Lastly, we applied fuzzy TOPSIS to rank the identified challenging factors concerning to their criticality for QSE process. We have identified 22 challenging factors of QSE process and mapped them to 7 core categories. The ISM results indicate that the ‘resources’ category has the most decisive influence on the other six core categories of the identified challenging factors. Moreover, the fuzzy TOPSIS indicates that ‘complex programming’, ‘limited software libraries’, ‘maintenance complexity’, ‘lack of training and workshops’, and ‘data encoding issues’ are the highest priority challenging factor for QSE process execution. Organizations using QSE could consider the identified challenging factors and their prioritization to improve their QSE process. PubDate: 2023-07-26
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Development and Operations (DevOps) is a relatively recent phenomenon that can be defined as a multidisciplinary effort to improve and accelerate the delivery of business values in terms of IT solutions. Many software organizations are heading towards DevOps to leverage its benefits in terms of improved development speed, stability, collaboration, and communication. DevOps practices are essential to effectively implement in software organizations, but little attention has been given in the literature to how these practices can be managed. Our study aims to propose and develop a framework for effectively managing DevOps practices. We have conducted an empirical study using the publicly available HELENA2 dataset to identify the best practices for effectively implementing DevOps. Furthermore, we have used the prediction algorithms such as Support Vector Machine (SVM), Artificial Neural Network (ANN) and Random Forest (RF) to develop a prediction model for DevOps implementation. The findings of this study show that “Continuous deployment”, “Coding standards”, “Continuous integration”, and “Daily Standup” "are the most significant practicesduring the life cycle of projects for effectively managing the DevOps practices. The contribution of this study is not only limited to investigating the best DevOps practices but also provides a prediction of DevOps project success and prioritization of best practices. It can assist software organizations in getting the best possible practices to focus on based on the nature of their projects. PubDate: 2023-07-12 DOI: 10.1007/s10515-023-00388-8
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The recent advent of data protection laws and regulations has emerged to protect privacy and personal information of individuals. As the cases of privacy breaches and vulnerabilities are rapidly increasing, people are aware and more concerned about their privacy. These bring a significant attention to software development teams to address privacy concerns in developing software applications. As today’s software development adopts an agile, issue-driven approach, issues in an issue tracking system become a centralised pool that gathers new requirements, requests for modification and all the tasks of the software project. Hence, establishing an alignment between those issues and privacy requirements is an important step in developing privacy-aware software systems. This alignment also facilitates privacy compliance checking which may be required as an underlying part of regulations for organisations. However, manually establishing those alignments is labour intensive and time consuming. In this paper, we explore a wide range of machine learning and natural language processing techniques which can automatically classify privacy requirements in issue reports. We employ six popular techniques namely Bag-of-Words (BoW), N-gram Inverse Document Frequency (N-gram IDF), Term Frequency-Inverse Document Frequency (TF-IDF), Word2Vec, Convolutional Neural Network (CNN) and Bidirectional Encoder Representations from Transformers (BERT) to perform the classification on privacy-related issue reports in Google Chrome and Moodle projects. The evaluation showed that BoW, N-gram IDF, TF-IDF and Word2Vec techniques are suitable for classifying privacy requirements in those issue reports. In addition, N-gram IDF is the best performer in both projects. PubDate: 2023-06-28 DOI: 10.1007/s10515-023-00387-9
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Large scale software systems must be decomposed into modular units to reduce maintenance efforts. Software Architecture Recovery (SAR) approaches have been introduced to analyze dependencies among software modules and automatically cluster them to achieve high modularity. These approaches employ various types of algorithms for clustering software modules. In this paper, we discuss design decisions and variations in existing genetic algorithms devised for SAR. We present a novel hybrid genetic algorithm that introduces three major differences with respect to these algorithms. First, it employs a greedy heuristic algorithm to automatically determine the number of clusters and enrich the initial population that is generated randomly. Second, it uses a different solution representation that facilitates an arithmetic crossover operator. Third, it is hybridized with a heuristic that improves solutions in each iteration. We present an empirical evaluation with seven real systems as experimental objects. We compare the effectiveness of our algorithm with respect to a baseline and state-of-the-art hybrid genetic algorithms. Our algorithm outperforms others in maximizing the modularity of the obtained clusters. PubDate: 2023-06-26 DOI: 10.1007/s10515-023-00384-y
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Software reverse engineering (SRE) is receiving more attention due to the current trend for rapid software development, since this allows developers to implement systems without any detailed design process and then use SRE tools to generate the design content, such as unified modeling language (UML) diagrams. In extant SRE research, studies have mainly focused on how precisely the conversion can reflect the system. However, little research has investigated the quality of the converted results, which could be inherently affected by poorly written software code. Accordingly, a knowledge-based ontological system (OntoRESec) was developed in this study to enable the online quality assessment of converted UML structural design content. With a focus on class design, OntoRESec features a domain-specific ontology model with a rule-based inference engine to assess the reversed class diagrams in SRE. With regard to the domain of quality assessment, the system is scoped to security in class design, since this is a major concern in terms of design quality and should be incorporated into the automated SRE context. An illustrative case is presented that adopts a security framework (STRIDE) to implement and demonstrate the rule-based inference of quality assessment in OntoRESec. In addition, several system cases are used to evaluate the proposed work. PubDate: 2023-06-18 DOI: 10.1007/s10515-023-00385-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Model transformations are central to model-driven software development. Applications of model transformations include creating models, handling model co-evolution, model merging, and understanding model evolution. In the past, various (semi-)automatic approaches to derive model transformations from meta-models or from examples have been proposed. These approaches require time-consuming handcrafting or the recording of concrete examples, or they are unable to derive complex transformations. We propose a novel unsupervised approach, called Ockham, which is able to learn edit operations from model histories in model repositories. Ockham is based on the idea that meaningful domain-specific edit operations are the ones that compress the model differences. It employs frequent subgraph mining to discover frequent structures in model difference graphs. We evaluate our approach in two controlled experiments and one real-world case study of a large-scale industrial model-driven architecture project in the railway domain. We found that our approach is able to discover frequent edit operations that have actually been applied before. Furthermore, Ockham is able to extract edit operations that are meaningful—in the sense of explaining model differences through the edit operations they comprise—to practitioners in an industrial setting. We also discuss use cases (i.e., semantic lifting of model differences and change profiles) for the discovered edit operations in this industrial setting. We find that the edit operations discovered by Ockham can be used to better understand and simulate the evolution of models. PubDate: 2023-04-26 DOI: 10.1007/s10515-023-00381-1
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Fault localization aims to efficiently locate faults when debugging programs, reducing software development and maintenance costs. Spectrum-based fault location (SBFL) is the most commonly used fault location technology, which calculates and ranks the suspicious value of each program entity with a specific formula by counting the coverage information of all the program entities and execution results of test cases. However, previous SBFL techniques suffered from low accuracy due to the sole use of execution coverage. This paper proposed an approach GNet4FL based on the graph convolutional neural network. GNet4FL first collects static features based on code structure and dynamic features based on test results. Then, GNet4FL uses GraphSAGE to obtain node representation of source codes and performs feature fusion on an entity consisting of multiple nodes, which preserves the topological information of the graph. Finally, the representation of each entity is input to the multi-layer perceptron for training and ranking entities. The results of the study showed that GNet4FL successfully located 160 out of 262 faults, outperforming the three state-of-the-art methods by 94, 42, and 14% in Top-1 accuracy, and having close results to Grace with less cost. Furthermore, we investigated the impact of each component (i.e., graph neural network, pruning, and dynamic features) of GNet4FL on the results. We found that all of these components had a positive impact on the proposed approach. PubDate: 2023-04-24 DOI: 10.1007/s10515-023-00383-z
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Question Answering (QA) is an attractive and challenging area in NLP community. With the development of QA technique, plenty of QA software has been applied in daily human life to provide convenient access of information retrieval. To investigate the performance of QA software, many benchmark datasets have been constructed to provide various test cases. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases are mandatory to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this work, we propose a novel testing method, qaAskeR \(^+\) , with five new Metamorphic Relations for QA software. qaAskeR \(^+\) does not refer to the annotated labels of test cases. Instead, based on the idea that a correct answer should imply a piece of reliable knowledge that always conforms with any other correct answer, qaAskeR \(^+\) tests QA software by inspecting its behaviors on multiple recursively asked questions that are relevant to the same or some further enriched knowledge. Experimental results show that qaAskeR \(^+\) can reveal quite a few violations that indicate actual answering issues on various mainstream QA software without using any pre-annotated labels. PubDate: 2023-03-28 DOI: 10.1007/s10515-023-00380-2
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Rapid GUI prototyping has evolved into a widely applied technique in early stages of software development to facilitate the clarification and refinement of requirements. Especially high-fidelity GUI prototyping has shown to enable productive discussions with customers and mitigate potential misunderstandings, however, the benefits of applying high-fidelity GUI prototypes are accompanied by the disadvantage of being expensive and time-consuming in development and requiring experience to create. In this work, we show RaWi, a data-driven GUI prototyping approach that effectively retrieves GUIs for reuse from a large-scale semi-automatically created GUI repository for mobile apps on the basis of Natural Language (NL) searches to facilitate GUI prototyping and improve its productivity by leveraging the vast GUI prototyping knowledge embodied in the repository. Retrieved GUIs can directly be reused and adapted in the graphical editor of RaWi. Moreover, we present a comprehensive evaluation methodology to enable (i) the systematic evaluation of NL-based GUI ranking methods through a novel high-quality gold standard and conduct an in-depth evaluation of traditional IR and state-of-the-art BERT-based models for GUI ranking, and (ii) the assessment of GUI prototyping productivity accompanied by an extensive user study in a practical GUI prototyping environment. PubDate: 2023-03-14 DOI: 10.1007/s10515-023-00377-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Bug localization technologies and tools are widely used in software engineering. Although state-of-the-art methods have achieved great progress, they only consider the source code information at the text level, which may establish a wrong correlation between the source code and the bug report, affecting the localization accuracy and reliability. In this paper, we propose an improved bug localization model, which uses the semantics of source codes at the graph level to supplement its semantics at the text level, optimizing and adjusting the graph semantics in combination with the attention mechanism to obtain the code semantic feature including the shallow and deep semantics of the source code. Finally, the correlation between code semantic feature and report semantic feature is measured by cosine similarity. We conduct experiments on three open source Java projects to comprehensively evaluate the performance of proposed model. The experimental results show that the model is significantly better than state-of-the-art methods. PubDate: 2023-03-07 DOI: 10.1007/s10515-023-00379-9