A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

  Subjects -> ELECTRONICS (Total: 207 journals)
The end of the list has been reached or no journals were found for your choice.
Similar Journals
Journal Cover
IEEE Transactions on Software Engineering
Journal Prestige (SJR): 0.548
Citation Impact (citeScore): 5
Number of Followers: 86  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 0098-5589
Published by IEEE Homepage  [228 journals]
  • Just-In-Time Obsolete Comment Detection and Update

    • Free pre-print version: Loading...

      Authors: Zhongxin Liu;Xin Xia;David Lo;Meng Yan;Shanping Li;
      Pages: 1 - 23
      Abstract: Comments are valuable resources for the development, comprehension and maintenance of software. However, while changing code, developers sometimes neglect the evolution of the corresponding comments, resulting in obsolete comments. Such obsolete comments can mislead developers and introduce bugs in the future, and are therefore detrimental. We notice that by detecting and updating obsolete comments in time with code changes, obsolete comments can be effectively reduced and even avoided. We refer to this task as Just-In-Time (JIT) Obsolete Comment Detection and Update. In this work, we propose a two-stage framework named CUP$^mathrm{2}$2 (Two-stage Comment UPdater) to automate this task. CUP$^mathrm{2}$2 consists two components, i.e., an Obsolete Comment Detector named OCD and a Comment UPdater named CUP, each of which relies on a distinct neural network model to perform detection (updates). Specifically, given a code change and a corresponding comment, CUP$^mathrm{2}$2 first leverages OCD to predict whether this comment should be updated. If the answer is yes, CUP will be used to generate the new version of the comment automatically. To evaluate CUP$^mathrm{2}$2, we build a large-scale dataset with over 4 million code-comment change samples. Our dataset focuses on method-level code changes and updates on method header comments considering the importance and widespread use of such comments. Evaluation results show that 1) both OCD and CUP outperform their baselines by significant margins, and 2) CUP$^mathrm{2}$2 performs better than a rule-based baseline. Specifically, the comments generated by CUP$^mathrm{2}$2 are identical to the ground truth for 41.8% of the samples that are predicted to be positive by OCD. We believe CUP$^mathrm{2}$2 can help developers detect obsolete comments, better understand where and how to update obsolete comments and reduce their edits on obsolete comment updates.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Cerebro: Static Subsuming Mutant Selection

    • Free pre-print version: Loading...

      Authors: Aayush Garg;Milos Ojdanic;Renzo Degiovanni;Thierry Titcheu Chekam;Mike Papadakis;Yves Le Traon;
      Pages: 24 - 43
      Abstract: Mutation testing research has indicated that a major part of its application cost is due to the large number of low utility mutants that it introduces. Although previous research has identified this issue, no previous study has proposed any effective solution to the problem. Thus, it remains unclear how to mutate and test a given piece of code in a best effort way, i.e., achieving a good trade-off between invested effort and test effectiveness. To achieve this, we propose Cerebro, a machine learning approach that statically selects subsuming mutants, i.e., the set of mutants that resides on the top of the subsumption hierarchy, based on the mutants’ surrounding code context. We evaluate Cerebro using 48 and 10 programs written in C and Java, respectively, and demonstrate that it preserves the mutation testing benefits while limiting application cost, i.e., reduces all cost application factors such as equivalent mutants, mutant executions, and the mutants requiring analysis. We demonstrate that Cerebro has strong inter-project prediction ability, which is significantly higher than two baseline methods, i.e., supervised learning on features proposed by state-of-the-art, and random mutant selection. More importantly, our results show that Cerebro’s selected mutants lead to strong tests that are respectively capable of killing 2 times higher than the number of subsuming mutants killed by the baselines when selecting the same number of mutants. At the same time, Cerebro reduces the cost-related factors, as it selects, on average, 68% fewer equivalent mutants, while requiring 90% fewer test executions than the baselines.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study

    • Free pre-print version: Loading...

      Authors: Emanuele Iannone;Roberta Guadagni;Filomena Ferrucci;Andrea De Lucia;Fabio Palomba;
      Pages: 44 - 63
      Abstract: Software vulnerabilities are weaknesses in source code that can be potentially exploited to cause loss or harm. While researchers have been devising a number of methods to deal with vulnerabilities, there is still a noticeable lack of knowledge on their software engineering life cycle, for example how vulnerabilities are introduced and removed by developers. This information can be exploited to design more effective methods for vulnerability prevention and detection, as well as to understand the granularity at which these methods should aim. To investigate the life cycle of known software vulnerabilities, we focus on how, when, and under which circumstances the contributions to the introduction of vulnerabilities in software projects are made, as well as how long, and how they are removed. We consider 3,663 vulnerabilities with public patches from the National Vulnerability Database—pertaining to 1,096 open-source software projects on GitHub—and define an eight-step process involving both automated parts (e.g., using a procedure based on the SZZ algorithm to find the vulnerability-contributing commits) and manual analyses (e.g., how vulnerabilities were fixed). The investigated vulnerabilities can be classified in 144 categories, take on average at least 4 contributing commits before being introduced, and half of them remain unfixed for at least more than one year. Most of the contributions are done by developers with high workload, often when doing maintenance activities, and removed mostly with the addition of new source code aiming at implementing further checks on inputs. We conclude by distilling practical implications on how vulnerability detectors should work to assist developers in timely identifying these issues.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Spork: Structured Merge for Java With Formatting Preservation

    • Free pre-print version: Loading...

      Authors: Simon Larsén;Jean-Rémy Falleri;Benoit Baudry;Martin Monperrus;
      Pages: 64 - 83
      Abstract: The highly parallel workflows of modern software development have made merging of source code a common activity for developers. The state of the practice is based on line-based merge, which is ubiquitously used with “git merge”. Line-based merge is however a generalized technique for any text that cannot leverage the structured nature of source code, making merge conflicts a common occurrence. As a remedy, research has proposed structured merge tools, which typically operate on abstract syntax trees instead of raw text. Structured merging greatly reduces the prevalence of merge conflicts but suffers from important limitations, the main ones being a tendency to alter the formatting of the merged code and being prone to excessive running times. In this paper, we present spork, a novel structured merge tool for java. spork is unique as it preserves formatting to a significantly greater degree than comparable state-of-the-art tools. spork is also overall faster than the state of the art, in particular significantly reducing worst-case running times in practice. We demonstrate these properties by replaying 1740 real-world file merges collected from 119 open-source projects, and further demonstrate several key differences between spork and the state of the art with in-depth case studies.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect
           Prediction

    • Free pre-print version: Loading...

      Authors: Chanathip Pornprasit;Chakkrit Kla Tantithamthavorn;
      Pages: 84 - 98
      Abstract: Defect prediction is proposed to assist practitioners effectively prioritize limited Software Quality Assurance (SQA) resources on the most risky files that are likely to have post-release software defects. However, there exist two main limitations in prior studies: (1) the granularity levels of defect predictions are still coarse-grained and (2) the surrounding tokens and surrounding lines have not yet been fully utilized. In this paper, we perform a survey study to better understand how practitioners perform code inspection in modern code review process, and their perception on a line-level defect prediction. According to the responses from 36 practitioners, we found that 50% of them spent at least 10 minutes to more than one hour to review a single file, while 64% of them still perceived that code inspection activity is challenging to extremely challenging. In addition, 64% of the respondents perceived that a line-level defect prediction tool would potentially be helpful in identifying defective lines. Motivated by the practitioners’ perspective, we present DeepLineDP, a deep learning approach to automatically learn the semantic properties of the surrounding tokens and lines in order to identify defective files and defective lines. Through a case study of 32 releases of 9 software projects, we find that the risk score of code tokens varies greatly depending on their location. Our DeepLineDP is 17%-37% more accurate than other file-level defect prediction approaches; is 47%-250% more cost-effective than other line-level defect prediction approaches; and achieves a reasonable performance when transferred to other software projects. These findings confirm that the surrounding tokens and surrounding lines should be considered to identify the fine-grained locations of defective files (i.e., defective lines).
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Achieving High MAP-Coverage Through Pattern Constraint Reduction

    • Free pre-print version: Loading...

      Authors: Yingquan Zhao;Zan Wang;Shuang Liu;Jun Sun;Junjie Chen;Xiang Chen;
      Pages: 99 - 112
      Abstract: Testing multi-threaded programs is challenging due to the enormous space of thread interleavings. Recently, a code coverage criterion for multi-threaded programs called MAP-coverage has been proposed and shown to be effective for testing concurrent programs. Existing approaches for achieving high MAP-coverage are based on random testing with simple heuristics, which is ineffective in systematically triggering rare thread interleavings. In this study, we propose a novel approach called pattern constraint reduction (PCR), which employs optimized constraint solving to generate thread interleavings for high MAP-coverage. The idea is to iteratively encode and solve path conditions to generate thread interleavings which are guaranteed to improve MAP-coverage. Furthermore, we effectively apply interpolation techniques to reduce the efforts of constraint solving by avoiding solving infeasible constraints. The experiment results on 20 benchmark programs show that our approach complements existing random testing based approaches when there are rare failure-inducing interleaving in the whole search space. Specifically, PCR finds concurrency bugs faster in 18 out of 20 programs, with an average speedup of 4.2x and a maximum speedup of 11.4x.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • APIMatchmaker: Matching the Right APIs for Supporting the Development of
           Android Apps

    • Free pre-print version: Loading...

      Authors: Yanjie Zhao;Li Li;Haoyu Wang;Qiang He;John Grundy;
      Pages: 113 - 130
      Abstract: Android developers are often faced with the need to learn how to use different APIs suitable for their projects. Automated API recommendation approaches have been invented to help fill this gap, and these have been demonstrated to be useful to some extent. Unfortunately, most state-of-the-art works are not proposed for Android developers, and the ones dedicated to Android app development often suffer from high redundancy and poor run-time performance, or do not target the problem of recommending API usage patterns. To address this gap we propose to the community a new tool, namely APIMatchmaker, to recommend API usages by learning directly from similar real-world Android apps. Unlike existing recommendation approaches, which leverage a single context to find similar projects, we innovatively introduce a multi-dimensional, context-aware, collaborative filtering approach to better achieve the purpose. Specifically, in addition to code similarity, we also take app descriptions (or topics) into consideration to ensure that similar apps also provide similar functions. We evaluate APIMatchmaker on a large number of real-world Android apps and observe that APIMatchmaker yields a high success rate in recommending APIs for Android apps under development, and it is also able to outperform the state-of-the-art.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • An Experimental Assessment of Using Theoretical Defect Predictors to Guide
           Search-Based Software Testing

    • Free pre-print version: Loading...

      Authors: Anjana Perera;Aldeida Aleti;Burak Turhan;Marcel Böhme;
      Pages: 131 - 146
      Abstract: Automated test generators, such as search-based software testing (SBST) techniques are primarily guided by coverage information. As a result, they are very effective at achieving high code coverage. However, is high code coverage alone sufficient to detect bugs effectively' In this paper, we propose a new SBST technique, predictive many objective sorting algorithm (PreMOSA), which augments coverage information with defect prediction information to decide where to increase the test coverage in the class under test (CUT). Through an experimental evaluation using 420 labelled bugs on the Defects4J benchmark and using theoretical defect predictors, we demonstrate the improved effectiveness and efficiency of PreMOSA in detecting bugs when using any acceptable defect predictor, i.e., a defect predictor with recall and precision $geq$≥ 75%, compared to the state-of-the-art dynamic many objective sorting algorithm (DynaMOSA). PreMOSA detects up to 8.3% more labelled bugs on average than DynaMOSA when given a time budget of 2 minutes for test generation per CUT.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

    • Free pre-print version: Loading...

      Authors: Zimin Chen;Steve Kommrusch;Martin Monperrus;
      Pages: 147 - 165
      Abstract: In this paper, we address the problem of automatic repair of software vulnerabilities with deep learning. The major problem with data-driven vulnerability repair is that the few existing datasets of known confirmed vulnerabilities consist of only a few thousand examples. However, training a deep learning model often requires hundreds of thousands of examples. In this work, we leverage the intuition that the bug fixing task and the vulnerability fixing task are related and that the knowledge learned from bug fixes can be transferred to fixing vulnerabilities. In the machine learning community, this technique is called transfer learning. In this paper, we propose an approach for repairing security vulnerabilities named VRepair which is based on transfer learning. VRepair is first trained on a large bug fix corpus and is then tuned on a vulnerability fix dataset, which is an order of magnitude smaller. In our experiments, we show that a model trained only on a bug fix corpus can already fix some vulnerabilities. Then, we demonstrate that transfer learning improves the ability to repair vulnerable C functions. We also show that the transfer learning model performs better than a model trained with a denoising task and fine-tuned on the vulnerability fixing task. To sum up, this paper shows that transfer learning works well for repairing security vulnerabilities in C compared to learning on a small dataset.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Learning How to Listen: Automatically Finding Bug Patterns in Event-Driven
           JavaScript APIs

    • Free pre-print version: Loading...

      Authors: Ellen Arteca;Max Schäfer;Frank Tip;
      Pages: 166 - 184
      Abstract: Event-driven programming is widely practiced in the JavaScript community, both on the client side to handle UI events and AJAX requests, and on the server side to accommodate long-running operations such as file or network I/O. Many popular event-based APIs allow event names to be specified as free-form strings without any validation, potentially leading to lost events for which no listener has been registered and dead listeners for events that are never emitted. In previous work, Madsen et al. presented a precise static analysis for detecting such problems, but their analysis does not scale because it may require a number of contexts that is exponential in the size of the program. Concentrating on the problem of detecting dead listeners, we present an approach to learn how to use event-based APIs by first mining a large corpus of JavaScript code using a simple static analysis to identify code snippets that register an event listener, and then applying statistical modeling to identify anomalous patterns, which often indicate incorrect API usage. In a large-scale evaluation on 127,531 open-source JavaScript code bases, our technique was able to detect 75 anomalous listener-registration patterns, while maintaining a precision of 90.9% and recall of 7.5% over a validation set, demonstrating that a learning-based approach to detecting event-handling bug patterns is feasible. In an additional experiment, we investigated instances of these patterns in 25 open-source projects, and reported 30 issues to the project maintainers, of which 7 have been confirmed as bugs.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Runtime Permission Issues in Android Apps: Taxonomy, Practices, and Ways
           Forward

    • Free pre-print version: Loading...

      Authors: Ying Wang;Yibo Wang;Sinan Wang;Yepang Liu;Chang Xu;Shing-Chi Cheung;Hai Yu;Zhiliang Zhu;
      Pages: 185 - 210
      Abstract: Android introduces a new permission model that allows apps to request permissions at runtime rather than at the installation time since 6.0 (Marshmallow, API level 23). While this runtime permission model provides users with greater flexibility in controlling an app's access to sensitive data and system features, it brings new challenges to app development. First, as users may grant or revoke permissions at any time while they are using an app, developers need to ensure that the app properly checks and requests required permissions before invoking any permission-protected APIs. Second, Android's permission mechanism keeps evolving and getting customized by device manufacturers. Developers are expected to comprehensively test their apps on different Android versions and device models to make sure permissions are properly requested in all situations. Unfortunately, these requirements are often impractical for developers. In practice, many Android apps suffer from various runtime permission issues (ARP issues). While existing studies have explored ARP issues, the understanding of such issues is still preliminary. To better characterize ARP issues, we performed an empirical study using 135 Stack Overflow posts that discuss ARP issues and 199 real ARP issues archived in popular open-source Android projects on GitHub. Via analyzing the data, we observed 11 types of ARP issues that commonly occur in Android apps. For each type of issues, we systematically studied: (1) how they can be manifested, (2) how pervasive and serious they are in real-world apps, and (3) how they can be fixed. We also analyzed the evolution trend of different types of issues from 2015 to 2020 to understand their impact on the Android ecosystem. Furthermore, we conducted a field survey and in-depth interviews among the practitioners from open-source community and industry, to gain insights from practitioners’ practices and learn their requirem-nts of tools that can help combat ARP issues. Finally, to understand the strengths and weaknesses of the existing tools that can detect ARP issues, we built ARPBench, an open benchmark consisting of 94 real ARP issues, and evaluated the performance of three available tools. The experimental results indicate that the existing tools have very limited supports for detecting our observed issue types and report a large number of false alarms. We further analyzed the tools’ limitations and summarized the challenges of designing an effective ARP issue detection technique. We hope that our findings can shed light on future research and provide useful guidance to practitioners.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • The Human Side of Software Engineering Teams: An Investigation of
           Contemporary Challenges

    • Free pre-print version: Loading...

      Authors: Marco Hoffmann;Daniel Mendez;Fabian Fagerholm;Anton Luckhardt;
      Pages: 211 - 225
      Abstract: Context: There have been numerous recent calls for research on the human side of software engineering and its impact on various factors such as productivity, developer happiness and project success. An analysis of which challenges in software engineering teams are most frequent is still missing. As teams are more international, it is more frequent that their members have different human values as well as different communication habits. Additionally, virtual team setups (working geographically separated, remote communication using digital tools and frequently changing team members) are increasingly prevalent. Objective: We aim to provide a starting point for a theory about contemporary human challenges in teams and their causes in software engineering. To do so, we look to establish a reusable set of challenges and start out by investigating the effect of team virtualization. Virtual teams often use digital communication and consist of members with different nationalities that may have more divergent human values due to cultural differences compared to single nationality teams. Method: We designed a survey instrument and asked respondents to assess the frequency and criticality of a set of challenges, separated in context ”within teams” as well as ”between teams and clients”, compiled from previous empirical work, blog posts, and pilot survey feedback. For the team challenges, we asked if mitigation measures were already in place to tackle the challenge. Respondents were also asked to provide information about their team setup. The survey included the Personal Value Questionnaire to measure Schwartz human values. Finally, respondents were asked if there were additional challenges at their workplace. The survey was first piloted and then distributed to professionals working in software engineering teams via social networking sites and persona- business networks. Result: In this article, we report on the results obtained from 192 respondents. We present a set of challenges that takes the survey feedback into account and introduce two categories of challenges; ”interpersonal” and ”intrapersonal”. We found no evidence for links between human values and challenges. We found some significant links between the number of distinct nationalities in a team and certain challenges, with less frequent and critical challenges occurring if 2-3 different nationalities were present compared to a team having members of just one nationality or more than three. A higher degree of virtualization seems to increase the frequency of some human challenges, which warrants further research about how to improve working processes when teams work from remote or in a distributed fashion. Conclusion: We present a set of human challenges in software engineering that can be used for further research on causes and mitigation measures, which serves as our starting point for a theory about causes of contemporary human challenges in software engineering teams. We report on evidence that a higher degree of virtualization of teams leads to an increase of certain challenges. This warrants further research to gather more evidence and test countermeasures, such as whether the employment of virtual reality software incorporating facial expressions and movements can help establish a less detached way of communication.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence
           Checking

    • Free pre-print version: Loading...

      Authors: Huaijin Wang;Pingchuan Ma;Yuanyuan Yuan;Zhibo Liu;Shuai Wang;Qiyi Tang;Sen Nie;Shi Wu;
      Pages: 226 - 250
      Abstract: Binary code function search has been used as the core basis of various security and software engineering applications, including malware clustering, code clone detection, and vulnerability audits. Recognizing logically similar assembly functions, however, remains a challenge. Most binary code search tools rely on program structure-level information, such as control flow and data flow graphs, that is extracted using program analysis techniques or deep neural networks (DNNs). However, DNN-based techniques capture lexical-, control structure-, or data flow-level information of binary code for representation learning, which is often too coarse-grained and does not accurately denote program functionality. Additionally, it may exhibit low robustness to a variety of challenging settings, such as compiler optimizations and obfuscations. This paper proposes a general solution for enhancing the top-$k$k ranked candidates in DNN-based binary code function search. The key idea is to design a low-cost and comprehensive equivalence check that quickly exposes functionality deviations between the target function and its top-$k$k matched functions. Functions that fail this equivalence check can be shaved from the top-$k$k list, and functions that pass the check can be revisited to move ahead on the top-$k$k ranked candidates, in a deliberate way. We design a practical and efficient equivalence check, named BinUSE, using under-constrained symbolic execution (USE). USE, a variant of symbolic execution, improves scalability by initiating symbolic execution directly from function entry points and relaxing constraints on function parameters. It eliminates the overhead incurred by path explosion and costly constraints. BinUSE is specifically designed to deliver an assembly function-level equivalence check, enhancing DNN-based binary code search by reducing its false alarms with low cost. Our evaluation shows that BinUSE can enable a general and effective enhancement of four state-of-the-art DNN-based binary code search tools when confronted with challenges posed by different compilers, optimizations, obfuscations, and architectures.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Impact of Usability Mechanisms: A Family of Experiments on Efficiency,
           Effectiveness and User Satisfaction

    • Free pre-print version: Loading...

      Authors: Juan M. Ferreira;Francy D. Rodríguez;Adrián Santos;Oscar Dieste;Silvia T. Acuña;Natalia Juristo;
      Pages: 251 - 267
      Abstract: Context: The usability software quality characteristic aims to improve system user performance. In a previous study, we found evidence of the impact of a set of usability features from the viewpoint of users in terms of efficiency, effectiveness and satisfaction. However, the impact level appears to depend on the usability feature and suggest priorities with respect to their implementation depending on how they promote user performance. Objectives: We use a family of three experiments to increase the precision and generalization of the results in the baseline experiment and provide findings regarding the impact on user performance of the Abort Operation, Progress Feedback and Preferences usability mechanisms. Method: We conduct two replications of the baseline experiment in academic settings. We analyse the data of 366 experimental subjects and apply aggregation (meta-analysis) procedures. Results: We find that the Abort Operation and Preferences usability mechanisms appear to improve system usability a great deal with respect to efficiency, effectiveness and user satisfaction. Conclusions: We find that the family of experiments further corroborates the results of the baseline experiment. Most of the results are statistically significant, and, because of the large number of experimental subjects, the evidence that we gathered in the replications is sufficient to outweigh other experiments.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Cross-Project Online Just-In-Time Software Defect Prediction

    • Free pre-print version: Loading...

      Authors: Sadia Tabassum;Leandro L. Minku;Danyi Feng;
      Pages: 268 - 287
      Abstract: Cross-Project (CP) Just-In-Time Software Defect Prediction (JIT-SDP) makes use of CP data to overcome the lack of data necessary to train well performing JIT-SDP classifiers at the beginning of software projects. However, such approaches have never been investigated in realistic online learning scenarios, where Within-Project (WP) software changes naturally arrive over time and can be used to automatically update the classifiers. We provide the first investigation of when and to what extent CP data are useful for JIT-SDP in such realistic scenarios. For that, we propose three different online CP JIT-SDP approaches that can be updated with incoming CP and WP training examples over time. We also collect data on 9 proprietary software projects and use 10 open source software projects to analyse these approaches. We find that training classifiers with incoming CP+WP data can lead to absolute improvements in G-mean of up to 53.89% and up to 35.02% at the initial stage of the projects compared to classifiers using WP-only and CP-only data, respectively. Using CP+WP data was also shown to be beneficial after a large number of WP data were received. Using CP data to supplement WP data helped the classifiers to reduce or prevent large drops in predictive performance that may occur over time, leading to absolute G-Mean improvements of up to 37.35% and 48.16% compared to WP-only and CP-only data during such periods, respectively. During periods of stable predictive performance, absolute improvements were of up to 29.03% and up to 41.25% compared to WP-only and CP-only classifiers, respectively. Our results highlight the importance of using both CP and WP data together in realistic online JIT-SDP scenarios.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Automatic Detection of Java Cryptographic API Misuses: Are We There
           Yet'

    • Free pre-print version: Loading...

      Authors: Ying Zhang;Md Mahir Asef Kabir;Ya Xiao;Danfeng Yao;Na Meng;
      Pages: 288 - 303
      Abstract: The Java platform provides various cryptographic APIs to facilitate secure coding. However, correctly using these APIs is challenging for developers who lack cybersecurity training. Prior work shows that many developers misused APIs and consequently introduced vulnerabilities into their software. To eliminate such vulnerabilities, people created tools to detect and/or fix cryptographic API misuses. However, it is still unknown (1) how current tools are designed to detect cryptographic API misuses, (2) how effectively the tools work to locate API misuses, and (3) how developers perceive the usefulness of tools’ outputs. For this paper, we conducted an empirical study to investigate the research questions mentioned above. Specifically, we first conducted a literature survey on existing tools and compared their approach design from different angles. Then we applied six of the tools to three popularly used benchmarks to measure tools’ effectiveness of API-misuse detection. Next, we applied the tools to 200 Apache projects and sent 57 vulnerability reports to developers for their feedback. Our study revealed interesting phenomena. For instance, none of the six tools was found universally better than the others; however, CogniCrypt, CogniGuard, and Xanitizer outperformed SonarQube. More developers rejected tools’ reports than those who accepted reports (30 versus 9) due to their concerns on tools’ capabilities, the correctness of suggested fixes, and the exploitability of reported issues. This study reveals a significant gap between the state-of-the-art tools and developers’ expectations; it sheds light on future research in vulnerability detection.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Enhancing the Capability of Testing-Based Formal Verification by Handling
           Operations in Software Packages

    • Free pre-print version: Loading...

      Authors: Ai Liu;Shaoying Liu;
      Pages: 304 - 324
      Abstract: Testing a program based on its specification is necessary to ensure that the program meets its desired functionality. Formal methods, based on some mathematical theories, are often used to enhance the quality of systems but suffer from difficulties in application. The Testing-Based Formal Verification (TBFV) is proposed as an alternative to ensure the correctness of all traversed program paths, but is limited and impractical due to the lack of the capability of dealing with operations (e.g., methods defined in classes) provided in software packages. In this paper, we provide an axiomatic approach to dealing with this problem so as to enhance the capability of the TBFV. In particular, we focus on the Vector, ArrayList, and LinkedList classes in Java. We present both an example to demonstrate how our approach works properly and two small experiments conducted to evaluate the performance of our approach by comparing it with the specification-based testing (SBT). The result shows that our approach is more than 30% superior to the SBT in bug detection.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • On the Relationship Between Organizational Structure Patterns and
           Architecture in Agile Teams

    • Free pre-print version: Loading...

      Authors: Damian A. Tamburri;Rick Kazman;Hamed Fahimi;
      Pages: 325 - 347
      Abstract: Forming members of an organisation into coherent groups or teams is an important issue in any large-scale software engineering endeavour, especially so in agile software development where teams rely heavily on self-organisation and organisational flexibility. But is there a recurrent organisational structure pattern in agile software engineering teams' and if so what does that pattern imply, in terms of software architecture quality' We address these questions using mixed-methods research in industry featuring interviews, surveys, and Delphi studies of real agile teams. In our study of 30 agile software teams we found that, out of seven organisational structure patterns that recur across our dataset, a single organisational pattern occurs over 37% of the time. This pattern: (a) reflects young communities (1-12 months old); (b) disappears in established ones (13+ months); and (c) reflects the highest number of architecture smells reported. Finally, we observe a negative correlation between a proposed organisational measure and architecture smells. On the one hand, these insights may serve to aid architects in designing not only their architectures but also their communities to best support their co-evolution. On the other hand, we observe that organisational structures in software engineering influence much more than simply software architectures, and we expect our results to lay the foundations of more structured and rigorous approaches to organisational structure studies and use in software engineering research and practice.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • ARTE: Automated Generation of Realistic Test Inputs for Web APIs

    • Free pre-print version: Loading...

      Authors: Juan C. Alonso;Alberto Martin-Lopez;Sergio Segura;José María García;Antonio Ruiz-Cortés;
      Pages: 348 - 363
      Abstract: Automated test case generation for web APIs is a thriving research topic, where test cases are frequently derived from the API specification. However, this process is only partially automated since testers are usually obliged to manually set meaningful valid test inputs for each input parameter. In this article, we present ARTE, an approach for the automated extraction of realistic test data for web APIs from knowledge bases like DBpedia. Specifically, ARTE leverages the specification of the API parameters to automatically search for realistic test inputs using natural language processing, search-based, and knowledge extraction techniques. ARTE has been integrated into RESTest, an open-source testing framework for RESTful APIs, fully automating the test case generation process. Evaluation results on 140 operations from 48 real-world web APIs show that ARTE can efficiently generate realistic test inputs for 64.9% of the target parameters, outperforming the state-of-the-art approach SAIGEN (31.8%). More importantly, ARTE supported the generation of over twice as many valid API calls (57.3%) as random generation (20%) and SAIGEN (26%), leading to a higher failure detection capability and uncovering several real-world bugs. These results show the potential of ARTE for enhancing existing web API testing tools, achieving an unprecedented level of automation.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Using the SOCIO Chatbot for UML Modelling: A Family of Experiments

    • Free pre-print version: Loading...

      Authors: Ranci Ren;John W. Castro;Adrián Santos;Oscar Dieste;Silvia T. Acuña;
      Pages: 364 - 383
      Abstract: Context: Recent developments in natural language processing have facilitated the adoption of chatbots in typically collaborative software engineering tasks (such as diagram modelling). Families of experiments can assess the performance of tools and processes and, at the same time, alleviate some of the typical shortcomings of individual experiments (e.g., inaccurate and potentially biased results due to a small number of participants). Objective: Compare the usability of a chatbot for collaborative modelling (i.e., SOCIO) and an online web tool (i.e., Creately). Method: We conducted a family of three experiments to evaluate the usability of SOCIO against the Creately online collaborative tool in academic settings. Results: The student participants were faster at building class diagrams using the chatbot than with the online collaborative tool and more satisfied with SOCIO. Besides, the class diagrams built using the chatbot tended to be more concise —albeit slightly less complete. Conclusion: Chatbots appear to be helpful for building class diagrams. In fact, our study has helped us to shed light on the future direction for experimentation in this field and lays the groundwork for researching the applicability of chatbots in diagramming.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Learning Configurations of Operating Environment of Autonomous Vehicles to
           Maximize their Collisions

    • Free pre-print version: Loading...

      Authors: Chengjie Lu;Yize Shi;Huihui Zhang;Man Zhang;Tiexin Wang;Tao Yue;Shaukat Ali;
      Pages: 384 - 402
      Abstract: Autonomous vehicles must operate safely in their dynamic and continuously-changing environment. However, the operating environment of an autonomous vehicle is complicated and full of various types of uncertainties. Additionally, the operating environment has many configurations, including static and dynamic obstacles with which an autonomous vehicle must avoid collisions. Though various approaches targeting environment configuration for autonomous vehicles have shown promising results, their effectiveness in dealing with a continuous-changing environment is limited. Thus, it is essential to learn realistic environment configurations of continuously-changing environment, under which an autonomous vehicle should be tested regarding its ability to avoid collisions. Featured with agents dynamically interacting with the environment, Reinforcement Learning (RL) has shown great potential in dealing with complicated problems requiring adapting to the environment. To this end, we present an RL-based environment configuration learning approach, i.e., DeepCollision, which intelligently learns environment configurations that lead an autonomous vehicle to crash. DeepCollision employs Deep Q-Learning as the RL solution, and selects collision probability as the safety measure, to construct the reward function. We trained four DeepCollision models and conducted an experiment to compare them with two baselines, i.e., random and greedy. Results show that DeepCollision demonstrated significantly better effectiveness in generating collisions compared with the baselines. We also provide recommendations on configuring DeepCollision with the most suitable time interval based on different road structures.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Nighthawk: Fully Automated Localizing UI Display Issues via Visual
           Understanding

    • Free pre-print version: Loading...

      Authors: Zhe Liu;Chunyang Chen;Junjie Wang;Yuekai Huang;Jun Hu;Qing Wang;
      Pages: 403 - 418
      Abstract: Graphical User Interface (GUI) provides a visual bridge between a software application and end users, through which they can interact with each other. With the upgrading of mobile devices and the development of aesthetics, the visual effects of the GUI are more and more attracting, and users pay more attention to the accessibility and usability of applications. However, such GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, component occlusion, missing image always occur during GUI rendering on different devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a fully automated approach, Nighthawk, based on deep learning for modelling visual information of the GUI screenshot. Nighthawk can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. At the same time, training the model needs a large amount of labeled buggy screenshots, which requires considerable manual effort to prepare them. We therefore propose a heuristic-based training data auto-generation method to automatically generate the labeled training data. The evaluation demonstrates that our Nighthawk can achieve average 0.84 precision and 0.84 recall in detecting UI display issues, average 0.59 AP and 0.60 AR in localizing these issues. We also evaluate Nighthawk with popular Android apps on Google Play and F-Droid, and successfully uncover 151 previously-undetected UI display issues with 75 of them being confirmed or fixed so far.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Let’s Talk With Developers, Not About Developers: A Review of Automatic
           Program Repair Research

    • Free pre-print version: Loading...

      Authors: Emily Winter;Vesna Nowack;David Bowes;Steve Counsell;Tracy Hall;Sæmundur Haraldsson;John Woodward;
      Pages: 419 - 436
      Abstract: Automatic program repair (APR) offers significant potential for automating some coding tasks. Using APR could reduce the high costs historically associated with fixing code faults and deliver significant benefits to software engineering. Adopting APR could also have profound implications for software developers’ daily activities, transforming their work practices. To realise the benefits of APR it is vital that we consider how developers feel about APR and the impact APR may have on developers’ work. Developing APR tools without consideration of the developer is likely to undermine the success of APR deployment. In this paper, we critically review how developers are considered in APR research by analysing how human factors are treated in 260 studies from Monperrus’s Living Review of APR. Over half of the 260 studies in our review were motivated by a problem faced by developers (e.g., the difficulty associated with fixing faults). Despite these human-oriented motivations, fewer than 7% of the 260 studies included a human study. We looked in detail at these human studies and found their quality mixed (for example, one human study was based on input from only one developer). Our results suggest that software developers are often talked about in APR studies, but are rarely talked with. A more comprehensive and reliable understanding of developer human factors in relation to APR is needed. Without this understanding, it will be difficult to develop APR tools and techniques which integrate effectively into developers’ workflows. We recommend a future research agenda to advance the study of human factors in APR.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • An Empirical Study of Yanked Releases in the Rust Package Registry

    • Free pre-print version: Loading...

      Authors: Hao Li;Filipe R. Cogo;Cor-Paul Bezemer;
      Pages: 437 - 449
      Abstract: Cargo, the software packaging manager of Rust, provides a yank mechanism to support release-level deprecation, which can prevent packages from depending on yanked releases. Most prior studies focused on code-level (i.e., deprecated APIs) and package-level deprecation (i.e., deprecated packages). However, few studies have focused on release-level deprecation. In this study, we investigate how often and how the yank mechanism is used, the rationales behind its usage, and the adoption of yanked releases in the Cargo ecosystem. Our study shows that 9.6% of the packages in Cargo have at least one yanked release, and the proportion of yanked releases kept increasing from 2014 to 2020. Package owners yank releases for other reasons than withdrawing a defective release, such as fixing a release that does not follow semantic versioning or indicating a package is removed or replaced. In addition, we found that 46% of the packages directly adopted at least one yanked release and the yanked releases propagated through the dependency network, which leads to 1.4% of the releases in the ecosystem having unresolved dependencies.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • Generating Concise Patches for Newly Released Programming Assignments

    • Free pre-print version: Loading...

      Authors: Leping Li;Hui Liu;Kejun Li;Yanjie Jiang;Rui Sun;
      Pages: 450 - 467
      Abstract: In programming courses, providing students with concise and constructive feedback on faulty submissions (programs) is highly desirable. However, providing feedback manually is often time-consuming and tedious. To release tutors from the manual construction of concise feedback, researchers have proposed approaches such as CLARA and Refactory to construct feedback automatically. The key to such approaches is to fix a faulty program by making it equivalent to one of its correct reference programs whose overall structure is identical to that of the faulty submission. However, for a newly released assignment, it is likely that there are no correct reference programs at all, let alone correct reference programs sharing identical structure with the faulty submission. Therefore, in this paper, we propose AssignmentMender generating concise patches for newly released assignments. The key insight of AssignmentMender is that a faulty submission can be repaired by reusing fine-grained code snippets from submissions (even when they are faulty) for the same assignment. It automatically locates suspicious code in the faulty program and leverages static analysis to retrieve reference code from existing submissions with a graph-based matching algorithm. Finally, it generates candidate patches by modifying the suspicious code based on the reference code. Different from existing approaches, AssignmentMender exploits faulty submissions in addition to bug-free submissions to generate patches. Another advantage of AssignmentMender is that it can leverage submissions whose overall structures are different from those of the to-be-fixed submission. Evaluation results on 128 faulty submissions from 10 assignments show that AssignmentMender improves the state-of-the-art in feedback generation for newly released assignments. A case study involving 40 students and 80 submissio-s further provides initial evidence showing that the proposed approach is useful in practice.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
  • *

    The+2022+reviewers+list+includes+the+names+of+all+reviewers+from+29+November+2021+through+15+December+2022.+All+other+re-viewers+who+submit+after+this+date+will+be+included+in+our+2022+list.

    &rft.title=IEEE+Transactions+on+Software+Engineering&rft.issn=0098-5589&rft.date=2023&rft.volume=49&rft.spage=468&rft.epage=472">2022 Reviewers List* * The 2022 reviewers list includes the names of all
           reviewers from 29 November 2021 through 15 December 2022. All other
           re-viewers who submit after this date will be included in our 2022 list.

    • Free pre-print version: Loading...

      Pages: 468 - 472
      Abstract: Presents a list of reviewers who contributed to this publication in 2022.
      PubDate: Jan. 1 2023
      Issue No: Vol. 49, No. 1 (2023)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.236.70.233
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-