Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The dramatic development of Edge Computing technologies is strongly stimulating the adoption of machine learning models on connected and autonomous vehicles (CAVs) so that they can provide a variety of intelligent onboard services. When multiple services running on the resource-constrained CAVs, how limited resources can dynamically support the desired services is of the utmost importance for both automakers and domain researchers. In this context, efficiently and dynamically managing vehicle services becomes critical for autonomous driving. While previous research focused on service scheduling, computation offloading, and virtual machine migration, we propose EdgeWare, an extensible and flexible middleware to manage the execution of vehicle services, which is open-source to the community with four key features: i) on-demand model switch, i.e., easily switch and upgrade machine learning models, ii) function consolidation and deduplication to eliminate duplicate copies of repeating functions and maximize the reusability of vehicle services, iii) build event-driven applications to reduce workload, and iv) dynamic workflow customization which enables customizing workflow to extend the functionality. Our experiment results show that EdgeWare accelerates the execution of services about 2.6 \(\times\) faster compared to the silo approach and save CPU and memory utilization up to around 50% and 17% respectively, and it allows domain researchers to dynamically add new services on CAVs or easily switch to the upgraded applications for the life cycle management of vehicle services. PubDate: 2022-05-09
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Landslide is a major natural hazard causing losses of human lives and properties. Therefore, it is significant to assess landslide susceptibility. This paper proposed an assessment model for landslide susceptibility based on deep learning to avoid landslide hazards and reduce losses. We combined the multilayer perceptron and the frequency ratio to construct a hybrid model to calculate landslide susceptibility. We used 22,877 landslide locations and an equal number of non-landslide locations obtained from high-resolution satellite images for experiments. The model’s accuracy and the AUC value outperform the non-hybrid single models by 32.88%. Furthermore, we employed multi GPUs to accelerate the training process. We utilized a node with four GPUs to distribute the model and calculate the input batch, resulting in a decent speedup. PubDate: 2022-04-07
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract More and more applications are organized in the form of meshed micro-services which can be deployed on the popular container orchestration platform Kubernetes. Designing appropriate container auto-scaling methods for such applications in Kubernetes is beneficial to reducing costs and guaranteeing Quality of Services (QoS). However, most existing resource provisioning methods focus on a service without considering interactions among meshed services. Meanwhile, synchronous calls among services have different impacts on the processing ability of containers as the proportion of different business type’s requests changes which is not considered in existing methods too. Therefore, in this article, an adaptive queuing model and queuing-length aware Jackson queuing network based method is proposed. It adjusts the processing rate of containers according to the ratio of synchronous calls and considers queuing tasks when calculating the impact of bottleneck tiers to others. Experiments are performed on a real Kubernetes cluster, which illustrate that the proposal obtains the lowest percentage of Service Level Agreement (SLA)-violations (decreasing about 6.33%-12.29%) with about 0.9% additional costs compared with existing methods of Kubernetes and other latest methods. PubDate: 2022-04-06
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract With the end of Moore's Law and Dennard scaling, it has become increasingly difficult to implement high-performance computing systems on a monolithic chip. The chiplet technology that integrates multiple small chips into a large-scale computing system through heterogeneous integration is one of the important development directions of high-performance computing. Chiplet-based systems have huge advantages over monolithic chip in terms of design and manufacturing cost and development efficiency. In this survey, we summarized the concept and history of chiplet and introduce the critical technology needed to implement chiplet-based system. Finally, we discuss several future research directions of chiplet-based system. PubDate: 2022-03-31
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In edge computing, though distributed training of Deep Neural Networks (DNNs) is expected to exchange massive gradients between parameter servers and working nodes, the high communication cost constrains the training speed. To break this limitation, gradient compression algorithms expect the ultimate compression ratio at the expense of the accuracy of the trained model. Therefore, new gradient compression techniques are necessary to ensure both communication efficiency and model accuracy. This paper introduces a novel technique—an Adaptive Sparse Ternary Gradient Compression (ASTC) scheme, which relies on the number of gradients in model layers to compress gradients. ASTC establishes the model compression selection criterion by gradients’ amount, compresses the network layer that meets the model’s standard, evaluates the gradients’ importance based on entropy to adaptively perform sparse compression, and finally conducts ternary quantization compression and a lossless code scheme on sparse gradients. Using public datasets (MNIST, CIFAR-10, Tiny ImageNet) and deep learning models (CNN, LeNet5, ResNet18) for experimental evaluation, we exhibit excellent results that the training efficiency of ASTC is about 1.6 times, 1.37 times, and 1.1 times higher than that of Top-1, AdaComp, and SBC, respectively. Furthermore, ASTC can be improved by an average of about \(1.9\%\) in training accuracy compared with the above approaches. PubDate: 2022-03-18
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract DNA storage is a new digital data storage technology based on specific encoding and decoding methods between 0 and 1 binary codes of digital data and A-T-C-G quaternary codes of DNAs, which and is expected to develop into a major data storage form in the future due to its advantages (such as high data density, long storage time, low energy consumption, convenience for carrying, concealed transportation and multiple encryptions). In this review, we mainly summarize the recent research advances of four main encoding and decoding methods of DNA storage technology: direct mapping method between 0 and 1 binary and A-T-C-G quaternary codes in early-stage, fountain code for higher logical storage density, inner and outer codes for random access DNA storage data, and CRISPR mediated in vivo DNA storage method. The first three encoding/decoding methods belong to in vitro DNA storage, representing the mainstream research and application in DNA storage. Their advantages and disadvantages are also reviewed: direct mapping method is easy and efficient, but has high error rate and low logical density; fountain code can achieve higher storage density without random access; inner and outer code has error-correction design to realize random access at the expense of logic density. This review provides important references and improved understanding of DNA storage methods. Development of efficient and accurate DNA storage encoding and decoding methods will play a very important and even decisive role in the transition of DNA storage from the laboratory to practical application, which may fundamentally change the information industry in the future. PubDate: 2022-03-18
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Power and energy consumed by a high-performance computing system are a significant problem nowadays. Superconducting computing technology may offer an attractive low-power alternative to traditional complementary metal–oxide–semiconductor (CMOS) technology due to the ultrafast and low power switching characteristics of superconductor devices. We offer a relatively comprehensive review of the latest development of superconducting computing technology from aspects of logic circuits, emerging superconducting architectures, and automated design tools. In light of the inner operation mechanisms, we classify the superconducting single flux quantum (SFQ) logic family into six major categories and discuss their respective strengths and weaknesses. Also, many novel superconducting architectures have been proposed, such as dual-clocks-based superconducting circuits, superconducting accelerators, and superconducting neuromorphic circuits. However, their effectiveness needs further evaluation, and their manufacturability is still unknown. Additional efforts are also demanded to enhance the electronic design automation of very large-scale integration (VLSI) SFQ circuits while maintaining a relatively low cost in area and power. We also discuss open challenges and future directions in the superconducting computing area of research. PubDate: 2022-03-16
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Brain-inspired computing, which is inspired by the information processing procedure and the biophysiological structure of the brain, is believed to have the potential to drive the next wave of computer engineering and provide a promising way for the next generation of artificial intelligence. The basic software for brain-inspired computing is the core link to realize the research goals of brain-inspired computing and build the ecological environment of brain-inspired computing applications. This paper reviews the status of the three major kinds of basic software for brain-inspired computing. Namely, the toolchains for neuromorphic chips, the software simulation frameworks, and the frameworks that integrate spiking neural networks (SNNs) and deep neural networks (DNNs). Afterward, we point out that a "general-purpose" hierarchical and HW/SW decoupled basic software framework would be beneficial to both the (computational) neuroscience and brain-inspired intelligence fields. And the notion “general-purpose” refers to the decoupling of software and hardware and supports the integration of computer science and neuroscience related research. PubDate: 2022-03-16
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Quantum algorithms are demonstrated to outperform classical algorithms for certain problems and thus are promising candidates for efficient information processing. Herein we aim to provide a brief and popular introduction to quantum algorithms for both the academic community and the general public with interest. We start from elucidating quantum parallelism, the basic framework of quantum algorithms and the difficulty of quantum algorithm design. Then we mainly focus on a historical overview of progress in quantum algorithm research over the past three to four decades. Finally, we clarify two common questions about the study of quantum algorithms, hoping to stimulate readers for further exploration. PubDate: 2022-02-17 DOI: 10.1007/s42514-022-00090-3
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Research is being actively conducted for finding solutions to improve healthcare, precision medicine, and personalized medicine. Today’s biomedical research increasingly relies on computing power, particularly high-performance computing (HPC), and simultaneously involves researchers from a diverse range of research backgrounds. However, HPC is evolving rapidly toward a more heterogeneous architecture as we enter the exascale computing era, which unavoidably involves increased technical complexity. To better serve end users in the biomedical application community, we integrate software and hardware resources to develop the biomedical application community on the China National Grid (CNGrid). The biomedical application community is a web-based computing service for many biomedical research-related applications, including drug development, personalized medicine, bioinformatics, and computational chemistry. It exposes computational resources provided by the CNGrid and masks the underlying complex cyberinfrastructure, enabling user-friendly and easily accessible services to end users in the biomedical community. Our service is currently available free of charge at http://biomed.cngrid.org/. PubDate: 2022-02-10 DOI: 10.1007/s42514-022-00088-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The prediction of \({\mathrm{PM}}_{2.5}\) concentration has attracted considerable research efforts in recent years. However, due to the lack of open dataset, the data processed by existing intelligent methods are only values at single stations or mean value in a small region, while the data in real applications are all gridded values in large regions. This incompatibility in data format makes intelligent methods cannot be integrated into the practical process of \({\mathrm{PM}}_{2.5}\) prediction. In this paper, first we build a large dataset with gridded data obtained from the numerical prediction field, then an intelligent prediction method with gridded data as the basic input and output format is proposed. To capture both the spatial and temporal characteristics in data, the ConvLSTM (convolutional long short term memory) model is applied, which can utilize the advantages of both the CNN (convolutional neural network) and LSTM models. However, ConvLSTM has defects in processing multi-feature data: the more features model uses, the worse the forecasting result will be. To improve the prediction accuracy of ConvLSTM further, the attention mechanism is applied, which can describe more accurately the importance of different features and different regions for the prediction accuracy. On the built large dataset of \({\mathrm{PM}}_{2.5}\) gridded concentrations, when we predict the next hour’s value using the past 6 h, the RMSE (root mean square error) of the conventional MLR (multi-linear regression) and ConvLSTM are respectively 6.44 \(\mu \mathrm{g}/{\mathrm{m}}^{3}\) and 6.24 \(\mu \mathrm{g}/{\mathrm{m}}^{3}\) , when the attention mechanism is incorporated into ConvLSTM, the RMSE can be decreased to 4.79 \(\mu \mathrm{g}/{\mathrm{m}}^{3}\) . PubDate: 2022-02-03 DOI: 10.1007/s42514-021-00087-4
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Structure-based virtual screening is a key, routine computational method in computer-aided drug design. Such screening can be used to identify potentially highly active compounds, to speed up the progress of novel drug design. Molecular docking-based virtual screening can help find active compounds from large ligand databases by identifying the binding affinities between receptors and ligands. In this study, we analyzed the challenges of virtual screening, with the aim of identifying highly active compounds faster and more easily than is generally possible. We discuss the accuracy and speed of molecular docking software and the strategy of high-throughput molecular docking calculation, and we focus on current challenges and our solutions to these challenges of ultra-large-scale virtual screening. The development of Web services helps lower the barrier to drug virtual screening. We introduced some related web sites for docking and virtual screening, focusing on the development of pre- and post-processing interactive visualization and large-scale computing. PubDate: 2022-01-13 DOI: 10.1007/s42514-021-00086-5
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The energy consumption of large-scale heterogeneous computing systems has become a critical concern on both financial and environmental fronts. Current systems employ hand-crafted heuristics and ignore changes in the system and workload characteristics. Moreover, high-dimensional state and action problems cannot be solved efficiently using traditional reinforcement learning-based methods in large-scale heterogeneous settings. Therefore, in this paper, energy-aware task scheduling with deep reinforcement learning (DRL) is proposed. First, based on the real data set SPECpower, a high-precision energy consumption model, convenient for environmental simulation, is designed. Based on the actual production conditions, a partition-based task-scheduling algorithm using proximal policy optimization on heterogeneous resources is proposed. Simultaneously, an auto-encoder is used to process high-dimensional space to speed up DRL convergence. Finally, to fully verify our algorithm, three scheduling scenarios containing large, medium, and small-scale heterogeneous environments are simulated. Experiments show that when compared with heuristics and DRL-based methods, our algorithm more effectively reduces system energy consumption and ensures the quality of service, without significantly increasing the waiting time. PubDate: 2021-12-01 DOI: 10.1007/s42514-021-00083-8
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Specification mining is an automated or semi-automated process for inferring models or properties from computer programs or systems and is a useful way to aid program understanding, monitoring, and verification. There have been many works on mining various forms of specifications, of which mining for temporal logic specifications is becoming increasingly interesting, as temporal logic is capable of formally describing and reasoning about software behaviors in terms of temporal order. Several approaches have been proposed to mine linear temporal logic (LTL) specifications. But compared to LTL, past-time linear temporal logic (PTLTL) enables specifying many system behaviors in more natural forms, such as a specification \(G(a \rightarrow Ob)\) stating “Once event a happens, another event b must have happened before it”, which is much more intuitive than the equivalent LTL specification \(\lnot ((\lnot b)\ U\ (a \wedge \lnot b))\) for users and easier to check because of its shorter form. In this paper, we propose a general approach to mining PTLTL specifications. In addition, we present a cache strategy and a parallel strategy to make it faster and more scalable. We implement a tool named Past Time Linear Temporal Logic Miner (PTLM) and evaluate it. The result is encouraging. PubDate: 2021-12-01 DOI: 10.1007/s42514-021-00079-4
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract With the rapid development of high-throughput sequencing technologies, the scale of sequencing data continuously increases at unprecedented speed. In the field of genomics, high performance computing (HPC) is urgently needed to process these large-scale sequencing data, which uses supercomputers and parallel processing technologies to solve complex computing problems and performs intensive computing operations across massive resources. Nowadays, high performance computing plays an important role in data-driven sciences, and is widely used in genomics research. However, while dealing with massive multi-dimensional genomics data using high performance computing, there are still many challenges which limit the wide applications of HPC, such as high data complexity, huge memory requirements and low parallel computing performance. In this paper, we reviewed the irreplaceable applications of high performance computing in genomics, especially in pan-genome, single-cell transcriptome and large-scale population sequencing studies. In future, with the developing methods of hardware acceleration and algorithm optimization, the applications of high performance computing will be more inseparable in complex and large-scale genomics studies. PubDate: 2021-12-01 DOI: 10.1007/s42514-021-00081-w
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Heat shock protein 90 (Hsp90) is a promising target for cancer treatment, developing new effective Hsp90 inhibitors is of great significance in anticancer therapy. In this study, 20 machine learning models were constructed on 1321 molecules in order to precisely classify highly active and weakly active Hsp90 inhibitors. Six types of fingerprints including MACCS keys (MACCS), Extended connectivity fingerprints with radius 2 (ECFP_4), PubChem fingerprints, Estate fingerprints, Substructure fingerprints and 2D atom pairs fingerprints were applied to characterize Hsp90 inhibitors. Five machine learning algorithms containing support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting decision tree (GBDT) and multilayer perceptron (MLP) were utilized to develop classification models. The best RF and SVM models resulted in MCC values of 0.8070 and 0.8003, respectively. The fingerprints of these best models were analyzed by information gain (IG) method. Based on the IG analysis, we found some favorable substructures of highly active Hsp90 inhibitors. Moreover, we clustered 1321 Hsp90 inhibitors into eight subsets, further analyzed and summarized the structural characteristics of each subset. It was found that purine scaffold and resorcinol appeared frequently in highly active Hsp90 inhibitors. PubDate: 2021-12-01 DOI: 10.1007/s42514-021-00084-7
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The traditional system design method cannot guarantee the dependence of large-scale and complex real-time embedded software. The model constructed by UML and other semi-structured modeling languages does not support simulation and verification, nor can it find requirements omission and logic contradiction. The Extended Hierarchical State transition Matrix model (EHSTM) which supports hierarchical modeling and concurrent States is proposed. The formal modeling of large-scale software system is simplified by model hierarchy. All relations between any two complex system concepts are clarified by hierarchical States and state parallelization, and the parallel behavior modeling of system is supported at the same time. After the model is constructed, it can be simulated and verified by a bounded model verification tool "GarakabuII". C source codes can be generated automatically after model checking and verification. In this way, system developers can focus only model design, which simplifies the system design process. Finally, a system design tool ZIPC based on EHSTM model is designed. Aiming at the problems of atomicity violation and data race in concurrent program development, ZIPC tool is used to construct the model, and the above problems can be effectively solved by experimental verification. PubDate: 2021-12-01 DOI: 10.1007/s42514-021-00082-9
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The emergence of supercomputers has brought rapid development to human life and scientific research. Today, the new wave of artificial intelligence (AI) not only brings convenience to people's lives, but also changes the engineering and scientific high-performance computation. AI technologies provide more efficient and accurate computing methods for many fields. These ongoing changes pose new challenges to the design of computing infrastructures, which will be addressed in this survey in details. This survey first describes the distinguished progress of combining AI and high-performance computing (HPC) in scientific computation, analyzes several typical scenarios, and summarizes the characteristics of the corresponding requirements of computing resources. On this basis, this survey further lists four general methods for integrating AI computing with conventional HPC, as well as their key features and application scenarios. Finally, this survey introduces the design strategy of the Peng Cheng Cloud Brain II Supercomputing Center in improving AI computing capability and cluster communication efficiency, which helped it won the first place in the IO500 and AIPerf rankings. PubDate: 2021-12-01 DOI: 10.1007/s42514-021-00080-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Non-linear phase field models are increasingly used for the simulation of fracture propagation problems. The numerical simulation of fracture networks of realistic size requires the efficient parallel solution of large coupled non-linear systems. Although in principle efficient iterative multi-level methods for these types of problems are available, they are not widely used in practice due to the complexity of their parallel implementation. Here, we present Utopia, which is an open-source C++ library for parallel non-linear multilevel solution strategies. Utopia provides the advantages of high-level programming interfaces while at the same time a framework to access low-level data-structures without breaking code encapsulation. Complex numerical procedures can be expressed with few lines of code, and evaluated by different implementations, libraries, or computing hardware. In this paper, we investigate the parallel performance of our implementation of the recursive multilevel trust-region (RMTR) method based on the Utopia library. RMTR is a globally convergent multilevel solution strategy designed to solve non-convex constrained minimization problems. In particular, we solve pressure-induced phase-field fracture propagation in large and complex fracture networks. Solving such problems is deemed challenging even for a few fractures, however, here we are considering networks of realistic size with up to 1000 fractures. PubDate: 2021-06-29 DOI: 10.1007/s42514-021-00069-6