Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Mauricio Banaszeski da Silva;Gilson I. Wirth;Hans P. Tuinhout;Adrie Zegers-van Duijnhoven;Andries J. Scholten;
Pages: 2229 - 2242 Abstract: Flicker noise, or 1/f noise, is known to increase as devices scale down. However, as the scaling of MOS transistors advances, the effect of individual oxide defects, which originates flicker noise, becomes apparent through distinguishable discrete fluctuation in current. In such cases, flicker noise is recognizable as random telegraph noise (RTN). Another typical characteristic of RTN is that seemingly identical devices have different noise characteristics, as RTN variability from device to device also increases with scaling. Due to the large variability of RTN in highly-scaled devices, circuit yield must be evaluated during noise analysis. This work proposes a model to analyze the effect of RTN in analog circuits and to evaluate circuit yield. The model can be used in the traditional workflow of noise analysis. The model estimates the distribution of RTN power from the distribution of the noise spectral density. Then, quantiles from the power distribution are used for predicting current/voltage fluctuations for a given yield. This work shows how to use the proposed model to calculate random decision errors in comparators, jitter in oscillators and phase-locked loops, and the impact of RTN in correlated double sampling circuits. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Mohammad Oveisi;Huan Wang;Payam Heydari;
Pages: 2243 - 2256 Abstract: Realization of high-order modulation schemes directly in the RF domain enables the generation of spectrally efficient $4^{M}$ quadrature-amplitude-modulated ( $4^{M}$ QAM) symbols using the vectorial summation of $M$ quadrature phase-shift keying (QPSK) signals whose amplitudes are progressively scaled by a constant factor of two. Called RF-QAM, this approach leads to numerous advantages including the elimination of power-hungry digital-to-analog converter (DAC) and the mitigation of stringent linearity requirement of the front-end power amplifier (PA). This paper presents a comprehensive comparative study of RF-QAM and conventional transmitters. The design issues associated with the front end and the mixed-signal blocks for both architectures are investigated, and the performance of these two designs is compared. Various circuit- and system-level simulations verify the superior performance of the RF-QAM transmitter compared to the conventional counterpart. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Chi-Wei Huang;Chin-Kai Lai;Chung-Chih Hung;Chung-Yu Wu;Ming-Dou Ker;
Pages: 2257 - 2270 Abstract: In this paper, an analog front-end (AFE) local-field potential (LFP) acquisition unit with real-time stimulation artifact removal is proposed and verified for closed-loop deep brain stimulation (DBS) applications. The proposed acquisition unit is called the synchronized sample-and-hold stimulation artifact blanking (SSAB) AFE LFP acquisition unit. Both right-leg-driven (RLD) circuit and monopolar electrode-tissue impedance (ETI) measurement circuit associated with the AFE amplifier are also proposed. During closed-loop stimulations, the artifact removal is realized through the SSAB-IPC by blanking the AR-CCIA with a clock synchronized to the stimulation-enable signal and holding the amplifier at its state before stimulation through a sample-and-hold operation. After stimulation, the acquisition unit can quickly recover from the holding state back to the LFP recording state to reduce the discontinuity in LFP recording. The proposed acquisition unit was fabricated in 0.18- $mu {mathrm{ m}}$ CMOS technology. With the RLD circuit, the measured CMRR is 124– 145 dB in the signal bandwidth. The fabricated monopolar ETI measurement circuit has a measurement error less than 8.3% with an extra power consumption of $2.65 mu text{W}$ . The experimental results have shown that the proposed SSAB AFE LFP acquisition unit is feasible for the integration of SoCs in real-time closed-loop DBS systems for various applications. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Tuomin Tao;Hanzhi Ma;Da Li;Yan Li;Shurun Tan;En-Xiao Liu;Jose Schutt-Aine;Er-Ping Li;
Pages: 2271 - 2282 Abstract: This paper presents the efficient systematic methods for modeling and analysis of spike signal sequence in crossbar arrays for neuromorphic computing chips. A novel spike signal sequence is proposed, where the ideal spike sequence with only spike time information in the original spiking neural network (SNN) algorithm is mapped onto actual spike waveform by stitching neighboring sequential spikes together with certain overlaps. We thoroughly investigate and analyze the performance of the input encoding as well as the implementation of spike timing dependent plasticity (STDP)-based SNN on memristor crossbar arrays with the proposed spike signal sequence. A detailed circuit model of a crossbar array, consisting of resistance, capacitance and inductance derived by the partial equivalent element circuit (PEEC) method, is created to simulate the training process of SNN. The proposed spike signal sequence is demonstrated that is able to achieve accurate input encoding as well as high recognition accuracy when it is used to perform the classification task on MNIST handwritten digits. The spike signal sequence is further analyzed and assessed in terms of the main factors affecting its encoding accuracy and the parasitic effects of crossbar arrays on its robustness. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Andrea Ballo;Alfio Dario Grasso;Gaetano Palumbo;
Pages: 2283 - 2292 Abstract: This paper presents and discusses two Dickson charge pumps that are capable of working with a supply voltage lower than the MOS threshold voltage and are particularly suited for energy-constrained applications. Specifically, the paper includes a theoretical analysis of a previous topology introduced by the authors, and then it discusses a novel topology which solves drawbacks of the previous one. The paper also includes a comparison with other conventional topologies, namely the latched and the four-phase charge pumps, that shows the inherent advantage of the proposed solutions also in the range of input values higher than the MOS threshold voltage. 2-, 4-, and 6- stage versions of the conventional and proposed CPs have been fabricated in a 65-nm standard CMOS technology having a MOS threshold voltage equal to about 440 mV. Experimental results confirm the analytical predictions since the proposed topologies work at input voltage equal to 300 mV with a current load equal to $20~mu text{A}$ and also confirm the advantages of the two proposed solutions in terms of settling time, output current drivability and output power density for voltages slightly higher (450 mV) than the MOS threshold voltage. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Xin Xin;Linxiao Shen;Xiyuan Tang;Yi Shen;Jueping Cai;Xingyuan Tong;Nan Sun;
Pages: 2293 - 2305 Abstract: This paper presents a 13-tap FIR filter and an IIR filter embedded in a 10-bit SAR ADC for wireless communications chip. The IIR filter can be inherently realized through reusing the capacitor array of the SAR ADC, thus improving the stopband suppression and shaping the transition band. Besides, the DC attenuation is also avoided. The sampling rate loss of the SAR ADC can be compensated by the $4times $ time-interleaving technology. The proposed filter features high power-efficient, linearity and process compatibility. Compared with a 15-tap FIR filter, the out-of-band suppression at the cut-off frequency (OOBS@ $f_{mathrm {cut-off}}$ ) is enhanced by 9dB theoretically. A prototype FIR/IIR filter in 40nm CMOS occupies an active area of 0.067mm2, consumes $38~mu text{W}$ at a single supply of 1.1V, has a 1-MHz bandwidth, obtains $>$ 42.2dB OOBS@5.9MHz when operated at 40MS/s. Meanwhile, the SAR ADC without/with the proposed filter can achieve a FoMw of 7.91 fJ/conversion-step and 13.5 fJ/conversion-step, respectively. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Fabricio Alcalde Bessia;Troy England;Hongzhi Sun;Leandro Stefanazzi;Davide Braga;Miguel Sofo Haro;Shaorui Li;Juan Estrada;Farah Fahim;
Pages: 2306 - 2316 Abstract: The MIDNA application specific integrated circuit (ASIC) is a skipper-CCD readout chip fabricated in a 65nm LP-CMOS process that is capable of working at cryogenic temperatures. The chip integrates four front-end channels that process the skipper-CCD signal and performs differential averaging using a dual slope integration (DSI) circuit. Each readout channel contains a pre-amplifier, a DC restorer, and a dual-slope integrator with chopping capability. The integrator chopping is a key system design element in order to mitigate the effect of low-frequency noise produced by the integrator itself, and it is not often required with standard CCDs. Each channel consumes 4.5 mW of power, occupies 0.156 mm 2 area and has an input referred noise of 2.7 $mu text {V}_{text {rms}}$ . It is demonstrated experimentally to achieve sub-electron noise when coupled with a skipper-CCD by means of averaging samples of each pixel. Sub-electron noise is shown in three different acquisition approaches. The signal range is 6000 electrons. The readout system achieves 0.2 ${text {e}^{-}}$ RMS by averaging 1000 samples with MIDNA both at room temperature and at 180Kelvin. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Keisuke Kawahara;Yohtaro Umeda;Kyoya Takano;Shinsuke Hara;
Pages: 2317 - 2330 Abstract: This paper presents a low-imbalance and inductor-less active balun. The large immittance of the parasitics increases gain and phase errors in single-ended-to-differential conversion at high frequencies. Positive feedback is effective in reducing these errors. However, there is a trade-off between the stability of the feedback and the imbalance correction. This paper analyzes this trade-off, and the common-mode rejection ratio (CMRR) was improved by adding a capacitor. Besides, we established the feedback loops for the imbalance correction are also available for bandwidth extension, that is the additional capacitor improves not only the CMRR but also the high-frequency gain. This peaking technique removes inductors that consume large chip areas. The balun was fabricated in a 0.18- $mu text{m}$ CMOS process and achieved a small core area of 0.0058 mm2. In addition, a self-bias scheme using a current mirror was devised. It ensures a good bias current balance and reduces errors. The manufacturing variation of the fabricated baluns was statistically evaluated. To obtain the 99.7% limit of the CMRR, we extended the theory of the random CMRR to the complex plane. The measurement results demonstrated small errors within −0.1±0.2 dB and −0.18±1.17° including a variation of $pm 3sigma $ from DC to 8.0 GHz. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Junwei Sun;Jianling Yang;Yanfeng Wang;Peng Liu;Yin Sheng;
Pages: 2331 - 2341 Abstract: The reinforcement and extinction in conditioned reflex have been studied extensively, but memristor-based generalization and differentiation circuits under different emotional conditions are rarely studied. Therefore, a memristor-based generalization and differentiation circuit under positive and negative emotional conditions is presented in this paper. The circuit includes emotion module, synapse module, voltage selection module and output module. The emotion module is divided into positive emotion and negative emotion modules. Different emotions have different effects on the synapse module, which in turn affects the output module. The memristor-based circuit proposed in this paper can not only realize the process of generalization and differentiation under the influence of different emotions, but also realize the function of secondary differentiation. The results presented in this paper can be verified in PSPICE. By analyzing the effects of different emotions on differentiation and generalization, this paper provides some references for future researches in the field of generalization and differentiation. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Yi-Ta Chen;Yu-Chuan Chuang;Li-Sheng Chang;An-Yeu Wu;
Pages: 2342 - 2355 Abstract: User identification enables secure access to data and machines in smart factories. Compared with other modalities, ECG-based user identification is rising due to its intrinsic liveness proof and invulnerability to spoofing without contact. On the other hand, as new employees are registered at the factory, the ECG-based user identification system needs to be updated based on the new coming data. This scenario can be defined as an online class-incremental learning (O-CIL) problem. By exploiting hardware-software co- design, this work presents a Scalable QR-decomposition-based extreme learning machine (S-QRD-ELM) engine that can effectively and efficiently support O-CIL for ECG-based user identification. At the software level, we apply the concept of “the others” class and inversion-free QR-decomposition (QRD) recursive least squares to the S-QRD-ELM. This makes S-QRD-ELM achieve 79.7% higher accuracy in the O-CIL scenario compared with the neural network trained with back-propagation (BP-NN). At the hardware level, a one-dimensional diagonally-mapped linear array (1D-DMLA) is proposed to efficiently compute the QRD and back-substitution (BS) operations inside the S-QRD-ELM, reducing 98.5% of the silicon area. Moreover, the integrated processing element (PE) design with the unified COordinate Rotation DIgital Computer (u-CORDIC) further reduces 15.3% of the area and 22.4% of the power consumption. This engine is fabricated in 40nm CMOS technology with a $1.33times 1.33$ mm2 die area. The chip achieves $0.02mu text{J}$ /sample and $2.47mu text{J}$ /sample inferencing and learning energy efficiency, respectively, which is $6.4times $ and $28.5times $ than the state-of-the-art. To the best of our knowledge, the proposed highly energy-efficient S-QRD-ELM engine is the first chip to meet the requirements of O-CIL for ECG-based user identification. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Liu Liu;Ann Franchesca Laguna;Ramin Rajaei;Mohammad Mehdi Sharifi;Arman Kazemi;Xunzhao Yin;Michael Niemier;Xiaobo Sharon Hu;
Pages: 2356 - 2369 Abstract: Pattern searches, a key operation in many data analytic applications, often deal with data represented by multiple states per dimension. However, hash tables, a common software-based pattern search approach, require a large amount of additional memory, and thus, are limited by the memory wall. A hardware-based solution is to use content-addressable memories (CAMs) that support fast associative searches in parallel. Ternary CAMs (TCAMs) support bit-wise Hamming distance (HD) based searches. Detecting the HD of vectors with multiple states per dimension (i.e., multi-state Hamming distance (MSHD)) can be implemented on TCAMs with one-hot encoding, but requires one TCAM cell per state, leading to a higher area, latency, and energy overhead. We propose a Ferroelectric FET (FeFET)-based multi-state CAM design, MHCAM, which implements MSHD searches in a dense FeFET-based memory array. MHCAM only uses $lceil log_{2} s rceil ~2$ FeFET CAM cells to represent $s$ states or symbols per dimension, and can be reconfigured to 2-bit/4-bit/6-bit/8-bit dimensions. A low-cost sensing circuit with matchline voltage scaling technique is introduced to perform both exact match and threshold match. We use DNA and protein pre-alignment filtering as application case studies to evaluate the application-level benefit of MHCAM. DNA and protein pre-alignment filtering achieve $3.8times /4.7times $ speedup and $1.7times /1.8times $ energy improvement compared with the state-of-the-art 2FeFET TCAM-based implementation. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Weiwei Wu;Fengbin Tu;Mengqi Niu;Zhiheng Yue;Leibo Liu;Shaojun Wei;Xiangyu Li;Yang Hu;Shouyi Yin;
Pages: 2370 - 2383 Abstract: Skeleton-based human action cognition (HAR) has drawn increasing attention recently. As an emerging approach for skeleton-based HAR tasks, Spatial-Temporal Graph Convolution Network (STGCN) achieves remarkable performance by fully exploiting the skeleton topology information via graph convolution. Unfortunately, existing GCN accelerators lose efficiency when processing STGCN models due to two limitations. (1) At the dataflow level, the hardware parallelism of GCN accelerators cannot match the computation parallelism of STGCN models, leading to computing resource under-utilization. (2) At the computation level, GCN accelerators fail to exploit the inherent temporal redundancy in STGCN models. To overcome the limitations, this paper proposes STAR, an STGCN architecture for skeleton-based human action recognition. STAR is designed based on the characteristics of different computation phases in STGCN. For limitation (1), a spatial-temporal dimension consistent (STDC) dataflow is proposed to fully exploit the data reuse opportunities in all the different dimensions of STGCN. For limitation (2), we propose a node-wise exponent sharing scheme and a temporal-structured redundancy elimination mechanism, to exploit the inherent temporal redundancy specially introduced by STGCN. To further address the under-utilization induced by redundancy elimination, we design a dynamic data scheduler to manage the feature data storage and schedule the features and weights for valid computation in real time. STAR achieves $4.48times $ , $5.98times $ , $2.54times $ , and $103.88times $ energy savings on average over the HyGCN, AWB-GCN, TPU, and Jetson TX2 GPU. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Siyu Zhang;Wendong Mao;Zhongfeng Wang;
Pages: 2384 - 2397 Abstract: Deformable convolutional networks (DCNs) have shown outstanding potential in video super-resolution with their powerful inter-frame feature alignment. However, deploying DCNs on resource-limited devices is challenging, due to their high computational complexity and irregular memory accesses. In this work, an algorithm-hardware co-optimization framework is proposed to accelerate the DCNs on field-programmable gate array (FPGA). Firstly, at the algorithm level, an anchor-based lightweight deformable network (ALDNet) is proposed to extract spatio-temporal information from the aligned features, boosting the visual effects with low model complexity. Secondly, to reduce intensive multiplications, an innovative shift-based deformable 3D convolution is developed using low-cost bit shifts and additions, maintaining comparable reconstruction quality. Thirdly, at the hardware level, a dedicated critical processing core, together with a block-level interleaving storage scheme, is presented to avoid dynamic and irregular memory accesses caused by the deformable convolutions. Finally, an overall architecture is designed to accelerate the ALDNet and implemented on an Intel Stratix 10GX platform. Experimental results demonstrate that the proposed design can provide significantly better visual perception than other FPGA-based super-resolution implementations. Meanwhile, compared with the prior hardware accelerators, our design can achieve $2.75times $ and $1.63times $ improvements in terms of throughput and energy efficiency, respectively. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Wenjun Tang;Mingyen Lee;Juejian Wu;Yixin Xu;Yao Yu;Yongpan Liu;Kai Ni;Yu Wang;Huazhong Yang;Vijaykrishnan Narayanan;Xueqing Li;
Pages: 2398 - 2411 Abstract: Bitwise logic-in-memory (BLiM) is a promising approach to efficient computing in data-intensive applications by reducing data movement between memory and processing units. However, existing BLiM techniques have challenges towards higher energy efficiency and speed: (i) DC power in computing and result sensing is significant in most existing RRAM and MRAM based BLiM solutions; (ii) before the computation result could be stored back to the same memory array, existing BLiM has to sense the result first, at the cost of extra power and latency due to the sense amplifiers (SAs). Targeting at higher energy efficiency and speed, this work proposes a new BLiM approach in 2-transistor/ cell (2T/C) and 3T/C topologies based on ferroelectric field-effect transistors (FeFETs), supporting a variety of computing functions. For the first time, this new approach supports SA-free direct write-back, and consumes no static power for computing and sensing with proposed fully dynamic computing and sensing schemes. Another highlight is that this work further minimizes the dynamic power by (i) reducing the chance of bitline charging activities and (ii) recycling the bitline charge in sensing multi-operand operations. Compared with prior BLiM methods based on nonvolatile memories, evaluation shows 3.0x–100x latency and 1.3x–200x energy improvement for typical in- memory XOR operation, which further leads to 3.0x–58x and 3.2x–78x savings of latency and energy, respectively, for the application of advanced-encryption standard (AES). PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Bi Wu;Haonan Zhu;Ke Chen;Chenggang Yan;Weiqiang Liu;
Pages: 2412 - 2424 Abstract: Conventional computing architectures based on the von Neumann structure are suffering from the severe ‘memory wall’ issue due to the isolation and speed mismatch between memory and processor. As a promising solution, the concept of logic in-memory (LiM) has been proposed to effectively reduce the overhead of data migration and has been extensively studied in various memory technologies such as SRAM, DRAM, MRAM, ReRAM, etc. Among them, SOT-MRAM combines the advantages of non-volatility, low static power consumption, ultra-fast read/write speed, and high density, has emerged as one of the most promising candidates for low-power LiM implementations. In this paper, four in-memory logic operations, AND, OR, MAJ and full-addition (FA), are proposed based on the Unipolar Switching (US) SOT-MRAM devices. Incorporating the emerging switching behavior of SOT-MRAM, these operations can be performed with the basic memory access operations (read/write) with negligible modifying peripheral circuits. Meanwhile, by optimizing the operation steps, the performance degradation caused by the instability of SOT-MRAM device can be minimized in the proposed LiM architecture. Detailed simulation results show that the proposed design can reduce the latency (energy) of AND, OR operations at least by 71.2%, 74.4% (30.0%, 35.4%) compared with the existing SRAM and STT-MRAM designs. For MAJ and FA operations, the performance is improved by at least 34.7% and 44.8% compared to the existing design. The robustness of our design is demonstrated by the 100% pass of the 1000 samples Monte Carlo simulations for the sufficient switching current margin and the effectiveness of basic operations. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Mahidhar Puligunta;Hayssam El-Razouk;
Pages: 2425 - 2438 Abstract: Cryptography primitives have a prominent role in securing applications that may require low-area realizations, for example portable devices and other resource constrained devices. A given system may require support for different cryptography based protocols/ primitives. Many standardized and/or published primitives rely on arithmetic operations over $GFleft ({2^{m}}right)$ that occupy major area footprint. Therefore, versatile operators have been of interest to reduce the area penalty, in particular bit-serial multipliers. This paper introduces a novel scheme for versatile multiplication by the normal element in the Gaussian Normal Basis (GNB) leading to new low-area versatile GNB multiplier and inverter architectures that are presented for the first time, as far as we know. Specifically, the proposed inverters are the first versatile GNB inversion in open literature, to the best of our knowledge. Field Programmable Gate Arrays (FPGA) implementation results demonstrate that the proposed versatile multiplication and inversion techniques save almost 30% and 46%, and for Application Specific Integrated Circuits (ASIC) implementation the savings are up-to 29% and 35% respectively, in terms of area when compared to other counterparts. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Zhuoying Zhao;Ziling Tan;Pinghui Mo;Xiaonan Wang;Dan Zhao;Xin Zhang;Ming Tao;Jie Liu;
Pages: 2439 - 2449 Abstract: This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The system consists of field programmable gate array (FPGA) and application specific integrated circuit (ASIC) working in heterogeneous parallelization. To be specific, a multiplication-less neural network (NN) is deployed on the non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic forces, which is the most computationally expensive part of MD. All other calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to achieve similar-level accuracy, the proposed NvN-based system based on low-end fabrication technologies (180 nm) is $1.6times $ faster and $10^{2}$ - $10^{3}times $ more energy efficiency than state-of-the-art vN-based MLMD using graphics processing units (GPUs) based on much more advanced technologies (12 nm), indicating superiority of the proposed NvN-based heterogeneous parallel architecture. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Gianmarco Ottavi;Angelo Garofalo;Giuseppe Tagliavini;Francesco Conti;Alfio Di Mauro;Luca Benini;Davide Rossi;
Pages: 2450 - 2463 Abstract: Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- to 32-bit arithmetic and all possible mixed-precision combinations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15 follower cores. Clock gating Instruction Fetch (IF) stages and private caches of the follower cores leads to 38% power reduction. The cluster, implemented in 65 nm CMOS technology, achieves a peak performance of 58 GOPS and a peak efficiency of 1.15 TOPS/W. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Maxime Schramme;David Bol;
Pages: 2464 - 2477 Abstract: Sensitivity to process, voltage, and temperature (PVT) variations constitutes a serious obstacle in ultralow- voltage/ultralow-power (ULV/ULP) circuits and systems. To address this challenge, we propose a unified frequency/back-bias regulation (UFBBR) macro embedded in a custom ULP ARM Cortex-M4 microcontroller unit (MCU) manufactured in 28-nm FDSOI technology. The UFBBR technique combines the generation of a 32-to-80 MHz system clock and asymmetric adaptive back biasing for PVT compensation. Relying on a novel dual-output frequency-locked loop, it senses both the logic speed and the N/PMOS process imbalance using back-bias-controlled oscillators, and generates adequate forward back-bias voltages with digitally-controlled oscillators followed by switched-capacitor charge-pumps for fast current actuation. Compared to a situation with zero back biasing and appropriate frequency/voltage margins, the UFBBR provides $15times $ of frequency boosting at 0.4 V or 180 mV of voltage reduction at 64 MHz. It leverages software-programmable configuration knobs to achieve a fast wake-up of $8 ~mu text{s}$ and an in-lock power of $22 ~mu text{W}$ , with an area overhead below 0.032 mm2. In sleep, it drives the back-bias voltages towards 0 V and disables the clock for minimum power consumption. These features help the MCU system achieve a minimum energy point of $5.5 ~mu text{W}$ /MHz and a sleep power of $7.7 ~mu text{W}$ . PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Guodao Zhang;Yisu Ge;Haojie Xu;Abdulilah Mohammad Mayet;Yanjie Lu;Mingtao Ye;Ehsan Nazemi;
Pages: 2478 - 2486 Abstract: Biological systems in case of real-time state and also large-scale simulation approach are interesting and challenge-based due to different aspects of nonlinear mathematical modeling that can describe the interactions of biological blocks. Thus, hardware circuit designing of these basic blocks in the Central Nervous System (CNS) can be an important field in case of achieving high performance neuromorphic system emulator. This paper presents a high-speed, low-cost, and efficient digital circuit for emulating the plausible calcium-dynamic-based model of astrocyte which has spontaneous oscillations. The nonlinear high-cost functions of the complex astrocyte model are reformulated using the power-2 based low-cost terms using optimized exhaustive search algorithm. Subsequently, the proposed model is simulated in case of validating the presented model and new optimized functions. Finally, the proposed model is physically realized in hardware case using Virtex 4 FPGA platform to test and validate final circuits. FPGA implementation results confirmed the ability of the design to emulate biological cell behaviours in detail with high accuracy. The proposed hardware consumes maximum 2% of the all resources of a Virtex 4 board. Additionally, timing analysis and synthesize report represent that the proposed model works in a high frequency of 371.56 MHz. Moreover, to validate the results of implementation, the proposed model is compared with the original model and other similar works in terms of accuracy, speed-up, and maximum number of implemented astrocyte. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Donghyuk Kim;Sanghyun Jeong;Joo-Young Kim;
Pages: 2487 - 2496 Abstract: We propose a software/hardware co-design framework called Agamotto for the complete design automation and performance optimization of the row stationary-based CNN accelerator. We design a scalable accelerator template whose critical design parameters can be configured. Based on the hardware template, Agamotto estimates the performance of the numerous possible hardware implementations for the target FPGA device and CNN model using the latency modeling tool. It chooses the best hardware design and generates the instructions and optimal runtime variables for each target CNN layer. As a result, Agamotto can generate the best hardware design within 61.67 seconds, achieving up to 2.8x higher hardware utilization than the original accelerator. In addition, experimental results show that the performance estimation is accurate, showing only 4.8% difference against the FPGA runtime for the end-to-end CNN model execution. The accelerator implemented on the Xilinx VCU118 evaluation board achieves 402 giga operations per second (GOPS) at 200 MHz, resulting in 13 frames per second (FPS) for the end-to-end execution of VGG-16. It is flexible enough to run more complex CNN models such as ResNet-50 and DarkNet-53, achieving 29.3 FPS and 16.9 FPS, respectively. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Michael Wasef;Nader Rafla;
Pages: 2497 - 2510 Abstract: Recurrent neural networks (RNNs) are used extensively in time series data applications. Modern RNNs consist of three layer types: recurrent, Fully-Connected (FC), and attention. This paper introduces the design, acceleration, implementation, and verification of a complete reconfigurable RNN using a system-on-chip approach on an FPGA. This design is suitable for small-scale projects and Internet of Things (IoT) end devices as the design utilizes a small number of hardware resources compared to previous configurable architectures. The proposed reconfigurable architecture consists of three layers. The first layer is a Python software layer that contains a function serving as the architecture’s user interface. The output of the python function is three binary files containing the RNN architecture description and trained parameters. The embedded software layer implemented on an on-chip ARM microcontroller is the second layer of that architecture. This layer reads the first layer output files and configures the hardware layer with the required configuration and parameters to execute each layer in the RNN. The hardware layer consists of two Intellectual Properties (IPs) with different configurations. The Recurrent Layer Hardware IP implements the recurrent layer using either Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU) as basic building blocks, while the ATTENTION/FC IP implements the attention layer and the FC layer. The proposed design allows the implementation of a recurrent layer on an FPGA with variable input and a hidden vector length of up to 100 elements for each vector. It also supports implementing an attention layer with up to 64 input vectors and a maximum vector length of 100 items. The FC layers can be configured to support a maximum value of 256 for the input vector length and the number of neurons in each layer. The hardware design of the recurrent layer achieves a maximum performance of 1.958 and 2.479 GOPS for the GRU an- LSTM models, respectively. The maximum performance of the attention and FC layers is 2.641 GOPS and 634.3 MOPS, respectively. The hardware design works at a maximum frequency of 100 MHz. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Pengbo Liu;Xingyuan Wang;Yining Su;Huipeng Liu;Salahuddin Unar;
Pages: 2511 - 2522 Abstract: This article proposes a new infinite parameter range spatiotemporal chaotic system named improved sinusoidal dynamic non-adjacent coupled mapping lattice (ISDNCML). The system performance test shows that ISDNCML has more random lattice interaction, better chaos and excellent cryptography. Based on the characteristics of ISDNCML, the globally coupled private image encryption algorithm is proposed. The algorithm first identifies the private area of an image and then couples the private and non-private areas for encryption. While coupling encryption, ill-conditioned dynamic diffusion is introduced into the private area, which avoids the repeated encryption of the private area on the premise of ensuring the security of private information. The security and practicability of this proposed cryptographic system are verified by the analysis of ISDNCML and various tests of encryption effect. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Huihong Shi;Haoran You;Zhongfeng Wang;Yingyan Lin;
Pages: 2523 - 2536 Abstract: Multiplication is arguably the most computation-intensive operation in modern deep neural networks (DNNs), limiting their extensive deployment on resource-constrained devices. Thereby, pioneering works have handcrafted multiplication-free DNNs, which are hardware-efficient but generally inferior to their multiplication-based counterparts in task accuracy, calling for multiplication-reduced hybrid DNNs to marry the best of both worlds. To this end, we propose a Neural Architecture Search and Acceleration (NASA) framework for the above hybrid models, dubbed NASA+, to boost both task accuracy and hardware efficiency. Specifically, NASA+ augments the state-of-the-art (SOTA) search space with multiplication-free operators to construct hybrid ones, and then adopts a novel progressive pretraining strategy to enable the effective search. Furthermore, NASA+ develops a chunk-based accelerator with novel reconfigurable processing elements to better support searched hybrid models, and integrates an auto-mapper to search for optimal dataflows. Experimental results and ablation studies consistently validate the effectiveness of our NASA+ algorithm-hardware co-design framework, e.g., we can achieve up to 65.1% lower energy-delay-product with comparable accuracy over the SOTA multiplication-based system on CIFAR100. Codes are available at https://github.com/GATECH-EIC/NASA. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Yingqing Pei;Ye Tao;Haibo Gu;Jinhü Lu;
Pages: 2537 - 2549 Abstract: The problem of seeking Nash equilibrium (NE) based on aggregative games under quantization constraints is full of challenges. Although the NE seeking algorithm in continuous-time systems has been studied, this problem in discrete-time systems still needs to be solved urgently. To address this problem, three distributed algorithms are first proposed under three quantization cases, adaptive, random, and time-varying quantizations, based on doubly stochastic communication topology networks. Then, the actions of players would eventually converge to NE under the conditions of vanishing step size and strong monotonicity are proved. Moreover, the convergence rate of the three quantization cases are analyzed, respectively. Finally, numerical experiments are implemented on plug-in hybrid electric vehicles (PHEVs) to validate the effectiveness of the proposed distributed algorithms. Comparing the convergence rates of the three proposed algorithms, the convergence effect of the adaptive quantization is better than that of the other two quantization cases. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Deyuan Liu;Hao Liu;Jinhu Lü;Frank L. Lewis;
Pages: 2550 - 2560 Abstract: This paper investigates the optimal formation control of a heterogeneous multiagent system consisting of multiple quadrotors and ground vehicles via reinforcement learning to achieve the time-varying formation under switching topologies. A distributed observer is firstly constructed to generate references using local information for each vehicle to form time-varying formation and the convergence of the observer under switching topologies is proven. Then, reinforcement learning methods are provided for the heterogeneous vehicle group to realize the optimal tracking control without information of vehicle dynamical model. Simulation tests are given to confirm the effectiveness of the proposed method. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Zepeng Ning;Xunyuan Yin;Yang Shi;
Pages: 2561 - 2572 Abstract: This paper proposes a novel quantized control strategy for network-based linear systems subject to multi-input-multi-output (MIMO) quantization. A logarithmic quantization scheme is adopted for characterizing the quantization effect on system dynamics. A sufficient and necessary condition on the asymptotic stability is established for quantized MIMO systems. To improve the numerical testability of the obtained results, a polytopic approach approximating the MIMO quantization uncertainties is developed. By constructing a novel Lyapunov function that has dependence on the MIMO quantization uncertainties, asymptotic stability criteria are established for closed-loop quantized MIMO systems. The conditions on the existence of state-feedback controllers that guarantee the closed-loop stability are derived based on the proposed technique that decouples the controller gains and the parameters of MIMO quantization uncertainties. The proposed method and the associated theoretical results are extended to the disturbance attenuation case. Finally, the theoretical results are applied to a benchmark example and a converter circuit to illustrate their efficacy and superiority. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Adeel Arif;Hesheng Wang;Herman Castañeda;Yong Wang;
Pages: 2573 - 2586 Abstract: This paper presents a finite-time visual servoing control strategy for the autonomous landing of a quadrotor onto a tilting and moving landing vehicle. The proposed method, called Finite-Time Dynamic Visual Servo (FTDVS) control, utilizes a computer vision technique called Virtual Reticle Image Plane (VRIP) to track four observable features on the landing plane called landmarks. To represent the sea state, target-plane tilting motion is modelled using monochromatic sinusoidal waves. VRIP exploits the sinusoidal tilting pose of the landing plane and calculates the time period of the tilting motion. Based on this time period, the proposed FTDVS control strategy enables the quadrotor to search for suitable landing windows on a tilting target plane. The proposed finite-time controller converges the tracking errors to zero in finite-time during a landing maneuver. First, a rigorous stability analysis of the FTDVS control law is presented, followed by simulations demonstrating acceptable performance in terms of tracking accuracy when compared to existing controllers for autonomous landing on moving targets. The proposed method has been experimentally validated in an indoor environment. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Hao Li;Changchun Hua;Kuo Li;Qidong Li;
Pages: 2587 - 2598 Abstract: In order to improve the efficiency of data transmission and save communication resources, the problems of double event-triggered control are investigated for a class of high-order nonlinear random systems. Under more general system conditions, in addition to overcoming the difficulty of recursive design caused by signal discontinuity, the effects of high-order nonlinearity and random disturbances also need to be addressed. Based on the adding power integral technique, a practical finite-time stable result is established for the nonlinear random systems under a double event-triggered mechanism (ETM) and proved that there is no Zeno phenomenon. Compared with the existing results, the update frequency of the signals is effectively reduced, and the upper bound of the stable error is independent of trigger parameters, thus can be made sufficiently small by tuning design parameters. Furthermore, the result is expanded to finite-time stabilization, state variables converge to the origin in a finite time. Finally, numerical simulations verify the effectiveness of the proposed algorithm. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Lin Chen;Yilin Wang;Jing Zhao;Shihong Ding;Jinwu Gao;Hong Chen;
Pages: 2599 - 2611 Abstract: To achieve rapid and high-precision servo control of an electronic throttle, an adaptive control scheme is proposed based on the extremum seeking (ES), which consists of a variable-gain adaptive proportional-integral (ES-API) controller and an adaptive compensator (ES-ACP). The two gains ( ${K_{p}}$ , ${K_{i}}$ ) of the ES-API controller are designed as maps with respect to the tracking error, and the parameters of these maps are learned by ES. Additionally, the ES-ACP is applied to compensate for the strong nonlinearity inherent in an electronic throttle control (ETC) system, whose parameters are also learned by ES. During parameter learning, an objective function is utilized to quantify the tracking error of the opening angle of the electronic throttle plate, and then the parameters are learned using a step reference signal and a ramp reference signal. ES optimizes the above parameters by reducing the objective function to achieve a more favorable tracking response. Five reference signals are used to evaluate the learned controller after the parameter learning process is completed. Experiments were performed on a test bench equipped with an electronic throttle, and the experimental results show that the control scheme is capable of tracking multiple reference trajectories quickly and accurately. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Yi Wu;Kaixue Ma;
Pages: 2612 - 2624 Abstract: Multimode resonators (MMRs) can be utilized to build multiband bandpass filters (MBPFs). Each individual mode serves as the operating mode for one of the passbands in a well-known coupling topology. However, several difficulties are involved with the classical design method. In this article, we present a novel star-like topology to design MBPFs based on MMRs. Differently from the classical use, in the proposed novel topology, they are so regarded as bandstop resonators and provide multiple controllable transmission zeros. To explain the proposed concept and the implementation method, two bandstop MMRs – a classical dual-mode stub loaded resonator (DSLR) and a new tri-mode dual-stub loaded resonator (TDSR) – are investigated with detailed design formulas and diagrams. Then, a tri-band MBPF with equal individual sub-bands, a quad-band and two quint-band MBPFs with unequal individual sub-bands are synthesized and designed using the proposed bandstop MMRs. Additionally, simulations and experiments are used to verify the efficiency of all prototypes. Good agreements with the theoretical calculations are observed for all the MBPFs. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:
Xiaoxuan Ji;Peng Zhao;Haoyu Wang;Hengzhao Yang;Minfan Fu;
Pages: 2625 - 2634 Abstract: Due to the compatibility considerations, it is not attractive for the commercial wireless charger to modify the Qi-standard coils for charging multiple loads. This paper would explore the potential of a power relay module (PX) to address this issue. The multiple-coil PX would enhance the effective coupling when the standard coupler fails. In this paper, different types of PXs are developed for various applications, including a two-coil PX using series compensation, a two-coil PX using high-order compensation, and a three-coil PX using high-order compensation. Their power and efficiency characteristics are analyzed in a uniform manner, and the benefits of different PXs are justified through a planar charger and a bowl-shape charger in the experiment. The implemented bowl-shape charger is able to offer one fast-charging channel for a single device (30 W) and one multiple-load channel for at most four devices (each 10 W) simultaneously. The peak efficiency is 88% when the overall delivered power of PX is 70W. PubDate:
June 2023
Issue No:Vol. 70, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.