Abstract: Honglan Jiang, Cong Liu, Leibo Liu, Fabrizio Lombardi, Jie Han

Often as the most important arithmetic modules in a processor, adders, multipliers, and dividers determine the performance and energy efficiency of many computing tasks. The demand of higher speed and power efficiency, as well as the feature of error resilience in many applications (e.g., multimedia, recognition, and data analytics), have driven the development of approximate arithmetic design. In this article, a review and classification are presented for the current designs of approximate arithmetic circuits including adders, multipliers, and dividers. A comprehensive and comparative evaluation of their error and circuit characteristics is performed for understanding the features of various designs. By using approximate multipliers and adders, the circuit for an image processing application consumes as little as 47% of the power and 36% of the power-delay product of an accurate design while achieving similar image processing quality. PubDate: Fri, 11 Aug 2017 00:00:00 GMT

Abstract: Hui Li, Sébastien Le Beux, Martha Johanna Sepulveda, Ian O'connor

Single-layer optical crossbar interconnections based on Wavelength Division Multiplexing stand among other nanophotonic interconnects because of their low latency and low power. However, such architectures suffer from a poor scalability due to losses induced by long propagation distances on waveguides and waveguide crossings. Multi-layer deposited silicon technology allows the stacking of optical layers that are connected by means of Optical Vertical Couplers. This allows significant reduction in the optical losses, which contributes to improve the interconnect scalability but also leads to new challenges related to network designs and layouts. In this article, we investigate the design of optical crossbars using multi-layer silicon deposited technology. PubDate: Fri, 11 Aug 2017 00:00:00 GMT

Monolithic three-dimensional (M3D) integration is gaining momentum, as it has the potential to achieve significantly higher device density compared to 3D integration based on through-silicon vias. M3D integration uses several techniques that are not used in the fabrication of conventional integrated circuits (ICs). Therefore, a detailed analysis of the M3D fabrication process is required to understand the impact of defects that are likely to occur during chip fabrication. In this article, we first analyze electrostatic coupling in M3D ICs, which arises due to the aggressive scaling of the interlayer dielectric (ILD) thickness. We then analyze defects that arise due to voids created during wafer bonding, a key step in most M3D fabrication processes. PubDate: Tue, 11 Jul 2017 00:00:00 GMT

In this article, we present a comprehensive study of four frequency locking mechanisms in Spin Torque Nano Oscillators (STNOs) and explore their suitability for a class of specialized computing applications. We implemented a physical STNO model based on Landau-Lifshitz-Gilbert-Slonczewski equation and benchmarked the model to experimental data. Based on our simulations, we provide an in-depth analysis of how the “self-organizing” ability of coupled STNO array can be effectively used for computations that are unsuitable or inefficient in the von-Neumann computing domain. As a case study, we demonstrate the computing ability of coupled STNOs with two applications: edge detection of an image and associative computing for image recognition. PubDate: Tue, 11 Jul 2017 00:00:00 GMT

Abstract: Mrityunjay Ghosh, Amlan Chakrabarti, Niraj K. Jha

Quantum computing is a new computational paradigm that promises an exponential speed-up over classical algorithms. To develop efficient quantum algorithms for problems of a non-deterministic nature, random walk is one of the most successful concepts employed. In this article, we target both continuous-time and discrete-time random walk in both the classical and quantum regimes. Binary Welded Tree (BWT), or glued tree, is one of the most well-known quantum walk algorithms in the continuous-time domain. Prior work implements quantum walk on the BWT with static welding. In this context, static welding is randomized but case-specific. We propose a solution to automatically generate the circuit for the Oracle for welding. PubDate: Thu, 29 Jun 2017 00:00:00 GMT

Abstract: M. Hassan Najafi, Peng Li, David J. Lilja, Weikang Qian, Kia Bazargan, Marc Riedel

Computations based on stochastic bit streams have several advantages compared to deterministic binary radix computations, including low power consumption, low hardware cost, high fault tolerance, and skew tolerance. To take advantage of this computing technique, previous work proposed a combinational logic-based reconfigurable architecture to perform complex arithmetic operations on stochastic streams of bits. The long execution time and the cost of converting between binary and stochastic representations, however, make the stochastic architectures less energy efficient than the deterministic binary implementations. This article introduces a methodology for synthesizing a given target function stochastically using finite-state machines (FSMs), and enhances and extends the reconfigurable architecture using sequential logic. PubDate: Thu, 29 Jun 2017 00:00:00 GMT

Abstract: Sai Vineel Reddy Chittamuru, Srinivas Desai, Sudeep Pasricha

On-chip communication is widely considered to be one of the major performance bottlenecks in contemporary chip multiprocessors (CMPs). With recent advances in silicon nanophotonics, photonics-based network-on-chip (NoC) architectures are being considered as a viable solution to support communication in future CMPs as they can enable higher bandwidth and lower power dissipation compared to traditional electrical NoCs. In this article, we present SwiftNoC, a novel reconfigurable silicon-photonic NoC architecture that features improved multicast-enabled channel sharing, as well as dynamic re-prioritization and exchange of bandwidth between clusters of cores running multiple applications, to increase channel utilization and system performance. PubDate: Thu, 29 Jun 2017 00:00:00 GMT

Near-threshold computing (NTC) circuits have been shown to offer significant energy efficiency and power benefits but with a huge performance penalty. This performance loss exacerbates if process and voltage variations are considered. In this article, we demonstrate that three-dimensional (3D) IC technology can overcome this limitation. We present a detailed case study with a 28nm commercial-grade core at 0.6V operation optimized with various 3D IC physical design methods. First, our study under the deterministic case shows that 3D IC NTC design outperforms 2D IC NTC by 29.5% in terms of performance at comparable energy. This is significantly higher than the 12.8% performance benefit of 3D IC at nominal voltage supplies due to higher delay sensitivity to input slew at lower voltages. PubDate: Thu, 29 Jun 2017 00:00:00 GMT

Content Addressable Memory (CAM) is widely used in applications where searching a specific pattern of data is a major operation. Conventional CAMs suffer from area, power, and speed limitations. We propose Spin-Torque Transfer RAM--based Ternary CAM (TCAM) cells. The proposed NOR-type TCAM cell has a 62.5% (33%) reduction in number of transistor compared to conventional CMOS TCAMs (spintronic TCAMs). We analyzed the sense margin of the proposed TCAM with respect to 16-, 32-, 64-, 128-, and 256-bit word sizes in 22nm predictive technology. Simulations indicated a reliable sense margin of 50mV even at 0.7V supply voltage for 256-bits word. We also explored a selective threshold voltage modulation of transistors to improve the sense margin and tolerate process and voltage variations. PubDate: Sun, 21 May 2017 00:00:00 GMT

In this article, we propose using optical networks-on-chip (NoCs) to design cache access protocols for large shared L2 caches. We observe that the problem is unique because optical networks have very low latency, and in principle all of the cache banks are very close to each other. A naive approach is to broadcast a request to a set of banks that might possibly contain the copy of a block. However, this approach is wasteful in terms of energy and bandwidth. Hence, we propose a set of novel schemes that create a set of virtual networks (overlays) of cache banks over a physical optical NoC. PubDate: Sun, 21 May 2017 00:00:00 GMT

Abstract: Mahboobeh Houshmand, Mehdi Sedighi, Morteza Saheb Zamani, Kourosh Marjoei

One-way quantum computation (1WQC) is a model of universal quantum computations in which a specific highly entangled state called a cluster state allows for quantum computation by single-qubit measurements. The needed computations in this model are organized as measurement patterns. The traditional approach to obtain a measurement pattern is by translating a quantum circuit that solely consists of CZ and J(α) gates into the corresponding measurement patterns and then performing some optimizations by using techniques proposed for the 1WQC model. However, in these cases, the input of the problem is a quantum circuit, not an arbitrary unitary matrix. Therefore, in this article, we focus on the first phase—that is, decomposing a unitary matrix into CZ and J(α) gates. PubDate: Sun, 21 May 2017 00:00:00 GMT