for Journals by Title or ISSN
for Articles by Keywords

Publisher: IBM   (Total: 1 journals)   [Sort alphabetically]

Showing 1 - 1 of 1 Journals sorted by number of followers
IBM J. of Research and Development     Hybrid Journal   (Followers: 17, SJR: 0.275, CiteScore: 1)
Journal Cover
IBM Journal of Research and Development
Journal Prestige (SJR): 0.275
Citation Impact (citeScore): 1
Number of Followers: 17  
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 0018-8646
Published by IBM Homepage  [1 journal]
  • Message from the Senior Vice President, IBM Cognitive Systems
    • Authors: Bob Picciano;
      Pages: 1 - 2
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • Preface: IBM POWER9 Technology
    • Authors: Pratap Pattnaik;
      Pages: 1 - 2
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • IBM POWER9 processor and system features for computing in the cognitive
    • Authors: L. B. Arimilli;B. Blaner;B. C. Drerup;C. F. Marino;D. E. Williams;E. N. Lais;F. A. Campisano;G. L. Guthrie;M. S. Floyd;R. B. Leavens;S. M. Willenborg;R. Kalla;B. Abali;
      Pages: 1:1 - 1:11
      Abstract: IBM POWER9 is a family of processor chips designed to serve a diverse set of workloads. New features have been added to POWER9 to address emerging workloads such as cognitive and artificial intelligence applications. POWER9 also further enhances features introduced in IBM POWER8 for big data and cloud applications. Distinct chips using common intellectual property building blocks are provided to enable enterprise applications requiring large symmetric multiprocessor servers with large memory footprints, as well as one to two socket industry form-factor servers. In this paper, we describe new POWER9 features for both system types. Several highly differentiated new features are described in other papers in this issue of the IBM Journal, and they provide a more in-depth description of their unique design aspects.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • IBM POWER9 processor core
    • Authors: H. Q. Le;J. A. Van Norstrand;B. W. Thompto;J. E. Moreira;D. Q. Nguyen;D. Hrusecky;M. J. Genden;M. Kroener;
      Pages: 2:1 - 2:12
      Abstract: The IBM POWER9 processor is the latest Reduced Instruction Set Computer microprocessor from IBM. POWER9 employs a new modular core microarchitecture to counter the technology trend of decreasing frequency and increasing power density from generation to generation. The new POWER9 design enables a family of processors optimized for a broad range of server applications. The new microarchitecture is closely coupled with a rich set of new instructions geared toward data-centric applications. In this paper, we describe the POWER9 core microarchitecture innovations, its new instructions and features, and the exploitation of this new design for computing in the cognitive era.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • IBM POWER9 memory architectures for optimized systems
    • Authors: W. J. Starke;J. S. Dodson;J. Stuecheli;E. Retter;B. W. Michael;S. J. Powell;J. A. Marcella;
      Pages: 3:1 - 3:13
      Abstract: The IBM POWER9 processor chipset provides a variety of system memory architecture interfaces to enable highly differentiated system offerings: a high bandwidth, high capacity, highly reliable, buffered architecture; a compute-density-optimized direct DDR attach architecture; heterogeneous integration of graphics processing unit memory into the host system memory; and an agnostic, flexibly attached SCM architecture. In this paper, we explore these architectures and the targeted optimizations they provide for various classes of workloads. We also explore the development synergies and semiconductor physical design tradeoffs associated with the varying implementations, and finally, we describe several hypothetical systems that could be constructed by utilizing these memory architectures.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • IBM POWER9 circuit design and energy optimization for 14-nm technology
    • Authors: E. J. Fluhr;R. M. Rao;H. Smith;A. Buyuktosunoglu;R. Bertran;
      Pages: 4:1 - 4:11
      Abstract: Modern multicore microprocessors require attentive development to energy requirements when maximizing power-performance efficiency and ensuring reliable plus scalable functionality. IBM POWER9 relies on extensive modeling to identify representative workloads used when analyzing thermal design power and regulator design power against product requirements. Compounding benefits of circuit optimizations applied to the diverse subcomponents of the chip results in lower power cores, caches, and memory/IO interconnect. Specific dc- and ac-current analyses ensure proper definition of chip specifications for system voltage and current delivery. Finally, a systematic exploration of microbenchmarks on intermediate and final POWER9 hardware provides insight into processor core requirements while validating model accuracy.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • XIVE: External interrupt virtualization for the cloud infrastructure
    • Authors: F. Auernhammer;R. L. Arndt;
      Pages: 5:1 - 5:10
      Abstract: In today's microprocessors, external interrupts are the only processor resource that has no full virtualization hardware support. Therefore, they create significant overhead in the form of virtual machine (VM) context switches, especially in cloud and hyperscale datacenter systems, in which the physical processors are shared by a large number of virtual processors allocated to the VMs running on the system. The IBM POWER9 processor closes this gap in the processor hardware virtualization support by introducing an eXternal Interrupt Virtualization Engine (XIVE) architecture. XIVE defines a holistic interrupt delivery mechanism that supports multiple layers of interrupt coalescing and hardware-based routing of interrupts to the correct physical thread and target level. As required, individual interrupts can thus be routed directly to a user process, to a specific supervisor or an operating system, or to the hypervisor, thereby eliminating the need for interrupt rerouting in software and minimizing the number of context switches. In addition, XIVE provides the means to automatically trigger escalation interrupts to the next higher privilege level in case the target of an interrupt is not dispatched. In this paper, we discuss the fundamental concepts of the XIVE architecture and provide an overview of the XIVE-based interrupt controller implementation of the IBM POWER9 processor.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • IBM POWER9 system software
    • Authors: J. Jann;P. Mackerras;J. Ludden;M. Gschwind;W. Ouren;S. Jacobs;B. F. Veale;D. Edelsohn;
      Pages: 6:1 - 6:10
      Abstract: The IBM POWER9 architecture offers a substantial set of novel and performance-improvement features that are made available to both scale-up and scale-out applications via system software. These features provide significant performance improvements for cognitive, cloud, and virtualization workloads, many of which use dynamic scripting languages. In this paper, we describe some of the key features.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • IBM POWER9 systems designed for commercial, cognitive, and cloud
    • Authors: N. S. Nett;R. X. Arroyo;T. Nguyen;B. W. Mashak;R. M. Zgabay;H. Nguyen;C. W. Mann;E. J. Hauptli;S. P. Mroz;W. J. Anderl;
      Pages: 7:1 - 7:13
      Abstract: A new era of computing has emerged that focuses on actionable insights and predictive analytics with machine learning and deep learning algorithms. This is referred to as the cognitive era of computing. Servers designed for cognitive computing require a much different architecture than a traditional commercial server designed for database transactional processing and process automation. For example, graphics processing unit acceleration and high-bandwidth I/O for scalability are some of the key requirements for cognitive computing. Another different set of requirements is driven by servers designed for cloud infrastructure. The requirements for a cloud server place an emphasis on the total cost of ownership, total cost of acquisition, as well as compute density and server management. In this paper, we describe the family of IBM POWER9 servers that have been designed to meet the differing requirements for the cognitive, commercial, and cloud market spaces. We describe how each server in the family has been optimized for one (or more) of these workloads by implementing different combinations of POWER9 module packages, memory subsystems, internal storage subsystems, system management, and different levels of reliability, accessibility, and serviceability.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • IBM POWER9 opens up a new era of acceleration enablement: OpenCAPI
    • Authors: J. Stuecheli;W. J. Starke;J. D. Irish;L. B. Arimilli;D. Dreps;B. Blaner;C. Wollbrink;B. Allison;
      Pages: 8:1 - 8:8
      Abstract: Open Coherent Accelerator Processor Interface (OpenCAPI) is a new industry-standard device interface that enables the development of host-agnostic devices that can coherently connect to any host platform that supports the OpenCAPI standard. This in turn allows such devices to coherently cache host memory to facilitate accelerator execution, perform direct memory access and atomics to host memory, send messages and interrupts to the host, and act as a host memory home agent. OpenCAPI utilizes high-frequency differential signaling technology while providing the high bandwidth and low latency needed by advanced accelerators. OpenCAPI encapsulates the serializing cache access and address translation constructs in high-speed host silicon technology to minimize overhead and design complexity in attached silicon such as field-programmable gate arrays and application-specific integrated circuits. Finally, OpenCAPI architecturally ties together transaction layer, link layer, and physical layer attributes to optimally align to high serializer/deserializer (SerDes) ratios and enable high-bandwidth, highly parallel exploitation of attached silicon.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • Functionality and performance of NVLink with IBM POWER9 processors
    • Authors: IBM POWER9 NPU team;
      Pages: 9:1 - 9:10
      Abstract: Heterogeneous computer systems with multiple types of processing elements (PEs) are becoming a popular design to optimize performance and efficiency for a wide variety of applications. Each part of an application can be executed on the PE for which it is best suited. In heterogeneous systems, communication, efficient data movement, and memory sharing across PEs are critical to execute an application across the different PEs while incurring minimal overhead for communication and synchronization. The IBM POWER9 processor supports the NVIDIA NVLink interface, a high-performance interconnect with many such capabilities. In the IBM Power System AC922, IBM POWER9 processors directly connect to multiple NVIDIA GPUs using NVLink. In this paper, we highlight the important functional and performance capabilities of NVLink with the POWER9 processor. These include high bandwidth, hardware cache coherence, fine-grained data movement, and hardware support for atomic operations across all PEs of a compute node. We also present an analysis of how these performance and functional capabilities of POWER9 processors and NVLink are expected to have significant impacts on performance and programmability across a variety of important applications, such as machine learning and domains within high-performance computing.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • IBM POWER9 and cognitive computing
    • Authors: M. Kumar;W. P. Horn;J. Kepner;J. E. Moreira;P. Pattnaik;
      Pages: 10:1 - 10:12
      Abstract: Cognitive applications are complex and are composed of multiple components exhibiting diverse workload behavior. Efficient execution of these applications requires systems that can effectively handle this diversity. In this paper, we show that IBM POWER9™ shared memory systems have the compute capacity and memory throughput to efficiently handle the broad spectrum of computing requirements for cognitive workloads. We first review the GraphBLAS interface defined for supporting cognitive applications, particularly whole-graph analytics. We show that this application-programming interface effectively separates the concerns between the analytics application developer and the system developer and simultaneously enables good performance by permitting system developers to make platform-specific optimizations. A linear algebra formulation and execution of betweenness centrality kernel in the High-Performance Computing Scalable Graph Analysis Benchmark, for 256 million vertices and 2 billion edges graphs, delivers a sixfold reduction in execution time over a reference implementation. Following that, we present the results of benchmarking the forward propagation step of deep neural networks (DNNs) written in GraphBLAS and executed on POWER9. We present the rationale and evidence for weight matrices of large DNNs being sparse and show that for sparse weight matrices, GraphBLAS/POWER® has a two orders-of-magnitude performance advantage over dense implementations. Applications requiring analysis of graphs larger than several tens of billion vertices require distributed computing environments such as Apache Spark to provide resilience and parallelism. We show that when linear algebra techniques are implemented in an Apache Spark environment, we are able to leverage the parallelism available in POWER9 Servers.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • Addressing verification challenges of heterogeneous systems based on IBM
    • Authors: K.-D. Schubert;S. S. Abrar;D. Averill;E. Bauman;A. C. Brown;R. Cash;D. Chatterjee;J. Gullickson;M. Nelson;K. A. Pasnik;K. Sugavanam;
      Pages: 11:1 - 11:12
      Abstract: In this paper, we describe methods and techniques used to verify the IBM POWER9 microprocessor in the context of heterogeneous and open system structures. The base concepts for the functional verification are those that have been already used in IBM POWER7 and IBM POWER8 verification. However, the POWER9 design point provided new features to connect to accelerator chips for cognitive or other use cases. These features required innovative verification solutions. The examples given in this paper demonstrate how a combination of new tools and new forms of collaboration addressed these verification challenges.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
  • IBM POWER9 package technology and design
    • Authors: S. Chun;W. D. Becker;J. Casey;S. Ostrander;D. Dreps;J. A. Hejase;R. M. Nett;B. Beaman;J. R. Eagle;
      Pages: 12:1 - 12:10
      Abstract: The first-level package that contains the IBM POWER9 processor chip is designed to achieve the high computational performance needed for cognitive systems in a cost-effective design. The throughput data bandwidth of the POWER9 package for high-end scale-up systems is more than 1 TB/s, which is double the data bandwidth of the previous generation. This increase in bandwidth is achieved by introducing a dielectric with a loss tangent of 40% of the predecessor material, a C4 density increase of 15%, higher number of stacked vias to reduce jogging, and improved via pattern and placement to increase the frequency and density of signals. The cloud platform scale-out POWER9 package leverages the high-end and cognitive platform package attributes to maintain signal frequency while introducing novel chip-package-system co-design techniques. These design techniques were used to produce a well-balanced two-socket entry-level package with four build-up layers above and below the core, instead of six, resulting in a significant cost reduction from the previous generation while supporting the signal frequencies of POWER9. POWER9 systems are the first to offer 16-Gb/s PCIe Gen4 and 25.8-Gb/s open coherent accelerator processor interface that interconnect the processor to the I/O, networking, and accelerators required for systems in the cognitive computing era. In this paper, we present the material and wiring technology needed to achieve the signal performance up to 25.8 Gb/s per channel, the package physical attributes, and the chip-package-system co-design methodology to achieve the increased signal density, minimize the crosstalk, and maximize the frequency while reusing the package form factors of the previous generation, IBM POWER8.
      PubDate: July-Sept. 1 2018
      Issue No: Vol. 62, No. 4/5 (2018)
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-