HC30 (2018)

Held at The Flint Center for the Performing Arts, Cupertino, California, Sunday-Tuesday, August 19-21, 2018.

Zipfile of All HC30 files (161MB)

VideosAt A GlanceTutorialsKeynotesConf. Day1Conf. Day2Posters

Tutorial 1: Blockchains and Distributed Ledgers

Tutorial 2: Architectures for Accelerating Deep Neural Nets

Session 1: Mobile/Power efficient processors

Session 2: Graphics Solutions

Keynote 1: Spectre/Meltdown and What It Means for Future Chip Design

Session 3: IoT/Edge Computing

Session 4: Security

Session 5: Switching Fabrics and FPGA Architectures

Session 6: New Technologies

Keynote 2: Adaptable Intelligence: the Next Computing Era

Session 7: Machine Learning I

Session 8: Machine Learning II

Session 9: Server Processors

  • Sunday 8/19: Tutorials
    • 8:00 AM – 9:15 AM: Breakfast
    • 9:15 AM – 12:45 PM: Tutorial 1: Blockchains and Distributed Ledgers (Chair: Geoffrey Burr)
    • 12:45 PM – 2:00 PM: Lunch
    • 2:00 PM – 5:00 PM: Tutorial 2: Architectures for Accelerating Deep Neural  Nets (Chair: Kurt Keutzer)
    • 5:00 PM – 6:00 PM: Reception
  • Monday 8/20: Conference Day 1
    • 7:30 AM – 9:15 AM: Breakfast
    • 9:15 AM – 9:30 AM: Opening Remarks
    • 9:30 AM – 11:00 AM: Mobile/Power efficient processors (Chair: Alisa Scherer)
    • 11:00 AM – 11:30 AM: Break
    • 11:30 AM – 12:30 PM: Graphics Solutions (Chair: Ralph Wittig)
    • 12:30 PM – 1:45 PM: Lunch
    • 1:45 PM – 3:30 PM: Keynote: Spectre/Meltdown and What It Means for Future Chip Design (Chair: Partha Ranganathan)
    • 3:30 PM – 4:00 PM: Break
    • 4:00 PM – 5:30 PM: IoT/Edge Computing (Chair: Bill Dally)
    • 5:30 PM – 6:30 PM: Security (Chair: Alan Smith)
    • 6:30 PM – 7:30 PM: Reception (Wine & Snacks)
  • Tuesday 8/21: Conference Day 2
    • 7:45 AM – 8:45 AM: Breakfast
    • 8:45 AM – 10:15 AM: Switching Fabrics and FPGA Architectures (Chair: Pradeep Dubey)
    • 10:15 AM – 10:45 AM: Break
    • 10:45 AM – 11:45 AM: New Technologies (Chair: Forest Baskett)
    • 11:45 AM – 12:45 PM: Keynote 2: Adaptable Intelligence: the Next Computing Era (Chair:John Hennessy)
    • 12:45 PM – 2:00 PM: Lunch
    • 2:00 PM – 3:30 PM: Machine Learning I (Chair: Gary Lauterbach)
    • 3:30 PM – 4:30 PM: Machine Learning II (Chair: Ian Bratt)
    • 4:30 PM – 5:00 PM: Break
    • 5:00 PM – 7:00 PM: Server Processors (Chair: Dileep Bhandarkar)
    • 7:00 PM – 7:15 PM: Closing Remarks

Tutorials

Sun 8/19 Tutorial Title Presenter Affiliation
8:00 AM Breakfast
9:15 AM T1 A New Era in Distributed Computing with Blockchains and Databases Dr. C. Mohan, IBM Fellow, IBM Research–Almaden
Abstract:

A new era is emerging in the world of distributed computing with the growing popularity of blockchains (shared, replicated and distributed ledgers) and the associated databases as a way of integrating inter-organizational work. Originally, the concept of a distributed ledger was invented as the underlying technology of the cryptocurrency Bitcoin. But the adoption and further adaptation of it for use in the commercial or permissioned environments is what is of utmost interest to me and hence will be the focus of this tutorial. Computer companies like IBM and Microsoft, and many key players in different vertical industry segments have recognized the applicability of blockchains in environments other than cryptocurrencies. IBM did some pioneering work by architecting and implementing Fabric, and then open sourcing it. Now Fabric is being enhanced via the Hyperledger Consortium as part of The Linux Foundation. A few of the other efforts include Enterprise Ethereum, R3 Corda and BigchainDB.

While there is no standard in the blockchain space currently, all the ongoing efforts involve some combination of database, transaction, encryption, consensus and other distributed systems technologies. Some of the application areas in which blockchain pilots are being carried out are: smart contracts, supply chain management, know your customer, derivatives processing and provenance management. In this talk, I will survey some of the ongoing blockchain projects with respect to their architectures in general and their approaches to some specific technical areas. I will focus on how the functionality of traditional and modern data stores are being utilized or not utilized in the different blockchain projects. I will also distinguish how traditional distributed database management systems have handled replication and how blockchain systems do it. Since most of the blockchain efforts are still in a nascent state, the time is right for database and other distributed systems researchers and practitioners to get more deeply involved to focus on the numerous open problems.

11:15 AM Break
11:45 AM T1 Introduction to XRP and the XRP Ledger Yana Novikova, Senior Product Manager, Ripple
Abstract:

The XRP Ledger is a decentralized cryptographic ledger powered by a network of peer-to-peer servers. The XRP Ledger is the home of XRP, a decentralized digital asset built for payments and a source of liquidity that bridges different currencies worldwide.

What sets the XRP Ledger apart is its unique consensus algorithm that does not require the time and energy of “mining,” the way other such systems do. Instead of “proof of work” or even “proof of stake,” the XRP Ledger’s consensus algorithm uses a system where every participant has an overlapping set of “trusted validators” and those trusted validators efficiently agree on which transactions happen in what order. As of early 2018, the Bitcoin network uses more electricity per transaction than a US family home uses in an entire day, and confirming the transaction takes hours. In comparison, a single XRP transaction uses a negligible amount of electricity, and takes less than 4 or 5 seconds to confirm.

Abstractly, the XRP Ledger is a replicated state machine. The replicated state is the ledger maintained by each node in the network (both validator and tracking) and state transitions correspond to transactions submitted by clients of the network. Once validator nodes agree on sets of transactions to apply to the state, the XRP Ledger protocol specifies deterministic rules for ordering transactions within each set and how to apply transactions to generate the new ledger state. Thus, the role of the XRP Ledger Consensus Protocol (LCP) is only to make the network reach agreement on sets of transactions even in the presence of faulty or malicious participants (Byzantine), guaranteeing that every node generates a consistent ledger.

While several consensus algorithms exist for the Byzantine Generals Problem, specifically as it pertains to distributed payment systems, many suffer from high latency induced by the requirement that all nodes within the network communicate synchronously. XRP Ledger consensus algorithm that circumvents this requirement by utilizing collectively-trusted subnetworks within the larger network, where the “trust” required of these subnetworks is in fact minimal and can be further reduced with principled choice of the member nodes. In addition minimal connectivity is required to maintain agreement throughout the whole network. The result is a low-latency consensus algorithm which still maintains robustness in the face of Byzantine failures.

12:45 PM Lunch
2:00 PM T2 Architectures for Accelerating Deep Neural Nets
Abstract:
In the first portion of this tutorial, we provide a very brief introduction to Deep Neural Nets and their applications in computer vision, speech recognition, and other areas. We review the two key computational elements of Deep Neural Nets: inference and training in regards to their compute and memory requirements. Finally, we review popular target architectures for supporting these applications, including CPUs, GPUs, and custom DNN accelerators, including a discussion around common micro-architectures for acceleration of typical computational patterns and computational considerations around batch sizes, quantization and pruning.In the second portion of this tutorial we turn our focus to the problem of accelerating inference in edge devices. The devices range from autonomous vehicles, through mobile phones to very low power IOT devices. We consider both the real-time speed requirements of these applications as well as power and energy constraints. We consider effective design principles for reducing the computational requirements of DNNs and useful techniques for quantization/compression of DNN computations. We then go into depth on accelerator architectures for meeting these constraints.In the last portion of this tutorial we consider the problem of training Deep Neural Nets, particularly in the cloud. We briefly examine the ubiquitous synchronous stochastic gradient descent and asynchronous variants. We look at the problem of scaling DNN training on distributed multiprocessors and its attendant problems of increasing batch size and balancing computation and communication. We then broadly survey the architectures to support training, including special purpose accelerators.
2:00 PM T2 Overview of Deep Learning and Computer Architectures for Accelerating DNNs Michaela Blott Xilinx Research
2:50 PM T2 Accelerating Inference at the Edge Song Han Assistant Professor, MIT
3:40 PM Break
4:10 PM T2 Accelerating Training in the Cloud William L. Lynch, Ardavan Pedram Cerebras Systems
5:00 PM Reception
6:00 PM End of Reception
Mon 1:45 PM Keynote 1: Spectre/Meltdown The era of security: Introduction John Hennessy Stanford/Google
Tue 11:45 AM Keynote 2 Adaptable Intelligence: the Next Computing Era Victor Peng CEO, Xilinx

Conference Day1

Mon 8/20 Session Title Presenter Affiliation
7:30 AM Breakfast
9:15 AM Intro Opening Remarks by Conference Chairs Part A Part B David Lau (MIPS), John Kubiatowicz (UC Berkeley), Stefan Rusu (TSMC)
9:30 AM Mobile/Pwr Efficient Processors Samsung’s Exynos-M3 CPU Jeff Rupley Samsung Electronics
10:00 AM The Pixel Visual Core: Google’s Fully Programmable Image, Vision and AI Processor for Mobile Devices Jason Redgrave, Albert Meixner, Nathan Goulding-Hotta, Artem Vasilyev and Ofer Shacham Google
10:30 AM BROOM: An open-source Out-of-Order processor with resilient low-voltage operation in 28nm CMOS Christopher Celio, Pi-Feng Chiu, Krste Asanovic, David Patterson and Borivoje Nikolic UC Berkeley
11:00 AM Break
11:30 AM Graphics Solutions Intel’s High Performance Graphics solutions in thin and light mobile form factors Srinivas Chennupaty Intel
12:00 PM Delivering a new level of Visual Performance in and SoC – AMD Raven Ridge APU Dan Bouvier, Jim Gibney, Sonu Arora and Alex Branover AMD
12:30 PM Lunch
1:45 PM Keynote 1: Spectre/Meltdown The era of security: Introduction John Hennessy Stanford/Google
Spectre/Meltdown: the Project Zero/Reptoline journey Paul Turner Google
Exploiting modern microarchitectures, implications for SW Jon Masters Red Hat
Exploiting modern microarchitectures, implications for computer architects Mark Hill U Wisconsin-Madison
Panel Q & A
3:30 PM Break
4:00 PM IoT/Edge Computing SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IoT Devices Paul Whatmough (ARM), Sae Kyu Lee (IBM), Sam Xi, Udit Gupta, Lillian Pentecost, Marco Donato, Hsea-Ching Hseuh, David Brooks and Gu-Yeon Wei (Harvard University)
4:30 PM Navion: An Energy-Efficient Visual-Inertial Odometry Accelerator for Micro Robotics and Beyond Amr Suleiman, Zhengdong Zhang, Luca Carlone, Sertac Karaman and Vivienne Sze MIT
5:00 PM NVIDIA’s Xavier System-on-Chip Michael Ditty NVIDIA
5:30 PM Security The Hardware Security Platform Behind Azure Sphere Doug Stiles Microsoft
6:00 PM Titan: Google’s Root-of-Trust Security Silicon Dominic Rizzo and Parthasarathy Ranganathan Google
6:30 PM Reception
7:30 PM End of Reception

Conference Day2

Tue 8/21 Session Title Presenter Affiliation
7:45 AM Breakfast
8:45 AM Switching Fabrics and FPGA Architectures NVSwitch and DGX-2 – NVIDIA’s NVLink-Switching Chip and Scale-Up GPU-Compute Server Alexander Ishii and Denis Foley NVIDIA
9:15 AM Programmable Forwarding Planes at Terabit/s Speeds Patrick Bosshart Barefoot Networks
9:45 AM Xilinx Project Everest: ‘HW/SW Programmable Engine’ Juanjo Noguera, Chris Dick, Vinod Kathail, Gaurav Singh, Kees Vissers, and Ralph Wittig Xilinx
10:15 AM Break
10:45 AM New Technologies Architecture for Carbon Nanotube Based Memory (NRAM) Bill Gervasi Nantero
11:15 AM Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip David Fick and Michael Henry Mythic
11:45 AM Keynote 2 Adaptable Intelligence: the Next Computing Era Victor Peng CEO, Xilinx
12:45 PM Lunch
2:00 PM Machine Learning I ARM’s First Generation ML Processor Ian Bratt and John Brothers ARM
2:30 PM The NVIDIA Deep Learning Accelerator Frans Sijstermans NVIDIA
3:00 PM Tachyum Cloud Chip for Hyperscale workloads, deep ML, general, symbolic and bio AI Radoslav Danilak, Rodney Mullendore, Igor Shevlyakov and Kenneth Wagner Tachyum
3:30 PM Machine Learning II The Evolution of Deep Learning Accelerators Upon the Evolution of Deep Learning Algorithms Song Yao (DeePhi Tech),
Shuang Liang (Tsinghua University), Junbin Wang, Zhongmin Chen,
Shaoxia Fang, Lingzhi Sui, Qian Yu, Dongliang Xie, Xiaoming Sun,
Song Han (MIT), Yi Shan (DeePhi Tech), and Yu Wang (Tsinghua University)
4:00 PM Xilinx Tensor Processor: An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs Rahul Nimaiyar Xilinx
4:30 PM Break
5:00 PM Server Processors The IBM POWER9 Scale Up Processor Jeffrey Stuecheli IBM
5:30 PM Fujitsu High Performance CPU for the Post-K Computer Toshio Yoshida Fujitsu Limited
6:00 PM Vector Engine Processor of NEC’s Brand-New supercomputer SX-Aurora TSUBASA Yohei Yamada and Shintaro Momose NEC Corporation
6:30 PM Next Generation Intel Xeon(R) Scalable processor: Cascade Lake Sailesh Kottapalli and Akhilesh Kumar Intel
7:15 PM End of Conference

Posters

Title Authors Affiliation
DragonFly+: FPGA-Based Quad-Camera Visual SLAM System for Autonomous Vehicles Weikang Fang, Yanjun Zhang, Bo Yu and Shaoshan Liu Beijing Institute of Technology; PerceptIn
Ultra Low Latency and High Performance Deep Learning Processor Yang Kong, Jun Xu, Xiuyu Sun and Xulin Yu Alibaba
An Energy-Efficient Unified Deep Neural Network Accelerator with Fully-Variable Weight Precision for Mobile Deep Learning Applications Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim and Hoi-Jun Yoo KAIST
AIX: FPGA geared-up Neural Network Accelerator for NUGU, the 1st Commercial Korean Virtual Personal Assistant (VPA) Minwook Ahn, Seok Joong Hwang, Wonsub Kim, Seungrok Jung, Jeong-Ho Han, Sangjun Yang, Yeonbok Lee, Jazzson Park, Mookyoung Chung, Woohyung Lim and Youngjoon Kim SK Telecom
DnnWeaver v2.0: From Tensors to FPGAs Hardik Sharma, Jongse Park, Balavinayagam Samynathan, Behnam Robatmili, Shahrzad Mirkhani and Hadi Esmaeilzadeh UC San Diego; Georgia Institute of Technology; Bigstream, Inc
Integrated Power Management for High Performance Computing Noah Sturcken Ferric, Inc.
Laconic: A Deep Learning Inference Hardware Accelerator Sayeh Sharify, Mostafa Mahmoud, Alberto Delmas Lascorz, Milos Nikolic and Andreas Moshovos University of Toronto
Bit-Tactical: A SW/HW Approach to Exploiting Value and Bit Sparsity in Neural Networks Alberto Delmas Lascorz, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic and Andreas Moshovos University of Toronto
High-Level Synthesis of Multithreaded Accelerators for Irregular Applications Stefano Devecchi, Nicola Saporetti, Marco Minutoli, Marco Lattuada, Pietro Fezzardi, Vito Giovanni Castellana, Fabrizio Ferrandi and Antonino Tumeo Qualcomm; Politecnico di Milano; Pacific Northwest National Laboratory (PNNL)