Getting Started With HPC

Post by

Published

April 29, 2021

H

Tags:

No additional tags.

igh-performance computing (HPC) is the ability to process data and perform complex calculations at high speeds. High-Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.

This week's ACM Select features several introductory resources on HPC, with perspectives on the history, technical definitions of HPC terminologies, trends in HPC, and the implications of its use.

As always, we invite you to share your feedback and suggestions at selects-feedback@acm.org. For more resources in computing, we encourage you to explore the ACM Digital Library and Learning Center.

What is HPC?

The articles in this subsection cover the basics of HPC, some motivating real-world use cases, and introduces supercomputers.

‍

HPC 101

Published in the InsideHpc online Magazine

This article is a collection of links to other short articles covering very high-level concepts of HPC including what’s a cluster, HPC architecture for beginners, Computing: the processors, disks, and more, communications networks, and the cluster software. It is a good starting point for anyone who wants to know a high-level overview of HPC.

‍

HPC Matters, and We Have Video Evidence

Published in the Communications of the ACM November 17, 2014

HPC provides the means to support and advance research in a variety of fields. This reference we suggest is a collection of short videos intended to give the viewer a broad overview of some of the fields where HPC has an impact, ranging from advanced science to industrial R&D impacting our everyday lives.

What are Supercomputers and Supercomputing

Published in XRDS: Crossroads, The ACM Magazine for Students June 2009 Article No.: 3

The next reading offers a historical perspective on the evolution of supercomputers, showing how their development has often been at the forefront of computer architecture innovation. This short paper also provides a timeline of significant milestones, each contributing to make HPC what it is today.

[Read More]

Programming Languages in HPC

The selected articles in this section cover most popular programming models of HPC, their tutorials and shed lights on their comparative benefits and application.

MPI Tutorial

Authored by Wes Kendall, Dwaraka Nath, and Wesley Bland

As you might have read in the HPC101 article, MPI (Message Passing Interface) is another fundamental set of APIs that application programmers can use to write parallel programs and run those programs on a cluster. This article walks through MPI programming with some very useful examples along with introductions and explanations.

[Read More]

[Further Reading]

Translating OpenMP Device Constructs to OpenCL Using Unnecessary Data Transfer Elimination

Published in SC '16: Proceedings of the International Conference for High-Performance Computing, Networking, Storage and Analysis November 2016 Article No.: 51 Pages 1–12

HPC goes beyond OpenMP and MPI: This article gives a sense of a few other programming languages out there used in HPC such as OpenCL, OpenACC, and CUDA. This paper proposes a framework that translates OpenMP 4.0 accelerator directives to OpenCL.

[Read More]

‍

‍

CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application

Published in CCGRID '13: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, May 2013 Pages 136–143

This is yet another example of different programming languages used in HPC and their performance implications. “OpenACC is a new accelerator programming interface that provides a set of OpenMP-like loop directives for the programming of accelerators in an implicit and portable way. It allows the programmer to express the offloading of data and computations to accelerators, such that the porting process for legacy CPU-based applications can be significantly simplified. This paper focuses on the performance aspects of OpenACC using two micro benchmarks and one real-world computational fluid dynamics application.”

[Read More]

‍

What are the applications?

Published in ISC 2020 Digital Youtube channel, Jul 6, 2020

In this video, Rick Stevens from Argonne National Laboratory, University of Chicago gives an overview of some of the ongoing work in the US applying HPC and AI to COVID-19 related research problems. The first is the COVID-19 HPC consortium that joins US supercomputing centers, computing and technology vendors and federal agencies to provide HPC cycles to the SARS-CoV-2/COVID-19 research community. The second topic is the nine DOE laboratory collaboration formed to apply advanced computing to the problem of developing molecular therapeutics for COVID-19. The talk gives a brief overview of the science, the state of play and how HPC and AI are being used and progress towards solutions.

[Watch full talk]

‍

“The convergence of AI and HPC provides the means to address big data challenges in science, engineering and industry, and enables the creation of disruptive approaches for data-driven discovery and innovation. Realizing these goals demands a concerted effort between AI practitioners, HPC and domain experts” --- Huerta et. al

How does HPC relate to ML/Deep Learning?

Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure

Published in the Journal of Big Data volume 7, Article number: 88 (2020) ,16 October 202

In this survey paper, the authors discuss the convergence of AI and HPC and highlights the benefits of using HPC infrastructure in accelerating the pace of distributed model training. The article highlights software and hardware challenges and recommendations to streamline the use of HPC resources for AI research including support for containerization, up-to-date documentation and availability of distributed training software stacks. The article also briefly discusses the tradeoffs for using cloud computing and HPC for distributed training.

Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis

Published in ACM Computing Surveys, August 2019 Article No.: 65

The following survey provides an introduction to the computational and performance aspects of Deep Learning from an HPC perspective. In addition to showing how most aspects of DL implementations already benefit from the pervasive application of HPC techniques, the modelling and analysis in the paper illustrate how to think about performance in a complex environment that has become central to our way of life.

HPC in the Cloud

HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

Published in ACM Computing SurveysJanuary 2018 Article No.: 8

HPC is not only about supercomputers, and in recent years substantial progress has been made in running applications with substantial performance requirements on the Cloud. The next reading offers a survey of the various efforts in the field, from the feasibility of doing HPC on the Cloud, to how it affects the Cloud infrastructure, finally to how to make the Cloud a better environment for HPC applications.

10 Years Later: Cloud Computing is Closing the Performance Gap

Published in ICPE '21: Companion of the ACM/SPEC International Conference on Performance Engineering, April 2021 Pages 41–48

In this paper, researchers from Lawrence Berkeley National Laboratory try to answer the question of whether cloud computing can provide competitive performance for scientific applications. The authors conduct a series of experiments that include hardware and system microbenchmarks and user application proxies to identify performance gaps and demonstrate that today’s cloud HPC systems can deliver competitive performance for computationally intensive, memory-intensive and communication intensive scientific applications.

Jesmin Jahan Tithi

Dr. Jesmin Jahan Tithi is an AI Research Scientist at Intel focusing on high-performance computing and software-hardware codesign of next-generation processors targeting large-scale machine learning and graph applications. At Intel, Jesmin contributed to DOE's OCR, ECP Pathforward, CORAL2 projects, and DARPA's HIVE and SDH projects. She received her Ph.D. from Stony Brook University, New York (SUNYSB), and worked as an intern in Google, Intel, and PNNL during her Ph.D. After finishing her B.Sc in Computer Science and Engineering from the Bangladesh University of Engineering and Technology, she also worked as a Lecturer in the same prestigious department. Jesmin is a founding member of the Z-inspection -- an assessment process for Trustworthy & Ethical AI. Jesmin has been a member of the ACM Future of Computing Academy (Dec 2019-June 2021), Heidelberg Nobel Laureate Forum alumni (2019), and a current member of the ACM Code of Professional Ethics Board, and ACM Selects. Jesmin is a regular reviewer for ACM and IEEE conferences and journals. Jesmin holds six issued patents and over twenty-two peer-reviewed publications.

Pavan Kumar

Pavan Kumar is the Product Manager for Cloud HPC (High Performance Computing) at Google. Previously, Pavan was the Founder and CTO of Cocoon Health and led development of the world's first computer vision based vital signs & sleep monitoring products. He was also a co-founder of Çhrysalis Cloud, a developer-first platform to help more efficiently stream and analyze massive amounts of data (including video) in real-time. Prior to that, Pavan's education and career have focused on software development at leading companies such as Apple and NetApp. He is a published researcher with applied experience in machine learning, computer vision/ AI (algorithms & infrastructure), developing cloud-native services, video streaming and containerization. Pavan holds a MS in Computer Science from UC San Diego. Pavan is an internationally recognized speaker on the subject of computer vision/ machine learning, and has been a featured presenter at venues including the first-ever White House Demo Day hosted by President Obama, Embedded Vision Summit, SVIEF in China and INKtalks in India. Recently, Pavan was the commencement speaker at UC San Diego 2018 master's graduation ceremony.

Fabio Checconi

Fabio Checconi received the BS and MS degrees in computer engineering from the University of Pisa, Italy, and the PhD degree in computer engineering from Scuola Superiore S. Anna, Pisa, Italy. He is currently a Research Scientist with the Intel Parallel Computing Labs, Santa Clara, California. His research interests include real-time operating systems, parallel graph algorithms and novel architectures for data analytics.

William Magro

THere's More

Recommended Selects

See all selects

Sep

29

//

2022

Getting Started Series

Getting Started with Internet of Things: IoT Applications

This Selects finalizes with an example application domain of Industrial Internet ofThings (IIoT), and a source to delve into state-of-the-art IoT research trends.

Aug

30

//

2022

Getting Started Series

Getting Started with Internet of Things: Computing and Communication

The selection includes easy to read articles describing and motivating the IoT, and later deep dives into the major aspects of IoT such as communication protocols, edge-to-cloud continuum, AI and data analytics, and security/privacy.

Aug

2

//

2022

Computing in Practice Series

Trustworthy AI in Healthcare #02

AI needs to be trustworthy. Trustworthiness means that healthcare organizations, doctors, and patients should be able to rely on the AI solution as being lawful, ethical, and robust.