Agenda

    
Agenda is subject to change. Times listed below are in Pacific.

Lesson Materials: will be provided closer to event date

Tuesday, July 29 - Preparation day (virtual)

Pacific time

Session

9:00 AM – 11:00 AM

Preparation Day - Welcome & Orientation    
Andrea Zonca, Lead of Scientific Computing Applications and Chair of the Summer Institute 
  Accounts, Login, Environment, Running Jobs and Logging into Expanse User Portal
Robert Sinkovits, Director of Education and Training, Emeritus
  Q&A wrap up

Monday, August 4

Pacific time

Main Room Session

8:00 AM – 8:30 AM

Check-in & Registration

8:30 AM - 9:30 AM Welcome & Overview
Andrea Zonca, Lead of Scientific Computing Applications and Chair of the Summer Institute 
9:30 AM - 12:00 PM
(break 10:30-10:45 AM)

Data Management: Data Storage, Data Transfers, File Systems

Marty Kandes, Computational and Data Science Research Specialist
Proper data management is essential for the effective use of advanced CI. This session will cover an overview of file systems, data compression, archives (tar files), checksums and MD5 digests, downloading data using wget and curl, data transfer and long-term storage solutions

12:00 PM - 1:30 PM Lunch
1:30 PM - 3:15 PM

Running Batch and Interactive Jobs 
Mary Thomas, Computational Data Scientist

3:15 PM - 3:30 PM

Break

3:30 PM  - 4:45 PM

Code Migration & Software Environments

Nicole Wolter, Computational and Data Science Research Specialist
Mahidhar Tatineni, Director of User Services

4:45 PM - 5:15 PM Q&A + Wrap-up
5:15 PM - 6:30 PM

Evening Reception

Tuesday, August 5

Pacific time

Main Room Session

8:00 AM – 8:30 AM

Check-in & Light Breakfast

8:30 AM - 10:30AM Parallel Computing Concepts
Robert Sinkovits, Director of Education and Training, Emeritus
Advanced cyberinfrastructure users, whether they develop their own software or run 3rd party applications, should understand fundamental parallel computing concepts. Here we cover supercomputer architectures, the differences between threads and processes, implementations of parallelism (e.g., OpenMP and MPI), strong and weak scaling, limitations on scalability (Amdahl’s and Gustafson’s Laws) and benchmarking. We also discuss how to choose the appropriate number of cores, nodes or GPUs when running your applications and, when appropriate, the best balance between threads and processes. This session does not assume any programming experience.
10:30 AM  - 10:45 AM

Break

10:45 AM - 12:00 PM High Throughput Computing
Marty Kandes, Computational and Data Science Research Specialist
High-throughput computing (HTC) workloads are characterized by large numbers of small jobs. These frequently involve parameter sweeps where the same type of calculation is done repeatedly with different input values or data processing pipelines where an identical set of operations is applied to many files. This session covers the characteristics and potential pitfalls of HTC, job bundling, the Open Science Grid and the resources available through the Partnership to Advance Throughput Computing (PATh).

12:00 PM - 1:30 PM

Lunch
1:30 PM – 2:15 PM Getting Help 
Nicole Wotler,  Computational and Data Science Research Specialist
Reducing the time and effort needed to address problems related to application performance, batch job submission or data management can minimize frustration and enable the users to become more productive. In this section we will cover common problems and best practices on resolving issues.
2:15 PM - 4:30 PM
(break 3:15 PM - 3:30 PM)
Parallel Computing using MPI & Open MP 
Mahidhar Tatineni, Director of User Services
This session is targeted at attendees who are looking for a hands-on introduction to parallel computing using MPI and Open MP programming. The session will start with an introduction and basic information for getting started with MPI. An overview of the common MPI routines that are useful for beginner MPI programmers, including MPI environment set up, point-to-point communications, and collective communications routines will be provided. Simple examples illustrating distributed memory computing, with the use of common MPI routines, will be covered. The OpenMP section will provide an overview of constructs and directives for specifying parallel regions, work sharing, synchronization and data scope. Simple examples will be used to illustrate the use of OpenMP shared-memory programming model, and important run time environment variables Hands on exercises for both MPI and OpenMP will be done in C and FORTRAN.
4:30 PM - 4:45 PM Q&A + Wrap-up

Wednesday, August 6

Pacific time

Main Room Session

8:00 AM – 8:30 AM

Check-in & Light Breakfast

8:30 AM - 9:30 AM

Knowledge Management 

Subhasis Dasgupta, Computational and Data Researcher
This session will help participants understand knowledge management and how to implement it, specifically within the scientific community. It will also highlight the fundamental shift in the machine learning paradigm and how to incorporate knowledge management into daily processes. This section will cover the basic concepts of knowledge management, from ontology development to document management.

9:30 AM - 12:00 PM
(break 10:30-10:45 AM)
Deep Learning - Part 1
Mai Nguyen, Lead for Data Analytics
Paul Rodriguez, Computational Data Scientist

Deep learning, a subfield of machine learning, has seen tremendous growth and success in the past few years. Deep learning approaches have achieved state-of-the-art performance across many domains, including image classification, speech recognition, and biomedical applications. This session provides an introduction to neural networks and deep learning concepts and approaches. Examples utilizing deep learning will be presented, and hands-on exercises will be covered using Keras. Please note: Knowledge of fundamental machine learning concepts and techniques is required.

12:00 PM - 1:30 PM

Lunch
1:30 PM - 4:30 PM
(break 3:15 PM - 3:30 PM)
Deep Learning – Part 2
Mai Nguyen, Lead for Data Analytics
Paul Rodriguez, Computational Data Scientist

This session continues and extends Deep Learning - Part 1 by going into more advanced examples. Concepts regarding architecture, layers, and applications will be presented. Additionally, more advanced tutorials and hands-on exercises with larger deep convolutional networks and transfer learning will be executed on GPUs. There will also be a chance to learn Keras more in depth and become familiar with building more flexible models.
4:30 PM - 4:45 PM Q&A + Wrap-up

Thursday, August 7

Pacific time

Main Room Session

8:00 AM – 8:30 AM

Check-in & Light Breakfast

8:30 AM - 9:30 AM Best Practices for Scientific Computing 
Fernando Garzon, Computational and Data Science Research Specialist
9:30 AM - 12:00 PM
(break 10:30-10:45 AM)
Performance Tuning
Robert Sinkovits, Director of Education and Training, Emeritus
This session is targeted at attendees who both do their own code development and need their calculations to finish as quickly as possible. We will cover the effective use of cache, loop-level optimizations, force reductions, optimizing compilers and their limitations, short-circuiting, time-space tradeoffs and more. Exercises will be done mostly in C, but emphasis will be on general techniques that can be applied in any language.

12:00 PM - 1:30 PM

Lunch
1:30 PM - 4:00 PM
(break 3:15 PM - 3:30 PM)
GPU Computing and Programming
Andreas Goetz, Research Scientist and Principal Investigator
This session introduces massively parallel computing with graphics processing units (GPUs). The use of GPUs is popular across all scientific domains since GPUs can significantly accelerate time to solution for many computational tasks. Participants will be introduced to the essential background of the GPU chip architecture and will learn how to program GPUs via the use of libraries, OpenACC compiler directives, and CUDA programming. The session will incorporate hands-on exercises for participants to acquire the basic skills to use and develop GPU aware applications
4:00 PM - 4:15 PM Q&A + Wrap-up
Group Photo

Friday, August 8

Pacific time

Main Room Session

8:00 AM – 8:30 AM

Check-in & Light Breakfast

8:30 AM – 11:00AM

Python for HPC
Andrea Zonca, Lead of Scientific Computing Applications and Chair of the Summer Institute
In this session we will introduce four key technologies in the Python ecosystem that provide significant benefits for scientific applications run in supercomputing environments. Previous Python experience is recommended but not required. 


(1) First, we will learn how to speed up Python code compiling it on-the-fly with numba (2) Then we will introduce the threads, processes and the Global Interpreter lock and we will leverage first numba then dask to use all available cores on a machine (3) Finally we will distribute computations across multiple nodes launching dask workers on a separate Expanse job.

11:00 AM – 11:15 AM Overview of Voyager
Amit Majumdar, Division Director of Data-Enabled Scientific Computing
Voyager provides an innovative system architecture uniquely optimized for deep learning operations using well-established frameworks such as PyTorch and TensorFlow. Voyager comprises 42 training nodes of Supermicro X12 Habana Gaudi Training Servers; each training node contains 8 GAUDI HL-205 training processor cards which have 100 GbE non-blocking, all-to-all connections among the 8 cards within a node; the 42 Training nodes are connected via a high-performance, low latency 400 GbE switch interconnect. Voyager’s architecture has already shown highly scalable AI application performance in various areas such as LLMs (with billions of parameters such as for GPT2-XL and GPT3-XL), convolutional neural network-based image processing, and graph neural network based high-energy particle physics.
11:15 AM - 11:30 AM

Overview of COSMOS

Mahidhar Tatineni, Director of User Services

11:30 AM - 11:45 AM Over of Prototype National Research Platform (PNRP)
Mahidhar Tatineni, Director of User Services
11:45 AM - Noon

Closing Remarks

Andrea Zonca, Lead of Scientific Computing Applications and Chair of the Summer Institute
Lunch boxes will be provided*