Agenda is subject to change. Times listed below are in Pacific.

Lesson Material: https://github.com/ciml-org/ciml-summer-institute-2023

 

Tuesday, June 20 – Preparation Day (virtual)

9:00 am - 9:15 am

1.1. Welcome & Orientation 

9:15 am – 9:45 am

1.2 Accounts, Login, Environment, Running Jobs and Logging into Expanse User Portal 
Robert Sinkovits, Director of Education and Training

9:45 am – 10:30 am

Q&A & Wrap-up

 

Tuesday, June 27 – HPC, Parallel Concepts

8:00 am -8:30 am Light Breakfast & Check-in
8:30 am - 9:30 am

2.1 Welcome and Introductions

Mary Thomas, Computational Data Scientist & Director of the CIML Summer Institute

9:30 am - 9:40 am Break

9:40 am - 11:00 am

2.2 Parallel Computing Concepts
Robert Sinkovits, Director of Education and Training 
We will cover supercomputer architectures, the differences between threads and processes,
implementations

of parallelism (e.g., OpenMP and MPI), strong and weak scaling, limitations on scalability (Amdahl’s and

Gustafson’s Laws) and benchmarking. 

11:00 am -11:10 am Break
11:10 am - 12:30 pm

2.3 Running Batch Jobs on SDSC Systems

Marty Kandes, Computational and Data Science Research Specialist  
Batch job schedulers are used to 
manage and fairly distribute the shared resources of high-performance

computing (HPC) systems.  Learning how to interact with them and compose your work into batch jobs
is essential to becoming an
effective HPC user.

12:30 pm - 1:30 pm
Lunch
1:30 pm - 2:50 pm

2.4 Data Management and File Systems  
Mahidhar Tatineni, Director of User Services
Managing data efficiently on a supercomputer is important from both users' and system's perspectives.
W
e will cover a few basic data management techniques and I/O best practices in the context of the Expanse
system at SDSC. 

2:50 pm - 3:00 pm Break
3:00 pm - 4:30 pm

2.5 GPU Computing - Hardware architecture and software infrastructure  

Andreas Goetz, Research Scientist & Principal Investigator  

Brief overview of the massively parallel GPU architecture that enables large-scale deep learning

applications, access and use of GPUs on SDSC Expanse for ML applications

4:30 pm - 5:00 pm

Q&A, Wrap-up 

5:00 pm - 7:00 pm
Evening Reception - 15th Floor, the Village

 

Wednesday, June 28 - Scalable Machine Learning

8:00 am - 8:30 am Light Breakfast & Check-in
8:30 am - 8:40 am

3.1 Quick Welcome

8:40 am - 10:00 am

3.2 Introduction to Singularity: Containers for Scientific and High-Performance Computing  

Marty Kandes, Computational and Data Science Research Specialist  
Singularity is an open-source container engine designed to bring operating system-level virtualization

to scientific and high-performance computing. With Singularity you can package complex computational workflows --- software applications, libraries, and data --- in a simple, portable, and reproducible way,

which can then be run almost anywhere. 

10:00 am - 10:10 am Break
10:10 am - 12:10 pm

3.3 CONDA Environments and Jupyter Notebook on Expanse: Scalable & Reproducible

Data Exploration and ML  
Peter Rose, Director of Structural Bioinformatics Laboratory
Set up reproducible and transferable software environments and scale up calculations to large datasets using parallel computing.

12:10 pm - 1:10 pm
Lunch

1:10 pm - 1:30 pm

3.4 Machine Learning (ML) Overview 

Mai Nguyen, Lead for Data Analytics
Brief review of machine learning concepts

1:30 pm - 2:25 pm

3.5 R on HPC
Paul Rodriguez, Computational Data Scientist 
A presentation and demo of parallelizing R; also an example case study of several ML tools and R for big data

2:25 pm - 2:35 pm Break
2:35 pm -4:35 pm 3.6 Spark  
Mai Nguyen, Lead for Data Analytics
Introduction to performing machine learning at scale, with hands-on exercises using Spark
4:35 pm - 5:00 pm Q&A, Wrap-up

 

Thursday, June 29 - Deep Learning

8:00 am – 8:30 am Light breakfast & Check-in
8:30 am– 8:40 am

4.1 Quick Welcome

8:40 am – 10:00 am

4.2 Introduction to Neural Networks and Convolution Neural Networks 
Paul Rodriguez, Computational Data Scientist  

An overview of the main concepts of neural networks and feature discovery; the basic convolution neural network for digit recognition using tensorflow

10:00 am – 10:10 am Break
10:10 am – 11:30 am

4.3 Practical Guidelines for Training Deep Learning on HPC

Paul Rodriguez, Computational Data Scientist 

Guildelines on running deep networks on Expanse, such as using tensorboard, notebooks, and batch jobs; also some discussion of multinode execution.

11:30 am - 12:30 pm
Lunch
12:30 pm – 1:30 pm

4.4 Deep Learning Layers and Architectures 
Mai Nguyen, Lead for Data Analytics
Overview of deep learning concepts, including layers, architectures, applications, and libraries

1:30 pm – 1:40 pm Break
1:40 pm – 3:10 pm

4.5 Deep Learning Transfer Learning 

Mai Nguyen, Lead for Data Analytics 

Tutorial and hands-on exercises on the use of transfer learning for efficient training of deep learning models.

3:10 pm – 3:20 pm Break

3:20 pm – 4:50 pm 

4.6 Deep Learning – Special Connections

Paul Rodriguez, Computational Data Scientist 

The architecture of many networks use paths and connections in flexible ways; we will review gate, skip, and residual connections and get some intuition what they are good for

4:50 pm - 5:00 pm

Q&A, Wrap-up