Agenda is subject to change. Times listed below are in Pacific.

Lesson Materials: https://github.com/ciml-org/ciml-summer-institute-2024

 

Tuesday, June 18
Preparation Day (virtual)

9:00 am - 9:15 am

1.1. Welcome & Orientation

Mary Thomas, Computational Data Scientist & Director of the CIML Summer Institute
Cindy Wong, Events Specialist 

9:15 am – 9:45 am

1.2 Accounts, Login, Environment, Running Jobs and Logging into Expanse User Portal 
Mary Thomas, Computational Data Scientist & Director of the CIML Summer Institute 

9:45 am – 10:30 am

Q&A & Wrap-up

 

Tuesday, June 25
HPC, Parallel Concepts

8:00 am -8:30 am Light Breakfast & Check-in
Location: SDSC Auditorium 
8:30 am - 9:30 am

2.1 Welcome and Introductions

Mary Thomas, Computational Data Scientist & Director of the CIML Summer Institute

9:30 am - 10:15 am

2.2 Parallel Computing Concepts
Robert Sinkovits, Director of Education and Training 
We will cover supercomputer architectures, the differences between threads and processes, implementations

of parallelism (e.g., OpenMP and MPI), strong and weak scaling, limitations on scalability (Amdahl’s and

Gustafson’s Laws) and benchmarking. 

10:15 am - 10:30 am Break
10:30 am - 11: 15 am  

2.3 Getting Started with Batch Job Scheduling
Marty Kandes, Computational and Data Science Research Specialist

Batch job schedulers are used to manage and fairly distribute the shared resources of high-performance
computing (HPC) systems. Learning how to interact with them and compose your work into batch
jobs is essential to becoming an effective HPC user.

11:15 am - 12:30 pm 2.4 Data Management and File Systems  
Marty Kandes, Computational and Data Science Research Specialist
Managing data efficiently on a supercomputer is important from both users' and system's perspectives. 
We will cover a few basic data management techniques and I/O best practices in the context of the Expanse system at SDSC. 
12:30 pm - 1:30 pm
Lunch @ Cafe Ventanas
1:45 pm - 3:15 pm

2.5 GPU Computing - Hardware architecture and software infrastructure  

Andreas Goetz, Research Scientist & Principal Investigator  

Brief overview of the massively parallel GPU architecture that enables large-scale deep learning

applications, access and use of GPUs on SDSC Expanse for ML applications

3:15 pm - 3:30 pm Break
3:30 pm - 5:00 pm

2.6 Software Containers for Scientific and High-Performance Computing

Marty Kandes, Computational and Data Science Research Specialist  
Singularity is an open-source container engine designed to bring operating system-level virtualization to scientific
and high-performance computing. With Singularity you can package complex computational workflows ---
software applications, libraries, and data --- in a simple, portable, and reproducible way, which can then be run almost anywhere. 

5:00 pm - 5:15 pm

Q&A, Wrap-up 

5:30 pm - 7:30 pm
Evening Reception - UC San Diego, Seventh College, 15th Floor

 

Wednesday, June 26
Deep Learning

8:00 am - 8:30 am Light Breakfast & Check-in
8:30 am - 8:45 am

3.1 Machine Learning (ML) Overview 

Mai Nguyen, Lead for Data Analytics
Brief review of machine learning concepts

8:45 am - 10:15 am

3.2 Introduction to Neural Networks and Convolution Neural Networks 
Paul Rodriguez, Computational Data Scientist  

An overview of the main concepts of neural networks and feature discovery; the basic convolution neural network for digit recognition using tensorflow

10:15 am - 10:30 am Break
10:30 am - 12:00 pm

3.3 Practical Guidelines for Training Deep Learning on HPC

Paul Rodriguez, Computational Data Scientist 

Guildelines on running deep networks on Expanse, such as using tensorboard, notebooks, and batch jobs; also some discussion of multinode execution.

12:00 pm - 1:00 pm
Lunch @ Cafe Ventanas

1:00pm - 1:45 pm

3.4 Deep Learning Layers and Architectures 
Mai Nguyen, Lead for Data Analytics
Overview of deep learning concepts, including layers, architectures, applications, and libraries.

1:45 pm - 3:15 pm

3.5 Deep Learning Transfer Learning 

Mai Nguyen, Lead for Data Analytics 

Tutorial and hands-on exercises on the use of transfer learning for efficient training of deep learning models.

3:15 pm - 3:30 pm Break
3:30 pm - 5:00 pm

3.6 Deep Learning – Special Connections

Paul Rodriguez, Computational Data Scientist 

The architecture of many networks use paths and connections in flexible ways; we will review gate, skip, and residual connections and get some intuition what they are good for.

5:00 pm Q&A, Wrap-up

 

Thursday, June 27 
Scalable Machine Learning & Large Language Model

8:00 am – 8:30 am Light breakfast & Check-in
8:30 am– 10:00 am

4.1 CONDA Environments and Jupyter Notebook on Expanse: Scalable & Reproducible

Data Exploration and ML  
Peter Rose, Director of Structural Bioinformatics Laboratory
Set up reproducible and transferable software environments and scale up calculations to large datasets using parallel computing.

10:00 am – 10:15 am Break
10:15 am – 10:45 am

4.2 R on HPC Demo
Paul Rodriguez, Computational Data Scientist
A presentation and demo of parallelizing R; also an example case study of several ML tools and R for big data.

10:45 am - 12:15 pm 4.3 Spark
Mai Nguyen, Lead for Data Analytics
Introduction to performing machine learning at scale, with hands-on exercises using Spark.
12:15 pm - 1:15 pm
Lunch @ Cafe Ventanas
1:15 pm -4:15 pm

4.4 LLM Overview
In this session we will present an introduction to Large Language Models and the possible use cases. Examples of how to use LLMs will be covered.  This session is designed for people with a basic understanding of machine learning but no prior knowledge of LLMs is required.

2:15 pm - 2:30 pm Break
2:30 pm - 4:30 pm 4.5 LLM Overview (continued)

4:30 pm - 5:00 pm

Q&A, Wrap-up