Agenda - CIML

Agenda is subject to change. Times listed below are in Pacific.

Lesson Materials: https://github.com/ciml-org/ciml-summer-institute-2024

Tuesday, June 18
Preparation Day (virtual)

9:00 am - 9:15 am

1.1. Welcome & Orientation

Mary Thomas, Computational Data Scientist & Director of the CIML Summer Institute
Cindy Wong, Events Specialist

9:15 am – 9:45 am

1.2 Accounts, Login, Environment, Running Jobs and Logging into Expanse User Portal
Mary Thomas, Computational Data Scientist & Director of the CIML Summer Institute

9:45 am – 10:30 am

Q&A & Wrap-up

Tuesday, June 25
HPC, Parallel Concepts

8:00 am -8:30 am	Light Breakfast & Check-in Location: SDSC Auditorium
8:30 am - 9:30 am	2.1 Welcome and Introductions Mary Thomas, Computational Data Scientist & Director of the CIML Summer Institute
9:30 am - 10:15 am	2.2 Parallel Computing Concepts Robert Sinkovits, Director of Education and Training We will cover supercomputer architectures, the differences between threads and processes, implementations of parallelism (e.g., OpenMP and MPI), strong and weak scaling, limitations on scalability (Amdahl’s and Gustafson’s Laws) and benchmarking.
10:15 am - 10:30 am	Break
10:30 am - 11: 15 am	2.3 Getting Started with Batch Job Scheduling Marty Kandes, Computational and Data Science Research Specialist Batch job schedulers are used to manage and fairly distribute the shared resources of high-performance computing (HPC) systems. Learning how to interact with them and compose your work into batch jobs is essential to becoming an effective HPC user.
11:15 am - 12:30 pm	2.4 Data Management and File Systems Marty Kandes, Computational and Data Science Research Specialist Managing data efficiently on a supercomputer is important from both users' and system's perspectives. We will cover a few basic data management techniques and I/O best practices in the context of the Expanse system at SDSC.
12:30 pm - 1:30 pm Lunch @ Cafe Ventanas
1:45 pm - 3:15 pm	2.5 GPU Computing - Hardware architecture and software infrastructure Andreas Goetz, Research Scientist & Principal Investigator Brief overview of the massively parallel GPU architecture that enables large-scale deep learning applications, access and use of GPUs on SDSC Expanse for ML applications
3:15 pm - 3:30 pm	Break
3:30 pm - 5:00 pm	2.6 Software Containers for Scientific and High-Performance Computing Marty Kandes, Computational and Data Science Research Specialist Singularity is an open-source container engine designed to bring operating system-level virtualization to scientific and high-performance computing. With Singularity you can package complex computational workflows --- software applications, libraries, and data --- in a simple, portable, and reproducible way, which can then be run almost anywhere.
5:00 pm - 5:15 pm	Q&A, Wrap-up
5:30 pm - 7:30 pm Evening Reception - UC San Diego, Seventh College, 15th Floor

Wednesday, June 26
Deep Learning

8:00 am - 8:30 am	Light Breakfast & Check-in
8:30 am - 8:45 am	3.1 Machine Learning (ML) Overview Mai Nguyen, Lead for Data Analytics Brief review of machine learning concepts
8:45 am - 10:15 am	3.2 Introduction to Neural Networks and Convolution Neural Networks Paul Rodriguez, Computational Data Scientist An overview of the main concepts of neural networks and feature discovery; the basic convolution neural network for digit recognition using tensorflow
10:15 am - 10:30 am	Break
10:30 am - 12:00 pm	3.3 Practical Guidelines for Training Deep Learning on HPC Paul Rodriguez, Computational Data Scientist Guildelines on running deep networks on Expanse, such as using tensorboard, notebooks, and batch jobs; also some discussion of multinode execution.
12:00 pm - 1:00 pm Lunch @ Cafe Ventanas
1:00pm - 1:45 pm	3.4 Deep Learning Layers and Architectures Mai Nguyen, Lead for Data Analytics Overview of deep learning concepts, including layers, architectures, applications, and libraries.
1:45 pm - 3:15 pm	3.5 Deep Learning Transfer Learning Mai Nguyen, Lead for Data Analytics Tutorial and hands-on exercises on the use of transfer learning for efficient training of deep learning models.
3:15 pm - 3:30 pm	Break
3:30 pm - 5:00 pm	3.6 Deep Learning – Special Connections Paul Rodriguez, Computational Data Scientist The architecture of many networks use paths and connections in flexible ways; we will review gate, skip, and residual connections and get some intuition what they are good for.
5:00 pm	Q&A, Wrap-up

Thursday, June 27
Scalable Machine Learning & Large Language Model

8:00 am – 8:30 am	Light breakfast & Check-in
8:30 am– 10:00 am	4.1 CONDA Environments and Jupyter Notebook on Expanse: Scalable & Reproducible Data Exploration and ML Peter Rose, Director of Structural Bioinformatics Laboratory Set up reproducible and transferable software environments and scale up calculations to large datasets using parallel computing.
10:00 am – 10:15 am	Break
10:15 am – 10:45 am	4.2 R on HPC Demo Paul Rodriguez, Computational Data Scientist A presentation and demo of parallelizing R; also an example case study of several ML tools and R for big data.
10:45 am - 12:15 pm	4.3 Spark Mai Nguyen, Lead for Data Analytics Introduction to performing machine learning at scale, with hands-on exercises using Spark.
12:15 pm - 1:15 pm Lunch @ Cafe Ventanas
1:15 pm -4:15 pm	4.4 LLM Overview In this session we will present an introduction to Large Language Models and the possible use cases. Examples of how to use LLMs will be covered. This session is designed for people with a basic understanding of machine learning but no prior knowledge of LLMs is required.
2:15 pm - 2:30 pm	Break
2:30 pm - 4:30 pm	4.5 LLM Overview (continued)
4:30 pm - 5:00 pm	Q&A, Wrap-up

Get Connected