Preparation Day: Wednesday, July 27, 2022
Institute: Monday, August 1 – August 5, 2022
This event will be held virtually. All times listed are Pacific Time.
All program content will be found on the GitHub Repository
https://github.com/sdsc/sdsc-summer-institute-2022
Wednesday, July 27
Pacific time |
Session |
9:00 AM – 11:00 AM |
1.0 Preparation Day - Welcome & Orientation Accounts, Login, Environment, Running Jobs and Logging into Expanse User Portal Q&A wrap up |
Monday, August 1
Pacific time |
Main Room Session |
8:00 AM – 8:15 AM |
Welcome |
8:15 AM – 9:15 AM |
2.1 Parallel Computing Concepts Robert Sinkovits, Director of Education and Training |
9:15 AM – 10:00 AM |
2.2 Hardware Overview All users of advanced CI can benefit from a basic understanding of hardware, to determine which factors affect application performance. Here we give an overview starting from CPUs (processors, cores, hyperthreading, instruction sets), the anatomy of a compute node (sockets, memory, attached devices, accelerators), to an overview of cluster architecture (login and compute nodes, interconnects). We also cover how to obtain hardware information using Linux tools, pseudo-filesystems and commonly used hardware utilization monitoring tools. |
10:00 AM – 10:15 AM |
Break |
10:15 AM – 11:30 AM |
2.3 Intermediate Linux Effective use of Linux based compute resources via the command line interface (CLI) can significantly increase researcher productivity. Assuming basic familiarity with the Linux CLI we cover some more advanced concepts with focus on the Bash shell. Among others this includes the filesystem hierarchy, file permissions, symbolic and hard links, wildcards and file globbing, finding commands and files, environment variables and modules, configuration files, aliases, history and tips for effective Bash shell scripting. |
11:30 AM – 12:30 PM |
2.4 Batch Computing |
12:30 PM – 12:45 PM |
Break |
12:45 PM – 2:15 PM |
2.5 Data Management Proper data management is essential for the effective use of advanced CI. This session will cover an overview of file systems, data compression, archives (tar files), checksums and MD5 digests, downloading data using wget and curl, data transfer and long-term storage solutions. |
Tuesday, August 2
Pacific time |
Main Room Session |
8:00 AM – 8:30 AM |
3.1 Security |
8:30 AM – 9:30 AM |
3.2 Interactive Computing |
9:30 AM – 9:45 AM |
Break |
9:45 AM – 10:30 AM |
3.3 Getting Help |
10:30 AM – 11:30 AM |
3.4 Code Migration |
11:30 AM – 11:45 AM |
Break |
11:45 AM – 12:45 PM |
3.5 High Throughput Computing |
12:45 PM – 1:45 PM |
3.6 Linux Tools for File Processing |
Wednesday, August 3
Pacific time |
Main Room Session |
Breakout Room Session |
8:00 AM – 9:30 AM |
4.1a Intro to Git & GitHub |
4.1b Advanced Git & GitHub Data Science Research Specialist
|
9:30 AM – 9:45 AM |
Break |
|
9:45 AM – 12:00 PM |
4.2a Python for HPC In this session we will introduce four key technologies in the Python ecosystem that provide significant benefits for scientific applications run in supercomputing environments. Previous Python experience is recommended but not required.
|
4.2b A Short Introduction to Data Science and its Applications Ilkay Altintas, Chief Data Science Officer Shweta Purawat, Computational and Data Researcher The new era of data science is here. Our lives as well as any field of science, engineering, business, and society are continuously transformed by our ability to collect meaningful data in a systematic fashion and turn that into value. These needs not only push for new and innovative capabilities in composable data management and analytical methods that can scale in an anytime anywhere fashion, but also require methods to bridge the gap between applications and compose such capabilities within solution architectures.
In this short overview, we will show you a plethora of applications that are enabled by data science techniques and describe the process and cyberinfrastructure used within these projects to solve questions.
|
12:00 PM – 2:30 PM |
4.3a Performance Tuning |
4.3b Scalable Machine Learning This session introduces approaches that can be used to perform machine learning at scale. Tools and procedures for executing machine learning techniques on HPC will be presented. Spark will also be covered for scalable data analytics and machine learning. Please note: Knowledge of fundamental machine learning algorithms and techniques is required. |
Thursday, August 4
Pacific time |
Main Room Session |
Breakout Room Session |
8:00 AM – 10:30 AM |
5.1a Scientific Visualization for mesh based data with Visit Amit Chourasia, Senior Visualization Scientist |
5.1b Deep Learning - Part 1 |
10:30 AM – 10:45 AM |
Break |
|
10:45 AM – 1:30 PM |
5.2a GPU Computing and Programming This session introduces massively parallel computing with graphics processing units (GPUs). The use of GPUs is popular across all scientific domains since GPUs can significantly accelerate time to solution for many computational tasks. Participants will be introduced to essential background of the GPU chip architecture and will learn how to program GPUs via the use of libraries, OpenACC compiler directives, and CUDA programming. The session will incorporate hands-on exercises for participants to acquire the basic skills to use and develop GPU aware applications.
|
5.2b Deep Learning – Part 2 |
1:30 PM – 2:00 PM |
5.3 An Introduction to Singularity: Containers for Scientific and High-Performance Computing |
Friday, August 5
Pacific time |
Main Room Session |
|
8:00 AM – 11:00 AM |
6.1a Parallel Computing using MPI & Open MP |
6.1b Information Visualization Concepts
|
11:00 AM – 11:15 AM |
Break |
|
11:15 AM – 12:00 PM |
6.2 Scaling up Interactive Data Analysis in Jupyter Lab: From Laptop to HPC |
|
12:00 PM – 12:15 PM |
Closing Remarks Robert Sinkovits, Director of Education and Training |