Agenda

 

 

 

 

Lesson material repository: SDSC Summer Institute 2021

Agenda is subject to change. Times listed below are in Pacific Daylight Time.


Wednesday, July 28

Time (PDT) Session
9:00 - 11:00 AM Preparation Session: Navigate online tools, account set-up, log-in and access of system

 

Monday, August 2

Time (PDT) Main Room Session
8:00 - 9:00 AM 1.1. Welcome, Orientation, & Introductions (Main Room)
Bob Sinkovits, Director of Scientific Computing, SDSC & Director of the Summer Institute
9:00 - 10:00 AM 1.2 Accessing and Running Jobs on Expanse (Main Room)
Mary Thomas, Computational Data Scientist, SDSC

This session covers the basics of accessing Expanse; managing the user environment; and compiling and running jobs. It is assumed that you have completed the basic steps of logging onto Expanse and refreshing your Unix skills prior to the event.
10:00 - 10:30 AM 1.3. Expanse User Portal (Main Room)
Subhashini Sivagnanam, Senior Computational and Data Science Specialist, SDSC

10:30- 10:45 - AM Break

  Main Room Session Breakout Room Session
10:45 - 12:45 PM

1.4a. Introduction to version control with git and GitHub 
Martin Kandes, Computational & Data Science Research Specialist, SDSC


Introduction to git for beginners, create a repository on Github

1.4b. Advanced Github
Andrea Zonca, Senior Computational Scientist, SDSC


You should be already familiar with creating Pull Requests, merging, and rebasing branches

12:45 - 1:15 PM -30-minute lunch/break

1:15 - 2:00 PM 1.5. Understanding Performance and Obtaining Hardware Information (Main Room)
Bob Sinkovits, Director of Scientific Computing, SDSC & Director of the Summer Institute

 

Tuesday, August 3

AM Session Main Room Session Breakout Room Session

 

8 AM- 10:45 AM

 

2.1a. Python for HPC
Andrea Zonca, Senior Computational Scientist, SDSC

 

In this session we will introduce four key technologies in the Python ecosystem that provide significant benefits for scientific applications run in supercomputing environments. Previous Python experience is recommended but not required.
(1) First we will learn how to speed up Python code compiling it on-the-fly with numba (2) Then we will introduce the threads, processes and the Global Interpreter lock and we will leverage first numba then dask to use all available cores on a machine (3) Finally we will distribute computations across multiple nodes launching dask workers on a separate Expanse job.

2.1b. A Short Introduction to Data Science and its Applications

Ilkay Altintas, Chief Data Science Officer, SDSC
Subhasis Dasgupta, Computational and Data Researcher, SDSC

Shweta Purawat, Computational and Data Researcher, SDSC

 

The new era of data science is here. Our lives as well as any field of science, engineering, business, and society are continuously transformed by our ability to collect meaningful data in a systematic fashion and turn that into value. These needs not only push for new and innovative capabilities in composable data management and analytical methods that can scale in an anytime anywhere fashion, but also require methods to bridge the gap between applications and compose such capabilities within solution architectures.


In this short overview, we will show you a plethora of applications that are enabled by data science techniques and describe the process and cyberinfrastructure used within these projects to solve questions.

 

15 minute break will be based on instructor

10:45 – 11:15 AM - 30-minute lunch/break

PM Session Main Room Session Breakout Room Session

11:15 - 2:00 PM

 

2.2a. Performance Tuning
Bob Sinkovits, Director for Scientific Computing Applications, SDSC


This session is targeted at attendees who both do their own code development and need their calculations to finish as quickly as possible. We will cover the effective use of cache, loop-level optimizations, force reductions, optimizing compilers and their limitations, short-circuiting, time-space tradeoffs and more. Exercises will be done mostly in C, but emphasis will be on general techniques that can be applied in any language.

2.2b. Information Visualization Concepts
Amit Chourasia, Senior Visualization Scientist, SDSC

 

This tutorial will provide a ground up understanding of information visualization concepts and how they can be leveraged to select and use effective visual idioms for different data types such spreadsheet data, geospatial, graph, etc.). Example visualization designs and fixing problems with existing visualizations will be discussed. Practical rules of thumbs for visualization will be discussed as well.

15-minute break will be based on instructor

 

Wednesday, August 4

AM Session Main Room Session Breakout Room Session

 

8 AM- 10:45 AM

 

3.1a. Scientific Visualization for mesh based data with Visit
Amit Chourasia, Senior Visualization Scientist, SDSC

 

This tutorial will provide a high-level overview of scientific visualization techniques and their applicability for structured mesh-based data (such as rectilinear grids). Attendees will follow along exercises in a hands-on manner to employ different types of techniques using VisIt software and also perform remote visualization on Expanse cluster.

3.1b. Scalable Machine Learning 

Mai Nguyen, Lead for Data Analytics, SDSC 

Paul Rodriguez, Research Analyst, SDSC 


Machine learning is an integral part of knowledge discovery in a wide variety of applications.

From scientific domains to social media analytics, the data that needs to be analyzed has become massive and complex. This session introduces approaches that can be used to perform machine learning at scale. Tools and procedures for executing machine learning techniques on HPC will be presented. Spark will also be covered. In particular, we will use Spark’s machine learning library, 

MLlib, to demonstrate how distributed computing can be used to provide scalable machine learning. Please note: Knowledge of fundamental machine learning algorithms and techniques is required

15-minute break will be based on instructor

10:45 – 11:15 AM - 30-minute lunch/break

PM Session  Main Room Session

11:15 - 2:00 PM

Group photo 
3.2. Lightning Rounds
15-minute break will be based on instructor

 

Thursday, August 5

AM Session Main Room Session Breakout Room Session

 

8 AM- 10:45 AM

 

4.1a. GPU Computing and Programming 
Andreas Goetz, Research Scientist and Principal Investigator, SDSC 

 
This session introduces massively parallel computing with graphics processing units (GPUs). The use of GPUs is becoming increasingly popular across all scientific domains since GPUs can significantly accelerate time to solution for many computational tasks. Participants will be introduced to essential background of the GPU chip architecture and will learn how to program GPUs via the use of libraries, OpenACC compiler directives, and CUDA programming. The session will incorporate hands-on exercises for participants to acquire the skills to use and develop GPU aware applications. 

4.1b. Deep Learning (part 1)

Mai Nguyen, Lead for Data Analytics, SDSC 

Paul Rodriguez, Research Analyst, SDSC 

 

Deep learning, a subfield of machine learning, has seen tremendous growth and success in the past few years. Deep learning approaches have achieved state-of-the-art performance across many domains, including image classification, speech recognition, and biomedical applications. Deep learning makes use of models that are composed of many layers of interconnected processing units. The many layers allow for a deep network to learn representations of data at multiple and increasingly complex and task-specific levels of abstraction, leading to automatic feature learning and excellent prediction performance. This session provides an introduction to deep learning concepts and approaches. Case studies utilizing deep learning will be presented, and hands-on exercises will be covered using Keras. Please note: Knowledge of fundamental machine learning concepts and techniques is required. 

 

15-minute break will be based on instructor

10:45 – 11:15 AM - 30-minute lunch/break

PM Session  Main Room Session Breakout Room Session

11:15 - 2:00 PM

4.2a. Parallel Computing using MPI & Open MP 
Mahidhar Tatineni, Director of User Services, SDSC 

 
This session is targeted at attendees who are looking for a hands-on introduction to parallel computing using MPI and Open MP programming. The session will start with an introduction and basic information for getting started with MPI. An overview of the common MPI routines that are useful for beginner MPI programmers, including MPI environment set up, point-to-point communications, and collective communications routines will be provided. Simple examples illustrating distributed memory computing, with the use of common MPI routines, will be covered. The OpenMP section will provide an overview of constructs and directives for specifying parallel regions, work sharing, synchronization and data scope. Simple examples will be used to illustrate the use of OpenMP shared-memory programming model, and important run time environment variables Hands on exercises for both MPI and OpenMP will be done in C and FORTRAN. 

4.2b. Deep Learning (part 2)

Mai Nguyen, Lead for Data Analytics, SDSC 

Paul Rodriguez, Research Analyst, SDSC 

 

Deep learning, a subfield of machine learning, has seen tremendous growth and success in the past few years. Deep learning approaches have achieved state-of-the-art performance across many domains, including image classification, speech recognition, and biomedical applications. Deep learning makes use of models that are composed of many layers of interconnected processing units. The many layers allow for a deep network to learn representations of data at multiple and increasingly complex and task-specific levels of abstraction, leading to automatic feature learning and excellent prediction performance. This session provides an introduction to deep learning concepts and approaches. Case studies utilizing deep learning will be presented, and hands-on exercises will be covered using Keras. Please note: Knowledge of fundamental machine learning concepts and techniques is required. 

 

15-minute break will be based on instructor

 

Friday, August 6

Time Main Room Session

8:30 - 9:00 AM 

5.1. An Introduction to Singularity: Containers for Scientific and High-Performance Computing 
Martin Kandes, Computational & Data Science Research Specialist, SDSC 

9:00 - 9:30 AM 

5.2. Data sharing via SeedMeLab
Amit Chourasia, Senior Visualization Scientist, SDSC 
9:30 - 10:00 AM 5.3. Open Science Chain, Protecting Data Integrity with Open Science Chain
Subhashini Sivagnanam, Senior Computational and Data Science Specialist, SDSC & Manu Shantharam, Senior Computational Scientist, SDSC

10:00 - 11:00 AM 

Introduction to new projects/special topics (30 minutes each):
  • 5.4. Voyager, Mahidhar Tatineni, Director of User Services, SDSC 
  • 5.5. Composable Systems, Ilkay Altintas, Chief Data Science Officer, SDSC

11:00- 11:15 - AM Break

11:15 - 11:45 PM Introduction to new projects/special topics (30 minutes each): 
  • 5.6. CloudBank, Shava Smallen, Computational & Data Science Research Specialist
11:45 - 12:00 PM  Adjourn- Wrap-up, thank you for joining us! 
Bob Sinkovits, Director of Scientific Computing, SDSC & Director of the Summer Institute