Structural Bioinformatics Training Workshop & Hackathon 2017

Application of Big Data Technology and 3D Visualization

San Diego Supercomputer Center/University of California, San Diego
June 26 - 28, 2017

This 3-day hands-on workshop introduces participants to the development of fast and scalable structural bioinformatics methods using state-of-the-art Big Data technologies and Web-GL 3D visualization. The first two days of the workshop combine lectures, hands-on applications, and programming sessions. On the third day participants apply the new technologies to their own projects.

This workshop is held at the University of California, San Diego and hosted by the Structural Bioinformatics Laboratory at SDSC in collaboration with the RCSB Protein Data Bank.


This workshop is sponsored by the NIH Big Data to Knowledge (BD2K) initiative. Air travel and 4-day lodging will be provided for non-commercial participants, including a limited number of international participants. Apply now to secure your place in the workshop. Participants will be selected based on the best fit to the program.

Target Audience

The workshop is aimed at graduate students, postdocs, staff, faculty, industrial researchers, and scientific software developers who develop software for Structural Bioinformatics applications. Intermediate to advanced programming skills in high-level languages are required (Java, JavaScript, Python, C++).

Applications process is now CLOSED

See HOW TO APPLY page for details.


This workshop applies the following leading-edge open source technologies to Structural Bioinformatics problems.

  • MMTF (Macromolecular Transmission Format) ( is a highly optimized, compact, and simple data format for processing 3D structures from the PDB. MMTF APIs are available in common programming languages and are integrated with BioJava and BioPython.
  • Apache Spark ( is the most popular Big Data framework for distributed parallel computing.
  • Apache Spark SQL, and Apache Spark ML (Machine Learning) are scalable data analytics frameworks.
  • NGL Viewer ( is a WebGL-based viewer, supporting MMTF for efficient display of 3D structures with even millions of atoms in a web browser.


Workshop Outcomes

  • Apply parallel distributed computing using Apache Spark and the MMTF compressed data format to develop scalable analysis of the Protein Data Bank.
  • Efficiently transfer and display 3D structures in a web browser using NGL Viewer and MMTF
  • Implement your own scalable analysis or visualization tools using the provided open source APIs.
  • Run scalable data integration, data analysis, and machine learning algorithms on your data.
  • Develop collaborations with SDSC scientists and workshop participants, and contribute to open source projects.