We're gradually importing our scientific contents in French. Please forgive the visual glitches that may result from the operation.
High-performance Computing for Reproducible Genomics
By   |  March 06, 2017

High-performance Computing for Reproducible Genomics

HARVARD course from EDX

If you’re interested in data analysis and interpretation, then this is the data science course for you.

Enhanced throughput: Almost all recently manufactured laptops and desktops include multiple core CPUs. With R, it is very easy to obtain faster turnaround times for analyses by distributing tasks among the cores for concurrent execution. We will discuss how to use Bioconductor to simplify parallel computing for efficient, fault-tolerant, and reproducible high-performance analyses. This will be illustrated with common multicore architectures and Amazon’s EC2 infrastructure.

Enhanced interactivity: New approaches to programming with R and Bioconductor allow researchers to use the web browser as a highly dynamic interface for data interrogation and visualization. We will discuss how to create interactive reports that enable us to move beyond static tables and one-off graphics so that our analysis outputs can be transformed and explored in real time.

Enhanced reproducibility: New methods of virtualization of software environments, exemplified by the Docker ecosystem, are useful for achieving reproducible distributed analyses. The Docker Hub includes a considerable number of container images useful for important Bioconductor-based workflows, and we will illustrate how to use and extend these for sharable and reproducible analysis.

Given the diversity in educational background of our students we have divided the series into seven parts. You can take the entire series or individual courses that interest you. If you are a statistician you should consider skipping the first two or three courses, similarly, if you are biologists you should consider skipping some of the introductory biology lectures. Note that the statistics and programming aspects of the class ramp up in difficulty relatively quickly across the first three courses. By the third course will be teaching advanced statistical concepts such as hierarchical models and by the fourth advanced software engineering skills, such as parallel computing and reproducible research concepts.

These courses make up 2 XSeries and are self-paced:

PH525.1x: Statistics and R for the Life Sciences

PH525.2x: Introduction to Linear Models and Matrix Algebra

PH525.3x: Statistical Inference and Modeling for High-throughput Experiments

PH525.4x: High-Dimensional Data Analysis

PH525.5x: Introduction to Bioconductor: annotation and analysis of genomes and genomic assays

PH525.6x: High-performance computing for reproducible genomics

PH525.7x: Case studies in functional genomics

What you’ll learn:

  • Parallel Computing
  • Interactive Graphics
  • Reproducible distributed analysis


  • Length: 4 weeks
  • Effort: 2-4 hours per week
  • Price: Free
  • Add a Verified Certificate for $50
  • Institution: HARVARDx
  • Subject: Biology and Life science
  • Level: Advanced
  • Languages: English
  • Video Transcripts: English

© HPC Today 2017 - All rights reserved.

Thank you for reading HPC Today.

Express poll

Do you use multi-screen
visualization technologies?

Industry news


Brands / Products index