DnA Lab | Short Read Workshop

2022 Workshop Archive

You must register to attend this workshop. Please register here

Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)

Address: 3415 Colorado Ave, Boulder, CO 80303

Room: A104 (class) and B231 (office hours)

Time: July 11-15 & 18-22, 9am-12pm (class hours, A104) & 1pm-3pm (office hours, B231)

Parking: See here

Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu) and Dr. Lynn Sanford (lynn.sanford AT colorado DOT edu)

IT support: BIT Help (bit-help AT colorado DOT edu). Please include sr2022 in the subject line on any emails to BIT Help.

Before Day 1 | Tasks to Complete

This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actual classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.

For everyone (MacOS/Windows/Linux)

Go to GitHub and register for an account (if you do not have an existing one).

Put your GitHub username in this Google spreadsheet

We will use your github username to give you temporary access to a super computer, an Amazon Cloud (AWS) server.

For Windows users

In addition to the items above, Windows users will need to install a terminal application. We recommend installing Linux Bash Shell with Ubuntu.

Windows 10 instructions: Bash for Windows with Ubuntu
Windows 11 instructions: How to install Bash on Windows 11

Optional Items

In addition to the items above, we recommend users going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.
Download and install IGV. You will need to do this for Day 4.
Download and install R and RStudio. You will need to do this for Day 6.

Day 1 | Intro to Sequencing and Overview of Pipelines

The goal of the first day is to get oriented to the class, the AWS super computer, and sequencing data. We will present an overview of the course and discuss the basic principles of high-throughput sequencing (HTS) and subsequent steps involved in processing this data (analysis pipelines). We will confirm that everyone can access the AWS instance and begin learning the VIM text editor.

Videos (watch before class!)

1.1 | Course Overview

1.2 | Sequencing: Creating Libraries

1.3 | Sequencing: Pre-sequencing library quality

1.4 | Sequencing: Illumina Sequencing

1.5 | Sequencing: Designing sequencing experiments

1.6 | Basic Unix: File System/Directory Trees

1.7 | Basic Unix: Commands for Moving Files/Directories

1.8 | Basic Unix: Vim Tutorial

Class slides

Libraries and sequencing

Class worksheets and notes

Logging onto a super computer

Configuring SSH Keys (if you are using a AWS)

Logging onto Google Shell

Using vim tutor

Vim crash 911

Creating a variable in bash

Homework

Homework day 1

Library QC challenge

Library QC challenge - answers

Additional material / Useful links

Basic info about file types in the class

Illumina sequencing technology

Illumina video

More information about barcoding

Library kit considerations

Illumina platform comparison

Day 2 | Intro to Linux & Vim

The goal of day 2 is to learn some basic Linux/Unix commands for managing files. Today we will go over basic Unix/Linux commands and work more with text editors (Vim). You can Google search Linux/Unix commands/Vim commands and use the handy cheat-sheets (below) for all of these tools.

Videos (watch before class!)

2.1 | SSH and VPN Introduction

2.2 | Remote Rsync/Reading Files

2.3 | Searching/editing Files, Pipes, and Outputs

2.4 | Directory Permissions

Class slides

Intro to Linux/Unix

Class worksheets

Part 1: Basic Unix Skills

Part 2: Writing a Script

Homework

Homework Day 2

Additional material / Useful links

Linux/Unix cheatsheet

Vim cheatsheet

Day 3 | Intro to Servers, Public Data & Quality Control

The goal of day 3 is to move to super computer thinking. Therefore, today we will learn the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server and how to run some basic quality control. Today we start using real data! We will evaluate your sequencing data for its quality. Remember: Garbage In, Garbage out. It is never worth your time to try to analyze garbage.

Videos (watch before class!)

3.1 | Introduction to the Computer Cluster

3.2 | LMOD: An Environment Module System

3.3 | Transferring Data

3.4 | Slurm Workload Manager Basics

3.5 | FastQC Overiew

3.6 | Running FastQC

Class slides

Clusters and SLURM

Class worksheets and notes

Part 1: Downloading data with sbatch

Part 2: Running fastqc on the compute nodes

Windows - where are my files?

Homework

Homework Day 3

Downloading/Running IGV

Additional material / Useful links

rsync for directories (the slash position matters)

Unix Permissions

Slurm (sbatch) Cheatsheet

FastQC Website

Previous FastQC worksheet (more detail if needed)

Day 4 | Trimming, Mapping & IGV (Genome Browser Visualization)

Congrats! It is time to now do some real data analysis! Today we will do more quality control and learn about mapping and visualization. These are the first steps that take place after obtaining your sequencing data. It is important to determine whether your sequencing experiment is successful before moving on to downstream analyses.

Videos (watch before class!)

4.1 | Trimmomatic Overview

4.2 | Running Trimmomatic

4.3 | Introduction to Mapping

4.4 | The HISAT2 mapper

4.5 | Mapping with HISAT2

4.6 | Integrative Genomics Viewer (IGV)

Class slides

Trimming and mapping

Class worksheets

Trimmomatic worksheet

Mapping/IGV worksheet

Homework

Homework day 4

Additional material / Useful links

HISAT2 Manual

Samtools Manual

IGV Manual

MultiQC Website

X2Go allows you to log into a visualization node on a super computer to use programs such as IGV there. We will not go into detail on the usage of X2Go in this class, but the following information may be useful if you ever need to use it.

Install X2Go

Using x2go to log on to a super computer

Using x2go to log on to a super computer (Windows users)

Day 5 | Assessment

Today it's about catching up and making sure we are ready for next week! We will convert a big BAM file into something small enough to be transferred quickly and visualized. We will also take a quick assessment to make sure everyone is comfortable with the first week of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be at a much quicker pace!

Class slides

TDFs for visualization

Class worksheets

TDF Worksheet and Assessment

Homework

Homework day 5

Day 6 | RNA-seq: R & Read Counting

The goal for day 6 is to begin to analyze RNA-seq data -- the single most commonly obtained short read sequencing data! We will focus on the most common pipeline. Since most differential expression programs are written in R, today we will focus on learning how to use R (a statistical computing environment). We will also discuss how to count reads over genes.

Videos (watch before class!)

6.1 | Intro to R

6.2 | Counting reads

Class slides

Intro to R and Read Counting

Class worksheets and scripts

Learning R worksheet

Learning R script

Featurecounts worksheet

Featurecounts R script

Homework

Homework script - Learning R

pokemon_data

Day 7 | RNA-seq: Differential Expression

Today we will run DEseq2 -- the most commonly used program for differential expression assessment. We'll talk about how to interpret results and build quality designs.

Videos (watch before class!)

7.1 | Differential expression overview

7.2 | Differential expression with Deseq

Class slides

Differential expression with Deseq2

Class worksheets and scripts

DESeq2 worksheet

DESeq2 script

Homework

DESeq2 homework

Additional Material / Useful Links

Beginner's Guide to DEseq2 package

DESeq2 Manual

Deseq2 walk through

An alternative DEseq tutorial

Isoform read counting with Stringtie

Isoform DE with Ballgown

Day 8 | RNA-seq: Advanced DESeq2 Methods

Today we will do more practice with DESeq2, discuss more subtleties of analysis and create advanced condition designs in order to address them.

Videos (watch before class!)

8.1 | Multifactor Designs in DESeq2

Class slides

Multifactor Designs in DESeq2

Class worksheets

Worksheet day 8

Homework

Homework day 8

Day 9 | ChIP-seq, ATAC-seq & Peak Calling

The goal for day 9 is to discuss more peak-centric sequencing methods. To this end we will cover the basic analysis of ChIP-seq and ATAC-seq. For this we will learn to use MACS2 and the different settings required in peak calling depending on the type of data. We also will discuss index files, post-sequencing QC, bedtools, and a few other key concepts.

Class slides

Peak calling and bedtools slides

Homework

Homework - MACS2

Homework - Bedtools

Day 10 | Downstream Analysis

The goal of day 10 is to wrap up and give you pointers on where to go next. We will cover downloading data from public databases, working with github, and when to make your own tools/pipelines.

Videos (watch before class!)

10.1 | Git and GitHub (external video)

10.2 | Nextflow pipelines (external video)

Class slides

Big Data Ethics

Day 2 | Unix

Day 3 | Intro to Servers & QC

Day 4 | Trimming & Mapping

Day 5 | Assessment

Day 6 | RNA-seq: Read counting

Day 7 | RNA-seq: DESeq2

Day 8 | RNA-seq: Advanced DESeq2

Day 9 | ChIP/ATAC-seq

Day 10 | Downstream Analysis

Short Read Sequencing Workshop | Basics of HTS Data Analysis

2025 Previous Years

2022 Workshop Archive

You must register to attend this workshop. Please register here

Before Day 1 | Tasks to Complete

For everyone (MacOS/Windows/Linux)

For Windows users

Optional Items

Day 1 | Intro to Sequencing and Overview of Pipelines

Videos (watch before class!)

1.1 | Course Overview

1.2 | Sequencing: Creating Libraries

1.3 | Sequencing: Pre-sequencing library quality

1.4 | Sequencing: Illumina Sequencing

1.5 | Sequencing: Designing sequencing experiments

1.6 | Basic Unix: File System/Directory Trees

1.7 | Basic Unix: Commands for Moving Files/Directories

1.8 | Basic Unix: Vim Tutorial

Class slides

Libraries and sequencing

Class worksheets and notes

Logging onto a super computer

Configuring SSH Keys (if you are using a AWS)

Logging onto Google Shell

Using vim tutor

Vim crash 911

Creating a variable in bash

Homework

Homework day 1

Library QC challenge

Library QC challenge - answers

Additional material / Useful links

Basic info about file types in the class

Illumina sequencing technology

Illumina video

More information about barcoding

Library kit considerations

Illumina platform comparison

Day 2 | Intro to Linux & Vim

Videos (watch before class!)

2.1 | SSH and VPN Introduction

2.2 | Remote Rsync/Reading Files

2.3 | Searching/editing Files, Pipes, and Outputs

2.4 | Directory Permissions

Class slides

Intro to Linux/Unix

Class worksheets

Part 1: Basic Unix Skills

Part 2: Writing a Script

Homework

Homework Day 2

Additional material / Useful links

Linux/Unix cheatsheet

Vim cheatsheet

Day 3 | Intro to Servers, Public Data & Quality Control

Videos (watch before class!)

3.1 | Introduction to the Computer Cluster

3.2 | LMOD: An Environment Module System

3.3 | Transferring Data

3.4 | Slurm Workload Manager Basics

3.5 | FastQC Overiew

3.6 | Running FastQC

Class slides

Clusters and SLURM

Class worksheets and notes

Part 1: Downloading data with sbatch

Part 2: Running fastqc on the compute nodes

Windows - where are my files?

Homework

Homework Day 3

Downloading/Running IGV

Additional material / Useful links

rsync for directories (the slash position matters)

Unix Permissions

Slurm (sbatch) Cheatsheet

FastQC Website

Previous FastQC worksheet (more detail if needed)

Day 4 | Trimming, Mapping & IGV (Genome Browser Visualization)

Videos (watch before class!)

4.1 | Trimmomatic Overview