DnA Lab | Short Read Workshop

2021 Workshop Archive

Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)

Address: 3415 Colorado Ave, Boulder, CO 80303

Room: A108 (9am-12pm); B331 (1-3pm, office hours)

Time: 9am-12pm (class hours, A108) & 1pm-3pm (helproom hours, B331)

Parking: See here

Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu), Zach Maas (Zachary.Maas AT colorado DOT edu), and Lynn Sanford (Lynn.Sanford AT colorado DOT edu)

IT support: BIT Help (bit-help AT colorado DOT edu). Please included sr2021 any emails to BIT Help.

Before starting if you are working alone | Tasks to Complete

This course is taught reverse-classroom style. In other words, we will post videos for you to watch before you try to work through that days materials. The videos will go over the premise for the day ahead. Then you will try to work through the materials for that day. If you are at CU Boulder and working on the super computer called fiji, everything you are insturcted to type should be the same, excpet for logging into the super computer. This is becuase we have installed all the material for class on fiji for you in the some location it was on the AWS. However, if you are working from a super computer other than fiji you will likely want to git clone our scripts. Follow these directions to get the materials on your super computer.

For everyone (MacOS/Windows/Linux)

Install X2GO Client

X2Go client is an open source remote desktop software for Linux systems. In other words, you will be able to use this application for desktop visualization of data stored on the remote system (the AWS server). Our primary use of X2Go will be in visuzalting genomic data. See installation instructions for MacOS/Windows here.

For Windows users

In addition to the items above, Windows users will need to install a terminal application. We recommend installing Linux Bash Shell with Ubuntu. Bash for Windows with Ubuntu

Optional Items

In addition to the items above, we recommend users going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.

Before Day 1 if you are coming to the workshop | Tasks to Complete

This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actually classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.

For everyone (MacOS/Windows/Linux)

Go to GitHub and register for an account (if you do not have an existing one).

Put your GitHub username in this Google spreadsheet

We will use your github username to give you temporary access to a super computer, an Amazon Cloud (AWS) server.

Install X2GO Client

For Windows users

In addition to the items above, Windows users will need to install a terminal application. We recommend installing Linux Bash Shell with Ubuntu. Bash for Windows with Ubuntu

Optional Items

Day 1: Intro to Sequencing and Overview of Pipelines

The goal of the first day is to get oriented to the class, the AWS super computer, and sequencing data. We will go over an overview of the course, to discuss the basic principles of High-throughput Sequencing (HTS) and the subsequent steps involved in processing this data (analysis pipelines). We will confirm that everyone can access the AWS instance and begin learning the VIM text editor. If you are doing this on the fiji server you do not need a ssh key on github.

Videos (watch before class!)

1.1 | Course Overview

1.2 | Creating Libraries

1.3 | Pre-sequencing library quality

1.4 | Illumina Sequencing

1.5 | Designing sequencing experiments

1.6 | Vim Tutorial

In class slides, notes and materials

Configuring SSH Keys (only needed if you are using a AWS)

Logging on to a super computer

Using vim tutor

Using x2go to log on to a super computer

Using x2go to log on to a super computer (Windows users)

Libraries and sequencing

Creating a variable in bash

Vim crash 911

Homework

Library QC challenge

Library QC challenge - answers

Additional Material/Useful Links (external websites)

Basic info about file types in the class

Illumina sequencing technology

Day 2: Intro to Linux & Vim

The goal of day 2 is to learn some basic Linux/Unix commands for managing files. Today we will go over basic Unix/Linux commands and text editors (Vim). You can Google search Linux/Unix commands/vim commands and use the handy cheat-sheets (below) for all of these tools.

Videos (watch before class!)

2.1 | SSH and VPN Introduction

2.2 | File System/Directory Trees

2.3 | Basic Commands for Moving Files/Directories

2.4 | Remote Rsync/Reading Files

2.5 | Searching/editing Files, Pipes, and Outputs

2.6 | Directory Permissions

In class slides, notes and materials

Linux slides

Homework

Additional Material/Useful Links (external websites)

Linux walk through

Linux/Unix cheatsheet

Vim cheatsheet

Day 3: Intro to Servers & downloading Public Data

The goal of day 3 is to move to super computer thinking. Therefore, today we will learn the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server.

Videos (watch before class!)

3.1 | Introduction to the Computer Cluster

3.2 | LMOD: An Environment Module System

3.3 | Transferring Data

3.4 | Slurm Workload Manager Basics

In class slides, notes and materials

Day 3 Slides

Windows - where are my files

Homework

Practice sbatch and computing basics

Additional Material/Useful Links (external websites)

rsync for directories (the slash position matters)

Unix Permissions

Slurm (sbatch) Cheatsheet

In Class Exercise Solution

Day 4: Quality Control, Mapping & IGV (Genome Browser Visualization)

Congrats! It is time to now do some real data analysis! The goal of day 4 is to evaluate your sequencing data for its quality. Remember: Garbage In, Garbage out. It is never worth your time to try to analyze garbage. Today we will learn about quality control, mapping, and visualization -- which are the first steps that take place after obtaining your sequencing data. It is important to determine whether your sequencing experiment is successful before moving on to downstream analyses.

Videos (watch before class!)

4.1 | Barcoding

4.2 | FastQC Overiew

4.3 | Running FastQC

4.4 | Trimmomatic Overview

4.5 | Running Trimmomatic

4.6 | Introduction to Mapping

4.7 | The HISAT2 mapper

4.8 | Mapping with HISAT2

4.9 | Integrative Genomics Viewer (IGV)

In class slides, notes and materials

QC slides

Mapping and viz

Homework

Homework day 4

Additional Material/Useful Links

fastqc worksheet

trimmomatic worksheet

mapping and IGV worksheet

FastQC Website

MultiQC Website

Day 5: Assessment

Today it's about catching up and making sure we are ready for next week! We will continue with our discussion of read mapping (started on day 4) and take a quick assessment to make sure everyone is comfortable with the first week of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be at a much quicker pace!

Quiz

Assessment (timed!)

Homework

Homework day 5

Additional Material/Useful Links

Assessment worksheet

Day 6: RNA-seq

The goal for day 6 is to begin to analyze RNA-seq data -- the single most commonly obtained short read sequencing data! Today we will focus on the most common pipeline. Since most differential expression programs are written in R, we will focus today on learning how to use R (a statistical computing environment). We will also discuss how to count reads over genes. Note that the videos cover a much larger fraction of the process -- consider it a preview ...

Videos (watch before class!)

6.1 | Intro to R

6.2 Differential expression overview

6.3 | Counting reads

6.4 | Differential expression with Deseq

6.5 | Isoforms

6.6 | Ballgown

In class worksheets, slides, notes and materials

Using R worksheet

Learning R script

sbatch Feature Counts walk through

Homework

Learning_R_Additional_Practice script

pokemon_data

Additional Material/Useful Links (external websites)

Deseq2 walk through

Day 7: RNA-seq continued

Today we will run DEseq -- the most commonly used program for differential expression assessment. We'll talk about how to interpret results and build quality designs.

Videos (watch before class!)

6.5 | isoforms

6.6 | Ballgown

In class slides, notes and materials

Differential expression with Deseq2

sbatch Deseq2 walkthrough

Homework

In-class Worksheet(s)

In Class Slides

Homework

Additional Material/Useful Links (external websites)

Beginner's Guide to DEseq2 package DESeq2 Manual An alternative DEseq tutorial

Day 8: Finish RNA-seq & Single-Cell Sequencing

Today we will wrap up our discussing of differential expression with RNA_seq and discuss more advanced transcriptome sequencing:single cell.

Videos (watch before class!)

8.1 Nascent Transcription (we may move this to day10)

8.2 | Single-Cell Sequencing (1/3)

8.3 | Single-Cell Sequencing (2/3)

8.4 | Single-Cell Sequencing (3/3)

In class slides, notes and materials

RNAseq practice

Single cell slides

Single cell worksheet

In-class Worksheet(s)

Homework

Additional Material/Useful Links (external websites)

Azofeifa2017 | FStitch

Azofeifa2017 | Tfit

FStitch Usage | GitHub

Tfit Usage | GitHub

DAStk Usage | GitHub

Day 9: ChIP-seq & ATAC-seq

The goal for day 9 is to discuss more peak centric sequencing methods. To this end we will cover the basic analysis of ChIP-seq and ATAC-seq. For this we will learn to use MACS2, the different setting required in peak calling depending on the type type of data. We also will discuss index files, post-sequencing QC, and a few other key concepts.

Videos (watch before class!)

9.1 |Introduction to ChIP-seq

9.2 | Peak Calling with MACS2

9.3 | Evaluating ChIP-seq Data

9.4 | Motif Calling (MEME)

9.5 | Introduction to ATAC-seq

In class slides, notes and materials

QC and ChIP slides

Homework

In-class Worksheet(s)

Walkthrough HMMRATAC

Walkthrough for the program Macs2 which calls peaks

Bam file QC

Additional Material/Useful Links (external websites)

MACS2 Usage |GitHub

BEDTools Documentation | Intersect

Azofeifa2018 | MD Score Analysis

Andrysik2017 | GRO/ChIP/RNA-seq Data used in workshop

Sanchez2018 | Additional ChIP data used in workshop

Day 10: Downstream Analysis

The goal of day 10 is to wrap up and give you pointers on where to go next. We will cover downloading data from ENCODE, some basic BedTools commands, and open biomedical ontologies. We will also discuss nascent sequencing data and how this is distinct (both in data and analysis) from RNA-seq.

Additional Material/Useful Links (external websites)

Day 2 | Unix

Day 3 | Intro to Servers

Day 4 | QC & Mapping

Day 5 | Assessment

Day 6 | RNA-seq day 1

Day 7 | RNA-seq day 2

Day 8 | Single Cell sequencing

Day 9 | ChIP/ATAC-seq

Day 10 | Downstream Analysis

Short Read Sequencing Workshop | Basics of HTS Data Analysis

2025 Previous Years

2021 Workshop Archive

Before starting if you are working alone | Tasks to Complete

For everyone (MacOS/Windows/Linux)

For Windows users

Optional Items

Before Day 1 if you are coming to the workshop | Tasks to Complete

For everyone (MacOS/Windows/Linux)

For Windows users

Optional Items

Day 1: Intro to Sequencing and Overview of Pipelines

Videos (watch before class!)

1.1 | Course Overview

1.2 | Creating Libraries

1.3 | Pre-sequencing library quality

1.4 | Illumina Sequencing

1.5 | Designing sequencing experiments

1.6 | Vim Tutorial

In class slides, notes and materials

Configuring SSH Keys (only needed if you are using a AWS)

Logging on to a super computer

Using vim tutor

Using x2go to log on to a super computer

Using x2go to log on to a super computer (Windows users)

Libraries and sequencing

Creating a variable in bash

Vim crash 911

Homework

Homework

Library QC challenge

Library QC challenge - answers

Additional Material/Useful Links (external websites)

Basic info about file types in the class

Illumina sequencing technology

Day 2: Intro to Linux & Vim

Videos (watch before class!)

2.1 | SSH and VPN Introduction

2.2 | File System/Directory Trees

2.3 | Basic Commands for Moving Files/Directories

2.4 | Remote Rsync/Reading Files

2.5 | Searching/editing Files, Pipes, and Outputs

2.6 | Directory Permissions

In class slides, notes and materials

Linux slides

Homework

Homework

Additional Material/Useful Links (external websites)

Linux walk through

Linux/Unix cheatsheet

Vim cheatsheet

Day 3: Intro to Servers & downloading Public Data

Videos (watch before class!)

3.1 | Introduction to the Computer Cluster

3.2 | LMOD: An Environment Module System

3.3 | Transferring Data

3.4 | Slurm Workload Manager Basics

In class slides, notes and materials

Day 3 Slides

Windows - where are my files

Homework

Additional Material/Useful Links (external websites)

rsync for directories (the slash position matters)

Unix Permissions

Slurm (sbatch) Cheatsheet

In Class Exercise Solution

Day 4: Quality Control, Mapping & IGV (Genome Browser Visualization)

Videos (watch before class!)

4.1 | Barcoding

4.2 | FastQC Overiew

4.3 | Running FastQC

4.4 | Trimmomatic Overview

4.5 | Running Trimmomatic

4.6 | Introduction to Mapping

4.7 | The HISAT2 mapper

4.8 | Mapping with HISAT2

4.9 | Integrative Genomics Viewer (IGV)

In class slides, notes and materials

QC slides

Mapping and viz