DnA Lab | Short Read Workshop

2019 Workshop Archive

Course Outline: Download PDF

Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)

Address: 3415 Colorado Ave, Boulder, CO 80303

Room: A108

Time: 9am-12pm (class hours) & 1pm-3pm (helproom hours)

Parking: See here

Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu) & Margaret Gruca (margaret.gruca AT colorado DOT edu)

IT support: BIT Help (bit-help AT colorado DOT edu)

Before Day 1 | Tasks to Complete

This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actually classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.

For everyone (MacOS/Windows/Linux)

Go to GitHub and register for an account (if you do not have an existing one). Send your user ID to 'bit-help AT colorado DOT edu' so we can give you access to the AWS server

Install X2GO Client

X2Go client is an open source remote desktop software for Linux systems. In other words, you will be able to use this application for desktop visualization of data stored on the remote system (the AWS server). Our primary use of X2Go will be in visuzalting genomic data. See installation instructions for MacOS/Windows here.

Generate an ssh key

You will need to add a public ssh key to your GitHub account. You can view your key by going to the following link https://github.com/USERNAME.keys and replacing USERNAME with your GitHub username. See an example here.

For instructions on generating a key, see here. For instructions on adding this key to your GitHub account, see here. Once you have completed these two steps, check to make sure your key is added by checking this link https://github.com/USERNAME.keys mentioned in the previous section.

Email your GitHub user ID to BIT help @ 'bit-help AT colorado DOT edu'

For Windows users

In addition to the items above, Windows users will need to install a terminal application. We recommend one of the two following options:

Optional Items

In addition to the items above, we recommend users going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.

Day 1: Intro to Sequencing and Overview of Pipelines

Cover the basic principles of High-throughput Sequencing (HTS) and the subsequent steps involved in processing this data (analysis pipelines).

Videos

1.1 | Course Overview

1.2 | Illumina Sequencing

1.3 | Library Prep Q & A

1.4 | Overview of Sequencing

Slides

Couse Introduction

Sequencing and Data Processing Considertions

Library Quality Control

Additional Material/Useful Links

Day 1 Class Summary

Sims2014 | Sequencing Depth and Coverage Considerations

Head2014 | Library Prep Challenges and Considerations

Reuter2015 | High-Throughput Sequencing Technologies

Day 2: Intro to Unix & Vim/Emacs

Introduction to basic Unix commands and text editors (Vim). You can Google search Unix commands/vim commands and go to images for handy cheat-sheets for all of these tools.

Videos

2.1 | SSH and VPN Introduction

2.2 | File System/Directory Trees

2.3 | Basic Commands for Moving Files/Directories

2.4 | Remote Rsync/Reading Files

2.5 | Searching/editing Files, Pipes, and Outputs

2.6 | Directory Permissions

2.7 | Vim Tutorial

Slides

Unix/Vim Summary Slides

Homework

Practice Basics of Unix

Additional Material/Useful Links

Unix Cheat Sheet

Day 3: Intro to Servers & Downloading Public Data

Understand the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server.

Videos

3.1 | Introduction to the Computer Cluster

3.2 | LMOD: An Environment Module System

3.3 | Transferring Data

3.4 | Slurm Workload Manager Basics

Slides

Introduction to HPC and Workload Managers

Homework

Practice sbatch and computing basics

Additional Material/Useful Links

Slurm (sbatch) Cheatsheet

Day 4: Quality Control, Mapping & IGV (Genome Browser Visualization)

Quality control, mapping, and visualization are the first steps that take place after obtaining your sequencing data. It is important to assess that your sequencing experiment is successful before moving on to downstream analyses.

Videos

4.1 | Barcoding

4.2 | FastQC Overiew

4.3 | Running FastQC

4.4 | Trimmomatic Overview

4.5 | Running Trimmomatic

4.6 | Introduction to Mapping

4.7 | Bowtie2 Overview

4.8 | Running Bowtie2

4.9 | Integrative Genomics Viewer (IGV)

Slides

Quality Control, Mapping, & IGV

Homework

4.1 | FastQC homework

4.2 | Trimmomatic homework

4.3 | Mapping homework

4.4 | Additional homework practice

Additional Material/Useful Links

QC Fail

MultiQC : a program that will assess (in batch) all quality control output from a given experiment for user-friendly visualization

Baruzzo2016 | Benchmarking of Sequence Alignment Tools

Day 5: Assessment

Quick assessment to make sure everyone is comfortable with the first four days of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be a much quicker pace.

Quiz

Day 5 Quiz/Assessment

Additional Material/Useful Links

Sbatch and Software Notes

Day 6: RNA-seq

For RNA-seq, we will cover read counting using featureCounts, isoform analysis using Stringtie/Ballgown \& creating a custom GTF file from these annotations, and differential expression analysis (DEA) using DESeq2.

Videos

6.1 | HISAT2 Introduction

6.2 | Running HISAT2

6.3 | Differential Expression Analysis (DEA)

6.4 | Read Counting (FeatureCounts)

6.5 | DEA (DESeq2)

6.6 | Isoform Analysis (StringTie)

6.7 | DEA (Ballgown)

Slides

6.1 | R Basics & FeatureCounts

6.2 | DEA

Homework

R, FeatureCounts, DEA Homework

In-class Worksheet(s)

6.1 | HISAT2

6.2 | R Basics & FeatureCounts Worksheet

6.3 | DEA

Additional Material/Useful Links

Day 7: QC cont'd & Nascent Sequencing

We will learn how to assess the quality of our data post-mapping (e.g. mostly analyzing BAM files). We will also learn how to use MultiQC to combine outputs from multiple samples into one concatenated QC report. In the second half of the day, we will learn how to annotate nascent sequencing data as most of the annotations are based off of ChIP (for enhancers) and RNA-seq/Steady-State (for genes). In nascent analyses, we can capture elements such as intergenic \& intragenic transcription regulatory elements, altnerative 5'-end RNA polymerase initiation, and 3'-end run-on. As such, we need to be able to quickly capture all of these elements to analyze using methods such as motif displacement analysis, differential transcription analysis, and comparitive analyses with RNA/ChIP/ATAC-seq. We will use FStitch to capture these unnanotated regions and learn the principles of Tfit and DAStk beforehand in the video as well as in principle in the hands-on portion of the workshop.

Videos

7.1 | Nascent Transcription Analysis

Slides

7.1 | QC & Nascent Analysis

Homework

7.1 | Homework: QC & MD Score Analysis

In-class Worksheet(s)

Nascent Worksheet | QC & FStitch

Additional Material/Useful Links

Azofeifa2017 | FStitch

Azofeifa2017 | Tfit

FStitch Usage | GitHub

Tfit Usage | GitHub

DAStk Usage | GitHub

Day 8: Variant Calling, DNA-seq, & Single-Cell Sequencing

Variant calling using GATK and single-cell sequencing analysis.

Videos

8.1 | DNA-seq & Variant Calling Introduction

8.2 | GATK

8.3 | Single-Cell Sequencing (1/3)

8.4 | Single-Cell Sequencing (2/3)

8.5 | Single-Cell Sequencing (3/3)

Slides

8.1 | GATK

8.2 | Single-Cell Sequencing

In-class Worksheet(s)

8.1 | GATK Worksheet

8.2 | Single-Cell Sequencing Worksheet

Homework

Additional Material/Useful Links

Day 9: ChIP-seq & ATAC-seq

This section will cover the basic analysis of ChIP-seq and ATAC-seq. We will cover peak calling using MACS2, the different setting required in peak calling depending on the type type of data, and motif displacement (MD) analysis using DAStk (also covered in Day 7 homework).

Videos

9.1 |Introduction to ChIP-seq

9.2 | Peak Calling with MACS2

9.3 | Evaluating ChIP-seq Data

9.4 | Motif Calling (MEME)

9.5 | Introduction to ATAC-seq

Slides

ChIP/ATAC-seq Analysis

Homework

Homework | ChIP-seq Analysis

In-class Worksheet(s)

9.1 | Worksheet MACS2

9.2 | Worksheet Calculating MD Scores

9.3 | Worksheet Differential MD Score Analysis

Additional Material/Useful Links

MACS2 Usage |GitHub

BEDTools Documentation | Intersect

Azofeifa2018 | MD Score Analysis

Andrysik2017 | GRO/ChIP/RNA-seq Data used in workshop

Sanchez2018 | Additional ChIP data used in workshop

Day 10: Downstream Analysis

Day 10 is meant to get you started learning additional tools that might be helpful in downstream analysis of short read data. This year (2019) we will cover downloading data from ENCODE, some basic BedTools commands, and open biomedical ontologies. Additionally, an advanced homework assignment will go over creating command-line runnable bash scripts with for-loops, if-statements, and user-input.

Videos

10.1 | Introduction to R Studio

In-class Worksheet(s)

Worksheet | ENCODE & BEDTools

Homework

Homework | BEDTools Practice

Day 2 | Unix & Vim

Day 3 | Intro to Servers

Day 4 | QC & Mapping

Day 5 | Assessment

Day 6 | RNA-seq

Day 7 | QC cont'd & Nascent

Day 8 | Variant Calling & scRNA-seq

Day 9 | ChIP/ATAC-seq

Day 10 | Downstream Analysis

Short Read Sequencing Workshop | Basics of HTS Data Analysis

2025 Previous Years

2019 Workshop Archive

Before Day 1 | Tasks to Complete

For everyone (MacOS/Windows/Linux)

For Windows users

Optional Items

Day 1: Intro to Sequencing and Overview of Pipelines

Videos

1.1 | Course Overview

1.2 | Illumina Sequencing

1.3 | Library Prep Q & A

1.4 | Overview of Sequencing

Slides

Couse Introduction

Sequencing and Data Processing Considertions

Library Quality Control

Additional Material/Useful Links

Day 1 Class Summary

Sims2014 | Sequencing Depth and Coverage Considerations

Head2014 | Library Prep Challenges and Considerations

Reuter2015 | High-Throughput Sequencing Technologies

Day 2: Intro to Unix & Vim/Emacs

Videos

2.1 | SSH and VPN Introduction

2.2 | File System/Directory Trees

2.3 | Basic Commands for Moving Files/Directories

2.4 | Remote Rsync/Reading Files

2.5 | Searching/editing Files, Pipes, and Outputs

2.6 | Directory Permissions

2.7 | Vim Tutorial

Slides

Unix/Vim Summary Slides

Homework

Additional Material/Useful Links

Unix Cheat Sheet

Day 3: Intro to Servers & Downloading Public Data

Videos

3.1 | Introduction to the Computer Cluster

3.2 | LMOD: An Environment Module System

3.3 | Transferring Data

3.4 | Slurm Workload Manager Basics

Slides

Homework

Additional Material/Useful Links

Day 4: Quality Control, Mapping & IGV (Genome Browser Visualization)

Videos

4.1 | Barcoding

4.2 | FastQC Overiew

4.3 | Running FastQC

4.4 | Trimmomatic Overview

4.5 | Running Trimmomatic

4.6 | Introduction to Mapping

4.7 | Bowtie2 Overview

4.8 | Running Bowtie2

4.9 | Integrative Genomics Viewer (IGV)

Slides

Homework

4.1 | FastQC homework

4.2 | Trimmomatic homework

4.3 | Mapping homework

4.4 | Additional homework practice

Additional Material/Useful Links

QC Fail

MultiQC : a program that will assess (in batch) all quality control output from a given experiment for user-friendly visualization

Baruzzo2016 | Benchmarking of Sequence Alignment Tools

Day 5: Assessment

Quiz

Day 5 Quiz/Assessment

Additional Material/Useful Links

Sbatch and Software Notes

Day 6: RNA-seq

Videos

6.1 | HISAT2 Introduction

6.2 | Running HISAT2

6.3 | Differential Expression Analysis (DEA)

6.4 | Read Counting (FeatureCounts)

6.5 | DEA (DESeq2)

6.6 | Isoform Analysis (StringTie)

6.7 | DEA (Ballgown)