DnA Lab | Short Read Workshop

2024 Workshop - General Information

*Please note that specific videos and documents may change until a few weeks before the workshop*

You must register to attend this workshop. Please register here.

Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)

Address: 3415 Colorado Ave, Boulder, CO 80303

Room: A108 (class and office hours)

Time: July 8-12/15-19, 9am-12pm (class hours) & 1pm-3pm (office hours)

Parking: See here

Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu) and Dr. Lynn Sanford (lynn.sanford AT colorado DOT edu)

IT support: BIT Help (bit-help AT colorado DOT edu). Please include sr2024 in the subject line on any emails to BIT Help.

GitHub class repository: Dowell-Lab/srworkshop

Before Day 1 | Tasks to Complete

This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actual classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.

Before the first day of the workshop, there are a few things you need to do:

For everyone (MacOS/Windows/Linux)

Go to GitHub and register for an account (if you do not have an existing one).

Put your GitHub username in this Google spreadsheet

We will use your GitHub username to give you temporary access to an Amazon Cloud (AWS) server that we will use as the super computer for this course.

Watch the videos for the first day of class. Students in the DS2 cohort may skip the 'Sequencing' videos.
Starting on Day 6, we will be using the programming language R. We encourage you to install R and its design environment RStudio on your own computer, although if you run into major installation problems, we do have alternate options available.
R (version 4.4.1)

RStudio.

For Windows users

In addition to the items above, Windows users will need to install a terminal application. We recommend installing Windows Subsystem for Linux (WSL) with Ubuntu as an app interface.

Windows 11 instructions: How to install Bash on Windows 11 Configuring Windows Terminal
Windows 10 instructions: Bash for Windows with Ubuntu
Troubleshooting:
- The first thing to check if you're having problems is whether WSL is enabled. Open your Start menu and type "Turn Windows features on or off" until the Control Panel shows up, then click on it. Scroll down and make sure that "Virtual Machine Platform" and "Windows Subsystem for Linux" are both checked. You may need to restart your computer afterward. Then try to continue with the installation.
- If you are still having trouble, we recommend you come a half hour early to the first day of the workshop. There will be someone on hand to help you.

Optional Items

In addition to the items above, we recommend beginners going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.

Day 1 | Intro to Sequencing and Overview of Pipelines

The goal of the first day is to get oriented to the class, the AWS super computer, and sequencing data. We will present an overview of the course and discuss the basic principles of high-throughput sequencing (HTS) and subsequent steps involved in processing this data (analysis pipelines). We will confirm that everyone can access the AWS instance and begin learning basic Bash, Git/GitHub, and the VIM text editor.

Videos (watch before class!)

1.1 | Course Overview

1.2 | Sequencing: Creating Libraries

1.3 | Sequencing: Pre-sequencing library quality

1.4 | Sequencing: Illumina Sequencing

1.5 | Sequencing: Designing sequencing experiments

1.6 | Basic Unix: File System/Directory Trees

1.7 | Basic Unix: Commands for Moving Files/Directories

1.8 | Basic Unix: Vim Tutorial

Link to class materials

GitHub Dowell-Lab/srworkshop day01

Day 2 | Intro to Linux & Vim

The goal of day 2 is to learn some basic Linux/Unix commands for managing files. Today we will go over basic Unix/Linux commands and work more with text editors (Vim). You can Google search Linux/Unix commands/Vim commands and use the handy cheat-sheets (below) for all of these tools.

Videos (watch before class!)

2.1 | SSH and VPN Introduction

2.2 | Remote Rsync/Reading Files

2.3 | Searching/editing Files, Pipes, and Outputs

2.4 | Directory Permissions

Link to class materials

GitHub Dowell-Lab/srworkshop day02

Day 3 | Intro to Servers, Public Data & Quality Control

The goal of day 3 is to move to super computer thinking. Therefore, today we will learn the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server and how to run some basic quality control. Today we start using real data! We will evaluate your sequencing data for its quality. Remember: Garbage In, Garbage out. It is never worth your time to try to analyze garbage.

Videos (watch before class!)

3.1 | Introduction to the Computer Cluster

3.2 | LMOD: An Environment Module System

3.3 | Transferring Data

3.4 | Slurm Workload Manager Basics

3.5 | FastQC Overiew

3.6 | Running FastQC

Link to class materials

GitHub Dowell-Lab/srworkshop day03

Day 4 | Trimming, Mapping & IGV (Genome Browser Visualization)

Congrats! It is time to now do some real data analysis! Today we will do more quality control and learn about mapping and visualization. These are the first steps that take place after obtaining your sequencing data. It is important to determine whether your sequencing experiment is successful before moving on to downstream analyses.

Videos (watch before class!)

4.1 | Trimmomatic Overview

4.2 | Running Trimmomatic

4.3 | Introduction to Mapping

4.4 | The HISAT2 mapper

4.5 | Mapping with HISAT2

4.6 | Integrative Genomics Viewer (IGV)

Link to class materials

GitHub Dowell-Lab/srworkshop day04

Day 5 | Assessment

Today it's about catching up and making sure we are ready for next week! We will cover loops and how to convert a big BAM file into something small enough to be transferred quickly and visualized. We will also take an assessment to make sure everyone is comfortable with the first week of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be at a much quicker pace!

Class slides

Looping, Visualization, Assessment

Link to class materials

GitHub Dowell-Lab/srworkshop day05

Day 6 | RNA-seq: R & Read Counting

The goal for this week is to delve deeper into actual data analysis. Since most differential expression programs are written in R, today we will focus on learning how to use R (a statistical computing language/environment).

Videos (watch before class!)

6.1 | Intro to R

Link to class materials

GitHub Dowell-Lab/srworkshop day06

Mini-Project A | Single Cell RNA-seq

The goal for this project is to learn how to analyze single cell RNA-seq data. We'll first run through the initial analysis steps you've already learned on a single-cell dataset, then we'll apply our knowledge of R from yesterday to count reads, perform linear and non-linear dimensional reduction, cluster cells, and categorize cell type-specific gene expression. Through all of these steps we'll talk about how to interpret results and represent data.

Day 7 | Mapping and Filtering

Videos (watch before class!)

A7.2 | Single Cell Sequencing Overview

A7.1 | Single Cell Sequencing Analysis and Seurat

Link to class materials

GitHub Dowell-Lab/srworkshop projectA day07

Day 8 | Dimensional Reduction, Clustering, and Cell Type Annotation

Videos (watch before class!)

A8.1 | Clustering and Cell Type Identification

A8.2 | Single Cell Cell Type Annotations

Link to class materials

GitHub Dowell-Lab/srworkshop projectA day08

Day 9 | Cell Types, Gene Expression, and Advanced Analysis

Videos (watch before class!)

A9.1 | An Introduction to Cell Chat

Link to class materials

GitHub Dowell-Lab/srworkshop projectA day09

Mini-Project B | Multi-omics: RNA-seq/ChIP-seq/ATAC-seq

The goal for this project is to learn how to analyze RNA-seq data in depth and then integrate the results with . We'll first run through the initial analysis steps you've already learned on a single-cell dataset, then we'll apply our knowledge of R from yesterday to count reads, perform linear and non-linear dimensional reduction, cluster cells, and categorize cell type-specific gene expression. Through all of these steps we'll talk about how to interpret results and represent data.

Day 7 | Counting and DESeq2

B7.4 | Multifactor Designs in DESeq2 (Optional)

Link to class materials

GitHub Dowell-Lab/srworkshop projectB day07

Day 8 | ChIP-seq - Peak Calling and Motif Scanning

B8.5 | ATAC-seq Analysis (Optional)

Link to class materials

GitHub Dowell-Lab/srworkshop projectB day08

Day 9 | Integrating Data with BEDTools and Advanced Plotting

Videos (watch before class!)

B9.1 | Introduction to BEDTools

Link to class materials

GitHub Dowell-Lab/srworkshop projectB day09

Day 10 | Downstream Analysis

The goal of day 10 is to wrap up and give you pointers on where to go next. We will cover downloading data from public databases, working with github, and when to make your own tools/pipelines.

Videos (watch before class!)

10.1 | Git and GitHub (external video)

10.2 | Nextflow pipelines (external video)

Link to class materials

GitHub Dowell-Lab/srworkshop day10

Before Starting Class | TO DO

Day 1 | Sequencing Intro

Day 2 | Unix

Day 3 | Intro to Servers & QC

Day 4 | Trimming & Mapping

Day 5 | Assessment

Day 6 | RNA-seq: Read counting

Mini-Project A | Single Cell RNA-seq

Mini-Project B | Multi-omics: RNA-seq/ChIP-seq/ATAC-seq

Day 10 | Downstream Analysis

Short Read Sequencing Workshop | Basics of HTS Data Analysis

2025 Previous Years

2024 Workshop - General Information

You must register to attend this workshop. Please register here.

Before Day 1 | Tasks to Complete

For everyone (MacOS/Windows/Linux)

For Windows users

Optional Items

Day 1 | Intro to Sequencing and Overview of Pipelines

Videos (watch before class!)

1.1 | Course Overview

1.2 | Sequencing: Creating Libraries

1.3 | Sequencing: Pre-sequencing library quality

1.4 | Sequencing: Illumina Sequencing

1.5 | Sequencing: Designing sequencing experiments

1.6 | Basic Unix: File System/Directory Trees

1.7 | Basic Unix: Commands for Moving Files/Directories

1.8 | Basic Unix: Vim Tutorial

Link to class materials

GitHub Dowell-Lab/srworkshop day01

Day 2 | Intro to Linux & Vim

Videos (watch before class!)

2.1 | SSH and VPN Introduction

2.2 | Remote Rsync/Reading Files

2.3 | Searching/editing Files, Pipes, and Outputs

2.4 | Directory Permissions

Link to class materials

GitHub Dowell-Lab/srworkshop day02

Day 3 | Intro to Servers, Public Data & Quality Control

Videos (watch before class!)

3.1 | Introduction to the Computer Cluster

3.2 | LMOD: An Environment Module System

3.3 | Transferring Data

3.4 | Slurm Workload Manager Basics

3.5 | FastQC Overiew

3.6 | Running FastQC

Link to class materials

GitHub Dowell-Lab/srworkshop day03

Day 4 | Trimming, Mapping & IGV (Genome Browser Visualization)

Videos (watch before class!)

4.1 | Trimmomatic Overview

4.2 | Running Trimmomatic

4.3 | Introduction to Mapping

4.4 | The HISAT2 mapper

4.5 | Mapping with HISAT2

4.6 | Integrative Genomics Viewer (IGV)

Link to class materials

GitHub Dowell-Lab/srworkshop day04

Day 5 | Assessment

Class slides

Looping, Visualization, Assessment

Link to class materials

GitHub Dowell-Lab/srworkshop day05

Day 6 | RNA-seq: R & Read Counting

Videos (watch before class!)

6.1 | Intro to R

Link to class materials

GitHub Dowell-Lab/srworkshop day06

Mini-Project A | Single Cell RNA-seq

Day 7 | Mapping and Filtering

Videos (watch before class!)

A7.2 | Single Cell Sequencing Overview

A7.1 | Single Cell Sequencing Analysis and Seurat

Link to class materials

GitHub Dowell-Lab/srworkshop projectA day07

Day 8 | Dimensional Reduction, Clustering, and Cell Type Annotation

Videos (watch before class!)

A8.1 | Clustering and Cell Type Identification

A8.2 | Single Cell Cell Type Annotations

Link to class materials

GitHub Dowell-Lab/srworkshop projectA day08

Day 9 | Cell Types, Gene Expression, and Advanced Analysis

Videos (watch before class!)

A9.1 | An Introduction to Cell Chat

Link to class materials

GitHub Dowell-Lab/srworkshop projectA day09

Mini-Project B | Multi-omics: RNA-seq/ChIP-seq/ATAC-seq

Day 7 | Counting and DESeq2

Videos (watch before class!)

B7.1 | Counting Reads