2022 Workshop Archive


You must register to attend this workshop. Please register here


Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)

Address: 3415 Colorado Ave, Boulder, CO 80303

Room: A104 (class) and B231 (office hours)

Time: July 11-15 & 18-22, 9am-12pm (class hours, A104) & 1pm-3pm (office hours, B231)

Parking: See here

Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu) and Dr. Lynn Sanford (lynn.sanford AT colorado DOT edu)

IT support: BIT Help (bit-help AT colorado DOT edu). Please include sr2022 in the subject line on any emails to BIT Help.


Before Day 1 | Tasks to Complete


This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actual classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.

For everyone (MacOS/Windows/Linux)

  1. Sign up for a free github account
  2. Go to GitHub and register for an account (if you do not have an existing one).

  3. Put your GitHub username in this Google spreadsheet
  4. We will use your github username to give you temporary access to a super computer, an Amazon Cloud (AWS) server.

For Windows users

In addition to the items above, Windows users will need to install a terminal application. We recommend installing Linux Bash Shell with Ubuntu.

Optional Items

  • In addition to the items above, we recommend users going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.
  • Download and install IGV. You will need to do this for Day 4.
  • Download and install R and RStudio. You will need to do this for Day 6.

Day 1 | Intro to Sequencing and Overview of Pipelines


The goal of the first day is to get oriented to the class, the AWS super computer, and sequencing data. We will present an overview of the course and discuss the basic principles of high-throughput sequencing (HTS) and subsequent steps involved in processing this data (analysis pipelines). We will confirm that everyone can access the AWS instance and begin learning the VIM text editor.

Videos (watch before class!)

1.1 | Course Overview
1.2 | Sequencing: Creating Libraries
1.3 | Sequencing: Pre-sequencing library quality
1.4 | Sequencing: Illumina Sequencing
1.5 | Sequencing: Designing sequencing experiments
1.6 | Basic Unix: File System/Directory Trees
1.7 | Basic Unix: Commands for Moving Files/Directories
1.8 | Basic Unix: Vim Tutorial

Class slides

Libraries and sequencing

Class worksheets and notes

Logging onto a super computer
Configuring SSH Keys (if you are using a AWS)
Logging onto Google Shell
Using vim tutor
Vim crash 911
Creating a variable in bash

Homework

Homework day 1
Library QC challenge
Library QC challenge - answers

Additional material / Useful links

Basic info about file types in the class
Illumina sequencing technology
Illumina video
More information about barcoding
Library kit considerations
Illumina platform comparison

Day 2 | Intro to Linux & Vim


The goal of day 2 is to learn some basic Linux/Unix commands for managing files. Today we will go over basic Unix/Linux commands and work more with text editors (Vim). You can Google search Linux/Unix commands/Vim commands and use the handy cheat-sheets (below) for all of these tools.

Videos (watch before class!)

2.1 | SSH and VPN Introduction
2.2 | Remote Rsync/Reading Files
2.3 | Searching/editing Files, Pipes, and Outputs
2.4 | Directory Permissions

Class slides

Intro to Linux/Unix

Class worksheets

Part 1: Basic Unix Skills
Part 2: Writing a Script

Homework

Homework Day 2

Additional material / Useful links

Linux/Unix cheatsheet
Vim cheatsheet

Day 3 | Intro to Servers, Public Data & Quality Control


The goal of day 3 is to move to super computer thinking. Therefore, today we will learn the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server and how to run some basic quality control. Today we start using real data! We will evaluate your sequencing data for its quality. Remember: Garbage In, Garbage out. It is never worth your time to try to analyze garbage.

Videos (watch before class!)

3.1 | Introduction to the Computer Cluster
3.2 | LMOD: An Environment Module System
3.3 | Transferring Data
3.4 | Slurm Workload Manager Basics
3.5 | FastQC Overiew
3.6 | Running FastQC

Class slides

Clusters and SLURM

Class worksheets and notes

Part 1: Downloading data with sbatch
Part 2: Running fastqc on the compute nodes
Windows - where are my files?

Homework

Homework Day 3
Downloading/Running IGV

Additional material / Useful links

rsync for directories (the slash position matters)
Unix Permissions
Slurm (sbatch) Cheatsheet
FastQC Website
Previous FastQC worksheet (more detail if needed)

Day 4 | Trimming, Mapping & IGV (Genome Browser Visualization)


Congrats! It is time to now do some real data analysis! Today we will do more quality control and learn about mapping and visualization. These are the first steps that take place after obtaining your sequencing data. It is important to determine whether your sequencing experiment is successful before moving on to downstream analyses.

Videos (watch before class!)

4.1 | Trimmomatic Overview
4.2 | Running Trimmomatic
4.3 | Introduction to Mapping
4.4 | The HISAT2 mapper
4.5 | Mapping with HISAT2
4.6 | Integrative Genomics Viewer (IGV)

Class slides

Trimming and mapping

Class worksheets

Trimmomatic worksheet
Mapping/IGV worksheet

Homework

Homework day 4

Additional material / Useful links

HISAT2 Manual
Samtools Manual
IGV Manual
MultiQC Website

X2Go allows you to log into a visualization node on a super computer to use programs such as IGV there. We will not go into detail on the usage of X2Go in this class, but the following information may be useful if you ever need to use it.

Install X2Go

Using x2go to log on to a super computer
Using x2go to log on to a super computer (Windows users)

Day 5 | Assessment


Today it's about catching up and making sure we are ready for next week! We will convert a big BAM file into something small enough to be transferred quickly and visualized. We will also take a quick assessment to make sure everyone is comfortable with the first week of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be at a much quicker pace!

Class slides

TDFs for visualization

Class worksheets

TDF Worksheet and Assessment

Homework

Homework day 5

Day 6 | RNA-seq: R & Read Counting


The goal for day 6 is to begin to analyze RNA-seq data -- the single most commonly obtained short read sequencing data! We will focus on the most common pipeline. Since most differential expression programs are written in R, today we will focus on learning how to use R (a statistical computing environment). We will also discuss how to count reads over genes.

Videos (watch before class!)

6.1 | Intro to R
6.2 | Counting reads

Class slides

Intro to R and Read Counting

Class worksheets and scripts

Learning R worksheet
Learning R script
Featurecounts worksheet
Featurecounts R script

Homework

Homework script - Learning R
pokemon_data

Day 7 | RNA-seq: Differential Expression


Today we will run DEseq2 -- the most commonly used program for differential expression assessment. We'll talk about how to interpret results and build quality designs.

Videos (watch before class!)

7.1 | Differential expression overview
7.2 | Differential expression with Deseq

Class slides

Differential expression with Deseq2

Class worksheets and scripts

DESeq2 worksheet
DESeq2 script

Homework

DESeq2 homework

Additional Material / Useful Links

Beginner's Guide to DEseq2 package
DESeq2 Manual
Deseq2 walk through
An alternative DEseq tutorial
Isoform read counting with Stringtie
Isoform DE with Ballgown

Day 8 | RNA-seq: Advanced DESeq2 Methods


Today we will do more practice with DESeq2, discuss more subtleties of analysis and create advanced condition designs in order to address them.

Videos (watch before class!)

8.1 | Multifactor Designs in DESeq2

Class slides

Multifactor Designs in DESeq2

Class worksheets

Worksheet day 8

Homework

Homework day 8

Day 9 | ChIP-seq, ATAC-seq & Peak Calling


The goal for day 9 is to discuss more peak-centric sequencing methods. To this end we will cover the basic analysis of ChIP-seq and ATAC-seq. For this we will learn to use MACS2 and the different settings required in peak calling depending on the type of data. We also will discuss index files, post-sequencing QC, bedtools, and a few other key concepts.

Videos (watch before class!)

9.1 | Introduction to ChIP-seq
9.2 | Peak Calling with MACS2
9.3 | Evaluating ChIP-seq Data
9.4 | Motif Calling (MEME)
9.5 | Introduction to ATAC-seq
9.6 | Introduction to BEDTools

Class slides

Peak calling and bedtools slides

Class worksheets

Processing ChIP-seq
MACS2
Bedtools

Homework

Homework - MACS2
Homework - Bedtools

Additional material / Useful links

MACS2 peak caller
HMMRATAC peak caller
BEDTools Documentation Homepage
Azofeifa2018 | MD Score Analysis
Andrysik2017 | GRO/ChIP/RNA-seq Data used in workshop

Day 10 | Downstream Analysis


The goal of day 10 is to wrap up and give you pointers on where to go next. We will cover downloading data from public databases, working with github, and when to make your own tools/pipelines.

Videos (watch before class!)

10.1 | Git and GitHub (external video)
10.2 | Nextflow pipelines (external video)

Class slides

Big Data Ethics

Class worksheets

Worksheet - Choosing slurm parameters
Worksheet - Git and GitHub
Worksheet - Short read analysis best practices
Worksheet - Getting genes for GO and GSEA
Worksheet - Gene ontology analysis
Worksheet - Running GSEA

Class data files

Background genes for GO
Significant genes for GO
Ranked gene list for GSEA

Additional material / Useful links

Worksheet - Downloading public data
Worksheet - What super computer can I use?
Worksheet - Nascent and Tfit
Other tools for specific analysis steps
Reactome Pathway Analyzer
Gene Ontology (GO)
The OBO Foundry
ENCODE
Official Bash Documentation
Filetypes FAQ
Useful tools I've found
Github tutorial
Nextflow Documentation
Azofeifa2017 | FStitch
Azofeifa2017 | Tfit
FStitch Usage | GitHub
Tfit Usage | GitHub
DAStk Usage | GitHub