2022 Workshop Archive
You must register to attend this workshop. Please register here
Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)
Address: 3415 Colorado Ave, Boulder, CO 80303
Room: A104 (class) and B231 (office hours)
Time: July 11-15 & 18-22, 9am-12pm (class hours, A104) & 1pm-3pm (office hours, B231)
Parking: See here
Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu) and Dr. Lynn Sanford (lynn.sanford AT colorado DOT edu)
IT support: BIT Help (bit-help AT colorado DOT edu). Please include sr2022 in the subject line on any emails to BIT Help.
Before Day 1 | Tasks to Complete
This course is taught reverse-classroom style. In other words, we will post videos
for you to watch before the next day of the workshop that will go over the premise
for the day ahead. In the actual classroom hours, we will then try to help you execute
the tasks described in the videos in a hands-on approach. The idea is that you will then
learn the theory ahead of time while having classroom helpers guide you through the nuts
and bolts to facilitate an environment for those that are relatively new to computing.
While we encourage you to attend every lecture as the content continues to build upon
previous days, please email us if you have concerns/questions about attending all 10 days.
For everyone (MacOS/Windows/Linux)
- Sign up for a free github account
Go to GitHub and register for
an account (if you do not have an existing one).
- Put your GitHub username in this Google spreadsheet
We will use your github username to give you temporary access to a super
computer, an Amazon Cloud (AWS) server.
For Windows users
In addition to the items above, Windows users will need to install a terminal
application. We recommend installing Linux Bash Shell with Ubuntu.
Optional Items
-
In addition to the items above, we recommend users going through a basic
command-line usage tutorial which can be found on
Codecademy.
The first week, will move relatively slowly through these basics, but if you find
you're having trouble keeping up, we highly recommend this before the second week.
-
Download and install
IGV.
You will need to do this for Day 4.
-
Download and install
R and
RStudio.
You will need to do this for Day 6.
Day 1 | Intro to Sequencing and Overview of Pipelines
The goal of the first day is to get oriented to the class, the AWS super computer,
and sequencing data. We will present an overview of the course and discuss the basic
principles of high-throughput sequencing (HTS) and subsequent steps involved in
processing this data (analysis pipelines). We will confirm that everyone can
access the AWS instance and begin learning the VIM text editor.
Videos (watch before class!)
1.1 | Course Overview
1.2 | Sequencing: Creating Libraries
1.3 | Sequencing: Pre-sequencing library quality
1.4 | Sequencing: Illumina Sequencing
1.5 | Sequencing: Designing sequencing experiments
1.6 | Basic Unix: File System/Directory Trees
1.7 | Basic Unix: Commands for Moving Files/Directories
1.8 | Basic Unix: Vim Tutorial
Class slides
Libraries and sequencing
Class worksheets and notes
Logging onto a super computer
Configuring SSH Keys (if you are using a AWS)
Logging onto Google Shell
Using vim tutor
Vim crash 911
Creating a variable in bash
Homework
Homework day 1
Library QC challenge
Library QC challenge - answers
Additional material / Useful links
Basic info about file types in the class
Illumina sequencing technology
Illumina video
More information about barcoding
Library kit considerations
Illumina platform comparison
Day 2 | Intro to Linux & Vim
The goal of day 2 is to learn some basic Linux/Unix commands for managing files.
Today we will go over basic Unix/Linux commands and work more with text editors (Vim).
You can Google search Linux/Unix commands/Vim commands and use the handy cheat-sheets
(below) for all of these tools.
Videos (watch before class!)
2.1 | SSH and VPN Introduction
2.2 | Remote Rsync/Reading Files
2.3 | Searching/editing Files, Pipes, and Outputs
2.4 | Directory Permissions
Class slides
Intro to Linux/Unix
Class worksheets
Part 1: Basic Unix Skills
Part 2: Writing a Script
Homework
Homework Day 2
Additional material / Useful links
Linux/Unix cheatsheet
Vim cheatsheet
Day 3 | Intro to Servers, Public Data & Quality Control
The goal of day 3 is to move to super computer thinking. Therefore, today we will
learn the basics of high-performance computing, servers, and job/workload managers.
It is also important to know how to access public data and transfer this data to and
from the server and how to run some basic quality control. Today we start using real
data! We will evaluate your sequencing data for its quality. Remember: Garbage In,
Garbage out. It is never worth your time to try to analyze garbage.
Videos (watch before class!)
3.1 | Introduction to the Computer Cluster
3.2 | LMOD: An Environment Module System
3.3 | Transferring Data
3.4 | Slurm Workload Manager Basics
3.5 | FastQC Overiew
3.6 | Running FastQC
Class slides
Clusters and SLURM
Class worksheets and notes
Part 1: Downloading data with sbatch
Part 2: Running fastqc on the compute nodes
Windows - where are my files?
Homework
Homework Day 3
Downloading/Running IGV
Additional material / Useful links
rsync for directories (the slash position matters)
Unix Permissions
Slurm (sbatch) Cheatsheet
FastQC Website
Previous FastQC worksheet (more detail if needed)
Day 4 | Trimming, Mapping & IGV (Genome Browser Visualization)
Congrats! It is time to now do some real data analysis! Today we will do more quality
control and learn about mapping and visualization. These are the first steps that take
place after obtaining your sequencing data. It is important to determine whether your
sequencing experiment is successful before moving on to downstream analyses.
Videos (watch before class!)
4.1 | Trimmomatic Overview
4.2 | Running Trimmomatic
4.3 | Introduction to Mapping
4.4 | The HISAT2 mapper
4.5 | Mapping with HISAT2
4.6 | Integrative Genomics Viewer (IGV)
Class slides
Trimming and mapping
Class worksheets
Trimmomatic worksheet
Mapping/IGV worksheet
Homework
Homework day 4
Additional material / Useful links
HISAT2 Manual
Samtools Manual
IGV Manual
MultiQC Website
X2Go allows you to log into a visualization node on a super computer to use programs such as IGV there. We will not go into detail on the usage of X2Go in this class, but the following information may be useful if you ever need to use it.
Install X2Go
Using x2go to log on to a super computer
Using x2go to log on to a super computer (Windows users)
Day 5 | Assessment
Today it's about catching up and making sure we are ready for next week! We will
convert a big BAM file into something small enough to be transferred quickly and visualized.
We will also take a quick assessment to make sure everyone is comfortable with the first
week of the workshop. We will give you a short task which should ideally take less than
an hour to complete. If it takes longer, it is a good idea to go home over the weekend
and review/practice skills from the first week. The second week will be at a much quicker
pace!
Class slides
TDFs for visualization
Class worksheets
TDF Worksheet and Assessment
Homework
Homework day 5
Day 6 | RNA-seq: R & Read Counting
The goal for day 6 is to begin to analyze RNA-seq data -- the single most commonly
obtained short read sequencing data! We will focus on the most common pipeline.
Since most differential expression programs are written in R, today we will focus
on learning how to use R (a statistical computing environment). We will also discuss
how to count reads over genes.
Videos (watch before class!)
6.1 | Intro to R
6.2 | Counting reads
Class slides
Intro to R and Read Counting
Class worksheets and scripts
Learning R worksheet
Learning R script
Featurecounts worksheet
Featurecounts R script
Homework
Homework script - Learning R
pokemon_data
Day 7 | RNA-seq: Differential Expression
Today we will run DEseq2 -- the most commonly used program for differential expression
assessment. We'll talk about how to interpret results and build quality designs.
Videos (watch before class!)
7.1 | Differential expression overview
7.2 | Differential expression with Deseq
Class slides
Differential expression with Deseq2
Class worksheets and scripts
DESeq2 worksheet
DESeq2 script
Homework
DESeq2 homework
Additional Material / Useful Links
Beginner's Guide to DEseq2 package
DESeq2 Manual
Deseq2 walk through
An alternative DEseq tutorial
Isoform read counting with Stringtie
Isoform DE with Ballgown
Day 8 | RNA-seq: Advanced DESeq2 Methods
Today we will do more practice with DESeq2, discuss more subtleties of analysis and
create advanced condition designs in order to address them.
Videos (watch before class!)
8.1 | Multifactor Designs in DESeq2
Class slides
Multifactor Designs in DESeq2
Class worksheets
Worksheet day 8
Homework
Homework day 8
Day 9 | ChIP-seq, ATAC-seq & Peak Calling
The goal for day 9 is to discuss more peak-centric sequencing methods. To this end
we will cover the basic analysis of ChIP-seq and ATAC-seq. For this we will learn
to use MACS2 and the different settings required in peak calling depending on the
type of data. We also will discuss index files, post-sequencing QC, bedtools, and
a few other key concepts.
Videos (watch before class!)
9.1 | Introduction to ChIP-seq
9.2 | Peak Calling with MACS2
9.3 | Evaluating ChIP-seq Data
9.4 | Motif Calling (MEME)
9.5 | Introduction to ATAC-seq
9.6 | Introduction to BEDTools
Class slides
Peak calling and bedtools slides
Class worksheets
Processing ChIP-seq
MACS2
Bedtools
Homework
Homework - MACS2
Homework - Bedtools
Additional material / Useful links
MACS2 peak caller
HMMRATAC peak caller
BEDTools Documentation Homepage
Azofeifa2018 | MD Score Analysis
Andrysik2017 | GRO/ChIP/RNA-seq Data used in workshop
Day 10 | Downstream Analysis
The goal of day 10 is to wrap up and give you pointers on where to go next. We will
cover downloading data from public databases, working with github, and when to make
your own tools/pipelines.
Videos (watch before class!)
10.1 | Git and GitHub (external video)
10.2 | Nextflow pipelines (external video)
Class slides
Big Data Ethics
Class worksheets
Worksheet - Choosing slurm parameters
Worksheet - Git and GitHub
Worksheet - Short read analysis best practices
Worksheet - Getting genes for GO and GSEA
Worksheet - Gene ontology analysis
Worksheet - Running GSEA
Class data files
Background genes for GO
Significant genes for GO
Ranked gene list for GSEA
Additional material / Useful links
Worksheet - Downloading public data
Worksheet - What super computer can I use?
Worksheet - Nascent and Tfit
Other tools for specific analysis steps
Reactome Pathway Analyzer
Gene Ontology (GO)
The OBO Foundry
ENCODE
Official Bash Documentation
Filetypes FAQ
Useful tools I've found
Github tutorial
Nextflow Documentation
Azofeifa2017 | FStitch
Azofeifa2017 | Tfit
FStitch Usage | GitHub
Tfit Usage | GitHub
DAStk Usage | GitHub