2021 Workshop Archive


Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)

Address: 3415 Colorado Ave, Boulder, CO 80303

Room: A108 (9am-12pm); B331 (1-3pm, office hours)

Time: 9am-12pm (class hours, A108) & 1pm-3pm (helproom hours, B331)

Parking: See here

Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu), Zach Maas (Zachary.Maas AT colorado DOT edu), and Lynn Sanford (Lynn.Sanford AT colorado DOT edu)

IT support: BIT Help (bit-help AT colorado DOT edu). Please included sr2021 any emails to BIT Help.


Before starting if you are working alone | Tasks to Complete


This course is taught reverse-classroom style. In other words, we will post videos for you to watch before you try to work through that days materials. The videos will go over the premise for the day ahead. Then you will try to work through the materials for that day. If you are at CU Boulder and working on the super computer called fiji, everything you are insturcted to type should be the same, excpet for logging into the super computer. This is becuase we have installed all the material for class on fiji for you in the some location it was on the AWS. However, if you are working from a super computer other than fiji you will likely want to git clone our scripts. Follow these directions to get the materials on your super computer.

For everyone (MacOS/Windows/Linux)

  1. Install X2GO Client
  2. X2Go client is an open source remote desktop software for Linux systems. In other words, you will be able to use this application for desktop visualization of data stored on the remote system (the AWS server). Our primary use of X2Go will be in visuzalting genomic data. See installation instructions for MacOS/Windows here.

For Windows users

In addition to the items above, Windows users will need to install a terminal application. We recommend installing Linux Bash Shell with Ubuntu. Bash for Windows with Ubuntu

Optional Items

In addition to the items above, we recommend users going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.


Before Day 1 if you are coming to the workshop | Tasks to Complete


This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actually classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.

For everyone (MacOS/Windows/Linux)

  1. Sign up for a free github account
  2. Go to GitHub and register for an account (if you do not have an existing one).

  3. Put your GitHub username in this Google spreadsheet
  4. We will use your github username to give you temporary access to a super computer, an Amazon Cloud (AWS) server.

  5. Install X2GO Client
  6. X2Go client is an open source remote desktop software for Linux systems. In other words, you will be able to use this application for desktop visualization of data stored on the remote system (the AWS server). Our primary use of X2Go will be in visuzalting genomic data. See installation instructions for MacOS/Windows here.

For Windows users

In addition to the items above, Windows users will need to install a terminal application. We recommend installing Linux Bash Shell with Ubuntu. Bash for Windows with Ubuntu

Optional Items

In addition to the items above, we recommend users going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.


Day 1: Intro to Sequencing and Overview of Pipelines


The goal of the first day is to get oriented to the class, the AWS super computer, and sequencing data. We will go over an overview of the course, to discuss the basic principles of High-throughput Sequencing (HTS) and the subsequent steps involved in processing this data (analysis pipelines). We will confirm that everyone can access the AWS instance and begin learning the VIM text editor. If you are doing this on the fiji server you do not need a ssh key on github.

Videos (watch before class!)

1.1 | Course Overview
1.2 | Creating Libraries
1.3 | Pre-sequencing library quality
1.4 | Illumina Sequencing
1.5 | Designing sequencing experiments
1.6 | Vim Tutorial

In class slides, notes and materials

Configuring SSH Keys (only needed if you are using a AWS)
Logging on to a super computer
Using vim tutor
Using x2go to log on to a super computer
Using x2go to log on to a super computer (Windows users)
Libraries and sequencing
Creating a variable in bash
Vim crash 911

Homework

Homework
Library QC challenge
Library QC challenge - answers

Additional Material/Useful Links (external websites)

Basic info about file types in the class
Illumina sequencing technology

Day 2: Intro to Linux & Vim


The goal of day 2 is to learn some basic Linux/Unix commands for managing files. Today we will go over basic Unix/Linux commands and text editors (Vim). You can Google search Linux/Unix commands/vim commands and use the handy cheat-sheets (below) for all of these tools.

Videos (watch before class!)

2.1 | SSH and VPN Introduction
2.2 | File System/Directory Trees
2.3 | Basic Commands for Moving Files/Directories
2.4 | Remote Rsync/Reading Files
2.5 | Searching/editing Files, Pipes, and Outputs
2.6 | Directory Permissions

In class slides, notes and materials

Linux slides

Homework

Homework

Additional Material/Useful Links (external websites)

Linux walk through
Linux/Unix cheatsheet
Vim cheatsheet

Day 3: Intro to Servers & downloading Public Data


The goal of day 3 is to move to super computer thinking. Therefore, today we will learn the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server.

Videos (watch before class!)

3.1 | Introduction to the Computer Cluster
3.2 | LMOD: An Environment Module System
3.3 | Transferring Data
3.4 | Slurm Workload Manager Basics

In class slides, notes and materials

Day 3 Slides
Windows - where are my files

Homework

Practice sbatch and computing basics

Additional Material/Useful Links (external websites)

rsync for directories (the slash position matters)
Unix Permissions
Slurm (sbatch) Cheatsheet
In Class Exercise Solution

Day 4: Quality Control, Mapping & IGV (Genome Browser Visualization)


Congrats! It is time to now do some real data analysis! The goal of day 4 is to evaluate your sequencing data for its quality. Remember: Garbage In, Garbage out. It is never worth your time to try to analyze garbage. Today we will learn about quality control, mapping, and visualization -- which are the first steps that take place after obtaining your sequencing data. It is important to determine whether your sequencing experiment is successful before moving on to downstream analyses.

Videos (watch before class!)

4.1 | Barcoding
4.2 | FastQC Overiew
4.3 | Running FastQC
4.4 | Trimmomatic Overview
4.5 | Running Trimmomatic
4.6 | Introduction to Mapping
4.7 | The HISAT2 mapper
4.8 | Mapping with HISAT2
4.9 | Integrative Genomics Viewer (IGV)

In class slides, notes and materials

QC slides
Mapping and viz

Homework

Homework day 4

Additional Material/Useful Links

fastqc worksheet
trimmomatic worksheet
mapping and IGV worksheet
FastQC Website
MultiQC Website

Day 5: Assessment


Today it's about catching up and making sure we are ready for next week! We will continue with our discussion of read mapping (started on day 4) and take a quick assessment to make sure everyone is comfortable with the first week of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be at a much quicker pace!

Quiz

Assessment (timed!)

Homework

Homework day 5

Additional Material/Useful Links

Assessment worksheet

Day 6: RNA-seq


The goal for day 6 is to begin to analyze RNA-seq data -- the single most commonly obtained short read sequencing data! Today we will focus on the most common pipeline. Since most differential expression programs are written in R, we will focus today on learning how to use R (a statistical computing environment). We will also discuss how to count reads over genes. Note that the videos cover a much larger fraction of the process -- consider it a preview ...

Videos (watch before class!)

6.1 | Intro to R
6.2 Differential expression overview
6.3 | Counting reads
6.4 | Differential expression with Deseq
6.5 | Isoforms
6.6 | Ballgown

In class worksheets, slides, notes and materials

Using R worksheet
Learning R script
sbatch Feature Counts walk through

Homework

Learning_R_Additional_Practice script
pokemon_data

Additional Material/Useful Links (external websites)

Deseq2 walk through

Day 7: RNA-seq continued


Today we will run DEseq -- the most commonly used program for differential expression assessment. We'll talk about how to interpret results and build quality designs.

Videos (watch before class!)

6.5 | isoforms
6.6 | Ballgown

In class slides, notes and materials

Differential expression with Deseq2
sbatch Deseq2 walkthrough

Homework

In-class Worksheet(s)

In Class Slides
Homework

Additional Material/Useful Links (external websites)

Beginner's Guide to DEseq2 package DESeq2 Manual An alternative DEseq tutorial

Day 8: Finish RNA-seq & Single-Cell Sequencing


Today we will wrap up our discussing of differential expression with RNA_seq and discuss more advanced transcriptome sequencing:single cell.

Videos (watch before class!)

8.1 Nascent Transcription (we may move this to day10)
8.2 | Single-Cell Sequencing (1/3)
8.3 | Single-Cell Sequencing (2/3)
8.4 | Single-Cell Sequencing (3/3)

In class slides, notes and materials

RNAseq practice
Single cell slides
Single cell worksheet

In-class Worksheet(s)

Homework

Homework

Additional Material/Useful Links (external websites)

Azofeifa2017 | FStitch
Azofeifa2017 | Tfit
FStitch Usage | GitHub
Tfit Usage | GitHub
DAStk Usage | GitHub

Day 9: ChIP-seq & ATAC-seq


The goal for day 9 is to discuss more peak centric sequencing methods. To this end we will cover the basic analysis of ChIP-seq and ATAC-seq. For this we will learn to use MACS2, the different setting required in peak calling depending on the type type of data. We also will discuss index files, post-sequencing QC, and a few other key concepts.

Videos (watch before class!)

9.1 |Introduction to ChIP-seq
9.2 | Peak Calling with MACS2
9.3 | Evaluating ChIP-seq Data
9.4 | Motif Calling (MEME)
9.5 | Introduction to ATAC-seq

In class slides, notes and materials

QC and ChIP slides

Homework

Homework

In-class Worksheet(s)

Walkthrough HMMRATAC
Walkthrough for the program Macs2 which calls peaks
Bam file QC

Additional Material/Useful Links (external websites)

MACS2 Usage |GitHub
BEDTools Documentation | Intersect
Azofeifa2018 | MD Score Analysis
Andrysik2017 | GRO/ChIP/RNA-seq Data used in workshop
Sanchez2018 | Additional ChIP data used in workshop

Day 10: Downstream Analysis


The goal of day 10 is to wrap up and give you pointers on where to go next. We will cover downloading data from ENCODE, some basic BedTools commands, and open biomedical ontologies. We will also discuss nascent sequencing data and how this is distinct (both in data and analysis) from RNA-seq.

In-class Worksheet(s)

Day 10 slides
Downloading public data
What super computer can I use?
Other paths
Getting genes for GO and GSEA walkthrough
Running GSEA
Go walkthrough
Nascent and Tfit walkthrough
Short read analysis best practices

In class data files

Ranked gene list for GSEA
Background genes for GO
Significant genes for GO

Additional Material/Useful Links (external websites)

Reactome Pathway Analyzer
Gene Ontology (GO)
The OBO Foundry
ENCODE
BEDTools Documentation Homepage
Official Bash Documentation
Filetypes FAQ
Useful tools I've found
Github tutoria