2024 Workshop - General Information

*Please note that specific videos and documents may change until a few weeks before the workshop*


You must register to attend this workshop. Please register here.


Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)

Address: 3415 Colorado Ave, Boulder, CO 80303

Room: A108 (class and office hours)

Time: July 8-12/15-19, 9am-12pm (class hours) & 1pm-3pm (office hours)

Parking: See here

Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu) and Dr. Lynn Sanford (lynn.sanford AT colorado DOT edu)

IT support: BIT Help (bit-help AT colorado DOT edu). Please include sr2024 in the subject line on any emails to BIT Help.

GitHub class repository: Dowell-Lab/srworkshop


Before Day 1 | Tasks to Complete


This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actual classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.

Before the first day of the workshop, there are a few things you need to do:

For everyone (MacOS/Windows/Linux)

  1. Sign up for a free GitHub account
  2. Go to GitHub and register for an account (if you do not have an existing one).

  3. Put your GitHub username in this Google spreadsheet
  4. We will use your GitHub username to give you temporary access to an Amazon Cloud (AWS) server that we will use as the super computer for this course.

  5. Watch the videos for the first day of class. Students in the DS2 cohort may skip the 'Sequencing' videos.
  6. Starting on Day 6, we will be using the programming language R. We encourage you to install R and its design environment RStudio on your own computer, although if you run into major installation problems, we do have alternate options available.

    R (version 4.4.1)

    RStudio.

For Windows users

In addition to the items above, Windows users will need to install a terminal application. We recommend installing Windows Subsystem for Linux (WSL) with Ubuntu as an app interface.

  • Windows 11 instructions: How to install Bash on Windows 11 Configuring Windows Terminal
  • Windows 10 instructions: Bash for Windows with Ubuntu
  • Troubleshooting:
    • The first thing to check if you're having problems is whether WSL is enabled. Open your Start menu and type "Turn Windows features on or off" until the Control Panel shows up, then click on it. Scroll down and make sure that "Virtual Machine Platform" and "Windows Subsystem for Linux" are both checked. You may need to restart your computer afterward. Then try to continue with the installation.
    • If you are still having trouble, we recommend you come a half hour early to the first day of the workshop. There will be someone on hand to help you.

Optional Items

  • In addition to the items above, we recommend beginners going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.

Day 1 | Intro to Sequencing and Overview of Pipelines


The goal of the first day is to get oriented to the class, the AWS super computer, and sequencing data. We will present an overview of the course and discuss the basic principles of high-throughput sequencing (HTS) and subsequent steps involved in processing this data (analysis pipelines). We will confirm that everyone can access the AWS instance and begin learning basic Bash, Git/GitHub, and the VIM text editor.

Videos (watch before class!)

1.1 | Course Overview
1.2 | Sequencing: Creating Libraries
1.3 | Sequencing: Pre-sequencing library quality
1.4 | Sequencing: Illumina Sequencing
1.5 | Sequencing: Designing sequencing experiments
1.6 | Basic Unix: File System/Directory Trees
1.7 | Basic Unix: Commands for Moving Files/Directories
1.8 | Basic Unix: Vim Tutorial

Link to class materials

GitHub Dowell-Lab/srworkshop day01

Day 2 | Intro to Linux & Vim


The goal of day 2 is to learn some basic Linux/Unix commands for managing files. Today we will go over basic Unix/Linux commands and work more with text editors (Vim). You can Google search Linux/Unix commands/Vim commands and use the handy cheat-sheets (below) for all of these tools.

Videos (watch before class!)

2.1 | SSH and VPN Introduction
2.2 | Remote Rsync/Reading Files
2.3 | Searching/editing Files, Pipes, and Outputs
2.4 | Directory Permissions

Link to class materials

GitHub Dowell-Lab/srworkshop day02

Day 3 | Intro to Servers, Public Data & Quality Control


The goal of day 3 is to move to super computer thinking. Therefore, today we will learn the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server and how to run some basic quality control. Today we start using real data! We will evaluate your sequencing data for its quality. Remember: Garbage In, Garbage out. It is never worth your time to try to analyze garbage.

Videos (watch before class!)

3.1 | Introduction to the Computer Cluster
3.2 | LMOD: An Environment Module System
3.3 | Transferring Data
3.4 | Slurm Workload Manager Basics
3.5 | FastQC Overiew
3.6 | Running FastQC

Link to class materials

GitHub Dowell-Lab/srworkshop day03

Day 4 | Trimming, Mapping & IGV (Genome Browser Visualization)


Congrats! It is time to now do some real data analysis! Today we will do more quality control and learn about mapping and visualization. These are the first steps that take place after obtaining your sequencing data. It is important to determine whether your sequencing experiment is successful before moving on to downstream analyses.

Videos (watch before class!)

4.1 | Trimmomatic Overview
4.2 | Running Trimmomatic
4.3 | Introduction to Mapping
4.4 | The HISAT2 mapper
4.5 | Mapping with HISAT2
4.6 | Integrative Genomics Viewer (IGV)

Link to class materials

GitHub Dowell-Lab/srworkshop day04

Day 5 | Assessment


Today it's about catching up and making sure we are ready for next week! We will cover loops and how to convert a big BAM file into something small enough to be transferred quickly and visualized. We will also take an assessment to make sure everyone is comfortable with the first week of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be at a much quicker pace!

Class slides

Looping, Visualization, Assessment

Link to class materials

GitHub Dowell-Lab/srworkshop day05

Day 6 | RNA-seq: R & Read Counting


The goal for this week is to delve deeper into actual data analysis. Since most differential expression programs are written in R, today we will focus on learning how to use R (a statistical computing language/environment).

Videos (watch before class!)

6.1 | Intro to R

Link to class materials

GitHub Dowell-Lab/srworkshop day06

Mini-Project A | Single Cell RNA-seq


The goal for this project is to learn how to analyze single cell RNA-seq data. We'll first run through the initial analysis steps you've already learned on a single-cell dataset, then we'll apply our knowledge of R from yesterday to count reads, perform linear and non-linear dimensional reduction, cluster cells, and categorize cell type-specific gene expression. Through all of these steps we'll talk about how to interpret results and represent data.

Day 7 | Mapping and Filtering


Videos (watch before class!)

A7.2 | Single Cell Sequencing Overview
A7.1 | Single Cell Sequencing Analysis and Seurat

Link to class materials

GitHub Dowell-Lab/srworkshop projectA day07


Day 8 | Dimensional Reduction, Clustering, and Cell Type Annotation


Videos (watch before class!)

A8.1 | Clustering and Cell Type Identification
A8.2 | Single Cell Cell Type Annotations

Link to class materials

GitHub Dowell-Lab/srworkshop projectA day08


Day 9 | Cell Types, Gene Expression, and Advanced Analysis


Videos (watch before class!)

A9.1 | An Introduction to Cell Chat

Link to class materials

GitHub Dowell-Lab/srworkshop projectA day09


Mini-Project B | Multi-omics: RNA-seq/ChIP-seq/ATAC-seq


The goal for this project is to learn how to analyze RNA-seq data in depth and then integrate the results with . We'll first run through the initial analysis steps you've already learned on a single-cell dataset, then we'll apply our knowledge of R from yesterday to count reads, perform linear and non-linear dimensional reduction, cluster cells, and categorize cell type-specific gene expression. Through all of these steps we'll talk about how to interpret results and represent data.

Day 7 | Counting and DESeq2


Videos (watch before class!)

B7.1 | Counting Reads
B7.2 | Differential Expression Analysis
B7.3 | DESeq2
B7.4 | Multifactor Designs in DESeq2 (Optional)

Link to class materials

GitHub Dowell-Lab/srworkshop projectB day07


Day 8 | ChIP-seq - Peak Calling and Motif Scanning


Videos (watch before class!)

B8.1 | Intro to ChIP-seq
B8.2 | Evaluating ChIP-seq Data Quality
B8.3 | MACS2 for ChIP-seq Data
B8.4 | Motif Calling with MEME
B8.5 | ATAC-seq Analysis (Optional)

Link to class materials

GitHub Dowell-Lab/srworkshop projectB day08


Day 9 | Integrating Data with BEDTools and Advanced Plotting


Videos (watch before class!)

B9.1 | Introduction to BEDTools

Link to class materials

GitHub Dowell-Lab/srworkshop projectB day09


Day 10 | Downstream Analysis


The goal of day 10 is to wrap up and give you pointers on where to go next. We will cover downloading data from public databases, working with github, and when to make your own tools/pipelines.

Videos (watch before class!)

10.1 | Git and GitHub (external video)
10.2 | Nextflow pipelines (external video)

Link to class materials

GitHub Dowell-Lab/srworkshop day10