*Please note that specific videos and documents may change until a few weeks before the workshop*
Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)
Address: 3415 Colorado Ave, Boulder, CO 80303
Room: A108 (class and office hours)
Time: July 8-12/15-19, 9am-12pm (class hours) & 1pm-3pm (office hours)
Parking: See here
Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu) and Dr. Lynn Sanford (lynn.sanford AT colorado DOT edu)
IT support: BIT Help (bit-help AT colorado DOT edu). Please include sr2024 in the subject line on any emails to BIT Help.
GitHub class repository: Dowell-Lab/srworkshop
This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actual classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.
Before the first day of the workshop, there are a few things you need to do:
Go to GitHub and register for an account (if you do not have an existing one).
We will use your GitHub username to give you temporary access to an Amazon Cloud (AWS) server that we will use as the super computer for this course.
In addition to the items above, Windows users will need to install a terminal application. We recommend installing Windows Subsystem for Linux (WSL) with Ubuntu as an app interface.
The goal of the first day is to get oriented to the class, the AWS super computer, and sequencing data. We will present an overview of the course and discuss the basic principles of high-throughput sequencing (HTS) and subsequent steps involved in processing this data (analysis pipelines). We will confirm that everyone can access the AWS instance and begin learning basic Bash, Git/GitHub, and the VIM text editor.
The goal of day 2 is to learn some basic Linux/Unix commands for managing files. Today we will go over basic Unix/Linux commands and work more with text editors (Vim). You can Google search Linux/Unix commands/Vim commands and use the handy cheat-sheets (below) for all of these tools.
The goal of day 3 is to move to super computer thinking. Therefore, today we will learn the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server and how to run some basic quality control. Today we start using real data! We will evaluate your sequencing data for its quality. Remember: Garbage In, Garbage out. It is never worth your time to try to analyze garbage.
Congrats! It is time to now do some real data analysis! Today we will do more quality control and learn about mapping and visualization. These are the first steps that take place after obtaining your sequencing data. It is important to determine whether your sequencing experiment is successful before moving on to downstream analyses.
Today it's about catching up and making sure we are ready for next week! We will cover loops and how to convert a big BAM file into something small enough to be transferred quickly and visualized. We will also take an assessment to make sure everyone is comfortable with the first week of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be at a much quicker pace!
The goal for this week is to delve deeper into actual data analysis. Since most differential expression programs are written in R, today we will focus on learning how to use R (a statistical computing language/environment).
The goal for this project is to learn how to analyze single cell RNA-seq data. We'll first run through the initial analysis steps you've already learned on a single-cell dataset, then we'll apply our knowledge of R from yesterday to count reads, perform linear and non-linear dimensional reduction, cluster cells, and categorize cell type-specific gene expression. Through all of these steps we'll talk about how to interpret results and represent data.
The goal for this project is to learn how to analyze RNA-seq data in depth and then integrate the results with . We'll first run through the initial analysis steps you've already learned on a single-cell dataset, then we'll apply our knowledge of R from yesterday to count reads, perform linear and non-linear dimensional reduction, cluster cells, and categorize cell type-specific gene expression. Through all of these steps we'll talk about how to interpret results and represent data.
The goal of day 10 is to wrap up and give you pointers on where to go next. We will cover downloading data from public databases, working with github, and when to make your own tools/pipelines.