Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)
Address: 3415 Colorado Ave, Boulder, CO 80303
Room: A108 (9am-12pm); B331 (1-3pm, office hours)
Time: 9am-12pm (class hours, A108) & 1pm-3pm (helproom hours, B331)
Parking: See here
Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu), Zach Maas (Zachary.Maas AT colorado DOT edu), and Lynn Sanford (Lynn.Sanford AT colorado DOT edu)
IT support: BIT Help (bit-help AT colorado DOT edu). Please included sr2021 any emails to BIT Help.
This course is taught reverse-classroom style. In other words, we will post videos for you to watch before you try to work through that days materials. The videos will go over the premise for the day ahead. Then you will try to work through the materials for that day. If you are at CU Boulder and working on the super computer called fiji, everything you are insturcted to type should be the same, excpet for logging into the super computer. This is becuase we have installed all the material for class on fiji for you in the some location it was on the AWS. However, if you are working from a super computer other than fiji you will likely want to git clone our scripts. Follow these directions to get the materials on your super computer.
X2Go client is an open source remote desktop software for Linux systems. In other words, you will be able to use this application for desktop visualization of data stored on the remote system (the AWS server). Our primary use of X2Go will be in visuzalting genomic data. See installation instructions for MacOS/Windows here.
In addition to the items above, Windows users will need to install a terminal application. We recommend installing Linux Bash Shell with Ubuntu. Bash for Windows with Ubuntu
In addition to the items above, we recommend users going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.
This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actually classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.
Go to GitHub and register for an account (if you do not have an existing one).
We will use your github username to give you temporary access to a super computer, an Amazon Cloud (AWS) server.
X2Go client is an open source remote desktop software for Linux systems. In other words, you will be able to use this application for desktop visualization of data stored on the remote system (the AWS server). Our primary use of X2Go will be in visuzalting genomic data. See installation instructions for MacOS/Windows here.
In addition to the items above, Windows users will need to install a terminal application. We recommend installing Linux Bash Shell with Ubuntu. Bash for Windows with Ubuntu
In addition to the items above, we recommend users going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.
The goal of the first day is to get oriented to the class, the AWS super computer, and sequencing data. We will go over an overview of the course, to discuss the basic principles of High-throughput Sequencing (HTS) and the subsequent steps involved in processing this data (analysis pipelines). We will confirm that everyone can access the AWS instance and begin learning the VIM text editor. If you are doing this on the fiji server you do not need a ssh key on github.
The goal of day 2 is to learn some basic Linux/Unix commands for managing files. Today we will go over basic Unix/Linux commands and text editors (Vim). You can Google search Linux/Unix commands/vim commands and use the handy cheat-sheets (below) for all of these tools.
The goal of day 3 is to move to super computer thinking. Therefore, today we will learn the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server.
Congrats! It is time to now do some real data analysis! The goal of day 4 is to evaluate your sequencing data for its quality. Remember: Garbage In, Garbage out. It is never worth your time to try to analyze garbage. Today we will learn about quality control, mapping, and visualization -- which are the first steps that take place after obtaining your sequencing data. It is important to determine whether your sequencing experiment is successful before moving on to downstream analyses.
Today it's about catching up and making sure we are ready for next week! We will continue with our discussion of read mapping (started on day 4) and take a quick assessment to make sure everyone is comfortable with the first week of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be at a much quicker pace!
The goal for day 6 is to begin to analyze RNA-seq data -- the single most commonly obtained short read sequencing data! Today we will focus on the most common pipeline. Since most differential expression programs are written in R, we will focus today on learning how to use R (a statistical computing environment). We will also discuss how to count reads over genes. Note that the videos cover a much larger fraction of the process -- consider it a preview ...
Today we will run DEseq -- the most commonly used program for differential expression assessment. We'll talk about how to interpret results and build quality designs.
Today we will wrap up our discussing of differential expression with RNA_seq and discuss more advanced transcriptome sequencing:single cell.
The goal for day 9 is to discuss more peak centric sequencing methods. To this end we will cover the basic analysis of ChIP-seq and ATAC-seq. For this we will learn to use MACS2, the different setting required in peak calling depending on the type type of data. We also will discuss index files, post-sequencing QC, and a few other key concepts.
The goal of day 10 is to wrap up and give you pointers on where to go next. We will cover downloading data from ENCODE, some basic BedTools commands, and open biomedical ontologies. We will also discuss nascent sequencing data and how this is distinct (both in data and analysis) from RNA-seq.