Course Outline: Download PDF
Building: Jennie Smoly Caruthers Biotechnology Building (JSCBB)
Address: 3415 Colorado Ave, Boulder, CO 80303
Room: A108
Time: 9am-12pm (class hours) & 1pm-3pm (helproom hours)
Parking: See here
Contact: Dr. Mary Allen (mary.a.allen AT colorado DOT edu) & Margaret Gruca (margaret.gruca AT colorado DOT edu)
IT support: BIT Help (bit-help AT colorado DOT edu)
This course is taught reverse-classroom style. In other words, we will post videos for you to watch before the next day of the workshop that will go over the premise for the day ahead. In the actually classroom hours, we will then try to help you execute the tasks described in the videos in a hands-on approach. The idea is that you will then learn the theory ahead of time while having classroom helpers guide you through the nuts and bolts to facilitate an environment for those that are relatively new to computing. While we encourage you to attend every lecture as the content continues to build upon previous days, please email us if you have concerns/questions about attending all 10 days.
Go to GitHub and register for an account (if you do not have an existing one). Send your user ID to 'bit-help AT colorado DOT edu' so we can give you access to the AWS server
X2Go client is an open source remote desktop software for Linux systems. In other words, you will be able to use this application for desktop visualization of data stored on the remote system (the AWS server). Our primary use of X2Go will be in visuzalting genomic data. See installation instructions for MacOS/Windows here.
You will need to add a public ssh key to your GitHub account. You can view your key by going to the following link https://github.com/USERNAME.keys and replacing USERNAME with your GitHub username. See an example here.
For instructions on generating a key, see here. For instructions on adding this key to your GitHub account, see here. Once you have completed these two steps, check to make sure your key is added by checking this link https://github.com/USERNAME.keys mentioned in the previous section.
In addition to the items above, Windows users will need to install a terminal application. We recommend one of the two following options:
In addition to the items above, we recommend users going through a basic command-line usage tutorial which can be found on Codecademy. The first week, will move relatively slowly through these basics, but if you find you're having trouble keeping up, we highly recommend this before the second week.
Cover the basic principles of High-throughput Sequencing (HTS) and the subsequent steps involved in processing this data (analysis pipelines).
Introduction to basic Unix commands and text editors (Vim). You can Google search Unix commands/vim commands and go to images for handy cheat-sheets for all of these tools.
Understand the basics of high-performance computing, servers, and job/workload managers. It is also important to know how to access public data and transfer this data to and from the server.
Quality control, mapping, and visualization are the first steps that take place after obtaining your sequencing data. It is important to assess that your sequencing experiment is successful before moving on to downstream analyses.
Quick assessment to make sure everyone is comfortable with the first four days of the workshop. We will give you a short task which should ideally take less than an hour to complete. If it takes longer, it is a good idea to go home over the weekend and review/practice skills from the first week. The second week will be a much quicker pace.
For RNA-seq, we will cover read counting using featureCounts, isoform analysis using Stringtie/Ballgown \& creating a custom GTF file from these annotations, and differential expression analysis (DEA) using DESeq2.
We will learn how to assess the quality of our data post-mapping (e.g. mostly analyzing BAM files). We will also learn how to use MultiQC to combine outputs from multiple samples into one concatenated QC report. In the second half of the day, we will learn how to annotate nascent sequencing data as most of the annotations are based off of ChIP (for enhancers) and RNA-seq/Steady-State (for genes). In nascent analyses, we can capture elements such as intergenic \& intragenic transcription regulatory elements, altnerative 5'-end RNA polymerase initiation, and 3'-end run-on. As such, we need to be able to quickly capture all of these elements to analyze using methods such as motif displacement analysis, differential transcription analysis, and comparitive analyses with RNA/ChIP/ATAC-seq. We will use FStitch to capture these unnanotated regions and learn the principles of Tfit and DAStk beforehand in the video as well as in principle in the hands-on portion of the workshop.
Variant calling using GATK and single-cell sequencing analysis.
This section will cover the basic analysis of ChIP-seq and ATAC-seq. We will cover peak calling using MACS2, the different setting required in peak calling depending on the type type of data, and motif displacement (MD) analysis using DAStk (also covered in Day 7 homework).
Day 10 is meant to get you started learning additional tools that might be helpful in downstream analysis of short read data. This year (2019) we will cover downloading data from ENCODE, some basic BedTools commands, and open biomedical ontologies. Additionally, an advanced homework assignment will go over creating command-line runnable bash scripts with for-loops, if-statements, and user-input.