Marco Antoniotti, DISCo, Universita` degli Studi di Milano Bicocca
Copyright (c) 2018, Marco Antoniotti, all rights reserved.
marco.antoniotti <a> unimib.it
bimib.disco.unimib.it
Introduction
============
The goal of this project is to produce analyses of Cancer Data sets from public data sources. The results of these analyses will be putative ''cancer progression models'' produced with the tools included in the TRanslational ONCOlogy (TRONCO) R BioRonductor suite (www.troncopackage.org).
The TRONCO website contains references to the papers the BIMIB group published, and, in particular, for the purpose of this exercise, the paper
* "Algorithmic methods to infer the evolutionary trajectories in cancer progression" by Caravagna et al, PNAS, 2016, https://doi.org/10.1073/pnas.1520213113
The paper describes a ''pipeline'' that can be applied to analyze several publicly available cancer data sets. In particular, the
* The Cancer Genome Atlas (TCGA -- https://cancergenome.nih.gov/)
The Exercise
============
The troncopackage.org site contains instructions to replicate the study described in the PNAS paper above. Cfr., https://sites.google.com/site/troncopackage/picnic
The goal of the exercise is to first replicate the PICNIC study and then to proceed to analyze one of the other Cancers available in TCGA.
The suggestion is to use the following resources to get the relevant data.
* firebrowse.org -- a very nice and mostly intuitive user-friendly layer to access the data in TCGA (which has a data model quite complicated). You can use firebrowse.org to select, e.g., ''Copy Number Analyses'' or ''Mutation Analyses'' after having selected the cohort, that is, the cancer type. Eventually for each possible study listed, you can download the .MAF file which can be handled by TRONCO.
* COSMIC (https://cancer.sanger.ac.uk/cosmic) -- A comprehensive catalog of somatic mutations in cancer. For each cancer, you will need to select a number of ''driver'' mutations to use as input to the TRONCO PICNIC pipeline. Normally, you would find these mutations by reading the reference literature (each cancer cohort on TCGA has an associated ''official'' paper describing it); in alternative you could select a subset of the somatic mutations described in COSMIC for a given cancer type, although this will require quite a bit of work to understand how COSMIC works.
Eventually you will need, above all:
* A set of .MAF or .GISTIC files containing the ''raw data''
* A list of ''driver'' genes (or ''events'')
With these at hand you should be able to tweak the TRONCO PICNIC R code to produce analyses for a different cancer type.
Notes
=====
You will need RStudio installed on your computer. Note that some of the most recent versions of R and BioConductor require a few different commands to complete the installation. In general you can skip the MUTEX and NBS stratification steps if you find them too difficult to install in the time allotted.
A Few Final Words
=================
Enjoy the exercise and do contact us if you feel like it. We do not expect you to get everything in place (or even right) in the time you have at your disposal.
Remember that you are curing cancer instead of helping some giant tech firm sell more underwear.