Before we start, let’s download the data and install the software for ThermoFisher / Affymetrix microarrays data analysis.
Let us try to work with gene expression data directly in Excel. This is definetely not the best choice, but will help you to feel the data. Please download a subset of TCGA LUSC data: lusc20.txt
It contains gene expression for 10 normal and 10 cancer lung tissues.
Download the data and save as txt file.
Import data to Excel. Ensure that you use “Open..” from Excel. Otherwise gene names (e.g. SEPT11) will be damaged!
Alternatively (in some systems the problems with decimal sepatator are severe!) please use prepared lusc20.xlsx file
average expression for each gene: global, normal, cancer =AVERAGE()
exclude genes that are not detected
log fold chage: logFC = MeanTumour - MeanNormal (“-”, not “/”, as we work in log scale)
perform a t-test comparing tumour and normal tissues =T.TEST()
assign rank to p-value, either manualy or by =RANK.AVG()
estimate FDR: FDR = m * pv / k, where m - number of genes, pv - p-value, k - rank (1..m)
MA-plot (x: AverageGlobal, y:logFC)
Volcano-plot (x: logFC, y: -log10(FDR))
Transcriptome Analysis Concole is a user-friendly tool for analysis ex.Affymetrix arrays, that are bought niw by ThermoFisher. Please install it and import CEL files from SCC_CEL.zip.
You might need registration to download library files. If you do not have one, use login: petr.nazarov@crp-sante.lu Ask for the password.
Annotate and import the data, Samples with “N” in the name come from normal tissue. In order to speed-up the analysis - select “Gene” insted of “Gene+Exon” (optional).
See PCA visualization
Perform differential expression analysis (DEA)
Export the results of DEA and the data
Check functional annotation of the significant genes
Performa analysis of timecourse experiment for IFNg-stimulated A375 cell line. The data are in IFNg_CEL.zip
The dataset is discussed in Nazarov et al, 2013