Download Presentation

0. Data and software download

Before we start, let’s download the data and install the software for ThermoFisher / Affymetrix microarrays data analysis.

Small LUSC dataset

TAC software

Microarray raw CEL files

1. Simple example of data analysis

Let us try to work with gene expression data directly in Excel. This is definetely not the best choice, but will help you to feel the data. Please download a subset of TCGA LUSC data: lusc20.txt

It contains gene expression for 10 normal and 10 cancer lung tissues.

Task 1:

  1. Download the data and save as txt file.

  2. Import data to Excel. Ensure that you use “Open..” from Excel. Otherwise gene names (e.g. SEPT11) will be damaged!

Alternatively (in some systems the problems with decimal sepatator are severe!) please use prepared lusc20.xlsx file

  1. Calculate:
  • average expression for each gene: global, normal, cancer =AVERAGE()

  • exclude genes that are not detected

  • log fold chage: logFC = MeanTumour - MeanNormal (“-”, not “/”, as we work in log scale)

  • perform a t-test comparing tumour and normal tissues =T.TEST()

  • assign rank to p-value, either manualy or by =RANK.AVG()

  • estimate FDR: FDR = m * pv / k, where m - number of genes, pv - p-value, k - rank (1..m)

  1. Draw several plots used for visualization in transcriptomics:
  • MA-plot (x: AverageGlobal, y:logFC)

  • Volcano-plot (x: logFC, y: -log10(FDR))

  1. Run online enrichment analysis. Select 1000 genes with lowest FDR (ensure that they all have FDR<0.05) and feed them to Enrichr. Investigate Pathways:Reactome2016, Ontologies:GO BioProc 2018

2. TAC software

Transcriptome Analysis Concole is a user-friendly tool for analysis ex.Affymetrix arrays, that are bought niw by ThermoFisher. Please install it and import CEL files from SCC_CEL.zip.

You might need registration to download library files. If you do not have one, use login: petr.nazarov@crp-sante.lu Ask for the password.

Task 2:

  1. Annotate and import the data, Samples with “N” in the name come from normal tissue. In order to speed-up the analysis - select “Gene” insted of “Gene+Exon” (optional).

  2. See PCA visualization

  3. Perform differential expression analysis (DEA)

  4. Export the results of DEA and the data

  5. Check functional annotation of the significant genes

3. Optional task

Performa analysis of timecourse experiment for IFNg-stimulated A375 cell line. The data are in IFNg_CEL.zip

The dataset is discussed in Nazarov et al, 2013


LIH