Let us try to cluster samples from previously considered datasets. We will use PCA to present samples in 2D space. We will use default kmeans
function to cluster.
PC = prcomp(t(Data$LX),scale=TRUE)
plot(PC$x[,"PC1"],PC$x[,"PC2"],col=1,pch=19,cex=2, main="Samples in PC-space")
## clustering -> 4 clusters
cl = kmeans(t(Data$LX),centers=4,nstart=10)$cluster
plot(PC$x[,"PC1"],PC$x[,"PC2"],col=cl+1,pch=19,cex=2, main="Samples in PC-space")
Try to change number of clusters. We can also cluster features - usually not very interesting.
cl2 = kmeans(Data$LX,centers=4,nstart=10)$cluster
plotPCA(t(Data$LX),col=cl2+1, cex=size)
## clustering of the samples
hc = hclust(dist(t(Data$LX)))
plot(hc)
## "cut the tree" to get defined clusters
cl = cutree(hc,k=4)
plot(PC$x[,"PC1"],PC$x[,"PC2"],col=cl+1,pch=19,cex=2, main="Samples in PC-space")
## play with k
As you can see - not so beautiful. Usually people apply consensus clustering to get defined and robust clusters.
## install the package if needed:
#install.packages("ConsensusClusterPlus")
library(ConsensusClusterPlus)
## clustering of the samples
hc = ConsensusClusterPlus(Data$LX,maxK=4,reps=100,pItem=0.8,pFeature=1, title="GeneProCC_hc",clusterAlg="hc",distance="euclidean",seed=1234567890,plot="png")
## get clusters
cl=hc[[4]]$consensusClass
## plot
plot(PC$x[,"PC1"],PC$x[,"PC2"],col=cl+1,pch=19,cex=2, main="Samples in PC-space")
Not much better but at least it is more reproducible.
One of the most used methods for visualizing omics data is heatmap. Here we will use heatmaps from the package pheatmap
.
## install "pheatmap" if needed
#install.packages("pheatmap")
library(pheatmap)
## Warning: package 'pheatmap' was built under R version 3.5.2
## let's use previously defined markers
pheatmap(Data$LX[imarker,], annotation_col = Data$Meta)
There is not much sense to build heatmaps on all available features. Let’s select the most interesting. We will use custom warp-up for Limma package: DEA.limma
described in Transcriptomics course (section 4).
## load analysis function
source("http://sablab.net/scripts/LibDEA.r")
## Perform differential expression analysis (Fisher statistics)
Res = DEA.limma(Data$LX, group=Data$Meta$Time)
## Loading required package: limma
## [1] "day10-day04" "day14-day04" "day14-day10" "day21-day04" "day21-day10"
## [6] "day21-day14"
##
## Limma,unpaired: 3619,2971,2281,1732 DEG (FDR<0.05, 1e-2, 1e-3, 1e-4)
## Significant protein groups
isig = which(Res$FDR<0.1e-4)
## reannotate features by gene names
X = Data$LX[isig,]
rownames(X) = substr(Data$Anno$Gene.names[isig],1,20)
## vizualize
pheatmap(X,scale = "row", annotation_col = Data$Meta,fontsize_row = 2)