1.1. Current folder

R can import data from local storage or Internet, and export it locally. Let us first setup the current folder, where our results will be stored.

rm(list=ls()) ## clean memory

getwd()   # shows current folder
dir()     # shows files in the current folder
setwd("D:/Data/R") # sets the current folder

 

1.2. Read values from a text file

Here is a file with some marker genes Markers.

We can read values from an unformatted text file using scan().

markers = scan("http://edu.sablab.net/rp2019/data/markers.txt",what = "",sep="\n") # what - defines the value class
print(markers)
##  [1] "Gfap"    "Grid1"   "Grin2b"  "Mag"     "Mki67"   "Msi1"    "Nes"    
##  [8] "Plp1"    "Slc6a1"  "Slc6a11" "Sox2"    "Syp"

You can download an entire webpage by scan to parse it afterwards.

 

1.3. Read tables

We will use read.table() to import the data as a data frame. Let’s try on sample annotation file: sampleAnnotation.txt.

Some parameters are important in read.table():

Samples = read.table("http://edu.sablab.net/rp2019/data/sampleAnnotation.txt", header=T, sep="\t", as.is=T, row.names = 1) 
Time Replicate
BO2day4rep1 day04 rep1
BO2day4rep2 day04 rep2
BO2day4rep3 day04 rep3
BO2day4rep4 day04 rep4
BO2day4rep5 day04 rep5
BO2day10rep1 day10 rep1
BO2day10rep2 day10 rep2
BO2day10rep3 day10 rep3
BO2day10rep4 day10 rep4
BO2day10rep5 day10 rep5
BO2day14rep1 day14 rep1
BO2day14rep2 day14 rep2
BO2day14rep3 day14 rep3
BO2day14rep4 day14 rep4
BO2day14rep5 day14 rep5
BO2day21rep1 day21 rep1
BO2day21rep2 day21 rep2
BO2day21rep3 day21 rep3
BO2day21rep4 day21 rep4

Do not forget functions that allow you seeing, what is inside your data:

str(Samples)
## 'data.frame':    19 obs. of  2 variables:
##  $ Time     : chr  "day04" "day04" "day04" "day04" ...
##  $ Replicate: chr  "rep1" "rep2" "rep3" "rep4" ...
head(Samples)
##               Time Replicate
## BO2day4rep1  day04      rep1
## BO2day4rep2  day04      rep2
## BO2day4rep3  day04      rep3
## BO2day4rep4  day04      rep4
## BO2day4rep5  day04      rep5
## BO2day10rep1 day10      rep1
summary(Samples)
##      Time            Replicate        
##  Length:19          Length:19         
##  Class :character   Class :character  
##  Mode  :character   Mode  :character
View(Samples)

You can see row and column names of the loaded data.frame object:

rownames(Samples)
##  [1] "BO2day4rep1"  "BO2day4rep2"  "BO2day4rep3"  "BO2day4rep4" 
##  [5] "BO2day4rep5"  "BO2day10rep1" "BO2day10rep2" "BO2day10rep3"
##  [9] "BO2day10rep4" "BO2day10rep5" "BO2day14rep1" "BO2day14rep2"
## [13] "BO2day14rep3" "BO2day14rep4" "BO2day14rep5" "BO2day21rep1"
## [17] "BO2day21rep2" "BO2day21rep3" "BO2day21rep4"
colnames(Samples)
## [1] "Time"      "Replicate"

 

1.4. Read and extract values from a MaxQuant file

1.4.1. Reading data table

Here we will try to read protein group file generated by MaxQuant. Let us first do it manually first. We will work with data from rat brain organoid experiment. The data originated from Elena Martinez’ work in collaboration with Rolf Bjerkvig. You can download the proteinGroups.txt file to use it locally, or import it directly from Internet.

file = "D:/Data/RProteo2019/proteinGroups.txt"
## uncomment the following line to import from Internet
# file = "http://edu.sablab.net/rp2019/data/proteinGroups.txt"
Tab = read.table(file, header=T, sep="\t", as.is=T, row.names = 1) 

## see number of rows and columns
dim(Tab)
## [1] 5460  185

The table contains too many columns to be properly seen after str(). Let’s show all column names:

colnames(Tab)
##   [1] "Majority.protein.IDs"                
##   [2] "Peptide.counts..all."                
##   [3] "Peptide.counts..razor.unique."       
##   [4] "Peptide.counts..unique."             
##   [5] "Protein.names"                       
##   [6] "Gene.names"                          
##   [7] "Fasta.headers"                       
##   [8] "Number.of.proteins"                  
##   [9] "Peptides"                            
##  [10] "Razor...unique.peptides"             
##  [11] "Unique.peptides"                     
##  [12] "Peptides.BO2day10rep1"               
##  [13] "Peptides.BO2day10rep2"               
##  [14] "Peptides.BO2day10rep3"               
##  [15] "Peptides.BO2day10rep4"               
##  [16] "Peptides.BO2day10rep5"               
##  [17] "Peptides.BO2day14rep1"               
##  [18] "Peptides.BO2day14rep2"               
##  [19] "Peptides.BO2day14rep3"               
##  [20] "Peptides.BO2day14rep4"               
##  [21] "Peptides.BO2day14rep5"               
##  [22] "Peptides.BO2day21rep1"               
##  [23] "Peptides.BO2day21rep2"               
##  [24] "Peptides.BO2day21rep3"               
##  [25] "Peptides.BO2day21rep4"               
##  [26] "Peptides.BO2day4rep1"                
##  [27] "Peptides.BO2day4rep2"                
##  [28] "Peptides.BO2day4rep3"                
##  [29] "Peptides.BO2day4rep4"                
##  [30] "Peptides.BO2day4rep5"                
##  [31] "Razor...unique.peptides.BO2day10rep1"
##  [32] "Razor...unique.peptides.BO2day10rep2"
##  [33] "Razor...unique.peptides.BO2day10rep3"
##  [34] "Razor...unique.peptides.BO2day10rep4"
##  [35] "Razor...unique.peptides.BO2day10rep5"
##  [36] "Razor...unique.peptides.BO2day14rep1"
##  [37] "Razor...unique.peptides.BO2day14rep2"
##  [38] "Razor...unique.peptides.BO2day14rep3"
##  [39] "Razor...unique.peptides.BO2day14rep4"
##  [40] "Razor...unique.peptides.BO2day14rep5"
##  [41] "Razor...unique.peptides.BO2day21rep1"
##  [42] "Razor...unique.peptides.BO2day21rep2"
##  [43] "Razor...unique.peptides.BO2day21rep3"
##  [44] "Razor...unique.peptides.BO2day21rep4"
##  [45] "Razor...unique.peptides.BO2day4rep1" 
##  [46] "Razor...unique.peptides.BO2day4rep2" 
##  [47] "Razor...unique.peptides.BO2day4rep3" 
##  [48] "Razor...unique.peptides.BO2day4rep4" 
##  [49] "Razor...unique.peptides.BO2day4rep5" 
##  [50] "Unique.peptides.BO2day10rep1"        
##  [51] "Unique.peptides.BO2day10rep2"        
##  [52] "Unique.peptides.BO2day10rep3"        
##  [53] "Unique.peptides.BO2day10rep4"        
##  [54] "Unique.peptides.BO2day10rep5"        
##  [55] "Unique.peptides.BO2day14rep1"        
##  [56] "Unique.peptides.BO2day14rep2"        
##  [57] "Unique.peptides.BO2day14rep3"        
##  [58] "Unique.peptides.BO2day14rep4"        
##  [59] "Unique.peptides.BO2day14rep5"        
##  [60] "Unique.peptides.BO2day21rep1"        
##  [61] "Unique.peptides.BO2day21rep2"        
##  [62] "Unique.peptides.BO2day21rep3"        
##  [63] "Unique.peptides.BO2day21rep4"        
##  [64] "Unique.peptides.BO2day4rep1"         
##  [65] "Unique.peptides.BO2day4rep2"         
##  [66] "Unique.peptides.BO2day4rep3"         
##  [67] "Unique.peptides.BO2day4rep4"         
##  [68] "Unique.peptides.BO2day4rep5"         
##  [69] "Sequence.coverage...."               
##  [70] "Unique...razor.sequence.coverage...."
##  [71] "Unique.sequence.coverage...."        
##  [72] "Mol..weight..kDa."                   
##  [73] "Sequence.length"                     
##  [74] "Sequence.lengths"                    
##  [75] "Q.value"                             
##  [76] "Score"                               
##  [77] "Identification.type.BO2day10rep1"    
##  [78] "Identification.type.BO2day10rep2"    
##  [79] "Identification.type.BO2day10rep3"    
##  [80] "Identification.type.BO2day10rep4"    
##  [81] "Identification.type.BO2day10rep5"    
##  [82] "Identification.type.BO2day14rep1"    
##  [83] "Identification.type.BO2day14rep2"    
##  [84] "Identification.type.BO2day14rep3"    
##  [85] "Identification.type.BO2day14rep4"    
##  [86] "Identification.type.BO2day14rep5"    
##  [87] "Identification.type.BO2day21rep1"    
##  [88] "Identification.type.BO2day21rep2"    
##  [89] "Identification.type.BO2day21rep3"    
##  [90] "Identification.type.BO2day21rep4"    
##  [91] "Identification.type.BO2day4rep1"     
##  [92] "Identification.type.BO2day4rep2"     
##  [93] "Identification.type.BO2day4rep3"     
##  [94] "Identification.type.BO2day4rep4"     
##  [95] "Identification.type.BO2day4rep5"     
##  [96] "Sequence.coverage.BO2day10rep1...."  
##  [97] "Sequence.coverage.BO2day10rep2...."  
##  [98] "Sequence.coverage.BO2day10rep3...."  
##  [99] "Sequence.coverage.BO2day10rep4...."  
## [100] "Sequence.coverage.BO2day10rep5...."  
## [101] "Sequence.coverage.BO2day14rep1...."  
## [102] "Sequence.coverage.BO2day14rep2...."  
## [103] "Sequence.coverage.BO2day14rep3...."  
## [104] "Sequence.coverage.BO2day14rep4...."  
## [105] "Sequence.coverage.BO2day14rep5...."  
## [106] "Sequence.coverage.BO2day21rep1...."  
## [107] "Sequence.coverage.BO2day21rep2...."  
## [108] "Sequence.coverage.BO2day21rep3...."  
## [109] "Sequence.coverage.BO2day21rep4...."  
## [110] "Sequence.coverage.BO2day4rep1...."   
## [111] "Sequence.coverage.BO2day4rep2...."   
## [112] "Sequence.coverage.BO2day4rep3...."   
## [113] "Sequence.coverage.BO2day4rep4...."   
## [114] "Sequence.coverage.BO2day4rep5...."   
## [115] "Intensity"                           
## [116] "Intensity.BO2day10rep1"              
## [117] "Intensity.BO2day10rep2"              
## [118] "Intensity.BO2day10rep3"              
## [119] "Intensity.BO2day10rep4"              
## [120] "Intensity.BO2day10rep5"              
## [121] "Intensity.BO2day14rep1"              
## [122] "Intensity.BO2day14rep2"              
## [123] "Intensity.BO2day14rep3"              
## [124] "Intensity.BO2day14rep4"              
## [125] "Intensity.BO2day14rep5"              
## [126] "Intensity.BO2day21rep1"              
## [127] "Intensity.BO2day21rep2"              
## [128] "Intensity.BO2day21rep3"              
## [129] "Intensity.BO2day21rep4"              
## [130] "Intensity.BO2day4rep1"               
## [131] "Intensity.BO2day4rep2"               
## [132] "Intensity.BO2day4rep3"               
## [133] "Intensity.BO2day4rep4"               
## [134] "Intensity.BO2day4rep5"               
## [135] "LFQ.intensity.BO2day10rep1"          
## [136] "LFQ.intensity.BO2day10rep2"          
## [137] "LFQ.intensity.BO2day10rep3"          
## [138] "LFQ.intensity.BO2day10rep4"          
## [139] "LFQ.intensity.BO2day10rep5"          
## [140] "LFQ.intensity.BO2day14rep1"          
## [141] "LFQ.intensity.BO2day14rep2"          
## [142] "LFQ.intensity.BO2day14rep3"          
## [143] "LFQ.intensity.BO2day14rep4"          
## [144] "LFQ.intensity.BO2day14rep5"          
## [145] "LFQ.intensity.BO2day21rep1"          
## [146] "LFQ.intensity.BO2day21rep2"          
## [147] "LFQ.intensity.BO2day21rep3"          
## [148] "LFQ.intensity.BO2day21rep4"          
## [149] "LFQ.intensity.BO2day4rep1"           
## [150] "LFQ.intensity.BO2day4rep2"           
## [151] "LFQ.intensity.BO2day4rep3"           
## [152] "LFQ.intensity.BO2day4rep4"           
## [153] "LFQ.intensity.BO2day4rep5"           
## [154] "MS.MS.count.BO2day10rep1"            
## [155] "MS.MS.count.BO2day10rep2"            
## [156] "MS.MS.count.BO2day10rep3"            
## [157] "MS.MS.count.BO2day10rep4"            
## [158] "MS.MS.count.BO2day10rep5"            
## [159] "MS.MS.count.BO2day14rep1"            
## [160] "MS.MS.count.BO2day14rep2"            
## [161] "MS.MS.count.BO2day14rep3"            
## [162] "MS.MS.count.BO2day14rep4"            
## [163] "MS.MS.count.BO2day14rep5"            
## [164] "MS.MS.count.BO2day21rep1"            
## [165] "MS.MS.count.BO2day21rep2"            
## [166] "MS.MS.count.BO2day21rep3"            
## [167] "MS.MS.count.BO2day21rep4"            
## [168] "MS.MS.count.BO2day4rep1"             
## [169] "MS.MS.count.BO2day4rep2"             
## [170] "MS.MS.count.BO2day4rep3"             
## [171] "MS.MS.count.BO2day4rep4"             
## [172] "MS.MS.count.BO2day4rep5"             
## [173] "MS.MS.count"                         
## [174] "Only.identified.by.site"             
## [175] "Reverse"                             
## [176] "Potential.contaminant"               
## [177] "id"                                  
## [178] "Peptide.IDs"                         
## [179] "Peptide.is.razor"                    
## [180] "Mod..peptide.IDs"                    
## [181] "Evidence.IDs"                        
## [182] "MS.MS.IDs"                           
## [183] "Best.MS.MS"                          
## [184] "Oxidation..M..site.IDs"              
## [185] "Oxidation..M..site.positions"

First, we should remove bad features using information in Reverse and Potential.contaminant columns.

print("Original size:")
## [1] "Original size:"
dim(Tab)
## [1] 5460  185
ikeep = Tab$Reverse=="" & Tab$Potential.contaminant==""
Tab = Tab[ikeep,]
print("Filtered:")
## [1] "Filtered:"
dim(Tab)
## [1] 5306  185

Now, let us define a data container - it will be easier to manipulate later. We are interested in several columns to annotate the features (in addition to rownames, which is protein group IDs). Let’s keep: Protein.names, Gene.names, Fasta.headers. Sample information we shall take from

Data = list()
## get feature annotation
Data$Anno = Tab[,c("Protein.names","Gene.names","Fasta.headers")]
## get sample annotation
Data$Meta = read.table("http://edu.sablab.net/rp2019/data/sampleAnnotation.txt", header=T, sep="\t", as.is=T, row.names = 1) 

str(Data)
## List of 2
##  $ Anno:'data.frame':    5306 obs. of  3 variables:
##   ..$ Protein.names: chr [1:5306] "Synapsin-3" "" "" "" ...
##   ..$ Gene.names   : chr [1:5306] "Syn3" "Abcf2" "Morc3" "Ddx17" ...
##   ..$ Fasta.headers: chr [1:5306] "tr|A0A096MIT7|A0A096MIT7_RAT Synapsin-3 OS=Rattus norvegicus OX=10116 GN=Syn3 PE=1 SV=2;sp|O70441|SYN3_RAT Syna"| __truncated__ "tr|A0A096MIV5|A0A096MIV5_RAT ATP-binding cassette subfamily F member 2 OS=Rattus norvegicus OX=10116 GN=Abcf2 P"| __truncated__ "tr|D4A552|D4A552_RAT MORC family CW-type zinc finger 3 OS=Rattus norvegicus OX=10116 GN=Morc3 PE=1 SV=1;tr|A0A0"| __truncated__ "tr|E9PT29|E9PT29_RAT DEAD-box helicase 17 OS=Rattus norvegicus OX=10116 GN=Ddx17 PE=1 SV=3;tr|A0A096MIX2|A0A096"| __truncated__ ...
##  $ Meta:'data.frame':    19 obs. of  2 variables:
##   ..$ Time     : chr [1:19] "day04" "day04" "day04" "day04" ...
##   ..$ Replicate: chr [1:19] "rep1" "rep2" "rep3" "rep4" ...

Now, let us extract intensity data. We will use normalized LFQ intensities

## column indexes, containing LFQ intensities
icol = grep("LFQ.intensity",colnames(Tab))
## extract columns
Data$X0 = Tab[,icol]
## remove uninformative part of column names
colnames(Data$X0) = sub("LFQ.intensity.","",colnames(Data$X0))  
## matrices are faster than data.frames => transform to a matrix
Data$X0 = as.matrix(Data$X0)
str(Data)
## List of 3
##  $ Anno:'data.frame':    5306 obs. of  3 variables:
##   ..$ Protein.names: chr [1:5306] "Synapsin-3" "" "" "" ...
##   ..$ Gene.names   : chr [1:5306] "Syn3" "Abcf2" "Morc3" "Ddx17" ...
##   ..$ Fasta.headers: chr [1:5306] "tr|A0A096MIT7|A0A096MIT7_RAT Synapsin-3 OS=Rattus norvegicus OX=10116 GN=Syn3 PE=1 SV=2;sp|O70441|SYN3_RAT Syna"| __truncated__ "tr|A0A096MIV5|A0A096MIV5_RAT ATP-binding cassette subfamily F member 2 OS=Rattus norvegicus OX=10116 GN=Abcf2 P"| __truncated__ "tr|D4A552|D4A552_RAT MORC family CW-type zinc finger 3 OS=Rattus norvegicus OX=10116 GN=Morc3 PE=1 SV=1;tr|A0A0"| __truncated__ "tr|E9PT29|E9PT29_RAT DEAD-box helicase 17 OS=Rattus norvegicus OX=10116 GN=Ddx17 PE=1 SV=3;tr|A0A096MIX2|A0A096"| __truncated__ ...
##  $ Meta:'data.frame':    19 obs. of  2 variables:
##   ..$ Time     : chr [1:19] "day04" "day04" "day04" "day04" ...
##   ..$ Replicate: chr [1:19] "rep1" "rep2" "rep3" "rep4" ...
##  $ X0  : num [1:5306, 1:19] 2.46e+07 1.46e+07 1.44e+07 4.63e+08 4.78e+06 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5306] "A0A096MIT7;O70441" "A0A096MIV5;A2VD14;F1M8H5;A0A096MJ15" "D4A552;A0A096MIX0" "E9PT29;A0A096MIX2;A0A096MJW9" ...
##   .. ..$ : chr [1:19] "BO2day10rep1" "BO2day10rep2" "BO2day10rep3" "BO2day10rep4" ...

1.4.2. Preprocessing

Now we have data in the matrix Data$X0. The data are extremely right-skewed, so log-transformation is needed before using any exploratory data analysis. Let us assume, that all 0 intensities come from proteins with low detection limit (another approach is to consider them as missing values - NA, but we will not use this here).

## define ofset to avoid -Inf as 1st percentile of non-zero data
os = quantile(Data$X0[Data$X0>0],probs = 0.01, na.rm=TRUE)
## log-transorm and shift left-most values to 0
Data$LX = log2(os + Data$X0) - log2(os)
Data$LX[is.na(Data$LX)] = 0
str(Data)
## List of 4
##  $ Anno:'data.frame':    5306 obs. of  3 variables:
##   ..$ Protein.names: chr [1:5306] "Synapsin-3" "" "" "" ...
##   ..$ Gene.names   : chr [1:5306] "Syn3" "Abcf2" "Morc3" "Ddx17" ...
##   ..$ Fasta.headers: chr [1:5306] "tr|A0A096MIT7|A0A096MIT7_RAT Synapsin-3 OS=Rattus norvegicus OX=10116 GN=Syn3 PE=1 SV=2;sp|O70441|SYN3_RAT Syna"| __truncated__ "tr|A0A096MIV5|A0A096MIV5_RAT ATP-binding cassette subfamily F member 2 OS=Rattus norvegicus OX=10116 GN=Abcf2 P"| __truncated__ "tr|D4A552|D4A552_RAT MORC family CW-type zinc finger 3 OS=Rattus norvegicus OX=10116 GN=Morc3 PE=1 SV=1;tr|A0A0"| __truncated__ "tr|E9PT29|E9PT29_RAT DEAD-box helicase 17 OS=Rattus norvegicus OX=10116 GN=Ddx17 PE=1 SV=3;tr|A0A096MIX2|A0A096"| __truncated__ ...
##  $ Meta:'data.frame':    19 obs. of  2 variables:
##   ..$ Time     : chr [1:19] "day04" "day04" "day04" "day04" ...
##   ..$ Replicate: chr [1:19] "rep1" "rep2" "rep3" "rep4" ...
##  $ X0  : num [1:5306, 1:19] 2.46e+07 1.46e+07 1.44e+07 4.63e+08 4.78e+06 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5306] "A0A096MIT7;O70441" "A0A096MIV5;A2VD14;F1M8H5;A0A096MJ15" "D4A552;A0A096MIX0" "E9PT29;A0A096MIX2;A0A096MJW9" ...
##   .. ..$ : chr [1:19] "BO2day10rep1" "BO2day10rep2" "BO2day10rep3" "BO2day10rep4" ...
##  $ LX  : num [1:5306, 1:19] 4.87 4.15 4.14 9.06 2.7 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5306] "A0A096MIT7;O70441" "A0A096MIV5;A2VD14;F1M8H5;A0A096MJ15" "D4A552;A0A096MIX0" "E9PT29;A0A096MIX2;A0A096MJW9" ...
##   .. ..$ : chr [1:19] "BO2day10rep1" "BO2day10rep2" "BO2day10rep3" "BO2day10rep4" ...

Finaly, let’s order the samples in the way, defined in Data$Meta.

## check consistancy
if (!all(colnames(Data$X0) %in% rownames(Data$Meta)))
  stop("Error! Sample annotation in Meta does not fit such in the data!")
Data$X0 = Data$X[,rownames(Data$Meta)]
Data$LX = Data$X[,rownames(Data$Meta)]
str(Data)
## List of 4
##  $ Anno:'data.frame':    5306 obs. of  3 variables:
##   ..$ Protein.names: chr [1:5306] "Synapsin-3" "" "" "" ...
##   ..$ Gene.names   : chr [1:5306] "Syn3" "Abcf2" "Morc3" "Ddx17" ...
##   ..$ Fasta.headers: chr [1:5306] "tr|A0A096MIT7|A0A096MIT7_RAT Synapsin-3 OS=Rattus norvegicus OX=10116 GN=Syn3 PE=1 SV=2;sp|O70441|SYN3_RAT Syna"| __truncated__ "tr|A0A096MIV5|A0A096MIV5_RAT ATP-binding cassette subfamily F member 2 OS=Rattus norvegicus OX=10116 GN=Abcf2 P"| __truncated__ "tr|D4A552|D4A552_RAT MORC family CW-type zinc finger 3 OS=Rattus norvegicus OX=10116 GN=Morc3 PE=1 SV=1;tr|A0A0"| __truncated__ "tr|E9PT29|E9PT29_RAT DEAD-box helicase 17 OS=Rattus norvegicus OX=10116 GN=Ddx17 PE=1 SV=3;tr|A0A096MIX2|A0A096"| __truncated__ ...
##  $ Meta:'data.frame':    19 obs. of  2 variables:
##   ..$ Time     : chr [1:19] "day04" "day04" "day04" "day04" ...
##   ..$ Replicate: chr [1:19] "rep1" "rep2" "rep3" "rep4" ...
##  $ X0  : num [1:5306, 1:19] 2.15e+07 1.92e+07 1.37e+07 5.36e+08 1.20e+07 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5306] "A0A096MIT7;O70441" "A0A096MIV5;A2VD14;F1M8H5;A0A096MJ15" "D4A552;A0A096MIX0" "E9PT29;A0A096MIX2;A0A096MJW9" ...
##   .. ..$ : chr [1:19] "BO2day4rep1" "BO2day4rep2" "BO2day4rep3" "BO2day4rep4" ...
##  $ LX  : num [1:5306, 1:19] 2.15e+07 1.92e+07 1.37e+07 5.36e+08 1.20e+07 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5306] "A0A096MIT7;O70441" "A0A096MIV5;A2VD14;F1M8H5;A0A096MJ15" "D4A552;A0A096MIX0" "E9PT29;A0A096MIX2;A0A096MJW9" ...
##   .. ..$ : chr [1:19] "BO2day4rep1" "BO2day4rep2" "BO2day4rep3" "BO2day4rep4" ...

1.4.3. Alternative: readMaxQuant(..)

Alternatively, I prepared a script which can do this automaticaly

## load the script from Internet
source("http://sablab.net/scripts/readMaxQuant.r")

Data = readMaxQuant(file, 
             key.data ="LFQ.intensity.", 
             names.anno = c("Protein.names","Gene.names","Fasta.headers"), 
             samples = "http://edu.sablab.net/rp2019/data/sampleAnnotation.txt", 
             keep.na = FALSE, 
             row.names=1, 
             os.quantile=0.01)
## 5460 features were read.
## Reverse and contaminants are removed, resulting in 5306 features 
## Remove 12 uninformative features
## [1] "Final data container:"
## List of 6
##  $ nf  : int 5294
##  $ ns  : int 19
##  $ Anno:'data.frame':    5294 obs. of  3 variables:
##   ..$ Protein.names: chr [1:5294] "Synapsin-3" "" "" "" ...
##   ..$ Gene.names   : chr [1:5294] "Syn3" "Abcf2" "Morc3" "Ddx17" ...
##   ..$ Fasta.headers: chr [1:5294] "tr|A0A096MIT7|A0A096MIT7_RAT Synapsin-3 OS=Rattus norvegicus OX=10116 GN=Syn3 PE=1 SV=2;sp|O70441|SYN3_RAT Syna"| __truncated__ "tr|A0A096MIV5|A0A096MIV5_RAT ATP-binding cassette subfamily F member 2 OS=Rattus norvegicus OX=10116 GN=Abcf2 P"| __truncated__ "tr|D4A552|D4A552_RAT MORC family CW-type zinc finger 3 OS=Rattus norvegicus OX=10116 GN=Morc3 PE=1 SV=1;tr|A0A0"| __truncated__ "tr|E9PT29|E9PT29_RAT DEAD-box helicase 17 OS=Rattus norvegicus OX=10116 GN=Ddx17 PE=1 SV=3;tr|A0A096MIX2|A0A096"| __truncated__ ...
##  $ Meta:'data.frame':    19 obs. of  2 variables:
##   ..$ Time     : chr [1:19] "day04" "day04" "day04" "day04" ...
##   ..$ Replicate: chr [1:19] "rep1" "rep2" "rep3" "rep4" ...
##  $ X0  : num [1:5294, 1:19] 2.15e+07 1.92e+07 1.37e+07 5.36e+08 1.20e+07 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5294] "A0A096MIT7;O70441" "A0A096MIV5;A2VD14;F1M8H5;A0A096MJ15" "D4A552;A0A096MIX0" "E9PT29;A0A096MIX2;A0A096MJW9" ...
##   .. ..$ : chr [1:19] "BO2day4rep1" "BO2day4rep2" "BO2day4rep3" "BO2day4rep4" ...
##  $ LX  : num [1:5294, 1:19] 4.69 4.53 4.07 9.27 3.89 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5294] "A0A096MIT7;O70441" "A0A096MIV5;A2VD14;F1M8H5;A0A096MJ15" "D4A552;A0A096MIX0" "E9PT29;A0A096MIX2;A0A096MJW9" ...
##   .. ..$ : chr [1:19] "BO2day4rep1" "BO2day4rep2" "BO2day4rep3" "BO2day4rep4" ...

 

1.5. Read values from binary file (optional)

R can keep data in GZip-ed form, automatically loading the variables into memory. Such files have .RData extension. This is a fast & easy way to store your data. Let us first download the data in RData format into you working directory using download.file() and then load it by load(). Parameters of downloading:

Example: Data from TCGA for 40 lung squamous cell carcinoma samples and 20 controls.

download.file("http://edu.sablab.net/rt2018/data/LUSC60.RData", destfile="LUSC60.RData",mode = "wb")

getwd()                # show current folder
dir(pattern=".RData")  # show files in the current folder
load("LUSC60.RData")   # load the data

ls()                   # you should see 'LUSC' object among variables

str(LUSC)

 

1.6. Data export (optional)

There are several ways to export your data. Let’s consider the most simple.

Parameters of write.table are:

Let’s select only SCC (squamous cell carcinoma) patients from Samples and save the file locally

## get indexes of differentiation markers
imarker = Data$Anno$Gene.names %in% markers

## extract markers and annotate them by gene names
X = Data$LX[imarker, ]
row.names(X) = Data$Anno$Gene.names[imarker]

write.table(X,file = "MarkerBO.txt",sep = "\t",row.names = TRUE, col.names=NA, quote=FALSE)

# save Data as a binary file
save(Data,file="Data.RData") 

# save all variables as binary file
save(list=ls(),file="workspace.RData") 

getwd()
dir()      # see the results

LIH