For those, who are new to programming, just consider variable as a box with a label. You can store some information in it. In R there are several ways how one can assign values to a variable.
# put 2 to x
x = 2
x
# put 3 to y
y <- 3
y
# put x+y into z
x + y -> z
z
Variables are case-sensitive. Try typing in Y
instead of y
and you will see error.
As you see, you can type the variable name to see what is inside. More advanced way to show the data is to use functions print()
, cat()
, View()
.
print(z)
cat("x=",x,", y=",y,", z=",z,"\n")
To see what variables we defined, type ls()
. And if you want to remove a variable - rm()
Try:
ls() # here are the variables
## [1] "x" "y" "z"
rm(list=ls()) # remove them
ls() # check
## character(0)
i = 5 # i assigned the value of 5
i*2
i/2
i^2 # power
i%/%2 # integer division
i%%2 # modulo - the remainder of integer division
round(1.5) # round the results
atomic
types of dataAtomic types of data are what we call scalar in math. An atomic value is a simple, unique value. You can get the class of the data by functions class()
or mode()
.
Numbers can be presented by integer
or numeric
data types. They are numeric
by default
r = 1.5
len = 2 * pi * r # note: 'pi' - predefined constant 3.141592653589793
len
Logical or Boolean variables get two values - TRUE
(or just T
) and FALSE
(or F
)
b1 = TRUE # try b1=T
b2 = FALSE # try b2=F
b1 & b2 # logical AND
b1 | b2 # logical OR
!b1 # logical NOT
xor(b1,b2) # logical XOR
r == len # does value in `r` equals to the one in `len` ?
r < len # is `r` smaller then `len` ?
r <= len # is `r` smaller or euqal then `len`
r != len # is `r` different from `len`
In R the text information is stored in variables of character
class. Different to many other languages, one atomic character
variable can contain entire text. In other words, value “hello” is not considered as a vector of letters, but as a whole.
"..."
or '...'
to define your character
There are many functions that work with text in R. Let’s consider some of them
st = 'Hello, world!'
paste("We say:",st) # concatenation
## [1] "We say: Hello, world!"
# a more powerfull method to create text (as in C):
sprintf("We say for the %d-nd time: %s..",2,st) # directly prints output
## [1] "We say for the 2-nd time: Hello, world!.."
st = sprintf("By the way, pi=%f and N_Avogadro=%.2e",pi,6.02214085e23) # set output to `st` variable
print(st)
## [1] "By the way, pi=3.141593 and N_Avogadro=6.02e+23"
casefold(st, upper=T) # change the case
## [1] "BY THE WAY, PI=3.141593 AND N_AVOGADRO=6.02E+23"
nchar(st) # number of characters
## [1] 47
strsplit(st," ") # splits characters
## [[1]]
## [1] "By" "the" "way,"
## [4] "pi=3.141593" "and" "N_Avogadro=6.02e+23"
Very powerful functions are sub
and gsub
. They replace regular expression template by defined character value. sub
replace only the first match, gsub
- all matches.
sub(".+and ","",st)
## [1] "N_Avogadro=6.02e+23"
In R, there is a special value to denote missing data. This value is NA
and it can be assigned to a variable of any class. Whatever operation you do with NA
value will be NA
, except function is.na()
, that returns TRUE
. Try:
na = NA # create variable `na` with NA inside
na + 1 # result is NA
100>na # result is still NA
na==na # result is still NA
is.na(na) # TRUE
NULL
. It shows that the variable is defined, but contains nothing yet. is.null()
or length()
may help checking for this value.Numeric numbers can be, in addition, infinite (Inf
,-Inf
) and undefined not-a-number (NaN
). Functions is.infinite()
, is.finite()
and is.nan()
help detecting such values.
1/0 # Inf
-1/0 # -Inf
is.infinite(1/0)
is.finite(1/0)
0/0 # undefined value NaN
sqrt(-1) # not a real number
Vectors combine atomic
elements of a single class. You can have vector of numbers, logical values, characters… but not mixed. Numeric vectors can be created by a simple sequence, e.g. 1:5
. Generic function is c()
that takes enumeration of elements and combine them. You can address to an element of a vector using [i]
, where i
- is element number (starts from 1).
a = c(1,2,3,4,5) # creating vector by enumeration
a
## [1] 1 2 3 4 5
a[1]+a[5]
## [1] 6
b=5:9
a+b
## [1] 6 8 10 12 14
length(a) # get length of `a`
## [1] 5
txt = c(st, "Let's try vectors", "bla-bla-bla")
txt
## [1] "By the way, pi=3.141593 and N_Avogadro=6.02e+23"
## [2] "Let's try vectors"
## [3] "bla-bla-bla"
boo = c(T,F,T,F,T)
boo
## [1] TRUE FALSE TRUE FALSE TRUE
a + 1:3
. The missing values are circularly repeated.More advanced way to define sequences
seq(from=1,to=10,by=0.5) # a numeric sequence
rep(1:4, times = 2) # any sequence defined by repetition
rep(1:4, each = 2) # similar, but not the same
And here is one of the strongest feature of R
We can work easily with elements of the vector. The indexes of the vector can be vectors themselves.
a
## [1] 1 2 3 4 5
a[1:3] # take a part of vector by index numbers
## [1] 1 2 3
a[boo] # take a part of vector by logical vector
## [1] 1 3 5
a[a>2] # take a part by a condition
## [1] 3 4 5
a[-1] # removes the first element
## [1] 2 3 4 5
Please, do the following tasks:
- Compare two numbers: \(e^\pi\) and \(\pi^e\). Print the results using
cat()
use:
pi
,exp()
,^
,>
,cat()
- Create a vector of exponents of 2: \(2^0\), \(2^1\), \(2^2\), …, \(2^{10}\)
i:j
,^
- Output the results of Task b as a vector of character with a template: “2^i = x”.
print()
,sprintf()
- Output the results of Task c, showing only even exponents.
print, seq or “%%”
Matrices are very similar to vectors, just defined in 2 dimensions. They as well include atomic values of a single class. Arrays are multidimensional matrixes
Let us define a matrix with 5 rows and 3 columns
A=matrix(0,nrow=5, ncol=3)
A
A=A-1 # add scalar
A
A=A+1:5 # add vector
A
t(A) # transpose
A*A # by-element product
A%*%t(A) # matrix product
# alternative ways to create matrix:
cbind(c(1,2,3,4),c(10,20,30,40))
rbind(c(1,2,3,4),c(10,20,30,40))
Data frames are two-dimensional tables that can contain values of different classes in different columns.
Data=data.frame(matrix(nr=5,nc=5))
# let us add a column to Data
mice = sprintf("Mouse_%d",1:5)
Data = cbind(mice,Data)
# put the names to the variables
names(Data) = c("name","sex","weight","age","survival","code")
Data
## name sex weight age survival code
## 1 Mouse_1 NA NA NA NA NA
## 2 Mouse_2 NA NA NA NA NA
## 3 Mouse_3 NA NA NA NA NA
## 4 Mouse_4 NA NA NA NA NA
## 5 Mouse_5 NA NA NA NA NA
# put in the data manualy
Data$name=sprintf("Mouse_%d",1:5)
Data$sex=c("Male","Female","Female","Male","Male")
Data$weight=c(21,17,20,22,19)
Data$age=c(160,131,149,187,141)
Data$survival=c(T,F,T,F,T)
Data$code = 1:nrow(Data)
Data
## name sex weight age survival code
## 1 Mouse_1 Male 21 160 TRUE 1
## 2 Mouse_2 Female 17 131 FALSE 2
## 3 Mouse_3 Female 20 149 TRUE 3
## 4 Mouse_4 Male 22 187 FALSE 4
## 5 Mouse_5 Male 19 141 TRUE 5
Useful functions to see what is inside your data frame:
View(Data) # visualize data as a table
str(Data) # see the structure of the table or other variables
head(Data) # see the head of the table
summary(Data) # summary on the data
Factors are introduced instead of character vectors with repeated values, e.g. Data$sex. A factor
variable includes a vector of integer indexes and a short vector of character - levels of the factor.
# Let's use factors
Data$sex = factor(Data$sex)
summary(Data)
# usefull commands when working with factors:
levels(Data$sex) # returns levels of the factor
nlevels(Data$sex) # returns number of levels
as.character(Data$sex) # transform into character vector
Lists are the most general containers in classical R. Elements (fields) of a list can be atomic, vectors, matrices, data frames or other lists. Let’s create a list that includes data and description of an experiment.
L = list() # creates an empty list
L$Data = Data
L$description = "A fake experiment with virtual mice"
L$num = nrow(Data)
str(L)
Access to list elements:
L$Data
L$num
# or by index:
L[[1]]
L[[3]]
# other ways:
L[["num"]]
L$'num'
Despite R is over 20 years old, it is still a rapidly developing language. If you are interested in modern and more advanced data structures, please check this recent course by A.Ginolhac, E.Koncina, R.Krause (UniLu/LCSB & Elixir) Data Processing in R-tidyverse