2.1. Variables

For those, who are new to programming, just consider variable as a box with a label. You can store some information in it. In R there are several ways how one can assign values to a variable.

# put 2 to x
x = 2

# put 3 to y
y <- 3

# put x+y into z
x + y -> z

Variables are case-sensitive. Try typing in Y instead of y and you will see error.

As you see, you can type the variable name to see what is inside. More advanced way to show the data is to use functions print(), cat(), View().

cat("x=",x,", y=",y,", z=",z,"\n")

To see what variables we defined, type ls(). And if you want to remove a variable - rm() Try:

ls() # here are the variables
## [1] "x" "y" "z"
rm(list=ls()) # remove them
ls() # check
## character(0)

2.2. Operations

i = 5      # i assigned the value of 5
i^2        # power
i%/%2       # integer division
i%%2      # modulo - the remainder of integer division
round(1.5) # round the results

2.3. atomic types of data

Atomic types of data are what we call scalar in math. An atomic value is a simple, unique value. You can get the class of the data by functions class() or mode().

2.3.1. Numeric

Numbers can be presented by integer or numeric data types. They are numeric by default

r = 1.5
len = 2 * pi * r  # note: 'pi' - predefined constant 3.141592653589793

2.3.2. Logical values and operations

Logical or Boolean variables get two values - TRUE (or just T) and FALSE (or F)

b1 = TRUE   # try  b1=T
b2 = FALSE  # try  b2=F
b1 & b2     # logical AND
b1 | b2     # logical OR
!b1         # logical NOT
xor(b1,b2)  # logical XOR
r == len    # does value in `r` equals to the one in `len` ?
r < len     # is `r` smaller then `len` ?
r <= len    # is `r` smaller or euqal then `len`
r != len    # is `r` different from `len`

2.3.3. Characters

In R the text information is stored in variables of character class. Different to many other languages, one atomic character variable can contain entire text. In other words, value “hello” is not considered as a vector of letters, but as a whole.

  • You can use either "..." or '...' to define your character

There are many functions that work with text in R. Let’s consider some of them

st = 'Hello, world!'
paste("We say:",st)   # concatenation
## [1] "We say: Hello, world!"
# a more powerfull method to create text (as in C):
sprintf("We say for the %d-nd time: %s..",2,st)   # directly prints output
## [1] "We say for the 2-nd time: Hello, world!.."
st = sprintf("By the way, pi=%f and N_Avogadro=%.2e",pi,6.02214085e23) # set output to `st` variable
## [1] "By the way, pi=3.141593 and N_Avogadro=6.02e+23"
casefold(st, upper=T) # change the case
## [1] "BY THE WAY, PI=3.141593 AND N_AVOGADRO=6.02E+23"
nchar(st)             # number of characters
## [1] 47
strsplit(st," ")      # splits characters
## [[1]]
## [1] "By"                  "the"                 "way,"               
## [4] "pi=3.141593"         "and"                 "N_Avogadro=6.02e+23"

Very powerful functions are sub and gsub. They replace regular expression template by defined character value. sub replace only the first match, gsub - all matches.

sub(".+and ","",st)
## [1] "N_Avogadro=6.02e+23"

2.3.4. Special values

In R, there is a special value to denote missing data. This value is NA and it can be assigned to a variable of any class. Whatever operation you do with NA value will be NA, except function is.na(), that returns TRUE. Try:

na = NA    # create variable `na` with NA inside
na + 1     # result is NA
100>na     # result is still NA
na==na     # result is still NA
is.na(na)  # TRUE
  • Another value is NULL. It shows that the variable is defined, but contains nothing yet. is.null() or length() may help checking for this value.

Numeric numbers can be, in addition, infinite (Inf,-Inf) and undefined not-a-number (NaN). Functions is.infinite(), is.finite() and is.nan() help detecting such values.

1/0    # Inf
-1/0   # -Inf

0/0      # undefined value NaN
sqrt(-1) # not a real number

2.4. Vectors

Vectors combine atomic elements of a single class. You can have vector of numbers, logical values, characters… but not mixed. Numeric vectors can be created by a simple sequence, e.g. 1:5. Generic function is c() that takes enumeration of elements and combine them. You can address to an element of a vector using [i], where i - is element number (starts from 1).

a = c(1,2,3,4,5) # creating vector by enumeration
## [1] 1 2 3 4 5
## [1] 6
## [1]  6  8 10 12 14
length(a)       # get length of `a`
## [1] 5
txt = c(st, "Let's try vectors", "bla-bla-bla")
## [1] "By the way, pi=3.141593 and N_Avogadro=6.02e+23"
## [2] "Let's try vectors"                              
## [3] "bla-bla-bla"
boo = c(T,F,T,F,T)

More advanced way to define sequences

seq(from=1,to=10,by=0.5) # a numeric sequence
rep(1:4, times = 2)      # any sequence defined by repetition
rep(1:4, each = 2)       # similar, but not the same

And here is one of the strongest feature of R

We can work easily with elements of the vector. The indexes of the vector can be vectors themselves.

## [1] 1 2 3 4 5
a[1:3] # take a part of vector by index numbers
## [1] 1 2 3
a[boo] # take a part of vector by logical vector
## [1] 1 3 5
a[a>2] # take a part by a condition
## [1] 3 4 5
a[-1]  # removes the first element
## [1] 2 3 4 5

Please, do the following tasks:

  1. Compare two numbers: \(e^\pi\) and \(\pi^e\). Print the results using cat()

use: pi, exp(), ^, >, cat()

  1. Create a vector of exponents of 2: \(2^0\), \(2^1\), \(2^2\), …, \(2^{10}\)

i:j, ^

  1. Output the results of Task b as a vector of character with a template: “2^i = x”.

print(), sprintf()

  1. Output the results of Task c, showing only even exponents.

print, seq or “%%”

2.5. Matrices and arrays

Matrices are very similar to vectors, just defined in 2 dimensions. They as well include atomic values of a single class. Arrays are multidimensional matrixes

Let us define a matrix with 5 rows and 3 columns

A=matrix(0,nrow=5, ncol=3) 

A=A-1   # add scalar

A=A+1:5 # add vector

t(A)    # transpose

A*A     # by-element product

A%*%t(A)   # matrix product

# alternative ways to create matrix:

2.6. Data frames

Data frames are two-dimensional tables that can contain values of different classes in different columns.


# let us add a column to Data
mice = sprintf("Mouse_%d",1:5)
Data = cbind(mice,Data)
# put the names to the variables
names(Data) = c("name","sex","weight","age","survival","code")
##      name sex weight age survival code
## 1 Mouse_1  NA     NA  NA       NA   NA
## 2 Mouse_2  NA     NA  NA       NA   NA
## 3 Mouse_3  NA     NA  NA       NA   NA
## 4 Mouse_4  NA     NA  NA       NA   NA
## 5 Mouse_5  NA     NA  NA       NA   NA
# put in the data manualy
Data$code = 1:nrow(Data)
##      name    sex weight age survival code
## 1 Mouse_1   Male     21 160     TRUE    1
## 2 Mouse_2 Female     17 131    FALSE    2
## 3 Mouse_3 Female     20 149     TRUE    3
## 4 Mouse_4   Male     22 187    FALSE    4
## 5 Mouse_5   Male     19 141     TRUE    5

Useful functions to see what is inside your data frame:

View(Data)    # visualize data as a table
str(Data)     # see the structure of the table or other variables
head(Data)    # see the head of the table
summary(Data) # summary on the data

2.7. Factors

Factors are introduced instead of character vectors with repeated values, e.g. Data$sex. A factor variable includes a vector of integer indexes and a short vector of character - levels of the factor.

# Let's use factors
Data$sex = factor(Data$sex)

# usefull commands when working with factors:
levels(Data$sex)        # returns levels of the factor
nlevels(Data$sex)       # returns number of levels
as.character(Data$sex)  # transform into character vector 

2.7. Lists

Lists are the most general containers in classical R. Elements (fields) of a list can be atomic, vectors, matrices, data frames or other lists. Let’s create a list that includes data and description of an experiment.

L = list()    # creates an empty list
L$Data = Data
L$description = "A fake experiment with virtual mice"
L$num = nrow(Data)

Access to list elements:

# or by index:
# other ways:


Despite R is over 20 years old, it is still a rapidly developing language. If you are interested in modern and more advanced data structures, please check this recent course by A.Ginolhac, E.Koncina, R.Krause (UniLu/LCSB & Elixir) Data Processing in R-tidyverse