- Variables and Operators

For those, who are new to programming, just consider variable as a box with a label. You can store some information in it. In R there are several ways how one can assign values to a variable.

```
# put 2 to x
x = 2
x
# put 3 to y
y <- 3
y
# put x+y into z
x + y -> z
z
```

Variables are case-sensitive. Try typing in `Y`

instead of `y`

and you will see error.

As you see, you can type the variable name to see what is inside. More advanced way to show the data is to use functions `print()`

, `cat()`

, `View()`

.

```
print(z)
cat("x=",x,", y=",y,", z=",z,"\n")
```

To see what variables we defined, type `ls()`

. And if you want to remove a variable - `rm()`

Try:

`ls() # here are the variables`

`## [1] "x" "y" "z"`

```
rm(list=ls()) # remove them
ls() # check
```

`## character(0)`

```
i = 5 # i assigned the value of 5
i*2
i/2
i^2 # power
i%/%2 # integer division
i%%2 # modulo - the remainder of integer division
round(1.5) # round the results
```

`atomic`

Atomic types of data are what we call *scalar* in math. An atomic value is a simple, unique value. You can get the class of the data by functions `class()`

or `mode()`

.

Numbers can be presented by `integer`

or `numeric`

data types. They are `numeric`

by default

```
r = 1.5
len = 2 * pi * r # note: 'pi' - predefined constant 3.141592653589793
len
```

Logical or Boolean variables get two values - `TRUE`

(or just `T`

) and `FALSE`

(or `F`

)

```
b1 = TRUE # try b1=T
b2 = FALSE # try b2=F
b1 & b2 # logical AND
b1 | b2 # logical OR
!b1 # logical NOT
xor(b1,b2) # logical XOR
r == len # does value in `r` equals to the one in `len` ?
r < len # is `r` smaller then `len` ?
r <= len # is `r` smaller or euqal then `len`
r != len # is `r` different from `len`
```

In **R** the text information is stored in variables of `character`

class. Different to many other languages, one atomic `character`

variable can contain entire text. In other words, value “hello” is not considered as a vector of letters, but as a whole.

- You can use either
`"..."`

or`'...'`

to define your`character`

There are many functions that work with text in **R**. Let’s consider some of them

```
st = 'Hello, world!'
paste("We say:",st) # concatenation
```

`## [1] "We say: Hello, world!"`

```
# a more powerfull method to create text (as in C):
sprintf("We say for the %d-nd time: %s..",2,st) # directly prints output
```

`## [1] "We say for the 2-nd time: Hello, world!.."`

```
st = sprintf("By the way, pi=%f and N_Avogadro=%.2e",pi,6.02214085e23) # set output to `st` variable
print(st)
```

`## [1] "By the way, pi=3.141593 and N_Avogadro=6.02e+23"`

`casefold(st, upper=T) # change the case`

`## [1] "BY THE WAY, PI=3.141593 AND N_AVOGADRO=6.02E+23"`

`nchar(st) # number of characters`

`## [1] 47`

`strsplit(st," ") # splits characters`

```
## [[1]]
## [1] "By" "the" "way,"
## [4] "pi=3.141593" "and" "N_Avogadro=6.02e+23"
```

Very powerful functions are `sub`

and `gsub`

. They replace *regular expression* template by defined character value. `sub`

replace only the first match, `gsub`

- all matches.

`sub(".+and ","",st)`

`## [1] "N_Avogadro=6.02e+23"`

In **R**, there is a special value to denote missing data. This value is ** NA** and it can be assigned to a variable of any class. Whatever operation you do with

`NA`

`NA`

`is.na()`

, that returns `TRUE`

. Try:```
na = NA # create variable `na` with NA inside
na + 1 # result is NA
100>na # result is still NA
na==na # result is still NA
is.na(na) # TRUE
```

- Another value is
`NULL`

. It shows that the variable is defined, but contains nothing yet.`is.null()`

or`length()`

may help checking for this value.

Numeric numbers can be, in addition, infinite (`Inf`

,`-Inf`

) and undefined not-a-number (`NaN`

). Functions `is.infinite()`

, `is.finite()`

and `is.nan()`

help detecting such values.

```
1/0 # Inf
-1/0 # -Inf
is.infinite(1/0)
is.finite(1/0)
0/0 # undefined value NaN
sqrt(-1) # not a real number
```

Vectors combine `atomic`

elements of a single class. You can have vector of numbers, logical values, characters… but not mixed. Numeric vectors can be created by a simple sequence, e.g. `1:5`

. Generic function is `c()`

that takes enumeration of elements and combine them. You can address to an element of a vector using `[i]`

, where `i`

- is element number (starts from 1).

```
a = c(1,2,3,4,5) # creating vector by enumeration
a
```

`## [1] 1 2 3 4 5`

`a[1]+a[5]`

`## [1] 6`

```
b=5:9
a+b
```

`## [1] 6 8 10 12 14`

`length(a) # get length of `a``

`## [1] 5`

```
txt = c(st, "Let's try vectors", "bla-bla-bla")
txt
```

```
## [1] "By the way, pi=3.141593 and N_Avogadro=6.02e+23"
## [2] "Let's try vectors"
## [3] "bla-bla-bla"
```

```
boo = c(T,F,T,F,T)
boo
```

`## [1] TRUE FALSE TRUE FALSE TRUE`

- take care summing vectors. Try
`a + 1:3`

. The missing values are circularly repeated.

More advanced way to define sequences

```
seq(from=1,to=10,by=0.5) # a numeric sequence
rep(1:4, times = 2) # any sequence defined by repetition
rep(1:4, each = 2) # similar, but not the same
```

And here is one of the strongest feature of

R

We can work easily with elements of the vector. The indexes of the vector can be vectors themselves.

`a`

`## [1] 1 2 3 4 5`

`a[1:3] # take a part of vector by index numbers`

`## [1] 1 2 3`

`a[boo] # take a part of vector by logical vector`

`## [1] 1 3 5`

`a[a>2] # take a part by a condition`

`## [1] 3 4 5`

`a[-1] # removes the first element`

`## [1] 2 3 4 5`

Please, do the following tasks:

- Compare two numbers: \(e^\pi\) and \(\pi^e\). Print the results using
`cat()`

use:`pi`

,`exp()`

,`^`

,`>`

,`cat()`

- Create a vector of exponents of 2: \(2^0\), \(2^1\), \(2^2\), …, \(2^{10}\)

`i:j`

,`^`

- Output the results of
Task bas a vector of character with a template: “2^i = x”.

`print()`

,`sprintf()`

- Output the results of
Task c, showing only even exponents.print, seq or “%%”

**Matrices** are very similar to vectors, just defined in 2 dimensions. They as well include atomic values of a single class. **Arrays** are multidimensional matrixes

Let us define a matrix with 5 rows and 3 columns

```
A=matrix(0,nrow=5, ncol=3)
A
A=A-1 # add scalar
A
A=A+1:5 # add vector
A
t(A) # transpose
A*A # by-element product
A%*%t(A) # matrix product
# alternative ways to create matrix:
cbind(c(1,2,3,4),c(10,20,30,40))
rbind(c(1,2,3,4),c(10,20,30,40))
```

Data frames are two-dimensional tables that can contain values of different classes in different columns.

```
Data=data.frame(matrix(nr=5,nc=5))
# let us add a column to Data
mice = sprintf("Mouse_%d",1:5)
Data = cbind(mice,Data)
# put the names to the variables
names(Data) = c("name","sex","weight","age","survival","code")
Data
```

```
## name sex weight age survival code
## 1 Mouse_1 NA NA NA NA NA
## 2 Mouse_2 NA NA NA NA NA
## 3 Mouse_3 NA NA NA NA NA
## 4 Mouse_4 NA NA NA NA NA
## 5 Mouse_5 NA NA NA NA NA
```

```
# put in the data manualy
Data$name=sprintf("Mouse_%d",1:5)
Data$sex=c("Male","Female","Female","Male","Male")
Data$weight=c(21,17,20,22,19)
Data$age=c(160,131,149,187,141)
Data$survival=c(T,F,T,F,T)
Data$code = 1:nrow(Data)
Data
```

```
## name sex weight age survival code
## 1 Mouse_1 Male 21 160 TRUE 1
## 2 Mouse_2 Female 17 131 FALSE 2
## 3 Mouse_3 Female 20 149 TRUE 3
## 4 Mouse_4 Male 22 187 FALSE 4
## 5 Mouse_5 Male 19 141 TRUE 5
```

Useful functions to see what is inside your data frame:

```
View(Data) # visualize data as a table
str(Data) # see the structure of the table or other variables
head(Data) # see the head of the table
summary(Data) # summary on the data
```

Factors are introduced instead of character vectors with repeated values, e.g. Data$sex. A `factor`

variable includes a vector of integer indexes and a short vector of character - levels of the factor.

```
# Let's use factors
Data$sex = factor(Data$sex)
summary(Data)
# usefull commands when working with factors:
levels(Data$sex) # returns levels of the factor
nlevels(Data$sex) # returns number of levels
as.character(Data$sex) # transform into character vector
```

Lists are the most general containers in classical **R**. Elements (fields) of a list can be atomic, vectors, matrices, data frames or other lists. Let’s create a list that includes data and description of an experiment.

```
L = list() # creates an empty list
L$Data = Data
L$description = "A fake experiment with virtual mice"
L$num = nrow(Data)
str(L)
```

Access to list elements:

```
L$Data
L$num
# or by index:
L[[1]]
L[[3]]
# other ways:
L[["num"]]
L$'num'
```

Despite **R** is over 20 years old, it is still a rapidly developing language. If you are interested in modern and more advanced data structures, please check this recent course by **A.Ginolhac, E.Koncina, R.Krause** (UniLu/LCSB & Elixir) Data Processing in R-tidyverse