Functions

R is a functional programming (FP) language.

R is a collection of functions;

Functions that maps some input to some output.



Basic Structure

All R functions have three parts:

Part Description
formals() the list of arguments which controls how you can call the function.
body() the code inside the function.
environment() the map of the location of the function’s variables.


f <- function(x) { "ciao" }
f
## function(x) { "ciao" }
# the list of arguments which controls how you can call the function
formals(f)
## $x
#the code inside the function.
body(f)
## {
##     "ciao"
## }
#the map of the location of the function's variables.
environment(f)
## <environment: R_GlobalEnv>



Primitives

There is one exception to the rule that functions have three components.

Primitive functions call C code directly with .Primitive() and contain no R code (i.e. sum()).

Therefore their formals(), body(), and environment() are all NULL:

sum
## function (..., na.rm = FALSE)  .Primitive("sum")
formals(sum)
## NULL
body(sum)
## NULL
environment(sum)
## NULL



The syntax

The syntax for R functions is:

 foo = function(arg1, arg2, arg3, ...){
          
            ...do something...
   
        The last expression is returned 
}

You can specify arguments or you can leave space for unspecifed parameters.

Using triple-dots ... in argument list allows for unspecified additional arguments.

foo = function(x,y,...){
          print(x)
  
          ## grep unspecified parameters
          args = list(...)
          
          if("z" %in% names(args) ){
            print(args$z)
          }
}

foo(5, 3, z="what's up?" )
## [1] 5
## [1] "what's up?"

The scoping rules

Scoping (adjective) : “the act or practice of eyeing or examining, as in order to evaluate or appreciate”

The scope is the context within your computer program where a variable or an identifier can be used, or within which a declaration has effect.

Scoping is the set of rules that control the way R picks up the value of a variable.

a <- 1 

The <- operator is called a variable assignment operator.

Given the expression a <- 1:

  • the value is assigned to the variable in the current environment.

  • If you already had an assignment for the variable before in the same environment, this one will overwrite it.

  • Variable assignments only update in the current environment.

When R is looking for a value of a given variable, it will start searching from the bottom. This means the current environment is inspected first, then its enclosing environment. The search goes until either the value is found or the empty environment is reached.


R has two types of scoping:


Scoping Meaning Usage
Static a variable always refers to its top level environmentant. implemented automatically at the language level
Dynamic a global identifier refers to the identifier associated with the most recent environment. To select functions and save typing during interactive analysis


Static scoping is determined by the structure of the source code



There are four basic principles behind R’s implementation of static scoping:

  1. name masking

  2. functions and variables

  3. the first start

  4. dynamic lookup



Name masking

If a name is NOT defined inside a function, R will look one level up.

f <- function() {
  x <- 1
  y <- 2
  c(x, y)
}

f()
## [1] 1 2
rm(f)
# A __________
x <- 1

g <- function() {
  # B __________
  y <- 2
  c(x, y)
  # __________ B
}

g()
## [1] 1 2
rm(x, g)
# __________ A



Functions and variables

The same principles apply regardless of the type of associated value : the closer level wins.

The same rules of name masking apply if a function is defined inside another function

#fisrt function __________________
l <- function(x) x + 1

#second function ________________
m <- function() {
  #fist function again __________
  l <- function(x) x * 2
  l(10)
}
m()
#> [1] 20
rm(l, m)



The first start

What happens if you are calling an operation on value that is NOT initialized?

a <- a + 1
# Error in a + 1 : non-numeric argument to binary operator

To avoid it you can use the function exists()

j <- function() {
  if (!exists("a")) {
    a <- 1
  } else {
    a <- a + 1
  }
  a
}
j()
## [1] 2
rm(j)



Dynamic look up

Static scoping determines WHERE to look for values, not WHEN to look for them.

R looks for values when the function is run, not when it’s created.

This means that :

the output of a function can be different depending on objects outside its environment

# function recalling an external variable
f <- function() x

# time 0 _________
x <- 15
f()
#> [1] 15

# ... time 1 _________
x <- 20
f()
#> [1] 20

You generally want to AVOID this behaviour.

Declaration of variables is important. If you make a spelling mistake in your code, you won’t get an error.

One way to detect this problem is the findGlobals() function from codetools. This function lists all the external dependencies( i.e. functions and variables) of a function:

f <- function() x + 1
codetools::findGlobals(f)
#> [1] "+" "x"



Special calls

R supports two additional syntaxes for calling special types of functions:

infix and replacement functions.



Infix functions

Functions where the function name comes in between its arguments.

All user-created infix functions must start and end with %. R comes with the following infix functions predefined:

Infix Description
%in% Matching operator
%% Remainder operator
%*% Matrix multiplication
%/% Integer division
%o% Outer product
%x% Kronecker product

The complete list of built-in infix operators that don’t need % is: :, ::, :::, $, @, ^, *, /, +, -, >,>=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, <<-

To create a new operator:

`%+%` <- function(a, b) paste0(a, b)

"new" %+% " string"
#> [1] "new string"



Replacement functions

Replacement functions have names in the form of

function_name<-

and they modify their arguments in place.

They:

  • typically have two arguments (x, value), although they can have more.

  • they must return the modified object.

For example, the following function allows you to modify the second element of a vector:

# Replacement function (two parameters)
`replace_the_second<-` <- function(x, value) {
  x[2] <- value
  x
}

x <- 1:10

# Call of Replacement function (one parameters)
replace_the_second(x) <- 5

x
#>  [1]  1  5  3  4  5  6  7  8  9 10

When R evaluates the assignment replace_the_second(x) <- 5, it notices that the left hand side of the <- is not a simple name, so it looks for a function named replace_the_second<- to do the replacement.



Missing and stop

The missing() function can be used to test whether a value was specified as an argument to a function.

foo = function(x,y,...){
                 print(x)
                ## ___MISSING___ ##
                if ( missing(y) ){
                   cat("y is not specified\n")
                }else{
                   print(y)
                   args = list(...)
                   if('z' %in% names(args)) print(args$z)
                 }
}

foo(5, z="what's up?" )
Error in foo(5, z = "what's up?") : y is not specified

the stop() function stops execution of the current expression and executes an error action.

foo = function(x,y,...){
  if (missing(y)){
    
    # STOP
    stop("y is not specified, please STOP\n")
    
    }else{
     print(x)
     print(y)
     args = list(...)
     if("z" %in% names(args)) print(args$z)
  }
}

foo(5, z="what's up?" )
Error in foo(5, z = "what's up?") : y  is not specified, please STOP

Errors and warnings

geterrmessage() function gives the last error message.

warning() generates a warning message.

foo = function(x,y,...){
               if (missing(y)){
                 ## WARNINGS
                 warning("foo: y is not specified\n")
               }
        print(x)
        args = list(...)
        if("z" %in% names(args))
          print(args$z)
}

foo(5, z="what's up?" )
[1] 5
[1] "what's up?"
Warning message:
In foo(5, z = "what's up?") : foo: y is not specified
  • Warnings can be printed using warnings().

Condition Handling and Recovery

The condition system provides a mechanism for signaling and handling unusual conditions/errors

try()

The most straightforward way is to wrap our problematic call in a try() block

x = list( 1, 2, -2, 'H5', 0, 10)

str(x)

for(i in x) {
    print(i)
    cat("log of ", i, " = ", log(i),"\n-----------\n") 
}
[1] 1
log of  1  =  0 
-----------
[1] 2
log of  2  =  0.6931472 
-----------
[1] -2
log of  -2  =  NaN 
-----------
[1] "H5"
Error in log(i) : non-numeric argument to mathematical function
In addition: Warning message:
In log(i) : NaNs produced

To prevent this behaviour you can use a try() block

x = list( 1, 2, -2, 'H5', 0, 10)

str(x)

for(i in x) {
  print(i)
  ### TRY BLOCK
  try(
    cat("log of ", i, " = ", log(i),"\n-----------\n") 
  )
}
1] 1
log of  1  =  0 
-----------
[1] 2
log of  2  =  0.6931472 
-----------
[1] -2
log of  -2  =  NaN 
-----------
[1] "H5"
Error in log(i) : non-numeric argument to mathematical function
In addition: Warning message:
In log(i) : NaNs produced
[1] 0
log of  0  =  -Inf 
-----------
[1] 10
log of  10  =  2.302585 
-----------

Errors and warnings do not halt the loop, which continue on with the rest of the input.


tryCatch()

Sometimes users want perform an operation and catch errors and warnings.

This can be solved using the tryCatch(), which allows you to write specific error and warning handlers.

Function Summary
tryCatch() Evaluates the operation and return specific error and warnings.
# declaration
foo = function(z,
               ## WARNING FUNCTION
               warning = function(w) {
                 print( paste('warning:',w) );
                 },
               
               ## ERROR FUNCTION
               error = function(e) {
                 print(paste('error:',e));
               }
               
               ){
 
        ## TRY CATCH BLOCK
        tryCatch(
          { 
          print(paste("attempt log operation for z:",z))
          return(log(z))
          }
          ,warning = warning
          ,error = error )
  }

#execution ---
foo(2)
## [1] "attempt log operation for z: 2"
## [1] 0.6931472
# executes & invokes the WARNING’s handler ---
foo(-2) 
## [1] "attempt log operation for z: -2"
## [1] "warning: simpleWarning in log(z): NaNs produced\n"
# executes & invokes the ERROR’s handler ---
foo("H5")
## [1] "attempt log operation for z: H5"
## [1] "error: Error in log(z): non-numeric argument to mathematical function\n"

Restarts

Sometimes users want substitute the return value when errors or warnings are returned.

Function Summary
invokeRestart() Transfers control to the point where the specified restart was established + calls the restart’s handler with the arguments.
withRestarts() describe the action the restart takes

I want to calculte a log of a values until I have a result

How can I control warnings and error on the log, so I will be able to be calm?

tryCatch() + invokeRestart() + withRestarts()

foo = function(z,
               
               ## WARNING FUNCTION with restart
               warning = function(w) {
                 print( paste('warning:',w) );
                 invokeRestart("correctArgForWarnings")
                 },
               
               ## ERROR FUNCTION with restart
               error = function(e) {
                 print(paste('error:',e));
                 invokeRestart("correctArgForErrors")
               }
               
               ){
  
  ## Loop is repeated until a break is specified
  repeat 
    ## 1. catch errors *********************
    withRestarts(   
     
      ## 2. catch warnings =================
      withRestarts(
        
        ## TRY CATCH BLOCk -----------------
        tryCatch(
          { 
          print(paste("attempt log operation for z:",z))
          return(log(z))
          } # return break the repeat loop
          ,warning = warning
          ,error = error )
          ##------------------------------------
        
        , correctArgForWarnings = function() {z <<- -z} ) 
      ##=================================
      
      , correctArgForErrors = function() {z <<- 1})
       ##*********************************
}

foo(2)
## [1] "attempt log operation for z: 2"
## [1] 0.6931472
# invokes the warning’s handler
foo(-2) 
## [1] "attempt log operation for z: -2"
## [1] "warning: simpleWarning in log(z): NaNs produced\n"
## [1] "attempt log operation for z: 2"
## [1] 0.6931472
# invokes the error’s handler
foo("H5")
## [1] "attempt log operation for z: H5"
## [1] "error: Error in log(z): non-numeric argument to mathematical function\n"
## [1] "attempt log operation for z: 1"
## [1] 0

Debugging R code

Function Summary
debug() Set, unset or query the debugging flag on a function
browser() Interrupt the execution of an expression and allow the inspection of the environment.
traceback() Prints the call stack of the last uncaught error.
>debug(lsfit6)

>lsfit6(X,y)
debugging in: lsfit6(X, y)
debug at #1: {
    solve.default(crossprod(X), crossprod(X, y))
}
Browse[2]> crossprod(X)

               (Intercept) incidence I(incidence^2) I(incidence^3)
(Intercept)         20.000   37.7000        81.1900       191.9870
incidence           37.700   81.1900       191.9870       483.4555
I(incidence^2)      81.190  191.9870       483.4555      1270.0715
I(incidence^3)     191.987  483.4555      1270.0715      3435.4413

> undebug(lsfit6)
>foo = function(){  1994 + "You go out and it’s on" }

>fighters = function() { print("Yeah, whatever it is"); foo()}

>fighters()
[1] "Yeah, whatever it is"
Errore in 1994 + "You go out and it’s on" :
  argomento non numerico trasformato in operatore binario

>traceback()
2: foo() at #1
1: fighters()

Conditional & repetetive execution


Type of exec Condition Example
CONDITIONAL if if (cond) expr1 else expr2
CONDITIONAL ifelse ifelse( cond, expr1, expr2 )
REPETITIVE for for ( i in expr1 ) expr2
REPETITIVE while while (cond) expr
REPETITIVE repeat repeat expr

The break statement can be used terminate ANY loop (and it is the only way to terminate a repeat loop).


Recursion and iteration

The factorial function

n! = n *(n-1) * (n-2) . . . . . 2 *1

can be defined recursively as:

n! -> f(n) = n*f(n-1)

with f(1)=1.


Let’s see how we can implement it:

## Recursion
fact.rec = function(n){
  ifelse (n==1, 1, (n * fact.rec(n-1) ) )
}

## Iteration
fact.it = function(n){
  ans = 1
  for (ii in 2:n) ans = ans * ii
  ans
}


A simple way to benchmark, how long does it takes?

system.time( fact.rec(100) )["elapsed"]
## elapsed 
##   0.003
system.time( fact.it(100) )["elapsed"]
## elapsed 
##   0.002
library(rbenchmark)

# benchmark() is a simple wrapper around system.time()
benchmark( fact.rec(15)
          , fact.it(15)
          
          , order="relative"
          , replications=5000
          )
##           test replications elapsed relative user.self sys.self user.child sys.child
## 2  fact.it(15)         5000   0.012    1.000     0.012    0.000          0         0
## 1 fact.rec(15)         5000   0.134   11.167     0.132    0.003          0         0
  1. Recursive version is “conceptually attractive” … iterative version less so.

  2. Recursive version computationally more expensive. Overhead:

    • in time: every time a function is called;

    • in memory usage: Computing fact.rec(100) requires fact.rec(99) which requires … fact.rec(1). So fact.rec(100) can not be completed before fact.rec(1) is completed.


Functionals

Functional is a function that takes a function as an input and returns a vector as output.

Function Summary
lapply() Applies a function to each element of a list and returns a list
sapply() Applies a function to each element of a list and returns a vector/matrix
tapply() Applies a function to each element of an indexed array
apply() Applies a function to margins of an array or matrix
mapply() Applies a function to each element of different objects

the simplest functional is lapply(), which takes a function, applies it to each element in a list, and returns the results in the form of a list. lapply() is the building block for many other functionals, so it’s important to understand how it works. Here’s a pictorial representation:

These functions are alternatives to iterations.

lapply() makes it easier to work with lists by eliminating much of the cliche’ associated with looping.

lapply() is written in C for performance, but we can obtaining the same result with a for-loop

l = list(a=1:3,b=4) 

ans = vector("list",length(l))
for (ii in seq_along(l)){
      ans[[ii]] = c(
         length(l[[ii]])
        ,mean(l[[ii]])
        )
} 

Benchmarking

f.lapply = function(my.list){ 
             lapply( my.list, function(x){
               c(mean(x),length(x))
               })
}

f.forloop = function(my.list) {
            ans = vector("list",length(my.list))

              for (ii in 1:length(my.list)){
                ans[[ii]] = c(mean(my.list[[ii]])
                              ,length(my.list[[ii]])
                              );
                ans 
              }
  }

library(rbenchmark) 

set.seed(1)

N=10^(1:6)

b = vector('list', length(N))

for ( i in 1:length(N) ){

    l = list( runif(N[[i]]), runif(N[[i]]) )

  b[[i]] = cbind.data.frame(
    
    "N" = N[[i]]
    
    , benchmark( 
             f.lapply(l) 
           , f.forloop(l)
           , columns=c("test", "replications", "elapsed", "relative")
           , order="relative" 
           , replications=10000)
  )

}

b = do.call ( "rbind.data.frame", b)

b

library(ggplot2)
ggplot(b, aes(x=N,y=log2(elapsed), group=test,color=test))+geom_line()+
  geom_point(aes(size=elapsed), fill='white', shape=21, stroke=2)+
  scale_x_log10()+
  ggsci::scale_color_startrek()+
  ggpubr::theme_pubr()

sapply() and vapply() are very similar to lapply() except they simplify their output to produce an atomic vector. While sapply() guesses, vapply() takes an additional argument specifying the output type.

if you want to perform operations on two list, you have to used mapply()

list1 = list(c('value'=1) ,c('value'=2),c('value'=3))
list2 = list('a','b','c')

z = mapply( function (x,y){
  x$names=y;
  x = as.data.frame(x)
  return(x)
  }
  ,x=list1
  ,y=list2
        , SIMPLIFY = F
        )
z
## [[1]]
##   value names
## 1     1     a
## 
## [[2]]
##   value names
## 1     2     b
## 
## [[3]]
##   value names
## 1     3     c
z = do.call('rbind.data.frame', z)

z
##   value names
## 1     1     a
## 2     2     b
## 3     3     c


A work by Matteo Cereda and Fabio Iannelli