Prerequisites

Here we’ll explore the TIBBLE package

suppressWarnings(library(tidyverse))

“modern reimagining of the data.frame […] that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.”

Tibble tidyverse


Creating tibbles

Most other R packages use regular data frames, so you might want to coerce a data frame to a tibble. You can do that with as_tibble():

as_tibble(iris)
## # A tibble: 150 × 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # … with 140 more rows

You can create a new tibble from individual vectors with tibble().

tibble() will automatically recycle inputs of length 1, and allows you to refer to variables that you just created, as shown below.

tibble(
  x = 1:5, 
  y = 1, 
  z = x ^ 2 + y
)
## # A tibble: 5 × 3
##       x     y     z
##   <int> <dbl> <dbl>
## 1     1     1     2
## 2     2     1     5
## 3     3     1    10
## 4     4     1    17
## 5     5     1    26

If you’re already familiar with data.frame(),

tibble() does much less:

  1. it never changes the type of the inputs (e.g. it never converts strings to factors!);

  2. it never changes the names of variables;

  3. it never creates row names.


tb <- tibble(
  `:)` = "smile", 
  ` ` = "space",
  `2000` = "number"
)
tb
## # A tibble: 1 × 3
##   `:)`  ` `   `2000`
##   <chr> <chr> <chr> 
## 1 smile space number



Tibble vs data.frame

There are two main differences in the usage of a tibble() vs. a classic data.frame():

printing and subsetting.

1. Printing

Tibbles have a prefined print method that shows only the first 10 rows, and all the columns that fit on screen.

In addition to its name, each column reports its type, a nice feature borrowed from str():

tibble(
  a = lubridate::now() + runif(1e3) * 86400,
  b = lubridate::today() + runif(1e3) * 30,
  c = 1:1e3,
  d = runif(1e3),
  e = sample(letters, 1e3, replace = TRUE)
)
## # A tibble: 1,000 × 5
##    a                   b              c      d e    
##    <dttm>              <date>     <int>  <dbl> <chr>
##  1 2023-02-10 05:10:22 2023-02-28     1 0.236  a    
##  2 2023-02-10 10:32:21 2023-02-16     2 0.958  o    
##  3 2023-02-10 00:29:15 2023-02-16     3 0.0447 o    
##  4 2023-02-10 03:28:00 2023-02-15     4 0.981  i    
##  5 2023-02-09 19:10:10 2023-03-03     5 0.295  u    
##  6 2023-02-10 04:33:31 2023-02-13     6 0.289  p    
##  7 2023-02-09 22:48:00 2023-02-15     7 0.352  b    
##  8 2023-02-10 07:10:40 2023-02-15     8 0.502  l    
##  9 2023-02-10 06:04:49 2023-03-01     9 0.776  l    
## 10 2023-02-10 13:25:36 2023-03-07    10 0.565  i    
## # … with 990 more rows

First, you can explicitly print() the data frame and control the number of rows (n) and the width of the display. width = Inf will display all columns:

nycflights13::flights %>% 
  print(n = 10, width = Inf)

You can control the default print behaviour by setting options:

  • options(tibble.print_max = n, tibble.print_min = m): if more than m rows, print only n rows. Use options(dplyr.print_min = Inf) to always show all rows.

  • Use options(tibble.width = Inf) to always print all columns, regardless of the width of the screen.


2. Subsetting

If you want to pull out a single variable, you need some new tools, $ and [[.

[[ can extract by name or position; $ only extracts by name but is a little less typing.

df <- tibble(
  x = runif(5),
  y = rnorm(5)
)

# Extract by name
df$x
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426
df[["x"]]
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426
# Extract by position
df[[1]]
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426

To use these in a pipe, you’ll need to use the special placeholder .:

# Extract in pipe
df %>% .$x
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426
df %>% .[["x"]]
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426

Compared to a data.frame,

tibbles are more strict: they never do partial matching

# Partial matching
d = data.frame( alpha= runif(10), beta=runif(10))
d$al
##  [1] 0.6276676 0.3800857 0.2967158 0.8523114 0.5388956 0.8361498 0.8964996 0.8116371
##  [9] 0.3043738 0.2367873
# surly tibble
t = tibble( alpha= runif(10), beta=runif(10))
t$al
## Warning: Unknown or uninitialised column: `al`.
## NULL

and they will generate a warning if the column you are trying to access does not exist.


Interacting with older code

Some older functions don’t work with tibbles. If you encounter one of these functions, use as.data.frame() to turn a tibble back to a data.frame:

class(as.data.frame(tb))
## [1] "data.frame"

A work by Matteo Cereda and Fabio Iannelli