packyourcode/namespace.Rmd at master · anouel/packyourcode · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# Namespace {#namespace}

## Introduction

In addition to the mandatory `DESCRIPTION` file in the top level of the package directory, an R package also requires another mandatory text file called `NAMESPACE`. This file, with its syntax and its purpose, is perhaps one of the more disorienting elements in a package. So let me try to demistify and explain what a name `NAMESPACE` file is used for.


## About Namespaces

Admittedly, the word _namespace_ is one of those terms that says everything and nothing at the same time. The meaning behind _namespace_ is basically that: space for a name. That's the simple part. The complication arises when we naturally ask: What name? What space?

According to [Wikipedia](https://en.wikipedia.org/wiki/Namespace):

> In computing, a namespace is a set of symbols that are used to organize objects of various kinds, so that these objects may be referred to by name. Prominent examples include:
>
> - file systems are namespaces that assign names to files
> - some programming languages organize their variables and subroutines in namespaces
> - computer networks and distributed systems assign names to resources, such as computers, printers, websites, (remote) files, etc.


The main concept is that of a __name__ of an object and its __value__. For example, consider the following code in which objects `a` and `b` are created:

```{r eval = FALSE}
a <- 1
b <- c('hi', 'there')
```

From a technical point of view, the command `a <- 1` implies that an object named `a` is being assigned the value of 1. In turn, the command `b <- c('hi', 'there')` involves creating an object named `b` which takes the value of a character vector.

When you type the name of an object, R will look up for it and see what value is associated to (if the object exists). In order to uniquely and correctly identified the value of a name (i.e. named object), R uses a formal system of environments, scoping rules, and namespaces. Without entering into the details of environments, and scoping rules, the idea of a namespace is to provide a context so that R knows how to find the value of an object associated with a name. The lack of such a context would result in total chaos.

Why is this context extremely important with R packages? The best way to answer this question is by thinking about two (or more) different packages containing a function with the same name. For instance, base R contains the function `norm()` which computes the norm of a matrix (and returns an error when you pass a vector to it). Say that I write my own function `norm()` to compute the norm of a vector. If I run the command `norm(1:5)`, what's going to happen? Will R get confused about which `norm()` to use? Will it use R's base `norm()`? Or will it use my `norm()`?

This is where namespaces come very handy. Let's assume that my function `norm()` is actually part of a package called `"vectortools"`. By having a dedicated namespace for each package, we can use the `::` operator to tell R which function (which name) to use: either `base::norm()` or `vectortools::norm()`.


## Namespace of a Package

So far we've discussed the idea of a _namescape_, which is what allows you to invoke commands such as `vectortools::norm()`, so that R does not get confused about which function `norm()` to use. But what about the file `NAMESPACE`?

According the technical manual _[Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-namespaces)_,

> This file contains _namespace directives_ describing the imports and exports of the namespace.

What does this mean?

In regards to a package, the concept of __names__ has to do with the names of the objects in the package. Most objects in a package are the functions that it is made of. But a package can also contain data sets typically in the form of data frames, or matrices, or other types of data structures.

The so-called _namespace directives_ is just the fancy name used to indicate: 1) which functions of your package will be _exported_, and optionally 2) which functions from other packages are _imported_ by your package, or 3) which packages need to be imported because your package depends on them.


### Exported Functions

Not all the functions that you write have to be available for the end-user. Often, you will be writing small auxiliary functions that are called by other functions. A typical example of these functions are checkers (or validators). In the `"cointoss"` package example, we have functions like `check_sides()` and `check_prob()`, which are the auxiliary functions called by the constructor function `coin()`. The constructor is the function that has been designed for the end user; the auxiliary functions are not intended to be called by the end user.

You can tell R which functions are to be exported: those that the end user is supposed to invoke. And which functions to keep "under the hood" (by not exporting them, by default). The list of functions and methods that will be exported (export directives) goes inside the `NAMESPACE`.

Likewise, if your package uses functions from other packages that don't form part of R's base distribution, you should also specify these in the `NAMESPACE` file.


### Roxygen export

This book assumes that you are using roxygen comments to document the functions. The good news is that roxygen has a dedicated keyword that let's you specify an _export directive_: the `@export` keyword. Every time you include an `@export` comment in the documentation of a function, this will be taken into account when running the code that generates the corresponding documentation (i.e. in an `.Rd` file).

Here's an example with the code of the `coin()` function. Notice the roxygen comment `@export` to indicate that this function must be exported in the namespace of the package.

```{r eval = FALSE}
#' @title Coin
#' @description Creates an object of class \code{"coin"}
#' @param sides vector of coin sides
#' @param prob vector of side probabilities
#' @return an object of class \code{"coin"}
#' @export
#' @examples
#' # default
#' coin1 <- coin()
#'
coin <- function(sides = c("heads", "tails"), prob = c(0.5, 0.5)) {
  check_sides(sides)
  check_prob(prob)

  object <- list(
    sides = sides,
    prob = prob)
  class(object) <- "coin"
  object
}
```

The way we are going to build a package will be done in such a manner that R will automatically generate content for the `NAMESPACE` file. Assuming that `coin()` was the only exported function, then `NAMESPACE` would look like this:

```
# Generated by roxygen2: do not edit by hand

export(coin)
```

This means that the package `"roxygen2"` is actually doing the work for us, automatically updating `NAMESPACE`. Observe the syntax `export(coin)`, meaning that `coin()` will be exported.