Introduction to reticulate

Overview

The reticulate package provides an R interface to Python modules, classes, and functions. For example, this code imports the Python os module and calls some functions within it:

library(reticulate)
os <- import("os")
os$chdir("tests")
os$getcwd()

Functions and other data within Python modules and classes can be accessed via the $ operator (analogous to the way you would interact with an R list, environment, or reference class).

The reticulate package is compatible with all versions of Python >= 2.7. Integration with NumPy is optional and requires NumPy >= 1.6.

Type Conversions

When calling into Python, R data types are automatically converted to their equivalent Python types. When values are returned from Python to R they are converted back to R types. Types are converted as follows:

R Python Examples
Single-element vector Scalar 1, 1L, TRUE, "foo"
Multi-element vector List c(1.0, 2.0, 3.0), c(1L, 2L, 3L)
List of multiple types Tuple list(1L, TRUE, "foo")
Named list Dict list(a = 1L, b = 2.0), dict(x = x_data)
Matrix/Array NumPy ndarray matrix(c(1,2,3,4), nrow = 2, ncol = 2)
Function Python function function(x) x + 1
NULL, TRUE, FALSE None, True, False NULL, TRUE, FALSE

If a Python object of a custom class is returned then an R reference to that object is returned. You can call methods and access properties of the object just as if it was an instance of an R reference class.

Importing Modules

The import() function can be used to import any Python module. For example:

difflib <- import("difflib")
difflib$ndiff(foo, bar)

filecmp <- import("filecmp")
filecmp$cmp(dir1, dir2)

The import_main() and import_builtins() functions give you access to the main module where code is executed by default and the collection of built in Python functions. For example:

main <- import_main()

py <- import_builtins()
py$print('foo')

The main module is generally useful if you have executed Python code from a file or string and want to get access to it’s results (see the section below for more details).

Sourcing Scripts

The source_python() function will source a Python script and make the objects it creates available within an R environment (by default the calling environment). For example, consider the following Python script:

def add(x, y):
  return x + y

We source it using the source_python() function and then can call the add() function directly from R:

source_python('add.py')
add(5, 10)
[1] 15

Executing Code

You can execute Python code within the main module using the py_run_file and py_run_string functions. These functions both return a reference to the main Python module so you can access the results of their execution. For example:

py_run_file("script.py")

main <- py_run_string("x = 10")
main$x

Object Conversion

By default when Python objects are returned to R they are converted to their equivalent R types. However, if you’d rather make conversion from Python to R explicit and deal in native Python objects by default you can pass convert = FALSE to the import function. In this case Python to R conversion will be disabled for the module returned from import. For example:

# import numpy and specify no automatic Python to R conversion
np <- import("numpy", convert = FALSE)

# do some array manipulations with NumPy
a <- np$array(c(1:4))
sum <- a$cumsum()

# convert to R explicitly at the end
py_to_r(sum)

As illustrated above, if you need access to an R object at end of your computations you can call the py_to_r() function explicitly.

Getting Help

You can print documentation on any Python object using the py_help() function. For example:

os <- import("os")
py_help(os$chdir)

Lists, Tuples, and Dictionaries

The automatic conversion of R types to Python types works well in most cases, but occasionally you will need to be more explicit on the R side to provide Python the type it expects.

For example, if a Python API requires a list and you pass a single element R vector it will be converted to a Python scalar. To overcome this simply use the R list function explicitly:

foo$bar(indexes = list(42L))

Similarly, a Python API might require a tuple rather than a list. In that case you can use the tuple() function:

tuple("a", "b", "c")

R named lists are converted to Python dictionaries however you can also explicitly create a Python dictionary using the dict() function:

dict(foo = "bar", index = 42L)

This might be useful if you need to pass a dictionary that uses a more complex object (as opposed to a string) as it’s key.

Arrays

R matrices and arrays are converted automatically to and from NumPy arrays.

When converting from R to NumPy, the NumPy array is mapped directly to the underlying memory of the R array (no copy is made). In this case, the NumPy array uses a column-based in memory layout that is compatible with R (i.e. Fortran style rather than C style). When converting from NumPy to R, R receives a column-ordered copy of the NumPy array.

You can also manually convert R arrays to NumPy using the np_array() function. For example, you might do this if you needed to create a NumPy array with C rather than Fortran style in-memory layout (for higher performance in row-oriented computations) or if you wanted to control the data type of the NumPy array more explicitly. Here are some example uses of np_array():

a <- np_array(c(1:8), dtype = "float16")
a <- np_array(c(1:8), dim = c(2,4), order = "C")

Reasoning about arrays which use distinct in-memory orders can be tricky. The Arrays in R and Python article provides additional details.

With Contexts

The R with generic function can be used to interact with Python context manager objects (in Python you use the with keyword to do the same). For example:

py <- import_builtins()
with(py$open("output.txt", "w") %as% file, {
  file$write("Hello, there!")
})

This example opens a file and ensures that it is automatically closed at the end of the with block. Note the use of the %as% operator to alias the object created by the context manager.

Iterators

If a Python API returns an iterator or generator you can interact with it using the iterate() function. The iterate() function can be used to apply an R function to each item yielded by the iterator:

iterate(iter, print)

If you don’t pass a function to iterate the results will be collected into an R vector:

results <- iterate(iter)

Note that the Iterators will be drained of their values by iterate():

a <- iterate(iter) # results are not empty
b <- iterate(iter) # results are empty since items have already been drained

Element Level Iteration

You can also iterate on an element-by-element basis using the iter_next() function. For example:

while (TRUE) {
  item <- iter_next(iter)
  if (is.null(item))
    break
}

By default iter_next() will return NULL when the iteration is complete but you can provide a custom completed value it will be returned instead. For example:

while (TRUE) {
  item <- iter_next(iter, completed = NA)
  if (is.na(item))
    break
}

Note that some iterators/generators in Python are infinite. In that case the caller will need custom logic to determine when to terminate the loop.

Generators

Python generators are functions that implement the Python iterator protocol. Similarly, the reticulate generator() function enables you to create a Python iterator from an R function.

In Python, generators produce values using the yield keyword. In R, values are simply returned from the function. One benefit of the yield keyword is that it enables successive iterations to use the state of previous iterations. In R, this can be done by returning a function that mutates it’s enclosing environment via the <<- operator. For example:

# define a generator function
sequence_generator <-function(start) {
  value <- start
  function() {
    value <<- value + 1
    value
  }
}

# convert the function to a python iterator
iter <- py_iterator(sequence_generator(10))

If you want to indicate the end of the iteration, return NULL from the function:

sequence_generator <-function(start) {
  value <- start
  function() {
    value <<- value + 1
    if (value < 100)
      value
    else
      NULL
  }
}

Note that you can change the value that indicates the end of the iteration using the completed parameter (e.g. py_iterator(func, completed = NA)).

Advanced Functions

There are several more advanced functions available that are useful principally when creating high level R interfaces for Python libraries.

Python Objects

Typically interacting with Python objects from R involves using the $ operator to access whatever properties for functions of the object you need. When using the $, Python objects are automatically converted to their R equivalents when possible. The following functions enable you to interact with Python objects at a lower level (e.g. no conversion to R is done unless you explicitly call the py_to_r function):

Function Description
py_has_attr() Check if an object has a specified attribute.
py_get_attr() Get an attribute of a Python object.
py_set_attr() Set an attribute of a Python object.
py_list_attributes() List all attributes of a Python object.
py_len() Length of Python object.
py_call() Call a Python callable object with the specified arguments.
py_to_r() Convert a Python object to it’s R equivalent
r_to_py() Convert an R object to it’s Python equivalent

Pickle

You can save and load Python objects (via pickle) using the py_save_object and py_load_object functions:

Function Description
py_save_object() Save a Python object to a file with pickle.
py_load_object() Load a previously saved Python object from a file.

Configuration

The following functions enable you to query for information about the Python configuration available on the current system.

Function Description
py_available() Check whether a Python interface is available on this system.
py_numpy_available() Check whether the R interface to NumPy is available (requires NumPy >= 1.6)
py_module_available() Check whether a Python module is available on this system.
py_config() Get information on the location and version of Python in use.

Output Control

These functions enable you to capture or suppress output from Python:

Function Description
py_capture_output() Capture Python output for the specified expression and return it as an R character vector.
py_suppress_warnings() Execute the specified expression, suppressing the display Python warnings.

Miscellaneous

The functions provide miscellaneous other lower-level capabilities:

Function Description
py_set_seed() Set Python and NumPy random seeds.
py_unicode() Convert a string to a Python unicode object.
py_str() Get the string representation of Python object.
py_id() Get a unique identifier for a Python object
py_is_null_xptr() Check whether a Python object is a null externalptr.
py_validate_xptr() Check whether a Python object is a null externalptr and throw an error if it is.

Learning More

The following articles cover additional aspects of using reticulate: