Basics of R Programming Language

R!

R is a powerful programming language and open-source software widely used for statistical computing and data analysis. This programming language is developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. R has gained popularity among statisticians, data scientists, researchers, and analysts for its flexibility, extensibility, and robust statistical capabilities.

Why learn R?

Here are several compelling reasons to consider learning R:

  • Statistical Analysis
  • Data Visualisation
  • Open Source
  • Community Support
  • Extensibility
  • Integration with Other Languages
  • Data Science and Machine Learning
  • Widely Used in Academia and Industry
  • Continuous Development

Getting Started with R

To begin working with R, users typically install an Integrated Development Environment (IDE) such as RStudio, which provides a user-friendly interface for coding, debugging, and visualising results. R scripts are written in the R language and can be executed interactively or saved for later use.

A look around RStudio

Open RStudio. You will see four windows (aka panes). Each window has a different function. The screenshot below shows an analogy linking the different RStudio windows to cooking.

Console Pane

On the left-hand side, you’ll find the console. This is where you can input commands (code that R can interpret), and the responses to your commands, known as output, are displayed here. While the console is handy for experimenting with code, it doesn’t save any of your entered commands. Therefore, relying exclusively on the console is not recommended.

History Pane

The history pane (located in the top right window) maintains a record of the commands that you have executed in the R console during your current R session. This includes both correct and incorrect commands.

You can navigate through your command history using the up and down arrow keys in the console. This allows you to quickly recall and re-run previous commands without retyping them.

Environment Pane

The environment pane (located in the top right window) provides an overview of the objects (variables, data frames, etc.) that currently exist in your R session. It displays the names, types, dimensions, and some content of these objects. This allows you to monitor the state of your workspace in real-time.

Plotting Pane

The plotting pane (located in the bottom right window) is where graphical output, such as plots and charts, is displayed when you create visualisations in R. The Plotting pane often includes tools for zooming, panning, and exporting plots, providing additional functionality for exploring and customising your visualisations. Help Pane:

Help Pane

The help pane (located in the bottom right window) is a valuable resource for accessing documentation and information about R functions, packages, and commands. When you type a function or command in the console and press the F1 key (Mac: fn + F1) the Help pane displays relevant documentation. Additionally, you can type a keyword in the text box at the top right corner of the Help Pane.

Files Pane

The files pane provides a file browser and file management interface within RStudio. It allows you to navigate through your project directories, view files, and manage your file system.

Packages Pane

This pane provides a user-friendly interface for managing R packages. It lists installed packages and allows you to load, unload, update, and install packages.

Viewer Pane

It is used to display dynamic content generated by R, such as HTML, Shiny applications, or interactive visualisations.

Working directory

Opening an RStudio session launches it from a specific location. This is the working directory. R looks in the working directory by default to read in data and save files. You can find out what the working directory is by using the command getwd(). This shows you the path to your working directory in the console. In Mac this is in the format /path/to/working/directory and in Windows C:\path\to\working\directory. It is often useful to have your data and R scripts in the same directory and set this as your working directory. We will do this now.

Make a folder for this course somewhere on your computer that you will be able to easily find. Name the folder for example, Intro_R_REDCap_course. Then, to set this folder as your working directory:

In RStudio click on the Files tab and then click on the three dots, as shown below.

In the window that appears, find the folder you created (e.g. Intro_R_REDCap_course), click on it, then click Open. The files tab will now show the contents of your new folder. Click on More → Set As Working Directory, as shown below.

Note: You can use an RStudio project as described here to automatically keep track of and set the working directory.

R Scripts

In RStudio, the Script pane (located at the top left window) serves as a dedicated space for writing, editing, and executing R scripts. It is where you compose and organise your R code, making it an essential area for creating reproducible and well-documented analyses.

RStudio provides syntax highlighting in the Script pane, making it easier to identify different components of your code. You can execute individual lines or selections of code from the Script pane. This helps in testing and debugging code without running the entire script.

Quarto Document

Quarto is an open-source scientific and technical publishing system that allows you to combine text, code, and output in a single document. It is the next-generation version of RMarkdown and is widely used for reproducible research, dynamic reports, and interactive documents.

With Quarto, you can:

  • Write reports that integrate R code and results
  • Create interactive documents (HTML, PDF, Word, and more)
  • Publish research outputs with dynamic figures and tables

Why use Quarto?

  • Reproducibility
  • Combines analysis and documentation in one file
  • Flexible Outputs
  • Generate HTML, PDF, Word, and presentations
  • Works with R, Python, and Julia
  • Supports Markdown Syntax
  • Easy formatting for text and visuals

In this workshop, we will be using Quarto documents to write R code.

Getting Started with a Quarto Document

Follow these steps to create a new Quarto document in RStudio:

Open a New Quarto Document

  1. Open RStudio
  2. Go to File → New File → Quarto Document
  3. A dialog box will appear:
    • Title: Enter a document title as “Analysing REDCap Data using R”
    • Format: Leave default format as HTML
    • Engine: Leave default engine as knitr
  4. Click Create.

This creates a new .qmd file in RStudio, which is a Quarto document.

Save the File

  1. Click File → Save As
  2. Choose a meaningful filename, e.g., introR_workshop.qmd
  3. Click Save

Understanding the Structure of a Quarto Document

A Quarto document consists of three main sections:

  1. YAML Header (Metadata Section)

This section is enclosed at the top of the file using — and contains metadata. Example:

---
title: "My First Quarto Document"
author: "John Doe"
date: "2025-01-30"
format: html
---

Common YAML options:

  • title: Document title
  • author: Name of the author
  • date: Date of the document
  • format: Output type (HTML, PDF, Word, etc.)
  1. Text and Markdown (Narrative Section)

Quarto supports Markdown, a simple way to format text.

  • Headings:

    # Main Heading
    ## Subheading
    ### Smaller Heading
  • Bold and Italic Text:

    **Bold Text**
    *Italic Text*
  • Lists:

    -   Bullet Point 1
    -   Bullet Point 2
  • Hyperlinks and Images:

    [Click here for Quarto docs](https://quarto.org/)
    ![RStudio Logo](https://www.rstudio.com/wp-content/uploads/2014/04/rstudio-logo.png)
  1. Code Blocks (Executable Section)

Quarto allows you to insert code chunks that run R scripts inside your document.

Example R Code Chunk:

```{r}
# Example calculation
x <- c(1, 2, 3, 4, 5)
sum(x)
```

To insert a code chunk, go to Code in the menu -> Insert Code Chunk or use the keyboard shortcuts Windows/Linux: Ctrl + Alt + I or Mac: + Option + I. Code is written inside triple backticks and it is executed when you render the document.

  1. Running and Rendering a Quarto Document

To run a single code chunk click the Run button at the top of the chunk or use the keyboard shortcut Windows/Linux: Ctrl + Shift + Enter or Mac: + Shift + Enter.

To generate an output file (HTML, PDF, or Word), click the Render button in RStudio. The document compiles and opens the rendered file.

Tip

If PDF output fails, install TinyTeX for LaTeX support:

install.packages("tinytex")

Keyboard Shortcuts in Quarto (Windows & Mac)

Action Windows/Linux Mac
Run a single code line Ctrl + Enter Cmd + Enter
Run a single code chunk Ctrl + Shift + Enter Cmd + Shift + Enter
Run all chunks above Ctrl + Alt + P Cmd + Option + P
Render (Knit) document Ctrl + Shift + K Cmd + Shift + K
Insert a new code chunk Ctrl + Alt + I Cmd + Option + I
Comment/uncomment a line Ctrl + Shift + C Cmd + Shift + C
Open Quarto Render menu Ctrl + Shift + R Cmd + Shift + R
Open Quarto preview Ctrl + Shift + O Cmd + Shift + O
Restart R session Ctrl + Shift + F10 Cmd + Shift + F10

Comments

In R, any text following the hash symbol # is termed a comment. R disregards this text, considering it non-executable. Comments serve the purpose of documenting your code, aiding your future understanding of specific lines, and highlighting the intentions or challenges encountered.

RStudio makes it easy to comment or uncomment a paragraph: Select the lines you want to comment (to comment a set of lines) or placing the cursor at any location of a line (to comment a single line), press at the same time on your keyboard + Shift + C (mac) or Ctrl + Shift + C (Windows/Linux).

Extensive use of comments is encouraged throughout this course.

# This is a comment. Ignored by R. But useful for me!

Executing Commands

Executing commands or running code is the process of submitting a command to your computer, which does some computation and returns an answer. In RStudio, there are several ways to execute commands:

  • Select the line(s) of code using the mouse, and then click Run at the top right corner of the R text file.
  • Select Run Lines from the Code menu.
  • Click anywhere on the line of code and click Run.
  • Select the line(s) you want to run. Press + Return (Mac) or Ctrl + Return (Windows/Linux) to run the selected code.

We suggest the third option, which is fastest. This link provides a list of useful RStudio keyboard shortcuts that can be beneficial when coding and navigating the RStudio IDE.

When you type in, and then run the commands shown in the grey boxes below, you should see the result in the Console pane at bottom left.

Simple Maths in R

We can use R as a calculator to do simple maths.

3 + 5
[1] 8

More complex calculator functions are built in to R, which is the reason it is popular among mathematicians and statisticians. To use these functions, we need to call these functions.

Try It Yourself

Add a R code chunk and find the result of the equation: \[ \frac{3^2 \times 8^3}{10 + 5} -120\]

(3^2 * 8^3)/(10 + 5) - 120
[1] 187.2

Calling Functions

R has a large collection of built-in functions that are called like this:

function_name(argument1 = value1, argument2 = value2, ...)

Let’s explore using seq() function to create a series of numbers.

Start by typing se and then press Tab. RStudio will suggest possible completions. Specify seq() by typing more or use the up/down arrows to select it. You’ll see a helpful tooltip-type information pop up, reminding you of the function’s arguments. If you need more assistance, press F1 (Windows/linux) or fn + Tab (Mac) to access the full documentation in the help tab at the lower right.

Now, type the arguments 1, 10 and press Return.

seq(1, 10)
 [1]  1  2  3  4  5  6  7  8  9 10

You can explicitly specify arguments using the name = value format. However, if you don’t, R will try to resolve them based on their position.

seq(from = 1, to = 10)
 [1]  1  2  3  4  5  6  7  8  9 10

In this example, it assumes that we want a sequence starting from 1 and ending at 10. Since we didn’t mention the step size, it defaults to the value defined in the function, which is 1 in this case.

seq(from = 1, to = 10, by = 2)
[1] 1 3 5 7 9

If you are using name = value format the order of the arguments does not matter.

seq(to = 10, by = 2, from = 1)
[1] 1 3 5 7 9

For frequently used functions, I might rely on positional resolution for the first one or two arguments. However, beyond that, I prefer to use the name = value format for clarity and precision.

To take the log of 100:

log(x = 100, base = 10)
[1] 2

To take the square root of 100:

sqrt(100) # this is the short-hand of sqrt(x = 100)
[1] 10

Notice that the square root function is abbreviated to sqrt(). This is to make writing R code faster, however the draw back is that some functions are hard to remember, or to interpret.

Try It Yourself

Find the sum of log square root values of the sequence 10, 20, 30, …, 100.

sum(log(sqrt(seq(10, 100, 10))))
[1] 19.06513

Getting Help

In R, the ? and ?? operators are used for accessing help documentation, but they behave slightly differently.

  • The ? operator is used to access help documentation for a specific function or topic. When you type ? followed by the name of a function, you get detailed information about that function. For example try:
?mean
View Output
<!DOCTYPE html> R: Arithmetic Mean
mean R Documentation

Arithmetic Mean

Description

Generic function for the (trimmed) arithmetic mean.

Usage

mean(x, ...)

## Default S3 method:
mean(x, trim = 0, na.rm = FALSE, ...)

Arguments

x

An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for trim = 0, only.

trim

the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.

na.rm

a logical evaluating to TRUE or FALSE indicating whether NA values should be stripped before the computation proceeds.

further arguments passed to or from other methods.

Value

If trim is zero (the default), the arithmetic mean of the values in x is computed, as a numeric or complex vector of length one. If x is not logical (coerced to numeric), numeric (including integer) or complex, NA_real_ is returned, with a warning.

If trim is non-zero, a symmetrically trimmed mean is computed with a fraction of trim observations deleted from each end before the mean is computed.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

weighted.mean, mean.POSIXct, colMeans for row and column means.

Examples

x <- c(0:10, 50)
xm <- mean(x)
c(xm, mean(x, trim = 0.10))

The above command displays the help documentation for the mean function, providing information about its usage, arguments, and examples.

  • The ?? operator is used for a broader search across help documentation. It performs a search for the specified term or keyword in the documentation.
??regression

This will search for the term “regression” in the help documentation and return relevant results. It’s useful when you want to find functions, packages, or topics related to a specific term.

Tab completion

A very useful feature is Tab completion. You can start typing and use Tab to autocomplete code, for example, a function name.

Try It Yourself

Check the help page of log function.

help(log)

R Packages

Many developers have built 1000s of functions and shared them with the R user community to help make everyone’s work easier and more efficient. These functions (short programs) are generally packaged up together in (wait for it) Packages. For example, the tidyverse package is a compilation of many different functions, all of which help with data transformation and visualisation. Packages also contain data, which is often included to assist new users with learning the available functions.

Installing Packages

Packages are hosted on repositories, with CRAN (Comprehensive R Archive Network) being the primary repository. To install packages from CRAN, you use the install.packages() function. For example:

install.packages("tidyverse")

This will spit out a lot of text into the console as the package is being installed. Once complete you should have a message:

The downloaded binary packages are in... followed by a long directory name.

To remove an installed package:

remove.packages("tidyverse")

Loading Packages

After installation, you need to load a package into your R session using the library() function. For example:

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

This makes the functions and datasets from the ‘tidyverse’ package available for use in your current session.

Tip

You only need to install a package once. Once installed, you don’t need to reinstall it in subsequent sessions. However, you do need to load the package at the beginning of each R session using the library() function before you can utilise its functions and features. This ensures that the package is actively available for use in your current session.

To view packages currently loaded into memory:

(.packages())
 [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
 [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
[13] "grDevices" "utils"     "datasets"  "methods"   "base"     
search()
 [1] ".GlobalEnv"        "package:lubridate" "package:forcats"  
 [4] "package:stringr"   "package:dplyr"     "package:purrr"    
 [7] "package:readr"     "package:tidyr"     "package:tibble"   
[10] "package:ggplot2"   "package:tidyverse" "package:stats"    
[13] "package:graphics"  "package:grDevices" "package:utils"    
[16] "package:datasets"  "package:methods"   "Autoloads"        
[19] "package:base"     

Package Documentation

Each package comes with documentation that explains how to use its functions. You can access this information using the help() function or by using ? before the function name:

help(tidyverse)
View Output
<!DOCTYPE html> R: tidyverse: Easily Install and Load the ‘Tidyverse’
tidyverse-package R Documentation

tidyverse: Easily Install and Load the ‘Tidyverse’

Description

logo

The ‘tidyverse’ is a set of packages that work in harmony because they share common data representations and ‘API’ design. This package is designed to make it easy to install and load multiple ‘tidyverse’ packages in a single step. Learn more about the ‘tidyverse’ at https://www.tidyverse.org.

Author(s)

Maintainer: Hadley Wickham hadley@rstudio.com

Other contributors:

  • RStudio [copyright holder, funder]

See Also

Useful links:

or by using vignette (if the documentation is in the form of vignettes):

vignette(package="tidyverse")

Variables

A variable is a bit of tricky concept, but very important for understanding R. Essentially, a variable is a symbol that we use in place of another value. Usually the other value is a larger/longer form of data. We can tell R to store a lot of data, for example, in a variable named x. When we execute the command x, R returns all of the data that we stored there.

For now however we’ll just use a tiny data set: the number 5. To store some data in a variable, we need to use a special symbol <-, which in our case tells R to assign the value 5 to the variable x. This is called the assignment operator. To insert the assignment operator press Option + - (Mac) or Alt + - (Windows/Linux).

Let’s see how this works.

Create a variable called x, that will contain the number 5.

x <- 5

R won’t return anything in the console, but note that you now have a new entry in the environment pane. The variable name is at the left (x) and the value that is stored in that variable, is displayed on the right (5).

We can now use x in place of 5:

x + 10
[1] 15
x * 3
[1] 15

Variables are sometimes referred to as objects. In R there are different conventions about how to name variables, but most importantly they:

  • cannot begin with a number
  • should begin with an alphabetical letter
  • they are case sensitive
  • variables can take any name, but its best to use something that makes sense to you, and will likely make sense to others who may read your code.

It is wise to adapt a consistent convention for separating words in variables.

For example:

# i_use_snake_case
# other.people.use.periods
# evenOthersUseCamelCase
Try It Yourself

Assign the value 5 to a variable a and the value 10 to a variable b. Then, create a new variable sum that stores the result of adding a and b, and print the value of sum.

a <- 5
b <- 10
sum <- a + b
sum
[1] 15

The Pipe Operator (|> or %>%)

The pipe operator (|>) is a commonly used feature of the tidyverse. It was originally defined in the (cleverly named) magrittr package, but is also included in the dplyr, tidyverse packages. The |> symbol can seem confusing and intimidating at first. However, once you understand the basic idea, it can become addicting!

We suggest you use a shortcut: + Shift + M (Mac) or Ctrl + Shift + M (Windows/Linux).

The |> symbol is placed between a value on the left and a function on the right. The |> simply takes the value to the left and passes it to the function on the right as the first argument. It acts as a “pipe”. That’s it!

Suppose we have a variable, x.

x <- 7

The following are the exact same.

sqrt(x)
[1] 2.645751
x |> sqrt()
[1] 2.645751

We’ll continue to use |> throughout this tutorial to show how useful it can be for chaining various data manipulation steps during an analysis.

Chaining functions

R chaining allows you to streamline your data analysis workflow by sequentially applying multiple operations to your data using the pipe operator |>. We often need to perform several data manipulation or analysis operations in a sequence. Chaining allows you to apply these operations one after the other in a clear and concise manner.

Here’s a basic template for chaining operations using the pipe operator |>:

result <- data |>
    operation1(...) |>
    operation2(...) |>
    operation3(...) |>
    ...
    operationN(...)

In this template:

  • data represents the input data frame or object.
  • operation1, operation2, …, operationN represent the functions or operations you want to apply sequentially to the data.For example: select(), filter() or mutate() functions.
  • ... represents any additional arguments or parameters that may be passed to each operation.

Each operation takes the output of the previous operation as its input, making it easy to chain multiple operations together. This improves the readability of your code by organising operations in a left-to-right fashion and it avoids creating intermediate variables to store the results of each operation.

Try It Yourself

Find the sum of log square root values of the sequence 10, 20, 30, …, 100 by chaining functions.

seq(10, 100, 10) |> sqrt() |> log() |> sum()
[1] 19.06513

Clearing the Environment

Take a look at the objects you have created in your workspace that is accumulated in the environment pane in the upper right corner of RStudio.

You can obtain a list of objects in your workspace using a couple of different R commands:

objects()
Output
[1] "a"   "b"   "sum" "x"  
ls()
Output
[1] "a"   "b"   "sum" "x"  

If you wish to remove a specific object, let’s say x1, you can use the following command:

rm(a)

To remove all objects:

rm(list = ls())

Alternatively, you can click the broom icon in RStudio’s Environment pane to clear everything.

For the sake of reproducibility, it’s crucial to regularly delete your objects and restart your R session. This ensures that your analysis can be replicated next week or even after upgrading your operating system. Restarting your R session helps identify and address any dependencies or configurations needed for your analysis to run successfully.

Case Study: Immunotherapy Dataset

In this workshop, we are using a dummy Immunotherapy dataset on REDCap filled with randomly generated data. Therefore, note that in some cases the data can make no sense. However, this will be useful for learning how to import data into R, data manipulation and basic visualisation.

This dataset contains 15 instruments or forms namely: Demographics, Melanoma Data, Adjuvant Therapy, Systemic Therapy for Advanced Disease, Melanoma CNS Metastases, Adverse Events, Baseline Visit, Checkpoint Inhibitor Treatment, Immune Related Adverse Events (irAEs), Pathology, PET irAE Imaging, PPI and Antibiotic use during treatment with CPIs, Response Data, and Mortality Data.

Importing REDCap Data

REDCap API

REDCap (Research Electronic Data Capture) provides an Application Programming Interface (API) that allows users to programmatically access and interact with their project data. The API enables automation of data retrieval, updates, and exports, reducing manual effort and ensuring reproducibility in data analysis.

Peter Mac REDCap Instance

What is the REDCap API?

The REDCap API is a web-based service that allows users to interact with REDCap programmatically. Instead of manually downloading CSV files, users can use the API to:

  • Retrieve records from a REDCap project
  • Import new data or update existing records
  • Export metadata (variable names, field types)
  • Pull longitudinal and repeating instrument data
  • Generate reports dynamically

The API facilitates automated data retrieval, making it a powerful tool for integrating REDCap data into R-based workflows.

Example Use Case

A researcher can schedule a daily script in R to pull the latest REDCap data for real-time analysis instead of manually exporting files from the web interface.

Requesting an API Token in REDCap

To access the API, users must obtain an API token, which is a unique, secure key that authenticates requests. REDCap provides API tokens at the user-project level. This means that if three users on the same project need to use the API, each user will need to individually request an API token. Similarly, if one user wants an API token for three different projects, they will need to request an API token for each project.

Steps to Request an API Token for a Project:

  1. Log in to REDCap and navigate to your project.
  2. If your REDCap project has API access enabled, you will see it in the applications on the left side of the screen as follows. Otherwise contact REDCapServiceDesk@petermac.org.

  1. Click on “API” under the Applications menu.

  2. Click “Request API token” to send a token request to the REDCap administrative team.

  3. Your REDCap administrator will review and approve your request.

  4. Once approved, you will receive a unique API token (a long alphanumeric string).

Important

Keep your API token private and never share it. It grants full access to your REDCap project data.

Using the REDCap API

The best way to familiarise yourself with the REDCap API is to explore the API Playground.

  1. Click on the “API Playground” link from the left-hand menu under “Applications.”

  2. Once in the API Playground, there is a blue box with a dropdown menu labeled “API Method.” This dropdown includes all the API actions REDCap can take.

    1. If a project is in production, the methods listed in this dropdown will be limited so as not to affect real data in the project. This is noted in the green text under the “API Method” dropdown.

  1. Select the method you need from the dropdown menu and complete the additional information. The additional information (e.g., “Format”, “Instrument”, etc.) will vary depending on which API method you choose and the project structure. In the above example, the researcher is asking to export project information as a CSV.

    1. To see all the API functions REDCap is capable of, and export a .zip file of sample code, click on the “REDCap API documentation” link that is available on both the “API” page and in the “API Playground.”
  2. When you scroll further down the page, there is an open text box with a series of tabs on the top, with each tab corresponding to a coding language. Each tab will provide the API code in the indicated language.

  1. To execute a real API request, click the “Execute Request” button, and it will display the API response in a textbox as follows.

On the API Playground, there is a button that will let you “Execute Request.” This will perform the API action you are programming and thus affect the data in your project. Use this button with a great amount of caution.

Security Considerations for API Access

Since the API token provides direct access to your REDCap project, it must be handled securely.

Best Practices for API Security:

  • Never share your API token with anyone. Keep your API token private. – Never hardcode it in scripts.
  • Do not test API tokens in browsers. Using an API token in plain text within a script is unsecure. An API token should be encrypted within a script, be called via secure environment variables, or otherwise be accessible from the script via other secure mechanisms.
  • Before you share code anywhere, remove your API token.
  • Enable logging and monitor API access regularly.
  • Revoke unused API tokens if they are no longer needed.
  • Regenerate your API token every 90 days, or at any point that you think your token has been compromised. To regenerate your token, go to the API page and select “Regenerate token.” If you are no longer using the API functionality on your project, delete your token.

Using an Environment Variable for API Token in R

You can save your API keys into a “hidden” file containing code that runs when you start R. That file is called the “.Renviron”. It can be a bit of a pain to find this file. So the best option is to install the usethis package, which contains helper functions, including a function to find this file.

install.packages("remotes")
remotes::install_cran("usethis")

When it comes to add packages to your copy of R, the install_cran() function in the remotes package is superior to the usual install.packages() function because it will first check to see if you already have the latest version before bothering to download and install.

After installing usethis you can access your “.Renviron” file by typing this in your console.

usethis::edit_r_environ()

It will cause the file to open. Create a name for your API key (for example: rcap_immuno_key) and add a line like this to your .Renviron file:

rcap_immuno_key="your_api_token_here"

When you click the link you will be given the option to create an API Token for this project. Copy the token created in the previous section from REDCap website, and paste it in the .Renviron file as explained above. Instead of your_api_token_here in the .Renviron file, your token should be there within ““.

After adding the line, remember to save the file and completely restart R/RStudio. Once R restarts, you can access the key like this:

api_token <- Sys.getenv("rcap_immuno_key")

Once you have an API token, you can test whether it works using httr in R.

Example: Checking Project Information
library(REDCapR)

# Define API URL and Token
url <- "https://redcap.petermac.org.au/api/"
token <- Sys.getenv("rcap_immuno_key")  # Load token securely

# Test API connection
formData <- list("token"=token,
    content='project',
    format='csv',
    returnFormat='json'
)
response <- httr::POST(url, body = formData, encode = "form")
result <- httr::content(response)

# Print project details
result
project_id project_title creation_time production_time in_production project_language purpose purpose_other project_notes custom_record_label secondary_unique_field is_longitudinal has_repeating_instruments_or_events surveys_enabled scheduling_enabled record_autonumbering_enabled randomization_enabled ddp_enabled project_irb_number project_grant_number project_pi_firstname project_pi_lastname display_today_now_button missing_data_codes external_modules bypass_branching_erase_field_prompt
1840 Sample Immune checkpoint inhibitor related endocrine toxicity 2025-01-21 11:52:37 2025-01-21 15:55:23 1 English 1 For a workshop A database with dymmy data to be used for an R workshop titled Analysing REDCap data using R for Peter Mac employees NA NA 1 1 1 0 0 0 0 NA NA Anna Galligan 1 NA sticky_matrix_headers,data_dictionary_revisions,annotated_pdf,record_logging_link,data_driven_project_banner,project_autocomplete 0

If the request is successful, you should see metadata about your REDCap project as shown above.

Try It Yourself

In the REDCap API Playground, use the ‘Export Records’ option to create an API request that exports the 3rd record and the mortality_data form in CSV format.

#!/usr/bin/env Rscript
url <- "https://redcap.petermac.org.au/api/"
formData <- list("token"=token,
    content='record',
    action='export',
    format='csv',
    type='flat',
    csvDelimiter='',
    'records[0]'='3',
    'forms[0]'='mortality_data',
    rawOrLabel='raw',
    rawOrLabelHeaders='raw',
    exportCheckboxLabel='false',
    exportSurveyFields='false',
    exportDataAccessGroups='false',
    returnFormat='json'
)
response <- httr::POST(url, body = formData, encode = "form")
result <- httr::content(response)
print(result)

Importing REDCap Data via API

Once you have set up your API token securely, you can use R to retrieve data directly from REDCap. The REDCapR package provides an interface to streamline API calls from R, making it easy to import records from a REDCap project.

Reading REDCap Data

# If this fails, run install.packages("REDCapR") or 
# remotes::install_github(repo="OuhscBbmc/REDCapR")
requireNamespace("REDCapR")

Set project-wide values

There is some information that is specific to the REDCap project, as opposed to an individual operation. This includes:

  1. the uniform resource identifier (uri) of the server
  2. the token for the user’s project.
library(REDCapR)

# Define API URL and Token
uri <- "https://redcap.petermac.org.au/api/"
token <- Sys.getenv("rcap_immuno_key")  # Load token securely

Read all records and fields

By default, the redcap_read() function retrieves the entire dataset from a REDCap project if no filtering parameters (such as records or fields) are specified.

# Read the entire dataset
immuno_all_rows_all_fields <- redcap_read(redcap_uri = uri, token = token)$data

# print the top 6 rows
head(immuno_all_rows_all_fields)
record_id redcap_event_name redcap_repeat_instrument redcap_repeat_instance ur last_name first_name sex dob height weight bmi coenrolled___1 coenrolled___2 coenrolled___3 coenrolled___4 clinical_trial clinical_trial_description medical_history___1 medical_history___2 medical_history___7 medical_history___8 medical_history___5 medical_history___3 medical_history___4 medical_history___6 medical_history___99 medical_history_other autoimmune_disease autoimmune_disease_select___1 autoimmune_disease_select___2 autoimmune_disease_select___3 autoimmune_disease_select___4 autoimmune_disease_select___5 autoimmune_disease_select___6 autoimmune_disease_select___9 autoimmune_disease_other rheumatoid_arthritis smoking demographics_complete mel_type___1 mel_type___2 mel_type___3 mel_type___4 mel_type___5 mel_type___6 mel_type_cutaneous mel_mutation___1 mel_mutation___2 mel_mutation___3 mel_mutation___4 mel_mutation___5 mel_mutation___9 mel_mutation_braf mel_mutation_nras mel_mutation_kit mel_mutation_other mel_first_date mel_first_stage resct1stdiag mel_date_diag stage_diagnosis dt_advanced_dis melanoma_data_complete adj_given adj_path_done adj_path_date adj_path_hb adj_path_wcc adj_path_neut_lymph adj_path_creat adj_path_glucose adj_path_lipase adj_path_a1c_dcct adj_path_a1c_ifcc adj_path_insulin adj_path_cpep adj_path_islet_ab___1 adj_path_islet_ab___2 adj_path_islet_ab___3 adj_path_islet_ab___4 adj_path_gad_ab adj_path_ia2_ab adj_path_insulin_ab adj_path_znt8_ab adj_path_cortisol adj_path_acth adj_path_tsh adj_path_ft4 adj_path_ft3 adj_path_tpo_ab adj_path_tg_ab adj_path_trab adj_path_fsh adj_path_lh adj_path_testosterone adj_path_oestradiol adj_path_igf1 adj_path_gh adj_path_prolactin adj_path_alt adj_path_albumin adj_path_alp adj_path_bilirubin adj_path_ferritin adj_path_crp adj_path_vitd adj_path_troponin adj_path_calprotectin adj_stage adj_type adj_summary adj_type_trialid adjstartdate adj_start adj_medication___1 adj_cessation adj_recurrence adj_recurrence_date timrecurrence adj_resection adjuvant_therapy_complete sys_path_done sys_path_date sys_path_hb sys_path_wcc sys_path_neut_lymph sys_path_creat sys_path_glucose sys_path_lipase sys_path_a1c_dcct sys_path_a1c_ifcc sys_path_insulin sys_path_cpep sys_path_islet_ab___1 sys_path_islet_ab___2 sys_path_islet_ab___3 sys_path_islet_ab___4 sys_path_gad_ab sys_path_ia2_ab sys_path_insulin_ab sys_path_znt8_ab sys_path_cortisol sys_path_acth sys_path_tsh sys_path_ft4 sys_path_ft3 sys_path_tpo_ab sys_path_tg_ab sys_path_trab sys_path_fsh sys_path_lh sys_path_testosterone sys_path_oestradiol sys_path_igf1 sys_path_gh sys_path_prolactin sys_path_alt sys_path_albumin sys_path_alp sys_path_bilirubin sys_path_ferritin sys_path_crp sys_path_vitd sys_path_troponin sys_path_calprotectin systemic_type systemic_summary systemic_chemo systemic_ici systemic_mapk systemic_trial systemic_trial_number systemic_type_other systemic_stage systemic_ecog systemic_disease_sites___1 systemic_disease_sites___2 systemic_disease_sites___3 systemic_disease_sites___4 systemic_disease_sites___5 systemic_disease_sites___6 systemic_disease_sites___7 systemic_disease_sites___8 systemic_disease_sites___9 systemic_disease_sites___10 systemic_disease_sites___11 systemic_disease_sites___12 systemic_ppi systemic_antibiotics systemic_steroids systemic_ldh systemic_ldh_value systemic_creatinine_units systemic_creatinine_uln systemic_egfr systemic_start systemic_cycles systemic_cease_reason rest_pet rest_ct first_response st_resp_ct dt_first_response best_response_percist best_resp_recist dt_best_response systemic_percist_response_time systemic_progression systemic_progression_type systemic_progression_date systemic_progression_time systemic_progression_imaging pseudoprogression systemic_prog_imaging_type___1 systemic_prog_imaging_type___2 systemic_prog_imaging_type___3 systemic_prog_ct_date systemic_prog_pet_date systemic_prog_mri_date systemic_progression_clinically sites_progression___1 sites_progression___2 sites_progression___3 sites_progression___4 sites_progression___5 sites_progression___6 sites_progression___7 sites_progression___8 sites_progression___9 sites_progression___10 sites_progression___11 sites_progression___12 site_1_met_at_recur___1 site_1_met_at_recur___2 site_1_met_at_recur___3 site_1_met_at_recur___4 site_1_met_at_recur___5 site_1_met_at_recur___6 site_1_met_at_recur___7 site_1_met_at_recur___8 site_1_met_at_recur___9 site_1_met_at_recur___10 site_1_met_at_recur___11 site_1_met_at_recur___12 biopsy_confirmed systemic_oligorecurrence systemic_oligo_treatment treat_intent_olig systemic_oligo_treatment_date systemic_oligo_treatment_response systemic_oligo_treatment_systemic date_last_syst_tx tx_ongoing reason_cessation p_treatment___1 p_treatment___2 p_treatment___3 p_treatment___4 type_io___1 type_io___2 type_io___3 dis_free_io___1 dis_free_io___2 dis_free_io___3 dis_free_io___4 dur_res_io res_pemb_nivo___1 res_pemb_nivo___2 res_pemb_nivo___3 res_pemb_nivo___4 res_niv_pem res_pem___1 res_pem___2 res_pem___3 res_pem___4 dur_res_pem braf_mek targ_dis_free___1 targ_dis_free___2 targ_dis_free___3 targ_dis_free___4 dur_braf_mek site_disease_prog___1 site_disease_prog___2 site_disease_prog___3 site_disease_prog___4 site_disease_prog___5 site_disease_prog___6 site_disease_prog___7 site_disease_prog___8 site_disease_prog___9 site_disease_prog___10 site_disease_prog___11 p_treatment_info prior_treatment_complete cnsmets_date cnsmets_number number_cns_mets cnsmets_largest cns_symptoms surgery_brainmets cnsmets_radiotherapy cnsmets_glucocorticoids cnsmets_brafmek cnsmets_bevacizumab cns_diag cns_symptoms_type resp_io_brainmet___1 resp_io_brainmet___2 resp_io_brainmet___3 resp_io_brainmet___4 cns_steroid cns_braf_mek single_double melanoma_cns_metastases_complete ae_any ae_adj_sql ae_systemic_sql ae_type ae_type_sql ae_type_sql_select ae_endocrine ae_gastrointestinal ae_haematological ae_neurological ae_skin ae_onset_date ae_ctcae ae_kdigo ae_investigations___1 ae_investigations___2 ae_investigations___3 ae_investigations___4 ae_investigations___5 ae_investigations___6 ae_investigations___7 ae_investigations___8 ae_investigations___9 ae_investigations___10 ae_investigations___11 ae_autoantibodies_date ae_autoantibodies_result ae_biopsy_date ae_biopsy_result ae_csf_date ae_csf_result ae_ecg_date ae_ecg_result ae_echo_date ae_echo_result ae_endoscopy_date ae_endoscopy_result ae_fcp_date ae_fcp_result ae_fmcs_date ae_fmcs_result ae_mri_date ae_mri_result ae_ncs_date ae_ncs_result ae_urin_date ae_urin_result ae_treatment___11 ae_treatment___10 ae_treatment___1 ae_treatment___2 ae_treatment___3 ae_treatment___4 ae_treatment___5 ae_treatment___15 ae_treatment___6 ae_treatment___7 ae_treatment___8 ae_treatment___9 ae_treatment___14 ae_treatment___12 ae_treatment___13 ae_summary adverse_events_complete b_date b_ecog b_autoim___1 b_autoim___2 b_autoim___3 b_autoim___4 b_autoim___5 b_autoim___6 b_autoim___7 b_autoim___8 b_autoim___9 b_autoim___10 b_autoim___11 b_autoim___12 b_autoim___13 b_autoim_type b_fhx b_endo___1 b_endo___2 b_endo___3 b_endo___4 b_endo___5 b_endo___6 b_endo___7 b_endo___8 b_endo___9 b_endo___10 b_endo___11 b_endo___12 b_endo_type hla b_cancer b_cancer_hx b_pmhx b_meds b_ppi b_steroid b_steroid_type baseline_visit_complete drug_name drug_date drug_number drug_comment checkpoint_inhibitor_treatment_complete irae_type___1 irae_type___2 irae_type___3 irae_type___4 irae_type___5 irae_type___6 irae_type___7 irae_type___8 irae_type___9 irae_type___10 endo_irae___1 endo_irae___2 endo_irae___3 endo_irae___4 hypophysitis_date hypophysitis_time thyroiditis_date thyroiditis_time pancreatitis_date adrenalitis_date hypophysitis_cycle thyroid_cycle pancreas_cycle adrenal_cycle endo_irae_comment skin_date skin_cycle skin_type___1 skin_type___2 skin_type___3 skin_type___4 skin_type___5 skin_type___6 skin_type___7 skin_type___8 skin_type___9 skin_type___10 skin_grade skin_histo rheum_date rheum_cycle rheum_type___1 rheum_type___2 rheum_type___3 rheum_type___4 rheum_type___5 rheum_type___6 rheum_type___7 rheum_grade rheum_path gastro_date gastro_cycle gastro_type___1 gastro_type___2 gastro_type___3 gastro_type___4 gastro_type___5 gastro_grade gastro_endoscopy gastro_histo gastro_radiol gastro_cdt gastro_stool gastro_path liver_date liver_cycle liver_grade liver_path liver_histo liver_radiol renal_date renal_cycle renal_type___1 renal_type___2 renal_type___3 renal_type___4 renal_type___5 renal_type___6 renal_grade renal_path renal_urine renal_histo pulm_date pulm_cycle pulm_type___1 pulm_type___2 pulm_type___3 pulm_type___4 pulm_grade pulm_radiol pulm_histo pulm_path cardiac_date cardiac_cycle cardiac_type___1 cardiac_type___2 cardiac_type___3 cardiac_type___4 cardiac_type___5 cardiac_grade cardiac_tropck cardiac_ecg cardiac_echo cardiac_radiol cardiac_histo neuro_date neuro_cycle neuro_type___1 neuro_type___2 neuro_type___3 neuro_type___4 neuro_type___5 neuro_type___6 neuro_type___7 neuro_type___8 neuro_type___9 neuro_type___10 neuro_type___11 neuro_type___12 neuro_grade neuro_radiol neuro_path irae_steroids irae_details irae_emergency immune_related_adverse_events_iraes_complete path_date hb wcc neut_lymph creat glucose lipase a1c_dcct a1c_ifcc insulin cpep islet_ab___1 islet_ab___2 islet_ab___3 islet_ab___4 gad_ab ia2_ab insulin_ab znt8_ab cortisol acth tsh ft4 ft3 tpo_ab tg_ab trab fsh lh testosterone oestradiol igf1 gh prolactin alt albumin alp bilirubin ferritin crp vitd troponin calprotectin pathology_complete pit_image_date image_type pit_suspected pit_size pit_appearance hypophysitis_image pit_alt_image image_comment ctmri_imaging_complete ur_pet pet_date pet_timing___1 pet_timing___2 pet_timing___3 pet_timing___4 pet_bsl pet_uptake_time sul_peak suv_max pet_steroid b_pet_metastases___1 b_pet_metastases___2 b_pet_metastases___3 b_pet_metastases___4 b_pet_metastases___5 b_pet_metastases___6 b_pet_metastases___7 b_pet_metastases___8 b_pet_metastases___9 b_pet_metastases___10 pet_endo___1 pet_endo___2 pet_endo___3 pet_endo___4 pet_endo___5 pet_endo___6 pet_endo___7 pet_endo___8 pet_endo___9 pet_endo___10 pet_endo___11 pit_pet_fdg pit_pet_suv pit_suv_change hypophysitis_pet pet_pit_ct thyr_pet_fdg thyr_pet_suv thyr_suv_change thyroiditis_pet pet_thyr_ct panc_pet_fdg panc_pet_suv panc_suv_change pancreatitis_pet pet_panc_ct adrenal_pet_fdg adrenal_pet_suv adrenal_suv_change adrenalitis_pet pet_adrenal_ct cns_pet_fdg cns_pet_suv brain_suv_change encephalitis_pet pet_cns_ct rheum_pet rheum_suvmax pet_rheum_ct liver_pet_fdg liver_pet_suv liver_suv_change hepatitis_pet pet_liver_ct uppergi_pet_fdg uppergi_pet_suv uppergi_suv_change gastritis_pet pet_uppergi_ct ilium_pet_fdg ilium_pet_suv ilium_suv_change ileitis_pet pet_ilium_ct colon_pet_fdg colon_pet_suv colon_suv_change colitis_pet pet_colon_ct pet_comments sul_change suv_change percist_pet eortc_pet recist_pet residual_disease___1 residual_disease___2 residual_disease___3 residual_disease___4 residual_disease___5 residual_disease___6 residual_disease___7 residual_disease___8 residual_disease___9 residual_disease___10 pet_metastasis pet_metastasis_number site_progression___1 site_progression___2 site_progression___3 site_progression___4 site_progression___5 site_progression___6 site_progression___7 site_progression___8 site_progression___9 site_progression___10 pet_imaging_complete antibiotic_oral antibiotic_iv antibiotic_type ppi ppi_and_antibiotic_use_during_treatment_with_cpis_complete responce_cpi best_res_pet time_to_best_response response_dur_cpi overall_response progression_cpi progression_adrenal progression_site___1 progression_site___2 progression_site___3 progression_site___4 progression_site___5 progression_site___6 progression_site___7 progression_site___8 progression_site___9 progression_site___10 progression_site___11 other_site subsequent_rx response_data_complete mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause other_cause_death mortality_data_complete
1 baseline_arm_1 NA NA 2810493 Jackson Hannah 2 1943-02-12 154 163 68.73 0 0 0 0 NA NA 0 0 0 0 0 0 0 0 0 NA NA 0 0 0 0 0 0 0 NA NA NA 0 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
1 baseline_arm_1 prior_treatment 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 Immune checkpoint inhibition: 2019-04-15 NA 2 NA 0 NA NA 6 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 9 NA NA NA NA 2019-04-15 2 3 1 NA 3 NA 2019-07-09 3 NA 2019-07-09 2.8 0 NA NA NA NA NA 0 0 0 NA NA NA NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA 0 10/5/19 0 2 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 NA 0 0 0 0 NA NA 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 baseline_arm_1 NA NA 6408685 Howard Samantha 2 2008-01-29 181 97 29.61 0 0 0 0 NA NA 0 0 0 0 0 0 0 0 0 NA NA 0 0 0 0 0 0 0 NA NA NA 0 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 baseline_arm_1 NA NA 9994173 Martinez Noah 2 1940-11-06 199 123 31.06 0 0 0 0 NA NA 0 0 0 0 0 0 0 0 0 NA NA 0 0 0 0 0 0 0 NA NA NA 0 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 end_arm_1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 2017-01-10 2 2016-10-24 4.7 NA NA 2
3 baseline_arm_1 prior_treatment 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 Immune checkpoint inhibition: 2016-08-19 NA 3 NA 0 NA NA 5 1 0 0 0 0 0 1 1 0 0 0 0 0 NA NA NA 9 NA NA NA NA 2016-08-19 NA NA 1 NA 4 NA 2016-10-24 4 NA NA NA 1 1 2016-10-24 2.2 NA 2 0 0 0 NA NA NA 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 2 0 NA NA NA NA 0 21/10/16 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 NA 0 0 0 0 NA NA 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Read a subset of records

In many cases, you may only need data for a specific subset of records (e.g., certain patients). You can achieve this by specifying a list of record IDs in the records argument of redcap_read() as follows.

Pass an array (where each element is a record ID) to the records parameter:

# Define the specific records to retrieve
selected_records <- c(385, 490, 500)  # Replace with actual record IDs

# Read only the selected records
immuno_some_records <- redcap_read(
  redcap_uri = uri, 
  token = token, 
  records = selected_records
)$data

# print all rows
immuno_some_records
record_id redcap_event_name redcap_repeat_instrument redcap_repeat_instance ur last_name first_name sex dob height weight bmi coenrolled___1 coenrolled___2 coenrolled___3 coenrolled___4 clinical_trial clinical_trial_description medical_history___1 medical_history___2 medical_history___7 medical_history___8 medical_history___5 medical_history___3 medical_history___4 medical_history___6 medical_history___99 medical_history_other autoimmune_disease autoimmune_disease_select___1 autoimmune_disease_select___2 autoimmune_disease_select___3 autoimmune_disease_select___4 autoimmune_disease_select___5 autoimmune_disease_select___6 autoimmune_disease_select___9 autoimmune_disease_other rheumatoid_arthritis smoking demographics_complete mel_type___1 mel_type___2 mel_type___3 mel_type___4 mel_type___5 mel_type___6 mel_type_cutaneous mel_mutation___1 mel_mutation___2 mel_mutation___3 mel_mutation___4 mel_mutation___5 mel_mutation___9 mel_mutation_braf mel_mutation_nras mel_mutation_kit mel_mutation_other mel_first_date mel_first_stage resct1stdiag mel_date_diag stage_diagnosis dt_advanced_dis melanoma_data_complete adj_given adj_path_done adj_path_date adj_path_hb adj_path_wcc adj_path_neut_lymph adj_path_creat adj_path_glucose adj_path_lipase adj_path_a1c_dcct adj_path_a1c_ifcc adj_path_insulin adj_path_cpep adj_path_islet_ab___1 adj_path_islet_ab___2 adj_path_islet_ab___3 adj_path_islet_ab___4 adj_path_gad_ab adj_path_ia2_ab adj_path_insulin_ab adj_path_znt8_ab adj_path_cortisol adj_path_acth adj_path_tsh adj_path_ft4 adj_path_ft3 adj_path_tpo_ab adj_path_tg_ab adj_path_trab adj_path_fsh adj_path_lh adj_path_testosterone adj_path_oestradiol adj_path_igf1 adj_path_gh adj_path_prolactin adj_path_alt adj_path_albumin adj_path_alp adj_path_bilirubin adj_path_ferritin adj_path_crp adj_path_vitd adj_path_troponin adj_path_calprotectin adj_stage adj_type adj_summary adj_type_trialid adjstartdate adj_start adj_medication___1 adj_cessation adj_recurrence adj_recurrence_date timrecurrence adj_resection adjuvant_therapy_complete sys_path_done sys_path_date sys_path_hb sys_path_wcc sys_path_neut_lymph sys_path_creat sys_path_glucose sys_path_lipase sys_path_a1c_dcct sys_path_a1c_ifcc sys_path_insulin sys_path_cpep sys_path_islet_ab___1 sys_path_islet_ab___2 sys_path_islet_ab___3 sys_path_islet_ab___4 sys_path_gad_ab sys_path_ia2_ab sys_path_insulin_ab sys_path_znt8_ab sys_path_cortisol sys_path_acth sys_path_tsh sys_path_ft4 sys_path_ft3 sys_path_tpo_ab sys_path_tg_ab sys_path_trab sys_path_fsh sys_path_lh sys_path_testosterone sys_path_oestradiol sys_path_igf1 sys_path_gh sys_path_prolactin sys_path_alt sys_path_albumin sys_path_alp sys_path_bilirubin sys_path_ferritin sys_path_crp sys_path_vitd sys_path_troponin sys_path_calprotectin systemic_type systemic_summary systemic_chemo systemic_ici systemic_mapk systemic_trial systemic_trial_number systemic_type_other systemic_stage systemic_ecog systemic_disease_sites___1 systemic_disease_sites___2 systemic_disease_sites___3 systemic_disease_sites___4 systemic_disease_sites___5 systemic_disease_sites___6 systemic_disease_sites___7 systemic_disease_sites___8 systemic_disease_sites___9 systemic_disease_sites___10 systemic_disease_sites___11 systemic_disease_sites___12 systemic_ppi systemic_antibiotics systemic_steroids systemic_ldh systemic_ldh_value systemic_creatinine_units systemic_creatinine_uln systemic_egfr systemic_start systemic_cycles systemic_cease_reason rest_pet rest_ct first_response st_resp_ct dt_first_response best_response_percist best_resp_recist dt_best_response systemic_percist_response_time systemic_progression systemic_progression_type systemic_progression_date systemic_progression_time systemic_progression_imaging pseudoprogression systemic_prog_imaging_type___1 systemic_prog_imaging_type___2 systemic_prog_imaging_type___3 systemic_prog_ct_date systemic_prog_pet_date systemic_prog_mri_date systemic_progression_clinically sites_progression___1 sites_progression___2 sites_progression___3 sites_progression___4 sites_progression___5 sites_progression___6 sites_progression___7 sites_progression___8 sites_progression___9 sites_progression___10 sites_progression___11 sites_progression___12 site_1_met_at_recur___1 site_1_met_at_recur___2 site_1_met_at_recur___3 site_1_met_at_recur___4 site_1_met_at_recur___5 site_1_met_at_recur___6 site_1_met_at_recur___7 site_1_met_at_recur___8 site_1_met_at_recur___9 site_1_met_at_recur___10 site_1_met_at_recur___11 site_1_met_at_recur___12 biopsy_confirmed systemic_oligorecurrence systemic_oligo_treatment treat_intent_olig systemic_oligo_treatment_date systemic_oligo_treatment_response systemic_oligo_treatment_systemic date_last_syst_tx tx_ongoing reason_cessation p_treatment___1 p_treatment___2 p_treatment___3 p_treatment___4 type_io___1 type_io___2 type_io___3 dis_free_io___1 dis_free_io___2 dis_free_io___3 dis_free_io___4 dur_res_io res_pemb_nivo___1 res_pemb_nivo___2 res_pemb_nivo___3 res_pemb_nivo___4 res_niv_pem res_pem___1 res_pem___2 res_pem___3 res_pem___4 dur_res_pem braf_mek targ_dis_free___1 targ_dis_free___2 targ_dis_free___3 targ_dis_free___4 dur_braf_mek site_disease_prog___1 site_disease_prog___2 site_disease_prog___3 site_disease_prog___4 site_disease_prog___5 site_disease_prog___6 site_disease_prog___7 site_disease_prog___8 site_disease_prog___9 site_disease_prog___10 site_disease_prog___11 p_treatment_info prior_treatment_complete cnsmets_date cnsmets_number number_cns_mets cnsmets_largest cns_symptoms surgery_brainmets cnsmets_radiotherapy cnsmets_glucocorticoids cnsmets_brafmek cnsmets_bevacizumab cns_diag cns_symptoms_type resp_io_brainmet___1 resp_io_brainmet___2 resp_io_brainmet___3 resp_io_brainmet___4 cns_steroid cns_braf_mek single_double melanoma_cns_metastases_complete ae_any ae_adj_sql ae_systemic_sql ae_type ae_type_sql ae_type_sql_select ae_endocrine ae_gastrointestinal ae_haematological ae_neurological ae_skin ae_onset_date ae_ctcae ae_kdigo ae_investigations___1 ae_investigations___2 ae_investigations___3 ae_investigations___4 ae_investigations___5 ae_investigations___6 ae_investigations___7 ae_investigations___8 ae_investigations___9 ae_investigations___10 ae_investigations___11 ae_autoantibodies_date ae_autoantibodies_result ae_biopsy_date ae_biopsy_result ae_csf_date ae_csf_result ae_ecg_date ae_ecg_result ae_echo_date ae_echo_result ae_endoscopy_date ae_endoscopy_result ae_fcp_date ae_fcp_result ae_fmcs_date ae_fmcs_result ae_mri_date ae_mri_result ae_ncs_date ae_ncs_result ae_urin_date ae_urin_result ae_treatment___11 ae_treatment___10 ae_treatment___1 ae_treatment___2 ae_treatment___3 ae_treatment___4 ae_treatment___5 ae_treatment___15 ae_treatment___6 ae_treatment___7 ae_treatment___8 ae_treatment___9 ae_treatment___14 ae_treatment___12 ae_treatment___13 ae_summary adverse_events_complete b_date b_ecog b_autoim___1 b_autoim___2 b_autoim___3 b_autoim___4 b_autoim___5 b_autoim___6 b_autoim___7 b_autoim___8 b_autoim___9 b_autoim___10 b_autoim___11 b_autoim___12 b_autoim___13 b_autoim_type b_fhx b_endo___1 b_endo___2 b_endo___3 b_endo___4 b_endo___5 b_endo___6 b_endo___7 b_endo___8 b_endo___9 b_endo___10 b_endo___11 b_endo___12 b_endo_type hla b_cancer b_cancer_hx b_pmhx b_meds b_ppi b_steroid b_steroid_type baseline_visit_complete drug_name drug_date drug_number drug_comment checkpoint_inhibitor_treatment_complete irae_type___1 irae_type___2 irae_type___3 irae_type___4 irae_type___5 irae_type___6 irae_type___7 irae_type___8 irae_type___9 irae_type___10 endo_irae___1 endo_irae___2 endo_irae___3 endo_irae___4 hypophysitis_date hypophysitis_time thyroiditis_date thyroiditis_time pancreatitis_date adrenalitis_date hypophysitis_cycle thyroid_cycle pancreas_cycle adrenal_cycle endo_irae_comment skin_date skin_cycle skin_type___1 skin_type___2 skin_type___3 skin_type___4 skin_type___5 skin_type___6 skin_type___7 skin_type___8 skin_type___9 skin_type___10 skin_grade skin_histo rheum_date rheum_cycle rheum_type___1 rheum_type___2 rheum_type___3 rheum_type___4 rheum_type___5 rheum_type___6 rheum_type___7 rheum_grade rheum_path gastro_date gastro_cycle gastro_type___1 gastro_type___2 gastro_type___3 gastro_type___4 gastro_type___5 gastro_grade gastro_endoscopy gastro_histo gastro_radiol gastro_cdt gastro_stool gastro_path liver_date liver_cycle liver_grade liver_path liver_histo liver_radiol renal_date renal_cycle renal_type___1 renal_type___2 renal_type___3 renal_type___4 renal_type___5 renal_type___6 renal_grade renal_path renal_urine renal_histo pulm_date pulm_cycle pulm_type___1 pulm_type___2 pulm_type___3 pulm_type___4 pulm_grade pulm_radiol pulm_histo pulm_path cardiac_date cardiac_cycle cardiac_type___1 cardiac_type___2 cardiac_type___3 cardiac_type___4 cardiac_type___5 cardiac_grade cardiac_tropck cardiac_ecg cardiac_echo cardiac_radiol cardiac_histo neuro_date neuro_cycle neuro_type___1 neuro_type___2 neuro_type___3 neuro_type___4 neuro_type___5 neuro_type___6 neuro_type___7 neuro_type___8 neuro_type___9 neuro_type___10 neuro_type___11 neuro_type___12 neuro_grade neuro_radiol neuro_path irae_steroids irae_details irae_emergency immune_related_adverse_events_iraes_complete path_date hb wcc neut_lymph creat glucose lipase a1c_dcct a1c_ifcc insulin cpep islet_ab___1 islet_ab___2 islet_ab___3 islet_ab___4 gad_ab ia2_ab insulin_ab znt8_ab cortisol acth tsh ft4 ft3 tpo_ab tg_ab trab fsh lh testosterone oestradiol igf1 gh prolactin alt albumin alp bilirubin ferritin crp vitd troponin calprotectin pathology_complete pit_image_date image_type pit_suspected pit_size pit_appearance hypophysitis_image pit_alt_image image_comment ctmri_imaging_complete ur_pet pet_date pet_timing___1 pet_timing___2 pet_timing___3 pet_timing___4 pet_bsl pet_uptake_time sul_peak suv_max pet_steroid b_pet_metastases___1 b_pet_metastases___2 b_pet_metastases___3 b_pet_metastases___4 b_pet_metastases___5 b_pet_metastases___6 b_pet_metastases___7 b_pet_metastases___8 b_pet_metastases___9 b_pet_metastases___10 pet_endo___1 pet_endo___2 pet_endo___3 pet_endo___4 pet_endo___5 pet_endo___6 pet_endo___7 pet_endo___8 pet_endo___9 pet_endo___10 pet_endo___11 pit_pet_fdg pit_pet_suv pit_suv_change hypophysitis_pet pet_pit_ct thyr_pet_fdg thyr_pet_suv thyr_suv_change thyroiditis_pet pet_thyr_ct panc_pet_fdg panc_pet_suv panc_suv_change pancreatitis_pet pet_panc_ct adrenal_pet_fdg adrenal_pet_suv adrenal_suv_change adrenalitis_pet pet_adrenal_ct cns_pet_fdg cns_pet_suv brain_suv_change encephalitis_pet pet_cns_ct rheum_pet rheum_suvmax pet_rheum_ct liver_pet_fdg liver_pet_suv liver_suv_change hepatitis_pet pet_liver_ct uppergi_pet_fdg uppergi_pet_suv uppergi_suv_change gastritis_pet pet_uppergi_ct ilium_pet_fdg ilium_pet_suv ilium_suv_change ileitis_pet pet_ilium_ct colon_pet_fdg colon_pet_suv colon_suv_change colitis_pet pet_colon_ct pet_comments sul_change suv_change percist_pet eortc_pet recist_pet residual_disease___1 residual_disease___2 residual_disease___3 residual_disease___4 residual_disease___5 residual_disease___6 residual_disease___7 residual_disease___8 residual_disease___9 residual_disease___10 pet_metastasis pet_metastasis_number site_progression___1 site_progression___2 site_progression___3 site_progression___4 site_progression___5 site_progression___6 site_progression___7 site_progression___8 site_progression___9 site_progression___10 pet_imaging_complete antibiotic_oral antibiotic_iv antibiotic_type ppi ppi_and_antibiotic_use_during_treatment_with_cpis_complete responce_cpi best_res_pet time_to_best_response response_dur_cpi overall_response progression_cpi progression_adrenal progression_site___1 progression_site___2 progression_site___3 progression_site___4 progression_site___5 progression_site___6 progression_site___7 progression_site___8 progression_site___9 progression_site___10 progression_site___11 other_site subsequent_rx response_data_complete mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause other_cause_death mortality_data_complete
385 baseline_arm_1 NA NA 5454920 Carter Rachel 1 1956-08-28 146 151 70.84 0 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0 0 0 0 NA NA 2 2 0 0 0 0 1 0 NA 0 0 0 1 0 0 NA NA NA NA 2014-06-23 4 0 2017-12-05 2 23/5/14 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2014-05-23 1 NA NA 1 1 1 1 0 0 1 confusion 0 0 0 0 0 0 NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
385 end_arm_1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 2017-07-02 NA NA 11.3 1 NA 2
385 baseline_arm_1 prior_treatment 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 Immune checkpoint inhibition: 2016-07-22 NA 3 NA 0 NA NA 6 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 328 83 110 77 2016-07-22 NA 2 1 NA 4 NA NA NA NA NA NA 1 3 2016-09-05 1.5 1 NA 0 1 1 NA 2016-10-10 2016-09-05 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 2 0 NA NA NA NA NA 14/10/16 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 NA 0 0 0 0 NA NA 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
385 baseline_arm_1 prior_treatment 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 Immune checkpoint inhibition: 2016-10-14 NA 1 NA 0 NA NA 6 2 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 NA NA NA NA NA 2016-10-14 NA 2 1 NA 4 NA 2017-03-30 4 NA NA NA 1 2 2016-11-04 0.7 1 NA 0 0 1 NA NA 2016-11-04 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 1 2 NA 2016-11-11 4 0 14/10/26 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 NA 0 0 0 0 NA NA 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
490 baseline_arm_1 NA NA 1088816 Garcia Claire 1 1985-06-09 197 52 13.40 0 0 0 0 1 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0 0 0 0 NA NA 2 2 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
490 end_arm_1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 2018-12-11 2 NA 24.3 1 NA 2
490 baseline_arm_1 adjuvant_therapy 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA Trial: NA NA NA 0 NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
490 baseline_arm_1 prior_treatment 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 Immune checkpoint inhibition: 2016-12-02 NA 3 NA 1 Keynote 054 - cross over NA 5 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 9 NA NA NA NA 2016-12-02 NA NA 2 1 NA 1 2017-02-22 NA 1 NA NA 1 1 2017-05-18 5.5 1 NA 1 0 0 2017-05-18 NA NA 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 2 0 NA NA NA NA 0 29/4/17 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 NA 0 0 0 0 NA NA 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
490 baseline_arm_1 prior_treatment 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 MAPK inhibitor therapy: 2017-05-19 NA NA 1 0 NA NA 5 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 9 NA NA NA NA 2017-05-19 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 NA NA NA NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 NA 0 0 0 0 NA NA 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
500 baseline_arm_1 NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA 0 0 0 0 0 0 0 0 0 NA NA 0 0 0 0 0 0 0 NA NA NA 0 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
500 baseline_arm_1 prior_treatment 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 MAPK inhibitor therapy: 2017-11-01 NA NA 1 0 NA NA 6 NA 1 0 0 0 0 1 0 0 0 0 0 0 NA NA NA NA NA 79 110 87 2017-11-01 NA 2 1 2 2 NA NA NA NA 2018-01-09 2.3 1 2 2018-06-05 7.1 1 2 0 0 1 NA NA 2018-06-05 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 NA NA NA NA 0 15/6/18 1 NA 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 NA 0 0 0 0 NA NA 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
500 baseline_arm_1 prior_treatment 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 Immune checkpoint inhibition: 2018-06-29 NA 2 NA 0 NA NA 6 NA 1 0 0 1 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA 2018-06-29 2 2 1 NA 4 NA 2018-09-12 4 NA 2018-09-12 2.5 1 3 2018-09-12 2.5 1 2 0 1 1 NA 2018-09-12 2018-09-12 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 NA NA NA NA NA 0 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 NA 0 0 0 0 NA NA 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
500 baseline_arm_1 adverse_events 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA Immune checkpoint inhibition: 2018-06-29 4 Gastrointestinal NA NA 8 NA NA NA 2018-09-01 3 NA 0 0 0 0 0 0 1 1 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2018-09-01: Gastrointestinal 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Read a subset of fields

If you only need specific variables (e.g., record_id, dob, gender), you can specify a list of field names in the fields argument of redcap_read().

# Define the specific fields to retrieve
selected_fields <- c("first_name", "dob", "mortality_date")  # Replace with actual record IDs

# Read only the selected fields
immuno_some_fields <- redcap_read(
  redcap_uri = uri, 
  token = token, 
  fields = selected_fields
)$data

# print the top 6 rows
head(immuno_some_fields)
record_id redcap_event_name redcap_repeat_instrument redcap_repeat_instance first_name dob mortality_date
1 baseline_arm_1 NA NA Hannah 1943-02-12 NA
2 baseline_arm_1 NA NA Samantha 2008-01-29 NA
3 baseline_arm_1 NA NA Noah 1940-11-06 NA
3 end_arm_1 NA NA NA NA 2017-01-10
4 baseline_arm_1 NA NA Aiden 1963-06-04 NA
4 end_arm_1 NA NA NA NA 2024-02-09

In all these cases, the data imported into R from REDCap is in its raw format. For example, a categorical variable like sex, which is expected to contain values such as “Male” or “Female,” may instead be represented as numeric codes (e.g., 1 for Male, 2 for Female). While these values can be manually recoded in R, doing so for large projects with multiple categorical variables can quickly become cumbersome and error-prone.

Furthermore, complex study designs—such as those used in clinical trials, cohort studies, and observational research—often involve longitudinal data or repeating instruments, adding another layer of complexity to data management.

  • Longitudinal data is used when information is collected at multiple time points or study events (e.g., Baseline, Follow-up).
  • Repeating instruments allow a single form to be completed multiple times per participant (e.g., recording multiple adverse events, medications, or hospital visits).

Handling these structured data formats in R requires additional steps for cleaning and organisation.

To address these challenges, the REDCapTidieR package extends the functionality of REDCapR, making it easier to analyse complex REDCap datasets.

Try It Yourself

Read records from 260 to 266 and the following fields: record_id, mel_type, systemic_type, systemic_stage, best_response_percist, mortality_treatment_time and mortality_cause.

# Define the specific fields to retrieve
selected_fields <- c("record_id", "mel_type", "systemic_type", "systemic_stage",
                     "best_response_percist", "mortality_treatment_time",
                     "mortality_cause")  # Replace with actual record IDs

# Read only the selected records and fields
immuno_subset <- redcap_read(
  redcap_uri = uri, 
  token = token, 
  fields = selected_fields,
  records = seq(260, 266)
)$data

# print the subset
immuno_subset

Reading all REDCap data

Unlike REDCapR, which returns a single large dataframe, REDCapTidieR automatically structures and organises the data by breaking it into separate tibbles, each representing a different REDCap instrument. This makes it easier to work with studies involving multiple forms and events.

Before using REDCapTidieR, ensure it is installed along with its dependencies:

# If this fails, run install.packages("REDCapTidieR") or
# devtools::install_github("CHOP-CGTInformatics/REDCapTidieR")
requireNamespace("REDCapTidieR")

To import the entire dataset while maintaining structured tables (or supertibble), use read_redcap():

# Load required packages
library(REDCapTidieR)

# Read entire REDCap project data
immuno <- read_redcap(redcap_uri = uri, token = token, raw_or_label = "raw")
# print the structure of imported data
str(immuno, max.level = 2)
suprtbl [15 × 11] (S3: redcap_supertbl/tbl_df/tbl/data.frame)
 $ redcap_form_name : chr [1:15] "demographics" "melanoma_data" "adjuvant_therapy" "prior_treatment" ...
 $ redcap_form_label: chr [1:15] "Demographics" "Melanoma Data" "Adjuvant Therapy" "Systemic Therapy for Advanced Disease" ...
 $ redcap_data      :List of 15
 $ redcap_metadata  :List of 15
 $ redcap_events    :List of 15
 $ structure        : chr [1:15] "nonrepeating" "nonrepeating" "repeating" "repeating" ...
 $ data_rows        : int [1:15] 498 498 288 765 498 231 144 177 144 441 ...
 $ data_cols        : int [1:15] 38 26 61 177 22 67 41 8 128 47 ...
 $ data_size        : 'lobstr_bytes' num [1:15] 181.33 kB 130.62 kB 117.23 kB   1.10 MB  97.36 kB ...
 $ data_na_pct      : 'formattable' num [1:15]  13%  22%  82%  41%  62% ...
  ..- attr(*, "formattable")=List of 4
 $ form_complete_pct: 'formattable' num [1:15]  74%  74%  97%  96%  34% ...
  ..- attr(*, "formattable")=List of 4

Exploring the Data

The supertibble object can be viewed with the RStudio Data Viewer. You can click on the table icon in the Environment tab to view of the supertibble in the data viewer. At a glance you see an overview of the instruments in the REDCap project.

Data Viewer showing the immuno supertibble

You can drill down into individual tables in the redcap_data and redcap_metadata columns. Note that in the demographics data tibble, each row represents a patient, identified by their record_id.

Data Viewer showing the demographics data tibble

In the pet_imaging data tibble, each row represents a PET scan information of a specific patient. Each row is identified by the combination of record_id and redcap_form_instance. This difference in granularity is because pet_imaging is a repeating instrument whereas demographics is a nonrepeating instrument.

Data Viewer showing the pet_imaging data tibble

You can also explore the metadata tibbles in the redcap_metadata column to find out about field labels, field types, and other field attributes.

Data Viewer showing the demographics metadata tibble

Extracting data tibbles from the supertibble

REDCapTidieR provides three different functions to extract data tibbles from a supertibble.

Binding data tibbles into the environment

The bind_tibbles() function takes a supertibble and binds its data tibbles directly into the global environment. When you use bind_tibbles() while working interactively in the RStudio IDE, you will see data tibbles appear in the Environment pane.

immuno |> bind_tibbles()

Demonstration of the bind_tibbles function

By default, bind_tibbles() extracts all data tibbles from the supertibble. With the tbls argument you can specify a subset of data tibbles that should be extracted.

Extracting a list of data tibbles

The extract_tibbles() function takes a supertibble and returns a named list of data tibbles. The default is to extract all data tibbles. We use str here to show the structure of the list returned by extract_tibbles().

immuno_instrument_list <- immuno |>
  extract_tibbles()

immuno_instrument_list |>
  str(max.level = 1)
List of 15
 $ demographics                                     : tibble [498 × 38] (S3: tbl_df/tbl/data.frame)
 $ melanoma_data                                    : tibble [498 × 26] (S3: tbl_df/tbl/data.frame)
 $ adjuvant_therapy                                 : tibble [288 × 61] (S3: tbl_df/tbl/data.frame)
 $ prior_treatment                                  : tibble [765 × 177] (S3: tbl_df/tbl/data.frame)
 $ melanoma_cns_metastases                          : tibble [498 × 22] (S3: tbl_df/tbl/data.frame)
 $ adverse_events                                   : tibble [231 × 67] (S3: tbl_df/tbl/data.frame)
 $ baseline_visit                                   : tibble [144 × 41] (S3: tbl_df/tbl/data.frame)
 $ checkpoint_inhibitor_treatment                   : tibble [177 × 8] (S3: tbl_df/tbl/data.frame)
 $ immune_related_adverse_events_iraes              : tibble [144 × 128] (S3: tbl_df/tbl/data.frame)
 $ pathology                                        : tibble [441 × 47] (S3: tbl_df/tbl/data.frame)
 $ ctmri_imaging                                    : tibble [332 × 12] (S3: tbl_df/tbl/data.frame)
 $ pet_imaging                                      : tibble [35 × 112] (S3: tbl_df/tbl/data.frame)
 $ ppi_and_antibiotic_use_during_treatment_with_cpis: tibble [23 × 7] (S3: tbl_df/tbl/data.frame)
 $ response_data                                    : tibble [144 × 23] (S3: tbl_df/tbl/data.frame)
 $ mortality_data                                   : tibble [381 × 10] (S3: tbl_df/tbl/data.frame)

Adding variable labels with the labelled package

REDCapTidieR package allows you to attach labels to variables in the supertibble. Variable labels can make data exploration easier.

immuno |>
  make_labelled() |>
  bind_tibbles()

The make_labelled() function takes a supertibble and returns a supertibble with variable labels applied to the variables of the supertibble as well as to the variables of all data and metadata tibbles in the redcap_data and redcap_metadata columns of the supertibble.

The RStudio Data Viewer shows variable labels below variable names.

Data Viewer showing part of a labelled supertibble

You can use the labelled::look_for() function to explore the variable labels of a tibble.

labelled::look_for(mortality_data)
pos variable label col_type missing levels value_labels
1 record_id Record ID dbl 0 NULL NULL
2 redcap_event REDCap Event chr 0 NULL NULL
3 mortality Has participant deceased? dbl 14 NULL NULL
4 mortality_date Date of last follow up or death date 10 NULL NULL
5 ongoing_survelliance Ongoing melanoma imaging surveillance? dbl 71 NULL NULL
6 date_last_scan Date of last scan date 113 NULL NULL
7 mortality_treatment_time Time since first treatment dose (months) dbl 60 NULL NULL
8 mortality_cause Cause of Death dbl 156 NULL NULL
9 other_cause_death Cause of death if not melanoma lgl 381 NULL NULL
10 form_status_complete REDCap Instrument Completed? dbl 0 NULL NULL

These labels are the REDCap field labels that prompt data entry in the REDCap instrument. REDCapTidieR places them into the field_label variable of the instrument’s metadata tibble. Below you can see that the field labels of the REDCap instrument for mortality_data are the same as the labels above.

REDCap data entry view of the mortality_data instrument

In the demographics instrument, a label has a trailing colon : (check the label of autoimmune_disease_select___9 variable below). This won’t look good as a variable label so let’s remove it.

labelled::look_for(demographics)
pos variable label col_type missing levels value_labels
1 record_id Record ID dbl 0 NULL NULL
2 redcap_event REDCap Event chr 0 NULL NULL
3 ur UR Number dbl 8 NULL NULL
4 last_name Last Name chr 8 NULL NULL
5 first_name First Name chr 8 NULL NULL
6 sex Gender dbl 8 NULL NULL
7 dob Date of Birth date 8 NULL NULL
8 height Height dbl 8 NULL NULL
9 weight Weight dbl 8 NULL NULL
10 bmi BMI dbl 8 NULL NULL
11 coenrolled___1 Other Studies Enrolled In: MetaMel dbl 0 NULL NULL
12 coenrolled___2 Other Studies Enrolled In: Micromac dbl 0 NULL NULL
13 coenrolled___3 Other Studies Enrolled In: MRV dbl 0 NULL NULL
14 coenrolled___4 Other Studies Enrolled In: SUMMA dbl 0 NULL NULL
15 clinical_trial Participating in a Clinical Trial dbl 113 NULL NULL
16 clinical_trial_description Describe Trial(s) chr 401 NULL NULL
17 medical_history___1 Medical History: Chronic kidney disease dbl 0 NULL NULL
18 medical_history___2 Medical History: Diabetes dbl 0 NULL NULL
19 medical_history___7 Medical History: Diabetes - Type 1 dbl 0 NULL NULL
20 medical_history___8 Medical History: Diabetes - Type 2 dbl 0 NULL NULL
21 medical_history___5 Medical History: GN dbl 0 NULL NULL
22 medical_history___3 Medical History: Hypertension dbl 0 NULL NULL
23 medical_history___4 Medical History: Ischaemic heart disease dbl 0 NULL NULL
24 medical_history___6 Medical History: Vasculitis dbl 0 NULL NULL
25 medical_history___99 Medical History: Other dbl 0 NULL NULL
26 medical_history_other Medical History Other Unknown chr 415 NULL NULL
27 autoimmune_disease History of Idiopathic Autoimmune Disease dbl 102 NULL NULL
28 autoimmune_disease_select___1 Select Autoimmune Disease(s): Connective tissue disease dbl 0 NULL NULL
29 autoimmune_disease_select___2 Select Autoimmune Disease(s): Inflammatory arthritis dbl 0 NULL NULL
30 autoimmune_disease_select___3 Select Autoimmune Disease(s): Inflammatory bowel disease dbl 0 NULL NULL
31 autoimmune_disease_select___4 Select Autoimmune Disease(s): Interstitial lung disease dbl 0 NULL NULL
32 autoimmune_disease_select___5 Select Autoimmune Disease(s): Multiple sclerosis dbl 0 NULL NULL
33 autoimmune_disease_select___6 Select Autoimmune Disease(s): Sarcoidosis dbl 0 NULL NULL
34 autoimmune_disease_select___9 Select Autoimmune Disease(s): Other, specify: dbl 0 NULL NULL
35 autoimmune_disease_other Describe Autoimmune Disease lgl 498 NULL NULL
36 rheumatoid_arthritis Rheumatoid arthritis lgl 498 NULL NULL
37 smoking Smoking History dbl 100 NULL NULL
38 form_status_complete REDCap Instrument Completed? dbl 0 NULL NULL

The make_labelled() function has a format_labels argument that you can use to preprocess labels before applying them to variables.

immuno |>
  make_labelled(format_labels = ~ gsub(":", "", .)) |>
  bind_tibbles()

labelled::look_for(demographics, "autoimmune")
pos variable label col_type missing levels value_labels
27 autoimmune_disease History of Idiopathic Autoimmune Disease dbl 102 NULL NULL
28 autoimmune_disease_select___1 Select Autoimmune Disease(s) Connective tissue disease dbl 0 NULL NULL
29 autoimmune_disease_select___2 Select Autoimmune Disease(s) Inflammatory arthritis dbl 0 NULL NULL
30 autoimmune_disease_select___3 Select Autoimmune Disease(s) Inflammatory bowel disease dbl 0 NULL NULL
31 autoimmune_disease_select___4 Select Autoimmune Disease(s) Interstitial lung disease dbl 0 NULL NULL
32 autoimmune_disease_select___5 Select Autoimmune Disease(s) Multiple sclerosis dbl 0 NULL NULL
33 autoimmune_disease_select___6 Select Autoimmune Disease(s) Sarcoidosis dbl 0 NULL NULL
34 autoimmune_disease_select___9 Select Autoimmune Disease(s) Other, specify dbl 0 NULL NULL
35 autoimmune_disease_other Describe Autoimmune Disease lgl 498 NULL NULL

This remove all colons in labels.

Try It Yourself

List all the labels in the prior_treatment instrument that contains the word “response”.

labelled::look_for(prior_treatment, "response")

Renaming column names using labels

Some columns associated with checkbox fields in REDCap forms often have less intuitive names. For example, in the melanoma_data instrument, the melanoma type columns are named as mel_type___1, mel_type___2, mel_type___3, etc. These names correspond to different melanoma subtypes but are not easily interpretable.

To improve readability, these columns can be renamed using their corresponding labels, making the dataset more intuitive for analysis. The following function automates this renaming process by extracting variable labels and applying them to the column names.

# This function rename checkbox columns using the labels
rename_checkbox_columns <- function(instrument, column_name_prefix) {
  # List of column names to rename 
  col_names_to_rename <- labelled::look_for(instrument, column_name_prefix)$variable
  # New names for the selected columns
  new_names <- labelled::look_for(instrument, column_name_prefix)$label
  new_names <- str_replace_all(new_names, " ", "_")
  new_names <- str_replace_all(new_names, ":", "")
  # Rename the columns
  names(instrument)[names(instrument) %in% col_names_to_rename] <- new_names
  return(instrument)
}
melanoma_data <- rename_checkbox_columns(melanoma_data, "mel_type___")
head(melanoma_data)
record_id redcap_event Melanoma_type_Cutaneous Melanoma_type_Mucosal Melanoma_type_Acral Melanoma_type_Uveal Melanoma_type_Unknown_primary Melanoma_type_Pathology_not_available mel_type_cutaneous mel_mutation___1 mel_mutation___2 mel_mutation___3 mel_mutation___4 mel_mutation___5 mel_mutation___9 mel_mutation_braf mel_mutation_nras mel_mutation_kit mel_mutation_other mel_first_date mel_first_stage resct1stdiag mel_date_diag stage_diagnosis dt_advanced_dis form_status_complete
1 baseline 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 0
2 baseline 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 0
3 baseline 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 0
4 baseline 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 0
5 baseline 0 0 0 0 0 0 2 0 0 0 0 0 0 V600M NA NA NA 2015-09-01 2 1 2016-03-10 1 27/10/16 0
6 baseline 0 0 0 0 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA 0

Viewing the Data

This section demonstrates different ways to get to know the immuno dataset and its instruments.

When the name of the object is typed, the first few lines along with some information, such as the number of rows are displayed:

immuno

Since this is a large dataset, the output is not shown here. But you can try executing the above code.

To view any column displayed above in the immuno object, you can specify the column number within [[]] or column name followed by $.

  • For example to view column 1:
immuno[[1]]
 [1] "demographics"                                     
 [2] "melanoma_data"                                    
 [3] "adjuvant_therapy"                                 
 [4] "prior_treatment"                                  
 [5] "melanoma_cns_metastases"                          
 [6] "adverse_events"                                   
 [7] "baseline_visit"                                   
 [8] "checkpoint_inhibitor_treatment"                   
 [9] "immune_related_adverse_events_iraes"              
[10] "pathology"                                        
[11] "ctmri_imaging"                                    
[12] "pet_imaging"                                      
[13] "ppi_and_antibiotic_use_during_treatment_with_cpis"
[14] "response_data"                                    
[15] "mortality_data"                                   
  • For example to view redcap_form_name column:
immuno$redcap_form_name
 [1] "demographics"                                     
 [2] "melanoma_data"                                    
 [3] "adjuvant_therapy"                                 
 [4] "prior_treatment"                                  
 [5] "melanoma_cns_metastases"                          
 [6] "adverse_events"                                   
 [7] "baseline_visit"                                   
 [8] "checkpoint_inhibitor_treatment"                   
 [9] "immune_related_adverse_events_iraes"              
[10] "pathology"                                        
[11] "ctmri_imaging"                                    
[12] "pet_imaging"                                      
[13] "ppi_and_antibiotic_use_during_treatment_with_cpis"
[14] "response_data"                                    
[15] "mortality_data"                                   

A similar method can be used to access the patient data in all instruments using the redcap_data column or the 3rd column in this case. However, this displays patient data of all the instruments one after the other, making it difficult to read. A better way is to view a single instrument as follows.

For example, to view the mortality_data instrument, we can access the redcap_data column first (i.e., immuno$redcap_data or immuno[[3]]) and then access the 15th instrument:

head(immuno$redcap_data[[15]]) # same as immuno[[3]][[15]]
record_id redcap_event mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause other_cause_death form_status_complete
3 end 1 2017-01-10 2 2016-10-24 4.7 NA NA 2
4 end 0 2024-02-09 2 2023-06-29 NA NA NA 2
7 end 0 2024-10-15 2 2021-09-01 NA NA NA 2
8 end 0 2024-09-20 1 2024-09-09 NA NA NA 2
12 end 1 2016-11-18 NA 2016-01-04 NA 1 NA 2
16 end 0 2024-04-26 1 2023-07-14 NA 1 NA 2

The dim() function prints the dimensions (rows x columns):

dim(immuno)
[1] 15 11
dim(immuno$redcap_data[[15]])
[1] 381  10

This information is available at the environment pane in the top right panel as the number of observations (rows) and variables (columns).

The nrow() function prints the number of rows while ncol() prints the number of columns:

nrow(immuno$redcap_data[[15]])
[1] 381
ncol(immuno$redcap_data[[15]])
[1] 10

The View() function gives a spreadsheet-like view of the data frame:

View(immuno)

By clicking the object on the environment tab also gives a spreadsheet-like view of the object:

The head() function prints the top 6 rows of a data frame:

head(immuno$redcap_data[[15]])
record_id redcap_event mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause other_cause_death form_status_complete
3 end 1 2017-01-10 2 2016-10-24 4.7 NA NA 2
4 end 0 2024-02-09 2 2023-06-29 NA NA NA 2
7 end 0 2024-10-15 2 2021-09-01 NA NA NA 2
8 end 0 2024-09-20 1 2024-09-09 NA NA NA 2
12 end 1 2016-11-18 NA 2016-01-04 NA 1 NA 2
16 end 0 2024-04-26 1 2023-07-14 NA 1 NA 2

Similarly, the tail() function prints the bottom 6 rows of the data frame:

tail(immuno$redcap_data[[15]])
record_id redcap_event mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause other_cause_death form_status_complete
492 end 1 2017-10-18 NA NA 6.8 1 NA 2
493 end 1 2024-06-26 2 NA 32.5 1 NA 2
494 end 1 2016-02-12 2 NA 6.1 1 NA 2
496 end 1 2018-06-28 2 NA NA 1 NA 2
497 end 1 2016-08-04 2 NA 1.1 1 NA 2
499 end NA 2018-10-09 NA NA NA NA NA 2

The colnames() function displays all the column names:

colnames(immuno$redcap_data[[15]])
 [1] "record_id"                "redcap_event"            
 [3] "mortality"                "mortality_date"          
 [5] "ongoing_survelliance"     "date_last_scan"          
 [7] "mortality_treatment_time" "mortality_cause"         
 [9] "other_cause_death"        "form_status_complete"    

The $ symbol allows access to individual columns. To display mortality_date column:

head(immuno$redcap_data[[15]]$mortality_date)
[1] "2017-01-10" "2024-02-09" "2024-10-15" "2024-09-20" "2016-11-18"
[6] "2024-04-26"

The str() function shows the structure of the data:

str(immuno$redcap_data[[15]])
tibble [381 × 10] (S3: tbl_df/tbl/data.frame)
 $ record_id               : num [1:381] 3 4 7 8 12 16 23 24 28 29 ...
 $ redcap_event            : chr [1:381] "end" "end" "end" "end" ...
 $ mortality               : num [1:381] 1 0 0 0 1 0 1 1 0 1 ...
 $ mortality_date          : Date[1:381], format: "2017-01-10" "2024-02-09" ...
 $ ongoing_survelliance    : num [1:381] 2 2 2 1 NA 1 NA 2 1 2 ...
 $ date_last_scan          : Date[1:381], format: "2016-10-24" "2023-06-29" ...
 $ mortality_treatment_time: num [1:381] 4.7 NA NA NA NA NA NA 17.2 80.4 9.7 ...
 $ mortality_cause         : num [1:381] NA NA NA NA 1 1 1 1 1 1 ...
 $ other_cause_death       : logi [1:381] NA NA NA NA NA NA ...
 $ form_status_complete    : num [1:381] 2 2 2 2 2 2 2 2 2 2 ...

The glimpse()function (dplyr package) displays a compact summary of the data frame, showing you key details such as the data types of each column, the first few values, and the total number of observations.

glimpse(immuno$redcap_data[[15]])
Rows: 381
Columns: 10
$ record_id                <dbl> 3, 4, 7, 8, 12, 16, 23, 24, 28, 29, 31, 37, 3…
$ redcap_event             <chr> "end", "end", "end", "end", "end", "end", "en…
$ mortality                <dbl> 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, …
$ mortality_date           <date> 2017-01-10, 2024-02-09, 2024-10-15, 2024-09-…
$ ongoing_survelliance     <dbl> 2, 2, 2, 1, NA, 1, NA, 2, 1, 2, NA, 2, 1, 1, …
$ date_last_scan           <date> 2016-10-24, 2023-06-29, 2021-09-01, 2024-09-…
$ mortality_treatment_time <dbl> 4.7, NA, NA, NA, NA, NA, NA, 17.2, 80.4, 9.7,…
$ mortality_cause          <dbl> NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, 1, 1, NA, N…
$ other_cause_death        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ form_status_complete     <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …

The summary() function generates summary statistics:

summary(immuno$redcap_data[[15]])
   record_id     redcap_event         mortality      mortality_date      
 Min.   :  3.0   Length:381         Min.   :0.0000   Min.   :2013-11-15  
 1st Qu.:150.0   Class :character   1st Qu.:0.0000   1st Qu.:2018-08-02  
 Median :270.0   Mode  :character   Median :1.0000   Median :2021-07-20  
 Mean   :265.2                      Mean   :0.5313   Mean   :2021-04-11  
 3rd Qu.:382.0                      3rd Qu.:1.0000   3rd Qu.:2024-02-15  
 Max.   :499.0                      Max.   :1.0000   Max.   :2024-10-15  
                                    NA's   :14       NA's   :10          
 ongoing_survelliance date_last_scan       mortality_treatment_time
 Min.   :1.000        Min.   :0624-01-02   Min.   :  0.10          
 1st Qu.:1.000        1st Qu.:2020-05-30   1st Qu.: 11.70          
 Median :2.000        Median :2023-07-06   Median : 33.40          
 Mean   :1.597        Mean   :2016-11-19   Mean   : 40.41          
 3rd Qu.:2.000        3rd Qu.:2024-02-09   3rd Qu.: 62.30          
 Max.   :2.000        Max.   :2028-07-06   Max.   :131.50          
 NA's   :71           NA's   :113          NA's   :60              
 mortality_cause other_cause_death form_status_complete
 Min.   :1.000   Mode:logical      Min.   :0.000       
 1st Qu.:1.000   NA's:381          1st Qu.:2.000       
 Median :1.000                     Median :2.000       
 Mean   :1.173                     Mean   :1.995       
 3rd Qu.:1.000                     3rd Qu.:2.000       
 Max.   :3.000                     Max.   :2.000       
 NA's   :156                                           

A statistical overview can be obtained using the skim() function in skimr package:

library(skimr)
skim(immuno$redcap_data[[15]])
Data summary
Name immuno$redcap_data[[15]]
Number of rows 381
Number of columns 10
_______________________
Column type frequency:
character 1
Date 2
logical 1
numeric 6
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
redcap_event 0 1 3 3 0 1 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
mortality_date 10 0.97 2013-11-15 2024-10-15 2021-07-20 293
date_last_scan 113 0.70 0624-01-02 2028-07-06 2023-07-06 218

Variable type: logical

skim_variable n_missing complete_rate mean count
other_cause_death 381 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
record_id 0 1.00 265.17 134.35 3.0 150.0 270.0 382.0 499.0 ▅▇▇▇▇
mortality 14 0.96 0.53 0.50 0.0 0.0 1.0 1.0 1.0 ▇▁▁▁▇
ongoing_survelliance 71 0.81 1.60 0.49 1.0 1.0 2.0 2.0 2.0 ▆▁▁▁▇
mortality_treatment_time 60 0.84 40.41 31.58 0.1 11.7 33.4 62.3 131.5 ▇▅▃▂▁
mortality_cause 156 0.59 1.17 0.54 1.0 1.0 1.0 1.0 3.0 ▇▁▁▁▁
form_status_complete 0 1.00 1.99 0.10 0.0 2.0 2.0 2.0 2.0 ▁▁▁▁▇
Try It Yourself
  1. Display the number of rows and columns in the melanoma_cns_metastases instrument.
  2. Show the first 6 and last 6 records from the melanoma_cns_metastases instrument.
  3. List the column names and row names of the instrument.
  4. Generate a statistical summary of the instrument using the skim() function.
# number of rows
nrow(immuno$redcap_data[[5]])
# number of columns
ncol(immuno$redcap_data[[5]])

# first 6 records
head(immuno$redcap_data[[5]])
# last 6 records
tail(immuno$redcap_data[[5]])

# column names
colnames(immuno$redcap_data[[5]])
# row names (no names given, so indices are used)
rownames(immuno$redcap_data[[5]])

skim(immuno$redcap_data[[5]])

Writing Data to a File

Writing data to a file is a fundamental operation in programming and data analysis. It involves taking data from within a program or environment and storing it in a file on a disk for later use or sharing. This section explains the basics of writing a data file using the readr package.

The write_csv() and write_tsv() functions are part of the readr package, which is designed for writing delimited files like CSV (comma-separated values) and TSV (tab-separated values). These functions are used to write data frames into CSV and TSV files, respectively.

We first provide the variable name of the data frame followed by the file name (ideally including the full folder location).

To write a CSV file:

# on Mac:
write_csv(cms_data, "~/Desktop/cms_data.csv")

# on Windows
write_csv(cms_data, "C:/Users/srajapaksa/Desktop/cms_data.csv")

To write a TSV file:

# on Mac:
write_tsv(cms_data, "~/Desktop/cms_data.csv")

# on Windows
write_tsv(cms_data, "C:/Users/srajapaksa/Desktop/cms_data.csv")
Try It Yourself
  1. View the help documentation of the read_redcap() function.
  2. Use read_redcap() to make an API call and read the response_data instrument.
  3. Extract the first 10 rows and assign them to a new variable called response_data_10.
  4. Save response_data_10 as a CSV file named immuno_response_data_10.csv in your Downloads folder.
help("read_redcap")
response_data_df <- read_redcap(redcap_uri = uri, token = token, raw_or_label = "raw", forms = "response_data")
response_data_10 <- head(response_data_df$redcap_data[[1]], 10)
write_csv(response_data_10, "~/Downloads/immuno_response_data_10.csv")

Data manipulation with `dplyr` functions

Common tasks in working with data include actions like filtering rows or columns, performing calculations, or adding new columns. This sort of operations is known as data manipulation. It is the process of cleaning, organising, and transforming raw data into a more structured and usable format for analysis.

In this workshop, we’ll guide you through the process of data manipulation in R, starting with the tidyverse. The tidyverse is a collection of packages that align with a data science philosophy developed by Hadley Wickham and the RStudio team. Many users find it to be a more intuitive way to grasp R concepts.

You’ll primarily use five key dplyr functions for data manipulations:

  1. filter(): pick observations based on their values.
  2. select(): pick variables by their names.
  3. mutate(): create new variables using functions applied to existing variables.
  4. summarise(): collapse multiple values into a single summary.
  5. arrange(): reorder the rows based on specified criteria.

If you’ve already installed the tidyverse package (if not, you can do so by running the command: install.packages("tidyverse")), let’s proceed to load it into our R session first:

library(tidyverse)

Next, load the pre-processed RDS Object:

immuno_dataset <- readRDS("data/Sample_immuno_dataset.rds")

RDS (R Data Serialisation) files are used to save and load single R objects while preserving their structure, labels, and attributes. The .rds format is useful for storing dataframes, lists, models, and other complex objects. For convenience, the previously loaded REDCap dataset has been pre-processed and saved as an .rds file. This pre-processed version will be used throughout the remainder of the workshop.

filter()

The filter() function takes logical expressions and returns the rows for which all are TRUE.

Example 1: Find all records from the melanoma_data data frame where the melanoma type is cutaneous.

immuno_dataset$redcap_data$melanoma_data |> 
  filter(melanoma_type == "cutaneous") |> 
  head()
record_id redcap_event melanoma_type melanoma_molecular_mutation mel_first_date mel_first_stage resct1stdiag mel_date_diag stage_diagnosis dt_advanced_dis
16 baseline cutaneous nras 2019-06-01 Stage III TRUE 2017-01-01 yes 01062017
24 baseline cutaneous wild_type 2014-03-27 Stage II TRUE 2015-01-01 yes 1/9/15
28 baseline cutaneous braf 2015-09-01 Stage II TRUE 2016-03-10 yes 27/10/16
57 baseline cutaneous nras 2019-12-01 Stage I TRUE 2020-07-09 yes 08/01/2021
58 baseline cutaneous wild_type 2013-10-01 Stage I TRUE 2015-09-14 yes 6/3/17
60 baseline cutaneous braf 2017-03-01 Stage III TRUE 2019-11-01 no 1/11/19

Here we are sending the immuno_dataset$redcap_data$melanoma_data data frame into the function filter() which tests each value in melanoma_type column for the value “cutaneous” and returns the rows where this condition is TRUE.

You can check the dimension (number of rows and number of columns) of the resulting data frame by using the dim() function as follows:

immuno_dataset$redcap_data$melanoma_data |> filter(melanoma_type == "cutaneous") |> dim()
[1] 265  10

Example 2: Identify records in mortality_data where time since first treatment dose (mortality_treatment_time) exceeds 1 year. mortality_treatment_time is given in months.

immuno_dataset$redcap_data$mortality_data |> 
  filter(mortality_treatment_time > 12) |> 
  head()
record_id redcap_event mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause
24 end TRUE 2017-03-03 no NA 17.2 melanoma progression
28 end FALSE 2023-08-18 yes 2022-08-03 80.4 melanoma progression
31 end TRUE 2017-11-09 NA NA 54.3 melanoma progression
38 end FALSE 2024-04-15 yes 2024-06-01 109.6 NA
43 end FALSE 2024-06-12 yes 2024-06-13 61.9 NA
48 end FALSE 2023-01-26 yes NA 89.0 NA

We can use logical operators like and &, or | to combine multiple conditions as follows.

Example 3: Find all the records in mortality_data where cause of death (mortality_cause) is categorised as “melanoma progression” and has a date of last follow up or death (mortality_date) before “2023-01-01”

immuno_dataset$redcap_data$mortality_data |> 
  filter(mortality_cause == "melanoma progression" & mortality_date > as.Date("2023-01-01")) |> 
  head()
record_id redcap_event mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause
16 end FALSE 2024-04-26 yes 2023-07-14 NA melanoma progression
28 end FALSE 2023-08-18 yes 2022-08-03 80.4 melanoma progression
68 end FALSE 2024-01-19 yes 2023-04-18 46.1 melanoma progression
72 end FALSE 2024-03-21 yes 2024-03-15 53.6 melanoma progression
77 end FALSE 2024-07-05 yes 2024-06-28 91.1 melanoma progression
87 end TRUE 2024-01-28 no 2015-10-17 24.9 melanoma progression

Example 4: Find the records in melanoma_data where melanoma_molecular_mutation is either braf or nras.

immuno_dataset$redcap_data$melanoma_data |> 
  filter(melanoma_molecular_mutation == "braf" | melanoma_molecular_mutation == "braf") |> 
  head()
record_id redcap_event melanoma_type melanoma_molecular_mutation mel_first_date mel_first_stage resct1stdiag mel_date_diag stage_diagnosis dt_advanced_dis
28 baseline cutaneous braf 2015-09-01 Stage II TRUE 2016-03-10 yes 27/10/16
60 baseline cutaneous braf 2017-03-01 Stage III TRUE 2019-11-01 no 1/11/19
70 baseline cutaneous braf 2012-01-01 Stage unknown as pathology unavailable TRUE 2017-08-18 no NA
79 baseline cutaneous braf 2015-04-01 Stage unknown as pathology unavailable TRUE 2018-12-01 no 1/12/18
80 baseline cutaneous braf 2013-01-01 Stage I TRUE 2015-10-01 yes 8/10/15
85 baseline cutaneous braf 2012-07-01 Stage II TRUE 2014-12-04 no 4/12/14

Example 5: Retrieve records where mortality_cause is due to “treatment toxicity” and mortality_treatment_time is greater than 4 months but less than or equal to 10 months.

immuno_dataset$redcap_data$mortality_data |> 
  filter(
    mortality_cause == "treatment toxicity" & 
    mortality_treatment_time > 4 & 
    mortality_treatment_time <= 10) 
record_id redcap_event mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause
414 end TRUE 2019-02-18 no 2019-01-31 7.0 treatment toxicity
429 end TRUE 2020-04-22 no NA 5.2 treatment toxicity

%in% helper

The %in% function is used to determine whether elements of one vector are present in another vector. It returns a logical vector indicating whether each element of the first vector is found in the second vector.

When we want to filter a subset of rows that may contain multiple different values, it’s more efficient to provide a vector of the values of interest instead of combining multiple OR commands.

Example 6: Retrieve records where melanoma_type is acral, cutaneous or muscosal.

immuno_dataset$redcap_data$melanoma_data |> 
  filter(melanoma_type %in% c("acral", "cutaneous",  "muscosal")) |> 
  head()
record_id redcap_event melanoma_type melanoma_molecular_mutation mel_first_date mel_first_stage resct1stdiag mel_date_diag stage_diagnosis dt_advanced_dis
16 baseline cutaneous nras 2019-06-01 Stage III TRUE 2017-01-01 yes 01062017
24 baseline cutaneous wild_type 2014-03-27 Stage II TRUE 2015-01-01 yes 1/9/15
28 baseline cutaneous braf 2015-09-01 Stage II TRUE 2016-03-10 yes 27/10/16
44 baseline acral NA 2012-04-20 Stage II TRUE 2016-03-30 no 30/3/16
57 baseline cutaneous nras 2019-12-01 Stage I TRUE 2020-07-09 yes 08/01/2021
58 baseline cutaneous wild_type 2013-10-01 Stage I TRUE 2015-09-14 yes 6/3/17
Try It Yourself

Find patients who have stage III or IV disease (mel_first_stage column) and the first recurrence (mel_date_diag) is after 2021-03-01.

Hint: Use melanoma_data instrument

immuno_dataset$redcap_data$melanoma_data |> 
  filter(
    mel_first_stage %in% c("Stage III", "Stage IV") & 
    mel_date_diag > as.Date("2021-03-01"))

select()

The select() function returns a subset of the variables or columns.

This function can accept column names (even without quotation marks) or the column position number starting from the left. Unlike in base R (we explore before), commands within the brackets in select() do not need to be concatenated using c().

Example 1: Extract the record ID, Echo date (ae_echo_date) and MRI date (ae_mri_date) columns from adverse_events data frame.

immuno_dataset$redcap_data$adverse_events |> 
  select(record_id, ae_echo_date, ae_mri_date) |> 
  filter(!is.na(ae_echo_date)) # filter non-missing values in ae_echo_date column
record_id ae_echo_date ae_mri_date
104 2022-10-19 2022-09-27
104 2022-10-19 2022-09-27
104 2022-10-19 2022-09-27
104 2022-10-19 2022-09-27

Using column positions:

immuno_dataset$redcap_data$adverse_events |> 
  select(1, 23, 30) |> 
  filter(!is.na(ae_echo_date)) # filter non-missing values in ae_echo_date column
record_id ae_echo_date ae_mri_date
104 2022-10-19 2022-09-27
104 2022-10-19 2022-09-27
104 2022-10-19 2022-09-27
104 2022-10-19 2022-09-27

We can use the ‘-’ symbol to extract all columns except for specific ones:

immuno_dataset$redcap_data$demographics |> 
  select(-redcap_event, -ur, -sex, -dob, -height, -weight) |> 
  head()
record_id last_name first_name bmi other_studies_enrolled_in medical_history autoimmune_disease select_autoimmune_diseases smoking
1 Jackson Hannah 68.73 NA NA NA NA NA
2 Howard Samantha 29.61 NA NA NA NA NA
3 Martinez Noah 31.06 NA NA NA NA NA
4 Lewis Aiden 51.51 NA NA NA NA NA
5 Jenkins Connor 20.28 NA NA FALSE NA Past smoker
6 Allen Claire 28.06 NA NA NA NA NA

Or use a combination of column names and positions:

immuno_dataset$redcap_data$demographics |> 
  select(1, medical_history, 15) |> 
  head()
record_id medical_history smoking
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA Past smoker
6 NA NA

Useful helper functions

The select helper functions (check ?select_helpers) are a set of convenience functions provided by the dplyr package. These functions offer shortcuts for selecting columns based on specific criteria or patterns, making it easier to work with data frames.

Some commonly used select helper functions include:

  1. starts_with(): selects columns that start with a specified prefix.
immuno_dataset$redcap_data$immune_related_adverse_events_iraes |> 
  select(starts_with('liver')) |> 
  head()
liver_date liver_cycle liver_grade liver_path liver_histo liver_radiol
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
  1. ends_with(): selects columns that end with a specified suffix.
immuno_dataset$redcap_data$immune_related_adverse_events_iraes |> 
  select(ends_with('date')) |> 
  head()
hypophysitis_date thyroiditis_date pancreatitis_date skin_date rheum_date gastro_date liver_date renal_date pulm_date neuro_date
2017-01-20 NA NA 2016-09-21 NA NA NA NA NA NA
2017-01-20 NA NA 2016-09-21 NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA
  1. contains(): selects columns that contain a specified substring.
immuno_dataset$redcap_data$immune_related_adverse_events_iraes |> 
  select(contains('skin')) |> 
  head()
skin_date skin_cycle characterise_skin/hair_irae skin_grade skin_histo
2016-09-21 NA bullous_pemphigoid 3 Consistent with a pemphigoid-type reaction, possibly drug-related or consistent with Bullous Pemphigoid.
2016-09-21 NA bullous_pemphigoid 3 Consistent with a pemphigoid-type reaction, possibly drug-related or consistent with Bullous Pemphigoid.
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
  1. everything(): Selects all columns.

This function returns all column names that have not been specified. It is often used when reordering all columns in a dataframe:

immuno_dataset$redcap_data$immune_related_adverse_events_iraes |> 
  select(1, starts_with("gastro"), everything()) |> 
  head()
record_id gastro_date gastro_cycle gastro_grade gastro_endoscopy gastro_histo gastro_radiol gastro_cdt gastro_stool redcap_event immune_related_adverse_event endocrine_toxicity hypophysitis_date hypophysitis_time thyroiditis_date thyroiditis_time pancreatitis_date hypophysitis_cycle thyroid_cycle pancreas_cycle skin_date skin_cycle characterise_skin/hair_irae skin_grade skin_histo rheum_date rheum_cycle characterise_rheumatic_irae rheum_grade rheum_path characterise_gastrointestinal_irae liver_date liver_cycle liver_grade liver_path liver_histo liver_radiol renal_date renal_cycle characterise_renal_irae_ renal_grade renal_path renal_histo pulm_date pulm_cycle characterise_pulmonary_irae pulm_grade pulm_radiol neuro_date neuro_cycle classify_neurological_irae neuro_grade neuro_radiol irae_steroids irae_details irae_emergency
31 NA NA NA NA NA NA NA NA legacy_data endocrine pituitary 2017-01-20 234 NA NA NA NA NA NA 2016-09-21 NA bullous_pemphigoid 3 Consistent with a pemphigoid-type reaction, possibly drug-related or consistent with Bullous Pemphigoid. NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE ipi 2013 - no irae pembro 2015 - no irae ipi 2016 - Bullous pemphigoid - eruption right thigh, limbs,feet Plaque left cheek Derm Imp: bullous pemphigoid, ?secondary to ipilimumab ipi nivo 2017 - hypophysitis and recurrence of bullous pemphigoid. Steroids and IvIg TRUE
31 NA NA NA NA NA NA NA NA legacy_data skin/hair pituitary 2017-01-20 234 NA NA NA NA NA NA 2016-09-21 NA bullous_pemphigoid 3 Consistent with a pemphigoid-type reaction, possibly drug-related or consistent with Bullous Pemphigoid. NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE ipi 2013 - no irae pembro 2015 - no irae ipi 2016 - Bullous pemphigoid - eruption right thigh, limbs,feet Plaque left cheek Derm Imp: bullous pemphigoid, ?secondary to ipilimumab ipi nivo 2017 - hypophysitis and recurrence of bullous pemphigoid. Steroids and IvIg TRUE
74 NA NA NA NA NA NA NA NA legacy_data NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
96 NA NA NA NA NA NA NA NA legacy_data NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
97 NA NA NA NA NA NA NA NA legacy_data NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
102 NA NA NA NA NA NA NA NA legacy_data NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Here the dimensions of the dataframe is not changed, merely the column order.

You can combine multiple helper functions to create more complex selection criteria. Additionally, you can use the ‘-’ symbol in front of the helper function to exclude the matched columns.

Try It Yourself

Identify patients whose first response (overall_response) and best response on PET (best_res_pet) were either CMR (Complete metabolic response) or PMR (Partial metabolic response). Display only relevant columns: patient ID (record_id), first response (overall_response), and best PET response (best_res_pet).

Hint: Use response_data instrument

immuno_dataset$redcap_data$response_data |> 
  filter(best_res_pet %in% c("Complete metabolic response", "Partial metabolic response"), 
         overall_response %in% c("Complete metabolic response", "Partial metabolic response")) |> 
  select(record_id, best_res_pet, overall_response)

mutate()

The mutate() function adds new columns of data, thus ‘mutating’ the contents and dimensions of the input data frame.

Example 1: Calculate the BMI of patients (i.e, \(\text{BMI } = \frac{\text{nweight}}{\text{height in m} \times \text{ height in m}} \times 100\)).

Here we use the round() function to round off the result to the closest integer or numeric value as number of responses cannot contain decimal values.

immuno_dataset$redcap_data$demographics |> 
  mutate(bmi_new = weight / (height/100 * height/100)) |> 
  head()
record_id redcap_event ur last_name first_name sex dob height weight bmi other_studies_enrolled_in medical_history autoimmune_disease select_autoimmune_diseases smoking bmi_new
1 baseline 2810493 Jackson Hannah Male 1943-02-12 154 163 68.73 NA NA NA NA NA 68.72997
2 baseline 6408685 Howard Samantha Male 2008-01-29 181 97 29.61 NA NA NA NA NA 29.60838
3 baseline 9994173 Martinez Noah Male 1940-11-06 199 123 31.06 NA NA NA NA NA 31.05982
4 baseline 9580798 Lewis Aiden Male 1963-06-04 189 184 51.51 NA NA NA NA NA 51.51032
5 baseline 2653008 Jenkins Connor Male 2019-08-15 157 50 20.28 NA NA FALSE NA Past smoker 20.28480
6 baseline 931154 Allen Claire Male 1957-07-12 145 59 28.06 NA NA NA NA NA 28.06183

This creates a new column at the end of the data frame named bmi_new and computes the BMI. Because the number of columns is expanding, we can reduce the number of columns displayed using the select() function.

To do this, we need to use chaining which is discussed before.

Let’s use chaining to combine both select() and mutate() operations for the previous example:

immuno_dataset$redcap_data$demographics |> 
  select(record_id, weight, height, bmi) |> 
  mutate(bmi_new = weight / (height/100 * height/100)) |> 
  head()
record_id weight height bmi bmi_new
1 163 154 68.73 68.72997
2 97 181 29.61 29.60838
3 123 199 31.06 31.05982
4 184 189 51.51 51.51032
5 50 157 20.28 20.28480
6 59 145 28.06 28.06183

case_when helper function

The case_when() function allows you to create conditional statements inside mutate(). It is a vectorized alternative to multiple ifelse() statements, making the code cleaner and easier to read. Each case is evaluated sequentially and the first match for each element determines the corresponding value in the output vector. If no cases match, the .default is used as a final “else” statment.

case_when(
  condition1 ~ value1,
  condition2 ~ value2,
  condition3 ~ value3,
  TRUE ~ default_value
)

Each condition is checked in order, and the corresponding value is assigned if the condition is TRUE. The TRUE ~ default_value at the end acts as a fallback for any rows that do not match previous conditions.

Example 2: generates a new column bmi_category, which classifies patients based on their Body Mass Index (BMI) using the following categories:

  • “Underweight”: BMI < 18.5
  • “Normal Weight”: 18.5 ≤ BMI < 25
  • “Overweight”: 25 ≤ BMI < 30
  • “Obese”: BMI ≥ 30.
immuno_dataset$redcap_data$demographics |> 
  select(record_id, weight, height, bmi) |> 
  mutate(bmi_category = case_when(
    bmi < 18.5 ~ "Underweight",
    bmi >= 18.5 & bmi < 25 ~ "Normal Weight",
    bmi >= 25 & bmi < 30 ~ "Overweight",
    bmi > 30 ~ "Obese",
    TRUE ~ "Unknown"
  )) |> head()
record_id weight height bmi bmi_category
1 163 154 68.73 Obese
2 97 181 29.61 Overweight
3 123 199 31.06 Obese
4 184 189 51.51 Obese
5 50 157 20.28 Normal Weight
6 59 145 28.06 Overweight

Here mutate() creates a new column bmi_category. case_when() assigns categories based on the BMI values and the final condition TRUE ensures any missing or unclassified values are labeled as “Unknown”.

Try It Yourself

Determine whether patients have “normal” or “abnormal” serum creatinine levels based on their sex (sex) and creatinine (sys_path_creat). Create a new column creat_level with the following categories:

  • “Normal”:
    • Men (sex = “Male”) with creatinine between 0.7 and 1.3 mg/dL
    • Women (sex = “Female”) with creatinine between 0.6 and 1.1 mg/dL
  • “Abnormal”: Otherwise

The demog_prior_treatment dataset, which merges demographic and prior treatment instruments, has already been provided.

Hint: To convert serum creatinine from µmol/L to mg/dL, use the conversion factor:

\[ \text{Creatinine (mg/dL)} = \text{Creatinine (µmol/L)} \times 0.0113 \]

demog_prior_treatment <- full_join(immuno_dataset$redcap_data$demographics,
                                   immuno_dataset$redcap_data$prior_treatment, by = "record_id") |> 
  filter(!is.na(sys_path_creat)) # filter non-missing values in sys_path_creat column
demog_prior_treatment |> 
  mutate(creat_level = case_when(
    sex == "Male" & sys_path_creat * 0.0113 >= 0.7 & sys_path_creat * 0.0113 <= 1.3 ~ "Normal",
    sex == "Female" & sys_path_creat * 0.0113 >= 0.6 & sys_path_creat * 0.0113 <= 1.1 ~ "Normal",
    TRUE ~ "Abnormal"
)) |> 
  select(record_id, sex, sys_path_creat, creat_level)

summarise()

The summarise() function creates individual summary statistics from larger data sets.

The output of summarise()/summarize() differs qualitatively from the input. It results in a smaller dataframe with a reduced representation of the original data. While not strictly necessary, it’s advisable to assign new column names for the summary statistics generated by this function. This practice enhances clarity and organisation in your data analysis workflow.

Example 1: Calculate the mean number of creatinine (sys_path_creat) in prior_treatment data frame.

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(mean_creatinine = mean(sys_path_creat))
mean_creatinine
NA

This results in a data frame of size 1 row \(\times\) 1 col with a value of NA, indicating that the result is either Not Applicable or missing. This occurs because the column contains missing values, making the mean calculation invalid. To compute the mean creatinine level while excluding missing values, use the na.rm = TRUE argument in the mean() function.

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(mean_creatinine = mean(sys_path_creat, na.rm = TRUE))
mean_creatinine
69.14035

We can create additional summary statistics by adding them in a comma-separated sequence as follows:

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(mean_creatinine = mean(sys_path_creat, na.rm = TRUE),
            min_creatinine = min(sys_path_creat, na.rm = TRUE),
            max_creatinine = max(sys_path_creat, na.rm = TRUE),
            total_creatinine = sum(sys_path_creat, na.rm = TRUE))
mean_creatinine min_creatinine max_creatinine total_creatinine
69.14035 51 108 3941

n() helper function

This function counts the number of observations in a dataset. It does not take any arguments, but simply counts the rows.

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(mean_creatinine = mean(sys_path_creat, na.rm = TRUE),
            min_creatinine = min(sys_path_creat, na.rm = TRUE),
            max_creatinine = max(sys_path_creat, na.rm = TRUE),
            total_creatinine = sum(sys_path_creat, na.rm = TRUE), 
            n_rows = n())
mean_creatinine min_creatinine max_creatinine total_creatinine n_rows
69.14035 51 108 3941 11627
Try It Yourself

Summarise the key laboratory values and treatment outcomes for patients who received systemic therapy. Calculate the mean of C peptide (sys_path_cpep), creatinine (sys_path_creat), and lactate dehydrogenase (systemic_ldh_value).

Hint: Use prior_treatment instrument

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(
    mean_cpep = mean(sys_path_cpep, na.rm = TRUE),
    mean_creat = mean(sys_path_creat, na.rm = TRUE),
    mean_ldh = mean(systemic_ldh_value, na.rm = TRUE))

arrange()

The arrange() function orders rows based on the values in a given column.

Example 1: Order the records based on the UR number in demographics.

immuno_dataset$redcap_data$demographics |> 
  arrange(ur) |> 
  head()
record_id redcap_event ur last_name first_name sex dob height weight bmi other_studies_enrolled_in medical_history autoimmune_disease select_autoimmune_diseases smoking
216 baseline 102028 Johnson Catherine Female 1979-08-09 190 103 28.53 mrv NA TRUE inflammatory_bowel_disease Never smoked
74 baseline 112616 Peterson Sophie Male 1929-09-08 161 71 27.39 NA hypertension TRUE inflammatory_arthritis Never smoked
98 baseline 113893 Green Nathan Male 1975-11-29 147 190 87.93 NA NA FALSE NA Never smoked
249 baseline 121563 Jones Jack Male 1998-06-06 193 68 18.26 NA NA NA NA NA
195 baseline 140072 Foster Charlotte Female 1963-05-19 195 135 35.50 micromac NA FALSE NA Never smoked
260 baseline 146543 Barnes Abigail Female 1927-01-16 167 158 56.65 NA NA NA NA NA

Example 2: Sort the records in mortality_data based on the mortality date first and then by last scan date (date_last_scan).

immuno_dataset$redcap_data$mortality_data |> 
  arrange(mortality_date, date_last_scan) |> 
  head()
record_id redcap_event mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause
70 end TRUE 2013-11-15 no NA 45.7 melanoma progression
193 end TRUE 2015-04-25 no NA NA melanoma progression
331 end TRUE 2015-05-29 no NA 28.0 melanoma progression
84 end TRUE 2015-06-19 no NA 0.5 melanoma progression
349 end TRUE 2015-07-25 no 2024-01-17 8.5 melanoma progression
85 end FALSE 2015-08-05 no 2018-10-25 7.5 melanoma progression

desc() helper function

This function is used to sort data in descending order.

Example 3: Sort the records in mortality_data in descending order based on the mortality_treatment_time.

immuno_dataset$redcap_data$mortality_data |> 
  arrange(desc(mortality_treatment_time)) |> 
  head()
record_id redcap_event mortality mortality_date ongoing_survelliance date_last_scan mortality_treatment_time mortality_cause
188 end FALSE 2024-09-06 yes 2023-12-22 131.5 NA
345 end TRUE 2018-08-03 no NA 115.0 melanoma progression
221 end FALSE 2024-08-12 yes 2024-06-08 111.8 NA
38 end FALSE 2024-04-15 yes 2024-06-01 109.6 NA
146 end FALSE 2024-09-06 no 2020-08-06 108.6 NA
200 end FALSE 2024-07-12 no 2022-01-04 108.3 NA
Try It Yourself

Exclude records where either the first name (first_name) or last name (last_name) is missing, and then sort the remaining records in ascending order, first by first_name and then by last_name.

Hint: Use demographics instrument. Use is.na() to check for missing values and !is.na() to keep only non-missing values.

immuno_dataset$redcap_data$demographics |> 
  filter(!is.na(first_name), !is.na(last_name)) |> 
  arrange(first_name, last_name)

count() helper

The count() function is used to count the number of occurrences of unique values in one or more variables within a data frame. This function is particularly useful for summarising data and understanding the distribution of values within a dataset.

Example 1: Count the number of melanoma types in melanoma_data data frame.

immuno_dataset$redcap_data$melanoma_data |> 
  count(melanoma_type)
melanoma_type n
acral 7
cutaneous 265
mucosal 16
pathology_not_available 4
unknown_primary 64
uveal 21
NA 132

Example 2: Count the number of records observed in each melanoma type and melanoma molecular mutation.

immuno_dataset$redcap_data$melanoma_data |> 
  count(melanoma_type, melanoma_molecular_mutation)
melanoma_type melanoma_molecular_mutation n
acral braf 1
acral kit 2
acral nras 2
acral wild_type 1
acral NA 1
cutaneous braf 110
cutaneous kit 3
cutaneous nras 54
cutaneous other 1
cutaneous unknown 3
cutaneous wild_type 74
cutaneous NA 20
mucosal braf 1
mucosal kit 3
mucosal nras 3
mucosal wild_type 9
pathology_not_available braf 1
pathology_not_available nras 1
pathology_not_available wild_type 2
unknown_primary braf 30
unknown_primary kit 1
unknown_primary nras 15
unknown_primary other 1
unknown_primary wild_type 16
unknown_primary NA 1
uveal braf 1
uveal nras 1
uveal other 2
uveal unknown 1
uveal wild_type 12
uveal NA 4
NA braf 2
NA wild_type 2
NA NA 128
Try It Yourself

Determine the number of records with each single-agent immunotherapy (type_of_single_agent_io: ipilimumab, nivolumab, pembrolizumab). Additionally, count the number of patients for each type of best response (best_response_to_ipi, best_response_to_nivo_p, best_response_to_pembro) separately.

Hint: Use prior_treatment instrument.

immuno_dataset$redcap_data$prior_treatment |> count(type_of_single_agent_io)
immuno_dataset$redcap_data$prior_treatment |> count(best_response_to_ipi)
immuno_dataset$redcap_data$prior_treatment |> count(best_response_to_nivo)
immuno_dataset$redcap_data$prior_treatment |> count(best_response_to_pembro)

Visualising Data

ggplot2 package simplifies the creation of plots. This package offers a streamlined interface for defining variables to plot, configuring their display, and adjusting visual attributes. Consequently, adapting to changes in the data or transitioning between plot types requires only minimal modifications. This feature facilitates the creation of high-quality plots suitable for publication with minimal manual adjustments.

If you’ve already installed the tidyverse package (if not, you can do so by running the command: install.packages("tidyverse")), let’s proceed to load it into our R session first:

library(tidyverse)

Next, load the pre-processed RDS Object:

immuno_dataset <- readRDS("data/Sample_immuno_dataset.rds")

Building a Basic Plot

The construction of ggplot graphics is incremental, allowing for the addition of new elements in layers. This approach grants users extensive flexibility and customisation options, enabling the creation of tailored plots to suit specific needs.

To build a ggplot, the following basic templates can be used for different types of plots.

Three things are required for a ggplot:

1. The data

We first specify the data frame that contains the relevant data to create a plot. Here we are sending the immuno_dataset$redcap_data$melanoma_data to the ggplot() function.

# render plot background
ggplot(immuno_dataset$redcap_data$demographics)

This command results in an empty gray panel. We must specify how various columns of the data frame should be depicted in the plot.

2. Aesthetics aes()

Next, we specify the columns in the data we want to map to visual properties (called aesthetics or aes in ggplot2). e.g. the columns for x values, y values and colours.

Since we are interested in generating a scatter plot, each point will have an x and a y coordinate. Therefore, we need to specify the x-axis to represent the year and y-axis to represent the count.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = height))

This results in a plot which includes the grid lines, the variables and the scales for x and y axes. However, the plot is empty or lacks data points.

3. Geometric Representation geom_()

Finally, we specify the type of plot (the geom). There are different types of geoms:

geom_blank() draws an empty plot.

geom_segment() draws a straight line. geom_vline() draws a vertical line and geom_hline() draws a horizontal line.

geom_curve() draws a curved line.

geom_line()/geom_path() makes a line plot. geom_line() connects points from left to right and geom_path() connects points in the order they appear in the data.


geom_point() produces a scatterplot.

geom_jitter() adds a small amount of random noise to the points in a scatter plot.

geom_dotplot() produces a dot plot.

geom_smooth() adds a smooth trend line to a plot.

geom_quantile() draws fitted quantile with lines (a scatter plot with regressed quantiles).

geom_density() creates a density plot.


geom_histogram() produces a histogram.

geom_bar() makes a bar chart. Height of the bar is proportional to the number of cases in each group.

geom_col() makes a bar chart. Height of the bar is proportional to the values in data.


geom_boxplot() produces a box plot.

geom_violin() creates a violin plot.


geom_ribbon() produces a ribbon (y interval defined line).

geom_area() draws an area plot, which is a line plot filled to the y-axis (filled lines).

geom_rect(), geom_tile() and geom_raster() draw rectangles.

geom_polygon() draws polygons, which are filled paths.


geom_text() adds text to a plot.

geom_text() adds label to a plot.

The range of geoms available in ggplot2 can be obtained by navigating to the ggplot2 package in the Packages tab pane in RStudio (bottom right-hand corner) and scrolling down the list of functions sorted alphabetically to the geom_... functions.

Since we are interested in creating a scatter plot, the geometric representation of the data will be in point form. Therefore we use the geom_point() function.

To plot the expression of estrogen receptor alpha (ESR1) against that of the transcription factor, GATA3:

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point() 
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Notice that we use the + sign to add a layer of points to the plot. This concept bears resemblance to Adobe Photoshop, where layers of images can be rearranged and edited independently. In ggplot, each layer is added over the plot in accordance with its position in the code using the + sign.

A note about |> and +

ggplot2 package was developed prior to the introduction of the pipe operator. In ggplot2, the + sign functions analogously to the pipe operator in other tidyverse functions, enabling code to be written from left to right.

Customising Plots

Adding Colour

The above plot could be made more informative. For instance, the additional information regarding the gender (i.e., sex column) could be incorporated into the plot. To do this, we can utilise aes() and specify which column in the data frame should be represented as the color of the points.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi, color = sex)) + 
  geom_point() 
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Notice that we specify the colour = sex argument in the aes() mapping inside the geom_() function instead of ggplot() function. Aesthetic mappings can be set in both ggplot() and individual geom() layers and we will discuss the difference in the Section: Adding Layers.

To colour points based on a continuous variable, for example: height:

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = height)) 
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

In ggplot2, a color scale is used for continuous variables, while discrete or categorical values are represented using discrete colors.

Note that some patient samples lack values, leading ggplot2 to remove those points with missing values for bmi and weight.

Adding Shape

Let’s add shape to points.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(shape = smoking)) 
Warning: Removed 105 rows containing missing values or values outside the scale range
(`geom_point()`).

Note that some patient samples have not been classified and ggplot has removed those points with missing values for the smoking categories.

Some aesthetics like shape can only be used with categorical variables:

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(shape = height)) 
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `scale_f()`:
! A continuous variable cannot be mapped to the shape aesthetic.
ℹ Choose a different aesthetic or use `scale_shape_binned()`.

The shape argument allows you to customise the appearance of all data points by assigning an integer associated with predefined shapes shown below:

To use asterix instead of points in the plot:

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(shape = 8) 
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

It would be useful to be able to change the shape of all the points. We can do so by setting the size to a single value rather than mapping it to one of the variables in the data set - this has to be done outside the aesthetic mappings (i.e. outside the aes() bit) as above.

Aesthetic Setting vs. Mapping

Instead of mapping an aesthetic property to a variable, you can set it to a single value by specifying it in the layer parameters (outside aes()). We map an aesthetic to a variable (e.g., aes(shape = THREEGENE)) or set it to a constant (e.g., shape = 8). If you want appearance to be governed by a variable in your data frame, put the specification inside aes(); if you want to override the default size or colour, put the value outside of aes().

# size outside aes()
ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(shape = 8) 
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).
# size inside aes()
ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(shape = smoking)) 
Warning: Removed 105 rows containing missing values or values outside the scale range
(`geom_point()`).

The above plots are created with similar code, but have rather different outputs. The first plot sets the size to a value and the second plot maps (not sets) the size to the three-gene classifier variable.

It is usually preferable to use colours to distinguish between different categories but sometimes colour and shape are used together when we want to show which group a data point belongs to in two different categorical variables.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex, shape = smoking))
Warning: Removed 105 rows containing missing values or values outside the scale range
(`geom_point()`).

Adding Size and Transparency

We can adjust the size and/or transparency of the points.

Let’s first increase the size of points.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex), size = 2)
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Note that here we add the size argument outside of the the aesthetic mapping.

Size is not usually a good aesthetic to map to a variable and hence is not advised.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex, size = smoking))
Warning: Using size for a discrete variable is not advised.
Warning: Removed 105 rows containing missing values or values outside the scale range
(`geom_point()`).

Because this value is discrete, the default size scale uses evenly spaced sizes for points categorised on smoking categories.

Transparency can be useful when we have a large number of points as we can more easily tell when points are overlaid, but like size, it is not usually mapped to a variable and sits outside the aes().

Let’s change the transparency of points.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex), alpha = 0.5)
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Adding Layers

We can add another layer to this plot using a different geometric representation (or geom_ function) we discussed previously.

Let’s add trend lines to this plot using the geom_smooth() function which provide a summary of the data.

ggplot(immuno_dataset$redcap_data$demographics) + 
  geom_point(aes(x = weight, y = bmi)) +
  geom_smooth(aes(x = weight, y = bmi))
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 8 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Note that the shaded area surrounding blue line represents the standard error bounds on the fitted model.

There is some annoying duplication of code used to create this plot. We’ve repeated the exact same aesthetic mapping for both geoms. We can avoid this by putting the mappings in the ggplot() function instead.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point() +
  geom_smooth()

Geom layers specified earlier in the command are drawn first, preceding subsequent geom layers. The sequence of geom layers specified in the command determines their order of appearance in the plot.

If you switch the order of the geom_point() and geom_smooth() functions above, you’ll notice a change in the regression line. Specifically, the regression line will now be plotted underneath the points.

Let’s make the plot look a bit prettier by reducing the size of the points and making them transparent. We’re not mapping size or alpha to any variables, just setting them to constant values, and we only want these settings to apply to the points, so we set them inside geom_point().

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(size = 0.5, alpha = 0.5) +
  geom_smooth() 
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 8 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Aesthetic Specifications in Plot vs. Layers

Aesthetic mappings can be provided either in the initial ggplot() call, in individual layers, or through a combination of both approaches. When there’s only one layer in the plot, the method used to specify aesthetics doesn’t impact the result.

# colour argument inside ggplot()
ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi, colour = smoking)) + 
  geom_point(size = 0.5, alpha = 0.5) +
  geom_smooth() 
# colour argument inside geom_point()
ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = smoking), size = 0.5, alpha = 0.5) +
  geom_smooth() 

In the left plot, since we specified the colour (i.e., colour = smoking) inside the ggplot() function, the geom_smooth() function will fit regression lines for each type of ER status and will have coloured regression lines as shown above. This is because, when aesthetic mappings are defined in ggplot(), at the global level, they’re passed down to each of the subsequent geom layers of the plot.

If we want to add colour only to the points and fit a regression line across all points, we could specify the colour inside geom_point() function (i.e., right plot).

Plot Labels

You can customise plots to include a title, a subtitle, a caption or a tagusing the labs() function.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex), size = 0.5, alpha = 0.5) +
  geom_smooth() +
  labs(
    title = "Variation between BMI and Weight coloured by smoking categories of melanoma patients",
    subtitle = "BMI vs Weight",
    caption = "Variation between BMI and Weight",
    tag = "Figure 1",
    y = "Body Mass Index",
    x = "Weight (kg)")

Themes

Themes control the overall appearance of the plot, including background color, grid lines, axis labels, and text styles. ggplot offers several built-in themes, and you can also create custom themes to match your preferences or the requirements of your publication. The default theme has a grey background.

weight_vs_bmi <- ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = smoking), size = 0.5, alpha = 0.5) +
  geom_smooth() 

weight_vs_bmi + theme_bw()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 8 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Try these themes yourselves: theme_classic(), theme_dark(), theme_grey() (default), theme_light(), theme_linedraw(), theme_minimal(), theme_void() and theme_test().

Try It Yourself

Given the cortisol_mort_time data frame, complete the following tasks:

  1. Print the data frame to explore its contents.
  2. Create a scatter plot of cortisol levels versus melanoma treatment time.
  3. Color the points by melanoma type.
  4. Set the point size to 2 and adjust transparency to 0.7.
  5. Add a single regression line for all points.
  6. Include a plot title and label the axes as:
    • Y-axis: Cortisol (nmol/L)
    • X-axis: Time since first treatment dose (months)
  7. Apply the theme_grey() for styling.
# Extract cortisol values from the pathology instrument,
# keeping only patient ID and cortisol, and remove rows with missing cortisol values
cortisol <- immuno_dataset$redcap_data$pathology |> 
  select(record_id, cortisol) |> 
  filter(!is.na(cortisol))

# Join cortisol data with mortality_data instrument to bring in mortality_treatment_time,
# and keep only relevant columns: patient ID, mortality treatment time, and cortisol
cortisol_mort_time <- left_join(cortisol, immuno_dataset$redcap_data$mortality_data) |> 
  select(record_id, mortality_treatment_time, cortisol)

# Extract melanoma type information from the melanoma_data instrument
mel_subset <- immuno_dataset$redcap_data$melanoma_data |> select(record_id, melanoma_type)

# Join melanoma type to the existing cortisol_mort_time data
cortisol_mort_time <- left_join(cortisol_mort_time, mel_subset)
cortisol_mort_time

2.-7.

ggplot(cortisol_mort_time, aes(x = mortality_treatment_time, y = cortisol)) + 
  geom_point(aes(colour = melanoma_type), size = 2, alpha = 0.7) +
  geom_smooth() +
  labs(
    title = "Cortisol vs Time since first treatment dose",
    y = "Cortisol (nmol/L)",
    x = "Time since first treatment dose (months)",
    colour = "Melanoma Type" # legend title
  ) + 
  theme_grey()

Bar chart

Let’s create a bar chart of the number of patients based on different melanoma type in melanoma_data instrument.

The geom_bar is the geom used to plot bar charts. It requires a single aesthetic mapping of the categorical variable of interest to x.

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type))

The dark grey bars are a big ugly - what if we want each bar to be a different colour?

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, colour = melanoma_type))

Colouring the edges wasn’t quite what we had in mind. Look at the help for geom_bar to see what other aesthetic we should have used.

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, fill = melanoma_type))

What happens if we colour (fill) with something other than the melanoma_type?

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, fill = melanoma_molecular_mutation))

We get a stacked bar plot.

Note the similarity in what we did here to what we did with the scatter plot - there is a common grammar.

We can rearrange the three gene groups into adjacent (dodged) bars by specifying a different position within geom_bar():

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, fill = melanoma_molecular_mutation), position = 'dodge')

What if want all the bars to be the same colour but not dark grey, e.g. blue?

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, fill = "blue"))

That doesn’t look right - why not?

You can set the aesthetics to a fixed value but this needs to be outside the mapping, just like we did before for size and transparency in the scatter plots.

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type), fill = "blue")

Setting this inside the aes() mapping told ggplot2 to map the colour aesthetic to some variable in the data frame, one that doesn’t really exist but which is created on-the-fly with a value of “blue” for every observation.

Try It Yourself

Create a horizontal bar chart displaying the types of immune checkpoint inhibition therapies. Add appropriate axis labels and a descriptive title.

Hint: use prior_treatment instrument.

ggplot(immuno_dataset$redcap_data$prior_treatment, aes(y = systemic_ici)) +
  geom_bar() +
  labs(
    title = "Immune Checkpoint Inhibition Therapy Counts",
    x = "Counts",
    y = "Immune Checkpoint Inhibition Therapy Type"
  )

Box plot

Box plots (or box & whisker plots) are a particular favourite seen in many seminars and papers. Box plots summarise the distribution of a set of values by displaying the minimum and maximum values, the median (i.e. middle-ranked value), and the range of the middle 50% of values (inter-quartile range). The whisker line extending above and below the IQR box define Q3 + (1.5 x IQR), and Q1 - (1.5 x IQR) respectively.

To create a box plot from immuno dataset:

# join the melanoma_data instrument and mortality_data
mel_mort <- full_join(immuno_dataset$redcap_data$melanoma_data , 
                      immuno_dataset$redcap_data$mortality_data, 
                      by = "record_id")
# keep only the non-missing rows of mortality_treatment_time column
mel_mort <- mel_mort |> filter(!is.na(mortality_treatment_time)) 

ggplot(mel_mort, aes(x = melanoma_type, y = mortality_treatment_time)) +
  geom_boxplot()

See geom_boxplot help to explain how the box and whiskers are constructed and how it decides which points are outliers and should be displayed as points.

Let’s try a colour aesthetic to also look at how estrogen receptor expression differs between HER2 positive and negative tumours.

ggplot(mel_mort, aes(x = melanoma_type, y = mortality_treatment_time, color = melanoma_type)) +
  geom_boxplot() 

Try It Yourself

Create a box plot showing the duration of response to BRAF/MEK inhibitors by type of BRAF/MEK therapy. Make sure to include appropriate axis labels.

Hint: use prior_treatment instrument.

ggplot(immuno_dataset$redcap_data$prior_treatment, 
       aes(x = braf_mek, y = dur_braf_mek)) +
  geom_boxplot() + 
  labs(
    x = "Type of BRAF/ MEK",
    y = "Duration of Response to BRAF/MEK inhibitor (months)"
  )

Violin plot

A violin plot is used to visualise the distribution of a numeric variable across different categories. It combines aspects of a box plot and a kernel density plot.

The width of the violin at any given point represents the density of data at that point. Wider sections indicate a higher density of data points, while narrower sections indicate lower density. By default, violin plots are symmetric.

ggplot(mel_mort, aes(x = melanoma_type, y = mortality_treatment_time, color = melanoma_type)) +
    geom_violin()

Try It Yourself

Create a violin plot showing the time from commencing treatment to best response by type of best response to PET. Make sure to include appropriate axis labels.

Hint: use response_data instrument.

ggplot(immuno_dataset$redcap_data$response_data, 
       aes(y = time_to_best_response, x = best_res_pet)) +
  geom_violin() + 
  labs(
    x = "Best response on PET",
    y = "Time from commencing treatment to best response (days)"
  )

Histogram

The geom for creating histograms is, rather unsurprisingly, geom_histogram().

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_histogram(aes(x = time_to_best_response))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_bin()`).

The warning message hints at picking a more optimal number of bins by specifying the binwidth argument.

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_histogram(aes(x = time_to_best_response), binwidth = 5)
Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_bin()`).

Or we can set the number of bins.

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_histogram(aes(x = time_to_best_response), bins = 20)
Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_bin()`).

These histograms are not very pleasing, aesthetically speaking - how about some better aesthetics?

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_histogram(
    aes(x = time_to_best_response), 
    bins = 20, 
    colour = "darkblue", 
    fill = "grey")
Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_bin()`).

Try It Yourself

Create a histogram of pituitary size using data from the ctmri_imaging instrument. Add color to the bars and a distinct color to the borders. Include clear and appropriate axis labels.

ggplot(immuno_dataset$redcap_data$ctmri_imaging, 
       aes(x = pit_size)) +
  geom_histogram(binwidth = 1, colour = "darkgreen", fill = "lightgreen") + 
  labs(
    x = "Pituitary size (mm)",
    y = "Counts"
  )

Density plot

Density plots are used to visualise the distribution of a continuous variable in a dataset. These are essentially smoothed histograms, where the area under the curve for each sub-group will sum to 1. This allows us to compare sub-groups of different size.

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_density(aes(x = time_to_best_response, colour = best_res_pet))
Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Groups with fewer than two data points have been dropped.
Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
-Inf

Saving plot images

Use ggsave() to save the last plot you displayed.

ggsave("time_to_best_response_density_plot.png")

You can alter the width and height of the plot and can change the image file type.

ggsave("time_to_best_response_density_plot.pdf", width = 20, height = 12, units = "cm")

You can also pass in a plot object you have created instead of using the last plot displayed. See the help page (?ggsave) for more details.

Try It Yourself

Assign the variable name cortisol_mort_time_plt to the scatter plot you created before. Save this plot as a jpeg file.

cortisol_mort_time_plt <- ggplot(cortisol_mort_time, aes(x = mortality_treatment_time, y = cortisol)) + 
  geom_point(aes(colour = melanoma_type), size = 2, alpha = 0.7) +
  geom_smooth() +
  labs(
    title = "Cortisol vs Time since first treatment dose",
    y = "Cortisol (nmol/L)",
    x = "Time since first treatment dose (months)",
    colour = "Melanoma Type" # legend title
  ) + 
  theme_grey()
ggsave(plot = cortisol_mort_time_plt, filename = "cortisol_vs_mort_time_plot.jpeg")