Basics of R Programming Language

R!

R is a powerful programming language and open-source software widely used for statistical computing and data analysis. This programming language is developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. R has gained popularity among statisticians, data scientists, researchers, and analysts for its flexibility, extensibility, and robust statistical capabilities.

Why learn R?

Here are several compelling reasons to consider learning R:

Statistical Analysis
Data Visualisation
Open Source
Community Support
Extensibility
Integration with Other Languages
Data Science and Machine Learning
Widely Used in Academia and Industry
Continuous Development

Getting Started with R

To begin working with R, users typically install an Integrated Development Environment (IDE) such as RStudio, which provides a user-friendly interface for coding, debugging, and visualising results. R scripts are written in the R language and can be executed interactively or saved for later use.

A look around RStudio

Open RStudio. You will see four windows (aka panes). Each window has a different function. The screenshot below shows an analogy linking the different RStudio windows to cooking.

Console Pane

On the left-hand side, you’ll find the console. This is where you can input commands (code that R can interpret), and the responses to your commands, known as output, are displayed here. While the console is handy for experimenting with code, it doesn’t save any of your entered commands. Therefore, relying exclusively on the console is not recommended.

History Pane

The history pane (located in the top right window) maintains a record of the commands that you have executed in the R console during your current R session. This includes both correct and incorrect commands.

You can navigate through your command history using the up and down arrow keys in the console. This allows you to quickly recall and re-run previous commands without retyping them.

Environment Pane

The environment pane (located in the top right window) provides an overview of the objects (variables, data frames, etc.) that currently exist in your R session. It displays the names, types, dimensions, and some content of these objects. This allows you to monitor the state of your workspace in real-time.

Plotting Pane

The plotting pane (located in the bottom right window) is where graphical output, such as plots and charts, is displayed when you create visualisations in R. The Plotting pane often includes tools for zooming, panning, and exporting plots, providing additional functionality for exploring and customising your visualisations. Help Pane:

Help Pane

The help pane (located in the bottom right window) is a valuable resource for accessing documentation and information about R functions, packages, and commands. When you type a function or command in the console and press the F1 key (Mac: fn + F1) the Help pane displays relevant documentation. Additionally, you can type a keyword in the text box at the top right corner of the Help Pane.

Files Pane

The files pane provides a file browser and file management interface within RStudio. It allows you to navigate through your project directories, view files, and manage your file system.

Packages Pane

This pane provides a user-friendly interface for managing R packages. It lists installed packages and allows you to load, unload, update, and install packages.

Viewer Pane

It is used to display dynamic content generated by R, such as HTML, Shiny applications, or interactive visualisations.

Working directory

Opening an RStudio session launches it from a specific location. This is the working directory. R looks in the working directory by default to read in data and save files. You can find out what the working directory is by using the command getwd(). This shows you the path to your working directory in the console. In Mac this is in the format /path/to/working/directory and in Windows C:\path\to\working\directory. It is often useful to have your data and R scripts in the same directory and set this as your working directory. We will do this now.

Make a folder for this course somewhere on your computer that you will be able to easily find. Name the folder for example, Intro_R_REDCap_course. Then, to set this folder as your working directory:

In RStudio click on the Files tab and then click on the three dots, as shown below.

In the window that appears, find the folder you created (e.g. Intro_R_REDCap_course), click on it, then click Open. The files tab will now show the contents of your new folder. Click on More → Set As Working Directory, as shown below.

Note: You can use an RStudio project as described here to automatically keep track of and set the working directory.

R Scripts

In RStudio, the Script pane (located at the top left window) serves as a dedicated space for writing, editing, and executing R scripts. It is where you compose and organise your R code, making it an essential area for creating reproducible and well-documented analyses.

RStudio provides syntax highlighting in the Script pane, making it easier to identify different components of your code. You can execute individual lines or selections of code from the Script pane. This helps in testing and debugging code without running the entire script.

Quarto Document

Quarto is an open-source scientific and technical publishing system that allows you to combine text, code, and output in a single document. It is the next-generation version of RMarkdown and is widely used for reproducible research, dynamic reports, and interactive documents.

With Quarto, you can:

Write reports that integrate R code and results
Create interactive documents (HTML, PDF, Word, and more)
Publish research outputs with dynamic figures and tables

Why use Quarto?

Reproducibility
Combines analysis and documentation in one file
Flexible Outputs
Generate HTML, PDF, Word, and presentations
Works with R, Python, and Julia
Supports Markdown Syntax
Easy formatting for text and visuals

In this workshop, we will be using Quarto documents to write R code.

Getting Started with a Quarto Document

Follow these steps to create a new Quarto document in RStudio:

Open a New Quarto Document

Open RStudio
Go to File → New File → Quarto Document
A dialog box will appear:
- Title: Enter a document title as “Analysing REDCap Data using R”
- Format: Leave default format as HTML
- Engine: Leave default engine as knitr
Click Create.

This creates a new .qmd file in RStudio, which is a Quarto document.

Save the File

Click File → Save As
Choose a meaningful filename, e.g., introR_workshop.qmd
Click Save

Understanding the Structure of a Quarto Document

A Quarto document consists of three main sections:

YAML Header (Metadata Section)

This section is enclosed at the top of the file using — and contains metadata. Example:

---
title: "My First Quarto Document"
author: "John Doe"
date: "2025-01-30"
format: html
---

Common YAML options:

title: Document title
author: Name of the author
date: Date of the document
format: Output type (HTML, PDF, Word, etc.)

Text and Markdown (Narrative Section)

Quarto supports Markdown, a simple way to format text.

Headings:

# Main Heading
## Subheading
### Smaller Heading

Bold and Italic Text:
```
**Bold Text**
*Italic Text*
```
Lists:
```
-   Bullet Point 1
-   Bullet Point 2
```

Hyperlinks and Images:

[Click here for Quarto docs](https://quarto.org/)
![RStudio Logo](https://www.rstudio.com/wp-content/uploads/2014/04/rstudio-logo.png)

Code Blocks (Executable Section)

Quarto allows you to insert code chunks that run R scripts inside your document.

Example R Code Chunk:

```{r}
# Example calculation
x <- c(1, 2, 3, 4, 5)
sum(x)
```

To insert a code chunk, go to Code in the menu -> Insert Code Chunk or use the keyboard shortcuts Windows/Linux: Ctrl + Alt + I or Mac: ⌘ + Option + I. Code is written inside triple backticks and it is executed when you render the document.

Running and Rendering a Quarto Document

To run a single code chunk click the Run button at the top of the chunk or use the keyboard shortcut Windows/Linux: Ctrl + Shift + Enter or Mac: ⌘ + Shift + Enter.

To generate an output file (HTML, PDF, or Word), click the Render button in RStudio. The document compiles and opens the rendered file.

Tip

If PDF output fails, install TinyTeX for LaTeX support:

install.packages("tinytex")

Keyboard Shortcuts in Quarto (Windows & Mac)

Action	Windows/Linux	Mac
Run a single code line	`Ctrl + Enter`	`Cmd + Enter`
Run a single code chunk	`Ctrl + Shift + Enter`	`Cmd + Shift + Enter`
Run all chunks above	`Ctrl + Alt + P`	`Cmd + Option + P`
Render (Knit) document	`Ctrl + Shift + K`	`Cmd + Shift + K`
Insert a new code chunk	`Ctrl + Alt + I`	`Cmd + Option + I`
Comment/uncomment a line	`Ctrl + Shift + C`	`Cmd + Shift + C`
Open Quarto Render menu	`Ctrl + Shift + R`	`Cmd + Shift + R`
Open Quarto preview	`Ctrl + Shift + O`	`Cmd + Shift + O`
Restart R session	`Ctrl + Shift + F10`	`Cmd + Shift + F10`

Comments

In R, any text following the hash symbol # is termed a comment. R disregards this text, considering it non-executable. Comments serve the purpose of documenting your code, aiding your future understanding of specific lines, and highlighting the intentions or challenges encountered.

RStudio makes it easy to comment or uncomment a paragraph: Select the lines you want to comment (to comment a set of lines) or placing the cursor at any location of a line (to comment a single line), press at the same time on your keyboard ⌘ + Shift + C (mac) or Ctrl + Shift + C (Windows/Linux).

Extensive use of comments is encouraged throughout this course.

# This is a comment. Ignored by R. But useful for me!

Executing Commands

Executing commands or running code is the process of submitting a command to your computer, which does some computation and returns an answer. In RStudio, there are several ways to execute commands:

Select the line(s) of code using the mouse, and then click Run at the top right corner of the R text file.
Select Run Lines from the Code menu.
Click anywhere on the line of code and click Run.
Select the line(s) you want to run. Press ⌘ + Return (Mac) or Ctrl + Return (Windows/Linux) to run the selected code.

We suggest the third option, which is fastest. This link provides a list of useful RStudio keyboard shortcuts that can be beneficial when coding and navigating the RStudio IDE.

When you type in, and then run the commands shown in the grey boxes below, you should see the result in the Console pane at bottom left.

Simple Maths in R

We can use R as a calculator to do simple maths.

3 + 5

[1] 8

More complex calculator functions are built in to R, which is the reason it is popular among mathematicians and statisticians. To use these functions, we need to call these functions.

Try It Yourself

Add a R code chunk and find the result of the equation: \[ \frac{3^2 \times 8^3}{10 + 5} -120\]

Solution

(3^2 * 8^3)/(10 + 5) - 120

[1] 187.2

Calling Functions

R has a large collection of built-in functions that are called like this:

function_name(argument1 = value1, argument2 = value2, ...)

Let’s explore using seq() function to create a series of numbers.

Start by typing se and then press Tab. RStudio will suggest possible completions. Specify seq() by typing more or use the up/down arrows to select it. You’ll see a helpful tooltip-type information pop up, reminding you of the function’s arguments. If you need more assistance, press F1 (Windows/linux) or fn + Tab (Mac) to access the full documentation in the help tab at the lower right.

Now, type the arguments 1, 10 and press Return.

seq(1, 10)

 [1]  1  2  3  4  5  6  7  8  9 10

You can explicitly specify arguments using the name = value format. However, if you don’t, R will try to resolve them based on their position.

seq(from = 1, to = 10)

 [1]  1  2  3  4  5  6  7  8  9 10

In this example, it assumes that we want a sequence starting from 1 and ending at 10. Since we didn’t mention the step size, it defaults to the value defined in the function, which is 1 in this case.

seq(from = 1, to = 10, by = 2)

[1] 1 3 5 7 9

If you are using name = value format the order of the arguments does not matter.

seq(to = 10, by = 2, from = 1)

[1] 1 3 5 7 9

For frequently used functions, I might rely on positional resolution for the first one or two arguments. However, beyond that, I prefer to use the name = value format for clarity and precision.

To take the log of 100:

log(x = 100, base = 10)

[1] 2

To take the square root of 100:

sqrt(100) # this is the short-hand of sqrt(x = 100)

[1] 10

Notice that the square root function is abbreviated to sqrt(). This is to make writing R code faster, however the draw back is that some functions are hard to remember, or to interpret.

Try It Yourself

Find the sum of log square root values of the sequence 10, 20, 30, …, 100.

Solution

sum(log(sqrt(seq(10, 100, 10))))

[1] 19.06513

Getting Help

In R, the ? and ?? operators are used for accessing help documentation, but they behave slightly differently.

The ? operator is used to access help documentation for a specific function or topic. When you type ? followed by the name of a function, you get detailed information about that function. For example try:

?mean

View Output

<!DOCTYPE html> R: Arithmetic Mean

mean	R Documentation

Arithmetic Mean

Description

Generic function for the (trimmed) arithmetic mean.

Usage

mean(x, ...)

## Default S3 method:
mean(x, trim = 0, na.rm = FALSE, ...)

Arguments

`x`	An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for `trim = 0`, only.
`trim`	the fraction (0 to 0.5) of observations to be trimmed from each end of `x` before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
`na.rm`	a logical evaluating to `TRUE` or `FALSE` indicating whether `NA` values should be stripped before the computation proceeds.
`…`	further arguments passed to or from other methods.

Value

If trim is zero (the default), the arithmetic mean of the values in x is computed, as a numeric or complex vector of length one. If x is not logical (coerced to numeric), numeric (including integer) or complex, NA_real_ is returned, with a warning.

If trim is non-zero, a symmetrically trimmed mean is computed with a fraction of trim observations deleted from each end before the mean is computed.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Examples

x <- c(0:10, 50)
xm <- mean(x)
c(xm, mean(x, trim = 0.10))

The above command displays the help documentation for the mean function, providing information about its usage, arguments, and examples.

The ?? operator is used for a broader search across help documentation. It performs a search for the specified term or keyword in the documentation.

??regression

This will search for the term “regression” in the help documentation and return relevant results. It’s useful when you want to find functions, packages, or topics related to a specific term.

Tab completion

A very useful feature is Tab completion. You can start typing and use Tab to autocomplete code, for example, a function name.

Try It Yourself

Check the help page of log function.

Solution

help(log)

R Packages

Many developers have built 1000s of functions and shared them with the R user community to help make everyone’s work easier and more efficient. These functions (short programs) are generally packaged up together in (wait for it) Packages. For example, the tidyverse package is a compilation of many different functions, all of which help with data transformation and visualisation. Packages also contain data, which is often included to assist new users with learning the available functions.

Installing Packages

Packages are hosted on repositories, with CRAN (Comprehensive R Archive Network) being the primary repository. To install packages from CRAN, you use the install.packages() function. For example:

install.packages("tidyverse")

This will spit out a lot of text into the console as the package is being installed. Once complete you should have a message:

The downloaded binary packages are in... followed by a long directory name.

To remove an installed package:

remove.packages("tidyverse")

Loading Packages

After installation, you need to load a package into your R session using the library() function. For example:

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

This makes the functions and datasets from the ‘tidyverse’ package available for use in your current session.

Tip

You only need to install a package once. Once installed, you don’t need to reinstall it in subsequent sessions. However, you do need to load the package at the beginning of each R session using the library() function before you can utilise its functions and features. This ensures that the package is actively available for use in your current session.

To view packages currently loaded into memory:

(.packages())

 [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
 [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
[13] "grDevices" "utils"     "datasets"  "methods"   "base"

search()

 [1] ".GlobalEnv"        "package:lubridate" "package:forcats"  
 [4] "package:stringr"   "package:dplyr"     "package:purrr"    
 [7] "package:readr"     "package:tidyr"     "package:tibble"   
[10] "package:ggplot2"   "package:tidyverse" "package:stats"    
[13] "package:graphics"  "package:grDevices" "package:utils"    
[16] "package:datasets"  "package:methods"   "Autoloads"        
[19] "package:base"

Package Documentation

Each package comes with documentation that explains how to use its functions. You can access this information using the help() function or by using ? before the function name:

help(tidyverse)

View Output

<!DOCTYPE html> R: tidyverse: Easily Install and Load the ‘Tidyverse’

tidyverse-package

R Documentation

tidyverse: Easily Install and Load the ‘Tidyverse’

Description

The ‘tidyverse’ is a set of packages that work in harmony because they share common data representations and ‘API’ design. This package is designed to make it easy to install and load multiple ‘tidyverse’ packages in a single step. Learn more about the ‘tidyverse’ at https://www.tidyverse.org.

Author(s)

Maintainer: Hadley Wickham hadley@rstudio.com

Other contributors:

RStudio [copyright holder, funder]

Variables

A variable is a bit of tricky concept, but very important for understanding R. Essentially, a variable is a symbol that we use in place of another value. Usually the other value is a larger/longer form of data. We can tell R to store a lot of data, for example, in a variable named x. When we execute the command x, R returns all of the data that we stored there.

For now however we’ll just use a tiny data set: the number 5. To store some data in a variable, we need to use a special symbol <-, which in our case tells R to assign the value 5 to the variable x. This is called the assignment operator. To insert the assignment operator press Option + - (Mac) or Alt + - (Windows/Linux).

Let’s see how this works.

Create a variable called x, that will contain the number 5.

x <- 5

R won’t return anything in the console, but note that you now have a new entry in the environment pane. The variable name is at the left (x) and the value that is stored in that variable, is displayed on the right (5).

We can now use x in place of 5:

x + 10

[1] 15

x * 3

[1] 15

Variables are sometimes referred to as objects. In R there are different conventions about how to name variables, but most importantly they:

cannot begin with a number
should begin with an alphabetical letter
they are case sensitive
variables can take any name, but its best to use something that makes sense to you, and will likely make sense to others who may read your code.

It is wise to adapt a consistent convention for separating words in variables.

For example:

# i_use_snake_case
# other.people.use.periods
# evenOthersUseCamelCase

Try It Yourself

Assign the value 5 to a variable a and the value 10 to a variable b. Then, create a new variable sum that stores the result of adding a and b, and print the value of sum.

Solution

a <- 5
b <- 10
sum <- a + b
sum

[1] 15

The Pipe Operator (`|>` or `%>%`)

The pipe operator (|>) is a commonly used feature of the tidyverse. It was originally defined in the (cleverly named) magrittr package, but is also included in the dplyr, tidyverse packages. The |> symbol can seem confusing and intimidating at first. However, once you understand the basic idea, it can become addicting!

We suggest you use a shortcut: ⌘ + Shift + M (Mac) or Ctrl + Shift + M (Windows/Linux).

The |> symbol is placed between a value on the left and a function on the right. The |> simply takes the value to the left and passes it to the function on the right as the first argument. It acts as a “pipe”. That’s it!

Suppose we have a variable, x.

x <- 7

The following are the exact same.

sqrt(x)

[1] 2.645751

x |> sqrt()

[1] 2.645751

We’ll continue to use |> throughout this tutorial to show how useful it can be for chaining various data manipulation steps during an analysis.

Chaining functions

R chaining allows you to streamline your data analysis workflow by sequentially applying multiple operations to your data using the pipe operator |>. We often need to perform several data manipulation or analysis operations in a sequence. Chaining allows you to apply these operations one after the other in a clear and concise manner.

Here’s a basic template for chaining operations using the pipe operator |>:

result <- data |>
    operation1(...) |>
    operation2(...) |>
    operation3(...) |>
    ...
    operationN(...)

In this template:

data represents the input data frame or object.
operation1, operation2, …, operationN represent the functions or operations you want to apply sequentially to the data.For example: select(), filter() or mutate() functions.
... represents any additional arguments or parameters that may be passed to each operation.

Each operation takes the output of the previous operation as its input, making it easy to chain multiple operations together. This improves the readability of your code by organising operations in a left-to-right fashion and it avoids creating intermediate variables to store the results of each operation.

Try It Yourself

Find the sum of log square root values of the sequence 10, 20, 30, …, 100 by chaining functions.

Solution

seq(10, 100, 10) |> sqrt() |> log() |> sum()

[1] 19.06513

Clearing the Environment

Take a look at the objects you have created in your workspace that is accumulated in the environment pane in the upper right corner of RStudio.

You can obtain a list of objects in your workspace using a couple of different R commands:

objects()

Output

[1] "a"   "b"   "sum" "x"

ls()

Output

[1] "a"   "b"   "sum" "x"

If you wish to remove a specific object, let’s say x1, you can use the following command:

rm(a)

To remove all objects:

rm(list = ls())

Alternatively, you can click the broom icon in RStudio’s Environment pane to clear everything.

For the sake of reproducibility, it’s crucial to regularly delete your objects and restart your R session. This ensures that your analysis can be replicated next week or even after upgrading your operating system. Restarting your R session helps identify and address any dependencies or configurations needed for your analysis to run successfully.

Case Study: Immunotherapy Dataset

In this workshop, we are using a dummy Immunotherapy dataset on REDCap filled with randomly generated data. Therefore, note that in some cases the data can make no sense. However, this will be useful for learning how to import data into R, data manipulation and basic visualisation.

This dataset contains 15 instruments or forms namely: Demographics, Melanoma Data, Adjuvant Therapy, Systemic Therapy for Advanced Disease, Melanoma CNS Metastases, Adverse Events, Baseline Visit, Checkpoint Inhibitor Treatment, Immune Related Adverse Events (irAEs), Pathology, PET irAE Imaging, PPI and Antibiotic use during treatment with CPIs, Response Data, and Mortality Data.

Importing REDCap Data

REDCap API

REDCap (Research Electronic Data Capture) provides an Application Programming Interface (API) that allows users to programmatically access and interact with their project data. The API enables automation of data retrieval, updates, and exports, reducing manual effort and ensuring reproducibility in data analysis.

Peter Mac REDCap Instance

https://redcap.petermac.org.au

What is the REDCap API?

The REDCap API is a web-based service that allows users to interact with REDCap programmatically. Instead of manually downloading CSV files, users can use the API to:

Retrieve records from a REDCap project
Import new data or update existing records
Export metadata (variable names, field types)
Pull longitudinal and repeating instrument data
Generate reports dynamically

The API facilitates automated data retrieval, making it a powerful tool for integrating REDCap data into R-based workflows.

Example Use Case

A researcher can schedule a daily script in R to pull the latest REDCap data for real-time analysis instead of manually exporting files from the web interface.

Requesting an API Token in REDCap

To access the API, users must obtain an API token, which is a unique, secure key that authenticates requests. REDCap provides API tokens at the user-project level. This means that if three users on the same project need to use the API, each user will need to individually request an API token. Similarly, if one user wants an API token for three different projects, they will need to request an API token for each project.

Steps to Request an API Token for a Project:

Log in to REDCap and navigate to your project.
If your REDCap project has API access enabled, you will see it in the applications on the left side of the screen as follows. Otherwise contact REDCapServiceDesk@petermac.org.

Click on “API” under the Applications menu.
Click “Request API token” to send a token request to the REDCap administrative team.
Your REDCap administrator will review and approve your request.
Once approved, you will receive a unique API token (a long alphanumeric string).

Important

Keep your API token private and never share it. It grants full access to your REDCap project data.

Using the REDCap API

The best way to familiarise yourself with the REDCap API is to explore the API Playground.

Click on the “API Playground” link from the left-hand menu under “Applications.”
Once in the API Playground, there is a blue box with a dropdown menu labeled “API Method.” This dropdown includes all the API actions REDCap can take.
1. If a project is in production, the methods listed in this dropdown will be limited so as not to affect real data in the project. This is noted in the green text under the “API Method” dropdown.

Select the method you need from the dropdown menu and complete the additional information. The additional information (e.g., “Format”, “Instrument”, etc.) will vary depending on which API method you choose and the project structure. In the above example, the researcher is asking to export project information as a CSV.
1. To see all the API functions REDCap is capable of, and export a .zip file of sample code, click on the “REDCap API documentation” link that is available on both the “API” page and in the “API Playground.”
When you scroll further down the page, there is an open text box with a series of tabs on the top, with each tab corresponding to a coding language. Each tab will provide the API code in the indicated language.

To execute a real API request, click the “Execute Request” button, and it will display the API response in a textbox as follows.

On the API Playground, there is a button that will let you “Execute Request.” This will perform the API action you are programming and thus affect the data in your project. Use this button with a great amount of caution.

Security Considerations for API Access

Since the API token provides direct access to your REDCap project, it must be handled securely.

Best Practices for API Security:

Never share your API token with anyone. Keep your API token private. – Never hardcode it in scripts.
Do not test API tokens in browsers. Using an API token in plain text within a script is unsecure. An API token should be encrypted within a script, be called via secure environment variables, or otherwise be accessible from the script via other secure mechanisms.
Before you share code anywhere, remove your API token.
Enable logging and monitor API access regularly.
Revoke unused API tokens if they are no longer needed.
Regenerate your API token every 90 days, or at any point that you think your token has been compromised. To regenerate your token, go to the API page and select “Regenerate token.” If you are no longer using the API functionality on your project, delete your token.

Using an Environment Variable for API Token in R

You can save your API keys into a “hidden” file containing code that runs when you start R. That file is called the “.Renviron”. It can be a bit of a pain to find this file. So the best option is to install the usethis package, which contains helper functions, including a function to find this file.

install.packages("remotes")
remotes::install_cran("usethis")

When it comes to add packages to your copy of R, the install_cran() function in the remotes package is superior to the usual install.packages() function because it will first check to see if you already have the latest version before bothering to download and install.

After installing usethis you can access your “.Renviron” file by typing this in your console.

usethis::edit_r_environ()

It will cause the file to open. Create a name for your API key (for example: rcap_immuno_key) and add a line like this to your .Renviron file:

rcap_immuno_key="your_api_token_here"

When you click the link you will be given the option to create an API Token for this project. Copy the token created in the previous section from REDCap website, and paste it in the .Renviron file as explained above. Instead of your_api_token_here in the .Renviron file, your token should be there within ““.

After adding the line, remember to save the file and completely restart R/RStudio. Once R restarts, you can access the key like this:

api_token <- Sys.getenv("rcap_immuno_key")

Once you have an API token, you can test whether it works using httr in R.

Example: Checking Project Information

library(REDCapR)

# Define API URL and Token
url <- "https://redcap.petermac.org.au/api/"
token <- Sys.getenv("rcap_immuno_key")  # Load token securely

# Test API connection
formData <- list("token"=token,
    content='project',
    format='csv',
    returnFormat='json'
)
response <- httr::POST(url, body = formData, encode = "form")
result <- httr::content(response)

# Print project details
result

project_id	project_title	creation_time	production_time	in_production	project_language	purpose	purpose_other	project_notes	custom_record_label	secondary_unique_field	is_longitudinal	has_repeating_instruments_or_events	surveys_enabled	scheduling_enabled	record_autonumbering_enabled	randomization_enabled	ddp_enabled	project_irb_number	project_grant_number	project_pi_firstname	project_pi_lastname	display_today_now_button	missing_data_codes	external_modules	bypass_branching_erase_field_prompt
1840	Sample Immune checkpoint inhibitor related endocrine toxicity	2025-01-21 11:52:37	2025-01-21 15:55:23	1	English	1	For a workshop	A database with dymmy data to be used for an R workshop titled Analysing REDCap data using R for Peter Mac employees	NA	NA	1	1	1	0	0	0	0	NA	NA	Anna	Galligan	1	NA	sticky_matrix_headers,data_dictionary_revisions,annotated_pdf,record_logging_link,data_driven_project_banner,project_autocomplete	0

If the request is successful, you should see metadata about your REDCap project as shown above.

Try It Yourself

In the REDCap API Playground, use the ‘Export Records’ option to create an API request that exports the 3rd record and the mortality_data form in CSV format.

Solution

#!/usr/bin/env Rscript
url <- "https://redcap.petermac.org.au/api/"
formData <- list("token"=token,
    content='record',
    action='export',
    format='csv',
    type='flat',
    csvDelimiter='',
    'records[0]'='3',
    'forms[0]'='mortality_data',
    rawOrLabel='raw',
    rawOrLabelHeaders='raw',
    exportCheckboxLabel='false',
    exportSurveyFields='false',
    exportDataAccessGroups='false',
    returnFormat='json'
)
response <- httr::POST(url, body = formData, encode = "form")
result <- httr::content(response)
print(result)

Importing REDCap Data via API

Once you have set up your API token securely, you can use R to retrieve data directly from REDCap. The REDCapR package provides an interface to streamline API calls from R, making it easy to import records from a REDCap project.

Reading REDCap Data

# If this fails, run install.packages("REDCapR") or 
# remotes::install_github(repo="OuhscBbmc/REDCapR")
requireNamespace("REDCapR")

Set project-wide values

There is some information that is specific to the REDCap project, as opposed to an individual operation. This includes:

the uniform resource identifier (uri) of the server
the token for the user’s project.

library(REDCapR)

# Define API URL and Token
uri <- "https://redcap.petermac.org.au/api/"
token <- Sys.getenv("rcap_immuno_key")  # Load token securely

Read all records and fields

By default, the redcap_read() function retrieves the entire dataset from a REDCap project if no filtering parameters (such as records or fields) are specified.

# Read the entire dataset
immuno_all_rows_all_fields <- redcap_read(redcap_uri = uri, token = token)$data

# print the top 6 rows
head(immuno_all_rows_all_fields)

record_id	redcap_event_name	redcap_repeat_instrument	redcap_repeat_instance	ur	last_name	first_name	sex	dob	height	weight	bmi	coenrolled___1	coenrolled___2	coenrolled___3	coenrolled___4	clinical_trial	clinical_trial_description	medical_history___1	medical_history___2	medical_history___7	medical_history___8	medical_history___5	medical_history___3	medical_history___4	medical_history___6	medical_history___99	medical_history_other	autoimmune_disease	autoimmune_disease_select___1	autoimmune_disease_select___2	autoimmune_disease_select___3	autoimmune_disease_select___4	autoimmune_disease_select___5	autoimmune_disease_select___6	autoimmune_disease_select___9	autoimmune_disease_other	rheumatoid_arthritis	smoking	demographics_complete	mel_type___1	mel_type___2	mel_type___3	mel_type___4	mel_type___5	mel_type___6	mel_type_cutaneous	mel_mutation___1	mel_mutation___2	mel_mutation___3	mel_mutation___4	mel_mutation___5	mel_mutation___9	mel_mutation_braf	mel_mutation_nras	mel_mutation_kit	mel_mutation_other	mel_first_date	mel_first_stage	resct1stdiag	mel_date_diag	stage_diagnosis	dt_advanced_dis	melanoma_data_complete	adj_given	adj_path_done	adj_path_date	adj_path_hb	adj_path_wcc	adj_path_neut_lymph	adj_path_creat	adj_path_glucose	adj_path_lipase	adj_path_a1c_dcct	adj_path_a1c_ifcc	adj_path_insulin	adj_path_cpep	adj_path_islet_ab___1	adj_path_islet_ab___2	adj_path_islet_ab___3	adj_path_islet_ab___4	adj_path_gad_ab	adj_path_ia2_ab	adj_path_insulin_ab	adj_path_znt8_ab	adj_path_cortisol	adj_path_acth	adj_path_tsh	adj_path_ft4	adj_path_ft3	adj_path_tpo_ab	adj_path_tg_ab	adj_path_trab	adj_path_fsh	adj_path_lh	adj_path_testosterone	adj_path_oestradiol	adj_path_igf1	adj_path_gh	adj_path_prolactin	adj_path_alt	adj_path_albumin	adj_path_alp	adj_path_bilirubin	adj_path_ferritin	adj_path_crp	adj_path_vitd	adj_path_troponin	adj_path_calprotectin	adj_stage	adj_type	adj_summary	adj_type_trialid	adjstartdate	adj_start	adj_medication___1	adj_cessation	adj_recurrence	adj_recurrence_date	timrecurrence	adj_resection	adjuvant_therapy_complete	sys_path_done	sys_path_date	sys_path_hb	sys_path_wcc	sys_path_neut_lymph	sys_path_creat	sys_path_glucose	sys_path_lipase	sys_path_a1c_dcct	sys_path_a1c_ifcc	sys_path_insulin	sys_path_cpep	sys_path_islet_ab___1	sys_path_islet_ab___2	sys_path_islet_ab___3	sys_path_islet_ab___4	sys_path_gad_ab	sys_path_ia2_ab	sys_path_insulin_ab	sys_path_znt8_ab	sys_path_cortisol	sys_path_acth	sys_path_tsh	sys_path_ft4	sys_path_ft3	sys_path_tpo_ab	sys_path_tg_ab	sys_path_trab	sys_path_fsh	sys_path_lh	sys_path_testosterone	sys_path_oestradiol	sys_path_igf1	sys_path_gh	sys_path_prolactin	sys_path_alt	sys_path_albumin	sys_path_alp	sys_path_bilirubin	sys_path_ferritin	sys_path_crp	sys_path_vitd	sys_path_troponin	sys_path_calprotectin	systemic_type	systemic_summary	systemic_chemo	systemic_ici	systemic_mapk	systemic_trial	systemic_trial_number	systemic_type_other	systemic_stage	systemic_ecog	systemic_disease_sites___1	systemic_disease_sites___2	systemic_disease_sites___3	systemic_disease_sites___4	systemic_disease_sites___5	systemic_disease_sites___6	systemic_disease_sites___7	systemic_disease_sites___8	systemic_disease_sites___9	systemic_disease_sites___10	systemic_disease_sites___11	systemic_disease_sites___12	systemic_ppi	systemic_antibiotics	systemic_steroids	systemic_ldh	systemic_ldh_value	systemic_creatinine_units	systemic_creatinine_uln	systemic_egfr	systemic_start	systemic_cycles	systemic_cease_reason	rest_pet	rest_ct	first_response	st_resp_ct	dt_first_response	best_response_percist	best_resp_recist	dt_best_response	systemic_percist_response_time	systemic_progression	systemic_progression_type	systemic_progression_date	systemic_progression_time	systemic_progression_imaging	pseudoprogression	systemic_prog_imaging_type___1	systemic_prog_imaging_type___2	systemic_prog_imaging_type___3	systemic_prog_ct_date	systemic_prog_pet_date	systemic_prog_mri_date	systemic_progression_clinically	sites_progression___1	sites_progression___2	sites_progression___3	sites_progression___4	sites_progression___5	sites_progression___6	sites_progression___7	sites_progression___8	sites_progression___9	sites_progression___10	sites_progression___11	sites_progression___12	site_1_met_at_recur___1	site_1_met_at_recur___2	site_1_met_at_recur___3	site_1_met_at_recur___4	site_1_met_at_recur___5	site_1_met_at_recur___6	site_1_met_at_recur___7	site_1_met_at_recur___8	site_1_met_at_recur___9	site_1_met_at_recur___10	site_1_met_at_recur___11	site_1_met_at_recur___12	biopsy_confirmed	systemic_oligorecurrence	systemic_oligo_treatment	treat_intent_olig	systemic_oligo_treatment_date	systemic_oligo_treatment_response	systemic_oligo_treatment_systemic	date_last_syst_tx	tx_ongoing	reason_cessation	p_treatment___1	p_treatment___2	p_treatment___3	p_treatment___4	type_io___1	type_io___2	type_io___3	dis_free_io___1	dis_free_io___2	dis_free_io___3	dis_free_io___4	dur_res_io	res_pemb_nivo___1	res_pemb_nivo___2	res_pemb_nivo___3	res_pemb_nivo___4	res_niv_pem	res_pem___1	res_pem___2	res_pem___3	res_pem___4	dur_res_pem	braf_mek	targ_dis_free___1	targ_dis_free___2	targ_dis_free___3	targ_dis_free___4	dur_braf_mek	site_disease_prog___1	site_disease_prog___2	site_disease_prog___3	site_disease_prog___4	site_disease_prog___5	site_disease_prog___6	site_disease_prog___7	site_disease_prog___8	site_disease_prog___9	site_disease_prog___10	site_disease_prog___11	p_treatment_info	prior_treatment_complete	cnsmets_date	cnsmets_number	number_cns_mets	cnsmets_largest	cns_symptoms	surgery_brainmets	cnsmets_radiotherapy	cnsmets_glucocorticoids	cnsmets_brafmek	cnsmets_bevacizumab	cns_diag	cns_symptoms_type	resp_io_brainmet___1	resp_io_brainmet___2	resp_io_brainmet___3	resp_io_brainmet___4	cns_steroid	cns_braf_mek	single_double	melanoma_cns_metastases_complete	ae_any	ae_adj_sql	ae_systemic_sql	ae_type	ae_type_sql	ae_type_sql_select	ae_endocrine	ae_gastrointestinal	ae_haematological	ae_neurological	ae_skin	ae_onset_date	ae_ctcae	ae_kdigo	ae_investigations___1	ae_investigations___2	ae_investigations___3	ae_investigations___4	ae_investigations___5	ae_investigations___6	ae_investigations___7	ae_investigations___8	ae_investigations___9	ae_investigations___10	ae_investigations___11	ae_autoantibodies_date	ae_autoantibodies_result	ae_biopsy_date	ae_biopsy_result	ae_csf_date	ae_csf_result	ae_ecg_date	ae_ecg_result	ae_echo_date	ae_echo_result	ae_endoscopy_date	ae_endoscopy_result	ae_fcp_date	ae_fcp_result	ae_fmcs_date	ae_fmcs_result	ae_mri_date	ae_mri_result	ae_ncs_date	ae_ncs_result	ae_urin_date	ae_urin_result	ae_treatment___11	ae_treatment___10	ae_treatment___1	ae_treatment___2	ae_treatment___3	ae_treatment___4	ae_treatment___5	ae_treatment___15	ae_treatment___6	ae_treatment___7	ae_treatment___8	ae_treatment___9	ae_treatment___14	ae_treatment___12	ae_treatment___13	ae_summary	adverse_events_complete	b_date	b_ecog	b_autoim___1	b_autoim___2	b_autoim___3	b_autoim___4	b_autoim___5	b_autoim___6	b_autoim___7	b_autoim___8	b_autoim___9	b_autoim___10	b_autoim___11	b_autoim___12	b_autoim___13	b_autoim_type	b_fhx	b_endo___1	b_endo___2	b_endo___3	b_endo___4	b_endo___5	b_endo___6	b_endo___7	b_endo___8	b_endo___9	b_endo___10	b_endo___11	b_endo___12	b_endo_type	hla	b_cancer	b_cancer_hx	b_pmhx	b_meds	b_ppi	b_steroid	b_steroid_type	baseline_visit_complete	drug_name	drug_date	drug_number	drug_comment	checkpoint_inhibitor_treatment_complete	irae_type___1	irae_type___2	irae_type___3	irae_type___4	irae_type___5	irae_type___6	irae_type___7	irae_type___8	irae_type___9	irae_type___10	endo_irae___1	endo_irae___2	endo_irae___3	endo_irae___4	hypophysitis_date	hypophysitis_time	thyroiditis_date	thyroiditis_time	pancreatitis_date	adrenalitis_date	hypophysitis_cycle	thyroid_cycle	pancreas_cycle	adrenal_cycle	endo_irae_comment	skin_date	skin_cycle	skin_type___1	skin_type___2	skin_type___3	skin_type___4	skin_type___5	skin_type___6	skin_type___7	skin_type___8	skin_type___9	skin_type___10	skin_grade	skin_histo	rheum_date	rheum_cycle	rheum_type___1	rheum_type___2	rheum_type___3	rheum_type___4	rheum_type___5	rheum_type___6	rheum_type___7	rheum_grade	rheum_path	gastro_date	gastro_cycle	gastro_type___1	gastro_type___2	gastro_type___3	gastro_type___4	gastro_type___5	gastro_grade	gastro_endoscopy	gastro_histo	gastro_radiol	gastro_cdt	gastro_stool	gastro_path	liver_date	liver_cycle	liver_grade	liver_path	liver_histo	liver_radiol	renal_date	renal_cycle	renal_type___1	renal_type___2	renal_type___3	renal_type___4	renal_type___5	renal_type___6	renal_grade	renal_path	renal_urine	renal_histo	pulm_date	pulm_cycle	pulm_type___1	pulm_type___2	pulm_type___3	pulm_type___4	pulm_grade	pulm_radiol	pulm_histo	pulm_path	cardiac_date	cardiac_cycle	cardiac_type___1	cardiac_type___2	cardiac_type___3	cardiac_type___4	cardiac_type___5	cardiac_grade	cardiac_tropck	cardiac_ecg	cardiac_echo	cardiac_radiol	cardiac_histo	neuro_date	neuro_cycle	neuro_type___1	neuro_type___2	neuro_type___3	neuro_type___4	neuro_type___5	neuro_type___6	neuro_type___7	neuro_type___8	neuro_type___9	neuro_type___10	neuro_type___11	neuro_type___12	neuro_grade	neuro_radiol	neuro_path	irae_steroids	irae_details	irae_emergency	immune_related_adverse_events_iraes_complete	path_date	hb	wcc	neut_lymph	creat	glucose	lipase	a1c_dcct	a1c_ifcc	insulin	cpep	islet_ab___1	islet_ab___2	islet_ab___3	islet_ab___4	gad_ab	ia2_ab	insulin_ab	znt8_ab	cortisol	acth	tsh	ft4	ft3	tpo_ab	tg_ab	trab	fsh	lh	testosterone	oestradiol	igf1	gh	prolactin	alt	albumin	alp	bilirubin	ferritin	crp	vitd	troponin	calprotectin	pathology_complete	pit_image_date	image_type	pit_suspected	pit_size	pit_appearance	hypophysitis_image	pit_alt_image	image_comment	ctmri_imaging_complete	ur_pet	pet_date	pet_timing___1	pet_timing___2	pet_timing___3	pet_timing___4	pet_bsl	pet_uptake_time	sul_peak	suv_max	pet_steroid	b_pet_metastases___1	b_pet_metastases___2	b_pet_metastases___3	b_pet_metastases___4	b_pet_metastases___5	b_pet_metastases___6	b_pet_metastases___7	b_pet_metastases___8	b_pet_metastases___9	b_pet_metastases___10	pet_endo___1	pet_endo___2	pet_endo___3	pet_endo___4	pet_endo___5	pet_endo___6	pet_endo___7	pet_endo___8	pet_endo___9	pet_endo___10	pet_endo___11	pit_pet_fdg	pit_pet_suv	pit_suv_change	hypophysitis_pet	pet_pit_ct	thyr_pet_fdg	thyr_pet_suv	thyr_suv_change	thyroiditis_pet	pet_thyr_ct	panc_pet_fdg	panc_pet_suv	panc_suv_change	pancreatitis_pet	pet_panc_ct	adrenal_pet_fdg	adrenal_pet_suv	adrenal_suv_change	adrenalitis_pet	pet_adrenal_ct	cns_pet_fdg	cns_pet_suv	brain_suv_change	encephalitis_pet	pet_cns_ct	rheum_pet	rheum_suvmax	pet_rheum_ct	liver_pet_fdg	liver_pet_suv	liver_suv_change	hepatitis_pet	pet_liver_ct	uppergi_pet_fdg	uppergi_pet_suv	uppergi_suv_change	gastritis_pet	pet_uppergi_ct	ilium_pet_fdg	ilium_pet_suv	ilium_suv_change	ileitis_pet	pet_ilium_ct	colon_pet_fdg	colon_pet_suv	colon_suv_change	colitis_pet	pet_colon_ct	pet_comments	sul_change	suv_change	percist_pet	eortc_pet	recist_pet	residual_disease___1	residual_disease___2	residual_disease___3	residual_disease___4	residual_disease___5	residual_disease___6	residual_disease___7	residual_disease___8	residual_disease___9	residual_disease___10	pet_metastasis	pet_metastasis_number	site_progression___1	site_progression___2	site_progression___3	site_progression___4	site_progression___5	site_progression___6	site_progression___7	site_progression___8	site_progression___9	site_progression___10	pet_imaging_complete	antibiotic_oral	antibiotic_iv	antibiotic_type	ppi	ppi_and_antibiotic_use_during_treatment_with_cpis_complete	responce_cpi	best_res_pet	time_to_best_response	response_dur_cpi	overall_response	progression_cpi	progression_adrenal	progression_site___1	progression_site___2	progression_site___3	progression_site___4	progression_site___5	progression_site___6	progression_site___7	progression_site___8	progression_site___9	progression_site___10	progression_site___11	other_site	subsequent_rx	response_data_complete	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause	other_cause_death	mortality_data_complete
1	baseline_arm_1	NA	NA	2810493	Jackson	Hannah	2	1943-02-12	154	163	68.73	0	0	0	0	NA	NA	0	0	0	0	0	0	0	0	0	NA	NA	0	0	0	0	0	0	0	NA	NA	NA	0	0	0	0	0	0	0	NA	0	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
1	baseline_arm_1	prior_treatment	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2	Immune checkpoint inhibition: 2019-04-15	NA	2	NA	0	NA	NA	6	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	1	9	NA	NA	NA	NA	2019-04-15	2	3	1	NA	3	NA	2019-07-09	3	NA	2019-07-09	2.8	0	NA	NA	NA	NA	NA	0	0	0	NA	NA	NA	NA	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	NA	NA	NA	NA	NA	NA	0	10/5/19	0	2	0	0	0	0	0	0	0	0	0	0	0	NA	0	0	0	0	NA	0	0	0	0	NA	NA	0	0	0	0	NA	0	0	0	0	0	0	0	0	0	0	0	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
2	baseline_arm_1	NA	NA	6408685	Howard	Samantha	2	2008-01-29	181	97	29.61	0	0	0	0	NA	NA	0	0	0	0	0	0	0	0	0	NA	NA	0	0	0	0	0	0	0	NA	NA	NA	0	0	0	0	0	0	0	NA	0	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
3	baseline_arm_1	NA	NA	9994173	Martinez	Noah	2	1940-11-06	199	123	31.06	0	0	0	0	NA	NA	0	0	0	0	0	0	0	0	0	NA	NA	0	0	0	0	0	0	0	NA	NA	NA	0	0	0	0	0	0	0	NA	0	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
3	end_arm_1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	1	2017-01-10	2	2016-10-24	4.7	NA	NA	2
3	baseline_arm_1	prior_treatment	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2	Immune checkpoint inhibition: 2016-08-19	NA	3	NA	0	NA	NA	5	1	0	0	0	0	0	1	1	0	0	0	0	0	NA	NA	NA	9	NA	NA	NA	NA	2016-08-19	NA	NA	1	NA	4	NA	2016-10-24	4	NA	NA	NA	1	1	2016-10-24	2.2	NA	2	0	0	0	NA	NA	NA	1	0	0	0	0	0	1	1	0	0	0	0	0	0	0	0	0	0	1	1	0	0	0	0	0	2	0	NA	NA	NA	NA	0	21/10/16	0	0	0	0	0	0	0	0	0	0	0	0	0	NA	0	0	0	0	NA	0	0	0	0	NA	NA	0	0	0	0	NA	0	0	0	0	0	0	0	0	0	0	0	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Read a subset of records

In many cases, you may only need data for a specific subset of records (e.g., certain patients). You can achieve this by specifying a list of record IDs in the records argument of redcap_read() as follows.

Pass an array (where each element is a record ID) to the records parameter:

# Define the specific records to retrieve
selected_records <- c(385, 490, 500)  # Replace with actual record IDs

# Read only the selected records
immuno_some_records <- redcap_read(
  redcap_uri = uri, 
  token = token, 
  records = selected_records
)$data

# print all rows
immuno_some_records

record_id	redcap_event_name	redcap_repeat_instrument	redcap_repeat_instance	ur	last_name	first_name	sex	dob	height	weight	bmi	coenrolled___1	coenrolled___2	coenrolled___3	coenrolled___4	clinical_trial	clinical_trial_description	medical_history___1	medical_history___2	medical_history___7	medical_history___8	medical_history___5	medical_history___3	medical_history___4	medical_history___6	medical_history___99	medical_history_other	autoimmune_disease	autoimmune_disease_select___1	autoimmune_disease_select___2	autoimmune_disease_select___3	autoimmune_disease_select___4	autoimmune_disease_select___5	autoimmune_disease_select___6	autoimmune_disease_select___9	autoimmune_disease_other	rheumatoid_arthritis	smoking	demographics_complete	mel_type___1	mel_type___2	mel_type___3	mel_type___4	mel_type___5	mel_type___6	mel_type_cutaneous	mel_mutation___1	mel_mutation___2	mel_mutation___3	mel_mutation___4	mel_mutation___5	mel_mutation___9	mel_mutation_braf	mel_mutation_nras	mel_mutation_kit	mel_mutation_other	mel_first_date	mel_first_stage	resct1stdiag	mel_date_diag	stage_diagnosis	dt_advanced_dis	melanoma_data_complete	adj_given	adj_path_done	adj_path_date	adj_path_hb	adj_path_wcc	adj_path_neut_lymph	adj_path_creat	adj_path_glucose	adj_path_lipase	adj_path_a1c_dcct	adj_path_a1c_ifcc	adj_path_insulin	adj_path_cpep	adj_path_islet_ab___1	adj_path_islet_ab___2	adj_path_islet_ab___3	adj_path_islet_ab___4	adj_path_gad_ab	adj_path_ia2_ab	adj_path_insulin_ab	adj_path_znt8_ab	adj_path_cortisol	adj_path_acth	adj_path_tsh	adj_path_ft4	adj_path_ft3	adj_path_tpo_ab	adj_path_tg_ab	adj_path_trab	adj_path_fsh	adj_path_lh	adj_path_testosterone	adj_path_oestradiol	adj_path_igf1	adj_path_gh	adj_path_prolactin	adj_path_alt	adj_path_albumin	adj_path_alp	adj_path_bilirubin	adj_path_ferritin	adj_path_crp	adj_path_vitd	adj_path_troponin	adj_path_calprotectin	adj_stage	adj_type	adj_summary	adj_type_trialid	adjstartdate	adj_start	adj_medication___1	adj_cessation	adj_recurrence	adj_recurrence_date	timrecurrence	adj_resection	adjuvant_therapy_complete	sys_path_done	sys_path_date	sys_path_hb	sys_path_wcc	sys_path_neut_lymph	sys_path_creat	sys_path_glucose	sys_path_lipase	sys_path_a1c_dcct	sys_path_a1c_ifcc	sys_path_insulin	sys_path_cpep	sys_path_islet_ab___1	sys_path_islet_ab___2	sys_path_islet_ab___3	sys_path_islet_ab___4	sys_path_gad_ab	sys_path_ia2_ab	sys_path_insulin_ab	sys_path_znt8_ab	sys_path_cortisol	sys_path_acth	sys_path_tsh	sys_path_ft4	sys_path_ft3	sys_path_tpo_ab	sys_path_tg_ab	sys_path_trab	sys_path_fsh	sys_path_lh	sys_path_testosterone	sys_path_oestradiol	sys_path_igf1	sys_path_gh	sys_path_prolactin	sys_path_alt	sys_path_albumin	sys_path_alp	sys_path_bilirubin	sys_path_ferritin	sys_path_crp	sys_path_vitd	sys_path_troponin	sys_path_calprotectin	systemic_type	systemic_summary	systemic_chemo	systemic_ici	systemic_mapk	systemic_trial	systemic_trial_number	systemic_type_other	systemic_stage	systemic_ecog	systemic_disease_sites___1	systemic_disease_sites___2	systemic_disease_sites___3	systemic_disease_sites___4	systemic_disease_sites___5	systemic_disease_sites___6	systemic_disease_sites___7	systemic_disease_sites___8	systemic_disease_sites___9	systemic_disease_sites___10	systemic_disease_sites___11	systemic_disease_sites___12	systemic_ppi	systemic_antibiotics	systemic_steroids	systemic_ldh	systemic_ldh_value	systemic_creatinine_units	systemic_creatinine_uln	systemic_egfr	systemic_start	systemic_cycles	systemic_cease_reason	rest_pet	rest_ct	first_response	st_resp_ct	dt_first_response	best_response_percist	best_resp_recist	dt_best_response	systemic_percist_response_time	systemic_progression	systemic_progression_type	systemic_progression_date	systemic_progression_time	systemic_progression_imaging	pseudoprogression	systemic_prog_imaging_type___1	systemic_prog_imaging_type___2	systemic_prog_imaging_type___3	systemic_prog_ct_date	systemic_prog_pet_date	systemic_prog_mri_date	systemic_progression_clinically	sites_progression___1	sites_progression___2	sites_progression___3	sites_progression___4	sites_progression___5	sites_progression___6	sites_progression___7	sites_progression___8	sites_progression___9	sites_progression___10	sites_progression___11	sites_progression___12	site_1_met_at_recur___1	site_1_met_at_recur___2	site_1_met_at_recur___3	site_1_met_at_recur___4	site_1_met_at_recur___5	site_1_met_at_recur___6	site_1_met_at_recur___7	site_1_met_at_recur___8	site_1_met_at_recur___9	site_1_met_at_recur___10	site_1_met_at_recur___11	site_1_met_at_recur___12	biopsy_confirmed	systemic_oligorecurrence	systemic_oligo_treatment	treat_intent_olig	systemic_oligo_treatment_date	systemic_oligo_treatment_response	systemic_oligo_treatment_systemic	date_last_syst_tx	tx_ongoing	reason_cessation	p_treatment___1	p_treatment___2	p_treatment___3	p_treatment___4	type_io___1	type_io___2	type_io___3	dis_free_io___1	dis_free_io___2	dis_free_io___3	dis_free_io___4	dur_res_io	res_pemb_nivo___1	res_pemb_nivo___2	res_pemb_nivo___3	res_pemb_nivo___4	res_niv_pem	res_pem___1	res_pem___2	res_pem___3	res_pem___4	dur_res_pem	braf_mek	targ_dis_free___1	targ_dis_free___2	targ_dis_free___3	targ_dis_free___4	dur_braf_mek	site_disease_prog___1	site_disease_prog___2	site_disease_prog___3	site_disease_prog___4	site_disease_prog___5	site_disease_prog___6	site_disease_prog___7	site_disease_prog___8	site_disease_prog___9	site_disease_prog___10	site_disease_prog___11	p_treatment_info	prior_treatment_complete	cnsmets_date	cnsmets_number	number_cns_mets	cnsmets_largest	cns_symptoms	surgery_brainmets	cnsmets_radiotherapy	cnsmets_glucocorticoids	cnsmets_brafmek	cnsmets_bevacizumab	cns_diag	cns_symptoms_type	resp_io_brainmet___1	resp_io_brainmet___2	resp_io_brainmet___3	resp_io_brainmet___4	cns_steroid	cns_braf_mek	single_double	melanoma_cns_metastases_complete	ae_any	ae_adj_sql	ae_systemic_sql	ae_type	ae_type_sql	ae_type_sql_select	ae_endocrine	ae_gastrointestinal	ae_haematological	ae_neurological	ae_skin	ae_onset_date	ae_ctcae	ae_kdigo	ae_investigations___1	ae_investigations___2	ae_investigations___3	ae_investigations___4	ae_investigations___5	ae_investigations___6	ae_investigations___7	ae_investigations___8	ae_investigations___9	ae_investigations___10	ae_investigations___11	ae_autoantibodies_date	ae_autoantibodies_result	ae_biopsy_date	ae_biopsy_result	ae_csf_date	ae_csf_result	ae_ecg_date	ae_ecg_result	ae_echo_date	ae_echo_result	ae_endoscopy_date	ae_endoscopy_result	ae_fcp_date	ae_fcp_result	ae_fmcs_date	ae_fmcs_result	ae_mri_date	ae_mri_result	ae_ncs_date	ae_ncs_result	ae_urin_date	ae_urin_result	ae_treatment___11	ae_treatment___10	ae_treatment___1	ae_treatment___2	ae_treatment___3	ae_treatment___4	ae_treatment___5	ae_treatment___15	ae_treatment___6	ae_treatment___7	ae_treatment___8	ae_treatment___9	ae_treatment___14	ae_treatment___12	ae_treatment___13	ae_summary	adverse_events_complete	b_date	b_ecog	b_autoim___1	b_autoim___2	b_autoim___3	b_autoim___4	b_autoim___5	b_autoim___6	b_autoim___7	b_autoim___8	b_autoim___9	b_autoim___10	b_autoim___11	b_autoim___12	b_autoim___13	b_autoim_type	b_fhx	b_endo___1	b_endo___2	b_endo___3	b_endo___4	b_endo___5	b_endo___6	b_endo___7	b_endo___8	b_endo___9	b_endo___10	b_endo___11	b_endo___12	b_endo_type	hla	b_cancer	b_cancer_hx	b_pmhx	b_meds	b_ppi	b_steroid	b_steroid_type	baseline_visit_complete	drug_name	drug_date	drug_number	drug_comment	checkpoint_inhibitor_treatment_complete	irae_type___1	irae_type___2	irae_type___3	irae_type___4	irae_type___5	irae_type___6	irae_type___7	irae_type___8	irae_type___9	irae_type___10	endo_irae___1	endo_irae___2	endo_irae___3	endo_irae___4	hypophysitis_date	hypophysitis_time	thyroiditis_date	thyroiditis_time	pancreatitis_date	adrenalitis_date	hypophysitis_cycle	thyroid_cycle	pancreas_cycle	adrenal_cycle	endo_irae_comment	skin_date	skin_cycle	skin_type___1	skin_type___2	skin_type___3	skin_type___4	skin_type___5	skin_type___6	skin_type___7	skin_type___8	skin_type___9	skin_type___10	skin_grade	skin_histo	rheum_date	rheum_cycle	rheum_type___1	rheum_type___2	rheum_type___3	rheum_type___4	rheum_type___5	rheum_type___6	rheum_type___7	rheum_grade	rheum_path	gastro_date	gastro_cycle	gastro_type___1	gastro_type___2	gastro_type___3	gastro_type___4	gastro_type___5	gastro_grade	gastro_endoscopy	gastro_histo	gastro_radiol	gastro_cdt	gastro_stool	gastro_path	liver_date	liver_cycle	liver_grade	liver_path	liver_histo	liver_radiol	renal_date	renal_cycle	renal_type___1	renal_type___2	renal_type___3	renal_type___4	renal_type___5	renal_type___6	renal_grade	renal_path	renal_urine	renal_histo	pulm_date	pulm_cycle	pulm_type___1	pulm_type___2	pulm_type___3	pulm_type___4	pulm_grade	pulm_radiol	pulm_histo	pulm_path	cardiac_date	cardiac_cycle	cardiac_type___1	cardiac_type___2	cardiac_type___3	cardiac_type___4	cardiac_type___5	cardiac_grade	cardiac_tropck	cardiac_ecg	cardiac_echo	cardiac_radiol	cardiac_histo	neuro_date	neuro_cycle	neuro_type___1	neuro_type___2	neuro_type___3	neuro_type___4	neuro_type___5	neuro_type___6	neuro_type___7	neuro_type___8	neuro_type___9	neuro_type___10	neuro_type___11	neuro_type___12	neuro_grade	neuro_radiol	neuro_path	irae_steroids	irae_details	irae_emergency	immune_related_adverse_events_iraes_complete	path_date	hb	wcc	neut_lymph	creat	glucose	lipase	a1c_dcct	a1c_ifcc	insulin	cpep	islet_ab___1	islet_ab___2	islet_ab___3	islet_ab___4	gad_ab	ia2_ab	insulin_ab	znt8_ab	cortisol	acth	tsh	ft4	ft3	tpo_ab	tg_ab	trab	fsh	lh	testosterone	oestradiol	igf1	gh	prolactin	alt	albumin	alp	bilirubin	ferritin	crp	vitd	troponin	calprotectin	pathology_complete	pit_image_date	image_type	pit_suspected	pit_size	pit_appearance	hypophysitis_image	pit_alt_image	image_comment	ctmri_imaging_complete	ur_pet	pet_date	pet_timing___1	pet_timing___2	pet_timing___3	pet_timing___4	pet_bsl	pet_uptake_time	sul_peak	suv_max	pet_steroid	b_pet_metastases___1	b_pet_metastases___2	b_pet_metastases___3	b_pet_metastases___4	b_pet_metastases___5	b_pet_metastases___6	b_pet_metastases___7	b_pet_metastases___8	b_pet_metastases___9	b_pet_metastases___10	pet_endo___1	pet_endo___2	pet_endo___3	pet_endo___4	pet_endo___5	pet_endo___6	pet_endo___7	pet_endo___8	pet_endo___9	pet_endo___10	pet_endo___11	pit_pet_fdg	pit_pet_suv	pit_suv_change	hypophysitis_pet	pet_pit_ct	thyr_pet_fdg	thyr_pet_suv	thyr_suv_change	thyroiditis_pet	pet_thyr_ct	panc_pet_fdg	panc_pet_suv	panc_suv_change	pancreatitis_pet	pet_panc_ct	adrenal_pet_fdg	adrenal_pet_suv	adrenal_suv_change	adrenalitis_pet	pet_adrenal_ct	cns_pet_fdg	cns_pet_suv	brain_suv_change	encephalitis_pet	pet_cns_ct	rheum_pet	rheum_suvmax	pet_rheum_ct	liver_pet_fdg	liver_pet_suv	liver_suv_change	hepatitis_pet	pet_liver_ct	uppergi_pet_fdg	uppergi_pet_suv	uppergi_suv_change	gastritis_pet	pet_uppergi_ct	ilium_pet_fdg	ilium_pet_suv	ilium_suv_change	ileitis_pet	pet_ilium_ct	colon_pet_fdg	colon_pet_suv	colon_suv_change	colitis_pet	pet_colon_ct	pet_comments	sul_change	suv_change	percist_pet	eortc_pet	recist_pet	residual_disease___1	residual_disease___2	residual_disease___3	residual_disease___4	residual_disease___5	residual_disease___6	residual_disease___7	residual_disease___8	residual_disease___9	residual_disease___10	pet_metastasis	pet_metastasis_number	site_progression___1	site_progression___2	site_progression___3	site_progression___4	site_progression___5	site_progression___6	site_progression___7	site_progression___8	site_progression___9	site_progression___10	pet_imaging_complete	antibiotic_oral	antibiotic_iv	antibiotic_type	ppi	ppi_and_antibiotic_use_during_treatment_with_cpis_complete	responce_cpi	best_res_pet	time_to_best_response	response_dur_cpi	overall_response	progression_cpi	progression_adrenal	progression_site___1	progression_site___2	progression_site___3	progression_site___4	progression_site___5	progression_site___6	progression_site___7	progression_site___8	progression_site___9	progression_site___10	progression_site___11	other_site	subsequent_rx	response_data_complete	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause	other_cause_death	mortality_data_complete
385	baseline_arm_1	NA	NA	5454920	Carter	Rachel	1	1956-08-28	146	151	70.84	0	0	0	0	0	NA	0	0	0	0	0	0	0	0	0	NA	0	0	0	0	0	0	0	0	NA	NA	2	2	0	0	0	0	1	0	NA	0	0	0	1	0	0	NA	NA	NA	NA	2014-06-23	4	0	2017-12-05	2	23/5/14	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2014-05-23	1	NA	NA	1	1	1	1	0	0	1	confusion	0	0	0	0	0	0	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
385	end_arm_1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	1	2017-07-02	NA	NA	11.3	1	NA	2
385	baseline_arm_1	prior_treatment	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2	Immune checkpoint inhibition: 2016-07-22	NA	3	NA	0	NA	NA	6	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	9	328	83	110	77	2016-07-22	NA	2	1	NA	4	NA	NA	NA	NA	NA	NA	1	3	2016-09-05	1.5	1	NA	0	1	1	NA	2016-10-10	2016-09-05	1	1	0	0	0	1	0	1	0	0	0	0	0	1	0	0	0	1	0	0	0	0	0	0	0	2	0	NA	NA	NA	NA	NA	14/10/16	0	0	0	0	0	0	0	0	0	0	0	0	0	NA	0	0	0	0	NA	0	0	0	0	NA	NA	0	0	0	0	NA	0	0	0	0	0	0	0	0	0	0	0	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
385	baseline_arm_1	prior_treatment	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2	Immune checkpoint inhibition: 2016-10-14	NA	1	NA	0	NA	NA	6	2	1	0	0	0	1	0	1	0	0	0	0	0	0	0	0	NA	NA	NA	NA	NA	2016-10-14	NA	2	1	NA	4	NA	2017-03-30	4	NA	NA	NA	1	2	2016-11-04	0.7	1	NA	0	0	1	NA	NA	2016-11-04	1	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	2	1	2	NA	2016-11-11	4	0	14/10/26	0	0	0	0	0	0	0	0	0	0	0	0	0	NA	0	0	0	0	NA	0	0	0	0	NA	NA	0	0	0	0	NA	0	0	0	0	0	0	0	0	0	0	0	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
490	baseline_arm_1	NA	NA	1088816	Garcia	Claire	1	1985-06-09	197	52	13.40	0	0	0	0	1		0	0	0	0	0	0	0	0	0	NA	0	0	0	0	0	0	0	0	NA	NA	2	2	0	0	0	0	0	0	NA	0	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
490	end_arm_1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	1	2018-12-11	2	NA	24.3	1	NA	2
490	baseline_arm_1	adjuvant_therapy	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	Trial:	NA	NA	NA	0	NA	NA	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
490	baseline_arm_1	prior_treatment	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2	Immune checkpoint inhibition: 2016-12-02	NA	3	NA	1	Keynote 054 - cross over	NA	5	1	0	0	0	1	0	1	1	0	0	0	0	0	0	0	0	9	NA	NA	NA	NA	2016-12-02	NA	NA	2	1	NA	1	2017-02-22	NA	1	NA	NA	1	1	2017-05-18	5.5	1	NA	1	0	0	2017-05-18	NA	NA	0	0	0	0	1	1	1	1	0	0	0	0	0	0	0	0	1	1	1	1	0	0	0	0	0	2	0	NA	NA	NA	NA	0	29/4/17	0	0	0	0	0	0	0	0	0	0	0	0	0	NA	0	0	0	0	NA	0	0	0	0	NA	NA	0	0	0	0	NA	0	0	0	0	0	0	0	0	0	0	0	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
490	baseline_arm_1	prior_treatment	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	3	MAPK inhibitor therapy: 2017-05-19	NA	NA	1	0	NA	NA	5	1	0	0	0	1	1	1	1	0	0	0	0	0	0	0	0	9	NA	NA	NA	NA	2017-05-19	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	NA	NA	NA	NA	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	0	0	0	0	0	0	0	0	0	NA	0	0	0	0	NA	0	0	0	0	NA	NA	0	0	0	0	NA	0	0	0	0	0	0	0	0	0	0	0	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
500	baseline_arm_1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	0	0	0	0	0	0	0	0	0	NA	NA	0	0	0	0	0	0	0	NA	NA	NA	0	0	0	0	0	0	0	NA	0	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
500	baseline_arm_1	prior_treatment	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	3	MAPK inhibitor therapy: 2017-11-01	NA	NA	1	0	NA	NA	6	NA	1	0	0	0	0	1	0	0	0	0	0	0	NA	NA	NA	NA	NA	79	110	87	2017-11-01	NA	2	1	2	2	NA	NA	NA	NA	2018-01-09	2.3	1	2	2018-06-05	7.1	1	2	0	0	1	NA	NA	2018-06-05	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	2	0	NA	NA	NA	NA	0	15/6/18	1	NA	0	0	0	0	0	0	0	0	0	0	0	NA	0	0	0	0	NA	0	0	0	0	NA	NA	0	0	0	0	NA	0	0	0	0	0	0	0	0	0	0	0	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
500	baseline_arm_1	prior_treatment	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2	Immune checkpoint inhibition: 2018-06-29	NA	2	NA	0	NA	NA	6	NA	1	0	0	1	0	0	0	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	2018-06-29	2	2	1	NA	4	NA	2018-09-12	4	NA	2018-09-12	2.5	1	3	2018-09-12	2.5	1	2	0	1	1	NA	2018-09-12	2018-09-12	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	2	NA	NA	NA	NA	NA	0	NA	0	0	0	0	0	0	0	0	0	0	0	0	0	NA	0	0	0	0	NA	0	0	0	0	NA	NA	0	0	0	0	NA	0	0	0	0	0	0	0	0	0	0	0	NA	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
500	baseline_arm_1	adverse_events	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	1	NA	Immune checkpoint inhibition: 2018-06-29	4	Gastrointestinal	NA	NA	8	NA	NA	NA	2018-09-01	3	NA	0	0	0	0	0	0	1	1	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	2018-09-01: Gastrointestinal	2	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Read a subset of fields

If you only need specific variables (e.g., record_id, dob, gender), you can specify a list of field names in the fields argument of redcap_read().

# Define the specific fields to retrieve
selected_fields <- c("first_name", "dob", "mortality_date")  # Replace with actual record IDs

# Read only the selected fields
immuno_some_fields <- redcap_read(
  redcap_uri = uri, 
  token = token, 
  fields = selected_fields
)$data

# print the top 6 rows
head(immuno_some_fields)

record_id	redcap_event_name	redcap_repeat_instrument	redcap_repeat_instance	first_name	dob	mortality_date
1	baseline_arm_1	NA	NA	Hannah	1943-02-12	NA
2	baseline_arm_1	NA	NA	Samantha	2008-01-29	NA
3	baseline_arm_1	NA	NA	Noah	1940-11-06	NA
3	end_arm_1	NA	NA	NA	NA	2017-01-10
4	baseline_arm_1	NA	NA	Aiden	1963-06-04	NA
4	end_arm_1	NA	NA	NA	NA	2024-02-09

In all these cases, the data imported into R from REDCap is in its raw format. For example, a categorical variable like sex, which is expected to contain values such as “Male” or “Female,” may instead be represented as numeric codes (e.g., 1 for Male, 2 for Female). While these values can be manually recoded in R, doing so for large projects with multiple categorical variables can quickly become cumbersome and error-prone.

Furthermore, complex study designs—such as those used in clinical trials, cohort studies, and observational research—often involve longitudinal data or repeating instruments, adding another layer of complexity to data management.

Longitudinal data is used when information is collected at multiple time points or study events (e.g., Baseline, Follow-up).
Repeating instruments allow a single form to be completed multiple times per participant (e.g., recording multiple adverse events, medications, or hospital visits).

Handling these structured data formats in R requires additional steps for cleaning and organisation.

To address these challenges, the REDCapTidieR package extends the functionality of REDCapR, making it easier to analyse complex REDCap datasets.

Try It Yourself

Read records from 260 to 266 and the following fields: record_id, mel_type, systemic_type, systemic_stage, best_response_percist, mortality_treatment_time and mortality_cause.

Solution

# Define the specific fields to retrieve
selected_fields <- c("record_id", "mel_type", "systemic_type", "systemic_stage",
                     "best_response_percist", "mortality_treatment_time",
                     "mortality_cause")  # Replace with actual record IDs

# Read only the selected records and fields
immuno_subset <- redcap_read(
  redcap_uri = uri, 
  token = token, 
  fields = selected_fields,
  records = seq(260, 266)
)$data

# print the subset
immuno_subset

Reading all REDCap data

Unlike REDCapR, which returns a single large dataframe, REDCapTidieR automatically structures and organises the data by breaking it into separate tibbles, each representing a different REDCap instrument. This makes it easier to work with studies involving multiple forms and events.

Before using REDCapTidieR, ensure it is installed along with its dependencies:

# If this fails, run install.packages("REDCapTidieR") or
# devtools::install_github("CHOP-CGTInformatics/REDCapTidieR")
requireNamespace("REDCapTidieR")

To import the entire dataset while maintaining structured tables (or supertibble), use read_redcap():

# Load required packages
library(REDCapTidieR)

# Read entire REDCap project data
immuno <- read_redcap(redcap_uri = uri, token = token, raw_or_label = "raw")

# print the structure of imported data
str(immuno, max.level = 2)

suprtbl [15 × 11] (S3: redcap_supertbl/tbl_df/tbl/data.frame)
 $ redcap_form_name : chr [1:15] "demographics" "melanoma_data" "adjuvant_therapy" "prior_treatment" ...
 $ redcap_form_label: chr [1:15] "Demographics" "Melanoma Data" "Adjuvant Therapy" "Systemic Therapy for Advanced Disease" ...
 $ redcap_data      :List of 15
 $ redcap_metadata  :List of 15
 $ redcap_events    :List of 15
 $ structure        : chr [1:15] "nonrepeating" "nonrepeating" "repeating" "repeating" ...
 $ data_rows        : int [1:15] 498 498 288 765 498 231 144 177 144 441 ...
 $ data_cols        : int [1:15] 38 26 61 177 22 67 41 8 128 47 ...
 $ data_size        : 'lobstr_bytes' num [1:15] 181.33 kB 130.62 kB 117.23 kB   1.10 MB  97.36 kB ...
 $ data_na_pct      : 'formattable' num [1:15]  13%  22%  82%  41%  62% ...
  ..- attr(*, "formattable")=List of 4
 $ form_complete_pct: 'formattable' num [1:15]  74%  74%  97%  96%  34% ...
  ..- attr(*, "formattable")=List of 4

Exploring the Data

The supertibble object can be viewed with the RStudio Data Viewer. You can click on the table icon in the Environment tab to view of the supertibble in the data viewer. At a glance you see an overview of the instruments in the REDCap project.

Data Viewer showing the `immuno` supertibble

You can drill down into individual tables in the redcap_data and redcap_metadata columns. Note that in the demographics data tibble, each row represents a patient, identified by their record_id.

Data Viewer showing the `demographics` data tibble

In the pet_imaging data tibble, each row represents a PET scan information of a specific patient. Each row is identified by the combination of record_id and redcap_form_instance. This difference in granularity is because pet_imaging is a repeating instrument whereas demographics is a nonrepeating instrument.

Data Viewer showing the `pet_imaging` data tibble

You can also explore the metadata tibbles in the redcap_metadata column to find out about field labels, field types, and other field attributes.

Data Viewer showing the `demographics` metadata tibble

Extracting data tibbles from the supertibble

REDCapTidieR provides three different functions to extract data tibbles from a supertibble.

Binding data tibbles into the environment

The bind_tibbles() function takes a supertibble and binds its data tibbles directly into the global environment. When you use bind_tibbles() while working interactively in the RStudio IDE, you will see data tibbles appear in the Environment pane.

immuno |> bind_tibbles()

Demonstration of the `bind_tibbles` function

By default, bind_tibbles() extracts all data tibbles from the supertibble. With the tbls argument you can specify a subset of data tibbles that should be extracted.

Extracting a list of data tibbles

The extract_tibbles() function takes a supertibble and returns a named list of data tibbles. The default is to extract all data tibbles. We use str here to show the structure of the list returned by extract_tibbles().

immuno_instrument_list <- immuno |>
  extract_tibbles()

immuno_instrument_list |>
  str(max.level = 1)

List of 15
 $ demographics                                     : tibble [498 × 38] (S3: tbl_df/tbl/data.frame)
 $ melanoma_data                                    : tibble [498 × 26] (S3: tbl_df/tbl/data.frame)
 $ adjuvant_therapy                                 : tibble [288 × 61] (S3: tbl_df/tbl/data.frame)
 $ prior_treatment                                  : tibble [765 × 177] (S3: tbl_df/tbl/data.frame)
 $ melanoma_cns_metastases                          : tibble [498 × 22] (S3: tbl_df/tbl/data.frame)
 $ adverse_events                                   : tibble [231 × 67] (S3: tbl_df/tbl/data.frame)
 $ baseline_visit                                   : tibble [144 × 41] (S3: tbl_df/tbl/data.frame)
 $ checkpoint_inhibitor_treatment                   : tibble [177 × 8] (S3: tbl_df/tbl/data.frame)
 $ immune_related_adverse_events_iraes              : tibble [144 × 128] (S3: tbl_df/tbl/data.frame)
 $ pathology                                        : tibble [441 × 47] (S3: tbl_df/tbl/data.frame)
 $ ctmri_imaging                                    : tibble [332 × 12] (S3: tbl_df/tbl/data.frame)
 $ pet_imaging                                      : tibble [35 × 112] (S3: tbl_df/tbl/data.frame)
 $ ppi_and_antibiotic_use_during_treatment_with_cpis: tibble [23 × 7] (S3: tbl_df/tbl/data.frame)
 $ response_data                                    : tibble [144 × 23] (S3: tbl_df/tbl/data.frame)
 $ mortality_data                                   : tibble [381 × 10] (S3: tbl_df/tbl/data.frame)

Adding variable labels with the labelled package

REDCapTidieR package allows you to attach labels to variables in the supertibble. Variable labels can make data exploration easier.

immuno |>
  make_labelled() |>
  bind_tibbles()

The make_labelled() function takes a supertibble and returns a supertibble with variable labels applied to the variables of the supertibble as well as to the variables of all data and metadata tibbles in the redcap_data and redcap_metadata columns of the supertibble.

The RStudio Data Viewer shows variable labels below variable names.

Data Viewer showing part of a labelled supertibble

You can use the labelled::look_for() function to explore the variable labels of a tibble.

labelled::look_for(mortality_data)

pos	variable	label	col_type	missing	levels	value_labels
1	record_id	Record ID	dbl	0	NULL	NULL
2	redcap_event	REDCap Event	chr	0	NULL	NULL
3	mortality	Has participant deceased?	dbl	14	NULL	NULL
4	mortality_date	Date of last follow up or death	date	10	NULL	NULL
5	ongoing_survelliance	Ongoing melanoma imaging surveillance?	dbl	71	NULL	NULL
6	date_last_scan	Date of last scan	date	113	NULL	NULL
7	mortality_treatment_time	Time since first treatment dose (months)	dbl	60	NULL	NULL
8	mortality_cause	Cause of Death	dbl	156	NULL	NULL
9	other_cause_death	Cause of death if not melanoma	lgl	381	NULL	NULL
10	form_status_complete	REDCap Instrument Completed?	dbl	0	NULL	NULL

These labels are the REDCap field labels that prompt data entry in the REDCap instrument. REDCapTidieR places them into the field_label variable of the instrument’s metadata tibble. Below you can see that the field labels of the REDCap instrument for mortality_data are the same as the labels above.

REDCap data entry view of the `mortality_data` instrument

In the demographics instrument, a label has a trailing colon : (check the label of autoimmune_disease_select___9 variable below). This won’t look good as a variable label so let’s remove it.

labelled::look_for(demographics)

pos	variable	label	col_type	missing	levels	value_labels
1	record_id	Record ID	dbl	0	NULL	NULL
2	redcap_event	REDCap Event	chr	0	NULL	NULL
3	ur	UR Number	dbl	8	NULL	NULL
4	last_name	Last Name	chr	8	NULL	NULL
5	first_name	First Name	chr	8	NULL	NULL
6	sex	Gender	dbl	8	NULL	NULL
7	dob	Date of Birth	date	8	NULL	NULL
8	height	Height	dbl	8	NULL	NULL
9	weight	Weight	dbl	8	NULL	NULL
10	bmi	BMI	dbl	8	NULL	NULL
11	coenrolled___1	Other Studies Enrolled In: MetaMel	dbl	0	NULL	NULL
12	coenrolled___2	Other Studies Enrolled In: Micromac	dbl	0	NULL	NULL
13	coenrolled___3	Other Studies Enrolled In: MRV	dbl	0	NULL	NULL
14	coenrolled___4	Other Studies Enrolled In: SUMMA	dbl	0	NULL	NULL
15	clinical_trial	Participating in a Clinical Trial	dbl	113	NULL	NULL
16	clinical_trial_description	Describe Trial(s)	chr	401	NULL	NULL
17	medical_history___1	Medical History: Chronic kidney disease	dbl	0	NULL	NULL
18	medical_history___2	Medical History: Diabetes	dbl	0	NULL	NULL
19	medical_history___7	Medical History: Diabetes - Type 1	dbl	0	NULL	NULL
20	medical_history___8	Medical History: Diabetes - Type 2	dbl	0	NULL	NULL
21	medical_history___5	Medical History: GN	dbl	0	NULL	NULL
22	medical_history___3	Medical History: Hypertension	dbl	0	NULL	NULL
23	medical_history___4	Medical History: Ischaemic heart disease	dbl	0	NULL	NULL
24	medical_history___6	Medical History: Vasculitis	dbl	0	NULL	NULL
25	medical_history___99	Medical History: Other	dbl	0	NULL	NULL
26	medical_history_other	Medical History Other Unknown	chr	415	NULL	NULL
27	autoimmune_disease	History of Idiopathic Autoimmune Disease	dbl	102	NULL	NULL
28	autoimmune_disease_select___1	Select Autoimmune Disease(s): Connective tissue disease	dbl	0	NULL	NULL
29	autoimmune_disease_select___2	Select Autoimmune Disease(s): Inflammatory arthritis	dbl	0	NULL	NULL
30	autoimmune_disease_select___3	Select Autoimmune Disease(s): Inflammatory bowel disease	dbl	0	NULL	NULL
31	autoimmune_disease_select___4	Select Autoimmune Disease(s): Interstitial lung disease	dbl	0	NULL	NULL
32	autoimmune_disease_select___5	Select Autoimmune Disease(s): Multiple sclerosis	dbl	0	NULL	NULL
33	autoimmune_disease_select___6	Select Autoimmune Disease(s): Sarcoidosis	dbl	0	NULL	NULL
34	autoimmune_disease_select___9	Select Autoimmune Disease(s): Other, specify:	dbl	0	NULL	NULL
35	autoimmune_disease_other	Describe Autoimmune Disease	lgl	498	NULL	NULL
36	rheumatoid_arthritis	Rheumatoid arthritis	lgl	498	NULL	NULL
37	smoking	Smoking History	dbl	100	NULL	NULL
38	form_status_complete	REDCap Instrument Completed?	dbl	0	NULL	NULL

The make_labelled() function has a format_labels argument that you can use to preprocess labels before applying them to variables.

immuno |>
  make_labelled(format_labels = ~ gsub(":", "", .)) |>
  bind_tibbles()

labelled::look_for(demographics, "autoimmune")

pos	variable	label	col_type	missing	levels	value_labels
27	autoimmune_disease	History of Idiopathic Autoimmune Disease	dbl	102	NULL	NULL
28	autoimmune_disease_select___1	Select Autoimmune Disease(s) Connective tissue disease	dbl	0	NULL	NULL
29	autoimmune_disease_select___2	Select Autoimmune Disease(s) Inflammatory arthritis	dbl	0	NULL	NULL
30	autoimmune_disease_select___3	Select Autoimmune Disease(s) Inflammatory bowel disease	dbl	0	NULL	NULL
31	autoimmune_disease_select___4	Select Autoimmune Disease(s) Interstitial lung disease	dbl	0	NULL	NULL
32	autoimmune_disease_select___5	Select Autoimmune Disease(s) Multiple sclerosis	dbl	0	NULL	NULL
33	autoimmune_disease_select___6	Select Autoimmune Disease(s) Sarcoidosis	dbl	0	NULL	NULL
34	autoimmune_disease_select___9	Select Autoimmune Disease(s) Other, specify	dbl	0	NULL	NULL
35	autoimmune_disease_other	Describe Autoimmune Disease	lgl	498	NULL	NULL

This remove all colons in labels.

Try It Yourself

List all the labels in the prior_treatment instrument that contains the word “response”.

Solution

labelled::look_for(prior_treatment, "response")

Renaming column names using labels

Some columns associated with checkbox fields in REDCap forms often have less intuitive names. For example, in the melanoma_data instrument, the melanoma type columns are named as mel_type___1, mel_type___2, mel_type___3, etc. These names correspond to different melanoma subtypes but are not easily interpretable.

To improve readability, these columns can be renamed using their corresponding labels, making the dataset more intuitive for analysis. The following function automates this renaming process by extracting variable labels and applying them to the column names.

# This function rename checkbox columns using the labels
rename_checkbox_columns <- function(instrument, column_name_prefix) {
  # List of column names to rename 
  col_names_to_rename <- labelled::look_for(instrument, column_name_prefix)$variable
  # New names for the selected columns
  new_names <- labelled::look_for(instrument, column_name_prefix)$label
  new_names <- str_replace_all(new_names, " ", "_")
  new_names <- str_replace_all(new_names, ":", "")
  # Rename the columns
  names(instrument)[names(instrument) %in% col_names_to_rename] <- new_names
  return(instrument)
}

melanoma_data <- rename_checkbox_columns(melanoma_data, "mel_type___")
head(melanoma_data)

record_id	redcap_event	mel_type_cutaneous	mel_mutation_braf	mel_mutation_nras	mel_mutation_kit	mel_mutation_other	mel_first_date	mel_first_stage	resct1stdiag	mel_date_diag	stage_diagnosis	dt_advanced_dis
1	baseline	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
2	baseline	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
3	baseline	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
4	baseline	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
5	baseline	2	V600M	NA	NA	NA	2015-09-01	2	1	2016-03-10	1	27/10/16
6	baseline	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Viewing the Data

This section demonstrates different ways to get to know the immuno dataset and its instruments.

When the name of the object is typed, the first few lines along with some information, such as the number of rows are displayed:

immuno

Since this is a large dataset, the output is not shown here. But you can try executing the above code.

To view any column displayed above in the immuno object, you can specify the column number within [[]] or column name followed by $.

For example to view column 1:

immuno[[1]]

 [1] "demographics"                                     
 [2] "melanoma_data"                                    
 [3] "adjuvant_therapy"                                 
 [4] "prior_treatment"                                  
 [5] "melanoma_cns_metastases"                          
 [6] "adverse_events"                                   
 [7] "baseline_visit"                                   
 [8] "checkpoint_inhibitor_treatment"                   
 [9] "immune_related_adverse_events_iraes"              
[10] "pathology"                                        
[11] "ctmri_imaging"                                    
[12] "pet_imaging"                                      
[13] "ppi_and_antibiotic_use_during_treatment_with_cpis"
[14] "response_data"                                    
[15] "mortality_data"

For example to view redcap_form_name column:

immuno$redcap_form_name

 [1] "demographics"                                     
 [2] "melanoma_data"                                    
 [3] "adjuvant_therapy"                                 
 [4] "prior_treatment"                                  
 [5] "melanoma_cns_metastases"                          
 [6] "adverse_events"                                   
 [7] "baseline_visit"                                   
 [8] "checkpoint_inhibitor_treatment"                   
 [9] "immune_related_adverse_events_iraes"              
[10] "pathology"                                        
[11] "ctmri_imaging"                                    
[12] "pet_imaging"                                      
[13] "ppi_and_antibiotic_use_during_treatment_with_cpis"
[14] "response_data"                                    
[15] "mortality_data"

A similar method can be used to access the patient data in all instruments using the redcap_data column or the 3rd column in this case. However, this displays patient data of all the instruments one after the other, making it difficult to read. A better way is to view a single instrument as follows.

For example, to view the mortality_data instrument, we can access the redcap_data column first (i.e., immuno$redcap_data or immuno[[3]]) and then access the 15th instrument:

head(immuno$redcap_data[[15]]) # same as immuno[[3]][[15]]

record_id	redcap_event	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause	other_cause_death	form_status_complete
3	end	1	2017-01-10	2	2016-10-24	4.7	NA	NA	2
4	end	0	2024-02-09	2	2023-06-29	NA	NA	NA	2
7	end	0	2024-10-15	2	2021-09-01	NA	NA	NA	2
8	end	0	2024-09-20	1	2024-09-09	NA	NA	NA	2
12	end	1	2016-11-18	NA	2016-01-04	NA	1	NA	2
16	end	0	2024-04-26	1	2023-07-14	NA	1	NA	2

The dim() function prints the dimensions (rows x columns):

dim(immuno)

[1] 15 11

dim(immuno$redcap_data[[15]])

[1] 381  10

This information is available at the environment pane in the top right panel as the number of observations (rows) and variables (columns).

The nrow() function prints the number of rows while ncol() prints the number of columns:

nrow(immuno$redcap_data[[15]])

[1] 381

ncol(immuno$redcap_data[[15]])

[1] 10

The View() function gives a spreadsheet-like view of the data frame:

View(immuno)

By clicking the object on the environment tab also gives a spreadsheet-like view of the object:

The head() function prints the top 6 rows of a data frame:

head(immuno$redcap_data[[15]])

record_id	redcap_event	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause	other_cause_death	form_status_complete
3	end	1	2017-01-10	2	2016-10-24	4.7	NA	NA	2
4	end	0	2024-02-09	2	2023-06-29	NA	NA	NA	2
7	end	0	2024-10-15	2	2021-09-01	NA	NA	NA	2
8	end	0	2024-09-20	1	2024-09-09	NA	NA	NA	2
12	end	1	2016-11-18	NA	2016-01-04	NA	1	NA	2
16	end	0	2024-04-26	1	2023-07-14	NA	1	NA	2

Similarly, the tail() function prints the bottom 6 rows of the data frame:

tail(immuno$redcap_data[[15]])

record_id	redcap_event	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause	other_cause_death	form_status_complete
492	end	1	2017-10-18	NA	NA	6.8	1	NA	2
493	end	1	2024-06-26	2	NA	32.5	1	NA	2
494	end	1	2016-02-12	2	NA	6.1	1	NA	2
496	end	1	2018-06-28	2	NA	NA	1	NA	2
497	end	1	2016-08-04	2	NA	1.1	1	NA	2
499	end	NA	2018-10-09	NA	NA	NA	NA	NA	2

The colnames() function displays all the column names:

colnames(immuno$redcap_data[[15]])

 [1] "record_id"                "redcap_event"            
 [3] "mortality"                "mortality_date"          
 [5] "ongoing_survelliance"     "date_last_scan"          
 [7] "mortality_treatment_time" "mortality_cause"         
 [9] "other_cause_death"        "form_status_complete"

The $ symbol allows access to individual columns. To display mortality_date column:

head(immuno$redcap_data[[15]]$mortality_date)

[1] "2017-01-10" "2024-02-09" "2024-10-15" "2024-09-20" "2016-11-18"
[6] "2024-04-26"

The str() function shows the structure of the data:

str(immuno$redcap_data[[15]])

tibble [381 × 10] (S3: tbl_df/tbl/data.frame)
 $ record_id               : num [1:381] 3 4 7 8 12 16 23 24 28 29 ...
 $ redcap_event            : chr [1:381] "end" "end" "end" "end" ...
 $ mortality               : num [1:381] 1 0 0 0 1 0 1 1 0 1 ...
 $ mortality_date          : Date[1:381], format: "2017-01-10" "2024-02-09" ...
 $ ongoing_survelliance    : num [1:381] 2 2 2 1 NA 1 NA 2 1 2 ...
 $ date_last_scan          : Date[1:381], format: "2016-10-24" "2023-06-29" ...
 $ mortality_treatment_time: num [1:381] 4.7 NA NA NA NA NA NA 17.2 80.4 9.7 ...
 $ mortality_cause         : num [1:381] NA NA NA NA 1 1 1 1 1 1 ...
 $ other_cause_death       : logi [1:381] NA NA NA NA NA NA ...
 $ form_status_complete    : num [1:381] 2 2 2 2 2 2 2 2 2 2 ...

The glimpse()function (dplyr package) displays a compact summary of the data frame, showing you key details such as the data types of each column, the first few values, and the total number of observations.

glimpse(immuno$redcap_data[[15]])

Rows: 381
Columns: 10
$ record_id                <dbl> 3, 4, 7, 8, 12, 16, 23, 24, 28, 29, 31, 37, 3…
$ redcap_event             <chr> "end", "end", "end", "end", "end", "end", "en…
$ mortality                <dbl> 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, …
$ mortality_date           <date> 2017-01-10, 2024-02-09, 2024-10-15, 2024-09-…
$ ongoing_survelliance     <dbl> 2, 2, 2, 1, NA, 1, NA, 2, 1, 2, NA, 2, 1, 1, …
$ date_last_scan           <date> 2016-10-24, 2023-06-29, 2021-09-01, 2024-09-…
$ mortality_treatment_time <dbl> 4.7, NA, NA, NA, NA, NA, NA, 17.2, 80.4, 9.7,…
$ mortality_cause          <dbl> NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, 1, 1, NA, N…
$ other_cause_death        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ form_status_complete     <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …

The summary() function generates summary statistics:

summary(immuno$redcap_data[[15]])

   record_id     redcap_event         mortality      mortality_date      
 Min.   :  3.0   Length:381         Min.   :0.0000   Min.   :2013-11-15  
 1st Qu.:150.0   Class :character   1st Qu.:0.0000   1st Qu.:2018-08-02  
 Median :270.0   Mode  :character   Median :1.0000   Median :2021-07-20  
 Mean   :265.2                      Mean   :0.5313   Mean   :2021-04-11  
 3rd Qu.:382.0                      3rd Qu.:1.0000   3rd Qu.:2024-02-15  
 Max.   :499.0                      Max.   :1.0000   Max.   :2024-10-15  
                                    NA's   :14       NA's   :10          
 ongoing_survelliance date_last_scan       mortality_treatment_time
 Min.   :1.000        Min.   :0624-01-02   Min.   :  0.10          
 1st Qu.:1.000        1st Qu.:2020-05-30   1st Qu.: 11.70          
 Median :2.000        Median :2023-07-06   Median : 33.40          
 Mean   :1.597        Mean   :2016-11-19   Mean   : 40.41          
 3rd Qu.:2.000        3rd Qu.:2024-02-09   3rd Qu.: 62.30          
 Max.   :2.000        Max.   :2028-07-06   Max.   :131.50          
 NA's   :71           NA's   :113          NA's   :60              
 mortality_cause other_cause_death form_status_complete
 Min.   :1.000   Mode:logical      Min.   :0.000       
 1st Qu.:1.000   NA's:381          1st Qu.:2.000       
 Median :1.000                     Median :2.000       
 Mean   :1.173                     Mean   :1.995       
 3rd Qu.:1.000                     3rd Qu.:2.000       
 Max.   :3.000                     Max.   :2.000       
 NA's   :156

A statistical overview can be obtained using the skim() function in skimr package:

library(skimr)
skim(immuno$redcap_data[[15]])

Data summary
Name	immuno$redcap_data[[15]]
Number of rows	381
Number of columns	10
_______________________
Column type frequency:
character	1
Date	2
logical	1
numeric	6
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
redcap_event	0	1	3	3	0	1	0

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
mortality_date	10	0.97	2013-11-15	2024-10-15	2021-07-20	293
date_last_scan	113	0.70	0624-01-02	2028-07-06	2023-07-06	218

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
other_cause_death	381	0	NaN	:

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
record_id	0	1.00	265.17	134.35	3.0	150.0	270.0	382.0	499.0	▅▇▇▇▇
mortality	14	0.96	0.53	0.50	0.0	0.0	1.0	1.0	1.0	▇▁▁▁▇
ongoing_survelliance	71	0.81	1.60	0.49	1.0	1.0	2.0	2.0	2.0	▆▁▁▁▇
mortality_treatment_time	60	0.84	40.41	31.58	0.1	11.7	33.4	62.3	131.5	▇▅▃▂▁
mortality_cause	156	0.59	1.17	0.54	1.0	1.0	1.0	1.0	3.0	▇▁▁▁▁
form_status_complete	0	1.00	1.99	0.10	0.0	2.0	2.0	2.0	2.0	▁▁▁▁▇

Try It Yourself

Display the number of rows and columns in the melanoma_cns_metastases instrument.
Show the first 6 and last 6 records from the melanoma_cns_metastases instrument.
List the column names and row names of the instrument.
Generate a statistical summary of the instrument using the skim() function.

Solution

# number of rows
nrow(immuno$redcap_data[[5]])
# number of columns
ncol(immuno$redcap_data[[5]])

# first 6 records
head(immuno$redcap_data[[5]])
# last 6 records
tail(immuno$redcap_data[[5]])

# column names
colnames(immuno$redcap_data[[5]])
# row names (no names given, so indices are used)
rownames(immuno$redcap_data[[5]])

skim(immuno$redcap_data[[5]])

Writing Data to a File

Writing data to a file is a fundamental operation in programming and data analysis. It involves taking data from within a program or environment and storing it in a file on a disk for later use or sharing. This section explains the basics of writing a data file using the readr package.

The write_csv() and write_tsv() functions are part of the readr package, which is designed for writing delimited files like CSV (comma-separated values) and TSV (tab-separated values). These functions are used to write data frames into CSV and TSV files, respectively.

We first provide the variable name of the data frame followed by the file name (ideally including the full folder location).

To write a CSV file:

# on Mac:
write_csv(cms_data, "~/Desktop/cms_data.csv")

# on Windows
write_csv(cms_data, "C:/Users/srajapaksa/Desktop/cms_data.csv")

To write a TSV file:

# on Mac:
write_tsv(cms_data, "~/Desktop/cms_data.csv")

# on Windows
write_tsv(cms_data, "C:/Users/srajapaksa/Desktop/cms_data.csv")

Try It Yourself

View the help documentation of the read_redcap() function.
Use read_redcap() to make an API call and read the response_data instrument.
Extract the first 10 rows and assign them to a new variable called response_data_10.
Save response_data_10 as a CSV file named immuno_response_data_10.csv in your Downloads folder.

Solution

help("read_redcap")

response_data_df <- read_redcap(redcap_uri = uri, token = token, raw_or_label = "raw", forms = "response_data")

response_data_10 <- head(response_data_df$redcap_data[[1]], 10)

write_csv(response_data_10, "~/Downloads/immuno_response_data_10.csv")

Data manipulation with `dplyr` functions

Common tasks in working with data include actions like filtering rows or columns, performing calculations, or adding new columns. This sort of operations is known as data manipulation. It is the process of cleaning, organising, and transforming raw data into a more structured and usable format for analysis.

In this workshop, we’ll guide you through the process of data manipulation in R, starting with the tidyverse. The tidyverse is a collection of packages that align with a data science philosophy developed by Hadley Wickham and the RStudio team. Many users find it to be a more intuitive way to grasp R concepts.

You’ll primarily use five key dplyr functions for data manipulations:

filter(): pick observations based on their values.
select(): pick variables by their names.
mutate(): create new variables using functions applied to existing variables.
summarise(): collapse multiple values into a single summary.
arrange(): reorder the rows based on specified criteria.

If you’ve already installed the tidyverse package (if not, you can do so by running the command: install.packages("tidyverse")), let’s proceed to load it into our R session first:

library(tidyverse)

Next, load the pre-processed RDS Object:

immuno_dataset <- readRDS("data/Sample_immuno_dataset.rds")

RDS (R Data Serialisation) files are used to save and load single R objects while preserving their structure, labels, and attributes. The .rds format is useful for storing dataframes, lists, models, and other complex objects. For convenience, the previously loaded REDCap dataset has been pre-processed and saved as an .rds file. This pre-processed version will be used throughout the remainder of the workshop.

`filter()`

The filter() function takes logical expressions and returns the rows for which all are TRUE.

Example 1: Find all records from the melanoma_data data frame where the melanoma type is cutaneous.

immuno_dataset$redcap_data$melanoma_data |> 
  filter(melanoma_type == "cutaneous") |> 
  head()

record_id	redcap_event	melanoma_type	melanoma_molecular_mutation	mel_first_date	mel_first_stage	resct1stdiag	mel_date_diag	stage_diagnosis	dt_advanced_dis
16	baseline	cutaneous	nras	2019-06-01	Stage III	TRUE	2017-01-01	yes	01062017
24	baseline	cutaneous	wild_type	2014-03-27	Stage II	TRUE	2015-01-01	yes	1/9/15
28	baseline	cutaneous	braf	2015-09-01	Stage II	TRUE	2016-03-10	yes	27/10/16
57	baseline	cutaneous	nras	2019-12-01	Stage I	TRUE	2020-07-09	yes	08/01/2021
58	baseline	cutaneous	wild_type	2013-10-01	Stage I	TRUE	2015-09-14	yes	6/3/17
60	baseline	cutaneous	braf	2017-03-01	Stage III	TRUE	2019-11-01	no	1/11/19

Here we are sending the immuno_dataset$redcap_data$melanoma_data data frame into the function filter() which tests each value in melanoma_type column for the value “cutaneous” and returns the rows where this condition is TRUE.

You can check the dimension (number of rows and number of columns) of the resulting data frame by using the dim() function as follows:

immuno_dataset$redcap_data$melanoma_data |> filter(melanoma_type == "cutaneous") |> dim()

[1] 265  10

Example 2: Identify records in mortality_data where time since first treatment dose (mortality_treatment_time) exceeds 1 year. mortality_treatment_time is given in months.

immuno_dataset$redcap_data$mortality_data |> 
  filter(mortality_treatment_time > 12) |> 
  head()

record_id	redcap_event	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause
24	end	TRUE	2017-03-03	no	NA	17.2	melanoma progression
28	end	FALSE	2023-08-18	yes	2022-08-03	80.4	melanoma progression
31	end	TRUE	2017-11-09	NA	NA	54.3	melanoma progression
38	end	FALSE	2024-04-15	yes	2024-06-01	109.6	NA
43	end	FALSE	2024-06-12	yes	2024-06-13	61.9	NA
48	end	FALSE	2023-01-26	yes	NA	89.0	NA

We can use logical operators like and &, or | to combine multiple conditions as follows.

Example 3: Find all the records in mortality_data where cause of death (mortality_cause) is categorised as “melanoma progression” and has a date of last follow up or death (mortality_date) before “2023-01-01”

immuno_dataset$redcap_data$mortality_data |> 
  filter(mortality_cause == "melanoma progression" & mortality_date > as.Date("2023-01-01")) |> 
  head()

record_id	redcap_event	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause
16	end	FALSE	2024-04-26	yes	2023-07-14	NA	melanoma progression
28	end	FALSE	2023-08-18	yes	2022-08-03	80.4	melanoma progression
68	end	FALSE	2024-01-19	yes	2023-04-18	46.1	melanoma progression
72	end	FALSE	2024-03-21	yes	2024-03-15	53.6	melanoma progression
77	end	FALSE	2024-07-05	yes	2024-06-28	91.1	melanoma progression
87	end	TRUE	2024-01-28	no	2015-10-17	24.9	melanoma progression

Example 4: Find the records in melanoma_data where melanoma_molecular_mutation is either braf or nras.

immuno_dataset$redcap_data$melanoma_data |> 
  filter(melanoma_molecular_mutation == "braf" | melanoma_molecular_mutation == "braf") |> 
  head()

record_id	redcap_event	melanoma_type	melanoma_molecular_mutation	mel_first_date	mel_first_stage	resct1stdiag	mel_date_diag	stage_diagnosis	dt_advanced_dis
28	baseline	cutaneous	braf	2015-09-01	Stage II	TRUE	2016-03-10	yes	27/10/16
60	baseline	cutaneous	braf	2017-03-01	Stage III	TRUE	2019-11-01	no	1/11/19
70	baseline	cutaneous	braf	2012-01-01	Stage unknown as pathology unavailable	TRUE	2017-08-18	no	NA
79	baseline	cutaneous	braf	2015-04-01	Stage unknown as pathology unavailable	TRUE	2018-12-01	no	1/12/18
80	baseline	cutaneous	braf	2013-01-01	Stage I	TRUE	2015-10-01	yes	8/10/15
85	baseline	cutaneous	braf	2012-07-01	Stage II	TRUE	2014-12-04	no	4/12/14

Example 5: Retrieve records where mortality_cause is due to “treatment toxicity” and mortality_treatment_time is greater than 4 months but less than or equal to 10 months.

immuno_dataset$redcap_data$mortality_data |> 
  filter(
    mortality_cause == "treatment toxicity" & 
    mortality_treatment_time > 4 & 
    mortality_treatment_time <= 10)

record_id	redcap_event	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause
414	end	TRUE	2019-02-18	no	2019-01-31	7.0	treatment toxicity
429	end	TRUE	2020-04-22	no	NA	5.2	treatment toxicity

`%in%` helper

The %in% function is used to determine whether elements of one vector are present in another vector. It returns a logical vector indicating whether each element of the first vector is found in the second vector.

When we want to filter a subset of rows that may contain multiple different values, it’s more efficient to provide a vector of the values of interest instead of combining multiple OR commands.

Example 6: Retrieve records where melanoma_type is acral, cutaneous or muscosal.

immuno_dataset$redcap_data$melanoma_data |> 
  filter(melanoma_type %in% c("acral", "cutaneous",  "muscosal")) |> 
  head()

record_id	redcap_event	melanoma_type	melanoma_molecular_mutation	mel_first_date	mel_first_stage	resct1stdiag	mel_date_diag	stage_diagnosis	dt_advanced_dis
16	baseline	cutaneous	nras	2019-06-01	Stage III	TRUE	2017-01-01	yes	01062017
24	baseline	cutaneous	wild_type	2014-03-27	Stage II	TRUE	2015-01-01	yes	1/9/15
28	baseline	cutaneous	braf	2015-09-01	Stage II	TRUE	2016-03-10	yes	27/10/16
44	baseline	acral	NA	2012-04-20	Stage II	TRUE	2016-03-30	no	30/3/16
57	baseline	cutaneous	nras	2019-12-01	Stage I	TRUE	2020-07-09	yes	08/01/2021
58	baseline	cutaneous	wild_type	2013-10-01	Stage I	TRUE	2015-09-14	yes	6/3/17

Try It Yourself

Find patients who have stage III or IV disease (mel_first_stage column) and the first recurrence (mel_date_diag) is after 2021-03-01.

Hint: Use melanoma_data instrument

Solution

immuno_dataset$redcap_data$melanoma_data |> 
  filter(
    mel_first_stage %in% c("Stage III", "Stage IV") & 
    mel_date_diag > as.Date("2021-03-01"))

`select()`

The select() function returns a subset of the variables or columns.

This function can accept column names (even without quotation marks) or the column position number starting from the left. Unlike in base R (we explore before), commands within the brackets in select() do not need to be concatenated using c().

Example 1: Extract the record ID, Echo date (ae_echo_date) and MRI date (ae_mri_date) columns from adverse_events data frame.

immuno_dataset$redcap_data$adverse_events |> 
  select(record_id, ae_echo_date, ae_mri_date) |> 
  filter(!is.na(ae_echo_date)) # filter non-missing values in ae_echo_date column

record_id	ae_echo_date	ae_mri_date
104	2022-10-19	2022-09-27
104	2022-10-19	2022-09-27
104	2022-10-19	2022-09-27
104	2022-10-19	2022-09-27

Using column positions:

immuno_dataset$redcap_data$adverse_events |> 
  select(1, 23, 30) |> 
  filter(!is.na(ae_echo_date)) # filter non-missing values in ae_echo_date column

record_id	ae_echo_date	ae_mri_date
104	2022-10-19	2022-09-27
104	2022-10-19	2022-09-27
104	2022-10-19	2022-09-27
104	2022-10-19	2022-09-27

We can use the ‘-’ symbol to extract all columns except for specific ones:

immuno_dataset$redcap_data$demographics |> 
  select(-redcap_event, -ur, -sex, -dob, -height, -weight) |> 
  head()

record_id	last_name	first_name	bmi	other_studies_enrolled_in	medical_history	autoimmune_disease	select_autoimmune_diseases	smoking
1	Jackson	Hannah	68.73	NA	NA	NA	NA	NA
2	Howard	Samantha	29.61	NA	NA	NA	NA	NA
3	Martinez	Noah	31.06	NA	NA	NA	NA	NA
4	Lewis	Aiden	51.51	NA	NA	NA	NA	NA
5	Jenkins	Connor	20.28	NA	NA	FALSE	NA	Past smoker
6	Allen	Claire	28.06	NA	NA	NA	NA	NA

Or use a combination of column names and positions:

immuno_dataset$redcap_data$demographics |> 
  select(1, medical_history, 15) |> 
  head()

record_id	medical_history	smoking
1	NA	NA
2	NA	NA
3	NA	NA
4	NA	NA
5	NA	Past smoker
6	NA	NA

Useful helper functions

The select helper functions (check ?select_helpers) are a set of convenience functions provided by the dplyr package. These functions offer shortcuts for selecting columns based on specific criteria or patterns, making it easier to work with data frames.

Some commonly used select helper functions include:

starts_with(): selects columns that start with a specified prefix.

immuno_dataset$redcap_data$immune_related_adverse_events_iraes |> 
  select(starts_with('liver')) |> 
  head()

liver_date	liver_cycle	liver_grade	liver_path	liver_histo	liver_radiol
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA

ends_with(): selects columns that end with a specified suffix.

immuno_dataset$redcap_data$immune_related_adverse_events_iraes |> 
  select(ends_with('date')) |> 
  head()

hypophysitis_date	thyroiditis_date	pancreatitis_date	skin_date	rheum_date	gastro_date	liver_date	renal_date	pulm_date	neuro_date
2017-01-20	NA	NA	2016-09-21	NA	NA	NA	NA	NA	NA
2017-01-20	NA	NA	2016-09-21	NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

contains(): selects columns that contain a specified substring.

immuno_dataset$redcap_data$immune_related_adverse_events_iraes |> 
  select(contains('skin')) |> 
  head()

skin_date	skin_cycle	characterise_skin/hair_irae	skin_grade	skin_histo
2016-09-21	NA	bullous_pemphigoid	3	Consistent with a pemphigoid-type reaction, possibly drug-related or consistent with Bullous Pemphigoid.
2016-09-21	NA	bullous_pemphigoid	3	Consistent with a pemphigoid-type reaction, possibly drug-related or consistent with Bullous Pemphigoid.
NA	NA	NA	NA	NA
NA	NA	NA	NA	NA
NA	NA	NA	NA	NA
NA	NA	NA	NA	NA

everything(): Selects all columns.

This function returns all column names that have not been specified. It is often used when reordering all columns in a dataframe:

immuno_dataset$redcap_data$immune_related_adverse_events_iraes |> 
  select(1, starts_with("gastro"), everything()) |> 
  head()

record_id	gastro_date	gastro_cycle	gastro_grade	gastro_endoscopy	gastro_histo	gastro_radiol	gastro_cdt	gastro_stool	redcap_event	immune_related_adverse_event	endocrine_toxicity	hypophysitis_date	hypophysitis_time	thyroiditis_date	thyroiditis_time	pancreatitis_date	hypophysitis_cycle	thyroid_cycle	pancreas_cycle	skin_date	skin_cycle	characterise_skin/hair_irae	skin_grade	skin_histo	rheum_date	rheum_cycle	characterise_rheumatic_irae	rheum_grade	rheum_path	characterise_gastrointestinal_irae	liver_date	liver_cycle	liver_grade	liver_path	liver_histo	liver_radiol	renal_date	renal_cycle	characterise_renal_irae_	renal_grade	renal_path	renal_histo	pulm_date	pulm_cycle	characterise_pulmonary_irae	pulm_grade	pulm_radiol	neuro_date	neuro_cycle	classify_neurological_irae	neuro_grade	neuro_radiol	irae_steroids	irae_details	irae_emergency
31	NA	NA	NA	NA	NA	NA	NA	NA	legacy_data	endocrine	pituitary	2017-01-20	234	NA	NA	NA	NA	NA	NA	2016-09-21	NA	bullous_pemphigoid	3	Consistent with a pemphigoid-type reaction, possibly drug-related or consistent with Bullous Pemphigoid.	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	TRUE	ipi 2013 - no irae pembro 2015 - no irae ipi 2016 - Bullous pemphigoid - eruption right thigh, limbs,feet Plaque left cheek Derm Imp: bullous pemphigoid, ?secondary to ipilimumab ipi nivo 2017 - hypophysitis and recurrence of bullous pemphigoid. Steroids and IvIg	TRUE
31	NA	NA	NA	NA	NA	NA	NA	NA	legacy_data	skin/hair	pituitary	2017-01-20	234	NA	NA	NA	NA	NA	NA	2016-09-21	NA	bullous_pemphigoid	3	Consistent with a pemphigoid-type reaction, possibly drug-related or consistent with Bullous Pemphigoid.	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	TRUE	ipi 2013 - no irae pembro 2015 - no irae ipi 2016 - Bullous pemphigoid - eruption right thigh, limbs,feet Plaque left cheek Derm Imp: bullous pemphigoid, ?secondary to ipilimumab ipi nivo 2017 - hypophysitis and recurrence of bullous pemphigoid. Steroids and IvIg	TRUE
74	NA	NA	NA	NA	NA	NA	NA	NA	legacy_data	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
96	NA	NA	NA	NA	NA	NA	NA	NA	legacy_data	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
97	NA	NA	NA	NA	NA	NA	NA	NA	legacy_data	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
102	NA	NA	NA	NA	NA	NA	NA	NA	legacy_data	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Here the dimensions of the dataframe is not changed, merely the column order.

You can combine multiple helper functions to create more complex selection criteria. Additionally, you can use the ‘-’ symbol in front of the helper function to exclude the matched columns.

Try It Yourself

Identify patients whose first response (overall_response) and best response on PET (best_res_pet) were either CMR (Complete metabolic response) or PMR (Partial metabolic response). Display only relevant columns: patient ID (record_id), first response (overall_response), and best PET response (best_res_pet).

Hint: Use response_data instrument

Solution

immuno_dataset$redcap_data$response_data |> 
  filter(best_res_pet %in% c("Complete metabolic response", "Partial metabolic response"), 
         overall_response %in% c("Complete metabolic response", "Partial metabolic response")) |> 
  select(record_id, best_res_pet, overall_response)

`mutate()`

The mutate() function adds new columns of data, thus ‘mutating’ the contents and dimensions of the input data frame.

Example 1: Calculate the BMI of patients (i.e, $\text{BMI } = \frac{\text{nweight}}{\text{height in m} \times \text{ height in m}} \times 100$).

Here we use the round() function to round off the result to the closest integer or numeric value as number of responses cannot contain decimal values.

immuno_dataset$redcap_data$demographics |> 
  mutate(bmi_new = weight / (height/100 * height/100)) |> 
  head()

record_id	redcap_event	ur	last_name	first_name	sex	dob	height	weight	bmi	other_studies_enrolled_in	medical_history	autoimmune_disease	select_autoimmune_diseases	smoking	bmi_new
1	baseline	2810493	Jackson	Hannah	Male	1943-02-12	154	163	68.73	NA	NA	NA	NA	NA	68.72997
2	baseline	6408685	Howard	Samantha	Male	2008-01-29	181	97	29.61	NA	NA	NA	NA	NA	29.60838
3	baseline	9994173	Martinez	Noah	Male	1940-11-06	199	123	31.06	NA	NA	NA	NA	NA	31.05982
4	baseline	9580798	Lewis	Aiden	Male	1963-06-04	189	184	51.51	NA	NA	NA	NA	NA	51.51032
5	baseline	2653008	Jenkins	Connor	Male	2019-08-15	157	50	20.28	NA	NA	FALSE	NA	Past smoker	20.28480
6	baseline	931154	Allen	Claire	Male	1957-07-12	145	59	28.06	NA	NA	NA	NA	NA	28.06183

This creates a new column at the end of the data frame named bmi_new and computes the BMI. Because the number of columns is expanding, we can reduce the number of columns displayed using the select() function.

To do this, we need to use chaining which is discussed before.

Let’s use chaining to combine both select() and mutate() operations for the previous example:

immuno_dataset$redcap_data$demographics |> 
  select(record_id, weight, height, bmi) |> 
  mutate(bmi_new = weight / (height/100 * height/100)) |> 
  head()

record_id	weight	height	bmi	bmi_new
1	163	154	68.73	68.72997
2	97	181	29.61	29.60838
3	123	199	31.06	31.05982
4	184	189	51.51	51.51032
5	50	157	20.28	20.28480
6	59	145	28.06	28.06183

`case_when` helper function

The case_when() function allows you to create conditional statements inside mutate(). It is a vectorized alternative to multiple ifelse() statements, making the code cleaner and easier to read. Each case is evaluated sequentially and the first match for each element determines the corresponding value in the output vector. If no cases match, the .default is used as a final “else” statment.

case_when(
  condition1 ~ value1,
  condition2 ~ value2,
  condition3 ~ value3,
  TRUE ~ default_value
)

Each condition is checked in order, and the corresponding value is assigned if the condition is TRUE. The TRUE ~ default_value at the end acts as a fallback for any rows that do not match previous conditions.

Example 2: generates a new column bmi_category, which classifies patients based on their Body Mass Index (BMI) using the following categories:

“Underweight”: BMI < 18.5
“Normal Weight”: 18.5 ≤ BMI < 25
“Overweight”: 25 ≤ BMI < 30
“Obese”: BMI ≥ 30.

immuno_dataset$redcap_data$demographics |> 
  select(record_id, weight, height, bmi) |> 
  mutate(bmi_category = case_when(
    bmi < 18.5 ~ "Underweight",
    bmi >= 18.5 & bmi < 25 ~ "Normal Weight",
    bmi >= 25 & bmi < 30 ~ "Overweight",
    bmi > 30 ~ "Obese",
    TRUE ~ "Unknown"
  )) |> head()

record_id	weight	height	bmi	bmi_category
1	163	154	68.73	Obese
2	97	181	29.61	Overweight
3	123	199	31.06	Obese
4	184	189	51.51	Obese
5	50	157	20.28	Normal Weight
6	59	145	28.06	Overweight

Here mutate() creates a new column bmi_category. case_when() assigns categories based on the BMI values and the final condition TRUE ensures any missing or unclassified values are labeled as “Unknown”.

Try It Yourself

Determine whether patients have “normal” or “abnormal” serum creatinine levels based on their sex (sex) and creatinine (sys_path_creat). Create a new column creat_level with the following categories:

“Normal”:
- Men (sex = “Male”) with creatinine between 0.7 and 1.3 mg/dL
- Women (sex = “Female”) with creatinine between 0.6 and 1.1 mg/dL
“Abnormal”: Otherwise

The demog_prior_treatment dataset, which merges demographic and prior treatment instruments, has already been provided.

Hint: To convert serum creatinine from µmol/L to mg/dL, use the conversion factor:

\[ \text{Creatinine (mg/dL)} = \text{Creatinine (µmol/L)} \times 0.0113 \]

demog_prior_treatment <- full_join(immuno_dataset$redcap_data$demographics,
                                   immuno_dataset$redcap_data$prior_treatment, by = "record_id") |> 
  filter(!is.na(sys_path_creat)) # filter non-missing values in sys_path_creat column

Solution

demog_prior_treatment |> 
  mutate(creat_level = case_when(
    sex == "Male" & sys_path_creat * 0.0113 >= 0.7 & sys_path_creat * 0.0113 <= 1.3 ~ "Normal",
    sex == "Female" & sys_path_creat * 0.0113 >= 0.6 & sys_path_creat * 0.0113 <= 1.1 ~ "Normal",
    TRUE ~ "Abnormal"
)) |> 
  select(record_id, sex, sys_path_creat, creat_level)

`summarise()`

The summarise() function creates individual summary statistics from larger data sets.

The output of summarise()/summarize() differs qualitatively from the input. It results in a smaller dataframe with a reduced representation of the original data. While not strictly necessary, it’s advisable to assign new column names for the summary statistics generated by this function. This practice enhances clarity and organisation in your data analysis workflow.

Example 1: Calculate the mean number of creatinine (sys_path_creat) in prior_treatment data frame.

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(mean_creatinine = mean(sys_path_creat))

mean_creatinine
NA

This results in a data frame of size 1 row $\times$ 1 col with a value of NA, indicating that the result is either Not Applicable or missing. This occurs because the column contains missing values, making the mean calculation invalid. To compute the mean creatinine level while excluding missing values, use the na.rm = TRUE argument in the mean() function.

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(mean_creatinine = mean(sys_path_creat, na.rm = TRUE))

mean_creatinine
69.14035

We can create additional summary statistics by adding them in a comma-separated sequence as follows:

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(mean_creatinine = mean(sys_path_creat, na.rm = TRUE),
            min_creatinine = min(sys_path_creat, na.rm = TRUE),
            max_creatinine = max(sys_path_creat, na.rm = TRUE),
            total_creatinine = sum(sys_path_creat, na.rm = TRUE))

mean_creatinine	min_creatinine	max_creatinine	total_creatinine
69.14035	51	108	3941

`n()` helper function

This function counts the number of observations in a dataset. It does not take any arguments, but simply counts the rows.

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(mean_creatinine = mean(sys_path_creat, na.rm = TRUE),
            min_creatinine = min(sys_path_creat, na.rm = TRUE),
            max_creatinine = max(sys_path_creat, na.rm = TRUE),
            total_creatinine = sum(sys_path_creat, na.rm = TRUE), 
            n_rows = n())

mean_creatinine	min_creatinine	max_creatinine	total_creatinine	n_rows
69.14035	51	108	3941	11627

Try It Yourself

Summarise the key laboratory values and treatment outcomes for patients who received systemic therapy. Calculate the mean of C peptide (sys_path_cpep), creatinine (sys_path_creat), and lactate dehydrogenase (systemic_ldh_value).

Hint: Use prior_treatment instrument

Solution

immuno_dataset$redcap_data$prior_treatment |> 
  summarise(
    mean_cpep = mean(sys_path_cpep, na.rm = TRUE),
    mean_creat = mean(sys_path_creat, na.rm = TRUE),
    mean_ldh = mean(systemic_ldh_value, na.rm = TRUE))

`arrange()`

The arrange() function orders rows based on the values in a given column.

Example 1: Order the records based on the UR number in demographics.

immuno_dataset$redcap_data$demographics |> 
  arrange(ur) |> 
  head()

record_id	redcap_event	ur	last_name	first_name	sex	dob	height	weight	bmi	other_studies_enrolled_in	medical_history	autoimmune_disease	select_autoimmune_diseases	smoking
216	baseline	102028	Johnson	Catherine	Female	1979-08-09	190	103	28.53	mrv	NA	TRUE	inflammatory_bowel_disease	Never smoked
74	baseline	112616	Peterson	Sophie	Male	1929-09-08	161	71	27.39	NA	hypertension	TRUE	inflammatory_arthritis	Never smoked
98	baseline	113893	Green	Nathan	Male	1975-11-29	147	190	87.93	NA	NA	FALSE	NA	Never smoked
249	baseline	121563	Jones	Jack	Male	1998-06-06	193	68	18.26	NA	NA	NA	NA	NA
195	baseline	140072	Foster	Charlotte	Female	1963-05-19	195	135	35.50	micromac	NA	FALSE	NA	Never smoked
260	baseline	146543	Barnes	Abigail	Female	1927-01-16	167	158	56.65	NA	NA	NA	NA	NA

Example 2: Sort the records in mortality_data based on the mortality date first and then by last scan date (date_last_scan).

immuno_dataset$redcap_data$mortality_data |> 
  arrange(mortality_date, date_last_scan) |> 
  head()

record_id	redcap_event	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause
70	end	TRUE	2013-11-15	no	NA	45.7	melanoma progression
193	end	TRUE	2015-04-25	no	NA	NA	melanoma progression
331	end	TRUE	2015-05-29	no	NA	28.0	melanoma progression
84	end	TRUE	2015-06-19	no	NA	0.5	melanoma progression
349	end	TRUE	2015-07-25	no	2024-01-17	8.5	melanoma progression
85	end	FALSE	2015-08-05	no	2018-10-25	7.5	melanoma progression

`desc()` helper function

This function is used to sort data in descending order.

Example 3: Sort the records in mortality_data in descending order based on the mortality_treatment_time.

immuno_dataset$redcap_data$mortality_data |> 
  arrange(desc(mortality_treatment_time)) |> 
  head()

record_id	redcap_event	mortality	mortality_date	ongoing_survelliance	date_last_scan	mortality_treatment_time	mortality_cause
188	end	FALSE	2024-09-06	yes	2023-12-22	131.5	NA
345	end	TRUE	2018-08-03	no	NA	115.0	melanoma progression
221	end	FALSE	2024-08-12	yes	2024-06-08	111.8	NA
38	end	FALSE	2024-04-15	yes	2024-06-01	109.6	NA
146	end	FALSE	2024-09-06	no	2020-08-06	108.6	NA
200	end	FALSE	2024-07-12	no	2022-01-04	108.3	NA

Try It Yourself

Exclude records where either the first name (first_name) or last name (last_name) is missing, and then sort the remaining records in ascending order, first by first_name and then by last_name.

Hint: Use demographics instrument. Use is.na() to check for missing values and !is.na() to keep only non-missing values.

Solution

immuno_dataset$redcap_data$demographics |> 
  filter(!is.na(first_name), !is.na(last_name)) |> 
  arrange(first_name, last_name)

`count()` helper

The count() function is used to count the number of occurrences of unique values in one or more variables within a data frame. This function is particularly useful for summarising data and understanding the distribution of values within a dataset.

Example 1: Count the number of melanoma types in melanoma_data data frame.

immuno_dataset$redcap_data$melanoma_data |> 
  count(melanoma_type)

melanoma_type	n
acral	7
cutaneous	265
mucosal	16
pathology_not_available	4
unknown_primary	64
uveal	21
NA	132

Example 2: Count the number of records observed in each melanoma type and melanoma molecular mutation.

immuno_dataset$redcap_data$melanoma_data |> 
  count(melanoma_type, melanoma_molecular_mutation)

melanoma_type	melanoma_molecular_mutation	n
acral	braf	1
acral	kit	2
acral	nras	2
acral	wild_type	1
acral	NA	1
cutaneous	braf	110
cutaneous	kit	3
cutaneous	nras	54
cutaneous	other	1
cutaneous	unknown	3
cutaneous	wild_type	74
cutaneous	NA	20
mucosal	braf	1
mucosal	kit	3
mucosal	nras	3
mucosal	wild_type	9
pathology_not_available	braf	1
pathology_not_available	nras	1
pathology_not_available	wild_type	2
unknown_primary	braf	30
unknown_primary	kit	1
unknown_primary	nras	15
unknown_primary	other	1
unknown_primary	wild_type	16
unknown_primary	NA	1
uveal	braf	1
uveal	nras	1
uveal	other	2
uveal	unknown	1
uveal	wild_type	12
uveal	NA	4
NA	braf	2
NA	wild_type	2
NA	NA	128

Try It Yourself

Determine the number of records with each single-agent immunotherapy (type_of_single_agent_io: ipilimumab, nivolumab, pembrolizumab). Additionally, count the number of patients for each type of best response (best_response_to_ipi, best_response_to_nivo_p, best_response_to_pembro) separately.

Hint: Use prior_treatment instrument.

Solution

immuno_dataset$redcap_data$prior_treatment |> count(type_of_single_agent_io)

immuno_dataset$redcap_data$prior_treatment |> count(best_response_to_ipi)

immuno_dataset$redcap_data$prior_treatment |> count(best_response_to_nivo)

immuno_dataset$redcap_data$prior_treatment |> count(best_response_to_pembro)

Visualising Data

ggplot2 package simplifies the creation of plots. This package offers a streamlined interface for defining variables to plot, configuring their display, and adjusting visual attributes. Consequently, adapting to changes in the data or transitioning between plot types requires only minimal modifications. This feature facilitates the creation of high-quality plots suitable for publication with minimal manual adjustments.

If you’ve already installed the tidyverse package (if not, you can do so by running the command: install.packages("tidyverse")), let’s proceed to load it into our R session first:

library(tidyverse)

Next, load the pre-processed RDS Object:

immuno_dataset <- readRDS("data/Sample_immuno_dataset.rds")

Building a Basic Plot

The construction of ggplot graphics is incremental, allowing for the addition of new elements in layers. This approach grants users extensive flexibility and customisation options, enabling the creation of tailored plots to suit specific needs.

To build a ggplot, the following basic templates can be used for different types of plots.

Three things are required for a ggplot:

1. The data

We first specify the data frame that contains the relevant data to create a plot. Here we are sending the immuno_dataset$redcap_data$melanoma_data to the ggplot() function.

# render plot background
ggplot(immuno_dataset$redcap_data$demographics)

This command results in an empty gray panel. We must specify how various columns of the data frame should be depicted in the plot.

2. Aesthetics `aes()`

Next, we specify the columns in the data we want to map to visual properties (called aesthetics or aes in ggplot2). e.g. the columns for x values, y values and colours.

Since we are interested in generating a scatter plot, each point will have an x and a y coordinate. Therefore, we need to specify the x-axis to represent the year and y-axis to represent the count.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = height))

This results in a plot which includes the grid lines, the variables and the scales for x and y axes. However, the plot is empty or lacks data points.

3. Geometric Representation `geom_()`

Finally, we specify the type of plot (the geom). There are different types of geoms:

	`geom_blank()` draws an empty plot.
	`geom_segment()` draws a straight line. `geom_vline()` draws a vertical line and `geom_hline()` draws a horizontal line.
	`geom_curve()` draws a curved line.
	`geom_line()/geom_path()` makes a line plot. `geom_line()` connects points from left to right and `geom_path()` connects points in the order they appear in the data.

	`geom_point()` produces a scatterplot.
	`geom_jitter()` adds a small amount of random noise to the points in a scatter plot.
	`geom_dotplot()` produces a dot plot.
	`geom_smooth()` adds a smooth trend line to a plot.
	`geom_quantile()` draws fitted quantile with lines (a scatter plot with regressed quantiles).
	`geom_density()` creates a density plot.

	`geom_histogram()` produces a histogram.
	`geom_bar()` makes a bar chart. Height of the bar is proportional to the number of cases in each group.
	`geom_col()` makes a bar chart. Height of the bar is proportional to the values in data.

	`geom_boxplot()` produces a box plot.
	`geom_violin()` creates a violin plot.

	`geom_ribbon()` produces a ribbon (y interval defined line).
	`geom_area()` draws an area plot, which is a line plot filled to the y-axis (filled lines).
	`geom_rect()`, `geom_tile()` and `geom_raster()` draw rectangles.
	`geom_polygon()` draws polygons, which are filled paths.

	`geom_text()` adds text to a plot.
	`geom_text()` adds label to a plot.

The range of geoms available in ggplot2 can be obtained by navigating to the ggplot2 package in the Packages tab pane in RStudio (bottom right-hand corner) and scrolling down the list of functions sorted alphabetically to the geom_... functions.

Since we are interested in creating a scatter plot, the geometric representation of the data will be in point form. Therefore we use the geom_point() function.

To plot the expression of estrogen receptor alpha (ESR1) against that of the transcription factor, GATA3:

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point()

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Notice that we use the + sign to add a layer of points to the plot. This concept bears resemblance to Adobe Photoshop, where layers of images can be rearranged and edited independently. In ggplot, each layer is added over the plot in accordance with its position in the code using the + sign.

A note about |> and +

ggplot2 package was developed prior to the introduction of the pipe operator. In ggplot2, the + sign functions analogously to the pipe operator in other tidyverse functions, enabling code to be written from left to right.

Customising Plots

Adding Colour

The above plot could be made more informative. For instance, the additional information regarding the gender (i.e., sex column) could be incorporated into the plot. To do this, we can utilise aes() and specify which column in the data frame should be represented as the color of the points.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi, color = sex)) + 
  geom_point()

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Notice that we specify the colour = sex argument in the aes() mapping inside the geom_() function instead of ggplot() function. Aesthetic mappings can be set in both ggplot() and individual geom() layers and we will discuss the difference in the Section: Adding Layers.

To colour points based on a continuous variable, for example: height:

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = height))

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

In ggplot2, a color scale is used for continuous variables, while discrete or categorical values are represented using discrete colors.

Note that some patient samples lack values, leading ggplot2 to remove those points with missing values for bmi and weight.

Adding Shape

Let’s add shape to points.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(shape = smoking))

Warning: Removed 105 rows containing missing values or values outside the scale range
(`geom_point()`).

Note that some patient samples have not been classified and ggplot has removed those points with missing values for the smoking categories.

Some aesthetics like shape can only be used with categorical variables:

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(shape = height))

Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `scale_f()`:
! A continuous variable cannot be mapped to the shape aesthetic.
ℹ Choose a different aesthetic or use `scale_shape_binned()`.

The shape argument allows you to customise the appearance of all data points by assigning an integer associated with predefined shapes shown below:

To use asterix instead of points in the plot:

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(shape = 8)

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

It would be useful to be able to change the shape of all the points. We can do so by setting the size to a single value rather than mapping it to one of the variables in the data set - this has to be done outside the aesthetic mappings (i.e. outside the aes() bit) as above.

Aesthetic Setting vs. Mapping

Instead of mapping an aesthetic property to a variable, you can set it to a single value by specifying it in the layer parameters (outside aes()). We map an aesthetic to a variable (e.g., aes(shape = THREEGENE)) or set it to a constant (e.g., shape = 8). If you want appearance to be governed by a variable in your data frame, put the specification inside aes(); if you want to override the default size or colour, put the value outside of aes().

# size outside aes()
ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(shape = 8)

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

# size inside aes()
ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(shape = smoking))

Warning: Removed 105 rows containing missing values or values outside the scale range
(`geom_point()`).

The above plots are created with similar code, but have rather different outputs. The first plot sets the size to a value and the second plot maps (not sets) the size to the three-gene classifier variable.

It is usually preferable to use colours to distinguish between different categories but sometimes colour and shape are used together when we want to show which group a data point belongs to in two different categorical variables.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex, shape = smoking))

Warning: Removed 105 rows containing missing values or values outside the scale range
(`geom_point()`).

Adding Size and Transparency

We can adjust the size and/or transparency of the points.

Let’s first increase the size of points.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex), size = 2)

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Note that here we add the size argument outside of the the aesthetic mapping.

Size is not usually a good aesthetic to map to a variable and hence is not advised.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex, size = smoking))

Warning: Using size for a discrete variable is not advised.

Warning: Removed 105 rows containing missing values or values outside the scale range
(`geom_point()`).

Because this value is discrete, the default size scale uses evenly spaced sizes for points categorised on smoking categories.

Transparency can be useful when we have a large number of points as we can more easily tell when points are overlaid, but like size, it is not usually mapped to a variable and sits outside the aes().

Let’s change the transparency of points.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex), alpha = 0.5)

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Adding Layers

We can add another layer to this plot using a different geometric representation (or geom_ function) we discussed previously.

Let’s add trend lines to this plot using the geom_smooth() function which provide a summary of the data.

ggplot(immuno_dataset$redcap_data$demographics) + 
  geom_point(aes(x = weight, y = bmi)) +
  geom_smooth(aes(x = weight, y = bmi))

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Warning: Removed 8 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Note that the shaded area surrounding blue line represents the standard error bounds on the fitted model.

There is some annoying duplication of code used to create this plot. We’ve repeated the exact same aesthetic mapping for both geoms. We can avoid this by putting the mappings in the ggplot() function instead.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point() +
  geom_smooth()

Geom layers specified earlier in the command are drawn first, preceding subsequent geom layers. The sequence of geom layers specified in the command determines their order of appearance in the plot.

If you switch the order of the geom_point() and geom_smooth() functions above, you’ll notice a change in the regression line. Specifically, the regression line will now be plotted underneath the points.

Let’s make the plot look a bit prettier by reducing the size of the points and making them transparent. We’re not mapping size or alpha to any variables, just setting them to constant values, and we only want these settings to apply to the points, so we set them inside geom_point().

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(size = 0.5, alpha = 0.5) +
  geom_smooth()

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Warning: Removed 8 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Aesthetic Specifications in Plot vs. Layers

Aesthetic mappings can be provided either in the initial ggplot() call, in individual layers, or through a combination of both approaches. When there’s only one layer in the plot, the method used to specify aesthetics doesn’t impact the result.

# colour argument inside ggplot()
ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi, colour = smoking)) + 
  geom_point(size = 0.5, alpha = 0.5) +
  geom_smooth() 
# colour argument inside geom_point()
ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = smoking), size = 0.5, alpha = 0.5) +
  geom_smooth()

In the left plot, since we specified the colour (i.e., colour = smoking) inside the ggplot() function, the geom_smooth() function will fit regression lines for each type of ER status and will have coloured regression lines as shown above. This is because, when aesthetic mappings are defined in ggplot(), at the global level, they’re passed down to each of the subsequent geom layers of the plot.

If we want to add colour only to the points and fit a regression line across all points, we could specify the colour inside geom_point() function (i.e., right plot).

Plot Labels

You can customise plots to include a title, a subtitle, a caption or a tagusing the labs() function.

ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = sex), size = 0.5, alpha = 0.5) +
  geom_smooth() +
  labs(
    title = "Variation between BMI and Weight coloured by smoking categories of melanoma patients",
    subtitle = "BMI vs Weight",
    caption = "Variation between BMI and Weight",
    tag = "Figure 1",
    y = "Body Mass Index",
    x = "Weight (kg)")

Themes

Themes control the overall appearance of the plot, including background color, grid lines, axis labels, and text styles. ggplot offers several built-in themes, and you can also create custom themes to match your preferences or the requirements of your publication. The default theme has a grey background.

weight_vs_bmi <- ggplot(immuno_dataset$redcap_data$demographics, aes(x = weight, y = bmi)) + 
  geom_point(aes(colour = smoking), size = 0.5, alpha = 0.5) +
  geom_smooth() 

weight_vs_bmi + theme_bw()

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Warning: Removed 8 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).

Try these themes yourselves: theme_classic(), theme_dark(), theme_grey() (default), theme_light(), theme_linedraw(), theme_minimal(), theme_void() and theme_test().

Try It Yourself

Given the cortisol_mort_time data frame, complete the following tasks:

Print the data frame to explore its contents.
Create a scatter plot of cortisol levels versus melanoma treatment time.
Color the points by melanoma type.
Set the point size to 2 and adjust transparency to 0.7.
Add a single regression line for all points.
Include a plot title and label the axes as:
- Y-axis: Cortisol (nmol/L)
- X-axis: Time since first treatment dose (months)
Apply the theme_grey() for styling.

# Extract cortisol values from the pathology instrument,
# keeping only patient ID and cortisol, and remove rows with missing cortisol values
cortisol <- immuno_dataset$redcap_data$pathology |> 
  select(record_id, cortisol) |> 
  filter(!is.na(cortisol))

# Join cortisol data with mortality_data instrument to bring in mortality_treatment_time,
# and keep only relevant columns: patient ID, mortality treatment time, and cortisol
cortisol_mort_time <- left_join(cortisol, immuno_dataset$redcap_data$mortality_data) |> 
  select(record_id, mortality_treatment_time, cortisol)

# Extract melanoma type information from the melanoma_data instrument
mel_subset <- immuno_dataset$redcap_data$melanoma_data |> select(record_id, melanoma_type)

# Join melanoma type to the existing cortisol_mort_time data
cortisol_mort_time <- left_join(cortisol_mort_time, mel_subset)

Solution

cortisol_mort_time

2.-7.

ggplot(cortisol_mort_time, aes(x = mortality_treatment_time, y = cortisol)) + 
  geom_point(aes(colour = melanoma_type), size = 2, alpha = 0.7) +
  geom_smooth() +
  labs(
    title = "Cortisol vs Time since first treatment dose",
    y = "Cortisol (nmol/L)",
    x = "Time since first treatment dose (months)",
    colour = "Melanoma Type" # legend title
  ) + 
  theme_grey()

Bar chart

Let’s create a bar chart of the number of patients based on different melanoma type in melanoma_data instrument.

The geom_bar is the geom used to plot bar charts. It requires a single aesthetic mapping of the categorical variable of interest to x.

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type))

The dark grey bars are a big ugly - what if we want each bar to be a different colour?

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, colour = melanoma_type))

Colouring the edges wasn’t quite what we had in mind. Look at the help for geom_bar to see what other aesthetic we should have used.

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, fill = melanoma_type))

What happens if we colour (fill) with something other than the melanoma_type?

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, fill = melanoma_molecular_mutation))

We get a stacked bar plot.

Note the similarity in what we did here to what we did with the scatter plot - there is a common grammar.

We can rearrange the three gene groups into adjacent (dodged) bars by specifying a different position within geom_bar():

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, fill = melanoma_molecular_mutation), position = 'dodge')

What if want all the bars to be the same colour but not dark grey, e.g. blue?

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type, fill = "blue"))

That doesn’t look right - why not?

You can set the aesthetics to a fixed value but this needs to be outside the mapping, just like we did before for size and transparency in the scatter plots.

ggplot(immuno_dataset$redcap_data$melanoma_data) +  
  geom_bar(aes(x = melanoma_type), fill = "blue")

Setting this inside the aes() mapping told ggplot2 to map the colour aesthetic to some variable in the data frame, one that doesn’t really exist but which is created on-the-fly with a value of “blue” for every observation.

Try It Yourself

Create a horizontal bar chart displaying the types of immune checkpoint inhibition therapies. Add appropriate axis labels and a descriptive title.

Hint: use prior_treatment instrument.

Solution

ggplot(immuno_dataset$redcap_data$prior_treatment, aes(y = systemic_ici)) +
  geom_bar() +
  labs(
    title = "Immune Checkpoint Inhibition Therapy Counts",
    x = "Counts",
    y = "Immune Checkpoint Inhibition Therapy Type"
  )

Box plot

Box plots (or box & whisker plots) are a particular favourite seen in many seminars and papers. Box plots summarise the distribution of a set of values by displaying the minimum and maximum values, the median (i.e. middle-ranked value), and the range of the middle 50% of values (inter-quartile range). The whisker line extending above and below the IQR box define Q3 + (1.5 x IQR), and Q1 - (1.5 x IQR) respectively.

To create a box plot from immuno dataset:

# join the melanoma_data instrument and mortality_data
mel_mort <- full_join(immuno_dataset$redcap_data$melanoma_data , 
                      immuno_dataset$redcap_data$mortality_data, 
                      by = "record_id")
# keep only the non-missing rows of mortality_treatment_time column
mel_mort <- mel_mort |> filter(!is.na(mortality_treatment_time)) 

ggplot(mel_mort, aes(x = melanoma_type, y = mortality_treatment_time)) +
  geom_boxplot()

See geom_boxplot help to explain how the box and whiskers are constructed and how it decides which points are outliers and should be displayed as points.

Let’s try a colour aesthetic to also look at how estrogen receptor expression differs between HER2 positive and negative tumours.

ggplot(mel_mort, aes(x = melanoma_type, y = mortality_treatment_time, color = melanoma_type)) +
  geom_boxplot()

Try It Yourself

Create a box plot showing the duration of response to BRAF/MEK inhibitors by type of BRAF/MEK therapy. Make sure to include appropriate axis labels.

Hint: use prior_treatment instrument.

Solution

ggplot(immuno_dataset$redcap_data$prior_treatment, 
       aes(x = braf_mek, y = dur_braf_mek)) +
  geom_boxplot() + 
  labs(
    x = "Type of BRAF/ MEK",
    y = "Duration of Response to BRAF/MEK inhibitor (months)"
  )

Violin plot

A violin plot is used to visualise the distribution of a numeric variable across different categories. It combines aspects of a box plot and a kernel density plot.

The width of the violin at any given point represents the density of data at that point. Wider sections indicate a higher density of data points, while narrower sections indicate lower density. By default, violin plots are symmetric.

ggplot(mel_mort, aes(x = melanoma_type, y = mortality_treatment_time, color = melanoma_type)) +
    geom_violin()

Try It Yourself

Create a violin plot showing the time from commencing treatment to best response by type of best response to PET. Make sure to include appropriate axis labels.

Hint: use response_data instrument.

Solution

ggplot(immuno_dataset$redcap_data$response_data, 
       aes(y = time_to_best_response, x = best_res_pet)) +
  geom_violin() + 
  labs(
    x = "Best response on PET",
    y = "Time from commencing treatment to best response (days)"
  )

Histogram

The geom for creating histograms is, rather unsurprisingly, geom_histogram().

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_histogram(aes(x = time_to_best_response))

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_bin()`).

The warning message hints at picking a more optimal number of bins by specifying the binwidth argument.

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_histogram(aes(x = time_to_best_response), binwidth = 5)

Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_bin()`).

Or we can set the number of bins.

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_histogram(aes(x = time_to_best_response), bins = 20)

Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_bin()`).

These histograms are not very pleasing, aesthetically speaking - how about some better aesthetics?

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_histogram(
    aes(x = time_to_best_response), 
    bins = 20, 
    colour = "darkblue", 
    fill = "grey")

Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_bin()`).

Try It Yourself

Create a histogram of pituitary size using data from the ctmri_imaging instrument. Add color to the bars and a distinct color to the borders. Include clear and appropriate axis labels.

Solution

ggplot(immuno_dataset$redcap_data$ctmri_imaging, 
       aes(x = pit_size)) +
  geom_histogram(binwidth = 1, colour = "darkgreen", fill = "lightgreen") + 
  labs(
    x = "Pituitary size (mm)",
    y = "Counts"
  )

Density plot

Density plots are used to visualise the distribution of a continuous variable in a dataset. These are essentially smoothed histograms, where the area under the curve for each sub-group will sum to 1. This allows us to compare sub-groups of different size.

ggplot(immuno_dataset$redcap_data$response_data) +
  geom_density(aes(x = time_to_best_response, colour = best_res_pet))

Warning: Removed 155 rows containing non-finite outside the scale range
(`stat_density()`).

Warning: Groups with fewer than two data points have been dropped.

Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
-Inf

Saving plot images

Use ggsave() to save the last plot you displayed.

ggsave("time_to_best_response_density_plot.png")

You can alter the width and height of the plot and can change the image file type.

ggsave("time_to_best_response_density_plot.pdf", width = 20, height = 12, units = "cm")

You can also pass in a plot object you have created instead of using the last plot displayed. See the help page (?ggsave) for more details.

Try It Yourself

Assign the variable name cortisol_mort_time_plt to the scatter plot you created before. Save this plot as a jpeg file.

Solution

cortisol_mort_time_plt <- ggplot(cortisol_mort_time, aes(x = mortality_treatment_time, y = cortisol)) + 
  geom_point(aes(colour = melanoma_type), size = 2, alpha = 0.7) +
  geom_smooth() +
  labs(
    title = "Cortisol vs Time since first treatment dose",
    y = "Cortisol (nmol/L)",
    x = "Time since first treatment dose (months)",
    colour = "Melanoma Type" # legend title
  ) + 
  theme_grey()

ggsave(plot = cortisol_mort_time_plt, filename = "cortisol_vs_mort_time_plot.jpeg")

Basics of R Programming Language

R!

Why learn R?

Getting Started with R

A look around RStudio

Console Pane

History Pane

Environment Pane

Plotting Pane

Help Pane

Files Pane

Packages Pane

Viewer Pane

Working directory

R Scripts

Quarto Document

Getting Started with a Quarto Document

Open a New Quarto Document

Save the File

Understanding the Structure of a Quarto Document

Keyboard Shortcuts in Quarto (Windows & Mac)

Comments

Executing Commands

Simple Maths in R

Calling Functions

Getting Help

Arithmetic Mean

Description

Usage

Arguments

Value

References

See Also

Examples

R Packages

Installing Packages

Loading Packages

Package Documentation

tidyverse: Easily Install and Load the ‘Tidyverse’

Description

Author(s)

See Also

Variables

The Pipe Operator (|> or %>%)

Chaining functions

Clearing the Environment

Case Study: Immunotherapy Dataset

Importing REDCap Data

REDCap API

What is the REDCap API?

Requesting an API Token in REDCap

Using the REDCap API

Using an Environment Variable for API Token in R

Importing REDCap Data via API

Reading REDCap Data

Set project-wide values

Read all records and fields

Read a subset of records

Read a subset of fields

Reading all REDCap data

Exploring the Data

Extracting data tibbles from the supertibble

Binding data tibbles into the environment

Extracting a list of data tibbles

Adding variable labels with the labelled package

Renaming column names using labels

Viewing the Data

Writing Data to a File

Data manipulation with `dplyr` functions

filter()

%in% helper

select()

Useful helper functions

mutate()

case_when helper function

summarise()

n() helper function

arrange()

desc() helper function

count() helper

The Pipe Operator (`|>` or `%>%`)

`filter()`

`%in%` helper

`select()`

`mutate()`

`case_when` helper function

`summarise()`

`n()` helper function

`arrange()`

`desc()` helper function

`count()` helper

2. Aesthetics `aes()`

3. Geometric Representation `geom_()`