Add a scripted action to the pipeline
Every study starts with a dataset definition like the one you just edited.
When executed, a dataset definition generates a compressed CSV (.csv.gz) of patient data.
A real analysis will have several further steps after this. Each step is defined in a separate file, and can be written in any of the programming languages supported in OpenSAFELY.
Create a new action🔗
In this tutorial, we're going to draw a histogram of ages, using either four lines of Python or just a few more lines of R.
- Right-click on the
analysisfolder in the editor's Explorer and select "New file". Type "report.py" as the filename and press Enter. - Add the following to
report.py:.import pandas as pd data = pd.read_csv("output/dataset.csv.gz") fig = data.age.plot.hist().get_figure() fig.savefig("output/report.png")
- Right-click on the
analysisfolder in the editor's Explorer and select "New file". Type "report.R" as the filename and press Enter. - Add the following to
report.R:.library('tidyverse') df_input <- read_csv( here::here("output", "dataset.csv.gz"), col_types = cols(patient_id = col_integer(),age = col_double()) ) plot_age <- ggplot(data=df_input, aes(df_input$age)) + geom_histogram() ggsave( plot= plot_age, filename="report.png", path=here::here("output"), )
This code reads the CSV of patient data, and saves a histogram of ages to a new file.
Add the action to the pipline🔗
-
Open
project.yamlin the editor. This file will be near the end of the list of files and folders. This file describes how each step in your analysis should be run. It already defines a singlegenerate_datasetaction which defines the output that we've generated so far. This file is in a format called YAML: the way it's indented matters, so be careful to copy and paste the following carefully. -
Add a
generate_reportaction to the file, so the entire file looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
- Line 14 tells the system we want to create a new action called
generate_report. - Line 15 says how to run the script (using the
pythonorRrunner). - Line 16 tells the system that this action depends on the outputs of the
generate_datasetbeing present. - Lines 17-19 describe the files that the action creates. Line 18 says that the
items indented below it are moderately sensitive, which means they may be released
to the public after a careful review (and possible redaction). Line 19 says that
there's one output file, which will be found at
output/report.png.
At the command line, type opensafely run generate_report and press
Enter. This should end by telling you a file containing the histogram has been created.
Open the output folder — you can do this via Visual Studio Code's Explorer — and check that it contains report.png.
Double click on report.png to display the image,
or right-click on report.png and select Download to download the image.
Warning
Changes you make to files are automatically saved on GitHub. However, changes will not persist outside of the GitHub codespace unless you commit and push them to GitHub, as described in the next section.
- Previous: Run the project pipeline
- Next: Publish the changes to GitHub