TL;DR
Read this if you want to automate parameterized Quarto reports using Python.
Assumptions
I assume Quarto is already installed and that you have experience using Quarto. If you’re unfamiliar with Quarto, start here and these are installation instructions. FYI, I use a Ubuntu-based Linux distro at home so you’ll need to translate any terminal commands to your respective system. Assume that anytime a Python script is called in a terminal the the project environment has been activated (e.g., source .venv/bin/activate
).
Quarto and R
There are numerous examples of how to automate parameterized Quarto reports with R.
- Jadey Ryan: Automating Quarto reports with parameters
- David Keyes: A step-by-step guide to parameterized reporting in R using Quarto
- Mandy Norrbo: Generate multiple presentations with Quarto parameters
- Meghan Hall: Tips for Custom Parameterized PDFs in Quarto
However, I haven’t seen any tutorials for Python. Don’t believe me? Using your preferred search engine, search parameterized quarto
and see if any results aren’t for R. If you look closely there’s one result about using shell scripts.
I think R users are more familiar with Quarto because it’s a Posit product and like me, I imagine most RMarkdown users have transitioned to Quarto.
Why Python?
Because, right or wrong, Python is more prevalent than R.
Recently, I’ve been working on a broadly scoped, analytics passion project analyzing my household energy usage. One of the aspects of the project pertains to forecasting my household energy usage. This time series project is multilingual, it builds forecasting models with Python and R. You can learn more about that work by clicking the links below.
This tutorial references the code found in the Forecasting repo. Specifically, the Forecasting/py
directory.
Parameterized Quarto Reports
The need for parameterized Quarto reports arises when there’s repetition and a need for automation. Many of the R examples above illustrate this, see them for examples and details. I’ll discuss my use case in the context of the forecasting project.
A part of the forecasting process is hyperparameter tuning and choosing which models to put into production. All of this was developed using a Quarto markdown file (.qmd
). Originally, there was one Quarto file per model type (i.e., statistical, machine learning, and AutoML). However, I noticed that most of the code across files was the same or similar enough that I decided to consolidate the files into a single template. The template uses parameters to alert the code to which model type is being processed. The output files are dynamically named as method-date.html. For example, when training the statistics models the output file is stats-2024-10-27.html. Then, I created a CLI interface to execute the training process easily.
Let’s explore each of these components individually.
Quarto projects
I like using Quarto projects. To create a project use the following command and choose the appropriate options. There are several project types, from websites to blogs but choose the default project unless you know what you’re doing.
```{shell}
quarto create project
```
One outcome of creating a project is a file called _quarto.yml
. This file is like the Quarto command center for your project.
For my project, I only specified the bare minimum, an output directory for the rendered reports. However, you can add specific elements and styling to the rendered documents.
```{yaml}
project:
output-dir: notebooks/training_results
```
Quarto YAML Headers
If you’re familiar with Quarto markdown files then there’s nothing new here. The only callout I’ll make is that while testing this process sometimes the Jupyter Kernel would die. I learned that Quarto initiates the Jupyter Kernel and then keeps it alive via a daemon. I think this reduces run times as booting the Kernel is time-consuming.
Usually, this efficiency would be great. However, during my testing, when the Kernel died the entire training process died. This was sub-optimal because each training instance is independent of the rest. So if one model’s training process fails, the process should continue with the next model. I’d rather have some results than none at all. By reading the Quarto docs I learned setting daemon: false
forces Quarto to restart the Jupyter Kernel every time.
Takeaway. If your reports need to run sequentially and stop if something goes wrong then set daemon: true
. Otherwise, if the reports are independent of each other then setting daemon: false
might be helpful. Just be aware of adding some way to catch some reports that weren’t executed.
```{yaml}
---
title: Model Training
author: Santiago Rodriguez
format:
html:
code-fold: true
code-summary: Unhide
code-copy: false
engine: jupyter
execute:
warning: false
error: false
echo: true
freeze: false
daemon: false
---
```
Parameters in Python
Quarto parameters must be specified in code chunks when using Python. Here’s the expert from Quarto docs.
For Jupyter, Quarto uses the same syntax for defining parameters as Papermill. To parameterize a document, designate a cell with the tag parameters and provide appropriate default values.
The second element in my file, after the YAML header is the code chunk that defines parameters. This is not a rule but my convention, it helps me keep track of the parameters.
The primary parameter for my work is called method
. The default value for the the method parameter is None
so I can’t accidentally mess something up. The code chunk immediately after stops the work if the method parameter is None
.
```{python, quarto_params}
#| tags: [parameters]
months: int = 6
method: None | str = None
```
```{python}
assert method is not None, """
The Quarto param method cannot be None
"""
```
Rendering reports
As all of the R examples highlight, there are three ways to execute the reports:
- manually
- Quarto CLI
- Python
Manually
I’m skipping the manual execution option because the intent is to automate parameterized Quarto reports using Python.
Quarto CLI
First, create a variable called today with the date. Then use the Quarto CLI to render the report manually specifying the model method (i.e., stats, ml, autoML).
For example:
```{shell}
today=`date +"%Y-%m-%d"`
quarto render notebooks/training.qmd --output "stats.html" -P method:"stats"
quarto render notebooks/training.qmd --output "ml.html" -P method:"ml"
quarto render notebooks/training.qmd --output "autoML.html" -P method:"autoML"
```
I could put these commands in a shell script and call the script when I need to execute the training process. The shell script would look something like the following.
```{shell}
source .venv/bin/activate
today=`date +"%Y-%m-%d"`
quarto render notebooks/training.qmd --output "stats.html" -P method:"stats"
quarto render notebooks/training.qmd --output "ml.html" -P method:"ml"
quarto render notebooks/training.qmd --output "autoML.html" -P method:"autoML"
```
Or, I could use Python. Let’s use Python.
Python
I created a Python script called train.py
and used docopt
to create a command-line interface (CLI) for the script. Below is the CLI interface for train.py
.
```{python}
"""Model Training Interface
Usage:
train.py (-h | --help)
train.py --version
train.py fit <methods>...
Options:
-h --help Show this screen
--version Show version
"""
```
The fit
argument specifies the model method for which the training process is executed. The model methods are extracted via docopt_args.get("<methods>")
. The script then loops through the model methods and interacts with Quarto via the Python Quarto module. Use a dict
to define parameters, e.g., {“param1”: value, “param2”: “value”}. Below is a redacted snippet from train.py
.
```{python}
from quarto import render
model_methods = docopt_args.get("<methods>")
if __name__ == "__main__":
for method in model_methods:
# iterative param
output_file = method + "-" + current_date + ".html"
# call training script, execute, and render output
render(
input=input_file,
output_format="html",
output_file=output_file,
execute=True,
execute_params={"method": method},
)
# verbosity
print("Finished")
```
Last, run the script via a terminal.
```{shell}
python train.py fit stats ml autoML
```
S/N: There are at least two Python Quarto modules quarto-cli and quarto. I don’t know which is official. The former has a more recent release date.
Closing remarks
Both the shell and Python scripts are viable options. I chose to use Python because it is more versatile. If for instance I only needed to run the training process for the statistics models, I could. That said, the shell script is more concise.
I could use a scheduler, such as cron to automate the process completely. This would work with either the shell or Python scripts. In the context of my forecasting project, I would schedule the training process to run every month to ensure the models didn’t get stale.
Thanks for reading. Hopefully, this was useful and is only the beginning of Quarto and Python writings.