This document illustrates the basic simulation workflow of ERAHUMED. Specifically, we will cover:
- How to setup and run a simulation.
- How to extract and analyze simulation results.
The goal is to provide a practical, step-by-step overview of how to perform simulations. For a detailed explanation of the underlying models, algorithms, and assumptions, refer to the user manual.
This guide is intended for users working with the R interface of the package and is not required for those using only the Shiny GUI.
Running a simulation
The main interface for running simulations in erahumed
is the erahumed_simulation()
function:
sim <- erahumed_simulation()
#> Initializing inputs
#> Computing hydrology: lake
#> Computing hydrology: clusters
#> Computing hydrology: ditches
#> Computing exposure: clusters
#> Computing exposure: ditches
#> Computing exposure: lake
#> Computing risk: clusters
#> Computing risk: ditches
#> Computing risk: lake
sim
#> <ERAHUMED Simulation>
#> Date range : 2020-01-01 to 2020-12-31
#> Simulation days : 366
#> Clusters : 552
#> Management systems : 2
#> Chemicals simulated : 8
#> Total applications : 17
#>
#> Need help extracting simulation outputs? Check `?get_results`.
This function handles both the setup and execution of a simulation. Simulation parameters are specified via its arguments, and calling the function launches the simulation and returns a fully executed object containing the results (note that this may take some time).
The example above runs a simulation with the default model
parameters. These can be customized via the arguments of
erahumed_simulation()
. For instance:
sim2 <- erahumed_simulation(foc_ss = 0.20, foc_sed = 0.07)
#> Initializing inputs
#> Computing hydrology: lake
#> Computing hydrology: clusters
#> Computing hydrology: ditches
#> Computing exposure: clusters
#> Computing exposure: ditches
#> Computing exposure: lake
#> Computing risk: clusters
#> Computing risk: ditches
#> Computing risk: lake
sim2
#> <ERAHUMED Simulation>
#> Date range : 2020-01-01 to 2020-12-31
#> Simulation days : 366
#> Clusters : 552
#> Management systems : 2
#> Chemicals simulated : 8
#> Total applications : 17
runs a simulation with modified environmental parameters (fraction of organic content in suspended solid and sediment), while:
sim3 <- erahumed_simulation(date_start = "2019-01-01", date_end = "2019-12-31")
#> Initializing inputs
#> Computing hydrology: lake
#> Computing hydrology: clusters
#> Computing hydrology: ditches
#> Computing exposure: clusters
#> Computing exposure: ditches
#> Computing exposure: lake
#> Computing risk: clusters
#> Computing risk: ditches
#> Computing risk: lake
sim3
#> <ERAHUMED Simulation>
#> Date range : 2019-01-01 to 2019-12-31
#> Simulation days : 365
#> Clusters : 552
#> Management systems : 2
#> Chemicals simulated : 8
#> Total applications : 17
runs a simulation over a different date range.
The full set of simulation parameters is documented in the
user manual, as well as in the R documentation page
?erahumed_simulation
. We highlight a few special
parameters:
Observational inputs - The
outflows_df
andweather_df
arguments oferahumed_simulation()
are data-frames containing time-series data that serve as the empirical basis for ERAHUMED simulations. Further details are provided in the user manual. Thedate_start
anddate_end
arguments (see above) must fall within the time range covered by these datasets.Rice-field (agrochemical) management system map - The
rfms_map
argument argument is the primary interface for advanced scenario customization, encapsulating the full set of user-defined agrochemical configurations (including custom chemicals and Rice-Field Management Systems, or RFMSs). A conceptual overview of these capabilities is provided in the user manual, while a step-by-step guide to creating custom scenarios is available in a dedicated vignette.Random seed - The
seed
argument controls the random number generator used in the simulation, ensuring reproducible results when stochastic elements are involved (e.g., the random order in which rice field clusters are drained during the sowing season). Setting a fixed value allows you to obtain identical results when re-running the same simulation; leaving it unset may produce slightly different outcomes across runs.
Analyzing simulation results
Simulation results are extracted as follows:
lake_hydrology_df <- get_results(sim, component = "hydrology", element = "lake")
cluster_hydrology_df <- get_results(sim, component = "hydrology", element = "cluster")
cluster_exposure_df <- get_results(sim, component = "exposure", element = "cluster")
These are provided in the form of data.frame
s, for
instance:
head(cluster_hydrology_df)
#> ideal_height_eod_cm ideal_irrigation ideal_draining is_plan_delays_window
#> 1 20 TRUE TRUE FALSE
#> 2 20 TRUE TRUE FALSE
#> 3 20 TRUE TRUE FALSE
#> 4 20 TRUE TRUE FALSE
#> 5 20 TRUE TRUE FALSE
#> 6 20 TRUE TRUE FALSE
#> petp_cm area_m2 capacity_m3 date element_id
#> 1 -0.058 114881.78 4960.991 2020-01-01 02_Carrera_del_Saler0-2_0
#> 2 -0.058 116539.90 4960.991 2020-01-01 03_Petxinar0-3_2
#> 3 -0.058 154730.35 4960.991 2020-01-01 03_Petxinar0-3_3
#> 4 -0.058 163789.56 4960.991 2020-01-01 03_Petxinar1-3_1
#> 5 -0.058 83016.51 4960.991 2020-01-01 03_Petxinar1-3_2
#> 6 -0.058 106260.07 4960.991 2020-01-01 03_Petxinar1-3_3
#> ditch_element_id seed_day tancat rfms_id rfms_name height_sod_cm irrigation
#> 1 d2 -110 TRUE 2 Clearfield 20 TRUE
#> 2 d2 -110 TRUE 2 Clearfield 20 TRUE
#> 3 d2 -110 TRUE 2 Clearfield 20 TRUE
#> 4 d2 -110 TRUE 2 Clearfield 20 TRUE
#> 5 d2 -110 TRUE 2 Clearfield 20 TRUE
#> 6 d2 -110 TRUE 2 Clearfield 20 TRUE
#> draining ideal_diff_flow_cm ideal_inflow_cm ideal_outflow_cm outflow_m3
#> 1 TRUE 0.058 5 4.942 0
#> 2 TRUE 0.058 5 4.942 0
#> 3 TRUE 0.058 5 4.942 0
#> 4 TRUE 0.058 5 4.942 0
#> 5 TRUE 0.058 5 4.942 0
#> 6 TRUE 0.058 5 4.942 0
#> outflow_cm inflow_cm inflow_m3 height_eod_cm plan_delay
#> 1 0 0.058 66.63143 20 0
#> 2 0 0.058 67.59314 20 0
#> 3 0 0.058 89.74360 20 0
#> 4 0 0.058 94.99795 20 0
#> 5 0 0.058 48.14958 20 0
#> 6 0 0.058 61.63084 20 0
From here on, the analysis may proceed in the way you find more
convenient. For instance, in the chunk below I create a plot of water
levels for a set of clusters with similar features, using
dplyr
and ggplot2
:
library(dplyr)
library(ggplot2)
ditch <- "d4"
tancat <- FALSE
rfms_name <- "Clearfield" # Rice field management system
clusters_df <- cluster_hydrology_df |>
filter(ditch == !!ditch, tancat == !!tancat, rfms_name == !!rfms_name)
avg_df <- clusters_df |>
group_by(date) |>
summarise(height_eod_cm = mean(height_eod_cm))
ggplot() +
geom_line(
data = clusters_df,
mapping = aes(x = date, y = height_eod_cm, group = element_id),
color = "black", linewidth = 0.1, alpha = 0.2) +
geom_line(
data = avg_df,
mapping = aes(x = date, y = height_eod_cm),
color = "black"
) +
xlab("Date") + ylab("Height [cm]") +
ggtitle("Cluster simulated water levels",
paste("Ditch:", ditch, "- Tancat:", tancat, "- RFMS:", rfms_name)
)
Further information
For additional details not covered in this guide, see the other package vignettes. If there is a specific topic you would like to see documented, please let us know by filing an issue on GitHub or by using the contact information provided on the package homepage.