Runs the aggregate model — run_model • aggregate.model

Runs the aggregate model according to the given specification of modules.

Usage

run_model(
  specification,
  dictionary = NULL,
  inputdata_directory = paste0(getwd(), "/data/raw"),
  primary_source = c("download", "local"),
  save_to_disk = NULL,
  present = FALSE,
  quiet = FALSE,
  use_logs = c("both", "y", "x"),
  trend = TRUE,
  ardl_or_ecm = "ardl",
  max.ar = 4,
  max.dl = 4,
  saturation = c("IIS", "SIS"),
  saturation.tpval = 0.01,
  max.block.size = 20,
  gets_selection = TRUE,
  selection.tpval = 0.01,
  constrain.to.minimum.sample = TRUE
)

Arguments

specification: A tibble or data.frame with three columns. Column names must be: 'type', 'dependent', and 'independent'. The column 'type' must contain for each row a character of either 'd' (Identity) or 'n' (Definition - i.e. will be estimated). The column 'dependent' must contain the LHS (Y variables) and the column named 'independent' containing the RHS (x variables separated by + and -).
dictionary: A tibble or data.frame storing the Eurostat variable code in column 'eurostat_code' and the model variable name in 'model_varname'. If download == TRUE then the dictionary also requires a column named 'dataset_id' that stores the Eurostat dataset id. When NULL, the default dictionary is used.
inputdata_directory: A path to .rds input files in which the data is stored. Can be NULL if download == TRUE.
primary_source: A string. Determines whether "download" or "local" data loading takes precedence.
save_to_disk: A path to a directory where the final dataset will be saved, including the file name and ending. Not saved when NULL.
present: A logical value whether the final aggregate model output should be presented or not. NOTE: not implemented yet.
quiet: Logical with default = FALSE. Should messages be displayed? These messages are intended to give more information about the estimation and data retrieval process.
use_logs: To decide whether to log any variables. Must be one of "both", "y", or "x". Default is "both".
trend: Logical. Should a trend be added? Default is TRUE.
ardl_or_ecm: Either 'ardl' or 'ecm' to determine whether to estimate the model as an Autoregressive Distributed Lag Function (ardl) or as an Equilibrium Correction Model (ecm).
max.ar: Integer. The maximum number of lags to use for the AR terms. as well as for the independent variables.
max.dl: Integer. The maximum number of lags to use for the independent variables (the distributed lags).
saturation: Carry out Indicator Saturation using the 'isat' function in the 'gets' package. Needes is a character vector or string. Default is 'c("IIS","SIS")' to carry out Impulse Indicator Saturation and Step Indicator Saturation. Other possible values are 'NULL' to disable or 'TIS' or Trend Indicator Saturation. When disabled, estimation will be carried out using the 'arx' function from the 'gets' package.
saturation.tpval: The target p-value of the saturation methods (e.g. SIS and IIS, see the 'isat' function in the 'gets' package). Default is 0.01.
max.block.size: Integer. Maximum size of block of variables to be selected over, default = 20.
gets_selection: Logical. Whether general-to-specific selection using the 'getsm' function from the 'gets' package should be done on the final saturation model. Default is TRUE.
selection.tpval: Numeric. The target p-value of the model selection methods (i.e. general-to-specific modelling, see the 'getsm' function in the 'gets' package). Default is 0.01.
constrain.to.minimum.sample: Logical. Should all data series be constrained to the minimum data series? Default is TRUE.

Value

An object of class aggmod, which is a named list with four elements:

args: A named list storing the user arguments for the aggregate model.
module_order_eurostatvars: The original specification with translated variable names to Eurostat codes and arranged in order of estimation.
module_collection: The above specification with two added columns that store the model object for each module and the dataset used for estimation, including fitted values for the dependent variable.
full_data: A tibble or data.frame containing the complete original data for the aggregate model and the fitted values of each module.

Examples

spec <- dplyr::tibble(
  type = c(
    "d",
    "d",
    "n"
  ),
  dependent = c(
    "StatDiscrep",
    "TOTS",
    "Import"
  ),
  independent = c(
    "TOTS - FinConsExpHH - FinConsExpGov - GCapitalForm - Export",
    "GValueAdd + Import",
    "FinConsExpHH + GCapitalForm"
  )
)
# \donttest{
a <- run_model(specification = spec, dictionary = NULL,
inputdata_directory = NULL, primary_source = "download",
save_to_disk = NULL, present = FALSE)
#> Dataset query already saved in cache_list.json...
#> Reading cache file /tmp/RtmpRdTQJB/eurostat/9c7812fa85b16c2feb69beab873eec66.rds
#> Table  namq_10_a10  read from cache file:  /tmp/RtmpRdTQJB/eurostat/9c7812fa85b16c2feb69beab873eec66.rds
#> Dataset query already saved in cache_list.json...
#> Reading cache file /tmp/RtmpRdTQJB/eurostat/494bfc7fbeb296a64850e2948fd8afc0.rds
#> Table  namq_10_gdp  read from cache file:  /tmp/RtmpRdTQJB/eurostat/494bfc7fbeb296a64850e2948fd8afc0.rds
#> 
#> --- Estimation begins ---
#> Estimating Import = FinConsExpHH + GCapitalForm 
#> Constructing TOTS = GValueAdd + Import 
#> Constructing StatDiscrep = TOTS - FinConsExpHH - FinConsExpGov - GCapitalForm - Export 
# }