Runs the OSEM model
run_model.Rd
Runs the OSEM model according to the given specification of modules.
Usage
run_model(
specification,
dictionary = NULL,
inputdata_directory = paste0(getwd(), "/data/raw"),
primary_source = c("download", "local"),
save_to_disk = NULL,
present = FALSE,
quiet = FALSE,
use_logs = "both",
trend = TRUE,
ardl_or_ecm = "ardl",
max.ar = 4,
max.dl = 4,
saturation = c("IIS", "SIS"),
saturation.tpval = 0.01,
max.block.size = 20,
gets_selection = TRUE,
selection.tpval = 0.01,
constrain.to.minimum.sample = TRUE
)
Arguments
- specification
A tibble or data.frame with three columns. Column names must be: 'type', 'dependent', and 'independent'. The column 'type' must contain for each row a character of either 'd' (Identity) or 'n' (Definition - i.e. will be estimated). The column 'dependent' must contain the LHS (Y variables) and the column named 'independent' containing the RHS (x variables separated by + and -).
- dictionary
A tibble or data.frame storing the Eurostat variable code in column 'eurostat_code' and the model variable name in 'model_varname'. If
download == TRUE
then the dictionary also requires a column named 'dataset_id' that stores the Eurostat dataset id. WhenNULL
, the default dictionary is used.- inputdata_directory
A path to .rds input files in which the data is stored. Can be
NULL
ifdownload == TRUE
.- primary_source
A string. Determines whether
"download"
or"local"
data loading takes precedence.- save_to_disk
A path to a directory where the final dataset will be saved, including the file name and ending. Not saved when
NULL
.- present
A logical value whether the final OSEM model output should be presented or not.
- quiet
Logical with default = FALSE. Should messages be displayed? These messages are intended to give more information about the estimation and data retrieval process.
- use_logs
To decide whether to log any variables. Must be one of 'both', 'y', 'x', or 'none'. Default is 'both'.
- trend
Logical. Should a trend be added? Default is TRUE.
- ardl_or_ecm
Either 'ardl' or 'ecm' to determine whether to estimate the model as an Autoregressive Distributed Lag Function (ardl) or as an Equilibrium Correction Model (ecm).
- max.ar
Integer. The maximum number of lags to use for the AR terms. as well as for the independent variables.
- max.dl
Integer. The maximum number of lags to use for the independent variables (the distributed lags).
- saturation
Carry out Indicator Saturation using the 'isat' function in the 'gets' package. Needs a character vector or string. Default is 'c("IIS","SIS")' to carry out Impulse Indicator Saturation and Step Indicator Saturation. Other possible values are 'NULL' to disable or 'TIS' or Trend Indicator Saturation. When disabled, estimation will be carried out using the 'arx' function from the 'gets' package.
- saturation.tpval
The target p-value of the saturation methods (e.g. SIS and IIS, see the 'isat' function in the 'gets' package). Default is 0.01.
- max.block.size
Integer. Maximum size of block of variables to be selected over, default = 20.
- gets_selection
Logical. Whether general-to-specific selection using the 'getsm' function from the 'gets' package should be done on the final saturation model. Default is TRUE.
- selection.tpval
Numeric. The target p-value of the model selection methods (i.e. general-to-specific modelling, see the 'getsm' function in the 'gets' package). Default is 0.01.
- constrain.to.minimum.sample
Logical. Should all data series be constrained to the minimum data series? Default is
TRUE
.
Value
An object of class osem, which is a named list with four elements:
- args
A named list storing the user arguments for the OSEM model.
- module_order_eurostatvars
The original specification with translated variable names to Eurostat codes and arranged in order of estimation.
- module_collection
The above specification with two added columns that store the model object for each module and the dataset used for estimation, including fitted values for the dependent variable.
- full_data
A tibble or data.frame containing the complete original data for the OSEM model and the fitted values of each module.
Examples
spec <- dplyr::tibble(
type = c(
"d",
"d",
"n"
),
dependent = c(
"StatDiscrep",
"TOTS",
"Import"
),
independent = c(
"TOTS - FinConsExpHH - FinConsExpGov - GCapitalForm - Export",
"GValueAdd + Import",
"FinConsExpHH + GCapitalForm"
)
)
# \donttest{
a <- run_model(specification = spec, dictionary = NULL,
inputdata_directory = NULL, primary_source = "download",
save_to_disk = NULL, present = FALSE)
#> Dataset query already saved in cache_list.json...
#> Reading cache file /tmp/RtmpbqqFES/eurostat/359cf54d5550ae78d4a50764c95bc538.rds
#> Table namq_10_a10 read from cache file: /tmp/RtmpbqqFES/eurostat/359cf54d5550ae78d4a50764c95bc538.rds
#> Dataset query already saved in cache_list.json...
#> Reading cache file /tmp/RtmpbqqFES/eurostat/ec61ed0d2532c29bd845c4f33896b64d.rds
#> Table namq_10_gdp read from cache file: /tmp/RtmpbqqFES/eurostat/ec61ed0d2532c29bd845c4f33896b64d.rds
#>
#> --- Estimation begins ---
#> Estimating Import = FinConsExpHH + GCapitalForm
#> Constructing TOTS = GValueAdd + Import
#> Constructing StatDiscrep = TOTS - FinConsExpHH - FinConsExpGov - GCapitalForm - Export
# }