--- title: "An Introduction to missoNet" author: - Yixiao Zeng, Celia M. T. Greenwood date: "`r format(Sys.time(), '%B %d, %Y')`" output: html_document: toc: true toc_depth: 3 number_sections: true theme: united highlight: tango bibliography: missoNet_refs.bib nocite: '@*' vignette: > %\VignetteIndexEntry{An Introduction to missoNet} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction **missoNet** is a package that fits penalized multi-task regression -- that is, with multiple correlated tasks or response variables -- to simultaneously estimate the coefficients of a set of predictor variables for all tasks and the conditional response network structure given all predictors, via penalized maximum likelihood in an undirected conditional Gaussian graphical model. In contrast to most penalized multi-task regression (conditional graphical lasso) methods, **missoNet** has the capability of obtaining estimates even when the response data is corrupted by missing values. The method automatically enjoys the theoretical and computational benefits of convexity, and returns solutions that are comparable/close to the estimates without any missing values. **missoNet** assumes the model $$ \mathbf{Y} = \mathbf{X}\mathbf{B}^* + \mathbf{E},\ \ \mathbf{E}_{i\cdot} \stackrel{iid}{\sim} \mathcal{MVN}(0_q,(\mathbf{\Theta}^*)^{-1})\ \ \forall i = 1,...,n, $$ where $\mathbf{Y} \in \mathbb{R}^{n\times q}$ and $\mathbf{X} \in \mathbb{R}^{n\times p}$ represent the centered response data matrix and predictor data matrix, respectively (thus the intercepts are omitted). $\mathbf{E}_{i\cdot} \in \mathbb{R}^{q}$ is the $i$th row ($i$th sample) of the error matrix and is drawn from a multivariate Gaussian distribution. $n, p, q$ denote the sample size, the number of predictors and the number of responses, respectively. The regression coefficient matrix $\mathbf{B}^* \in \mathbb{R}^{p\times q}$ and the precision (inverse covariance) matrix $\mathbf{\Theta}^* \in \mathbb{R}^{q\times q}$ are parameters to be estimated in this model. Note that the entries of $\mathbf{\Theta}^*$ have a one-to-one correspondence with partial correlations. That is, the random variables $Y_j$ and $Y_k$ are conditionally independent (i.e. $\mathbf{\Theta}^*_{jk} = 0$) given $X$ and all other remaining variables in $Y$ iff the corresponding partial correlation coefficient is zero. To estimate the parameters, **missoNet** solves a penalized MLE problem: $$ (\hat{\mathbf{\Theta}},\hat{\mathbf{B}}) = \underset{\mathbf{\Theta} \succeq 0,\ \mathbf{B}}{\text{argmin}}\ \mathrm{tr}\left[\frac{1}{n}(\mathbf{Y} - \mathbf{XB})^\top(\mathbf{Y} - \mathbf{XB}) \mathbf{\Theta}\right] - \mathrm{log}|\mathbf{\Theta}| + \lambda_{\Theta}(\|\mathbf{\Theta}\|_{1,\mathrm{off}} + 1_{n\leq \mathrm{max}(p,q)} \|\mathbf{\Theta}\|_{1,\mathrm{diag}}) + \lambda_{B}\|\mathbf{B}\|_1. $$ $\mathbf{B}^*$ that mapping predictors to responses and $\mathbf{\Theta}^*$ that revealing the responses' conditional dependencies are estimated for the lasso penalties -- $\lambda_B$ and $\lambda_\Theta$, over a grid of values (on the log scale) covering the entire range of possible solutions. To learn more about sparse multi-task regression with inverse covariance estimation using datasets without missing values, see [MRCE](https://CRAN.R-project.org/package=MRCE). However, missingness in real data is inevitable. Most penalized multi-task regression / conditional graphical lasso methods either assume the data is fully-observed (eg. MRCE), or deal with missing data through a likelihood-based method such as the EM algorithm (e.g, [cglasso](https://CRAN.R-project.org/package=cglasso)). An important challenge in this context is the possible non-convexity of the associated optimization problem when there is missing data, as well as the high computational cost from iteratively updating the estimations for parameters. **missoNet** aims to handle a specific class of datasets where the response matrix $\mathbf{Y}$ contains data which is missing at random (MAR). To use **missoNet**, users do not need to possess additional knowledge for pre-processing missing data (e.g. imputation) nor for selection of regularization parameters ($\lambda_B$ and $\lambda_\Theta$), the method provides a unified framework for automatically solving a convex modification of the multi-task learning problem defined above, using corrupted datasets. The package provides an integrated set of core routines for data simulation, model fitting and cross-validation, visualization of results, as well as predictions in new data. The function arguments are in the same style as those of `glmnet`, making it easy for experienced users to get started. # Installation To install the package **missoNet** from CRAN, type the following command in the R console: ```{r, echo = TRUE, eval = FALSE} install.packages("missoNet") ``` Or install the development version of **missoNet** from GitHub: ```{r, echo = TRUE, eval = FALSE} if(!require("devtools")) { install.packages("devtools") } devtools::install_github("yixiao-zeng/missoNet") ``` # Quick Start The purpose of this section is to give users a general overview of the functions and their usages. We will briefly go over the main functions, basic operations and outputs. First, we load the **missoNet** package: ```{r} library(missoNet) ``` We will use a synthetic dataset for illustration. To generate a set of data with missing response values, we can simply call the built-in function `generateData`: ```{r} ## Specify a random seed for reproducibility. ## The overall missing rate in the response matrix is around 10%. ## The missing mechanism can also be "MCAR" or "MNAR". sim.dat <- generateData(n = 150, p = 15, q = 12, rho = 0.1, missing.type = "MAR", with.seed = 1512) tr <- 1:120 # training set indices tst <- 121:150 # test set indices ``` This command returns an object containing all necessary components for analysis including: * `'X'`, `'Y'`: a predictor matrix $\mathbf{X} \in \mathbb{R}^{n\times p}$ and a complete (clean) response matrix $\mathbf{Y} \in \mathbb{R}^{n\times q}$, in which rows correspond to samples and columns correspond to variables. The rows of $\mathbf{Y}$ is sampled from $\mathcal{MVN}(X\mathbf{B}^*,(\mathbf{\Theta}^*)^{-1})$. * `'Beta'`, `'Theta'`: the true parameters $\mathbf{B}^* \in \mathbb{R}^{p\times q}$ and $\mathbf{\Theta}^* \in \mathbb{R}^{q\times q}$ for the generative model that will be estimated later (see the later section for more details on their structures and sparsity). * `'Z'`: a corrupted response matrix $\mathbf{Z} \in \mathbb{R}^{n\times q}$ having elements $\mathbf{Z}_{ij}$ ($\forall\ i=1,...,n,\ \ j=1,...,q$) which is either `NA` or equal to $\mathbf{Y}_{ij}$. The probabilities of missingness for the response variables and the missing mechanism are specified by `'rho'` and `'missing.type'`, respectively. `'rho'` can be a scalar or a vector of length $q$. In the code above, `'rho' = 0.1` indicates an overall missing rate of around 10%. In addition to `"MAR"`, the `'missing.type'` can also be `"MCAR"` or `"MNAR"`, see the later section for more details on how missing values are generated under different mechanisms. > **_NOTE 1:_** the program only accepts missing values that are coded as either `NA`s or `NaN`s. > **_NOTE 2:_** although the program will provide estimates even when $n \leq \text{max}(p,q)$, convergence is likely to be rather slow, and the variance of estimation is likely to be excessively large. Therefore, we __suggest__ that both the predictor columns and the response columns be reduced (filtered) if possible, prior to fitting any models. For example in genomics, where $\mathbf{X}$ can very wide (i.e. large $p$), we often filter features based on variance or coefficient of variation. We can easily visualize how missing data is distributed in the corrupted response matrix $\mathbf{Z}$ using package **visdat**: ```{r, eval = FALSE} Z <- sim.dat$Z # corrupted response matrix visdat::vis_miss(as.data.frame(Z)) ``` ```{r, include = FALSE, eval = FALSE} png(file = "vismis.png", width = 1900, height = 1300, res = 260) suppressWarnings(visdat::vis_miss(as.data.frame(sim.dat$Z))) dev.off() ``` ```{r, echo = FALSE, fig.align = 'left', out.width = "80%"} knitr::include_graphics(system.file("extdata", "vismis.png", package = "missoNet")) ``` A single model can be fitted using the most basic call to `missoNet`: ```{r} ## Training set X.tr <- sim.dat$X[tr, ] # predictor matrix Z.tr <- sim.dat$Z[tr, ] # corrupted response matrix ## Using the training set to fit the model. ## 'lambda.Beta' and 'lambda.Theta' are arbitrarily set to 0.1. ## 'verbose' = 0 suppresses printing of messages. fit <- missoNet(X = X.tr, Y = Z.tr, lambda.Beta = 0.1, lambda.Theta = 0.1, verbose = 0) ``` However, `missoNet` should be more commonly used with a grid of values for $\lambda_B$ and $\lambda_\Theta$. The two penalty vectors must have the same length, and the `missoNet` function will be called with each consecutive pair of values -- i.e. with the first elements of the two vectors, then with the second elements, etc. ```{r} lambda.Beta.vec <- 10^(seq(from = 0, to = -1, length.out = 10)) # 10 values on the log scale, from 1 to 0.1 lambda.Theta.vec <- rep(0.1, 10) # for each value of 'lambda.Beta', 'lambda.Theta' remains constant at 0.1 fit_list <- missoNet(X = X.tr, Y = Z.tr, lambda.Beta = lambda.Beta.vec, lambda.Theta = lambda.Theta.vec, verbose = 0) ``` The command above returns a sequence of models for users to choose from. In many cases, users may prefer that the software selects the best fit among them. Cross-validation is perhaps the simplest and most widely used method for this task. `cv.missoNet` is the main function to do cross-validation, along with supporting methods such as `plot` and `predict`. Here we use `cv.missoNet` to perform a five-fold cross-validation, samples will be permuted before splitting into multiple folds. For reproducibility, we assign a random seed to the permutation. ```{r} ## If 'permute' = FALSE, the samples will be split into k-folds in their original orders, ## i.e. the first (n/'kfold') samples will be assigned to the first fold an so on. cvfit <- cv.missoNet(X = X.tr, Y = Z.tr, kfold = 5, lambda.Beta = lambda.Beta.vec, lambda.Theta = lambda.Theta.vec, permute = TRUE, with.seed = 433, verbose = 0) ``` The program has warned us that the range of $\lambda_\Theta$ is inappropriate so that we need to supply a sequence of $\lambda_\Theta$ covering larger values. However, picking a suitable range for such hyperparameters may require experience or a series of trials and errors. Therefore, `cv.missoNet` provides a smarter way to solve this problem. Let's fit the cross-validation model again, this time running all folds in parallel on two CPU cores. The parallelization of **missoNet** models relies on package **parallel**, so make sure the parallel clusters have been registered beforehand. ```{r, eval = FALSE} ## 'fit.1se = TRUE' tells the program to make additional estimations of the ## parameters at the largest value of 'lambda.Beta' / 'lambda.Theta' that gives ## the most regularized model such that the cross-validated error is within one ## standard error of the minimum. cl <- parallel::makeCluster(min(parallel::detectCores()-1, 2)) cvfit <- cv.missoNet(X = X.tr, Y = Z.tr, kfold = 5, fit.1se = TRUE, parallel = TRUE, cl = cl, permute = TRUE, with.seed = 433, verbose = 1) parallel::stopCluster(cl) ``` Note that we have not explicitly specified the regularization parameters $\lambda_B$ and $\lambda_\Theta$ in the command above. In this case, a grid of $\lambda_B$ and $\lambda_\Theta$ values in a (hopefully) reasonable range will be computed and used by the program. Users can even provide just one of $\lambda_B$ and $\lambda_\Theta$ vectors such that the program will compute the other. Generally, it is difficult at the beginning to choose suitable $\lambda$ sequences, so we __suggest__ users start analysis using `cv.missoNet`, and let the program compute the appropriate $\lambda_B$ and $\lambda_\Theta$ values itself, and then automatically pick the optimal combination of them at which the minimum cross-validated error is achieved. `cv.missoNet` returns a `'cv.missoNet'` object, a list with all the ingredients of the cross-validated fit. We can execute the `plot` method ```{r, eval = FALSE} ## The values of mean cross-validated errors along with upper and lower standard deviation bounds ## can be accessed via 'cvfit$cvm', 'cvfit$cvup' and 'cvfit$cvlo', respectively. plot(cvfit) ``` ```{r, echo = FALSE, fig.align = 'left', out.width = "80%"} knitr::include_graphics(system.file("extdata", "cvfitHeat.png", package = "missoNet")) ``` to visualize the mean cross-validated error in a heatmap style. The white solid box marks out the position of the minimum mean cross-validated error with corresponding $\lambda_B$ and $\lambda_\Theta$ (`"lambda.min"` := $({\lambda_B}_\text{min}, {\lambda_\Theta}_\text{min})$), and the white dashed boxes indicate the largest $\lambda_B$ and $\lambda_\Theta$ at which the mean cross-validated error is within one standard error of the minimum, by fixing the other one at `"lambda.min"` (i.e. `"lambda.1se.Beta"` := $({\lambda_B}_\text{1se}, {\lambda_\Theta}_\text{min})$ and `"lambda.1se.Theta"` := $({\lambda_B}_\text{min}, {\lambda_\Theta}_\text{1se})$). We often use this "one-standard-error" rule when selecting the most parsimonious model; this acknowledges the fact that the cross-validation surface is estimated with error, so error on the side of parsimony is preferable. We can also plot the errors in a 3D scatter plot (without surfaces): ```{r, eval = FALSE} plot(cvfit, type = "cv.scatter", plt.surf = FALSE) ``` ```{r, echo = FALSE, fig.align = 'left', out.width = "80%"} knitr::include_graphics(system.file("extdata", "cvfitScat.png", package = "missoNet")) ``` After cross-validation, $\mathbf{B}^*$ and $\mathbf{\Theta}^*$ can be estimated at three special $\lambda$ pairs discussed above -- `"lambda.min"`, `"lambda.1se.Beta"` and `"lambda.1se.Theta"`. The corresponding results, along with the $\lambda$ values, are stored in three separate lists: `'est.min'`, `'est.1se.B'` and `'est.1se.Tht'` (`'est.1se.B'` and `'est.1se.Tht'` are `NULL` unless the argument `'fit.1se' = TRUE` when calling `cv.missoNet`). Let's extract the estimates of the model parameters $\hat{\mathbf{B}}$ and $\hat{\mathbf{\Theta}}$ at `"lambda.min"` and `"lambda.1se.Beta"` then plot them beside the ground truth $\mathbf{B}^*$ and $\mathbf{\Theta}^*$: ```{r, eval = FALSE} ## Define a plotting function plot_heatmap <- function(est, col, legend = FALSE, lgd_name, title) { return(ComplexHeatmap::Heatmap(est, cluster_rows = FALSE, cluster_columns = FALSE, col = col, show_heatmap_legend = legend, name = lgd_name, column_names_gp = grid::gpar(fontsize = 8), row_names_gp = grid::gpar(fontsize = 8), row_names_side = "left", border = TRUE, column_title = title)) } ## Color space col <- circlize::colorRamp2(c(-2, 0, 2), c("blue", "white", "red")) ## Beta* Beta_star <- sim.dat$Beta Beta_star_ht <- plot_heatmap(Beta_star, col, title = expression(paste(bold(Beta), "*"))) ## Beta_hat at "lambda.min" Beta_hat_min <- cvfit$est.min$Beta Beta_hat_min_ht <- plot_heatmap(Beta_hat_min, col, title = expression(paste(hat(bold(Beta)), " at `lambda.min`"))) ## Beta_hat at "lambda.1se.Beta" Beta_hat_1se <- cvfit$est.1se.B$Beta Beta_hat_1se_ht <- plot_heatmap(Beta_hat_1se, col, legend = TRUE, lgd_name = "values", title = expression(paste(hat(bold(Beta)), " at `lambda.Beta.1se`"))) ## Theta* Theta_star <- sim.dat$Theta Theta_star_ht <- plot_heatmap(Theta_star, col, title = expression(paste(bold(Theta), "*"))) ## Theta_hat at "lambda.min" Theta_hat_min <- cvfit$est.min$Theta Theta_hat_min_ht <- plot_heatmap(Theta_hat_min, col, title = expression(paste(hat(bold(Theta)), " at `lambda.min`"))) ## Theta_hat at "lambda.1se.Beta" Theta_hat_1se <- cvfit$est.1se.B$Theta Theta_hat_1se_ht <- plot_heatmap(Theta_hat_1se, col, legend = TRUE, lgd_name = "values", title = expression(paste(hat(bold(Theta)), " at `lambda.Beta.1se`"))) ## Plot Beta_star_ht + Beta_hat_min_ht + Beta_hat_1se_ht Theta_star_ht + Theta_hat_min_ht + Theta_hat_1se_ht ``` ```{r, echo = FALSE, fig.align = 'left', out.width = "100%"} knitr::include_graphics(system.file("extdata", "cvfitEst1.png", package = "missoNet")) ``` ```{r, echo = FALSE, fig.align = 'left', out.width = "100%"} knitr::include_graphics(system.file("extdata", "cvfitEst2.png", package = "missoNet")) ``` Like other regression methods, predictions can be made based on the fitted `'cv.missoNet'` object as well. The code below gives predictions for a new input matrix `'newx'` at `"lambda.min"` (`'s' = "lambda.1se.Beta"` and `'s' = "lambda.1se.Theta"` is available only when `'fit.1se' = TRUE`): ```{r, eval = FALSE} ## Predictions on the test set: newy = mu + newx %*% Beta. ## 's' can also be "lambda.1se.Beta" or "lambda.1se.Theta" ## (why 's' but not 'lambda'? Because we follow the naming rules of glmnet). newy <- predict(cvfit, newx = sim.dat$X[tst, ], s = "lambda.min") cat("dim(newy):", dim(newy)) ``` ```{r, echo = FALSE} cat("dim(newy): 30 12") ``` Users now should be able to use **missoNet** package with the functions introduced so far. There are many more arguments in the functions that give users a great deal of flexibility. To learn more, move on to the next section. In the section on a real data application, we demonstrate a complete process of analyzing a neuroimaging and genetic dataset using **missoNet**, and we __highly recommend__ interested users to browse this content. # Other Commonly Used Function Arguments **missoNet** includes a variety of functions for data simulation, goodness-of-fit evaluation, regularization parameter tuning, and visualization of results, as well as predictions in new data. In addition to the basic function arguments introduced so far, there are some other commonly used arguments available for users to customize the fit. ## `cv.missoNet` * `'rho'`: (optional) a scalar or a numeric vector of length $q$ for the missing probabilities of the response variables. The default is `'rho' = NULL` and the program will compute the empirical missing rates for each of the columns of `'Y'` and use them as the working missing probabilities; a user-supplied `'rho'` overrides this default. The default setting should suffice in most cases; note that misspecified missing probabilities would introduce biases into the model. Use the global assignment for an overall missing probability (i.e. a scalar `'rho'`) with care, because it might introduce extra errors especially when the outcomes `'Y'` has highly unbalanced missing rates across variables. * `'lambda.Beta'` and `'lambda.Theta'` are (optional) user-supplied sequences of regularization parameters for the lasso penalties, {$\lambda_B$} and {$\lambda_\Theta$}, among which the cross-validation procedure searches. When `'lambda.Beta' = NULL` and/or `'lambda.Theta' = NULL` (the default), a grid of $\lambda_B$ and/or $\lambda_\Theta$ in a (hopefully) reasonable range will be computed and used by the program. In this case, the following arguments can be used to adjust the range of sequence as well as the density of grid (__considering the limited space, similar control arguments are discussed together, below "/" serves as a separator between $\lambda_B$ and $\lambda_\Theta$__): + `'lamBeta.min.ratio'` / `'lamTheta.min.ratio'`: the smallest value of $\lambda_B$ / $\lambda_\Theta$ is calculated as the data-derived $\text{max}(\lambda_B)$ / $\text{max}(\lambda_\Theta)$ multiplied by this positive ratio. This argument controls the width of the generated lambda sequence by trimming its smallest end. The default depends on the sample size $n$ relative to the number of predictors $p$ / responses $q$. If $n>p$ / $n>q$, the default is `'lamBeta.min.ratio' = 1.0E-4` / `'lamTheta.min.ratio' = 1.0E-4`, otherwise it is `1.0E-2` / `1.0E-2`. A very small value of `'lamBeta.min.ratio'` / `'lamTheta.min.ratio'` may significantly increase runtime and lead to a saturated fit when $n \leq p$ / $n \leq q$. + `'n.lamBeta'` / `'n.lamTheta'`: the program generates `'n.lamBeta'` / `'n.lamTheta'` values linear on the log scale from $\text{max}(\lambda_B)$ / $\text{max}(\lambda_\Theta)$ down to ($\text{max}(\lambda_B)*$`'lamBeta.min.ratio'`) / ($\text{max}(\lambda_\Theta)*$`'lamTheta.min.ratio'`). If $n>p$ / $n>q$, the default is `'n.lamBeta' = 40` / `'n.lamTheta' = 40`, otherwise it is `20` / `20`. This argument controls the density of the generated lambda sequence. Avoid supplying a very large number to both `'n.lamBeta'` and `'n.lamTheta'` at the same time, because the program will fit (`'n.lamBeta'`$*$`'n.lamTheta'`) models in total for each fold of the cross-validation, thus the complexity grows in $\mathcal{O}(n^2)$. Instead, users can adopt a dynamic search strategy, see the next section for an example. + `'lamBeta.scale.factor'` / `'lamTheta.scale.factor'`: a positive multiplication factor controlling the overall magnitude of the entire $\lambda_B$ / $\lambda_\Theta$ sequence. This argument plays the role of scaling the lambda sequence (or shifting it on the log scale), the default is `'lamBeta.scale.factor' = 1` / `'lamTheta.scale.factor' = 1` and a typical usage is when the program has warned that the magnitudes of the $\lambda_B$ / $\lambda_\Theta$ values are inappropriate (either too large or too small), so that the optimal value of $\lambda_B$ / $\lambda_\Theta$ selected by the cross-validation is close to either boundary of the search range. * `'standardize'` and `'standardize.response'` are logical flags (the defaults are `TRUE`) for standardization of variables in `'X'` and `'Y'` to have unit variances prior to fitting the model. The estimated results are always returned on the original scale. `cv.missoNet` computes appropriate lambda sequences relying on standardization. If users wish to compare the results with those of other softwares, it is suggested to supply the program with datasets which have been standardized beforehand (by using `'scale()'` or similar functions), then set `'standardize'` and `'standardize.response'` to `FALSE`. * `'fit.1se'` is a logical flag (the default is `FALSE`) telling the program whether $\mathbf{B}^*$ and $\mathbf{\Theta}^*$ need to be estimated at `"lambda.1se.Beta"`:=$({\lambda_B}_\text{1se}, {\lambda_\Theta}_\text{min})$ and `"lambda.1se.Theta"`:=$({\lambda_B}_\text{min}, {\lambda_\Theta}_\text{1se})$, where ${\lambda_B}_\text{1se}$ and ${\lambda_\Theta}_\text{1se}$ are the largest $\lambda_B$ and $\lambda_\Theta$ respectively at which the mean cross-validated error is within one standard error of the minimum, while the other one is fixed at `"lambda.min"`:=$({\lambda_B}_\text{min}, {\lambda_\Theta}_\text{min})$. The corresponding estimation results are stored in `'est.1se.B'` and `'est.1se.Tht'` inside the returned object. * `'fit.relax'`: if `TRUE` (the default is `FALSE`), the program will re-estimate the edges in the active set (i.e. non-zero off-diagonal elements) of $\hat{\mathbf{\Theta}}$ without penalization ($\lambda_\Theta$ = 0), such a debiased estimate of the network structure could be useful for some interdependency analyses. This "relaxed" fit is stored in `'relax.net'`. WARNING: there may be convergence issues if the working empirical covariance matrix is not of full rank (e.g. $n < q$). * `'verbose'`: can be value of `0`, `1` or `2`. `'verbose' = 0` -- silent mode; `'verbose' = 1` (the default) -- limited tracing with progress bars; `'verbose' = 2` -- detailed tracing. Note that displaying the progress bars slightly increases the computation overhead compared to the silent mode. The detailed tracing should be used for monitoring progress only when the program runs extremely slowly, and it is not supported under `'parallel' = TRUE`. * `'parallel'` and `'cl'`: if `'parallel' = TRUE`, the program uses clusters to compute all folds of the cross-validation in parallel. Users must first register parallel clusters by using `parallel::makeCluster('nCores')` and supply the created cluster object to `'cl'`. One way to determine the number of cores for parallelization could be `'nCores' = min(parallel::detectCores()-1, ceiling(kfold/2))`. Note that if `'verbose' == 1` and `'nCores' == 'kfold'`, the progress bar for the cross-validation process will jump directly from 0 to 100, because the computations for all folds start and end at approximately the same time in a homogeneous machine. ## `missoNet` `missoNet` shares most arguments with `cv.missoNet` except that `'lambda.Beta'` and `'lambda.Theta'` are not optional for `missoNet`. Instead, users must specify either a scalar or a vector for both `'lambda.Beta'` and `'lambda.Theta'`; the vectors must have the same length. `missoNet` will be called with each consecutive pair of `'lambda.Beta'` and `'lambda.Theta'` -- i.e. with the first elements of the two vectors (`'lambda.Beta[1]'`, `'lambda.Theta[1]'`), then with the second elements (`'lambda.Beta[2]'`, `'lambda.Theta[2]'`), etc. ## `generateData` `generateData` provides a convenient way for users to quickly generate simulation data and get started to build their own analysis frameworks. Given the predictor matrix $\mathbf{X}$ and the true parameters $\mathbf{B}^*$ and $\mathbf{\Theta}^*$, a response matrix $\mathbf{Y}$ without missing values is generated by $\mathbf{Y} = \mathbf{X}\mathbf{B}^* + \mathbf{E}$, where the rows of $\mathbf{E}$ are sampled from $\mathcal{MVN}(0_q,(\mathbf{\Theta}^*)^{-1})$. A corrupted response matrix $\mathbf{Z} := f_{\mathbf{M}}(\mathbf{Y})$ has elements $\mathbf{Z}_{ij}$ = `NA` if $\mathbf{M}_{ij} = 1$, otherwise $\mathbf{Z}_{ij} = \mathbf{Y}_{ij}$. $\mathbf{M}$ is a indicator matrix of missingness having the same dimension as $\mathbf{Y}$ and parameterized by `'rho'` and `'missing.type'`, see below for details. * `'Beta'`: (optional) a user-supplied true regression coefficient matrix $\mathbf{B}^*$ ($p \times q$). If `'Beta' = NULL` (the default), a dense ($p \times q$) matrix will first be created by setting each element to a random draw from $\mathcal{N}(0, 1)$, then a sparse $\mathbf{B}^*$ is generated by randomly assigning zeros to some of the elements in this matrix. There are two kinds of sparsity that can be controlled by using the following arguments if $\mathbf{B}^*$ is automatically generated: + `'Beta.row.sparsity'`: a Bernoulli parameter between 0 and 1 controlling the approximate proportion of rows where at least one element could be nonzero in $\mathbf{B}^*$. + `'Beta.elm.sparsity'`: a Bernoulli parameter between 0 and 1 controlling the approximate proportion of nonzero elements among the rows of $\mathbf{B}^*$ where not all elements are zeros. * `'Theta'`: (optional) a user-supplied (positive definite) true precision matrix $\mathbf{\Theta}^*$ ($q \times q$) for sampling the error matrix $\mathbf{E} \sim \mathcal{MVN}(0_q,(\mathbf{\Theta}^*)^{-1})$. The default is `'Theta' = NULL` and the program will generate a block-structured $\mathbf{\Theta}^*$ having four blocks corresponding to four types of network structures: independent, weak graph, strong graph and chain. This argument is only needed when `'E' = NULL`. * `'Sigma.X'`: (optional) a user-supplied (positive definite) covariance matrix ($p \times p$) for sampling the predictor matrix $\mathbf{X} \sim \mathcal{MVN}(0_p, \mathbf{\Sigma}_X)$. If `'Sigma.X' = NULL` (the default), the function will generate $\mathbf{X}$ using an AR(1) covariance matrix with 0.7 autocorrelation (i.e. $[\mathbf{\Sigma}_X]_{jk} = 0.7^{|j-k|}$). This argument is only needed when `'X' = NULL`. * `'rho'`: a scalar or a vector of length $q$ specifying the approximate proportion of missing values in each column of $\mathbf{Z}$. Note that a scalar will be automatically converted to a vector filled with the same values. * `'missing.type'`: this argument determines how the missing values in the corrupted response matrix $\mathbf{Z}$ are randomly generated, can be one of `"MCAR"`, `"MAR"` and `"MNAR"`: + `"MCAR"`: missing completely at random. For all $i=1,\dots,n$ and $j=1,\dots,q$: $\mathbf{Z}_{ij}$ = `NA` if $\mathbf{M}_{ij}=1$, otherwise $\mathbf{Z}_{ij}$ = $\mathbf{Y}_{ij}$, where $\mathbf{M}_{ij}$ are i.i.d. Bernoulli draws with probability `'rho[j]'`; + `"MAR"`: missing at random. For all $i=1,\dots,n$ and $j=1,\dots,q$: $\mathbf{Z}_{ij}$ = `NA` if $\mathbf{M}_{ij}=1$, otherwise $\mathbf{Z}_{ij}$ = $\mathbf{Y}_{ij}$, where $\mathbf{M}_{ij}$ are Bernoulli draws with probability (`'rho[j]'`$*\frac{c}{1+e^{-[\mathbf{X}\mathbf{B}^*]_{ij}}}$), in which $c$ is a constant correcting the missing rate of the $j$th column of $\mathbf{Y}$ back to the level of `'rho[j]'`; + `"MNAR"`: missing not at random. For all $i=1,\dots,n$ and $j=1,\dots,q$: $\mathbf{Z}_{ij}$ = `NA` if $\mathbf{M}_{ij}=1$, otherwise $\mathbf{Z}_{ij}$ = $\mathbf{Y}_{ij}$, where $\mathbf{M}_{ij} = 1 \cdot (\mathbf{Y}_{ij} < T^j)$, in which $T^{j} = \mathcal{Q}$(`'rho[j]'`, $\mathbf{Y}_{\cdot j}$) is the `'rho[j]'`-quantile of $\mathbf{Y}_{\cdot j}$. In other words, $\mathbf{Y}_{ij}$ will be missing if it is smaller than the `'rho[j]'`-quantile of the $j$th column of $\mathbf{Y}$.