Bayesian goodness-of-fit testing for regression

This document describes the steps users need to follow to implement the proposed Bayesian goodness-of-fit test. To illustrate the approach, we use simulated data.

Step 1: Load packages and run R file

To implement the method, To implement the method, users need to load the packages below and read the R code from the file “MCMCAlgorithm.R”. This file contains the algorithm used to fit our proposal and obtain the corresponding Bayes factor.

#####################
### Required libraries
#####################
library(invgamma)
library(MASS)
library(Matrix)
library(mvtnorm)
library(splines)

#####################
### Read "MCMCAlgorithm.R" file
#####################
source(url("https://raw.githubusercontent.com/anfebar/anfebar.github.io/master/Software/BayesianGOFRegression/MCMCAlgorithm.R"))

Simulated data

We illustrate the proposed method using simulated data and beta regression. In an application with geniune data, this step is omitted.

#####################
### Simulate Data
#####################
set.seed(123)
## Predictors
n = 500 # sample size
p1 = 5  # number of continuous predictors
p2 = 5  # number of binary predictors
S.X = matrix(1,p1,p1)
for(j1 in 1:p1)
{
  for(j2 in 1:p1) S.X[j1,j2] = 0.7^abs(j1-j2)
}
S.X[1,-1] = 0
S.X[-1,1] = 0
X = cbind(1,rmvnorm(n,rep(0,p1),S.X))
X = cbind(X,sapply(rep(0.5,len=p2),function(p)rbinom(n,1,p)))

## Response variable
p = ncol(X)
m = X%*%c(-2,rep(0.15,p-1))
mu = 1/(1+exp(-m))
m = X[,c(1,2,ncol(X))]%*%rep(1,3)
phi= 1.5+(exp(m))
y=rbeta(n,mu*phi,(1-mu)*phi)
X = X[,-1]

Step 2: fit the model defining \(H_0\)

The input in this step is the data set containing the response variable (\(y\)), predictors (\(x\)), and model for \(y|x\). In this example, the precision parameter in the proposed beta regression is not indexed by predictors. Notice that the precision parameter in the Beta regression model used to simulate data depends on the predictors, i.e., the proposed model is not correctly specified.

#######################
### Fit beta regression with constant precision parameter 
#######################
library(betareg) # Package for beta regression
fit <- betareg(y ~ X)

Step 3: compute universal residuals

Compute universal residuals using the fitted model in step 2.

#######################
### Universal residuals
#######################
X = cbind(1,X)
m = X%*%fit$coefficients$mean
mu = 1/(1+exp(-m))
phi= fit$coefficients$precision
u=qnorm(pbeta(y,mu*phi,(1-mu)*phi)) # qnorm of the Rosenblatt’s transformation
## Transform continous predictors
aux = cbind(rep(1,n))
for(j in 2:(p1+1)) aux = cbind(aux,bs(X[,j]))
Xbs = cbind(aux,X[,(p1+2):p])
## Create dataframe -- name column containing the universal residuals as "u"
Data = data.frame(u=u,X=Xbs)

Step 4: compute Bayes factor and posterior probability

The input in this last step is the data set containing the universal residuals and predictors. Using the proposed approach, we then compute the posterior probability (and Bayes factor) of \(H_0\) which states the model use in Step 2 is the data generating mechanism. This step might take several minutes.

#####################
### Set mcmc parameters --  H: DPM truncation level
#####################
MCMC = list(nburn = 5000, nskip = 10, nsave = 1000, H = 50)

#####################
### Compute Bayes factor and posterior probability
#####################
BF = f.MCMC(Data,MCMC)

## [1] "Burning ..."
## [1] "Burning 1000 of 5000"
## [1] "Burning 2000 of 5000"
## [1] "Burning 3000 of 5000"
## [1] "Burning 4000 of 5000"
## [1] "Burning 5000 of 5000"
## [1] "Saving 100 of 1000"
## [1] "Saving 200 of 1000"
## [1] "Saving 300 of 1000"
## [1] "Saving 400 of 1000"
## [1] "Saving 500 of 1000"
## [1] "Saving 600 of 1000"
## [1] "Saving 700 of 1000"
## [1] "Saving 800 of 1000"
## [1] "Saving 900 of 1000"
## [1] "Saving 1000 of 1000"

BF