This is the first post in what’s going to be a series on using Copulas in Stan. Each post is going to be short to keep me from postponing writing them. In this post I lightly introduce the series and give a quick primer on copulas.
Welcome to the first post in my series on copulas in Stan. After StanCon 2024 I was inspired to start writing short blog posts about this both to help get other started and also because I often don’t really know how a thing works until I have to write about it or present it.
If you’ve ever felt intimidated by Sklar’s theorem or how the Frank Copula is defined
At their core, copulas are functions that link univariate marginal distribution functions to form a multivariate distribution. According to Sklar’s Theorem, any multivariate joint distribution can be expressed in terms of its marginals and a copula that captures the dependence structure between variables.
If we let \(X = (X_1, \dots, X_D)\) be a multivariate random variable with marginal distribution functions \(F_i\), the joint distribution function of \(X\) can be written
\(H(X)\) is the joint cumulative distribution function (CDF) of the collection of random variables \(X\).
\(F_i(X_i)\) are the marginal CDFs of each variate.
\(C\) is the copula function.
Copulas as Densities
Copulas are multivariate distribution functions for random variables with uniform marginal distributions, i.e. they are functions that map the unit cube \([0,1]^D\) to \([0,1]\). They can also be described using copula density functions when the marginals are continuous. If \(H(X)\) is the CDF of \(X\), and the multivariate distribution has a PDF, \(h\), we write
Notice that \(\sum_{i=1}^D \log f_i(X_i)\) is just the usual sum over marginal log-densities. Let’s rewrite the other term a little bit and explicitly write the parameters we’re conditioning on
The main difference when modeling with a copula is
We need to use the CDFs \(F_i(X_i \vert \theta_i)\) as well as the pdfs.
We need to code up some function \(\log c\left(u_1, \dots, u_D\vert \theta_c\right)\) that takes as input the data \(X\) after it’s been transformed to \([0,1]^D\) by our CDFs and outputs a density.
The Copula We All Use
The simplest copula is the independence copula where we simply multiply together the uniform variates:
In this way, we all use copulas whether we want to or not!
An Imaginary Stan Model
The Stan code below is just to give an idea of what a barebones model that uses copulas might look like in Stan. In future posts I’ll write models that are based on this blueprint to implement different types of copulas.
The code is basically a simple implementation of this equation from above: