This is the first post in what’s going to be a series on using Copulas in Stan. Each post is going to be short to keep me from postponing writing them. In this post I lightly introduce the series and give a quick primer on copulas.
Welcome to the first post in my series on copulas in Stan. After StanCon 2024 I was inspired to start writing short blog posts about this both to help get other started and also because I often don’t really know how a thing works until I have to write about it or present it.
If you’ve ever felt intimidated by Sklar’s theorem or how the Frank Copula is defined
just remember Arnold’s famous words from Predator
What Are Copulas?
At their core, copulas are functions that link univariate marginal distribution functions to form a multivariate distribution. According to Sklar’s Theorem, any multivariate joint distribution can be expressed in terms of its marginals and a copula that captures the dependence structure between variables.
If we let be a multivariate random variable with marginal distribution functions , the joint distribution function of can be written
where:
is the joint cumulative distribution function (CDF) of the collection of random variables .
are the marginal CDFs of each variate.
is the copula function.
Copulas as Densities
Copulas are multivariate distribution functions for random variables with uniform marginal distributions, i.e. they are functions that map the unit cube to . They can also be described using copula density functions when the marginals are continuous. If is the CDF of , and the multivariate distribution has a PDF, , we write
where is the density of the copula. Most often, we’d model the log of the PDF
Notice that is just the usual sum over marginal log-densities. Let’s rewrite the other term a little bit and explicitly write the parameters we’re conditioning on
The main difference when modeling with a copula is
We need to use the CDFs as well as the pdfs.
We need to code up some function that takes as input the data after it’s been transformed to by our CDFs and outputs a density.
The Copula We All Use
The simplest copula is the independence copula where we simply multiply together the uniform variates:
We see that if we use the independence copula, we just end up with the usual likelihood
In this way, we all use copulas whether we want to or not!
An Imaginary Stan Model
The Stan code below is just to give an idea of what a barebones model that uses copulas might look like in Stan. In future posts I’ll write models that are based on this blueprint to implement different types of copulas.
The code is basically a simple implementation of this equation from above:
except we’re allowing for realisations of the -dimensional random variable , so the input data becomes an matrix.