Sum of dependent random variables and copulas

Question

I have two dependent continuous random variables (RVs) $X$ and $Y$ and I'm interested in determining the CDF of the sum, i.e., $F_{X+Y}(t) = \mathbb{P}(X+Y \leq t)$. I know the marginal of $X$ and $Y$ ($f_X(\cdot)$ and $f_Y(\cdot)$, respectively). Using the copulas framework, I know that the joint distribution is given by $$ f(x,y) = c(F_X(x),F_Y(y)) \times f_X(x) f_Y(y).$$ Thus, the CDF can be written as $$F_{X+Y}(t) = \int \int \mathbb{1}_{(x+y\leq t)} c(F_X(x),F_Y(y)) \times f_X(x) f_Y(y) dx dy,$$ where $\mathbb{1}_{(\cdot)}$ is the indicator function. At this stage, even for the simplest case, I feel that computing the above in closed-form seems difficult, so instead, I can write $$F_{X+Y}(t) = \mathbb{E}[\mathbb{1}_{(X+Y\leq t)} c(F_X(X),F_Y(Y))]$$ and use Monte Carlo simulations to evaluate the above.

The context is $X$ and $Y$ are Rayleigh RVs, I know that they are correlated but I don't have their joint distributions (I have access to samples from the joint distribution but not the analytical form).

Provided that my argument is correct, I have the following questions:

How to choose the copulas $C$?
How to sample from $X$ and $Y$ when they are dependent? Thanks!

Your question "How to choose the copulas C?" is impossible to answer without knowing specifics of your problem. This is exactly the same thing as asking: *I have a random variable $X$, how should I choose its distribution?" So provide details on: What is the specific problem you are trying to solve, What do you know about the joint distribution of $X$ and $Y$ and their dependence? — g g, Aug 18 '22 at 06:57
@gg: To answer your question "What do you know about the joint distribution of X and Y and their dependence?", does this mean if $X$ and $Y$ have lower/upper tail dependence, for instance? Also, if I have samples from the joined distribution of $X$ and $Y$ would that be helpful in characterizing the dependence and hence the choice of $C$? — Jeremy, Aug 18 '22 at 09:43
Yes, exactly. The more you can clarify, the better one can answer. Also some context is always helpful. Such as, what kind of distributions are we talking about? In what kind of context do they occur.... And how do you know the margins but not not the joint distribution? — g g, Aug 18 '22 at 10:49
@gg: I don't have a specific context but I added an example of a context and an example of margins. — Jeremy, Aug 18 '22 at 12:32
@gg: maybe I'm missing a point here but if I have the joint distribution, I won't need copulas, right? I thought Sklar's Theorem provides a way to model the dependency in the absence of knowledge of the joint distribution. — Jeremy, Aug 18 '22 at 14:07

score 1 · Answer 1 · answered Apr 28 '23 at 14:13

Determining the distribution of sums of random variables is a popular topic in statistics, but even more so in finance and actuarial science. So when you are looking for literature you can also check the literature in theses fields.

There are two cases that you can consider: First, specific margins coupled with specific dependence structures, and second a general framework.

An example for the first point is the multivariate normal (or more generally multivariate elliptical distributions), where the distribution of sums can be computed. Here it is important to note that this is only possible for a specific combination of margins and dependence structures. Finding such particular combinations and calculating the resulting distribution of $X+Y$ is (or was) a popular topic in finance and actuarial science. If you want to go in this direction with your Rayleigh margins, you would have to try out different families of copulas to see in which cases you can solve the resulting double integral analytically (possibly taking inspiration from the literature on how to do this). As a result you would have a model $F_Z$ for $Z=X+Y$ that will depend on the parameters of $X$, $Y$ and the chosen copula $C$, and you can then go on to estimate these parameters from you data. This approach is nice when it works, but since it is not always possible to find closed-form solutions a numerical alternative is the following approach.

Concerning the second point, determining the distribution of $Z=X+Y$ in general for $(X,Y)$ with copula $C$ is discussed in:

Gijbels, I. and Herrmann, K. (2014). On the distribution of sums of random variables with copula-induced dependence. Insurance: Mathematics and Economics, 59, pp.27-44. (https://www.sciencedirect.com/science/article/pii/S0167668714000961)

Your integral representation is indeed the starting point of the paper, but you can actually transform the variables into $[0,1]^2$ and get rid of one of the integrals leaving you with a one-dimensional integral only. This integral can be computed efficiently and you can then again estimate the parameters of $F_Z$ based on your data.

Now coming back to your specific questions:

Simulation from copulas is covered by the answer here (and you can use the copula package in R to run the simulations): How to sample from a copula?
How to choose the copula $C$ is more difficult. If you have access to a bivariate sample $(x_i,y_i)^{i=1}_{n}$ from $(X,Y)$ you can choose a parametric model and estimate the parameters, or you can use a non-parametric approximation (like the empirical copula) to get an estimated copula $\hat{C}$.

Sum of dependent random variables and copulas

1 Answers1