4

What's the difference between the Cartesian product in set theory and in relational algebra?

The Cartesian product in set theory is defined as:

$$A \times B = \{(a, b) \mid (a \in A) \land (b \in B)\}$$

I think this is exactly how it works in relational databases, but Wikipedia tries to make a difference that I don't understand:

$$ R \times S := \{ (r_1,r_2,\dots,r_n,s_1,s_2,\dots,s_m) \mid (r_1,r_2,\dots,r_n) \in R, (s_1,s_2,\dots,s_m) \in S \} $$

1 Answers1

2

The Wikipedia definition merely "opens the parentheses". Using the pure set-theoretic definition, we would expect $$ ((r_1,r_2,\dots,r_n),(s_1,s_2,\dots,s_m)) $$ instead of $$ (r_1,r_2,\dots,r_n,s_1,s_2,\dots,s_m). $$ Assuming $R,S$ are both relations over the same universe $U$, the Wikipedia definition ensures that $R \times S$ is an $(n+m)$-ary relation over $U$. Under the set-theoretic definition, we get a binary relation over $U^n$ when $n = m$, and a unary relation over $U^n \times U^m$ in general.

We often write $U^n \times U^m = U^{n+m}$, but this isn't in general an equality of sets (it might be in some special cases, depending on the exact definition of $U^n$). Rather, we identify $U^n \times U^m$ with $U^{n+m}$, that is, we ignore the difference (this informal notion can be formalized). The difference between $R \times S$ as sets and as relations is similar – we identify the two objects although they are "physically" different.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514