11

Consider a discrete case. PMF is the probability each value of random variable gets. So, for example, X ~ Poisson(2). I plot these probabilities (below), so I can say that I show the PMF of X. But on the other hand I show the distribution of X. For example, I can say whether the distribution I have is symmetrical or not. So, what is the difference between probability distribution and PMF terms (in discrete case)? Below I also bring the definitions from Wikipedia, but it is not helpful either.

Many thanks!

Enter image description here

A probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value.

A probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment.

MathAdam
  • 3,397
John
  • 435

4 Answers4

9

The word "distribution" gets thrown around loosely sometimes, which can cause confusion.

The distribution of a random variable $X$ is the function that takes a set $S \subset \mathbb R$ as input and returns the number $P(X \in S)$ as output. (Technically I should assume that $S$ is a "nice" subset of $\mathbb R$ in some sense, but let's not worry about that.) I think the Wikipedia article would be more clear if it just gave us this definition up front.

The probability mass function (PMF) of a random variable $X$ is the function that takes a number $x \in \mathbb R$ as input and returns the number $P(X=x)$ as output. If $X$ is a discrete random variable, then the PMF of $X$ is a convenient way to specify the distribution of $X$.

Here is one way to describe the relationship between the distribution of $X$ and the PMF of $X$, in the case where $X$ is a discrete random variable. Suppose that the possible values of $X$ are $x_1,x_2,\ldots$ If $f$ is the distribution of $X$, then $$ f(S) = \sum_{i : x_i \in S} P(X = x_i) $$ for any set $S \subset \mathbb R$.

littleO
  • 54,048
  • 1
    Thanks. But is not the definition of PMF by itself exactly as you described for "distribution of a random variable X"? If not, how they are different? – John Dec 31 '18 at 14:34
  • 1
    No, those definitions are not the same. For example, the PMF of $X$ takes a number as input, but the distribution of $X$ takes a set of numbers as input. – littleO Dec 31 '18 at 14:38
  • 1
    Thank you a lot, for clarification. But I am still a bit confused regarding your statement "the distribution of X takes a set of numbers as input". Suppose I flip a coin twice, X is number of tails. My PMF is: 0: 1/4, 1: 1/2, 2, 1/4. So, for each value X (0,1,2,) the PMF returns the P(X=x). How will look the distribution function? It would help you can update your answer. Thanks! – John Dec 31 '18 at 14:51
  • 1
    Let $f$ be the distribution of the random variable $X$ that you mentioned. Then for example, if $S = (-7,1.3)$ (that is, $S$ is the open interval from $-7$ to $1.3$), then $f(S) = 3/4$. We could attempt to describe $f$ more explicitly but I'm not sure that it would be useful. The definition of $f$ is just that $f(S) = P(X \in S)$ for any set $S \subset \mathbb R$. – littleO Dec 31 '18 at 14:58
  • 1
    OK. I see. Thanks a lot. – John Dec 31 '18 at 15:03
  • @John Sure. I added a paragraph that describes the relationship between the distribution of $X$ and the PMF of $X$ a bit more explicitly (in the case where $X$ is a discrete random variable). – littleO Dec 31 '18 at 15:19
  • To the downvoter, I'd be genuinely interested in knowing the reason for the downvote; I might learn something. – littleO Dec 31 '18 at 15:36
  • Didn't downvote, but you probably mean "The distribution of a random variable X is the function that takes a set $S \subset \mathbb{R}$ as input and returns the number $P(X \in S)$".

    Also, first you define a PMF, and then you say that in the specific case of $X$ being discrete, it is a useful way of obtaining the distribution. This is misleading. The PMF is only defined for discrete random variables.

    – snar Dec 31 '18 at 15:55
  • Thank you @snarski. I just changed it to $X \in S$. Regarding the definition of the PMF, I believe that Blitzstein's probability book defines the PMF as I have here, so that the input can be any real number (but the output might be $0$). In this approach, technically any random variable has a PMF, but PMFs are only interesting / useful for discrete random variables. However, maybe that way of defining the PMF is not standard. – littleO Dec 31 '18 at 16:14
  • 1
    It's misleading because then people think that's how the probability density function is defined. – snar Dec 31 '18 at 22:12
4

I'm not aware of an agreed upon definition/meaning for probability distribution.

On the other hand, probability mass functions and probability density functions have agreed upon definitions and are used to describe probability distributions.

A probability density function is the generalization of probability mass functions to random variables which are not strictly discrete. In the case of a discrete random variable, the main difference is that the probability density function should integrate to one, while the probability mass function should add to one.

Suppose $X$ is a discrete random variable taking values $S=\{x_1,x_2,\ldots\} \subset \mathbb{R}$.

The probability mass function is a function $p : S\to [0,1]$ where $$ p(x) = \mathbb{P}(X=x) $$

On the other hand, the density function (of any RV) can be thought of as, $$ f(x)dx = \mathbb{P}(X\in[x+dx]) $$ In integral form you could write this as, $$ \int_{x}^{x+dx} f(z)dz = \mathbb{P}(X\in [x,x+dx]) $$

That is, the density times the width of a small interval gives the probability that $X$ is in that small interval $X\in[x,x+dx]$.

If the random variable is discrete, then the probability that $X$ is in this interval is the same as the probability $X=x$ for small enough $dx$. So you have $f(x)dx = \mathbb{P}(X=x)$ (or in integral form, $\lim_{dx\to 0}\int_{x}^{x+dx} f(z)dz = \mathbb{P}(X=x)$).

In particular, if $p(x)$ is the pmf for a discrete random variable $X$, then we can write the density function as: $$ f(x) = \sum_{i:p(x_i)\neq 0} p(x_i) \delta(x-x_i) $$ where $\delta(x)$ is the delta distribution; i.e. $\int_a^b f(x)\delta(c)d x = f(c)$ whenever $c\in[a,b]$

  • 2
    Only a continuous random variable has a density function. Note that $\lim_{dx \to 0} \int_x^{x+dx} f(z) dz = 0$, so your final equation states that $P(X = x) = 0$. This is true for a continuous random variable but not for a discrete random variable. (Discrete random variables don't have density functions.) – littleO Dec 31 '18 at 15:23
  • 2
    You can write the density of discrete random variables using delta distributions. – overfull hbox Dec 31 '18 at 15:26
0

I provide a simple explanation of this here: Difference between "probability density function" and "probability distribution function"?. In short, a probability mass function is a discrete probability distribution function, where discrete is often implied.

0

Probability Mass Function.

I would say pmf of a discrete random variable is a graph or a table or a formulae that specifies the proportion or probabilities associated with each possible value the random variable can take.

It is a function that gives the probability that a discrete random variable is exactly equal to some value.