I've actually spent many sleepless nights thinking about this. Let me try to give some sort of an answer.
Suppose we assume the model $Y = X\beta + \epsilon$ where $X$ is a $N\times(p+1)$ data matrix, whose entries we view as fixed. And $\epsilon \sim \mathcal N(0,\sigma^2 I)$. The question is how to interpret this model. Suppose you are an experimenter , and you want to study the relationship between the height of a plant and how much water it is given. You have $N$ plants, and you water the $j$th plant $x_j$ liters of water each day, and after 30 days, you record the height $y_j$. If $y_j$ is well modeled by the normal distribution with mean $\beta_0+\beta_1x_j$ and variance $\sigma^2$, then this model is appropriate. Moreover, since you chose what $X$ is, it makes sense to treat $X$ as fixed.
Now the question is what exactly can we do with this assumption? First, we use the OLS estimator $\hat \beta = (X^TX)^{-1}X^Ty$, and from our assumptions, it follows that $\hat \beta \sim \mathcal N(\beta,\sigma^2(X^TX)^{-1})$, and it is common to take the estimator $\hat \sigma^2 = \frac{1}{N-p-1}\sum(y_j-\hat y_j)^2$.
There are two things we can do with this information: (1) We can make inference about $\beta$ (2) We can make predictions for future plant height. Let's look at both.
To do inference, the most basic thing one can do is test $H_0: \beta_0 = 0$ or $H_0: \beta_1 = 0$ or $H_0 : \beta_0=\beta_1 = 0$. Hopefully it is clear how to interpret these. Or we could also make confidence intervals for $\beta$.
To do prediction, suppose we were to give another plant $x$ liters of water every day. If $x$ is an entry of the second column of $X$, then we can clearly create a confidence interval for the true height after 30 days. What if $x$ is not an entry of the second column of $X$? Which is to say that if the amount of water we gave was $\in \{0.1,0.2,0.3\}$ and $x \notin \{0.1,0.2,0.3\}$. Here, if $.1\leq x\leq .3$, then it seems reasonable to assume that it would also hold that $y = \beta_0+\beta_1x+\epsilon$. If $x\gg.3$ or $x\ll.1$, then such a model is probably no longer reasonable, and prediction would not be appropriate.
Now, this model is for experimental data. But if we have observational data, then it is appropriate to assume that $(X,Y)$ come from some joint distribution (here I am using $X$ to denote both the random variable, and the data matrix with a column of 1's; hopefully it is clear which one I'm referring to from context). We can always write $Y = \mathbb E(Y\mid X)+\epsilon$. Here we assume $\mathbb E(Y\mid X) = \beta_0+\beta^T X + \epsilon$. We also assume that $\epsilon \mid X \sim \mathcal N(0,\sigma^2)$, where $\sigma^2$ is not a function of $X$. Then, again, we use the OLS estimator $\hat \beta = (X^TX)^{-1}X^Ty$ and it follows that $\hat \beta\mid X \sim \mathcal N(\beta, \sigma^2(X^TX)^{-1})$. (Note that $X$ on the left means something different than $X$ on the right, sorry!) How to interpret this statement? If you were to observe the exact same data matrix $X$ over and over again, the outcomes $y$ would be different each time because not all of the variability in the outcome is explained by the covariates. Hence the estimate $\hat \beta$ would be different each time, and its distribution is given by the aforementioned. You can do inference and prediction as before, and the interpretation would follow a similar reasoning to the just stated.