Questions tagged [linear-algebra]

A field of mathematics concerned with the study of finite dimensional vector spaces, including matrices and their manipulation, which are important in statistics.

Overview

Linear algebra is the field of mathematics concerned with the study of finite-dimensional vector spaces. Matrices and their manipulation, which form the algorithmic part of linear algebra, are particularly important in statistics.

References

The following are introductory references:

A thread focused on linear algebra references useful for applied statistics is the following:

The following threads from math.se also have lists of references:

87 questions
14
votes
3 answers

How does tensor product/multiplication work in TensorFlow?

In Tensorflow, I saw the following example: import tensorflow as tf import numpy as np mat_a = tf.constant(np.arange(1,13, dtype=np.int32), shape=[2,2,3]) mat_b = tf.constant(np.arange(12,24, dtype=np.int32), shape=[2,3,2]) mul_c =…
frt132
  • 159
  • 1
  • 4
12
votes
1 answer

Finding linear transformation under which distance matrices are similar

I have $n$ sets of vectors, where each set $S_i$ contains $k$ vectors in $\mathbb{R}^d$. I know there is some unknown linear transformation $W$ under which the distance matrix $D_i$ (a $k\times k$ matrix) is approximately "the same" (i.e. has a low…
11
votes
2 answers

What is the use of additional column of 1s in normal equation?

Currently I am going through Normal Equation in Machine Learning. $$ \hat\theta = (X^T \cdot X)^{-1} \cdot X^T \cdot y $$ But when I see how they use this equation, I found they always add an additional column of 1s in the starting of matrix X…
Inquisitive
  • 213
  • 2
  • 7
9
votes
1 answer

Deriving backpropagation equations "natively" in tensor form

Image shows a typical layer somewhere in a feed forward network: $a_i^{(k)}$ is the activation value of the $i^{th}$ neuron in the $k^{th}$ layer. $W_{ij}^{(k)}$ is the weight connecting $i^{th}$ neuron in the $k^{th}$ layer to the $j^{th}$ neuron…
Neil Slater
  • 29,388
  • 5
  • 82
  • 101
5
votes
1 answer

Closed form solution of linear regression via least squares using matrix derivatives

How is the closed form solution to linear regression derived using matrix derivatives as opposed to using the trace method as Andrew Ng does in his Machine learning lectures. Specifically, I am trying to understand how Nando de Frietas does it…
5
votes
0 answers

How is image convolution actually implemented in deep learning libraries using simple linear algebra?

As a clarifier, I want to implement cross-correlation, but the machine learning literature keeps referring to it as convolution so I will stick with it. I am trying to implement image convolution using linear algebra. After looking around on the…
4
votes
2 answers

How do we define a linearly separable problem?

When we talk about Perceptrons, we say that they are limited for approximating functions that are linearly separable, while Neural Networks that use non-linear transformations are not. I am having trouble understanding this idea of linear…
Stefan Radonjic
  • 746
  • 1
  • 8
  • 23
4
votes
1 answer

Mathematical formulation of Support Vector Machines?

I'm trying to learn maths behind SVM (hard margin) but due to different forms of mathematical formulations I'm bit confused. Assume we have two sets of points $\text{(i.e. positives, negatives)}$ one on each side of hyperplane $\pi$. So the…
Anjith
  • 961
  • 2
  • 11
  • 20
4
votes
1 answer

How can positional encodings including a sine operation be linearly transformable for any offset?

In the paper "Attention is all you need" the authors add a positional encoding to each token in the sequence (section 3.5). The following encoding is chosen: $ PE(pos, 2dim) = sin(pos / 10000 ^ {2dim/d_{model}} ) $ $ PE(pos, 2dim+1) = cos(pos /…
Stephan Heijl
  • 181
  • 1
  • 8
4
votes
2 answers

What type of technique can be used to solve this question?

Apology for the ambiguous title, I do not know the term. I have data of some products which a few variables: origin, weight, brand. For example: Product A = "China, 100g, Brand X" Product B = "Japan, 50g, Brand Y" Product C = "China, 30g,…
lpounng
  • 1,197
  • 4
  • 21
3
votes
2 answers

Backpropagation with a different sized training set?

I'm trying to create a NN whose input is a (length m) array of 3d vectors $$\vec{x}_i = [x_{i,1},x_{i,2},x_{i,3}], \hspace{5mm}i=1:m $$ and whose output is a similarly sized array: $$\vec{h}_{\theta,i} = [h_{\theta,i1},h_{\theta,i2},h_{\theta,i3}],…
3
votes
1 answer

How to incorporate the uncertainty of the model coefficients in the prediction interval of a multiple linear regression

I'm dealing with modeling small experimental data sets. As most experimental work does not generate thousands of samples, but rather a handful, I need to be inventive about how to deal with this small number of data sets (say 10-20). I've been…
3
votes
0 answers

Possible flaw in the MDS method for dimensionality reduction

The MDS (multidimensional scaling) method is used to solve the problem of dimensionality reduction. Basically, it does the following: given $n$ points $x_1,\cdots,x_n\in\mathbb R^d$, try to find a smaller $d'$ and points $y_1,\cdots,y_n\in\mathbb…
3
votes
1 answer

Why in this case are gradient steps not perpendicular to contour lines?

There is a theorem that gradient at point is perpendicular to tangent line to contour line at given point. Why in this picture it seems that this rule is not respected? source: http://www.deeplearningbook.org/contents/numerical.html page 89 And…
3
votes
1 answer

PCA formulation - Deep Learning book by Ian Goodfellow

I am reading this deep learning book by Ian goodfellow. In the PCA formulation in the first chapter i.e Linear Algebra, he mentions the following: we need to choose the encoding matrix D. To do so, we revisit the idea of minimizing the L2 distance…
1
2 3 4 5 6