19

I'm looking for a Python package that implements multivariate linear regression.

(Terminological note: multivariate regression deals with the case where there are more than one dependent variables while multiple regression deals with the case where there is one dependent variable but more than one independent variables.)

Franck Dernoncourt
  • 5,862
  • 12
  • 44
  • 80

2 Answers2

14

You can still use sklearn.linear_model.LinearRegression. Simply make the output y a matrix with as many columns as you have dependent variables. If you want something non-linear, you can try different basis functions, use polynomial features, or use a different method for regression (like a NN).

jamesmf
  • 3,117
  • 1
  • 18
  • 25
2

Just for fun you can compute the feature by hand by forming tuples $seq =(d_1,...,d_N)$ such that $Sum(seq) = \sum^N_{i=1} \leq D$. Once you form those tuples each entry indicates the power the current raw feature should be raised by. So say $(1,2,3)$ would map to the monomial $x_1 x_2^2 x_3^3$.

The code to get the tuples is:

def generate_all_tuples_for_monomials(N,D):
    if D == 0:
        seq0 = N*[0]
        sequences_degree_0 = [seq0]
        S_0 = {0:sequences_degree_0}
        return S_0
    else:
        # S_all = [ k->S_D ] ~ [ k->[seq0,...,seqK]]
        S_all = generate_all_tuples_for_monomials(N,D-1)# S^* = (S^*_D-1) U S_D
        print(S_all)
        #
        S_D_current = []
        # for every prev set of degree tuples
        #for d in range(len(S_all.items())): # d \in [0,...,D_current]
        d = D-1
        d_new = D - d # get new valid degree number
        # for each sequences, create the new valid degree tuple
        S_all_seq_for_deg_d = S_all[d]
        for seq in S_all[d]:
            for pos in range(N):
                seq_new = seq[:]
                seq_new[pos] = seq_new[pos] + d_new # seq elements dd to D
                if seq_new not in S_D_current:
                    S_D_current.append(seq_new)
        S_all[D] = S_D_current
        return S_all

then it should be easy to do regression if you know linear algebra.

c = pseudo_inverse(X_poly)*y

example. Probably better to do regularized linear regression though if your interested in generalization.


Acknowledgements to Yuval is CS exchange for the help.