Context:
I am trying to understand the differences between the GRU/LSTM cells from tensorflow and pytorch (for research reproducibility) and noticed that TensorFlow differentiates between the kernel_initializer and the recurrent_initializer (see documentation (GRU/LSTM)) while PyTorch does not even mention any build in initialization (though you can overwrite the variables for custom initialization) (see torch documentation (GRU/LSTM)).
Question:
What is the difference between kernel weights and recurrent kernel weights in LSTMs and GRUs? Why is the initialization between those different (in Tensorflow) and how would I replicate such initialization in PyTorch?