--

What about the biases? Shouldn't the r, z, and h biases be added before applying activation?

I am aware there is an alternative form that eliminates the input weights from r and z and this doesn't include the biases for these two, but I am not familiar with any GRU variant that eliminates all of the biases.

https://en.wikipedia.org/wiki/Gated_recurrent_unit

--

--