Linear layers (or Dense layers) are to Deep Learning what rockets are to space ships: essential. Most, if not all, models I’ve seen to date use them in one capacity or another. But can the proverbial wheel be improved on? We’ll look at a simple method to put your Linear layers on “steroids” in this article.

What happens in a Linear Layer is we take all of the data as a matrix (or tensor) and use matrix multiplication with a matrix of trained weights, then add a vector of trained biases. The number of operations in matrix multiplication grows exponentially…

What about the biases? Shouldn't the r, z, and h biases be added before applying activation?

I am aware there is an alternative form that eliminates the input weights from r and z and this doesn't include the biases for these two, but I am not familiar with any GRU variant that eliminates all of the biases.

- Training a Machine Learning model can take a long time.
- There is the problem of dead neurons, where a model probably started out with too many neurons and/or layers but there is no efficient way to know this in advance or remove those extra parameters later.
- Sometimes a model begins to plateau below the accuracy we want, leaving no way to further improve accuracy without starting training over on a larger new model.

What if we could start from a smaller neural network model and grow it to the size of the problem during the training process?

Suppose we could…

If you’ve ever trained a GAN(Generative Adversarial Network) or an image classification neural network, you know just how data hungry it can be. Especially when you want to work with higher resolution images. The other day, I tried running a GAN on a couple hundred images. Just on a resolution of 256x256 and the compute time was close to an hour. And I’ve got a fairly decent graphics card: a GeForce RTX 2070 Super.

So that got me thinking: How in the world do animals process so much visual information on the go? …

Neural Network Enthusiast