Beyond Wx+b in Tensorflow

Many tutorials I have seen have been centered around using tensorflow for classification. Much of this Tutorials made use of some variant of logistic regression. Pushing the strength of the model by adding more layers or introducing exotic functions like dropouts or convolutions. This post contains my documentation of my journey from Y = Wx + b to any other cool algorithm I can find. Also I will be taking on the problem that I documented in A curious case of a hairy flashdrive.

Where does it all start from?

This story starts from reading the writings of Sebastian Ruder - a Natural Language PHD student. His articles explore many concepts and I find them to be quite explanatory, I recommend you take a look.