Machine Learning Study Group – Week 2 & 3 Recap

Hello World,

So this is another recap from our study group covering the Andrew NG course on Coursera. Lets start by a quick summary from the two weeks. Week 1 was all about introduction to linear regression and gradient descent. There were no assignments due. Week 3 was all about multi-variate linear regression, normalization and a few other topics. There was a coding assignment as well as a quiz due for week 2.

During our study group sessions for each of these weeks, there were a couple common themes.

  1. Vectorization of algorithms was confusing.
  2. Derivatives and Partial Derivatives was confusing.
  3. Converting Mathematical notation to real code was confusing.
  4. The difference between cost functions and optimization functions was confusing.
  5. Normalization was confusing.

I’m going to hit each one of these high level in this article, and then create separate articles where necessary and re link back here as they are written.

Vectorization

This is a difficult topic that will require its own article.  Basically this is the process of converting standard mathematical equations into linear algebra matrix/vector mathematics to increase performance.  This is crucial for machine learning as many of the core functions that are executed repeatedly are able to be easily vectorized increasing performance literally thousands fold, especially if you take advantage of modern gpus.  I have written an article here which demonstrates some of that difference.

Derivatives and Partial Derivatives

Again another topic which requires its own separate article, which I have yet to write, but high level here is the basics.  A derivative shows the rate of change of an equation.  This is key as a cost function always has at minimum the form hypothesis – actual.  Meaning that in many circumstances you have weighted values and you can see what the slope of those weights are given that function.  Partial derivatives allow you to separate out each weight and see its contribution to the whole for the error.  Giving you a rate of error for each one.  This is the premise behind the equations for optimizing our function weights.  This is the backbone of training machine learning algorithms which uses weighted values.

Converting Mathematical Notation to Code

This is a very interesting topic, which also deserves its very own article, but high level here is what to think about.  A summation is simply a for loop (or vectorized operation).  A derivative or partial derivative is going to likely have its own implementation for performance reasons.  If you find yourself needing this on the fly, a symbolic math library would be necessary.  Active patterns in F# are great for this.  When converting equations really think a lot about linear algebra.  Expressing equations as linear algebra can increase performance a huge amount and is an example of converting linear algebra equations to code.  When I pick up a new language, the first thing I start doing is looking for numerical computing libraries and or linear algebra libraries.  In .net this is MathNumerics in Python its numpy in R, it comes out of the box :), but use Microsoft R Open for the built in MKL and LPAS/BLAS optimizations for your platform.

The difference between cost functions and optimization functions

A cost function shows you how well your function is performing in relationship to your desired outcome.  For example sum of squared errors is simply the difference between your prediction and actual squared summed across all training samples.  You may choose other cost functions.  Your optimization function is simply the partial derivative of the cost function such that you can identify and optimize each component that makes up the cost function there by reducing it.

Normalization

This deserved its own article, and it is found here.  Feel free to leave comments if it is still un-clear and I will modify the article or write new articles to address those concerns.

Summary

There are a lot of topics here, so I will cruise through these as time permits and balance with the other articles I have on my back log as well as ones that are more directly associated with my daily work and happenings in the Microsoft Cognitive Serices and Cortana Analytics Suite world.  Just keep an eye on the blog occasionally and they will pop up over the next few months.

Leave a Reply

Your email address will not be published. Required fields are marked *