This article is meant to explain how the K-Means Clustering algorithm works while simultaneously learning a little Python.
What is K-Means?
K-Means Clustering is an unsupervised learning algorithm that tells you how similar observations are by putting them into groups or “clusters”. K-Means is often used as a discovery step on new data to discover what various categories might be and then apply something such as a k-nearest-neighbor as a classifier to it after understanding the centroid labels. Where a centroid is the center of a “cluster” or group.
Here you can find the slide deck for my talk for SQL Saturday #524 in South Florida. I hope you find this useful, but it should be far more useful if you attend in person. This talk lays the foundation for building and understanding analytical computing for cloud and devices as well as how they work together.
So here we go with another recap. This week we did a deep dive into binary classification using Logistic Regression. Logistic regression and binary classification is the underpinnings for modern neural networks so a deep and complete understanding of this is necessary to be proficient in machine learning.
Sigmoid really isn’t that complicated (once your understand it of course). Some back knowledge in case you are coming at this totally fresh is that the Sigmoid function is used in machine learning primarily as a hypothesis function for classifiers. What is interesting is that this same function is used for binary classifiers, multi-class classifiers and is the backbone of modern neural networks.
This is a high level article geared for general consumption of the normal individual! I’ve been thinking about types of customer engagements I have been doing lately and decided to break it down into a series of categorical engagements. There are 4 categories of engagements: Descriptive, Predictive, Prescriptive and Actuated Analytics engagements.
So here I am after trying for a long time to not learn Python learning Python. It just seems like I might get a hit or two more on my blog with some Python content. Well whats the first thing I need to figure out aside from getting it up and running in my environment and installing some libraries… Thats right, find a numerical computing library and see how it ticks.
Lets just start with my environment, because I painstakingly chose one.
In this article we are going to cover a simple version of Gradient Descent. It is important to note that this version of gradient descent is using Sum of Squares as its cost function to reduce. This implementation utilizes vectorized algorithms. Lets start off with…
So this is another recap from our study group covering the Andrew NG course on Coursera. Lets start by a quick summary from the two weeks. Week 1 was all about introduction to linear regression and gradient descent. There were no assignments due. Week 3 was all about multi-variate linear regression, normalization and a few other topics. There was a coding assignment as well as a quiz due for week 2. Continue reading →
If you are practicing machine learning, you are likely going to run into this at some point. Basically the reason we use feature scaling is to help our algorithms train faster and better. Lets begin by taking a standard theta optimization equation to help better understand the problem.
So I wrote an article earlier “Linear Regression From Scratch”. Many folks have pointed out that this is in fact not the optimal approach. Now being the perfectionist I decided to re-implement. Not to mention it works great in my own libraries. The following article discussing converting the original code into code that uses linear algebra. Beyond this, it still works in PCL for xamarin, Hoo-Rah Xamarin!