So today, I was asked to put some thought into what we should focus our entry level data scientists on in terms of tech skills. After I put a bunch of thought into it, I ended up coming up with this. I decided that the most important aspect of this was a few items fold
Don’t overload them
Can deliver to production where the target can be anything, including IoT.
They will not be concerned with building front ends.
I’m writing this article because believe it or not, this process is a pain in the neck and not completely documented in any one place. Lets start with why in the world you would want to do this. For me, I want to use Tensor Flow and NVidia embedded robotics SDKs. Unfortunately the only supported dev environment for this is Ubuntu. Not anything against Ubuntu it just appears to be fairly unstable in comparison to Mac and Windows, but that is neither here nor there, if you want to build intelligent robots, you need these tools.
This article is meant to explain how the K-Means Clustering algorithm works while simultaneously learning a little Python.
What is K-Means?
K-Means Clustering is an unsupervised learning algorithm that tells you how similar observations are by putting them into groups or “clusters”. K-Means is often used as a discovery step on new data to discover what various categories might be and then apply something such as a k-nearest-neighbor as a classifier to it after understanding the centroid labels. Where a centroid is the center of a “cluster” or group.
So this article is inspired by a customer doing financial analysis who can only grab a certain amount of data at a time from the data steward’s stores in chunks based on time windows. As time is constantly moving, what happens is that occasionally you get duplicate data in each request. If you attempt to grab exactly on the edges, you have a chance of missing something, so its best to have a bit of an overlap and just deal with that overlap. Continue reading →
So here I am after trying for a long time to not learn Python learning Python. It just seems like I might get a hit or two more on my blog with some Python content. Well whats the first thing I need to figure out aside from getting it up and running in my environment and installing some libraries… Thats right, find a numerical computing library and see how it ticks.
Lets just start with my environment, because I painstakingly chose one.
In this article we are going to cover a simple version of Gradient Descent. It is important to note that this version of gradient descent is using Sum of Squares as its cost function to reduce. This implementation utilizes vectorized algorithms. Lets start off with…
So I wrote an article earlier “Linear Regression From Scratch”. Many folks have pointed out that this is in fact not the optimal approach. Now being the perfectionist I decided to re-implement. Not to mention it works great in my own libraries. The following article discussing converting the original code into code that uses linear algebra. Beyond this, it still works in PCL for xamarin, Hoo-Rah Xamarin!
So today we will do a quick conversion from mathematical notations of Algebra into a real algorithm that can be executed. Note we will not be covering gradient descent, but rather only cost functions, errors and execution of these to provide the framework for gradient descent. Gradient descent has so many flavors that it deserves its own article.
So I’ve been working on building some interesting visualizations with open data. Today I get to show off a really interesting one, not only will we discuss the visualization in depth, but also dive into how I built it. And here it is, the top 10 bookings in Miami where the legend is in descending order for most common bookings holistically.
Here is a recorded version of an in-person training I have been doing. Enjoy. I end up coming back to this myself even for reference.
This episode is all about performing data manipulation to derive raw insights from your data using the R programming language. Data manipulation is the core to anything and everything you do in business intelligence and machine learning. This episode sets the base for all R based intelligence sessions from here on out.