I’m not sure the title really nailed it well enough, but we are going to talk about solving VERY big problems as fast as we possibly can using highly sophisticated techniques. This blog article is really a high level overview of what you want to set up as opposed necessarily to the usual how to set it up. There are a ton of steps to the actual how to; I thought it best to just provide an overview in this article to what you want to do instead of how to do it.
So I just completed an incredible project with Brain Thermal Tunnel Genix, where I learned so much about pattern recognition, machine learning and taking research and algorithms and pushing those into a production environment where it can be integrated into a real product. Today’s article takes those lessons and provides a sample on how to perform complex modelling and operationalize it in the cloud. The accompanying Gallery Example can be found here.
Here is the description as it exists on meetup and here is the link to the meetup page.
This session is an introduction to data science and how it fits into the greater application development workflow. We will begin with an generic architecture overview followed by an actual implementation example using several Azure components. We will then make a dive into data insights and algorithm development.
So today, I was asked to put some thought into what we should focus our entry level data scientists on in terms of tech skills. After I put a bunch of thought into it, I ended up coming up with this. I decided that the most important aspect of this was a few items fold
Don’t overload them
Can deliver to production where the target can be anything, including IoT.
They will not be concerned with building front ends.
This article is meant to explain how the K-Means Clustering algorithm works while simultaneously learning a little Python.
What is K-Means?
K-Means Clustering is an unsupervised learning algorithm that tells you how similar observations are by putting them into groups or “clusters”. K-Means is often used as a discovery step on new data to discover what various categories might be and then apply something such as a k-nearest-neighbor as a classifier to it after understanding the centroid labels. Where a centroid is the center of a “cluster” or group.
So I’ve spent a while now looking at 3 competing languages and I did my best to give each one a fair shake. Those 3 languages were F#, Python and R. I have to say it was really close for a while because each language has its strengths and weaknesses. That said, I am moving forward with 2 languages and a very specific way I use each one. I wanted to outline this, because for me it has taken a very long time to learn all of the languages to the level that I have to discover this and I would hate for others to go through the same exercise.
So here we go with another recap. This week we did a deep dive into binary classification using Logistic Regression. Logistic regression and binary classification is the underpinnings for modern neural networks so a deep and complete understanding of this is necessary to be proficient in machine learning.
Sigmoid really isn’t that complicated (once your understand it of course). Some back knowledge in case you are coming at this totally fresh is that the Sigmoid function is used in machine learning primarily as a hypothesis function for classifiers. What is interesting is that this same function is used for binary classifiers, multi-class classifiers and is the backbone of modern neural networks.
So this article is inspired by a customer doing financial analysis who can only grab a certain amount of data at a time from the data steward’s stores in chunks based on time windows. As time is constantly moving, what happens is that occasionally you get duplicate data in each request. If you attempt to grab exactly on the edges, you have a chance of missing something, so its best to have a bit of an overlap and just deal with that overlap. Continue reading →