Becoming a Functional Data Scientist

Hello World,

So today, I was asked to put some thought into what we should focus our entry level data scientists on in terms of tech skills.  After I put a bunch of thought into it, I ended up coming up with this.  I decided that the most important aspect of this was a few items fold

  1. Don’t overload them
  2. Can deliver to production where the target can be anything, including IoT.
  3. They will not be concerned with building front ends.

I have to say, the result greatly surprised me.

Continue reading

Setting up Python and Virtual Environments in Visual Studio Code on Ubuntu

Hello World,

I’m writing this article because believe it or not, this process is a pain in the neck and not completely documented in any one place. Lets start with why in the world you would want to do this. For me, I want to use Tensor Flow and NVidia embedded robotics SDKs. Unfortunately the only supported dev environment for this is Ubuntu. Not anything against Ubuntu it just appears to be fairly unstable in comparison to Mac and Windows, but that is neither here nor there, if you want to build intelligent robots, you need these tools.

Continue reading

K-Means under the hood with Python

plot_6

Hello World!

This article is meant to explain how the K-Means Clustering algorithm works while simultaneously learning a little Python.

What is K-Means?

K-Means Clustering is an unsupervised learning algorithm that tells you how similar observations are by putting them into groups or “clusters”.  K-Means is often used as a discovery step on new data to discover what various categories might be and then apply something such as a k-nearest-neighbor as a classifier to it after understanding the centroid labels.  Where a centroid is the center of a “cluster” or group.

Continue reading

Merging Data Sets in Python

Hello World,

So this article is inspired by a customer doing financial analysis who can only grab a certain amount of data at a time from the data steward’s stores in chunks based on time windows. As time is constantly moving, what happens is that occasionally you get duplicate data in each request. If you attempt to grab exactly on the edges, you have a chance of missing something, so its best to have a bit of an overlap and just deal with that overlap.
Continue reading

Getting Started with Linear Algebra in Python

Hello World!

So here I am after trying for a long time to not learn Python learning Python.  It just seems like I might get a hit or two more on my blog with some Python content.  Well whats the first thing I need to figure out aside from getting it up and running in my environment and installing some libraries… Thats right, find a numerical computing library and see how it ticks.

Lets just start with my environment, because I painstakingly chose one.

Continue reading

Basic Gradient Descent

Hello World!

In this article we are going to cover a simple version of Gradient Descent. It is important to note that this version of gradient descent is using Sum of Squares as its cost function to reduce. This implementation utilizes vectorized algorithms. Lets start off with…

What is Gradient Descent

Continue reading

Linear Regression from Scratch using Linear Algebra

Hello World!

So I wrote an article earlier “Linear Regression From Scratch”.  Many folks have pointed out that this is in fact not the optimal approach.  Now being the perfectionist I decided to re-implement.  Not to mention it works great in my own libraries.  The following article discussing converting the original code into code that uses linear algebra.  Beyond this, it still works in PCL for xamarin,  Hoo-Rah Xamarin!

Continue reading

Linear Regression from Scratch

Hello World,

So today we will do a quick conversion from mathematical notations of Algebra into a real algorithm that can be executed.  Note we will not be covering gradient descent, but rather only cost functions, errors and execution of these to provide the framework for gradient descent.  Gradient descent has so many flavors that it deserves its own article.

So to the mathematical representation.

LinearRegression

Continue reading

Miami’s top 10 Jail Bookings

Hello World!

So I’ve been working on building some interesting visualizations with open data.  Today I get to show off a really interesting one, not only will we discuss the visualization in depth, but also dive into how I built it.  And here it is, the top 10 bookings in Miami where the legend is in descending order for most common bookings holistically.

Continue reading

Intro to Data Manipulation with R

Hello World,

Here is a recorded version of an in-person training I have been doing.  Enjoy.  I end up coming back to this myself even for reference.

This episode is all about performing data manipulation to derive raw insights from your data using the R programming language.  Data manipulation is the core to anything and everything you do in business intelligence and machine learning.  This episode sets the base for all R based intelligence sessions from here on out.

Part 1: Introduction to Microsoft R Open.

Part 2: Introduction to R Data Structures

Part 3: Data Manipulation with R

Part 4: Beautiful Visualizations with R

Continue reading