Flask Mongo Engine with Azure Cosmos DB

Hello World!

This article we will do a light touch on Cosmos DB; specifically the Mongo API from Cosmos DB and using that API from Mongo Engine.  I think one of the great things about Cosmos DB’s Mongo API is that I simply swap out my connection strings and guess what; it works!  This means not only can I use Mongo Engine, but I can use PyMongo or any other framework for any language that connects to Mongo.

Understanding Tensor Flow Input Pipelines Part 1

Hello World!

Alright; so this whole input pipeline thing in pretty much every framework is the most undocumented thing in the universe.  So this article is about demystifying it.  We can break down the process into a few key steps:

1. Acquire & Label Data
2. Process Label Files for Record Conversions
3. Process Label Files for Training a Specific Network Interface
4. Train the Specific Network Interface

This is part 1.  We will focus on the 3rd item in this list; processing the files into TF Records.  Note you can find more associated code in the TensorFlow section of this git repository: https://github.com/drcrook1/CIFAR10

Operationalizing SKLearn with Azure Machine Learning

Hello World!

So I just completed an incredible project with Brain Thermal Tunnel Genix, where I learned so much about pattern recognition, machine learning and taking research and algorithms and pushing those into a production environment where it can be integrated into a real product.  Today’s article takes those lessons and provides a sample on how to perform complex modelling and operationalize it in the cloud.  The accompanying Gallery Example can be found here.

Arduino + Reinforcement Learning = Autonomous Robot

Hello World,

So there are a ton of articles out there on the theory of Reinforcement Learning, but very few with an actual application.  I watched a few lectures from Berkley, and read a few articles by NVidia and thought, “Well, lets just give this a shot”.  8 hours later, this is what I had.

Herby V1 simply learns to go forward as much as possible while avoiding obstacles.

Decoding Woes Solved – Python

Hello World,

This is a short post.  Basically I had a data set come in, where there were some funky characters involved.  I was getting “Can’t read this; doesn’t appear to be UTF-8”.  Looked around on stackoverflow for a while to little avail.  I came up with this, which works.

dataPath = "C:\\data\\CompanyA\\DavidCrook\\davidData_Session1.csv"
fil = open(dataPath)
txt = ''.join(txt)
works = pd.read_csv(StringIO(txt), index_col = 0)
doesntWork = pd.read_csv(dataPath, index_col = 0)

Just read the sucker with the standard file open and line reader, push it into a StringIO and then read into a data frame.  Guess what I’m doing from now on.

#MicroBlogPost 🙂

K-Means under the hood with Python

Hello World!

What is K-Means?

K-Means Clustering is an unsupervised learning algorithm that tells you how similar observations are by putting them into groups or “clusters”.  K-Means is often used as a discovery step on new data to discover what various categories might be and then apply something such as a k-nearest-neighbor as a classifier to it after understanding the centroid labels.  Where a centroid is the center of a “cluster” or group.

Miami Data Science – R Fundamentals Talk

Hello World,

Here is the slide deck link for the talk from last night. https://drcdata.blob.core.windows.net/slidedecks/DataScience_MSRO_DataManipulation_Visualization.pptx

Also you can find the free video series here: https://aka.ms/rjumpstart

Finally, don’t forget to tweet me @Data4Bots if you want into the slack channel.

Machine Learning Study Group Recap – Week 4

Hello World,

So here we go with another recap. This week we did a deep dive into binary classification using Logistic Regression. Logistic regression and binary classification is the underpinnings for modern neural networks so a deep and complete understanding of this is necessary to be proficient in machine learning.

Sigmoid for Classifiers Decoded

Hello World,

Sigmoid really isn’t that complicated (once your understand it of course).  Some back knowledge in case you are coming at this totally fresh is that the Sigmoid function is used in machine learning primarily as a hypothesis function for classifiers.  What is interesting is that this same function is used for binary classifiers, multi-class classifiers and is the backbone of modern neural networks.

Here is the sigmoid function:    $\frac{ 1 }{ 1 + e^{-z}}$

Merging Data Sets in Python

Hello World,

So this article is inspired by a customer doing financial analysis who can only grab a certain amount of data at a time from the data steward’s stores in chunks based on time windows. As time is constantly moving, what happens is that occasionally you get duplicate data in each request. If you attempt to grab exactly on the edges, you have a chance of missing something, so its best to have a bit of an overlap and just deal with that overlap.