Decoding Woes Solved – Python

Hello World,

This is a short post.  Basically I had a data set come in, where there were some funky characters involved.  I was getting “Can’t read this; doesn’t appear to be UTF-8”.  Looked around on stackoverflow for a while to little avail.  I came up with this, which works.

dataPath = "C:\\data\\CompanyA\\DavidCrook\\davidData_Session1.csv"
fil = open(dataPath)
txt = fil.readlines()
txt = ''.join(txt)
works = pd.read_csv(StringIO(txt), index_col = 0)
doesntWork = pd.read_csv(dataPath, index_col = 0)

Just read the sucker with the standard file open and line reader, push it into a StringIO and then read into a data frame.  Guess what I’m doing from now on.

#MicroBlogPost 🙂

Standardize Continuous Data Shape for Neural Networks

Hello World,

So this is an interesting problem.  You are collecting data from somewhere and you want to feed it into a neural network for classification.  There is one main problem with this.  The shape of the data!  Neural networks and really just anything require specifically shaped data, you can’t just like give it something of ambiguous size.  There are tons of papers out there on dimensionality reduction, but nothing on dimensionality reduction to a specified size.  This article explains my approach.

Continue reading

Time Series Discovery with Python

Hello World,

This article is loosely based on a time series challenge from customer data.  I have fabricated 3 data files such that they represent the same challenge and we will go through the process of discovering that data.  The primary challenge in this data set is that it is from a sleep study and the researchers left the date portion of the time stamp off.  What this means is that at midnight, the data plots at the beginning of the x-axis.  The second challenge is lining up data to see if there is anything interesting with the time.  So yes, you can simply plot using the index that python generates, however I’m also interested in the actual time itself as this is a study involving humans.

Continue reading

Data Science 101 – St. Pete .Net Talk

Hello World,

Here is the link to the slide deck.

Here is the description as it exists on meetup and here is the link to the meetup page.

This session is an introduction to data science and how it fits into the greater application development workflow. We will begin with an generic architecture overview followed by an actual implementation example using several Azure components. We will then make a dive into data insights and algorithm development.

Becoming a Functional Data Scientist

Hello World,

So today, I was asked to put some thought into what we should focus our entry level data scientists on in terms of tech skills.  After I put a bunch of thought into it, I ended up coming up with this.  I decided that the most important aspect of this was a few items fold

  1. Don’t overload them
  2. Can deliver to production where the target can be anything, including IoT.
  3. They will not be concerned with building front ends.

I have to say, the result greatly surprised me.

Continue reading

Microsoft Tech and Robotics

Hello World,

This article is a high level discussion on where you might use various Microsoft Technologies in the field of robotics.  I will begin with a side pet project I’m kicking off to get more familiar with some cool tools and tech I’ve lately discovered so I can hopefully get assigned to some really cool projects at work, including drones.

Continue reading

Setting up Python and Virtual Environments in Visual Studio Code on Ubuntu

Hello World,

I’m writing this article because believe it or not, this process is a pain in the neck and not completely documented in any one place. Lets start with why in the world you would want to do this. For me, I want to use Tensor Flow and NVidia embedded robotics SDKs. Unfortunately the only supported dev environment for this is Ubuntu. Not anything against Ubuntu it just appears to be fairly unstable in comparison to Mac and Windows, but that is neither here nor there, if you want to build intelligent robots, you need these tools.

Continue reading

K-Means under the hood with Python

plot_6

Hello World!

This article is meant to explain how the K-Means Clustering algorithm works while simultaneously learning a little Python.

What is K-Means?

K-Means Clustering is an unsupervised learning algorithm that tells you how similar observations are by putting them into groups or “clusters”.  K-Means is often used as a discovery step on new data to discover what various categories might be and then apply something such as a k-nearest-neighbor as a classifier to it after understanding the centroid labels.  Where a centroid is the center of a “cluster” or group.

Continue reading

SQL Saturday 524 Slides – Analytical Computing – Device to Cloud and Back

Hello World!

Here you can find the slide deck for my talk for SQL Saturday #524 in South Florida.  I hope you find this useful, but it should be far more useful if you attend in person.  This talk lays the foundation for building and understanding analytical computing for cloud and devices as well as how they work together.

https://drcdata.blob.core.windows.net/slidedecks/Analytics_Device_To_Cloud_And_Back.pptx