Here is a recorded version of an in-person training I have been doing. Enjoy. I end up coming back to this myself even for reference.
This episode is all about performing data manipulation to derive raw insights from your data using the R programming language. Data manipulation is the core to anything and everything you do in business intelligence and machine learning. This episode sets the base for all R based intelligence sessions from here on out.
This article is a video tutorial on introduction to the very bare basics of R. Its a bit dry, but it is the underlying components of everything covered in the interesting stuff. Can’t do cool stuff without understanding the basics first.
Ever wonder the difference between R and Microsoft R? Considering learning R as a programming language? You should probably watch this video. It is the first in a 4 part series to give you the jump start you need to becoming a professional data scientist with R.
These days I need to make videos instead of written articles, so I am going to post a few of those here.
In this video we will do an initial exploratory analysis on a water flow data set that came from a prototype that I built. The prototype consists of a water pump, a valve and a flow meter. The data set exists in SQL Azure. We will use R and R Studio to perform the analysis from an Azure virtual machine.
Today is a freaking cool day. Why do you ask? Because today I am writing an article on how to use two of the coolest freaking big data/data science tools out there together to do epic shit! Lets start with HBase. HBase is a way to have a big data solution with query performance at an interactive level. So many folks are starting to just dump data into HBase. In the project teddy solution, we are dumping tweets, dialogue and dialogue annotations to power our open domain conversational api. There really is no other way that is easy to use for us to do this.
The second part of project teddy is to predict based on an incoming conversational component, what sort of response the speaker is attempting to illicit from the teddy bear. If we power our teddy bear with predictive analytics and big data, this would be perfect. What better platform to do this quickly and easily than AzureML?
Many folks may know that the South Florida Evangelism team is undertaking a task that many think is impossible. Well, in that statement all I hear is “there is still a chance!” The end goal is to create a teddy bear that can have a conversation about anything. So step one is to collect as much dialogue as possible from as many sources as possible and annotate them. What better place to power an association engine for word and phrase relevance than something that forces you down to 140 characters to get your message across.
So as any normal developer I decided to start by looking for samples already out there. MSDN has a great starter for writing tweets and doing sentiment analysis with HBase and C#. The only issue with the sample is, that it is very poorly written and difficult to understand with no separation of concerns. So I want to go through simplifying the solution and separating a few concerns out.
So I had a life changing event this past Sunday at 8:55am 5/24/2015. My first child was born! Both child and wife are healthy and happy. Everything is good in life. Like many couples though, my wife and I struggled to find the right name for our child. We didn’t want something too common, or was an old person name, or so rare and funky that nobody could spell it. We also realized we just had a general lack in knowing what names were out there. So after much debate and discussion over what to name her, I started doing a bit of an analysis using some census data. I want to thank Jamie Dixon for providing the data that he found for use in his Dinner Nerds article. The data itself can be found here. This article will discuss the code used to go through all of the data and provide insights into child names.
As many of you may know at this point, I am relocating to South Florida. Final location to be determined, but will probably be renting around Pompano Beach or Fort Lauderdale while working out of Venture Hive and the Microsoft Fort Lauderdale Offices. So what does this have to do with Zillow? Well, It has EVERYTHING to do with Zillow. What I’ve found while searching for homes is that between Realtors, Zillow and Trulia, they really just don’t have a predictive analytics solution that works for me. So I decided to give a shot at AzureML to mash together a few datasets to send me notifications more to my liking than is currently being sent. So step 1 in this plan is to data mine Zillow. Luckily, Zillow has an api for that. Or if you are feeling particularly frisky, Zillow gets their data from ArcGIS (example for Raleigh). So lets get cracking…
The answer to these questions are pretty much all the same. Step 1, learn about it and build one piece of software focused on that goal. Step 2, go for it, just do it. So that said, Microsoft has a fantastic resource, Microsoft Virtual Academy, which provides free training around various topics from entry level to advanced. This article focuses on a learning plan with MVA to attain the goal of becoming an Analytics Developer.