Today is a freaking cool day. Why do you ask? Because today I am writing an article on how to use two of the coolest freaking big data/data science tools out there together to do epic shit! Lets start with HBase. HBase is a way to have a big data solution with query performance at an interactive level. So many folks are starting to just dump data into HBase. In the project teddy solution, we are dumping tweets, dialogue and dialogue annotations to power our open domain conversational api. There really is no other way that is easy to use for us to do this.
The second part of project teddy is to predict based on an incoming conversational component, what sort of response the speaker is attempting to illicit from the teddy bear. If we power our teddy bear with predictive analytics and big data, this would be perfect. What better platform to do this quickly and easily than AzureML?
As Azure becomes more and more popular and I encounter more startups, I find myself doing this tutorial all the time and explaining it. Therefor, I have decided to write a blog article with pictures, to make my life (and yours) easier.
What is BizSpark
BizSpark is the best thing since sliced bread for a startup. It is literally every development tool and license that Microsoft has to offer for free for commercial purposes for 3 years. Not only that, but you get $150/month (as of this writing) in Azure for 3 years as well. As if it couldn’t get any better, you get access to reduced pricing on various products from Microsoft Partners. Microsoft being such a giant of a company, there are a TON of partners you get special pricing from. BizSpark also includes product licensing such as just your simple windows licenses, or visual studio licenses, and even SQL Server licenses. Its everything! Usually at this point I get the question, so what’s the catch? There is no catch! Microsoft wants you to use their tools and be successful with their tools so that when you become a giant company, you are using their tools and not a competitors. Therefor Microsoft gives these tools to high potential startups for free! If you think you qualify, apply or come to an event I attend (usually found on the events tab). You can also ping me on twitter @DavidCrook1988.
Many folks may know that the South Florida Evangelism team is undertaking a task that many think is impossible. Well, in that statement all I hear is “there is still a chance!” The end goal is to create a teddy bear that can have a conversation about anything. So step one is to collect as much dialogue as possible from as many sources as possible and annotate them. What better place to power an association engine for word and phrase relevance than something that forces you down to 140 characters to get your message across.
So as any normal developer I decided to start by looking for samples already out there. MSDN has a great starter for writing tweets and doing sentiment analysis with HBase and C#. The only issue with the sample is, that it is very poorly written and difficult to understand with no separation of concerns. So I want to go through simplifying the solution and separating a few concerns out.
As many of you may know at this point, I am relocating to South Florida. Final location to be determined, but will probably be renting around Pompano Beach or Fort Lauderdale while working out of Venture Hive and the Microsoft Fort Lauderdale Offices. So what does this have to do with Zillow? Well, It has EVERYTHING to do with Zillow. What I’ve found while searching for homes is that between Realtors, Zillow and Trulia, they really just don’t have a predictive analytics solution that works for me. So I decided to give a shot at AzureML to mash together a few datasets to send me notifications more to my liking than is currently being sent. So step 1 in this plan is to data mine Zillow. Luckily, Zillow has an api for that. Or if you are feeling particularly frisky, Zillow gets their data from ArcGIS (example for Raleigh). So lets get cracking…
So I thought it would be beneficial to discuss Angular, Web API and Azure in some form of depth as well as provide an entire set of functioning code. I will start by addressing a few things, What is Angular, What is Web API, and what is Azure? Followed by the code and explanation of the code. The code itself provides a simple website, which has restful routing, requests for processing and lists out data from a database acquired from said processing.
The answer to these questions are pretty much all the same. Step 1, learn about it and build one piece of software focused on that goal. Step 2, go for it, just do it. So that said, Microsoft has a fantastic resource, Microsoft Virtual Academy, which provides free training around various topics from entry level to advanced. This article focuses on a learning plan with MVA to attain the goal of becoming an Analytics Developer.
This should prove to be an interesting series of posts coming up, as I am working on a new project that is very unique and interesting. The idea is to use incoming data from Arduinos, Raspberry Pis, Gallileos, Edisons and other assortments of IoT type devices connected to oil and gas pipelines to determine if a leak is currently in progress and also predict if a leak is likely to occur in the future based on current and trending conditions.
My part in the project is all back end analytics, and I have very little to do with the actual telemetry and hardware. The telemetry will be posted using Azure Event Hubs, and thus my portion of the project begins with mocking that real time data at a large enough geo dispersed scale that I can develop a system that can handle it, and then switch my configurations to consume from the production event hubs. Since I am no longer a consultant working on projects with trade secrets and everything these days is about the elevation of skills in the community, I have posted everything on github that you can download and peruse at your leisure. Please note that this is in progress and well the github source may not necessarily work when you look. I’ll try to enforce a standard to comment “working – comment” on pushes to the repository. The git repository is located here: https://github.com/drcrook1/OilGas
This article is one of those that is going to help remind me how to do this deployment, as it can be a bit tricky. If you are working with F# for web jobs, like I have started doing, there are a few steps.
Create a new console application
Add proper nuget packages
Manually add a .dll reference and copy said .dll to output
So you are going to notice a slight shift in this blog to start incorporating not only video game development, but hardcore data analytics. As part of that shift, I am going to start incorporating F# into my standard set of languages as it is the language of hardcore data analytics if you roll with the .NET stack.
This particular article is about building a console based blob manager in F# instead of C#. The very first thing I noticed about using F# to manage my blobs as opposed to C# is just the sheer reduction in lines of code. The code presented here is a port of the C# article located here. This code will eventually make its way into a production system which is part of a big data solution I am building. New data sets that we acquire will be uploaded into blob storage, an entry stored into a queue, with a link to the data set. Once a job is prepared to run, the data will be moved to Hadoop to do the processing and then stored in its final location. So step 1 is…Store data in Blob storage.
Welcome to Part 2! We will be discussing Binary Classification. So I hope many of you have started using AzureML. If not, you should definitely check it out. Here is the link to the dev center for it. This article series will focus on a few key points.
Understanding the Evaluation of each Model Type.
Understanding the published Web Service of each Model
If you are looking for how to build a simple how to get started, check out this article.
The series will be broken down into a three parts.