My Production Data Science Workflow

Hello World,

So I’ve spent a while now looking at 3 competing languages and I did my best to give each one a fair shake. Those 3 languages were F#, Python and R. I have to say it was really close for a while because each language has its strengths and weaknesses. That said, I am moving forward with 2 languages and a very specific way I use each one. I wanted to outline this, because for me it has taken a very long time to learn all of the languages to the level that I have to discover this and I would hate for others to go through the same exercise.

Language Dropped

Lets start with the language I dropped.  It is with very much disappointment that I am dropping Python.  This is honestly a very difficult decision.  Here is why.

  1. Too many languages to know for my limited brain.
  2. It serves the same purpose as F#, but not necessarily as R.
  3. F# won the Client Device battle as well as the Cross Platform Battle.
  4. Python’s tool chain for production code delivery is not nearly as up to the same par as F#.  This was the largest draw back.

Why was this decision hard?  Tensor Flow and CNTK really made this challenging.  CNTK is shipping python wrappers next month and Tensor Flow already has python wrappers.  Advanced GPU computing is really something I wanted in on.  But some quick research around F# Machine Learning and GPU revealed that hey, F#’s had advanced Machine Learning on GPU for a long time.  The caveat is documentation, but I’m pretty good, so I can figure it out.

Ok, so this isn’t an article about language battles, to the workflow

So I have R and F# left.  I am happy with this decision.  Here is the new workflow.

Initial Engagement

Most engagements begin with a customer saying “Heres my data!”.  R is perfect for this.  Super fast prototyping and data engagement.  Its not the most production ready language, but it is super strong, great visualizations, efficient etc etc.  Not only that I can take this code and put it into SQL Server and Power BI.  It also runs in AzureML and a variety of big data platforms where performance may not be the most critical aspect.  R makes me very powerful very quickly.

Production Delivery

This is where F# comes in.  I figured out generally what I want to do now.  I have delivered some value with R.  Often though you need to build api’s, micro-services, client applications big data compute that is lightning fast etc.  This is where F# really comes in and shines.  Its my one stop shop for production level coding integrating into the familiar and well developed .net tool chain as well as other cloud based services.  Beyond this seeing F# being adopted so readily by Xamarin gives me hope for the future of the language and its future adoption.  Its open source as are most of the libraries for it so I suspect academia and research should pick it up a bit more readily.

Conclusion

Using this work flow and these technologies, I as an individual can delivery full stack projects with advanced analytics at the heart and soul of them quickly and easily focusing more directly on the problem at hand.

Many folks know evangelists like myself primarily as talkers or presenters, but the other aspect of our job is to deliver value, typically in a highly competitive ecosystem.  This is often done solo due to the distributed aspect of our position and the geographic responsibilities.  As a result I require the tooling that empowers me as an individual the most such that I can deliver value to the folks in my community in a fast, effective and efficient way.

I experimented with the other workflows and have delivered production code with them, however I have personally found this the most effective and will be standardizing on this.

Final Notes

I’ll still post some Python blog articles for kicks, but I will likely not be shipping production code with it.

3 thoughts on “My Production Data Science Workflow

  1. Pingback: F# Weekly #24, 2016 – Sergey Tihon's Blog

  2. Pingback: Data Science Languages – Curated SQL

  3. Thanks for sharing your experience with learning these languages , many new data science aspirants get confused with deciding which languages to learn so this post will help them get a better idea.

Leave a Reply

Your email address will not be published. Required fields are marked *