Battle of the Programming Languages

Hello World!

So this article is to help provide some guidance around which programming language to use.  Note that this article is specifically geared towards delivering code in which intelligence and information is the soul of the product.  In this day and age, that should be every product.

I want to preface this article with a few things

  1. This is an excerpt from a paper I wrote for internal use of my own volition.  As this is the case, I was able to remove all confidential information and publish my findings.
  2. I only analyzed F#, C#, R and Python.  I know there are more, but I picked the top dogs, but F# had some special circumstances that I felt it belonged.

The Problem

Today client devices are becoming increasingly powerful.  The simple Lumia 830 smart phone that currently resides in my pocket has 1gb RAM, 128gb SD and a quadcore processor for computation as well as integrated graphics.  It also has an array of sensors.  This sort of power is becoming increasingly common on devices this size as well as other form factors often with increased power capabilities.  This sort of power means that devices and perform computationally intensive tasks without server connectivity and can enable peer to peer networking schemes if necessary, which is powerful in emerging markets.

Software development is becoming more complex as each language and stack is becoming more and more empowered.  This is driving innovations such as node in which JavaScript can be the language of choice from embedded technologies, databases, web serving and ui, which contributes to the simplification of skills needed to deliver a product.  We also see technologies, such as the Python wrappers around the Plotly wrappers around D3.js enabling advanced interactive visualizations from a traditionally scientific computing language.

Needs of Intelligent Applications Developer

There are a host of needs that a developer has however from an intelligent applications perspective this can be broken down into a few key categories.

  1. Experimentation
    1. And conversion to production code
  2. Data Ingestion & Manipulation
  3. Data Visualization
  4. Machine Learning Capabilities
  5. Modern Application Targets
    1. Includes connected as well as disconnected experiences.

Languages and Platform Integrations Overview

There are a few key languages in the world, which would be applicable to these scenarios.  Those languages are: C#, F#, R and Python.  Below is a chart of languages and key tie ins with further explanations.  1 is poor performance, and 5 is excellent performance in each category.  After the chart I will break down in accordance to task area why each language received the score that it did.

  C# F# R Python
Experimentation 2 4 3 4
Data Ingest & Manipulate 2 4 5 4
Visualization 2 5 4 4
Machine Learning 3 4 4 5
Modern Apps 5 4 2 3
Total 14 21 18 20

Experimentation

Experimentation is a key task in intelligent application development as well as application development holistically.  This is an area separate from your production code in which you can build out sections of code, see how it does and then migrate into your production environment.  The key here is that not only do you experiment, but the experiment must be representative of how it lives in a production environment.  Below is the breakdown for each language.

C#

C# received a 2 for the following reasons

  • C# has a REPL.
    • .csx files are difficult to find and add to an application.
  • C# does not have
    • Intellisense for dynamic types
    • Integrating Dispersed data sets. Importing tsv, csv and other dirty data sources is difficult in C#.
    • Ad-Hoc Visualizations

F#

F# received a 4 for the following reasons.

  • F# has a REPL.
    • Referencing external libraries is tedious and requires more steps as opposed to Python and R.
    • Easy to surface and experiment in REPL, however no good visualizations into current REPL environment.
  • REPL to Production is fantastic.
  • Type providers enable easy of connecting disparate systems and data formats
    • It is more verbose than R.
  • Ad-hoc visualizations with XPlot (plotly wrapper)

R

R received a 3 for the following reasons.

  • Experimentation is phenominal in R, however code has a tendency to behave differently when brought into a production system or when receiving issues that are not explicitly experimented with in the REPL.
  • The REPL and production code environment are completely integrated and this causes migration issues.
  • Ad-hoc visualizations with a variety of choices
  • Great dispersed data integration, however systems integration is difficult during experimentation.

Python

Python received a 5 for the following reasons

  • REPL
  • Dispersed Data & Systems
  • Ad-hoc visualizations
  • Movement into production codebase is seamless and code behaves as expected.

Data Ingestion and Manipulation

Data Ingestion and Manipulation is important to have in a clean and concise way.  This section covers grouping, sorting, filtering, selecting, and dealing with na data quickly, easily, intuitively and in a fashion that is easy to read and understand.

C#

C# receives a 2 here for the following reasons

  • Linq, though powerful is only useable on objects, which can in theory be thought of as a dataframe, however you have to import data and spend extra time coercing it to this format.
  • Multi-dimensional data sets are difficult to deal with and slow.
  • There are packages out there to help deal with this, however most require a substantial amount of verbosity to achieve the result desired and often times is slow.

F#

F# received a 4 for the following reasons

  • The primary reason F# was is for
    • As compared to R, F# is more verbose
    • Renaming ingested types after the fact is difficult/impossible
  • Forward Pipe Operator provides significant cleanliness
  • Great built in manipulation features, which are further improved with Deedle.
  • F# is the only language with active/passive patterns and Discriminated Unions.

R

R received a 5 for the following reasons

  • Ingesting data is very easy
  • Manipulating data is very easy and clean
  • Comparatively R does a great job with dplyr, magrittr, tidyr and lubridate packages.
  • That said, R does have some downsides around dates are still difficult and coercion can create issues that do not surface themselves in an easily debug-able fashion.

Python

Python received a 4 for the following reasons

  • Docked primarily for verbosity in relation to other languages
  • Python does not have consistency in functional paradigms and pipe operators within primary packages.
  • Ingestion is light verbosity
  • Ability to manipulate multi-dimensional data is good.

Data Visualizations

Data Visualizations is the ability to render your data in meaningful ways clearly and concisely conveying what is important.  This category takes into account the ability to deliver visualizations within a single language to the modern application world.

C#

C# received a 2 in this space for the following reasons.

  • C# Charting libraries that exist such that you can build them in only C# are outdated and have no path for modern application development.
    • Most are forms based
    • Some web, but primarily have to jump into JavaScript.
  • No known interactive charting libraries for pure C#.
  • This said, there are C#/Xaml control combinations, which work, however ad-hoc visualizations and pure C# is not known.

F#

F# received a 5 in this space for the following reasons.

  • XPlot enables modern web visualizations around google charts and plotly
  • Fallback visualizations are good and client based working in Xamarin.
  • Does have databinding for client visualizations using primary charting libraries.
  • Works from the interactive.

R

R received a 4 in this space for the following reasons.

  • GGplot2 remains one of the best static visualization libraries out there.
  • Plotly R api wrappers are phenomenal, however you have to pay licensing to bring that to production.
  • Ggvis is another good option for interactive graphics, but again production is expensive.
  • Databinding only exists in ggvis through Shiny Server interaction, which is not ideal and also expensive.

Python

Python received a 4 in this space for the following reasons

  • Plotly wrappers are great and kept up.
  • Fallback visualizations are powerful.
  • No databinding

Machine Learning

This category focuses on each languages ability to perform machine learning tasks in several scenarios, big data, small data as well as wrappers and delivery mechanisms.

C#

C# receives a 3 for the following reasons.

  • There are many ML projects out there which you can purchase or use with C#.
  • C# can take advantage of all of the F# ML libraries, which is partly why it is a 3, however those libraries are not necessarily natural.
  • No CNTK or Tensor Flow wrappers.
  • Has CUDAfy.NET for gpu processing, however no higher level processing wrappers.
  • No Azure ML support.
  • Big Data Support

F#

F# receives a 4 for the following reasons.

  • No CNTK or Tensor Flow wrappers
  • Can use all of R’s packages through R Type Provider
  • No Azure ML support.
  • Distributed and GPU compute is well supported
  • Many high level wrappers around other common ML libraries
  • Passive/Active Patterns are very conducive for these types of workloads.
  • Big Data Support

R

R receives a 4 for the following reasons.

  • No CNTK or Tensor Flow wrappers
  • Does have GPU support
  • Lots of pre-built packages for various ML workloads that are easy to use
  • When data gets large, you need to move to paid for libraries, such as those by Microsoft R Enterprise (formerly Revolution R Enterprise).

Python

Python receives a 5 for the following reasons

  • CNTK & Tensor Flow wrappers with ability to deploy C++
  • GPU Support
  • Loads of other wrappers
  • Big Data support with just infrastructure costs.

Modern Application Integration

This category takes everything from earlier and assumes you are attempting to deploy the code to a modern application.  A modern application is defined as a web application, cross platform mobile and standard desktop for mac and windows.  This takes into account connected as well as disconnected experiences.

C#

C# received a score of 5 for the following reasons.

  • C# is the best language for this bar none.
  • net templates, .net core and Xamarin support.
  • Test frameworks
  • Documentation
  • Scaffolded Applications

F#

F# received a score of 4 for the following reasons.

  • Easily integrated into C# and served out.
  • It often lags behind C# in support
    • Example: .net core compatibility
    • net templates
    • Scaffolded Applications
  • Anything C# can do F# can do
    • Minus platform limitations and templates
    • Greatly increases F#’s delivery capabilities as it rides the C# wave.
  • Web, Client, Classic all covered on all platforms.

R

R received a score of 2 for the following reasons.

  • To even deploy R as an executable you need to wrap it.
  • Can be delivered through F#’s Type Provider and then through C#.
    • It’s a work around, but this enables Asp.Net and Xamarin through 2 integration layers.
  • Has to be deployed to something like R Server.
    • Security and Integrations are difficult.
  • Licensing is very restrictive and if R is packaged and deployed with your application, you are forced to open source your entire product.

Python

Python received a 3 for the following reasons.

  • Python does have Flask and Flask templates for various platforms.
    • Albeit the performance lags behind C# and F#.
  • Python can generate executables for classic desktops cross platform.
  • There is currently no path to deploy python for mobile applications aside from web back-ends.

Microsoft Data Platform & Language Investments

This section is simply a chart with Microsoft Data Platform investments and languages that it supports.  Note that not all platform appliances are listed here, simply some of the more high impact ones.

  C# F# R Python *
SQL Yes T-SQL
HD-Insight Yes Yes Yes Yes
Azure-ML Yes Yes Consume All
Data Lake Yes Sort-Of U-SQL
Data Factory
Cognitive Services Yes Yes Yes Yes ALL
Power BI Yes JavaScript
Stream Analytics T-SQL
IoT/Event Hub Yes Yes C/C++
Web Apps Yes Yes Yes Many Others
Azure Storage Yes Yes Sort-Of Yes Most other languages
Azure Batch Yes Yes Yes
Azure Rest Yes Yes Yes Yes ALL
Service Fabric Yes Yes Sort-Of Yes Anything with .exe
Visual Studio Yes Yes Yes Yes Most

What Does this Mean?

From a platform support perspective with a focus on data workloads, C# is the number 1 language with F# behind it.  Python comes in third while R comes in last.  From an intelligent applications developer’s perspective this ranking is F#, Python, R and then C#.

  Platform Perspective AI Dev Perspective
1st Place C# F#
2nd Place F# Python
3rd Place Python R
4th Place R C#

This has to of course be balanced with from an AI dev perspective R has no real options for disconnected execution and python misses all disconnected modern client scenarios and is relegated only to classic desktop in those scenarios.

This means that in each language there are gaps and there is no optimal path for an AI dev to move forward with.  Each language has its benefits and its cons, while no one language executes across the board on functionality and platform adoption.  If we were to place an award for each language for position from each perspective (1st – 4 points, 2nd – 3 points, 3rd – 2 points and 4th – 1 point) and added that into our diagram, we would have the following ordering

  C# F# R Python
Experimentation 2 4 3 4
Data Ingest & Manipulate 2 4 5 4
Visualization 2 5 4 4
Machine Learning 3 4 4 5
Modern Apps 5 4 2 3
Platform View 4 3 1 2
AI Dev View 1 4 2 3
Total 19 28 21 25

This chart shows us that clearly F# and Python are our best candidates across the board, however they have some key areas they lack in.  F#’s lacking area is primarily lack of adoption, which I believe is partly due to not being supported at the same time C# is.  For example, asp.net core releases with C#, but no F# support.  Python’s primary lacking is delivery to modern mobile applications.

Summary

There are a variety of new needs happening with the progress of technology and the coming of data science to the forefront of the technology world.  With that said, it is my view that F# provides the most complete view in this new paradigm shift in technology.  That paradigm shift is defined as Data Science and Intelligent Applications becoming the central pivot of most computational workloads in every day applications.

3 thoughts on “Battle of the Programming Languages

  1. Pingback: Machine Learning Study Group Recap – Week 1 | DaCrook

  2. Pingback: F# Weekly #20, 2016 – Sergey Tihon's Blog

  3. Pingback: Basic Gradient Descent | DaCrook

Leave a Reply

Your email address will not be published. Required fields are marked *