Analytics · R

R vs. Python?

David Feldman Data Analyst at Scribd

June 7th, 2015

Both R and Python are popular languages used to perform data analysis tasks. From what I understand, Python is a great general-purpose language, and R's functionality is developed specifically with  statisticians in mind. I've heard people argue both sides, but I wonder which is better for daily use?

Benjamin Olding Former Co-founder, Board Member at Jana

June 7th, 2015

I did a phd in statistics.  Everyone used R.  I didn't know R (I was not a stats undergrad), and it seemed magical: everyone was using it to solve everything.  So, I invested time learning it.

I was pretty disappointed.  It really seemed like the result of a small community only knowing a single scripting language.  You can do pretty much anything with pretty much any language.  Why would you want to though?  This isn't a case of best tool - it's just the only script tool for that community (or was at the time - I think it's changing, mercifully).

If you already know R and can accomplish a task with a R and you don't know python, I can't see a reason for you to not just use R to solve your problem.

If you already know python, then check out pandas and numpy/scipy.  When I was in grad school, these tools didn't exist, and as a result, I would have told you then that it made more sense to use the packages already in R than code the specialized routines you needed in another language.  Even so, R is just awful at manipulating data; I'd usually manipulate the data into the form I wanted outside R, then use read.table to read it in and pass it through the least amount of R code I needed to get the analysis done.  I was hardly alone: in fact, many of my fellow grad students just wrote everything in C++ for their dissertation, using R just as a way to easily bang out graphs when needed.  

Now that these python-based tools and libraries exist, however, I see no reason for a python programmer to not turn to them first, regardless of what you may hear about R.

If you do not know either R or python, please just learn python with pandas; this is the future.  There is nothing inherent to the R language that makes it superior - it just has a lot of packages already written for it.  However, that advantage decreases every day as more people contribute to pandas and numpy.  I love stats - but the ideas behind statistical analysis aren't "owned" by a programming language.  Python didn't really exist when S was created (the precursor to R).  S+ and then R had real advantages over other script-based languages for a long time.  It's just no longer the case.
Python can realistically be used for 20 other things, unlike R, and the reality of analysis is usually that more than 50% of the work is getting the data into a usable form.  R just fails at this.  As a result, I used a lot of awk and sed; but python will get things done too.  I only turned to awk and sed because R was so terrible at manipulating real-world raw data.  R does a fine job at analysis once you have things in table form, but it doesn't do a better job at it than python if the routine exists in both languages (and, unless you're doing something pretty obscure at this point, it likely does).

I really don't see a trade-off on this one.  Unless you already know R for some reason, I believe the answer to your question is python, full stop.

Hasan Diwan contract Data Scientist to several startups

June 7th, 2015

As with most such questions, it depends. Python was designed by a computer scientist; R by statisticians. The personalities of the designers of each shine through in their use.

Ana Echeverri Visual Analytics, Predictive Analytics, Enterprise Software

June 7th, 2015

I would say it depends on what you are trying to do. I use both R and python+scikit-learn. If I am just doing statistical modeling or data mining I prefer to use R. If however I need the analysis to be part of a web app I prefer to use Python. But the bottom line is I can probably achieve the same results from the analysis perspective using either one. Ana

Dan Oblinger Founder at AnalyticsFire

June 7th, 2015

I second Benjamin's opinion.  scripting in a general purpose language which has libraries like pandas in it, is nearly always a better experience than working is a special built langauge that after the fact was extended to be a general purpose language.

Just one example to illustrate the point.  In R, certain operations on a DataFrame object will result in other lower dimensional objects, and sometimes not.  I think the rules originated when the operators were specialized statistical steps.  Since then R is extended to handle all the things general purpose languages do, but not in a simplest, cleanest way.  In Python the entire structure was created clean, then the Panda DataFrame was added, but it does not 'pollute' operations (like textual manipulation of data in a file).

Hasan, noted that Python graphing is primitives compared to R.  I do agree on this point.
I generally write up a small python function that dumps the R statements into a file in /tmp and then invoke R on that function.  (Once this is done, that graphing tool is available directly within python.)

Hasan also noted other statistical functions that R has that python does not.  Certainly true, but if you listed the algs in scipy and scikit-learn I am positive there would be many not found in R.

My only disclaimer I am not a hard core stats guy.  I am doing ML, and lots of data preprocessing.
So I cannot assess the completeness of the Python environment from the perspective of a stats guy.

Shobhit Verma Ed Tech Test Prep

June 7th, 2015

I got degrees in Statistics as well as Computer Science. I love and use R for exploration and once I have played with the data and figured out what model would generalize best, I use python to create a production version algorithm that scales.
If you do not want to learn python you may be able to go very far using Revolution Analytics support. However, I just prefer rewriting in python as it allows me to be more in control of the various optimizations at scale.

Micah Stevens Software/Hardware Engineer

June 7th, 2015

I'd suggest Python for general purpose stuff. It has a much larger ecosystem, more libraries, and can be used fruitfully in almost any manner. You might find that most problems are already solved, it's more a matter of integration than programming for 75% of your tasks.

Don't underestimate the importance of community support, and having a large community. Python is likely several orders of magnitude greater in this respect. This means you have more people to ask for help, you have more and better tools to work with the language, and it's better understood. 

I've also seen people use Python in modeling and simulations, so I know it's capable in that realm if it's a requirement, although I wouldn't be surprised if it's not as good as R for what R was designed for. 

R is a statistical tool

Python is a programming language

This is a huge and fundamental difference between the two which makes any further comparison redundant. I would say that daily use of Python would be by a developer and for R would be by statistician, plain and simple.


June 7th, 2015

It depends on what you mean by "daily use".. Here are a couple of scenarios:

1. If you are building a generalized web platform that has more user engagement use-cases outside of data and statistical dashboards, then Python is going to be more resourceful as it has full stack web frameworks that can assist with web development and provides a productive/superior eco-system than R for web dev.

2. If your daily chores and product require a lot of data analysis and predictive modeling based on large sets of data, I'm biased that R has a better usage and easier to attain your goals.

Bojan Tunguz Chief Data Scientist at Tunguz Consulting LLC

June 7th, 2015

Another consideration might be performance. In my experience Python is much faster than R, which can be a serious issue for large data sets. 

Alexandre Bellaiche Project Manager chez Microwave Vision Group

March 6th, 2017

If you have the data already gathered (for example in a CSV file), R will allow you to manipulate and clean it very quickly.

Otherwise, I would recommend Python that can do everything that R can do by using a few additional libraries (Pandas in particular). On the plus side you can also connect to database, write scripts for web-scraping, etc...

If you have a specific business need, I can provide personalized help.