R vs SAS (Comparison and Opinion)

Background

PC or Mac, Windows or Linux, Intel or AMD, we geeks simply love comparing things. This particular comparison although not known in popular culture is an oft repeated argument in the Analytics industry.

SAS needs no introduction, for those who need one can check out the Wikipedia article as well as LearnAnalytics SAS training section.

R or rather the R Statistical package very simply put is the open source equivalent of SAS, for what it’s worth R can pretty much do everything SAS can do in terms of Statistical analysis and there are some pretty cool things R can do which SAS can’t. Say you want to build a predictive model using Logistic regression, well R can do it; ARIMA model, yes; Decision Trees, yes; Association rule mining,yes;etc etc…..

Anything you envisage using SAS STAT for statistical analysis and data mining, R can do it.

What makes R Special?

So what if R can do everything SAS can, there are others also like SPSS, Statistica and so on which can also do pretty much what SAS can do.

Yes, but are the other software’s free? Therein lies the crux behind the whole argument, R is free, it’s an Open source project initially started in New Zealand and is now considered as one of the best Statistical analysis tools in the world.

What’s the argument, isn’t R always better?

It’s not that simple, Linux can do everything Windows can and more, but Windows still dominates.  One of the biggest reasons for continued Windows dominance is momentum and an easier user experience. Inspite of all the advantages Linux offers (better security, no viruses, comparable user experience especially in the Ubuntu variants), the common man still prefers Windows, not to say Linux doesn’t have its die hard following and a vibrant support community.

Same goes for R, now I have used both SAS and R extensively and am going to discuss the pro’s and cons of both packages below.

Statistical Capability

 SAS Stat and other SAS packages pack a powerful punch and cover almost the whole gamut of statistical analysis and techniques. However since R is open source and people can submit their own packages/libraries, the latest cutting edge techniques are invariably released in R first. To date R has got almost 15,000 packages in the CRAN (Comprehensive R Archive Network – The site which maintains the R project) repository.

Some of the latest techniques such as GLMET, RF, ADABoost are available for use in R but not in SAS. Many experimental packages are also available in R. Infact in most Kaggle competitions (which requires a blog post of it’s own), the winners (who are amongst the world’s best data  miners) have almost invariably used R to build their models.

In this aspect R is the hands down winner, however a word does need to be put in about SAS, since SAS is a paid software with support, any new innovation, or new statistical technique has to be vetted and accepted. SAS is used in many mission critical assignments where merely experimental techniques cannot be allowed to creep in. While this is necessary for the environment SAS works in, it also means that it will keep playing catchup with R in terms of latest innovations. On the other hand since anybody can upload a package in R, user beware!

Therefore in terms of pure statistical capabilities, I rate R higher.

3.     Data Handling

Data handling is the bugbear of R. The single largest drawback of R is the way it allocates and handles memory by trying to load the whole dataset in RAM. This can cause severe problems when working on a combination of large datasets and small computers (which it always is, your data is always huge and your computer is always puny!).

SAS excels in handling large datasets, infact server editions of SAS can chew through TeraBytes of data without any issues whereas R is very likely to throw Out of memory errors or become unresponsive and die.

Not to say that R cannot handle big data, it can, but say I have a Laptop with 2 gigs of RAM and a dataset running into millions of records, for the same exercise which SAS can do in 30 seconds, R might take upto a few minutes or even die.

However computing power is cheap and getting cheaper by the day, given enough RAM and computing power, R can also crunch through large datasets efficiently, especially on 64 bit machines.

But for now in terms of Data handling, I rate SAS higher.

4.     Ease of Use

One of the biggest reasons Linux has never been the runaway success as compared to Windows is that it was so damn difficult to use, install or troubleshoot. Now take that problem and multiply by 10, and you get the idea of R. There is no easy way to put it, but R is not for the faint of heart. It is damn difficult to learn as compared to SAS.

SAS programming syntax can be considered as a high level language which is intuitive and easy to learn, additionally it was designed as a DML (Data Manipulation Language). On the other hand R programming is a monster.

For e.g. consider you have to do a simple data manipulation task such as sorting a few tables and joining them together. It would be a piece of cake to do this on SQL (any SQL package or even PROC SQL) or any of the SAS data steps. Now consider doing this in C++ (makes your blood run cold doesn’t it).

If SAS programming is high level more akin to SQL , then R is a low level language closer to C++. Even simple tasks can mean writing lengthy pieces of obfuscated code.

Learning R is definitely more challenging than SAS, but since R is a true programming language it gives more flexibility and power than SAS to the programmer. But for mere mortals like the rest of us, we would prefer to use the SAS programming language.

Support for R is another issue; obscure errors messages can literally suck the life blood out of somebody who is fairly new to R. There are support groups and forums on the internet, but if you are using a new package and it throws and error, you are on your own.

All in all, for true programmers R is closer to the heart but for the rest of us, who just want to get our work done, SAS is the winner by a mile in terms of ease of use.

 

Recommendation

I have used both R and SAS, and there is no straightforward answer to this. For example even though R is free, technically it should be cheaper to use shouldn’t it? Well the answer is not always.

TCO (Total Cost of Ownership) of using R might actually go higher than SAS. For example an Analytics company decides to use R exclusively figuring since they don’t have to pay for SAS licenses, their cost of project delivery will go down, better profit margins, lower billing to client, better competitiveness in the market. Win –win right?

Except now they have to train their consultants on R, or hire outside talent. R programmers are in short supply (esp. in India), this drives up your cost of resources for one. Now take into account the learning curve and the deployment cost as well as code migrations of client legacy systems, now to mention the obscure tantrums that R can throw it you, but you can’t call anyone for support now, since there is none, it’s free software. At least if SAS doesn’t work, you can hold them by their throats. (For the kind of licensing fee they demand, it’d better work!)

On the other hand, you have a startup, small team, really smart people. Investing in a SAS license may not make sense at this point, they will simply use what I call the RUM stack (R-Ubuntu-MySql), it’s a pun on the LAMP stack.

i.e. use MySQL for heavy data manipulation, use R only for statistical analysis on machine running on Ubuntu Linux. Everything for free! While this solution may work for a small company and high calibre programmers, it is not scalable for a 25,000 man consulting organization which is run by processes/adherence and not individual brilliance.

My choice -> if you are small and hungry go for R. If you are a big organization where budget is not an issue, close your eyes and buy SAS licenses, everybody will be happy (but install R on your laptop nonetheless).

 

Comments


54 thoughts on “R vs SAS (Comparison and Opinion)

  1. Have you ever considered writing an e-book or guest authoring on other
    websites? I have a blog based upon on the same subjects you discuss and would really like to have you share some stories/information.
    I know my subscribers would value your work. If
    you are even remotely interested, feel free to send me an e mail.

  2. Poor analysis. Sorting and joining 3 tables in R is literally 4 lines of code. Let’s not exaggerate.

    Serious troubleshooting/debugging problems are rare b/c the user community is so good that your answers are easily available on stack overflow (and the package/function documentation for items on the official CRAN server are miles better than for SAS in terms of their conciseness and organizational consistency).

    You overstate the big data problems with R because tools such as hadoop and sparkr are readily available. And one can also make use of multi-core processing, given the lower-level of the R language.

    Tools like SAS, SPSS, STATA are all of the spreadsheet era #antiquated. Once you get beyond computing simple statistics like the mean of a single column, you spend most of your programming time coming up with a workaround to get SAS to do what you want instead of tackling the program directly. E.g. Try looping over columns and applying a function to each, or explore SAS IML.

    R ftw

    • I fourth that. I have used SAS and now I am using R. Using packages like dplyr and piping anyone can write several lines of SAS code in one line in R more clearly.

    • Also I am not sure why people say learning curve in R is steep. I learnt R in a fraction of time compared to the time I spent learning SAS. Tidyverse is a good step for anyone interested in R. Best approach is learning by doing.

      The only way in which SAS is better than R because it utilizes disk memory too while R is on RAM. For everything else R is perfect. I have explored pandas too but I must say R is way too easy to code and follow. While there is no debate between python and R both are complimentary, SAS certainly can be avoided.

  3. If statistics or data analysis is your livelihood, you have to learn both SAS and R.

    Yes, SAS is easier to pick up so if you decide to go with SAS. That’s understandable. That’s what I did. SAS is way ahead in data manipulation. SAS can do lots of what R can do but like author said, SAS is bit behind the curve on latest stuff. There are some things only R can do and some things that R can do much better than SAS. I have found figures produced by R to be more manuscript friendly. In short, my suggestion is to try out both at first. Choose one that you find easier to learn but make sure to pick up the other one over time. Knowing both will make your life much easier as someone who has to produce statistical results as displays that are easy on the eyes of non-statisticians.

  4. I have been using R for couple of months now and found to be pretty easy to learn. It is like any programing language and basics remains same. Good tool for beginners and medium size organizations. SAS I have heard is very powerful but only for big enterprises, I hope I will work on SAS one day then I will be able to compare practically.

    • Yes! I spent a year with an instructor in grad school trying to learn SAS and it was infuriating and eventually I gave up because I was easily able to teach myself R using the online support groups etc. I really don’t understand why the author thinks learning R is more difficult…

  5. Hi All,
    I am MCA 2014 Passout and currently doing job in IT company as a Data Analyst.
    So pls tell me whic one is best course for my career .i want to switch any one of them but lots of confusion like easy to Switch,more in demand etc.( R ,SAS).Please reply

  6. Adding few important points, comparing SAS vs R is not comparing apple to apple. If we compare R with “Complete SAS” then R is nowhere in the league. If we compare R with Base SAS, then R is better, as it gives all the facility for free which base SAS gives.
    R is hard core programming, but open source (free), which can be used to develop enterprise wide analytics application. It has thousands of built in function to solve complex analytics problem. For non-programming background, little difficult to expertise. R is widely used in academic and SME.
    SAS is a complete end to end solution from Data management to Data visualization to ETL to BI report to Advance analytics. Even with SAS EIS and SAS EF we can develop enterprise wide large scale application. Also SAS SCL can be also be used as general purpose OO programming language like Java, C++. More important aspect, SAS EG or E-miner has user friendly GUI, which can be used by non-programmer comfortably. In industry SAS is widely used due to below few reasons.
    1. Data security is very high, for this reason in BFSI domain SAS is no. 1 choice.
    2. SAS has over 250 industry specific “point and click” solution like credit and market risk, Asset/ Performance management, Fraud/ Pricing/ Marketing analytics.
    3. SAS DI is a very powerful ETL tool which augments the power of SAS in Data management.
    4. SAS BI provides the facility to produce world class dashboard in real time.
    5. In corporate time has more value than product cost. 90% of Fortune 100 companies uses SAS.
    6. SAS visual analytics analyzes Big data in real time with great visualization as “Tableu”.

    In future when completion grows, Base SAS can be made free as SAS University edition. SAS is much more than just an analytic tool and it will always remain dominant in future.

  7. Pingback: SAS Versus R (Part 1) | The Big Analytics Blog

  8. Dear
    It’s nice article for the beginners ! Can you suggest me which is the best institute in Chennai to enroll R language, it would be really helpful to us.
    Thank you

  9. Hii
    I finished my MBA in may and am very much interested in analytics. Currently am doing a sales job for reputed automobile components manufacturing company. Am thinking to pursue a analytical course so that I can get into analytical industry. I am very much confused with R and SAS and latest one hadoop. As I am an engineering graduate with IT stream programming is not a problem for me. Suggest me one statistical tool according to future scope to get decent job. And this one year of my experience is going to be antavantage for me or not?? Pl reply me

  10. We currently use SAS at the company I work for, but we recently ran into issues with the SAS licensing model which, despite a large IT budget, has caused us to choke on the immediate and long-term costs. At this time, our top 2 alternative candidates are IBM SPSS and R. Since the writing of this article (2.5 years ago) I would presume R has been enhanced, and R Analytics now has a commercial offering, Revolution R Enterprise. My question is, have any of the cons against R mentioned in the article been addressed in the most recent release or do these gaps still exist between R and SAS? Thank you

    • Joe, I wrote this almost 3 years ago. Older and wiser now – I would consider this argument dead.
      SAS and R do not compete much at all. What are you using SAS for ? Data manipulation or model building? If data manipulation/MIS/exploration – move to MySQL or Hadoop. If model building use MySQL for data manipulation and R for model estimation.

      Pretty much all of R shortfalls have been addressed – but its probably moot now as the world has shifted to python. But SAS remains SAS – if you can sort out the budget, just go get it. In terms of overall TCO – going or not going for SAS may not have any impact at all. It doesn’t in my company – and we build credit scores and run the credit bureau’s in all major geographies in the world.

  11. Hi,

    I am data analyst since 2010. I am using excel and SQL . I want to switch my carrier to SAS. Am I get job on SAS if learned course from outside. because no sources available in my organization to learn SAS.

    Please suggest me………..

    Regards,
    Satya

    • Hi Satya,

      Since you are data analyst by profession; I suggest or recommend to pursue with SAS advance and specialization course related to it. Also r-statistical analysis may be helpful to you which is booming sector too. it has lot of scopes in IT firms as well like google, iphones application.
      If you do SAS stat course only do from SAS india institute which is accredited as there are many fake accreditation institute. One of the accredited institute is EPOCH institute and they are genuine as well as accredited. Based on your experience they also promote you for placement in good firms. However its your responsibility to crack through interviews. Hope this helps. All the best

  12. Hi .. I would like to ask a question.If i learn R extensively can i get a job in any IT industry.Please help me guys.. Thanks..

  13. Hi karan,

    Great article man..
    Can you do me a favor, i have just entered the world of analytics.. Can you please suggest me the software that i should opt to go for to learn.. i am at my very initial stage.. Please help me out..

    Thanks…

  14. Well written article. I mostly agree with what is said.
    R is not by definition a statistical programming language nor a data manipulation language so it will need to be associated with stats packages and SQL server to compete directly with SAS.

  15. As a grad student in community ecology, R is easily the winner. Its free, it has powerful packages like vegan that do ordination and other community analyses quickly.

    Personally I prefer the object-based nature of R especially for basic statistisc because it avoids the proverbial “black box” of the PROCs in SAS. What I mean is that in R you write the exact ANOVA model you want calculated and get the ANOVA table for that model and then you can perform pairwise comparisons using whichever method you choose (i.e.,Tukey’sHSD, Fischers LSD). While SAS does this in things like PROCMIXED and PROCGLM these PROCS spit out obscene amounts of output that obfuscate what you’re after.

    Also the resources for learning R are fantastic (especially the R cookbook) as are the graphics (while my experience with SAS is limited, I remember that the graphics paled by comparison to those in R (especially the ggplots2 package).

  16. It is also worth noting the SAS’ JMP software embodies the JMP Scripting Language. This is an in memory programming language that performs much better than Matlab or R for larger datasets. Also, it has amazing dynamic data discovery/visualization/predictive modeling capabilities at your fingertips. It can also run SAS and R programs.

    I was a Matlab addict for years, I think it offers a better programming environment than R, but I quickly got hooked on JMP within a few weeks of getting used to it. Also, JMP is fairly cheap for a license.

    One other note is that I recently have gotten hooked on SAS Enterprise Guide. After the learning curve, I have found this to be the mother of all data centric code writing environments. It is impressive software and has a great features for enterprise business analytics usage.

  17. I am totally new to Analytics … just have a quick question::::

    Can’t we write R programs and use them in SAS … if not how are algorithms made available in the licensed tools like SAP PA, SPSS, SAS … these tools have the options to integrate with R don’t they … if so don’t we have the option to write R scripts, pack it and then use them in the other tools …

    • Yes. SAS has developed capabilities in calling R and or porting results of R analysis. I have seen it demoed in the SAS Analytics2013 Conference. Search SAS.com for R references.

  18. How can you say that SAS is easier to learn than R ?

    Have you ever tried to compute a matrix product with SAS ? You will have to write at least one page of code (check yourself http://www.lexjansen.com/pharmasug/2010/cc/cc15.pdf)

    With R you just have to write A%*%B, and bob is your uncle.

    I have worked with both R and SAS, and I can tell that SAS is a very bad software, full of bugs ; If you need to do something other than just computing means or basic linear regressions, SAS becomes a real nightmare !

    • I suppose IML would make it easy. However from a plain business point of view (non research), I still stand by my view. For research purposes R is already the weapon of choice.

      Infact if you check Kaggle forums, nobody would be caught dead using SAS!. That Python and C++ are coming up over R is a different matter altogether.

  19. I read the below statement in your post…..and could guess you may not have as yet started using the ‘sqldf’ package for R…..addresses the pain you are talking about, in a cool way..see the youtube video showing simple syntaxes..

    “For e.g. consider you have to do a simple data manipulation task such as sorting a few tables and joining them together. It would be a piece of cake to do this on SQL (any SQL package or even PROC SQL) or any of the SAS data steps. Now consider doing this in C++ (makes your blood run cold doesn’t it).”

      • i learned C, Matlab and R sequentially. Now I am learning SAS. I feel that SAS is quite different from the first three languages. It is hard to learn for me. But I think once I start to use it as a daily routine, it wouldn’t be too hard to figure out the logic behind it.

  20. Pingback: Thoughts on Machine Learning – the statistical software R | Florian Hartl

  21. Pingback: Installing Rattle and R | Analytics Training

Leave a Reply

Your email address will not be published.