Installing Rattle and R

The halo around R continues to grow and grow, more and more organizations are now beginning to explore building capabilities in R programming as it can potentially deliver costs savings. More on the comparison of R and SAS in our earlier blog entry.

In this post we will take you through installation of R and Rattle on a Windows 7 machine. Here is a youtube video showing the capabilities of R on a small credit scoring dataset.

  1. Download R from the website. The link provides for Windows installation, the setup file for both 32 bit and 64 bit systems is the same, so you need not worry.
  2. The setup file is an executable, simply run it and follow the instructions, it should install the basic R software on your system.
  3. There should be an icon created on your desktop, in 64 bit systems two icons get created (one for normal 32 bit, the other for 64 bit). If you have a 64 bit system, double click on the Rx64 2.XX icon, where XX is the version number)
  4. The following window should open upR software interfacetype in the following commands one after the other, press enter after each statement   install.packages(“RGtk2”) & install.packages(“rattle”). After the first command, a window will open up asking for a CRAN mirror to be  selected as below, You can select any CRAN mirror to download the packages from (to be safe, select any US or western Europe mirror to ensure latest versions)
  5. Run the following commands now » library(rattle) followed by rattle()
  6. This is where most errors regarding rattle installation pop up, in a lot of cases R will thrown an error such as GTK not found or error with GTK+ and it will offer to download GTK for you. But even that option after download will not work. Fear not, follow the instructions below to resolve, if your Rattle window launches, congratulations, its working
  7. For those with GTK problems follow the below bullet point steps
  • 32 Bit systems open this link, 64 bit systems open this link.
  • On the page scroll down to GTK+ packages and select GTK+ Version 2.24.8 (32 bit Runtime); GTK+ Version 2.22.1 (64 bit- Binaries)
  • Copy it to the C drive root and extract the ZIP files as they are. For e.g. I create a folder C:\gtk+_2.22.1-1_win64
  • Now Right click on My Computer and then click on Properties (Alternatively you can go via Control Panel >System & Security>System), a new window will open up, on the left hand side click on “Advanced system settings”
  • A new window as below will open up

  • Click on Environment Variables near the bottom, a new window will again pop up, within the system variables selection, scroll down to path and click on edit.
  • An “Edit System Variable” window will open up with variable name “Path”, within variable values you will see a number of Folder paths separated by a semi colon.
  • Within the variable values go the beginning and add a path to the GTK folder we had extracted to the Bin folder, for e.g. C:\gtk+_2.22.1-1_win64\bin followed by a semi colon. (Note: make sure your path actually exists in the folder you have extracted into, i.e. the bin folder)
  • Close all and restart the R software
  • Type in library(rattle), press enter followed by rattle()
  • The rattle window should now open up, you are now ready to shake, rattle and roll your data. Install all packages which Rattle prompts you to, it will be done automatically after you press ok. Check out our Rattle demonstration post for a flavor of what Rattle can do.


Do let us know if the post was helpful in solving your Rattle installation issues, especially the pesky GTK/RGTK2 error. Feel free to comment even if you still face installation issues, we will try and solve them!

LearnAnalytics Team.

R-Rattle Training Video

Today, we are going to introduce a very powerful data mining tool called Rattle. Interesting feature of Rattle is that it is a GUI which sits on top of R. What it means is that it gives users a point and click interface to build data mining projects, predictive Models etc without writing a single line of R code.

In the featured video we have built various predictive models on a credit scoring dataset and compared their performances against each other using ROC curves. Models built are –>

  • Decision Trees
  • Random Forests
  • Adaptive Boosting
  • Support Vector Machines
  • Logistic Regression
  • Neural Networks

This was done without writing any R code (except to launch rattle). Total video lenght is about 17 minutes, which will take you through data import in rattle, variable exploration, model building and model evaluation using ROC’s.

This video is for people from an advanced analytics background as we have not explained much of the methodologies behind the techniques, merely how to do in Rattle. Those who can understand the methodology and are not working in the analytics industry, you should immediately jump ship, greener pastures are awaiting (Seriously, if you understand even 40% of this, you cannot be unemployed!)

For those, who want to understand and learn stuff shown on the video, check out our website, we specialize in Analytics Training for students worldwide. We provide SAS, R , Advanced Analytics trainings.

For doubts/queries, batch timings, drop in  a mail to

  1. Click here to download R
  2. Click here to download Rattle
  3. Click here to download the dataset discussed in the video

To install rattle, simply follow the instructions on the website linked above, if you have problems in installing,drop us a mail, we will be glad to help you out. We will be following up on a detailed post on R and rattle installation with troubleshooting.

Drop in comments to give us feedback!!

Learn Analytics Team

R vs SAS (Comparison and Opinion)


PC or Mac, Windows or Linux, Intel or AMD, we geeks simply love comparing things. This particular comparison although not known in popular culture is an oft repeated argument in the Analytics industry.

SAS needs no introduction, for those who need one can check out the Wikipedia article as well as LearnAnalytics SAS training section.

R or rather the R Statistical package very simply put is the open source equivalent of SAS, for what it’s worth R can pretty much do everything SAS can do in terms of Statistical analysis and there are some pretty cool things R can do which SAS can’t. Say you want to build a predictive model using Logistic regression, well R can do it; ARIMA model, yes; Decision Trees, yes; Association rule mining,yes;etc etc…..

Anything you envisage using SAS STAT for statistical analysis and data mining, R can do it.

What makes R Special?

So what if R can do everything SAS can, there are others also like SPSS, Statistica and so on which can also do pretty much what SAS can do.

Yes, but are the other software’s free? Therein lies the crux behind the whole argument, R is free, it’s an Open source project initially started in New Zealand and is now considered as one of the best Statistical analysis tools in the world.

What’s the argument, isn’t R always better?

It’s not that simple, Linux can do everything Windows can and more, but Windows still dominates.  One of the biggest reasons for continued Windows dominance is momentum and an easier user experience. Inspite of all the advantages Linux offers (better security, no viruses, comparable user experience especially in the Ubuntu variants), the common man still prefers Windows, not to say Linux doesn’t have its die hard following and a vibrant support community.

Same goes for R, now I have used both SAS and R extensively and am going to discuss the pro’s and cons of both packages below.

Statistical Capability

 SAS Stat and other SAS packages pack a powerful punch and cover almost the whole gamut of statistical analysis and techniques. However since R is open source and people can submit their own packages/libraries, the latest cutting edge techniques are invariably released in R first. To date R has got almost 15,000 packages in the CRAN (Comprehensive R Archive Network – The site which maintains the R project) repository.

Some of the latest techniques such as GLMET, RF, ADABoost are available for use in R but not in SAS. Many experimental packages are also available in R. Infact in most Kaggle competitions (which requires a blog post of it’s own), the winners (who are amongst the world’s best data  miners) have almost invariably used R to build their models.

In this aspect R is the hands down winner, however a word does need to be put in about SAS, since SAS is a paid software with support, any new innovation, or new statistical technique has to be vetted and accepted. SAS is used in many mission critical assignments where merely experimental techniques cannot be allowed to creep in. While this is necessary for the environment SAS works in, it also means that it will keep playing catchup with R in terms of latest innovations. On the other hand since anybody can upload a package in R, user beware!

Therefore in terms of pure statistical capabilities, I rate R higher.

3.     Data Handling

Data handling is the bugbear of R. The single largest drawback of R is the way it allocates and handles memory by trying to load the whole dataset in RAM. This can cause severe problems when working on a combination of large datasets and small computers (which it always is, your data is always huge and your computer is always puny!).

SAS excels in handling large datasets, infact server editions of SAS can chew through TeraBytes of data without any issues whereas R is very likely to throw Out of memory errors or become unresponsive and die.

Not to say that R cannot handle big data, it can, but say I have a Laptop with 2 gigs of RAM and a dataset running into millions of records, for the same exercise which SAS can do in 30 seconds, R might take upto a few minutes or even die.

However computing power is cheap and getting cheaper by the day, given enough RAM and computing power, R can also crunch through large datasets efficiently, especially on 64 bit machines.

But for now in terms of Data handling, I rate SAS higher.

4.     Ease of Use

One of the biggest reasons Linux has never been the runaway success as compared to Windows is that it was so damn difficult to use, install or troubleshoot. Now take that problem and multiply by 10, and you get the idea of R. There is no easy way to put it, but R is not for the faint of heart. It is damn difficult to learn as compared to SAS.

SAS programming syntax can be considered as a high level language which is intuitive and easy to learn, additionally it was designed as a DML (Data Manipulation Language). On the other hand R programming is a monster.

For e.g. consider you have to do a simple data manipulation task such as sorting a few tables and joining them together. It would be a piece of cake to do this on SQL (any SQL package or even PROC SQL) or any of the SAS data steps. Now consider doing this in C++ (makes your blood run cold doesn’t it).

If SAS programming is high level more akin to SQL , then R is a low level language closer to C++. Even simple tasks can mean writing lengthy pieces of obfuscated code.

Learning R is definitely more challenging than SAS, but since R is a true programming language it gives more flexibility and power than SAS to the programmer. But for mere mortals like the rest of us, we would prefer to use the SAS programming language.

Support for R is another issue; obscure errors messages can literally suck the life blood out of somebody who is fairly new to R. There are support groups and forums on the internet, but if you are using a new package and it throws and error, you are on your own.

All in all, for true programmers R is closer to the heart but for the rest of us, who just want to get our work done, SAS is the winner by a mile in terms of ease of use.



I have used both R and SAS, and there is no straightforward answer to this. For example even though R is free, technically it should be cheaper to use shouldn’t it? Well the answer is not always.

TCO (Total Cost of Ownership) of using R might actually go higher than SAS. For example an Analytics company decides to use R exclusively figuring since they don’t have to pay for SAS licenses, their cost of project delivery will go down, better profit margins, lower billing to client, better competitiveness in the market. Win –win right?

Except now they have to train their consultants on R, or hire outside talent. R programmers are in short supply (esp. in India), this drives up your cost of resources for one. Now take into account the learning curve and the deployment cost as well as code migrations of client legacy systems, now to mention the obscure tantrums that R can throw it you, but you can’t call anyone for support now, since there is none, it’s free software. At least if SAS doesn’t work, you can hold them by their throats. (For the kind of licensing fee they demand, it’d better work!)

On the other hand, you have a startup, small team, really smart people. Investing in a SAS license may not make sense at this point, they will simply use what I call the RUM stack (R-Ubuntu-MySql), it’s a pun on the LAMP stack.

i.e. use MySQL for heavy data manipulation, use R only for statistical analysis on machine running on Ubuntu Linux. Everything for free! While this solution may work for a small company and high calibre programmers, it is not scalable for a 25,000 man consulting organization which is run by processes/adherence and not individual brilliance.

My choice -> if you are small and hungry go for R. If you are a big organization where budget is not an issue, close your eyes and buy SAS licenses, everybody will be happy (but install R on your laptop nonetheless).