Data Analysis Toolpak – 2 Sample T Test

We check the significance of the difference in means between 2 samples using a T Test (in Excel).
Dataset can be downloaded at

Data Analysis Tool Pak – Multiple Linear Regression (Excel 2010)

In this segment – I demonstrate the use of Data Analysis Tool Pak to build a multiple linear model using only Excel 2010 and basic interpretation of the results.

While the Data Analysis Toolpak does not replace a dedicated statistical tool, it does allow you to quickly check and do basic tests, analysis on Excel in a user friendly way, however it’s capabilities are limited and not recommended as a dedicated tool.

Who ‘does’ Analytics?

Who ‘does’ Analytics?


A weird question really, not meant in the literal sense, but really who are the people who work in Analytics or are trying to learn/enter the Analytics industry. Going through the Google Analytics reports of my YouTube channel, this question struck me as quite relevant and threw up some very interesting insights.

A little background first – back in early 2012, while in-between jobs, I taught an introduction to analytics batch to a varied set of students (based in India and US). The curriculum covered SAS Programming (taught by my wife) and Basic Statistics, predictive modelling (taught by myself) using webex. The training sessions were recorded using Webex. A few months back, I uploaded the full SAS and analytics trainings on YouTube for general viewing.

Now, almost 100,000 views later, 850 subscribers, @300 views a day – YouTube analytics section throws up some unique insights about the viewer’s logging in. Firstly, topics like PROC SQL and PROC LOGISTIC are highly esoteric and of no interest to the average YouTuber searching for the Miley Cyrus VMA twerk, i.e. it is unlikely to be either suggested by the recommendation engine or searched for except by someone who is actually interested in these topics and is actively searching/viewing such stuff. However, internet being the great enabler it is, the channel has managed to garner some attention (inspite of the dubious audio quality) and generate some regular traffic. (8-10K views a month).

So, then who is “doing analytics”?

Country Distribution

Country List

That India tops my list of incoming viewers is no surprise, after all – my networks (social and otherwise) which drive a significant volume of traffic is primarily India based. However, after India – the worldwide anglosphere (UKUSA) community dominates and infact “rank-orders” perfectly based on their relative population size. i.e. all English speaking countries in descending order based on their populations. After the anglosphere – the list of non english countries show a pattern (France, Germany, Singapore and Brazil); the size of their financial markets and population driving it. China & Japan do not figure anywhere on the list – even though they would be mature geographies in terms of analytics use and analytics professionals primarily due to the language divide. (Hypothesized, my opinion, could be wrong).

Being an analytics professional, I just cant stop at simply publishing a bar chart of the geographic traffic to my channel, I feel the urge to draw an “insight” from it too. (How many of you can relate to your bosses screaming – “don’t give me numbers dammit – give me an INSIGHT”). I hate this word, but yeah – whats the insight. The question I want to ask is – “Which country has the highest concentration of analytics professionals?”. The data source being only the traffic to my channel – I claim no reliability of the answer but here goes à

  1. Simply relying on the traffic is not enough – large countries have large pools and dominate, hence India and US on the top.
  2. I am going to ignore the bias of English vs Non English, since being a technical topic, majority of analytics professionals around the world will have a working mastery of the English language
  3. Dividing the traffic by the population of the countries will give me an index which will remove the bias of population size and give me an apple to apple comparison (or closest to it) with which I can directly compare and rank order countries based on their “Analytics Concentration”

And the list/ranking goes as à

Surprisingly, the data shows that the tiny city state of Singapore accounts for the highest concentration of analytics! Punching way above its weight with a score of 15 and leader of the pack by far! (Methodology for calculating the index is very simple – views/population).

In the Anglo-sphere Australia/USA and UK after removing the effect of population show almost the same Analytics penetration (countries with very similar HDI, income levels and the language), however Canada surprisingly has a small but significant lead over the other 3 (11 vs. 9). Therefore amongst the Anglo countries, Canada ranks as the most “Analytics” country! Maybe I should also try and find a job there eh!

India, not surprisingly has fallen way behind others after removing the huge population advantage, but still ranks higher than Germany and France. Brazil again finishes last in this ranking as well, but again I am sure the language bias is probably affects it more than others.

Gender Distribution

I ask all the analytics professionals to stand up from their cubicles and look around , calculate the rough ratio of males to females in your office – the chances neigh almost a certainty that the skew is towards males, but how much? Let’s look at the data once more à


Not surprisingly, the traffic shows a major skew towards males. 77% to 23%, or rather more than 3 out 4 analytics professionals (or professionals in making) are males, makes for a very boring office culture if we keep adding only more and more male geeks to the profession.

But the what I really want to see is, does geography play a role here? Is there variance in gender participation by country? Remember, India is driving almost 40% of the numbers here, and lesser said about gender participation in India, the better.

Gender distribution by top countries as below à
















YouTube does not give in-depth analysis beyond the top 4 countries for me currently, still – the data shows that the gender divide is global, and very surprisingly the worst in the United Kingdom (and not India)! The UK shows a massive 85-15 divide as compared to the global average of 77-23. US has the best ratio of 70-30 from the countries on the list. So if you are a female analytics professional – the best office climate would probably be in the US and the worst in the UK!


Age Distribution

Distribution by age for the top 4 countries à

Country <18 Years 18-24 Years 25-34 Years 35-44 Years 45-54 Years 55-64 Years 65+ Years

































The first thing which hits you when you look at this table is the massive youth skew in India as compared to the rest. By and far, the average Indian Analytics professional (or the would be professional) is much younger as compared to her global counterparts.

A full 61% of Indians are less than 34 years, with a solid 51% sitting in the 25-34 year bracket. Comparing that with the US where 71% are older than 34 years! This table probably shows clearer than any other set of numbers the India story à Young Indian professionals working for their older American bosses out of offices in Banglaore, Gurgaon and Mumbai.

Amongst other countries, UK has 57% older than 35, Canada has 75% (compared to India with a mere 38%). Canada and the US have similar age profiles, UK is comparatively younger as compared to these two, and of course Indians are the babies in the room.



So, what have we learned, any analytics presentation without the summary is incomplete, therefore top conclusions based on our Data à


  1. Singapore and Canada are the most analytics heavy countries in the world.
  2. Analytics industry globally is skewed towards male participation, UK has the worst male-female ratio and the US has the best
  3. Indians are the youngest analytics professionals globally whereas the Americans and Canadians are likely to be the oldest.

Please feel free to comment with your views on this article below or on the linkedin page – as well as mail me at .

Karan Sarao has over 5 years of experience in analytics including training professionals on tools and techniques. Currently works with TransUnion India in Analytics Business Development based out of Mumbai, India. 


Advanced SAS Programming – Day 12 & 13 (Introduction to PROC SQL)




Day 12 – Introduction to PROC SQL

  • Group By and Having clauses in SQL
  • – Using aggregate functions to summarize data in SQL
  • – Using nested Where clause in SQL
  • – Modifying datasets using Create Table Like, Create Table As, Alter Table and Update clauses

Day 13 – PROC SQL Continued

– Using SQL queries in SAS
– Queries to create a table and select values from a table
– Using SQL conditional statements (Where and Case When clauses)
– Order by clause in SQL to sort datasets

Regression – Linear and Logistic using SAS

Basic introduction to Multiple Linear and Logistic regression using SAS with real life data sets.

Basic introduction to credit scoring using Logit modeling. Key concepts of binary prediction like lift, KS, ROC curve, Gains charts etc are explained in these set of videos.

Basic Linear Regression – Part 1

Linear Regression – Part 2

Introduction to Logistic Regression – Part 1

Logistic Regression – Part 2

Logistic Regression – Part 3

Base SAS Programming – Day 5

Merging Data Sets and other options

  1. Merge: (Synonymous to Joins)
    • Datasets can be merged using Data step only, using Merge statement
    • There must be a common variable (Primary id) between the two (or more) datasets being  merged
    • The datasets being merged MUST be SORTED  by the primary id before being merged
    • Types of Merge
      • Inner Merge (no condition) – All observations of the datasets are merged.
      • Exact Merge (x=1 and y = 1) – Only the common observations between the datasets are merged.
      • Right Inner Merge(y = 1) – Only the observations of Right side dataset
      • Left inner Merge(x=1) – Only the observations of Left side dataset
      • Outer Merge(x=1 or y =1) – Only uncommon observations between the datasets
      • Right Outer(x=0 and y=1) – Only uncommon observations of right side dataset
      • Left Outer(x=1 and y = 0) – Only uncommon observations of Left side dataset
  2. Proc Means: Other options and statements
    • By and Class statements to sub-group statistics
    • Output Out statement to save the result into a dataset