Basic Statistics

Series of videos on Basic Stats using SAS

Hypothesis Testing – First Part

Hypothesis Testing – Second Part

Hypothesis Testing – Third Part

Round up and Revision – Basic Stats (Introduction to Excel Data Analysis Tool Pack)

SAS Programming – Day 10

Arrays in SAS

  • SAS arrays are another way to temporarily group and refer to SAS variables. A SAS array provides a different name to reference a group of variables
  • Array statement begins with keyword ARRAY followed by array name and N – number of elements within array
  • _temporay_  option is used to create a temporary array

Base SAS Programming – Day 9

Looping in SAS

  1. Functions in SAS:  Continued
    • Text function:
      •  Compress – Returns a character string with specified characters removed from the original string
      • Index – Returns the position of the specific character in a string
    • Use of upcase, lowcase and propcase functions in string comparison
    • Math / Stat functions: Like Int, Round, Sum, Mean etc
    • Difference between Mean value (or any aggregate function) of Proc Means / Summary and Mean function
  2. Loops in SAS : Do Loops
    • Loops are used to iterate through every observation for specified number of times to obtain a desired result
    • Types of Loops:
      • Do Loop
      • Do While
      • Do Until
    • Default increments by 1
    • Can use BY to increment by any value other than 1

Base SAS Programming – Day 8

Functions in SAS (Continued)

  1. Text Functions
    • Catx – to concatenate characters / strings with any delimiter. Cat is also a function used to concatenate characters / strings
    • Trim – to remove trailing blanks in a string
    • Tranwrd – Replaces all occurrences of a substring in a character string
    • Translate – Replaces specific characters in a character expression.
    • Do check out Compress and other Text functions like Upcase, Lowcase and Propcase
  2. Date and Time Functions
    • Day – Returns the Day from a SAS date value
    • Month – Returns the Month from a SAS date value
    • Year – Returns the year from a SAS date value
    • Week – Returns the week number from a SAS date value (Try weekday function)
    • Mdy – Concatenates Month, Day and year into a date value
    • Today – Returns current system date
    • Datdif – Difference between 2 dates in days
    • Yeardif – Difference between 2 dates in years
    • INTCK – Returns the number of interval boundaries of a given kind that lie between 2 dates.
    • INTNX – Increments a date value by a given time interval, and returns a date

Base SAS Programming – Day 5

Merging Data Sets and other options

  1. Merge: (Synonymous to Joins)
    • Datasets can be merged using Data step only, using Merge statement
    • There must be a common variable (Primary id) between the two (or more) datasets being  merged
    • The datasets being merged MUST be SORTED  by the primary id before being merged
    • Types of Merge
      • Inner Merge (no condition) – All observations of the datasets are merged.
      • Exact Merge (x=1 and y = 1) – Only the common observations between the datasets are merged.
      • Right Inner Merge(y = 1) – Only the observations of Right side dataset
      • Left inner Merge(x=1) – Only the observations of Left side dataset
      • Outer Merge(x=1 or y =1) – Only uncommon observations between the datasets
      • Right Outer(x=0 and y=1) – Only uncommon observations of right side dataset
      • Left Outer(x=1 and y = 0) – Only uncommon observations of Left side dataset
  2. Proc Means: Other options and statements
    • By and Class statements to sub-group statistics
    • Output Out statement to save the result into a dataset

Base SAS Programming – Day 7

Introduction to Important PROCs – FREQ, SUMMARY and FUNCTIONS in SAS

  1. Proc Freq
    • To determine the frequency of occurrence of values in categorical variables
    • Results in frequency, cumulative frequency, percentage and cumulative percentage
    • Can create a n-way crosstab using tables statement and * between the variables
    • Crosstab results in frequency, percentage, row percent and column percent
    • norow, nocol, nocumm, nopercent options can be used with Tables statement to customise the result
  2. Proc Summary
    • Similar to proc Means, used to extract basic statistics / summarize data
    • Print or Output option MUST be used in Proc Summary to get the result
    • By and Class statements can be used in Proc Summary
  3. Functions in SAS
    • There are different types of functions in SAS, mainly used for data manipulation
    • Text functions, Data Type conversion functions, Math / Stat functions, Date and Time functions
    • Type conversion functions
      • Input – to convert character data type to numeric
      • Put – to convert numeric data type to character
    • Text functions
      • Substr – to extract part of a string, based on number of characters
      • Scan – to extract part of a string, based on a delimiter

Base SAS Programming – Day 4

Sorting and De-duping datasets in SAS

  1. Proc Sort
    • By statement is used in Proc Sort to sort variables in a datset
    • By default, the specified variable(s) is sorted in ascending order
    • Descending option / keyword is used in the ‘By’ statement along with the variable to sort it in descending order
    • Only the variable followed by ‘descending’ will be sorted in descending order, the rest of them, if any will be sorted in ascending order
    • Noduprecs option – is used to extract only the non duplicate observations of the dataset
    • Nodupkey option – is used to extract only the non duplicate observations of a particular variable(s) specified in the By statement
  2. Proc Print
    • By and Sum statement in SAS – using aggregate  functions in Proc Print procedure
  3. Proc Append
    • To append SAS datasets, is to stack observations of one dataset over the other. It is like a horizontal join.
    • The datasets being appended must have the same variables (same data type and variable name)
    • Force option – to append datasets with uncommon variable names/attributes
    • Appending of datasets can be done using data and set statement, which is more of a union of datasets.

Base SAS Programming – Day 2

  1. Proc Contents – other option
    • Position – to list the variables in alphabetic order as well as in order of creation in the dataset.
    • _all_ option – to list out the contents of all the files / datasets in a library
    • nods option – used only with _all_ , to list out only the file/dataset names in a library
  2. Data step:
    • Creating a copy of a dataset using data and set statement.
    • Keep and Drop options – to modify a dataset by retaining only the required variables in it. Can be used in both Data and Set statements.
    • Rename and Label statements to change the name of the variable and provide a brief description to it.
    • Using Firstobs and obs option in Data step
  3. Proc Print: Options
    • Firstobs and Obs – to specify the number of observations (based on observation number) to be printed in the result / output window
    • Label – Prints the label of the variable in place of the variable name in the output.
    • n – specifies the  number of observations in the result

Base SAS Programming – Day 1

The BASE SAS video series begins with the assumption that the student viewer has no background in SAS programming and in fact very limited to no prior exposure to any kind of programming at all. Base SAS video series comprises of 9 video lectures (Average of hour and a half each), plus additional videos covering Advanced topics like PROC SQL and SAS Macros.

Day 1 topics as below —

  1. Intro to Libraries, Data step and Proc step in SAS
  2. Data step example:
    • Creating a sample dataset using Datalines / Cards statement
    • Understanding data types in SAS
    • Informat and Format
    • Label
  3.  Proc step example: Overview
    • Proc Contents – to know the dataset structure, with list of variables and number of observations
      • Varnum option – to list the variables of a dataset in creation order (otherwise the list is in alphabetic order)
    • Proc Print – to view the data in a dataset.
      • Label option and Var statement
    • Proc Means – to extract basic statistics of a numerical variables in a dataset like N, Mean, Standard Deviation, Minimum and Maximum (default)