Merging data is one of the more complicated tasks for new coders. PROC SQL has many advantages to the traditional SAS MERGE Data Step, but understanding the different types of joins can be daunting. Enter The Simple Guide to SAS: From Null to Novice, designed to demystify the world of SAS programming for beginners.
The following is a conceptual example from the guide that explains the four types of PROC SQL joins: right, left, inner and full. The tables below represent two data sets called Jobs and Salaries.
Left Join
Let’s say an HR representative wants to see a list of all employees with a job title (all rows in data set 1) with their associated salary information added when available. To get this list, left join the Jobs and Salaries data sets matching on employee Name. A left join returns all rows in data set 1 (Jobs) and appends on any requested information from data set 2 (Salaries) when available. This would result in the following table.
Notice that Ashley appears in the left joined data set because a record for her exists in data set 1 even though there was no associated record for her in data set 2. Since she was not in the Salaries data set, a salary was not found for her. Therefore, she is assigned a Salary value of missing.
Right Join
Alternatively, the HR representative might want a report of all known employees with salaries at the bank (all rows in data set 2) with job titles added, when available. To get this list, right join the Jobs and Salaries data sets on employee Name. This will return all rows in data set 2 and add information from data set 1 when available. The results look like this.
When using a right join, Ashley no longer appears in the merged data set because she does not have Salary information, but Susan does. Since Susan is not in data set 1, however, the Job field is blank/missing for her in the merged data set.
Inner Join
The HR representative might want only the list of employees with a known job title and salary. In this case, inner join the Jobs data set with the Salaries data set on Name. This will return only the records on data set 1 that also existed in data set 2.
Only Emma and John were in both data sets.
Full Join
Finally, the HR representative might want to know all available information. In this case, do a full join on Name so that all rows from data set 1 and all rows from data set 2 are included.
All employees are included in the merged data set and have missing values where they did not have a Name match on the other data set.
Before coding a PROC SQL join, it is helpful to walk through what is needed conceptually and what type of join is needed to achieve the desired result. As data analysts, we often get requests to merge data with little instruction on how to join it and what the resulting data will be used for. Don’t be afraid to ask about the goals and purpose of the merged data set to make sure the most useful information is provided.
The Simple Guide to SAS: From Null to Novice
Tags