DSNY5-Week 1: Data Joining and All Types of Join

Data joining is a technique which helps to combine 2 or more tables/ datasets together based on the common field (join key) in order to get the fields that the user wants. In the first week of DSNY 5, I learned a new type of join that I haven't learned or used before. That is Cross Join. Before exploring what is Cross Join and when to use it, I am going to summarize all types of joins that I have known in this blog. I divide into 3 groups: Outer Join (included Inner Join), Exclusive Join and Cross Join.

I have 3 tables: A (green), B (blue) and C (yellow) and Null value represented as pink cross-hatching area.


Outer Join

  1. Left Join or Left Outer Join: the result includes all records in table A and the matching records with table B.
Image 1: Left Join or Left Outer Join

In the image above, when Order ID=5, product cell in that row is Null. In Table B, there is no Order Id =5, so there is no product. But the Left Join includes all records in table A. Therefore, product is Null at that row.

2.    Right Join or Right Outer Join: the result includes all records in table B and matching records with table A.

Image 2: Right Join or Right Outer Join

This is the right join, so I put Order ID and Product columns first. It will return all records in table B, so all Order ID from 1 to 4 will be listed. In table A, the Order id=5 doesn't match with table B. Also, the Order ID=4 in table B doesn't match Order ID in table A, so the Customer ID and Customer Name at that row is Null.

3.   Inner Join or Join: the result includes only the matching records between table A and table B.

Image 3: Inner Join

In the image above, the matching Order ID is 1, 2 and 3.

4.  Full Outer Join: the result includes all rows in table A, table B and matching records between table A and table B.

Image 4: Full Outer Join

In the image above, the result includes the result from the inner join in part 3 and the Order ID=5 in table A (not in table B) and Order ID = 4 in table B (not in table A). Therefore, there are some null values when Order ID =5 and Order ID =4.

Exclusive Join

  1. Left Exclusive Join: the result includes records in table A but not present in table B.
Image 5: Left Exclusive Join

Order ID = 1,2,3 present in table B. In the Left Exclusive Join image above, only Order ID=5 doesn't present in table B. Therefore, the result is only 1 row when Order ID=5 and Null value in Product.

2.  Right Exclusive Join: the result includes records in table B but not present in table A.

Image 6: Right Exclusive Join

In table B, only Order ID=4 doesn't present in table A. Therefore, the result only returns Order ID=4 with Null Values in Customer ID and Customer Name.

3. Full Outer Exclusive Join: the result includes records in table A, table B but exclude the matching records between table A and table B.

Image 7: Full Outer Exclusive Join

Order ID=1,2,3 are presented in both table A and table B. For full outer exclusive join, all records have those Order ID will be excluded. The remaining records will be included in the result. There are some Null values when the Order ID doesn't match in the other table.

Cross Join

For all of joins above, there is at least one matching field to join 2 tables. But Cross Join is special. We don't need any matching fields to join 2 tables. Cross join is used to create all combinations of records in both tables.

Image 8: Cross Join

In the image of cross join above, each rows in table B will match with each rows in table C. There are 4 rows in table B and 4 rows in table C. Therefore, the total rows for the Cross Join result will be equal 4 x 4 = 16 rows. But how to do it in the tools (mySQL, Tableau Prep,...) you are using?

In the first week of DSNY 5, I learned a tip from Valerija Kirjackaja. By creating a dummy field for each table, I can join table B and table C with that join key. So my new tables would be like:

Image 9: Use dummy field to join 2 tables in Cross Join

Self Join

There is another join also uses the same method as Cross Join. Its name is Self Join because the table joins with itself. It's useful when we would like to make a pair of value in the table.

Image 10: Self Join

I use data joining most of the time when I work with data. In the first week of DSNY 5, I learned and practiced joining datasets in Tableau Prep. Data joining is very helpful for me when I would like to combine or contrast 2 or more datasets. I hope this blog is helpful to you to distinguish the difference between types of join.

Author:
Le Luu
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab