Data joining is a technique which helps to combine 2 or more tables/ datasets together based on the common field (join key) in order to get the fields that the user wants. In the first week of DSNY 5, I learned a new type of join that I haven't learned or used before. That is Cross Join. Before exploring what is Cross Join and when to use it, I am going to summarize all types of joins that I have known in this blog. I divide into 3 groups: Outer Join (included Inner Join), Exclusive Join and Cross Join.
I have 3 tables: A (green), B (blue) and C (yellow) and Null value represented as pink cross-hatching area.
Outer Join
- Left Join or Left Outer Join: the result includes all records in table A and the matching records with table B.
In the image above, when Order ID=5, product cell in that row is Null. In Table B, there is no Order Id =5, so there is no product. But the Left Join includes all records in table A. Therefore, product is Null at that row.
2. Right Join or Right Outer Join: the result includes all records in table B and matching records with table A.
This is the right join, so I put Order ID and Product columns first. It will return all records in table B, so all Order ID from 1 to 4 will be listed. In table A, the Order id=5 doesn't match with table B. Also, the Order ID=4 in table B doesn't match Order ID in table A, so the Customer ID and Customer Name at that row is Null.
3. Inner Join or Join: the result includes only the matching records between table A and table B.
In the image above, the matching Order ID is 1, 2 and 3.
4. Full Outer Join: the result includes all rows in table A, table B and matching records between table A and table B.
In the image above, the result includes the result from the inner join in part 3 and the Order ID=5 in table A (not in table B) and Order ID = 4 in table B (not in table A). Therefore, there are some null values when Order ID =5 and Order ID =4.
Exclusive Join
- Left Exclusive Join: the result includes records in table A but not present in table B.
Order ID = 1,2,3 present in table B. In the Left Exclusive Join image above, only Order ID=5 doesn't present in table B. Therefore, the result is only 1 row when Order ID=5 and Null value in Product.
2. Right Exclusive Join: the result includes records in table B but not present in table A.
In table B, only Order ID=4 doesn't present in table A. Therefore, the result only returns Order ID=4 with Null Values in Customer ID and Customer Name.
3. Full Outer Exclusive Join: the result includes records in table A, table B but exclude the matching records between table A and table B.
Order ID=1,2,3 are presented in both table A and table B. For full outer exclusive join, all records have those Order ID will be excluded. The remaining records will be included in the result. There are some Null values when the Order ID doesn't match in the other table.
Cross Join
For all of joins above, there is at least one matching field to join 2 tables. But Cross Join is special. We don't need any matching fields to join 2 tables. Cross join is used to create all combinations of records in both tables.
In the image of cross join above, each rows in table B will match with each rows in table C. There are 4 rows in table B and 4 rows in table C. Therefore, the total rows for the Cross Join result will be equal 4 x 4 = 16 rows. But how to do it in the tools (mySQL, Tableau Prep,...) you are using?
In the first week of DSNY 5, I learned a tip from Valerija Kirjackaja. By creating a dummy field for each table, I can join table B and table C with that join key. So my new tables would be like:
Self Join
There is another join also uses the same method as Cross Join. Its name is Self Join because the table joins with itself. It's useful when we would like to make a pair of value in the table.
I use data joining most of the time when I work with data. In the first week of DSNY 5, I learned and practiced joining datasets in Tableau Prep. Data joining is very helpful for me when I would like to combine or contrast 2 or more datasets. I hope this blog is helpful to you to distinguish the difference between types of join.