Unions and Joins - And then there was one

Unions and Joins are often made to sound far more complex than they actually are – I’ll do my best to not do that here… They are both ways to merge two data sources together, put simply, a union is adding to rows and a join is adding to fields.

Unions

So unions add rows. You need common fields in both data sources to do this (having uncommon fields will generate nulls in the rows from the data source that doesn’t have those rows). Lets have a pic that explains it all…

1+2=Union

Here you can see that both sources have identical fields and so everything just gets put into the same source!

Joins

If unions add rows then joins add fields. But it all hinges on the data sources having at least one common field. Lets have a look at a pic and I’ll talk you through some notable goings-on.

As you can see the species field isn’t called “Species” in the second table – it’s called “Animals” – but that doesn’t stop the join. All the tools I’ve seen allow you to manually select fields to join on – and this field is what it all hinges on because the tables share data here. The new table is created from the combination of the previous tables, things to note: bears appear twice and so the scientific name is added on to both those rows without any trouble; another thing is that Squirrel (and a few others) haven’t been given scientific names so they’re nulls and Wolf (and a few others) aren’t present in our original table so they also get nulls.

This is what’s called a “Full Outer Join” as it takes everything and puts nulls in where information is missing. But there other types of join…

This is a screen in Tableau that allows you to decide on your type of join, we just saw full outer, so lets have a look at what the other options look like…

Inner Join

Inner Joins don’t have those nulls in. That’s it. If the the that row isn’t in the common field of both then it’s gone. No more. Good bye.

Left Join

Left Joins take all of the rows from the left hand table (our one from unioning) and assigns new information where it can and where it can’t whacks a null in. (Sometimes referred to as a Left-Inner Join).

Right Join

Right Joins are like Left Joins but the other table is the one that has all it’s rows in it. (Sometimes referred to as a Right-Inner Join).

That’s it! That’s all there is!

(For the most part)

 

Author:
Gheorghie Lang
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab