Any part of the blog indented in italics is written by ChatGPT AI.
After recently recapping data models, I thought I would write this blog to test my ability to describe them, with the twist of testing how a very popular AI bot (that is creating waves in all forms of creative expression) can describe them also. The explanations are brief to test precision and the ability to condense explanations into a shorter format (for both me and the AI).
The default method of linking two sets of data within Tableau is via relationships, however, the idea of using physical joins can initially seem to be more intuitive. This blog will recap the differences between the two, as well as some potential drawbacks.
A join is a great way to combine two or more tables - and increase the amount of fields that relate to an original row of the data, by using a unique identifier in order to join the two. If the two tables have a common field, for example an ID, then the tables can be joined in one of four ways; inner, left, right, and full outer. The latter options allow the passing through of data which might be dropped otherwise - since they have no match with the other tables. It is important to note that the tables must all have the same granularity, otherwise the structure of the original data will change or an explosion of the rows will occur (which I will explain later) .
When using data, a join is a way to combine data from two or more tables based on a common field between them. For example, you might have a customers table and an orders table, and you can use a join to get a list of all the orders made by each customer. There are different types of joins that you can use in Tableau, including inner, outer, left, right, and full outer joins. The type of join you choose depends on the data you are working with and the type of analysis you want to perform.
Relationships on the other hand allow multiple tables to be connected via logical steps, and not in the physical merging of tables as done with a join. This allows values of data to relate to other relevant values of data from another table, without joining them into one dataset, and can work well with different levels of granularity (without the problem of exploding the data). Relationships can also work with multiple datasets easily, and this can be quickly done within Tableau - by creating 'noodles' which relate the datasets in the Data Source section within desktop, while also using a common field as the 'join clause'.
The main use case for relationships is when you require one row of data to relate to many rows of other data in another dataset, but without exploding the data and keeping the original granularity. It is also seen as the preferred method in most cases in modern data model theory, and there are even arguments as to using relationships in every single scenario as opposed to its counterpart.
A relationship, however, is a way to link two or more tables together in a database in such a way that data in one table can be related to data in another table. There are several types of relationships that can be used, including one-to-one, one-to-many, and many-to-many.
For example, in a one-to-one relationship, each record in one table is related to exactly one record in another table. In a one-to-many relationship, a record in one table can be related to many records in another table. And in a many-to-many relationship, a record in one table can be related to many records in another table, and vice versa. A relationship data model helps to ensure the integrity and consistency of the data in a database, by establishing clear connections between entities and enforcing rules about how they can be related to one another. It also makes it easier to query and analyze the data, as you can use the relationships to navigate between different entities and explore the data in different ways.
Joining the data can cause an explosion of data, especially when the tables that are being joined have different levels of granularity, and one row doesn't correspond to another singular row in particular.
Take the example where there are two tables, one being 'Sales Person' and the other being 'Sales'. The Sales Person dataset shows each sales person and their employee information, whereas the sales shows every single sale and which sales person that corresponds to. If there were 10 sales people, but hundreds of rows of sales for each one, then joining the data by the Sales Person's name will explode the data from 10 rows to several hundred if not thousands of new rows. This can be useful in some cases, but catastrophic in others. It's important to know what is going into the joins, and what your desired outcome is; especially the structure and the granularity of the data.
A join can cause an explosion of data, also known as a "cartesian product," when it combines two or more tables in such a way that every row in one table is paired with every row in the other table. This can result in a very large number of rows in the resulting table, depending on the size of the tables being joined.
A drawback that is worth mentioning regarding relationships is the processing speed, and that can be easily seen with anybody that is familiar with Tableau. Physical joins of extract data can improve query speeds, especially when that is compared to logical relationships of live databases.
Obviously joins and relationships can happen simultaneously, but this blog just looks at the concepts from a basic level (when joining 2 tables).
A few things:
- AI is becoming increasingly more impressive.
- It delivers the output within 30 seconds - whereas my part took 30 mins.
- The AI gave general descriptions, and even picked up on areas I forgot to.
- I won't be using the AI again to help me write blogs, however, this was an interesting look into how it could describe some complex ideas - what does this mean for the future of education in the data analytics space?
![](https://www.thedataschool.co.uk/content/images/2022/12/image-286.png)