One thing I've learnt so far through Data School projects is that data does not often arrive as and when you need it.
When we want to build out a dashboard but the data is not ready, we might choose to use dummy data as a placeholder. Our go-to will often be to use Superstore, but what if Superstore is not suitable as a substitute?
Example - Organic Store dashboard
In this example, I want to create some dummy data for a UK-based chain of stores that sell organic food; required fields include a product name, sales figure, production cost, profit, and store location.
Sales figures for this store will usually range from £10 to £140.
We can go to Mockaroo to create our dummy data:
It is worth signing up for a free account, as this allows us to save and revisit the schemas that we make.
N.B. - a schema describes the contents of a dataset, including data type and any key features.
The landing page automatically suggests a typical schema for dummy data; to build out our own schema, we can start by just deleting the fields suggested.
Click the 'ADD ANOTHER FIELD' button to start creating fields.
We can begin by creating our Product Name field: type in the field name, then click in the 'Type' box to select the kind of mock data that will go here. We are given a large range of options to choose from in the 'Choose a Type' pop-up window:
Mockaroo has a Type called 'Product (Grocery)'; this will be a good substitute for our organic store product names.
Next to each field we make, Options for this field will pop up to the right of the data type box. The automatic option is to set what percentage of that field will contain blank values; for this exercise we can leave this at 0%.
Next, we want to set the Sales figure for each product (e.g. how much a customer paid for it). Add another field, call it 'Sales', and choose 'Number' as the data type.
This time, we have more options to choose from: we can set a min and max value, and also choose the number of decimal places.
We know that this number represents a price, so we can set 2 decimal places; we also know that prices tend to range from £10 to £140, so we can set these min and max values.
Next, we can set up the cost of production field; this will be a 2 decimal place number data type, but we would expect it to be a fraction of the selling price. Let's pick an arbitrary number for the max value in this field, say £30.
Now that we have the selling price and the cost of production for each product, we can create a profit field based on these fields. Add another field and call it 'Profit'.
To the far right of our Options next to this new field, there is sigma button (Σ) representing the Formula option.
Clicking this button will open a new window where we can create a formula for the new field.
Formulas in Mockaroo are case-sensitive, and have a particular syntax: when we want to use our fields in a formula, we use the following construction: field('field_name')
Subtract the cost of production from the sales figure to calculate our profit.
The final field we need is for our stores' locations. Mockaroo has a list of cities around the world that would be suitable for this; we can create a new 'Store_location' field and select City as our data type.
However, we know that our organic store is UK-based, so we will ideally only use city names from the UK. We can create a new 'Store_Country' field, and set data type to 'Country' and use the 'restrict countries...' option to set our cities to only the United Kingdom.
Our final schema should look something like this:
At the bottom of the page, we can click the 'SAVE THIS SCHEMA' button in case we want to revisit this work. We can then click 'DOWNLOAD DATA'.
If we want to configure our options for the download, we can set the number of rows, the file format etc. at the bottom of the page:
Note: when you preview the dataset each time, it will look subtly different as Mockaroo refreshes the random values in each field. The final output will only be 'confirmed' when it is downloaded.
We now have some viable dummy data for our food store:
If we did not want to include the Store Country field in our final output - since it only shows 'United Kingdom' over and over - we could use a 'hidden field' here. Hidden fields in Mockaroo can be used to help construct other fields, without necessarily appearing in the dataset we download. To make a hidden field, we just select the field in question and add two underscores at the front of the field name, e.g. '__Store_Country'.
We can go back to the schema and refine things as necessary if needs be in Mockaroo; otherwise, we can use this as our Tableau data input, and start building a sample dashboard for our client: