Making up data: Mockaroo

by Algirdas Grajauskas

What is it?

Mockaroo is a website which allows you to create different datasets, this is useful as it allows to anonymise data or scope out projects where data is still not available yet (for unspecified reasons).

source: Mockaroo website

Do not forget to press SAVE SCHEMA as Mockaroo refreshes every time you go into it and does not automatically save the data

Source: Mockaroo website

Creating your data set: Superstore Dataset

  • First you need to clear the fields

Understand the Superstore dataset, what are important factors in superstore dataset for example: "ID", "first_name", "last_name", "Store Name", "full_name", "email", and other bits and pieces that would be associated with any generic store dataset.

Press add another field and call the field name ID for the type choose a "row number", as this will represent the row for the data.

By clicking ADD ANOTHER FIELD it will duplicate the last field created, and if you try to save the dataset it will not let you as two fields share a name. Therefore, make sure to rename the second field and make it first_name and choose "First Name" from the type. Repeat this process for last_name field.


I SHALL REPEAT ONCE MORE DO NOT FORGET TO SAVE, THEREFORE CLICK SAVE SCHEMA.

It is also good practice to preview the data as you're creating the dataset. It also should be noted that every time preview is pressed or the dataset is downloaded the rows are randomised based on your types but the header names stay the same.


Add another field called Store Name and choose the type "Company name". You can also rename the previous fields are they are not set in store, for example add "store_owner_" to the first name and last name

source: Mockaroo Website

Proceeding to more complex data mockaroo

Adding something more unique such as a title, just create a new field and rename it title and choose the type "title" the one that has "Mr, Mrs, Dr". What is interesting about this is that the titles get attached appropriately to gendered named, therefore female names would get Mrs and male names Mr.

If we also add __ to the front of the field name, this indicates that the field should not be downloaded but it can be used in things such as formula. This makes your life easier when you would want to use the field for calculation but you do not want to populate your dataset with extra columns

source: Mockaroo website

Adding a store_owner_full_name

source: Mockaroo website

By adding a full name we can also choose "full name" type which also knows what the previous first and last names were and will make them correct based on the first and last name. For this exercise, we want to also add the title to the full name, therefore we will use the formula tool ∑ and change the field type to formula. By writing out the formula, make sure that if the field contains any uppercase or alphanumerical characters to put field in front of the column. (The language in formulas are written is similar to RUBY code language)

source: Mockaroo website
source: Mockaroo website

KEEP IN MIND WHEN WRITING IN FORMULA IT IS CASE SENSITIVE


  • Next step create an email

To do this simply repeat the process from before by adding a new field and naming it email, choose "username" for the type and use the formula to create a mock email. lower function is used to make the email more uniform as there may be capital letters in first and last names.

source: Mockaroo website
  • Repeat the previous processes for Country, City, Street name.

It is also possible to restrict the country to specific countries, and then the city field would know which countries it's restricted to and only pull up cities, streets based on those countries.

source: Mockaroo Website

When saving/downloading the dataset, just choose to a lower amount of rows than 1000. As you'd want only up to 50 stores instead of 1000, and download your dataset by clicking download data.


If you save the dataset which you have exported before to your datasets, you can use them later in your consecutive mockaroo sessions.


  • Create the order list by making a new schema which will be our orders

We will need Order ID, Order Data, Store ID, Customer Name, Customer Age, Category, Sales

For Order ID we will want to use the GUID type, for our Order Data we will use datetime. While for the Store ID we will be using Dataset Column and choose our imported Superstore Dataset and choose the ID. For Customer Age we would like to choose a normal distribution type instead of a number, as this would give us more realistic data than random numbers between a range of them. Lastly, we want to add a Category which will be the type of custom this will allow us to put in our own list of items which will be randomly assigned. We would also like to not have an even distribution between our categories, therefore we will choose a custom distribution.

source: Mockaroo website

The distribution works essentially for every x there will be a y amount of them.

  • Using Scenarios to create Sales data

Create column name called Sales, and add rows which are the same as your Categories from before, in my case it was "Technology, Clothes, Groceries". When creating the deviations make sure they will not have any 0's as you would not want to have 0£ value items in your superstore data.

source: Mockaroo website

This concludes your mockaroo preview session. Now that you have created both the superstore dataset and the orders id dataset. These can be joined together in any program such as Alteryx or Tableau Prep, or even Tableau itself.


Thank you for reading :)

Avatar

Algirdas Grajauskas

Fri 29 Jul 2022

Thu 28 Jul 2022

Wed 27 Jul 2022