In today's training we discovered Mockaroo - a really handy tool for creating a fictional dataset which is also free to use! I expect this will be very useful in the not-so-distant future when the data I am using for a project is fairly limited and so can provide me with a way to replicate and expand upon that data. This has happened to me already where I had a few different columns of data all for one specific date, however I wanted to bring in some more rows for different dates to allow for some examples of trend analysis.
To start the process, you want to create a new schema which will take you to a page like the screenshot below:
![](https://www.thedataschool.co.uk/content/images/2022/03/image-17.png)
You can then create as many columns as you'd like, labelling them via 'Field Name'. Once named, you can select the 'Type' which has a whole range of options such as names, car makes, addresses etc. There are also a few fields which allow for more customisable fields such as random numbers or letters which allow for a bit more control, for example if there is a certain format you'd like the field to follow like 'ABC-123'.
After this, there are a few more options such as choosing a percentage of fields to be blank to replicate some incomplete data, or performing calculations on a field in a similar way to a 'Calculated Field' in Tableau or 'Formula' tool in Alteryx. However, the main difference here is that these calculations are done using the Ruby language, so there are some slight differences such as needing to use field('first_name') to reference the first_name field.
Once this has been set up as you'd like, you can choose how many rows of data you'd like to generate and then preview to check it's all looking okay. If you're happy with this, you can download the data in a selected format and you're ready to go.
![](https://www.thedataschool.co.uk/content/images/2022/03/image-31.png)
The last thing worth mentioning is that you can save schemas, which is particularly useful when you're dealing with larger datasets, as there's nothing more frustrating than making 20 complicated fields to then lose it all - so save as you go!