Dealing with Complex datasets can be a challenging task

This week we had to work with a dataset that seemed easy to understand at first, but after closer inspection, it turned out to be more complex than expected. First glances can be deceiving. Spoiler alert, that's the main learning for me from this week. Some datasets might look simple, but in reality, require several hours to be understood.

So, dealing with a complex dataset can be a challenging task. Here are some general steps that can be taken to handle a complex dataset effectively:

  1. Understand the data: Before doing any analysis, it is essential to understand the data. This includes understanding the data structure, variable types, and relationships between variables. If you can, ask the stakeholders for help, and go over the data along with your client.
  2. Clean and preprocess the data: Cleaning and preprocessing the data can help identify and correct errors, handle missing data, and transform the data into a format suitable for analysis. This also helps with the understanding process, often new questions arise from this step.
  3. Explore the data: Exploring the data can help identify patterns, relationships, and outliers that can inform the subsequent analysis. This can include visualizations, summary statistics, and data profiling. For us in the Data School, that means having fun in Tableau.
  4. Choose appropriate analysis methods: The chosen methods should be appropriate for the type of data and the research question being addressed. This could include statistical methods, machine learning algorithms, or data mining techniques. Don't try to fit the data to your preferred method, rather find the fitting method for your data/problem.
  5. Validate the results: Validating the results is crucial to ensure that the analysis methods used are appropriate and that the results are reliable. This can include cross-validation, sensitivity analysis, or hypothesis testing. This part took us most of the time this week, which in hindsight was not the ideal solution. Asking more questions at the Kick-off meeting and working closely with the client would have saved us hours.
  6. Communicate the results: The results of the analysis should be communicated clearly and effectively. This can include visualizations, tables, and written summaries, sometimes even all of them. This week we missed the opportunity to include a summary and it the presentation suffered because of it. Being honest with the client is also very important, if you don't have the necessary data to answer their questions, you should tell them that.

In addition to these steps, it is essential to clearly understand the research question being addressed and involve the stakeholders where necessary. This week, one of their definitions made no sense to me but did make sense to them. Never forget that they know their business better than we do (even though often they don't know their data as much as they think they do). Additionally, it is important to continually refine the analysis as new insights are gained or additional data becomes available. Don't forget to present what would be the next steps!

Author:
Lucas Carvalhal Sirieiro
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab