In this blog post, I’ll walk you through the process of creating an air quality indicator using data from the EPA API, cleaning it in Tableau Prep, and visualizing the results in Tableau Pulse.
1. Introduction to the Project
The goal of this project is to create a reliable air quality indicator by leveraging real-time data from the EPA, cleaning and preparing the data in Tableau Prep, and then using Tableau Pulse to analyze and visualize the data effectively.
2. Data for Pulse
To begin with, it’s essential to understand the type of data we’re working with. Air quality data typically includes various metrics like PM2.5, Ozone levels, and other pollutants on a daily level. For this project, the data was sourced from the EPA’s AirNow API, which provides daily air quality information.In order for a metric to work well in Pulse the data needs these qualities.
- Data Characteristics:
- Granularity: Daily data is essential for tracking changes over time.
- Completeness: The dataset should be free of gaps to ensure accurate analysis.
- Field Names: Use understandable and human-readable field names.
- Key Fields: Include a time dimension, a measure (e.g., Air Quality Indicators), filters (e. g., Pollutants), and a record ID for identifying outliers.
3. Getting the Data from the EPA API
The EPA provides an API called AirNow, which allows you to download daily air quality data. You can automate this process using Google Scripts, which will fetch the data daily and save it to your Google Drive.You will need to create a free account to get a key and call the API. The API key can be found at the top right corner of the Web Services tab. If you don’t want to use the API there is also the option to download daily data as a csv.
Steps:
- API Call: Set up an API call using the provided query specifying the parameters like zip code, date, and the type of data (e.g., PM2.5 levels).
Automation: Use Google Scripts to automate the data retrieval process. You can create a new script from Google Drive under the New button.
To write the script navigate to the <> tab.
The script below retrieves the last 90 days of data and saves it to drive, ensure to use your API key with it and designate your drive location.
function Air_Quality_Data() {
// API key and other parameters
var apiKey = 'insert your API key';
var zipCode = '10010';
var distance = 25; // Adjust the distance if necessary
// Initialize a variable to store the combined CSV content
var combinedCsvContent = '';
var headerIncluded = false; // Flag to track if the header has been included
// Loop through the last 90 days
for (var i = 0; i < 90; i++) {
// Calculate the date for each day
var date = new Date();
date.setDate(date.getDate() - i);
var dateStr = Utilities.formatDate(date, Session.getScriptTimeZone(), 'yyyy-MM-dd');
// Construct the API URL for each day
var url = 'https://www.airnowapi.org/aq/forecast/zipCode/?format=text/csv&zipCode='
+ zipCode + '&date=' + dateStr + '&distance=' + distance + '&API_KEY=' + apiKey;
// Fetch the data from the API
var response = UrlFetchApp.fetch(url);
if (response.getResponseCode() == 200) {
Logger.log('Data for ' + dateStr + ': Fetched successfully.');
// Get the CSV content for the day
var csvContent = response.getContentText();
// Split the content into rows
var rows = csvContent.split('\n');
// Append rows to the combined content, skipping empty rows and headers after the first one
for (var j = 0; j < rows.length; j++) {
if (rows[j].trim() !== '') { // Check if the row is not empty
if (!headerIncluded) {
// Include the header from the first file
combinedCsvContent += rows[j];
headerIncluded = true; // Set the flag to true after including the header
} else if (j > 0) {
// Skip the header and only include data rows
combinedCsvContent += '\n' + rows[j];
}
}
}
} else {
Logger.log('Failed to fetch data for ' + dateStr + '. HTTP Status code: ' + response.getResponseCode());
}
}
// Folder ID where you want to save the file
var folderId = '1NrAOYee2AGyCTdC8i7I2zBPZjERGQTIM';
// Get the folder by ID
var folder = DriveApp.getFolderById(folderId);
// Define the file name for the combined CSV
var fileName = 'NYCDailyAirQualityData.csv';
// Search for an existing file with the same name in the folder
var files = folder.getFilesByName(fileName);
// If the file exists, delete it
if (files.hasNext()) {
var file = files.next();
file.setTrashed(true);
}
// Create a new file in the folder with the combined CSV content
folder.createFile(fileName, combinedCsvContent, MimeType.CSV);
Logger.log('Data saved: ' + fileName);
}
The script can be scheduled to run daily, ensuring that you always have the most recent data available.
4. Cleaning and Preparing Data in Tableau Prep
Once the data is retrieved, the next step is to clean and prepare it in Tableau Prep.
Key Steps in Tableau Prep:
- Import Data: Load the daily air quality data into Tableau Prep, you can either connect directly to Google Drive from Prep or upload it as a data source to Tableau Cloud. Make sure to embed your credentials!
- Data Cleaning: Keep only the date values in date issue that are equal to date forecast, remove all columns except date issue, parameter, and AQI. Rename date issue to date and parameter to pollutant.
- Data Transformation: Create a record ID using the ROW_NUMBER() function. Create a calculated field with this value {ORDERBY [Date]: ROW_NUMBER()}
- Output: Save the cleaned dataset as a data source in Cloud, which will be used in Tableau Pulse for further analysis.
5. Creating the Air Quality Indicator in Tableau Pulse
With the data cleaned and ready, the next step is to create the air quality indicator in Tableau Pulse. Pulse is a powerful tool for real-time monitoring and alerting based on key metrics.
Steps:
- Connect to Data:Create a new metric definition and select the data source you would like to use.
- Metric Creation: Pass AQI into measure and aggregate it to average. Pass date to time dimension. Pass pollutant to metric filters. In the Insights tab pass Record ID to Record identifier.
- Save the Definition: Once saved you can adjust the filters and follow the metric to get a daily feed of insights.