How to Build a Serverless API Data Pipeline on GCP (Part 1/2)

Introduction

In this blog, you'll learn how to build a production-ready, serverless data pipeline on Google Cloud Platform that automatically ingests data from an external API and loads it into BigQuery for analysis.

This is part 1 / 2 of the blog.

What you'll build:

    • An automated pipeline that runs every hour
    • Serverless API ingestion using Cloud Functions
    • Raw data storage in Cloud Storage
    • Structured data warehouse in BigQuery

What you'll learn:

    • Setting up GCP services for data engineering
    • Writing and deploying Cloud Functions
    • Scheduling jobs with Cloud Scheduler
    • Loading data into BigQuery
    • Handling duplicates and data quality

Prerequisites:

    • A GCP account (free tier works!)
    • Basic Python knowledge (can use AI to write basic API call)
    • Familiarity with REST APIs
    • ~30-45 minutes

Architecture Overview:

Cloud Scheduler → Cloud Functions → Cloud Storage → BigQuery
     (Hourly)     (Python API Call)   (JSON Files)    (Analytics)

Step 1: Set Up Your GCP Project

1.1 Create a New Project

  1. Go to the GCP Console
  2. Click on the project dropdown at the top
  3. Click "New Project"
  4. Name your project (e.g., `serverless-api-pipeline')
  5. Click "Create"

1.2 Enable Required APIs

Navigate to APIs & Services → Library and enable:

    • Cloud Functions API
    • Cloud Scheduler API
    • Cloud Storage API
    • BigQuery API

1.3 Set Up Billing

Make sure billing is enabled for your project (required for Cloud Functions and Scheduler).


Step 2: Choose and Test Your API

2.1 Select an API

For this tutorial, we'll use a free exchange rates API. You can use:

  • Frankfurter (free)
  • Or any other public REST API that returns JSON

2.2 Test the API in Cloud Shell

    • Search for and open Cloud Shell through the search console
    • Select Python as your language and test the API with this script:
import requests
import json
from datetime import datetime, timezone

API_URL = "https://api.frankfurter.app/latest"

def call_api():
    response = requests.get(API_URL, timeout=10)
    response.raise_for_status()
    return response.json()

def main():
    data = call_api()

    output = {
        "ingestion_timestamp": datetime.now(timezone.utc).isoformat(),
        "base_currency": data["base"],
        "date": data["date"],
        "rates": data["rates"]
    }

    print(json.dumps(output, indent=2))

if __name__ == "__main__":
    main()

You should see a JSON response like this:

{
  "base": "EUR",
  "date": "2025-12-25",
  "rates": {
    "AUD": 1.75,
    "GBP": 0.73,
    "JPY": 110.5
  }
}

Step 3: Create a Cloud Storage Bucket

3.1 Create the Bucket

  1. Search for Cloud Storage
  2. Click "Create Bucket"
  3. Name your bucket (e.g., `api-pipeline-data')
  4. Choose a region close to you
  5. Select "Standard" storage class
  6. Click "Create"

3.2 Create Folder Structure

Inside your bucket, create a folder called raw/exchange_rates/:


Step 4: Build the Cloud Function

4.1 Create the Function

  1. Search for Cloud Functions
  2. Click "Create Function"
  1. Configure:
    • Name: api-ingestion-function
    • Region: Same as your bucket and Tier 1 pricing
    • Authentication: Require authentication

4.2 Write the Function Code

Click "Next" to go to the code editor.

Runtime: Python 3.11

Function entry point: main

main.py:

import json
import requests
from datetime import datetime, timezone
from google.cloud import storage

API_URL = "https://api.frankfurter.app/latest"
BUCKET_NAME = "api-pipeline-data"

def call_api():
    response = requests.get(API_URL, timeout=10)
    response.raise_for_status()
    return response.json()

def main(request):
    data = call_api()

    output = {
        "ingestion_timestamp": datetime.now(timezone.utc).isoformat(),
        "base_currency": data["base"],
        "date": data["date"],
        "rates": data["rates"]
    }

    timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
    filename = f"raw/exchange_rates/{timestamp}.json"

    client = storage.Client()
    bucket = client.bucket(BUCKET_NAME)
    blob = bucket.blob(filename)
    blob.upload_from_string(
        json.dumps(output),
        content_type="application/json"
    )

    return {
        "status": "success",
        "file_written": filename
    }

requirements.txt:

functions-framework==3.*
google-cloud-storage==2.10.0
requests==2.31.0

4.3 Deploy the Function

  1. Click "Deploy"
  2. Wait for deployment to complete (2-3 minutes)
  3. Note the Trigger URL once deployed

Step 5: Schedule the Function with Cloud Scheduler

5.1 Create a Scheduler Job

  1. Navigate to Cloud Scheduler
  2. Click "Create Job"
  3. Configure:
    • Name: hourly-api-ingestion
    • Region: Same as your function
    • Frequency: 0 * * * * (every hour at minute 0)
    • Timezone: Your timezone

5.2 Configure the Target

  1. Target type: HTTP
  2. URL: Your Cloud Function trigger URL
  3. HTTP method: GET
  4. Auth header: Add OIDC token
    • Service account: Default compute service account

5.3 Test the Schedule

Click "Force Run" to test immediately.

Check your Cloud Storage bucket - you should see a new JSON file!


Please navigate to Part 2 of this blog to see the rest of the steps.

Author:
Rosh Khan
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab