Summarising YouTube Video Descriptions with Alteryx and OpenAI

The aim of this project was to leverage the natural language processing power of Open AI’s GPT language model in order to automatically generate video descriptions for our YouTube channel. This was achieved using Alteryx and Google Sheets. Here you will find an explanation of the steps in the process.

Google API Authentication

Ref: https://developers.google.com/identity/protocols/oauth2

  1. Create Oauth 2.0 credentials (Client ID & Secret) for your application here: https://console.cloud.google.com/apis/credentials
  2. Deal with scopes and get a refresh token.
    To do this you need to put an https request in the address bar of your browser and authenticate manually with the Google account that owns the videos you want to generate descriptions for. The request is formed like this:

    https://accounts.google.com/o/oauth2/v2/auth?scope=https://www.googleapis.com/auth/youtube.force-ssl&response_type=code&redirect_uri=http://127.0.0.1/callback&client_id={client id}.apps.googleusercontent.com

    Notice that the scope is set to ‘force-ssl’. Authorization tokens are all linked to certain scopes which allow you to perform certain functions on your Google/Youtube Account. In order to do what we want to do in this project, the ‘force-ssl’ or ‘youtubepartner’ scope is required.

    the redirect_uri is set to http://127.0.0.1/callback as this should be your localhost and will give you the response in the address bar. Look for ‘code’ in the address bar once you have hit enter and that will be the code you need for the next step.
  3. Generate Access and Refresh tokens.
    This request can be done in Alteryx or something else like Postman:

    POST request
    URL: https://oauth2.googleapis.com/token?
    Params:
    client_id - {client id}
    client_secret - {client secret}
    code - the code you got from the previous step
    grant_type - ‘authorization_code’
    redirect_uri - http://127.0.0.1/callback
    Headers: none should be needed

    The request will look something like this: https://oauth2.googleapis.com/token?client_id={client id}&client_secret={client secret}&code={code}&grant_type=authorization_code&redirect_uri=http://127.0.0.1/callback

    In response you should receive an Access & Refresh Token.
  4. Set up token refresh in Alteryx
    Now onto Alteryx. You want to start the workflow with a refresh of the access token as they do not last forever.

    In the download tool:

    POST request
    URL: https://oauth2.googleapis.com/token
    Payload:
    grant_type - ‘refresh_token’
    client_id - {client id}
    client_secret - {client secret}
    refresh_token - {refresh token} (obtained in previous step)
    Headers: none

Get Video Captions

The input here is a Google sheet which contains the Ids (the bit at the end of the url) of the videos we want to generate descriptions for. The other columns are for a checking stage and for keeping track of which stages of the workflow have been completed for each video.

Download tool 1 (Get caption ID):

GET Request
URL: https://www.googleapis.com/youtube/v3/captions
Payload: part - ‘snippet’
videoId - {videoId}
Headers: Authorization - ‘Bearer {Access token}’

Download tool 2 (Get caption text):

GET Request
URL: https://www.googleapis.com/youtube/v3/captions/{videoId}
Payload: Id - {caption id} (from previous request)
Headers: Authorization - ‘Bearer {Access token}’

Summarise Video Using Open AI and Write Description Back to Google Sheets

We now need to start using the OpenAI API. This requires an account and is not free, although the charges are very small, especially when using gpt 3.5 (the same model used in ChatGPT as of July 2023). The authorization for this API will be ‘Bearer’ + a key you can get for your account here: https://platform.openai.com/account.

In order to use gpt 3.5, which in my opinion is the best option currently, you need to format your message within the JSON like a chat with “role” and “content”. Here is what is should look like:

'{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "' + 'summarise the following text in 150 words or less, do not mention the word Transcript:' + {Transcript}+ '"}]
}'

You can adapt the prompt to determine the response you will get. The {Transcript} will be the captions/transcript that you got earlier from your video.

In the download tool, the setups should be as follows:

POST request
URL: https://api.openai.com/v1/chat/completions
Payload: select ‘Take Query String/Body from Field’ and choose your JSON field. Headers:
Authorization - ‘Bearer {API key}’
Content-Type - ‘application/json’
User-Agent - ‘alteryx

The Google Sheet output will write to a new sheet with the videoId and the summary/description generated by gpt 3.5.

Update YouTube Video Description

The final step is to update the video automatically with our new description text. For this we need a second branch in the workflow after the first Append Fields tool, just after the first Google sheet input.

The first filter tool looks at two columns in the Google sheet that we saw earlier: ‘ok?’ and ‘description updated?’. The ‘ok?’ column is there to be manually updated with a ‘Y’ when the relevant individual is happy for the generated description to be put live on YouTube. The ‘description updated?’ column will be automatically filled in when the description is actually updated on the video. Therefore the filter tool looks for video Ids that have had their description ok’d and which have not already had their descriptions updated. Because these columns and the actual description for the video are on separate sheets, a join is needed to bring in the actual text. Things have to be like this as Alteryx does not allow you to update rows in a Google sheet, only append.

The first download tool here is used to get the video title and the video category -  two necessary parameters for actually updating the description. The setup is as follows:

GET request
URL: https://www.googleapis.com/youtube/v3/videos
Payload: part - ‘snippet’ Id - {video Id}
Headers: Authorization - ‘Bearer {Access token}’

Next the JSON needs to be constructed as follows:

'{
"id": "'+ {video id} +'",
"snippet": {
"title": "'+ {title} +'",
"categoryId": "'+ {categoryId} +'",
"description": "' + {description text} + '"
}
}'

Download tool setup should be like this:

PUT request
URL: https://www.googleapis.com/youtube/v3/videos
Payload: select ‘Take Query String/Body from Field’ and choose your JSON field. Headers:
Authorization - ‘Bearer {Access Token}’
Content-Type - ‘application/json’

After updating the description on YouTube, the workflow adds a row on yet another sheet on the Google sheet with the videoid and video title in order to keep track of if the video description update has been completed for that video.

Google Sheet Formulas

Something I have operating on my Google sheet are a couple of formulas on sheet 1 which keep track of if a video has a had a description generated by gpt and if that description has been updated on the actual video. This makes things easier. The formula on sheet 1 looks at one of the other relevant sheets and checks if the video Id exists, if it does then a ‘Y’ is put in the relevant column.

Here is the formula used for the 'description updated?' column:

=IF(ISNA(VLOOKUP(A2, 'Video Updates'!A:A, 1, FALSE)), "N", "Y")

The Google API in this workflow can be particularly tricky and my intention is for this guide to be a useful resource for projects of this kind. Good luck!

Author:
Ollie Matthews
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab