Matching Multiple Keywords Per Row in Alteryx

When working with string text data in Alteryx, one common challenge is matching keywords within a field. This is usually straightforward if you only need to find one keyword per row, but things get more complicated when a single row can contain multiple relevant keywords.

In a recent project, I needed to identify all keyword matches within a text field, not just the first one. In this post, I’ll walk through how I used the Append Fields tool to solve this problem.

The Problem

Let’s say you have a dataset like this:

ID Description
1 Annual audit fee and bank charge
2 Software subscription
3 Transfer fee and service fee

And a keyword list like this:

KeywordClassification
audit feeProfessional Services
bank chargeBanking Fees
subscriptionSoftware / SaaS
service feeOperational Costs

The goal is to match the keywords in the dataset to a keyword list that contains a classification for each keyword, and then append the relevant classification to the dataset.

A typical approach (e.g. Find Replace or IF/OR logic) will only return one match per row.

But in this case, we want:

  • Row 1 → audit fee and bank charge
  • Row 3 → transfer fee and service fee

So we need a way to capture multiple matches within the same row.

The Approach

The way I approached this was to flip the logic slightly. Instead of trying to find one matching keyword per row, I checked each row against every keyword in the list. To do that, I expanded the data so each row could be compared to all keywords individually.

Step 1: Expand the Data with Append Fields

The Append Fields tool allows us to pair each row in our main dataset with every row in the keyword list.

After this step, the data looks like this:

IDDescriptionKeyword
1Annual audit fee and bank chargeaudit fee
1Annual audit fee and bank chargebank charge
1Annual audit fee and bank chargesubscription
1Annual audit fee and bank chargeservice fee
2Software subscriptionaudit fee
2Software subscriptionbank charge
2Software subscriptionsubscription
2Software subscriptionservice fee

Each original row is now duplicated for every keyword, so row ID 1 now has a row for every keyword, and the same for row ID 2, and so on.

This might look inefficient (and it temporarily increases row count a lot), but it’s exactly what we need.

Step 2: Filter for Matches

Now that each row contains a single keyword to check, we can test whether the keyword appears in the description.

A simple approach would be:

Contains(Lowercase([Description]), Lowercase([Keyword]))

However, this can create false positives (e.g. matching “fee” inside “coffee”).

So instead, I used regex to match whole words or phrases more accurately.

This step filters the dataset down to only the true matches. The exact regex you use will depend on your dataset, but a simple example would be:

REGEX_Match(Lowercase([Description]), ".*\\b" + Lowercase([Keyword]) + "\\b.*")

This uses word boundaries (\b) to ensure we only match whole words, rather than matching keywords inside other words.

In my actual workflow, I used a slightly more robust version to handle punctuation and avoid edge cases with short keywords. This looked something like:

REGEX_Match(
Lowercase([Description]),
".*(^|[^a-z0-9])" + Lowercase([Keyword]) + "([^a-z0-9]|$).*"
)

This version ensures the keyword is not part of a larger word, even when punctuation or formatting varies.

Step 3: Remove Duplicate Matches

Sometimes the keyword list contains duplicates or small variations, such as different casing.

To clean this up, I grouped by:

  • ID
  • cleaned keyword
  • classification

This ensures each keyword-classification match is only returned once per row, and avoids duplicate classifications appearing later in the process.

Step 4: Handle Multiple Matches per Row

At this point, some IDs will still appear multiple times, because a single row may contain more than one valid keyword match, each with its own classification.

To reshape this back into a usable format, I then:

  • sorted by ID
  • used a Multi-Row Formula to assign a sequence number to each match per ID
  • used that sequence number to create the output columns in a Cross Tab tool

The final result looked like this:

IDClassification_1Classification_2
1Professional ServicesBanking Fees
2Software / SaaSnull
3Operational Costsnull

You’ll notice that Classification_2 is null for some rows. This simply means that only one keyword match was found for that record, so there was no second classification to populate. Where multiple keyword matches exist, the Multi-Row Formula assigns them an order, and the Cross Tab tool outputs them into separate columns.

This approach is also scalable. It is not limited to two matches. If a row contains three or more keyword matches, the same logic can automatically produce additional columns such as Classification_3, Classification_4, and so on as needed.

Why This Works

Most keyword matching approaches only return one match per row, usually based on keyword order. That means you can miss valid matches when multiple keywords appear in the same text. By checking each keyword independently, this approach captures all relevant matches and is easy to scale as your keyword list grows.

Trade-offs

The main downside is that the data gets much larger during processing, which can impact performance on bigger datasets. You may also need extra logic, like priority rules, if you want to decide which match should be treated as the main one.

Final Thoughts

Using Append Fields like this is a simple way to move from single-match to multi-match logic in Alteryx. It works especially well with free-text fields and keyword classification problems. Once you understand the expand → filter → reshape pattern, it’s a really useful technique you can reuse in a lot of scenarios.

Author:
Martin Regan
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2026 The Information Lab