SUBSTR in Snowflake, or SUBSTRING in other databases, is a simple function that extracts part of a string from a larger string.
Syntax:
SUBSTR(<value>, <starting_point> [, <length>])
- value: the string you want to extract a substring from
- starting_point: where you want to start extracting from (position of the letter, where 1 is the first character)
- length: how many characters to extract (optional)
Note: The <length> parameter is optional. If you leave it out, the function will extract from the starting point all the way to the end of the string.
Example 1: Basic extraction
SELECT SUBSTR('Snowflake', 1, 4);
-- Result: 'Snow'
Example 2: Extracting from the middle
SELECT SUBSTR('Snowflake', 5, 5);
-- Result: 'flake'
Example 3: Omitting length to extract to the end
SELECT SUBSTR('Snowflake', 5);
-- Result: 'flake'
Bonus Tip: You can also use negative starting positions to count from the end of the string. For example, SUBSTR('Snowflake', -5) returns 'flake'.
Pairing up with POSITION to make it dynamic
While SUBSTR is powerful on its own, it becomes even more useful when combined with POSITION (or CHARINDEX). This allows you to dynamically extract substrings based on the location of specific characters or patterns, rather than hardcoding position numbers.
Example: Extracting everything after a delimiter
Let's say you have a table with email addresses and want to extract just the domain:
SELECT
email,
SUBSTR(email, POSITION('@' IN email) + 1) AS domain
FROM emails;
Result:
email | domain
-------------------------|-------------
john.doe@company.com | company.com
jane.smith@tech.io | tech.io
bob.jones@startup.co | startup.co
Example: Extracting everything before a delimiter
Now let's extract just the username (everything before the @):
SELECT
email,
SUBSTR(email, 1, POSITION('@' IN email) - 1) AS username
FROM emails;Result:
email | username
-------------------------|------------
john.doe@company.com | john.doe
jane.smith@tech.io | jane.smith
bob.jones@startup.co | bob.jones
By combining SUBSTR with POSITION, you can handle real-world data that doesn't follow a fixed format, making your queries more flexible and maintainable.
