An Intro to Alteryx Data Types

Alteryx has a lot of built in tools and features for dealing with and examining data types. In this post we will review what a data type is and how to manage data types in your data set. Having incorrect data types can lead to information loss, cause errors in formulas and joins, and prevent your data to properly flow. 1. StringsStrings represents a sequence of characters, which can include letters, numbers, symbols, and whitespace. Strings are used to store and manipulate textual data in various contexts, such as programming languages, databases, and file systems. They are usually surrounded by quotation marks. In Alteryx there are 4 different string types (Source):

Type	Description	Example
String	Fixed Length Latin-1 String. The length should be at least as large as the longest string you want to be contained in the field, or values are truncated. Limited to 8192 Latin-1 characters.	Any string whose length does not vary much from value to value, and only contains simple Latin-1 characters.
WString	Wide String accepts any character (Unicode.) Limited to 8192 characters.	Any string whose length does not vary much from value to value and contains any character.
V_String	Variable Length. The length of the field adjusts to accommodate the entire string within the field.	Any string whose length varies from value to value, and only contains simple Latin-1 characters.
V_WString	Variable Length Wide String. The length of the field adjusts to accommodate the entire string within the field and will accept any character.	Any string whose length varies from value to value and contains any character.

2. Numbers:

Numeric data, also known as numerical data or numeric values, refers to data that consists of numerical values representing quantities, measurements, or counts. It is one of the fundamental types of data used in various fields, including mathematics, statistics, computer science, finance, and scientific research. Numeric data can be further categorized into different subtypes based on their characteristics, such as integers, floating-point numbers, and decimals. Here's an overview of these subtypes:

Integers: Integers are whole numbers without any fractional or decimal part. They can be positive, negative, or zero. Examples of integers include -3, 0, 42, and 100.
Floating-Point Numbers: Floating-point numbers, also known as real numbers, are numbers that contain a fractional part represented with a decimal point. They can represent both integer and non-integer values. Floating-point numbers are typically stored in scientific notation, consisting of a sign bit, a significand (mantissa), and an exponent. Examples of floating-point numbers include -3.14, 0.001, 2.71828, and 123.456.
Decimals: Decimals are numeric values that represent fixed-point numbers with a specified number of decimal places. Unlike floating-point numbers, decimals have a fixed precision and do not suffer from rounding errors associated with floating-point arithmetic. Examples of decimals include 3.14159, 0.5, and -123.456.

There are several different numeric data types including integers, decimals, floats, and doubles. With the exception of the Fixed Decimal type, numeric data types are not adjustable in length. (Source)

Type	Description	Example
Byte	A unit of data that is 8 binary digits (bits) long. A byte field is a positive whole number that falls within the range 0 thru 255, or 28	0, 1, 2, 3....253, 254, 255
Int16	A numeric value without a decimal equal to 2 bytes, or -(215) to (215)-1	–32,768 to 32,767
Int32	A numeric value without a decimal equal to 4 bytes, or -(231) to (231)-1	–2,147,483,648 to 2,147,483,647
Int64	A numeric value without a decimal equal to 8 bytes, or -(263) to (263)-1	–9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
Fixed Decimal	A numeric value with a decimal. The length (precision) of a fixed decimal is equal to the width of the integer (left side of decimal) plus the decimal point plus the width of the scale (right side of decimal). If a number is negative, the negative sign is also included in the length. Alteryx defaults a Fixed Decimal to 19.6. The maximum precision is 50, inclusive of the decimal point and negative sign (if applicable). A Fixed Decimal is the only numeric data type with an adjustable length. Be careful when using Fixed Decimal in the Formula tool and when converting Fixed Decimal to Float or Double. In Formula, Fixed Decimal is implicitly converted into Double. The maximum precision for the Double data type is 15 digits and for Float 7 digits. If you are converting Fixed Decimal, you need to expect to lose all the data that couldn’t fit into the type you are converting to.	A value of 1234.567 with a length of 7.2 results in 1234.57 A value of 1234.567 with a length of 7.3 results in a field conversion error and Null output, as the value does not fit within the specified precision. A value of 1234.567 with a length of 6.1 results in 1234.6 A value of 1234.567 with a length of 8.3 results in 1234.567 A value of -1234.567 with a length of 8.3 results in a field conversion error and Null output, as the value does not fit within the specified precision. A value of 1234.567 with a length of 11.6 results in 1234.567000 A value of 12222222222222222222222222222222222.00000 when converted to Double results in 1.22222222222222e+34 A value of 1.983274187638715245 when converted to Double results in 1.98327418763872
Float	A standard single-precision floating-point value. It uses 4 bytes, and can represent values from +/- 3.4 x 10-38 to 3.4 x 1038 with 7 digits of precision. A float uses a decimal that can be placed in any position and is mainly used to save memory in large arrays of floating-point numbers.	+/- 3.4 x 10-38 to 3.4 x 1038 with 7 digits precision
Double	A standard double-precision floating-point value. It uses 8 bytes and can represent values from +/- 1.7 x 10-308 to 1.7 x 10308 with 15 digits precision. A double uses a decimal that can be placed in any position. A double uses twice as many bits as a float and is generally used as the default data type for decimal values.	+/- 1.7 x 10-308 to 1.7 x 10308 with 15 digits

3. Date Time Data:

DateTime data, also known as date and time data, refers to information that represents specific points in time or durations. It combines date information, which indicates a particular calendar date, with time information, which specifies the time of day. DateTime data is used to represent events, schedules, timestamps, and durations in various applications, including databases, software systems, and data analysis.

Type	Description	Example
Date	A 10-character String in "yyyy-mm-dd" format.	December 2, 2005 = 2005-12-02
Time	Default is an 8-character String in "HH:MM:SS" format. Specify additional precision up to 18 digits, for a max of 27 characters, including the decimal separator.	2:47 and 53 seconds a.m. = 02:47:53 2:47 and 53.236 seconds p.m. = 14:47:53.236
DateTime	Default is a 19-character String in "yyyy-mm-dd HH:MM:SS" format. Specify additional precision up to 18 digits, for a max of 38 characters, including the decimal separator.	2011-05-15 07:20:33 2005-12-02 14:47:53.123456

_datetime) # Output: 2024-02-02 12:34:56

4. Booleans:

A boolean, often referred to as a boolean value or boolean data type, is a fundamental data type in computer programming and logic that represents two possible states: true or false. Booleans are named after the British mathematician George Boole, who first defined an algebraic system of logic in the mid-19th century.

Booleans are used extensively in programming and logical operations to make decisions, control program flow, and evaluate conditions. They are particularly useful in conditional statements, loops, and boolean algebra. Here are some key characteristics of booleans:

Two States: Booleans can only have two possible values: true or false. These values represent the outcome of a logical condition or comparison.
Logical Operations: Booleans are often used in conjunction with logical operators, such as AND, OR, and NOT, to perform logical operations. For example:
- The AND operator returns true if both operands are true.
- The OR operator returns true if at least one operand is true.
- The NOT operator negates the value of a boolean, returning true if the operand is false, and vice versa.
Conditional Statements: Booleans are commonly used in conditional statements (e.g., if statements, switch statements) to execute different blocks of code based on whether a condition is true or false. For example:

Boolean Data

Type	Description	Example
Bool	An expression with only two possible values: True or False.	The words 'True' and 'False' display in the results where 'False' = 0 and 'True' = non-zero.

5. Spatial Objects:

A spatial object, also known as a geometric object or geometry, is a data type used in geographic information systems (GIS), spatial databases, and computational geometry to represent geographical features, locations, and spatial relationships. Spatial objects are fundamental for storing, analyzing, and visualizing spatial data, such as maps, satellite imagery, and geographic information.

Spatial objects can represent various types of geographic entities, including:

Points: A point represents a single location in space defined by its coordinates (latitude and longitude in a geographic coordinate system). Points are often used to represent specific landmarks, addresses, or geographic features.
Lines (or Line Strings): A line is a series of connected points that form a continuous path or route. Lines can represent features such as roads, rivers, borders, or pipelines.
Polygons: A polygon is a closed geometric shape consisting of a series of connected lines that form a closed boundary. Polygons are used to represent areas such as countries, states, cities, and administrative boundaries.
Multipoints: Multipoints are collections of multiple individual points grouped together as a single spatial object. Multipoints are useful for representing sets of discrete locations or spatial distributions.
Multilines (or Line Strings): Multilines are collections of multiple individual lines grouped together as a single spatial object. Multilines are commonly used to represent networks, such as transportation routes or utility networks.
Multipolygons: Multipolygons are collections of multiple individual polygons grouped together as a single spatial object. Multipolygons are used to represent complex geographic features with multiple contiguous or non-contiguous regions.

Spatial objects often include additional attributes or metadata associated with the geographic features they represent, such as names, identifiers, and properties. Spatial databases and GIS software support various operations and analyses on spatial objects, including spatial queries, geometric calculations, proximity analysis, and spatial joins.

Calculations and Data Type Dependency:

It's crucial to note that calculations in programming and Alteryx are data-type dependent. Arithmetic operations like "AVG" are exclusive to numeric fields, string manipulations like "TRIM" are reserved for string types, and date calculations like "DATEDIFF" can only be applied to date fields. Understanding these dependencies ensures accurate and meaningful computations within your programs.

Data Blending and Field Matching:

Data blending, a common practice in combining datasets, necessitates matching field types. String fields can only be joined with other string fields, numeric with numeric, and so forth. Failing to adhere to these rules can result in schema errors, impacting the integrity of your data integration.

Avoiding Data Loss:

Ensuring data integrity also involves avoiding data loss. Truncated strings and loss of precision in numeric values are common pitfalls. String fields, when truncated, lose valuable information, while numeric fields may lose precision, impacting the accuracy of calculations.

As you navigate the vast landscape of data types, understanding their nuances empowers you to construct robust programs and ensure the reliability of your data. Whether you're crunching numbers, manipulating text, or dealing with temporal aspects, a comprehensive grasp of data types is your compass in the coding journey.

Tools for Dealing with Data Types

Select Tool

Data Type

Utilize the Type dropdown menu to modify the data type of a column within your dataset.

Data Size

To adjust the permissible length (in characters for string, date-time, time, and numeric fixed decimal types) or measurement (in bytes for other numeric types) of data in a column, choose Size and input a numerical value. The allowable size varies depending on the data type, and you can customize it for fixed decimal numeric types, date-time and time types, as well as all string types.

Employ the [data type]: Forced option to ensure a column consistently maintains the desired data type. This feature proves beneficial when developing macros.

Auto Field Tool

Utilize the Auto Field function to analyze all records within an input and automatically determine the most suitable field type based on the data contained within the column.

The tool accurately identifies a numeric field even if a record begins with zero rather than a number and appropriately assigns it to a string data type.

Tool Configuration

In the "Select String Fields to Auto Change Field Type" section, designate the fields (columns) for autoconfiguration. You can select fields individually or opt for the "All" or "Clear" options.

Although you can configure this tool in a single step, it's essential to run the workflow at this stage to ensure that the configured fields are accurately loaded and recognized downstream. Run the workflow before proceeding to configure additional downstream tools.

DateTime Tool

Utilize the DateTime tool to seamlessly convert date-time data into various formats, catering to both expression-friendly and human-readable formats. This tool also allows you to specify the language of your date-time data. When performing operations involving two date-time data sets with different precisions, the higher precision takes precedence. To format more precise date-time formats as strings, it's necessary to insert a Select tool before writing to a database.

Tool Configuration:

Connect a data source to the input anchor of the DateTime tool.
In the Configuration window, choose the format to convert. There are two options:a. Date/Time format to string: Converts the date-time data format to a string.
- Select the date/time field to convert: Choose a date-time field (column) from the dropdown menu. If no date-time columns are available, the field appears dimmed and reads "No Suitable Input Fields Available."
- Specify the new column name: Enter a new column name for the converted data, or use the default "DateTime_Out."
- Specify your DateTime Language: Select the dropdown to choose the language for your new string column output.
- Select the format for the new column: Choose the format for your new column from the list, or select "Custom" to create your format via the "Specify a custom format for the new column" field.
b. String to Date/Time format: Converts string data to a DateTime format.
- Select the string field to convert: Choose a string field (column) to convert to a date-time format from the dropdown menu. If no string columns are available, the field appears dimmed and reads "No Suitable Input Fields Available."
- Specify the new column name: Enter a new column name for the converted data, or use the default "DateTime_Out."
- Specify your DateTime Language: Select the dropdown to identify the language of the incoming string data.
- Select the format that matches the incoming string field: Choose the format of the incoming string from the list, or select "Custom" to specify a different format via the "Specify the format of the incoming string" field.

Author:

Salome Grasland

View Profile

An Intro to Alteryx Data Types

Type

Description

Example

String

Fixed Length Latin-1 String. The length should be at least as large as the longest string you want to be contained in the field, or values are truncated. Limited to 8192 Latin-1 characters.

Any string whose length does not vary much from value to value, and only contains simple Latin-1 characters.

WString

Wide String accepts any character (Unicode.) Limited to 8192 characters.

Any string whose length does not vary much from value to value and contains any character.

V_String

Variable Length. The length of the field adjusts to accommodate the entire string within the field.

Any string whose length varies from value to value, and only contains simple Latin-1 characters.

V_WString

Variable Length Wide String. The length of the field adjusts to accommodate the entire string within the field and will accept any character.