Field Data Types in Alteryx

by Ian Baldwin

In Alteryx data fields are assigned a field type that is then used consistently for every record in the data field. Field types largely fall into five categories which can then be explored further.

Boolean Data

Boolean data is the simplest possible form of data. It is either True or False, a 1 or a 0. In Alteryx a Boolean record takes up one byte and is denoted by Bool in field type menus.

Numerical Data

Numerical data can come in various forms and byte sizes depending on the size and accuracy of the numbers needed. Byte is integer counting using a byte’s 8 bits, it gives you a number between 0 and 255 (28 -1). Int16, Int32 and Int64 are integer counters using 16, 32 and 64 bits respectively, exponentially growing the numbers you can count to with the data types. Int32 tops out at just over 2 billion whereas Int64 tops out at over 9 quintillion. If your data is discretely countable then using integer format counting is best practice for precision.

Fixed Decimal data is an adaptation of integer counting that places a fixed decimal point within the counted number. This format is useful for data such as currency to two decimal places or a set of measurements with an accepted precision such as a person’s height to the nearest centimetre. Fixed decimal data uses the appropriate number of bytes after the formatting has been declared.

Float and Double are ‘floating’ number formats that dedicate a portion of their ones and zeros to placing a decimal point where necessary. As such they do not count as accurately as their integer counterparts, but the upside to this format is you get a continuous measure rather than a discrete one. These floating formats can reach much higher values, 1038 for 4 byte Float data and 10308 for 8 byte Double data, and of course you can do decimals with them.

Alteryx automatically reads numeric data as Double data as it can pretty much catch all scenarios with its huge maximum value and 15 significant-digits precision. The downside to this is its not always the most compact data format you could use.

String Data

String data can be declared in a number of different ways depending on the character set required and the anticipated length of the string. If the string uses standard western character types and has a known length, declare it using String where each character will take up one byte. If it contains other characters such as Arabic, Chinese or emoji characters then WString should be used as the wide format which uses two bytes per character recognises over 60,000 Unicode characters.

Input data with variable string lengths declared using V_String and V_WString will have a cutting character that signals the end of a string. Like Double, V_WString is a great catch all field type when you are unsure on the format of your incoming data. A perfect use case for V_WString is analysing twitter engagement that comes in multiple languages, has variable character length and almost certainly contains a lot of emoji. However, if you have a highly controlled data field such as three letter country codes (AUS, FRA, USA etc.) then String could save you over 50% memory over V_Wstring. In this case it saves 63% memory with 3 bytes per record instead of 8.

Date and Time Data

Date and time fields are strings that get read in a special way to be interpreted as time based data. Date fields are 10 character strings in the format “yyyy-mm-dd”, Time fields are 8 character strings of the format “hh:mm:ss” and DateTime fields combine these into a 19 character string of the format “yyyy-mm-dd hh:mm:ss”.

Blobs

Blobs can either be a SpatialObj (a spatial object) or an undefined Blob. Spatial objects can be a point, a line or a polygon and are primarily used in geospatial analytics. While undefined blobs can be almost any undefined binary file, often this can be an image or a sound file.

Avatar

Ian Baldwin

Thu 05 Jan 2017

Fri 28 Oct 2016

Fri 28 Oct 2016