Save space. Keep it compact. Make it small. It wouldn't be the first time you have to be aware of the space something takes up.
In the data world, more space means more computing power which might lead to delays and lengthy run times... not a very appetizing thought, right? Lucky for us, we can format the data types and their sizes to reduce the memory space the data set occupies.
Before we delve into the details of data types, let's talk about bits and bytes.
Unlike humans, who store information using decimal numbers and letters, computers store information using bits (also known as binary digits). We can look at a bit as the smallest unit of information and is represented as a one (1) or a zero (0). When you put eight (8) bits together you get a byte. When information is stored in a computer, the amount that is stored is referred to in bytes. There are a lot more details and information about computer processing, but let's not byte off more than we can chew for now.
For this part, we'll focus on some popular data types where the size can be changed. It's important to note that you cannot change the size of your data type once you're in Tableau, so make sure to change it in your data preparation or cleaning tool.
String (number if characters)
Latin-1 characters with a length range from 0 to 8192
One character = 8 bits (0r 1 byte).
Changing the size of your string data type is pretty simple. In Python you can add the size in brackets after specifying the data type. In Alteryx you choose the size in the formula or select tool.
Date values (number if characters)
Date - default 10 characters
Time - default 8 characters (max 27)
DateTime - default 19 characters (max 38)
In Alteryx you can only change the Time & DateTime types by adding space for an additional 18 characters.
Numeric values (binary size)
To choose the size of the numeric data types, you select one of the options below. You cannot further edit the size of the types below.
Byte = 1 byte (only positive numbers in range 0-255)
Int16 = 2 bytes (–32,768 to 32,767)
Int32 = 4 bytes (–2,147,483,648 to 2,147,483,647)
Int64 = 8 bytes (–9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
Float = 4 bytes (+/- 3.4 x 10-38 to 3.4 x 1038 with 7 digits precision)
Double = 8 bytes (+/- 1.7 x 10-308 to 1.7 x 10308 with 15 digits)
The fixed decimal length is determined by the number of characters before, after, and including the decimal point. In Alteryx he maximum number of characters is 19 and the scale is the number of characters after the decimal point.
With smaller data sets, the size of your data type won't affect the computing power much, it becomes an issue with very large data sets. However, keeping this in the back of your thoughts and getting into the habit of looking at the size as you work through cleaning your data, will be beneficial in future projects.