Power from 1s and 0s Numbers

In Data Analysis, data analysts usually work with the boolean data type. The return value for the boolean data type is True or False. However, I often use 2 bits 1 and 0 instead of using True or False for boolean type. In this blog, I will summarize some reasons to explain why I love using 1 and 0 in preparing data.


1/ Saving time with Logical Operator

For some complicated problems, we usually need to use multiple conditions to filter the data. If we have a problem with more than 5 or 6 conditionsto filter, how can we track the result as we want?

For example: If we have multiple conditions below, how do you check the result in a short time?

(True OR False) AND (True AND False) AND True OR ( True AND True)

Instead of using True or False, I will use 2 digits number 1 for True and 0 for False. Then, I replace OR by addition (+) and AND by multiplication (*). So I will have a new equation like this:

(1 + 0) * (1 * 0) * 1 + (1 * 1) = 1 * 0 * 1 + 1 = 1

It takes me less than 1 minute to solve that equation. The result is 1 which means the result returns True. To get that equation, I used logical gate for OR and AND.

Image 1: Logical OR Operator is same as an addition equation

The result of the logical OR Operator is same as the result of an addition equation. However, there is 1 special thing when adding 1 with 1. The binary number only represents in 1 or 0, so 1 Or 1 =  1.

Image 2: Logical AND Operator is same as a product equation

The result of the logical AND Operator is same as a product equation.

By switching TRUE and FALSE values into 1 and 0, I can:

  • Solve the problem faster
  • Easy to track the condition
  • Easy to debug

2/ Easy to summarize the total result

Assume that you have a dataset with measure values like this:

Image 3: Given a dataset includes 1 field for Name, 1 field for Grade

Now, I would like to summarize all students whose grade greater than 80. So, I need to create a new field to check if the grade is greater than 80, then return True. Otherwise, it returns False.

I will solve that problem in 2 ways. My first solution is using Boolean data type to return True or False. My second solution is using 1 and 0 values to return.

a) Using Boolean data type (return True or False)

Image 4: Using True or False to return

I need 4 tools to solve that problem. I need a Formula tool to check if Grade > 80, then return True; otherwise, return False. Then, I use a Filter tool to get only records which have True values in the [Passed ?] column.  Finally, I use Summarize tool to count how many records.

b) Using binary numbers (1 and 0)

Image 5: Using binary numbers (1 and 0) 

I only need 3 tools to solve the problem. In Formula tool, I use If function to return 1 if Grade is greater than 80; Otherwise, return 0. Then, I use summarize tool to Sum all values in the [Passed ?] column. I also get the same answer. However, the workflow will be shorter. It leads to the next advantage of using 1 and 0.

3/ Saving space and increasing performance

If we use the binary numbers to return the value, we will use fewer tools than the other way (Prove in part 2). It affects to the size of the file.

Image 6: Comparing the size of each file

In the image above, the file size of 1_0 workflow is smaller than boolean_workflow 1 KB. If we work with a big dataset, we could save much space in the physical storage or in cloud. Also, it will help to increase the performance. The program will run faster and smoother when debugging or testing.

4/ Applications of Binary Numbers

Computers use binary numbers and logical gate to operate the system. The system encoded the characters in 1s and 0s. In some machines, 1 represents ON and 0 represents OFF.

ASCII code (Image from the Internet)

In Machine Learning, there is a One-hot encoding method to encode the category into binary numbers.

One-hot encoding in Machine Learning (Image from Wikipedia)

Only 2 digits 1 and 0 but they are powerful to help me solving the problem or tracking the long condition. Besides that, they also help to save the space and increase the performance of the program. I hope after this blog, you could try apply those digits into solving the problem or challenges.

See you in the next blog!

Author:
Le Luu
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab