NumPy arrays in Python have the advantage of faster computation speeds than regular Python lists.
One of the key reasons behind this advantage is that a NumPy array stores only the elements of homogenous data types as opposed to a regular Python list.
This quirk of a NumPy array ensures that all its elements are stored in consecutive blocks of memory, which increases the ease and speed with which they can be retrieved.
It reduces the overall computation time on any numerical operations performed using NumPy arrays.
Well, it makes sense, but what if we intentionally tweak the input of a NumPy array to make it hold elements of different data types.
NumPy still won’t deviate from its fundamental quirk and plays a clever trick to find a way around it!
It ‘coerces’ the elements to be of homogenous data types.
Consider a regular list – [6,2,True, 3,4,5]; it is a collection of four integer and one boolean elements.
If we convert it into a NumPy type, the resulting array would have the elements as below:
[6,2,1,3,4,5]
NumPy converted the boolean element ‘True’ to integer ‘1’ to ensure consecutive blocks of memory allocation.
Now, consider another list – [6,2,”True”,3,4,5]. This time, we have a collection of four integers and one string.
As per the heuristic, NumPy should try to convert the string “True” to an integer.
But it is not as straightforward as converting a boolean element to an integer. Therefore, it coerces all the other elements to string data types instead.
The resulting NumPy array would be a collection of strings as below:
[“6″ ,”2”, “True”, “3” , “4”, “5”]
For data scientists and data analysts working with NumPy, understanding its type coercion quirk is crucial.
Especially while dealing with uncleaned data, even a single weird value has the potential to mess up an entire NumPy array!
Comments
Post a Comment