Given an array of integers, find if the array contains any duplicates. Your function should return true if any value appears at least twice in the array, and it should return false if every element is distinct.
The easy solution would be to simply compare each element to the others. It takes O(n2) steps with two loops. Larger the array size leads exponentially increasing amounts of time.
For completing it in O(n) steps we can think of allocating the extra array in which we mark at the position of the integer whenever we find it in the given array. Let’s say we have the mask array M. When we run into ‘324’ in the given array, we mark 1 in M. In this question, because there’s no range limitation of integers, we would need to allocate the extra array with size 232 = 4G + 1. It’s huge and it most likely can fail with limited memory constraints such as embedded systems. We could use the bit mask to reduce the size of mask array. With a 32-bit mask we can mask the integers with range of 0 – 31. To cover whole 32-bit integer range we need around 2G memory (2G and one more bit). It’s still huge.
How can we reduce the memory footprint? At 3rd we can think of using a hash table. This consumes the smallest memory footprint among the 3 approaches. This takes more computing time than bit mask approach. Smaller hash table leads more conflict, meaning taking longer computing time to resolve.
- Interview question – https://leetcode.com/problems/contains-duplicate/
- Hash tables – p162, Programming Pearls 2nd ed, Jon Bentley