Contains Duplicate

Given an array of integers, find if the array contains any duplicates. Your function should return true if any value appears at least twice in the array, and it should return false if every element is distinct.

Difficulty: Easy

The easy solution would be to simply compare each element to the others. It takes O(n2) steps with two loops. Larger the array size leads exponentially increasing amounts of time.

For completing it in O(n) steps we can think of allocating the extra array in which we mark at the position of the integer whenever we find it in the given array. Let’s say we have the mask array M. When we run into ‘324’ in the given array, we mark 1 in M[325]. In this question, because there’s no range limitation of integers, we would need to allocate the extra array with size 232 = 4G + 1. It’s huge and it most likely can fail with limited memory constraints such as embedded systems. We could use the bit mask to reduce the size of mask array. With a 32-bit mask we can mask the integers with range of 0 – 31. To cover whole 32-bit integer range we need around 2G memory (2G and one more bit). It’s still huge.

How can we reduce the memory footprint? At 3rd we can think of using a hash table. This consumes the smallest memory footprint among the 3 approaches. This takes more computing time than bit mask approach. Smaller hash table leads more conflict, meaning taking longer computing time to resolve.

struct node {
    int num;
    struct node *next;
};

#define NHASH 237
#define MULT 31
struct node *bin[NHASH] = { 0 };

unsigned int hash( unsigned int num ) {
    unsigned int h = MULT * num + num;
    return h % NHASH;
}

bool containsDuplicate(int* nums, int numsSize) {
    int i, h;
    struct node *p;
    
    for ( i = 0; i < NHASH; i++ ) {
        if ( bin[i] != NULL ) free( bin[i] );
        bin[i] = NULL;
    }
    
    while ( numsSize-- ) {
        h = hash( *nums );
        printf("\n%d, %d", *nums, h);
        for ( p = bin[h]; p != NULL; p = p->next ) {
            if ( p->num == *nums )
                return true;
        }
        p = malloc( sizeof(struct node) );
        p->num = *nums++;
        p->next = bin[h];
        bin[h] = p;
    };
    
    return false;
}

 

[references]

  1. Interview question – https://leetcode.com/problems/contains-duplicate/
  2. Hash tables – p162, Programming Pearls 2nd ed, Jon Bentley
Advertisements
This entry was posted in Algorithms, C and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s