4. Time complexity and time factor #

Created Wednesday 19 February 2020

  1. hash function and compression -
    • This will not be a constant. It depends on the string size, so h() = O(l), where l = key length.
    • We assume that multiplication is fairly constant here for reasonable values of p.
    • The number of keys are huge as compared to the max key size, so we assume that generating the hash_Index is negligible as compared to LL traversal of a bucket. Hence hashing can be said to be O(1), i.e constant.
    • hashing function is independent of the bucket size. But the index depends on the compression function, which does **depend **on the bucket size hence the index **depends **on the bucketSize;
  2. void insertion(string key, V value)
    • The worst case can be if all the elements are in the same bucket. i.e an LL of length n. n = number of keyVal pairs. **But **we have very good hash functions and research indicates that such a thing is not likely to happen.
    • On an average, each bucket has n/b elements. This number is called the load factor = number of kv pairs/ size of bucket array;
    • We need to ensure that n/b < 0.7 or something according to the situation. This means that each bucket has less than one bucket.
    • So having multiple entries is very improbable.
    • Consequenty complexity becomes O(1).
    • So the **crux **is the load factor, if n increases, the only way to keep n/b in check is by increasing b too. i.e resizing the bucket array. This is called **Rehashing. **We hash all the entries to a new double sized bucket array. The old bucket array is discarded. Rehashing is costly but it happens only occasionally. This is like the reallocation of a full vector. Rehashing depends on the load factor.
    • Remember that load factor should ideally indicate the spread of the entries. Merely copying the old bucket in the new bucket is not good and won’t work too, as the compression function depends on the bucketSize .
  3. V remove() - O(1)
  4. getValue - O(1)

Insertion needs only an if statement at the start for rehashing, nothing else is required. Let the insert work normally when this is over.