Hash Tables & Hash Maps (해시 테이블 & 해시 맵)
Overview
Hash tables are one of the most important data structures, providing average O(1) time complexity for insert, delete, and search operations. Understanding how they work is essential for coding interviews.
1. Hash Table Basics
Definition
- Hash Table: Data structure that maps keys to values using hash function
- Hash Function: Converts key to array index
- Goal: Average O(1) operations
Key Operations
insert(key, value): Add key-value pairget(key): Retrieve value by keydelete(key): Remove key-value paircontains(key): Check if key exists
Time Complexity
| Operation | Average | Worst | Space |
|---|
| Insert | $\mathcal{O}(1)$ | $\mathcal{O}(n)$ | $\mathcal{O}(n)$ |
| Search | $\mathcal{O}(1)$ | $\mathcal{O}(n)$ | - |
| Delete | $\mathcal{O}(1)$ | $\mathcal{O}(n)$ | - |
Note: Worst case occurs when all keys hash to same index (collision)
2. Hash Function
Requirements
- Deterministic: Same key → same hash
- Uniform Distribution: Keys distributed evenly
- Fast Computation: O(1) time
- Minimize Collisions: Different keys → different hashes (ideally)
Common Hash Functions
Integer Keys
1
2
3
4
5
6
7
8
9
| // Simple modulo
int hash(int key, int size) {
return key % size;
}
// Better: use prime number for size
int hash(int key, int prime) {
return key % prime;
}
|
String Keys
1
2
3
4
5
6
7
8
9
10
11
12
| // Polynomial rolling hash
int hash(string s, int size) {
int hash = 0;
int p = 31; // or 53 for lowercase + uppercase
int pow = 1;
for (char c : s) {
hash = (hash + (c - 'a' + 1) * pow) % size;
pow = (pow * p) % size;
}
return hash;
}
|
Visualization
Hash Function Example:
1
2
3
4
5
6
7
8
9
10
11
| Keys: [10, 20, 30, 15, 25]
Hash function: h(k) = k % 7
h(10) = 10 % 7 = 3
h(20) = 20 % 7 = 6
h(30) = 30 % 7 = 2
h(15) = 15 % 7 = 1
h(25) = 25 % 7 = 4
Array: [_, 15, 30, 10, 25, _, 20]
Index: 0 1 2 3 4 5 6
|
3. Collision Resolution
Problem
- Collision: Two keys hash to same index
- Example: h(10) = 3, h(17) = 3 (if size = 7)
Methods
1. Chaining (Separate Chaining)
- Each bucket contains linked list of entries
- Insert: Add to list
- Search: Traverse list
- Delete: Remove from list
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
| class HashTable {
vector<list<pair<int, int>>> table;
int size;
public:
HashTable(int s) : size(s), table(s) {}
void insert(int key, int value) {
int index = hash(key);
for (auto& p : table[index]) {
if (p.first == key) {
p.second = value;
return;
}
}
table[index].push_back({key, value});
}
int get(int key) {
int index = hash(key);
for (auto& p : table[index]) {
if (p.first == key) return p.second;
}
return -1; // Not found
}
};
|
Time Complexity:
- Average: O(1 + α) where α = n/m (load factor)
- Worst: O(n) if all keys collide
2. Open Addressing
- Store all entries in array itself
- When collision occurs, find next available slot
Probing Methods:
Linear Probing
1
2
3
| int probe(int key, int i, int size) {
return (hash(key) + i) % size;
}
|
Quadratic Probing
1
2
3
| int probe(int key, int i, int size) {
return (hash(key) + i*i) % size;
}
|
Double Hashing
1
2
3
4
5
| int probe(int key, int i, int size) {
int h1 = hash1(key);
int h2 = hash2(key);
return (h1 + i * h2) % size;
}
|
Comparison
| Method | Average | Worst | Space | Notes |
|---|
| Chaining | O(1+α) | O(n) | O(n) | Simple, handles any load |
| Linear Probing | O(1/(1-α)) | O(n) | O(n) | Cache friendly, clustering |
| Quadratic Probing | O(1/(1-α)) | O(n) | O(n) | Less clustering |
| Double Hashing | O(1/(1-α)) | O(n) | O(n) | Best distribution |
4. Load Factor
Definition
- Load Factor α = n/m
- n = number of entries
- m = number of buckets
Impact
- Low α (< 0.5): Few collisions, fast operations
- High α (> 0.7): Many collisions, slow operations
- Solution: Rehash when α > threshold
Rehashing
1
2
3
4
5
6
7
8
9
10
| void rehash() {
vector<pair<int, int>> old = entries;
size *= 2;
table.clear();
table.resize(size);
for (auto& p : old) {
insert(p.first, p.second);
}
}
|
5. Hash Set vs Hash Map
Hash Set
- Stores only keys (no values)
- Operations:
add(key), contains(key), remove(key) - Use: Remove duplicates, fast membership test
Hash Map
- Stores key-value pairs
- Operations:
put(key, value), get(key), remove(key) - Use: Frequency counting, caching, indexing
6. Applications
1. Frequency Counting
1
2
3
4
| unordered_map<int, int> freq;
for (int num : nums) {
freq[num]++;
}
|
2. Two Sum Problem
1
2
3
4
5
6
7
8
9
10
11
12
| vector<int> twoSum(vector<int>& nums, int target) {
unordered_map<int, int> map; // value -> index
for (int i = 0; i < nums.size(); i++) {
int complement = target - nums[i];
if (map.count(complement)) {
return {map[complement], i};
}
map[nums[i]] = i;
}
return {};
}
|
3. Group Anagrams
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| vector<vector<string>> groupAnagrams(vector<string>& strs) {
unordered_map<string, vector<string>> map;
for (string s : strs) {
string key = s;
sort(key.begin(), key.end());
map[key].push_back(s);
}
vector<vector<string>> result;
for (auto& p : map) {
result.push_back(p.second);
}
return result;
}
|
4. LRU Cache
- Use hash map + doubly linked list
- O(1) get and put operations
7. Implementation in Different Languages
C++
1
2
3
4
5
6
7
8
9
10
| #include <unordered_map>
#include <unordered_set>
unordered_map<string, int> map;
map["key"] = 10;
int value = map["key"];
unordered_set<int> set;
set.insert(10);
bool exists = set.count(10);
|
Java
1
2
3
4
5
6
7
| HashMap<String, Integer> map = new HashMap<>();
map.put("key", 10);
int value = map.get("key");
HashSet<Integer> set = new HashSet<>();
set.add(10);
boolean exists = set.contains(10);
|
Python
1
2
3
4
5
6
7
8
9
| # Dictionary (hash map)
map = {}
map["key"] = 10
value = map["key"]
# Set (hash set)
set = set()
set.add(10)
exists = 10 in set
|
8. Key Insights
- Average O(1) but worst O(n)
- Depends on hash function quality and load factor
- Collision resolution is crucial
- Chaining: Simple, works with any load
- Open addressing: Space efficient, cache friendly
- Load factor affects performance
- Keep α < 0.7 for good performance
- Rehash when needed
- Hash function matters
- Good hash: uniform distribution
- Bad hash: many collisions
Common Mistakes
- Not handling collisions
- Assuming perfect hash function
- Not considering load factor
- Letting table get too full
- Using mutable keys
- Key changes → hash changes → can’t find entry
- Not checking for existence
- Accessing non-existent key
References