Why would anyone use set instead of unordered_set?

C++0x is introducing `unordered_set` which is available in `boost` and many other places. What I understand is that `unordered_set` is hash table with `O(1)` lookup complexity. On the other hand, `set` is nothing but a tree with `log(n)` lookup complexity. Why on earth would anyone use `set` instead of `unordered_set`? i.e is there a need for `set` anymore?

When, for someone who wants to iterate over the items of the set, the order matters.

Unordered sets have to pay for their O(1) average access time in a few ways:

• `set` uses less memory than `unordered_set` to store the same number of elements.
• For a small number of elements, lookups in a `set` might be faster than lookups in an `unordered_set`.
• Even though many operations are faster in the average case for `unordered_set`, they are often guaranteed to have better worst case complexities for `set` (for example `insert`).
• That `set` sorts the elements is useful if you want to access them in order.
• You can lexicographically compare different `set`s with `<`, `<=`, `>` and `>=`. `unordered_set`s are not required to support these operations.

Whenever you prefer a tree to a hash table.

For instance, hash tables are “O(n)” at worst case. O(1) is the average case. Trees are “O(log n)” at worst.

Use set when:

1. We need ordered data(distinct elements).
2. We would have to print/access the data (in sorted order).
3. We need predecessor/successor of elements.

Use unordered_set when:

1. We need to keep a set of distinct elements and no ordering is required.
2. We need single element access i.e. no traversal.

Examples:

set:

Input : 1, 8, 2, 5, 3, 9

Output : 1, 2, 3, 5, 8, 9

Unordered_set:

Input : 1, 8, 2, 5, 3, 9

Output : 9 3 1 8 2 5 (maybe this order, influenced by hash function)

Mainly difference :

Note:(in some case `set` is more convenient) for example using `vector` as key

``````set<vector<int>> s;
s.insert({1, 2});
s.insert({1, 3});
s.insert({1, 2});

for(const auto& vec:s)
cout<<vec<<endl;   // I have override << for vector
// 1 2
// 1 3
``````

The reason why `vector<int>` can be as key in `set` because `vector` override `operator<`.

But if you use `unordered_set<vector<int>>` you have to create a hash function for `vector<int>`, because vector does’t have a hash function, so you have to define one like:

``````struct VectorHash {
size_t operator()(const std::vector<int>& v) const {
std::hash<int> hasher;
size_t seed = 0;
for (int i : v) {
seed ^= hasher(i) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
return seed;
}
};

vector<vector<int>> two(){
//unordered_set<vector<int>> s; // error vector<int> doesn't  have hash function
unordered_set<vector<int>, VectorHash> s;
s.insert({1, 2});
s.insert({1, 3});
s.insert({1, 2});

for(const auto& vec:s)
cout<<vec<<endl;
// 1 2
// 1 3
}
``````

you can see that in some case `unordered_set` is more complicated.

Mainly cited from:
https://www.geeksforgeeks.org/set-vs-unordered_set-c-stl/
https://stackoverflow.com/a/29855973/6329006

Because std::set is part of Standard C++ and unordered_set isn’t. C++0x
is NOT a standard, and neither is Boost. For many of us, portability is essential, and that means sticking to the standard.

Consider sweepline algorithms. These algorithms would fail utterly with hash tables, but work beautifully with balanced trees. To give you a concrete example of a sweepline algorithm consider fortune’s algorithm. http://en.wikipedia.org/wiki/Fortune%27s_algorithm

`g++` 6.4 stdlibc++ ordered vs unordered set benchmark

I benchmarked this dominant Linux C++ implementation to see the difference:

The full benchmark details and analysis have been given at: What is the underlying data structure of a STL set in C++? and I will not repeat them here.

“BST” means “tested with `std::set` and “hash map” means “tested with `std::unordered_set`. “Heap” is for `std::priority_queue` which I analyzed at: Heap vs Binary Search Tree (BST)

As a quick summary:

• the graph clearly shows that under these conditions, hashmap insertion were always a lot faster when there are more than 100k items, and the difference grows as the number of items increases

The cost of this speed boost is that you are not able to efficiently traverse in order.

• the curves clearly suggest that ordered `std::set` is BST-based and `std::unordered_set` is hashmap based. In the reference answer, I further confirmed that by GDB step debugging the code.

Similar question for `map` vs `unordered_map`: Is there any advantage of using map over unordered_map in case of trivial keys?

One more thing, in addition to what other people already mentioned. While the expected amortized complexity for inserting an element to an unordered_set is O(1), every now and then it will take O(n) because the hash-table needs to be restructured (the number of buckets needs to change) – even with a ‘good’ hash function. Just like inserting an element in a vector takes O(n) every now and then because the underlying array needs to be reallocated.

Inserting in a set always takes at most O(log n). This might be preferable in some applications.

While this answer might be 10 years late, it’s worth pointing out that `std::unordered_set` also has security downsides.

If the hash function is predictable (this is typically the case unless it applies counter-measures such as a randomized salt), attackers can hand-craft data that produces hash collisions and causes all insertions and look-ups to take O(n) time.

This can be used for very efficient and elegant denial-of-service attacks.

Many (most?) implementations of languages that internally employ hash maps have run into this:

Pardon me, one more thing worth noticing about the sorted property:

If you want a range of data in container, for example: You stored time in set, and you want time from 2013-01-01 to 2014-01-01.

For unordered_set it is impossible.

Of course, this example would be more convincing for usage cases between map and unordered_map.

Off hand, I would say it is convenient to have things in a relationship if you’re looking to convert it into a different format.

It is also possible that whilst one is faster to access, the time to build the index or the memory used when creating and/or accessing it is greater.

If you want to have things sorted, then you would use set instead of unordered_set. unordered_set is used over set when ordering stored does not matter.

Here’s a practical reason that I haven’t seen listed… if used incorrectly in buggy code, unordered sets can cause code to behave differently on different machines. This is because the order that the values are stored is not consistent across machines.

If code is (incorrectly) written that relies on the order of storage, the result will be that the program behaves inconsistently between different machines. Practically, this could happen if the unordered set is part of the implementation of a function/method that returns a list of values. The client of that function may not realize that an unordered set is being used, and may not realize that the order of the returned list is not guaranteed to be consistent/portable.

Thus, unordered sets are a bit more unforgiving to the programmer than ordered sets. They introduce this additional mechanism for confusing code behavior, which can lead to time consuming/confusing bugs because they may not be reproducible between machines.