Data-Structure Genericity

This section describes genericity over different underlying data-structures. It is organized as follows.

  1. The Basic Problem
  2. Container Hierarchy
  3. Data-Structure Tags and Traits
  4. Find-Type and Range-Type Methods and Iterators

The Basic Problem

The design attempts to address the following problem. When writing a function manipulating a generic container object, what is the behaviour of the object? E.g., suppose one writes

template<
    class Cntnr>
void some_op_sequence
    (Cntnr &r_cntnr)
{
    ...
}
then one needs to address the following questions in the body of some_op_sequence:
  1. Which types and methods does Cntnr support? Containers based on hash tables can be queries for the hash-functor type and object; this is meaningless for tree-based containers. Containers based on trees can be split, joined, or can erase iterators and return the following iterator; this cannot be done by hash-based containers.
  2. What are the guarantees of Cntnr? A container based on a probing hash-table invalidates all iterators when it is modified; this is not the case for containers based on node-based trees. Containers based on a node-based tree can be split or joined without exceptions; this is not the case for containers based on vector-based trees.
  3. How does the container maintain its elements? containers based on splay trees or lists with update policies "cache" "frequently accessed" elements; containers based on most other underlying data-structures do not.

Container Hierarchy

Figure Class hierarchy shows the container hierarchy.

  1. basic_assoc_cntnr contains types and methods shared by all associative containers, e.g., the type allocator and the method find.
  2. basic_hash_assoc_cntnr subclasses basic_assoc_cntnr, and contains types and methods shared by all hash-based containers, e.g., the type hash_fn.
    1. cc_hash_assoc_cntnr and gp_hash_assoc_cntnr each subclass basic_hash_assoc_cntnr, and encapsulate collision-chaining and (general) probing hash tables, respectively. These two types of hash tables have somewhat different policies and methods (i.e., constructors and policy-access methods).
  3. tree_assoc_cntnr subclasses one of basic_tree_assoc_cntnr which subclasses basic_assoc_cntnr. tree_assoc_cntnr encapsulates a tree-based container, and is parameterized by which underlying data-structure to use (e.g., a red-black tree); basic_assoc_cntnr. is specialized to the capabilities of the underlying structure. tree_assoc_cntnr contains some additional methods over basic_assoc_cntnr, e.g., split and join methods.
  4. lu_assoc_cntnr subclasses basic_assoc_cntnr, and encapsulates a list with update policies.

The hierarchy is composed naturally, such that each container inherits all types and methods from its base. Data-Structure Tags and Traits discusses how to query which types and methods each container supports.

Data-Structure Tags and Traits

pb_assoc contains a tag and traits mechanism similar to that of the STL's iterators.

pb_assoc contains a tag hierarchy corresponding to the hierarchy in Figure Class hierarchy. The tag hierarchy is shown in Figure Data-structure tag class hierarchy.

no image
Data-structure tag class hierarchy.

basic_assoc_cntnr publicly defines ds_category as one of the classes in Figure . Given any container Cntnr, the tag of the underlying data-structure can be found via typename Cntnr::ds_category.

Additionally, a traits mechanism can be used to query a container type for its attributes. Given any container Cntnr, then ds_traits<Cntnr> is a traits class identifying the properties of the container.

To find if a container can throw when a key is erased (which is true for vector-based trees, for example), one can use

ds_traits<Cntnr>::erase_can_throw, for example.

Some of the definitions in ds_traits are dependent on other definitions. E.g., if ds_traits<Cntnr>::split_join is true (which is the case for containers based on trees), then ds_traits<Cntnr>::split_join_can_throw indicates whether splits or joins can throw exceptions (which is true for vector-based trees); otherwise ds_traits<Cntnr>::split_join_can_throw will yield a compilation error. (This is somewhat similar to a compile-time version of the COM model [mscom]).

Find-Type and Range-Type Methods and Iterators

pb_assoc differentiates between two types of methods: find-type methods, and range-type methods. For example, find is a find-type method, since a container object searches for an element with a given key; insert is a find-type method, since, by STL convention, a container object returns an iterator corresponding to an element with a given key; begin and end are range-type methods, since they are not used to find a specific element, but rather to go over all elements in a container object.

Correspondingly, containers in pb_assoc define two families of iterators. const_find_iterator and find_iterator are the iterator types returned by find-type methods; const_iterator and iterator are the iterator types returned by range-type methods.

The relationship between these iterator types varies between container types. In a tree-based container, for example, const_find_iterator and const_iterator are synonymous, and find_iterator and iterator are synonymous; in a hash-based container, for example, this is not the case. Futhermore, find-type iterators in a hash-based container lack movement operators, such as operator++. All containers, however, maintain the invariants shown in Figure .

This distinction between find-type and range-type iterators and methods, while complicating the interface, has several advantages:

Iterators in unordered container types

Given an unordered container type, e.g., a hash-based container, it makes no sense to move an iterator returned by a find-type method. Let cntnr be an associative-container object, and consider:

std::for_each(m.find(1), m.find(5), foo);

which applies foo to all elements in m between 1 and 5.

If cntnr is a tree-based container object, then an in-order walk will apply foo to the relevant elements, e.g., as in Figure Range iteration in different data-structures -A. If m is a hash-based container, then the order of elements between any two elements is undefined (and probably time-varying); there is no guarantee that the elements traversed will coincide with the logical elements between 1 and 5, e.g., as in Figure Range iteration in different data-structures-B.

The application of a range function e.g., for_each, to a pair of hash-based container's iterators is possibly sensical only if the iterators are those returned by begin and end, respectively. Therefore, the iterator returned by m's find method should be immovable.

Another point also indicates that hash-based containers' find-type iterators and range-type iterators should be distinct. Consider Figure Find-type iterators in hash tables-A. An (immovable) find-type iterator, designed only to access an element, requires at most a single pointer to the element's link. Conversely, an iterator designed for range operations requires some more information e.g., the bucket number), since a cross-list traversal might be necessary. Alternatively, the lists might be linked, forming a monolithic total-element list, as in Figure Find-type iterators in hash tables-B (this seems similar to the Dinkumware design [dinkumware_stl]). This, however, complicates the hash-table's operations.

no image
Range iteration in different data-structures.
no image
Find-type iterators in hash tables.

As a consequence of this design,

std::for_each(m.find(1), m.find(5), foo);

will compile for tree-based containers, but will not compile for hash-tables or other types. The returned type of find is a find-type iterator. For tree-based containers, this is synonymous with a range-type iterator, and therefore supports operator++; for other types of containers, a find-type iterator lacks operator++.

Invalidation Guarantees

Consider the following snippet:

it = c.find(3);

c.erase(5);

Following the call to erase, what is the validity of it: can it be dereferenced? can it be incremented?

The answer depends on the underlying data-structure of the container. Figure Effect of erase in different underlying data-structures shows three cases: A1 and A2 show a red-black tree; B1 and B2 show an ordered-vector tree; C1 and C2 show a collision-chaining hash table.

no image
Effect of erase in different underlying data-structures.
  1. `Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can be dereferenced and incremented.
  2. Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is not valid at all.
  3. Erasing 5 from C1 yields C2. Here the situation is more complicated. On the one hand, incrementing it can be undefined. On the other hand, there is no problem in dereferencing it. In classic STL, it is not possible to express whether it is valid or not.

Thus again, the iterator concept seems overloaded. Distinguishing between find and range types allows fine-grained invalidation guarantees. Invalidation guarantees class hierarchy shows tags corresponding to different types of invalidation guarantees.

no image
Invalidation guarantees class hierarchy.
  1. basic_invalidation_guarantee corresponds to a basic guarantee that a find-type iterator, a found pointer, or a found reference, remains valid as long as the container object is not modified.
  2. find_invalidation_guarantee corresponds to a guarantee that a find-type iterator, a found pointer, or a found reference, remains valid even if the containter object is modified.
  3. range_invalidation_guarantee corresponds to a guarantee that a range-type iterator remains valid even if the containter object is modified.

To find the invalidation guarantee of a container, one can use

typename ds_traits<Cntnr>::invalidation_guarantee

which is one of the classes in Figure Invalidation guarantees class hierarchy.