Searched hist:246726 (Results 1 - 2 of 2) sorted by relevance

/freebsd-11-stable/sys/vm/
H A Dvm_radix.hdiff 246726 Tue Feb 12 23:26:27 MST 2013 attilio Implement a new algorithm for managing the radix trie which also
includes path-compression. This greatly helps with sparsely populated
tries, where an uncompressed trie may end up by having a lot of
intermediate nodes for very little leaves.

The new algorithm introduces 2 main concepts: the node level and the
node owner. Every node represents a branch point where the leaves share
the key up to the level specified in the node-level (current level
excluded, of course). Such key partly shared is the one contained in
the owner. Of course, the root branch is exempted to keep a valid
owner, because theoretically all the keys are contained in the space
designed by the root branch node. The search algorithm seems very
intuitive and that is where one should start reading to understand the
full approach.

In the end, the algorithm ends up by demanding only one node per insert
and this is not necessary in all the cases. To stay safe, we basically
preallocate as many nodes as the number of physical pages are in the
system, using uma_preallocate(). However, this raises 2 concerns:
* As pmap_init() needs to kmem_alloc(), the nodes must be pre-allocated
when vm_radix_init() is currently called, which is much before UMA
is fully initialized. This means that uma_prealloc() will dig into the
UMA_BOOT_PAGES pool of pages, which is often not enough to keep track
of such large allocations.
In order to fix this, change a bit the concept of UMA_BOOT_PAGES and
vm.boot_pages. More specifically make the UMA_BOOT_PAGES an initial "value"
as long as vm.boot_pages and extend the boot_pages physical area by as
many bytes as needed with the information returned by
vm_radix_allocphys_size().
* A small amount of pages will be held in per-cpu buckets and won't be
accessible from curcpu, so the vm_radix_node_get() could really panic
when the pre-allocation pool is close to be exhausted.
In theory we could pre-allocate more pages than the number of physical
frames to satisfy such request, but as many insert would happen without
a node allocation anyway, I think it is safe to assume that the
over-allocation is already compensating for such problem.
On the field testing can stand me correct, of course. This could be
further helped by the case where we allow a single-page insert to not
require a complete root node.

The use of pre-allocation gets rid all the non-direct mapping trickery
and introduced lock recursion allowance for vm_page_free_queue.

The nodes children are reduced in number from 32 -> 16 and from 16 -> 8
(for respectively 64 bits and 32 bits architectures).
This would make the children to fit into cacheline for amd64 case,
for example, and in general spawn less cacheline, which may be
helpful in lookup_ge() case.
Also, path-compression cames to help in cases where there are many levels,
making the fallouts of such change less hurting.

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff (partially)
Tested by: flo
H A Dvm_radix.cdiff 246726 Tue Feb 12 23:26:27 MST 2013 attilio Implement a new algorithm for managing the radix trie which also
includes path-compression. This greatly helps with sparsely populated
tries, where an uncompressed trie may end up by having a lot of
intermediate nodes for very little leaves.

The new algorithm introduces 2 main concepts: the node level and the
node owner. Every node represents a branch point where the leaves share
the key up to the level specified in the node-level (current level
excluded, of course). Such key partly shared is the one contained in
the owner. Of course, the root branch is exempted to keep a valid
owner, because theoretically all the keys are contained in the space
designed by the root branch node. The search algorithm seems very
intuitive and that is where one should start reading to understand the
full approach.

In the end, the algorithm ends up by demanding only one node per insert
and this is not necessary in all the cases. To stay safe, we basically
preallocate as many nodes as the number of physical pages are in the
system, using uma_preallocate(). However, this raises 2 concerns:
* As pmap_init() needs to kmem_alloc(), the nodes must be pre-allocated
when vm_radix_init() is currently called, which is much before UMA
is fully initialized. This means that uma_prealloc() will dig into the
UMA_BOOT_PAGES pool of pages, which is often not enough to keep track
of such large allocations.
In order to fix this, change a bit the concept of UMA_BOOT_PAGES and
vm.boot_pages. More specifically make the UMA_BOOT_PAGES an initial "value"
as long as vm.boot_pages and extend the boot_pages physical area by as
many bytes as needed with the information returned by
vm_radix_allocphys_size().
* A small amount of pages will be held in per-cpu buckets and won't be
accessible from curcpu, so the vm_radix_node_get() could really panic
when the pre-allocation pool is close to be exhausted.
In theory we could pre-allocate more pages than the number of physical
frames to satisfy such request, but as many insert would happen without
a node allocation anyway, I think it is safe to assume that the
over-allocation is already compensating for such problem.
On the field testing can stand me correct, of course. This could be
further helped by the case where we allow a single-page insert to not
require a complete root node.

The use of pre-allocation gets rid all the non-direct mapping trickery
and introduced lock recursion allowance for vm_page_free_queue.

The nodes children are reduced in number from 32 -> 16 and from 16 -> 8
(for respectively 64 bits and 32 bits architectures).
This would make the children to fit into cacheline for amd64 case,
for example, and in general spawn less cacheline, which may be
helpful in lookup_ge() case.
Also, path-compression cames to help in cases where there are many levels,
making the fallouts of such change less hurting.

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff (partially)
Tested by: flo

Completed in 90 milliseconds