bitmap_allocator.xml revision 1.5
1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="manual.ext.allocator.bitmap" xreflabel="bitmap_allocator"> 3<?dbhtml filename="bitmap_allocator.html"?> 4 5<info><title>The bitmap_allocator</title> 6 <keywordset> 7 <keyword>ISO C++</keyword> 8 <keyword>allocator</keyword> 9 </keywordset> 10</info> 11 12 13 14<para> 15</para> 16 17<section xml:id="allocator.bitmap.design"><info><title>Design</title></info> 18 19 20 <para> 21 As this name suggests, this allocator uses a bit-map to keep track 22 of the used and unused memory locations for its book-keeping 23 purposes. 24 </para> 25 <para> 26 This allocator will make use of 1 single bit to keep track of 27 whether it has been allocated or not. A bit 1 indicates free, 28 while 0 indicates allocated. This has been done so that you can 29 easily check a collection of bits for a free block. This kind of 30 Bitmapped strategy works best for single object allocations, and 31 with the STL type parameterized allocators, we do not need to 32 choose any size for the block which will be represented by a 33 single bit. This will be the size of the parameter around which 34 the allocator has been parameterized. Thus, close to optimal 35 performance will result. Hence, this should be used for node based 36 containers which call the allocate function with an argument of 1. 37 </para> 38 39 <para> 40 The bitmapped allocator's internal pool is exponentially growing. 41 Meaning that internally, the blocks acquired from the Free List 42 Store will double every time the bitmapped allocator runs out of 43 memory. 44 </para> 45 46 <para> 47 The macro <literal>__GTHREADS</literal> decides whether to use 48 Mutex Protection around every allocation/deallocation. The state 49 of the macro is picked up automatically from the gthr abstraction 50 layer. 51 </para> 52 53</section> 54 55<section xml:id="allocator.bitmap.impl"><info><title>Implementation</title></info> 56<?dbhtml filename="bitmap_allocator_impl.html"?> 57 58 59<section xml:id="bitmap.impl.free_list_store" xreflabel="Free List Store"><info><title>Free List Store</title></info> 60 61 62 <para> 63 The Free List Store (referred to as FLS for the remaining part of this 64 document) is the Global memory pool that is shared by all instances of 65 the bitmapped allocator instantiated for any type. This maintains a 66 sorted order of all free memory blocks given back to it by the 67 bitmapped allocator, and is also responsible for giving memory to the 68 bitmapped allocator when it asks for more. 69 </para> 70 <para> 71 Internally, there is a Free List threshold which indicates the 72 Maximum number of free lists that the FLS can hold internally 73 (cache). Currently, this value is set at 64. So, if there are 74 more than 64 free lists coming in, then some of them will be given 75 back to the OS using operator delete so that at any given time the 76 Free List's size does not exceed 64 entries. This is done because 77 a Binary Search is used to locate an entry in a free list when a 78 request for memory comes along. Thus, the run-time complexity of 79 the search would go up given an increasing size, for 64 entries 80 however, lg(64) == 6 comparisons are enough to locate the correct 81 free list if it exists. 82 </para> 83 <para> 84 Suppose the free list size has reached its threshold, then the 85 largest block from among those in the list and the new block will 86 be selected and given back to the OS. This is done because it 87 reduces external fragmentation, and allows the OS to use the 88 larger blocks later in an orderly fashion, possibly merging them 89 later. Also, on some systems, large blocks are obtained via calls 90 to mmap, so giving them back to free system resources becomes most 91 important. 92 </para> 93 <para> 94 The function _S_should_i_give decides the policy that determines 95 whether the current block of memory should be given to the 96 allocator for the request that it has made. That's because we may 97 not always have exact fits for the memory size that the allocator 98 requests. We do this mainly to prevent external fragmentation at 99 the cost of a little internal fragmentation. Now, the value of 100 this internal fragmentation has to be decided by this function. I 101 can see 3 possibilities right now. Please add more as and when you 102 find better strategies. 103 </para> 104 105<orderedlist> 106 <listitem><para>Equal size check. Return true only when the 2 blocks are of equal 107size.</para></listitem> 108 <listitem><para>Difference Threshold: Return true only when the _block_size is 109greater than or equal to the _required_size, and if the _BS is > _RS 110by a difference of less than some THRESHOLD value, then return true, 111else return false. </para></listitem> 112 <listitem><para>Percentage Threshold. Return true only when the _block_size is 113greater than or equal to the _required_size, and if the _BS is > _RS 114by a percentage of less than some THRESHOLD value, then return true, 115else return false.</para></listitem> 116</orderedlist> 117 118 <para> 119 Currently, (3) is being used with a value of 36% Maximum wastage per 120 Super Block. 121 </para> 122</section> 123 124<section xml:id="bitmap.impl.super_block" xreflabel="Super Block"><info><title>Super Block</title></info> 125 126 127 <para> 128 A super block is the block of memory acquired from the FLS from 129 which the bitmap allocator carves out memory for single objects 130 and satisfies the user's requests. These super blocks come in 131 sizes that are powers of 2 and multiples of 32 132 (_Bits_Per_Block). Yes both at the same time! That's because the 133 next super block acquired will be 2 times the previous one, and 134 also all super blocks have to be multiples of the _Bits_Per_Block 135 value. 136 </para> 137 <para> 138 How does it interact with the free list store? 139 </para> 140 <para> 141 The super block is contained in the FLS, and the FLS is responsible for 142 getting / returning Super Bocks to and from the OS using operator new 143 as defined by the C++ standard. 144 </para> 145</section> 146 147<section xml:id="bitmap.impl.super_block_data" xreflabel="Super Block Data"><info><title>Super Block Data Layout</title></info> 148 149 <para> 150 Each Super Block will be of some size that is a multiple of the 151 number of Bits Per Block. Typically, this value is chosen as 152 Bits_Per_Byte x sizeof(size_t). On an x86 system, this gives the 153 figure 8 x 4 = 32. Thus, each Super Block will be of size 32 154 x Some_Value. This Some_Value is sizeof(value_type). For now, let 155 it be called 'K'. Thus, finally, Super Block size is 32 x K bytes. 156 </para> 157 <para> 158 This value of 32 has been chosen because each size_t has 32-bits 159 and Maximum use of these can be made with such a figure. 160 </para> 161 <para> 162 Consider a block of size 64 ints. In memory, it would look like this: 163 (assume a 32-bit system where, size_t is a 32-bit entity). 164 </para> 165 166<table frame="all" xml:id="table.bitmap_alloc"> 167<title>Bitmap Allocator Memory Map</title> 168 169<tgroup cols="5" align="left" colsep="1" rowsep="1"> 170<colspec colname="c1"/> 171<colspec colname="c2"/> 172<colspec colname="c3"/> 173<colspec colname="c4"/> 174<colspec colname="c5"/> 175 176<tbody> 177 <row> 178 <entry>268</entry> 179 <entry>0</entry> 180 <entry>4294967295</entry> 181 <entry>4294967295</entry> 182 <entry>Data -> Space for 64 ints</entry> 183 </row> 184</tbody> 185</tgroup> 186</table> 187 188 <para> 189 The first Column(268) represents the size of the Block in bytes as 190 seen by the Bitmap Allocator. Internally, a global free list is 191 used to keep track of the free blocks used and given back by the 192 bitmap allocator. It is this Free List Store that is responsible 193 for writing and managing this information. Actually the number of 194 bytes allocated in this case would be: 4 + 4 + (4x2) + (64x4) = 195 272 bytes, but the first 4 bytes are an addition by the Free List 196 Store, so the Bitmap Allocator sees only 268 bytes. These first 4 197 bytes about which the bitmapped allocator is not aware hold the 198 value 268. 199 </para> 200 201 <para> 202 What do the remaining values represent?</para> 203 <para> 204 The 2nd 4 in the expression is the sizeof(size_t) because the 205 Bitmapped Allocator maintains a used count for each Super Block, 206 which is initially set to 0 (as indicated in the diagram). This is 207 incremented every time a block is removed from this super block 208 (allocated), and decremented whenever it is given back. So, when 209 the used count falls to 0, the whole super block will be given 210 back to the Free List Store. 211 </para> 212 <para> 213 The value 4294967295 represents the integer corresponding to the bit 214 representation of all bits set: 11111111111111111111111111111111. 215 </para> 216 <para> 217 The 3rd 4x2 is size of the bitmap itself, which is the size of 32-bits 218 x 2, 219 which is 8-bytes, or 2 x sizeof(size_t). 220 </para> 221</section> 222 223<section xml:id="bitmap.impl.max_wasted" xreflabel="Max Wasted Percentage"><info><title>Maximum Wasted Percentage</title></info> 224 225 226 <para> 227 This has nothing to do with the algorithm per-se, 228 only with some vales that must be chosen correctly to ensure that the 229 allocator performs well in a real word scenario, and maintains a good 230 balance between the memory consumption and the allocation/deallocation 231 speed. 232 </para> 233 <para> 234 The formula for calculating the maximum wastage as a percentage: 235 </para> 236 237 <para> 238(32 x k + 1) / (2 x (32 x k + 1 + 32 x c)) x 100. 239 </para> 240 241 <para> 242 where k is the constant overhead per node (e.g., for list, it is 243 8 bytes, and for map it is 12 bytes) and c is the size of the 244 base type on which the map/list is instantiated. Thus, suppose the 245 type1 is int and type2 is double, they are related by the relation 246 sizeof(double) == 2*sizeof(int). Thus, all types must have this 247 double size relation for this formula to work properly. 248 </para> 249 <para> 250 Plugging-in: For List: k = 8 and c = 4 (int and double), we get: 251 33.376% 252 </para> 253 254 <para> 255For map/multimap: k = 12, and c = 4 (int and double), we get: 37.524% 256 </para> 257 <para> 258 Thus, knowing these values, and based on the sizeof(value_type), we may 259 create a function that returns the Max_Wastage_Percentage for us to use. 260 </para> 261 262</section> 263 264<section xml:id="bitmap.impl.allocate" xreflabel="Allocate"><info><title><function>allocate</function></title></info> 265 266 267 <para> 268 The allocate function is specialized for single object allocation 269 ONLY. Thus, ONLY if n == 1, will the bitmap_allocator's 270 specialized algorithm be used. Otherwise, the request is satisfied 271 directly by calling operator new. 272 </para> 273 <para> 274 Suppose n == 1, then the allocator does the following: 275 </para> 276 <orderedlist> 277 <listitem> 278 <para> 279 Checks to see whether a free block exists somewhere in a region 280 of memory close to the last satisfied request. If so, then that 281 block is marked as allocated in the bit map and given to the 282 user. If not, then (2) is executed. 283 </para> 284 </listitem> 285 <listitem> 286 <para> 287 Is there a free block anywhere after the current block right 288 up to the end of the memory that we have? If so, that block is 289 found, and the same procedure is applied as above, and 290 returned to the user. If not, then (3) is executed. 291 </para> 292 </listitem> 293 <listitem> 294 <para> 295 Is there any block in whatever region of memory that we own 296 free? This is done by checking 297 </para> 298 <itemizedlist> 299 <listitem> 300 <para> 301 The use count for each super block, and if that fails then 302 </para> 303 </listitem> 304 <listitem> 305 <para> 306 The individual bit-maps for each super block. 307 </para> 308 </listitem> 309 </itemizedlist> 310 311 <para> 312 Note: Here we are never touching any of the memory that the 313 user will be given, and we are confining all memory accesses 314 to a small region of memory! This helps reduce cache 315 misses. If this succeeds then we apply the same procedure on 316 that bit-map as (1), and return that block of memory to the 317 user. However, if this process fails, then we resort to (4). 318 </para> 319 </listitem> 320 <listitem> 321 <para> 322 This process involves Refilling the internal exponentially 323 growing memory pool. The said effect is achieved by calling 324 _S_refill_pool which does the following: 325 </para> 326 <itemizedlist> 327 <listitem> 328 <para> 329 Gets more memory from the Global Free List of the Required 330 size. 331 </para> 332 </listitem> 333 <listitem> 334 <para> 335 Adjusts the size for the next call to itself. 336 </para> 337 </listitem> 338 <listitem> 339 <para> 340 Writes the appropriate headers in the bit-maps. 341 </para> 342 </listitem> 343 <listitem> 344 <para> 345 Sets the use count for that super-block just allocated to 0 346 (zero). 347 </para> 348 </listitem> 349 <listitem> 350 <para> 351 All of the above accounts to maintaining the basic invariant 352 for the allocator. If the invariant is maintained, we are 353 sure that all is well. Now, the same process is applied on 354 the newly acquired free blocks, which are dispatched 355 accordingly. 356 </para> 357 </listitem> 358 </itemizedlist> 359 </listitem> 360</orderedlist> 361 362<para> 363Thus, you can clearly see that the allocate function is nothing but a 364combination of the next-fit and first-fit algorithm optimized ONLY for 365single object allocations. 366</para> 367 368</section> 369 370<section xml:id="bitmap.impl.deallocate" xreflabel="Deallocate"><info><title><function>deallocate</function></title></info> 371 372 <para> 373 The deallocate function again is specialized for single objects ONLY. 374 For all n belonging to > 1, the operator delete is called without 375 further ado, and the deallocate function returns. 376 </para> 377 <para> 378 However for n == 1, a series of steps are performed: 379 </para> 380 381 <orderedlist> 382 <listitem><para> 383 We first need to locate that super-block which holds the memory 384 location given to us by the user. For that purpose, we maintain 385 a static variable _S_last_dealloc_index, which holds the index 386 into the vector of block pairs which indicates the index of the 387 last super-block from which memory was freed. We use this 388 strategy in the hope that the user will deallocate memory in a 389 region close to what he/she deallocated the last time around. If 390 the check for belongs_to succeeds, then we determine the bit-map 391 for the given pointer, and locate the index into that bit-map, 392 and mark that bit as free by setting it. 393 </para></listitem> 394 <listitem><para> 395 If the _S_last_dealloc_index does not point to the memory block 396 that we're looking for, then we do a linear search on the block 397 stored in the vector of Block Pairs. This vector in code is 398 called _S_mem_blocks. When the corresponding super-block is 399 found, we apply the same procedure as we did for (1) to mark the 400 block as free in the bit-map. 401 </para></listitem> 402 </orderedlist> 403 404 <para> 405 Now, whenever a block is freed, the use count of that particular 406 super block goes down by 1. When this use count hits 0, we remove 407 that super block from the list of all valid super blocks stored in 408 the vector. While doing this, we also make sure that the basic 409 invariant is maintained by making sure that _S_last_request and 410 _S_last_dealloc_index point to valid locations within the vector. 411 </para> 412</section> 413 414<section xml:id="bitmap.impl.questions" xreflabel="Questions"><info><title>Questions</title></info> 415 416 417 <section xml:id="bitmap.impl.question.1" xreflabel="Question 1"><info><title>1</title></info> 418 419 <para> 420Q1) The "Data Layout" section is 421cryptic. I have no idea of what you are trying to say. Layout of what? 422The free-list? Each bitmap? The Super Block? 423 </para> 424 <para> 425 The layout of a Super Block of a given 426size. In the example, a super block of size 32 x 1 is taken. The 427general formula for calculating the size of a super block is 42832 x sizeof(value_type) x 2^n, where n ranges from 0 to 32 for 32-bit 429systems. 430 </para> 431 </section> 432 433 <section xml:id="bitmap.impl.question.2" xreflabel="Question 2"><info><title>2</title></info> 434 435 <para> 436 And since I just mentioned the 437term `each bitmap', what in the world is meant by it? What does each 438bitmap manage? How does it relate to the super block? Is the Super 439Block a bitmap as well? 440 </para> 441 <para> 442 Each bitmap is part of a Super Block which is made up of 3 parts 443 as I have mentioned earlier. Re-iterating, 1. The use count, 444 2. The bit-map for that Super Block. 3. The actual memory that 445 will be eventually given to the user. Each bitmap is a multiple 446 of 32 in size. If there are 32 x (2^3) blocks of single objects 447 to be given, there will be '32 x (2^3)' bits present. Each 32 448 bits managing the allocated / free status for 32 blocks. Since 449 each size_t contains 32-bits, one size_t can manage up to 32 450 blocks' status. Each bit-map is made up of a number of size_t, 451 whose exact number for a super-block of a given size I have just 452 mentioned. 453 </para> 454 </section> 455 456 <section xml:id="bitmap.impl.question.3" xreflabel="Question 3"><info><title>3</title></info> 457 458 <para> 459 How do the allocate and deallocate functions work in regard to 460 bitmaps? 461 </para> 462 <para> 463 The allocate and deallocate functions manipulate the bitmaps and 464 have nothing to do with the memory that is given to the user. As 465 I have earlier mentioned, a 1 in the bitmap's bit field 466 indicates free, while a 0 indicates allocated. This lets us 467 check 32 bits at a time to check whether there is at lease one 468 free block in those 32 blocks by testing for equality with 469 (0). Now, the allocate function will given a memory block find 470 the corresponding bit in the bitmap, and will reset it (i.e., 471 make it re-set (0)). And when the deallocate function is called, 472 it will again set that bit after locating it to indicate that 473 that particular block corresponding to this bit in the bit-map 474 is not being used by anyone, and may be used to satisfy future 475 requests. 476 </para> 477 <para> 478 e.g.: Consider a bit-map of 64-bits as represented below: 479 1111111111111111111111111111111111111111111111111111111111111111 480 </para> 481 482 <para> 483 Now, when the first request for allocation of a single object 484 comes along, the first block in address order is returned. And 485 since the bit-maps in the reverse order to that of the address 486 order, the last bit (LSB if the bit-map is considered as a 487 binary word of 64-bits) is re-set to 0. 488 </para> 489 490 <para> 491 The bit-map now looks like this: 492 1111111111111111111111111111111111111111111111111111111111111110 493 </para> 494 </section> 495</section> 496 497<section xml:id="bitmap.impl.locality" xreflabel="Locality"><info><title>Locality</title></info> 498 499 <para> 500 Another issue would be whether to keep the all bitmaps in a 501 separate area in memory, or to keep them near the actual blocks 502 that will be given out or allocated for the client. After some 503 testing, I've decided to keep these bitmaps close to the actual 504 blocks. This will help in 2 ways. 505 </para> 506 507 <orderedlist> 508 <listitem><para>Constant time access for the bitmap themselves, since no kind of 509look up will be needed to find the correct bitmap list or its 510equivalent.</para></listitem> 511 <listitem><para>And also this would preserve the cache as far as possible.</para></listitem> 512 </orderedlist> 513 514 <para> 515 So in effect, this kind of an allocator might prove beneficial from a 516 purely cache point of view. But this allocator has been made to try and 517 roll out the defects of the node_allocator, wherein the nodes get 518 skewed about in memory, if they are not returned in the exact reverse 519 order or in the same order in which they were allocated. Also, the 520 new_allocator's book keeping overhead is too much for small objects and 521 single object allocations, though it preserves the locality of blocks 522 very well when they are returned back to the allocator. 523 </para> 524</section> 525 526<section xml:id="bitmap.impl.grow_policy" xreflabel="Grow Policy"><info><title>Overhead and Grow Policy</title></info> 527 528 <para> 529 Expected overhead per block would be 1 bit in memory. Also, once 530 the address of the free list has been found, the cost for 531 allocation/deallocation would be negligible, and is supposed to be 532 constant time. For these very reasons, it is very important to 533 minimize the linear time costs, which include finding a free list 534 with a free block while allocating, and finding the corresponding 535 free list for a block while deallocating. Therefore, I have 536 decided that the growth of the internal pool for this allocator 537 will be exponential as compared to linear for 538 node_allocator. There, linear time works well, because we are 539 mainly concerned with speed of allocation/deallocation and memory 540 consumption, whereas here, the allocation/deallocation part does 541 have some linear/logarithmic complexity components in it. Thus, to 542 try and minimize them would be a good thing to do at the cost of a 543 little bit of memory. 544 </para> 545 546 <para> 547 Another thing to be noted is the pool size will double every time 548 the internal pool gets exhausted, and all the free blocks have 549 been given away. The initial size of the pool would be 550 sizeof(size_t) x 8 which is the number of bits in an integer, 551 which can fit exactly in a CPU register. Hence, the term given is 552 exponential growth of the internal pool. 553 </para> 554</section> 555 556</section> 557 558</chapter> 559