1Memory Resource Controller 2 3NOTE: The Memory Resource Controller has been generically been referred 4 to as the memory controller in this document. Do not confuse memory 5 controller used here with the memory controller that is used in hardware. 6 7(For editors) 8In this document: 9 When we mention a cgroup (cgroupfs's directory) with memory controller, 10 we call it "memory cgroup". When you see git-log and source code, you'll 11 see patch's title and function names tend to use "memcg". 12 In this document, we avoid using it. 13 14Benefits and Purpose of the memory controller 15 16The memory controller isolates the memory behaviour of a group of tasks 17from the rest of the system. The article on LWN [12] mentions some probable 18uses of the memory controller. The memory controller can be used to 19 20a. Isolate an application or a group of applications 21 Memory hungry applications can be isolated and limited to a smaller 22 amount of memory. 23b. Create a cgroup with limited amount of memory, this can be used 24 as a good alternative to booting with mem=XXXX. 25c. Virtualization solutions can control the amount of memory they want 26 to assign to a virtual machine instance. 27d. A CD/DVD burner could control the amount of memory used by the 28 rest of the system to ensure that burning does not fail due to lack 29 of available memory. 30e. There are several other use cases, find one or use the controller just 31 for fun (to learn and hack on the VM subsystem). 32 33Current Status: linux-2.6.34-mmotm(development version of 2010/April) 34 35Features: 36 - accounting anonymous pages, file caches, swap caches usage and limiting them. 37 - private LRU and reclaim routine. (system's global LRU and private LRU 38 work independently from each other) 39 - optionally, memory+swap usage can be accounted and limited. 40 - hierarchical accounting 41 - soft limit 42 - moving(recharging) account at moving a task is selectable. 43 - usage threshold notifier 44 - oom-killer disable knob and oom-notifier 45 - Root cgroup has no limit controls. 46 47 Kernel memory and Hugepages are not under control yet. We just manage 48 pages on LRU. To add more controls, we have to take care of performance. 49 50Brief summary of control files. 51 52 tasks # attach a task(thread) and show list of threads 53 cgroup.procs # show list of processes 54 cgroup.event_control # an interface for event_fd() 55 memory.usage_in_bytes # show current memory(RSS+Cache) usage. 56 memory.memsw.usage_in_bytes # show current memory+Swap usage 57 memory.limit_in_bytes # set/show limit of memory usage 58 memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage 59 memory.failcnt # show the number of memory usage hits limits 60 memory.memsw.failcnt # show the number of memory+Swap hits limits 61 memory.max_usage_in_bytes # show max memory usage recorded 62 memory.memsw.usage_in_bytes # show max memory+Swap usage recorded 63 memory.soft_limit_in_bytes # set/show soft limit of memory usage 64 memory.stat # show various statistics 65 memory.use_hierarchy # set/show hierarchical account enabled 66 memory.force_empty # trigger forced move charge to parent 67 memory.swappiness # set/show swappiness parameter of vmscan 68 (See sysctl's vm.swappiness) 69 memory.move_charge_at_immigrate # set/show controls of moving charges 70 memory.oom_control # set/show oom controls. 71 721. History 73 74The memory controller has a long history. A request for comments for the memory 75controller was posted by Balbir Singh [1]. At the time the RFC was posted 76there were several implementations for memory control. The goal of the 77RFC was to build consensus and agreement for the minimal features required 78for memory control. The first RSS controller was posted by Balbir Singh[2] 79in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of the 80RSS controller. At OLS, at the resource management BoF, everyone suggested 81that we handle both page cache and RSS together. Another request was raised 82to allow user space handling of OOM. The current memory controller is 83at version 6; it combines both mapped (RSS) and unmapped Page 84Cache Control [11]. 85 862. Memory Control 87 88Memory is a unique resource in the sense that it is present in a limited 89amount. If a task requires a lot of CPU processing, the task can spread 90its processing over a period of hours, days, months or years, but with 91memory, the same physical memory needs to be reused to accomplish the task. 92 93The memory controller implementation has been divided into phases. These 94are: 95 961. Memory controller 972. mlock(2) controller 983. Kernel user memory accounting and slab control 994. user mappings length controller 100 101The memory controller is the first controller developed. 102 1032.1. Design 104 105The core of the design is a counter called the res_counter. The res_counter 106tracks the current memory usage and limit of the group of processes associated 107with the controller. Each cgroup has a memory controller specific data 108structure (mem_cgroup) associated with it. 109 1102.2. Accounting 111 112 +--------------------+ 113 | mem_cgroup | 114 | (res_counter) | 115 +--------------------+ 116 / ^ \ 117 / | \ 118 +---------------+ | +---------------+ 119 | mm_struct | |.... | mm_struct | 120 | | | | | 121 +---------------+ | +---------------+ 122 | 123 + --------------+ 124 | 125 +---------------+ +------+--------+ 126 | page +----------> page_cgroup| 127 | | | | 128 +---------------+ +---------------+ 129 130 (Figure 1: Hierarchy of Accounting) 131 132 133Figure 1 shows the important aspects of the controller 134 1351. Accounting happens per cgroup 1362. Each mm_struct knows about which cgroup it belongs to 1373. Each page has a pointer to the page_cgroup, which in turn knows the 138 cgroup it belongs to 139 140The accounting is done as follows: mem_cgroup_charge() is invoked to setup 141the necessary data structures and check if the cgroup that is being charged 142is over its limit. If it is then reclaim is invoked on the cgroup. 143More details can be found in the reclaim section of this document. 144If everything goes well, a page meta-data-structure called page_cgroup is 145updated. page_cgroup has its own LRU on cgroup. 146(*) page_cgroup structure is allocated at boot/memory-hotplug time. 147 1482.2.1 Accounting details 149 150All mapped anon pages (RSS) and cache pages (Page Cache) are accounted. 151Some pages which are never reclaimable and will not be on the global LRU 152are not accounted. We just account pages under usual VM management. 153 154RSS pages are accounted at page_fault unless they've already been accounted 155for earlier. A file page will be accounted for as Page Cache when it's 156inserted into inode (radix-tree). While it's mapped into the page tables of 157processes, duplicate accounting is carefully avoided. 158 159A RSS page is unaccounted when it's fully unmapped. A PageCache page is 160unaccounted when it's removed from radix-tree. Even if RSS pages are fully 161unmapped (by kswapd), they may exist as SwapCache in the system until they 162are really freed. Such SwapCaches also also accounted. 163A swapped-in page is not accounted until it's mapped. 164 165Note: The kernel does swapin-readahead and read multiple swaps at once. 166This means swapped-in pages may contain pages for other tasks than a task 167causing page fault. So, we avoid accounting at swap-in I/O. 168 169At page migration, accounting information is kept. 170 171Note: we just account pages-on-LRU because our purpose is to control amount 172of used pages; not-on-LRU pages tend to be out-of-control from VM view. 173 1742.3 Shared Page Accounting 175 176Shared pages are accounted on the basis of the first touch approach. The 177cgroup that first touches a page is accounted for the page. The principle 178behind this approach is that a cgroup that aggressively uses a shared 179page will eventually get charged for it (once it is uncharged from 180the cgroup that brought it in -- this will happen on memory pressure). 181 182Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.. 183When you do swapoff and make swapped-out pages of shmem(tmpfs) to 184be backed into memory in force, charges for pages are accounted against the 185caller of swapoff rather than the users of shmem. 186 187 1882.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP) 189 190Swap Extension allows you to record charge for swap. A swapped-in page is 191charged back to original page allocator if possible. 192 193When swap is accounted, following files are added. 194 - memory.memsw.usage_in_bytes. 195 - memory.memsw.limit_in_bytes. 196 197memsw means memory+swap. Usage of memory+swap is limited by 198memsw.limit_in_bytes. 199 200Example: Assume a system with 4G of swap. A task which allocates 6G of memory 201(by mistake) under 2G memory limitation will use all swap. 202In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap. 203By using memsw limit, you can avoid system OOM which can be caused by swap 204shortage. 205 206* why 'memory+swap' rather than swap. 207The global LRU(kswapd) can swap out arbitrary pages. Swap-out means 208to move account from memory to swap...there is no change in usage of 209memory+swap. In other words, when we want to limit the usage of swap without 210affecting global LRU, memory+swap limit is better than just limiting swap from 211OS point of view. 212 213* What happens when a cgroup hits memory.memsw.limit_in_bytes 214When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out 215in this cgroup. Then, swap-out will not be done by cgroup routine and file 216caches are dropped. But as mentioned above, global LRU can do swapout memory 217from it for sanity of the system's memory management state. You can't forbid 218it by cgroup. 219 2202.5 Reclaim 221 222Each cgroup maintains a per cgroup LRU which has the same structure as 223global VM. When a cgroup goes over its limit, we first try 224to reclaim memory from the cgroup so as to make space for the new 225pages that the cgroup has touched. If the reclaim is unsuccessful, 226an OOM routine is invoked to select and kill the bulkiest task in the 227cgroup. (See 10. OOM Control below.) 228 229The reclaim algorithm has not been modified for cgroups, except that 230pages that are selected for reclaiming come from the per cgroup LRU 231list. 232 233NOTE: Reclaim does not work for the root cgroup, since we cannot set any 234limits on the root cgroup. 235 236Note2: When panic_on_oom is set to "2", the whole system will panic. 237 238When oom event notifier is registered, event will be delivered. 239(See oom_control section) 240 2412.6 Locking 242 243 lock_page_cgroup()/unlock_page_cgroup() should not be called under 244 mapping->tree_lock. 245 246 Other lock order is following: 247 PG_locked. 248 mm->page_table_lock 249 zone->lru_lock 250 lock_page_cgroup. 251 In many cases, just lock_page_cgroup() is called. 252 per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by 253 zone->lru_lock, it has no lock of its own. 254 2553. User Interface 256 2570. Configuration 258 259a. Enable CONFIG_CGROUPS 260b. Enable CONFIG_RESOURCE_COUNTERS 261c. Enable CONFIG_CGROUP_MEM_RES_CTLR 262d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) 263 2641. Prepare the cgroups 265# mkdir -p /cgroups 266# mount -t cgroup none /cgroups -o memory 267 2682. Make the new group and move bash into it 269# mkdir /cgroups/0 270# echo $$ > /cgroups/0/tasks 271 272Since now we're in the 0 cgroup, we can alter the memory limit: 273# echo 4M > /cgroups/0/memory.limit_in_bytes 274 275NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, 276mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) 277 278NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). 279NOTE: We cannot set limits on the root cgroup any more. 280 281# cat /cgroups/0/memory.limit_in_bytes 2824194304 283 284We can check the usage: 285# cat /cgroups/0/memory.usage_in_bytes 2861216512 287 288A successful write to this file does not guarantee a successful set of 289this limit to the value written into the file. This can be due to a 290number of factors, such as rounding up to page boundaries or the total 291availability of memory on the system. The user is required to re-read 292this file after a write to guarantee the value committed by the kernel. 293 294# echo 1 > memory.limit_in_bytes 295# cat memory.limit_in_bytes 2964096 297 298The memory.failcnt field gives the number of times that the cgroup limit was 299exceeded. 300 301The memory.stat file gives accounting information. Now, the number of 302caches, RSS and Active pages/Inactive pages are shown. 303 3044. Testing 305 306For testing features and implementation, see memcg_test.txt. 307 308Performance test is also important. To see pure memory controller's overhead, 309testing on tmpfs will give you good numbers of small overheads. 310Example: do kernel make on tmpfs. 311 312Page-fault scalability is also important. At measuring parallel 313page fault test, multi-process test may be better than multi-thread 314test because it has noise of shared objects/status. 315 316But the above two are testing extreme situations. 317Trying usual test under memory controller is always helpful. 318 3194.1 Troubleshooting 320 321Sometimes a user might find that the application under a cgroup is 322terminated by OOM killer. There are several causes for this: 323 3241. The cgroup limit is too low (just too low to do anything useful) 3252. The user is using anonymous memory and swap is turned off or too low 326 327A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of 328some of the pages cached in the cgroup (page cache pages). 329 330To know what happens, disable OOM_Kill by 10. OOM Control(see below) and 331seeing what happens will be helpful. 332 3334.2 Task migration 334 335When a task migrates from one cgroup to another, its charge is not 336carried forward by default. The pages allocated from the original cgroup still 337remain charged to it, the charge is dropped when the page is freed or 338reclaimed. 339 340You can move charges of a task along with task migration. 341See 8. "Move charges at task migration" 342 3434.3 Removing a cgroup 344 345A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a 346cgroup might have some charge associated with it, even though all 347tasks have migrated away from it. (because we charge against pages, not 348against tasks.) 349 350Such charges are freed or moved to their parent. At moving, both of RSS 351and CACHES are moved to parent. 352rmdir() may return -EBUSY if freeing/moving fails. See 5.1 also. 353 354Charges recorded in swap information is not updated at removal of cgroup. 355Recorded information is discarded and a cgroup which uses swap (swapcache) 356will be charged as a new owner of it. 357 358 3595. Misc. interfaces. 360 3615.1 force_empty 362 memory.force_empty interface is provided to make cgroup's memory usage empty. 363 You can use this interface only when the cgroup has no tasks. 364 When writing anything to this 365 366 # echo 0 > memory.force_empty 367 368 Almost all pages tracked by this memory cgroup will be unmapped and freed. 369 Some pages cannot be freed because they are locked or in-use. Such pages are 370 moved to parent and this cgroup will be empty. This may return -EBUSY if 371 VM is too busy to free/move all pages immediately. 372 373 Typical use case of this interface is that calling this before rmdir(). 374 Because rmdir() moves all pages to parent, some out-of-use page caches can be 375 moved to the parent. If you want to avoid that, force_empty will be useful. 376 3775.2 stat file 378 379memory.stat file includes following statistics 380 381# per-memory cgroup local status 382cache - # of bytes of page cache memory. 383rss - # of bytes of anonymous and swap cache memory. 384mapped_file - # of bytes of mapped file (includes tmpfs/shmem) 385pgpgin - # of pages paged in (equivalent to # of charging events). 386pgpgout - # of pages paged out (equivalent to # of uncharging events). 387swap - # of bytes of swap usage 388inactive_anon - # of bytes of anonymous memory and swap cache memory on 389 LRU list. 390active_anon - # of bytes of anonymous and swap cache memory on active 391 inactive LRU list. 392inactive_file - # of bytes of file-backed memory on inactive LRU list. 393active_file - # of bytes of file-backed memory on active LRU list. 394unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc). 395 396# status considering hierarchy (see memory.use_hierarchy settings) 397 398hierarchical_memory_limit - # of bytes of memory limit with regard to hierarchy 399 under which the memory cgroup is 400hierarchical_memsw_limit - # of bytes of memory+swap limit with regard to 401 hierarchy under which memory cgroup is. 402 403total_cache - sum of all children's "cache" 404total_rss - sum of all children's "rss" 405total_mapped_file - sum of all children's "cache" 406total_pgpgin - sum of all children's "pgpgin" 407total_pgpgout - sum of all children's "pgpgout" 408total_swap - sum of all children's "swap" 409total_inactive_anon - sum of all children's "inactive_anon" 410total_active_anon - sum of all children's "active_anon" 411total_inactive_file - sum of all children's "inactive_file" 412total_active_file - sum of all children's "active_file" 413total_unevictable - sum of all children's "unevictable" 414 415# The following additional stats are dependent on CONFIG_DEBUG_VM. 416 417inactive_ratio - VM internal parameter. (see mm/page_alloc.c) 418recent_rotated_anon - VM internal parameter. (see mm/vmscan.c) 419recent_rotated_file - VM internal parameter. (see mm/vmscan.c) 420recent_scanned_anon - VM internal parameter. (see mm/vmscan.c) 421recent_scanned_file - VM internal parameter. (see mm/vmscan.c) 422 423Memo: 424 recent_rotated means recent frequency of LRU rotation. 425 recent_scanned means recent # of scans to LRU. 426 showing for better debug please see the code for meanings. 427 428Note: 429 Only anonymous and swap cache memory is listed as part of 'rss' stat. 430 This should not be confused with the true 'resident set size' or the 431 amount of physical memory used by the cgroup. 432 'rss + file_mapped" will give you resident set size of cgroup. 433 (Note: file and shmem may be shared among other cgroups. In that case, 434 file_mapped is accounted only when the memory cgroup is owner of page 435 cache.) 436 4375.3 swappiness 438 439Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only. 440 441Following cgroups' swappiness can't be changed. 442- root cgroup (uses /proc/sys/vm/swappiness). 443- a cgroup which uses hierarchy and it has other cgroup(s) below it. 444- a cgroup which uses hierarchy and not the root of hierarchy. 445 4465.4 failcnt 447 448A memory cgroup provides memory.failcnt and memory.memsw.failcnt files. 449This failcnt(== failure count) shows the number of times that a usage counter 450hit its limit. When a memory cgroup hits a limit, failcnt increases and 451memory under it will be reclaimed. 452 453You can reset failcnt by writing 0 to failcnt file. 454# echo 0 > .../memory.failcnt 455 4566. Hierarchy support 457 458The memory controller supports a deep hierarchy and hierarchical accounting. 459The hierarchy is created by creating the appropriate cgroups in the 460cgroup filesystem. Consider for example, the following cgroup filesystem 461hierarchy 462 463 root 464 / | \ 465 / | \ 466 a b c 467 | \ 468 | \ 469 d e 470 471In the diagram above, with hierarchical accounting enabled, all memory 472usage of e, is accounted to its ancestors up until the root (i.e, c and root), 473that has memory.use_hierarchy enabled. If one of the ancestors goes over its 474limit, the reclaim algorithm reclaims from the tasks in the ancestor and the 475children of the ancestor. 476 4776.1 Enabling hierarchical accounting and reclaim 478 479A memory cgroup by default disables the hierarchy feature. Support 480can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup 481 482# echo 1 > memory.use_hierarchy 483 484The feature can be disabled by 485 486# echo 0 > memory.use_hierarchy 487 488NOTE1: Enabling/disabling will fail if the cgroup already has other 489 cgroups created below it. 490 491NOTE2: When panic_on_oom is set to "2", the whole system will panic in 492 case of an OOM event in any cgroup. 493 4947. Soft limits 495 496Soft limits allow for greater sharing of memory. The idea behind soft limits 497is to allow control groups to use as much of the memory as needed, provided 498 499a. There is no memory contention 500b. They do not exceed their hard limit 501 502When the system detects memory contention or low memory, control groups 503are pushed back to their soft limits. If the soft limit of each control 504group is very high, they are pushed back as much as possible to make 505sure that one control group does not starve the others of memory. 506 507Please note that soft limits is a best effort feature, it comes with 508no guarantees, but it does its best to make sure that when memory is 509heavily contended for, memory is allocated based on the soft limit 510hints/setup. Currently soft limit based reclaim is setup such that 511it gets invoked from balance_pgdat (kswapd). 512 5137.1 Interface 514 515Soft limits can be setup by using the following commands (in this example we 516assume a soft limit of 256 MiB) 517 518# echo 256M > memory.soft_limit_in_bytes 519 520If we want to change this to 1G, we can at any time use 521 522# echo 1G > memory.soft_limit_in_bytes 523 524NOTE1: Soft limits take effect over a long period of time, since they involve 525 reclaiming memory for balancing between memory cgroups 526NOTE2: It is recommended to set the soft limit always below the hard limit, 527 otherwise the hard limit will take precedence. 528 5298. Move charges at task migration 530 531Users can move charges associated with a task along with task migration, that 532is, uncharge task's pages from the old cgroup and charge them to the new cgroup. 533This feature is not supported in !CONFIG_MMU environments because of lack of 534page tables. 535 5368.1 Interface 537 538This feature is disabled by default. It can be enabled(and disabled again) by 539writing to memory.move_charge_at_immigrate of the destination cgroup. 540 541If you want to enable it: 542 543# echo (some positive value) > memory.move_charge_at_immigrate 544 545Note: Each bits of move_charge_at_immigrate has its own meaning about what type 546 of charges should be moved. See 8.2 for details. 547Note: Charges are moved only when you move mm->owner, IOW, a leader of a thread 548 group. 549Note: If we cannot find enough space for the task in the destination cgroup, we 550 try to make space by reclaiming memory. Task migration may fail if we 551 cannot make enough space. 552Note: It can take several seconds if you move charges much. 553 554And if you want disable it again: 555 556# echo 0 > memory.move_charge_at_immigrate 557 5588.2 Type of charges which can be move 559 560Each bits of move_charge_at_immigrate has its own meaning about what type of 561charges should be moved. But in any cases, it must be noted that an account of 562a page or a swap can be moved only when it is charged to the task's current(old) 563memory cgroup. 564 565 bit | what type of charges would be moved ? 566 -----+------------------------------------------------------------------------ 567 0 | A charge of an anonymous page(or swap of it) used by the target task. 568 | Those pages and swaps must be used only by the target task. You must 569 | enable Swap Extension(see 2.4) to enable move of swap charges. 570 -----+------------------------------------------------------------------------ 571 1 | A charge of file pages(normal file, tmpfs file(e.g. ipc shared memory) 572 | and swaps of tmpfs file) mmapped by the target task. Unlike the case of 573 | anonymous pages, file pages(and swaps) in the range mmapped by the task 574 | will be moved even if the task hasn't done page fault, i.e. they might 575 | not be the task's "RSS", but other task's "RSS" that maps the same file. 576 | And mapcount of the page is ignored(the page can be moved even if 577 | page_mapcount(page) > 1). You must enable Swap Extension(see 2.4) to 578 | enable move of swap charges. 579 5808.3 TODO 581 582- Implement madvise(2) to let users decide the vma to be moved or not to be 583 moved. 584- All of moving charge operations are done under cgroup_mutex. It's not good 585 behavior to hold the mutex too long, so we may need some trick. 586 5879. Memory thresholds 588 589Memory cgroup implements memory thresholds using cgroups notification 590API (see cgroups.txt). It allows to register multiple memory and memsw 591thresholds and gets notifications when it crosses. 592 593To register a threshold application need: 594- create an eventfd using eventfd(2); 595- open memory.usage_in_bytes or memory.memsw.usage_in_bytes; 596- write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to 597 cgroup.event_control. 598 599Application will be notified through eventfd when memory usage crosses 600threshold in any direction. 601 602It's applicable for root and non-root cgroup. 603 60410. OOM Control 605 606memory.oom_control file is for OOM notification and other controls. 607 608Memory cgroup implements OOM notifier using cgroup notification 609API (See cgroups.txt). It allows to register multiple OOM notification 610delivery and gets notification when OOM happens. 611 612To register a notifier, application need: 613 - create an eventfd using eventfd(2) 614 - open memory.oom_control file 615 - write string like "<event_fd> <fd of memory.oom_control>" to 616 cgroup.event_control 617 618Application will be notified through eventfd when OOM happens. 619OOM notification doesn't work for root cgroup. 620 621You can disable OOM-killer by writing "1" to memory.oom_control file, as: 622 623 #echo 1 > memory.oom_control 624 625This operation is only allowed to the top cgroup of sub-hierarchy. 626If OOM-killer is disabled, tasks under cgroup will hang/sleep 627in memory cgroup's OOM-waitqueue when they request accountable memory. 628 629For running them, you have to relax the memory cgroup's OOM status by 630 * enlarge limit or reduce usage. 631To reduce usage, 632 * kill some tasks. 633 * move some tasks to other group with account migration. 634 * remove some files (on tmpfs?) 635 636Then, stopped tasks will work again. 637 638At reading, current status of OOM is shown. 639 oom_kill_disable 0 or 1 (if 1, oom-killer is disabled) 640 under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may 641 be stopped.) 642 64311. TODO 644 6451. Add support for accounting huge pages (as a separate controller) 6462. Make per-cgroup scanner reclaim not-shared pages first 6473. Teach controller to account for shared-pages 6484. Start reclamation in the background when the limit is 649 not yet hit but the usage is getting closer 650 651Summary 652 653Overall, the memory controller has been a stable controller and has been 654commented and discussed quite extensively in the community. 655 656References 657 6581. Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/ 6592. Singh, Balbir. Memory Controller (RSS Control), 660 http://lwn.net/Articles/222762/ 6613. Emelianov, Pavel. Resource controllers based on process cgroups 662 http://lkml.org/lkml/2007/3/6/198 6634. Emelianov, Pavel. RSS controller based on process cgroups (v2) 664 http://lkml.org/lkml/2007/4/9/78 6655. Emelianov, Pavel. RSS controller based on process cgroups (v3) 666 http://lkml.org/lkml/2007/5/30/244 6676. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/ 6687. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control 669 subsystem (v3), http://lwn.net/Articles/235534/ 6708. Singh, Balbir. RSS controller v2 test results (lmbench), 671 http://lkml.org/lkml/2007/5/17/232 6729. Singh, Balbir. RSS controller v2 AIM9 results 673 http://lkml.org/lkml/2007/5/18/1 67410. Singh, Balbir. Memory controller v6 test results, 675 http://lkml.org/lkml/2007/8/19/36 67611. Singh, Balbir. Memory controller introduction (v6), 677 http://lkml.org/lkml/2007/8/17/69 67812. Corbet, Jonathan, Controlling memory use in cgroups, 679 http://lwn.net/Articles/243795/ 680