1 Keeping data small 2 3When many applets are compiled into busybox, all rw data and 4bss for each applet are concatenated. Including those from libc, 5if static busybox is built. When busybox is started, _all_ this data 6is allocated, not just that one part for selected applet. 7 8What "allocated" exactly means, depends on arch. 9On NOMMU it's probably bites the most, actually using real 10RAM for rwdata and bss. On i386, bss is lazily allocated 11by COWed zero pages. Not sure about rwdata - also COW? 12 13In order to keep busybox NOMMU and small-mem systems friendly 14we should avoid large global data in our applets, and should 15minimize usage of libc functions which implicitly use 16such structures. 17 18Small experiment to measure "parasitic" bbox memory consumption: 19here we start 1000 "busybox sleep 10" in parallel. 20busybox binary is practically allyesconfig static one, 21built against uclibc. Run on x86-64 machine with 64-bit kernel: 22 23bash-3.2# nmeter '%t %c %m %p %[pn]' 2423:17:28 .......... 168M 0 147 2523:17:29 .......... 168M 0 147 2623:17:30 U......... 168M 1 147 2723:17:31 SU........ 181M 244 391 2823:17:32 SSSSUUU... 223M 757 1147 2923:17:33 UUU....... 223M 0 1147 3023:17:34 U......... 223M 1 1147 3123:17:35 .......... 223M 0 1147 3223:17:36 .......... 223M 0 1147 3323:17:37 S......... 223M 0 1147 3423:17:38 .......... 223M 1 1147 3523:17:39 .......... 223M 0 1147 3623:17:40 .......... 223M 0 1147 3723:17:41 .......... 210M 0 906 3823:17:42 .......... 168M 1 147 3923:17:43 .......... 168M 0 147 40 41This requires 55M of memory. Thus 1 trivial busybox applet 42takes 55k of memory on 64-bit x86 kernel. 43 44On 32-bit kernel we need ~26k per applet. 45 46Script: 47 48i=1000; while test $i != 0; do 49 echo -n . 50 busybox sleep 30 & 51 i=$((i - 1)) 52done 53echo 54wait 55 56(Data from NOMMU arches are sought. Provide 'size busybox' output too) 57 58 59 Example 1 60 61One example how to reduce global data usage is in 62archival/libunarchive/decompress_unzip.c: 63 64/* This is somewhat complex-looking arrangement, but it allows 65 * to place decompressor state either in bss or in 66 * malloc'ed space simply by changing #defines below. 67 * Sizes on i386: 68 * text data bss dec hex 69 * 5256 0 108 5364 14f4 - bss 70 * 4915 0 0 4915 1333 - malloc 71 */ 72#define STATE_IN_BSS 0 73#define STATE_IN_MALLOC 1 74 75(see the rest of the file to get the idea) 76 77This example completely eliminates globals in that module. 78Required memory is allocated in unpack_gz_stream() [its main module] 79and then passed down to all subroutines which need to access 'globals' 80as a parameter. 81 82 83 Example 2 84 85In case you don't want to pass this additional parameter everywhere, 86take a look at archival/gzip.c. Here all global data is replaced by 87single global pointer (ptr_to_globals) to allocated storage. 88 89In order to not duplicate ptr_to_globals in every applet, you can 90reuse single common one. It is defined in libbb/messages.c 91as struct globals *const ptr_to_globals, but the struct globals is 92NOT defined in libbb.h. You first define your own struct: 93 94struct globals { int a; char buf[1000]; }; 95 96and then declare that ptr_to_globals is a pointer to it: 97 98#define G (*ptr_to_globals) 99 100ptr_to_globals is declared as constant pointer. 101This helps gcc understand that it won't change, resulting in noticeably 102smaller code. In order to assign it, use SET_PTR_TO_GLOBALS macro: 103 104 SET_PTR_TO_GLOBALS(xzalloc(sizeof(G))); 105 106Typically it is done in <applet>_main(). 107 108Now you can reference "globals" by G.a, G.buf and so on, in any function. 109 110 111 bb_common_bufsiz1 112 113There is one big common buffer in bss - bb_common_bufsiz1. It is a much 114earlier mechanism to reduce bss usage. Each applet can use it for 115its needs. Library functions are prohibited from using it. 116 117'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer: 118 119#define G (*(struct globals*)&bb_common_bufsiz1) 120 121Be careful, though, and use it only if globals fit into bb_common_bufsiz1. 122Since bb_common_bufsiz1 is BUFSIZ + 1 bytes long and BUFSIZ can change 123from one libc to another, you have to add compile-time check for it: 124 125if (sizeof(struct globals) > sizeof(bb_common_bufsiz1)) 126 BUG_<applet>_globals_too_big(); 127 128 129 Drawbacks 130 131You have to initialize it by hand. xzalloc() can be helpful in clearing 132allocated storage to 0, but anything more must be done by hand. 133 134All global variables are prefixed by 'G.' now. If this makes code 135less readable, use #defines: 136 137#define dev_fd (G.dev_fd) 138#define sector (G.sector) 139 140 141 Word of caution 142 143If applet doesn't use much of global data, converting it to use 144one of above methods is not worth the resulting code obfuscation. 145If you have less than ~300 bytes of global data - don't bother. 146 147 148 Finding non-shared duplicated strings 149 150strings busybox | sort | uniq -c | sort -nr 151 152 153 gcc's data alignment problem 154 155The following attribute added in vi.c: 156 157static int tabstop; 158static struct termios term_orig __attribute__ ((aligned (4))); 159static struct termios term_vi __attribute__ ((aligned (4))); 160 161reduces bss size by 32 bytes, because gcc sometimes aligns structures to 162ridiculously large values. asm output diff for above example: 163 164 tabstop: 165 .zero 4 166 .section .bss.term_orig,"aw",@nobits 167- .align 32 168+ .align 4 169 .type term_orig, @object 170 .size term_orig, 60 171 term_orig: 172 .zero 60 173 .section .bss.term_vi,"aw",@nobits 174- .align 32 175+ .align 4 176 .type term_vi, @object 177 .size term_vi, 60 178 179gcc doesn't seem to have options for altering this behaviour. 180 181gcc 3.4.3 and 4.1.1 tested: 182char c = 1; 183// gcc aligns to 32 bytes if sizeof(struct) >= 32 184struct { 185 int a,b,c,d; 186 int i1,i2,i3; 187} s28 = { 1 }; // struct will be aligned to 4 bytes 188struct { 189 int a,b,c,d; 190 int i1,i2,i3,i4; 191} s32 = { 1 }; // struct will be aligned to 32 bytes 192// same for arrays 193char vc31[31] = { 1 }; // unaligned 194char vc32[32] = { 1 }; // aligned to 32 bytes 195 196-fpack-struct=1 reduces alignment of s28 to 1 (but probably 197will break layout of many libc structs) but s32 and vc32 198are still aligned to 32 bytes. 199 200I will try to cook up a patch to add a gcc option for disabling it. 201Meanwhile, this is where it can be disabled in gcc source: 202 203gcc/config/i386/i386.c 204int 205ix86_data_alignment (tree type, int align) 206{ 207#if 0 208 if (AGGREGATE_TYPE_P (type) 209 && TYPE_SIZE (type) 210 && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST 211 && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256 212 || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256) 213 return 256; 214#endif 215 216Result (non-static busybox built against glibc): 217 218# size /usr/srcdevel/bbox/fix/busybox.t0/busybox busybox 219 text data bss dec hex filename 220 634416 2736 23856 661008 a1610 busybox 221 632580 2672 22944 658196 a0b14 busybox_noalign 222 223 224 225 Keeping code small 226 227Set CONFIG_EXTRA_CFLAGS="-fno-inline-functions-called-once", 228produce "make bloatcheck", see the biggest auto-inlined functions. 229Now, set CONFIG_EXTRA_CFLAGS back to "", but add NOINLINE 230to some of these functions. In 1.16.x timeframe, the results were 231(annotated "make bloatcheck" output): 232 233function old new delta 234expand_vars_to_list - 1712 +1712 win 235lzo1x_optimize - 1429 +1429 win 236arith_apply - 1326 +1326 win 237read_interfaces - 1163 +1163 loss, leave w/o NOINLINE 238logdir_open - 1148 +1148 win 239check_deps - 1148 +1148 loss 240rewrite - 1039 +1039 win 241run_pipe 358 1396 +1038 win 242write_status_file - 1029 +1029 almost the same, leave w/o NOINLINE 243dump_identity - 987 +987 win 244mainQSort3 - 921 +921 win 245parse_one_line - 916 +916 loss 246summarize - 897 +897 almost the same 247do_shm - 884 +884 win 248cpio_o - 863 +863 win 249subCommand - 841 +841 loss 250receive - 834 +834 loss 251 252855 bytes saved in total. 253 254scripts/mkdiff_obj_bloat may be useful to automate this process: run 255"scripts/mkdiff_obj_bloat NORMALLY_BUILT_TREE FORCED_NOINLINE_TREE" 256and select modules which shrank. 257