1# Benchmarks
2
3The results of these benchmarks suggest that building this `bc` with
4optimization at `-O3` with link-time optimization (`-flto`) will result in the
5best performance. However, using `-march=native` can result in **WORSE**
6performance.
7
8*Note*: all benchmarks were run four times, and the fastest run is the one
9shown. Also, `[bc]` means whichever `bc` was being run, and the assumed working
10directory is the root directory of this repository. Also, this `bc` was at
11version `3.0.0` while GNU `bc` was at version `1.07.1`, and all tests were
12conducted on an `x86_64` machine running Gentoo Linux with `clang` `9.0.1` as
13the compiler.
14
15## Typical Optimization Level
16
17These benchmarks were run with both `bc`'s compiled with the typical `-O2`
18optimizations and no link-time optimization.
19
20### Addition
21
22The command used was:
23
24```
25tests/script.sh bc add.bc 1 0 1 1 [bc]
26```
27
28For GNU `bc`:
29
30```
31real 2.54
32user 1.21
33sys 1.32
34```
35
36For this `bc`:
37
38```
39real 0.88
40user 0.85
41sys 0.02
42```
43
44### Subtraction
45
46The command used was:
47
48```
49tests/script.sh bc subtract.bc 1 0 1 1 [bc]
50```
51
52For GNU `bc`:
53
54```
55real 2.51
56user 1.05
57sys 1.45
58```
59
60For this `bc`:
61
62```
63real 0.91
64user 0.85
65sys 0.05
66```
67
68### Multiplication
69
70The command used was:
71
72```
73tests/script.sh bc multiply.bc 1 0 1 1 [bc]
74```
75
76For GNU `bc`:
77
78```
79real 7.15
80user 4.69
81sys 2.46
82```
83
84For this `bc`:
85
86```
87real 2.20
88user 2.10
89sys 0.09
90```
91
92### Division
93
94The command used was:
95
96```
97tests/script.sh bc divide.bc 1 0 1 1 [bc]
98```
99
100For GNU `bc`:
101
102```
103real 3.36
104user 1.87
105sys 1.48
106```
107
108For this `bc`:
109
110```
111real 1.61
112user 1.57
113sys 0.03
114```
115
116### Power
117
118The command used was:
119
120```
121printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
122```
123
124For GNU `bc`:
125
126```
127real 11.30
128user 11.30
129sys 0.00
130```
131
132For this `bc`:
133
134```
135real 0.73
136user 0.72
137sys 0.00
138```
139
140### Scripts
141
142[This file][1] was downloaded, saved at `../timeconst.bc` and the following
143patch was applied:
144
145```
146--- ../timeconst.bc	2018-09-28 11:32:22.808669000 -0600
147+++ ../timeconst.bc	2019-06-07 07:26:36.359913078 -0600
148@@ -110,8 +110,10 @@
149 
150 		print "#endif /* KERNEL_TIMECONST_H */\n"
151 	}
152-	halt
153 }
154 
155-hz = read();
156-timeconst(hz)
157+for (i = 0; i <= 50000; ++i) {
158+	timeconst(i)
159+}
160+
161+halt
162```
163
164The command used was:
165
166```
167time -p [bc] ../timeconst.bc > /dev/null
168```
169
170For GNU `bc`:
171
172```
173real 16.71
174user 16.06
175sys 0.65
176```
177
178For this `bc`:
179
180```
181real 13.16
182user 13.15
183sys 0.00
184```
185
186Because this `bc` is faster when doing math, it might be a better comparison to
187run a script that is not running any math. As such, I put the following into
188`../test.bc`:
189
190```
191for (i = 0; i < 100000000; ++i) {
192	y = i
193}
194
195i
196y
197
198halt
199```
200
201The command used was:
202
203```
204time -p [bc] ../test.bc > /dev/null
205```
206
207For GNU `bc`:
208
209```
210real 16.60
211user 16.59
212sys 0.00
213```
214
215For this `bc`:
216
217```
218real 22.76
219user 22.75
220sys 0.00
221```
222
223I also put the following into `../test2.bc`:
224
225```
226i = 0
227
228while (i < 100000000) {
229	i += 1
230}
231
232i
233
234halt
235```
236
237The command used was:
238
239```
240time -p [bc] ../test2.bc > /dev/null
241```
242
243For GNU `bc`:
244
245```
246real 17.32
247user 17.30
248sys 0.00
249```
250
251For this `bc`:
252
253```
254real 16.98
255user 16.96
256sys 0.01
257```
258
259It seems that the improvements to the interpreter helped a lot in certain cases.
260
261Also, I have no idea why GNU `bc` did worse when it is technically doing less
262work.
263
264## Recommended Optimizations from `2.7.0`
265
266Note that, when running the benchmarks, the optimizations used are not the ones
267I recommended for version `2.7.0`, which are `-O3 -flto -march=native`.
268
269This `bc` separates its code into modules that, when optimized at link time,
270removes a lot of the inefficiency that comes from function overhead. This is
271most keenly felt with one function: `bc_vec_item()`, which should turn into just
272one instruction (on `x86_64`) when optimized at link time and inlined. There are
273other functions that matter as well.
274
275I also recommended `-march=native` on the grounds that newer instructions would
276increase performance on math-heavy code. We will see if that assumption was
277correct. (Spoiler: **NO**.)
278
279When compiling both `bc`'s with the optimizations I recommended for this `bc`
280for version `2.7.0`, the results are as follows.
281
282### Addition
283
284The command used was:
285
286```
287tests/script.sh bc add.bc 1 0 1 1 [bc]
288```
289
290For GNU `bc`:
291
292```
293real 2.44
294user 1.11
295sys 1.32
296```
297
298For this `bc`:
299
300```
301real 0.59
302user 0.54
303sys 0.05
304```
305
306### Subtraction
307
308The command used was:
309
310```
311tests/script.sh bc subtract.bc 1 0 1 1 [bc]
312```
313
314For GNU `bc`:
315
316```
317real 2.42
318user 1.02
319sys 1.40
320```
321
322For this `bc`:
323
324```
325real 0.64
326user 0.57
327sys 0.06
328```
329
330### Multiplication
331
332The command used was:
333
334```
335tests/script.sh bc multiply.bc 1 0 1 1 [bc]
336```
337
338For GNU `bc`:
339
340```
341real 7.01
342user 4.50
343sys 2.50
344```
345
346For this `bc`:
347
348```
349real 1.59
350user 1.53
351sys 0.05
352```
353
354### Division
355
356The command used was:
357
358```
359tests/script.sh bc divide.bc 1 0 1 1 [bc]
360```
361
362For GNU `bc`:
363
364```
365real 3.26
366user 1.82
367sys 1.44
368```
369
370For this `bc`:
371
372```
373real 1.24
374user 1.20
375sys 0.03
376```
377
378### Power
379
380The command used was:
381
382```
383printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
384```
385
386For GNU `bc`:
387
388```
389real 11.08
390user 11.07
391sys 0.00
392```
393
394For this `bc`:
395
396```
397real 0.71
398user 0.70
399sys 0.00
400```
401
402### Scripts
403
404The command for the `../timeconst.bc` script was:
405
406```
407time -p [bc] ../timeconst.bc > /dev/null
408```
409
410For GNU `bc`:
411
412```
413real 15.62
414user 15.08
415sys 0.53
416```
417
418For this `bc`:
419
420```
421real 10.09
422user 10.08
423sys 0.01
424```
425
426The command for the next script, the `for` loop script, was:
427
428```
429time -p [bc] ../test.bc > /dev/null
430```
431
432For GNU `bc`:
433
434```
435real 14.76
436user 14.75
437sys 0.00
438```
439
440For this `bc`:
441
442```
443real 17.95
444user 17.94
445sys 0.00
446```
447
448The command for the next script, the `while` loop script, was:
449
450```
451time -p [bc] ../test2.bc > /dev/null
452```
453
454For GNU `bc`:
455
456```
457real 14.84
458user 14.83
459sys 0.00
460```
461
462For this `bc`:
463
464```
465real 13.53
466user 13.52
467sys 0.00
468```
469
470## Link-Time Optimization Only
471
472Just for kicks, let's see if `-march=native` is even useful.
473
474The optimizations I used for both `bc`'s were `-O3 -flto`.
475
476### Addition
477
478The command used was:
479
480```
481tests/script.sh bc add.bc 1 0 1 1 [bc]
482```
483
484For GNU `bc`:
485
486```
487real 2.41
488user 1.05
489sys 1.35
490```
491
492For this `bc`:
493
494```
495real 0.58
496user 0.52
497sys 0.05
498```
499
500### Subtraction
501
502The command used was:
503
504```
505tests/script.sh bc subtract.bc 1 0 1 1 [bc]
506```
507
508For GNU `bc`:
509
510```
511real 2.39
512user 1.10
513sys 1.28
514```
515
516For this `bc`:
517
518```
519real 0.65
520user 0.57
521sys 0.07
522```
523
524### Multiplication
525
526The command used was:
527
528```
529tests/script.sh bc multiply.bc 1 0 1 1 [bc]
530```
531
532For GNU `bc`:
533
534```
535real 6.82
536user 4.30
537sys 2.51
538```
539
540For this `bc`:
541
542```
543real 1.57
544user 1.49
545sys 0.08
546```
547
548### Division
549
550The command used was:
551
552```
553tests/script.sh bc divide.bc 1 0 1 1 [bc]
554```
555
556For GNU `bc`:
557
558```
559real 3.25
560user 1.81
561sys 1.43
562```
563
564For this `bc`:
565
566```
567real 1.27
568user 1.23
569sys 0.04
570```
571
572### Power
573
574The command used was:
575
576```
577printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
578```
579
580For GNU `bc`:
581
582```
583real 10.50
584user 10.49
585sys 0.00
586```
587
588For this `bc`:
589
590```
591real 0.72
592user 0.71
593sys 0.00
594```
595
596### Scripts
597
598The command for the `../timeconst.bc` script was:
599
600```
601time -p [bc] ../timeconst.bc > /dev/null
602```
603
604For GNU `bc`:
605
606```
607real 15.50
608user 14.81
609sys 0.68
610```
611
612For this `bc`:
613
614```
615real 10.17
616user 10.15
617sys 0.01
618```
619
620The command for the next script, the `for` loop script, was:
621
622```
623time -p [bc] ../test.bc > /dev/null
624```
625
626For GNU `bc`:
627
628```
629real 14.99
630user 14.99
631sys 0.00
632```
633
634For this `bc`:
635
636```
637real 16.85
638user 16.84
639sys 0.00
640```
641
642The command for the next script, the `while` loop script, was:
643
644```
645time -p [bc] ../test2.bc > /dev/null
646```
647
648For GNU `bc`:
649
650```
651real 14.92
652user 14.91
653sys 0.00
654```
655
656For this `bc`:
657
658```
659real 12.75
660user 12.75
661sys 0.00
662```
663
664It turns out that `-march=native` can be a problem. As such, I have removed the
665recommendation to build with `-march=native`.
666
667## Recommended Compiler
668
669When I ran these benchmarks with my `bc` compiled under `clang` vs. `gcc`, it
670performed much better under `clang`. I recommend compiling this `bc` with
671`clang`.
672
673[1]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc
674