Cross Reference: /linux-master/crypto/xor.c

History log of /linux-master/crypto/xor.c
Revision	Date	Author	Comments
# cfb28fde	03-Feb-2021	Bhaskar Chowdhury <unixbhaskar@gmail.com>	crypto: xor - Fix typo of optimization s/optimzation/optimization/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# 3c02e04f	30-Dec-2020	Kirill Tkhai <ktkhai@virtuozzo.com>	crypto: xor - Fix divide error in do_xor_speed() crypto: Fix divide error in do_xor_speed() From: Kirill Tkhai <ktkhai@virtuozzo.com> Latest (but not only latest) linux-next panics with divide error on my QEMU setup. The patch at the bottom of this message fixes the problem. xor: measuring software checksum speed divide error: 0000 [#1] PREEMPT SMP KASAN PREEMPT SMP KASAN CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.0-next-20201223+ #2177 RIP: 0010:do_xor_speed+0xbb/0xf3 Code: 41 ff cc 75 b5 bf 01 00 00 00 e8 3d 23 8b fe 65 8b 05 f6 49 83 7d 85 c0 75 05 e8 84 70 81 fe b8 00 00 50 c3 31 d2 48 8d 7b 10 <f7> f5 41 89 c4 e8 58 07 a2 fe 44 89 63 10 48 8d 7b 08 e8 cb 07 a2 RSP: 0000:ffff888100137dc8 EFLAGS: 00010246 RAX: 00000000c3500000 RBX: ffffffff823f0160 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000808 RDI: ffffffff823f0170 RBP: 0000000000000000 R08: ffffffff8109c50f R09: ffffffff824bb6f7 R10: fffffbfff04976de R11: 0000000000000001 R12: 0000000000000000 R13: ffff888101997000 R14: ffff888101994000 R15: ffffffff823f0178 FS: 0000000000000000(0000) GS:ffff8881f7780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000220e000 CR4: 00000000000006a0 Call Trace: calibrate_xor_blocks+0x13c/0x1c4 ? do_xor_speed+0xf3/0xf3 do_one_initcall+0xc1/0x1b7 ? start_kernel+0x373/0x373 ? unpoison_range+0x3a/0x60 kernel_init_freeable+0x1dd/0x238 ? rest_init+0xc6/0xc6 kernel_init+0x8/0x10a ret_from_fork+0x1f/0x30 ---[ end trace 5bd3c1d0b77772da ]--- Fixes: c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking") Cc: <stable@vger.kernel.org> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# 10b0f78a	06-Oct-2020	Nathan Chancellor <nathan@kernel.org>	crypto: xor - Remove unused variable count in do_xor_speed Clang warns: crypto/xor.c:101:4: warning: variable 'count' is uninitialized when used here [-Wuninitialized] count++; ^~~~~ crypto/xor.c:86:17: note: initialize the variable 'count' to silence this warning int i, j, count; ^ = 0 1 warning generated. After the refactoring to use ktime that happened in this function, count is only assigned, never read. Just remove the variable to get rid of the warning. Fixes: c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking") Link: https://github.com/ClangBuiltLinux/linux/issues/1171 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# c055e3ea	25-Sep-2020	Ard Biesheuvel <ardb@kernel.org>	crypto: xor - use ktime for template benchmarking Currently, we use the jiffies counter as a time source, by staring at it until a HZ period elapses, and then staring at it again and perform as many XOR operations as we can at the same time until another HZ period elapses, so that we can calculate the throughput. This takes longer than necessary, and depends on HZ, which is undesirable, since HZ is system dependent. Let's use the ktime interface instead, and use it to time a fixed number of XOR operations, which can be done much faster, and makes the time spent depend on the performance level of the system itself, which is much more reasonable. To ensure that we have the resolution we need even on systems with 32 kHz time sources, while not spending too much time in the benchmark on a slow CPU, let's switch to 3 attempts of 800 repetitions each: that way, we will only misidentify algorithms that perform within 10% of each other as the fastest if they are faster than 10 GB/s to begin with, which is not expected to occur on systems with such coarse clocks. On ThunderX2, I get the following results: Before: [72625.956765] xor: measuring software checksum speed [72625.993104] 8regs : 10169.000 MB/sec [72626.033099] 32regs : 12050.000 MB/sec [72626.073095] arm64_neon: 11100.000 MB/sec [72626.073097] xor: using function: 32regs (12050.000 MB/sec) After: [72599.650216] xor: measuring software checksum speed [72599.651188] 8regs : 10491 MB/sec [72599.652006] 32regs : 12345 MB/sec [72599.652871] arm64_neon : 11402 MB/sec [72599.652873] xor: using function: 32regs (12345 MB/sec) Link: https://lore.kernel.org/linux-crypto/20200923182230.22715-3-ardb@kernel.org/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# 524ccdbd	25-Sep-2020	Ard Biesheuvel <ardb@kernel.org>	crypto: xor - defer load time benchmark to a later time Currently, the XOR module performs its boot time benchmark at core initcall time when it is built-in, to ensure that the RAID code can make use of it when it is built-in as well. Let's defer this to a later stage during the boot, to avoid impacting the overall boot time of the system. Instead, just pick an arbitrary implementation from the list, and use that as the preliminary default. Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# af1a8899	20-May-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 47 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version you should have received a copy of the gnu general public license for example usr src linux copying if not write to the free software foundation inc 675 mass ave cambridge ma 02139 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 20 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170858.552543146@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 75f296d9	15-Nov-2017	Levin, Alexander (Sasha Levin) <alexander.levin@verizon.com>	kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK Convert all allocations that used a NOTRACK flag to stop using it. Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 27c4d548	26-Aug-2016	Herbert Xu <herbert@gondor.apana.org.au>	crypto: xor - Fix warning when XOR_SELECT_TEMPLATE is unset This patch fixes an unused label warning triggered when the macro XOR_SELECT_TEMPLATE is not set. Fixes: 39457acda913 ("crypto: xor - skip speed test if the xor...") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# 39457acd	19-Aug-2016	Martin Schwidefsky <schwidefsky@de.ibm.com>	crypto: xor - skip speed test if the xor function is selected automatically If the architecture selected the xor function with XOR_SELECT_TEMPLATE the speed result of the do_xor_speed benchmark is of limited value. The speed measurement increases the bootup time a little, which can makes a difference for kernels used in container like virtual machines. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# af7cf25d	10-Oct-2012	Jan Beulich <JBeulich@suse.com>	add further __init annotations to crypto/xor.c Allow particularly do_xor_speed() to be discarded post-init. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>
# 56a51991	21-May-2012	Jim Kukunas <james.t.kukunas@linux.intel.com>	crypto: disable preemption while benchmarking RAID5 xor checksumming With CONFIG_PREEMPT=y, we need to disable preemption while benchmarking RAID5 xor checksumming to ensure we're actually measuring what we think we're measuring. Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
# 6a328475	21-May-2012	Jim Kukunas <james.t.kukunas@linux.intel.com>	crypto: wait for a full jiffy in do_xor_speed In the existing do_xor_speed(), there is no guarantee that we actually run do_2() for a full jiffy. We get the current jiffy, then run do_2() until the next jiffy. Instead, let's get the current jiffy, then wait until the next jiffy to start our test. Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
# d788fec8	04-Apr-2012	Borislav Petkov <borislav.petkov@amd.com>	crypto, xor: Sanitize checksumming function selection output Currently, it says [ 1.015541] xor: automatically using best checksumming function: generic_sse [ 1.040769] generic_sse: 6679.000 MB/sec [ 1.045377] xor: using function: generic_sse (6679.000 MB/sec) and repeats the function name three times unnecessarily. Change it into [ 1.015115] xor: automatically using best checksumming function: [ 1.040794] generic_sse: 6680.000 MB/sec and save us a line in dmesg. No functional change. Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# 5a0e3ad6	24-Mar-2010	Tejun Heo <tj@kernel.org>	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
# 33f65df7	26-Feb-2009	Vegard Nossum <vegard.nossum@gmail.com>	crypto: don't track xor test pages with kmemcheck The xor tests are run on uninitialized data, because it is doesn't really matter what the underlying data is. Annotate this false- positive warning. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
# bff61975	30-Mar-2009	NeilBrown <neilb@suse.de>	md: move lots of #include lines out of .h files and into .c This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by: NeilBrown <neilb@suse.de>
# 9bc89cd8	02-Jan-2007	Dan Williams <dan.j.williams@intel.com>	async_tx: add the async_tx api The async_tx api provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. Code that is written to the api can optimize for asynchronous operation and the api will fit the chain of operations to the available offload resources. I imagine that any piece of ADMA hardware would register with the 'async_' subsystem, and a call to async_X would be routed as appropriate, or be run in-line. - Neil Brown async_tx exploits the capabilities of struct dma_async_tx_descriptor to provide an api of the following general format: struct dma_async_tx_descriptor async_<operation>(..., struct dma_async_tx_descriptor depend_tx, dma_async_tx_callback cb_fn, void cb_param) { struct dma_chan chan = async_tx_find_channel(depend_tx, <operation>); struct dma_device device = chan ? chan->device : NULL; int int_en = cb_fn ? 1 : 0; struct dma_async_tx_descriptor tx = device ? device->device_prep_dma_<operation>(chan, len, int_en) : NULL; if (tx) { / run <operation> asynchronously / ... tx->tx_set_dest(addr, tx, index); ... tx->tx_set_src(addr, tx, index); ... async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param); } else { / run <operation> synchronously / ... <operation> ... async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param); } return tx; } async_tx_find_channel() returns a capable channel from its pool. The channel pool is organized as a per-cpu array of channel pointers. The async_tx_rebalance() routine is tasked with managing these arrays. In the uniprocessor case async_tx_rebalance() tries to spread responsibility evenly over channels of similar capabilities. For example if there are two copy+xor channels, one will handle copy operations and the other will handle xor. In the SMP case async_tx_rebalance() attempts to spread the operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor channel0 while cpu1 gets copy channel 1 and xor channel 1. When a dependency is specified async_tx_find_channel defaults to keeping the operation on the same channel. A xor->copy->xor chain will stay on one channel if it supports both operation types, otherwise the transaction will transition between a copy and a xor resource. Currently the raid5 implementation in the MD raid456 driver has been converted to the async_tx api. A driver for the offload engines on the Intel Xscale series of I/O processors, iop-adma, is provided in a later commit. With the iop-adma driver and async_tx, raid456 is able to offload copy, xor, and xor-zero-sum operations to hardware engines. On iop342 tiobench showed higher throughput for sequential writes (20 - 30% improvement) and sequential reads to a degraded array (40 - 55% improvement). For the other cases performance was roughly equal, +/- a few percentage points. On a x86-smp platform the performance of the async_tx implementation (in synchronous mode) was also +/- a few percentage points of the original implementation. According to 'top' on iop342 CPU utilization drops from ~50% to ~15% during a 'resync' while the speed according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s. The tiobench command line used for testing was: tiobench --size 2048 --block 4096 --block 131072 --dir /mnt/raid --numruns 5 iop342 had 1GB of memory available Details: * if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making async_tx_find_channel a static inline routine that always returns NULL * when a callback is specified for a given transaction an interrupt will fire at operation completion time and the callback will occur in a tasklet. if the the channel does not support interrupts then a live polling wait will be performed * the api is written as a dmaengine client that requests all available channels * In support of dependencies the api implicitly schedules channel-switch interrupts. The interrupt triggers the cleanup tasklet which causes pending operations to be scheduled on the next channel * Xor engines treat an xor destination address differently than a software xor routine. To the software routine the destination address is an implied source, whereas engines treat it as a write-only destination. This patch modifies the xor_blocks routine to take a an explicit destination address to mirror the hardware. Changelog: * fixed a leftover debug print * don't allow callbacks in async_interrupt_cond * fixed xor_block changes * fixed usage of ASYNC_TX_XOR_DROP_DEST * drop dma mapping methods, suggested by Chris Leech * printk warning fixups from Andrew Morton * don't use inline in C files, Adrian Bunk * select the API when MD is enabled * BUG_ON xor source counts <= 1 * implicitly handle hardware concerns like channel switching and interrupts, Neil Brown * remove the per operation type list, and distribute operation capabilities evenly amongst the available channels * simplify async_tx_find_channel to optimize the fast path * introduce the channel_table_initialized flag to prevent early calls to the api * reorganize the code to mimic crypto * include mm.h as not all archs include it in dma-mapping.h * make the Kconfig options non-user visible, Adrian Bunk * move async_tx under crypto since it is meant as 'core' functionality, and the two may share algorithms in the future * move large inline functions into c files * checkpatch.pl fixes * gpl v2 only correction Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
# 685784aa	09-Jul-2007	Dan Williams <dan.j.williams@intel.com>	xor: make 'xor_blocks' a library routine for use with async_tx The async_tx api tries to use a dma engine for an operation, but will fall back to an optimized software routine otherwise. Xor support is implemented using the raid5 xor routines. For organizational purposes this routine is moved to a common area. The following fixes are also made: * rename xor_block => xor_blocks, suggested by Adrian Bunk * ensure that xor.o initializes before md.o in the built-in case * checkpatch.pl fixes * mark calibrate_xor_blocks __init, Adrian Bunk Cc: Adrian Bunk <bunk@stusta.de> Cc: NeilBrown <neilb@suse.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com>