History log of /linux-master/drivers/md/bcache/bset.c
Revision Date Author Comments
# 11e529cc 19-Sep-2022 Jules Maselbas <jmaselbas@kalray.eu>

bcache: bset: Fix comment typos

Remove the redundant word `by`, correct the typo `creaated`.

CC: Kent Overstreet <kent.overstreet@gmail.com>
CC: linux-bcache@vger.kernel.org
Signed-off-by: Jules Maselbas <jmaselbas@kalray.eu>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20220919161647.81238-4-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6751c1e3 09-Feb-2021 Joe Perches <joe@perches.com>

bcache: Avoid comma separated statements

Use semicolons and braces.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5fe48867 25-Jul-2020 Coly Li <colyli@suse.de>

bcache: allocate meta data pages as compound pages

There are some meta data of bcache are allocated by multiple pages,
and they are used as bio bv_page for I/Os to the cache device. for
example cache_set->uuids, cache->disk_buckets, journal_write->data,
bset_tree->data.

For such meta data memory, all the allocated pages should be treated
as a single memory block. Then the memory management and underlying I/O
code can treat them more clearly.

This patch adds __GFP_COMP flag to all the location allocating >0 order
pages for the above mentioned meta data. Then their pages are treated
as compound pages now.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 46f5aa88 26-May-2020 Joe Perches <joe@perches.com>

bcache: Convert pr_<level> uses to a more typical style

Remove the trailing newline from the define of pr_fmt and add newlines
to the uses.

Miscellanea:

o Convert bch_bkey_dump from multiple uses of pr_err to pr_cont
as the earlier conversion was inappropriate done causing multiple
lines to be emitted where only a single output line was desired
o Use vsprintf extension %pV in bch_cache_set_error to avoid multiple
line output where only a single line output was desired
o Coalesce formats

Fixes: 6ae63e3501c4 ("bcache: replace printk() by pr_*() routines")

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7a0bc2a8 23-Jan-2020 Coly Li <colyli@suse.de>

bcache: add code comments for state->pool in __btree_sort()

To explain the pages allocated from mempool state->pool can be
swapped in __btree_sort(), because state->pool is a page pool,
which allocates pages by alloc_pages() indeed.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 15fbb231 13-Nov-2019 Christoph Hellwig <hch@lst.de>

bcache: don't export symbols

None of the exported bcache symbols are actually used anywhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 06c1526d 13-Nov-2019 Coly Li <colyli@suse.de>

bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front()

This patch adds simple code comments for bch_keylist_pop() and
bch_keylist_pop_front() in bset.c, to make the code more easier to
be understand.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 944a4f34 28-Jun-2019 Coly Li <colyli@suse.de>

bcache: make bset_search_tree() be more understandable

The purpose of following code in bset_search_tree() is to avoid a branch
instruction,
994 if (likely(f->exponent != 127))
995 n = j * 2 + (((unsigned int)
996 (f->mantissa -
997 bfloat_mantissa(search, f))) >> 31);
998 else
999 n = (bkey_cmp(tree_to_bkey(t, j), search) > 0)
1000 ? j * 2
1001 : j * 2 + 1;

This piece of code is not very clear to understand, even when I tried to
add code comment for it, I made mistake. This patch removes the implict
bit operation and uses explicit branch to calculate next location in
binary tree search.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# bd9026c8 28-Jun-2019 Coly Li <colyli@suse.de>

bcache: remove unncessary code in bch_btree_keys_init()

Function bch_btree_keys_init() initializes b->set[].size and
b->set[].data to zero. As the code comments indicates, these code indeed
is unncessary, because both struct btree_keys and struct bset_tree are
nested embedded into struct btree, when struct btree is filled with 0
bits by kzalloc() in mca_bucket_alloc(), b->set[].size and
b->set[].data are initialized to 0 (a.k.a NULL) already.

This patch removes the redundant code, and add comments in
bch_btree_keys_init() and mca_bucket_alloc() to explain why it's safe.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f960facb 28-Jun-2019 Coly Li <colyli@suse.de>

bcache: remove unnecessary prefetch() in bset_search_tree()

In function bset_search_tree(), when p >= t->size, t->tree[0] will be
prefetched by the following code piece,
974 unsigned int p = n << 4;
975
976 p &= ((int) (p - t->size)) >> 31;
977
978 prefetch(&t->tree[p]);

The purpose of the above code is to avoid a branch instruction, but
when p >= t->size, prefetch(&t->tree[0]) has no positive performance
contribution at all. This patch avoids the unncessary prefetch by only
calling prefetch() when p < t->size.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 31b90956 09-Jun-2019 Coly Li <colyli@suse.de>

bcache: fix stack corruption by PRECEDING_KEY()

Recently people report bcache code compiled with gcc9 is broken, one of
the buggy behavior I observe is that two adjacent 4KB I/Os should merge
into one but they don't. Finally it turns out to be a stack corruption
caused by macro PRECEDING_KEY().

See how PRECEDING_KEY() is defined in bset.h,
437 #define PRECEDING_KEY(_k) \
438 ({ \
439 struct bkey *_ret = NULL; \
440 \
441 if (KEY_INODE(_k) || KEY_OFFSET(_k)) { \
442 _ret = &KEY(KEY_INODE(_k), KEY_OFFSET(_k), 0); \
443 \
444 if (!_ret->low) \
445 _ret->high--; \
446 _ret->low--; \
447 } \
448 \
449 _ret; \
450 })

At line 442, _ret points to address of a on-stack variable combined by
KEY(), the life range of this on-stack variable is in line 442-446,
once _ret is returned to bch_btree_insert_key(), the returned address
points to an invalid stack address and this address is overwritten in
the following called bch_btree_iter_init(). Then argument 'search' of
bch_btree_iter_init() points to some address inside stackframe of
bch_btree_iter_init(), exact address depends on how the compiler
allocates stack space. Now the stack is corrupted.

Fixes: 0eacac22034c ("bcache: PRECEDING_KEY()")
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Rolf Fokkens <rolf@rolffokkens.nl>
Reviewed-by: Pierre JUHEN <pierre.juhen@orange.fr>
Tested-by: Shenghui Wang <shhuiw@foxmail.com>
Tested-by: Pierre JUHEN <pierre.juhen@orange.fr>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3be11dba 10-Aug-2018 Coly Li <colyli@suse.de>

bcache: fix code comments style

This patch fixes 3 style issues warned by checkpatch.pl,
- Comment lines are not aligned
- Comments use "/*" on subsequent lines
- Comment lines use a trailing "*/"

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6ae63e35 10-Aug-2018 Coly Li <colyli@suse.de>

bcache: replace printk() by pr_*() routines

There are still many places in bcache use printk to display kernel
message, which are suggested to be preplaced by pr_*() routines like
pr_err(), pr_info(), or pr_notice().

This patch replaces all printk() with a proper pr_*() routine for
bcache code.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b0d30981 10-Aug-2018 Coly Li <colyli@suse.de>

bcache: style fixes for lines over 80 characters

This patch fixes the lines over 80 characters into more lines, to minimize
warnings by checkpatch.pl. There are still some lines exceed 80 characters,
but it is better to be a single line and I don't change them.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1fae7cf0 10-Aug-2018 Coly Li <colyli@suse.de>

bcache: style fix to add a blank line after declarations

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6f10f7d1 10-Aug-2018 Coly Li <colyli@suse.de>

bcache: style fix to replace 'unsigned' by 'unsigned int'

This patch fixes warning reported by checkpatch.pl by replacing 'unsigned'
with 'unsigned int'.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b467a6ac 09-Aug-2018 Coly Li <colyli@suse.de>

bcache: add code comments for bset.c

This patch tries to add code comments in bset.c, to make some
tricky code and designment to be more comprehensible. Most information
of this patch comes from the discussion between Kent and I, he
offers very informative details. If there is any mistake
of the idea behind the code, no doubt that's from me misrepresentation.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d19936a2 20-May-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcache: convert to bioset_init()/mempool_init()

Convert bcache to embedded bio sets.

Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 42361469 18-Mar-2018 Bart Van Assche <bvanassche@acm.org>

bcache: Suppress more warnings about set-but-not-used variables

This patch does not change any functionality.

Reviewed-by: Michael Lyle <mlyle@lyle.org>
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# e6017571 01-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>

We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.

Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 501d52a9 19-May-2014 Kent Overstreet <kmo@daterainc.com>

bcache: Allocate bounce buffers with GFP_NOWAIT

There's no point in blocking on these allocations, since our fallback paths will
probably go faster than blocking.

Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c


# 85cbe1f8 17-Feb-2014 Kent Overstreet <kmo@daterainc.com>

bcache: Fix another compiler warning on m68k

Use a bigger hammer this time

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org>


# 3572324a 10-Jan-2014 Kent Overstreet <kmo@daterainc.com>

bcache: Minor fixes from kbuild robot

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 9dd6358a 17-Dec-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Fix auxiliary search trees for key size > cacheline size

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 3bdad1e4 11-Nov-2013 Nicholas Swenson <nks@daterainc.com>

bcache: Add bch_bkey_equal_header()

Checks if two keys have equivalent header fields.
(good enough for replacement or merging)

Used in bch_bkey_try_merge, and replacing a key
in the btree.

Signed-off-by: Nicholas Swenson <nks@daterainc.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 0f49cf3d 14-Oct-2013 Nicholas Swenson <nks@daterainc.com>

bcache: update bch_bkey_try_merge

Added generic header checks to bch_bkey_try_merge,
which then calls the bkey specific function

Removed extraneous checks from bch_extent_merge

Signed-off-by: Nicholas Swenson <nks@daterainc.com>


# 829a60b9 11-Nov-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Move insert_fixup() to btree_keys_ops

Now handling overlapping extents/keys is a method that's specific to what the
btree node contains.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 89ebb4a2 11-Nov-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Convert sorting to btree_keys

More work to disentangle various code from struct btree

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# dc9d98d6 18-Dec-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Convert debug code to btree_keys

More work to disentangle various code from struct btree

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# c052dd9a 11-Nov-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Convert btree_iter to struct btree_keys

More work to disentangle bset.c from struct btree

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# f67342dd 11-Nov-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Refactor bset_tree sysfs stats

We're in the process of turning bset.c into library code, so none of the code in
that file should know about struct cache_set or struct btree - so, move the
btree traversal part of the stats code to sysfs.c.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# a85e968e 20-Dec-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Add struct btree_keys

Soon, bset.c won't need to depend on struct btree.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 65d45231 20-Dec-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Abstract out stuff needed for sorting

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# ee811287 18-Dec-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Rename/shuffle various code around

More work to disentangle bset.c from the rest of the code:

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 67539e85 10-Sep-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Add struct bset_sort_state

More disentangling bset.c from the rest of the bcache code - soon, the
sorting routines won't have any dependencies on any outside structs.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 911c9610 28-Jul-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Split out sort_extent_cmp()

Only use extent comparison for comparing extents, so we're not using
START_KEY() on other key types (i.e. btree pointers)

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# fafff81c 17-Dec-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Bkey indexing renaming

More refactoring:

node() -> bset_bkey_idx()
end() -> bset_bkey_last()

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 085d2a3d 11-Nov-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Make bch_keylist_realloc() take u64s, not nptrs

Getting away from KEY_PTRS and moving toward KEY_U64s - and getting rid of magic
2s

Also - split out the part that checks against journal entry size so as to avoid
a dependancy on struct cache_set in bset.c

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 0a451145 18-Dec-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Use a mempool for mergesort temporary space

It was a single element mempool before, it's slightly cleaner to just use a real
mempool.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 78b77bf8 17-Dec-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Btree verify code improvements

Used this fixed code to find and fix the bug fixed by
a4d885097b0ac0cd1337f171f2d4b83e946094d4.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# d56d000a 09-Aug-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Don't touch bucket gen for dirty ptrs

Unnecessary since a bucket that has dirty pointers pointing to it can
never be invalidated - and skipping it is a measurable performance
boost, since the bucket gen will usually be a cache miss.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# ef71ec00 17-Dec-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Data corruption fix

The code that handles overlapping extents that we've just read back in from disk
was depending on the behaviour of the code that handles overlapping extents as
we're inserting into a btree node in the case of an insert that forced an
existing extent to be split: on insert, if we had to split we'd also insert a
new extent to represent the top part of the old extent - and then that new
extent would get written out.

The code that read the extents back in thus not bother with splitting extents -
if it saw an extent that ovelapped in the middle of an older extent, it would
trim the old extent to only represent the bottom part, assuming that the
original insert would've inserted a new extent to represent the top part.

I still haven't figured out _how_ it can happen, but I'm now pretty convinced
(and testing has confirmed) that there's some kind of an obscure corner case
(probably involving extent merging, and multiple overwrites in different sets)
that breaks this. The fix is to change the mergesort fixup code to split extents
itself when required.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10


# 098fb254 21-Aug-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Delete some slower inline asm

Never saw a profile of bset_search_tree() where it wasn't bottlenecked
on memory until I got my new Haswell machine, but when I tried it there
it was suddenly burning 20% of the cpu in the inner loop on shrd...

Turns out, the version of shrd that takes 64 bit operands has a 9 cycle
latency. hah.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 65d22e91 31-Jul-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Move spinlock into struct time_stats

Minor cleanup.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 50310164 10-Sep-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Kill bch_next_recurse_key()

This dates from before the btree iterator, and now it's finally gone

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# d5cc66e9 25-Jul-2013 Kent Overstreet <kmo@daterainc.com>

bcache: bch_(btree|extent)_ptr_invalid()

Trying to treat btree pointers and leaf node pointers the same way was a
mistake - going to start being more explicit about the type of
key/pointer we're dealing with. This is the first part of that
refactoring; this patch shouldn't change any actual behaviour.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 280481d0 24-Oct-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Debug code improvements

Couple changes:
* Consolidate bch_check_keys() and bch_check_key_order(), and move the
checks that only check_key_order() could do to bch_btree_iter_next().

* Get rid of CONFIG_BCACHE_EDEBUG - now, all that code is compiled in
when CONFIG_BCACHE_DEBUG is enabled, and there's now a sysfs file to
flip on the EDEBUG checks at runtime.

* Dropped an old not terribly useful check in rw_unlock(), and
refactored/improved a some of the other debug code.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# e58ff155 24-Jul-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Fix bch_ptr_bad()

Previously, bch_ptr_bad() could return false when there was a pointer to
a nonexistant device... it only filtered out keys with PTR_CHECK_DEV
pointers.

This behaviour was intended for multiple cache device support; for that,
just because the device for one of the pointers has gone away doesn't
mean we want to filter out the rest of the pointers.

But we don't yet explicitly filter/check individual pointers, so without
that this behaviour was wrong - a corrupt bkey with a bad device pointer
could cause us to deref a bad pointer. Doh.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 81ab4190 31-Oct-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Pull on disk data structures out into a separate header

Now, the on disk data structures are in a header that can be exported to
userspace - and having them all centralized is nice too.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# b54d6934 24-Jul-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Kill op->cl

This isn't used for waiting asynchronously anymore - so this is a fairly
trivial refactoring.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# c18536a7 24-Jul-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Prune struct btree_op

Eventual goal is for struct btree_op to contain only what is necessary
for traversing the btree.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 48dad8ba 10-Sep-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Add btree_map() functions

Lots of stuff has been open coding its own btree traversal - which is
generally pretty simple code, but there are a few subtleties.

This adds new new functions, bch_btree_map_nodes() and
bch_btree_map_keys(), which do the traversal for you. Everything that's
open coding btree traversal now (with the exception of garbage
collection) is slowly going to be converted to these two functions;
being able to write other code at a higher level of abstraction is a
big improvement w.r.t. overall code quality.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# c2f95ae2 24-Jul-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Clean up keylist code

More random refactoring.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 26c949f8 10-Sep-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Add btree_insert_node()

The flow of control in the old btree insertion code was rather -
backwards; we'd recurse down the btree (in btree_insert_recurse()), and
then if we needed to split the keys to be inserted into the parent node
would be effectively returned up to btree_insert_recurse(), which would
notice there was more work to do and finish the insertion.

The main problem with this was that the full logic for btree insertion
could only be used by calling btree_insert_recurse; if you'd gotten to a
btree leaf some other way and had a key to insert, if it turned out that
node needed to be split you were SOL.

This inverts the flow of control so btree_insert_node() does _full_
btree insertion, including splitting - and takes a (leaf) btree node to
insert into as a parameter.

This means we can now _correctly_ handle cache misses - for cache
misses, we need to insert a fake "check" key into the btree when we
discover we have a cache miss - while we still have the btree locked.
Previously, if the btree node was full inserting a cache miss would just
fail.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# 84786438 24-Sep-2013 Kent Overstreet <kmo@daterainc.com>

bcache: Fix for handling overlapping extents when reading in a btree node

btree_sort_fixup() was overly clever, because it was trying to avoid
pulling a key off the btree iterator in more than one place.

This led to a really obscure bug where we'd break early from the loop in
btree_sort_fixup() if the current key overlapped with keys in more than
one older set, and the next key it overlapped with was zero size.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6ded34d1 11-May-2013 Kent Overstreet <koverstreet@google.com>

bcache: Improve lazy sorting

The old lazy sorting code was kind of hacky - rewrite in a way that
mathematically makes more sense; the idea is that the size of the sets
of keys in a btree node should increase by a more or less fixed ratio
from smallest to biggest.

Signed-off-by: Kent Overstreet <koverstreet@google.com>


# 85b1492e 14-May-2013 Kent Overstreet <koverstreet@google.com>

bcache: Rip out pkey()/pbtree()

Old gcc doesnt like the struct hack, and it is kind of ugly. So finish
off the work to convert pr_debug() statements to tracepoints, and delete
pkey()/pbtree().

Signed-off-by: Kent Overstreet <koverstreet@google.com>


# 48a73025 03-Jun-2013 Phil Viana <phillip.l.viana@gmail.com>

md: bcache: Fixed a typo with the word 'arithmetic'

The word 'arithmetic' was typed as 'arithmatic'

Signed-off-by: Phil Viana <phillip.l.viana@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# cc0f4eaa 27-Mar-2013 Kent Overstreet <koverstreet@google.com>

bcache: Use WARN_ONCE() instead of __WARN()

Signed-off-by: Kent Overstreet <koverstreet@google.com>


# cd953ed0 27-Mar-2013 Geert Uytterhoeven <geert@linux-m68k.org>

bcache: Add missing #include <linux/prefetch.h>

m68k/allmodconfig:

drivers/md/bcache/bset.c: In function ‘bset_search_tree’:
drivers/md/bcache/bset.c:727: error: implicit declaration of function ‘prefetch’

drivers/md/bcache/btree.c: In function ‘bch_btree_node_get’:
drivers/md/bcache/btree.c:933: error: implicit declaration of function ‘prefetch’

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kent Overstreet <koverstreet@google.com>


# 169ef1cf 28-Mar-2013 Kent Overstreet <koverstreet@google.com>

bcache: Don't export utility code, prefix with bch_

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: linux-bcache@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b1a67b0f 25-Mar-2013 Kent Overstreet <koverstreet@google.com>

bcache: Style/checkpatch fixes

Took out some nested functions, and fixed some more checkpatch
complaints.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: linux-bcache@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 07e86ccb 25-Mar-2013 Kent Overstreet <koverstreet@google.com>

bcache: Build fixes from test robot

config: make ARCH=i386 allmodconfig

All error/warnings:

drivers/md/bcache/bset.c: In function 'bch_ptr_bad':
>> drivers/md/bcache/bset.c:164:2: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat]
--
drivers/md/bcache/debug.c: In function 'bch_pbtree':
>> drivers/md/bcache/debug.c:86:4: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat]
--
drivers/md/bcache/btree.c: In function 'bch_btree_read_done':
>> drivers/md/bcache/btree.c:245:8: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' [-Wformat]
--
drivers/md/bcache/closure.o: In function `closure_debug_init':
>> (.init.text+0x0): multiple definition of `init_module'
>> drivers/md/bcache/super.o:super.c:(.init.text+0x0): first defined here

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: linux-bcache@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# cafe5635 23-Mar-2013 Kent Overstreet <koverstreet@google.com>

bcache: A block layer cache

Does writethrough and writeback caching, handles unclean shutdown, and
has a bunch of other nifty features motivated by real world usage.

See the wiki at http://bcache.evilpiepirate.org for more.

Signed-off-by: Kent Overstreet <koverstreet@google.com>