History log of /freebsd-11-stable/sys/geom/mirror/g_mirror.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 334611 04-Jun-2018 markj

MFC r333278, r333279:
Avoid dropping the topology lock in gmirror's dumpconf implementation.


# 332640 17-Apr-2018 kevans

MFC r332387: Annotate geom modules with MODULE_VERSION

GEOM ELI may double ask the password during boot. Once at loader time, and
once at init time.

This happens due a module loading bug. By default GEOM ELI caches the
password in the kernel, but without the MODULE_VERSION annotation, the
kernel loads over the kernel module, even if the GEOM ELI was compiled into
the kernel. In this case, the newly loaded module
purges/invalidates/overwrites the GEOM ELI's password cache, which causes
the double asking.

This MFC commit differs slightly from head, due to pc98 removal. These
changes were trivial and should be obvious.


# 329613 20-Feb-2018 markj

MFC r328938:
Simplify synchronization read error handling.


# 328334 24-Jan-2018 markj

MFC r327779, r327780:
Fix handling of read errors during synchronization.


# 328333 24-Jan-2018 markj

MFC r327496, r327760:
Fix some I/O ordering issues in gmirror.


# 327988 15-Jan-2018 markj

MFC r327700:
Sort and remove unneeded includes.


# 327804 11-Jan-2018 markj

MFC r327698:
Release the queue lock before restarting the worker loop.


# 327493 02-Jan-2018 markj

MFC r326983:
Avoid using bioq_* in gmirror.


# 327080 22-Dec-2017 markj

MFC r326881, r326882:
Minor cleanup.


# 327070 21-Dec-2017 markj

MFC r326409:
Update gmirror metadata less frequently when synchronizing.


# 326979 19-Dec-2017 markj

MFC r326796-r326798:
Fix sc_writes tracking, and address a lost wakeup.


# 326715 08-Dec-2017 markj

MFC r325044:
Fix a lock leak in g_mirror_destroy().


# 326696 08-Dec-2017 markj

MFC r302794, r306744, r307691, r307692, r316174, r316681, r316859,
r316866, r316867, r316869:
Various gmirror fixes and cleanups.


# 326530 04-Dec-2017 markj

MFC r326132:
Allow kern.geom.mirror.debug to be negative.


# 324588 13-Oct-2017 avg

MFC r323612: gmirror: treat ENXIO as disk disconnect, not media error


# 324402 07-Oct-2017 ngie

MFC r305508:
r305508 (by markj):

Add some fail points to gmirror.

These are useful for testing changes to I/O error handling, and for
reproducing existing bugs in a controlled manner. The fail points are

g_mirror_regular_request_read
g_mirror_regular_request_write
g_mirror_sync_request_read
g_mirror_sync_request_write
g_mirror_metadata_write

They all effectively allow one to inject an error value into the bio_error
field of a corresponding BIO request as it is being completed.


# 324401 07-Oct-2017 ngie

MFC r306743,r317712:

r306743 (by markj):

gmirror: Bump the syncid if broken disks are found during startup.

Consider a mirror with two components, m1 and m2. Suppose a hardware error
results in the removal of m2, with m1's genid bumped. Suppose further that
a replacement mirror component m3 is created and synchronized, after which
the system is shut down uncleanly. During a subsequent bootup, if gmirror
tastes m1 and m2 first, m2 will be removed from the mirror because it is
broken, but the mirror will be started without bumping the syncid on m1
because all elements of the mirror are accounted for. Then m3 will be
added to the already-running mirror with the same syncid as m1, so the
components will not be synchronized despite the unclean shutdown.

Handle this scenario by bumping the syncid of healthy components if any
broken mirrors are discovered during mirror startup.

r317712 (by markj):

Synchronize unclean mirrors before adding them to a running gmirror.

During gmirror startup, if component mirrors are found to be dirty as is
typical after a system crash, the mirrors are synchronized to the mirror
with highest priority. However if a gmirror starts without all of its
mirrors present, for example because of some transient delays during
tasting, the remaining mirrors must be synchronized before they may become
active.


# 318752 23-May-2017 mav

MFC r309321:
Add `gmirror create` subcommand, alike to gstripe, gconcat, etc.

It is quite specific mode of operation without storing on-disk metadata.
It can be useful in some cases in combination with some external control
tools handling mirror creation and disks hot-plug.

Sponsored by: iXsystems, Inc.


# 316711 11-Apr-2017 markj

MFC r316032:
Refine r301173 a bit.


# 316709 11-Apr-2017 markj

MFC r316175:
Avoid sleeping when the mirror I/O queue is non-empty.


# 309205 26-Nov-2016 mav

MFC r308608:
Use providergone method to cover race between destroy and g_access().


# 307665 20-Oct-2016 mav

MFC r306762: Fix possible geom destruction before final provider close.

Introduce internal counter to track opens. Using provider's counters is
not very successfull after calling g_wither_provider().


# 307645 19-Oct-2016 markj

MFC r306742:
gmirror: Use bool instead of boolean_t.


# 306764 06-Oct-2016 mav

MFC r306279: Use g_wither_provider() where applicable.

It is just a helper function combining G_PF_WITHER setting with
g_orphan_provider().


# 306373 27-Sep-2016 markj

MFC r305509:
Don't treat an error from g_mirror_clear_metadata() as fatal.


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302091 22-Jun-2016 markj

Do not complete pending gmirror BIOs when tearing down the provider.

This will result in lock recursion and is more generally incorrect since
the completion handlers will just reinsert the BIOs into the queue we're
trying to drain.

Reviewed by: imp, ngie
Approved by: re (gjb)
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D6908


# 301173 01-Jun-2016 glebius

When we are in panic, always go the asynchronous path in g_mirror_destroy(),
otherwise the system will hang.

This is a temporarily least intrusive crutch to get certain panicing systems
dumping. The proper fix should question is g_mirror_destroy() should be called
on a panicing system at all.

Discussed with: mav


# 300288 20-May-2016 kib

Removal of Giant droping wrappers for GEOM classes.

Sponsored by: The FreeBSD Foundation


# 298808 29-Apr-2016 pfg

sys/geom: spelling fixes in comments.

No functional change.


# 298698 27-Apr-2016 pfg

geom: unsign some types to match their definitions and avoid overflows.

In struct:gctl_req, nargs is unsigned.

In mirror:
g_mirror_syncreqs is unsigned.

In raid:
in struct:g_raid_volume, v_disks_count is unsigned.

In virstor:
in struct:g_virstor_softc, n_components is unsigned.

MFC after: 2 weeks


# 297955 14-Apr-2016 imp

Bump bio_cmd and bio_*flags from 8 bits to 16.

Differential Revision: https://reviews.freebsd.org/D5784


# 295707 17-Feb-2016 imp

Create an API to reset a struct bio (g_reset_bio). This is mandatory
for all struct bio you get back from g_{new,alloc}_bio. Temporary
bios that you create on the stack or elsewhere should use this before
first use of the bio, and between uses of the bio. At the moment, it
is nothing more than a wrapper around bzero, but that may change in
the future. The wrapper also removes one place where we encode the
size of struct bio in the KBI.


# 283291 22-May-2015 jkim

CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks


# 280758 27-Mar-2015 mav

Remove extra semicolon.

MFC after: 1 week


# 280757 27-Mar-2015 mav

Remove request sorting from GEOM_MIRROR and GEOM_RAID.

When CPU is not busy, those queues are typically empty. When CPU is busy,
then one more extra sorting is the last thing it needs. If specific device
(HDD) really needs sorting, then it will be done later by CAM.

This supposed to fix livelock reported for mirror of two SSDs, when UFS
fires zillion of BIO_DELETE requests, that totally blocks I/O subsystem by
pointless sorting of requests and responses under single mutex lock.

MFC after: 2 weeks


# 280756 27-Mar-2015 mav

Fix bug on memory allocation error in split method.

While there, use bioq_takefirst() in place where it is convenient.

MFC after: 1 week


# 279913 12-Mar-2015 mav

Fix couple BIO_DELETE bugs in geom_mirror.

Do not report GEOM::candelete if none of providers support BIO_DELETE.
If consumer still requests BIO_DELETE, report error instead of hanging.

MFC after: 2 weeks


# 267992 28-Jun-2014 hselasky

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 267985 27-Jun-2014 gjb

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 267961 27-Jun-2014 hselasky

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 264142 05-Apr-2014 bdrewery

Show error code when failing to destroy a mirror on delay

Sponsored by: EMC / Isilon Storage Division
MFC after: 2 weeks


# 259929 27-Dec-2013 ae

Add an ability to stop gmirror and clear its metadata in one command.
This fixes the problem, when gmirror starts again just after stop.

The problem occurs when gmirror's component has geom label with equal size.
E.g. gpt and gptid have the same size as partition, diskid has the same
size as entire disk. When gmirror's geom has been destroyed, glabel
creates its providers and this initiate retaste.

Now "gmirror destroy" command is available. It destroys geom and also
erases gmirror's metadata.

MFC after: 2 weeks


# 258357 19-Nov-2013 ae

Add "resize" verb to gmirror(8) and such functionality to geom_mirror(4).
Now it is easy to expand the size of the mirror when all its components
are replaced. Also add g_resize method to geom_mirror class. It will write
updated metadata to new last sector, when parent provider is resized.

Silence from: geom@
MFC after: 1 month


# 256880 22-Oct-2013 mav

Merge GEOM direct dispatch changes from the projects/camlock branch.

When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
- caller should not hold any locks and should be reenterable;
- callee should not depend on GEOM dual-threaded concurency semantics;
- on the way down, if request is unmapped while callee doesn't support it,
the context should be sleepable;
- kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
- G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
- G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
- G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
- G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe. If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by: iXsystems, Inc.
MFC after: 2 months


# 254252 12-Aug-2013 ed

Fix the formatting of the error message.

The G_MIRROR_DEBUG() macro already appends a newline. Also, most of the
log messages emitted by gmirror start with an uppercase letter.


# 252011 19-Jun-2013 scottl

Fix a mystery cut-n-paste corruption from the previous commit.

Submitted by: Brenden Fabeny


# 252010 19-Jun-2013 scottl

Mark geom_mirror as capable of unmapped i/o

Obtained from: Netflix
MFC after: 3 days


# 245946 26-Jan-2013 avg

g_mirror: g_getattr() failure should not be fatal

This allows to use gmirror e.g. on top of ZVOLs.

PR: kern/175323
Submitted by: Alexei.Volkov@softlynx.ru, mav
Reported by: Alexei.Volkov@softlynx.ru
Tested by: Alexei.Volkov@softlynx.ru
Reviewed by: ae, mav, pjd
MFC after: 1 week


# 245443 14-Jan-2013 mav

Alike to r242314 for GRAID make GMIRROR more aggressive in marking volumes
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully. To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

PR: kern/113957
MFC after: 2 weeks


# 240371 11-Sep-2012 glebius

When synchronizing, include in the config dump amount of
bytes syncronized.
The rationale behind this is the following: for large disks the
percent synchronisation counter ticks too seldom, and monitoring
software (as well as human operator) can't tell whether
synchronisation goes on or one of disks got stuck. On an idle
server one can look into gstat and see whether synchronisation goes
on or not, but on a busy server that won't work. Also, new value
monitored can be differentiated obtaining the synchronisation speed
quite precisely.

Submitted by: Konstantin Kukushkin <dark ramtel.ru>
Reviewed by: pjd


# 237930 01-Jul-2012 glebius

Make geom_mirror more friendly to SSDs. To properly support TRIM,
we need to pass BIO_DELETE requests down to providers that support
it. Also, we need to announce our support for BIO_DELETE to upper
consumer. This requires:

- In g_mirror_start() return true for "GEOM::candelete" request.
- In g_mirror_init_disk() probe below provider for "GEOM::candelete"
attribute, and mark disk with a flag if it does support BIO_DELETE.
- In g_mirror_register_request() distribute BIO_DELETE requests only
to those disks, that do support it.

Note that we announce "GEOM::candelete" as true unconditionally of
whether we have TRIM-capable media down below or not. This is made
intentionally, because upper consumer (usually UFS) requests the
attribite only once at mount time. And if user ever migrates his
mirror from HDDs to SSDs, then he/she would get TRIM working without
remounting filesystem.

Reviewed by: pjd


# 237929 01-Jul-2012 glebius

In g_mirror_regular_request() upon successful delivery treat
BIO_DELETE requests same way as BIO_WRITE removing them from
queue. This fixes panic with BIO_DELETE operations on geom_mirror.

Reviewed by: pjd


# 235599 18-May-2012 ae

Introduce new device flag G_MIRROR_DEVICE_FLAG_TASTING. It should
protect geom from destroying while it is tasting.

PR: kern/154860
Reviewed by: pjd
MFC after: 1 week


# 227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 223921 11-Jul-2011 ae

Include sys/sbuf.h directly.

Reviewed by: pjd


# 221101 26-Apr-2011 mav

Implement relaxed comparision for hardcoded provider names to make it
ignore adX/adaY difference in both directions to simplify migration to
the CAM-based ATA or back.


# 219029 25-Feb-2011 netchild

Add some FEATURE macros for various GEOM classes.

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: silence on geom@ during 2 weeks
X-MFC after: to be determined in last commit with code from this project


# 211455 18-Aug-2010 mav

Remove bintime_cmp() function, unused since r200086.

MFC after: 1 week


# 201566 05-Jan-2010 mav

Move wakeup() out of mutex to reduce contention.


# 200935 24-Dec-2009 mav

As soon as mirror has no own stripes, report largest stripe of unrerlying
components, hoping others fit, if they are not equal.


# 200086 03-Dec-2009 mav

Change 'load' balancing mode algorithm:
- Instead of measuring last request execution time for each drive and
choosing one with smallest time, use averaged number of requests, running
on each drive. This information is more accurate and timely. It allows to
distribute load between drives in more even and predictable way.
- For each drive track offset of the last submitted request. If new request
offset matches previous one or close for some drive, prefer that drive.
It allows to significantly speedup simultaneous sequential reads.

PR: kern/113885
Reviewed by: sobomax


# 190878 10-Apr-2009 thompsa

Revert r190676,190677

The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by: scottl


# 190676 03-Apr-2009 thompsa

Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.


# 172836 20-Oct-2007 julian

Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.


# 170307 04-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 163888 01-Nov-2006 pjd

Now, that we have gjournal in the tree add possibility to configure
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.


# 163836 31-Oct-2006 pjd

Implement BIO_FLUSH handling by simply passing it down to the components.

Sponsored by: home.pl


# 162282 13-Sep-2006 pjd

Fix synchronization in gmirror and graid3 which I broken. Synchronization
request can still have bio_to set to sc_provider (this is READ part of a
synchronization request) and in this case g_{mirror,raid3}_sync() wasn't
called as it should be.

MFC after: 1 week


# 162188 09-Sep-2006 jmg

move created/detected/activated under debug level 1 to quiet the common case..

add count of active and total components to the launched line so you can
see at a glance if your mirror/raid3 is complete...

now:
GEOM_MIRROR: Device mirror/sam launched (2/2).

Reviewed by: pjd


# 161116 09-Aug-2006 pjd

Not only a request from us can be passed to g_{mirror,raid3}_worker()
function, but also a request to us, in which case checking bio_cflags
is wrong, because the class above us is controling it, not we.

MFC after: 1 week


# 160964 04-Aug-2006 yar

Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR: misc/101245
Submitted by: Darren Pilgrim <darren pilgrim bitfreak org>
Tested by: md5(1)
MFC after: 1 week


# 160895 01-Aug-2006 pjd

Don't use f-word in comments. We are gentlemans.

Pointed out by: Maciej Sobczak


# 160248 10-Jul-2006 pjd

Use proper defines instead of magic values.

MFC after: 1 week


# 160081 03-Jul-2006 pjd

Allow to close access even if device is already destroyed.

Reported by: Ulrich Spoerlein <uspoerlein@gmail.com>
PR: kern/98093
MFC after: 1 week


# 158116 28-Apr-2006 pjd

- Remove dead code.
- Comment possible event miss, which isn't critical, but probably can be
fixed by replacing the event lock usage with the queue lock.

MFC after: 2 weeks


# 158112 28-Apr-2006 pjd

Be sure to not destroy device twice. This is not possible in theory, but
with this change there is even no theoretical race.

MFC after: 2 weeks


# 157630 10-Apr-2006 pjd

Introduce and use delayed-destruction functionality from a pre-sync hook,
which means that devices will be destroyed on last close.

This fixes destruction order problems when, eg. RAID3 array is build on
top of RAID1 arrays.

Requested, reviewed and tested by: ru
MFC after: 2 weeks


# 157290 30-Mar-2006 pjd

- 'ndisks' variable is not boolean, so compare it with a value.
- Keep conditions order consistent with the comment above.

MFC after: 3 days


# 156878 19-Mar-2006 pjd

Update copyright for 2006.


# 156873 19-Mar-2006 pjd

kern.geom.mirror.sync_requests=2 seems to be a better default - it still
keeps disks very busy, but makes system much more responsive.

While here, kill extra space.


# 156684 13-Mar-2006 ru

Fix build on 64-bit platforms.


# 156610 12-Mar-2006 pjd

- Speed up synchronization process by using configurable number of I/O
requests in parallel.
+ Add kern.geom.mirror.sync_requests tunable which defines how many parallel
I/O requests should be used.
+ Retire kern.geom.mirror.reqs_per_sync and kern.geom.mirror.syncs_per_sec
sysctls.
- Fix race between regular and synchronization requests.
- Reimplement mirror's data synchronization - do not use the topology lock
for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.

MFC after: 3 days


# 156421 08-Mar-2006 pjd

Allow to dump kernel to gmirror providers.
Some conditions have to be met to make it work properly. This will be
described in the manual page.

MFC after: 3 days


# 155582 12-Feb-2006 pjd

On component state change to ACTIVE don't forget to update metadata.

MFC after: 3 days


# 155581 12-Feb-2006 pjd

Use time_uptime instead of time_second, as the latter may go backwards.

Suggested by: ru
MFC after: 3 days


# 155545 11-Feb-2006 pjd

- Add kern.geom.mirror.disconnect_on_failure sysctl/tunnable (default to 1
to preserve currect behaviour). When set to 0, components are not
disconnected - gmirror will try to still use them (only first error will
be logged). This is helpful when we have two broken components, but in
different places, so actually all data is available.
Such buggy component will be visible in 'gmirror list' output with flag
BROKEN.
- Never disconnect the last valid component. If we detect errors there we
will just pass them up. This wasn't reasonable to deny access to the
whole provider because of one broken sector.

Prodded by: ru
MFC after: 3 days


# 155539 11-Feb-2006 pjd

Mark array as CLEAN when there are no write requests in
kern.geom.mirror.idletime seconds. Write, not any requests.
Mark array as clean immediatelly on last write close.

Prodded by: ru
MFC after: 3 days


# 155174 01-Feb-2006 pjd

Remove trailing spaces.


# 154538 18-Jan-2006 pjd

Remove dead code.

Found by: Coverity Prevent(tm)
Coverity ID: CID104
MFC after: 3 days


# 152967 30-Nov-2005 sobomax

Check for g_read_data(9) errors properly:

o The only indication of error condition is NULL value returned by
the function;

o value pointed to by error argument is undefined in the case when
operation completes successfully.

Discussed with: phk


# 151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


# 146624 25-May-2005 pjd

After provider creation!!


# 146616 25-May-2005 pjd

- Call root_mount_rel() when provider IS created, not earlier.
This should close the race observed by Daniel Eriksson.
- Remove redundant wakeup().


# 146538 23-May-2005 pjd

Add some debug code to diagnose root-on-mirror problems with recent -current.

Reported by: Daniel Eriksson


# 146110 11-May-2005 pjd

Add KASSERT() to be sure there is an active component.

Suggested by: Coverity Prevent analysis tool


# 145305 19-Apr-2005 pjd

Remove the hack which allowed to use gmirror for root file system,
use root_mount KPI instead.


# 144143 26-Mar-2005 pjd

Make the code more obvious - when an error occurs in g_mirror_connect_disk(),
detach and destroy consumer before returning.


# 142727 27-Feb-2005 pjd

- Add md_provsize field to metadata, which will help with
shared-last-sector problem.
After this change, even if there is more than one provider with the same
last sector, the proper one will be chosen based on its size.
It still doesn't fix the 'c' partition problem (when da0s1 can be confused
with da0s1c) and situation when 'a' partition starts at offset 0
(then da0s1a can be confused with da0s1 and da0s1c). One can use '-h'
option there, when creating device or avoid sharing last sector.
Actually, when providers share the same last sector and their size is equal,
they provide exactly the same data, so the name (da0s1, da0s1a, da0s1c)
isn't important at all.
- Provide backward compatibility.
- Update copyright's year.

MFC after: 1 week


# 141994 16-Feb-2005 pjd

Update copyright in files changed this year.


# 139940 09-Jan-2005 pjd

Increase default synchronization speed.

MFC after: 3 days


# 139670 04-Jan-2005 pjd

Spoiling is now not possible, because we keep consumers open for writing
all the time. Remove unused code then.

MFC after: 4 days


# 139650 03-Jan-2005 pjd

Fix 'rebuild' command (we ignore retaste event now, so don't relay on it).


# 139451 30-Dec-2004 jhb

Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.


# 139246 23-Dec-2004 pjd

Update disk->d_genid field when increasing sc->sc_genid.


# 139213 22-Dec-2004 pjd

- Add genid field to the metadata which will allow to improve reliability a bit.
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Add version change history.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.


# 139146 21-Dec-2004 pjd

Now, when force device destruction is done on shutdown, hide warning,
that device cannot be destroyed immediately, under debug=1.

Suggested by: simon


# 139140 21-Dec-2004 pjd

This should not be permitted, but some GEOM classes held the topology lock
while doing g_(read|write)_data() (e.g. BSD). This can cause a deadlock
in MIRROR class. Not sure if this is safe to drop the topology lock in BSD
class, so change the code in MIRROR class to avoid this deadlock.


# 139054 19-Dec-2004 pjd

Remove unused variables.


# 139053 19-Dec-2004 pjd

- Argument 'flags' in g_mirror_destroy_consumer() function is unsed -
mark it as such.
- Before closing consumer check if it is open. It can be closed here
when g_mirror_connect_disk() fails on g_access().


# 139051 19-Dec-2004 pjd

Some major cleanups.

Keeping consumers open when device is closed is very hard. We need to
open consumers sometimes to update metadata, etc.
Many hacks was introduced in the past to made it possible. You cannot
be sure that you can open consumer for writing always, even if you think
it should be allowed. If one of the mirror components is for example da0
and you try to open it, you can get EPERM when da0s1 is opened for reading
(because BSD class opens consumers (da0) with an extra 'e' bit set).
Waiting for the events queue to be empty may do the trick, but it makes
code much uglier (as you cannot always call g_waitidle()), it doesn't
solve all edge cases and it can introduce deadlocks if there are events
in the queue that wait for gmirror.

I removed those hacks. Now all consumers are open r1w1e1 always, even if
device is closed. Maybe it is less clean from GEOM perspective, but simpify
code a lot and make it much more reliable.
The only issue was retaste event which is sent when we close consumers
opened for writing. I ignore retaste event by not detaching consumer
immediately (so retaste event is not send to my class) and sending event
right after it to detach and destroy consumer.


# 137490 09-Nov-2004 pjd

Before trying to update metadata (so open consumer for writing), be sure
that the events queue is empty. In other case we're able to hit the race
where for example da0s1 is tasted by some other class, which means that
da0 is open with exclusive bit set, which means that we can't open da0
for writing if it is our component.

Reported by: Attila Nagy <bra@fsn.hu> (and somebody else sometime ago,
but I cannot find who it was)


# 137487 09-Nov-2004 pjd

Don't rely on DIRTY flag to be sure that consumer if open, because
DIRTY flag can be removed in idle process. Use consumer's acw field
instead to avoid opening consumer twice.


# 137421 08-Nov-2004 pjd

Drop Giant lock before grabbing the topology lock.


# 137412 08-Nov-2004 pjd

If device is marked as beeing destroyed, deny all access requests.


# 137259 05-Nov-2004 pjd

Don't forget to make sure that there are no not-finished requests before
marking components as clean.

Pointed out by: scottl


# 137254 05-Nov-2004 pjd

Use shutdown hooks to mark mirrors as clean after all file systems are
unmounted.

Suggested by: scottl


# 137253 05-Nov-2004 pjd

Remove unused #include.


# 137251 05-Nov-2004 pjd

- Add a sysctl kern.geom.mirror.idletime, so one can specify after how many
seconds of idling, DRITY flags are removed.
- If mirror is in idle state or is not open for writing, sleep without
timeout when waiting for I/O requests.
- Don't use atomic operations, for now sysctls are protected by Giant.
- Update debugs.


# 137248 05-Nov-2004 pjd

MFp4:
- Fix for good (I hope) force-stopping mirrors and some filure cases
(e.g. the last good component dies when synchronization is in progress).
Don't use ->nstart/->nend consumer's fields, as this could be racy,
because those fields are used in g_down/g_up, use ->index consumer's
field instead for tracking number of not finished requests.

Reported by: marcel

- After 5 seconds of idle time (this should be configurable) mark all
dirty providers as clean, so when mirror is not used in 5 seconds
and there will be power failure, no synchronization on boot is needed.

Idea from: sorry, I can't find who suggested this

- When there are no ACTIVE components and no NEW components destroy whole
mirror, not only provider.

- Fix one debug to show information about I/O request, before we change
its command.


# 136504 14-Oct-2004 pjd

Ehh. Introduce a hack: Wait for 3 seconds, so GEOM is able to give us
providers for tasting. Before this hack, race below is possible:
SI_SUB_RAID (no not-fully-configured geoms, so don't block)
GEOM tasting (now geoms are created)
SI_SUB_MOUNT_ROOT (if root file system is placed on a mirror, it is
possible that this mirror is not fully configured yet)
There is a lot of work to do to avoid such hacks and I need a working
solution before 5.3, sorry.

Reported by: John Hay <jhay@icomtek.csir.co.za>


# 136236 07-Oct-2004 pjd

Be sure to always return 0 for negative access requests.

Reported by: Maciej Kucharz <qk@comp.waw.pl>


# 136197 06-Oct-2004 pjd

Geoms without softc are geoms which are initialized, so wait for them.


# 136191 06-Oct-2004 pjd

Look out for geoms without softc.

Reported by: tegge


# 136143 05-Oct-2004 pjd

Before root file system is mounted, wait for mirrors in degraded state.


# 135872 28-Sep-2004 pjd

Just use MAXPHYS as maximum I/O request size, instead of using my own
#define for this purpose.
No functional change.


# 135859 27-Sep-2004 pjd

Minor, but very important condition fix. The current one can never be true.


# 135854 27-Sep-2004 pjd

Decrease kern.geom.mirror.timeout to 4, so it is smaller than
vfs.root.mountdelay by default.


# 135833 26-Sep-2004 pjd

Avoid race while synchronizing components. It is very hard to bump into,
but it is possible:
1. Read data from good component for synchronization.
2. Write data to the same area.
3. Write synchronization data, which are now stale.

Found by: tegge


# 135831 26-Sep-2004 pjd

Simplify code a bit.


# 135524 20-Sep-2004 pjd

Force commit to provider more detailed info about this change.

There is no need to skip providers with 0 sectorsize in taste routine,
it is now forced by GEOM.
Actually, it can even cause some problems, because GEOM requires sectorsize
to be greater than 0 on first access, not on provider creation, so we can
skip valid providers by doing this check in taste method.

Requested by: scottl


# 135522 20-Sep-2004 pjd

This is not needed anymore, it is forced in GEOM now.
Actually, it can even cause some problems, because GEOM requires sectorsize
to be more than 0 on first access, not on provider creation, so we can skip
valid providers by doing this check here.

Reported by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Sven Willenberger <sven@dmv.com>


# 134957 08-Sep-2004 pjd

Show current status of mirror device directly.

Suggested by: Krzysztof CiepĀ³ucha <kris@home.pl>


# 134528 30-Aug-2004 pjd

Allow to configure debug level from /boot/loader.conf.


# 134486 29-Aug-2004 pjd

GCC, ehh.


# 134344 26-Aug-2004 pjd

Skip providers with not defined sector size.

Reported by: kuriyama


# 134226 23-Aug-2004 pjd

Allow to set kern.geom.mirror.timeout from /boot/loader.conf.


# 133991 18-Aug-2004 pjd

We really don't want to receive spoil event for synchroniztion consumers.


# 133946 18-Aug-2004 pjd

Bump synchronization ID if we are sure, that we have ACTIVE components.


# 133752 15-Aug-2004 pjd

Avoid code duplication by introducing g_mirror_write_metadata() function,
which is used now by g_mirror_clear_metadata() function and
g_mirror_update_metadata() function.


# 133530 11-Aug-2004 pjd

MFp4: Simplify code a bit:
- Remove kern.geom.mirror.sync_block_size sysctl. It is quite obvious that we
want to use the biggest size possible.
- Do not use UMA zone for sync data allocations. There could be only one
synchronization request per synchronized disk at a time, so allocate memory
for one request on whole synchronization process related to one disk.

Tested by synchronizing one component (out of three) and by synchronizing
two components (out of three) in parallel.


# 133484 11-Aug-2004 pjd

Try harder to not panic on 'stop -f'.
After the commit, this command should be really safe to use.


# 133448 10-Aug-2004 pjd

- Recognize HARDCODED flag when dumping consumer configuration.
- Improve code readabilty a bit.


# 133373 09-Aug-2004 pjd

- Introduce option for hardcoding providers' names into metadata.
It allows to fix problems when last provider's sector is shared between few
providers.
- Bump version number for CONCAT and STRIPE and add code for backward
compatibility.
- Do not bump version number of MIRROR, as it wasn't officially introduced yet.
Even if someone started to play with it, there is no big deal, because
wrong MD5 sum of metadata will deny those providers.
- Update manual pages.
- Add version history to g_(stripe|concat).h files.


# 133318 08-Aug-2004 phk

Tag all geom classes in the tree with a version number.


# 133173 05-Aug-2004 pjd

Don't use 'bp' after its destruction!


# 133170 05-Aug-2004 pjd

Simplify a bit - we could use 'sc' here as it was initialized properly.


# 133142 04-Aug-2004 pjd

- Add two fields to bio structure: 'bio_cflags' which can be used by
consumer and 'bio_pflags' which can be used by provider.
- Remove BIO_FLAG1 and BIO_FLAG2 flags. From now on new fields should be
used for internal flags.
- Update g_bio(9) manual page.
- Update some comments.
- Update GEOM_MIRROR, which was the only one using BIO_FLAGs.

Idea from: phk
Reviewed by: phk


# 133115 04-Aug-2004 pjd

- Add "prefer" balance algorithm. When used, only disk with the biggest
priority will be used for reading.
- Bump version number.


# 133114 04-Aug-2004 pjd

MFp4: We don't really need g_mirror_free_disk() function.


# 133079 03-Aug-2004 pjd

Fix comment.


# 132976 01-Aug-2004 pjd

Typo.


# 132954 01-Aug-2004 pjd

- Launch main provider when there are no more disks in NEW state.
- Log syncid bump at debug level 1.


# 132941 31-Jul-2004 pjd

If there are no valid components after the timeout, just destroy device.
There is probably nothing to wait for.


# 132938 31-Jul-2004 pjd

Handle spoil event in dedicated function: g_mirror_spoiled().
The different between the new function and g_mirror_orphan() (which was
used previously) is that syncid is bumped immediately, instead of on
first write, because when consumer was spoiled, it means, that its
provider was opened for writing, so we can't trust that its data
will be valid when it will be connected again.


# 132922 31-Jul-2004 pjd

Destroy synchronization geom immediately. This should fix unloading without
stopping all mirrors.


# 132907 30-Jul-2004 pjd

Dump correct field.


# 132904 30-Jul-2004 pjd

Add GEOM_MIRROR class which provide RAID1 functionality and has many useful
features. The gmirror(8) utility should be used for control of this class.
There is no manual page yet, but I'm working on it with keramida@.

Many useful tests provided by: simon (thank you!)
Some ideas from: scottl, simon, phk