History log of /freebsd-11-stable/sys/geom/geom_disk.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 356589 10-Jan-2020 mav

MFC r356214: Avoid few memory accesses in g_disk_done().


# 329316 15-Feb-2018 avg

MFC r327996: geom_disk / scsi_da: deny opening write-protected disks for writing

Ths change consists of two parts.

geom_disk: deny opening a disk for writing if it's marked as
write-protected. A new disk(9) flag is added to mark write protected
disks. A possible alternative could be to add another parameter to d_open,
so that the open mode could be passed to it and the disk drivers could
make the decision internally, but the flag required less churn.

scsi_da: add a new phase of disk probing to query the all pages mode
sense page. We can determine if the disk is write protected using bit 7
of the device specific field in the mode parameter header returned by
MODE SENSE.

PR: 224037


# 321290 20-Jul-2017 mav

MFC r320729: Add GEOM::descr attribute for symmetry with GEOM::ident.


# 312405 19-Jan-2017 mav

MFC r311971: Report random flash storage as non-rotating to GEOM_DISK.

While doing it, introduce respective constants in geom_disk.h.


# 306476 29-Sep-2016 ae

MFC r303019:
Use g_resize_provider() to change the size of GEOM_DISK provider,
when it is being opened. This should fix the possible loss of a resize
event when disk capacity changed.

MFC r303288:
Do not invoke resize method if geom is being withered.

MFC r303637:
Do not invoke resize event if initial disk size is zero. Some disks
report the size only after first opening. And due to the events are
asynchronous, some consumers can receive this event too late and
this confuses them. This partially restores previous behaviour, and
at the same time this should fix the problem, when already opened
provider loses resize event.

PR: 211028


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302150 23-Jun-2016 ken

Switch geom_disk over to using a pool mutex.

The GEOM disk d_mtx is only acquired on disk creation and destruction.
It is a good candidate for replacement with a pool mutex. This eliminates
the mutex initialization and teardown and the mutex and name variables
themselves from struct disk.

sys/geom/geom_disk.h:
Take d_mtx and d_mtx_name out of struct disk.

sys/geom/geom_disk.c:
Use mtx_pool_lock() and mtx_pool_unlock() to guard the disk
initialization state instead of a dedicated mutex.

This allows removing the initialization and destruction of
d_mtx.

sys/sys/param.h:
Bump __FreeBSD_version to 1100119 for the change to struct disk.

Suggested by: jhb
Sponsored by: Spectra Logic
Approved by: re (gjb)


# 302069 21-Jun-2016 ken

Fix a bug that caused da(4) instances to hang around after the underlying
device is gone.

The problem was that when disk_gone() is called, if the GEOM disk
creation process has not yet happened, the withering process
couldn't start.

We didn't record any state in the GEOM disk code, and so the d_gone()
callback to the da(4) driver never happened.

The solution is to track the state of the creation process, and
initiate the withering process from g_disk_create() if the disk is
being created.

This change does add fields to struct disk, and so I have bumped
DISK_VERSION.

geom_disk.c: Track where we are in the disk creation process,
and check to see whether our underlying disk has
gone away or not.

In disk_gone(), set a new d_goneflag variable that
g_disk_create() can check to see if it needs to
clean up the disk instance.

geom_disk.h: Add a mutex to struct disk (for internal use) disk
init level, and a gone flag.

Bump DISK_VERSION because the size of struct disk has
changed and fields have been added at the beginning.

Sponsored by: Spectra Logic
Approved by: re (marius)


# 300207 19-May-2016 ken

Add support for managing Shingled Magnetic Recording (SMR) drives.

This change includes support for SCSI SMR drives (which conform to the
Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to
the Zoned ATA Command Set or ZAC spec) behind SAS expanders.

This includes full management support through the GEOM BIO interface, and
through a new userland utility, zonectl(8), and through camcontrol(8).

This is now ready for filesystems to use to detect and manage zoned drives.
(There is no work in progress that I know of to use this for ZFS or UFS, if
anyone is interested, let me know and I may have some suggestions.)

Also, improve ATA command passthrough and dispatch support, both via ATA
and ATA passthrough over SCSI.

Also, add support to camcontrol(8) for the ATA Extended Power Conditions
feature set. You can now manage ATA device power states, and set various
idle time thresholds for a drive to enter lower power states.

Note that this change cannot be MFCed in full, because it depends on
changes to the struct bio API that break compatilibity. In order to
avoid breaking the stable API, only changes that don't touch or depend on
the struct bio changes can be merged. For example, the camcontrol(8)
changes don't depend on the new bio API, but zonectl(8) and the probe
changes to the da(4) and ada(4) drivers do depend on it.

Also note that the SMR changes have not yet been tested with an actual
SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports
ZBC to ZAC translation. I have not yet gotten a suitable drive or SAT
layer, so any testing help would be appreciated. These changes have been
tested with Seagate Host Aware SATA drives attached to both SAS and SATA
controllers. Also, I do not have any SATA Host Managed devices, and I
suspect that it may take additional (hopefully minor) changes to support
them.

Thanks to Seagate for supplying the test hardware and answering questions.

sbin/camcontrol/Makefile:
Add epc.c and zone.c.

sbin/camcontrol/camcontrol.8:
Document the zone and epc subcommands.

sbin/camcontrol/camcontrol.c:
Add the zone and epc subcommands.

Add auxiliary register support to build_ata_cmd(). Make sure to
set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA
flags as appropriate for ATA commands.

Add a new get_ata_status() function to parse ATA result from SCSI
sense descriptors (for ATA passthrough over SCSI) and ATA I/O
requests.

sbin/camcontrol/camcontrol.h:
Update the build_ata_cmd() prototype

Add get_ata_status(), zone(), and epc().

sbin/camcontrol/epc.c:
Support for ATA Extended Power Conditions features. This includes
support for all features documented in the ACS-4 Revision 12
specification from t13.org (dated February 18, 2016).

The EPC feature set allows putting a drive into a power power mode
immediately, or setting timeouts so that the drive will
automatically enter progressively lower power states after various
idle times.

sbin/camcontrol/fwdownload.c:
Update the firmware download code for the new build_ata_cmd()
arguments.

sbin/camcontrol/zone.c:
Implement support for Shingled Magnetic Recording (SMR) drives
via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA
Command Set (ZAC).

These specs were developed in concert, and are functionally
identical. The primary differences are due to SCSI and ATA
differences. (SCSI is big endian, ATA is little endian, for
example.)

This includes support for all commands defined in the ZBC and
ZAC specs.

sys/cam/ata/ata_all.c:
Decode a number of additional ATA command names in ata_op_string().

Add a new CCB building function, ata_read_log().

Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building
functions. These support both DMA and NCQ encapsulation.

sys/cam/ata/ata_all.h:
Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and
ata_zac_mgmt_in().

sys/cam/ata/ata_da.c:
Revamp the ada(4) driver to support zoned devices.

Add four new probe states to gather information needed for zone
support.

Add a new adasetflags() function to avoid duplication of large
blocks of flag setting between the async handler and register
functions.

Add new sysctl variables that describe zone support and paramters.

Add support for the new BIO_ZONE bio, and all of its subcommands:
DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

sys/cam/scsi/scsi_all.c:
Add command descriptions for the ZBC IN/OUT commands.

Add descriptions for ZBC Host Managed devices.

Add a new function, scsi_ata_pass() to do ATA passthrough over
SCSI. This will eventually replace scsi_ata_pass_16() -- it
can create the 12, 16, and 32-byte variants of the ATA
PASS-THROUGH command, and supports setting all of the
registers defined as of SAT-4, Revision 5 (March 11, 2016).

Change scsi_ata_identify() to use scsi_ata_pass() instead of
scsi_ata_pass_16().

Add a new scsi_ata_read_log() function to facilitate reading
ATA logs via SCSI.

sys/cam/scsi/scsi_all.h:
Add the new ATA PASS-THROUGH(32) command CDB. Add extended and
variable CDB opcodes.

Add Zoned Block Device Characteristics VPD page.

Add ATA Return SCSI sense descriptor.

Add prototypes for scsi_ata_read_log() and scsi_ata_pass().

sys/cam/scsi/scsi_da.c:
Revamp the da(4) driver to support zoned devices.

Add five new probe states, four of which are needed for ATA
devices.

Add five new sysctl variables that describe zone support and
parameters.

The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC
devices when they are attached via a SCSI to ATA Translation (SAT)
layer. Since ZBC -> ZAC translation is a new feature in the T10
SAT-4 spec, most SATA drives will be supported via ATA commands
sent via the SCSI ATA PASS-THROUGH command. The da(4) driver will
prefer the ZBC interface, if it is available, for performance
reasons, but will use the ATA PASS-THROUGH interface to the ZAC
command set if the SAT layer doesn't support translation yet.
As I mentioned above, ZBC command support is untested.

Add support for the new BIO_ZONE bio, and all of its subcommands:
DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

Add scsi_zbc_in() and scsi_zbc_out() CCB building functions.

Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB
building functions. Note that these have return values, unlike
almost all other CCB building functions in CAM. The reason is
that they can fail, depending upon the particular combination
of input parameters. The primary failure case is if the user
wants NCQ, but fails to specify additional CDB storage. NCQ
requires using the 32-byte version of the SCSI ATA PASS-THROUGH
command, and the current CAM CDB size is 16 bytes.

sys/cam/scsi/scsi_da.h:
Add ZBC IN and ZBC OUT CDBs and opcodes.

Add SCSI Report Zones data structures.

Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and
scsi_ata_zac_mgmt_in() prototypes.

sys/dev/ahci/ahci.c:
Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver.

ahci_setup_fis() previously set the top bits of the sector count
register in the FIS to 0 for FPDMA commands. This is okay for
read and write, because the PRIO field is in the only thing in
those bits, and we don't implement that further up the stack.

But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that
byte, so it needs to be transmitted to the drive.

In ahci_setup_fis(), always set the the top 8 bits of the
sector count register. We need it in both the standard
and NCQ / FPDMA cases.

sys/geom/eli/g_eli.c:
Pass BIO_ZONE commands through the GELI class.

sys/geom/geom.h:
Add g_io_zonecmd() prototype.

sys/geom/geom_dev.c:
Add new DIOCZONECMD ioctl, which allows sending zone commands to
disks.

sys/geom/geom_disk.c:
Add support for BIO_ZONE commands.

sys/geom/geom_disk.h:
Add a new flag, DISKFLAG_CANZONE, that indicates that a given
GEOM disk client can handle BIO_ZONE commands.

sys/geom/geom_io.c:
Add a new function, g_io_zonecmd(), that handles execution of
BIO_ZONE commands.

Add permissions check for BIO_ZONE commands.

Add command decoding for BIO_ZONE commands.

sys/geom/geom_subr.c:
Add DDB command decoding for BIO_ZONE commands.

sys/kern/subr_devstat.c:
Record statistics for REPORT ZONES commands. Note that the
number of bytes transferred for REPORT ZONES won't quite match
what is received from the harware. This is because we're
necessarily counting bytes coming from the da(4) / ada(4) drivers,
which are using the disk_zone.h interface to communicate up
the stack. The structure sizes it uses are slightly different
than the SCSI and ATA structure sizes.

sys/sys/ata.h:
Add many bit and structure definitions for ZAC, NCQ, and EPC
command support.

sys/sys/bio.h:
Convert the bio_cmd field to a straight enumeration. This will
yield more space for additional commands in the future. After
change r297955 and other related changes, this is now possible.
Converting to an enumeration will also prevent use as a bitmask
in the future.

sys/sys/disk.h:
Define the DIOCZONECMD ioctl.

sys/sys/disk_zone.h:
Add a new API for managing zoned disks. This is very close to
the SCSI ZBC and ATA ZAC standards, but uses integers in native
byte order instead of big endian (SCSI) or little endian (ATA)
byte arrays.

This is intended to offer to the complete feature set of the ZBC
and ZAC disk management without requiring the application developer
to include SCSI or ATA headers. We also use one set of headers
for ioctl consumers and kernel bio-level consumers.

sys/sys/param.h:
Bump __FreeBSD_version for sys/bio.h command changes, and inclusion
of SMR support.

usr.sbin/Makefile:
Add the zonectl utility.

usr.sbin/diskinfo/diskinfo.c
Add disk zoning capability to the 'diskinfo -v' output.

usr.sbin/zonectl/Makefile:
Add zonectl makefile.

usr.sbin/zonectl/zonectl.8
zonectl(8) man page.

usr.sbin/zonectl/zonectl.c
The zonectl(8) utility. This allows managing SCSI or ATA zoned
disks via the disk_zone.h API. You can report zones, reset write
pointers, get parameters, etc.

Sponsored by: Spectra Logic
Differential Revision: https://reviews.freebsd.org/D6147
Reviewed by: wblock (documentation)


# 298808 29-Apr-2016 pfg

sys/geom: spelling fixes in comments.

No functional change.


# 298439 21-Apr-2016 asomers

DRY on buffer sizes. Update to r298420.

sys/geom/geom_disk.c:
In disk_attr_changed, don't repeat a buffer size.

Reported by: ngie, hselasky
MFC after: 4 weeks
X-MFC-With: 298420
Sponsored by: Spectra Logic Corp


# 298420 21-Apr-2016 asomers

Notify userspace listeners when geom disk attributes have changed

sys/geom/geom_disk.c:
disk_attr_changed(): Generate a devctl event of type GEOM:<attr> for
every call.

MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D5952


# 296605 10-Mar-2016 imp

Don't assume that bio_cmd is bit mask.

Differential Revision: https://reviews.freebsd.org/D5593


# 294042 14-Jan-2016 rpokala

Add rotationrate to geom disk dumpconf

Parse and report the nominal rotation rate reported by the drive.

Reviewed by: sbruno, jhb
Approved by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D4483
Requested by: Kevin Bowling < kevin.bowling @ kev009.com >


# 291742 04-Dec-2015 ken

Fix a style issue in g_disk_limit().

Noticed by: bdrewery
MFC after: 1 week


# 291741 04-Dec-2015 ken

Fix g_disk_vlist_limit() to work properly with deletes.

Add a new bp argument to g_disk_maxsegs(), and add a new function,
g_disk_maxsize() tha will properly determine the maximum I/O size for a
delete or non-delete bio.

Submitted by: will
MFC after: 1 week
Sponsored by: Spectra Logic


# 291716 03-Dec-2015 ken

Add asynchronous command support to the pass(4) driver, and the new
camdd(8) utility.

CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl. User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.

While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists. This allows user applications to have more
flexibility in their data handling operations.

Kernel memory for data transferred via the queued interface is
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out. This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.

The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS. The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.

There are some things things would be good to add:

1. Come up with a way to do unmapped I/O on multiple buffers.
Currently the unmapped I/O interface operates on a struct bio,
which includes only one address and length. It would be nice
to be able to send an unmapped scatter/gather list down to
busdma. This would allow eliminating the copy we currently do
for data.

2. Add an ioctl to list currently outstanding CCBs in the various
queues.

3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
that.

4. Test physical address support. Virtual pointers and scatter
gather lists have been tested, but I have not yet tested
physical addresses or scatter/gather lists.

5. Investigate multiple queue support. At the moment there is one
queue of commands per pass(4) device. If multiple processes
open the device, they will submit I/O into the same queue and
get events for the same completions. This is probably the right
model for most applications, but it is something that could be
changed later on.

Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.

This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.

It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.

It can also copy data to and from regular files, block devices, tape
devices, pipes, stdin, and stdout. It does not support queueing
multiple commands to any of those targets, since it uses the standard
read(2)/write(2)/writev(2)/readv(2) system calls.

The I/O is done by two threads, one for the reader and one for the
writer. The reader thread sends completed read requests to the
writer thread in strictly sequential order, even if they complete
out of order. That could be modified later on for random I/O patterns
or slightly out of order I/O.

camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
the pass(4) driver and also to send request notifications internally.

For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
per CAM CCB on the reading side, and a scatter/gather list
(CAM_DATA_SG) on the writing side. In addition to testing both
interfaces, this makes any potential reblocking of I/O easier. No
data is copied between the reader and the writer, but rather the
reader's buffers are split into multiple I/O requests or combined
into a single I/O request depending on the input and output blocksize.

For the file I/O path, camdd(8) also uses a single buffer (read(2),
write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
(readv(2), writev(2), preadv(2), pwritev(2)) on writes.

Things that would be nice to do for camdd(8) eventually:

1. Add support for I/O pattern generation. Patterns like all
zeros, all ones, LBA-based patterns, random patterns, etc. Right
Now you can always use /dev/zero, /dev/random, etc.

2. Add support for a "sink" mode, so we do only reads with no
writes. Right now, you can use /dev/null.

3. Add support for automatic queue depth probing, so that we can
figure out the right queue depth on the input and output side
for maximum throughput. At the moment it defaults to 6.

4. Add support for SATA device passthrough I/O.

5. Add support for random LBAs and/or lengths on the input and
output sides.

6. Track average per-I/O latency and busy time. The busy time
and latency could also feed in to the automatic queue depth
determination.

sys/cam/scsi/scsi_pass.h:
Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
and fetch asynchronous CAM CCBs respectively.

Although these ioctls do not have a declared argument, they
both take a union ccb pointer. If we declare a size here,
the ioctl code in sys/kern/sys_generic.c will malloc and free
a buffer for either the CCB or the CCB pointer (depending on
how it is declared). Since we have to keep a copy of the
CCB (which is fairly large) anyway, having the ioctl malloc
and free a CCB for each call is wasteful.

sys/cam/scsi/scsi_pass.c:
Add asynchronous CCB support.

Add two new ioctls, CAMIOQUEUE and CAMIOGET.

CAMIOQUEUE adds a CCB to the incoming queue. The CCB is
executed immediately (and moved to the active queue) if it
is an immediate CCB, but otherwise it will be executed
in passstart() when a CCB is available from the transport layer.

When CCBs are completed (because they are immediate or
passdone() if they are queued), they are put on the done
queue.

If we get the final close on the device before all pending
I/O is complete, all active I/O is moved to the abandoned
queue and we increment the peripheral reference count so
that the peripheral driver instance doesn't go away before
all pending I/O is done.

The new passcreatezone() function is called on the first
call to the CAMIOQUEUE ioctl on a given device to allocate
the UMA zones for I/O requests and S/G list buffers. This
may be good to move off to a taskqueue at some point.
The new passmemsetup() function allocates memory and
scatter/gather lists to hold the user's data, and copies
in any data that needs to be written. For virtual pointers
(CAM_DATA_VADDR), the kernel buffer is malloced from the
new pass(4) driver malloc bucket. For virtual
scatter/gather lists (CAM_DATA_SG), buffers are allocated
from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
Physical pointers are passed in unchanged. We have support
for up to 16 scatter/gather segments (for the user and
kernel S/G lists) in the default struct pass_io_req, so
requests with longer S/G lists require an extra kernel malloc.

The new passcopysglist() function copies a user scatter/gather
list to a kernel scatter/gather list. The number of elements
in each list may be different, but (obviously) the amount of data
stored has to be identical.

The new passmemdone() function copies data out for the
CAM_DATA_VADDR and CAM_DATA_SG cases.

The new passiocleanup() function restores data pointers in
user CCBs and frees memory.

Add new functions to support kqueue(2)/kevent(2):

passreadfilt() tells kevent whether or not the done
queue is empty.

passkqfilter() adds a knote to our list.

passreadfiltdetach() removes a knote from our list.

Add a new function, passpoll(), for poll(2)/select(2)
to use.

Add devstat(9) support for the queued CCB path.

sys/cam/ata/ata_da.c:
Add support for the BIO_VLIST bio type.

sys/cam/cam_ccb.h:
Add a new enumeration for the xflags field in the CCB header.
(This doesn't change the CCB header, just adds an enumeration to
use.)

sys/cam/cam_xpt.c:
Add a new function, xpt_setup_ccb_flags(), that allows specifying
CCB flags.

sys/cam/cam_xpt.h:
Add a prototype for xpt_setup_ccb_flags().

sys/cam/scsi/scsi_da.c:
Add support for BIO_VLIST.

sys/dev/md/md.c:
Add BIO_VLIST support to md(4).

sys/geom/geom_disk.c:
Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size
limiting code in g_disk_start() a bit.

sys/kern/subr_bus_dma.c:
Change _bus_dmamap_load_vlist() to take a starting offset and
length.

Add a new function, _bus_dmamap_load_pages(), that will load a list
of physical pages starting at an offset.

Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios.
Allow unmapped I/O to start at an offset.

sys/kern/subr_uio.c:
Add two new functions, physcopyin_vlist() and physcopyout_vlist().

sys/pc98/include/bus.h:
Guard kernel-only parts of the pc98 machine/bus.h header with
#ifdef _KERNEL.

This allows userland programs to include <machine/bus.h> to get the
definition of bus_addr_t and bus_size_t.

sys/sys/bio.h:
Add a new bio flag, BIO_VLIST.

sys/sys/uio.h:
Add prototypes for physcopyin_vlist() and physcopyout_vlist().

share/man/man4/pass.4:
Document the CAMIOQUEUE and CAMIOGET ioctls.

usr.sbin/Makefile:
Add camdd.

usr.sbin/camdd/Makefile:
Add a makefile for camdd(8).

usr.sbin/camdd/camdd.8:
Man page for camdd(8).

usr.sbin/camdd/camdd.c:
The new camdd(8) utility.

Sponsored by: Spectra Logic
MFC after: 1 week


# 273638 25-Oct-2014 mav

Revert somewhat hackish geom_disk optimization, committed as part of r256880,
and the following r273143 commit, supposed to workaround introduced issue by
quite innocent-looking change.

While there is no clear understanding why, but r273143 is accused in data
corruption in some environments with high I/O load. I personally don't see
any problem in that commit, and possibly it is just a trigger to some other
bug somewhere, but better safe then sorry for now.

Requested by: scottl@
MFC after: 3 days


# 267992 28-Jun-2014 hselasky

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 267985 27-Jun-2014 gjb

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 267961 27-Jun-2014 hselasky

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 266319 17-May-2014 mav

Make GEOM DISK to account also BIO_FLUSH operations.


# 265072 28-Apr-2014 bdrewery

Remove redundant include

MFC after: 3 days


# 264320 10-Apr-2014 bdrewery

Fix spelling error in g_trace() call.

Sponsored by: EMC / Isilon Storage Division
MFC after: 1 week


# 258683 27-Nov-2013 mav

Escape special XML chars, returned by some devices, confusing XML parsers.

MFC after: 1 month


# 257131 25-Oct-2013 jhb

Reject attempts to attack a disk device that has the old NEEDSGIANT
flag set.

Reviewed by: mav


# 256956 23-Oct-2013 smh

Improve ZFS N-way mirror read performance by using load and locality
information.

The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487

Reviewed by: gibbs, mav, will
MFC after: 2 weeks
Sponsored by: Multiplay


# 256884 22-Oct-2013 mav

Remove Giant-locked drivers support (DISKFLAG_NEEDSGIANT flag) from disk(9).

Since at least FreeBSD 7 we had only four of them in the base tree, and
in head branch, thanks to jhb@, we have no any for more then a year.


# 256880 22-Oct-2013 mav

Merge GEOM direct dispatch changes from the projects/camlock branch.

When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
- caller should not hold any locks and should be reenterable;
- callee should not depend on GEOM dual-threaded concurency semantics;
- on the way down, if request is unmapped while callee doesn't support it,
the context should be sleepable;
- kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
- G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
- G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
- G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
- G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe. If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by: iXsystems, Inc.
MFC after: 2 months


# 256606 16-Oct-2013 mav

MFprojects/camlock r254907:
Move g_io_deliver() out of the lock, as required for direct dispatch.
Move g_destroy_bio() out too to reduce lock scope even more.


# 256603 16-Oct-2013 mav

MFprojects/camlock r254905:
Introduce new function devstat_end_transaction_bio_bt(), adding new argument
to specify present time. Use this function to move binuptime() out of lock,
substantially reducing lock congestion when slow timecounter is used.


# 254766 24-Aug-2013 mav

Add new attribute lunname to report only textual LUN-specific device IDs.
While lunid attribute prefers to report numeric ones, having both may be
useful in some situations.


# 252657 03-Jul-2013 smh

Bump disk(9) ABI version to signify the addition of d_delmaxsize by r249940.

Ensure that d_delmaxsize is always set, removing init to 0 which could cause
future issues if use cases change.

Allow kern.cam.da.X.delete_max (which maps to d_delmaxsize) to be increased
up to the calculated max after being reduced.

MFC after: 1 day
X-MFC-With: r249940


# 251654 12-Jun-2013 mav

Make CAM return and GEOM DISK pass through new GEOM::lunid attribute.

SPC-4 specification states that serial number may be property of device,
but not a specific logical unit. People reported about FC storages using
serial number in that way, making it unusable for purposes of LUN multipath
detection. SPC-4 states that designators associated with logical unit from
the VPD page 83h "Device Identification" should be used for that purpose.
Report first of them in the new attribute in such preference order: NAA,
EUI-64, T10 and SCSI name string.

While there, make GEOM DISK properly report GEOM::ident in XML output also
using d_getattr() method, if available. This fixes serial numbers reporting
for SCSI disks in `geom disk list` output and confxml.

Discussed with: gibbs, ken
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks


# 251616 11-Jun-2013 mav

Don't update provider properties and don't set DISKFLAG_OPEN if d_open()
disk method call returned error. GEOM considers devices in such case as
still closed, and won't call symmetric d_close() for them.


# 249940 26-Apr-2013 smh

Teach GEOM and CAM about the difference between the max "size" of r/w and delete
requests.

sys/geom/geom_disk.h:
- Added d_delmaxsize which represents the maximum size of individual
device delete requests in bytes. This can be used by devices to
inform geom of their size limitations regarding delete operations
which are generally different from the read / write limits as data
is not usually transferred from the host to physical device.

sys/geom/geom_disk.c:
- Use new d_delmaxsize to calculate the size of chunks passed through to
the underlying strategy during deletes instead of using read / write
optimised values. This defaults to d_maxsize if unset (0).

- Moved d_maxsize default up so it can be used to default d_delmaxsize

sys/cam/ata/ata_da.c:
- Added d_delmaxsize calculations for TRIM and CFA

sys/cam/scsi/scsi_da.c:
- Added re-calculation of d_delmaxsize whenever delete_method is set.

- Added kern.cam.da.X.delete_max sysctl which allows the max size for
delete requests to be limited. This is useful in preventing timeouts
on devices who's delete methods are slow. It should be noted that
this limit is reset then the device delete method is changed and
that it can only be lowered not increased from the device max.

Reviewed by: mav
Approved by: pjd (mentor)


# 249507 15-Apr-2013 ivoras

Introduce a symbol for the GEOM class name instead of using the ad-hoc string
constant.


# 249161 05-Apr-2013 mav

Following r241022, replace iteration over the provider list on media events
by taking first one and asserting that there is no others.

MFC after: 1 week


# 248694 25-Mar-2013 mav

In GEOM DISK:
- Replace single done mutex with per-disk ones. On system with several
disks on several HBAs that removes small, but measurable lock congestion.
- Modify disk destruction process to not destroy the mutex prematurely.
- Remove some extra pointer derefences.


# 248516 19-Mar-2013 kib

A flag for the geom disk driver to indicate that it accepts the
unmapped i/o requests.

Sponsored by: The FreeBSD Foundation
Tested by: pho


# 242322 29-Oct-2012 trasz

Fix locking problem in disk_resize(); previously it would run without
topology lock, resulting in assertion when running with DIAGNOSTIC.

Reviewed by: mav (earlier version)


# 241022 28-Sep-2012 pjd

Remove the topology lock from disk_gone(), it might be called with regular
mutexes held and the topology lock is an sx lock.

The topology lock was there to protect traversing through the list of providers
of disk's geom, but it seems that disk's geom has always exactly one provider.

Change the code to call g_wither_provider() for this one provider, which is
safe to do without holding the topology lock and assert that there is indeed
only one provider.

Discussed with: ken
MFC after: 1 week


# 240822 22-Sep-2012 pjd

Use the topology lock to protect list of providers while withering them.
It is possible that provider is destroyed while we are iterating over the
list.

Reported by: Brian Parkison <parkison@panzura.com>
Discussed with: phk
MFC after: 1 week


# 240629 18-Sep-2012 avg

g_disk_flushcache definitely should not be traced under G_T_TOPOLOGY

... use G_T_BIO instead

MFC after: 1 week


# 239790 28-Aug-2012 ed

Remove unneeded G_PF_CANDELETE flag.

This flag is only used by GEOM so it can be propagated to the character
device's SI_CANDELETE. Unfortunately, SI_CANDELETE seems to do nothing.


# 238886 29-Jul-2012 mav

Implement media change notification for DA and CD removable media devices.
It includes three parts:
1) Modifications to CAM to detect media media changes and report them to
disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes
Asynchronous Notification mechanism to receive events from hardware.
Active polling with TEST UNIT READY commands with 3 seconds period is used
for incapable hardware. After that both CD and DA drivers work the same way,
detecting two conditions: "NOT READY: Medium not present" after medium was
detected previously, and "UNIT ATTENTION: Not ready to ready change, medium
may have changed". First one reported to disk(9) as media removal, second
as media insert/change. To reliably receive second event new
AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by
generic error handling code in cam_periph_error().
2) Modifications to GEOM core to handle media remove and change events.
Media removal handled by spoiling all consumers attached to the provider.
Media change event also schedules provider retaste after spoiling to probe
new media. New flag G_CF_ORPHAN was added to consumers to reflect that
consumer is in process of destruction. It allows retaste to create new
geom instance of the same class, while previous one is still dying.
3) Modifications to some GEOM classes: DEV -- to report media change
events to devd; VFS -- to handle spoiling same as orphan to prevent
accessing replaced media. PART class already handles spoiling alike to
orphan.

Reviewed by: silence on geom@ and scsi@
Tested by: avg
Sponsored by: iXsystems, Inc. / PC-BSD
MFC after: 2 months


# 238216 07-Jul-2012 trasz

Add disk_resize(), to make it possible for the disk drivers such as da(4)
to notify GEOM about LUN size change.

Reviewed by: mav (earlier version)
Sponsored by: FreeBSD Foundation


# 237648 27-Jun-2012 ken

In g_disk_providergone(), don't continue if the softc is NULL. This may be
the case if we've already gone through g_disk_destroy().

Reported by: Michael Butler <imb@protected-networks.net>
MFC after: 3 days


# 237518 24-Jun-2012 ken

Fix a bug which causes a panic in daopen(). The panic is caused by
a da(4) instance going away while GEOM is still probing it.

In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.

While that event is queued, the da(4) instance goes away. When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.

The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up. This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.

scsi_cd.c,
scsi_da.c: In the register routine for the cd(4) and da(4)
routines, acquire a reference to the CAM peripheral
instance just before we call disk_create().

Use the new GEOM disk d_gone() callback to register
a callback (dadiskgonecb()/cddiskgonecb()) that
decrements the peripheral reference count once GEOM
has finished cleaning up its resources.

In the cd(4) driver, clean up open and close
behavior slightly. GEOM makes sure we only get one
open() and one close call, so there is no need to
set an open flag and decrement the reference count
if we are not the first open.

In the cd(4) driver, use cam_periph_release_locked()
in a couple of error scenarios to avoid extra mutex
calls.

geom.h: Add a new, optional, providergone callback that
is called when a provider is about to be deleted.

geom_disk.h: Add a new d_gone() callback to the GEOM disk
interface.

Bump the DISK_VERSION to version 2. This probably
should have been done after a couple of previous
changes, especially the addition of the d_getattr()
callback.

geom_disk.c: Add a providergone callback for the disk class,
g_disk_providergone(), that calls the user's
d_gone() callback if it exists.

Bump the DISK_VERSION to 2.

geom_subr.c: In g_destroy_provider(), call the providergone
callback if it has been provided.

In g_new_geomf(), propagate the class's
providergone callback to the new geom instance.

blkfront.c: Callers of disk_create() are supposed to pass in
DISK_VERSION, not an explicit disk API version
number. Update the blkfront driver to do that.

disk.9: Update the disk(9) man page to include information
on the new d_gone() callback, as well as the
previously added d_getattr() callback, d_descr
field, and HBA PCI ID fields.

MFC after: 5 days


# 227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 226737 25-Oct-2011 pjd

Allow upper layers to discover than BIO_DELETE and/or BIO_FLUSH is not
supported by returning EOPNOTSUPP instead of 0 or ENODEV.

MFC after: 3 days


# 226736 25-Oct-2011 pjd

Improve style a bit.

MFC after: 3 days


# 226735 25-Oct-2011 pjd

Simplify disk_alloc().

MFC after: 3 days


# 223921 11-Jul-2011 ae

Include sys/sbuf.h directly.

Reviewed by: pjd


# 223089 14-Jun-2011 gibbs

Plumb device physical path reporting from CAM devices, through GEOM and
DEVFS, and make it accessible via the diskinfo utility.

Extend GEOM's generic attribute query mechanism into generic disk consumers.
sys/geom/geom_disk.c:
sys/geom/geom_disk.h:
sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
- Allow disk providers to implement a new method which can override
the default BIO_GETATTR response, d_getattr(struct bio *). This
function returns -1 if not handled, otherwise it returns 0 or an
errno to be passed to g_io_deliver().

sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
- Don't copy the serial number to dp->d_ident anymore, as the CAM XPT
is now responsible for returning this information via
d_getattr()->(a)dagetattr()->xpt_getatr().

sys/geom/geom_dev.c:
- Implement a new ioctl, DIOCGPHYSPATH, which returns the GEOM
attribute "GEOM::physpath", if possible. If the attribute request
returns a zero-length string, ENOENT is returned.

usr.sbin/diskinfo/diskinfo.c:
- If the DIOCGPHYSPATH ioctl is successful, report physical path
data when diskinfo is executed with the '-v' option.

Submitted by: will
Reviewed by: gibbs
Sponsored by: Spectra Logic Corporation

Add generic attribute change notification support to GEOM.

sys/sys/geom/geom.h:
Add a new attrchanged method field to both g_class
and g_geom.

sys/sys/geom/geom.h:
sys/geom/geom_event.c:
- Provide the g_attr_changed() function that providers
can use to advertise attribute changes.
- Perform delivery of attribute change notifications
from a thread context via the standard GEOM event
mechanism.

sys/geom/geom_subr.c:
Inherit the attrchanged method from class to geom (class instance).

sys/geom/geom_disk.c:
Provide disk_attr_changed() to provide g_attr_changed() access
to consumers of the disk API.

sys/cam/scsi/scsi_pass.c:
sys/cam/scsi/scsi_da.c:
sys/geom/geom_dev.c:
sys/geom/geom_disk.c:
Use attribute changed events to track updates to physical path
information.

sys/cam/scsi/scsi_da.c:
Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
events for this driver. When this event occurs, and
the updated buffer type references our physical path
attribute, emit a GEOM attribute changed event via the
disk_attr_changed() API.

sys/cam/scsi/scsi_pass.c:
Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
events for this driver. When this event occurs, update
the physical patch devfs alias for this pass instance.

Submitted by: gibbs
Sponsored by: Spectra Logic Corporation


# 222652 03-Jun-2011 mav

Update disk's stripesize and stripeoffset parameters on provider open.
They are media-dependent and may change in run-time, same as sectorsize
and/or mediasize.

SCSI devices return physical sector size and offset via READ CAPACITY(16)
command and so can not report it until media inserted or at least until
probe sequence completed. UNMAP support is also reported there.


# 219970 24-Mar-2011 mav

MFgraid/head r218212, r218257:
Introduce new type of BIO_GETATTR -- GEOM::setstate, used to inform lower
GEOM about state of it's providers from the point of upper layers.
Make geom_disk use led(4) subsystem to illuminate states in such fashion:
FAILED - "1" (on), REBUILD - "f5" (slow blink), RESYNC - "f1" (fast blink),
ACTIVE - "0" (off).
LED name should be set for each disk via kern.geom.disk.%s.led sysctl.
Later disk API could be extended to allow disk driver to report this info
in custom way via it's own facilities.


# 219950 24-Mar-2011 mav

MFgraid/head r217827:
Change BIO_GETATTR("GEOM::kerneldump") API to make set_dumper() called by
consumer (geom_dev) instead of provider (geom_disk). This allows any geom
insert it's code into the dump call chain, implementing more sophisticated
functionality then just disk partitioning.


# 219056 26-Feb-2011 nwhitehorn

Add the disk ident and a human-meaningful description (here, the disk model
string) to the geom_disk config XML so that they are easily accessible from
userland.

MFC after: 1 week


# 217915 26-Jan-2011 mdf

Remove the CTLFLAG_NOLOCK as it seems to be both unused and
unfunctional. Wiring the user buffer has only been done explicitly
since r101422.

Mark the kern.disks sysctl as MPSAFE since it is and it seems to have
been mis-using the NOLOCK flag.

Partially break the KPI (but not the KBI) for the sysctl_req 'lock'
field since this member should be private and the "REQ_LOCKED" state
seems meaningless now.


# 216794 29-Dec-2010 kib

Add reporting of GEOM::candelete BIO_GETATTR for md(4) and geom_disk(4).
Non-zero value of attribute means that device supports BIO_DELETE.

Suggested and reviewed by: pjd
Tested by: pho
MFC after: 1 week


# 210471 25-Jul-2010 mav

Export PCI IDs of ATA/SATA controllers through CAM and ata(4) layers to
GEOM. This information needed for proper soft-RAID's on-disk metadata
reading and writing.


# 196823 04-Sep-2009 pjd

Simplify g_disk_ident_adjust() function and allow any printable character
in serial number.

Discussed with: trasz
Obtained from: Wheel Sp. z o.o. (http://www.wheel.pl)


# 190878 10-Apr-2009 thompsa

Revert r190676,190677

The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by: scottl


# 190677 03-Apr-2009 thompsa

Add interleaving root hold tokens from the CAM probe to disk_create and geom
provider tasting. This is needed for disk attachments that happen after threads
are running in the boot process.

Tested by: rnoland


# 184499 31-Oct-2008 kib

Revert r184136. Instead, push the check for crashdumpmap overflow into the
MD i386 and amd64 dump code.

Requested by: jhb
Retested by: pho
MFC after: 3 days (+ 176304 + 184136)


# 184136 21-Oct-2008 kib

Do not overflow crashdumpmap.

Reported and tested by: pho
Reviewed by: jhb
MFC after: 1 week


# 181463 09-Aug-2008 des

Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from: Varnish
MFC after: 2 weeks


# 176304 15-Feb-2008 scottl

Teach the dump and minidump code to respect the maxioszie attribute of
the disk; the hard-coded assumption of 64K doesn't work in all cases.


# 169289 05-May-2007 pjd

Allow to use ':' in d_ident, which is quite handy character.


# 169287 05-May-2007 pjd

Because there are many strange hardware out there, allow to use only
[a-zA-Z0-9-_@#%.] characters in d_ident field.


# 169285 05-May-2007 pjd

- Extend disk structure to allow to store disk's serial number, which can be
retrieved via GEOM::ident attribute.
- Bump disk(9) ABI version.

OK'ed by: phk


# 166861 21-Feb-2007 n_hibma

Reduce the noise when plugging in (USB) mass storage devices, like a 4 port
flash card reader.
Also remove an 'Opened da0 -> <random number>' which is not needed on a daily
basis (available through bootverbose).

Reviewed by: phk, ken
MFC after: 1 week


# 163833 31-Oct-2006 pjd

Add a new disk flag - DISKFLAG_CANFLUSHCACHE, which indicates that the disk
can handle BIO_FLUSH requests.

Sponsored by: home.pl


# 157619 10-Apr-2006 marcel

Add g_wither_provider() to abstract the details of destroying a
particular provider. Use this function where g_orphan_provider()
is being called so that the flags are updated correctly and
g_orphan_provider() is called only when allowed.


# 152565 18-Nov-2005 jdp

Fix a bug that caused some /dev entries to continue to exist after
the underlying drive had been hot-unplugged from the system. Here
is a specific example. Filesystem code had opened /dev/da1s1e.
Subsequently, the drive was hot-unplugged. This (correctly) caused
all of the associated /dev/da1* entries to be deleted. When the
filesystem later realized that the drive was gone it closed the
device, reducing the write-access counts to 0 on the geom providers
for da1s1e, da1s1, and da1. This caused geom to re-taste the
providers, resulting in the devices being created again. When the
drive was hot-plugged back in, it resulted in duplicate /dev entries
for da1s1e, da1s1, and da1.

This fix adds a new disk_gone() function which is called by CAM when a
drive goes away. It orphans all of the providers associated with the
drive, setting an error condition of ENXIO in each one. In addition,
we prevent a re-taste on last close for writing if an error condition
has been set in the provider.

Sponsored by: Isilon Systems
Reviewed by: phk
MFC after: 1 week


# 150759 30-Sep-2005 tegge

Move some devstat collection to below where large IO operations are chopped
up. This make iostat report operations passed down to the device driver
instead of operations passed down to GEOM disk. The transfer size limit
imposed by the device driver is no longer hidden, improving the correlation
between iostat output and device driver workload.


# 143791 18-Mar-2005 phk

After rejecting the bio request early, return instead of panicing.

Found by: Coverity (ID#450)


# 141624 10-Feb-2005 phk

Make various random things static


# 140968 29-Jan-2005 phk

When dumping to a unpartitioned disk, make sure to chop the
length of the dump area accordingly.

Run into by: scottl


# 140261 14-Jan-2005 phk

CAM will sometimes remove a disk again even before it finished being
initialized. We already cancel the pending events but we need to not
dereference the geom pointer which never got set different from NULL.


# 138732 12-Dec-2004 phk

Pass the file->flags down to geom ioctl handlers.

Reject certain ioctls if write permission is not indicated.

Bump geom API version.

Reported by: Ruben de Groot <mail25@bzerk.org>


# 133318 08-Aug-2004 phk

Tag all geom classes in the tree with a version number.


# 133314 08-Aug-2004 phk

Use default method initialization on geoms.


# 131267 29-Jun-2004 phk

Fix regression in last commit.


# 131207 27-Jun-2004 phk

Make sure to kill the devstat entry for disappearing disks.

PR: 68074
Submitted by: Hendrik Scholz <hscholz@raisdorf.net>


# 129877 30-May-2004 phk

Zap a redundant NULL


# 129116 11-May-2004 sos

Dont try to finish devstat's if the disk pointer is NULL, this can happen
when a disk has been destroyed but still has outstanding bio's.

Reviewed by: phk


# 125975 18-Feb-2004 phk

Change the disk(9) API in order to make device removal more robust.

Previously the "struct disk" were owned by the device driver and this
gave us problems when the device disappared and the users of that device
were not immediately disappearing.

Now the struct disk is allocate with a new call, disk_alloc() and owned
by geom_disk and just abandonned by the device driver when disk_create()
is called.

Unfortunately, this results in a ton of "s/\./->/" changes to device
drivers.

Since I'm doing the sweep anyway, a couple of other API improvements
have been carried out at the same time:

The Giant awareness flag has been flipped from DISKFLAG_NOGIANT to
DISKFLAG_NEEDSGIANT

A version number have been added to disk_create() so that we can detect,
report and ignore binary drivers with old ABI in the future.

Manual page update to follow shortly.


# 125651 10-Feb-2004 phk

don't call sbuf_clear() right after sbuf_new(), it is not necessary.


# 125539 06-Feb-2004 pjd

Allow decreasing access count even if there is no disk anymore.
This will allow closing disks that were removed while opened.

Approved by: phk, scottl (mentor)


# 124864 23-Jan-2004 phk

Remove no longer necessary debug printfs


# 123271 07-Dec-2003 truckman

Correct usage of mtx_init() API. This is not a functional change since
the code happened to work because MTX_DEF and NULL are both defined as 0.

Reviewed by: phk


# 121216 18-Oct-2003 phk

Retire bio_blkno entirely.

bio_offset is the field drivers should use.
bio_pblkno remains as a convenient place to store the number of
the device drivers.


# 120572 29-Sep-2003 phk

Return ENODEV in case the driver has no dump routine.


# 120374 23-Sep-2003 phk

Be more careful in dumpconf: softc may be NULL for departing devices.

Allow drivers to initialize the d_devstat if they want magic params.


# 119660 01-Sep-2003 phk

Simplify the ioctl handling in GEOM.

This replaces the current ioctl processing with a direct call path
from geom_dev() where the ioctl arrives (from SPECFS) to any directly
connected GEOM class.

The inverse of the above is no longer supported. This is the
situation were you have one or more intervening GEOM classes, for
instance a BSDlabel on top of a MBR or PC98. If you want to issue
MBR or PC98 specific ioctls, you will need to issue them on a MBR
or PC98 providers.

This paves the way for inviting CD's, FD's and other special cases
inside GEOM.


# 119652 01-Sep-2003 phk

Try to close the race between disk_destroy() and a subsequent disk_create().


# 116196 11-Jun-2003 obrien

Use __FBSDID().

Approved by: phk


# 115473 31-May-2003 phk

Introduce a init and fini member functions on a class.

Use ->init() and ->fini() to handle the mutex in geom_disk.c

Remove the g_add_class() function and replace it with a standardized
g_modevent() function.

This adds the basic infrastructure for loading/unloading GEOM classes


# 115468 31-May-2003 phk

Remove the G_CLASS_INITIALIZER, we do not need it anymore.


# 115309 25-May-2003 phk

Don't do silly thing if the disk_create() event gets canceled.

Approved by: re/scottl


# 115214 21-May-2003 phk

Return ENXIO if the softc pointer is NULL, in all likelyhood the
disk is in the process of disappearing.

Approved by: re/rwats*


# 114958 12-May-2003 phk

When a disk disappears, destroy the class from the event thread
to avoid race condtion.

Approved by: re/rwatson


# 114498 02-May-2003 phk

Use g_wither_geom() for cleanup.


# 113940 23-Apr-2003 phk

Introduce a g_waitfor_event() function which posts an event and waits for
it to be run (or cancelled) and use this instead of home-rolled versions.


# 113937 23-Apr-2003 phk

Rename g_call_me() to g_post_event(), and give it a flag
argument to determine if we can M_WAITOK in malloc.


# 113032 03-Apr-2003 phk

Remove all references to BIO_SETATTR. We will not be using it.


# 112989 02-Apr-2003 phk

Add handling for cancelled events in the g_call_me() methods.


# 112988 02-Apr-2003 phk

Change events to have an array of "void *" references, and give the
event posting functions varargs to fill these.

Attribute g_call_me() to appropriate g_geom's where necessary.

Add a flag argument to g_call_me() methods which will be used to signal
cancellation of events in the future.

This commit should be a no-op.


# 112952 01-Apr-2003 phk

Include <geom/geom_disk.h> not <sys/disk.h>


# 112708 27-Mar-2003 phk

Check return value of g_call_me()


# 112552 24-Mar-2003 phk

Premptively change initializations of struct g_class to use C99
sparse struct initializations before we extend the struct with
new OAM related member functions.


# 112476 21-Mar-2003 phk

Mitigate deadlock situation pending a more complete solution.


# 112367 18-Mar-2003 phk

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


# 112259 15-Mar-2003 phk

Use devstat_{start,end}_transaction_bio().
Remember to set bio_resid correctly first.


# 112002 08-Mar-2003 phk

Allocate devstat structure with devstat_new_entry().


# 111979 08-Mar-2003 phk

Centralize the devstat handling for all GEOM disk device drivers
in geom_disk.c.

As a side effect this makes a lot of #include <sys/devicestat.h>
lines not needed and some biofinish() calls can be reduced to
biodone() again.


# 111668 28-Feb-2003 phk

NO_GEOM cleanup:

Retire the "dev_t" centric version of the disk mini-layer.
Remove now unneeded linkage field in dev_t and struct disk.


# 111220 21-Feb-2003 phk

NO_GEOM cleanup:

Retire the "d_dump_t" and use the "dumper_t" type instead.

Dumper_t takes a void * as first arg which is more general than the
dev_t taken by d_dump_t. (Remember: we could have net-dumpers if
somebody wrote us one!)

Define the convention for GEOM controlled disk devices to be that the
first argument to the dumper function is the struct disk pointer.

Change device drivers accordingly.


# 111216 21-Feb-2003 phk

NO_GEOM cleanup:

Change the argument to disk_destroy() to be the same struct disk * as
disk_create() takes.

This enables drivers to ignore the (now) bogus dev_t which disk_create()
returns.


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 110766 12-Feb-2003 tegge

Correctly set bio_data in cloned children when cutting up large requests.


# 110727 11-Feb-2003 phk

Check disk->d_maxsize/dev->si_iosize_max at open time rather than in strategy.

Printf a warning and use DFLTPHYS if the drive has not set a size.


# 110720 11-Feb-2003 phk

Make a mutex to stop the race coming into geom_disk's done routine.

Cut up requests into smaller bits if they are longer than the drivers
disk->d_maxsize or dev->si_iosize_max.

Properly handle the race condition when using g_clone_bio() is used
without having the single-threadedness of g_down/g_up secure locking.


# 110710 11-Feb-2003 phk

Better names for struct disk elements: d_maxsize, d_stripeoffset
and d_stripesisze;

Introduce si_stripesize and si_stripeoffset in struct cdev so we
can make the visible to clustering code.

Add stripesize and stripeoffset to providers.

DTRT with stripesize and stripeoffset in various places in GEOM.


# 110708 11-Feb-2003 phk

Propagate DISKFLAG_CANDELETE from struct disk to G_PF_CANDELETE on the
provider.


# 110477 06-Feb-2003 phk

Check return value of g_clone_bio().


# 110475 06-Feb-2003 phk

Experimentally don't let go of Giant in geom_disk's done.
We may actually be increasing Giant contention doing so because the
actual stuff we do is very cheap.

Also I am not convinced there is not a tiny window for a race here.


# 110419 05-Feb-2003 phk

Implement the new "struct disk" centered API for device drivers.

This commit should not change anything as no device drivers use the
new API yet.


# 110317 04-Feb-2003 phk

Pave the road to removing the fixed size limit on device nodes:

Change the si_name of dev_t's to be a char * and put a private buffer for
holding the name at then end of the struct.

Initialize si_name to point to the private buffer.

Put a KASSERT in geom_disk to prevent overrun on the fake dev_t we still
have to generate for the disk_drivers.


# 110230 02-Feb-2003 phk

Add a bio_disk pointer for use between geom_disk and the device drivers.


# 110119 30-Jan-2003 phk

Add some agility to the disk_create() API:

Make passing the methods in a cdevsw structure optional.

Move "CANFREE" and "NOGIANT" flags into struct disk instead of the
cdevsw which may or may not be there.

Rename CANFREE to CANDELETE to match BIO_DELETE operation.

Add "OPEN" flag so drivers don't have to provide open/close methods
just to maintain such a flag.

Add temporary stopgap include of <sys/conf.h> to <sys/disk.h> until
the files which have them in the other order are fixed.

Add KASSERTS to make sure we don't get fed too many NULL pointers.

Clear our geom's softc pointer before we wither.


# 110118 30-Jan-2003 phk

NO_GEOM cleanup: Remove sys/disklabel.h include.


# 110116 30-Jan-2003 phk

NO_GEOM cleanup: retire disk_invalidate()


# 110081 30-Jan-2003 phk

NO_GEOM cleanup: Mark the last arg to disk_create() as unused.


# 110052 29-Jan-2003 phk

Add code to repsect the D_NOGIANT flag, should the disk device driver set it.
NO_GEOM cleanup: remove ifdefs.

Still untested.


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 109563 20-Jan-2003 phk

disk_dev_synth() is a NO_GEOM hack.


# 109560 20-Jan-2003 phk

Remove need for <sys/diskslice.h> but retain numerical compatibilty just in case.


# 107953 16-Dec-2002 phk

Constification and some s/int/u_int/ changes.


# 106101 28-Oct-2002 phk

Add the remaning part of the new libdisk interaction.

WARNING: This is not a published interface, it is a stopgap measure for
WARNING: libdisk so we can get 5.0-R out of the door.

Sponsored by: DARPA & NAI Labs


# 105957 25-Oct-2002 phk

Reduce the GEOM verbosity under bootverbose to something more sufferable.
This is not quite the set of information I would want, but the tree where
I have the "correct" version is messed up with conflicts.

Sponsored by: DARPA & NAI Labs.


# 105551 20-Oct-2002 phk

Now that the sectorsize and mediasize are properties of the provider,
don't take the detour over the I/O path to discover them using getattr(),
we can just pick them out directly.

Do note though, that for now they are only valid after the first open
of the underlying disk device due compatibility with the old disk_create()
API. This will change in the future so they will always be valid.

Sponsored by: DARPA & NAI Labs.


# 105542 20-Oct-2002 phk

Make the sectorsize a property of providers so we can include it in the XML
output.

Sponsored by: DARPA & NAI Labs


# 105539 20-Oct-2002 phk

It makes more sense for the fwheads and fwsectors properties to be in
the provider stanza rather than the geom stanza.


# 105537 20-Oct-2002 phk

Include fwsectors and gfwheads in the XML output for the disks we know.

Sponsored by: DARPA & NAI Labs.


# 105350 17-Oct-2002 phk

NUL terminate sysctl kern.disks


# 104936 11-Oct-2002 phk

The CAM system has it's own ideas of what locks are to be held by whom.
So do GEOM. Not a pretty sight.

Take all the interesting stuff out of GEOM::disk_create(), and leave just
the creation of the fake dev_t. Schedule the topology munging to happen
in the g_event thread with g_call_me().

This makes disk_create() pretty lock-agnostic, almost lock-atheist.

Tripped over by: peter
Sponsored by: DARPA & NAI Labs


# 104609 07-Oct-2002 phk

Correctly deal with non-DEVBSIZE drives.
Allow BIO_DELETE through too.

This fixes swap-backed md(4) devices.

Sponsored by: DARPA & NAI Labs.


# 104602 07-Oct-2002 phk

Copyin and copyout are only possible from a process-native thread,
and therefore we need a way for ioctl handlers to run in that thread
in GEOM. Rather than invent a complicated registration system to
recognize which ioctl handler to use for a given ioctl, we still
schedule all ioctls down the tree as bio transactions but add a
special return code that means "call me directly" and have the
geom_dev layer do that.

Use this for all ioctls that make it as far as a diskdriver to
avoid any backwards compatibility problems.

Requested by: scottl
Sponsored by: DARPA & NAI Labs


# 104542 05-Oct-2002 phk

This patch got lost in my trees: Pass setattr down to device drivers
as well.

Detected by: scottl
Sponsored by: DARPA & NAI Labs.


# 104519 05-Oct-2002 phk

NB: This commit does *NOT* make GEOM the default in FreeBSD
NB: But it will enable it in all kernels not having options "NO_GEOM"

Put the GEOM related options into the intended order.

Add "options NO_GEOM" to all kernel configs apart from NOTES.

In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.

There are currently three known issues which may force people to
need the NO_GEOM option:

boot0cfg/fdisk:
Tries to update the MBR while it is being used to control
slices. GEOM does not allow this as a direct operation.

SCSI floppy drives:
Appearantly the scsi-da driver return "EBUSY" if no media
is inserted. This is wrong, it should return ENXIO.

PC98:
It is unclear if GEOM correctly recognizes all variants of
PC98 disklabels. (Help Wanted! I have neither docs nor HW)

These issues are all being worked.

Sponsored by: DARPA & NAI Labs.


# 104451 04-Oct-2002 phk

Implement the "kern.disks" sysctl in GEOM.

This makes "mdconfig -l" work again.

Sponsored by: DARPA & NAI Labs.


# 104450 04-Oct-2002 phk

Properly conditionalize a debugging printf.

Sponsored by: DARPA & NAI Labs.


# 104312 01-Oct-2002 phk

Don't restrict device drivers ability to sleep in the ioctl method, this
is actually entirely legal.

Do bio's with ioctls in them in a g_call_me() function.

Sponsored by: DARPA & NAI Labs


# 104195 30-Sep-2002 phk

Retire g_io_fail() and let g_io_deliver() take an error argument instead.

Sponsored by: DARPA & NAI Labs.


# 104087 28-Sep-2002 phk

Style, whitespace and lint fixes.

Sponsored by: DARPA & NAI Labs.


# 103714 20-Sep-2002 phk

(This commit touches about 15 disk device drivers in a very consistent
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)

If struct disklabel is the messenger: kill the messenger.

Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer). This commit changes this communication to use four
explicit fields instead.

Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.

Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.

The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.

This concludes (modulus any mistakes) the series of disklabel related
commits.

I belive it all amounts to a NOP for all the rest of you :-)

Sponsored by: DARPA & NAI Labs.


# 103284 13-Sep-2002 phk

"Fix" printf format issues by using %j

Sponsored by: DARPA & NAI Labs.


# 98066 09-Jun-2002 phk

Improve some on the naming.

Submitted by: iedowse


# 97317 26-May-2002 phk

Give the closet-dev_t we hand to the diskdrivers a name.


# 97075 21-May-2002 phk

Remove the "-class" suffix from classes, they will not be ambiguous.

Sponsored by: DARPA & NAI Labs.


# 95038 19-Apr-2002 phk

Make kernel dumps work with GEOM.

Notice that if the device on which the dump is set is destroyed for
any reason, the dump setting is lost. This in particular will
happen in the case of spoilage. For instance if you set dump on
ad0s1b and open ad0 for writing, ad0s* will be spoilt and the dump
setting lost. See geom(4) for more about spoiling.

Sponsored by: DARPA & NAI Labs.


# 94287 09-Apr-2002 phk

Implement DIOCGFRONTSTUFF ioctl which reports how many bytes from the start
of the device magic stuff might occupy.

Sponsored by: DARPA & NAI Labs.


# 94175 08-Apr-2002 phk

In reverence of the 3rd X11 development rule:

3.The only thing worse than generalizing from one example
is generalizing from no examples at all.

Remove the fwcylinders attribute before anybody gets the idea that we
alone have squared the circle.

Sponsored by: DARPA & NAI Labs.


# 93778 04-Apr-2002 phk

Centralize EOF handling and improve access controls for bio scheduling.

Sponsored by: DARPA & NAI Labs


# 93776 04-Apr-2002 phk

Move access and orphan member functions from class to geom.

Sponsored by: DARPA & NAI Labs


# 93642 02-Apr-2002 phk

Initialize a field to cater for ata-raid


# 93496 31-Mar-2002 phk

Here follows the new kernel dumping infrastructure.

Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by: DARPA, NAI Labs


# 93248 26-Mar-2002 phk

Cave in to tradition and rename "methods" to "classes".


# 92403 16-Mar-2002 phk

Add a generic and general ioctl pass-through mechanism.

It should now be posible to issue ioctls to SCSI CD drives.


# 92108 11-Mar-2002 phk

First commit of the GEOM subsystem to make it easier for people to
test and play with this.

This is not yet production quality and should be run only on dedicated
test boxes.

For people who want to develop transformations for GEOM there exist a
set of shims to run geom in userland (ask phk@freebsd.org).

Reports of all kinds to: phk@freebsd.org
Please include in report:
dmesg
sysctl debug.geomdot
sysctl debug.geomconf

Known significant limitations:
no kernel dump facility.
ioctls severely restricted.

Sponsored by: DARPA, NAI Labs