History log of /freebsd-10.0-release/sys/geom/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

258505 24-Nov-2013 mjg

MFC r256951:
gnop: make sure that newly allocated memory for softc is zeroed

This prevents mtx_init from encountering non-zeros and panicking
the kernel as a result.

Approved by: re


257718 05-Nov-2013 delphij

MFC r257539:

When zero'ing out a buffer, make sure we are using right size.

Without this change, in the worst but unlikely case scenario, certain
administrative operations, including change of configuration, set or
delete key from a GEOM ELI provider, may leave potentially sensitive
information in buffer allocated from kernel memory.

We believe that it is not possible to actively exploit these issues, nor
does it impact the security of normal usage of GEOM ELI providers when
these operations are not performed after system boot.

Security: possible sensitive information disclosure
Submitted by: Clement Lecigne <clecigne google com>
Approved by: re (glebius)


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255860 24-Sep-2013 des

Introduce a kern.geom.notaste sysctl that can be used to temporarily
disable GEOM tasting to avoid the "bouncing GEOM" problem where, when
you shut down the consumer of a provider which can be viewed in multiple
ways (typically a mirror whose members are labeled partitions), GEOM
will immediately taste that provider's alter ego and reattach the
consumer.

Approved by: re (glebius)


255237 05-Sep-2013 ae

Remove stub implementation.

MFC after: 1 week


255144 02-Sep-2013 mav

Make ELI destruction (including orphanization) less aggressive, making it
always wait for provider close. Old algorithm was reported to cause NULL
dereference panic on attempt to close provider after softc destruction.
If not global workaroung in GEOM, that could even cause destruction with
requests still in flight.


254936 26-Aug-2013 mav

MFprojects/camlock r254895:
Add unmapped BIO support to GEOM ZERO if kern.geom.zero.clear is cleared.


254766 24-Aug-2013 mav

Add new attribute lunname to report only textual LUN-specific device IDs.
While lunid attribute prefers to report numeric ones, having both may be
useful in some situations.


254389 15-Aug-2013 ken

Change the way that unmapped I/O capability is advertised.

The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver. The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't. The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot. The cdevsw
is shared among all driver instances.

So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.

sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it
with an SI_UNMAPPED cdev flag.

kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine
whether or not a particular driver can handle
unmapped I/O.

geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs.
Since GEOM will create a temporary mapping when
needed, setting SI_UNMAPPED unconditionally will
work.

Remove the D_UNMAPPED_IO flag.

nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here
if NVME_UNMAPPED_BIO_SUPPORT is enabled.

vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a
cdev instead of the D_UNMAPPED_IO flag on the cdevsw.

sys/param.h: Bump __FreeBSD_version to 1000045 for the switch from
setting the D_UNMAPPED_IO flag in the cdevsw to setting
SI_UNMAPPED in the cdev.

Reviewed by: kib, jimharris
MFC after: 1 week
Sponsored by: Spectra Logic


254275 13-Aug-2013 mav

Return error when opening read-only volumes (like RAID4/5/...) for writing.
Previously opens succeeded, but actual write operations returned errors.

Requested by: peter
MFC after: 2 weeks


254271 13-Aug-2013 mav

Oops, wrong constant at r254269.


254269 13-Aug-2013 mav

Fix reasonable but safe Clang warnings.


254252 12-Aug-2013 ed

Fix the formatting of the error message.

The G_MIRROR_DEBUG() macro already appends a newline. Also, most of the
log messages emitted by gmirror start with an uppercase letter.


254095 08-Aug-2013 ae

gpt_entries is used as limit for the number of partition entries in
the GEOM_PART. Instead of just using number of entries from the GPT
header, calculate this limit based on the reserved space between
GPT header and first available LBA.

MFC after: 2 weeks


254015 07-Aug-2013 marcel

Change <sys/diskpc98.h> to not redefine the same symbols that are
being defined in <sys/diskmbr.h>. Instead give the symbols here a
"PC98_" prefix. This way, both <sys/diskmbr.h> and <sys/diskpc98.h>
can be included in the same C source file.

The renaming is trivial. The only gotcha is that DOSBBSECTOR is
also redefined from 0 to 1. This because DOSBBSECTOR was always
used in conjunction with an addition of 1. The PC98_BBSECTOR symbol
is defined as 1 and the expression is simplified.

Note: it is not believed that ports are seriously impacted; or at
all for that matter.

Approved by: nyan@


253938 04-Aug-2013 marcel

Remove inclusion of <sys/diskmbr.h>. We have no business knowing
anything related to MBR in this file.


253706 27-Jul-2013 mav

Introduce 3 seconds timeout on `graid stop` command (mostly with -f flag).
Since completion waiting goes in g_event thread, it may cause GEOM deadlock
if consumer on top (for example, ZFS) uses g_event thread for closing.


253141 10-Jul-2013 kib

When panicing due to the gjournal overflow, print the geom metadata
journal id.

Requested by: Andreas Longwitz <longwitz@incore.de>
MFC after: 1 week


253106 09-Jul-2013 kib

There are several code sequences like
vfs_busy(mp);
vfs_write_suspend(mp);
which are problematic if other thread starts unmount between two
calls. The unmount starts a write, while vfs_write_suspend() drain
writers. On the other hand, unmount drains busy references, causing
the deadlock.

Add a flag argument to vfs_write_suspend and require the callers of it
to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the
mount path, i.e. the covered vnode is not locked. The suspension is
not attempted if VS_SKIP_UNMOUNT is specified and unmount is in
progress.

Reported and tested by: Andreas Longwitz <longwitz@incore.de>
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


252657 03-Jul-2013 smh

Bump disk(9) ABI version to signify the addition of d_delmaxsize by r249940.

Ensure that d_delmaxsize is always set, removing init to 0 which could cause
future issues if use cases change.

Allow kern.cam.da.X.delete_max (which maps to d_delmaxsize) to be increased
up to the calculated max after being reduced.

MFC after: 1 day
X-MFC-With: r249940


252330 28-Jun-2013 jeff

- Add a general purpose resource allocator, vmem, from NetBSD. It was
originally inspired by the Solaris vmem detailed in the proceedings
of usenix 2001. The NetBSD version was heavily refactored for bugs
and simplicity.
- Use this resource allocator to allocate the buffer and transient maps.
Buffer cache defrags are reduced by 25% when used by filesystems with
mixed block sizes. Ultimately this may permit dynamic buffer cache
sizing on low KVA machines.

Discussed with: alc, kib, attilio
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


252011 19-Jun-2013 scottl

Fix a mystery cut-n-paste corruption from the previous commit.

Submitted by: Brenden Fabeny


252010 19-Jun-2013 scottl

Mark geom_mirror as capable of unmapped i/o

Obtained from: Netflix
MFC after: 3 days


251654 12-Jun-2013 mav

Make CAM return and GEOM DISK pass through new GEOM::lunid attribute.

SPC-4 specification states that serial number may be property of device,
but not a specific logical unit. People reported about FC storages using
serial number in that way, making it unusable for purposes of LUN multipath
detection. SPC-4 states that designators associated with logical unit from
the VPD page 83h "Device Identification" should be used for that purpose.
Report first of them in the new attribute in such preference order: NAA,
EUI-64, T10 and SCSI name string.

While there, make GEOM DISK properly report GEOM::ident in XML output also
using d_getattr() method, if available. This fixes serial numbers reporting
for SCSI disks in `geom disk list` output and confxml.

Discussed with: gibbs, ken
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks


251616 11-Jun-2013 mav

Don't update provider properties and don't set DISKFLAG_OPEN if d_open()
disk method call returned error. GEOM considers devices in such case as
still closed, and won't call symmetric d_close() for them.


251588 09-Jun-2013 marcel

Change the set and unset ctlreqs by making the index argument optional.
This allows setting attributes on tables. One simply does not provide
an index in that case. Otherwise the entry corresponding the index has
the attribute set or unset.

Use this change to fix a relatively longstanding bug in our GPT scheme
that's the result of rev 198097 (relatively harmless) followed by rev
237057 (damaging). The damaging part being that our GPT scheme always
has the active flag set on the PMBR slice. This is in violation with
EFI. Existing EFI implementions for both x86 and ia64 reject the GPT.
As such, GPT disks created by us aren't usable under EFI because of
that.

After this change, GPT disks never have the active flag set on the PMBR
slice. In order to make the GPT disk bootable under some x86 BIOSes,
the reason of rev 198097, one must now set the active attribute on the
gpt table. The kernel will apply this to the PMBR slice For (S)ATA:
gpart set -a active ada0

To fix an existing GPT disk that has the active flag set in the PMBR,
and that does not need the flag, use (again for (S)ATA):
gpart unset -a active ada0

The EBR, MBR & PC98 schemes, which also impement at least 1 attribute,
now check to make sure the entry passed is valid. They do not have
attributes that apply to the table.


251587 09-Jun-2013 marcel

Remove stub implementation.


251117 30-May-2013 brooks

MFP4 @222836

Add support for partitioning CFI disks from FDT using geom_flashmap.

Sponsored by: DARPA, AFRL


250868 21-May-2013 jh

Remove an extra semicolon from the DOT language output.

PR: kern/178540
Submitted by: Trond Endrestol
MFC after: 1 week


250819 20-May-2013 mav

Fix vdc->Secondary_Element_Count metadata field access from 16 to 8 bit.
In some cases it could cause kernel panic during failed drive replacement.

Reported by: trasz
MFC after: 1 week


250264 05-May-2013 stas

- Use int8_t type for the mftrecsz field in g_label_ntfs. char type
used previously caused probe failure on platforms where char is unsigned
(e.g. ARM), as mftrecsz can be negative.

Submitted by: Ilya Bakulin <ilya@bakulin.de>
MFC after: 2 weeks


249974 27-Apr-2013 mav

Return "descr" field alike to "Intel RAID1 volume" for GEOM RAID to make
it look better in bsdinstall.


249940 26-Apr-2013 smh

Teach GEOM and CAM about the difference between the max "size" of r/w and delete
requests.

sys/geom/geom_disk.h:
- Added d_delmaxsize which represents the maximum size of individual
device delete requests in bytes. This can be used by devices to
inform geom of their size limitations regarding delete operations
which are generally different from the read / write limits as data
is not usually transferred from the host to physical device.

sys/geom/geom_disk.c:
- Use new d_delmaxsize to calculate the size of chunks passed through to
the underlying strategy during deletes instead of using read / write
optimised values. This defaults to d_maxsize if unset (0).

- Moved d_maxsize default up so it can be used to default d_delmaxsize

sys/cam/ata/ata_da.c:
- Added d_delmaxsize calculations for TRIM and CFA

sys/cam/scsi/scsi_da.c:
- Added re-calculation of d_delmaxsize whenever delete_method is set.

- Added kern.cam.da.X.delete_max sysctl which allows the max size for
delete requests to be limited. This is useful in preventing timeouts
on devices who's delete methods are slow. It should be noted that
this limit is reset then the device delete method is changed and
that it can only be lowered not increased from the device max.

Reviewed by: mav
Approved by: pjd (mentor)


249930 26-Apr-2013 smh

Added a sysctl (kern.geom.dev.delete_max_sectors) to control the maximum
size of a delete request sent to the providing device performed by g_dev_ioctl.

This allows the kernel and apps via ioctl e.g. newfs -E to request large LBA
deletes which siginificantly improves performance.

Previously this was hard coded to 65536 sectors, the new default is 262144
which doubles the throughput of deletes on commonly available SSD's.

In tests on a Intel 520 120GB FW: 400i disk it improved the delete throughput
from 1.6GB/s to over 2.6GB/s on a full disk delete such as that done via
newfs -E

For some SSD's where delete time is pretty much constant, no matter what
the request, setting this to 0 will provide significantly better throughput
e.g. Samsung 840 240GB FW DXT07B0Q @ 262144 = 79G/s, @ 0 = 2259G/s

Reviewed by: mav
Approved by: pjd (mentor)
MFC after: 2 weeks


249571 16-Apr-2013 ivoras

Comment typo fix.

Is aware of the importance of comments: dim


249564 16-Apr-2013 ivoras

Fix the buffer-overflow-fixing fixes.

Pointy-hat to: me, for not realizing snprintf() is available in kernel.
Thanks to: jh, for bringing me the good news of snprintf(), Pawel Worach, for
noting that the panic can be provoked in i386 and not in amd64


249556 16-Apr-2013 brooks

Partial MFP4 of 222836:

Only look for FDT partitions if our potential parent is a DISK device.

Excluding direct recursion on the flashmap geoms was insufficient
because it did not prevent the underlying device from being retrieved if
flashmap geoms were further partitioned.

Reviewed by: imp
Sponsored by: DARPA, AFRL


249508 15-Apr-2013 ivoras

Introduce glabel labels based on GEOM ident attributes. In this initial
implementation, error on the side of conservatism and only create labels
for GEOMs of classes DISK and MULTIPATH.

Discussed with: trasz
Approved by: silence from freebsd-geom@


249507 15-Apr-2013 ivoras

Introduce a symbol for the GEOM class name instead of using the ad-hoc string
constant.


249440 13-Apr-2013 jmg

move the error report to a lower log level... Now you can see when it
returns an error without getting every single io that went through it..

MFC after: 1 week


249193 06-Apr-2013 trasz

Make it possible to submit FLUSH bios through geom_dev strategy. This
is required for CTL to work with device-backed LUNs.

Reviewed by: mav


249161 05-Apr-2013 mav

Following r241022, replace iteration over the provider list on media events
by taking first one and asserting that there is no others.

MFC after: 1 week


248722 26-Mar-2013 mav

geom_slice.c and its consumers like GEOM_LABEL are not touching the data
unless hotspots are used. Pass G_PF_ACCEPT_UNMAPPED flag through except
such rare cases (obsolete GEOM_SUNLABEL and GEOM_BSD).


248721 26-Mar-2013 mav

GEOM NOP does not touch the data, so pass G_PF_ACCEPT_UNMAPPED flag through.


248720 26-Mar-2013 mav

Remove extra bio_data and bio_length copying to child request after calling
g_clone_bio(), that already copied them.


248712 26-Mar-2013 kan

Do not pass unmapped buffers to drivers that cannot handle them

In physio, check if device can handle unmapped IO and pass an
appropriately mapped buffer to the driver strategy routine. The
only driver in the tree that can handle unmapped buffers is one
exposed by GEOM, so mark it as such with the new flag in the
driver cdevsw structure.

This fixes insta-panics on hosts, running dconschat, as /dev/fwmem
is an example of the driver that makes use of physio routine, but
bypasses the g_down thread, where the buffer gets mapped normally.

Discussed with: kib (earlier version)


248696 25-Mar-2013 mav

Make GEOM MULTIPATH to report unmapped bio support if underling path report
it. GEOM MULTIPATH itself never touches the data and so transparent.


248694 25-Mar-2013 mav

In GEOM DISK:
- Replace single done mutex with per-disk ones. On system with several
disks on several HBAs that removes small, but measurable lock congestion.
- Modify disk destruction process to not destroy the mutex prematurely.
- Remove some extra pointer derefences.


248679 24-Mar-2013 mav

Fix long known deadlock between geom dev destruction and d_close() call.
Use destroy_dev_sched_cb() to not wait for device destruction while holding
GEOM topology lock (that actually caused deadlock). Use request counting
protected by mutex to properly wait for outstanding requests completion in
cases of device closing and geom destruction. Unlike r227009, this code
does not block taskqueue thread for indefinite time, waiting for completion.


248674 24-Mar-2013 mav

Make g_wither_washer() to not loop by itself, but only when there was some
more topology change done that may require its attention. Add few missing
g_do_wither() calls in respective places to signal it.

This fixes potential infinite loop here when some provider is withered, but
still opened or connected for some reason and so can not be destroyed. For
example, see r227009 and r227510.


248596 21-Mar-2013 kib

Correct the page count when excess length is trimmed from the bio.

Reported and tested by: Ivan Klymenko <fidaj@ukr.net


248568 21-Mar-2013 kib

Assert that transient mapping of the bio is only done when unmapped
buffers are allowed.

Sponsored by: The FreeBSD Foundation


248517 19-Mar-2013 kib

The geom_part provider supports unmapped bio iff the underlying
provider does so, since geom_part never inspects the bio_data.

Sponsored by: The FreeBSD Foundation
Tested by: pho


248516 19-Mar-2013 kib

A flag for the geom disk driver to indicate that it accepts the
unmapped i/o requests.

Sponsored by: The FreeBSD Foundation
Tested by: pho


248508 19-Mar-2013 kib

Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks


248295 14-Mar-2013 pjd

We don't need buffer to handle BIO_DELETE, so don't check buffer size for it.
This fixes handling BIO_DELETE larger than MAXPHYS.


248068 08-Mar-2013 sbruno

Add legacy support to geom raid to create a /dev/arX device for support
of upgrading older machines using ataraid(4) to newer releases.

This optional parameter is controlled via kern.geom.raid.legacy_aliases
and will create a /dev/ar0 device that will point at /dev/raid/r0 for
example.

Tested on Dell SC 1425 DDF-1 format software raid controllers installing from
stable/7 and upgrading to stable/9 without having to adjust /etc/fstab

Reviewed by: mav
Obtained from: Yahoo!
MFC after: 2 Weeks


248058 08-Mar-2013 dumbbell

g_label_ntfs_taste: Abort taste is recsize == 0

This will avoid a 0-byte read (in g_read_data()) leading to a panic, if
previously read data are erroneous.

Suggested by: John-Mark Gurney <jmg@funkthat.com>


247961 07-Mar-2013 gavin

Support the FAT16 partition type in gpart(8)

PR: kern/174714
Submitted by: 4721 at hushmail dot com
MFC after: 1 week


247918 07-Mar-2013 mav

Fix panic when Secondary_Element_Count == 1 and Secondary_Element_Seq
is not set (255).

Reported by: sbruno
MFC after: 1 week


247837 05-Mar-2013 dumbbell

g_label_ntfs.c: Mark structures as __packed

Without this, read data is mis-interpreted. This could trigger a panic,
as was the case on one computer where computed "recsize" was zero,
leading to a call to g_read_page() asking for 0 bytes.


247662 02-Mar-2013 attilio

Remove ntfs headers dependency for g_label_ntfs.c by redefining the
used structs and values.

This patch is not targeted for MFC.


246876 16-Feb-2013 mckusick

Add barrier write capability to the VFS buffer interface. A barrier
write is a disk write request that tells the disk that the buffer
being written must be committed to the media along with any writes
that preceeded it before any future blocks may be written to the drive.

Barrier writes are provided by adding the functions bbarrierwrite
(bwrite with barrier) and babarrierwrite (bawrite with barrier).

Following a bbarrierwrite the client knows that the requested buffer
is on the media. It does not ensure that buffers written before that
buffer are on the media. It only ensure that buffers written before
that buffer will get to the media before any buffers written after
that buffer. A flush command must be sent to the disk to ensure that
all earlier written buffers are on the media.

Reviewed by: kib
Tested by: Peter Holm


245946 26-Jan-2013 avg

g_mirror: g_getattr() failure should not be fatal

This allows to use gmirror e.g. on top of ZVOLs.

PR: kern/175323
Submitted by: Alexei.Volkov@softlynx.ru, mav
Reported by: Alexei.Volkov@softlynx.ru
Tested by: Alexei.Volkov@softlynx.ru
Reviewed by: ae, mav, pjd
MFC after: 1 week


245533 17-Jan-2013 mav

- Fix rebuild position broken at r245522.
- Identify one more metadata field.


245522 17-Jan-2013 mav

For Promise/AMD metadata add support for disks with capacity above 2TiB
and for volumes with sector size above 512 bytes.


245519 17-Jan-2013 mav

Recalculate volume size only for real CONCATs. For SINGLE trust volume
size given by metadata, as it should be correct and in some cases can be
smaller then subdisk size.


245456 15-Jan-2013 mav

Allow to insert new component to geom_raid3 without specifying number.

PR: kern/160562
MFC after: 2 weeks


245444 15-Jan-2013 mav

Alike to r242314 for GRAID make GRAID3 more aggressive in marking volumes
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully. To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

MFC after: 2 weeks


245443 15-Jan-2013 mav

Alike to r242314 for GRAID make GMIRROR more aggressive in marking volumes
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully. To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

PR: kern/113957
MFC after: 2 weeks


245433 14-Jan-2013 mav

Keep value of orig_config_id metadata field. Windows driver writes there
previous value of config_id when it is changed in some cases. I guess it
may be used do avoid some split-brain conditions.


245425 14-Jan-2013 mav

Small cosmetic tuning of the IRRT status constants.


245423 14-Jan-2013 mav

Print some more metadata fields.


245400 14-Jan-2013 mav

Windows driver writes relative volume IDs to metadata field. Use that value
as a hint for raid/rX device number to make it persistent across reboots.


245398 13-Jan-2013 mav

- Add checks for Intel metadata version and attributes. Ignore disks with
unsupported metadata types like Intel Smart Response to not corrupt them.
- Improve setting of these things during metadata writing to protect from
incapable BIOS'es and other implementations.


245363 13-Jan-2013 mav

Improve support for disabled disks. If disabled disk disconnected and then
reconnected back, leave it as disconnected. If new disk inserted instead of
disabled, rebuild it and leave as enabled.


245341 12-Jan-2013 mav

Windows handles INIT and VERIFY as array-wide and it doesn't specify which
disks should be rebuilt. Our rebuild code is same time disk-centric. To
handle this situation properly check all disks for RBLD flags, and if no
disk specified try rebuild/resync all of them except newly inserted.


245338 12-Jan-2013 mav

Implement migration from single disk to RAID1/IRRT for Intel metadata.
Windows driver uses such migration when it creates new arrays. While GEOM
RAID has no mechanism to implement migration in general case, this specifc
case still can be handled easily via degraded RAID1 creation followed by
regular rebuild.


245326 12-Jan-2013 mav

Add basic support for Intel Rapid Recover Technology (Intel RRT).
It is alike to RAID1, but with dedicating master and recovery disks and
providing manual control over synchronization. It allows to use recovery
disk as snapshot of the master disk from the time of the last sync.

This implementation is not functionaly complete comparing to Windows,
but it is better then silent conversion to RAID1 on first boot.


245286 11-Jan-2013 kib

Add flags argument to vfs_write_resume() and remove
vfs_write_resume_flags().

Sponsored by: The FreeBSD Foundation


244716 26-Dec-2012 pjd

Reset provider-specific fields when resending I/O request in low memory
conditions. This fixes assertion which checks those fields when kernel is
compiled with DIAGNOSTIC.

Reported by: kib, pho
MFC after: 1 week


244585 22-Dec-2012 jh

Mangle label names containing spaces, non-printable characters '%' or
'"'. Mangling is only done for label names read from file system
metadata. Encoding resembles URL encoding. For example, the space
character becomes %20.

Help by: kib
Discussed with: imp, kib, pjd


243333 20-Nov-2012 jh

- Don't pass geom and provider names as format strings.
- Add __printflike() attributes.
- Remove an extra argument for the g_new_geomf() call in swapongeom_ev().

Reviewed by: pjd


242439 01-Nov-2012 alfred

Provide a device name in the sysctl tree for programs to query the
state of crashdump target devices.

This will be used to add a "-l" (ell) flag to dumpon(8) to list the
currently configured dumpdev.

Reviewed by: phk


242379 30-Oct-2012 trasz

Fix problem with geom_label(4) not recognizing UFS labels on filesystems
extended using growfs(8). The problem here is that geom_label checks if
the filesystem size recorded in UFS superblock is equal to the provider
(i.e. device) size. This check cannot be removed due to backward
compatibility. On the other hand, in most cases growfs(8) cannot set
fs_size in the superblock to match the provider size, because, differently
from newfs(8), it cannot recompute cylinder group sizes.

To fix this problem, add another superblock field, fs_providersize, used
only for this purpose. The geom_label(4) will attach if either fs_size
(filesystem created with newfs(8)) or fs_providersize (filesystem expanded
using growfs(8)) matches the device size.

PR: kern/165962
Reviewed by: mckusick
Sponsored by: FreeBSD Foundation


242328 29-Oct-2012 mav

Minor addition to r242323:
Alike to BIO_WRITE, report success if at least one subdisk succeeded with
BIO_DELETE. But unlike BIO_WRITE don't fail disk on BIO_DELETE error.

Sponsored by: iXsystems, Inc.
MFC after: 1 month


242323 29-Oct-2012 mav

Add basic BIO_DELETE support to GEOM RAID class for all RAID levels.

If at least one subdisk in the volume supports it, BIO_DELETE requests
will be propagated down. Unfortunatelly, for RAID levels with redundancy
unmapped blocks will be mapped back during first rebuild/resync process.

Sponsored by: iXsystems, Inc.
MFC after: 1 month


242322 29-Oct-2012 trasz

Fix locking problem in disk_resize(); previously it would run without
topology lock, resulting in assertion when running with DIAGNOSTIC.

Reviewed by: mav (earlier version)


242314 29-Oct-2012 mav

Make GEOM RAID more aggressive in marking volumes as clean on shutdown
and move that action from shutdown_pre_sync to shutdown_post_sync stage
to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully. To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

MFC after: 2 weeks


241896 22-Oct-2012 kib

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


241706 18-Oct-2012 attilio

It seems that it is preferable to keep support for glabel also for
filesystems that we don't support natively.
Revert part of r241636 to do so.

This patch is not targeted for MFC.

Requested by: gleb, jhb


241636 17-Oct-2012 attilio

Disconnect non-MPSAFE NTFS from the build in preparation for dropping
GIANT from VFS. This code is particulary broken and fragile and other
in-kernel implementations around, found in other operating systems,
don't really seem clean and solid enough to be imported at all.
If someone wants to reconsider in-kernel NTFS implementation for
inclusion again, a fair effort for completely fixing and cleaning it
up is expected.

In the while NTFS regular users can use FUSE interface and ntfs-3g
port to work with their NTFS partitions.

This is not targeted for MFC.


241418 10-Oct-2012 mav

NULL-ify last previously used pointer instead of last possible pointer.
This should be only a cosmetic change.

Found by: Clang Static Analyzer


241329 07-Oct-2012 mav

Make graid command line a bit more friendly by allowing volume name or
provider name to be specified instead of geom name (first argument in all
subcommands except label). In most cases there is only one array used
any way, so it is not really useful to make user type ugly geom names like
Intel-f0bdf223 or SiI-732c2b9448cf. Though they can be used in some cases.

Sponsored by: iXsystems, Inc.
MFC after: 1 month


241296 06-Oct-2012 avg

g_part_taste: directly destroy consumer and geom here, no need for withering

Besides withered but still alive consumers may interfere with
re-tatsing.

MFC after: 16 days


241022 28-Sep-2012 pjd

Remove the topology lock from disk_gone(), it might be called with regular
mutexes held and the topology lock is an sx lock.

The topology lock was there to protect traversing through the list of providers
of disk's geom, but it seems that disk's geom has always exactly one provider.

Change the code to call g_wither_provider() for this one provider, which is
safe to do without holding the topology lock and assert that there is indeed
only one provider.

Discussed with: ken
MFC after: 1 week


240822 22-Sep-2012 pjd

Use the topology lock to protect list of providers while withering them.
It is possible that provider is destroyed while we are iterating over the
list.

Reported by: Brian Parkison <parkison@panzura.com>
Discussed with: phk
MFC after: 1 week


240629 18-Sep-2012 avg

g_disk_flushcache definitely should not be traced under G_T_TOPOLOGY

... use G_T_BIO instead

MFC after: 1 week


240465 13-Sep-2012 mav

Add global and per-module sysctls/tunables to enable/disable metadata taste.
That should help to handle some cases when disk has some RAID metadata that
should be ignored, especially during boot.

MFC after: 3 days


240371 11-Sep-2012 glebius

When synchronizing, include in the config dump amount of
bytes syncronized.
The rationale behind this is the following: for large disks the
percent synchronisation counter ticks too seldom, and monitoring
software (as well as human operator) can't tell whether
synchronisation goes on or one of disks got stuck. On an idle
server one can look into gstat and see whether synchronisation goes
on or not, but on a busy server that won't work. Also, new value
monitored can be differentiated obtaining the synchronisation speed
quite precisely.

Submitted by: Konstantin Kukushkin <dark ramtel.ru>
Reviewed by: pjd


239987 01-Sep-2012 pjd

Allow to pass providers with /dev/ prefix to g_provider_by_name().

MFC after: 3 days


239790 28-Aug-2012 ed

Remove unneeded G_PF_CANDELETE flag.

This flag is only used by GEOM so it can be propagated to the character
device's SI_CANDELETE. Unfortunately, SI_CANDELETE seems to do nothing.


239673 25-Aug-2012 thomas

(g_multipath_rotate): Fix algorithm so that it does rotate over all good
providers, not just the last two.

PR: kern/170379
Reviewed by: mav
MFC after: 2 weeks


239184 10-Aug-2012 pjd

Always initialize sc_ekey, because as of r238116 it is always used.

If GELI provider was created on FreeBSD HEAD r238116 or later (but before this
change), it is using very weak keys and the data is not protected.
The bug was introduced on 4th July 2012.

One can verify if its provider was created with weak keys by running:

# geli dump <provider> | grep version

If the version is 7 and the system didn't include this fix when provider was
initialized, then the data has to be backed up, underlying provider overwritten
with random data, system upgraded and provider recreated.

Reported by: Fabian Keil <fk@fabiankeil.de>
Tested by: Fabian Keil <fk@fabiankeil.de>
Discussed with: so
MFC after: 3 days


239175 10-Aug-2012 mav

Add missing FAILED event to g_raid_subdisk_event2str() to print it properly
in debug messages.

Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com>


239132 07-Aug-2012 jimharris

Clone BIO_ORDERED flag, for disk drivers (namely CAM) that try to
consume it.

Sponsored by: Intel
Discussed with: gibbs, scottl


239131 07-Aug-2012 trociny

In g_gate_dumpconf() always check the result of g_gate_hold().

This fixes "Negative sc_ref" panic possible when sysctl_kern_geom_confxml()
is run simultaneously with destroying GATE device.

Reviewed by: pjd
MFC after: 3 days


239021 03-Aug-2012 jimharris

In virstor_ctl_stop(), check for a valid softc before trying to update
metadata.

Sponsored by: Intel
Reported and tested by: Marcelo Gondim <gondim at bsdinfo dot com dot br>
PR: kern/170199
MFC after: 3 days


239012 03-Aug-2012 thomas

New command "gmultipath prefer" to force selection of a specified
provider in an Active/Passive configuration.

Reviewed by: mav
MFC after: 4 weeks


238892 29-Jul-2012 mav

Partially revert r238886 in part of GEOM_VFS spoiling.

This change triggered interesting foot shooting condition in GEOM when
RW access to root partition by fsck spoils VFS geom there, which has it
opened RO at the same time. Seems spoiling concept needs some rework.


238886 29-Jul-2012 mav

Implement media change notification for DA and CD removable media devices.
It includes three parts:
1) Modifications to CAM to detect media media changes and report them to
disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes
Asynchronous Notification mechanism to receive events from hardware.
Active polling with TEST UNIT READY commands with 3 seconds period is used
for incapable hardware. After that both CD and DA drivers work the same way,
detecting two conditions: "NOT READY: Medium not present" after medium was
detected previously, and "UNIT ATTENTION: Not ready to ready change, medium
may have changed". First one reported to disk(9) as media removal, second
as media insert/change. To reliably receive second event new
AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by
generic error handling code in cam_periph_error().
2) Modifications to GEOM core to handle media remove and change events.
Media removal handled by spoiling all consumers attached to the provider.
Media change event also schedules provider retaste after spoiling to probe
new media. New flag G_CF_ORPHAN was added to consumers to reflect that
consumer is in process of destruction. It allows retaste to create new
geom instance of the same class, while previous one is still dying.
3) Modifications to some GEOM classes: DEV -- to report media change
events to devd; VFS -- to handle spoiling same as orphan to prevent
accessing replaced media. PART class already handles spoiling alike to
orphan.

Reviewed by: silence on geom@ and scsi@
Tested by: avg
Sponsored by: iXsystems, Inc. / PC-BSD
MFC after: 2 months


238868 28-Jul-2012 trociny

Reorder things in g_gate_create() so at the moment when g_new_geomf()
is called name is properly initialized.

Discussed with: pjd
MFC after: 2 weeks


238657 20-Jul-2012 trasz

Make it possible to resize opened partitions.

Sponsored by: FreeBSD Foundation


238565 18-Jul-2012 trasz

Add missing free.


238559 17-Jul-2012 ken

Add back spare fields consumed in r237545. It seems that these should only
be consumed to maintain backward compatibility in stable, but should not be
consumed in head.

Submitted by: trasz, attilio (indirectly)


238534 16-Jul-2012 trasz

The resize GEOM event has no references, thus cannot be canceled.


238533 16-Jul-2012 trasz

Add back spare fields reused in r238213. According to Attilio, the rule
is to use reuse spares only when MFC-ing, not in CURRENT.


238219 07-Jul-2012 trasz

Add trivial resize handling to gnop(8).

Reviewed by: mav
Sponsored by: FreeBSD Foundation


238218 07-Jul-2012 trasz

Add trivial resize handling to gmountver(8).

Reviewed by: mav
Sponsored by: FreeBSD Foundation


238216 07-Jul-2012 trasz

Add disk_resize(), to make it possible for the disk drivers such as da(4)
to notify GEOM about LUN size change.

Reviewed by: mav (earlier version)
Sponsored by: FreeBSD Foundation


238213 07-Jul-2012 trasz

Add a new GEOM method, resize(), which is called after provider size changes.
Add a new routine, g_resize_provider(), to use to notify GEOM about provider
change.

Reviewed by: mav
Sponsored by: FreeBSD Foundation


238198 07-Jul-2012 trasz

Fix orphan() methods of several GEOM classes to not assume that there
is an error set on the provider. With GEOM resizing, class can become
orphaned when it doesn't implement resize() method and the provider size
decreases.

Reviewed by: mav
Sponsored by: FreeBSD Foundation


238171 06-Jul-2012 trasz

Fix typo in the comment.


238119 04-Jul-2012 pjd

Extend GEOM Gate class to handle read I/O requests directly within the kernel.
This will allow HAST to read directly from the local component without
even communicating userland daemon.

Sponsored by: Panzura, http://www.panzura.com
MFC after: 1 month


238116 04-Jul-2012 pjd

Use correct part of the Master-Key for generating encryption keys.
Before this change the IV-Key was used to generate encryption keys,
which was incorrect, but safe - for the XTS mode this key was unused
anyway and for CBC mode it was used differently to generate IV
vectors, so there is no risk that IV vector collides with encryption
key somehow.

Bump version number and keep compatibility for older versions.

MFC after: 2 weeks


238115 04-Jul-2012 pjd

Correct comment.

MFC after: 3 days


238114 04-Jul-2012 pjd

Correct a comment and correct style of a flag check.

MFC after: 3 days


237930 01-Jul-2012 glebius

Make geom_mirror more friendly to SSDs. To properly support TRIM,
we need to pass BIO_DELETE requests down to providers that support
it. Also, we need to announce our support for BIO_DELETE to upper
consumer. This requires:

- In g_mirror_start() return true for "GEOM::candelete" request.
- In g_mirror_init_disk() probe below provider for "GEOM::candelete"
attribute, and mark disk with a flag if it does support BIO_DELETE.
- In g_mirror_register_request() distribute BIO_DELETE requests only
to those disks, that do support it.

Note that we announce "GEOM::candelete" as true unconditionally of
whether we have TRIM-capable media down below or not. This is made
intentionally, because upper consumer (usually UFS) requests the
attribite only once at mount time. And if user ever migrates his
mirror from HDDs to SSDs, then he/she would get TRIM working without
remounting filesystem.

Reviewed by: pjd


237929 01-Jul-2012 glebius

In g_mirror_regular_request() upon successful delivery treat
BIO_DELETE requests same way as BIO_WRITE removing them from
queue. This fixes panic with BIO_DELETE operations on geom_mirror.

Reviewed by: pjd


237875 01-Jul-2012 imp

Use %j to match intmax_t.


237820 29-Jun-2012 brooks

MFP4 #212266

Fix compile on MIPS64.

Sponsored by: DARPA, AFRL


237648 27-Jun-2012 ken

In g_disk_providergone(), don't continue if the softc is NULL. This may be
the case if we've already gone through g_disk_destroy().

Reported by: Michael Butler <imb@protected-networks.net>
MFC after: 3 days


237545 25-Jun-2012 ken

Consume spare fields for the providergone pointers added to the g_class and
g_geom structures in change 237518. The original change would have broken
the ABI.

Suggested by: ae
MFC after: 4 days


237518 24-Jun-2012 ken

Fix a bug which causes a panic in daopen(). The panic is caused by
a da(4) instance going away while GEOM is still probing it.

In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.

While that event is queued, the da(4) instance goes away. When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.

The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up. This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.

scsi_cd.c,
scsi_da.c: In the register routine for the cd(4) and da(4)
routines, acquire a reference to the CAM peripheral
instance just before we call disk_create().

Use the new GEOM disk d_gone() callback to register
a callback (dadiskgonecb()/cddiskgonecb()) that
decrements the peripheral reference count once GEOM
has finished cleaning up its resources.

In the cd(4) driver, clean up open and close
behavior slightly. GEOM makes sure we only get one
open() and one close call, so there is no need to
set an open flag and decrement the reference count
if we are not the first open.

In the cd(4) driver, use cam_periph_release_locked()
in a couple of error scenarios to avoid extra mutex
calls.

geom.h: Add a new, optional, providergone callback that
is called when a provider is about to be deleted.

geom_disk.h: Add a new d_gone() callback to the GEOM disk
interface.

Bump the DISK_VERSION to version 2. This probably
should have been done after a couple of previous
changes, especially the addition of the d_getattr()
callback.

geom_disk.c: Add a providergone callback for the disk class,
g_disk_providergone(), that calls the user's
d_gone() callback if it exists.

Bump the DISK_VERSION to 2.

geom_subr.c: In g_destroy_provider(), call the providergone
callback if it has been provided.

In g_new_geomf(), propagate the class's
providergone callback to the new geom instance.

blkfront.c: Callers of disk_create() are supposed to pass in
DISK_VERSION, not an explicit disk API version
number. Update the blkfront driver to do that.

disk.9: Update the disk(9) man page to include information
on the new d_gone() callback, as well as the
previously added d_getattr() callback, d_descr
field, and HBA PCI ID fields.

MFC after: 5 days


237057 14-Jun-2012 ae

Always reconstruct partition entries in the PMBR when Boot Camp is
disabled. This helps to easily recover from situations when PMBR is
damaged and contains no entries.

MFC after: 1 week


236619 05-Jun-2012 mav

Add missing newlines into XML output.

MFC after: 3 days
Sponsored by: iXsystems, Inc.


236023 25-May-2012 marcel

Add a partition type for nandfs to the apm, bsd, gpt and vtoc8 schemes.
The gpart alias for these partition types is "freebsd-nandfs".


235989 25-May-2012 trasz

Revert r235918 for now and add comment explaining the reason for the
size check.


235918 24-May-2012 trasz

Make g_label(4) ignore provider size when looking for UFS labels.
Without it, it fails to create labels for filesystems resized by
growfs(8).

PR: kern/165962
Submitted by: Olivier Cochard-Labbe <olivier at cochard dot me>


235858 23-May-2012 delphij

- Correct signedness for casts;
- Wrap long line while I'm there.

Noticed by: pjd, avg


235852 23-May-2012 delphij

Use %ju to match uintmax_t usage


235849 23-May-2012 delphij

Use %j and cast off_t to intmax_t for now to fix build.

Noticed by: bz


235778 22-May-2012 gber

Add a new geom class which allows to divide NAND Flash chip
into partitions.

Partitions are created based on data in dts file which are
extracted and interpreted by slicer.

Obtained from: Semihalf
Supported by: FreeBSD Foundation, Juniper Networks


235600 18-May-2012 ae

Prevent removing of the last active component from a mirror.

PR: kern/154860
Reviewed by: pjd
MFC after: 1 week


235599 18-May-2012 ae

Introduce new device flag G_MIRROR_DEVICE_FLAG_TASTING. It should
protect geom from destroying while it is tasting.

PR: kern/154860
Reviewed by: pjd
MFC after: 1 week


235419 13-May-2012 eadler

Add missing period at the end of the error message

Submitted by: pjd
Approved by: cperciva (implicit)
MFC after: 3 days
X-MFC-With: r235201


235270 11-May-2012 mav

- Prevent error status leak if write to some of the RAID1/1E volume disks
failed while write to some other succeeded. Instead mark disk as failed.
- Make RAID1E less aggressive in failing disks to avoid volume breakage.

MFC after: 2 weeks


235201 09-May-2012 eadler

Clarify error that geli generates
when it finds corrupt data.

PR: kern/165695
Submitted by: Robert Simmons <rsimmons0@gmail.com>
Reviewed by: pjd
Approved by: cperciva
MFC after: 1 week


235096 06-May-2012 mav

Remove some hardcoded constants from code.


235080 06-May-2012 mav

Plug small memory leaks.


235076 06-May-2012 mav

Add support for RAID5R. Slightly improve support for RAIDMDF.


235069 06-May-2012 mav

Fix `gmultipath configure` for big-endian machines.

MFC after: 1 week


234994 04-May-2012 mav

Fix bug causing memory corruption and panics with big-endian metadata.


234993 04-May-2012 mav

Implement read-only support for volumes in optimal state (without using
redundancy) for the following RAID levels: RAID4/5E/5EE/6/MDF.


234940 03-May-2012 mav

Add optional -o argument to the `graid label ` to specify some metadata
format options. Use it for specifying byte order for the DDF metadata:
big-endian defined by specification and little-endian used by Adaptec.


234899 01-May-2012 mav

Improve spare disks support. Unluckily, for some reason Adaptec 1430SA
RAID BIOS doesn't want to understand spare disks created by graid. But
at least spares created by BIOS are working fine now.


234869 01-May-2012 mav

Implement volume deletion if disk has more then one partition.


234868 01-May-2012 mav

Improve DDF metadata writing.


234848 30-Apr-2012 mav

Add to GEOM RAID class module, supporting the DDF metadata format, as
defined by the SNIA Common RAID Disk Data Format Specification v2.0.

Supports multiple volumes per array and multiple partitions per disk.
Supports standard big-endian and Adaptec's little-endian byte ordering.
Supports all single-layer RAID levels. Dual-layer RAID levels except
RAID10 are not supported now because of GEOM RAID design limitations.

Some work is still to be done, but the present code already manages basic
interoperation with RAID BIOS of the Adaptec 1430SA SATA RAID controller.

MFC after: 1 month
Sponsored by: iXsystems, Inc.


234816 29-Apr-2012 mav

s/gmirror/graid/


234727 27-Apr-2012 mav

Fix RAID5 level names changed at r234603.


234610 23-Apr-2012 mav

Fix copy-paste typo in r234603.

Submitted by: kan


234603 23-Apr-2012 mav

Add names for all primary RAID levels defined by DDF 2.0 specification.


234601 23-Apr-2012 mav

Add sos@ copyrights to RAID metadata modules, respecting his efforts in
decoding metadata formats in ataraid(4) code.


234458 19-Apr-2012 mav

Add to GEOM RAID class module for reading non-degraded RAID5 volumes and
some environment to differentiate 4 possible RAID5 on-disk layouts.

Tested with Intel and AMD RAID BIOSes.

MFC after: 2 weeks


234417 18-Apr-2012 marck

VMware environments are not unusual now. Add VMware partitions recognition
(both MBR for ESXi <= 4.1 and GPT for ESXi 5) to g_part.

Reviewed by: ae
Approved by: ae
MFC after: 2 weeks


234415 18-Apr-2012 mav

Some improvements to GEOM MULTIPATH:
- Implement "configure" command to allow switching operation mode of
running device on-fly without destroying and recreation.
- Implement Active/Read mode as hybrid of Active/Active and Active/Passive.
In this mode all paths not marked FAIL may handle reads same time,
but unlike Active/Active only one path handles write requests at any
point in time. It allows to closer follow original write request order
if above layers need it for data consistency (not waiting for requisite
write completion before sending dependent write).
- Hide duplicate messages about device status change.
- Remove periodic thread wake up with 10Hz rate.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


234026 08-Apr-2012 mckusick

Expand locking around identification of filesystem mount point when
accounting for I/O counts at completion of I/O operation. Also switch
from using global devmtx to vnode mutex to reduce contention.

Suggested and reviewed by: kib


233652 29-Mar-2012 ae

VMDB offset should be greater than logical volume size only for MBR.


233651 29-Mar-2012 ae

Do proper cleanup for the GPT case when an error occurs.


233627 28-Mar-2012 mckusick

Keep track of the mount point associated with a special device
to enable the collection of counts of synchronous and asynchronous
reads and writes for its associated filesystem. The counts are
displayed using `mount -v'.

Ensure that buffers used for paging indicate the vnode from
which they are operating so that counts of paging I/O operations
from the filesystem are collected.

This checkin only adds the setting of the mount point for the
UFS/FFS filesystem, but it would be trivial to add the setting
and clearing of the mount point at filesystem mount/unmount
time for other filesystems too.

Reviewed by: kib


233342 23-Mar-2012 ae

Check that scheme is not already registered. This may happens when a
KLD is preloaded with loader(8) and leads to infinity loops.

Also do not return EEXIST error code from MOD_LOAD handler, because
we have undocumented(?) ability replace kernel's module with preloaded one.
And if we have so, then preloaded module will be initialized first.
Thus error in MOD_LOAD handler will be triggered for the kernel.

PR: kern/165573
MFC after: 3 weeks


233181 19-Mar-2012 ae

Add CTLFLAG_TUN to sysctls.

MFC after: 1 month


233176 19-Mar-2012 ae

Add new GEOM_PART_LDM module that implements the Logical Disk Manager
scheme. The LDM is a logical volume manager for MS Windows NT and it
is also known as dynamic volumes. It supports about 2000 partitions
and also provides the capability for software RAID implementations.

This version implements only partitioning scheme capability and based
on the linux-ntfs project documentation and several publications across
the Web. NOTE: JBOD, RAID0 and RAID5 volumes aren't supported.

An access to the LDM metadata is read-only. When LDM is on the disk
partitioned with MBR we can also destroy metadata. For the GPT
partitioned disks destroy action is not supported.

Reviewed by: ivoras (previous version)
MFC after: 1 month


233175 19-Mar-2012 ae

Make kern.geom.part node not static. Also add CTLFLAG_TUN to the
check_integrity sysctl.

MFC after: 1 month


233000 15-Mar-2012 ae

Add MODULE_DEPEND() to geom_part modules.

MFC after: 2 weeks


232680 08-Mar-2012 emaste

Remove unactionable message about label geometry

It's not clear to a user what they should do after seeing the "geometry
does not match label" kernel message, and it does not appear to present
a problem in practice. Thus, just remove the messages.

Approved by: marcel


231929 20-Feb-2012 ae

If nested scheme allows dump kernel to its partition, we may allow
dump for the parent partition too.

MFC after: 2 weeks


231928 20-Feb-2012 ae

Add alias for the partition type 0x0f. Now "ebr" name is used for both
types 0x05 and 0x0f, but 0x05 is preferred and used when partition is
created with "gpart add -t ebr ...".
This should keep EBR partitions accessible after r231754 for those,
who have EBR on the partition with type 0x0f.


231754 15-Feb-2012 ae

Add additional check to EBR probe and create methods:
don't try probe and create EBR scheme when parent partition type
is not "ebr". This fixes error messages about corrupted EBR for
some partitions where is actually another partition scheme.

NOTE: if you have EBR on the partition with different than "ebr"
(0x05) type, then you will lost access to partitions until it will be
changed.

MFC after: 2 weeks


231751 15-Feb-2012 ae

Add PART::type attribute handler. It returns partition type as string.

MFC after: 2 weeks


231367 10-Feb-2012 ae

Add alias for the partition with type 0x42 to the MBR scheme.

MFC after: 1 week


231349 10-Feb-2012 ae

Let's be more realistic and limit maximum number of partition to 4k.

MFC after: 1 week


231075 06-Feb-2012 kib

Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return. Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by: mckusick
Tested by: scottl
MFC after: 2 weeks


230990 04-Feb-2012 emaste

Correct typo in comment (numbver)


230861 01-Feb-2012 ae

The scheme code may not know about some inconsistency in the metadata.
So, add an integrity check after recovery attempt.

MFC after: 1 week


230643 28-Jan-2012 attilio

Avoid to check the same cache line/variable from all the locking
primitives by breaking stop_scheduler into a per-thread variable.
Also, store the new td_stopsched very close to td_*locks members as
they will be accessed mostly in the same codepaths as td_stopsched and
this results in avoiding a further cache-line pollution, possibly.

STOP_SCHEDULER() was pondered to use a new 'thread' argument, in order to
take advantage of already cached curthread, but in the end there should
not really be a performance benefit, while introducing a KPI breakage.

In collabouration with: flo
Reviewed by: avg
MFC after: 3 months (or never)
X-MFC: r228424


230522 25-Jan-2012 nwhitehorn

Experimental support for booting CHRP-type PowerPC systems from hard disks.


230064 13-Jan-2012 truckman

Allow an MBR primary or extended Linux swap partition to be specified
as the system dump device. This was already allowed for GPT. The Linux
swap metadata at the beginning of the partition should not be disturbed
because the crash dump is written at the end.

Reviewed by: alfred, pjd, marcel
MFC after: 2 weeks


229886 09-Jan-2012 jimharris

Add support for >2TB disks in GEOM RAID for Intel metadata format.

Reviewed by: mav
Approved by: scottl
MFC after: 1 week


229537 04-Jan-2012 ray

GEOM_UNCOMPRESS module, can be used with uzip images and with new ulzma images.

Approved by: adrian (mentor)


228634 17-Dec-2011 avg

replace uses of libkern gets with cngets

MFC after: 2 months


228204 02-Dec-2011 mav

Close race between geom destruction on g_vfs_close() when softc destroyed
and g_vfs_orphan() call that tries to access softc, intruced at r227015.

PR: kern/162997


228076 28-Nov-2011 ae

Add an ability to increase number of allocated APM entries when we
have reserved free space in the APM area.
Also instead of one write request per each APM entry, use MAXPHY
sized writes when we are updating APM.

MFC after: 1 month


228061 28-Nov-2011 ae

The size of APM could be bigger than number of already allocated entries.
And the first usable sector should not start from the inside of APM area.

MFC after: 1 month


227510 14-Nov-2011 mav

Temporary revert r227009 to fix freeze on UP systems without PREEMPTION.

Before r215687, if some withered geom or provider could not be destroyed,
g_event thread went to sleep for 0.1s before retrying. After that change
it is just restarting immediately. r227009 made orphaned (withered) provider
to not detach immediately, but only after context switch. That made loop
inside g_event thread infinite on UP systems without PREEMPTION.

To address original problem with possible dead lock addressed by r227009
we have to fix r215687 change first, that needs some time to think and test.


227464 12-Nov-2011 mav

Major GEOM MULTIPATH class rewrite:
- Improved locking and destruction process to fix crashes.
- Improved "automatic" configuration method to make it consistent and safe
by reading metadata back from all specified paths after writing to one.
- Added provider size check to reduce chance of ordering conflict with
other GEOM classes.
- Added "manual" configuration method without using on-disk metadata.
- Added "add" and "remove" commands to allow manage paths manually.
- Failed paths are no longer dropped from geom, but only marked as FAIL
and excluded from I/O operations.
- Automatically restore failed paths when all others paths are marked
as failed, for example, because of device-caused (not transport) errors.
- Added "fail" and "restore" commands to manually control FAIL flag.
- geom is now destroyed on last path disconnection.
- Added optional Active/Active mode support. Unlike Active/Passive
mode, load evenly distributed between all working paths. If supported by
the device, it allows to significantly improve performance, utilizing
bandwidth of all paths. It is controlled by -A option during creation.
Disabled by default now.
- Improved `status` and `list` commands output.

Sponsored by: iXsystems, inc.
MFC after: 1 month


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


227015 02-Nov-2011 mav

Add mutex and two flags to make orphan() call properly asynchronous:
- delay consumer closing and detaching on orphan() until all I/Os complete;
- prevent new I/Os submission after orphan() called.
Previous implementation could destroy consumers still having active
requests and worked only because of global workaround made on GEOM level.


227009 01-Nov-2011 mav

Make orphan() method in geom_dev asynchronous using destroy_dev_sched_cb()
instead of destroy_dev(). It moves device destruction waiting out of the
topology lock and so fixes dead lock between orphanization and closing.
Real provider and geom destruction called from swi context after device
destroyed as callback of the destroy_dev_sched_cb().


227004 01-Nov-2011 mav

Refactor disk disconnection and geom destruction handling sequences.
Do not close/destroy opened consumer directly in case of disconnect. Instead
keep it existing until it will be closed in regular way in response to
upstream provider destruction. Delay geom destruction in the same way.
Previous implementation could destroy consumers still having active
requests and worked only because of global workaround made on GEOM level.


226998 01-Nov-2011 mav

Refactor disk disconnection and geom destruction handling sequences.
Do not close/destroy opened consumer directly in case of disconnect. Instead
keep it existing until it will be closed in regular way in response to
upstream provider destruction. Delay geom destruction in the same way.
Previous implementation could destroy consumers still having active
requests and worked only because of global workaround made on GEOM level.


226985 01-Nov-2011 mav

Workaround the problem introduced by combination of r162200 and r215687.
r162200 delays provider orphanization until all running requests complete,
to workaround broken orphan() method implementation in some classes.
r215687 removes persistent periodic (10Hz) event thread wake ups.
Together these changes can indefinitely delay orphanization until some
other event wake up the event thread. One consequence of this is inability
of CAM to destroy device disconnected when busy and, as consequence, create
new one after reconnection.

While the best solution would be to revert r162200, it is not easy, as
some classes still look broken in that way. Instead conditionally wake up
event thread if there are some providers waiting for orphanization.

MFC after: 1 week


226880 28-Oct-2011 ae

Our geom withering function could take some time before geom with its
providers and consumers will be destroyed. Before take some actions
with a geom, check that it is not destroyed at the moment.

Tested by: nwhitehorn
MFC after: 1 week


226840 27-Oct-2011 pjd

Before this change when GELI detected hardware crypto acceleration it will
start only one worker thread. For software crypto it will start by default
N worker threads where N is the number of available CPUs.

This is not optimal if hardware crypto is AES-NI, which uses CPU for AES
calculations.

Change that to always start one worker thread for every available CPU.
Number of worker threads per GELI provider can be easly reduced with
kern.geom.eli.threads sysctl/tunable and even for software crypto it
should be reduced when using more providers.

While here, when number of threads exceeds number of CPUs avilable don't
reduce this number, assume the user knows what he is doing.

Reported by: Yuri Karaban <dev@dev97.com>
MFC after: 3 days


226816 26-Oct-2011 mav

Clarify disks/volumes above 2TiB support in geom_raid:
- add support for volumes above 2TiB with Promise metadata format;
- enforse and document other limitations:
- Intel and Promise metadata formats do not support disks above 2TiB;
- NVIDIA metadata format does not support volumes above 2TiB.

Sponsored by: iXsystems, Inc.
MFC after: 2 weeks


226737 25-Oct-2011 pjd

Allow upper layers to discover than BIO_DELETE and/or BIO_FLUSH is not
supported by returning EOPNOTSUPP instead of 0 or ENODEV.

MFC after: 3 days


226736 25-Oct-2011 pjd

Improve style a bit.

MFC after: 3 days


226735 25-Oct-2011 pjd

Simplify disk_alloc().

MFC after: 3 days


226733 25-Oct-2011 pjd

Add support for creating GELI devices with older metadata version for use
with older FreeBSD versions:
- Add -V option to 'geli init' to specify version number. If no -V is given
the most recent version is used.
- If -V is given don't allow to use features not supported by this version.
- Print version in 'geli list' output.
- Update manual page and add table describing which GELI version is
supported by which FreeBSD version, so one can use it when preparing GELI
device for older FreeBSD version.

Inspired by: Garrett Cooper <yanegomi@gmail.com>
MFC after: 3 days


226730 25-Oct-2011 pjd

When decoding metadata, check magic string, so we know this is not GELI device
before we check its version. We don't want to report that some garbage is
unsupported version if this is not even GELI provider.

MFC after: 3 days


226728 25-Oct-2011 pjd

Prefer G_ELI_VERSION_* defines for version numbers over plain digits.

MFC after: 3 days


226727 25-Oct-2011 pjd

Fit lines into 80 chars.

MFC after: 3 days


226721 25-Oct-2011 pjd

When metadata is at newer version than the highest supported, return
EOPNOTSUPP when decoding.

MFC after: 3 days


226647 23-Oct-2011 marcel

Add support for Boot Camp. The support is defined as follows:
o Detect when Boot Camp is enabled (i.e. the MBR mirrors the GPT).
o When Boot Camp is enabled, update the MBR whenever we write the GPT.
o Creation of a Boot Camp enabled GPT is not supported.
o Automatically disable Boot Camp when the GPT has been changed so that
there's either no EFI partition or no HFS+ partition.
o The first 4 partitions (by index) get mirrored in the MBR.

Requested by, discussed with and tested by: kris@pcbsd.org
MFC after: 1 week


226522 18-Oct-2011 marius

Allow to dump on Solaris swap partitions.

PR: 161764
Submitted by: Peter Jeremy


224147 17-Jul-2011 pjd

Add some spare fields to the g_class and g_geom structures needed to implement
direct I/O handling and provider's property changes handling.


223930 11-Jul-2011 ae

Remove include of sys/sbuf.h from geom/geom.h.
sbuf support is not always required for geom/geom.h users, and no need to
depend from it.

PR: kern/158398


223921 11-Jul-2011 ae

Include sys/sbuf.h directly.

Reviewed by: pjd


223900 10-Jul-2011 mckusick

Allow disk partitions associated with UFS read-only mounted
filesystems to be opened for writing. This functionality used to
be special-cased for just the root filesystem, but with this change
is now available for all UFS filesystems. This change is needed for
journaled soft updates recovery.

Discussed with: Jeff Roberson


223660 29-Jun-2011 ae

Initialize elements of state array when creating the GPT table.
This fixes the problem, when the secondary GPT header is not erased when
partition table destroyed. Move equal operations from g_part_gpt_create
and g_part_gpt_recover to the separate function g_gpt_set_defaults.

Reported by: dwhite
MFC after: 1 week


223594 27-Jun-2011 ae

EBR could contain an early stage of boot code. But we do not support it.
Remove message about non empty bootcode, we can not break something
while GEOM_PART_EBR_COMPAT is defined.

But without GEOM_PART_EBR_COMPAT any changes in EBR are allowed and we
can accidentally wipe the boot code. To do not break anything save
the first EBR chunk and keep it untouched each time when we are
changing EBR. Note that we are still not support boot code for EBR.

PR: kern/141235
MFC after: 1 month


223587 27-Jun-2011 ae

MS Windows NT+ uses 4 bytes at offset 0x1b8 in the MBR to identify
disk drive. The boot0cfg(8) utility preserves these 4 bytes when is
writing bootcode to keep a multiboot ability.
Change gpart's bootcode method to keep DSN if it is not zero. Also
do not allow writing bootcode with size not equal to MBRSIZE.

PR: kern/157819
Tested by: Eir Nym
MFC after: 1 month


223332 20-Jun-2011 ae

Change the way how we update bootcode for BSD scheme.
Since the only parameter that we check is size of bootcode, then
allow only two sizes: size of boot1 and size of /boot/boot.
This partially protects users from losing ability to boot if incorrect
bootcode is specified.

Requested by: ru


223089 14-Jun-2011 gibbs

Plumb device physical path reporting from CAM devices, through GEOM and
DEVFS, and make it accessible via the diskinfo utility.

Extend GEOM's generic attribute query mechanism into generic disk consumers.
sys/geom/geom_disk.c:
sys/geom/geom_disk.h:
sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
- Allow disk providers to implement a new method which can override
the default BIO_GETATTR response, d_getattr(struct bio *). This
function returns -1 if not handled, otherwise it returns 0 or an
errno to be passed to g_io_deliver().

sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
- Don't copy the serial number to dp->d_ident anymore, as the CAM XPT
is now responsible for returning this information via
d_getattr()->(a)dagetattr()->xpt_getatr().

sys/geom/geom_dev.c:
- Implement a new ioctl, DIOCGPHYSPATH, which returns the GEOM
attribute "GEOM::physpath", if possible. If the attribute request
returns a zero-length string, ENOENT is returned.

usr.sbin/diskinfo/diskinfo.c:
- If the DIOCGPHYSPATH ioctl is successful, report physical path
data when diskinfo is executed with the '-v' option.

Submitted by: will
Reviewed by: gibbs
Sponsored by: Spectra Logic Corporation

Add generic attribute change notification support to GEOM.

sys/sys/geom/geom.h:
Add a new attrchanged method field to both g_class
and g_geom.

sys/sys/geom/geom.h:
sys/geom/geom_event.c:
- Provide the g_attr_changed() function that providers
can use to advertise attribute changes.
- Perform delivery of attribute change notifications
from a thread context via the standard GEOM event
mechanism.

sys/geom/geom_subr.c:
Inherit the attrchanged method from class to geom (class instance).

sys/geom/geom_disk.c:
Provide disk_attr_changed() to provide g_attr_changed() access
to consumers of the disk API.

sys/cam/scsi/scsi_pass.c:
sys/cam/scsi/scsi_da.c:
sys/geom/geom_dev.c:
sys/geom/geom_disk.c:
Use attribute changed events to track updates to physical path
information.

sys/cam/scsi/scsi_da.c:
Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
events for this driver. When this event occurs, and
the updated buffer type references our physical path
attribute, emit a GEOM attribute changed event via the
disk_attr_changed() API.

sys/cam/scsi/scsi_pass.c:
Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
events for this driver. When this event occurs, update
the physical patch devfs alias for this pass instance.

Submitted by: gibbs
Sponsored by: Spectra Logic Corporation


222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


222652 03-Jun-2011 mav

Update disk's stripesize and stripeoffset parameters on provider open.
They are media-dependent and may change in run-time, same as sectorsize
and/or mediasize.

SCSI devices return physical sector size and offset via READ CAPACITY(16)
command and so can not report it until media inserted or at least until
probe sequence completed. UNMAP support is also reported there.


222642 03-Jun-2011 ae

Add diagnostic message about not aligned partitions.

Idea from: ivoras


222603 02-Jun-2011 ae

Do not hide stripeoffset from libgeom(3), it may be useful even when
stripesize is zero.

MFC after: 1 week


222341 27-May-2011 ae

Some partitioning tools may have a different opinion about disk
geometry and partitions may start from withing the first track.
If we found such partitions, then do not reserve space of the
first track, only first sector.


222283 25-May-2011 ae

Prevent non-aligned reading from provider while tasting. Reject
providers with unsupported sectorsize.

Reported by: Joerg Wunsch
MFC after: 1 week


222281 25-May-2011 ae

Do not truncate available disk space to the closest track boundary.


222280 25-May-2011 ae

Do not truncate available disk space to the closest track boundary.


222279 25-May-2011 ae

Do not truncate available disk space to the closest track boundary.


222244 24-May-2011 ae

Remove unused variable.

MFC after: 1 week


222243 24-May-2011 ae

Remove unused variable.

MFC after: 1 week


222225 23-May-2011 pjd

Recognize BIO_FLUSH requests and pass them to userland.

MFC after: 1 week


221992 16-May-2011 ae

Make diagnostic messages more specific. With bootverbose print out
all inconsistencies of integrity in the partition table, not first
found only.

Requested by: kib


221984 16-May-2011 ae

Add diagnostic messages for integrity checks.


221972 15-May-2011 ae

Add a sysctl kern.geom.part.check_integrity for those who has corrupt
partition tables and lost an ability to boot after r221788.
Also unhide an error message from bootverbose, this would help to
easier determine the problem.


221953 15-May-2011 trociny

Fix a memory leak possible in g_eli_key_allocate() if the key with the
same keyno is added while we aren't holding the lock.

Approved by: pjd (mentor)
MFC after: 1 week


221792 11-May-2011 thompsa

Move the three geom kprocs as threads under a single pid.

Reviewed by: julian


221788 11-May-2011 ae

Add basic metadata integrity check. In case when partition table was
probed and read successfull, but it contains invalid values (e.g.
overlapped partitions, offset or size is out of bounds), then table
will be rejected.

MFC after: 1 month


221658 08-May-2011 ae

Limit number of sectors that can be addressed.

MFC after: 1 week


221656 08-May-2011 ae

Limit number of sectors that can be addressed.

MFC after: 1 week


221654 08-May-2011 ae

Limit number of sectors that can be addressed.
Reject table if blkcount from metadata is greater than provider.


221652 08-May-2011 ae

Limit number of sectors that can be addressed.

MFC after: 1 week


221647 08-May-2011 ae

Replace UINT_MAX to UINT32_MAX.

Pointed out by: kib
MFC after: 1 week


221645 08-May-2011 ae

Limit number of sectors that can be addressed.

MFC after: 1 week


221644 08-May-2011 ae

Limit number of sectors that can be addressed.

MFC after: 1 week


221631 08-May-2011 pjd

Export GELI class version via sysctl kern.geom.eli.version.

MFC after: 1 week


221630 08-May-2011 pjd

Version 6 is compatible with version 5 when it comes to control commands.

MFC after: 1 week


221629 08-May-2011 pjd

Detect and handle metadata of version 6.

MFC after: 1 week


221628 08-May-2011 pjd

When support for multiple encryption keys was committed, GELI integrity mode
was not updated to pass CRD_F_KEY_EXPLICIT flag to opencrypto. This resulted in
always using first key.

We need to support providers created with this bug, so set special
G_ELI_FLAG_FIRST_KEY flag for GELI provider in integrity mode with version
smaller than 6 and pass the CRD_F_KEY_EXPLICIT flag to opencrypto only if
G_ELI_FLAG_FIRST_KEY doesn't exist.

Reported by: Anton Yuzhaninov <citrin@citrin.ru>
MFC after: 1 week


221626 08-May-2011 pjd

Remove prototype for a function that no longer exist.

MFC after: 1 week


221625 08-May-2011 pjd

Drop proper key.

MFC after: 1 week


221624 08-May-2011 pjd

Add magic field to the g_eli_key structure to detect if we are really
operating on proper structures.

MFC after: 1 week


221500 05-May-2011 adrian

Updates to geom_map from the author.

The major update here is to support 64 bit size/offsets.
There's also style related changes.

Submitted by: ray@dlink.ua


221453 04-May-2011 ae

Remove unneeded code.

MFC after: 1 week


221452 04-May-2011 ae

Remove unneeded code.

MFC after: 1 week


221451 04-May-2011 ae

Remove unneeded code.

MFC after: 1 week


221449 04-May-2011 ae

Removed KASSERT, g_new_providerf() can not fail.

MFC after: 1 week


221447 04-May-2011 ae

Remove "for a moment" assignment. struct g_geom zeroed when allocated.

MFC after: 1 week


221446 04-May-2011 ae

Remove unneeded checks, g_new_xxx functions can not fail.

MFC after: 1 week


221433 04-May-2011 ae

When checking existence of providers skip those which are orphaned.

PR: kern/132273
MFC after: 2 week


221400 03-May-2011 mav

Use make_dev_alias_p() added in r221397 to create alias dev entry.
It removes panic in case if alias name is already busy for some reason.


221101 27-Apr-2011 mav

Implement relaxed comparision for hardcoded provider names to make it
ignore adX/adaY difference in both directions to simplify migration to
the CAM-based ATA or back.


221071 26-Apr-2011 mav

- Add shim to simplify migration to the CAM-based ATA. For each new adaX
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
- To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
- Add some more details to UPDATING.


220984 24-Apr-2011 pjd

One key is expected from providers smaller than or equal to (2^20)*sectorsize
bytes. Remove bogus assertion and while here remove another too obvious
assertion.

Reported by: Fabian Keil <freebsd-listen@fabiankeil.de>
MFC after: 2 weeks


220923 21-Apr-2011 pjd

If number of keys for the given provider doesn't exceed the limit,
allocate all of them at attach time. This allows to avoid moving
keys around in the most-recently-used queue and needs no mutex
synchronization nor refcounting.

MFC after: 2 weeks


220922 21-Apr-2011 pjd

Instead of allocating memory for all the keys at device attach,
create reasonably large cache for the keys that is filled when
needed. The previous version was problematic for very large providers
(hundreds of terabytes or serval petabytes). Every terabyte of data
needs around 256kB for keys. Make the default cache limit big enough
to fit all the keys needed for 4TB providers, which will eat at most
1MB of memory.

MFC after: 2 weeks


220790 18-Apr-2011 mav

Reduce geom_raid log verbosity.


220652 15-Apr-2011 gavin

Remove an incorrect be16toh() that prevented geom_part_apm from working on
little-endian machines.

Reviewed by: marcel
MFC after: 2 weeks


220559 12-Apr-2011 adrian

Introduce geom_map, a GEOM provider designed for use by
embedded flash stores.

Some devices - notably those with uboot - don't have an
explicit partition table (eg like Redboot's FIS.)
geom_map thus provides an easy way to export the hard-coded
flash layout as geom providers for use by filesystems and
other tools.

It also includes a "search" function which allows for
dynamic creation of partition layouts where the device only
has a single hard-coded partition. For example, if
there is a "kernel+rootfs" partition, a single image can
be created which appends the rootfs after the kernel with
an appropriate search string. geom_map can be told to
search for said search string and create a partition
beginning after it.

Submitted by: Aleksandr Rybalko <ray@dlink.ua>


220299 03-Apr-2011 trociny

In g_eli_read_done() and g_eli_write_done(), for a bio with
bio_children > 1, g_destroy_bio() is never called and the bio
leaks. Fix this by calling g_destroy_bio() earlier, before the check.

Submitted by: Victor Balada Diaz <victor@bsdes.net> (initial version)
Approved by: pjd (mentor)
MFC after: 1 week


220264 02-Apr-2011 pjd

GEOM has an internal mechanism to deal with ENOMEM errors returned via
g_io_deliver(). In such case it increases 'pace' counter on each ENOMEM and
reschedules the request. The 'pace' counter is decreased for each request going
down, but until 'pace' is greater than zero, GEOM will handle at most 10
requests per second. For GEOM GATE users that are proxy to local GEOM providers
(like ggatel(8) and HAST) we can end up with almost permanent slow down of GEOM
down queue. This is because once we reach GEOM GATE queue limit, we return
ENOMEM to the GEOM. This means that we have, eg. 1024 I/O requests in the GEOM
GATE queue. To make room in the queue and stop returning ENOMEM we need to
proceed the requests of course, but those requests are handled by userland
daemons that handle them by reading/writing also from/to local GEOM providers.
For example with HAST, a new requests comes to /dev/hast/data, which is GEOM
GATE provider. GEOM GATE passes the request to hastd(8) and hastd(8)
reads/writes from/to /dev/da0. Once we reach GEOM GATE queue limit, to free up
a slot in GEOM GATE queue, hastd(8) has to read/write from/to /dev/da0, but
this request will also be very slow, because GEOM now slows down all the
requests. We end up with full queue that we can unload at the speed of 10
requests per second. This simply looks like a deadlock.

Fix it by allowing userland daemons that work with both GEOM GATE and local
GEOM providers to specify unlimited queue size, so GEOM GATE will never return
ENOMEM to the GEOM.

MFC after: 1 week


220210 31-Mar-2011 mav

Bunch of small bugfixes and cleanups.

Found with: Clang Static Analyzer


220209 31-Mar-2011 mav

Bunch of small bugfixes and cleanups.

Found with: Coverity Prevent(tm)
CID: 9656, 9658, 9693, 9705, 9706, 9707, 9808, 9809, 9810,
9711, 9712, 9713, 9714


220184 31-Mar-2011 ae

Remove unneeded checks, g_new_xxx functions can not return NULL.

Reviewed by: pjd
MFC after: 1 week


220173 30-Mar-2011 trociny

Increase debug level on g_gate device destruction and add message on
device creation.

Suggested by: danger
Approved by: pjd (mentor)
MFC after: 3 days


220062 27-Mar-2011 trociny

In g_gate_create() there is a window between when g_gate_softc is
registered in g_gate_units array and when its sc_provider field is
filled. If during this period g_gate_units is accessed by another
thread that is checking for provider name collision the crash is
possible.

Fix this by adding sc_name field to struct g_gate_softc. In
g_gate_create() when g_gate_softc is created but sc_provider is still
not sc_name points to provider name stored in the local array.

Approved by: pjd (mentor)
Reported by: Freddie Cash <fjwcash@gmail.com>
MFC after: 1 week


219974 24-Mar-2011 mav

MFgraid/head:
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.

Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.

Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.

For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.

Look graid(8) manual page for additional details.

Co-authored by: imp
Sponsored by: Cisco Systems, Inc. and iXsystems, Inc.


219970 24-Mar-2011 mav

MFgraid/head r218212, r218257:
Introduce new type of BIO_GETATTR -- GEOM::setstate, used to inform lower
GEOM about state of it's providers from the point of upper layers.
Make geom_disk use led(4) subsystem to illuminate states in such fashion:
FAILED - "1" (on), REBUILD - "f5" (slow blink), RESYNC - "f1" (fast blink),
ACTIVE - "0" (off).
LED name should be set for each disk via kern.geom.disk.%s.led sysctl.
Later disk API could be extended to allow disk driver to report this info
in custom way via it's own facilities.


219950 24-Mar-2011 mav

MFgraid/head r217827:
Change BIO_GETATTR("GEOM::kerneldump") API to make set_dumper() called by
consumer (geom_dev) instead of provider (geom_disk). This allows any geom
insert it's code into the dump call chain, implementing more sophisticated
functionality then just disk partitioning.


219400 08-Mar-2011 sobomax

Some linux distros put mount point into the ext2fs labels, such as '/', or
'/boot', which confuses the devfs code and can cause userland programs to
fail reading /dev/ext2fs directory with weird error code, such as any
program that uses pwlib.

Strip any leading slashes before feeding the label to the geom_label code.

Sponsored by: Sippy Software, Inc.

MFC after: 1 week


219056 26-Feb-2011 nwhitehorn

Add the disk ident and a human-meaningful description (here, the disk model
string) to the geom_disk config XML so that they are easily accessible from
userland.

MFC after: 1 week


219029 25-Feb-2011 netchild

Add some FEATURE macros for various GEOM classes.

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: silence on geom@ during 2 weeks
X-MFC after: to be determined in last commit with code from this project


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218845 19-Feb-2011 nyan

Add support to set a slice name.


218675 14-Feb-2011 luigi

Correct a subtle bug in the 'gsched_rr' disk scheduler.
The algorithm is supposed to work as follows:
in order to prevent starvation, when a new client starts being served we
record the start time and reset the counter of bytes served.
We then switch to a new client after a certain amount of time or bytes,
even if the current one still has pending requests.
To avoid charging a new client the time of the first seek,
we start counting time when the first request is served.

Unfortunately a bug in the previous version of the code failed
to set the start time in certain cases, resulting in some processes
exceeding their timeslice.

The fix (in this patch) is trivial, though it took a while to find
out and replicate the bug.
Thanks to Tommaso Caprai for investigating and fixing the problem.

Submitted by: Tommaso Caprai
MFC after: 1 week


218663 13-Feb-2011 marcel

Use the preload_fetch_addr() and preload_fetch_size() convenience
functions to obtain the address and size of the preloaded key files.

Sponsored by: Juniper Networks.


218558 11-Feb-2011 nyan

Add support to write boot menu.


218014 28-Jan-2011 ae

Add new user-friendly aliases for partition types for the MBR and
EBR schemes: fat32, ebr, linux-data, linux-raid, linux-swap and
linux-lvm. Add bios-boot GUID and alias for the GPT scheme. It used by
GRUB 2 loader. Also do sorting definitions of types in diskmbr.h
and in g_part.c.

PR: bin/120990, kern/147664
MFC after: 2 weeks


217924 27-Jan-2011 ae

While inspecting the disklabel check that start offset of partition is
within provider's bounds. If not then reject this disklabel.
Mark bbarea as NULL to do not free it again in destroy method.

MFC after: 1 week


217915 26-Jan-2011 mdf

Remove the CTLFLAG_NOLOCK as it seems to be both unused and
unfunctional. Wiring the user buffer has only been done explicitly
since r101422.

Mark the kern.disks sysctl as MPSAFE since it is and it seems to have
been mis-using the NOLOCK flag.

Partially break the KPI (but not the KBI) for the sysctl_req 'lock'
field since this member should be private and the "REQ_LOCKED" state
seems meaningless now.


217880 26-Jan-2011 kib

Treat async buffer writes from the gjournal switcher thread the same as
from syncer. We shall not sleep on running buffer space when suspending.

Reproduced and tested by: pho
PR: kern/154228
MFC after: 1 week


217531 18-Jan-2011 ae

Limit maximum number of GPT entries to 4k. It is most realistic value
and can prevent kernel memory exhausting when big value is specified
from command line.

Split reading and writing operation to several iteration to do not
trigger KASSERT when data length is greater than MAXPHYS.

PR: kern/144962, kern/147851
MFC after: 2 weeks


217324 12-Jan-2011 mdf

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the geom piece.


217305 12-Jan-2011 ae

Sector size can not be greater than MAXPHYS. Since GRAID3 calculates
sector size from user-specified block size, report to user about
big blocksize.

PR: kern/147851
MFC after: 1 week


217303 12-Jan-2011 ae

Sector size can not be greater than MAXPHYS.

MFC after: 1 week


217263 11-Jan-2011 ae

Remove redundant check.

MFC after: 1 week


217262 11-Jan-2011 ae

Round GNOP provider's mediasize to its sectorsize. This prevents KASSERT
in g_io_request when geom classes doing tasting.

PR: kern/147852
MFC after: 1 week


217109 07-Jan-2011 mdf

Fix a memory overflow where the input length to g_gpt_utf8_to_utf16()
was specified incorrectly, causing the bzero to run past the end of a
malloc(9)'d object.

Submitted by: Eric Youngblut < eyoungblut AT isilon DOT com >
MFC after: 3 days


217040 06-Jan-2011 nwhitehorn

Add an entry to the gpart XML to determine if the geom has pending changes
that need to be committed (or undone).

MFC after: 2 weeks


216952 04-Jan-2011 kib

Finish r210923, 210926. Mark some devices as eternal.

MFC after: 2 weeks


216794 29-Dec-2010 kib

Add reporting of GEOM::candelete BIO_GETATTR for md(4) and geom_disk(4).
Non-zero value of attribute means that device supports BIO_DELETE.

Suggested and reviewed by: pjd
Tested by: pho
MFC after: 1 week


216755 28-Dec-2010 ae

Allow destroying EBR in COMPAT (default) mode.

MFC after: 2 week


216754 28-Dec-2010 ae

Make EBR probe method less strictly to be able detect EBRs with
small non fatal inconsistency. EBR may contain boot loader and sometimes
it just has some garbage data. Now this does not prevent FreeBSD to use
extended partitions. But since we do not support bootcode for EBR we mark
tables which have non empty boot area as corrupt. This does make them
readonly and we can not damage this data.

PR: kern/141235
MFC after: 1 month


216269 07-Dec-2010 brucec

Don't warn if a partition appears not to be aligned on a track boundary.
Modern disks use LBA and create a fake CHS geometry that doesn't have any
relation to the on-disk layout of data.


216132 02-Dec-2010 ivoras

Add a note about the magic number 20. Actually, 22.75 entries fit in
a 512 byte sector but when choosing magic numbers, 20 looks nicer.

Discussed with: marcel


216098 01-Dec-2010 jh

- Report an error when a label with invalid name is attempted to be
created with glabel(8).
- Fix a typo in an error message.
- Fix comment typos.

Approved by: pjd


215687 22-Nov-2010 jh

Use g_eventlock to protect against losing wakeups in the g_event process
and replace tsleep(9) with msleep(9) which doesn't use a timeout. The
previously used timeout caused the event process to wake up ten times
per second on an idle system.

one_event() is now called with the topology lock held and it returns
with both the topology and event locks held when there are no more
events in the queue.

Reported by: mav, Marius Nünnerich
Reviewed by: freebsd-geom


215299 14-Nov-2010 ed

Add support for asterisk characters when filling in the GELI password
during boot.

Change the last argument of gets() to indicate a visibility flag and add
definitions for the numerical constants. Except for the value 2, gets()
will behave exactly the same, so existing consumers shouldn't break. We
only use it in two places, though.

Submitted by: lme (older version)


215118 11-Nov-2010 ae

Fix regression introduced in r215088: gpart(8) reports
"arg0 'provider': Invalid argument" after creating new partition
table.
Move code for search of existing geom into g_part_find_geom
function and use this function instead of g_part_parm_geom
in g_part_ctl_create.

Approved by: kib (mentor)


215088 10-Nov-2010 ae

In r212554 name of G_PART_PARM_GEOM and G_PART_PARM_PROVIDER
ctlreq parameters was changed to "arg0". Fix the last place where
it is used.

Approved by: kib (mentor)


214748 03-Nov-2010 jh

Extend the g_eventlock mutex coverage in one_event() to include setting
of the EV_DONE flag and use the mutex to protect against losing wakeups
in g_waitfor_event().

Reported by: davidxu
Tested by: davidxu
Discussed on: freebsd-current


214352 25-Oct-2010 ae

Reimplemented "gpart destroy -F". Now it does all work in kernel.
This was needed for recover implementation.

Implement the recover command for GPT. Now GPT will marked as
corrupt when any of three types of corruption will be detected:
1. Damaged primary GPT header or table
2. Damaged secondary GPT header or table
3. Secondary header is not located in the last LBA
Marked GPT becomes read-only. Any changes with corrupt table
are prohibited. Only "destroy" and "recover" commands are allowed.

Discussed with: geom@ (mostly silence)
Tested by: Ilya A. Arhipov
Approved by: mav (mentor)
MFC after: 2 weeks


214229 22-Oct-2010 pjd

- Improve error messages, so instead of 'Not fully done', the user will get
information that device is already suspended or that device is using
one-time key and suspend is not supported.
- 'geli suspend -a' silently skips devices that use one-time key, this is fine,
but because we log which device were suspended on the console, log also which
devices were skipped.


214228 22-Oct-2010 pjd

Close a race between checking if device is already suspended and suspending it.


214227 22-Oct-2010 pjd

Add State tag, so 'geli status' will report active/suspended status, eg:

# geli status
Name Status Components
da0.eli SUSPENDED da0
da1.eli ACTIVE da1


214226 22-Oct-2010 pjd

Encryption keys array might be NULL if device is suspended. Check for this, so
we don't panic when we detach suspended device.


214225 22-Oct-2010 pjd

Move sc_akeyctx and sc_ivctx initialization to the g_eli_mkey_propagate()
function which eliminates code duplication and will ensure proper order
of operation.


214163 21-Oct-2010 pjd

Free opencrypto sessions on suspend, as they also might keep encryption keys.


214133 21-Oct-2010 pjd

Fix a bug introduced in r213067 where we use authentication key before
initializing it.


214118 20-Oct-2010 pjd

Bring in geli suspend/resume functionality (finally).

Before this change if you wanted to suspend your laptop and be sure that your
encryption keys are safe, you had to stop all processes that use file system
stored on encrypted device, unmount the file system and detach geli provider.

This isn't very handy. If you are a lucky user of a laptop where suspend/resume
actually works with FreeBSD (I'm not!) you most likely want to suspend your
laptop, because you don't want to start everything over again when you turn
your laptop back on.

And this is where geli suspend/resume steps in. When you execute:

# geli suspend -a

geli will wait for all in-flight I/O requests, suspend new I/O requests, remove
all geli sensitive data from the kernel memory (like encryption keys) and will
wait for either 'geli resume' or 'geli detach'.

Now with no keys in memory you can suspend your laptop without stopping any
processes or unmounting any file systems.

When you resume your laptop you have to resume geli devices using 'geli resume'
command. You need to provide your passphrase, etc. again so the keys can be
restored and suspended I/O requests released.

Of course you need to remember that 'geli suspend' won't clear file system
cache and other places where data from your geli-encrypted file system might be
present. But to get rid of those stopping processes and unmounting file system
won't help either - you have to turn your laptop off. Be warned.

Also note, that suspending geli device which contains file system with geli
utility (or anything used by 'geli resume') is not very good idea, as you won't
be able to resume it - when you execute geli(8), the kernel will try to read it
and this read I/O request will be suspended.


214116 20-Oct-2010 pjd

- Add missing comments.
- Make a comment consistent with others.


214063 19-Oct-2010 jh

Use make_dev_p(9) with the MAKEDEV_CHECKNAME flag instead of make_dev(9)
and print a diagnostic if the call fails.

This avoids a panic when a device with an invalid name is attempted to
be registered. For example the label class gets device names from
untrusted input.

Reviewed by: freebsd-geom


213769 13-Oct-2010 rpaulo

The canonical way to print __func__ when using KASSERT() is to write
("%s", __func__). This avoids clang's -Wformat-string warnings.


213662 09-Oct-2010 ae

Replace strlen(_PATH_DEV) with sizeof(_PATH_DEV) - 1.

Suggested by: kib
Approved by: kib (mentor)
MFC after: 5 days


213318 01-Oct-2010 lulf

- Check flag with the bitwise operator, not the logical operator.

Submitted by: arundel
MFC after: 1 week


213174 25-Sep-2010 ae

Some schemes can allocate memory for internal purposes but when
GEOM does withering this memory doesn't freed. Add G_PART_DESTROY
call to g_part_wither. Also add missed g_free() call to G_PART_READ
method for MBR and PC98 schemes.

Submitted by: jh (previous version)
Reviewed by: pjd
Approved by: kib (mentor)


213165 25-Sep-2010 pjd

Change g_eli_debug to int, so one can turn off any GELI output by setting
kern.geom.eli.debug sysctl to -1.

MFC after: 2 weeks


213164 25-Sep-2010 pjd

Ignore errors from BIO_FLUSH. It might confuse users that provider wasn't
really killed. What we really care about are write errors only.

MFC after: 2 weeks


213135 24-Sep-2010 pjd

Allow to configure GPT attributes. It shouldn't be allowed to set bootfailed
attribute (it should be allowed only to unset it), but for test purposes it
might be useful, so the current code allows it.

Reviewed by: arch@ (Message-ID: <20100917234542.GE1902@garage.freebsd.pl>)
MFC after: 2 weeks


213072 23-Sep-2010 pjd

Update copyright years.

MFC after: 1 week


213070 23-Sep-2010 pjd

Add support for AES-XTS. This will be the default now.

MFC after: 1 week


213067 23-Sep-2010 pjd

Implement switching of data encryption key every 2^20 blocks.
This ensures the same encryption key won't be used for more than
2^20 blocks (sectors). This will be the default now.

MFC after: 1 week


213063 23-Sep-2010 pjd

Make the code similar to the code in g_eli_integrity.c.

MFC after: 1 week


213062 23-Sep-2010 pjd

Define default overwrite count, so that userland can use it.

MFC after: 1 week


213055 23-Sep-2010 pjd

When trashing metadata, flush after each write.

MFC after: 1 week


212845 19-Sep-2010 brian

Support attaching version 4 metadata

Reviewed by: pjd


212754 16-Sep-2010 mav

Add support for dumping kernel to gconcat.
Dumping goes to the component, where dump partition begins.


212706 15-Sep-2010 pjd

Change message when setting or unsetting attribute less confusing.
Before:

ada0 has <attrib> set

After:

<attrib> set on ada0

MFC after: 2 weeks


212703 15-Sep-2010 pjd

Make the message that informs about bootcode being written to disk less
confusing.

Note there is still no information about 'partcode' being written to disk
(gpart bootcode -p <partcode> <disk>).

Maybe in the future all the messages printed by gpart(8) on success could be
hidden under -v?

PR: bin/150239
Reported by: Roddi <roddi@me.com>
Submitted by: arundel
MFC after: 2 weeks


212614 14-Sep-2010 pjd

- Change all places where G_TYPE_ASCNUM is used to G_TYPE_NUMBER.
It turns out the new type wasn't really needed.
- Reorganize code a little bit.


212609 14-Sep-2010 pjd

Simplify the code a bit.


212554 13-Sep-2010 pjd

- Remove gc_argname field. It was introduced for gpart(8), but if I
understand everything correctly, we don't really need it.
- Provide default numeric value as strings. This allows to simplify
a lot of code.
- Bump version number.


212547 13-Sep-2010 pjd

- Allow to specify value as const pointers.
- Make optional string values always an empty string.


212160 02-Sep-2010 gibbs

Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.

The barrier semantics of bioq_insert_tail() were broken in two ways:

o In bioq_disksort(), an added bio could be inserted at the head of
the queue, even when a barrier was present, if the sort key for
the new entry was less than that of the last queued barrier bio.

o The last_offset used to generate the sort key for newly queued bios
did not stay at the position of the barrier until either the
barrier was de-queued, or a new barrier (which updates last_offset)
was queued. When a barrier is in effect, we know that the disk
will pass through the barrier position just before the
"blocked bios" are released, so using the barrier's offset for
last_offset is the optimal choice.

sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
o Update last_offset in bioq_insert_tail().

o Only update last_offset in bioq_remove() if the removed bio is
at the head of the queue (typically due to a call via
bioq_takefirst()) and no barrier is active.

o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
set prev to the barrier and cur to it's next element. Now that
last_offset is kept at the barrier position, this change isn't
strictly necessary, but since we have to take a decision branch
anyway, it does avoid one, no-op, loop iteration in the while
loop that immediately follows.

o In bioq_disksort(), bypass the normal sort for bios with the
BIO_ORDERED attribute and instead insert them into the queue
with bioq_insert_tail(). bioq_insert_tail() not only gives
the desired command order during insertion, but also provides
barrier semantics so that commands disksorted in the future
cannot pass the just enqueued transaction.

sys/sys/bio.h:
Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
Use an ordered command for SCSI/ATA-NCQ commands issued in
response to bios with the BIO_ORDERED flag set.

sys/cam/scsi/scsi_da.c
Use an ordered tag when issuing a synchronize cache command.

Wrap some lines to 80 columns.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
Mark bios with the BIO_FLUSH command as BIO_ORDERED.

Sponsored by: Spectra Logic Corporation
MFC after: 1 month


211927 28-Aug-2010 pjd

Correct offset conversion to little endian. It was implemented in version 2,
but because of a bug it was a no-op, so we were still using offsets in native
byte order for the host. Do it properly this time, bump version to 4 and set
the G_ELI_FLAG_NATIVE_BYTE_ORDER flag when version is under 4.

MFC after: 2 weeks


211455 18-Aug-2010 mav

Remove bintime_cmp() function, unused since r200086.

MFC after: 1 week


210795 03-Aug-2010 ae

Check that gsp is not NULL before access. It can be NULL
for some cases.

Approved by: kib (mentor)
MFC after: 1 week


210792 03-Aug-2010 ae

Check that table is not NULL before access, it can be NULL
for some cases.

Approved by: mav (mentor)
MFC after: 2 weeks


210747 02-Aug-2010 ae

Forward ioctl requests to original geom.

PR: 148540
Silence from: luigi
Reviewed by: pjd
Approved by: mav (mentor)
MFC after: 2 weeks


210746 02-Aug-2010 ae

Release access for consumers that are opened, but will be destroyed
indirectly by orphan method.

PR: 148688
Silence from: marcel
Approved by: mav (mentor)
MFC after: 2 weeks


210471 25-Jul-2010 mav

Export PCI IDs of ATA/SATA controllers through CAM and ata(4) layers to
GEOM. This information needed for proper soft-RAID's on-disk metadata
reading and writing.


210401 23-Jul-2010 ae

Prevent access after free to table entry in case when
user deletes partition that not yet created (changes doesn't
committed to disk).

PR: 148687
Approved by: mav (mentor)
MFC after: 7 days


210046 14-Jul-2010 ru

Fixed cache size decoding read from a label.

PR: kern/144732
Submitted by: Eugene Grosbein
MFC after: 3 days


209536 26-Jun-2010 rpaulo

Add NTFS partition type to GEOM_MBR.


209187 14-Jun-2010 pjd

'unit' can be negative, so use signed type for it.

Found by: Coverity Prevent
CID: 3731
MFC after: 3 days


209186 14-Jun-2010 pjd

BIO_DELETE contains range we want to delete and doesn't provide any useful
data, so there is no need to copy it to userland.

MFC after: 3 days


209062 11-Jun-2010 avg

fix a few cases where a string is passed via format argument instead of
via %s

Most of the cases looked harmless, but this is done for the sake of
correctness. In one case it even allowed to drop an intermediate buffer.

Found by: clang
MFC after: 2 week


208992 10-Jun-2010 trasz

Untangle g_print_bio(), silencing Coverity.

Found with: Coverity Prevent
CID: 3566, 3567


208927 08-Jun-2010 mjacob

Try and narrow the gap in which you act on an event that has been canceled.
Obtained from: Jaako Heinonen
MFC after: 1 month


208812 05-Jun-2010 trasz

Make sure not to pass NULL to g_orphan_provider().

Found with: Coverity Prevent
CID: 3411


208746 02-Jun-2010 marius

Don't leak memory on destruction.

Reviewed by: marcel
MFC after: 3 days


208672 31-May-2010 avg

g_label: fix possible NULL pointer dereference

in case glabel debug level is >= 1 and gp->provider list is empty
for some reason

Found by: clang static analyzer
MFC after: 4 days


208515 24-May-2010 marius

Fix some whitespace nits.


208173 16-May-2010 nwhitehorn

Teach gpart about bootcode on APM.


208101 14-May-2010 mjacob

Yet another potential dereference of a dead provider.

Sponsored by: Panasas
MFC after: 1 week


208082 14-May-2010 mjacob

Make sure to check that the active provider pointer points to something before
dereferencing the pointer.

Sponsored by: Pansas
MFC after: 1 week


207878 10-May-2010 jh

- Don't return EAGAIN from gv_unload(). It was used to work around the
deadlock fixed in r207671.
- Wait for worker process to exit at class unload. The worker process
was not guaranteed to exit before the linker unloaded the module.
- Use 0 as the worker process exit status instead of ENXIO and style
the NOTREACHED comment.

Reviewed by: lulf
X-MFC after: r207671


207877 10-May-2010 jh

In g_zero_destroy_geom(), return 0 instead of EBUSY in the success case.
EBUSY was probably used as a workaround for the deadlock fixed in r207671.

Approved by: pjd
X-MFC after: r207671


207789 08-May-2010 lulf

- Remove obsolete flags.

MFC after: 1 week


207671 05-May-2010 jh

Fix deadlock between GEOM class unloading and withering. Withering can't
proceed while g_unload_class() blocks the event thread. Fix this by not
running g_unload_class() as a GEOM event and dropping the topology lock
when withering needs to proceed.

PR: kern/139847
Silence on: freebsd-geom


207181 25-Apr-2010 marcel

Re-calculate a geometry when reprobing as well.

PR: kern/145452
Reported by: "Andrey V. Elsukov" <bu7cher@yandex.ru>


207178 25-Apr-2010 marcel

Fix undo for schemes that have internal partitions. Internal partitions
do not constitute user-visible or active partitions and as such should
not prevent undoing pending operations.

While here, initialize the last usable sector for the placeholder geom
based on the null scheme, created to allow undoing the destruction of
a scheme. This gives consistent output with "gpart show".

Based on a patch from: "Andrey V. Elsukov" <bu7cher@yandex.ru>


207094 23-Apr-2010 marcel

Implement the resize verb and add support for resizing partitions
for all schemes but EBR. Quality work by Andrey!

Submitted by: "Andrey V. Elsukov" <bu7cher@yandex.ru>


206859 19-Apr-2010 jh

Fix ddb(4) "show geom addr" command when INVARIANTS is enabled. Don't
assert that the topology lock is held when g_valid_obj() is called from
debugger.

MFC after: 1 week


206665 15-Apr-2010 pjd

Use lower priority for GELI worker threads. This improves system
responsiveness under heavy GELI load.

MFC after: 3 days


206650 15-Apr-2010 avg

g_io_check: respond to zero pp->mediasize with ENXIO

Previsouly this condition was reported with EIO by bio_offset > mediasize
check.
Perhaps that check should be extended to bio_offset+bio_length > mediasize.

MFC after: 1 week


206552 13-Apr-2010 luigi

fix copyright format, as requested by Joel Dahl


206551 13-Apr-2010 luigi

make code compile with KTR


206497 12-Apr-2010 luigi

Bring in geom_sched, support for scheduling disk I/O requests
in a device independent manner. Also include an example anticipatory
scheduler, gsched_rr, which gives very nice performance improvements
in presence of competing random access patterns.

This is joint work with Fabio Checconi, developed last year
and presented at BSDCan 2009. You can find details in the
README file or at

http://info.iet.unipi.it/~luigi/geom_sched/


206130 03-Apr-2010 avg

g_vfs_open: allow only one mount per device vnode

In other words, deny multiple read-only mounts of the same device.
Shared read-only mounts should theoretically be possible, but,
unfortunately, can not be implemented correctly using current
buffer cache code/interface and results in an eventual system crash.
Also, using nullfs seems to be a more efficient way to achieve the same
goal.

This gets us back to where we were before GEOM and where other BSDs are.

Submitted by: pjd (idea for checking for shared mounting)
Discussed with: phk, pjd
Silence from: fs@, geom@
MFC after: 2 weeks


206097 02-Apr-2010 avg

bo_bsize: revert r205860 and take an alternative approch in getblk

In r205860 I missed the fact that there is code that strongly assumes
that devvp bo_bsize is equal to underlying provider's sectorsize.
In those places it is hard to obtain the sectorsize in an alternative
way if devvp bo_bsize is set to something else.
So, I am reverting bo_bsize assigment in g_vfs_open.
Instead, in getblk I use DEV_BSIZE block size for b_offset calculation
if vp is a disk vp as reported by vn_isdisk. This should coinside with
vp being a devvp.

Reported by: Mykola Dzham <i@levsha.me>
Tested by: Mykola Dzham <i@levsha.me>
Pointyhat to: avg
MFC after: 2 weeks
X-ToDo: convert bread(devvp) in all fs to use bo_bsize-d blocks


205860 29-Mar-2010 avg

g_vfs_open: correctly set devvp.v_bufobj.bo_bsize to DEV_BSIZE

Because of how breadn -> bufstrategy -> g_vfs_strategy are currently
implemented, bread on devvp always expects DEV_BSIZE block size.
Thus, devvp bo_bsize must always be DEV_BSIZE irrespective of media
properties or filesystem implementation details.

Reviewed by: mckusick
MFC after: 2 weeks


205847 29-Mar-2010 mjacob

Change how multipath labels are created and managed. This makes it easier
to support various storage boxes which really aren't active-active.

We only write the label on the *first* provider. For all other providers
we just "add" the disk. This also allows for an "add" verb.

A usage implication is that you should specificy the currently active
storage path as the first provider.

Note that this does not add RDAC-like functionality, but better allows for
autovolumefailover configurations (additional checkins elsewhere will support
this).

Sponsored by: Panasas
MFC after: 1 month


205619 24-Mar-2010 mav

Do not fetch precise time of request start when stats collection disabled.

Reviewed by: pjd, phk


205412 21-Mar-2010 mjacob

Add 'rotate' and 'getactive' verbs to provide some control and information
about what the currently active path is.

Sponsored by: Panasas
MFC after: 1 month


205385 20-Mar-2010 jh

Escape characters unsafe for XML output in GEOM class, instance and
provider names.

- Characters in range 0x01-0x1f except '\t', '\n', and '\r' are replaced
with '?'. Those characters are disallowed in XML.
- '&', '<', '>', '\'', '"' and characters in range 0x7f-0xff are
replaced with XML numeric character reference.

If the kern.geom.confxml sysctl provides invalid XML, libgeom
geom_xml2tree() fails and utilities using it do not work. Unsafe
characters are common in msdosfs and cd9660 labels.

PR: kern/104389
Submitted by: Doug Steinwand (original version)
Reviewed by: pjd
Discussed on: freebsd-geom
MFC after: 3 weeks


205279 18-Mar-2010 pjd

Simplify loops.


204886 08-Mar-2010 lulf

- Set missing flag when initiating a plex rebuild with the rebuildparity
command.
- Check if plex is already syncing or rebuilding before initiating a parity
rebuild or check.


204076 18-Feb-2010 pjd

Please welcome HAST - Highly Avalable Storage.

HAST allows to transparently store data on two physically separated machines
connected over the TCP/IP network. HAST works in Primary-Secondary
(Master-Backup, Master-Slave) configuration, which means that only one of the
cluster nodes can be active at any given time. Only Primary node is able to
handle I/O requests to HAST-managed devices. Currently HAST is limited to two
cluster nodes in total.

HAST operates on block level - it provides disk-like devices in /dev/hast/
directory for use by file systems and/or applications. Working on block level
makes it transparent for file systems and applications. There in no difference
between using HAST-provided device and raw disk, partition, etc. All of them
are just regular GEOM providers in FreeBSD.

For more information please consult hastd(8), hastctl(8) and hast.conf(5)
manual pages, as well as http://wiki.FreeBSD.org/HAST.

Sponsored by: FreeBSD Foundation
Sponsored by: OMCnet Internet Service GmbH
Sponsored by: TransIP BV


204071 18-Feb-2010 pjd

- Style fixes.
- Prefer strlcpy() over strncpy().


204070 18-Feb-2010 pjd

Correct comment.


204069 18-Feb-2010 pjd

Log attach just like we log detach.


203411 03-Feb-2010 gonzo

- Give geom_redboot taste of flash/spi. Now there is another provider
of redboot partitions. This patch was missed during merge from
projects/mips.


203408 02-Feb-2010 delphij

Prevent NULL deference by checking return value of
gctl_get_asciiparam.

MFC after: 2 weeks


203261 30-Jan-2010 marcel

Export the UUID of the partition in the XML. The partition UUID is used
by EFI's device path to identify a partition. In order for FreeBSD to
add EFI boot options, proper device paths need to be constructed.


202987 25-Jan-2010 ivoras

Go through with write_metadata() non-error-handling and make it return "void".
This is mostly to avoid dead variable assignment warning by LLVM.
No functional change.

Pointed out by: trasz
Approved by: gnn (mentor)


202977 25-Jan-2010 trasz

Remove unneeded variables.

Found with: clang


202976 25-Jan-2010 trasz

Remove pointless assignment.

Found with: clang


202974 25-Jan-2010 trasz

Remove some pointless variable assignments.

Found with: clang


202972 25-Jan-2010 trasz

Remove unused variable.

Found with: clang


202454 17-Jan-2010 delphij

Expose stripe offset and stripe size through libgeom and geom(8) userland
utilities.

Reviewed by: pjd, mav (earlier version)


202437 16-Jan-2010 trasz

Add gmountver, disk mount verification GEOM class.

Note that due to e.g. write throttling ('wdrain'), it can stall all the disk
I/O instead of just the device it's configured for. Using it for removable
media is therefore not a good idea.

Reviewed by: pjd (earlier version)


201645 06-Jan-2010 mav

Change the way in which zero stripesize is handled. Instead of reporting
zero stripeoffset in such case (as if device has no stripes), report offset
from the beginning of the media (as if device has single infinite stripe).

This gives partitioning tools information, required to guess better
partition alignment, in case if hardware doesn't report it's stripe size.
For example, it should give disklabel info about odd offset made by fdisk.


201567 05-Jan-2010 mav

Move wakeup() out of mutex to reduce contention.


201566 05-Jan-2010 mav

Move wakeup() out of mutex to reduce contention.


201545 05-Jan-2010 mav

Slightly optimize XOR calculation.


201374 02-Jan-2010 marcel

Properly return the UUID represented by the alias.

PR: 142174
Submitted by: Przemyslaw Laczynski <torindel@gmail.com>
Pointy hat to: rpaulo


201264 30-Dec-2009 mav

Call wakeup() only for the first request on the queue.


201145 28-Dec-2009 antoine

(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month


201139 28-Dec-2009 mav

Add BIO_DELETE support to ada(4):
- For SSDs use TRIM feature of DATA SET MANAGEMENT command, as defined by
ACS-2 specification working draft.
- For CompactFlash use CFA ERASE command, same as ad(4) does.

With this patch, `newfs -E /dev/ada1` was able to restore write speed of
my heavily weared OCZ Vertex SSD (firmware 1.4) up to the initial level
for the most part of it's capacity. Previous 1.3 firmware, even reportiong
TRIM capabilty bit set, was not working, reporting ABORT error for every
DSM command.

I have no idea whether it is normal, but for some reason it takes 200ms
to handle any TRIM command on this drive, that was making delete extremely
slow. But TRIM command is able to accept long list of LBAs and the length of
that list seems doesn't affect it's execution time. Implemented request
clusting algorithm allowed me to rise delete rate up to reasonable numbers,
when many parallel DELETE requests running.


200942 24-Dec-2009 mav

Make geom_concat to passthrough stripe parameters of the first component,
hoping that rest will fit.


200940 24-Dec-2009 mav

As soon as geom_raid3 reports it's own stripe as sector size, report largest
underlying provider's stripe, multiplied by number of data disks in array,
due to transformation done, as array stripe.


200935 24-Dec-2009 mav

As soon as mirror has no own stripes, report largest stripe of unrerlying
components, hoping others fit, if they are not equal.


200934 24-Dec-2009 mav

Add two disk ioctls, giving user-level tools information about disk/array
stripe (optimal access block) size and offset.


200933 24-Dec-2009 mav

Make geom_stripe report it's stripe size to upper layers.


200821 21-Dec-2009 mav

Make graid3 fallback to malloc() when component request size is bigger
then maximal prepared UMA zone size. This fixes crash with MAXPHYS > 128K.


200539 14-Dec-2009 rpaulo

Add Microsoft and NetBSD partition types handling.


200534 14-Dec-2009 rpaulo

Simplify partition type parsing by using a data-oriented model.
While there add more Apple and Linux partition types.


200086 03-Dec-2009 mav

Change 'load' balancing mode algorithm:
- Instead of measuring last request execution time for each drive and
choosing one with smallest time, use averaged number of requests, running
on each drive. This information is more accurate and timely. It allows to
distribute load between drives in more even and predictable way.
- For each drive track offset of the last submitted request. If new request
offset matches previous one or close for some drive, prefer that drive.
It allows to significantly speedup simultaneous sequential reads.

PR: kern/113885
Reviewed by: sobomax


199875 28-Nov-2009 trasz

Provide a set of sysctls and tunables to disable device node creation
for specific "kinds" of disk labels - for example, GPT UUIDs. Reason
for this is that sometimes, other GEOM classes attach to these device
nodes instead of the proper ones - e.g. they attach to /dev/gptid/XXX
instead of /dev/ada0p2, which is annoying.

Reviewed by: pjd (earlier version)
MFC after: 1 month


199232 12-Nov-2009 rpaulo

Add a missing check for Apple HFS partitions.

MFC after: 1 week


199228 12-Nov-2009 rnoland

We need to allocate space for the header in the create path also.

This fixes a null pointer dereference with "gpart create -s GPT" after
the previous commit.

Reported by: Yuri Pankov
Pointyhat to: me
MFC after: 1 week


199017 07-Nov-2009 rnoland

Fix handling of GPT headers when size is > 92 bytes.

It is valid for an on-disk GPT header to report a header size which is
greater than 92 bytes. Previously, we would read in the sector and copy
only the 92 bytes that we know how to deal with before calculating the
checksum for comparison. This meant that when we did the checksum, we
overshot the buffer and took in random memory, so the checksum would fail.

We now determine the size of the header and allocate enough space to
preserve the entire on-disk contents. This allows us to be correctly
calculate the checksum and be able to modify and write the header back
to the disk, while preserving data that we might not understand.

Reported by: Kris Weston
Approved by: marcel@
MFC after: 2 weeks


198097 14-Oct-2009 rnoland

Set the active flag in the PMBR when we install bootcode on a GPT
partitioned disk. Some BIOS require this to be set before they will
boot the device.

Approved by: marcel
MFC after: 2 weeks


197898 09-Oct-2009 pjd

If provider is open for writing when we taste it, skip it for classes that
depend on on-disk metadata. This was we won't attach to providers that are used
by other classes. For example we don't want to configure partitions on da0 if
it is part of gmirror, what we really want is partitions on mirror/foo.

During regular work it works like this: if provider is open for writing a class
receives the spoiled event from GEOM and detaches, once provider is closed the
taste event is send again and class can rediscover its metadata if it is still
there. This doesn't work that way when new class arrives, because GEOM gives
all existing providers for it to taste, also those open for writing. Classes
have to decided on their own if they want to deal with such providers (eg.
geom_dev) or not (classes modified by this commit).

Reported by: des, Oliver Lehmann <lehmann@ans-netz.de>
Tested by: des, Oliver Lehmann <lehmann@ans-netz.de>
Discussed with: phk, marcel
Reviewed by: marcel
MFC after: 3 days


197767 05-Oct-2009 lulf

- Improve error message consistency and wording.


197608 28-Sep-2009 marcel

The first 96 bytes may not be zeroes. It can contain trivial boot
code that merely emits an error and waits for a key press before
rebooting. The error being that extended partitions are not
bootable. The origin is presumed to be Windows 2000; Windows XP
does not do this...

For now, ignore the first 96 bytes when checking that the EBR is
(for the most part) all zeroes.

Tested by: Mario Lobo <mlobo@digiart.art.br>
MFC after: 1 week


197449 24-Sep-2009 marcel

Don't create more partitions than can fit in the table by checking
that the index is within bounds.


196986 08-Sep-2009 trasz

Remove unused variable.


196964 08-Sep-2009 mav

Do not check proper request alignment here in geom_dev in production.
It will be checked any way later by g_io_check() in g_io_schedule_down().
It is only needed here to not trigger panic from additional check, when
INVARIANTS enabled. So cover it with #ifdef INVARIANTS. It saves two
64bit divisions per request.


196904 06-Sep-2009 mav

MFp4:
Remove msleep() timeout from g_io_schedule_up/down(). It works fine
without it, saving few percents of CPU on high request rates without
need to rearm callout twice per request.


196879 06-Sep-2009 pjd

Add support for changing providers priority.

Submitted by: Mel Flynn


196837 04-Sep-2009 mav

Remove artificial MAX_IO_SIZE constant, equal to DFLTPHYS * 2. Use MAXPHYS
instead. It is NULL change for GENERIC kernel, but allows 'fast' mode to
work on systems with increased MAXPHYS.


196823 04-Sep-2009 pjd

Simplify g_disk_ident_adjust() function and allow any printable character
in serial number.

Discussed with: trasz
Obtained from: Wheel Sp. z o.o. (http://www.wheel.pl)


196580 27-Aug-2009 pjd

There's no need for checking result of M_WAITOK allocation.


196579 27-Aug-2009 pjd

Fix an obvious topology lock leak.

MFC after: 3 days


196333 17-Aug-2009 marcel

The start of the EFI GPT partition in the PMBR can always be represented
by CHS addressing. Don't define these fields as 0xff, but rather define
them correctly. This prevents boot problems on PCs where GPT is being
used.

PR: 115406
Submitted by: Kent Hauser <kent@khauser.net>
Approved by: re (kib)


195752 18-Jul-2009 lulf

- Fix the issue with read access count modification on RAID-5 plexes properly.
If the access counts were not increased and decreased in equal numbers by
gvinum consumers, the read access count would be inconsistent with the write
access count. Instead, modify the read access count with the write access
count directly to prevent any inconsistencies.

Approved by: re (kib)


195436 08-Jul-2009 marcel

Revert revisions 188839 and 188868. Use of the ioctl in geom_dev.c
is invalid because the ioctl happens without prior open. The ioctl
got introduced to provide backward compatibility for extended
partitions, but it ended up not being used because it didn't work
as expected. Since there are no consumers of the ioctl and the
implementation is broken, the best fix is to remove the code
entirely.

Spotted by: phk
Approved by: re (kensmith)


195257 01-Jul-2009 trasz

Fix a panic which (reportedly) can happen when unmounting a filesystem
with I/O requests in flight on kernels compiled with "options INVARIANTS".
Also, make it obvious it's not right to call g_valid_obj() (and macros
using it, e.g. G_VALID_CONSUMER()) without topology lock held.

Approved by: re (kib)
Reported by: pho


195195 30-Jun-2009 trasz

Make gjournal work with kernel compiled with "options DIAGNOSTIC".
Previously, it would panic immediately.

Reviewed by: pjd
Approved by: re (kib)


194924 24-Jun-2009 lulf

- Apply the same naming rules of LVM names as done in the LVM code itself.

PR: kern/135874


194811 24-Jun-2009 jhay

Do not stop the loop when an empty or deleted directory entry is found.
Rather just skip over it.


194433 18-Jun-2009 ivoras

Fix tabs, slightly improve comments.

Approved by: gnn (mentor) (original)
Noticed by: stas


194092 13-Jun-2009 ivoras

Add support for labels derived from GPT metadata.

Approved by: gnn (mentor)
Reviewed by: pjd
PR: 128398
Submitted by: Marius Nuennerich < marius at nuenneri.ch >


193981 11-Jun-2009 luigi

As discussed in the devsummit, introduce two fields in the
struct bio to store classification information, and a hook
for classifier functions that can be called by g_io_request().

This code is from Fabio Checconi as part of his GSOC work.


193547 05-Jun-2009 pjd

Simplify.


193131 30-May-2009 dougb

Crank the debug level necessary to display the "Label foo is removed"
and "Label for provider ..." messages up from 0 to 1.


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


192808 26-May-2009 lulf

- Unbreak 64 bit platforms by casting off_t to intmax.


192803 26-May-2009 lulf

- Fix wrong print on BIO_DONE.
- Use db_printf instead of printf. While here, apply this to other ddb commands
as well.

Pointed out by: pjd


192797 26-May-2009 lulf

- Add 'show bio' DDB command.

MFC after: 3 weeks


192021 12-May-2009 trasz

Check return value of gctl_get_asciiparam().

Found with: Coverity Prevent(tm)
CID: 1118


191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


191856 06-May-2009 lulf

- Split up the BIO queue into a queue for new and one for completed requests.
This is necessary for two reasons:
1) In order to avoid collisions with the use of a BIOs flags set by a consumer
or a provider
2) Because GV_BIO_DONE was used to mark a BIO as done, not enough flags was
available, so the consumer flags of a BIO had to be misused in order to
support enough flags. The new queue makes it possible to recycle the
GV_BIO_DONE flag into GV_BIO_GROW.
As a consequence, gvinum will now work with any other GEOM class under it or
on top of it.

- Use bio_pflags for storing internal flags on downgoing BIOs, as the requests
appear to come from a consumer of a gvinum volume. Use bio_cflags only for
cloned BIOs.
- Move gv_post_bio to be used internally for maintenance requests.
- Remove some cases where flags where set without need.

PR: kern/133604


191855 06-May-2009 lulf

- Fix a case where a RAID5 volume would think that it is supposed to grow a new
subdisk after a parity rebuild.


191854 06-May-2009 lulf

- Check if any plexes are doing internal maintenance before removing them.


191853 06-May-2009 lulf

- Add forgotten KASSERT.


191852 06-May-2009 lulf

- Fix a bug where the bio_data field of the wrong BIO is freed if an error
occurs when doing a RAID5 request.


191850 06-May-2009 lulf

- GV_BIO_RETRY is not used, and it is actually impossible with more than 8
values for bio_cflags/bio_pflags.


191849 06-May-2009 lulf

- Split the queue mutex into one for the event queue and one for the BIO queue,
as they do not really relate and to prepare for an additional queue to be
covered by the BIO queue mutex.
- Implement wrappers for fetching the next element from the event queue as well
as for putting a new element into the BIO queue.


191787 04-May-2009 lulf

- Make the gvinum softc invisible to userland, as it is not needed.


191248 18-Apr-2009 lulf

- Remove assertion of topology lock remaining from 7.x gvinum. It is not needed,
as the renaming only changes internal gvinum names and will not alter the geom
topology.
- The topology lock was not held when calling g_wither_geom after renaming.


191134 16-Apr-2009 marcel

Precision '*' expects an int and strlen() returns a size_t.
Compensate.


191130 15-Apr-2009 marcel

Add a compat option to the EBR scheme that controls the
naming of the partitions (GEOM_PART_EBR_COMPAT). When
compatibility is enabled, changes to the partitioning are
disallowed.

Remove the device name aliasing added previously to provide
backward compatibility, but which in practice doesn't give
us anything.

Enable compatibility on amd64 and i386.


190881 10-Apr-2009 lulf

- Move out allocation part of different gvinum objects into its own routine and
make use of it in the gvinum userland code.


190878 10-Apr-2009 thompsa

Revert r190676,190677

The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by: scottl


190849 08-Apr-2009 marcel

Don't use hexadecimal in the EBR partition names, because 'a'..'f'
are more commonly known as BSD partition names.

Discussed with: ivoras@


190677 03-Apr-2009 thompsa

Add interleaving root hold tokens from the CAM probe to disk_create and geom
provider tasting. This is needed for disk attachments that happen after threads
are running in the boot process.

Tested by: rnoland


190676 03-Apr-2009 thompsa

Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.


190667 03-Apr-2009 marcel

The 9 bytes immediately prior to the partition table can contain
signatures or disk serial numbers. Don't assume those to be zero
in all cases. This fixes a false negative.

Tested by: avatar@mmlab.cse.yzu.edu.tw


190537 30-Mar-2009 marcel

Sharpen the saw:
o PC98 uses 32-bit block numbers. Limit the scheme to 2^32-1
blocks when the media is larger. The 32-bit block numbers
are implicit (16-bit cylinder * 8-bit head * 8-bit sector).


190536 30-Mar-2009 marcel

Sharpen the saw:
o MBR uses 32-bit block numbers. Limit the scheme to 2^32-1
blocks when the media is larger.


190535 30-Mar-2009 marcel

Sharpen the saw:
o EBR uses 32-bit block numbers. Limit the scheme to 2^32-1
blocks when the media is larger.
o Calculate the number of entries based on the rounded media
size, rather than the raw media size.


190534 30-Mar-2009 marcel

Sharpen the saw:
o Don't create a GPT scheme underneath another scheme when
the probe doesn't allow it.


190513 28-Mar-2009 lulf

- Add files that should have been added in r190507.


190507 28-Mar-2009 lulf

Import the gvinum work that have been done during and after Summer of Code 2007.
The work have been under testing and fixing since then, and it is mature enough
to be put into HEAD for further testing.

A lot have changed in this time, and here are the most important:
- Gvinum now uses one single workerthread instead of one thread for each
volume and each plex. The reason for this is that the previous scheme was
very complex, and was the cause of many of the bugs discovered in gvinum.
Instead, gvinum now uses one worker thread with an event queue, quite
similar to what used in gmirror.
- The rebuild/grow/initialize/parity check routines no longer runs in
separate threads, but are run as regular I/O requests with special flags.
This made it easier to support mounted growing and parity rebuild.
- Support for growing striped and raid5-plexes, meaning that one can extend the
volumes for these plex types in addition to the concat type. Also works while
the volume is mounted.
- Implementation of many of the missing commands from the old vinum:
attach/detach, start (was partially implemented), stop (was partially
implemented), concat, mirror, stripe, raid5 (shortcuts for creating volumes
with one plex of these organizations).
- The parity check and rebuild no longer goes between userland/kernel, meaning
that the gvinum command will not stay and wait forever for the rebuild to
finish. You can instead watch the status with the list command.
- Many problems with gvinum have been reported since 5.x, and some has been hard
to fix due to the complicated architecture. Hopefully, it should be more
stable and better handle edge cases that previously made gvinum crash.
- Failed drives no longer disappears entirely, but now leave behind a dummy
drive that makes sure the original state is not forgotten in case the system
is rebooted between drive failures/swaps.
- Update manpage to reflect new commands and extend it with some examples.

Sponsored by: Google Summer of Code 2007
Mentored by: le
Tested by: Rick C. Petty <rick-freebsd2008 -at- kiwi-computer.com>


190463 27-Mar-2009 marcel

Sharpen the saw:
o BSD uses 32-bit block numbers. Limit the scheme to 2^32-1
blocks when the media is larger.


190461 27-Mar-2009 marcel

Sharpen the saw:
o Don't create an APM scheme underneath another scheme when
the probe doesn't allow it.
o APM uses 32-bit block numbers. Limit the scheme to 2^32-1
blocks when the media is larger.


190443 26-Mar-2009 marcel

Change the priority from high to normal. This makes sure that
the BSD or GPT schemes can take precedence as appropriate.


190423 25-Mar-2009 ivoras

Create GEOM labels from UFS IDs, e.g. /dev/ufsid/49c97b1faa2adc43. UFS IDs
are always present and can be used to identify file systems (useful if
hardware devices move often).

Actually-by: pjd
Approved by: gnn (mentor)


190232 22-Mar-2009 ivoras

Be more explicit and complain if kernel dumps are perfomed on unsupported
partition types. This is to help users used to the old behaviour.

Reviewed by: marcel
Approved by: gnn (mentor)


190058 19-Mar-2009 ivoras

Make GEOM provider names starting with "/dev/" acceptable as well as their
"raw" names. While there, change the formatting of extended MSDOS partitions
so that the dot (".") is not used to separate two numbers (which kind of
looks like the whole is a decimal number). Use "+" instead, which also
hints that the second part of the name is the offset from the start of
the partition in the first part of the name. Also change the offset from
decimal to hexadecimal notation, simply for aesthetic reasons and future
compatibility.

GEOM_PART is the default in 8-CURRENT but not yet in 7-STABLE so this
changeset can be MFC-ed without causing major problems from the second
part.

Reviewed by: marcel
Approved by: gnn (mentor)
MFC after: 2 weeks


189900 16-Mar-2009 pjd

Detach GELI providers on shutdown/reboot, which will allow providers underneath
to close properly.

Reported, reviewed and tested by: guido
MFC after: 1 week


189762 13-Mar-2009 guido

Backout this commit whil a better solution is developed


189695 11-Mar-2009 nyan

Move the PC98_[MS]ID_* defines from g_part_pc98.c to diskpc98.h.

Reviewed by: marcel


189660 11-Mar-2009 sam

o disallow write to RedBoot and FIS directory partitions; these are painful
to resurrect (maybe honor foot shooting bit in kern.geom_debugflags)
o fix match macro so we now recognize we want to merge FIS dir with RedBoot
config parameters even if we don't actually do it


189625 10-Mar-2009 guido

When attaching a geli on boot make sure that it is detached
upon last close. (needed for a gmirror to properly shutdown
upon reboot when a geli is on top the gmirror)


189616 10-Mar-2009 nyan

Restore the return statement. It was accidentally removed by rev 188429.


189608 09-Mar-2009 sam

add geom_redboot, a geom module that exports RedBoot FIS partitions as named
slices in dev/redboot/*


188899 21-Feb-2009 marcel

o When creating the EBR scheme, set the number of entries
properly. Otherwise the minimum of 1 is used and you can
only insert a single partition/slice and only at sector
0 (index 1).
o When adding a partition/slice, recalculate the index after
the start and size of the partition/slice are adjusted to
make them a multiple of the track size. Since the precheck
method sets the index based on the start of the partition
as provided by the user, we know that we're off by at most
1 and adjusting the index is safe.


188893 21-Feb-2009 marcel

Add bootcode handling.


188839 20-Feb-2009 marcel

Provide compatibility symlink for logical partitions:
1. Extend geom_dev by having it create the symlink (i.e. call
make_dev_alias) based on the DIOCGPROVIDERALIAS ioctl.
In this way the functionaility is generic and thus usable
by any geom/provider.
2. Have g_part handle said ioctl through the devalias method,
so that it's under control of the scheme itself. By design
the alias will not be created for newly added partitions.


188838 20-Feb-2009 marcel

Fix an infinite loop created when the last logical partition is
removed.


188723 17-Feb-2009 marcel

Add a default implementation for pre-check. It should
always succeed if not implemented.

Pointy hat: marcel


188705 17-Feb-2009 marcel

Remove gpt_offset and related code. It was introduced for use
by the BSD scheme, ended up not to be needed. Remove to avoid
abuse and to keep the bloat to a minimum.


188667 16-Feb-2009 marcel

Add support to add, delete and modify logical partitions, as well
as to create and destroy the extended partitioning scheme. In
other words: full support.


188659 15-Feb-2009 marcel

Add method precheck to the g_part interface. The precheck
method allows schemes to reject the ctl request, pre-check
the parameters and/or modify/set parameters. There are 2
use cases that triggered the addition:
1. When implementing a R/O scheme, deletes will still
happen to the in-memory representation. The scheme is
not involved in that operation. The pre-check method
can be used to fail the delete up-front. Without this
the write to disk will typically fail, but at that
time the delete already happened.
2. The EBR scheme uses a linked list to record slices.
There's no index. The EBR scheme defines the index
as a function of the start LBA of the partition. The
add verb picks an index for the range and then invokes
the add method of the scheme to fill in the blanks. It
is too late for the add method to change the index.
The pre-check is used to set the index up-front. This
also (silently) overrides/nullifies any (pointless)
user-specified index value.


188492 11-Feb-2009 lulf

- Use the correct argument when determining the buffer size.

PR: kern/131575
MFC after: 2 days


188429 10-Feb-2009 imp

Fix g_part_dumpconf and g_part_name prototpyes.

Submitted by: marcel@


188354 09-Feb-2009 marcel

Add the EBR scheme. The EBR scheme supports the Extended Boot Records
found inside extended partitions and used to create logical partitions.
At this time write/modify support is not (yet) present.
The EBR and MBR schemes both check the parent scheme. The MBR will
back-off when nested under another MBR, whereas the EBR only nests
under a MBR.


188352 08-Feb-2009 marcel

Allow gpe_offset to be set by the scheme. When gpe_offset is zero,
or invalid, initialize it to the start of the partition. Adjust
the mediasize when the offset lies somewhere inside the partition.


188329 08-Feb-2009 marcel

o Add the "PART::scheme" attribute that returns the name of the
underlying partitioning scheme.
o Put the start and end of the partition in the XML configuration.
The start and end are the LBAs of the first and last sector
(resp.) of the partition. They are currently identical to the
offset and size attributes, which describe the partition as an
offset and size in bytes, but may not in the future. The start
and end will be used for the logical partition boundaries and
may include metadata. The offset and size will always represent
the useful storage space within the partition. Typically these
two notions are the same, but for logical partitions in an
extended partition, the EBR is more naturally treated as being
part of the partition.


188303 08-Feb-2009 imp

Fix g_part_*dumpconf to return void to match kobj definition.
Fix g_part_*name to return a const char * rather than a char *.


188054 03-Feb-2009 marcel

In g_handleattr(), set bp->bio_completed also for the case
where len is 0. Otherwise g_getattr() will never succeed
when it is handled by g_handleattr_str().


187973 01-Feb-2009 marcel

Constify val in g_handleattr() and str in g_handleattr_str().
This allows passing string constants to g_handleattr_str().


187672 24-Jan-2009 ed

Remove unused unrhdr from GEOM character device module.

Now that make_dev() doesn't require unit numbers to be unique, there is
no need to use an unrhdr here to generate the numbers. Remove the entire
init-routine, because it is optional.


187053 11-Jan-2009 trasz

Prevent a panic that happens on SMP machines when removing a disk with
many writes queued up.

Reviewed by: phk, scottl
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


186823 06-Jan-2009 marius

- Don't enforce an upper-bound to the number of sectors or heads,
allowing the full 16-bit width of the corresponding fields in the
VTOC8 label to be used. The removed limits basically only held
true for providers labeled using the synthetic geometry provided
by cam_calc_geometry(9) but neither SCSI disks labeled with Solaris
nor sufficiently large ATA disks.
- Given that providers (originally) labeled with Solaris typically
use the native geometry as reported by the target while FreeBSD
typically uses a synthetic one put the message complaining about
mismatching geometries between what the label indicates and what
GEOM thinks the provider has, which we generally can't help,
under bootverbose in order to not unnecessarily scare users.
- For informational purposes add the non-matching values to the
message complaining about them, similar to what r186501 did for
g_part_bsd_read() except also indicating the origin of the
values.
- Make it clear that the messages emitted by this code refer to
the VTOC8 support rather than to another existing scheme or to
VTOC32.


186807 06-Jan-2009 marcel

Don't enforce an upper-bound to the number of sectors or heads
that that the provider has. The limits we imposed were PC BIOS
specific and not always applicable.


186733 04-Jan-2009 marcel

Improve probing.
o Don't check the dummy fields.
o The entry is unused if either dp_mid is 0 or dp_sid is 0.
o The start or end cylinder cannot be 0.
o The start CHS cannot be equal to the end CHS.

Submitted by: nyan


186517 27-Dec-2008 lulf

- Fix an issue with access permissions to underlying disks used by a gvinum
plex. If the plex is a raid5 plex, and is being written to, parity data might
have to be read from the underlying disks, requiring them to be opened for
reading as well as writing.

MFC after: 1 week


186501 26-Dec-2008 obrien

When the geometry does not match the label, print out the values.


186188 16-Dec-2008 trasz

Implement g_vfs_orphan(). Without it, the filesystem never closes
the device, which means refcount on periph drivers never drops,
which means cam_sim_free() never returns, which results in umass
sleeping there ad infinitum.

Submitted by: pjd
Reviewed by: scottl, pjd
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


185768 08-Dec-2008 lulf

- Add missing word in comment.


185693 06-Dec-2008 trasz

Make it possible to use gjournal for the root filesystem. Previously,
an unclean shutdown would make it impossible to mount rootfs at boot.

PR: kern/128529
Reviewed by: pjd
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


185518 01-Dec-2008 ivoras

Trivial patch to show on which geom has the error been detected.

Submitted by: Rick C. Petty
Approved by: gnn (mentor)
MFC after: 1 month


185497 01-Dec-2008 marcel

Allow boot code to be smaller than what the scheme expects.
This effectively changes the boot code size to be an upper
bound and makes the interface more flexible.


185327 26-Nov-2008 marcel

Allow dumpon to a partition of type FS_UNUSED as well.


185318 25-Nov-2008 lulf

- Fix a potential NULL pointer reference. Note that this should not happen in
practice, but it is a good programming practice and allows the kernel to not
depend on userland correctness.
- While there, make sizeof usage match the rest of the code.

Found with: Coverity Prevent(tm)
CID: 660, 662


185309 25-Nov-2008 lulf

- Fix a potential NULL pointer reference. Note that this cannot happen in
practice, but it is a good programming practice nontheless and it allows the
kernel to not depend on userland correctness.

Found with: Coverity Prevent(tm)
CID: 655-659, 664-667


185048 18-Nov-2008 marcel

Partition type FS_UNUSED does not mean the partition entry
is unused. Unused partition entries have a partition size
of zero. Therefore, partitions can have type FS_UNUSED.

MFC after: 3 days


184734 06-Nov-2008 marcel

Fix a panic caused by a corrupted table when the header is
still valid. We were checking the state of the header and
not the table.

PR: 119868
Based on a patch from: Jaakko Heinonen <jh@saunalahti.fi>
MFC after: 1 week


184554 02-Nov-2008 attilio

Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
mnt_lock and using instead a refcount in order to keep track of lock
requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
Change so its KPI by removing the interlock argument and defining 2 new
flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
old version (which was unlinked from the lockmgr alredy) and
MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
mnt_ref if running for more than 3 seconds, make it totally useless.
Remove it as it was thought to work into older versions.
If a problem of "refcount held never going away" should appear, we will
need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
leaving the MNTK_MWAIT flag on even if it the waiters were actually
woken up. Just a place in vfs_mount_destroy() is left because it is
going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with: kib
Tested by: pho


184552 02-Nov-2008 imp

Add support for reading Tivo Series 1 partitioning. This likely needs
a little refinement, but is good enough to commit as is.

# Should look to see if I should move swab(3) into the kernel or just
# provide the unoptimized routine here.

Reviewed by: marcel@


184499 31-Oct-2008 kib

Revert r184136. Instead, push the check for crashdumpmap overflow into the
MD i386 and amd64 dump code.

Requested by: jhb
Retested by: pho
MFC after: 3 days (+ 176304 + 184136)


184292 26-Oct-2008 lulf

- Import macros used in gmirror for printing gvinum debug messages and making
the output more standardized.
- Add a sysctl to set the verbosity of the debug messages.
- While there, fixup typos and wording in the messages.


184264 25-Oct-2008 marcel

Invalid BSD disklabels have been created by sysinstall and
are possibly still being created. The d_secperunit field
contains the number of sectors of the disk and not of the
slice/partition to which the disklabel applies.
Rather than reject the disklabel, we now silently adjust
the field. Existing code, like bslabel(8), does not seem
to check the label that extensively and seems to adjust
fields as a side-effect as well.
In other words, it's not that important apparently, so
gpart should not be too strict about it.

Reported by: nyan@
Reported by: Andriy Gapon <avg@icyb.net.ua>


184151 22-Oct-2008 marcel

Allow dumps to partitions with a tag of 0. The legacy
sunlabel implementation in FreeBSD does not use VTOC
information and as such as no partition types.


184136 21-Oct-2008 kib

Do not overflow crashdumpmap.

Reported and tested by: pho
Reviewed by: jhb
MFC after: 1 week


184069 20-Oct-2008 marcel

The active and bootable flags are not part of the type.
Export the active and bootable flags as attributes in
the configuration XML and allow them to be manipulated
with the set/unset commands.

Since libdisk treats the flags as part of the partition
type, preserve behavior by keeping them included in the
configuration text.


183754 10-Oct-2008 attilio

Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by: kib
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


183546 02-Oct-2008 lulf

- Use the new gv_write_header function to write out the header when removing a
drive to make sure that the header is in the correct format.


183545 02-Oct-2008 lulf

- Remove unneeded macro since the config_length field in the header was changed
to 64 bit in the new format.


183514 01-Oct-2008 lulf

- Make gvinum header on-disk structure consistent on all platforms by storing
the gvinum header in fields of fixed size and in a big endian byte order
rather than the size and byte order of the actual platform.

Note that the change is backwards compatible with the old gvinum configuration
format, but will save the configuration in the new format when the 'saveconfig'
command is executed.

Submitted by: Rick C. Petty <rick-freebsd -at- kiwi-computer.com>


183455 29-Sep-2008 marcel

Return G_PART_PROBE_PRI_HIGH instead of G_PART_PROBE_PRI_NORM
if the probe succeeds. This guarantees that the BSD scheme
wins over the MBR scheme when MBR gets to probe first. Build-
or link-time conditions can cause schemes to end up in the
linker set in a different order. Normally BSD is before MBR
in the linker set and as such get to probe first. But typically
when the kernel gets rebuild or relinked, this can change.


183454 29-Sep-2008 marcel

Insert the null scheme at the head. This does not change any
functionality, but creates an invariant: the first element
on the list is always the null scheme.


183420 27-Sep-2008 marcel

Export the partition name in the conftxt and confxml output.
The conftxt output is used by libdisk, and the confxml
output is used by gpart itself (gpart show -l).

Submitted by: nyan@


183419 27-Sep-2008 marcel

Hold the root mount while we're tasting. It is possible
that a nested partition (typically the BSD disklabel)
is not done tasting while the root file system is being
mounted. While this is rare, it's still possible.


183410 27-Sep-2008 marcel

Allow 255 sectors/track for the BSD disklabel. The previous limit
of 63 sectors/track is too PC BIOS specific. On pc98, where the
BSD disklabel is used as well, 255 sectors/track is not uncommon.

Submitted by: nyan@


183381 26-Sep-2008 ed

Remove unit2minor() use from kernel code.

When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by: kib


183146 18-Sep-2008 sbruno

Just a fixup for a KTRACE message I stumbled upon many moons ago.

Reviewed by: Scott Long
MFC after: 2 days


182843 07-Sep-2008 lulf

- Add a new ioctl for getting the provider name of a geom provider.
- Add a routine for looking up a device and checking if it is a valid geom
provider given a partial or full path to its device node.

Reviewed by: phk
Approved by: pjd (mentor)


182798 05-Sep-2008 rpaulo

Fix build.


182797 05-Sep-2008 rpaulo

Keep entries sorted.


182793 05-Sep-2008 rpaulo

Include the vendor in the partition name.


182784 05-Sep-2008 rpaulo

Detect Apple HFS GPT slices.


182542 31-Aug-2008 attilio

Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.

Manpages are updated accordingly.

Tested by: Diego Sardina <siarodx at gmail dot com>


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


181646 12-Aug-2008 pjd

Style(9).


181463 09-Aug-2008 des

Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from: Varnish
MFC after: 2 weeks


180717 22-Jul-2008 peter

Trivial commit to attempt to diagnose a svn problem. Add
comment that Tivo disks are APM, but do not have a DDR record.


180638 20-Jul-2008 pjd

Clear passphrase buffer after use.

Submitted by: Fabian Keil <fk@fabiankeil.de> (a bit different version)


180612 19-Jul-2008 lulf

- When renaming a drive, also set the drive name in the gvinum header.

PR: kern/125632
Approved by: pjd (mentor)
MFC after: 3 days


180451 11-Jul-2008 lulf

- Fix a logic error when updating plex configuration.

Approved by: pjd (mentor)


180291 05-Jul-2008 rwatson

Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables. Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout(). A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after: 3 weeks


180120 30-Jun-2008 delphij

Avoid NULL deference.

Reviewed by: ivoras


179897 20-Jun-2008 lulf

- Fix spelling errors.

Approved by: kib (mentor)
PR: kern/124788
Submitted by: Hywel Mallett <Hywel -at- hmallett.co.uk>


179853 18-Jun-2008 marcel

Add the set and unset verbs used to set and clear attributes for
partition entries. Implement the setunset method for the MBR
scheme to control the active flag.


179763 12-Jun-2008 marcel

Finish the support for partition labels and add it to the XML.


179756 12-Jun-2008 marcel

Add the raw partition type to the XML.


179755 12-Jun-2008 marcel

Add the raw partition type to the XML.


179752 12-Jun-2008 marcel

Add the raw partition type to the XML.


179751 12-Jun-2008 marcel

Add the raw partiton type to the XML.


179750 12-Jun-2008 marcel

Add the raw partition type to the XML.


179748 12-Jun-2008 marcel

Add the partition label and the raw partition type to the XML.


179413 29-May-2008 ed

Remove the distinction between device minor and unit numbers.

Even though we got rid of device major numbers some time ago, device
drivers still need to provide unique device minor numbers to make_dev().
These numbers are only used inside the kernel. They are not related to
device major and minor numbers which are visible in devfs. These are
actually based on the inode number of the device.

It would eventually be nice to remove minor numbers entirely, but we
don't want to be too agressive here.

Because the 8-15 bits of the device number field (si_drv0) are still
reserved for the major number, there is no 1:1 mapping of the device
minor and unit numbers. Because this is now unused, remove the
restrictions on these numbers.

The MAXMAJOR definition was actually used for two purposes. It was used
to convert both the userspace and kernelspace device numbers to their
major/minor pair, which is why it is now named UMINORMASK.

minor2unit() and unit2minor() have now become useless. Both minor() and
dev2unit() now serve the same purpose. We should eventually remove some
of them, at least turning them into macro's. If devfs would become
completely minor number unaware, we could consider using si_drv0 directly,
just like si_drv1 and si_drv2.

Approved by: philip (mentor)


179206 22-May-2008 lulf

- Recognize the 'volume' parameter when creating a plex.

PR: kern/75632
Approved by: pjd (mentor)
MFC after: 1 day


179097 18-May-2008 pjd

- Assert that we don't send new provider event for a provider which has
G_PF_WITHER flag set.
- Fix typo in assertion condition (sorry, but I forgot who report that).


179094 18-May-2008 pjd

Play nice with DDB pager.

Educated by: jhb's BSDCan presentation


178444 23-Apr-2008 marcel

Implement the G_PART_DUMPCONF method for all 6 schemes. Also call
the method for the (indent == NULL) case (i.e. the kern.geom.conftxt
sysctl). The purpose is to extend the conftxt output with scheme-
specific fields which can be used by libdisk. In particular, have
the schemes dump the xs and xt fields, which contain the backward
compatible values for class type and partition type. This allows
libdisk to work with the legacy slicers as well as with gpart and
helps/promotes migration.


178180 13-Apr-2008 marcel

Add the bootcode verb for installing boot code. Boot code
is supported for the MBR, GPT and PC98 schemes, where GPT
installs boot code into the PMBR.


177713 29-Mar-2008 marcel

Change the order from SI_ORDER_FIRST to SI_ORDER_ANY (within
SI_SUB_DRIVERS) to avoid loading schemes before all the GEOM
classes have been loaded and initialized. Otherwise we may
end up using mutexes that haven't been initialized (due to
g_retaste() posting an event).


177692 28-Mar-2008 marcel

Add support for PC-9800 partition tables.


177681 28-Mar-2008 marcel

When retasting, wither any existing GEOMs of the same class. This
allows the class to create a different GEOM for the same provider
as well as avoid that we end up with multiple GEOMs of the same
class with the same name.

For example, when a disk contains a PC98 partition table but
only MBR is supported, then the partition table can be treated
as a MBR. If support for PC98 is later loaded as a module, the
MBR scheme is pre-empted for the PC98 scheme as expected.


177510 23-Mar-2008 marcel

Redefine G_PART_SCHEME_DECLARE() from populating a private linker set
to declaring a proper module. The module event handler is part of the
gpart core and will add the scheme to an internal list on module load
and will remove the scheme from the internal list on module unload.
This makes it possible to dynamically load and unload partitioning
schemes.


177509 23-Mar-2008 marcel

Add g_retaste(), which given a class will present all non-open providers
to it for tasting. This is useful when the class, through means outside
the scope of GEOM, can claim providers previously unclaimed.

The g_retaste() function posts an event which is handled by the
g_retaste_event().

Event suggested by: phk


177345 18-Mar-2008 lulf

- Fix a memory leak when re-discovering a gvinum configuration.

Approved by: pjd (mentor)
MFC after: 1 week


176718 02-Mar-2008 marcel

Add support for VTOC8 labels (aka sun disk labels). When a label does
not have VTOC information about the partitions, it will be created.
This is because the VTOC information is used for the partition type
and FreeBSD's sunlabel(8) does not create nor use VTOC information.
For this purpose, new tags have been added to support FreeBSD's
partition types.


176672 29-Feb-2008 marcel

Follow-up improvements to the handling of false positives: If the
partition table is empty, check to see if we have something that
looks sufficiently like a BPB. On non-i386 machines, the boot
sector typically doesn't contain boot code; the end of the boot
sector is all zeroes. This is also where the partition table is
for MBRs.
We only check the sector size and cluster size, as that seems to
be the most reliable across implementations, BPB versions and
platforms.


176650 28-Feb-2008 marcel

Better handle false positives. The MBR differs from the boot sector
only because there's a partition table where the boot sector has
boot code. Boot sectors without boot code look like a MBR for all
practical purposes. This change adds a check for the partition table
and fails the probe when it's obvously invalid. The assumption being
that the sector contains a boot sector and not a MBR.
More checks are needed to distinguish a boot secto without boot code
from a (empty) MBR.


176419 20-Feb-2008 thompsa

geom_lvm(4) is now known as geom_linux_lvm(4).


176417 20-Feb-2008 thompsa

Add a geom class to map Linux LVM logical volumes.

The logical disks will appear as /dev/lvm/<vol group>-<logical vol>, for
instance /dev/lvm/vg0-home. G_LINUX_LVM currently supports linear stripes with
segments on multiple physical disks. The metadata is read only, logical
volumes can not be allocated or resized.

Reviewed by: Ivan Voras

Previously known as geom_lvm(4), rename requested by des, phk.


176304 15-Feb-2008 scottl

Teach the dump and minidump code to respect the maxioszie attribute of
the disk; the hard-coded assumption of 64K doesn't work in all cases.


176183 11-Feb-2008 thompsa

Unbreak build, size_t is larger on 64bit platforms.


176166 11-Feb-2008 thompsa

Add a geom class to map Linux LVM logical volumes.

The logical disks will appear as /dev/lvm/<vol group>-<logical vol>, for
instance /dev/lvm/vg0-home. GLVM currently supports linear stripes with
segments on multiple physical disks. The metadata is read only, logical
volumes can not be allocated or resized.

Reviewed by: Ivan Voras


174882 24-Dec-2007 marcel

Various fixes:
o BSD disklabels have relative offsets. Even for the BSD in MBR slice
setup, except when the mbroffset ioctl is supported. Since we don't
support that ioctl, bsdlabel(8) expects relative offsets. So, when
reading an existing disklabel, correct for disklabels that mistakenly
have the mbroffset offsets.
o Don't take the geometry seriously, because it's untrustworthy. We do
expect the numbers to be within range. This means that the secperunit
field will not be computed from secpercyl and ncyls, but simply is
the mediasize in sectors.
o Don't enforce partitions to be aligned to track boundaries. The
default label, constructed by bsdlabel(8), puts partition a at offset
BBSIZE bytes, which commonly means sector 16.


174674 16-Dec-2007 phk

Chop DIOCGDELETE from userland up in 1024 sector chunks to give geom_disk
or any other bio chopping geom a reasonable size of work.

Check for delivered signals between chunks, because the request size
and service time is unbounded.


174669 16-Dec-2007 phk

Don't limit BIO_DELETE requests to MAXPHYS, they perform no data
transfers, so they are not subject to the VM system limitation.


174500 09-Dec-2007 marcel

Decode as many or as few partition entries as the label claims there
are. We have already checked it against the caller provided maxpart.


174499 09-Dec-2007 marcel

Fix a bug in the add verb, where we failed to keep the list
of partitions in index-order. This is assumed by the APM, MBR
and BSD partitioning schemes.


174465 08-Dec-2007 marcel

Internal partitions can not be deleted or modified.


174456 08-Dec-2007 marcel

Skip internal partitions in the check for (user) partitions for
the destroy command. Previously a freshly created BSD disklabel
could not be destroyed because of the internal partition.


174437 08-Dec-2007 marcel

Add support for FS_ZFS.


174347 06-Dec-2007 jhb

Only attach to a GPT partition if it has the GPT_ENT_TYPE_FREEBSD type.

XXX: This only works currently with GEOM_GPT which only exists in 6.x.
XXX: I didn't add 'mbroffset' support for a GPT partition holding a BSD
label as I'm not sure if they use relative or absolute offsets.

MFC after: 3 days


174326 06-Dec-2007 marcel

Add a BSD disklabel backend to g_part:
o Disklabels can have between 8 and 20 partitions (inclusive).
o No device special file is created for the raw partition.
o Switch ia64 to use this backend.
o No support for boot code yet.


173746 19-Nov-2007 jb

On some arches, openssl is built with OPENSSL_NO_CAMELLIA, so the
code here needs to depend on that too.


173677 16-Nov-2007 maxim

o s/resiserfs_sb/reiserfs_sb/.

Submitted by: Ighighi


173001 26-Oct-2007 pjd

Save stack only when KTR_GEOM is both compiled into the kernel and enabled
in debug.ktr.mask. Because saving stack is very expensive, it's better only
to do it when one really wants to.

Reported by: Dan Nelson


172940 24-Oct-2007 jhb

First cut at support for booting a GPT labeled disk via the BIOS bootstrap
on i386 and amd64 machines. The overall process is that /boot/pmbr lives
in the PMBR (similar to /boot/mbr for MBR disks) and is responsible for
locating and loading /boot/gptboot. /boot/gptboot is similar to /boot/boot
except that it groks GPT rather than MBR + bsdlabel. Unlike /boot/boot,
/boot/gptboot lives in its own dedicated GPT partition with a new
"FreeBSD boot" type. This partition does not have a fixed size in that
/boot/pmbr will load the entire partition into the lower 640k. However,
it is limited in that it can only be 545k. That's still a lot better than
the current 7.5k limit for boot2 on MBR. gptboot mostly acts just like
boot2 in that it reads /boot.config and loads up /boot/loader. Some more
details:
- Include uuid_equal() and uuid_is_nil() in libstand.
- Add a new 'boot' command to gpt(8) which makes a GPT disk bootable using
/boot/pmbr and /boot/gptboot. Note that the disk must have some free
space for the boot partition.
- This required exposing the backend of the 'add' function as a
gpt_add_part() function to the rest of gpt(8). 'boot' uses this to
create a boot partition if needed.
- Don't cripple cgbase() in the UFS boot code for /boot/gptboot so that
it can handle a filesystem > 1.5 TB.
- /boot/gptboot has a simple loader (gptldr) that doesn't do any I/O
unlike boot1 since /boot/pmbr loads all of gptboot up front. The
C portion of gptboot (gptboot.c) has been repocopied from boot2.c.
The primary changes are to parse the GPT to find a root filesystem
and to use 64-bit disk addresses. Currently gptboot assumes that the
first UFS partition on the disk is the / filesystem, but this algorithm
will likely be improved in the future.
- Teach the biosdisk driver in /boot/loader to understand GPT tables.
GPT partitions are identified as 'disk0pX:' (e.g. disk0p2:) which is
similar to the /dev names the kernel uses (e.g. /dev/ad0p2).
- Add a new "freebsd-boot" alias to g_part() for the new boot UUID.

MFC after: 1 month
Discussed with: marcel (some things might still change, but am committing
what I have so far)


172857 21-Oct-2007 marcel

Add the freebsd-zfs alias. Both APM and GPT have ZFS partition
types.


172836 20-Oct-2007 julian

Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.


172354 27-Sep-2007 pjd

When orphaning a provider, cancel events related to it.
Without this change the following situation was possible:

1. Provider is orphaned from within class' access() method on last write
close - orphan provider event is send.
2. GEOM detects last write close on a provider and sends new provider event.
3. g_orphan_register() is called, and calls all orphan methods of attached
consumers.
4. New provider event is executed on orphaned provider, all classes can
taste already orphaned provider, and some may attach consumers to it.
Those consumers will never go away, because the g_orphan_register()
was already called.

We end up with a zombie provider.

With this change, at step 3, we will cancel new provider event.

How to repeat this problem:

# mdconfig -a -t malloc -s 10m
# geli init -i 0 md0
# geli attach md0
# newfs -L test /dev/md0.eli
# mount /dev/ufs/test /mnt/tmp
# geli detach -l md0.eli
# umount /mnt/tmp
# glabel status
Name Status Components
ufs/test N/A N/A

Reviewed by: phk
Approved by: re (kensmith)


172304 23-Sep-2007 pjd

LINT compiled just fine for me, but it seems it breaks tinerbox way of
compiling LINT.

Approved by: re (implicitly)


172302 23-Sep-2007 pjd

Bring in the GEOM Virtualisation class, which allows to create huge GEOM
providers with limited physical storage and add physical storage as
needed.

Submitted by: Ivan Voras
Sponsored by: Google Summer of Code 2006
Approved by: re (kensmith)


172031 01-Sep-2007 pjd

Add support for Camellia encryption algorithm.

PR: kern/113790
Submitted by: Yoshisato YANAGISAWA <yanagisawa@csg.is.titech.ac.jp>
Approved by: re (bmah)


170897 17-Jun-2007 marcel

Have gpart synthesize a disk geometry if the underlying provider
don't have it. Some partitioning schemes, as well as file systems,
operate on the geometry and without it such schemes (e.g. MBR)
and file systems (e.g. FAT) can't be created. This is useful for
memory disks.


170651 13-Jun-2007 marcel

Add the MBR partitioning scheme to g_part. This does not yet
support the ability to install boot code.


170362 06-Jun-2007 marcel

Prefix unknown (i.e. un-aliased) partition types with '!'. This is
how they had to be given with ctlreq.


170361 06-Jun-2007 marcel

Call sbuf_finish() before sbuf_data() and sbuf_len().


170307 05-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170289 04-Jun-2007 dwmalone

Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported. In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.


169588 15-May-2007 marcel

Fix a dereference in KASSERT.


169585 15-May-2007 marcel

o Implement automatic commit. It's enabled when the flags parameter
exists and contains the 'C' flag.
o The partition label can be the empty string. It's how labels are
cleared.
o When an action fails, lower permissions when they were raised
in order to allow the action. A failed action will not result
in any uncommitted changes.
o Allow the flags paremeter to be present but empty. It's the
equivalent of not being present.


169404 09-May-2007 marcel

Write the output parameter (if present) for the add, create, delete
destroy and modify verbs.


169398 09-May-2007 marcel

When reverting the creation of a partitioning scheme on a provider,
the failure to probe an existing partitioning scheme means that no
previous partitioning scheme existed. Don't error. Just destroy the
geom.


169389 08-May-2007 marcel

MFp4:
119373: o Remove the query verb, along with the request and response
parameters.
o Add the version and output parameters.
119390: [APM,GPT] Properly clear deleted entries.
119394: o Make the alias the standard and use the '!' to prefix
literal partition types.
o Treat schemes and partition types as case insensitive.
119462: [GPT] Fix a page fault caused when modifying a partition entry
without a new partition type.


169313 06-May-2007 pjd

When deleting key, flush write cache after each overwrite, so we don't
overwrite data N times in cache and only once on disk.


169289 05-May-2007 pjd

Allow to use ':' in d_ident, which is quite handy character.


169288 05-May-2007 pjd

Handle GEOM::ident attribute by attaching 'sX' string at the end of ident
received from the underlying provider, where X is pp->index value.

OK'ed by: phk


169287 05-May-2007 pjd

Because there are many strange hardware out there, allow to use only
[a-zA-Z0-9-_@#%.] characters in d_ident field.


169285 05-May-2007 pjd

- Extend disk structure to allow to store disk's serial number, which can be
retrieved via GEOM::ident attribute.
- Bump disk(9) ABI version.

OK'ed by: phk


169284 05-May-2007 pjd

Implement three new ioctls that can be used with GEOM provider:

DIOCGFLUSH - Flush write cache (sends BIO_FLUSH).

DIOCGDELETE - Delete data (mark as unused) (sends BIO_DELETE).

DIOCGIDENT - Get provider's uniqe and fixed identifier (asks for
GEOM::ident attribute).

First two are self-explanatory, but the last one might not be. Here are
properties of provider's ident:

- ident value is preserved between reboots,
- provider can be detached/attached and ident is preserved,
- provider's name can change - ident can't,
- ident value should not be based on on-disk metadata; in other words
copying whole data from one disk to another should not yield the same
ident for the other disk,
- there could be more than one provider with the same ident, but only if
they point at exactly the same physical storage, this is the case for
multipathing for example,
- GEOM classes that consumes single providers and provide single providers,
like geli, gbde, should just attach class name to the ident of the
underlying provider,
- ident is an ASCII string (is printable),
- ident is optional and applications can't relay on its presence.

The main purpose for this is that application and remember provider's ident
and once it tries to open provider by its name again, it may compare idents
to be sure this is the right provider. If it is not (idents don't match),
then it can open provider by its ident.

OK'ed by: phk


169283 05-May-2007 pjd

Implement g_delete_data() similar to g_read_data() and g_write_data().

OK'ed by: phk


169282 05-May-2007 pjd

- Implement helper g_handleattr_str() function for string attributes
handling.
- Extend g_handleattr() to treat attribute as string when len=0.

OK'ed by: phk


169065 27-Apr-2007 marcel

Put the scheme (APM, GPT, etc) in the XML.


168999 24-Apr-2007 simokawa

If compressed length is zero, return a zero-filled block.

MFC after: 1 week


168670 12-Apr-2007 le

-) Correct sdcount for a plex when removing or adding subdisks.
-) Set correct sizes for plexes and volumes a subdisk has been removed.

Submitted by: Ulf Lilleengen <lulf_AT_freebsd.org>


168669 12-Apr-2007 le

Avoid infinite loop if the device string given for a drive
only consists of "/".

Submitted by: Ulf Lilleengen <lulf_AT_freebsd.org>


168507 08-Apr-2007 pjd

Use root_mounted().


168445 07-Apr-2007 simokawa

Fix a bug for over 4GB media.

MFC after: 3 days


168426 06-Apr-2007 pjd

Sysctl description is not a format string, so one % is enough.


168052 30-Mar-2007 delphij

- Be more verbose when saying "foo" not found.
- In gctl_get_geom(), don't issue error when we were not
provided with an parameter, like gctl_get_provider() did.

Reviewed by: pjd


167913 26-Mar-2007 kris

make_dev(9) can be (and is) called without Giant, so there is no need to
drop the topology lock and acquire Giant around this call.

Reviewed by: phk


167800 22-Mar-2007 pjd

Add missing \n.


167755 21-Mar-2007 sam

Overhaul driver/subsystem api's:
o make all crypto drivers have a device_t; pseudo drivers like the s/w
crypto driver synthesize one
o change the api between the crypto subsystem and drivers to use kobj;
cryptodev_if.m defines this api
o use the fact that all crypto drivers now have a device_t to add support
for specifying which of several potential devices to use when doing
crypto operations
o add new ioctls that allow user apps to select a specific crypto device
to use (previous ioctls maintained for compatibility)
o overhaul crypto subsystem code to eliminate lots of cruft and hide
implementation details from drivers
o bring in numerous fixes from Michale Richardson/hifn; mostly for
795x parts
o add an optional mechanism for mmap'ing the hifn 795x public key h/w
to user space for use by openssl (not enabled by default)
o update crypto test tools to use new ioctl's and add cmd line options
to specify a device to use for tests

These changes will also enable much future work on improving the core
crypto subsystem; including proper load balancing and interposing code
between the core and drivers to dispatch small operations to the s/w
driver as appropriate.

These changes were instigated by the work of Michael Richardson.

Reviewed by: pjd
Approved by: re


167229 05-Mar-2007 pjd

Warn when user use sectorsize bigger than the page size, which will lead
to problems when the geli device is used with file system or as a swap.

Hopefully will prevent problems like kern/98742 in the future.

MFC after: 1 week


167164 02-Mar-2007 pjd

Fix geli after last commit for UP systems that are running SMP kernel.

Submitted by: Hyo geol, Lee <hyogeollee@gmail.com>
MFC after: 1 week


167086 27-Feb-2007 jhb

Use pause() rather than tsleep() on stack variables and function pointers.


167050 27-Feb-2007 mjacob

First cut at GEOM based multipath. This is an active/passive{/passive...}
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.

The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".

The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.

During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.

Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.

There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.

Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months


166934 23-Feb-2007 jhb

Use tsleep() rather than msleep() with a NULL mtx parameter.


166861 21-Feb-2007 n_hibma

Reduce the noise when plugging in (USB) mass storage devices, like a 4 port
flash card reader.
Also remove an 'Opened da0 -> <random number>' which is not needed on a daily
basis (available through bootverbose).

Reviewed by: phk, ken
MFC after: 1 week


166561 08-Feb-2007 rodrigc

#include <sys/systm.h> before <sys/geom.h> to get KASSERT(), and fix LINT build.


166551 07-Feb-2007 marcel

Evolve the ctlreq interface added to geom_gpt into a generic
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).

The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.


166325 28-Jan-2007 pjd

We expect 'bio_data != NULL' for BIO_{READ,WRITE,GETATTR}, but for
BIO_{DELETE,FLUSH} we expect 'bio_data == NULL'.

Reviewed by: phk


166321 28-Jan-2007 pjd

It is possible that GEOM taste provider before SMP is started.
We can't bind to a CPU which is not yet on-line, so add code that wait for
CPUs to go on-line before binding to them.

Reported by: Alin-Adrian Anton <aanton@spintech.ro>
MFC after: 2 weeks


166193 23-Jan-2007 kib

Cylinder group bitmaps and blocks containing inode for a snapshot
file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.

Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.

Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by: Peter Holm
X-MFC after: 3 weeks (if ever: it changes ABI)


164821 02-Dec-2006 pjd

Softc may be NULL in g_journal_orphan(), so don't be surprised.


163912 02-Nov-2006 pjd

Fix ia64 build breakage.


163906 02-Nov-2006 pjd

- Use g_duplicate_bio() instead of g_clone_bio(), so there memory is
allocated with M_WAITOK flag.
- Check 'buf' instead of 'error' so Prevent is not confused.

CID: 1562, 1563
Found by: Coverity Prevent analysis tool


163905 02-Nov-2006 pjd

I want CPU number here.

Noticed by: ru


163894 02-Nov-2006 pjd

Grr, fix one more build breakage.


163888 01-Nov-2006 pjd

Now, that we have gjournal in the tree add possibility to configure
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.


163886 01-Nov-2006 pjd

Change spaces to tabs where needed.


163877 01-Nov-2006 pjd

Skip disabled CPU, because after we sched_bind() to a disabled CPU,
we won't be able to exit from the thread.

Function g_eli_cpu_is_disabled() stoled from kern_pmc.c.

PR: 104669
Reported by: Nikolay Mirin <nik@optim.com.ru>
MFC after: 1 week


163875 01-Nov-2006 pjd

Forgot to remove this line.

Reported by: maxim


163869 01-Nov-2006 pjd

Add BIO_FLUSH support to GSHSEC class.


163868 01-Nov-2006 pjd

Add BIO_FLUSH support to GPT class.


163865 01-Nov-2006 pjd

Update the code to the current sync(2) version:
- Do not modify mnt_flag without mount interlock held.
- Do not touch MNT_ASYNC flag, as this can lead to a race with nmount(2).

Pointed out by: tegge
Reviewed by: tegge


163853 01-Nov-2006 pjd

Remove debugging code I accidentally committed.


163837 31-Oct-2006 pjd

Add gjournal GEOM class (kernel side), which implements block level
journaling and can be tought about marking file system as clean before
doing journal switch, which easly allows to add journaling to file
systems that don't have this feature.

Sponsored by: home.pl


163836 31-Oct-2006 pjd

Implement BIO_FLUSH handling by simply passing it down to the components.

Sponsored by: home.pl


163833 31-Oct-2006 pjd

Add a new disk flag - DISKFLAG_CANFLUSHCACHE, which indicates that the disk
can handle BIO_FLUSH requests.

Sponsored by: home.pl


163832 31-Oct-2006 pjd

Add a new I/O request - BIO_FLUSH, which basically tells providers below to
flush their caches. For now will mostly be used by disks to flush their
write cache.

Sponsored by: home.pl


163206 10-Oct-2006 pjd

Guard against invalid metadata.

MFC after: 1 week


163048 06-Oct-2006 ru

A GEOM cache can speed up read performance by sending fixed size
read requests to its consumer. It has been developed to address
the problem of a horrible read performance of a 64k blocksize FS
residing on a RAID3 array with 8 data components, where a single
disk component would only get 8k read requests, thus effectively
killing disk performance under high load. Documentation will be
provided later. I'd like to thank Vsevolod Lobko for his bright
ideas, and Pawel Jakub Dawidek for helping me fix the nasty bug.


162835 30-Sep-2006 pjd

One more white space fix.


162834 30-Sep-2006 pjd

Remove trailing spaces.


162832 30-Sep-2006 pjd

Remove trailing spaces.


162357 16-Sep-2006 pjd

Fix detecting of UFS1 label when mediasize%fragsize != 0.

Submitted by: Stanislav Sedov
PR: kern/84637
MFC after: 1 week


162353 16-Sep-2006 pjd

Add 'configure' subcommand which for now only allows setting and removing
of the BOOT flag. It can be performed on both attached and detached
providers.

Requested by: Matthias Lederhofer <matled@gmx.net>
MFC after: 1 week


162352 16-Sep-2006 pjd

Add __printflike() to gctl_error().

Approved by: phk
MFC after: 1 week


162350 16-Sep-2006 pjd

Small fixes after adding __printflike() to gctl_error().

Approved by: phk
MFC after: 3 days


162345 16-Sep-2006 pjd

Remove extra arguments.

MFC after: 3 days


162326 15-Sep-2006 pjd

Add 'show geom [addr]' ddb(4) command, which prints entire GEOM topology if
no additional argument is given or details about the given GEOM object
(class, geom, provider or consumer).

Approved by: phk


162282 13-Sep-2006 pjd

Fix synchronization in gmirror and graid3 which I broken. Synchronization
request can still have bio_to set to sc_provider (this is READ part of a
synchronization request) and in this case g_{mirror,raid3}_sync() wasn't
called as it should be.

MFC after: 1 week


162200 10-Sep-2006 pjd

Delay an orphan event if provider has still in-flight I/O requests.
This way GEOM classes can safely detach from provider when an orphan
event is received. This fixes 'detach with active requests' panic for
gstripe/gconcat under load.

PR: kern/102766
Submitted by: mjacob
OK'ed by: phk
MFC after: 1 week


162188 09-Sep-2006 jmg

move created/detected/activated under debug level 1 to quiet the common case..

add count of active and total components to the launched line so you can
see at a glance if your mirror/raid3 is complete...

now:
GEOM_MIRROR: Device mirror/sam launched (2/2).

Reviewed by: pjd


162153 08-Sep-2006 pjd

Fix format character.

Reported by: andre


162149 08-Sep-2006 pjd

Bump copyright year.


162148 08-Sep-2006 pjd

Use __FBSDID in .c files.


162142 08-Sep-2006 pjd

- Split failure probability configuration into read failure probability and
write failure probability.
- Allow to specify an error number to return of failure.

MFC after: 3 days


162056 05-Sep-2006 pjd

Fix problems with destroy and forcible destroy functionality:
- hold/release device in start/done routines, this will probably slow
down things a bit, but previous code was racy;
- only release device if g_gate_destroy() failed - if it succeeded device
is dead and there is nothing to release;
- various other changes which makes forcible destruction reliable.

MFC after: 3 days


161425 17-Aug-2006 imp

while (0); -> while (0) in multi-line macros


161246 12-Aug-2006 pjd

Handle MSDOS file systems properly. Before the change file systems
created on Windows XP (and others maybe) were not detected.
We detected only those created with newfs_msdos(8).

Submitted by: Tobias Reifenberger <treif@mayn.de>
style(9)ified by: pjd


161245 12-Aug-2006 pjd

Verify if a label doesn't point to the parent directory.


161220 11-Aug-2006 pjd

Before using byte offset for IV creation, covert it to little endian.
This way one will be able to use provider encrypted on eg. i386 on
eg. sparc64. This doesn't really buy us much today, because UFS isn't
endian agnostic.

We retain backward compatibility by setting G_ELI_FLAG_NATIVE_BYTE_ORDER
flag on devices with version number less than 2 and not converting the
offset.


161217 11-Aug-2006 pjd

Forgot to bump version number after G_ELI_FLAG_READONLY flag addition.


161136 09-Aug-2006 marcel

Strengthen the check for a PMBR:
o PMBR partitions count to the number of partitions on the disk, which
means that if a PMBR entry is invalid we will not treat the MBR as a
PMBR by virtue of it not describing any partitions.
Previously the checks were inconsistent in that an invalid PMBR entry
would be harmless when no other partitions exist (we would treat the
MBR as a PMBR by virtue of it being empty), but it would be fatal when
there is at least one other partition.
o The partition size of a PMBR partition is one less than the media size
because the GPT starts at the second sector (LBA 1) and extends to
the end of the media. For backward bug-compatibility we accept a size
that's exactly the media size (FreeBSD bug).
Also, when the partition size can not be represented in a 32-bit
integral, the partition size in the MBR is to be set to 0xFFFFFFFF.
Accept this as a valid size, even if the size can be represented.


161127 09-Aug-2006 pjd

Allow geli to operate on read-only providers.

Initial patch from: vd
MFC after: 2 weeks


161116 09-Aug-2006 pjd

Not only a request from us can be passed to g_{mirror,raid3}_worker()
function, but also a request to us, in which case checking bio_cflags
is wrong, because the class above us is controling it, not we.

MFC after: 1 week


161107 08-Aug-2006 marcel

Fix a phase-ordering bug: check the mediasize and sectorsize after
we obtained access. It is possible that GPT gets to taste a disk
first, which means the disk has not been opened before and it will
not get opened until after we checked the mediasize and sectorsize.
However, since the mediasize and sectorsize are determined at open
and that happens when access is optained, checking the mediasize
and sectorsize before obtaining access may result in GPT rejecting
the disk.


160964 04-Aug-2006 yar

Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR: misc/101245
Submitted by: Darren Pilgrim <darren pilgrim bitfreak org>
Tested by: md5(1)
MFC after: 1 week


160895 01-Aug-2006 pjd

Don't use f-word in comments. We are gentlemans.

Pointed out by: Maciej Sobczak


160741 27-Jul-2006 yar

Fix what looks like a typo: MODULE_DEPEND() takes module names,
not KLD file names; and GELI module's name is g_eli, not geom_eli.

Approved by: pjd (silence)
MFC after: 5 days


160569 22-Jul-2006 pjd

Don't forget to initialize crp_olen field, which is used to calculate
bio_completed value.


160330 13-Jul-2006 pjd

Always allow to specify components with /dev/ prefix.

MFC after: 3 days


160301 12-Jul-2006 pjd

Only check if we're freeing a valid object if we hold the topology lock.
This prevents panic under heavy load with DIAGNOSTIC compiled in.


160248 10-Jul-2006 pjd

Use proper defines instead of magic values.

MFC after: 1 week


160203 09-Jul-2006 pjd

When kern.geom.raid3.use_malloc tunnable is set to 1, malloc(9) instead of
uma(9) will be used for memory allocation.
In case of problems or tracking bugs, there are more useful tools for malloc(9)
debugging than for uma(9) debugging, like memguard(9) and redzone(9).

MFC after: 1 week


160155 07-Jul-2006 pjd

Remove bogus assertion.

Reported by: Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after: 3 days


160081 03-Jul-2006 pjd

Allow to close access even if device is already destroyed.

Reported by: Ulrich Spoerlein <uspoerlein@gmail.com>
PR: kern/98093
MFC after: 1 week


159936 26-Jun-2006 sobomax

Improve check for protective MBR. Instead of assiming that protective
MBR should have only one entry of type 0xEE, consider protective MBR
to be one, that has at least one entry of type 0xEE covering the whole
unit. This makes GEOM_GPT compatible with disks partitioned by the
Apple's BootCamp.

Approved in principle by: marcel
MFC After: 1 month


159756 18-Jun-2006 simon

In g_dev_strategy(), when failing an IO request with EINVAL due to
offset or request size which is not a multiple of the sector size, make
sure that the bio is set to indicate that no data has actually been
transferred.

The result of this is that the file offset is no longer incremented for
these requests. The fact that the file offset was incremented broke
fdisk(8)'s probing of sector size for non-512 byte sector sizes.

Reviewed by: phk, cperciva
Submitted by: mdodd
MFC after: 2 weeks


159361 06-Jun-2006 pjd

Allow to use the old -a option to specify an encryption algorithm to use
(for backward compatibility), but print a warning to inform about the
change.


159343 06-Jun-2006 pjd

- Unbreak the build when geli is compiled into the kernel (on as module),
by silencing unfounded compiler warning.

Reported by:


159307 05-Jun-2006 pjd

Implement data integrity verification (data authentication) for geli(8).

Supported by: Wheel Sp. z o.o. (http://www.wheel.pl)


159306 05-Jun-2006 pjd

Make kern.geom.eli.overwrites sysctl a tunable as well.


159304 05-Jun-2006 pjd

Add g_duplicate_bio() function which does the same thing what g_clone_bio()
is doing, but g_duplicate_bio() allocates new bio with M_WAITOK flag.


159238 04-Jun-2006 marcel

Fix unaligned memory accesses on Alpha and possible other platforms.
By using a pointer to struct dos_partition, we implicitly tell the
compiler that the pointer is 4-bytes aligned, even though we know
that's not the case. The fact that we only dereference the pointer
to access a byte-wide field (field dp_ptyp) is not a guarantee that
the compiler will in fact use a byte-wide load. On some platforms
it's more efficient to use long word or quad word loads and use
bit-shifting and bit-masking to get the intended byte. On those
platforms an misaligned load will be the result.
The fix is to use byte-wide pointer arithmetic based on sizeof() and
offsetof() to avoid invalid casts which avoids that the compiler
makes invalid assumptions.

Backtrace provided by: wilko@
MFC after: 1 week


158875 24-May-2006 ceri

Remove the trailing half of a sentence which was clearly superceded
by the preceding one some time during editing.


158290 04-May-2006 pjd

Use G_RAID3_FOREACH_SAFE_BIO() macro instead of G_RAID3_FOREACH_BIO() in
two places where g_io_request() is called. g_io_request() can free bio
structure so we can't reference it after and G_RAID3_FOREACH_BIO() macro
was doing this.

Found by: Coverity Prevent analysis tool (with my new models)
MFC after: 1 day


158195 30-Apr-2006 pjd

We shouldn't lock the topology here - we will panic on assertion inside
g_raid3_bump_syncid().

Reported by: Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after: 1 day


158117 28-Apr-2006 pjd

- Don't hold the device sx lock when going to sleep.
- Prevent possible live-lock in case of memory problems by freeing
already completed requests first.

Reported and tested by: markus, Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after: 1 day


158116 28-Apr-2006 pjd

- Remove dead code.
- Comment possible event miss, which isn't critical, but probably can be
fixed by replacing the event lock usage with the queue lock.

MFC after: 2 weeks


158114 28-Apr-2006 pjd

Be sure to not destroy device twice. This is not possible in theory, but
with this change there is even no theoretical race.

MFC after: 2 weeks


158112 28-Apr-2006 pjd

Be sure to not destroy device twice. This is not possible in theory, but
with this change there is even no theoretical race.

MFC after: 2 weeks


157900 20-Apr-2006 pjd

geli(8) provides keys on newsession time, so remove CRD_F_KEY_EXPLICIT flag
as HW crypto drivers don't support it.


157838 18-Apr-2006 pjd

Fix storing offset of already synchronized data. Offset in entire array was
stored in metadata instead of an offset in single disk.
After reboot/crash synchronization process started from a wrong offset
skipping (not synchronizing) part of the component which can lead to data
corrutpion (when synchronization process was interrupted on initial
synchronization) or other strange situations like 'graid3 status' showing
value more than 100%.

Reported, reviewed and tested by: ru
Reported by: Dmitry Morozovsky <marck@rinet.ru>
MFC after: 1 day


157783 15-Apr-2006 pjd

Correct debug: we are sending child bio here, not parent bio.

MFC after: 1 week


157740 13-Apr-2006 cracauer

Make CCD be able to read and write Linux software raids.

Supported for raid-0 with <n> disks, raid-1 with 2 disks.

Manpages have examples, warnings etc.

Test scripts on
http://www.cons.org/cracauer/ccdconfig-linux/
Reviewed by: alfred


157686 12-Apr-2006 pjd

Pass BIO_GETATTR requests down.

MFC after: 1 week


157630 10-Apr-2006 pjd

Introduce and use delayed-destruction functionality from a pre-sync hook,
which means that devices will be destroyed on last close.

This fixes destruction order problems when, eg. RAID3 array is build on
top of RAID1 arrays.

Requested, reviewed and tested by: ru
MFC after: 2 weeks


157620 10-Apr-2006 marcel

MFp4:
o Implement the remove verb to remove a partition entry.
o Improve error reporting by first checking that the verb is valid.
o Add an entry parameter to the add verb. this parameter can be
both read-only as welll as read-write and specifies the entry
number of the newly added partition.
o Make sure that the provider is alive when passed to us. It may
be withering away.
o When adding a new partition entry, test for overlaps with existing
partitions.


157619 10-Apr-2006 marcel

Add g_wither_provider() to abstract the details of destroying a
particular provider. Use this function where g_orphan_provider()
is being called so that the flags are updated correctly and
g_orphan_provider() is called only when allowed.


157581 07-Apr-2006 marcel

Change gctl_set_param() to return an error instead of setting an
error on the request. Add a wrapper, gctl_set_param_err(), that
sets the error on the request from the error returned by
gctl_set_param() and update current callers of gctl_set_param()
to call gctl_set_param_err() instead.
This makes gctl_set_param() much more usable in situations where
the caller knows better what to do with certain (apparent) error
conditions and setting an error on the request is not one of the
things that need to be done.


157548 05-Apr-2006 pjd

Typos.


157305 30-Mar-2006 pjd

Revert previous change, as I fixed MD5(9).


157293 30-Mar-2006 pjd

md_hash field in g_eli_metadata structure is not 4 byte aligned, which
case panic on sparc64.

The problem is in MD5(9) implementation. The Encode() function takes
'unsigned char *output' as its first argument, which is then assigned to
'u_int32_t *op'. If the 'output' argument is not 4 byte aligned (and in
geli(8) case it is not), sparc64 machine will panic.

I don't know how to fix MD5(9) in a clean way, so I'm implementing a
work-around in geli(8).

Reported by: brueffer
MFC after: 3 days


157292 30-Mar-2006 le

Protect from creating striped and RAID5 plexes with unequally sized
subdisks.


157290 30-Mar-2006 pjd

- 'ndisks' variable is not boolean, so compare it with a value.
- Keep conditions order consistent with the comment above.

MFC after: 3 days


157222 28-Mar-2006 pjd

Preserve previous behaviour of kern.geom.raid3.n{64,16,4}k tunables were 0
means unlimited.

Reported by: ru
MFC after: 3 days


157134 25-Mar-2006 pjd

Increase debug level for "Thread exiting." message. It's not that important
and is 0 by accident.

MFC after: 3 days


157053 23-Mar-2006 le

Fix whitespace.


157052 23-Mar-2006 le

Implement the 'resetconfig' command.

PR: kern/94835
Submitted by: Ulf Lilleengen <lulf@stud.ntnu.no>


156878 19-Mar-2006 pjd

Update copyright for 2006.


156876 19-Mar-2006 pjd

kern.geom.raid3.sync_requests=2 seems to be a better default - it still
keeps disks very busy, but makes system much more responsive.

While here, kill extra space.


156873 19-Mar-2006 pjd

kern.geom.mirror.sync_requests=2 seems to be a better default - it still
keeps disks very busy, but makes system much more responsive.

While here, kill extra space.


156686 13-Mar-2006 ru

Fix a typo.


156684 13-Mar-2006 ru

Fix build on 64-bit platforms.


156612 13-Mar-2006 pjd

- Reimplement I/O data allocation to prevent deadlocks.

Submitted by: green

- Speed up synchronization process by using configurable number of I/O
requests in parallel.
+ Add kern.geom.raid3.sync_requests tunable which defines how many parallel
I/O requests should be used.
+ Retire kern.geom.raid3.reqs_per_sync and kern.geom.raid3.syncs_per_sec
sysctls.
- Fix race between regular and synchronization requests.
- Reimplement raid3's data synchronization - do not use the topology lock
for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.

Tested by: Mike Tancsa <mike@sentex.net>
MFC after: 3 days


156610 13-Mar-2006 pjd

- Speed up synchronization process by using configurable number of I/O
requests in parallel.
+ Add kern.geom.mirror.sync_requests tunable which defines how many parallel
I/O requests should be used.
+ Retire kern.geom.mirror.reqs_per_sync and kern.geom.mirror.syncs_per_sec
sysctls.
- Fix race between regular and synchronization requests.
- Reimplement mirror's data synchronization - do not use the topology lock
for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.

MFC after: 3 days


156527 10-Mar-2006 pjd

When inserting a new component md_provsize metadata field wasn't set, which
means that old problem was triggered (when two providers end at the same
offset, eg. ad0 and ad0s1 and the wrong was is picked up by gmirror/graid3).

Reported by: Michal Suszko <dry@dry.pl>
MFC after: 3 days


156421 08-Mar-2006 pjd

Allow to dump kernel to gmirror providers.
Some conditions have to be met to make it work properly. This will be
described in the manual page.

MFC after: 3 days


156299 04-Mar-2006 pjd

We need to check if file system size is equal to provider's size, because
sysinstall(8) still bogusly puts first partition at offset 0 instead of 16,
so glabel/ufs will find file system on slice instead of partition.

Before sysinstall is fixed, we must keep this code, which means that we
wont't be able to detect UFS file systems created with 'newfs -s ...'.

PS. bsdlabel(8) creates partitions properly.

MFC after: 3 days


156201 02-Mar-2006 jeff

- Lock Giant if needed around the call to vnode_create_vobject(). This is
only important if devfs is not mpsafe.

Sponsored by: Isilon Systems, Inc.
Found by: kris


156170 01-Mar-2006 pjd

Assert proper use of bio_caller1, bio_caller2, bio_cflags, bio_driver1,
bio_driver2 and bio_pflags fields.

Reviewed by: phk


155906 22-Feb-2006 pjd

Do not use bio structure after g_io_deliver(), it may not longer by valid.

Found and fixed by: Vsevolod Lobko <seva@ip.net.ua>
MFC after: 3 days


155803 18-Feb-2006 pjd

Inform when label disappears.

MFC after: 3 days


155802 18-Feb-2006 pjd

Allow to use g_slice_orphan() from outside.

MFC after: 3 days


155801 18-Feb-2006 pjd

- Do not depend on fact that file system covers entire provider.
It won't work for file systems created with -s option.
Use better file system verfication.
- Add myself to the copyright.

MFC after: 3 days


155798 18-Feb-2006 pjd

This function returns nothing.


155797 18-Feb-2006 pjd

If provider's sector size prevents reading SBLOCKSIZE bytes return
immediatelly.


155582 12-Feb-2006 pjd

On component state change to ACTIVE don't forget to update metadata.

MFC after: 3 days


155581 12-Feb-2006 pjd

Use time_uptime instead of time_second, as the latter may go backwards.

Suggested by: ru
MFC after: 3 days


155560 12-Feb-2006 pjd

Allow to set kern.geom.raid3.disconnect_on_failure from loader.conf.

MFC after: 3 days


155546 11-Feb-2006 pjd

- Add kern.geom.raid3.disconnect_on_failure sysctl/tunnable (default to 1
to preserve currect behaviour). When set to 0, components are not
disconnected - graid3 will try to still use them (only first error will
be logged). This is helpful when we have two broken components, but in
different places, so actually all data is available.
Such buggy component will be visible in 'graid3 list' output with flag
BROKEN.
- Never disconnect the last valid component. If we detect errors there we
will just pass them up. This wasn't reasonable to deny access to the
whole provider because of one broken sector.

Prodded by: ru
MFC after: 3 days


155545 11-Feb-2006 pjd

- Add kern.geom.mirror.disconnect_on_failure sysctl/tunnable (default to 1
to preserve currect behaviour). When set to 0, components are not
disconnected - gmirror will try to still use them (only first error will
be logged). This is helpful when we have two broken components, but in
different places, so actually all data is available.
Such buggy component will be visible in 'gmirror list' output with flag
BROKEN.
- Never disconnect the last valid component. If we detect errors there we
will just pass them up. This wasn't reasonable to deny access to the
whole provider because of one broken sector.

Prodded by: ru
MFC after: 3 days


155544 11-Feb-2006 pjd

Correct typo. 'fbp' is NULL here so this will result in a panic.

MFC after: 3 days


155540 11-Feb-2006 pjd

Mark array as CLEAN when there are no write requests in
kern.geom.raid3.idletime seconds. Write, not any requests.
Mark array as clean immediatelly on last write close.

Prodded by: ru
MFC after: 3 days


155539 11-Feb-2006 pjd

Mark array as CLEAN when there are no write requests in
kern.geom.mirror.idletime seconds. Write, not any requests.
Mark array as clean immediatelly on last write close.

Prodded by: ru
MFC after: 3 days


155537 11-Feb-2006 pjd

Teach geli how to load keyfiles before root file system is mounted.
An example entries for loader.conf to make it possible:

geli_da0_keyfile0_load="YES"
geli_da0_keyfile0_type="da0:geli_keyfile0"
geli_da0_keyfile0_name="/boot/keys/da0.key0"
geli_da0_keyfile1_load="YES"
geli_da0_keyfile1_type="da0:geli_keyfile1"
geli_da0_keyfile1_name="/boot/keys/da0.key1"
geli_da0_keyfile2_load="YES"
geli_da0_keyfile2_type="da0:geli_keyfile2"
geli_da0_keyfile2_name="/boot/keys/da0.key2"

geli_da1s3a_keyfile0_load="YES"
geli_da1s3a_keyfile0_type="da1s3a:geli_keyfile0"
geli_da1s3a_keyfile0_name="/boot/keys/da1s3a.key"

Thanks for jhb and kan who showed me the right direction.

MFC after: 3 days


155535 11-Feb-2006 pjd

Check rootvnode variable to see if we still want to ask for passphrase on
boot. Other methods just don't work properly.

MFC after: 3 days


155462 08-Feb-2006 le

Catch the case when a subdisk has no provider or no consumer
attached to it.


155432 07-Feb-2006 brueffer

Clean up some sysctl descriptions, debug messages etc.

Approved by: pjd
MFC after: 3 days


155174 01-Feb-2006 pjd

Remove trailing spaces.


155071 30-Jan-2006 pjd

Allow to specify only one disk. This is helpful when we want to extend
our concatenated device later.

MFC after: 1 week


155070 30-Jan-2006 pjd

Fix typo which cased that 64kB elements limit was not set properly and
16kB elements limit wasn't set at all.

Submitted by: Vsevolod Lobko <seva@ip.net.ua>
MFC after: 3 days


154686 22-Jan-2006 fjoe

Rename geom_uzip class to g_uzip in order to be consistent with the naming
of other GEOM modules.

PR: 89998


154540 18-Jan-2006 pjd

Fix bio leak in case of malloc(9) failure.

Found by: Coverity Prevent(tm)
Coverity ID: CID794
MFC after: 3 days


154539 18-Jan-2006 pjd

Remove dead code.

Found by: Coverity Prevent(tm)
Coverity ID: CID105
MFC after: 3 days


154538 18-Jan-2006 pjd

Remove dead code.

Found by: Coverity Prevent(tm)
Coverity ID: CID104
MFC after: 3 days


154513 18-Jan-2006 pjd

Style cleanups.

X-MFC-after: Already MFCed to RELENG_6 by accident.


154473 17-Jan-2006 pjd

Move $FreeBSD$ from comment to __FBSDID().


154463 17-Jan-2006 pjd

- Use better types.
- Log problems at level 0 when killing providers.

MFC after: 3 days


154462 17-Jan-2006 pjd

Check return value.

Found by: Coverity Prevent(tm)
MFC after: 3 days


154461 17-Jan-2006 pjd

Remove dead code.

Found by: Coverity Prevent(tm)
MFC after: 3 days


154460 17-Jan-2006 pjd

Remove unused value.

Found by: Coverity Prevent(tm)
MFC after: 3 days


154459 17-Jan-2006 pjd

Log situation when EIO is returned.


154458 17-Jan-2006 pjd

Remove bio leak when EIO error is emulated.

Found by: Coverity Prevent(tm)
MFC after: 3 days


154075 06-Jan-2006 le

Get rid of the gv_bioq hack in most parts of the I/O path and
use the standard bioq structures.


153532 19-Dec-2005 pjd

MFp4: Typo fix (without it the XML GEOM tree wasn't consistent).

Reported by: Eric Anderson <anderson@centtech.com>


153265 09-Dec-2005 pjd

Fix build breakage by fixing typo.

Reported by: glebius


153251 08-Dec-2005 pjd

- Allow to specify the byte which will be used for filling read buffer.
- Improve sysctl description a bit.

Submitted by: Ivan Voras <ivoras@gmail.com>


153250 08-Dec-2005 pjd

Teach NOP GEOM class how to gather the following statistics:
- number of read I/O requests,
- number of write I/O requests,
- number of read bytes,
- number of written bytes.
Add 'reset' subcommand for resetting statistics.


152972 30-Nov-2005 sobomax

It is unclear who is wrong and who is right, but when operating on
plain file bsdlabel(8) always writes label at a fixed offset from
its beginning (512 bytes), regardless of the sector size. At the same
time, bsdlabel geom class expects label to be available at the very
beginning of the second sector.

As a result, images prepared in userland for media with sector size
different from 512 bytes (i.e. 2k for cdroms) are not recognized by
the tasting mechanism.

Solve the problem by always looking for the label at 512-byte offset
if we can't find it at the beginning of the second sector and sector
size is not 512 bytes.


152971 30-Nov-2005 sobomax

Don't pass error value pointer to g_read_data(9) at all if we don't
have any use of it.

Suggested by: pjd


152967 30-Nov-2005 sobomax

Check for g_read_data(9) errors properly:

o The only indication of error condition is NULL value returned by
the function;

o value pointed to by error argument is undefined in the case when
operation completes successfully.

Discussed with: phk


152966 30-Nov-2005 sobomax

Kill leading whilespace.


152922 29-Nov-2005 pjd

We do nothing with returned error value, so just remove it.


152913 29-Nov-2005 sobomax

Check value returned by g_read_data(9), otherwise we can end in panic(9)
if read error happens.

MFC after: 1 week


152784 25-Nov-2005 le

Add sysctl descriptions.


152773 24-Nov-2005 le

Since we want a vinum geom created anytime the module loads, move
the geom creation to a seperate init function and ignore the tasting.

The config is now parsed only in the vinumdrive geom, which hopefully
fixes the problem, that the drive class tasted before the vinum class
had a chance, for good.

Also restore the behaviour that the module can be loaded at boot time
and on a running system.


152634 20-Nov-2005 le

Whitespace.


152633 20-Nov-2005 le

Always declare variables at the start of the function.
Don't allocate potentially large variables on the stack.
Check strsep() return values when the string comes from userland.
Shorten variable names for lucidity's sake.

most of the stuff:
Pointed out by: njl@


152632 20-Nov-2005 le

Fix whitespace issue.

Pointed out by: joel@


152615 19-Nov-2005 le

Finally bring in what was produced during Google SoC 2005:

Add functions to rename objects and to move a subdisk from one drive
to another.

Obtained from: Chris Jones <chris.jones@ualberta.ca>
Sponsored by: Google Summer of Code 2005
MFC in: 1 week


152565 18-Nov-2005 jdp

Fix a bug that caused some /dev entries to continue to exist after
the underlying drive had been hot-unplugged from the system. Here
is a specific example. Filesystem code had opened /dev/da1s1e.
Subsequently, the drive was hot-unplugged. This (correctly) caused
all of the associated /dev/da1* entries to be deleted. When the
filesystem later realized that the drive was gone it closed the
device, reducing the write-access counts to 0 on the geom providers
for da1s1e, da1s1, and da1. This caused geom to re-taste the
providers, resulting in the devices being created again. When the
drive was hot-plugged back in, it resulted in duplicate /dev entries
for da1s1e, da1s1, and da1.

This fix adds a new disk_gone() function which is called by CAM when a
drive goes away. It orphans all of the providers associated with the
drive, setting an error condition of ENXIO in each one. In addition,
we prevent a re-taste on last close for writing if an error condition
has been set in the provider.

Sponsored by: Isilon Systems
Reviewed by: phk
MFC after: 1 week


152401 13-Nov-2005 marcel

o Slightly refactor the ctlreq code to maximize code sharing between
verbs. Only the create verb operates on a provider. All other verbs
operate on a GPT geom. Also, the GPT entry oriented verbs require
a non-downgraded GPT.
o Have all verbs take an optional flags parameter. The flags parameter
is a string of single-letter flags. The typical use of these flags
is to enable certain behaviour in support fo the gpt(8) tool.
o Add dummy implementations for the destroy and recover verbs.

This change causes test 2 of the GPT regression test suite to fail.
The presence of a geom parameter is now required even for unknown
verbs.


152342 12-Nov-2005 marcel

Make the kern.geom.conftxt sysctl more usable by also dumping the
MD class. Previously only the DISK class was dumped. The only
consumer of this sysctl is libdisk (i.e. sysinstall) and it tests
explicitly for instances of the DISK class. Dumping other classes
is therefore harmless.
By also dumping the MD class regression tests can be written that
use the MD class for operations that would normally be done on the
DISK class. The sysctl can now be used to test if those operations
took an effect. An example is partitioning.


151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


151822 28-Oct-2005 pjd

Fix possible live-lock under heavy load where we can't allocate more
memory for request.
I was sure graid3 should handle such situations well, but green@ reported
it is not and we want to fix it before 6.0.

Submitted by: green


151684 26-Oct-2005 takawata

Add checking for File record magic.


151172 09-Oct-2005 marcel

Rough implementation of the create and add verbs. The verbs cause
in-memory changes only and as such are only useful for prototyping
and regression testing purposes.


150759 30-Sep-2005 tegge

Move some devstat collection to below where large IO operations are chopped
up. This make iostat report operations passed down to the device driver
instead of operations passed down to GEOM disk. The transfer size limit
imposed by the device driver is no longer hidden, improving the correlation
between iostat output and device driver workload.


150735 29-Sep-2005 fjoe

- Fix "end_blk out of range" panic when INVARIANTS.
- Do not allow rw access.

Submitted by: Dario Freni <saturnero at freesbie dot org>
MFC after: 3 days


150304 18-Sep-2005 marcel

o Don't cause a panic when the control request lacks a verb.
o Don't set the error twice when the named class does not exist.
It causes ioctl(2) to return with error EEXIST.


150240 17-Sep-2005 marcel

Complete rewrite in preparation of adding support for control
requests. The following features have been added:
1. Extensive checking and validation of both the primary and
secondary headers to protect against corrupted data and to
take advantage of the redundancy to allow the GPT to be
used in the face of recoverable corruption.
2. Dynamic data-structures to avoid hardcoding gratuitous
table limits so as to support the creation of GPT tables
of (as of yet) unspecified size.
3. Only allow kernel dumps to swap partitions to provide the
necessary anti-footshooting measures. Linux swap partitions
are allowed.
4. Complete dump of the GPT configuration, including labels.
5. Supports Byte Order Mark (U+FEFF) handling for big-endian,
little-endian and mixed-endian partition names.


150177 15-Sep-2005 jhb

- Add a new simple facility for marking the current thread as being in a
state where sleeping on a sleep queue is not allowed. The facility
doesn't support recursion but uses a simple private per-thread flag
(TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is
set and INVARIANTS is enabled.
- Use this new facility to replace the g_xup and g_xdown mutexes that were
(ab)used to achieve similar behavior.
- Disallow sleeping in interrupt threads when invoking interrupt handlers.

MFC after: 1 week
Reviewed by: phk


150143 14-Sep-2005 rodrigc

Fix so that when a slice or a partition is removed through g_slice_config(),
it is destroyed in GEOM, in addition to being removed from /dev.
Before this patch, if you applied a new MBR which deleted a slice,
the deleted slice would not be in /dev, but it would still appear
in kern.geom.conftxt and kern.geom.confxml, which would confused
the diskPartitionEditor in sysinstall.

Submitted by: pjd
Tested by: pjd, rodrigc
MFC after: 1 week


149931 10-Sep-2005 pjd

Fix copy&paste typo.

MFC after: 3 days


149930 10-Sep-2005 pjd

Don't forget to initialize crp_etype field.

Reported by: Nick Evans <nevans@syphen.net>
MFC after: 3 days


149895 08-Sep-2005 le

Set the G_PF_WITHER flag on the subdisk provider that is about to
be destroyed. That way the GEOM system handles all deallocations
and we don't have to do it ourselves.


149787 04-Sep-2005 phk

Remove a race condition that could result in processes being stuck
waiting for geom events to happen:

Instead of maintaining a count of outstanding events, simply look if
the queue is empty. Make sure to not remove events from the queue
until they are executed in order to not open a new race.

Much work by: pjd
Tested by: kris
MT6: yes, should be.


149757 03-Sep-2005 phk

Typo.


149576 29-Aug-2005 pjd

Use KTR to log allocations and destructions of bios.
This should hopefully allow to track down "duplicate free of g_bio" panics.


149555 28-Aug-2005 le

Prevent that sync operations can be started when they are already
in progress, and be a bit more user friendly in terms of error
messages returned from the kernel.


149538 28-Aug-2005 pjd

Verify length of the data to read as well.


149501 26-Aug-2005 le

Shuffle around the order in which the components are compiled.

This way, the VINUMDRIVE class is loaded before the VINUM class,
but since geom does the tasting for newly arrived classes
last-in-first-out, the VINUM class tastes first.

This removes the need to call gv_parse_config() in the drive
taste path.


149495 26-Aug-2005 pjd

Verify offset before reading.

MFC after: 2 days


149492 26-Aug-2005 takawata

Add NTFS labeling function.

Reviewed by:pjd


149395 23-Aug-2005 pjd

Verify if we can actually read the data at given offset.

Reported by: Martin <nakal@nurfuerspam.de>


149379 22-Aug-2005 le

Correct the check if a plex is accessible in case it is not up.
This makes degraded RAID5 plexes actually work.


149353 21-Aug-2005 pjd

By default, when doing crypto work in software, start as many threads
as we have active CPUs and bind each thread to its own CPU.

MFC after: 3 days


149352 21-Aug-2005 pjd

Remove stale comment (we now always start worker thread).

MFC after: 3 days


149339 20-Aug-2005 pjd

Back-out the change from revision 1.14 and allow for '/' in labels again.

Convinced by: green, Gavin Atkinson, dougb, gordon
MFC after: 1 day


149323 20-Aug-2005 pjd

Add a __packed keyword to g_eli_metadata struct definition, so
sizeof(struct g_eli_metadata) will return the exact number of bytes needed
for storing it on the disk.
Without this change GELI was unusable on amd64 (and probably other 64-bit
archs), because sizeof(struct g_eli_metadata) was greater than 512 bytes
and geli(8) was failing on assertion.

Reported by: Michael Reifenberger <mike@Reifenberger.com>
MFC after: 3 days


149304 19-Aug-2005 pjd

Allow to change number of iterations for PKCS#5v2. It can only be used
when there is only one key set.

MFC after: 3 days


149303 19-Aug-2005 pjd

- Add a missing period.
- Fix number of spaces.

MFC after: 3 days


149300 19-Aug-2005 pjd

Avoid code duplication and implement bitcount32() function in systm.h only.

Reviewed by: cperciva
MFC after: 3 days


149193 17-Aug-2005 pjd

Always run dedicated kernel thread (even when we have hardware support).
There is no performance impact, but allows to allocate memory with
M_WAITOK flag.
As a side effect this simplify code a bit.

MFC after: 3 days


149192 17-Aug-2005 pjd

We should now return 0.


149187 17-Aug-2005 pjd

Even if crypto_dispatch() return an error, request is not canceled and
our callback will still be called, just to tell us that requested
failed...

Reported by: Mike Tancsa <mike@sentex.net>
MFC after: 3 days


149185 17-Aug-2005 pjd

We don't need to clear allocated memory. This will speed-up things a bit.

MFC after: 3 days


149150 16-Aug-2005 phk

remove stale comments


149140 16-Aug-2005 le

Make it possible to remove stale, left-over subdisks.


149094 15-Aug-2005 le

Fix a stupid logic bug introduced in geom_vinum_drive.c rev 1.18:

When a drive is newly created, it's state is initially set to 'down',
so it won't allow saving the config to it (thus it will never know of
itself being created). Work around this by adding a new flag, that's
also checked when saving the config to a drive.


149030 13-Aug-2005 pjd

Because code paths for I/O requests are quite complex, add comments above
the functions which participate in I/O paths.

MFC after: 1 day


148979 12-Aug-2005 pjd

Provide more complete "How to add a new file system to glabel." list.

MFC after: 1 week


148978 12-Aug-2005 pjd

Add code for Ext2FS and ReiserFS labels recognition.

Submitted by: Stanislav Sedov <stas@310.ru>
PR: kern/84638
MFC after: 1 week


148977 12-Aug-2005 pjd

Avoid creating directories in devfs by changing all '/' in labels to '_'.

Idea from: Stanislav Sedov <stas@310.ru>
MFC after: 3 days


148961 11-Aug-2005 pjd

GELI doesn't need cryptodev.

MFC after: 3 days


148867 08-Aug-2005 pjd

Be case-insensitive when dealing with algorithm names.

PR: kern/84659
Submitted by: Benjamin Lutz <benlutz@datacomm.ch>


148460 27-Jul-2005 pjd

MFp4: Export more informations about encrypted providers.

MFC after: 1 week


148458 27-Jul-2005 pjd

Reduce default debug level to 0.

MFC after: 1 week


148456 27-Jul-2005 pjd

Add GEOM_ELI class which provides GEOM providers encryption.
For features list and usage see manual page: geli(8).

Sponsored by: Wheel Sp. z o.o.
http://www.wheel.pl
MFC after: 1 week


148440 27-Jul-2005 pjd

Use root_mount KPI for RAID3 to delay root file system mount.
Actually, one cannot setup root file system on RAID3 device, but when
other file system exist in /etc/fstab which are placed on RAID3 device,
boot process will be interrupted when these devices are missing.

MFC after: 3 days
X-MFC-note: MFC only to RELENG_6, as RELENG_5 doesn't have root_mount KPI.


148410 25-Jul-2005 phk

By design I left a tiny race in updating the I/O statistics based on
the assumption that performance was more important that beancounter
quality statistics.

As it transpires the microoptimization is not measurable in the
real world and the inconsistent statistics confuse users, so revert
the decision.

MT6 candidate: possibly
MT5 candidate: possibly


148382 25-Jul-2005 pjd

Add a very simple and small GEOM class - ZERO.
It creates very huge provider (41PB) /dev/gzero.
On BIO_READ request it zero-fills bio_data and on BIO_WRITE it does nothing.
You can also set kern.geom.zero.clear sysctl to 0 to do nothing even for
BIO_READ.

I'm using it for performance testing where it is very helpful.

MFC after: 3 days


148192 20-Jul-2005 phk

Comment typo


148092 17-Jul-2005 pjd

Before calling g_orphan_provider(), add G_PF_WITHER flag, so GEOM will know
to destroy it.

PR: kern/81758
Submitted by: trasz <trasz@buziaczek.pl>
MFC after: 3 days


148061 15-Jul-2005 nyan

Merged from geom_mbr.c revisions 1.62 and 1.66.
- Implement a gctl handler and the verb "write MBR".


148048 15-Jul-2005 le

*) Implement round-robin reads for multiplex volumes.

*) Plug a possible memory leak. [1]

[1] obtained from: pjd@.


148034 15-Jul-2005 phk

Implement a gctl handler and the verb "write MBR" which can be used to
update metadata and bootcode while the MBR is in use.

MFC candidate


147843 08-Jul-2005 pjd

Add CANCEL command which allows to remove one request from the queue or
all requests from the queue if request number is not given.

Bump version number.

Approved by: re (scottl)


146624 25-May-2005 pjd

After provider creation!!


146616 25-May-2005 pjd

- Call root_mount_rel() when provider IS created, not earlier.
This should close the race observed by Daniel Eriksson.
- Remove redundant wakeup().


146538 23-May-2005 pjd

Add some debug code to diagnose root-on-mirror problems with recent -current.

Reported by: Daniel Eriksson


146353 18-May-2005 pjd

Correct typo.


146325 17-May-2005 le

When a drive dies, don't call g_wither_geom() directly, but instead
post an event to the geom event queue that will take care of it,
letting outstanding bios finish, and closing the consumers.

Plus some cosmetic clean ups.


146118 11-May-2005 pjd

cp can't be NULL.

Noticed by: Coverity Prevent analysis tool


146117 11-May-2005 pjd

gp can't be NULL.

Noticed by: Coverity Prevent analysis tool


146110 11-May-2005 pjd

Add KASSERT() to be sure there is an active component.

Suggested by: Coverity Prevent analysis tool


146109 11-May-2005 pjd

Check return value.

Found by: Coverity Prevent analysis tool


145761 01-May-2005 nyan

Fix signed vs unsigned warning.


145619 28-Apr-2005 le

Only allow RAID5 plexes to be parity checked.

PR: kern/80427
Submitty by: Stijn Hoop <stijn@win.tue.nl>


145502 25-Apr-2005 pjd

Fix provider's size check for 'insert' command.
Before this fix one was able to insert one sector too small provider.

MFC after: 3 days


145306 19-Apr-2005 wollman

The size of a filesystem may be less than the size of the provider it
resides on. Fix the special case of the filesystem fragment size not
evenly dividing the size of the provider. Fixing the general case
probably requires better superblock validation (left as an exercise to
the reader).


145305 19-Apr-2005 pjd

Remove the hack which allowed to use gmirror for root file system,
use root_mount KPI instead.


145259 19-Apr-2005 phk

Call g_waitidle() instead of GEOM using the root_mount_hold() KPI.
GEOM could (and will) get events as a result of drivers coming in
late so a one-shot method is not good enough for GEOM.


145250 18-Apr-2005 phk

Add a named reference-count KPI to hold off mounting of the root filesystem.

While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.


144934 12-Apr-2005 pjd

Protect against recursive labels creation in simlar way as it is done
in BSD and MBR classes, ie. if provider below us uses the same metadata,
don't create labels based on the metadata.
This allows to create labels on geoms with rank != 1 without hacks.

Tested by: Chris Elsworth <chris@shagged.org> on sparc64
OK'ed by: phk
MFC after: 2 weeks


144789 08-Apr-2005 pjd

Fix a long-standing bug. Error string has to be copyied from the user
process context.

Approved by: phk
MFC after: 3 days


144592 03-Apr-2005 pjd

- Add a missing g_io_deliver() in case of allocation failure - we didn't
completed I/O requests here.
- First allocate all needed bios, so if any of allocations fail, we can
free memory before sending any I/O requests down.

Reported by: Pawel Malachowski
MFC after: 3 days


144333 30-Mar-2005 nyan

Remove geometry translations here.


144328 30-Mar-2005 joerg

Support VTOC volume names. This can be useful to distinguish multiple
disks in a system. Solaris' format(1m) displays the volume names in
the disk overview.

MFC after: 1 month


144157 26-Mar-2005 phk

fix a "modify after free" bug which is practically impossible to
experience.

Found by: Coverity (id #540 #541)


144144 26-Mar-2005 pjd

If an error occurs, clean up before returning from g_raid3_connect_disk().


144143 26-Mar-2005 pjd

Make the code more obvious - when an error occurs in g_mirror_connect_disk(),
detach and destroy consumer before returning.


144142 26-Mar-2005 pjd

Check for return values.

Submitted by: sam
Found by: Coverity Prevent analysis tool


143792 18-Mar-2005 phk

g_read_data() can return NULL, check for it.

Found by: Coverity (ID#258)


143791 18-Mar-2005 phk

After rejecting the bio request early, return instead of panicing.

Found by: Coverity (ID#450)


143790 18-Mar-2005 phk

Avoid null pointer dereference.


143719 16-Mar-2005 pjd

Plug memory leak.

Submitted by: Ted Unangst
Found by: Coverity Prevent analysis tool
Approved by: phk
MFC after: 3 days


143627 15-Mar-2005 phk

forward declare struct disk.


143590 14-Mar-2005 phk

Do not attach MBR on top of an MBR. This removes some confusing
slice names on disks with extended partitions.

Spotted on: Mother-in-laws computer.


143418 11-Mar-2005 ume

stop including rijndael-api-fst.h from rijndael.h.
this is required to integrate opencrypto into crypto.


143259 07-Mar-2005 le

Remove test for zero sectorsize when tasting. This check doesn't
seem to be necessary anymore, and it prevents tasting a valid drive
when booting with geom_vinum already loaded, since SCSI disks set their
sectorsize not until first opening them.


143238 07-Mar-2005 phk

Add placeholder mutex argument to new_unrhdr().


143130 04-Mar-2005 le

Don't allow to synchronize a plex that is already sychronizing.

Reset the 'syncing' flag in case of errors, too.

Some cosmetics.


142727 27-Feb-2005 pjd

- Add md_provsize field to metadata, which will help with
shared-last-sector problem.
After this change, even if there is more than one provider with the same
last sector, the proper one will be chosen based on its size.
It still doesn't fix the 'c' partition problem (when da0s1 can be confused
with da0s1c) and situation when 'a' partition starts at offset 0
(then da0s1a can be confused with da0s1 and da0s1c). One can use '-h'
option there, when creating device or avoid sharing last sector.
Actually, when providers share the same last sector and their size is equal,
they provide exactly the same data, so the name (da0s1, da0s1a, da0s1c)
isn't important at all.
- Provide backward compatibility.
- Update copyright's year.

MFC after: 1 week


142301 23-Feb-2005 le

Correctly calculate what to do and how to retry a request to a plex when
the previous one failed and there are more than one plex in the volume.

This could have led to a flood of error messages on the console and
probably a deadlock in certain situations.


142079 19-Feb-2005 phk

Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close. This is not yet a generally safe function, but for this very
specific use it is safe. This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.


142020 17-Feb-2005 le

In case of drive errors, don't close the associated consumer and
detach it, but instead let the geom wither away.

Bump copyright year.


141998 16-Feb-2005 pjd

Fix year in copyrights.


141994 16-Feb-2005 pjd

Update copyright in files changed this year.


141993 16-Feb-2005 pjd

Fix year in copyrights.


141973 16-Feb-2005 pjd

Remove mutex asserion from g_gate_find(). We don't want g_gate_list_mtx
mutex to be held here, because we want speed here.


141972 16-Feb-2005 pjd

Remove TDP_GEOM flag from thread after ggate device creation.
This flag means "wait for all pending requests before returning to userland".
There are pending events for sure, because we just created new provider and
other classes want to taste it, but we cannot answer on I/O requests until
we're here.


141742 12-Feb-2005 pjd

Fix typo. We want to unlock mutex here.

Submitted by: Andreas Kohn <andreas.kohn@gmail.com>
MFC after: 1 week


141624 10-Feb-2005 phk

Make various random things static


141561 09-Feb-2005 pjd

- Remove g_gate_hold()/g_gate_release() from start/done paths. It saves
4 mutex operations per I/O requests.
- Use only one mutex to protect both (incoming and outgoing) queue.
As MUTEX_PROFILING(9) shows, there is no big contention for this lock.
- Protect sc_queue_count with queue mutex, instead of doing atomic
operations on it.
- Remove DROP_GIANT()/PICKUP_GIANT() - ggate is marked as MPSAFE and no
Giant there.


141513 08-Feb-2005 des

merge from geom_vol_ffs.c rev 1.14 (avoid unaligned I/O requests)


141498 08-Feb-2005 des

Take care not to issue unaligned I/O requests while tasting a provider.


141312 05-Feb-2005 pjd

- Use bioq_insert_tail()/bioq_insert_head() instead of bioq_disksort().
- Improve mediasize checking.

MFC after: 1 week


140968 29-Jan-2005 phk

When dumping to a unpartitioned disk, make sure to chop the
length of the dump area accordingly.

Run into by: scottl


140940 28-Jan-2005 jeff

- If mpsafevfs is off, acquire giant around all calls to bufdone().

Sponsored by: Isilon Systems, Inc.


140822 25-Jan-2005 phk

Introduce and use g_vfs_close().


140773 24-Jan-2005 phk

Create a correctly sized vnode objects for disk devices.


140722 24-Jan-2005 jeff

- Don't acquire giant around calls to bufdone().

Sponsored By: Isilon Systems, Inc.


140591 21-Jan-2005 le

Only report state changes of subdisks and plexes when there's
really a state change.

Reword the info a bit.


140590 21-Jan-2005 le

Don't initialize error with ENXIO as we might end up here when
the plex has no more consumers (e.g. orphaning).


140532 20-Jan-2005 pjd

Protect against recursive slices creation in simlar way as it is done
in BSD class, ie. if provider below us uses the same metadata, don't
create slices based on the metadata.
This allows to create slices on geoms with rank != 1 without hacks.

Discussed with: phk
Approved by: phk
MFC after: 2 weeks


140476 19-Jan-2005 le

Rename synchronization and initialization threads and prefix them
with 'gv_' for consistency.


140475 19-Jan-2005 le

Although an object may already be known in the configuration, it's
worker thread may have been destroyed (e.g. during orphaning).

Make sure that objects get back their worker threads when they get a
new geom.


140474 19-Jan-2005 le

Reset object flags after killing off an object's worker thread.


140367 17-Jan-2005 phk

Discontinue zero-length g_ctl arguments as "just give him this pointer"
transfers. The necessary context for calling copyin() isn't available
anyway and automatic code-validation chokes on this.


140261 14-Jan-2005 phk

CAM will sometimes remove a disk again even before it finished being
initialized. We already cancel the pending events but we need to not
dereference the geom pointer which never got set different from NULL.


140074 11-Jan-2005 pjd

Introduce a new GEOM class - SHSEC. It provides sharing secret between
the given providers. Without even one of the configured components there
should be no way to get the secret.

Supported by: WHEEL Sp. z o.o.
http://www.wheel.pl


140056 11-Jan-2005 phk

Add BO_SYNC() and add a default which uses the secret vnode pointer
and VOP_FSYNC() for now.


139940 09-Jan-2005 pjd

Increase default synchronization speed.

MFC after: 3 days


139778 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


139671 04-Jan-2005 pjd

- Fix 'rebuild' command - it can no longer relay on retaste event
(we ignore it).
- Remove code used for handling spoil events, as spoiling is not possible
anymore, because we keep consumers open for writing all the time.

MFC after: 4 days


139670 04-Jan-2005 pjd

Spoiling is now not possible, because we keep consumers open for writing
all the time. Remove unused code then.

MFC after: 4 days


139650 03-Jan-2005 pjd

Fix 'rebuild' command (we ignore retaste event now, so don't relay on it).


139622 03-Jan-2005 pjd

Remove unused #include.


139451 30-Dec-2004 jhb

Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.


139379 28-Dec-2004 pjd

Remove debug code.


139295 25-Dec-2004 pjd

- Add genid field to the metadata which will allow to improve reliability a bit.
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.


139246 23-Dec-2004 pjd

Update disk->d_genid field when increasing sc->sc_genid.


139213 22-Dec-2004 pjd

- Add genid field to the metadata which will allow to improve reliability a bit.
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Add version change history.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.


139146 21-Dec-2004 pjd

Now, when force device destruction is done on shutdown, hide warning,
that device cannot be destroyed immediately, under debug=1.

Suggested by: simon


139144 21-Dec-2004 pjd

Improve reliability and clean up code a bit.
For more details check src/sys/geom/mirror/g_mirror.c rev.1.47,1.48,1.49,1.50.


139140 21-Dec-2004 pjd

This should not be permitted, but some GEOM classes held the topology lock
while doing g_(read|write)_data() (e.g. BSD). This can cause a deadlock
in MIRROR class. Not sure if this is safe to drop the topology lock in BSD
class, so change the code in MIRROR class to avoid this deadlock.


139139 21-Dec-2004 pjd

Implement g_topology_try_lock().

No objection from: phk


139054 19-Dec-2004 pjd

Remove unused variables.


139053 19-Dec-2004 pjd

- Argument 'flags' in g_mirror_destroy_consumer() function is unsed -
mark it as such.
- Before closing consumer check if it is open. It can be closed here
when g_mirror_connect_disk() fails on g_access().


139051 19-Dec-2004 pjd

Some major cleanups.

Keeping consumers open when device is closed is very hard. We need to
open consumers sometimes to update metadata, etc.
Many hacks was introduced in the past to made it possible. You cannot
be sure that you can open consumer for writing always, even if you think
it should be allowed. If one of the mirror components is for example da0
and you try to open it, you can get EPERM when da0s1 is opened for reading
(because BSD class opens consumers (da0) with an extra 'e' bit set).
Waiting for the events queue to be empty may do the trick, but it makes
code much uglier (as you cannot always call g_waitidle()), it doesn't
solve all edge cases and it can introduce deadlocks if there are events
in the queue that wait for gmirror.

I removed those hacks. Now all consumers are open r1w1e1 always, even if
device is closed. Maybe it is less clean from GEOM perspective, but simpify
code a lot and make it much more reliable.
The only issue was retaste event which is sent when we close consumers
opened for writing. I ignore retaste event by not detaching consumer
immediately (so retaste event is not send to my class) and sending event
right after it to detach and destroy consumer.


139050 19-Dec-2004 pjd

Don't quit on first failure, just skip failures.


138888 15-Dec-2004 brueffer

Fix typo in a comment.

MFC after: 3 days


138801 13-Dec-2004 pjd

bioq_insert_head() function is already in subr_disk.c.


138732 12-Dec-2004 phk

Pass the file->flags down to geom ioctl handlers.

Reject certain ioctls if write permission is not indicated.

Bump geom API version.

Reported by: Ruben de Groot <mail25@bzerk.org>


138623 09-Dec-2004 pjd

- Turn off 'fast' mode by default and increase maximum memory to consume
when this mode is used.
- Manual page update.


138382 05-Dec-2004 marcel

o Don't limit GPT as a rank 2 provider. Allow it to be connected
anywhere in the DAG. This includes configurations that are not
allowed by the EFI specification.
o Reject a GPT partition table if it's not preceeded by a PMBR.
There's no need to preserve the MBR partitioning anymore as GPT
is mature and with the first bullet extending the applicability
of GPT, it's better to be a bit more strict.


138374 04-Dec-2004 pjd

When initializing device, set d_softc and d_no fields for all components,
because we know it then and we need it when inserting a component which
wasn't destroyed while device was running.

Reported by: Michael Handler <handler@grendel.net>
MFC after: 1 week


138221 30-Nov-2004 imp

Add observations of the Linux98 and Grub/98 boot loaders. These
observations lead me to believe that the convetion for pc98 boot
loaders is to have a jump unstruction, followed by a string, followed
by code. The jump usually doesn't have a nop after it and usually the
string is NUL terminated, but Grub/98 breaks both of these rules.

# I looked for, but failed to find the Minux boot blocks for PC-9801 port.


138219 30-Nov-2004 imp

Reject tasting of this provider if the sector size isn't a multiple of
512. If I had an audio cdrom in my cd player when I booted my system,
I'd get a panic from geom because you can't read 8192 bytes from an
audio cdrom.

Remove XXX comment about IPL1 and replace it with some information
from my soon to be published web page on the pc98 disk layout. The
IPL1 test was the result of an observation of a disk with FreeBSD's
boot0 program. It was testing part of an area what appears to be
reserved for a boot loader name, which comes after a jump over this
area. I don't yet know if it is required to be any specific jump
instruction, or if the destination has to be location 11. [1]

[1] FreeBSD Press No. 13, page 115, poorly translated by myself. The
picture there shows offset 8 as the destination of the jump, but
FreeBSD's boot0 program has three padding NULs after the IPL1 name and
uses a 16-bit 'jmp' instruction.


138171 28-Nov-2004 phk

Fix a long standing bug in geom_mbr which is only now exposed by the
correct open/close behaviour of filesystems:

When an ioctl to modify the MBR arrives, we cannot take for granted that
we have the consumer open.

The symptom is that one cannot run 'boot0cfg -s2 /dev/ad0' in single-user
mode because / is the only open partition in only open r1w0e1.

If it is not, we attempt to increase the write count by one and
decrease it again afterwards.

Presumably most if not all other slices suffer from the same problem.


138112 26-Nov-2004 le

Implement 'setstate' to allow setting the state of drives and subdisks
for debugging and emergency purposes.


138110 26-Nov-2004 le

Implement checkparity/rebuildparity.


138014 23-Nov-2004 pjd

- Add missing Giant drop before acquiring the topology lock.
- Move DROP_GIANT()/PICKUP_GIANT() to g_gate_ioctl().


137936 20-Nov-2004 fjoe

Use M_ZERO to not panic in mtx_init when INVARIANTS enabled.

Submitted by: simokawa
MFC after: 1 week


137730 15-Nov-2004 le

Move RAID5 offset calculation into a separate function to avoid
code duplication.


137727 15-Nov-2004 le

Share gv_roughlength() between kernel and userland, as we will need it
there later.


137490 09-Nov-2004 pjd

Before trying to update metadata (so open consumer for writing), be sure
that the events queue is empty. In other case we're able to hit the race
where for example da0s1 is tasted by some other class, which means that
da0 is open with exclusive bit set, which means that we can't open da0
for writing if it is our component.

Reported by: Attila Nagy <bra@fsn.hu> (and somebody else sometime ago,
but I cannot find who it was)


137489 09-Nov-2004 pjd

Introduce g_waitidlelock() function which is simlar to g_waitidle(),
but should be called with the topology lock held and returns with the
topology lock held and empty event queue.

Approved by: phk (sometime ago)


137487 09-Nov-2004 pjd

Don't rely on DIRTY flag to be sure that consumer if open, because
DIRTY flag can be removed in idle process. Use consumer's acw field
instead to avoid opening consumer twice.


137485 09-Nov-2004 pjd

For BIO_READ check if provider is open for reading and for BIO_WRITE,
check if provider is open for writing.
This fixes panic when device is open only for writing and we send write
request.


137421 09-Nov-2004 pjd

Drop Giant lock before grabbing the topology lock.


137412 08-Nov-2004 pjd

If device is marked as beeing destroyed, deny all access requests.


137259 05-Nov-2004 pjd

Don't forget to make sure that there are no not-finished requests before
marking components as clean.

Pointed out by: scottl


137258 05-Nov-2004 pjd

- Mark all raid3 components as clean after kern.geom.raid3.idletime seconds.
- Make kern.geom.raid3.timeout variable tunable.


137257 05-Nov-2004 pjd

Mark raid3 devices as clean on shutdown (after all file systems are
unmounted).

Suggested by: scottl


137256 05-Nov-2004 pjd

- Use ->index consumer's field to track number of in-flight requests.
- Remove unused #include.


137254 05-Nov-2004 pjd

Use shutdown hooks to mark mirrors as clean after all file systems are
unmounted.

Suggested by: scottl


137253 05-Nov-2004 pjd

Remove unused #include.


137251 05-Nov-2004 pjd

- Add a sysctl kern.geom.mirror.idletime, so one can specify after how many
seconds of idling, DRITY flags are removed.
- If mirror is in idle state or is not open for writing, sleep without
timeout when waiting for I/O requests.
- Don't use atomic operations, for now sysctls are protected by Giant.
- Update debugs.


137248 05-Nov-2004 pjd

MFp4:
- Fix for good (I hope) force-stopping mirrors and some filure cases
(e.g. the last good component dies when synchronization is in progress).
Don't use ->nstart/->nend consumer's fields, as this could be racy,
because those fields are used in g_down/g_up, use ->index consumer's
field instead for tracking number of not finished requests.

Reported by: marcel

- After 5 seconds of idle time (this should be configurable) mark all
dirty providers as clean, so when mirror is not used in 5 seconds
and there will be power failure, no synchronization on boot is needed.

Idea from: sorry, I can't find who suggested this

- When there are no ACTIVE components and no NEW components destroy whole
mirror, not only provider.

- Fix one debug to show information about I/O request, before we change
its command.


137184 04-Nov-2004 phk

Finish cut&paste adjustments.

Spotted by: tegge


137150 03-Nov-2004 phk

Stop dumping the MBR entries under bootverbose


137149 03-Nov-2004 phk

Stop wasting a bootverbose line on all geom slices.


137048 29-Oct-2004 phk

Don't set si_bsize_phys, nobody cares.


137034 29-Oct-2004 phk

Add GEOM class "VFS" for filesystems and other buffer cache users
of GEOM devices.

There is nothing magic about this, it just gives a bufobj interface
to GEOM.


137032 29-Oct-2004 phk

Add g_wither_geom_close() function.


137029 29-Oct-2004 phk

Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text. There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.


136983 26-Oct-2004 le

Give each plex a separate queue where held back bios are put on.
This lowers the CPU usage of the worker thread and prevents a
possible live lock on non-SMP machines.

MFC candidate.


136946 25-Oct-2004 phk

Use unit number allocation functions for GEOM minor numbers.


136940 25-Oct-2004 phk

Retire si_stripesize and si_stripeoffset they will not be needed in cdev
in the future.


136839 23-Oct-2004 phk

Don't call g_waitidle(), it happens automagically now.


136837 23-Oct-2004 phk

Add a new per-thread private flag: TDP_GEOM.

This flag gets set whenever the thread posts an event on the GEOM
event queue, and if the flag is set when the thread is prepared
to return to userland from the kernel, g_waitidle() will be called
to make sure that the posted events have completed.

This can replace an insufficient number of g_waitidle() calls in
various other places, and has the advantage of being failsafe: Any
system call which does a VOP_OPEN()/VOP_CLOSE will now correctly
wait for any geom events it posted as part of spoils or tastes.

Assert that topology and Giant is not held in g_waitidle().


136836 23-Oct-2004 phk

Move the prototype for g_waitidle() to a more visible place.


136797 22-Oct-2004 arr

- Turn KASSERT()s into warning printf()'s in the g_class_load() routine.
This removes a panic that will occur if you build with GENERIC and
attempt to kldload a GEOM module that is already in the kernel.

Reviewed by: phk


136755 21-Oct-2004 rwatson

Add KTR_GEOM, which allows tracing of basic GEOM I/O events occuring
in the g_up and g_down threads. Each time a bio is propelled up and
down the stack, an event is generating showing the provider, offset,
and length, as well as thread wakeup and work status information.


136504 14-Oct-2004 pjd

Ehh. Introduce a hack: Wait for 3 seconds, so GEOM is able to give us
providers for tasting. Before this hack, race below is possible:
SI_SUB_RAID (no not-fully-configured geoms, so don't block)
GEOM tasting (now geoms are created)
SI_SUB_MOUNT_ROOT (if root file system is placed on a mirror, it is
possible that this mirror is not fully configured yet)
There is a lot of work to do to avoid such hacks and I need a working
solution before 5.3, sorry.

Reported by: John Hay <jhay@icomtek.csir.co.za>


136503 14-Oct-2004 pjd

Only allow for unloading when there are no geoms in LABEL GEOM class.
We have to use our own destroy_geom method, because default one, which
is a part of geom_slice is broken.
MT5 candidate.

PR: kern/72467
Submitted by: Vladimir Novoseltsev


136414 12-Oct-2004 green

When loading GEOM modules, we expect the actual load process to be done
by the time that kldload(8) returns. Satisfy that by making the GEOM
module load event -- only when the kernel is !cold -- wait until the
GEOM module init function has finished instead of returning immediately.

This is the other half of fixing md(8) (actually, "mfs" in fstab(5))
that is similar to r1.128 of src/sys/dev/md/md.c. This bug would be
why RAM disks would often fail on boot and the first call to mdconfig(8)
would probably fail.

pjd has ideas for not requiring kldload(8) to work synchronously for
control devices that could make this obsolete.

Silence on: -arch


136399 11-Oct-2004 ups

Trace information about a buffer while we still control it.

Reviewed by: phk
Approved by: sam (mentor)


136284 08-Oct-2004 sos

Only do the geometry translations on ad* devices, other devices seems to
have their own way of life.
Those other devices translations should be moved here as well.


136236 07-Oct-2004 pjd

Be sure to always return 0 for negative access requests.

Reported by: Maciej Kucharz <qk@comp.waw.pl>


136233 07-Oct-2004 sos

Move the PC98 specific geometry "gunk" to geom_pc98.c where it belongs.
This also adds support for bigger disks on the controller I have access to,
and maybe others if I understood the adhoc methods used on those.

Those with more PC98 bigdrive controllers it is hereby invited to add/fix
support for those in geom_pc98.c and not using #ifdef PC98 all over the place.


136201 06-Oct-2004 phk

Don't set the BIO_ONQUEUE debugging flag until we actually put the bio
onto a queue. This made the ENOMEM handling an instant panic.


136197 06-Oct-2004 pjd

Geoms without softc are geoms which are initialized, so wait for them.


136191 06-Oct-2004 pjd

Look out for geoms without softc.

Reported by: tegge


136143 05-Oct-2004 pjd

Before root file system is mounted, wait for mirrors in degraded state.


136065 02-Oct-2004 le

Don't allow to create a drive that already exists.


136064 02-Oct-2004 le

Correctly skip the '/dev/' part when creating new drives and prefix
a drive's provider with '/dev/' when printing the config.

Reported by: will@


136056 02-Oct-2004 pjd

Unlock g_gate_list_mtx mutex when we cannot allocate unit number.
MT5 candidate.

PR: kern/72253
Submitted by: Ivan Voras <ivoras@fer.hr>


135966 30-Sep-2004 le

Make it possible to rebuild degraded RAID5 plexes. Note that it is
currently not possible to do this while the volume is mounted.

MFC in: 1 week


135876 28-Sep-2004 phk

Protect the start/end counts on consumers and providers with the up/down
mutexes.

Make it possible to also protect the disk statistics (at a minor cost in
performance) by setting bit 2 of kern.geom.collectstats.


135873 28-Sep-2004 pjd

- Set maximum request size to MAXPHYS (128kB), instead of DFLPHYS (64kB).
- Set minimum request size to sectorsize, instead of 512 bytes.

Approved by: phk (some time ago)


135872 28-Sep-2004 pjd

Just use MAXPHYS as maximum I/O request size, instead of using my own
#define for this purpose.
No functional change.


135866 27-Sep-2004 pjd

Decrease kern.geom.raid3.timeout to 4, so it is smaller than
vfs.root.mountdelay by default.


135865 27-Sep-2004 pjd

Deny invalid I/O requests which comes from userland here, because later
we'll get a panic.
MT5 candidate.

Reviewed by: phk


135863 27-Sep-2004 pjd

Avoid race while synchronizing components. It is very hard to bump into,
but it is possible:
1. Read data from good component for synchronization.
2. Write data to the same area.
3. Write synchronization data, which are now stale.

Found by: tegge (for gmirror)


135859 27-Sep-2004 pjd

Minor, but very important condition fix. The current one can never be true.


135854 27-Sep-2004 pjd

Decrease kern.geom.mirror.timeout to 4, so it is smaller than
vfs.root.mountdelay by default.


135834 26-Sep-2004 pjd

Forgot to commit addition of ds_resync field.


135833 26-Sep-2004 pjd

Avoid race while synchronizing components. It is very hard to bump into,
but it is possible:
1. Read data from good component for synchronization.
2. Write data to the same area.
3. Write synchronization data, which are now stale.

Found by: tegge


135831 26-Sep-2004 pjd

Simplify code a bit.


135716 24-Sep-2004 phk

Assert topology is held in g_dev_getprovider().

Don't call devsw(). It is not necessary, and we do not need to hold dev_lock
to compare the devsw pointer to our own since we do not dereference it.


135522 20-Sep-2004 pjd

This is not needed anymore, it is forced in GEOM now.
Actually, it can even cause some problems, because GEOM requires sectorsize
to be more than 0 on first access, not on provider creation, so we can skip
valid providers by doing this check here.

Reported by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Sven Willenberger <sven@dmv.com>


135461 19-Sep-2004 fjoe

Use correct malloc type when freeing memory allocated by g_read_data.

PR: 71431
Submitted by: daichi


135434 18-Sep-2004 le

Single concat or striped plexes don't need no special initialization
if their subdisks are all available, so let them be brought up.


135426 18-Sep-2004 le

Re-vamp how I/O is handled in volumes and plexes.

Analogous to the drive level, give each volume and plex a worker thread
that picks up and processes incoming and completed BIOs.

This should fix the data corruption issues that have come up a few
weeks ago and improve performance, especially of RAID5 plexes.

The volume level needs a little work, though.


135302 16-Sep-2004 fjoe

g_nop_create: destroy newly created provider in case of errors.


135173 13-Sep-2004 le

Give the DRIVE geom a worker thread that picks up incoming bios,
sends them down, and takes care of the finished bios. This makes it
easier to handle I/O errors at drive level.


135164 13-Sep-2004 le

Rename gv_kill_thread() to gv_kill_plex_thread(), since there are more
threads to come.


135162 13-Sep-2004 le

Save the config back to disk when a drive goes down.


135161 13-Sep-2004 le

Read a whole sector instead of GV_HDR_LEN, since a sector might be
bigger (i.e. on CD-ROMs).


135151 13-Sep-2004 pjd

Make kern.geom.debugflags sysctl tunable from /boot/loader.conf.
It will help to debug problems when booting.

Approved by: phk


135085 11-Sep-2004 phk

Fix a problem that shows up if less than the full complement of
lock sectors are defined ("number_of_keys" argument to gbde init being
less than 4 in the default compile).


135084 11-Sep-2004 phk

Respect that G_BDE_MAXKEYS is a compile time variable.


134958 08-Sep-2004 fjoe

Do not compile in zlib.c. Add a dependency on module instead.


134957 08-Sep-2004 pjd

Show current status of mirror device directly.

Suggested by: Krzysztof Ciep³ucha <kris@home.pl>


134824 05-Sep-2004 phk

For removable devices without media we set a zero mediasize but a non-zero
sectorsize in order to avoid a lot of checks around various divisions etc.

Enforce the sectorsize being > 0 with a KASSERT on successful open.

Fix scsi_cd.c to return 2k sectors when no media inserted.


134528 30-Aug-2004 pjd

Allow to configure debug level from /boot/loader.conf.


134519 30-Aug-2004 phk

Add more KASSERTS and checks.


134486 29-Aug-2004 pjd

GCC, ehh.


134421 28-Aug-2004 pjd

Use sc->sc_mediasize instead of sc->sc_provider->mediasize which contains
exactly the same value, but is shorter.


134420 28-Aug-2004 pjd

Warn the user if we are not going to use whole provider space.

Requested by: Michael Handler <handler@grendel.net>


134418 28-Aug-2004 pjd

Don't allow to insert providers, which are too small.

Reported by: Michael Handler <handler@grendel.net>


134407 27-Aug-2004 le

Move config_new_drive() to the correct place and rename it to
gv_config_new_drive().


134379 27-Aug-2004 phk

Introduce g_alloc_bio() as a waiting variant of g_new_bio().

Use in places where we can sleep and where we previously failed to check
for a NULL pointer.

MT5 candidate.


134356 26-Aug-2004 le

When attaching a consumer from a volume to a plex, check if the
volume already has a plex attached and adjust the access counts
of the new consumer accordingly.


134344 26-Aug-2004 pjd

Skip providers with not defined sector size.

Reported by: kuriyama


134303 25-Aug-2004 pjd

Log verification errors at level 1.


134292 25-Aug-2004 pjd

Dump disk number.


134226 23-Aug-2004 pjd

Allow to set kern.geom.mirror.timeout from /boot/loader.conf.


134221 23-Aug-2004 le

Compare the addresses of two RAID5 work packets directly instead
of the addresses of their related bios when locking one out, since
they could share a bio and this could lead to parity corruption.


134176 22-Aug-2004 le

Implement the possibility to remove drives.


134168 22-Aug-2004 pjd

Implementation of 'verify reading' algorithm, which uses parity data for
verification of regular data when device is in complete state.
On verification error, EIO error is returned for the bio and sysctl
kern.geom.raid3.stat.parity_mismatch is increased.

Suggested by: phk


134155 22-Aug-2004 le

Add forgotten format specifier in a KASSERT and shut up the compiler.

Submitted by: Gavin Atkinson <gavin.atkinson@ury.york.ac.uk>


134136 21-Aug-2004 pjd

Add version history.


134124 21-Aug-2004 pjd

Implement new reading algorithm, which will use parity component for reading
as well, even if device is in complete state.
I observe 40% of speed-up with this option for random read operations,
but slowdown for sequential reads.
Basically, without this option reading from a RAID3 device built from 5
components (c0-c4) looks like this:

Request no. Used components
1 c0+c1+c2+c3
2 c0+c1+c2+c3
3 c0+c1+c2+c3

With the new feature:

Request no. Used components
1 c0+c1+c2+c3
2 (c1^c2^c3^c4)+c1+c2+c3
3 c0+(c0^c2^c3^c4)+c2+c3
4 c0+c1+(c0^c1^c3^c4)+c3
5 c0+c1+c2+(c0^c1^c2^c4)
6 c0+c1+c2+c3
[...]


134014 19-Aug-2004 le

A volume can be up if it has a degraded RAID5 plex.


133991 18-Aug-2004 pjd

We really don't want to receive spoil event for synchroniztion consumers.


133986 18-Aug-2004 phk

Do not override the class provided dumpconf function.


133984 18-Aug-2004 le

Pretty print some informational messages.


133983 18-Aug-2004 le

Fix a stupid bug in the drive taste function: when checking if a
drive is known to the configuration check also if it already has a geom.
Without this check several needless geoms are created and valid
configuration data was overwritten.

This change obsoletes the need for a separate geom to taste an
offered provider and the consumer doesn't need to be opened with the
exclusive bit set.


133981 18-Aug-2004 pjd

NOP class doesn't operate on metadata, so the spoil event can be safely
ignored.


133979 18-Aug-2004 pjd

Dump device status on 'list' command.


133946 18-Aug-2004 pjd

Bump synchronization ID if we are sure, that we have ACTIVE components.


133839 16-Aug-2004 obrien

Minor style.9 cleanup.


133825 16-Aug-2004 pjd

Decrease debug level to 0.


133823 16-Aug-2004 pjd

Fix warning.


133808 16-Aug-2004 pjd

Introduce GEOM RAID3 class, i.e. kernel module, which implements RAID3
transformation and graid3(8) userland utility, which can be used for
configuration. No manual page yet, sorry.

Hardware provided by: Daniel Seuffert


133752 15-Aug-2004 pjd

Avoid code duplication by introducing g_mirror_write_metadata() function,
which is used now by g_mirror_clear_metadata() function and
g_mirror_update_metadata() function.


133717 14-Aug-2004 le

Make informational output look less like an accident.


133640 13-Aug-2004 fjoe

Add geom_uzip -- geom class that implements read-only compressed disks.
Currently supports cloop V2.0 disk compression format.
May support more formats in future.


133530 11-Aug-2004 pjd

MFp4: Simplify code a bit:
- Remove kern.geom.mirror.sync_block_size sysctl. It is quite obvious that we
want to use the biggest size possible.
- Do not use UMA zone for sync data allocations. There could be only one
synchronization request per synchronized disk at a time, so allocate memory
for one request on whole synchronization process related to one disk.

Tested by synchronizing one component (out of three) and by synchronizing
two components (out of three) in parallel.


133528 11-Aug-2004 pjd

Actually, HARDCODED flag isn't stored in metadata, so don't bother
dumping it.


133527 11-Aug-2004 pjd

- Fix typo.
- Dump HARDCODED flag.


133498 11-Aug-2004 pjd

Increase default kern.geom.stripe.maxmem to 50 elements.


133487 11-Aug-2004 pjd

When sending request once again because of ENOMEM, reset bio_children
and bio_inbed fields to 0. Without this change we can end up with
I/O leakage in some rare situations.
I tested this change by putting failure probability mechanism simlar
to this used in NOP class into g_clone_bio(9) function, so it was
able to return NULL with the given probability.

Discussed with: phk


133484 11-Aug-2004 pjd

Try harder to not panic on 'stop -f'.
After the commit, this command should be really safe to use.


133450 10-Aug-2004 le

If we kill the worklist thread of a RAID5 plex we can destroy
the worklist mutex at the same time, so move the mtx_destroy() call
to gv_kill_thread().


133449 10-Aug-2004 le

Lock the topology before calling gv_parse_config, not afterwards.


133448 10-Aug-2004 pjd

- Recognize HARDCODED flag when dumping consumer configuration.
- Improve code readabilty a bit.


133447 10-Aug-2004 pjd

Forgot to commit those: introduce hardcoded provider functionality,
which allow to store provider's name in the metadata and avoid
problems when few providers share the same last sector.


133444 10-Aug-2004 pjd

Fix one of the lastest commit. This bio_caller1 should also be changed to
bio_driver1 (as all the rest).
This introduced a small memory leak, but it wasn't really critical,
because maximum memory for g_stripe_zone is always set, so after few
requests gstripe was working in "economic" mode.


133373 09-Aug-2004 pjd

- Introduce option for hardcoding providers' names into metadata.
It allows to fix problems when last provider's sector is shared between few
providers.
- Bump version number for CONCAT and STRIPE and add code for backward
compatibility.
- Do not bump version number of MIRROR, as it wasn't officially introduced yet.
Even if someone started to play with it, there is no big deal, because
wrong MD5 sum of metadata will deny those providers.
- Update manual pages.
- Add version history to g_(stripe|concat).h files.


133371 09-Aug-2004 pjd

Do not use g_wither_geom(9). I doesn't work in the way which is expected
here anymore (after g_wither_washer() was introduced), i.e. geom and consumer
will not be immediately destroyed if possible.


133356 09-Aug-2004 phk

Too many versions.

Spotted by: pjd


133319 08-Aug-2004 phk

OK, now check geom class version numbers.


133318 08-Aug-2004 phk

Tag all geom classes in the tree with a version number.


133316 08-Aug-2004 phk

OOps, that check was a bit premature. Allow zero versions as well.


133314 08-Aug-2004 phk

Use default method initialization on geoms.


133312 08-Aug-2004 phk

Give classes a version number and refuse to touch classes which are not
understood. This makes room for additional binary compatibility in the
future.

Put fields in the class for the geom's methods and initialize the methods
of a new geom from these fields. This saves some code in all classes.


133205 06-Aug-2004 pjd

Add and document kern.geom.stripe.fast_failed sysctl, which shows how
many times "fast" mode failed.


133204 06-Aug-2004 pjd

Fields bio_caller[12] should be used by the consumer and fields
bio_driver[12] should be used by the provider!


133201 06-Aug-2004 pjd

Fix I/O leakage. We're cloning bios in g_stripe_start_fast(), but when
something goes wrong while running in "fast" mode, we free all bios and
falling back to "economic" mode. Freeing bios, doesn't mean decrease
bio_children, so bio_inbed couldn't be equal to bio_children and request
was never finished.
Decrease bio_children manually when destroying bios.

Reported by: Sam Lawrance <boris@brooknet.com.au>, simon


133173 05-Aug-2004 pjd

Don't use 'bp' after its destruction!


133170 05-Aug-2004 pjd

Simplify a bit - we could use 'sc' here as it was initialized properly.


133142 04-Aug-2004 pjd

- Add two fields to bio structure: 'bio_cflags' which can be used by
consumer and 'bio_pflags' which can be used by provider.
- Remove BIO_FLAG1 and BIO_FLAG2 flags. From now on new fields should be
used for internal flags.
- Update g_bio(9) manual page.
- Update some comments.
- Update GEOM_MIRROR, which was the only one using BIO_FLAGs.

Idea from: phk
Reviewed by: phk


133115 04-Aug-2004 pjd

- Add "prefer" balance algorithm. When used, only disk with the biggest
priority will be used for reading.
- Bump version number.


133114 04-Aug-2004 pjd

MFp4: We don't really need g_mirror_free_disk() function.


133079 03-Aug-2004 pjd

Fix comment.


132988 02-Aug-2004 pjd

- Fix unloading by the same way it is done in my other classes:
set gp->softc to NULL and return ENXIO when it is NULL, so GEOM
will not panic or hang, but unload one device on every 'unload'.
This make 'unload' command usable, but it have to be executed
<number of devices> + 1 times.
- Made use of 'pp' variable.


132976 01-Aug-2004 pjd

Typo.


132954 01-Aug-2004 pjd

- Launch main provider when there are no more disks in NEW state.
- Log syncid bump at debug level 1.


132941 31-Jul-2004 pjd

If there are no valid components after the timeout, just destroy device.
There is probably nothing to wait for.


132940 31-Jul-2004 le

Propagate size changes upwards.


132938 31-Jul-2004 pjd

Handle spoil event in dedicated function: g_mirror_spoiled().
The different between the new function and g_mirror_orphan() (which was
used previously) is that syncid is bumped immediately, instead of on
first write, because when consumer was spoiled, it means, that its
provider was opened for writing, so we can't trust that its data
will be valid when it will be connected again.


132923 31-Jul-2004 pjd

Remove unused field.


132922 31-Jul-2004 pjd

Destroy synchronization geom immediately. This should fix unloading without
stopping all mirrors.


132911 31-Jul-2004 pjd

Allow slice creation on providers from MIRROR class.
This should allow mounting root file system from a mirror.


132909 31-Jul-2004 pjd

Add '-p' option for 'insert' command which allows to specify priority
of the new component.
Version number wasn't bumped (it should be), because I think there are
no geom_mirror users yet.


132908 31-Jul-2004 pjd

- Check if 'slice' argument was given.
- Check if disk isn't already the mirror component.


132907 31-Jul-2004 pjd

Dump correct field.


132906 30-Jul-2004 le

Set the access counts of a subdisk correctly when attaching it
to a plex that already has subdisks.


132904 30-Jul-2004 pjd

Add GEOM_MIRROR class which provide RAID1 functionality and has many useful
features. The gmirror(8) utility should be used for control of this class.
There is no manual page yet, but I'm working on it with keramida@.

Many useful tests provided by: simon (thank you!)
Some ideas from: scottl, simon, phk


132896 30-Jul-2004 pjd

Nuke geom_mirror class. New geom_mirror class is in the way.

Approved by: phk


132895 30-Jul-2004 pjd

Allow to create slices on providers from class LABEL and class NOP.
This is really ugly way to do this, but there is no other way for now.
It allows to mount root file system from providers which belong to
those classes.

Approved by: phk


132877 30-Jul-2004 pjd

- Add '-S' option, which allow to specify sector size for transparent
provider.
- Bump version number.

This allows for a quite interesting trick. One can setup a stripe with
stripe size of 512 bytes and create transparent provider on top of it
with sector size equal to <ndisks> * 512. The result will be something
like RAID3 without parity disk (every access will touch all disks).


132833 29-Jul-2004 le

Shut up the compiler and temporarily '#if 0' gv_destroy_geom(),
until we need it again.


132665 26-Jul-2004 pjd

Improve geom(8)'s 'list' command to show geoms and their providers and
consumers. Teach STRIPE, CONCAT and NOP classes about this improvement.


132664 26-Jul-2004 pjd

Change naming scheme from /dev/<name>.stripe to /dev/stripe/<name>.


132663 26-Jul-2004 pjd

Change naming scheme from /dev/<name>.concat to /dev/concat/<name>.


132662 26-Jul-2004 pjd

M_WAITOK is ok here, while I'm using M_WAITOK later in this function.


132661 26-Jul-2004 pjd

M_WAITOK is ok here, while I'm using M_WAITOK later in this function.


132654 26-Jul-2004 le

Save the vinum config back to disk after syncing two plexes.


132642 25-Jul-2004 le

There's a chance that the VINUMDRIVE class tastes before the
VINUM class, so let the VINUMDRIVE class parse the on-disk
configuration, too.


132631 25-Jul-2004 le

Check for a NULL pointer before dereferencing it.


132617 24-Jul-2004 le

Use a temporary geom when tasting vinumdrives and lock the 'real'
vinumdrive geom with an exclusive bit. This should fix the problem
when underlying partitions overlap (i.e. the 'a' partition is at
the same offset as the 'c' partition).

Ideas borrowed from pjd@, quite a bit of testing by
Matthias Schuendehuette <msch@snafu.de>.


132607 24-Jul-2004 le

Disable kldunloading of geom_vinum temporarily until I figured out
how to do it correctly.


132381 19-Jul-2004 pjd

MFp4: Add two options for gnop(8)'s 'create' command:
-o offset - specifies where to start on the original provider
-s size - specifies size of the transparent provider


132355 18-Jul-2004 pjd

Fix copy&paste bug.


132342 18-Jul-2004 pjd

Fix exclusive-bit leakage.


132199 15-Jul-2004 phk

Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".


132098 13-Jul-2004 pjd

Remove unused macro.


132097 13-Jul-2004 pjd

Decrease log level of one debug message, so there is no hole (level 2
wasn't used at all).


132095 13-Jul-2004 pjd

Minor sysctl description fixes.

Submitted by: simon


131878 09-Jul-2004 pjd

Implement "FAST" mode for GEOM_STRIPE class and turn it on by default.

In this mode you can setup even very small stripe size and you can be
sure that only one I/O request will be send to every disks in stripe.
It consumes some more memory, but if allocation fails, it will fall
back to "ECONOMIC" mode.

It is about 10 times faster for small stripe size than "ECONOMIC" mode
and other RAID0 implementations. It is even recommended to use this
mode and small stripe size, so our requests are always splitted.

One can still use "ECONOMIC" mode by setting kern.geom.stripe.fast to 0.
It is also possible to setup maximum memory which "FAST" mode can consume,
by setting kern.geom.stripe.maxmem from /boot/loader.conf.


131877 09-Jul-2004 phk

Only detach consumers which are attached when we wither stuff away.

Pointed out by: pjd


131820 08-Jul-2004 phk

Make withering water tight.

When we orphan/wither a provider, an attached geom+consumer could
end up being withered as a result and it may be in front of us in
the normal object scanning order so we need to do multi-pass. On
the other hand, there may be withering stuff we can't get rid off
(yet), so we need to keep track of both the existence of withering
stuff and if there is more we can do at this time.


131798 08-Jul-2004 phk

Fail normally rather than KASSERT if attempt to open a spoiled consumer.


131718 06-Jul-2004 pjd

Add missing argument.


131716 06-Jul-2004 pjd

Properly free resources if g_access() fails.


131649 05-Jul-2004 pjd

- Add 'stop' command, which works just like 'destroy' command, but sounds
less dangerous.
- Update manual pages and extend examples.
- Bump versions.


131625 05-Jul-2004 pjd

g_clone_bio() can fail, be ready for this.

Approved by: le


131568 04-Jul-2004 phk

We only need to check for overlaps if we increasing access counts.


131476 02-Jul-2004 pjd

Introduce GEOM_LABEL class.
This class is used for detecting volume labels on file systems:
UFS, MSDOSFS (FAT12, FAT16, FAT32) and ISO9660.
It also provide native labelization (there is no need for file system).

g_label_ufs.c is based on geom_vol_ffs from Gordon Tetlow.
g_label_msdos.c and g_label_iso9660.c are probably hacks, I just found
where volume labels are stored and I use those offsets here,
but with this class it should be easy to do it as it should be done by
someone who know how.
Implementing volume labels detection for other file systems also should
be trivial.

New providers are created in those directories:
/dev/ufs/ (UFS1, UFS2)
/dev/msdosfs/ (FAT12, FAT16, FAT32)
/dev/iso9660/ (ISO9660)
/dev/label/ (native labels, configured with glabel(8))

Manual page cleanups and some comments inside were submitted by
Simon L. Nielsen, who was, as always, very helpful. Thanks!


131411 01-Jul-2004 pjd

Remove unused argument for good.


131408 01-Jul-2004 pjd

Free only if pointer isn't NULL.


131267 29-Jun-2004 phk

Fix regression in last commit.


131207 27-Jun-2004 phk

Make sure to kill the devstat entry for disappearing disks.

PR: 68074
Submitted by: Hendrik Scholz <hscholz@raisdorf.net>


131188 27-Jun-2004 pjd

Introduce a hack that will make geom_gate to work with read-only mounts.
Now, when trying to mount file system in read-only mode it tries to
opened a device for writting to be able to update to read-write mode
latter. Ehh.

Discussed with: phk


131160 26-Jun-2004 rwatson

The g_up and g_down threads use a local 'mymutex' mutex to allow WITNESS
to warn about attempts to sleep in the I/O path. This change pushes the
definition and use of 'mymutex' behind #ifdef WITNESS to avoid the cost
in non-debugging cases. This results in a clear .22% performance win for
512 byte and 1k I/O tests on my SMP test box. Not much, but every bit
counts.


131107 25-Jun-2004 le

Mark a plex as 'newborn' when it is created. This is used to indicate
that new RAID5 plexes need to be initialized first.


131046 24-Jun-2004 pjd

Don't force class to give a valid softc to g_slice_new(), it is not always
needed.

Approved by: phk


131015 24-Jun-2004 csjp

Currently, if the drives specified for volume creation are
not active GEOM providers, it will result in a kernel panic.

If the GEOM provider or disk goes away before the volume
configuration data gets written to the disk, it will result
in another kernel panic.

o Make sure that the drives specified for volume creation
are active GEOM providers.

o When writing out volume configuration data to associated drives,
make sure that the GEOM provider is active, otherwise continue
to the next drive in the volume.

Approved by: le, bmilekic (mentor)


131000 23-Jun-2004 le

Add a function to clean up RAID5 packets and use it when I/O has
finished or when building the complete packet fails.


130997 23-Jun-2004 le

Remove two debugging printfs that are currently rather disturbing
than helpful.


130990 23-Jun-2004 le

Accept "sd len 0" and auto-size the subdisk correctly.

Spotted by: csjp


130930 22-Jun-2004 le

No need to free the softc, because it wasn't allocated.


130925 22-Jun-2004 le

Don't sleep in the g_down path. More error checks to come.


130875 21-Jun-2004 phk

Kill g_access_rel() already now before we send it down 5-stable


130836 21-Jun-2004 pjd

Don't hold topology lock while calling g_gate_release().

Found by: KASSERT()


130712 19-Jun-2004 phk

Duplicate the securelevel check from spec_vnops.c here.


130697 18-Jun-2004 le

Clean up allocated ressources when destroying the main vinum geom.


130651 17-Jun-2004 phk

Reduce the thaumaturgical level of root filesystem mounts: Instead of using
an otherwise redundant clone routine in geom_disk.c, mount a temporary
DEVFS and do a proper lookup.

Submitted by: thomas


130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


130597 16-Jun-2004 le

Handle dead disks in a somewhat sane way.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


130542 15-Jun-2004 le

Fix several bugs related to subdisk drive_offset calculation.


130478 14-Jun-2004 le

Don't free a VINUMDRIVE softc when it's orphaned or spoiled. All
allocated ressouces should be ultimately freed in gv_destroy_geom()
(when unloading the module and not earlier), but I need to look at this
more closely.


130477 14-Jun-2004 le

Correctly calculate subdisk offset in RAID5 plexes.


130389 12-Jun-2004 le

Add a first version of a GEOMified vinum.


130280 09-Jun-2004 phk

Make the sysctl kern.geom.collectstats more granular.

Bit 0 controls statistics collection on GEOM providers.
Bit 1 controls statistics collection on GEOM consumers.

Default value is 1.

Prodded by: scottl


130193 07-Jun-2004 pjd

Fix format string.


130191 07-Jun-2004 pjd

Don't allow for duplicated entries creation.


129963 01-Jun-2004 joerg

Add SVR4-compatible VTOC-style elements to the Sun label. The
FreeBSD kernel doesn't use them but sunlabel(8) shortly will,
and both these files are used by sunlabel(8).


129877 30-May-2004 phk

Zap a redundant NULL


129747 26-May-2004 pjd

Dump some more informations:
- device state
- list of used providers
- total number of disks
- number of disks online

Prodded by: Alex Deiter <tiamat@komi.mts.ru>


129548 21-May-2004 pjd

- Change command name from 'config' to 'configure'.
- Bump version number.


129478 20-May-2004 pjd

- Teach CONCAT class how to talk with geom(8).
- Remove provider if any disk was lost.
- Dump CONCAT version.

Supported by: Wheel - Open Technologies - http://www.wheel.pl


129473 20-May-2004 pjd

Introduce STRIPE GEOM class. It implements RAID0 transformation and it
is intend to be fast. Just like CONCAT class it provides manual and
auto configuration methods.

Supported by: Wheel - Open Technologies - http://www.wheel.pl


129471 20-May-2004 pjd

Introduce NOP GEOM class. This is totally transparent GEOM class, but
it is very useful for tests. One is able to destroy its provider
forcibly if wants to test how other class handle such events.
One is also able to specify failure probability to check how other
classes handle I/O errors.

Supported by: Wheel - Open Technologies - http://www.wheel.pl


129116 11-May-2004 sos

Dont try to finish devstat's if the disk pointer is NULL, this can happen
when a disk has been destroyed but still has outstanding bio's.

Reviewed by: phk


128957 05-May-2004 pjd

Close some small wakeup<->msleep races.


128913 04-May-2004 pjd

Fix compilation on 64-bit architectures.

Noticed by: Tinderbox


128889 03-May-2004 pjd

Turn off debugging by default.


128887 03-May-2004 pjd

Prefer signed type over unsigned to be able to assert negative
reference count.


128881 03-May-2004 pjd

- Hold g_gate_list_mtx lock while generating/checking unit number.
Found by: mtx_assert() g_gate.c:273
- Set command before returning to userland with ENOMEM error value.
Found by: assert() ggatel.c:108


128835 02-May-2004 pjd

Make it compile on 64-bit architectures.
The biggest issue was that 16-bit atomic operations aren't supported
on all architectures.


128760 30-Apr-2004 pjd

Kernel bits of GEOM Gate.


128747 30-Apr-2004 marcel

Allow disks with a GPT to be used on big-endian machines. The GPT is
little-endian by definition and needs byte-swap operations for any
multi-byte field. While here fix indentation.


128486 20-Apr-2004 pjd

- Don't check if 'gp' is non-NULL, it always is and GEOM wants to
dump geom configuration when 'pp' and 'cp' are NULL.
- Use tabs instead of spaces.


127863 04-Apr-2004 pjd

Calculate bio_completed properly or die!

Approved by: phk


127699 01-Apr-2004 grehan

Move the name attribute to the end of the conftxt line to simplify
libdisk parsing (the name may be empty, or contain spaces).

Submitted by: Suleiman Souhlal <refugee@segfaulted.com>


127162 18-Mar-2004 pjd

Move "is consumer attached?" check before G_VALID_PROVIDER() check,
because if consumer is not attached, its provider never will be valid,
so we never reach this check.

Approved by: phk


126832 11-Mar-2004 phk

Be more insistent on destroying geoms at unload time. Still not perfect,
but it will do (better) for now.

KASSERT that to have providers a class must have an access method.

Tag the new_provider event with the geom as well.


126798 10-Mar-2004 phk

Rearrange some of the GEOM debugging tools to be more structured.

Retire g_sanity() and corresponding debugflag (0x8)

Retire g_{stall,release}_events().

Under #ifdef DIAGNOSTIC:

Make g_valid_obj() an official function and have it return an an
non-zero integer which indicates the kind of object when found.

Implement G_VALID_{CLASS,GEOM,CONSUMER,PROVIDER}() macros based
on g_valid_obj().

Sprinkle calls to these macros liberally over the infrastructure.

Always check that we do not free a live object.


126773 09-Mar-2004 pjd

- Don't take sectorsize from first disk. Calculate it by finding
least common multiple of all disks sector sizes.
This will allow to safely concatenate disks with different sector sizes.
- Mark unused function arguments.
- Other minor cleanups.


126772 09-Mar-2004 pjd

Print a space character between string given as a macro argument and
bio description.


126726 07-Mar-2004 phk

Don't panic on providers already withered when we wither a geom.


126674 05-Mar-2004 jhb

kthread_exit() no longer requires Giant, so don't force callers to acquire
Giant just to call kthread_exit().

Requested by: many


126589 04-Mar-2004 pjd

Correct year in copyrights.


126565 03-Mar-2004 pjd

- Remove d_valid field, we can use d_consumer field to check if disk
is valid.
- Use SYSCTL_DECL() instead of using own, ugly extern.


126450 01-Mar-2004 pjd

Removed unused fields.


126449 01-Mar-2004 pjd

We don't need d_length field.


126315 27-Feb-2004 pjd

Even if we're sure that we can't be orphaned here, we have to define
orphan field - we're enforcing it in GEOM. This will reach KASSERT
in INVARIANTS case.

Add missing space.

Approved by: scottl (mentor)


126314 27-Feb-2004 pjd

Remove unused field.

Approved by: scottl (mentor)


126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


126007 19-Feb-2004 pjd

Introduce CONCAT GEOM class for disk concatenation.
It allows manual and automatic (based on on-disk metadata) concatenation.

Reviewed by: phk, scottl
Approved by: scottl (mentor)


125975 18-Feb-2004 phk

Change the disk(9) API in order to make device removal more robust.

Previously the "struct disk" were owned by the device driver and this
gave us problems when the device disappared and the users of that device
were not immediately disappearing.

Now the struct disk is allocate with a new call, disk_alloc() and owned
by geom_disk and just abandonned by the device driver when disk_create()
is called.

Unfortunately, this results in a ton of "s/\./->/" changes to device
drivers.

Since I'm doing the sweep anyway, a couple of other API improvements
have been carried out at the same time:

The Giant awareness flag has been flipped from DISKFLAG_NOGIANT to
DISKFLAG_NEEDSGIANT

A version number have been added to disk_create() so that we can detect,
report and ignore binary drivers with old ABI in the future.

Manual page update to follow shortly.


125803 14-Feb-2004 phk

Do not check error code from closing ->access() calls, we know they succeed.


125802 14-Feb-2004 phk

Add a KASSERT which checks that a class never fails a closing ->access()
call.


125755 12-Feb-2004 phk

Remove the absolute count g_access_abs() function since experience has
shown that it is not useful.

Rename the relative count g_access_rel() function to g_access(), only
the name has changed.

Change all g_access_rel() calls in our CVS tree to call g_access() instead.

Add an #ifndef BURN_BRIDGES #define of g_access_rel() for source
code compatibility.


125743 12-Feb-2004 phk

Give both consumers and providers a {void *private, u_int index} which
the implementing class can use to hang internal info from.


125713 11-Feb-2004 pjd

Added g_print_bio() function to print informations about given bio.

Approved by: phk, scottl (mentor)


125657 10-Feb-2004 pjd

Now we have g_topology_assert_not(), so use it to detect deadlocks.

Approved by: phk, scottl (mentor)


125656 10-Feb-2004 pjd

Added macro which will be used to assert, that the topology lock is not held.

Approved by: phk, scottl (mentor)


125651 10-Feb-2004 phk

don't call sbuf_clear() right after sbuf_new(), it is not necessary.


125591 08-Feb-2004 phk

Polish the work/state engine in preparation for HW-crypto support.


125590 08-Feb-2004 phk

Add a missing error case return.

Problem reported by: Flemming Jacobsen <fj@batmule.dk>


125579 07-Feb-2004 phk

We don't need to hold Giant to create the worker kthread.


125539 06-Feb-2004 pjd

Allow decreasing access count even if there is no disk anymore.
This will allow closing disks that were removed while opened.

Approved by: phk, scottl (mentor)


125538 06-Feb-2004 le

Fix memory leak.

PR: kern/58634
Submitted by: le
Approved by: phk


125342 02-Feb-2004 phk

Allow a GEOM class to unload if it has no geoms or a method function to
get rid of them.

Prodded by: pjd


125332 02-Feb-2004 pjd

- Use proper names in KASSERTs.
- Typos.

Approved by: phk, scottl (mentor)


125325 02-Feb-2004 phk

Check error return from g_clone_bio(). (netchild@)

Rearrange code to avoid duplication (phk@)

Submitted by: netchild@


125318 02-Feb-2004 phk

Don't mingle malloc/g_event flags.

Spotted by: pjd@


125137 28-Jan-2004 phk

Bring back the geom_bioqueues, they _are_ a good idea.

ATA will uses these RSN.


124885 23-Jan-2004 phk

Make sure to keep track of canceled events.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


124883 23-Jan-2004 phk

Add KASSERTS.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


124881 23-Jan-2004 phk

Plug an insignificant memoryleak.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


124880 23-Jan-2004 phk

Add missing newline in printf.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


124869 23-Jan-2004 phk

Remove the MD5_KEY debugging tool


124864 23-Jan-2004 phk

Remove no longer necessary debug printfs


124371 11-Jan-2004 phk

Print the correct pointer in a KASSERT.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


124294 09-Jan-2004 phk

KASSERT against no-op access requests.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


123761 23-Dec-2003 phk

Prevent withering of the provider we're orphaning from happening until
we do it ourselves.

Nailed by: Simon Heath <heath@cng.fr>


123271 07-Dec-2003 truckman

Correct usage of mtx_init() API. This is not a functional change since
the code happened to work because MTX_DEF and NULL are both defined as 0.

Reviewed by: phk


123233 07-Dec-2003 phk

KASSERT against multiple orphanings of providers.


123215 07-Dec-2003 scottl

Re-arrange and consolidate some random debugging stuff


122888 18-Nov-2003 phk

Call class->init() an class->fini() while the class is hooked up,
rather than right before and right after. This allows these routines
to manipulate the mesh.

KASSERT that nobody creates a geom on an alien class.

Assert topology in g_valid_obj().

Approved by: re@


122880 18-Nov-2003 phk

Fix a harmless bug and add a ')' in a debugging printf.

Submitted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>


122762 15-Nov-2003 phk

This is a crude bandaid for 5.2 to protect against providers which disappear
while being tasted. I can moderately easy trigger this with atapi-cd, but
I do not fully understand the circumstances.


122550 12-Nov-2003 phk

Make sure to return errors if we have any.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


121476 24-Oct-2003 phk

Close the right consumers if we run into trouble opening them all.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


121475 24-Oct-2003 phk

Fix two old/new consumer confusions.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


121366 22-Oct-2003 phk

Fix a braino memory leak.

Found by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


121323 22-Oct-2003 phk

Forgotten commit: If a provider has zero sectorsize, it is an
indication of lack of media.

Tripped up: peter


121253 19-Oct-2003 phk

Remove KASSERT check for negative bio_offsets, add "normal" EIO
error return for same.


121216 18-Oct-2003 phk

Retire bio_blkno entirely.

bio_offset is the field drivers should use.
bio_pblkno remains as a convenient place to store the number of
the device drivers.


121030 12-Oct-2003 phk

Assume that bp->bio_offset is correctly initialized.

This fixes non-power-of-2 blocksize GEOM I/O.


121029 12-Oct-2003 phk

Destroy providers maked with G_PF_WITHER when the last consumer has detached.


120876 07-Oct-2003 phk

Interior decoration changes.


120852 06-Oct-2003 phk

Allow our bio tools to be used for local bio-chopping by not mandating
a bio_from value. bio_to is still mandated (mostly for debuggign) and
shall be copied from the parent bio.


120851 06-Oct-2003 phk

Introduce a per provider wither flag


120572 29-Sep-2003 phk

Return ENODEV in case the driver has no dump routine.


120506 27-Sep-2003 phk

The present defaults for the open and close for device drivers which
provide no methods does not make any sense, and is not used by any
driver.

It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.

Change the defaults to be nullopen() and nullclose() which simply
does nothing.

Remove explicit initializations to these from the drivers which
already used them.


120493 26-Sep-2003 phk

Add more KASSERTS().


120374 23-Sep-2003 phk

Be more careful in dumpconf: softc may be NULL for departing devices.

Allow drivers to initialize the d_devstat if they want magic params.


119973 11-Sep-2003 phk

Reorder a couple of KASSERTS to give more sensible messages.

Found by: GEOM 101 class of '03


119891 08-Sep-2003 phk

Correct bzero length so we clear the entire key structure.


119809 06-Sep-2003 phk

Bzero the right number of bytes.

Found by: Juergen Buchmueller <pullmoll@stop1984.com>


119749 04-Sep-2003 phk

Make sure to return ENOIOCTL if the ioctl is not handled.


119660 01-Sep-2003 phk

Simplify the ioctl handling in GEOM.

This replaces the current ioctl processing with a direct call path
from geom_dev() where the ioctl arrives (from SPECFS) to any directly
connected GEOM class.

The inverse of the above is no longer supported. This is the
situation were you have one or more intervening GEOM classes, for
instance a BSDlabel on top of a MBR or PC98. If you want to issue
MBR or PC98 specific ioctls, you will need to issue them on a MBR
or PC98 providers.

This paves the way for inviting CD's, FD's and other special cases
inside GEOM.


119652 01-Sep-2003 phk

Try to close the race between disk_destroy() and a subsequent disk_create().


119593 30-Aug-2003 phk

Add the new g_dev_getprovider() function, the swap_pager needs it now.

Spotted by: mr


119300 22-Aug-2003 ps

Change the the size fields to daddr_t to support greater than 2TB ccd volumes.

Reviewed by: phk


119299 22-Aug-2003 phk

Make CCD unloadable.


119298 22-Aug-2003 phk

Don't panic over the fact that unloading failed if we already knew that.


119296 22-Aug-2003 phk

Block all GETATTR calls hitting the CCD, we wouldn't know which child
device should handle them.

This prevents for instance GEOM::ioctl requests from reaching a
lower BSDlabel node, which ps@ found would confuse newfs(8).


119295 22-Aug-2003 phk

Check for null softc pointers, these happens when a ccd is withering.

Found by: David Schultz <dschultz@OCF.Berkeley.EDU>


118869 13-Aug-2003 phk

Replace a panic with a .1Hz retry loop.
Not a perfect solution, but far cheaper than one.


118855 13-Aug-2003 phk

In case we encounter a zero sectorsize provider in g_io_check(), fail
the request with a printf rather than a divide by zero error.


118355 02-Aug-2003 phk

Kick Giant compatibility one layer up.


118182 29-Jul-2003 phk

Fix a memory leak in CCD's mirror code.


118150 29-Jul-2003 phk

Implement DOSPTYP_EXTLBA more completely: loop until we find no more
partitions.

Submitted by: Rudolf Cejka <cejkar@fit.vutbr.cz>
PR: 53719


117342 08-Jul-2003 phk

Handle geoms which are withering away specially in the dump functions.


117150 02-Jul-2003 phk

Only dump 512 bytes of debugging.

Always wait for things to settle before returning.


116522 18-Jun-2003 phk

Sleep on "-" in our normal state to simplify debugging.


116518 18-Jun-2003 phk

Add "GEOM_FOX", a class which detects and selects between multiple
redundant paths to the same device.

This class reacts to a label in the first sector of the device,
which is created the following way:

# "0123456789abcdef012345..."
# "<----magic-----><-id-...>
echo "GEOM::FOX someid" | dd of=/dev/da0 conv=sync

NB: Since the fact that multiple disk devices are in fact the same
device is not known to GEOM, the geom taste/spoil process cannot
fully catch all corner cases and this module can therefore be
confused if you do the right wrong things.

NB: The disk level drivers need to do the right thing for this to
be useful, and that is not by definition currently the case.


116196 11-Jun-2003 obrien

Use __FBSDID().

Approved by: phk


116107 09-Jun-2003 phk

Fix error handling for ENOMEM style issues.


115960 07-Jun-2003 phk

Improve the root-dev prompt facility for printing devices which could
possibly be a root filesystem.


115959 07-Jun-2003 phk

Wait for everything to settle before we try to print the list of
geom devices.


115958 07-Jun-2003 phk

Make sure we return an error message if the geom parameter is not
located.


115953 07-Jun-2003 phk

Polishing and nitpicking.


115951 07-Jun-2003 phk

Drop a memory-corruption debugging test-tool.


115949 07-Jun-2003 phk

Add missing va_end() calls.

Noticed by: tmm


115850 04-Jun-2003 phk

Introduce g_provider_by_name() function, and use it.


115849 04-Jun-2003 phk

Make this a true GEOM class:
Attach to the component devices using GEOM semantics.
Create a GEOM provider instead of using disk_create()
Use the GEOM OAM api for configuration.

I saw approx ~1% speedup in througput and ~7% in latency in a
simple minded test of a two-disk striped device.

This file was repo-copied from src/sys/dev/ccd/ccd.c.

This is not yet linked into the build.


115845 04-Jun-2003 phk

Add a KASSERT to prevent the same GEOM class from being processed loaded
twice.

Enforce that classes should have different names while we are here.


115731 02-Jun-2003 phk

Further devilification of CCD:

Change the list interface to simplify things.
Remove old list ioctls which bogusly exported the softc to userland.
Move the softc and associated structures from the public header to
the source file.


115729 02-Jun-2003 phk

Begin deevilification of CCD:

Make CCD a GEOM class.

For now only use this for implementing a OAM config method which
can return a list of configured CCD devices in the format which
"ccdconfig -g[v]" would normally output.


115726 02-Jun-2003 phk

Return an indicative error message.


115624 01-Jun-2003 phk

Simplify the GEOM OAM api: Drop the request type, and let everything
hinge on the "verb" parameter which the class gets to interpret as
it sees fit.

Move the entire request into the kernel and move changed parameters
back when done.


115623 01-Jun-2003 phk

constify g_sanity()


115611 01-Jun-2003 phk

Use bcmp() to compare hash strings.


115517 31-May-2003 phk

Remove unused variable.
Remove unneeded return;

Found by: FlexeLint


115515 31-May-2003 phk

Remove unused variables.

Found by: FlexeLint


115512 31-May-2003 phk

Remove unused variables.
Rename struct h0h0 to g_hh01 in order to make it unique over files.

Found by: FlexeLint


115509 31-May-2003 phk

Remove unused variables.
Remove #ifdef notyet which will never become.

Found by: FlexeLint


115508 31-May-2003 phk

Remove unused variable.
Remove unneeded return.

Found by: FlexeLint


115507 31-May-2003 phk

Remove unused variable.

Found by: FlexeLint


115506 31-May-2003 phk

Add a destroy_geom method to the slice "library".
If a slice class has no destroy_geom method, use this one.

This should allow all slicers to kldload.


115505 31-May-2003 phk

Don't use & in front of arrays.

Found by: FlexeLint


115504 31-May-2003 phk

Remove unused variable.

Found by: FlexeLint


115492 31-May-2003 phk

Remove unused variable.

Found by: FlexeLint


115473 31-May-2003 phk

Introduce a init and fini member functions on a class.

Use ->init() and ->fini() to handle the mutex in geom_disk.c

Remove the g_add_class() function and replace it with a standardized
g_modevent() function.

This adds the basic infrastructure for loading/unloading GEOM classes


115468 31-May-2003 phk

Remove the G_CLASS_INITIALIZER, we do not need it anymore.


115460 31-May-2003 phk

Use le_uuid_dec() since GPT UUID's are always in LE format.

Tested by: Marcel


115309 25-May-2003 phk

Don't do silly thing if the disk_create() event gets canceled.

Approved by: re/scottl


115214 21-May-2003 phk

Return ENXIO if the softc pointer is NULL, in all likelyhood the
disk is in the process of disappearing.

Approved by: re/rwats*


114958 12-May-2003 phk

When a disk disappears, destroy the class from the event thread
to avoid race condtion.

Approved by: re/rwatson


114864 09-May-2003 phk

When a GEOM (/dev-)device is closed and we find that I/O requests are
still outstanding, give them a chance to complete.

If after 10 seconds we still find outstanding I/O requests, complete
the close with a console warning that the system is likely to panic
later on.

This is a workaround for umount -f not quite doing the right thing.

Approved by: re/scottl


114795 07-May-2003 phk

Hide the "ENOMEM" notice messages behind bootverbose. They are still
a valuable debugging tool for certain kinds of problems.

Approved by: re/scottl


114785 06-May-2003 phk

Fix the WARNING for wrong rawoffset, I tested incompatible units.

Approved by: re/jhb


114736 05-May-2003 phk

Avoid double-free panic.

Tripped up: DougB


114720 05-May-2003 phk

Re-order the the initialization slightly to improve structure.


114715 05-May-2003 phk

Use a dedicated malloc(9) bucket for sector storage.


114712 05-May-2003 phk

Don't warn if the rawoffset is zero, that is actually the best value it
could have.


114705 05-May-2003 phk

Turn the check that rawoffset == mbroffset into a warning instead.


114672 04-May-2003 phk

Only accept a rawoffset if it is identical to the mbroffset.


114671 04-May-2003 phk

Add a way to read the current mbroffset from a BSD label class.


114670 04-May-2003 phk

Add gctl_set_param() function.


114668 04-May-2003 phk

Remove debugging printfs which should not have been committed.


114568 03-May-2003 phk

Add a OAM interface for changing the label and writing the boot code.


114566 03-May-2003 phk

remove unused variables.

Spotted by: dougb


114556 02-May-2003 phk

Make bsd_disklabel_le_enc calculate the checksum and fill it in.
(If there is a legitimate need to correctly encode and pack a
disklabel with an invalid checksum custom tools can be built for
that.)

Make bsd_disklabel_le_dec() validate the magics, number of partitions
(against a new parameter) and the checksum.

Vastly simplify the logic of the GEOM::BSD class implementation:

Let g_bsd_modify() always take a byte-stream label.

This simplifies all users, except the ioctl's which now have to
convert to a byte-stream first. Their loss.

g_bsd_modify() is called with topology held now, and it returns
with it held.

Always update the md5sum in g_bsd_modify(), otherwise the check
is no use after the first modification of the label. Make the
MD5 over the bytestream version of the label.

Move the rawoffset hack to g_bsd_modify() and remove all the
inram/ondisk conversions.

Don't configure hotspots in g_bsd_modify(), do it in taste instead,
we do not support moving the label to a different location on the
fly anyway.

This passes all current regression tests.


114548 02-May-2003 phk

Pull in bcopy() prototype from <string.h> when compiled in userland.


114543 02-May-2003 phk

Considering that I did cast the arguments to (intmax_t) I must have
been sleepy since I used %qd instead of %jd.


114533 02-May-2003 phk

Style improvement.


114532 02-May-2003 phk

Use g_wither_geom() and plug memory leaks.


114531 02-May-2003 phk

Plug memory leaks.


114526 02-May-2003 phk

Use an uma-zone for allocation bio requests.


114519 02-May-2003 phk

Use g_slice_spoiled() instead of g_std_spoiled().

Add XXX comment about minor memory leak until I can fix it.


114518 02-May-2003 phk

Use g_slice_spoiled() instead of g_std_spoiled().


114517 02-May-2003 phk

Use g_slice_spoiled().
Free buffer from g_read_data().


114511 02-May-2003 phk

Back out all the stuff that didn't belong in the last commit.


114508 02-May-2003 phk

Use g_slice_spoiled() rather than g_std_spoiled().

Remember to free the buffer we got from g_read_data().


114507 02-May-2003 phk

Use g_slice_spoiled() not g_std_spoiled()


114506 02-May-2003 phk

Use g_slice_spoiled() rather than g_std_spoiled()


114505 02-May-2003 phk

Use g_slice_spoiled() rather than g_std_spoiled().


114504 02-May-2003 phk

Use a more tailored spoil routine for slices, and take advantage of
g_wither_geom() to do most of the work for us.


114499 02-May-2003 phk

Style improvement.


114498 02-May-2003 phk

Use g_wither_geom() for cleanup.


114495 02-May-2003 phk

Rework the "withering" mechanism:

Introduce g_wither_geom() to do the work in one single place.


114493 02-May-2003 phk

Rename g_slice_init() to the more appropriate g_slice_alloc() and give
it a g_slice_free() partner function.


114491 02-May-2003 phk

style improvement.


114490 02-May-2003 phk

Get rid of trivial function g_destroy_event().


114459 01-May-2003 phk

Plug some memory-leaks.


114455 01-May-2003 phk

Remove the now obsolete geomidorname hack.


114450 01-May-2003 phk

Add a new flag, EV_CANCELED, and use it to make g_waitfor_event() return
EAGAIN if an event got canceled.


114447 01-May-2003 phk

When events on a reference is cancelled, check our doorstep first,
it might be an orphan.


114440 01-May-2003 phk

Remove now unneeded special case for "geom.ctl".


114421 01-May-2003 nyan

Remove DIOCGPC98 ioctl.


114414 01-May-2003 nyan

- Move decoding pc98_partition function into geom_pc98_enc.c.
- Add encoding pc98_partition function.


114367 01-May-2003 marcel

Don't emulate a MBR by handling the MBR::type attribute. It is
not needed at all. The BSD class will attach to a GPT class without
it.


114293 30-Apr-2003 markm

Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.


114251 29-Apr-2003 phk

Fix an obscure fencepost error in GBDE's sector mapping code:

For certain combinations of sectorsize, mediasize and random numbers
(used to define the mapping), a multisector read or write would ignore
some subset of the sectors past the first sector in the request because
those sectors would be mapped past the end of the parent device, and
normal "end of media" truncation would zap that part of the request.

Rev 1.19+1.20 of g_bde_work.c added the check which should have alerted
me to this happening. This commit maps the request correctly and
adds KASSERTS to make sure things stay inside the parent device.

This does not change the on-disk layout of GBDE, there is no need to
backup/restore.


114250 29-Apr-2003 phk

Typo in last commit: Do not press xZZ to leave vi(1).


114249 29-Apr-2003 phk

When a bio comes back from below with a zero error code, check that
it wrote the full length. The only case where this should be able
to happen is if we try to read/write past the end and the request
is truncated. We obviously should never try to do that, so this
code should never activate.


114216 29-Apr-2003 kan

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


114167 28-Apr-2003 phk

I accidentally leaked this debugging tool in with my last commit.

Disable it with a direct warning.


114153 28-Apr-2003 phk

Rename g_bde_get_sector() to g_bde_get_keysector() and pick up the
offset from the work packet.


114152 28-Apr-2003 phk

Only attempt total cache-purge once in case of failure.


114150 28-Apr-2003 phk

Better criteria for skipping disk reading BIO_READ work packets.


114148 28-Apr-2003 phk

Explicitly set the sector state to JUNK if we encounter a read-error.


114088 26-Apr-2003 phk

Bail as soon as the first write request has failed, there is no point
in trying the second write if the first one went nowhere.


114087 26-Apr-2003 phk

Appearantly UFS no longer issues BIO_DELETE requests correctly, and
consequently trashes data. Disable BIO_DELETE handling in gbde for now.


114041 25-Apr-2003 phk

Do an explicit retry after we have dumped the cache, rather than a
(potential) tail recursion.


114040 25-Apr-2003 phk

If on a BIO_READ request, we failed to allocate the bio for reading
our key-sector, we would end up returning the read without an error,
despite the fact that the data was not correctly decrypted.

This would result in data corruption on read, but intact data still
on the media.


114038 25-Apr-2003 phk

Fix a problem and slightly improve the ENOMEM handling:

Give up the entire bio as soon as we detect a problem.

When we detect a problem, give up the bio by contributing the
remainder with ENOMEM, rather than kicking the bio back right
away.

If we failed on a non-first iteration we previously could end up
modifying fields in the bio after we delivered it. This could
account for memory corruption (none directly reported) on machines
with GBDE.


114035 25-Apr-2003 phk

Don't count a sector in the cache unless we manage to create it.


114034 25-Apr-2003 phk

Rename g_bde_release_sector() to g_bde_release_keysector() and pick up
the sector from the work item.


114033 25-Apr-2003 phk

Rename g_bde_read_sector() to g_bde_read_keysector() pick up the offset
in the work structure.


113940 23-Apr-2003 phk

Introduce a g_waitfor_event() function which posts an event and waits for
it to be run (or cancelled) and use this instead of home-rolled versions.


113938 23-Apr-2003 phk

More of the event stuff can now be private to geom_event.c


113937 23-Apr-2003 phk

Rename g_call_me() to g_post_event(), and give it a flag
argument to determine if we can M_WAITOK in malloc.


113934 23-Apr-2003 phk

Remove the now unused hardcoded g_post_event() event support.


113930 23-Apr-2003 phk

Turn EV_NEW_PROVIDER into a g_call_me() event.


113929 23-Apr-2003 phk

Convert EV_SPOILED event to use g_call_me().


113927 23-Apr-2003 phk

Turn the hardwired NEW_CLASS event into a g_call_me() event.


113926 23-Apr-2003 phk

Move the shutdown eventhandler stuff to a more logical place.


113895 23-Apr-2003 phk

Implement CONFIG_GEOM verbs "write label" and "write bootcode".


113893 23-Apr-2003 phk

Introduce gctl_get_paraml() which gets a parameter only if it has the
right length.


113892 23-Apr-2003 phk

Make gctl_error() take printfline varargs.


113889 23-Apr-2003 phk

Remove unused event pointers in object structures.
Remove KASSERTS which checked that they were unused.


113880 22-Apr-2003 phk

Change the locking so that the _modify function is called with topology
held.

The only place where we want to not hold topology is when we read
(or write) the label to disk: in the case of a disk error with a
long recovery time, holding topology would prevent open/close of
any disk device.


113879 22-Apr-2003 phk

We don't need to have a slice->start() function.


113878 22-Apr-2003 phk

Do not mandate that slicers have a private ->start(), they may not need
one. KASSERT() that they have one if G_SLICE_HOT_START is used.


113876 22-Apr-2003 phk

Implement handling of CONFIG_GEOM OAM request.


113875 22-Apr-2003 phk

Add "CONFIG_GEOM" operation to the OAM API.


113862 22-Apr-2003 phk

Collapse meta arguments into regular arguments, the distinction is
more trouble than it is worth.


113821 21-Apr-2003 phk

Implement a hotspot for the sunlabel.

This means that you can no longer trash your opened partitions by writing to
the sunlabel through another partition. This is similar to the semantics
implemented for BSD labels.


113819 21-Apr-2003 phk

Update GEOM::SUN to use the decoding functions in geom_sunlabel_enc.c
and #defines from sys/sun_disklabel.h.


113818 21-Apr-2003 phk

Use #defines from <sys/sun_disklabel.h> instead of private ones.


113813 21-Apr-2003 phk

Functions to encode and decode Sun Microsystems disk partitioning data
structures.

Mostly by: jake


113713 19-Apr-2003 phk

Make more of the "hotspot" stuff generic:

Give the class a way to specify the necessary action for read/delete/write:
ALLOW, DENY, START or CALL.

Update geom_bsd to use this.


113712 19-Apr-2003 phk

Create a dedicated structure for holding hotspot information rather than
using slice structures for it.


113593 17-Apr-2003 phk

These two files fell off during my previous commit: put the encoding
decoding functions for struct disklabel in a separate .c file.


113464 14-Apr-2003 phk

More correct patch: Only call biofinish if we have not already sent
any children down the mesh.


113462 14-Apr-2003 phk

Call biofinish() also when we get a malloc() failure.


113432 13-Apr-2003 phk

Time has run from the "run GEOM in userland" harness, and the new regression
test is built to test GEOM as running in the kernel.

This commit is basically "unifdef -D_KERNEL" to remove the mainly #include
related code to support the userland-harness.


113411 12-Apr-2003 phk

If we hit access ahead of a spoil event, we should have negative
delta access-counts and proceed.


113408 12-Apr-2003 phk

Fix a bug which resulted in orphanization getting confused every now
and then.


113392 12-Apr-2003 phk

Retire the experimental bio_taskqueue(), it was not quite as usable as
hoped. It can be revived from here, should other drivers be able to
use it.


113390 12-Apr-2003 phk

Retire the "frontstuff" record keeping, it was no match for the
in-band meta-data of BSD labels and a more complex solution will be needed.


113389 12-Apr-2003 phk

Move the functions for encoding decoding struct dos_partition into
a separate .c file so they can be used from userland as well.


113294 09-Apr-2003 phk

Only be verbose if (bootverbose)


113292 09-Apr-2003 phk

With the magic sequence checks removed this class is downright dangerous
to have in your kernel since it indiscriminately attaches to anything
it is offered with a range of bogus partitions.

Stop this from happening by rejecting any label with negative numbers in
it.


113286 09-Apr-2003 phk

Correctly split cyl/sects bytes when we print them.


113285 09-Apr-2003 phk

Style issue: use do {...} while(0); for multi-exit section.


113034 03-Apr-2003 phk

Retire the DIOCGMBR ioctl before anybody starts to use it.


113032 03-Apr-2003 phk

Remove all references to BIO_SETATTR. We will not be using it.


113031 03-Apr-2003 phk

Update the initializer for GEOM_MBREXT, I overlooked it previously.


113030 03-Apr-2003 phk

Add #define for DOSPTYP_PMBR, and use it.


113013 03-Apr-2003 phk

#include <sys/endian.h> as needed.


113012 03-Apr-2003 phk

Remove geom_enc.c, a superset of these functions are now available in
<sys/endian.h>


113011 03-Apr-2003 phk

Use <sys/endian.h> instead of geom_enc.c for endianess-agnostification.


113010 03-Apr-2003 phk

Use sys/endian.h instead of geom_enc.c for endian-agnostfication.


113008 03-Apr-2003 phk

Make sure we don't ignore error codes.


112989 02-Apr-2003 phk

Add handling for cancelled events in the g_call_me() methods.


112988 02-Apr-2003 phk

Change events to have an array of "void *" references, and give the
event posting functions varargs to fill these.

Attribute g_call_me() to appropriate g_geom's where necessary.

Add a flag argument to g_call_me() methods which will be used to signal
cancellation of events in the future.

This commit should be a no-op.


112979 02-Apr-2003 phk

Only orphan things if the open/close actually succeeded.


112978 02-Apr-2003 phk

Properly handle races between open/close and orphan.

KASSERT the race between close and strategy, it is an error in the upper
echelons if this happens,

Add XXX: comment explaining why the ioctl/orphan race is not closed.


112952 01-Apr-2003 phk

Include <geom/geom_disk.h> not <sys/disk.h>


112946 01-Apr-2003 phk

Use bioq_flush() to drain a bio queue with a specific error code.
Retain the mistake of not updating the devstat API for now.

Spell bioq_disksort() consistently with the remaining bioq_*().

#include <geom/geom_disk.h> where this is more appropriate.


112943 01-Apr-2003 phk

Start to split the GEOM/diskdriver specific bits into geom/geom_disk.h


112927 01-Apr-2003 phk

Remove the old config interface, the new OAM is sufficiently functional
now.


112926 01-Apr-2003 phk

Remove the old config interface now that the new OAM is functional.


112876 31-Mar-2003 phk

Remove some debugging in the new OAM[*] and add a debug flag for other
parts of it.

[*] I've been asked what "OAM" means: It's an acronym used in the
telecom industry, "Operations And Maintenance", and there it covers
anything from a single unlabeled led on the frontpanel the the full
nightmare of CMIP for SS7.


112830 29-Mar-2003 phk

Fix a bug in the ENOMEM pacing code which probably made it panic systems
after a lot of ENOMEM errors.


112828 29-Mar-2003 phk

Add create_geom and destroy_geom methods.


112709 27-Mar-2003 phk

Run a revision on the OAM api.

Use prefix gctl_ systematically.
Add flag with access perms for each argument.
Add ro/rw versions of argument building functions.
General cleanup.


112708 27-Mar-2003 phk

Check return value of g_call_me()


112596 25-Mar-2003 phk

g_class_by_name() was unused too.


112595 25-Mar-2003 phk

Remove unuse g_insert_geom().


112594 25-Mar-2003 phk

Forward compatibility: NULL check the passed in meta argument.


112552 24-Mar-2003 phk

Premptively change initializations of struct g_class to use C99
sparse struct initializations before we extend the struct with
new OAM related member functions.


112534 24-Mar-2003 phk

Turn /dev/geom.ctl from a GEOM class into a plain character device driver
instead, it will never see a disk-I/O transaction, so this is a lot simpler.


112533 24-Mar-2003 phk

Save a lock: Grab the stall_events SX lock exclusively so it also serialize
OAM reqests.


112518 23-Mar-2003 phk

Introduce g_cancel_events() and use it a couple of places where it makes
sense.


112517 23-Mar-2003 phk

Introduce an SX lock which allows us to stall event processing
during OAM operations.


112512 23-Mar-2003 phk

I forgot the evil ioctl census scripts: #include <geom/geom_ctl.h>


112511 23-Mar-2003 phk

Marshalling stuff for OAM API.


112509 23-Mar-2003 phk

A note about which #include files may be used where.


112508 23-Mar-2003 phk

Start leaking the AOM api into the tree.


112476 21-Mar-2003 phk

Mitigate deadlock situation pending a more complete solution.


112370 18-Mar-2003 phk

Retire the GEOM private statistics code and use devstat instead.


112367 18-Mar-2003 phk

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


112322 16-Mar-2003 phk

#ifdef notyet a bit of code which needs not yet committed refcounting to
work correctly.


112259 15-Mar-2003 phk

Use devstat_{start,end}_transaction_bio().
Remember to set bio_resid correctly first.


112070 10-Mar-2003 phk

If we run out of consumers while orphaning them, and the provider's geom
is withering, destroy the provider when done.

This was exposed by the recent change to geom_dev's orphaning logic.


112069 10-Mar-2003 phk

Fix yet another fallout of our M_* song and dance.


112030 09-Mar-2003 phk

Remove unneeded #include of geom_stats.h


112029 09-Mar-2003 phk

Stamp out Danglish.


112028 09-Mar-2003 phk

Don't use statistics counters to detect outstanding I/O.


112027 09-Mar-2003 phk

Don't abuse the statistics counters for detecting if we have outstanding
I/O requests, instead use the new dedicated fields in the consumer and
provider to track this.


112026 09-Mar-2003 phk

Add u_int nstart, nend counters to consumer and providers so we will not
have to examine the stats structure to tell if we have outstanding I/O
requests.

Making them u_int improves the chance of atomic updates to them,
but risks roll-over. Since the only interesting property is if
they are equal or not, this is not an issue.


112024 09-Mar-2003 phk

When a DEV class consumer is orphan'ed we need to wait for all the
outstanding requests to return before we unravel the mesh.

It is very important that the stuff below us plays nice and don't
overlook a couple of outstanding bio's, because until they remember
the geom event thread is blocked. At an expense in code here this
could be made more robust, but I actually _want_ a robust failure
in this case so any offending drivers can be fixed.


112002 08-Mar-2003 phk

Allocate devstat structure with devstat_new_entry().


111979 08-Mar-2003 phk

Centralize the devstat handling for all GEOM disk device drivers
in geom_disk.c.

As a side effect this makes a lot of #include <sys/devicestat.h>
lines not needed and some biofinish() calls can be reduced to
biodone() again.


111964 07-Mar-2003 phk

Limit our requests to DFLTPHYS, this is generally a good idea for
memory-allocation purposes. Right now it is also a very good idea
because we hit a Giant assertion in the free(9) processing if we
free something larger than 64k.


111863 04-Mar-2003 phk

Initialize the second buffer for mirroring to point to itself and not its
partner.


111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


111733 02-Mar-2003 phk

NO_GEOM cleanup:

Remove cdevsw->d_psize() implementation, we don't need it any more.


111668 28-Feb-2003 phk

NO_GEOM cleanup:

Retire the "dev_t" centric version of the disk mini-layer.
Remove now unneeded linkage field in dev_t and struct disk.


111462 25-Feb-2003 mux

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


111277 23-Feb-2003 grehan

Drop down Apple Partition Map code that has been in use by some
ppc developers for a while.

OK'd by: phk


111232 21-Feb-2003 phk

NO_GEOM cleanup: Convert CCD(4) to be use "struct disk*" instead of "dev_t"
as "this" handle.


111220 21-Feb-2003 phk

NO_GEOM cleanup:

Retire the "d_dump_t" and use the "dumper_t" type instead.

Dumper_t takes a void * as first arg which is more general than the
dev_t taken by d_dump_t. (Remember: we could have net-dumpers if
somebody wrote us one!)

Define the convention for GEOM controlled disk devices to be that the
first argument to the dumper function is the struct disk pointer.

Change device drivers accordingly.


111216 21-Feb-2003 phk

NO_GEOM cleanup:

Change the argument to disk_destroy() to be the same struct disk * as
disk_create() takes.

This enables drivers to ignore the (now) bogus dev_t which disk_create()
returns.


111146 19-Feb-2003 phk

Add M_WAITOK


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


110766 12-Feb-2003 tegge

Correctly set bio_data in cloned children when cutting up large requests.


110759 12-Feb-2003 phk

Implement a handle for efficient implementation of perforations in
lower extremities.

Setting bit 4 in debugflags (sysctl kern.geom.debugflags=16) will
allow any open to succeed on rank#1 providers. This will generally
correspond to the physical disk devices: ad0, da0, md0 etc.

This fundamentally violates the mechanics of GEOMs autoconfiguration,
and is only provided as a debugging facility, so obviously error
reports on GEOM where this bit is or has been set will not be
accepted.


110736 11-Feb-2003 phk

Implement a bio-taskqueue to reduce number of context switches in
disk I/O processing.

The intent is that the disk driver in its hardware interrupt
routine will simply schedule the bio on the task queue with
a routine to finish off whatever needs done.

The g_up thread will then schedule this routine, the likely
outcome of which is a biodone() which queues the bio on
g_up's regular queue where it will be picked up and processed.

Compared to the using the regular taskqueue, this saves one
contextswitch.

Change our scheduling of the g_up and g_down queues to be water-tight,
at the cost of breaking the userland regression test-shims.

Input and ideas from: scottl


110729 11-Feb-2003 phk

Announce our ability to do MAXPHYS transfers.


110728 11-Feb-2003 phk

Advertise MAXPHYS upwards, we will split as necessary before we get to the
bottom of things.


110727 11-Feb-2003 phk

Check disk->d_maxsize/dev->si_iosize_max at open time rather than in strategy.

Printf a warning and use DFLTPHYS if the drive has not set a size.


110720 11-Feb-2003 phk

Make a mutex to stop the race coming into geom_disk's done routine.

Cut up requests into smaller bits if they are longer than the drivers
disk->d_maxsize or dev->si_iosize_max.

Properly handle the race condition when using g_clone_bio() is used
without having the single-threadedness of g_down/g_up secure locking.


110713 11-Feb-2003 phk

Don't divide by zero if there is no stripewidth specified.


110712 11-Feb-2003 phk

Typo in last commit.


110710 11-Feb-2003 phk

Better names for struct disk elements: d_maxsize, d_stripeoffset
and d_stripesisze;

Introduce si_stripesize and si_stripeoffset in struct cdev so we
can make the visible to clustering code.

Add stripesize and stripeoffset to providers.

DTRT with stripesize and stripeoffset in various places in GEOM.


110708 11-Feb-2003 phk

Propagate DISKFLAG_CANDELETE from struct disk to G_PF_CANDELETE on the
provider.


110706 11-Feb-2003 phk

Wrap a long line.


110703 11-Feb-2003 phk

Don't short-circuit zero-length requests of they are BIO_[SG]ETATTR.


110700 11-Feb-2003 phk

Use the SI_CANDELETE flag on the dev_t rather than the D_CANFREE flag
on the cdevsw to determine ability to handle the BIO_DELETE request.


110697 11-Feb-2003 phk

Unconditionally make our provider with G_PF_CANDELETE.


110696 11-Feb-2003 phk

Propagate G_PF_CANDELETE to our own providers from the provider we attach to.


110690 11-Feb-2003 phk

Introduce flag field and G_PF_CANDELETE field on providers.


110686 11-Feb-2003 phk

Remove another printf which does not say anything we didn't already know.


110685 11-Feb-2003 phk

Turn the "updating" flag (back) into two sequence number fields at
either ends of the structure so we have a way to determine if a
snapshot is consistent.


110684 11-Feb-2003 phk

Remove a debugging printf.


110592 09-Feb-2003 phk

Update the statistics collection code to track busy time instead of
idle time.

Statistics now default to "on" and can be turned off with
sysctl kern.geom.collectstats=0

Performance impact of statistics collection is on the order of
800 nsec per consumer/provider set on a 700MHz Athlon.


110543 08-Feb-2003 phk

Put the name of the /dev entry in the .h file, userland will need it.


110541 08-Feb-2003 phk

Move the g_stat struct to its own .h file, we will export it to other code.

Insted of embedding a struct g_stat in consumers and providers, merely
include a pointer.

Remove a couple of <sys/time.h> includes now unneeded.

Add a special allocator for struct g_stat. This allocator will allocate
entire pages and hand out g_stat functions from there. The "id" field
indicates free/used status.

Add "/dev/geom.stats" device driver whic exports the pages from the
allocator to userland with mmap(2) in read-only mode.

This mmap(2) interface should be considered a non-public interface and
the functions in libgeom (not yet committed) should be used to access
the statistics data.


110540 08-Feb-2003 phk

Move #defines of major/minor to internal header file so other bits can
share and coordinate with geom_dev.


110523 07-Feb-2003 phk

Commit the correct copy of the g_stat structure.

Add debug.sizeof.g_stat sysctl.

Set the id field of the g_stat when we create consumers and providers.

Remove biocount from consumer, we will use the counters in the g_stat
structure instead. Replace one field which will need to be atomically
manipulated with two fields which will not (stat.nop and stat.nend).

Change add companion field to bio_children: bio_inbed for the exact
same reason.

Don't output the biocount in the confdot output.

Fix KASSERT in g_io_request().

Add sysctl kern.geom.collectstats defaulting to off.

Collect the following raw statistics conditioned on this sysctl:

for each consumer and provider {
total number of operations started.
total number of operations completed.
time last operation completed.
sum of idle-time.
for each of BIO_READ, BIO_WRITE and BIO_DELETE {
number of operations completed.
number of bytes completed.
number of ENOMEM errors.
number of other errors.
sum of transaction time.
}
}

API for getting hold of these statistics data not included yet.


110520 07-Feb-2003 phk

Fix some sleep strings to make more sense.


110518 07-Feb-2003 phk

Add the new statistics structure, put one in consumers and providers.
include <sys/time.h> as necessary.


110517 07-Feb-2003 phk

Rename bio_linkage to the more obvious bio_parent.
Add bio_t0 timestamp, and include <sys/time.h> where needed


110513 07-Feb-2003 gordon

Add some comments about the deficiencies of this module. I had hoped to get
around to addressing them some more, but Real Life (tm) has gotten in the
way.


110477 06-Feb-2003 phk

Check return value of g_clone_bio().


110475 06-Feb-2003 phk

Experimentally don't let go of Giant in geom_disk's done.
We may actually be increasing Giant contention doing so because the
actual stuff we do is very cheap.

Also I am not convinced there is not a tiny window for a race here.


110471 06-Feb-2003 phk

Put the checks we perform on a bio before calling ::start in their
own function, handle all validation and truncation at the time we
process the bio instead of when it gets scheduled.


110419 05-Feb-2003 phk

Implement the new "struct disk" centered API for device drivers.

This commit should not change anything as no device drivers use the
new API yet.


110317 04-Feb-2003 phk

Pave the road to removing the fixed size limit on device nodes:

Change the si_name of dev_t's to be a char * and put a private buffer for
holding the name at then end of the struct.

Initialize si_name to point to the private buffer.

Put a KASSERT in geom_disk to prevent overrun on the fake dev_t we still
have to generate for the disk_drivers.


110291 03-Feb-2003 gordon

Correct a comment. GEOM modules do not create /dev entries. They create
providers.

Pointed out by: phk


110290 03-Feb-2003 gordon

Add the GEOM module that makes volume labels useful. A kernel compiled with
this will cause volume labels to be exposed in /dev/vol/<volname>. Currently,
there is no conflict resolution if more than one FS has the same volume name.

Reviewed by: phk


110230 02-Feb-2003 phk

Add a bio_disk pointer for use between geom_disk and the device drivers.


110188 01-Feb-2003 phk

Eliminate the sc_openmask, ccdopen() and ccdclose() functions, we
can use the flag maintained by geom_disk.c

Having only a strategy method to intialize, don't waste space using
a cdevsw structure to do so.


110183 01-Feb-2003 phk

Move configuration of geom/providers into its own function in preparation
for adding on-the-fly config interface.


110157 31-Jan-2003 phk

Remove commented out g_enc_dos_partition(). We won't be needing it.


110150 31-Jan-2003 phk

Add a rudimentary class for slicing Apple partitioned disks.

More work is needed on this, stakeholders please contact me.

Not quite asked for by: rwatson


110119 30-Jan-2003 phk

Add some agility to the disk_create() API:

Make passing the methods in a cdevsw structure optional.

Move "CANFREE" and "NOGIANT" flags into struct disk instead of the
cdevsw which may or may not be there.

Rename CANFREE to CANDELETE to match BIO_DELETE operation.

Add "OPEN" flag so drivers don't have to provide open/close methods
just to maintain such a flag.

Add temporary stopgap include of <sys/conf.h> to <sys/disk.h> until
the files which have them in the other order are fixed.

Add KASSERTS to make sure we don't get fed too many NULL pointers.

Clear our geom's softc pointer before we wither.


110118 30-Jan-2003 phk

NO_GEOM cleanup: Remove sys/disklabel.h include.


110116 30-Jan-2003 phk

NO_GEOM cleanup: retire disk_invalidate()


110081 30-Jan-2003 phk

NO_GEOM cleanup: Mark the last arg to disk_create() as unused.


110052 29-Jan-2003 phk

Add code to repsect the D_NOGIANT flag, should the disk device driver set it.
NO_GEOM cleanup: remove ifdefs.

Still untested.


110050 29-Jan-2003 phk

Sort these functions as the author instructed.


109973 28-Jan-2003 phk

Mark some args unused so this compiles in userland.


109972 28-Jan-2003 phk

Use a void * to carry the private data for return-call'ed ioctl requests.
Amongst other things this avoids a complex workaround in the userland
regression bits.


109900 26-Jan-2003 phk

Implement DIOCBSDBB ioctl which overwrites first BBSIZE bytes of BSD
labeled disk.

This is complicated by the fact that BBSIZE is greater than the
PAGE_SIZE limit ioctl inflicts on arguments which are automatically
copied in.

As long as we don't need access to userland memory (copyin/out) we
can deal with the ioctl using g_callme() which executes it from the
GEOM event thread.

Once we need copyin/out, we need to return the bio with EDIRIOCTL
in order to make geom_dev call us back in the original process context
where copyin will work.

Unfortunately, that results in us getting called with Giant, so
we have to DROP_GIANT/PICKUP_GIANT around the code where we diddle
GEOMs internals.

Sometimes you just can't win...

... But it does make geom_bsd.c an almost complete example of the
GEOM beastiarium.


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109563 20-Jan-2003 phk

disk_dev_synth() is a NO_GEOM hack.


109560 20-Jan-2003 phk

Remove need for <sys/diskslice.h> but retain numerical compatibilty just in case.


109535 19-Jan-2003 phk

Finally give CCD the disk mini-layer treatment:

CAUTION:

Previously CCD would be different from all other disks in
the system in that there were no "ccd0" device, only a
"ccd0c" device.

This is no longer so after this commit. If you access a
ccd device through the "/dev/ccd0c" device _and_ have not
actually put a BSD disklabel on the device, you will have
to use the name "/dev/ccd0". If your CCD device contains
a BSD disklabel there should be no difference.

You need to recompile ccdconfig(8) using the changed
src/sys/sys/ccdvar.h for the -g "show me" option to work.

I have run the regression test I created before I started
overhauling CCD and it flags no problems, but this code
is mildly evil, so take care. If you would cry if you lost
what's on CCD, make a back before you upgrade.

Create separate cdevsw for the /dev/ccd.ctl device.

Remove the cloning function, the disk-minilayer will do all naming
for us.

Remove the ccdunit and ccdpart functions and carry the softc pointer
in the relevant dev_t's and structures.

Release all memory when a CCD device is unconfigured, previously
the softc would linger behind.

Remove all traces of BSD disklabel fiddling code.

Remove ccdpsize, the disk mini-layer does this for us.

Don't allocate memory with M_WAITOK in ccdstrategy().

Remove boundary checks which the disk mini-layer does for us.

Don't allocate space for more than 2 ccdbuf, RAID was never implemented.

NB: I have not tried to address any of the preexisting ailments of CCD.


109534 19-Jan-2003 phk

Unifdef -UDEBUG on the CCD driver. The debugging is mostly useless
and can be added back selectively, should anybody start to interest
themselves for the internal workings of ccd.

This commit will make the diffs for the following commits much more
readable.


109486 18-Jan-2003 phk

Inline now trivial functions getccdbuf() and putccdbuf().
Fix another trivial memory-leak.


109482 18-Jan-2003 phk

Fix minor memory-leak.


109474 18-Jan-2003 phk

Use the M_CCD malloc bucket instead of M_DEVBUF.
Don't keep a private freelist of a low number of trivially small structures.


109473 18-Jan-2003 phk

Inline trivial function ccdintr() into its one caller ccdiodone().
Only call ccdfind() once in ccdiodone() and cache the result.


109471 18-Jan-2003 phk

Sanitize the copyright section a bit: We do not need two copies of the
four-clause BSD license in the file, one will do.


109421 17-Jan-2003 phk

Find places to store the previously implicityly passed unit number in
the three configuration ioctls which need a unit number.

Add a "ccd.ctl" device for config operations.

Implement ioctls on ccd.ctl which rely on the explicityly passed
unit numbers.

Update ccdconfig to use the new ccd.ctl interface.

Add code to the kernel to detect old ccdconfig binaries, and whine
about it.

Add code to ccdconfig to detect old kernels, and whine about it.

These two compatibility measures will be retained only for a limited
period since they are in the way of GEOM'ification of ccd.


109256 14-Jan-2003 phk

Add a very simple but functional GEOM mirror class.

This is committed more as an instructive tool than as a production
facility, but this will change over time.


109253 14-Jan-2003 phk

Now that we have non-geom_disk based drivers, we need to cover for those,
in case they return EOPNOTSUPP on an ioctl.

Found by: jhb


109176 13-Jan-2003 phk

Always issue ioctls as BIO_GEATTR requests. The direction of data copies on
ioctls are no reliable indication of the ioctls "set" or "get" nature or if
such simplistic categories can even be applied.

MFC candidate: boot0cfg issue.


109170 13-Jan-2003 phk

Remove g_silence(). It does not do anything anymore.


109169 13-Jan-2003 phk

Fix typo.


109101 11-Jan-2003 phk

Don't restrict MBR sectorsize to 512 bytes.

Test data provided by: Andrey Koklin <aka@veco.ru>


109081 10-Jan-2003 jhb

Output the fstype of each partition in a disklabel in the configuration
text similar to the way that the MBR module dumps its slice types.


108819 06-Jan-2003 phk

BSD disklabels expose the controling label though the 'c' partition, and
some trick is necessary to prevent further BSD geoms from attaching to
that. Our old trick was to make sure we don't attach to a geom from
the "BSD" class, but this doesn't work if an intermediary geom obscures
this fact. Instead, calculate the MD5 checksum of the label we target
and ask if anybody below us loves that label. If they do we don't.

Coded by: gordon.


108817 06-Jan-2003 phk

In userland case include <errno.h>, not <err.h>. This is needed to make
the src/tools/regression/geom stuff compile.


108650 04-Jan-2003 nyan

Rename the dos_partition structure for pc98 to pc98_partition.


108593 03-Jan-2003 phk

Remove CCDF_SWAP and CCDF_PARITY, they have never been implemented.


108591 03-Jan-2003 nyan

MFMBR: Add ioctls for writing an IPL and a boot menu.


108584 03-Jan-2003 phk

Remove unused second argument from BIO_STRATEGY()


108558 02-Jan-2003 phk

Optimize the size of the work-items by letting the mapping function
decide the largest size which stays inside the zone and does not
collide with a lock sector.


108552 02-Jan-2003 phk

Update si_bsize_phys on open.

MFC candidate.


108470 30-Dec-2002 schweikh

Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.


108393 29-Dec-2002 phk

Implement ioctls for tampering with sector0.


108308 27-Dec-2002 phk

Remove the "ascii" attribute from the sysctls so that "sysctl -a" will
skip them.


108297 26-Dec-2002 phk

white-space changes


108296 26-Dec-2002 phk

Use a mutex assert to document our locking circumstances.


108295 26-Dec-2002 phk

We should not need to hold Giant for sbuf operations any more.


108294 26-Dec-2002 phk

Add an XXX comment to explain the predicament.


108093 19-Dec-2002 phk

Don't forget our topology lock in the MBREXT case.


108060 18-Dec-2002 phk

Solve another bug in the mapping code: correctly skip lock sectors.
Make sure sector zero is protected if it contains metadata.

Lower WARNS for gbde to 3 on non-i386 archs. rijndael-fst is evil
but appearntly does the right thing and passes the test-vectors.

MFC Candidate.


108052 18-Dec-2002 phk

Fix two blunders in the mapping functions which can lead to corrupt data,
for request sizes larger than the sectorsize or for multi-key setups.

See warning mailed to current@ for details of recovery.

Found by: Marcus Reid <marcus@blazingdot.com>


108051 18-Dec-2002 phk

Balk at unaligned requests.

MFC candidate.


108003 17-Dec-2002 phk

Add a check for negative offset locations and return EINVAL for them.


107970 17-Dec-2002 phk

Don't mangle geometry for pc98, this will happen in the ata driver.


107968 17-Dec-2002 phk

Remember to hold topology lock when we change things.

Spotted by: kuriyama


107967 17-Dec-2002 phk

Constify the dumpconf() function.


107956 16-Dec-2002 phk

Get rid of g_slice_addslice() and use g_slice_config() instead.

Tested with: i386 + src/tools/regression/geom


107953 16-Dec-2002 phk

Constification and some s/int/u_int/ changes.


107834 13-Dec-2002 phk

Add a couple of KASSERTS, just in case.


107832 13-Dec-2002 phk

Don't interpret the hotspots relative to all slices on a slicer, but
relative to the parent device.


107831 13-Dec-2002 phk

Fix spelling in comment.


107562 03-Dec-2002 sos

Add support for the PC98 platform to the ATA driver.
This mostly consists of functionality to serialize accesses to
the two ATA channels (which can also be used to "fix" certain
PCI based controllers).
Add support for Acard controllers.
Enable the ATA driver in PC98 GENERIC, and add device hints.
Update man page with latest support.

The PC98 core team has kindly provided me with a PC98
machine that made this all possible, thanks to all that
contributed to that effort, without that this would
probably newer have been possible..

Approved by: re@


107526 02-Dec-2002 phk

Use the hotspot code to prevent people from overwriting their disklabel
with stuff which would ruin the day for any open parititons.

Approved by: re


107522 02-Dec-2002 phk

Add a simplified version of the hot-spot code to enable us to protect
in-band disklabels from in-band vandalism.

Approve by: re


107453 01-Dec-2002 phk

Use more mnemonic argument names in the access functions.

Sponsored by: DARPA & NAI Labs
Approved by: re (blanket)


107452 01-Dec-2002 phk

Fix a cut&past-o.

Spotted by: yar
Approved by: re (blanket)


107451 01-Dec-2002 phk

Conceiveably, there may exist an algorithm which can tell if a sequence of bytes
are the output of AES/128/CBC or ARC4RANDOM. Encrypt the random data with which
we wipe when we get a BIO_DELETE to make such an algorithm useful.

Sponsored by: DARPA & NAI Labs
Approved by: re (blanket)


107450 01-Dec-2002 phk

Use unsigned for an index.

Sponsored by: DARPA & NAI Labs.
Approved by: re (blanket).


107116 20-Nov-2002 phk

Remember to update the providers idea of its size when we reconfigure
a slice child.

Approved by: re


107111 20-Nov-2002 phk

Do not call the dumpconf method unless there is one.
Compare pointers with NULL.

Partially submitted by: Christian Carstensen <cc@gate5.de>
Approved by: re


107012 17-Nov-2002 nyan

Save a slice name on the disk and print it at g_pc98_dumpconf().


106635 08-Nov-2002 phk

Remove harmless but irritating printf.


106634 08-Nov-2002 phk

Always recalculate the SRM checksum if the label is at 64 bytes offset.

Tested by: jhb


106559 07-Nov-2002 nyan

Fix to support pc98.
It is mostly merged from MBR specific part.

Reviewed by: phk


106518 06-Nov-2002 phk

Straighten up the geom.ctl config interface definitions.

Sponsored by: DARPA & NAI Labs


106408 04-Nov-2002 phk

Polish a bit here and there.
Reenable the geom.ctl device so people can play with gbde.

Sponsored by: DARPA & NAI Labs


106407 04-Nov-2002 phk

Run a revision on the GBDE encryption facility.

Replace ARC4 with SHA2-512.
Change lock-structure encoding to use random ordering rather for obscurity.
Encrypt lock-structure with AES/256 instead of AES/128.
Change kkey derivation to be MD5 hash based.
Watch for malloc(M_NOWAIT) failures and ditch our cache when they happen.
Remove clause 3 of the license with NAI Labs consent.

Many thanks to "Lucky Green" <shamrock@cypherpunks.to> and "David
Wagner" <daw@cs.berkeley.edu>, for code reading, inputs and
suggestions.

This code has still not been stared at for 10 years by a gang of
hard-core cryptographers. Discretion advised.

NB: These changes result in the on-disk format changing: dump/restore needed.

Sponsored by: DARPA & NAI Labs.


106398 04-Nov-2002 phk

Reject slices where begin == end.
Remove clause 3 from the license with NAI Labs consent.

Sponsored by: DARPA & NAI Labs


106397 04-Nov-2002 phk

Remove clause 3 in the license with NAI's consent.
Reject slices with type==0.
Diddle the bootverbose printfs.

Sponsored by: DARPA & NAI Labs


106341 02-Nov-2002 marcel

Remove the GEOM_GPT hack. We now check for partition type 0xEE and
skip those. This handles the Protective MBR (PMBR) which consists
of a single partition of type 0xEE that covers the whole disk and
as such protects the GPT partitioning. We allow other partitions to
be present besides partitions of type 0xEE and as such interpret
partition type 0xEE as a "hands-off" partition only.

While here, fix g_mbrext_dumpconf to test if indent is NULL and
dump the data in a form that libdisk can grok. Change the logic
in g_mbr_dumpconf to match that of g_mbrext_dumpconf. This does
not change the output, but prevents a NULL-pointer dereference
when indent == NULL && pp == NULL.


106340 02-Nov-2002 marcel

Fix dumpconf so libdisk can grok its output. We weren't checking
if indent was NULL. Consequently we always emitted the XML format.


106338 02-Nov-2002 phk

malloc(9) with M_NOWAIT seems to return NULL a lot more than I would have
expected under -current. This is a problem for GEOM because the up/down
threads cannot sleep waiting for memory to become free. The reason they
cannot sleep is that paging things out to disk may be the only way we can
clear up some RAM. Nice catch-22 there.

Implement a rudimentary ENOMEM recovery strategy: If an I/O request
fails with an error code of ENOMEM, schedule it for a retry, and
tell the down-thread to sleep hz/10 to get other parts of the system
a chance to free up some memory, in particular the up-path in GEOM.

All caches should probably start to monitor malloc(9) failures using the new
malloc_last_fail() function, and release when it indicates congestion.

Sponsored by: DARPA & NAI Labs.


106301 01-Nov-2002 phk

Make this compile in the userland shims again.

Sponsored by: DARPA & NAI Labs


106300 01-Nov-2002 phk

Add KASSERT for bio_cmd validity here as well. Various hacks still
bypass specfs.


106263 31-Oct-2002 phk

Spruce up bootverbose output a bit.

Allow extended partitions to have flag=0x80


106226 30-Oct-2002 phk

Change the kkey generation cherry-picker to use MD5.

Sponsored by: DARPA & NAI Labs


106101 28-Oct-2002 phk

Add the remaning part of the new libdisk interaction.

WARNING: This is not a published interface, it is a stopgap measure for
WARNING: libdisk so we can get 5.0-R out of the door.

Sponsored by: DARPA & NAI Labs


106100 28-Oct-2002 phk

Add support for the new libdisk interaction.

Sponsored by: DARPA & NAI Labs.


106085 28-Oct-2002 phk

Fix a bug in the cherry-picker kkey generator routine.

WARNING: You need to backup and restore the _unencrypted_ contents
WARNING: of your GBDE disks when you take this update!

Sponsored by: DARPA & NAI Labs.


106076 28-Oct-2002 phk

Add more compatibility junk.


106030 27-Oct-2002 phk

Don't truncate on large disks.


106001 26-Oct-2002 phk

Make geom_mbr.c optional on PC98, use GEOM_MBR option to include it.

Disable check for supposedly magic "IPL1" string for PC98 labels, its
thaumaturgical power is in doubt.


105957 25-Oct-2002 phk

Reduce the GEOM verbosity under bootverbose to something more sufferable.
This is not quite the set of information I would want, but the tree where
I have the "correct" version is messed up with conflicts.

Sponsored by: DARPA & NAI Labs.


105947 25-Oct-2002 phk

Add a g_dev_print() function which prints all the /dev entries GEOM
know about.


105941 25-Oct-2002 phk

Loose the g_dev_clone() noise.


105897 24-Oct-2002 phk

Use a better test to prevent tasting geom.ctl so we don't screw the
regression tests.


105892 24-Oct-2002 phk

Don't taste the first provider, it's /dev/geom.ctl and it's not going
to taste like anything we like anyway.


105581 20-Oct-2002 phk

No need to specify CTLTYPE_INT when we use SYSCTL_INT.


105551 20-Oct-2002 phk

Now that the sectorsize and mediasize are properties of the provider,
don't take the detour over the I/O path to discover them using getattr(),
we can just pick them out directly.

Do note though, that for now they are only valid after the first open
of the underlying disk device due compatibility with the old disk_create()
API. This will change in the future so they will always be valid.

Sponsored by: DARPA & NAI Labs.


105550 20-Oct-2002 phk

The g_id*() functions are not needed in the userland test-suite so
#ifdef _KERNEL them rather than deal with a copyin simulation.

Sponsored by: DARPA & NAI Labs


105542 20-Oct-2002 phk

Make the sectorsize a property of providers so we can include it in the XML
output.

Sponsored by: DARPA & NAI Labs


105540 20-Oct-2002 phk

Use %jd instead of %lld now that we have it.


105539 20-Oct-2002 phk

It makes more sense for the fwheads and fwsectors properties to be in
the provider stanza rather than the geom stanza.


105537 20-Oct-2002 phk

Include fwsectors and gfwheads in the XML output for the disks we know.

Sponsored by: DARPA & NAI Labs.


105520 20-Oct-2002 phk

Be consistent about functions being static.

Spotted by: FlexeLint


105512 20-Oct-2002 phk

Constify input to the arc4 seed function.
Implement the lockfile hunting in sector zero.

Sponsored by: DARPA & NAI Labs.


105506 20-Oct-2002 phk

Don't track bio allocation in debug output.

Sponsored by: DARPA & NAI Labs.


105505 20-Oct-2002 phk

Style(9) and english(9) fixes.

Submitted by: schweikh


105504 20-Oct-2002 phk

Make it possible to specify also via geom_t ID in the geom.ctl config ioctl.

Sponsored by: DARPA & NAI Labs.


105465 19-Oct-2002 phk

Fix a missing initialization.


105464 19-Oct-2002 phk

Add Geom Based Disk Encryption to the tree.

This is an encryption module designed for to secure denial of access
to the contents of "cold disks" with or without destruction activation.

Major features:

* Based on AES, MD5 and ARC4 algorithms.
* Four cryptographic barriers:
1) Pass-phrase encrypts the master key.
2) Pass-phrase + Lock data locates master key.
3) 128 bit key derived from 2048 bit master key protects sector key.
3) 128 bit random single-use sector keys protect data payload.
* Up to four different changeable pass-phrases.
* Blackening feature for provable destruction of master key material.
* Isotropic disk contents offers no information about sector contents.
* Configurable destination sector range allows steganographic deployment.

This commit adds the kernel part, separate commits will follow for the
userland utility and documentation.

This software was developed for the FreeBSD Project by Poul-Henning Kamp and
NAI Labs, the Security Research Division of Network Associates, Inc. under
DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA CHATS
research program.

Many thanks to Robert Watson, CBOSS Principal Investigator for making this
possible.

Sponsored by: DARPA & NAI Labs.


105452 19-Oct-2002 tmm

The argument to the DIOCGMEDIASIZE ioctl() is an off_t, not an u_int.

Reviewed by: phk


105358 17-Oct-2002 phk

Be consistent and return the NUL at the end of kern.geom.conf{xml,dot}.

Spotted by: sam


105350 17-Oct-2002 phk

NUL terminate sysctl kern.disks


105180 15-Oct-2002 njl

Return an error if the drive reports heads/sectors that do not make sense.
This fixes a divide by zero in fdisk(8)

Reviewed by: phk


105163 15-Oct-2002 phk

Constification ? Yes, out that door, row on the left, one patch each.

Sponsored by: DARPA & NAI Labs


105133 14-Oct-2002 phk

Remove a bogus local variable.

Sponsored by: DARPA & NAI Labs.


105124 14-Oct-2002 jake

Moved geom class initialization to SI_SUB_DRIVERS from SI_SUB_PSEUDO.
This fixes mounting root from md(4) which calls disk_create() early.


105092 14-Oct-2002 phk

Implement the GEOMCONFIGGEOM ioctl which can be used to manually create
and configure an instance of a class on a give provider.

Sponsored by: DARPA & NAI Labs


105091 14-Oct-2002 phk

Add more KASSERTS.

Sponsored by: DARPA & NAI Labs.


105068 13-Oct-2002 phk

Add the outline of the "/dev/geom.ctl" handling code.

Sponsored by: DARPA & NAI Labs.


105061 13-Oct-2002 phk

Give GEOM modules a chance to specify their own init routine, in case they
have special requirements.

Sponsored by: DARPA & NAI Labs.


104936 11-Oct-2002 phk

The CAM system has it's own ideas of what locks are to be held by whom.
So do GEOM. Not a pretty sight.

Take all the interesting stuff out of GEOM::disk_create(), and leave just
the creation of the fake dev_t. Schedule the topology munging to happen
in the g_event thread with g_call_me().

This makes disk_create() pretty lock-agnostic, almost lock-atheist.

Tripped over by: peter
Sponsored by: DARPA & NAI Labs


104701 09-Oct-2002 phk

Add support g_clone_bio() and g_std_done() to spawn multiple children
of a bio and correctly gather status when done.

Sponsored by: DARPA & NAI Labs.


104665 08-Oct-2002 phk

For now, don't wait for drives to stop returning EBUSY. There is too
much broken harware around it seems.

Sponsored by: DARPA & NAI Labs.


104609 07-Oct-2002 phk

Correctly deal with non-DEVBSIZE drives.
Allow BIO_DELETE through too.

This fixes swap-backed md(4) devices.

Sponsored by: DARPA & NAI Labs.


104606 07-Oct-2002 phk

Put a printf under #ifdef DIAGNOSTIC.

Sponsored by: DARPA & NAI Labs.


104602 07-Oct-2002 phk

Copyin and copyout are only possible from a process-native thread,
and therefore we need a way for ioctl handlers to run in that thread
in GEOM. Rather than invent a complicated registration system to
recognize which ioctl handler to use for a given ioctl, we still
schedule all ioctls down the tree as bio transactions but add a
special return code that means "call me directly" and have the
geom_dev layer do that.

Use this for all ioctls that make it as far as a diskdriver to
avoid any backwards compatibility problems.

Requested by: scottl
Sponsored by: DARPA & NAI Labs


104542 05-Oct-2002 phk

This patch got lost in my trees: Pass setattr down to device drivers
as well.

Detected by: scottl
Sponsored by: DARPA & NAI Labs.


104534 05-Oct-2002 phk

Fix argument order mistake when decoding disklabels from on-disk format.

Detected by: jhay
Sponsored by: DARPA & NAI Labs.


104519 05-Oct-2002 phk

NB: This commit does *NOT* make GEOM the default in FreeBSD
NB: But it will enable it in all kernels not having options "NO_GEOM"

Put the GEOM related options into the intended order.

Add "options NO_GEOM" to all kernel configs apart from NOTES.

In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.

There are currently three known issues which may force people to
need the NO_GEOM option:

boot0cfg/fdisk:
Tries to update the MBR while it is being used to control
slices. GEOM does not allow this as a direct operation.

SCSI floppy drives:
Appearantly the scsi-da driver return "EBUSY" if no media
is inserted. This is wrong, it should return ENXIO.

PC98:
It is unclear if GEOM correctly recognizes all variants of
PC98 disklabels. (Help Wanted! I have neither docs nor HW)

These issues are all being worked.

Sponsored by: DARPA & NAI Labs.


104452 04-Oct-2002 phk

Properly isolate the locking domains of sysctl from the topology lock
for the sysctls which report the configuration.

Sponsored by: DARPA & NAI Labs.


104451 04-Oct-2002 phk

Implement the "kern.disks" sysctl in GEOM.

This makes "mdconfig -l" work again.

Sponsored by: DARPA & NAI Labs.


104450 04-Oct-2002 phk

Properly conditionalize a debugging printf.

Sponsored by: DARPA & NAI Labs.


104359 02-Oct-2002 phk

Move GEOM's sysctls under kern.geom.

Sponsored by: DARPA & NAI Labs.


104357 02-Oct-2002 phk

Put some failing ioctl related printfs under a suitable debug flag.

Sponsored by: DARPA & NAI Labs.


104316 01-Oct-2002 phk

Use the canonical root:operator 0640 for GEOM disk devices.

Spotted by: brooks
Sponsored by: DARPA & NAI Labs.


104312 01-Oct-2002 phk

Don't restrict device drivers ability to sleep in the ioctl method, this
is actually entirely legal.

Do bio's with ioctls in them in a g_call_me() function.

Sponsored by: DARPA & NAI Labs


104292 01-Oct-2002 phk

Include <sys/diskmbr.h> instead of <sys/disklabel.h>

Sponsored by: DARPA & NAI Labs.


104197 30-Sep-2002 phk

Don the asbestos underwear and add the code which lets DIOCWDINFO
write modified disklabels back to disk.

Sponsored by: DARPA & NAI Labs.


104195 30-Sep-2002 phk

Retire g_io_fail() and let g_io_deliver() take an error argument instead.

Sponsored by: DARPA & NAI Labs.


104194 30-Sep-2002 phk

Introduce g_write_data() function.

Sponsored by: DARPA & NAI Labs


104193 30-Sep-2002 phk

Add missing g_enc_le2().

Sponsored by: DARPA & NAI Labs.


104191 30-Sep-2002 phk

Disable the g_sanity() check unless people ask for it in the debugflags.

Sponsored by: DARPA & NAI Labs.


104184 30-Sep-2002 phk

Make sure we don't loose our topology lock in a call_me() handler.

Sponsored by: DARPA & NAI Labs.


104107 28-Sep-2002 phk

Zero the local-variable mutexes before we call mtx_init() on them,
failing to do this may lead mtx_init() to belive they have already
been initialized.

Detected by: Marc Recht <marc@informatik.uni-bremen.de>


104087 28-Sep-2002 phk

Style, whitespace and lint fixes.

Sponsored by: DARPA & NAI Labs.


104086 28-Sep-2002 phk

Void functions cannot use return(foo) even if foo is also returning void.

Sponsored by: DARPA & NAI Labs.


104081 28-Sep-2002 phk

First confirmed kill from my Flexelint license: Check return value
of g_clone_bio().

Detected by: http://www.gimpel.com/html/flex.htm
Sponsored by: DARPA & NAI Labs.


104065 27-Sep-2002 phk

Extensively rework the geom_bsd method, put a lot of comments in, betting
that this will make people use this for their future copy&paste operations.

Rework the detection of raw-disk offsets in disklabels. This actually
unearthed a number of bugs in the (now) previous version.

Also accept labels which don't have a magic RAW_PART, provided they don't
confuse us too much.

Change the order of our sanity-checks on labels found on disks to be more
robust.

Check against MAXPARTITIONS in our sanity-check and reject disklabels
we cannot cope with.

Create new g_bsd_modify() function to implment disklabel modifying
ioctls.

Implement DIOCSDINFO and DIOCWDINFO with the provision that the latter
still not writes your change back to disk. I didn't have the nerves
for that yet.

In the start routine, use g_call_me() for complex ioctls to prevent
sleeping.

Sponsored by: DARPA & NAI Labs.


104064 27-Sep-2002 phk

Add the new g_slice_config() call, which can add/delete/change a slice,
with support for trying, doing and forcing.

This will eventually replace g_slice_addslice() which gets changed from
grabbing topology to requing it in this commit as well.

Sponsored by: DARPA & NAI Labs.


104063 27-Sep-2002 phk

Make the UP/DOWN threads hold on to their own private mutex while doing
work.

This prevents people from sleeping in the UP/DOWN I/O path by mistake
or design (doing so almost invariably result in deadlocks since it
stalls all I/O processing in the given direction.

Sponsored by: DARPA & NAI Labs.


104062 27-Sep-2002 phk

Correctly en/decode MAXPARTITIONS partitions.

Sponsored by: DARPA & NAI Labs.


104061 27-Sep-2002 phk

Setattr should not retry on EBUSY, we could get EBUSY back because
a disklabel modification tries to change an open device, and no
counter-examples exists.

Be less facist about when we can do Setattr, the openmodes of devices
are so loosely managed that the "exclusive" count is almost useless.

Sponsored by: DARPA & NAI Labs.


104060 27-Sep-2002 phk

Various no-ops:

Add a __unused.

Make the 2byte decoder functions return 16 bits for the benefits
of picky lints.

No need to grab giant around a tsleep() when we have a timeout.

Sponsored by: DARPA & NAI Labs.


104059 27-Sep-2002 phk

Correctly calculate size of PC98 slices.

Sponsored by: DARPA & NAI Labs.


104058 27-Sep-2002 phk

Allocate bio's with M_NOWAIT and let the caller deal with the problems.

Sponsored by: DARPA & NAI Labs.


104057 27-Sep-2002 phk

Add checks for g_clone_bio() returning NULL, it will be possible RSN.

Sponsored by: DARPA & NAI Labs.


104056 27-Sep-2002 phk

Implement g_call_me() as a way for geom methods to schedule operations
to be performed in the event-thread.

To do this, we need to lock the eventlist with g_eventlock (nee g_doorlock),
since g_call_me() being called from the UP/DOWN paths will not be able to
aquire g_topology_lock.

This also means that for now these events are not referenced on any
particular consumer/provider/geom.

For UP/DOWN path use, this will not become a problem since the access()
function will make sure we drain any bio's before we dismantle.

Sponsored by: DARPA & NAI Labs.


104055 27-Sep-2002 phk

Ok, include also the two tests which actually does effect the claims
of the last commit message.

Sponsored by: DARPA & NAI Labs.


104054 27-Sep-2002 phk

Hook into the shutdown EVENTHANDLER and stop tasting things after we
get notified to make things settle a bit faster.

Sponsored by: DARPA & NAI Labs.


104053 27-Sep-2002 phk

Rename the doorlock to eventlock, it gets to protect a bit more in the future.

Sponsored by: DARPA & NAI Labs.


103942 25-Sep-2002 jeff

- Use vrefcnt() instead of v_usecount.


103714 20-Sep-2002 phk

(This commit touches about 15 disk device drivers in a very consistent
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)

If struct disklabel is the messenger: kill the messenger.

Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer). This commit changes this communication to use four
explicit fields instead.

Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.

Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.

The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.

This concludes (modulus any mistakes) the series of disklabel related
commits.

I belive it all amounts to a NOP for all the rest of you :-)

Sponsored by: DARPA & NAI Labs.


103695 20-Sep-2002 phk

Remove unneeded #include <sys/disklabel.h>

Sponsored by: DARPA & NAI Labs.


103670 20-Sep-2002 phk

Retire now unused DIOCGDVIRGIN kludge.

Sponsored by: DARPA & NAI Labs.


103284 13-Sep-2002 phk

"Fix" printf format issues by using %j

Sponsored by: DARPA & NAI Labs.


103283 13-Sep-2002 phk

Use biowait() rather than DIY.

Sponsored by: DARPA & NAI Labs


103279 13-Sep-2002 phk

Add a couple more of the big/little-endian conversion routines and make
them visible from userland, if need be.

I wish that the C language contained this as part of struct definintions,
but failing that, I would settle for an agreed upon set of functions for
packing/unpacking integers in various sizes from byte-streams which may
have unfriendly alignment.

This really belongs in <sys/endian.h> I guess.


103278 13-Sep-2002 mux

Fix another two printf() format errors which weren't warned
about because the bio_blknos were bogusly casted to long long.


103276 13-Sep-2002 mux

Fix another printf() format error which wasn't warned about
because the bio_blkno was bogusly casted to an int.


103275 13-Sep-2002 mux

Fix a printf() format error on 64 bits architectures.
Also fix some style bugs on the same line.


103100 08-Sep-2002 phk

Deal with a new exteded MBR paritition type

Submitted by: Michal Mertl <mime@traveller.cz>


103009 06-Sep-2002 phk

Remove "magicspace". It looks good on paper, it doesn't work in practice.

Sponsored by: DARPA & NAI Labs.


103004 06-Sep-2002 phk

Don't respect the O_EXCL flag, we don't get it back on close so we cannot
correctly track it.

Spotted by: peter
Sponsored by: DARPA & NAI Labs.


102380 24-Aug-2002 marcel

Use 'p' as the partition specifier instead of 's'. We continue to use
's' for compatibility partitions (ie partitions with a BSD disklabel).
Partition numbers continue to start with 1.
Example /etc/fstab:
# Device Mountpoint FStype Options ...
/dev/da0p1 /efi msdos rw ...
/dev/da0p2 / ufs rw ...
/dev/da0p3 none swap sw ...


99028 29-Jun-2002 julian

Don't use the static thread.. it is going away.


98987 28-Jun-2002 phk

Add two new submodes to the AES encryption method.

This method is now suitable for encrypting swap spaces.

Sponsored by: DARPA & NAI Labs.


98099 10-Jun-2002 phk

Put geom_gpt.c under the GEOM option instead of having a special GEOM_GPT
option for it.


98066 09-Jun-2002 phk

Improve some on the naming.

Submitted by: iedowse


97887 05-Jun-2002 phk

Change the registration of magic spaces so it does its own memory management.

Sponsored by: DARPA & NAI Labs.


97547 30-May-2002 marcel

Add compile time asserts for the size of struct gpt_hdr and struct
gpt_ent. Use offsetof() for struct gpt_hdr to exclude padding.


97512 29-May-2002 phk

Add one copy of crc32() and crc32_tab[] in libkern, and remove it two other
places.

Comment out crc32 related definitions in zlib.h, we don't seem to have the
corresponding code in our kernel.


97392 28-May-2002 marcel

Add support to GEOM for GUID Partition Tables (GPTs). The support
is currently conditional on both the GEOM and GEOM_GPT options to
avoid getting GPT by default and having the MBR and GPT classes
clash.
The correct behaviour of the MBR class would be to back-off (reject)
a MBR if it's a Protective MBR (a MBR with a single partition of type
0xEE that spans the whole disk (as far as the MBR is concerned).
The correct behaviour if the GPT class would be to back-off (reject)
a GPT if there's a MBR that's not a Protective MBR.

At this stage it's inconvenient to destroy a good MBR when working
with GPTs that it's more convenient to have the MBR class back-off
when it detects the GPT signature on disk and have the GPT class
ignore the MBR.

In sys/gpt.h UUIDs (GUIDs) for the following FreeBSD partitions
have been defined:

GPT_ENT_TYPE_FREEBSD
FreeBSD slice with disklabel. This is the equivalent of
the well-known FreeBSD MBR partition type.
GPT_ENT_TYPE_FREEBSD_{SWAP|UFS|UFS2|VINUM}
FreeBSD partitions in the context of disklabel. This is
speculating on the idea to use the GPT to hold partitions
instead if slices and removing the fixed (and low) limits
we have on the number of partitions.

This commit lacks a GPT image for the regression suite.


97318 26-May-2002 phk

Add a proof-of-concept encryption class.

"The only hard problem in cryptography is key-management."

All sectors are encrypted with AES in CBC mode using a constant key,
currently compiled in and all zero.

To activate this module, write the magic header on the partition:

echo "<<FreeBSD-GEOM-AES>>" | dd conv=sync of=/dev/md98

The encrypted device will be one sector shorter and have ".aes"
appended to its name.

Sponsored by: DARPA & NAI Labs.


97317 26-May-2002 phk

Give the closet-dev_t we hand to the diskdrivers a name.


97316 26-May-2002 phk

Only clear the spoiled flag if the class had no spoiled method, the spoiled
method may have deallocated the consumer already and modifying free()'ed
memory is bad style.

Sponsored by: DARPA & NAI Labs.


97272 25-May-2002 bde

Fixed printf format errors. Most of them are 64-bit daddr_t casualties.
Printing daddr_t's using %d format was always an error, but gcc's
warning about it was ignored for supported 64-bit arches and not printed
for supported 32-bit arches. Hundreds if not thousands thousands of
previously "fixed" daddr_t printings are now broken on 32-bit machines
by casting daddr_t's to longs. daddr_t's should be printed using %jd
format, but this fix uses %lld since %j is not implemented in the
kernel yet.

Fixed some nearby format printf errors (style bugs).


97078 21-May-2002 phk

Introduce the concept of "magic spaces", and implement them in most of
the relevant classes.

Some methods may implement various "magic spaces", this is reserved
or magic areas on the disk, set a side for various and sundry purposes.
A good example is the BSD disklabel and boot code on i386 which occupies
a total of four magic spaces: boot1, the disklabel, the padding behind
the disklabel and boot2. The reason we don't simply tell people to
write the appropriate stuff on the underlying device is that (some of)
the magic spaces might be real-time modifiable. It is for instance
possible to change a disklabel while partitions are open, provided
the open partitions do not get trampled in the process.

Sponsored by: DARPA & NAI Labs.


97075 21-May-2002 phk

Remove the "-class" suffix from classes, they will not be ambiguous.

Sponsored by: DARPA & NAI Labs.


96987 20-May-2002 phk

Don't grab Giant around malloc(9) and free(9).
Don't grab Giant around wakeup(9).
Don't print verbose messages about each device found in geom_dev.
Various cleanups.

Sponsored by: DARPA & NAI Labs.


96953 19-May-2002 phk

Generalize a bit: we don't need separate functions to find the i386 and
alpha disklabels, just one function which is told where to look.

Sponsored by: DARPA & NAI Labs.


96952 19-May-2002 phk

Include needed #include for regression tests.

Sponsored by: DARPA & NAI Labs.


96475 12-May-2002 phk

Retire the bogus uses of the disklabel field d_sbsize and begin to
initialize it to zero so we don't have to have everbody and their
aunt including FFS specific header files.

Sponsored by: DARPA & NAI Labs.


95550 27-Apr-2002 phk

Fix a {} bug which doesn't have any effect yet.

Spotted by: jake


95405 24-Apr-2002 phk

Improve the cross-references in the XML output.

Explained by: des
Sponsored by: DARPA & NAI Labs.


95362 24-Apr-2002 phk

Make specific provisions for the kernel simulator used in the regression
tests, other userland programs may need to include <geom/geom.h>.

Sponsored by: DARPA & NAI Labs.


95323 23-Apr-2002 phk

Implement the GEOMGETCONF ioctl which returns vital stats for the
current device in XML in an sbuf.

Sponsored by: DARPA & NAI Labs


95321 23-Apr-2002 phk

All in a days work: make a function static.


95310 23-Apr-2002 phk

Introduce some serious paranoia to try to catch a memory overwrite problem
as early as possible.

Sponsored by: DARPA & NAI Labs


95276 22-Apr-2002 phk

Protect against multitple #includes of this file.


95038 19-Apr-2002 phk

Make kernel dumps work with GEOM.

Notice that if the device on which the dump is set is destroyed for
any reason, the dump setting is lost. This in particular will
happen in the case of spoilage. For instance if you set dump on
ad0s1b and open ad0 for writing, ad0s* will be spoilt and the dump
setting lost. See geom(4) for more about spoiling.

Sponsored by: DARPA & NAI Labs.


95037 19-Apr-2002 phk

Make life easier for reference-vector generatorts in tools/regression/geom
by including a FreeBSD friendly CVS identifier in the XML output.

Sponsored by: DARPA & NAI Labs.


94287 09-Apr-2002 phk

Implement DIOCGFRONTSTUFF ioctl which reports how many bytes from the start
of the device magic stuff might occupy.

Sponsored by: DARPA & NAI Labs.


94285 09-Apr-2002 phk

Various stylistic nit picking.

Sponsored by: DARPA & NAI Labs.


94284 09-Apr-2002 phk

Introduce the convenience function g_getattr() and make it DWIM.

Sponsored by: DARPA & NAI Labs.


94283 09-Apr-2002 phk

Constifixation of attribute argument to g_io_[gs]etattr()

Sponsored by: DARPA & NAI Labs


94182 08-Apr-2002 phk

Move generic disk ioctls from <sys/disklabel.h> to <sys/disk.h>.

Sponsored by: DARPA & NAI Labs


94175 08-Apr-2002 phk

In reverence of the 3rd X11 development rule:

3.The only thing worse than generalizing from one example
is generalizing from no examples at all.

Remove the fwcylinders attribute before anybody gets the idea that we
alone have squared the circle.

Sponsored by: DARPA & NAI Labs.


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93778 04-Apr-2002 phk

Centralize EOF handling and improve access controls for bio scheduling.

Sponsored by: DARPA & NAI Labs


93776 04-Apr-2002 phk

Move access and orphan member functions from class to geom.

Sponsored by: DARPA & NAI Labs


93774 04-Apr-2002 phk

s/classs/classes/ to fixup grammer after the previous global renaming.

Sponsored by: DARPA & NAI Labs


93657 02-Apr-2002 phk

Retire the bogus ioctl DIOCGPART in toto.

Once again we can notice that badly thought out hacks ferment and infect
far more code than initially expected.

Sponsored by: DARPA and NAI Labs.


93653 02-Apr-2002 phk

One less user of the bogus DIOCGPART ioctl.


93642 02-Apr-2002 phk

Initialize a field to cater for ata-raid


93496 31-Mar-2002 phk

Here follows the new kernel dumping infrastructure.

Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by: DARPA, NAI Labs


93395 29-Mar-2002 phk

Remove bogus ccddump() function in favour of the standard nodump.


93358 28-Mar-2002 phk

Complete an incomplete cut&paste operation.


93354 28-Mar-2002 phk

Add preliminary PC98 class to GEOM.

I have not been able to find very much information about the PC98
extended partition layout so this is gleaned from the source in
our pc98 architecture. Corrections and patched very welcome.

Sponsored by: DARPA and NAI Labs.


93326 28-Mar-2002 phk

In the absense of any smarter way to do this, cast various printf
arguments to silence printf format warnings.


93292 27-Mar-2002 phk

Calculate the checksum the right place for alpha. The fact that this
worked for the beast disklabel only goes to show how weak a simple
parity really is.


93250 26-Mar-2002 phk

Eliminate some thread pointers which do not make sense anymore.

Split private parts of geom.h into geom_int.h. The latter should
never be included in class implemtations.


93248 26-Mar-2002 phk

Cave in to tradition and rename "methods" to "classes".


93238 26-Mar-2002 phk

Push BIO_FORMAT into a local hack inside the floppy drivers where
it belongs.


93097 24-Mar-2002 phk

Make the BSD method width/endian agnostic and support alpha
architecture labels as well.

Sponsored by: DARPA, NAI Labs.


93090 24-Mar-2002 phk

Be more systematic about conversion of on-disk formats in a endian/width
agnostic way.

Collapse the MBR and MBREXT methods into one file and make them endian/width
agnostic.

Sponsored by: DARPA & NAI Labs.


92718 19-Mar-2002 alfred

Fix bio->bio_blkno format warning.


92698 19-Mar-2002 phk

Add five GEOM oriented ioctls to get basic information about a geom device.


92514 17-Mar-2002 phk

Need a different #include for the userland regression test.


92513 17-Mar-2002 phk

Make this compile in the userland-regression testsuite again.


92479 17-Mar-2002 phk

Change the giant-dropping method a fair bit to keep WITNESS more
happy.


92474 17-Mar-2002 phk

Forgot to remove the old g_malloc() call when I split it.

Spotted by: dima


92408 16-Mar-2002 phk

Hmm, talk about optimizer-fodder. Make the DIOCGDVIRGIN hack work again.


92403 16-Mar-2002 phk

Add a generic and general ioctl pass-through mechanism.

It should now be posible to issue ioctls to SCSI CD drives.


92372 15-Mar-2002 phk

Teach GEOM about Sun disklabel formats.

The detection code in this method is written so that it should work on
all architectures which means that you can plug a Sun disk into a i386
now and access the partitions.

We still need an endian-agnostic ufs/ffs before this is really
interresting, but the main focus was to get sparc64 onto the GEOM
trail.


92371 15-Mar-2002 phk

Try to get used to architectures which are picky about alignment.


92363 15-Mar-2002 mckusick

Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.


92108 11-Mar-2002 phk

First commit of the GEOM subsystem to make it easier for people to
test and play with this.

This is not yet production quality and should be run only on dedicated
test boxes.

For people who want to develop transformations for GEOM there exist a
set of shims to run geom in userland (ask phk@freebsd.org).

Reports of all kinds to: phk@freebsd.org
Please include in report:
dmesg
sysctl debug.geomdot
sysctl debug.geomconf

Known significant limitations:
no kernel dump facility.
ioctls severely restricted.

Sponsored by: DARPA, NAI Labs


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


88707 30-Dec-2001 phk

Reduce kernel stack usage of ccdinit() by MAXPATHLEN by using MALLOC(9).

Submitted by: Maxim Konovalov <maxim@macomnet.ru>
MFC after: 1 week


86479 17-Nov-2001 iedowse

Return EOPNOTSUPP for unknown module events.

PR: kern/18473
Submitted by: "Jeroen C. van Gelderen" <gelderen@systemics.com>


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83291 10-Sep-2001 kris

Fix some signed/unsigned integer confusion, and add bounds checking of
arguments to some functions.

Obtained from: NetBSD
Reviewed by: peter
MFC after: 2 weeks


82937 04-Sep-2001 phk

Kill the NCCD constant by modernizing the ccd driver.

Submitted by: sobomax
Reviewed by: phk


76366 08-May-2001 phk

Polish error handling with biofinish().


76322 06-May-2001 phk

Actually biofinish(struct bio *, struct devstat *, int error) is more general
than the bioerror().

Most of this patch is generated by scripts.


74993 29-Mar-2001 gallatin

fix a number of printf format string warnings inside DEBUG ifdefs


74810 26-Mar-2001 phk

Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.


71773 29-Jan-2001 phk

Fix a braino in ccd's clone routine.

Submitted by: tegge


71699 27-Jan-2001 jhb

Back out proc locking to protect p_ucred for obtaining additional
references along with the actual obtaining of additional references.


71463 23-Jan-2001 jhb

Proc locking in the form of using the proc lock to protect p_ucred while
we obtain another reference to it for vnode operations.


69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


65374 02-Sep-2000 phk

Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.


65208 29-Aug-2000 phk

Give ccd a cloning function.


62550 04-Jul-2000 mckusick

Move the truncation code out of vn_open and into the open system call
after the acquisition of any advisory locks. This fix corrects a case
in which a process tries to open a file with a non-blocking exclusive
lock. Even if it fails to get the lock it would still truncate the
file even though its open failed. With this change, the truncation
is done only after the lock is successfully acquired.

Obtained from: BSD/OS


60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


59841 01-May-2000 phk

Convert to struct bio instead of struct buf.


59794 30-Apr-2000 phk

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


59249 15-Apr-2000 phk

Complete the bio/buf divorce for all code below devfs::strategy

Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.

CCD not converted yet, casts to struct buf (still safe)

atapi-cd casts to struct buf to examine B_PHYS


58934 02-Apr-2000 phk

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


58349 20-Mar-2000 phk

Rename the existing BUF_STRATEGY() to DEV_STRATEGY()

substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.


58345 20-Mar-2000 phk

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


56825 29-Jan-2000 peter

Remove #if NCCD > 0 - it's guaranteed to be true by config if ccd.c is
being compiled. (NCCD is used elsewhere though :-( )


56098 16-Jan-2000 phk

Cleanup some remaining bdev fluff.


55756 10-Jan-2000 phk

Give vn_isdisk() a second argument where it can return a suitable errno.

Suggested by: bde


54934 21-Dec-1999 eivind

Remove unused variable


54655 15-Dec-1999 eivind

Introduce NDFREE (and remove VOP_ABORTOP)


54279 08-Dec-1999 ken

Revamp the devstat priority system. All disks now have the same priority.
The same goes for CD drivers and tape drivers. In systems with mixed IDE
and SCSI, devices in the same priority class will be sorted in attach
order.

Also, the 'CCD' priority is now the 'ARRAY' priority, and a number of
drivers have been modified to use that priority.

This includes the necessary changes to all drivers, except the ATA drivers.
Soren will modify those separately.

This does not include and does not require any change in the devstat
version number, since no known userland applications use the priority
enumerations.

Reviewed by: msmith, sos, phk, jlemon, mjacob, bde


53577 22-Nov-1999 phk

Convert various pieces of code to use vn_isdisk() rather than checking
for vp->v_type == VBLK.

In ccd: we don't need to call VOP_GETATTR to find the type of a vnode.

Reviewed by: sos


52965 07-Nov-1999 phk

Remove the devsw magic from DEV_MODULE()


51957 05-Oct-1999 n_hibma

Removal of sys/device.h

- Move intrhook stuff into kernel.h
- Remove all occurrences of #device <device.h>
- Add kernel.h were necessary (nowhere)
- delete device.h

This file contained the structures for cfdata (old style config) and is no
longer used. It was included by most drivers.

It confuses the remote debugger as the definition of 'struct device' in
device.h is found before the one in bus_private.h.


51714 27-Sep-1999 grog

Correct typo in comment. putccdbuf() releases a buffer, it doesn't allocate one.


51701 27-Sep-1999 dillon

Buffer locking code failed to use BUF_KERNPROC and BUF_UNLOCK and
BUF_LOCKFREE a buffer prior to physically freeing it. While these
bugs did not cause a crash, they might in the future.

Added eof handling for unlabeled partitions.

Submitted by: Tor.Egge@fast.no


51658 25-Sep-1999 phk

Remove five now unused fields from struct cdevsw. They should never
have been there in the first place. A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags


51601 23-Sep-1999 dillon

Cleanup CCD quite a bit, including adding clarifying comments.

Enhance MIRROR code. Add a few more sanity checks and implement
a zone-based disk selector to make use of both disks when reading.

Also implement a read fail-over. If a read error occurs on one
disk, the I/O is retried on the other.

NOTE: CCD's mirroring support cannot deal with write errors properly
in regards to recovery, meaning that 'old' data under a write error may
be read non-deterministically if you reboot after a write error, and CCD
certainly cannot deal with a disk changeout. And it still can't. Use
vinum if you are really serious about mirroring. CCD basically just
implements a poor-man's mirror.


51600 23-Sep-1999 dillon

Fix ccdiodone code. The code was using cbp->cb_buf.b_bcount to
sum the total amount of I/O issued to determine when all the I/O
has completed. This fails when the EOF boundry occurs in the middle
of an I/O. Using cbp->cb_buf.b_bufsize works better.


51580 23-Sep-1999 dillon

Fix bug in pseudo-geometry calculation code that assumed a sector size
smaller then 1024 bytes.


51376 18-Sep-1999 phk

Use devstat_end_transaction_buf() rather than Use devstat_end_transaction()


51111 09-Sep-1999 julian

Changes to centralise the default blocksize behaviour.
More likely to follow.

Submitted by: phk@freebsd.org


50830 03-Sep-1999 julian

Revert a bunch of contraversial changes by PHK. After
a quick think and discussion among various people some form of some of
these changes will probably be recommitted.

The reversion requested was requested by dg while discussions proceed.
PHK has indicated that he can live with this, and it has been agreed
that some form of some of these changes may return shortly after further
discussion.


50623 30-Aug-1999 phk

Make bdev userland access work like cdev userland access unless
the highly non-recommended option ALLOW_BDEV_ACCESS is used.

(bdev access is evil because you don't get write errors reported.)

Kill si_bsize_best before it kills Matt :-)

Use the specfs routines rather having cloned copies in devfs.


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50403 26-Aug-1999 phk

Initialize the dev->si_bsize fields.

Submitted by: tegge
Reviewed by: phk


49771 14-Aug-1999 phk

Spring cleaning around strategy and disklabels/slices:

Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout.
please see comment in sys/conf.h about the flag argument.

Remove strategy argument from all the diskslice/label/bad144
implementations, it should be found from the dev_t.

Remove bogus and unused strategy1 routines.

Remove open/close arguments from dssize(). Pick them up from dev_t.

Remove unused and unfinished setgeom support from diskslice/label/bad144 code.


48885 18-Jul-1999 phk

Use the vn_todev() function, rather than VOP_GETATTR


48865 17-Jul-1999 phk

Fix 2nd arg to udev2dev() call in ccd.c


48268 27-Jun-1999 peter

Initialize and hold locks for ccd generated bufs..

Obtained from: Matt Dillon <dillon@backplane.com>


47625 30-May-1999 phk

This commit should be a extensive NO-OP:

Reformat and initialize correctly all "struct cdevsw".

Initialize the d_maj and d_bmaj fields.

The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.


47028 11-May-1999 phk

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


46635 07-May-1999 phk

Continue where Julian left off in July 1998:

Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.

Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)

Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)


46625 07-May-1999 phk

Introduce two functions: physread() and physwrite() and use these directly
in *devsw[] rather than the 46 local copies of the same functions.

(grog will do the same for vinum when he has time)


46576 06-May-1999 phk

Don't use <sys/disk.h>


44671 11-Mar-1999 dg

Fixed variable overflow problem.

Obtained from: NetBSD via Mark J. Taylor <mtaylor@cybernet.com>


44617 10-Mar-1999 mjacob

Don't forget to remove devstat entries when taking
down the CCD device.


44126 18-Feb-1999 ken

Set the devstat priority for ccd devices to DEVSTAT_PRIORITY_CCD
instead of DEVSTAT_PRIORITY_OTHER.


43819 10-Feb-1999 ken

Add a prioritization field to the devstat_add_entry() call so that
peripheral drivers can determine where in the devstat(9) list they are
inserted.

This requires recompilation of libdevstat, systat, vmstat, rpc.rstatd, and
any ports that depend on the devstat code, since the size of the devstat
structure has changed. The devstat version number has been incremented as
well to reflect the change.

This sorts devices in the devstat list in "more interesting" to "less
interesting" order. So, for instance, da devices are now more important
than floppy drives, and so will appear before floppy drives in the default
output from systat, iostat, vmstat, etc.

The order of devices is, for now, kept in a central table in devicestat.h.
If individual drivers were able to make a meaningful decision on what
priority they should be at attach time, we could consider splitting the
priority information out into the various drivers. For now, though, they
have no way of knowing that, so it's easier to put them in an easy to find
table.

Also, move the checkversion() call in vmstat(8) to a more logical place.

Thanks to Bruce and David O'Brien for suggestions, for reviewing this, and
for putting up with the long time it has taken me to commit it. Bruce did
object somewhat to the central priority table (he would rather the
priorities be distributed in each driver), so his objection is duly noted
here.

Reviewed by: bde, obrien


43295 27-Jan-1999 dillon

Fix warnings preparing for -Wall -Wcast-qual

Also disable one usb module in LINT due to fatal compilation errors,
temporary.


43076 22-Jan-1999 peter

Convert ccd to a proper module vs. something started by PSEUDO_SET().


39228 15-Sep-1998 gibbs

Update system to new device statistics code.

Submitted by: "Kenneth D. Merry" <ken@plutotech.com>
mike@smith.net.au (Mike Smith)


38438 19-Aug-1998 sos

Make struct buf->b_offset reflect the real byte offset which got
in via the uio struct. This enables device drivers to use != DEV_BSIZE
blocking on devices with wierd sector/block sizes (ie CDROM's).


37389 04-Jul-1998 julian

There is no such thing any more as "struct bdevsw".

There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries. The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).

rawread()/rawwrite() went away as part of this though it's not strictly
the same patch, just that it involves all the same lines in the drivers.

cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.

Reviewed by: Eivind Eklund and Mike Smith
Changes suggested by eivind.


37384 04-Jul-1998 julian

VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>


36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


34437 09-Mar-1998 julian

Slightly more correct initialisation of the new buf struct for soft-updates.
Submitted by: Chris Csanady <ccsanady@friley585.res.iastate.edu>
Suggested by: Kirk McKusick


33740 22-Feb-1998 jkh

Properly bzero() structures after they're returned from getccdbuf().
Submitted by: Chris Csanady <ccsanady@friley585.res.iastate.edu>


33365 15-Feb-1998 jkh

Revert part of my previous patch - I don't see the *need*
to call splbio() from within an interrupt handler here. :-)


33363 15-Feb-1998 jkh

missing spl() call and off by one error in the handling of the partitions.
Submitted by: Chris Csanady <ccsanady@friley585.res.iastate.edu>
Obtained from: OpenBSD


32921 31-Jan-1998 eivind

Remove unused devfs include. (Julian or Satoshi might want to add proper
DEVFS support here; just including the header file doesn't do any good, and
would make this depend on opt_devfs.h)


31270 18-Nov-1997 phk

There is no ccdread() nor ccdwrite().


30688 24-Oct-1997 phk

Statizice.


30294 11-Oct-1997 phk

Remove a #ifndef __FreeBSD__ chunk.


26640 14-Jun-1997 bde

Removed unused #includes.


25360 01-May-1997 sos

Make ccd use the maxsecsize sector size as denominator, this
fixes ccd on != 512byte devices..


24203 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22538 10-Feb-1997 mpp

Make ccd compile again after the Lite2 merge.

VOP_UNLOCK was being called with the wrong number of arguments.


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


21470 10-Jan-1997 dyson

Fix CCD for bounced devices.


18084 06-Sep-1996 phk

Remove devconf, it never grew up to be of any use.


17275 24-Jul-1996 asami

Fail when odd number of disks are specified with mirror flag. Memory
leak fixes. Miscellaneous cleanup.

Partially submitted by: Matt White <mwhite+@CMU.EDU>


17264 23-Jul-1996 phk

Make a "DWIM" function for adding [bc]devsw entries for bdev drivers.

Saves about 280 butes of source per driver, 56 bytes in object size
and another 56 bytes moves from data to bss.

No functional change intended nor expected.

GENERIC should be about one k smaller now :-)


17237 21-Jul-1996 phk

Substitute raw{read|write} for ccd{read|write}


16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


15765 13-May-1996 asami

Add #ifndef/#endif around the "#define CCD_OFFSET 16", so you can override
it in your kernel config file.

Requested (in essence) by: phk


15763 13-May-1996 asami

Leave 16 lines in front of each component partition. It's now safe to
use sd87a or sd237e even if they start at the beginning of the slice.

You can also use sd85c if you prefer, although you need to change the
type field in the disklabel to "4.2BSD".


15369 24-Apr-1996 asami

Add missing "int" to static var.


14821 26-Mar-1996 asami

Change how mirror writes are handled, according to the discussion on the
mailing list.

When initiating a write, ccdbuffer() returns two "struct ccdbuf *"s
linked together by the cb_mirror field. "cb_pflags &
CCDPF_MIRROR_DONE" is set to 0 on both of them.

When a component returns to ccdiodone(), it checks if "cb_pflags &
CCDPF_MIRROR_DONE" is set or not. If not, it sets the partner's
flag and returns. If it is, it means its partner has already
returned, so it will go to the regular cleanup (which is in the
fallthrough code).

There should be no performance or functionality changes unless the
higher-level scsi driver does something with the resid value. The change
is purely aesthetical and prepares us for the parity implementation.


14730 21-Mar-1996 asami

Ported to 2.2-current. Uses [bc]devsw_add(), and is also now a proper
pseudo-device.

Doesn't use devfs correctly yet.


13784 31-Jan-1996 asami

Fix one warning and fix one bug found while looking at another warning (but
caused by a different reason):

. #ifndef __FreeBSD__ around check for negative size, FreeBSD size_t is
unsigned

. Disable mirror/parity if interleave size is 0 (i.e., serial concatenation).


13775 31-Jan-1996 asami

Mirror support. When CCDF_MIRROR is set:

(1) The reads are always done from the first n/2 disks.

(2) Each write is done twice, to the "data" disk (in the first half) and
the "mirror" disk (in the second half).

ccdbuffer() now takes an extra argument (struct ccdbuf **) and stores
the pointer to ccdbuf in there. In case of a mirrored write, it
allocates and stores two pointers. The "residual" is also doubled
for mirrored writes so that ccdiodone() can correctly tell when all
the writes are done.


13764 30-Jan-1996 asami

Prepare for adding mirroring. Check for flags (mirror forces uniform),
reduce the size to half, etc. Right now it only uses the first n/2 disks
for both read and write.


13173 02-Jan-1996 asami

Prepare to add support for parity. Report the post-parity size,
allocate space around parity blocks.


13070 28-Dec-1995 asami

Added $Id$.


13046 27-Dec-1995 asami

Changes to make it work on FreeBSD-2.1.


13041 27-Dec-1995 asami

ccd.c and ccd.4 from NetBSD-current circa 12/25/95.