History log of /freebsd-10-stable/sys/cam/ctl/ctl_private.h
Revision Date Author Comments
# 314767 06-Mar-2017 mav

MFC r314338: Polish handling of different reset flavours.

The biggest change is that ctl_remove_initiator() now generates I_T NEXUS
LOSS event, cleaning part of LUs state related to the initiator.


# 314753 06-Mar-2017 mav

MFC r314255: Reenable CTL_WITH_CA, optimizing it for lower memory usage.

This code was disabled due to its high memory usage. But now we need this
functionality for cfumass(4) frontend, since USB MS BBB transport does not
support autosense.


# 313369 07-Feb-2017 mav

MFC r312603: Add initial support for CTL module unloading.

It is only a first step and not perfect, but better then nothing.
The main blocker is CAM target frontend, that can not be unloaded,
since CAM does not have mechanism to unregister periph driver now.


# 312841 26-Jan-2017 mav

MFC r311804: Rewrite CTL statistics in more simple and scalable way.

Instead of collecting statistics for each combination of ports and logical
units, that consumed ~45KB per LU with present number of ports, collect
separate statistics for every port and every logical unit separately, that
consume only 176 bytes per each single LU/port. This reduces struct
ctl_lun size down to just 6KB.

Also new IOCTL API/ABI does not hardcode number of LUs/ports, and should
allow handling of very large quantities.

Old API is still enabled in stable branches for compatibility reasons.


# 312839 26-Jan-2017 mav

MFC r311787: Allocate memory for prevent flags only for removable LUs.

This array takes 64KB of RAM now, that was more then half of struct ctl_lun
size. If at some point we support more ports, this may need another tune.


# 311440 05-Jan-2017 mav

MFC r310524: Improve length handling when writing sense data.

- Allow maximal sense size limitation via Control Extension mode page.
- When sense size limited, include descriptors atomically: whole or none.
- Set new SDAT_OVFL bit if some descriptors don't fit the limit.
- Report real written sense length instead of static maximal 252 bytes.


# 311428 05-Jan-2017 mav

MFC r310366: Add support for SITUA bit in Logical Block Provisioning mode page.

VMware tries to enable this bit to avoid multiple threshold notifications
in case of multiple initiators connected to the same LUN. Unfortunately
their code sends MODE SELECT(6) request with parameter length hardcoded
for the page without any thresholds. Since we have four threshold and our
page is bigger, this attempt fails, that is correct in my understanding.
So all we can do about this now is to report proper error code and hope
VMware fix their code one day.


# 311407 05-Jan-2017 mav

MFC r310265: Add set of macros to simplify code access to mode pages fields.


# 311403 05-Jan-2017 mav

MFC r310257: Improve support for informational exceptions.

While CTL still has no real events to report in this way (like SMART),
it is possible to trigger false event by manually setting TEST bit in
Informational Exceptions Control mode page, that can be useful for
initiator testing. This code supports all flavours of IE reporting:
UNIT ATTENTION, RECOVERED ERROR and NO SENSE sense keys, REQUEST SENSE
command and Informational Exceptions log page.


# 291388 27-Nov-2015 mav

MFC r290670: Modify target port groups logic in CTL.

- Introduce "ha_shared" port option, which being set to "on" moves the
port into separate port group, shared between HA nodes. This allows to
better handle cases when iSCSI portals are bound to CARP address that can
dynamically move between nodes. Some initiators (at least VMware) don't
detect that after iSCSI reconnect they've attached to different SCSI port
from different port group, that totally breakes ALUA status parsing.
In theory, I believe, it should be enough to have different iSCSI portal
group tags on different nodes to make initiators detect this condition,
but it seems like VMware ignores those values, and even full LUN retaste
forced by UA does not help.
- Make CTL report up to three port groups: 1 -- non-HA mode or ports
with "ha_shared" option set, 2 -- HA node 1, 3 -- HA node 2.
- Report Transitioning state for all port groups when HA interlink is
connected, but neither of nodes is primary for the LUN.


# 288819 05-Oct-2015 mav

MFC r288448: Unify PR variable names to reduce confusion.


# 288816 05-Oct-2015 mav

MFC r288369: Really implement PREVENT ALLOW MEDIUM REMOVAL command.


# 288812 05-Oct-2015 mav

MFC r288358: Add CD/DVD Capabilities and Mechanical Status Page.

This page is obsolete since MMC-4, but still used by some software.

approved by:


# 288811 05-Oct-2015 mav

MFC r288348: Umplement media load/eject support for removable devices.

In case of block backend eject really closes the backing store, while
load tries to open it back. Failed store open is reported as no media.


# 288810 05-Oct-2015 mav

MFC r288310: Add to CTL initial support for CDROMs and removable devices.

Relnotes: yes


# 288809 05-Oct-2015 mav

MFC r288264: Allow LOG SENSE command on non-disk devices.


# 288807 05-Oct-2015 mav

MFC r288261: Move ioctl frontend defines where they belong.


# 288806 05-Oct-2015 mav

MFC r288260: Remove few more unused variables.


# 288805 05-Oct-2015 mav

MFC r288259: Remove some duplicate, legacy, dead and questionable code.


# 288788 05-Oct-2015 mav

MFC r288110: Add support for Control extension mode page.


# 288778 05-Oct-2015 mav

MFC r287993: Split two command flags with different meaning.

This is only a cosmetical change.


# 288768 05-Oct-2015 mav

MFC r287921: When reporting TPT UA, report which of thresholds was reached.


# 288732 05-Oct-2015 mav

MFC r287621: Reimplement CTL High Availability.

CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.

This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.

Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.

The code may still need some polishing, but generally it is functional.

Relnotes: yes
Sponsored by: iXsystems, Inc.


# 288727 05-Oct-2015 mav

MFC r287499: Move setting of media parameters inside open routines.

This is preparation for possibility to open/close media several times
per LUN life cycle. While there, rename variables to reduce confusion.
As additional bonus this allows to open read-only media, such as ZFS
snapshots.


# 288720 05-Oct-2015 mav

MFC r286807: Move "ioctl" CAM frontend into separate file.

It has nothing to share with too huge ctl.c other then device descriptor,
but even that may be counted as design error that may be fixed later.
At some point we may even want to have several ioctl ports.


# 288719 05-Oct-2015 mav

MFC r286806: Drop "internal" CTL frontend.

Its idea was to be a simple initiator and execute several commands from
kernel level, but FreeBSD never had consumer for that functionality,
while its implementation polluted many unrelated places.


# 286930 19-Aug-2015 mav

MFC r286345: Relax serialization of SYNCHRONIZE CACHE commands.

Before this change SYNCHRONIZE CACHE commands were executed exclusively,
as if they had ORDERED tag. But looking through SCSI specs I've found
no any reason to be so strict. For reads this ordering seems pointless.
For writes it looks less obvious, so I left ordering against preceeding
write commands, while following ones are no longer required to wait.


# 284798 25-Jun-2015 mav

MFC r284640: Bring per-port LUN enable/disable code up to date:
- remove last remnants of never implemented multiple targets support;
- implement missing support for LUN mapping in this area.

Due to existing locking constraints LUN mapping code is practically
unlocked at this point. Hopefully it is not racy enough to live until
somebody get idea how to call sleeping fronend methods under lock also
taken by the same frontend in non-sleepable context. :(


# 284796 25-Jun-2015 mav

MFC r284639: Introduce separate lock for tokens to reduce ctl_lock scope.


# 279273 25-Feb-2015 mav

MFC r278584: Add support for General Statistics and Performance log page.

CTL already collects most of statistics reported there, so why not.


# 279002 19-Feb-2015 mav

MFC r278037: CTL LUN mapping rewrite.

Replace iSCSI-specific LUN mapping mechanism with new one, working for any
ports. By default all ports are created without LUN mapping, exposing all
CTL LUNs as before. But, if needed, LUN mapping can be manually set on
per-port basis via ctladm. For its iSCSI ports ctld does it via ioctl(2).
The next step will be to teach ctld to work with FibreChannel ports also.

Respecting additional flexibility of the new mechanism, ctl.conf now allows
alternative syntax for LUN definition. LUNs can now be defined in global
context, and then referenced from targets by unique name, as needed. It
allows same LUN to be exposed several times via multiple targets.

While there, increase limit for LUNs per target in ctld from 256 to 1024.
Some initiators do not support LUNs above 255, but that is not our problem.

Relnotes: yes
Sponsored by: iXsystems, Inc.


# 275895 18-Dec-2014 mav

MFC r275568:
Count consecutive read requests as blocking in CTL for files and ZVOLs.

Technically read requests can be executed in any order or simultaneously
since they are not changing any data. But ZFS prefetcher goes crasy when
it receives consecutive requests from different threads. Since prefetcher
works on level of separate blocks, instead of two consecutive 128K requests
it may receive 32 8K requests in mixed order.

This patch is more workaround then a real fix, and it does not fix all of
prefetcher problems, but it improves sequential read speed by 3-4x times
in some configurations. On the other side it may hurt performance if
some backing store has no prefetch, that is why it is disabled by default
for raw devices.


# 275892 18-Dec-2014 mav

MFC r275474: Add GET LBA STATUS command support to CTL.

It is implemented for LUNs backed by ZVOLs in "dev" mode and files.
GEOM has no such API, so for LUNs backed by raw devices all LBAs will
be reported as mapped/unknown.

Sponsored by: iXsystems, Inc.


# 275888 18-Dec-2014 mav

MFC r275458:
Do not pre-allocate UNIT ATTENTIONs storage for every possible initiator.

Abusing ability of major UAs cover minor ones we may not account UAs for
inactive ports. Allocate UAs storage for port and start accounting only
after some initiator from that port fetched its first POWER ON OCCURRED.

This reduces per-LUN CTL memory usage from >1MB to less then 100K.


# 275886 18-Dec-2014 mav

MFC r275447:
Do not pre-allocate reservation keys memory for every possible initiator.

In configurations with many ports, like iSCSI, each LUN is typically
accessed only by limited subset of ports. Allocating that memory on
demand allows to reduce CTL memory usage from 5.3MB/LUN to 1.3MB/LUN.


# 275885 18-Dec-2014 mav

MFC r275405: Convert persis_offset from global variable to softc field.


# 275878 18-Dec-2014 mav

MFC r274962: Replace home-grown CTL IO allocator with UMA.

Old allocator created significant lock congestion protecting its lists
of preallocated I/Os, while UMA provides much better SMP scalability.
The downside of UMA is lack of reliable preallocation, that could guarantee
successful allocation in non-sleepable environments. But careful code
review shown, that only CAM target frontend really has that requirement.
Fix that making that frontend preallocate and statically bind CTL I/O for
every ATIO/INOT it preallocates any way. That allows to avoid allocations
in hot I/O path. Other frontends either may sleep in allocation context
or can properly handle allocation errors.

On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections
this change increases peak performance from ~700K to >1M IOPS! Yay! :)

Sponsored by: iXsystems, Inc.


# 275493 05-Dec-2014 mav

MFC r274785: Partially reconstruct Active/Standby clusting.

In this mode one head is in Active state, supporting all commands, while
another is in Standby state, supporting only minimal LUN discovery subset.

It is still incomplete since Standby state requires reservation support,
which is impossible to do right without having interlink between heads.
But it allows to run some basic experiments.


# 274732 20-Nov-2014 mav

MFC r274154, r274163:
Add to CTL support for logical block provisioning threshold notifications.

For ZVOL-backed LUNs this allows to inform initiators if storage's used or
available spaces get above/below the configured thresholds.

Sponsored by: iXsystems, Inc.


# 274492 13-Nov-2014 mav

MFC r274206:
Synchronize medium rotation rate in legacy Rigid Disk Drive Geometry mode
page with modern Block Device Characteristics VPD page.


# 274003 03-Nov-2014 mav

MFC r273730, r273731:
Reduce code duplication around Write Exclusive persistent reservation.

While there, allow some more commands to pass persistent reservation.


# 274002 03-Nov-2014 mav

MFC r273711:
Allocate buffer for READ BUFFER/WRITE BUFFER commands on demand.

These commands are rare, but consume additional 256KB RAM per LUN.


# 273978 02-Nov-2014 mav

MFC r273075: Remove couple Copan's vendor-specific mode pages.

Those pages are highly system-/hardware-specific, the code is incomplete,
and so they hardly can be useful for anybody else.


# 273977 02-Nov-2014 mav

MFC r273073: Some groundwork for later Informational Exceptions support.

This includes support for:
- Read-Write Error Recovery mode page;
- Informational Exceptions Control mode page;
- Logical Block Provisioning mode page;
- LOG SENSE command.

No real Informational Exceptions features yet. This is only a placeholder.


# 273531 23-Oct-2014 mav

MFC r273163: Implement more functional CTL debug logging.

Setting bits in kern.cam.ctl.debug allows to log errors, commands and some
commands data respectively.


# 273323 20-Oct-2014 mav

MFC r273038: Add support for READ DEFECT DATA (10/12) commands.

SPC-4 r2 allows to return empty defect list if the list is not supported.
We don't reallu support defect data lists, but this suppresses some errors.


# 273314 20-Oct-2014 mav

MFC r272893:
Store persistent reservation keys as uint64_t instead of uint8_t[8].

This allows to simplify the code and save 512KB of RAM per LUN (8%)
by removing no longer needed "registered" keys flags.


# 273312 20-Oct-2014 mav

MFC r272748:
Implement software (mode page) and hardware (config) write protection.


# 272639 06-Oct-2014 mav

MFC r271945:
Simplify legacy reservation handling. Drop it on I_T nexus loss.


# 272630 06-Oct-2014 mav

MFC r271507:
Implement control over command reordering via options and control mode page.

It allows to bypass range checks between UNMAP and READ/WRITE commands,
which may introduce additional delays while waiting for UNMAP parameters.
READ and WRITE commands are always processed in safe order since their
range checks are almost free.


# 272616 06-Oct-2014 mav

MFC r271309:
Improve cache control support, including DPO/FUA flags and the mode page.

At this moment it works only for files and ZVOLs in device mode since BIOs
have no respective respective cache control flags (DPO/FUA).


# 271529 13-Sep-2014 mav

MFC r271362:
Make ctl_port_mask an array to support more then 32 ports.

Overflow reported by Coverity.

CID: 1229894

Approved by: re (marius)


# 270106 17-Aug-2014 mav

MFC r269497:
Add support for Windows dialect of EXTENDED COPY command, aka Microsoft ODX.

This allows to avoid extra network traffic when copying files on NTFS iSCSI
disks within one storage host by drag'n'dropping them in Windows Explorer
of Windows 8/2012. It should also accelerate Hyper-V VM operations, etc.


# 269298 30-Jul-2014 mav

MFC r268808:
Increase maximal number of SCSI ports in CTL from 32 to 128.

After I gave each iSCSI target its own port, the old limit appeared to be
not so big. This change almost proportionally increases per-LUN memory
use, but it is still three times better then it was before r268807.


# 269297 30-Jul-2014 mav

MFC r268807:
Reduce per-LUN memory usage from 18MB to 1.8MB.

CTL never had use for CA support code since SPI has gone, and there is no
even frontends supporting that. But it still was reserving 256 bytes of
memory per LUN per every possible initiator on every possible port.

Wrap unused code with ifdef's in case somebody ever need it.


# 269296 30-Jul-2014 mav

MFC r268767:
Add support for VMWare dialect of EXTENDED COPY command, aka VAAI Clone.

This allows to clone VMs and move them between LUNs inside one storage
host without generating extra network traffic to the initiator and back,
and without being limited by network bandwidth.

LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set.
For LUNs without these IDs VMWare will use traditional copy operations.

Beware: the above LUN IDs explicitly set to values non-unique from the VM
cluster point of view may cause data corruption if wrong LUN is addressed!

Sponsored by: iXsystems, Inc.


# 269295 30-Jul-2014 mav

MFC r268581:
Merge several equal serialization indexes.


# 268692 15-Jul-2014 mav

MFC r268362:
Teach ctl_add_initiator() to dynamically allocate IIDs from pool.

If port passed negative IID value, the function will try to allocate IID
from the pool of unused, based on passed wwpn or name arguments. It does
all its best to make IID unique and persistent across reconnects.

This makes persistent reservation properly work for iSCSI. Previously,
in case of reconnects, reservation could be unexpectedly lost, or even
migrate between intiators.


# 268686 15-Jul-2014 mav

MFC r268308:
Make REPORT TARGET PORT GROUPS command report realistic data instead of
hardcoded garbage.


# 268683 15-Jul-2014 mav

MFC r268293:
Burry devid port method, which was a gross hack.

Instead make ports provide wanted port and target IDs, and LUNs provide
wanted LUN IDs. After that core Device ID VPD code only had to link all
of them together and add relative port and port group numbers.

LUN ID for iSCSI LUNs no longer created by CTL, but by ctld, and passed
to CTL as "scsiname" LUN option. This makes LUNs to report the same set
of IDs, independently from the port through which it is accessed, as
required by SCSI specifications.


# 268677 15-Jul-2014 mav

MFC r268266, r268275:
Separate concepts of frontend and port.

Before iSCSI implementation CTL had no knowledge about frontend drivers,
it had only frontends, which really were ports (alike to LUNs, if comparing
to backends). But iSCSI added there ioctl() method, which does not belong
to frontend as a port, but belongs to a frontend driver.


# 268675 15-Jul-2014 mav

MFC r268103:
Add support for REPORT TIMESTAMP command.


# 268674 15-Jul-2014 mav

MFC r268096, r268306, r268361:
Add more formal and strict command parsing and validation.

For every supported command define CDB length and mask of bits that are
allowed to be set. This allows to remove bunch of checks through the code
and still make the validation more strict. To properly do it for commands
supporting multiple service actions, formalize their parsing by adding
subtables for each of such commands.

As visible effect, this change allows to add support for REPORT SUPPORTED
OPERATION CODES command, reporting to client all the data about supported
SCSI commands, except timeouts.


# 268556 12-Jul-2014 mav

MFC r267643, r267873, r268391, r268398:
Introduce fine-grained CTL locking to improve SMP scalability.

Split global ctl_lock, historically protecting most of CTL context:
- remaining ctl_lock now protects lists of fronends and backends;
- per-LUN lun_lock(s) protect LUN-specific information;
- per-thread queue_lock(s) protect request queues.
This allows to radically reduce congestion on ctl_lock.

Create multiple worker threads, depending on number of CPUs, and assign
each LUN to one of them. This allows to spread load between multiple CPUs,
still avoiging congestion on queues and LUNs locks.

On 40-core server, exporting 5 LUNs, each backed by gstripe of SATA SSDs,
accessed via 6 iSCSI connections, this change improves peak request rate
from 250K to 680K IOPS.

Sponsored by: iXsystems, Inc.


# 268550 12-Jul-2014 mav

MFC r267905:
Add READ BUFFER and improve WRITE BUFFER SCSI commands support.

This gives some use to 512KB per-LUN buffers, allocated for Copan-specific
processor code and not used. It allows, for example, to test transport
performance and/or correctness without accessing the media, as supported
by Linux version of sg3_utils.


# 268151 02-Jul-2014 mav

MFC r267537:
Add support for VERIFY(10/12/16) and COMPARE AND WRITE SCSI commands.

Make data_submit backends method support not only read and write requests,
but also two new ones: verify and compare. Verify just checks readability
of the data in specified location without transferring them outside.
Compare reads the specified data and compares them to received data,
returning error if they are different.

VERIFY(10/12/16) commands request either verify or compare from backend,
depending on BYTCHK CDB field. COMPARE AND WRITE command executed in two
stages: first it requests compare, and then, if succeesed, requests write.
Atomicity of operation is guarantied by CTL request ordering code.

Sponsored by: iXsystems, Inc.


# 268144 02-Jul-2014 mav

MFC r267485:
Remove non-functional remnants of control LUN -- 18MB of RAM for nothing.


# 265634 08-May-2014 mav

MFC r264274, r264279, r264283, r264296, r264297:
Add support for SCSI UNMAP commands to CTL.

This patch adds support for three new SCSI commands: UNMAP, WRITE SAME(10)
and WRITE SAME(16). WRITE SAME commands support both normal write mode
and UNMAP flag. To properly report UNMAP capabilities this patch also adds
support for reporting two new VPD pages: Block limits and Logical Block
Provisioning.

UNMAP support can be enabled per-LUN by adding "-o unmap=on" to `ctladm
create` command line or "option unmap on" to lun sections of /etc/ctl.conf.

At this moment UNMAP supported for ramdisks and device-backed block LUNs.
It was tested to work great with ZFS ZVOLs. For file-backed LUNs UNMAP
support is unfortunately missing due to absence of respective VFS KPI.

Sponsored by: iXsystems, Inc


# 260477 09-Jan-2014 mav

MFC r257946:
Introduce seperate mutex lock to protect protect CTL I/O pools, slightly
reducing global CTL lock scope and congestion.

While there, simplify CTL I/O pools KPI, hiding implementation details.


# 288732 05-Oct-2015 mav

MFC r287621: Reimplement CTL High Availability.

CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.

This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.

Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.

The code may still need some polishing, but generally it is functional.

Relnotes: yes
Sponsored by: iXsystems, Inc.


# 288727 05-Oct-2015 mav

MFC r287499: Move setting of media parameters inside open routines.

This is preparation for possibility to open/close media several times
per LUN life cycle. While there, rename variables to reduce confusion.
As additional bonus this allows to open read-only media, such as ZFS
snapshots.


# 288720 05-Oct-2015 mav

MFC r286807: Move "ioctl" CAM frontend into separate file.

It has nothing to share with too huge ctl.c other then device descriptor,
but even that may be counted as design error that may be fixed later.
At some point we may even want to have several ioctl ports.


# 288719 05-Oct-2015 mav

MFC r286806: Drop "internal" CTL frontend.

Its idea was to be a simple initiator and execute several commands from
kernel level, but FreeBSD never had consumer for that functionality,
while its implementation polluted many unrelated places.


# 286930 19-Aug-2015 mav

MFC r286345: Relax serialization of SYNCHRONIZE CACHE commands.

Before this change SYNCHRONIZE CACHE commands were executed exclusively,
as if they had ORDERED tag. But looking through SCSI specs I've found
no any reason to be so strict. For reads this ordering seems pointless.
For writes it looks less obvious, so I left ordering against preceeding
write commands, while following ones are no longer required to wait.


# 284798 25-Jun-2015 mav

MFC r284640: Bring per-port LUN enable/disable code up to date:
- remove last remnants of never implemented multiple targets support;
- implement missing support for LUN mapping in this area.

Due to existing locking constraints LUN mapping code is practically
unlocked at this point. Hopefully it is not racy enough to live until
somebody get idea how to call sleeping fronend methods under lock also
taken by the same frontend in non-sleepable context. :(


# 284796 25-Jun-2015 mav

MFC r284639: Introduce separate lock for tokens to reduce ctl_lock scope.


# 279273 25-Feb-2015 mav

MFC r278584: Add support for General Statistics and Performance log page.

CTL already collects most of statistics reported there, so why not.


# 279002 19-Feb-2015 mav

MFC r278037: CTL LUN mapping rewrite.

Replace iSCSI-specific LUN mapping mechanism with new one, working for any
ports. By default all ports are created without LUN mapping, exposing all
CTL LUNs as before. But, if needed, LUN mapping can be manually set on
per-port basis via ctladm. For its iSCSI ports ctld does it via ioctl(2).
The next step will be to teach ctld to work with FibreChannel ports also.

Respecting additional flexibility of the new mechanism, ctl.conf now allows
alternative syntax for LUN definition. LUNs can now be defined in global
context, and then referenced from targets by unique name, as needed. It
allows same LUN to be exposed several times via multiple targets.

While there, increase limit for LUNs per target in ctld from 256 to 1024.
Some initiators do not support LUNs above 255, but that is not our problem.

Relnotes: yes
Sponsored by: iXsystems, Inc.


# 275895 18-Dec-2014 mav

MFC r275568:
Count consecutive read requests as blocking in CTL for files and ZVOLs.

Technically read requests can be executed in any order or simultaneously
since they are not changing any data. But ZFS prefetcher goes crasy when
it receives consecutive requests from different threads. Since prefetcher
works on level of separate blocks, instead of two consecutive 128K requests
it may receive 32 8K requests in mixed order.

This patch is more workaround then a real fix, and it does not fix all of
prefetcher problems, but it improves sequential read speed by 3-4x times
in some configurations. On the other side it may hurt performance if
some backing store has no prefetch, that is why it is disabled by default
for raw devices.


# 275892 18-Dec-2014 mav

MFC r275474: Add GET LBA STATUS command support to CTL.

It is implemented for LUNs backed by ZVOLs in "dev" mode and files.
GEOM has no such API, so for LUNs backed by raw devices all LBAs will
be reported as mapped/unknown.

Sponsored by: iXsystems, Inc.


# 275888 18-Dec-2014 mav

MFC r275458:
Do not pre-allocate UNIT ATTENTIONs storage for every possible initiator.

Abusing ability of major UAs cover minor ones we may not account UAs for
inactive ports. Allocate UAs storage for port and start accounting only
after some initiator from that port fetched its first POWER ON OCCURRED.

This reduces per-LUN CTL memory usage from >1MB to less then 100K.


# 275886 18-Dec-2014 mav

MFC r275447:
Do not pre-allocate reservation keys memory for every possible initiator.

In configurations with many ports, like iSCSI, each LUN is typically
accessed only by limited subset of ports. Allocating that memory on
demand allows to reduce CTL memory usage from 5.3MB/LUN to 1.3MB/LUN.


# 275885 18-Dec-2014 mav

MFC r275405: Convert persis_offset from global variable to softc field.


# 275878 18-Dec-2014 mav

MFC r274962: Replace home-grown CTL IO allocator with UMA.

Old allocator created significant lock congestion protecting its lists
of preallocated I/Os, while UMA provides much better SMP scalability.
The downside of UMA is lack of reliable preallocation, that could guarantee
successful allocation in non-sleepable environments. But careful code
review shown, that only CAM target frontend really has that requirement.
Fix that making that frontend preallocate and statically bind CTL I/O for
every ATIO/INOT it preallocates any way. That allows to avoid allocations
in hot I/O path. Other frontends either may sleep in allocation context
or can properly handle allocation errors.

On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections
this change increases peak performance from ~700K to >1M IOPS! Yay! :)

Sponsored by: iXsystems, Inc.


# 275493 05-Dec-2014 mav

MFC r274785: Partially reconstruct Active/Standby clusting.

In this mode one head is in Active state, supporting all commands, while
another is in Standby state, supporting only minimal LUN discovery subset.

It is still incomplete since Standby state requires reservation support,
which is impossible to do right without having interlink between heads.
But it allows to run some basic experiments.


# 274732 20-Nov-2014 mav

MFC r274154, r274163:
Add to CTL support for logical block provisioning threshold notifications.

For ZVOL-backed LUNs this allows to inform initiators if storage's used or
available spaces get above/below the configured thresholds.

Sponsored by: iXsystems, Inc.


# 274492 13-Nov-2014 mav

MFC r274206:
Synchronize medium rotation rate in legacy Rigid Disk Drive Geometry mode
page with modern Block Device Characteristics VPD page.


# 274003 03-Nov-2014 mav

MFC r273730, r273731:
Reduce code duplication around Write Exclusive persistent reservation.

While there, allow some more commands to pass persistent reservation.


# 274002 03-Nov-2014 mav

MFC r273711:
Allocate buffer for READ BUFFER/WRITE BUFFER commands on demand.

These commands are rare, but consume additional 256KB RAM per LUN.


# 273978 02-Nov-2014 mav

MFC r273075: Remove couple Copan's vendor-specific mode pages.

Those pages are highly system-/hardware-specific, the code is incomplete,
and so they hardly can be useful for anybody else.


# 273977 02-Nov-2014 mav

MFC r273073: Some groundwork for later Informational Exceptions support.

This includes support for:
- Read-Write Error Recovery mode page;
- Informational Exceptions Control mode page;
- Logical Block Provisioning mode page;
- LOG SENSE command.

No real Informational Exceptions features yet. This is only a placeholder.


# 273531 23-Oct-2014 mav

MFC r273163: Implement more functional CTL debug logging.

Setting bits in kern.cam.ctl.debug allows to log errors, commands and some
commands data respectively.


# 273323 20-Oct-2014 mav

MFC r273038: Add support for READ DEFECT DATA (10/12) commands.

SPC-4 r2 allows to return empty defect list if the list is not supported.
We don't reallu support defect data lists, but this suppresses some errors.


# 273314 20-Oct-2014 mav

MFC r272893:
Store persistent reservation keys as uint64_t instead of uint8_t[8].

This allows to simplify the code and save 512KB of RAM per LUN (8%)
by removing no longer needed "registered" keys flags.


# 273312 20-Oct-2014 mav

MFC r272748:
Implement software (mode page) and hardware (config) write protection.


# 272639 06-Oct-2014 mav

MFC r271945:
Simplify legacy reservation handling. Drop it on I_T nexus loss.


# 272630 06-Oct-2014 mav

MFC r271507:
Implement control over command reordering via options and control mode page.

It allows to bypass range checks between UNMAP and READ/WRITE commands,
which may introduce additional delays while waiting for UNMAP parameters.
READ and WRITE commands are always processed in safe order since their
range checks are almost free.


# 272616 06-Oct-2014 mav

MFC r271309:
Improve cache control support, including DPO/FUA flags and the mode page.

At this moment it works only for files and ZVOLs in device mode since BIOs
have no respective respective cache control flags (DPO/FUA).


# 271529 13-Sep-2014 mav

MFC r271362:
Make ctl_port_mask an array to support more then 32 ports.

Overflow reported by Coverity.

CID: 1229894

Approved by: re (marius)


# 270106 17-Aug-2014 mav

MFC r269497:
Add support for Windows dialect of EXTENDED COPY command, aka Microsoft ODX.

This allows to avoid extra network traffic when copying files on NTFS iSCSI
disks within one storage host by drag'n'dropping them in Windows Explorer
of Windows 8/2012. It should also accelerate Hyper-V VM operations, etc.


# 269298 30-Jul-2014 mav

MFC r268808:
Increase maximal number of SCSI ports in CTL from 32 to 128.

After I gave each iSCSI target its own port, the old limit appeared to be
not so big. This change almost proportionally increases per-LUN memory
use, but it is still three times better then it was before r268807.


# 269297 30-Jul-2014 mav

MFC r268807:
Reduce per-LUN memory usage from 18MB to 1.8MB.

CTL never had use for CA support code since SPI has gone, and there is no
even frontends supporting that. But it still was reserving 256 bytes of
memory per LUN per every possible initiator on every possible port.

Wrap unused code with ifdef's in case somebody ever need it.


# 269296 30-Jul-2014 mav

MFC r268767:
Add support for VMWare dialect of EXTENDED COPY command, aka VAAI Clone.

This allows to clone VMs and move them between LUNs inside one storage
host without generating extra network traffic to the initiator and back,
and without being limited by network bandwidth.

LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set.
For LUNs without these IDs VMWare will use traditional copy operations.

Beware: the above LUN IDs explicitly set to values non-unique from the VM
cluster point of view may cause data corruption if wrong LUN is addressed!

Sponsored by: iXsystems, Inc.


# 269295 30-Jul-2014 mav

MFC r268581:
Merge several equal serialization indexes.


# 268692 15-Jul-2014 mav

MFC r268362:
Teach ctl_add_initiator() to dynamically allocate IIDs from pool.

If port passed negative IID value, the function will try to allocate IID
from the pool of unused, based on passed wwpn or name arguments. It does
all its best to make IID unique and persistent across reconnects.

This makes persistent reservation properly work for iSCSI. Previously,
in case of reconnects, reservation could be unexpectedly lost, or even
migrate between intiators.


# 268686 15-Jul-2014 mav

MFC r268308:
Make REPORT TARGET PORT GROUPS command report realistic data instead of
hardcoded garbage.


# 268683 15-Jul-2014 mav

MFC r268293:
Burry devid port method, which was a gross hack.

Instead make ports provide wanted port and target IDs, and LUNs provide
wanted LUN IDs. After that core Device ID VPD code only had to link all
of them together and add relative port and port group numbers.

LUN ID for iSCSI LUNs no longer created by CTL, but by ctld, and passed
to CTL as "scsiname" LUN option. This makes LUNs to report the same set
of IDs, independently from the port through which it is accessed, as
required by SCSI specifications.


# 268677 15-Jul-2014 mav

MFC r268266, r268275:
Separate concepts of frontend and port.

Before iSCSI implementation CTL had no knowledge about frontend drivers,
it had only frontends, which really were ports (alike to LUNs, if comparing
to backends). But iSCSI added there ioctl() method, which does not belong
to frontend as a port, but belongs to a frontend driver.


# 268675 15-Jul-2014 mav

MFC r268103:
Add support for REPORT TIMESTAMP command.


# 268674 15-Jul-2014 mav

MFC r268096, r268306, r268361:
Add more formal and strict command parsing and validation.

For every supported command define CDB length and mask of bits that are
allowed to be set. This allows to remove bunch of checks through the code
and still make the validation more strict. To properly do it for commands
supporting multiple service actions, formalize their parsing by adding
subtables for each of such commands.

As visible effect, this change allows to add support for REPORT SUPPORTED
OPERATION CODES command, reporting to client all the data about supported
SCSI commands, except timeouts.


# 268556 12-Jul-2014 mav

MFC r267643, r267873, r268391, r268398:
Introduce fine-grained CTL locking to improve SMP scalability.

Split global ctl_lock, historically protecting most of CTL context:
- remaining ctl_lock now protects lists of fronends and backends;
- per-LUN lun_lock(s) protect LUN-specific information;
- per-thread queue_lock(s) protect request queues.
This allows to radically reduce congestion on ctl_lock.

Create multiple worker threads, depending on number of CPUs, and assign
each LUN to one of them. This allows to spread load between multiple CPUs,
still avoiging congestion on queues and LUNs locks.

On 40-core server, exporting 5 LUNs, each backed by gstripe of SATA SSDs,
accessed via 6 iSCSI connections, this change improves peak request rate
from 250K to 680K IOPS.

Sponsored by: iXsystems, Inc.


# 268550 12-Jul-2014 mav

MFC r267905:
Add READ BUFFER and improve WRITE BUFFER SCSI commands support.

This gives some use to 512KB per-LUN buffers, allocated for Copan-specific
processor code and not used. It allows, for example, to test transport
performance and/or correctness without accessing the media, as supported
by Linux version of sg3_utils.


# 268151 02-Jul-2014 mav

MFC r267537:
Add support for VERIFY(10/12/16) and COMPARE AND WRITE SCSI commands.

Make data_submit backends method support not only read and write requests,
but also two new ones: verify and compare. Verify just checks readability
of the data in specified location without transferring them outside.
Compare reads the specified data and compares them to received data,
returning error if they are different.

VERIFY(10/12/16) commands request either verify or compare from backend,
depending on BYTCHK CDB field. COMPARE AND WRITE command executed in two
stages: first it requests compare, and then, if succeesed, requests write.
Atomicity of operation is guarantied by CTL request ordering code.

Sponsored by: iXsystems, Inc.


# 268144 02-Jul-2014 mav

MFC r267485:
Remove non-functional remnants of control LUN -- 18MB of RAM for nothing.


# 265634 08-May-2014 mav

MFC r264274, r264279, r264283, r264296, r264297:
Add support for SCSI UNMAP commands to CTL.

This patch adds support for three new SCSI commands: UNMAP, WRITE SAME(10)
and WRITE SAME(16). WRITE SAME commands support both normal write mode
and UNMAP flag. To properly report UNMAP capabilities this patch also adds
support for reporting two new VPD pages: Block limits and Logical Block
Provisioning.

UNMAP support can be enabled per-LUN by adding "-o unmap=on" to `ctladm
create` command line or "option unmap on" to lun sections of /etc/ctl.conf.

At this moment UNMAP supported for ramdisks and device-backed block LUNs.
It was tested to work great with ZFS ZVOLs. For file-backed LUNs UNMAP
support is unfortunately missing due to absence of respective VFS KPI.

Sponsored by: iXsystems, Inc


# 260477 09-Jan-2014 mav

MFC r257946:
Introduce seperate mutex lock to protect protect CTL I/O pools, slightly
reducing global CTL lock scope and congestion.

While there, simplify CTL I/O pools KPI, hiding implementation details.