History log of /freebsd-10-stable/sys/cam/ctl/ctl_io.h
Revision Date Author Comments
# 313367 07-Feb-2017 mav

MFC r312348: Remove writing 'residual' field of struct ctl_scsiio.

This field has no practical use and never readed. Initiators already
receive respective residual size from frontends. Removed field had
different semantics, which looks useless, and was never passed through
by any frontend.

While there, fix kern_data_resid field support in case of HA, missed in
r312291.


# 312835 26-Jan-2017 mav

MFC r310778, r310782: Improve use of I/O's private area.

- Since I/Os are allocates from per-port pools, make allocations store
pointer to CTL softc there, and use it where needed instead of global.
- Created bunch of helper macros to access LUN, port and CTL softc.


# 312585 21-Jan-2017 mav

MFC r310649: Allow more efficient use of private area.

There are 16 bytes of space, so we may store two pointers in one.


# 312577 21-Jan-2017 mav

MFC r310635: Decouple limits on number of LUNs per port and LUs per CTL.

Those two values are not directly related, so make them independent.
This does not change any limits immediately, but makes number of LUNs
per port controllable via tunable/sysctl kern.cam.ctl.lun_map_size.
After this change increasing CTL_MAX_LUNS should be pretty cheap,
and even making it tunable should be easy.


# 311402 05-Jan-2017 mav

MFC r298810 (by pfg): sys/cam: spelling fixes in comments.

No functional change.


# 288811 05-Oct-2015 mav

MFC r288348: Umplement media load/eject support for removable devices.

In case of block backend eject really closes the backing store, while
load tries to open it back. Failed store open is reported as no media.


# 288798 05-Oct-2015 mav

MFC r288215: Switch I/O time accounting from system time to uptime.

While there, make num_dmas accounted independently of CTL_TIME_IO.


# 288790 05-Oct-2015 mav

MFC r288148: Synchronize mode pages between HA peers.

We allow to modify only few fields in mode pages now, but still it is
not good if they unexpectedly change during failover. Also this fixes
reporting of "Mode parameters changed" UAs on secondary node.


# 288789 05-Oct-2015 mav

MFC r288146: Make HA peers announce their parameters on connect.

HA protocol requires strict version, parameters and configuration match.
Differences there may cause full set of problems up to kernel panic.
To avoid that, validate peer parameters on connect, and abort connection
immediately if some mismatch detected.


# 288781 05-Oct-2015 mav

MFC r288020: Remove couple excess SGLIST I/O flags.

Those flags duplicated respective (sg_entries > 0) values.


# 288777 05-Oct-2015 mav

MFC r287991: Pack struct ctl_ha_msg_hdr by 8 bytes.


# 288774 05-Oct-2015 mav

MFC r287967: Relax serseq option operation for reads.

Previously, with serseq enabled, next command was unblocked only after
previous completed. With this change, for read operations, next command
is unblocked as soon as last media read completed. This is important
for frontends that actually wait for data move completion (like camtgt),
or when data are moved through the HA link, or especially when both.


# 288770 05-Oct-2015 mav

MFC r287940: Replicate initiators WWPNs and names between HA peers.


# 288769 05-Oct-2015 mav

MFC r287933: Replicate port->init_devid to HA peer.


# 288768 05-Oct-2015 mav

MFC r287921: When reporting TPT UA, report which of thresholds was reached.


# 288755 05-Oct-2015 mav

MFC r287778: Remove CTL_PRIV_LBA_LEN from HA messages.

Previously it was used for statistics, but now just a 16 extra bytes.


# 288754 05-Oct-2015 mav

MFC r287774: Implement QUERY TASK, QUERY TASK SET and QUERY ASYNC EVENT.

Now we support most of SAM-5 task management.


# 288732 05-Oct-2015 mav

MFC r287621: Reimplement CTL High Availability.

CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.

This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.

Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.

The code may still need some polishing, but generally it is functional.

Relnotes: yes
Sponsored by: iXsystems, Inc.


# 288731 05-Oct-2015 mav

MFC r287620: Remove unused target and initiator IDs.


# 288730 05-Oct-2015 mav

MFC r287618: Disable CTL_IO_DELAY feature.

It is too developer-oriented to be enabled by default.


# 288723 05-Oct-2015 mav

MFC r287293: Remove 600 bytes of port_priv from struct ctl_io_hdr.

This field used only for camtgt frontend, and once it any way preallocates
all requests, let it preallocate this memory too, not bothering core code.


# 275881 18-Dec-2014 mav

MFC r275058: Coalesce last data move and command status for read commands.

Make CTL core and block backend set success status before initiating last
data move for read commands. Make CAM target and iSCSI frontends detect
such condition and send command status together with data. New I/O flag
allows to skip duplicate status sending on later fe_done() call.

For Fibre Channel this change saves one of three interrupts per read command,
increasing performance from 126K to 160K IOPS. For iSCSI this change saves
one of three PDUs per read command, increasing performance from 1M to 1.2M
IOPS.

Sponsored by: iXsystems, Inc.


# 275878 18-Dec-2014 mav

MFC r274962: Replace home-grown CTL IO allocator with UMA.

Old allocator created significant lock congestion protecting its lists
of preallocated I/Os, while UMA provides much better SMP scalability.
The downside of UMA is lack of reliable preallocation, that could guarantee
successful allocation in non-sleepable environments. But careful code
review shown, that only CAM target frontend really has that requirement.
Fix that making that frontend preallocate and statically bind CTL I/O for
every ATIO/INOT it preallocates any way. That allows to avoid allocations
in hot I/O path. Other frontends either may sleep in allocation context
or can properly handle allocation errors.

On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections
this change increases peak performance from ~700K to >1M IOPS! Yay! :)

Sponsored by: iXsystems, Inc.


# 273978 02-Nov-2014 mav

MFC r273075: Remove couple Copan's vendor-specific mode pages.

Those pages are highly system-/hardware-specific, the code is incomplete,
and so they hardly can be useful for anybody else.


# 272616 06-Oct-2014 mav

MFC r271309:
Improve cache control support, including DPO/FUA flags and the mode page.

At this moment it works only for files and ZVOLs in device mode since BIOs
have no respective respective cache control flags (DPO/FUA).


# 269298 30-Jul-2014 mav

MFC r268808:
Increase maximal number of SCSI ports in CTL from 32 to 128.

After I gave each iSCSI target its own port, the old limit appeared to be
not so big. This change almost proportionally increases per-LUN memory
use, but it is still three times better then it was before r268807.


# 268697 15-Jul-2014 mav

MFC r268418:
Enable TAS feature: notify initiator if its command was aborted by other.

That should make operation more kind to multi-initiator environment.
Without this, other initiators may find out that something bad happened
to their commands only via command timeout.


# 268690 15-Jul-2014 mav

MFC r268353:
Implement ABORT TASK SET and I_T NEXUS RESET task management functions.

Use the last one to terminate active commands on iSCSI session termination.
Previous code was aborting only commands doing some data moves.


# 268685 15-Jul-2014 mav

MFC r268307:
Move lun_map() method from command nexus to port.

Previous implementation made impossible to do some things, such as calling
it for ports other then one through which command arrived.


# 268556 12-Jul-2014 mav

MFC r267643, r267873, r268391, r268398:
Introduce fine-grained CTL locking to improve SMP scalability.

Split global ctl_lock, historically protecting most of CTL context:
- remaining ctl_lock now protects lists of fronends and backends;
- per-LUN lun_lock(s) protect LUN-specific information;
- per-thread queue_lock(s) protect request queues.
This allows to radically reduce congestion on ctl_lock.

Create multiple worker threads, depending on number of CPUs, and assign
each LUN to one of them. This allows to spread load between multiple CPUs,
still avoiging congestion on queues and LUNs locks.

On 40-core server, exporting 5 LUNs, each backed by gstripe of SATA SSDs,
accessed via 6 iSCSI connections, this change improves peak request rate
from 250K to 680K IOPS.

Sponsored by: iXsystems, Inc.


# 268151 02-Jul-2014 mav

MFC r267537:
Add support for VERIFY(10/12/16) and COMPARE AND WRITE SCSI commands.

Make data_submit backends method support not only read and write requests,
but also two new ones: verify and compare. Verify just checks readability
of the data in specified location without transferring them outside.
Compare reads the specified data and compares them to received data,
returning error if they are different.

VERIFY(10/12/16) commands request either verify or compare from backend,
depending on BYTCHK CDB field. COMPARE AND WRITE command executed in two
stages: first it requests compare, and then, if succeesed, requests write.
Atomicity of operation is guarantied by CTL request ordering code.

Sponsored by: iXsystems, Inc.


# 268142 02-Jul-2014 mav

MFC r265323 (by trasz):
Provide better descriptions for 'struct ctl_scsiio' fields; based mostly
on emails from ken@.


# 265634 08-May-2014 mav

MFC r264274, r264279, r264283, r264296, r264297:
Add support for SCSI UNMAP commands to CTL.

This patch adds support for three new SCSI commands: UNMAP, WRITE SAME(10)
and WRITE SAME(16). WRITE SAME commands support both normal write mode
and UNMAP flag. To properly report UNMAP capabilities this patch also adds
support for reporting two new VPD pages: Block limits and Logical Block
Provisioning.

UNMAP support can be enabled per-LUN by adding "-o unmap=on" to `ctladm
create` command line or "option unmap on" to lun sections of /etc/ctl.conf.

At this moment UNMAP supported for ramdisks and device-backed block LUNs.
It was tested to work great with ZFS ZVOLs. For file-backed LUNs UNMAP
support is unfortunately missing due to absence of respective VFS KPI.

Sponsored by: iXsystems, Inc


# 288811 05-Oct-2015 mav

MFC r288348: Umplement media load/eject support for removable devices.

In case of block backend eject really closes the backing store, while
load tries to open it back. Failed store open is reported as no media.


# 288798 05-Oct-2015 mav

MFC r288215: Switch I/O time accounting from system time to uptime.

While there, make num_dmas accounted independently of CTL_TIME_IO.


# 288790 05-Oct-2015 mav

MFC r288148: Synchronize mode pages between HA peers.

We allow to modify only few fields in mode pages now, but still it is
not good if they unexpectedly change during failover. Also this fixes
reporting of "Mode parameters changed" UAs on secondary node.


# 288789 05-Oct-2015 mav

MFC r288146: Make HA peers announce their parameters on connect.

HA protocol requires strict version, parameters and configuration match.
Differences there may cause full set of problems up to kernel panic.
To avoid that, validate peer parameters on connect, and abort connection
immediately if some mismatch detected.


# 288781 05-Oct-2015 mav

MFC r288020: Remove couple excess SGLIST I/O flags.

Those flags duplicated respective (sg_entries > 0) values.


# 288777 05-Oct-2015 mav

MFC r287991: Pack struct ctl_ha_msg_hdr by 8 bytes.


# 288774 05-Oct-2015 mav

MFC r287967: Relax serseq option operation for reads.

Previously, with serseq enabled, next command was unblocked only after
previous completed. With this change, for read operations, next command
is unblocked as soon as last media read completed. This is important
for frontends that actually wait for data move completion (like camtgt),
or when data are moved through the HA link, or especially when both.


# 288770 05-Oct-2015 mav

MFC r287940: Replicate initiators WWPNs and names between HA peers.


# 288769 05-Oct-2015 mav

MFC r287933: Replicate port->init_devid to HA peer.


# 288768 05-Oct-2015 mav

MFC r287921: When reporting TPT UA, report which of thresholds was reached.


# 288755 05-Oct-2015 mav

MFC r287778: Remove CTL_PRIV_LBA_LEN from HA messages.

Previously it was used for statistics, but now just a 16 extra bytes.


# 288754 05-Oct-2015 mav

MFC r287774: Implement QUERY TASK, QUERY TASK SET and QUERY ASYNC EVENT.

Now we support most of SAM-5 task management.


# 288732 05-Oct-2015 mav

MFC r287621: Reimplement CTL High Availability.

CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.

This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.

Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.

The code may still need some polishing, but generally it is functional.

Relnotes: yes
Sponsored by: iXsystems, Inc.


# 288731 05-Oct-2015 mav

MFC r287620: Remove unused target and initiator IDs.


# 288730 05-Oct-2015 mav

MFC r287618: Disable CTL_IO_DELAY feature.

It is too developer-oriented to be enabled by default.


# 288723 05-Oct-2015 mav

MFC r287293: Remove 600 bytes of port_priv from struct ctl_io_hdr.

This field used only for camtgt frontend, and once it any way preallocates
all requests, let it preallocate this memory too, not bothering core code.


# 275881 18-Dec-2014 mav

MFC r275058: Coalesce last data move and command status for read commands.

Make CTL core and block backend set success status before initiating last
data move for read commands. Make CAM target and iSCSI frontends detect
such condition and send command status together with data. New I/O flag
allows to skip duplicate status sending on later fe_done() call.

For Fibre Channel this change saves one of three interrupts per read command,
increasing performance from 126K to 160K IOPS. For iSCSI this change saves
one of three PDUs per read command, increasing performance from 1M to 1.2M
IOPS.

Sponsored by: iXsystems, Inc.


# 275878 18-Dec-2014 mav

MFC r274962: Replace home-grown CTL IO allocator with UMA.

Old allocator created significant lock congestion protecting its lists
of preallocated I/Os, while UMA provides much better SMP scalability.
The downside of UMA is lack of reliable preallocation, that could guarantee
successful allocation in non-sleepable environments. But careful code
review shown, that only CAM target frontend really has that requirement.
Fix that making that frontend preallocate and statically bind CTL I/O for
every ATIO/INOT it preallocates any way. That allows to avoid allocations
in hot I/O path. Other frontends either may sleep in allocation context
or can properly handle allocation errors.

On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections
this change increases peak performance from ~700K to >1M IOPS! Yay! :)

Sponsored by: iXsystems, Inc.


# 273978 02-Nov-2014 mav

MFC r273075: Remove couple Copan's vendor-specific mode pages.

Those pages are highly system-/hardware-specific, the code is incomplete,
and so they hardly can be useful for anybody else.


# 272616 06-Oct-2014 mav

MFC r271309:
Improve cache control support, including DPO/FUA flags and the mode page.

At this moment it works only for files and ZVOLs in device mode since BIOs
have no respective respective cache control flags (DPO/FUA).


# 269298 30-Jul-2014 mav

MFC r268808:
Increase maximal number of SCSI ports in CTL from 32 to 128.

After I gave each iSCSI target its own port, the old limit appeared to be
not so big. This change almost proportionally increases per-LUN memory
use, but it is still three times better then it was before r268807.


# 268697 15-Jul-2014 mav

MFC r268418:
Enable TAS feature: notify initiator if its command was aborted by other.

That should make operation more kind to multi-initiator environment.
Without this, other initiators may find out that something bad happened
to their commands only via command timeout.


# 268690 15-Jul-2014 mav

MFC r268353:
Implement ABORT TASK SET and I_T NEXUS RESET task management functions.

Use the last one to terminate active commands on iSCSI session termination.
Previous code was aborting only commands doing some data moves.


# 268685 15-Jul-2014 mav

MFC r268307:
Move lun_map() method from command nexus to port.

Previous implementation made impossible to do some things, such as calling
it for ports other then one through which command arrived.


# 268556 12-Jul-2014 mav

MFC r267643, r267873, r268391, r268398:
Introduce fine-grained CTL locking to improve SMP scalability.

Split global ctl_lock, historically protecting most of CTL context:
- remaining ctl_lock now protects lists of fronends and backends;
- per-LUN lun_lock(s) protect LUN-specific information;
- per-thread queue_lock(s) protect request queues.
This allows to radically reduce congestion on ctl_lock.

Create multiple worker threads, depending on number of CPUs, and assign
each LUN to one of them. This allows to spread load between multiple CPUs,
still avoiging congestion on queues and LUNs locks.

On 40-core server, exporting 5 LUNs, each backed by gstripe of SATA SSDs,
accessed via 6 iSCSI connections, this change improves peak request rate
from 250K to 680K IOPS.

Sponsored by: iXsystems, Inc.


# 268151 02-Jul-2014 mav

MFC r267537:
Add support for VERIFY(10/12/16) and COMPARE AND WRITE SCSI commands.

Make data_submit backends method support not only read and write requests,
but also two new ones: verify and compare. Verify just checks readability
of the data in specified location without transferring them outside.
Compare reads the specified data and compares them to received data,
returning error if they are different.

VERIFY(10/12/16) commands request either verify or compare from backend,
depending on BYTCHK CDB field. COMPARE AND WRITE command executed in two
stages: first it requests compare, and then, if succeesed, requests write.
Atomicity of operation is guarantied by CTL request ordering code.

Sponsored by: iXsystems, Inc.


# 268142 02-Jul-2014 mav

MFC r265323 (by trasz):
Provide better descriptions for 'struct ctl_scsiio' fields; based mostly
on emails from ken@.


# 265634 08-May-2014 mav

MFC r264274, r264279, r264283, r264296, r264297:
Add support for SCSI UNMAP commands to CTL.

This patch adds support for three new SCSI commands: UNMAP, WRITE SAME(10)
and WRITE SAME(16). WRITE SAME commands support both normal write mode
and UNMAP flag. To properly report UNMAP capabilities this patch also adds
support for reporting two new VPD pages: Block limits and Logical Block
Provisioning.

UNMAP support can be enabled per-LUN by adding "-o unmap=on" to `ctladm
create` command line or "option unmap on" to lun sections of /etc/ctl.conf.

At this moment UNMAP supported for ramdisks and device-backed block LUNs.
It was tested to work great with ZFS ZVOLs. For file-backed LUNs UNMAP
support is unfortunately missing due to absence of respective VFS KPI.

Sponsored by: iXsystems, Inc