History log of /freebsd-10-stable/sys/cam/ctl/ctl_tpc.c
Revision Date Author Comments
# 314767 06-Mar-2017 mav

MFC r314338: Polish handling of different reset flavours.

The biggest change is that ctl_remove_initiator() now generates I_T NEXUS
LOSS event, cleaning part of LUs state related to the initiator.


# 314238 25-Feb-2017 mav

MFC r313910: Change XCOPY memory allocations.

Before this change XCOPY code could allocate memory in chunks up to 16-32MB
(VMware does XCOPY in 4MB chunks by default), that could be difficult for
VM subsystem to do due to KVA fragmentation, that sometimes created huge
allocation delays, blocking any I/O for respective LU for that time.

This change limits allocations down to TPC_MAX_IO_SIZE, which is 1MB now.
1MB is also not a cookie, but ZFS also can do that for large blocks, so
it should be less dramatic. As drawback this increases CPU overhead, but
it still look acceptable comparing to time consumed by ZFS read/write.


# 313367 07-Feb-2017 mav

MFC r312348: Remove writing 'residual' field of struct ctl_scsiio.

This field has no practical use and never readed. Initiators already
receive respective residual size from frontends. Removed field had
different semantics, which looks useless, and was never passed through
by any frontend.

While there, fix kern_data_resid field support in case of HA, missed in
r312291.


# 313365 07-Feb-2017 mav

MFC r312291, r312669:
Make CTL frontends report kern_data_resid for under-/overruns.

It seems like kern_data_resid was never really implemented. This change
finally does it. Now frontends update this field while transferring data,
while CTL/backends getting it can more flexibly handle the result.
At this point behavior should not change significantly, still reporting
errors on write overrun, but that may be changed later, if we decide so.

CAM target frontend still does not properly handle overruns due to CAM API
limitations. We may need to add some fields to struct ccb_accept_tio to
pass information about initiator requested transfer size(s).


# 312835 26-Jan-2017 mav

MFC r310778, r310782: Improve use of I/O's private area.

- Since I/Os are allocates from per-port pools, make allocations store
pointer to CTL softc there, and use it where needed instead of global.
- Created bunch of helper macros to access LUN, port and CTL softc.


# 312571 21-Jan-2017 mav

MFC r310539: Remove CTL_MAX_LUNS from places where it is not required.


# 311442 05-Jan-2017 mav

MFC r310534: Improve third-party copy error reporting.

For EXTENDED COPY:
- improve parameters checking to report some errors before copy start;
- forward sense data from copy target as descriptor in case of error;
- report which CSCD reported error in sense key specific information.
For WRITE USING TOKEN:
- pass through real sense data from copy target instead of reporting
our copy error, since for initiator its a "simple" write, not a copy.


# 311417 05-Jan-2017 mav

MFC r310285:
When reporting "Logical block address out of range" error, report the LBA
in sense data INFORMATION field.


# 300588 24-May-2016 mav

MFC r299347, r299348: Validate XCOPY range offsets and lengths.


# 300587 24-May-2016 mav

MFC r299346: More XCOPY parameters validation.


# 300586 24-May-2016 mav

MFC r299329: Improve validation of some POPULATE TOKEN parameters.


# 288822 05-Oct-2015 mav

MFC r288458: More aggressively fill WUT read pipeline.

On some tests I've measured 5% copy speedup from this.


# 288821 05-Oct-2015 mav

MFC r288450: Make zero WUT use WRITE SAME with recently allowed NDOB flag.


# 288814 05-Oct-2015 mav

MFC r288367: Fix arguments order.


# 288767 05-Oct-2015 mav

MFC r287913: Report number of failed XCOPY segment.


# 288740 05-Oct-2015 mav

MFC r287715: Improve XCOPY error reporting.


# 288739 05-Oct-2015 mav

MFC r287714: Report that we have no limit on POPULATE TOKEN segment size.


# 288732 05-Oct-2015 mav

MFC r287621: Reimplement CTL High Availability.

CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.

This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.

Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.

The code may still need some polishing, but generally it is functional.

Relnotes: yes
Sponsored by: iXsystems, Inc.


# 288719 05-Oct-2015 mav

MFC r286806: Drop "internal" CTL frontend.

Its idea was to be a simple initiator and execute several commands from
kernel level, but FreeBSD never had consumer for that functionality,
while its implementation polluted many unrelated places.


# 286928 19-Aug-2015 mav

MFC r286320: Issue all reads of single XCOPY segment simultaneously.

During vMotion and Clone VMware by default runs multiple sequential 4MB
XCOPY requests same time. If CTL issues reads sequentially in 1MB chunks
for each XCOPY command, reads from different commands are not detected
as sequential by serseq option code and allowed to execute simultaneously.
Such read pattern confused ZFS prefetcher, causing suboptimal disk access.
Issuing all reads same time make serseq code work properly, serializing
reads both within each XCOPY command and between them.

My tests with ZFS pool of 14 disks in RAID10 shows prefetcher efficiency
improved from 37% to 99.7%, copying speed improved by 10-60%, average
read latency reduced twice on HDD layer and by five times on zvol layer.


# 284796 25-Jun-2015 mav

MFC r284639: Introduce separate lock for tokens to reduce ctl_lock scope.


# 279004 19-Feb-2015 mav

MFC r278625: Make XCOPY and WUT commands respect physical block size/offset.

This change by 2-3 times improves performance of misaligned XCOPY and WUT
commands by avoiding unneeded read-modify-write cycles inside ZFS.


# 277810 27-Jan-2015 mav

MFC r277647: Fix wrong LUN reference in XCOPY block-to-block operation.

This could cause data corruption due to accessing wrong LUN in case of
retries on write errors. Failed writes were retried to read LUN.


# 276614 03-Jan-2015 mav

MFC r275942: Reduce number of places where global control_softc is used.

At some point we may want to have several CTL instances, and that is not
really impossible.


# 275881 18-Dec-2014 mav

MFC r275058: Coalesce last data move and command status for read commands.

Make CTL core and block backend set success status before initiating last
data move for read commands. Make CAM target and iSCSI frontends detect
such condition and send command status together with data. New I/O flag
allows to skip duplicate status sending on later fe_done() call.

For Fibre Channel this change saves one of three interrupts per read command,
increasing performance from 126K to 160K IOPS. For iSCSI this change saves
one of three PDUs per read command, increasing performance from 1M to 1.2M
IOPS.

Sponsored by: iXsystems, Inc.


# 275878 18-Dec-2014 mav

MFC r274962: Replace home-grown CTL IO allocator with UMA.

Old allocator created significant lock congestion protecting its lists
of preallocated I/Os, while UMA provides much better SMP scalability.
The downside of UMA is lack of reliable preallocation, that could guarantee
successful allocation in non-sleepable environments. But careful code
review shown, that only CAM target frontend really has that requirement.
Fix that making that frontend preallocate and statically bind CTL I/O for
every ATIO/INOT it preallocates any way. That allows to avoid allocations
in hot I/O path. Other frontends either may sleep in allocation context
or can properly handle allocation errors.

On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections
this change increases peak performance from ~700K to >1M IOPS! Yay! :)

Sponsored by: iXsystems, Inc.


# 275673 10-Dec-2014 mav

MFC r275446: Plug memory leaks on UNMAP and XCOPY with invalid parameters.


# 272647 06-Oct-2014 mav

MFC r272355: Fix couple issues with ROD tokens content.


# 271904 20-Sep-2014 mav

MFC r271702:
Fix tpc_create_token() introduced in r269497 to encode CREATOR LOGICAL
UNIT DESCRIPTOR field as Identification Descriptor CSCD descriptor, not
just as Identification Descriptor.

Approved by: re (gjb)


# 270389 23-Aug-2014 mav

MFC r270176:
Fix lock recursion on LUN shutdown, introduced on r269497.


# 270107 17-Aug-2014 mav

MFC r269587:
Reimplement WRITE USING TOKEN with Block Zero token using WRITE SAME.

On my ZVOL of SSDs that increases speed of zero writing in that way from
1 to 2.5GB/s by reducing CPU overhead.


# 270106 17-Aug-2014 mav

MFC r269497:
Add support for Windows dialect of EXTENDED COPY command, aka Microsoft ODX.

This allows to avoid extra network traffic when copying files on NTFS iSCSI
disks within one storage host by drag'n'dropping them in Windows Explorer
of Windows 8/2012. It should also accelerate Hyper-V VM operations, etc.


# 269574 05-Aug-2014 mav

MFC r269444, r269450:
Plug EXTENDED COPY request data memory leak.


# 269572 05-Aug-2014 mav

MFC r269442:
Fix some bugs in RECEIVE COPY STATUS data.


# 269570 05-Aug-2014 mav

MFC r269441:
Add missing comparisons to make list IDs in EXTENDED COPY per-initiator,
as they should be. Wrap it into a function to not duplicate the code.


# 269298 30-Jul-2014 mav

MFC r268808:
Increase maximal number of SCSI ports in CTL from 32 to 128.

After I gave each iSCSI target its own port, the old limit appeared to be
not so big. This change almost proportionally increases per-LUN memory
use, but it is still three times better then it was before r268807.


# 269296 30-Jul-2014 mav

MFC r268767:
Add support for VMWare dialect of EXTENDED COPY command, aka VAAI Clone.

This allows to clone VMs and move them between LUNs inside one storage
host without generating extra network traffic to the initiator and back,
and without being limited by network bandwidth.

LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set.
For LUNs without these IDs VMWare will use traditional copy operations.

Beware: the above LUN IDs explicitly set to values non-unique from the VM
cluster point of view may cause data corruption if wrong LUN is addressed!

Sponsored by: iXsystems, Inc.


# 288740 05-Oct-2015 mav

MFC r287715: Improve XCOPY error reporting.


# 288739 05-Oct-2015 mav

MFC r287714: Report that we have no limit on POPULATE TOKEN segment size.


# 288732 05-Oct-2015 mav

MFC r287621: Reimplement CTL High Availability.

CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.

This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.

Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.

The code may still need some polishing, but generally it is functional.

Relnotes: yes
Sponsored by: iXsystems, Inc.


# 288719 05-Oct-2015 mav

MFC r286806: Drop "internal" CTL frontend.

Its idea was to be a simple initiator and execute several commands from
kernel level, but FreeBSD never had consumer for that functionality,
while its implementation polluted many unrelated places.


# 286928 19-Aug-2015 mav

MFC r286320: Issue all reads of single XCOPY segment simultaneously.

During vMotion and Clone VMware by default runs multiple sequential 4MB
XCOPY requests same time. If CTL issues reads sequentially in 1MB chunks
for each XCOPY command, reads from different commands are not detected
as sequential by serseq option code and allowed to execute simultaneously.
Such read pattern confused ZFS prefetcher, causing suboptimal disk access.
Issuing all reads same time make serseq code work properly, serializing
reads both within each XCOPY command and between them.

My tests with ZFS pool of 14 disks in RAID10 shows prefetcher efficiency
improved from 37% to 99.7%, copying speed improved by 10-60%, average
read latency reduced twice on HDD layer and by five times on zvol layer.


# 284796 25-Jun-2015 mav

MFC r284639: Introduce separate lock for tokens to reduce ctl_lock scope.


# 279004 19-Feb-2015 mav

MFC r278625: Make XCOPY and WUT commands respect physical block size/offset.

This change by 2-3 times improves performance of misaligned XCOPY and WUT
commands by avoiding unneeded read-modify-write cycles inside ZFS.


# 277810 27-Jan-2015 mav

MFC r277647: Fix wrong LUN reference in XCOPY block-to-block operation.

This could cause data corruption due to accessing wrong LUN in case of
retries on write errors. Failed writes were retried to read LUN.


# 276614 03-Jan-2015 mav

MFC r275942: Reduce number of places where global control_softc is used.

At some point we may want to have several CTL instances, and that is not
really impossible.


# 275881 18-Dec-2014 mav

MFC r275058: Coalesce last data move and command status for read commands.

Make CTL core and block backend set success status before initiating last
data move for read commands. Make CAM target and iSCSI frontends detect
such condition and send command status together with data. New I/O flag
allows to skip duplicate status sending on later fe_done() call.

For Fibre Channel this change saves one of three interrupts per read command,
increasing performance from 126K to 160K IOPS. For iSCSI this change saves
one of three PDUs per read command, increasing performance from 1M to 1.2M
IOPS.

Sponsored by: iXsystems, Inc.


# 275878 18-Dec-2014 mav

MFC r274962: Replace home-grown CTL IO allocator with UMA.

Old allocator created significant lock congestion protecting its lists
of preallocated I/Os, while UMA provides much better SMP scalability.
The downside of UMA is lack of reliable preallocation, that could guarantee
successful allocation in non-sleepable environments. But careful code
review shown, that only CAM target frontend really has that requirement.
Fix that making that frontend preallocate and statically bind CTL I/O for
every ATIO/INOT it preallocates any way. That allows to avoid allocations
in hot I/O path. Other frontends either may sleep in allocation context
or can properly handle allocation errors.

On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections
this change increases peak performance from ~700K to >1M IOPS! Yay! :)

Sponsored by: iXsystems, Inc.


# 275673 10-Dec-2014 mav

MFC r275446: Plug memory leaks on UNMAP and XCOPY with invalid parameters.


# 272647 06-Oct-2014 mav

MFC r272355: Fix couple issues with ROD tokens content.


# 271904 20-Sep-2014 mav

MFC r271702:
Fix tpc_create_token() introduced in r269497 to encode CREATOR LOGICAL
UNIT DESCRIPTOR field as Identification Descriptor CSCD descriptor, not
just as Identification Descriptor.

Approved by: re (gjb)


# 270389 23-Aug-2014 mav

MFC r270176:
Fix lock recursion on LUN shutdown, introduced on r269497.


# 270107 17-Aug-2014 mav

MFC r269587:
Reimplement WRITE USING TOKEN with Block Zero token using WRITE SAME.

On my ZVOL of SSDs that increases speed of zero writing in that way from
1 to 2.5GB/s by reducing CPU overhead.


# 270106 17-Aug-2014 mav

MFC r269497:
Add support for Windows dialect of EXTENDED COPY command, aka Microsoft ODX.

This allows to avoid extra network traffic when copying files on NTFS iSCSI
disks within one storage host by drag'n'dropping them in Windows Explorer
of Windows 8/2012. It should also accelerate Hyper-V VM operations, etc.


# 269574 05-Aug-2014 mav

MFC r269444, r269450:
Plug EXTENDED COPY request data memory leak.


# 269572 05-Aug-2014 mav

MFC r269442:
Fix some bugs in RECEIVE COPY STATUS data.


# 269570 05-Aug-2014 mav

MFC r269441:
Add missing comparisons to make list IDs in EXTENDED COPY per-initiator,
as they should be. Wrap it into a function to not duplicate the code.


# 269298 30-Jul-2014 mav

MFC r268808:
Increase maximal number of SCSI ports in CTL from 32 to 128.

After I gave each iSCSI target its own port, the old limit appeared to be
not so big. This change almost proportionally increases per-LUN memory
use, but it is still three times better then it was before r268807.


# 269296 30-Jul-2014 mav

MFC r268767:
Add support for VMWare dialect of EXTENDED COPY command, aka VAAI Clone.

This allows to clone VMs and move them between LUNs inside one storage
host without generating extra network traffic to the initiator and back,
and without being limited by network bandwidth.

LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set.
For LUNs without these IDs VMWare will use traditional copy operations.

Beware: the above LUN IDs explicitly set to values non-unique from the VM
cluster point of view may cause data corruption if wrong LUN is addressed!

Sponsored by: iXsystems, Inc.