#
314767 |
|
06-Mar-2017 |
mav |
MFC r314338: Polish handling of different reset flavours.
The biggest change is that ctl_remove_initiator() now generates I_T NEXUS LOSS event, cleaning part of LUs state related to the initiator.
|
#
314238 |
|
25-Feb-2017 |
mav |
MFC r313910: Change XCOPY memory allocations.
Before this change XCOPY code could allocate memory in chunks up to 16-32MB (VMware does XCOPY in 4MB chunks by default), that could be difficult for VM subsystem to do due to KVA fragmentation, that sometimes created huge allocation delays, blocking any I/O for respective LU for that time.
This change limits allocations down to TPC_MAX_IO_SIZE, which is 1MB now. 1MB is also not a cookie, but ZFS also can do that for large blocks, so it should be less dramatic. As drawback this increases CPU overhead, but it still look acceptable comparing to time consumed by ZFS read/write.
|
#
313367 |
|
07-Feb-2017 |
mav |
MFC r312348: Remove writing 'residual' field of struct ctl_scsiio.
This field has no practical use and never readed. Initiators already receive respective residual size from frontends. Removed field had different semantics, which looks useless, and was never passed through by any frontend.
While there, fix kern_data_resid field support in case of HA, missed in r312291.
|
#
313365 |
|
07-Feb-2017 |
mav |
MFC r312291, r312669: Make CTL frontends report kern_data_resid for under-/overruns.
It seems like kern_data_resid was never really implemented. This change finally does it. Now frontends update this field while transferring data, while CTL/backends getting it can more flexibly handle the result. At this point behavior should not change significantly, still reporting errors on write overrun, but that may be changed later, if we decide so.
CAM target frontend still does not properly handle overruns due to CAM API limitations. We may need to add some fields to struct ccb_accept_tio to pass information about initiator requested transfer size(s).
|
#
312835 |
|
26-Jan-2017 |
mav |
MFC r310778, r310782: Improve use of I/O's private area.
- Since I/Os are allocates from per-port pools, make allocations store pointer to CTL softc there, and use it where needed instead of global. - Created bunch of helper macros to access LUN, port and CTL softc.
|
#
312571 |
|
21-Jan-2017 |
mav |
MFC r310539: Remove CTL_MAX_LUNS from places where it is not required.
|
#
311442 |
|
05-Jan-2017 |
mav |
MFC r310534: Improve third-party copy error reporting.
For EXTENDED COPY: - improve parameters checking to report some errors before copy start; - forward sense data from copy target as descriptor in case of error; - report which CSCD reported error in sense key specific information. For WRITE USING TOKEN: - pass through real sense data from copy target instead of reporting our copy error, since for initiator its a "simple" write, not a copy.
|
#
311417 |
|
05-Jan-2017 |
mav |
MFC r310285: When reporting "Logical block address out of range" error, report the LBA in sense data INFORMATION field.
|
#
300588 |
|
24-May-2016 |
mav |
MFC r299347, r299348: Validate XCOPY range offsets and lengths.
|
#
300587 |
|
24-May-2016 |
mav |
MFC r299346: More XCOPY parameters validation.
|
#
300586 |
|
24-May-2016 |
mav |
MFC r299329: Improve validation of some POPULATE TOKEN parameters.
|
#
288822 |
|
05-Oct-2015 |
mav |
MFC r288458: More aggressively fill WUT read pipeline.
On some tests I've measured 5% copy speedup from this.
|
#
288821 |
|
05-Oct-2015 |
mav |
MFC r288450: Make zero WUT use WRITE SAME with recently allowed NDOB flag.
|
#
288814 |
|
05-Oct-2015 |
mav |
MFC r288367: Fix arguments order.
|
#
288767 |
|
05-Oct-2015 |
mav |
MFC r287913: Report number of failed XCOPY segment.
|
#
288740 |
|
05-Oct-2015 |
mav |
MFC r287715: Improve XCOPY error reporting.
|
#
288739 |
|
05-Oct-2015 |
mav |
MFC r287714: Report that we have no limit on POPULATE TOKEN segment size.
|
#
288732 |
|
05-Oct-2015 |
mav |
MFC r287621: Reimplement CTL High Availability.
CTL HA functionality was originally implemented by Copan many years ago, but large part of the sources was never published. This change includes clean room implementation of the missing code and fixes for many bugs.
This code supports dual-node HA with ALUA in four modes: - Active/Unavailable without interlink between nodes; - Active/Standby with second node handling only basic LUN discovery and reservation, synchronizing with the first node through the interlink; - Active/Active with both nodes processing commands and accessing the backing storage, synchronizing with the first node through the interlink; - Active/Active with second node working as proxy, transfering all commands to the first node for execution through the interlink.
Unlike original Copan's implementation, depending on specific hardware, this code uses simple custom TCP-based protocol for interlink. It has no authentication, so it should never be enabled on public interfaces.
The code may still need some polishing, but generally it is functional.
Relnotes: yes Sponsored by: iXsystems, Inc.
|
#
288719 |
|
05-Oct-2015 |
mav |
MFC r286806: Drop "internal" CTL frontend.
Its idea was to be a simple initiator and execute several commands from kernel level, but FreeBSD never had consumer for that functionality, while its implementation polluted many unrelated places.
|
#
286928 |
|
19-Aug-2015 |
mav |
MFC r286320: Issue all reads of single XCOPY segment simultaneously.
During vMotion and Clone VMware by default runs multiple sequential 4MB XCOPY requests same time. If CTL issues reads sequentially in 1MB chunks for each XCOPY command, reads from different commands are not detected as sequential by serseq option code and allowed to execute simultaneously. Such read pattern confused ZFS prefetcher, causing suboptimal disk access. Issuing all reads same time make serseq code work properly, serializing reads both within each XCOPY command and between them.
My tests with ZFS pool of 14 disks in RAID10 shows prefetcher efficiency improved from 37% to 99.7%, copying speed improved by 10-60%, average read latency reduced twice on HDD layer and by five times on zvol layer.
|
#
284796 |
|
25-Jun-2015 |
mav |
MFC r284639: Introduce separate lock for tokens to reduce ctl_lock scope.
|
#
279004 |
|
19-Feb-2015 |
mav |
MFC r278625: Make XCOPY and WUT commands respect physical block size/offset.
This change by 2-3 times improves performance of misaligned XCOPY and WUT commands by avoiding unneeded read-modify-write cycles inside ZFS.
|
#
277810 |
|
27-Jan-2015 |
mav |
MFC r277647: Fix wrong LUN reference in XCOPY block-to-block operation.
This could cause data corruption due to accessing wrong LUN in case of retries on write errors. Failed writes were retried to read LUN.
|
#
276614 |
|
03-Jan-2015 |
mav |
MFC r275942: Reduce number of places where global control_softc is used.
At some point we may want to have several CTL instances, and that is not really impossible.
|
#
275881 |
|
18-Dec-2014 |
mav |
MFC r275058: Coalesce last data move and command status for read commands.
Make CTL core and block backend set success status before initiating last data move for read commands. Make CAM target and iSCSI frontends detect such condition and send command status together with data. New I/O flag allows to skip duplicate status sending on later fe_done() call.
For Fibre Channel this change saves one of three interrupts per read command, increasing performance from 126K to 160K IOPS. For iSCSI this change saves one of three PDUs per read command, increasing performance from 1M to 1.2M IOPS.
Sponsored by: iXsystems, Inc.
|
#
275878 |
|
18-Dec-2014 |
mav |
MFC r274962: Replace home-grown CTL IO allocator with UMA.
Old allocator created significant lock congestion protecting its lists of preallocated I/Os, while UMA provides much better SMP scalability. The downside of UMA is lack of reliable preallocation, that could guarantee successful allocation in non-sleepable environments. But careful code review shown, that only CAM target frontend really has that requirement. Fix that making that frontend preallocate and statically bind CTL I/O for every ATIO/INOT it preallocates any way. That allows to avoid allocations in hot I/O path. Other frontends either may sleep in allocation context or can properly handle allocation errors.
On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections this change increases peak performance from ~700K to >1M IOPS! Yay! :)
Sponsored by: iXsystems, Inc.
|
#
275673 |
|
10-Dec-2014 |
mav |
MFC r275446: Plug memory leaks on UNMAP and XCOPY with invalid parameters.
|
#
272647 |
|
06-Oct-2014 |
mav |
MFC r272355: Fix couple issues with ROD tokens content.
|
#
271904 |
|
20-Sep-2014 |
mav |
MFC r271702: Fix tpc_create_token() introduced in r269497 to encode CREATOR LOGICAL UNIT DESCRIPTOR field as Identification Descriptor CSCD descriptor, not just as Identification Descriptor.
Approved by: re (gjb)
|
#
270389 |
|
23-Aug-2014 |
mav |
MFC r270176: Fix lock recursion on LUN shutdown, introduced on r269497.
|
#
270107 |
|
17-Aug-2014 |
mav |
MFC r269587: Reimplement WRITE USING TOKEN with Block Zero token using WRITE SAME.
On my ZVOL of SSDs that increases speed of zero writing in that way from 1 to 2.5GB/s by reducing CPU overhead.
|
#
270106 |
|
17-Aug-2014 |
mav |
MFC r269497: Add support for Windows dialect of EXTENDED COPY command, aka Microsoft ODX.
This allows to avoid extra network traffic when copying files on NTFS iSCSI disks within one storage host by drag'n'dropping them in Windows Explorer of Windows 8/2012. It should also accelerate Hyper-V VM operations, etc.
|
#
269574 |
|
05-Aug-2014 |
mav |
MFC r269444, r269450: Plug EXTENDED COPY request data memory leak.
|
#
269572 |
|
05-Aug-2014 |
mav |
MFC r269442: Fix some bugs in RECEIVE COPY STATUS data.
|
#
269570 |
|
05-Aug-2014 |
mav |
MFC r269441: Add missing comparisons to make list IDs in EXTENDED COPY per-initiator, as they should be. Wrap it into a function to not duplicate the code.
|
#
269298 |
|
30-Jul-2014 |
mav |
MFC r268808: Increase maximal number of SCSI ports in CTL from 32 to 128.
After I gave each iSCSI target its own port, the old limit appeared to be not so big. This change almost proportionally increases per-LUN memory use, but it is still three times better then it was before r268807.
|
#
269296 |
|
30-Jul-2014 |
mav |
MFC r268767: Add support for VMWare dialect of EXTENDED COPY command, aka VAAI Clone.
This allows to clone VMs and move them between LUNs inside one storage host without generating extra network traffic to the initiator and back, and without being limited by network bandwidth.
LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set. For LUNs without these IDs VMWare will use traditional copy operations.
Beware: the above LUN IDs explicitly set to values non-unique from the VM cluster point of view may cause data corruption if wrong LUN is addressed!
Sponsored by: iXsystems, Inc.
|
#
288740 |
|
05-Oct-2015 |
mav |
MFC r287715: Improve XCOPY error reporting.
|
#
288739 |
|
05-Oct-2015 |
mav |
MFC r287714: Report that we have no limit on POPULATE TOKEN segment size.
|
#
288732 |
|
05-Oct-2015 |
mav |
MFC r287621: Reimplement CTL High Availability.
CTL HA functionality was originally implemented by Copan many years ago, but large part of the sources was never published. This change includes clean room implementation of the missing code and fixes for many bugs.
This code supports dual-node HA with ALUA in four modes: - Active/Unavailable without interlink between nodes; - Active/Standby with second node handling only basic LUN discovery and reservation, synchronizing with the first node through the interlink; - Active/Active with both nodes processing commands and accessing the backing storage, synchronizing with the first node through the interlink; - Active/Active with second node working as proxy, transfering all commands to the first node for execution through the interlink.
Unlike original Copan's implementation, depending on specific hardware, this code uses simple custom TCP-based protocol for interlink. It has no authentication, so it should never be enabled on public interfaces.
The code may still need some polishing, but generally it is functional.
Relnotes: yes Sponsored by: iXsystems, Inc.
|
#
288719 |
|
05-Oct-2015 |
mav |
MFC r286806: Drop "internal" CTL frontend.
Its idea was to be a simple initiator and execute several commands from kernel level, but FreeBSD never had consumer for that functionality, while its implementation polluted many unrelated places.
|
#
286928 |
|
19-Aug-2015 |
mav |
MFC r286320: Issue all reads of single XCOPY segment simultaneously.
During vMotion and Clone VMware by default runs multiple sequential 4MB XCOPY requests same time. If CTL issues reads sequentially in 1MB chunks for each XCOPY command, reads from different commands are not detected as sequential by serseq option code and allowed to execute simultaneously. Such read pattern confused ZFS prefetcher, causing suboptimal disk access. Issuing all reads same time make serseq code work properly, serializing reads both within each XCOPY command and between them.
My tests with ZFS pool of 14 disks in RAID10 shows prefetcher efficiency improved from 37% to 99.7%, copying speed improved by 10-60%, average read latency reduced twice on HDD layer and by five times on zvol layer.
|
#
284796 |
|
25-Jun-2015 |
mav |
MFC r284639: Introduce separate lock for tokens to reduce ctl_lock scope.
|
#
279004 |
|
19-Feb-2015 |
mav |
MFC r278625: Make XCOPY and WUT commands respect physical block size/offset.
This change by 2-3 times improves performance of misaligned XCOPY and WUT commands by avoiding unneeded read-modify-write cycles inside ZFS.
|
#
277810 |
|
27-Jan-2015 |
mav |
MFC r277647: Fix wrong LUN reference in XCOPY block-to-block operation.
This could cause data corruption due to accessing wrong LUN in case of retries on write errors. Failed writes were retried to read LUN.
|
#
276614 |
|
03-Jan-2015 |
mav |
MFC r275942: Reduce number of places where global control_softc is used.
At some point we may want to have several CTL instances, and that is not really impossible.
|
#
275881 |
|
18-Dec-2014 |
mav |
MFC r275058: Coalesce last data move and command status for read commands.
Make CTL core and block backend set success status before initiating last data move for read commands. Make CAM target and iSCSI frontends detect such condition and send command status together with data. New I/O flag allows to skip duplicate status sending on later fe_done() call.
For Fibre Channel this change saves one of three interrupts per read command, increasing performance from 126K to 160K IOPS. For iSCSI this change saves one of three PDUs per read command, increasing performance from 1M to 1.2M IOPS.
Sponsored by: iXsystems, Inc.
|
#
275878 |
|
18-Dec-2014 |
mav |
MFC r274962: Replace home-grown CTL IO allocator with UMA.
Old allocator created significant lock congestion protecting its lists of preallocated I/Os, while UMA provides much better SMP scalability. The downside of UMA is lack of reliable preallocation, that could guarantee successful allocation in non-sleepable environments. But careful code review shown, that only CAM target frontend really has that requirement. Fix that making that frontend preallocate and statically bind CTL I/O for every ATIO/INOT it preallocates any way. That allows to avoid allocations in hot I/O path. Other frontends either may sleep in allocation context or can properly handle allocation errors.
On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections this change increases peak performance from ~700K to >1M IOPS! Yay! :)
Sponsored by: iXsystems, Inc.
|
#
275673 |
|
10-Dec-2014 |
mav |
MFC r275446: Plug memory leaks on UNMAP and XCOPY with invalid parameters.
|
#
272647 |
|
06-Oct-2014 |
mav |
MFC r272355: Fix couple issues with ROD tokens content.
|
#
271904 |
|
20-Sep-2014 |
mav |
MFC r271702: Fix tpc_create_token() introduced in r269497 to encode CREATOR LOGICAL UNIT DESCRIPTOR field as Identification Descriptor CSCD descriptor, not just as Identification Descriptor.
Approved by: re (gjb)
|
#
270389 |
|
23-Aug-2014 |
mav |
MFC r270176: Fix lock recursion on LUN shutdown, introduced on r269497.
|
#
270107 |
|
17-Aug-2014 |
mav |
MFC r269587: Reimplement WRITE USING TOKEN with Block Zero token using WRITE SAME.
On my ZVOL of SSDs that increases speed of zero writing in that way from 1 to 2.5GB/s by reducing CPU overhead.
|
#
270106 |
|
17-Aug-2014 |
mav |
MFC r269497: Add support for Windows dialect of EXTENDED COPY command, aka Microsoft ODX.
This allows to avoid extra network traffic when copying files on NTFS iSCSI disks within one storage host by drag'n'dropping them in Windows Explorer of Windows 8/2012. It should also accelerate Hyper-V VM operations, etc.
|
#
269574 |
|
05-Aug-2014 |
mav |
MFC r269444, r269450: Plug EXTENDED COPY request data memory leak.
|
#
269572 |
|
05-Aug-2014 |
mav |
MFC r269442: Fix some bugs in RECEIVE COPY STATUS data.
|
#
269570 |
|
05-Aug-2014 |
mav |
MFC r269441: Add missing comparisons to make list IDs in EXTENDED COPY per-initiator, as they should be. Wrap it into a function to not duplicate the code.
|
#
269298 |
|
30-Jul-2014 |
mav |
MFC r268808: Increase maximal number of SCSI ports in CTL from 32 to 128.
After I gave each iSCSI target its own port, the old limit appeared to be not so big. This change almost proportionally increases per-LUN memory use, but it is still three times better then it was before r268807.
|
#
269296 |
|
30-Jul-2014 |
mav |
MFC r268767: Add support for VMWare dialect of EXTENDED COPY command, aka VAAI Clone.
This allows to clone VMs and move them between LUNs inside one storage host without generating extra network traffic to the initiator and back, and without being limited by network bandwidth.
LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set. For LUNs without these IDs VMWare will use traditional copy operations.
Beware: the above LUN IDs explicitly set to values non-unique from the VM cluster point of view may cause data corruption if wrong LUN is addressed!
Sponsored by: iXsystems, Inc.
|