331646 |
27-Mar-2018 |
ken |
MFC r331422:
------------------------------------------------------------------------ r331422 | ken | 2018-03-23 07:52:26 -0600 (Fri, 23 Mar 2018) | 42 lines
Disable T10 Protection Information / EEDP handling for type 2 protection.
The mps(4) and mpr(4) drivers and hardware handle T10 Protection Information, which is a system of checksums and guard blocks to protect data while it is being transferred and while it is on disk. It is also known as T10 DIF. For more details, see section 4.22 of the SBC-4 spec.
Supporting Type 2 protection requires using 32 byte CDBs, and filling in the fields in those CDBs. We don't yet support that in the da(4) driver.
Type 1 and Type 3 protection don't require that, and can be handled by the mps(4)/mpr(4) driver's code and firmware without any additional input from the da(4) driver.
If a drive has Type 2 protection enabled (you frequently see this with SAS drives shipped from Dell), don't set the various EEDP fields in the mps(4)/mpr(4) driver command fields. Otherwise, you wind up with errors like this that would otherwise make no sense:
(da9:mpr0:0:18:0): READ(10). CDB: 28 00 00 00 00 00 00 02 00 00 (da9:mpr0:0:18:0): CAM status: SCSI Status Error (da9:mpr0:0:18:0): SCSI status: Check Condition (da9:mpr0:0:18:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code) (da9:mpr0:0:18:0): (da9:mpr0:0:18:0): Field Replaceable Unit: 0 (da9:mpr0:0:18:0): Command Specific Info: 0 (da9:mpr0:0:18:0): (da9:mpr0:0:18:0): Descriptor 0x80: f8 21 (da9:mpr0:0:18:0): Descriptor 0x81: 00 00 00 00 00 00 (da9:mpr0:0:18:0): Error 22, Unretryable error
In other words, what kind of strange SAS hard drive doesn't support a standard 10 byte SCSI READ command? In this case, one that has Type 2 protection enabled.
We can revisit this when we put Type 2 protection support in the da(4) driver, but for now this will help people who put Type 2 formatted drives in a system and wonder what in the world is going on.
Sponsored by: Spectra Logic
------------------------------------------------------------------------ |
322658 |
18-Aug-2017 |
ken |
MFC r321502, r321714, r321733, r321737, r321799, r322364:
------------------------------------------------------------------------ r321502 | scottl | 2017-07-25 19:48:13 -0600 (Tue, 25 Jul 2017) | 2 lines
Quiet a message that sounds far more dire than it really is.
------------------------------------------------------------------------ r321714 | scottl | 2017-07-30 00:53:58 -0600 (Sun, 30 Jul 2017) | 13 lines
Split the interrupt setup code into two parts: allocation and configuration. Do the allocation before requesting the IOCFacts message. This triggers the LSI firmware to recognize the multiqueue should be enabled if available. Multiqueue isn't used by the driver yet, but this also fixes a problem with the cached IOCFacts not matching latter checks, leading to potential problems with error recovery.
As a side-effect, fetch the driver tunables as early as possible.
Reviewed by: slm Obtained from: Netflix Differential Revision: D9243
------------------------------------------------------------------------ r321733 | scottl | 2017-07-30 16:34:24 -0600 (Sun, 30 Jul 2017) | 5 lines
Change from using underbar function names to normal function names for the informational print functions. Collapse the debug API a bit to be more generic and not require as much code duplication. While here, fix a bug in MPS that was already fixed in MPR.
------------------------------------------------------------------------ r321737 | scottl | 2017-07-30 18:05:49 -0600 (Sun, 30 Jul 2017) | 3 lines
Don't re-parse PCI IDs in order to set card-specific flags, use the flags field in the PCIID table.
------------------------------------------------------------------------ r321799 | scottl | 2017-07-31 10:55:56 -0600 (Mon, 31 Jul 2017) | 4 lines
Fix a logic bug in the split PCI interrupt code that slipped through
Reported by: Harry Schmalzbauer
------------------------------------------------------------------------ r322364 | ken | 2017-08-10 08:59:17 -0600 (Thu, 10 Aug 2017) | 39 lines
Changes to make mps(4) and mpr(4) handle reinit with reallocation.
When the mps(4) and mpr(4) drivers need to reinitialize the firmware, they sometimes need to reallocate all of the memory allocated by the driver. The reallocation happens whenever the IOC Facts change. That should only happen after a firmware upgrade.
If the reinitialization happens as a result of a timed out command sent to the card, the command that timed out and triggered the reinit may have been freed if iocfacts_allocate() reallocated all memory. If the caller attempts to access the command after that, the kernel will panic because the caller will be dereferencing freed memory.
The solution is to set a flag in the softc when we reallocate, and avoid dereferencing the command strucure if we've reallocated.
The changes are largely the same in both drivers, since mpr(4) is a derivative of mps(4).
o In iocfacts_allocate(), if the IOC Facts have changed and we need to reallocate, set the REALLOCATED flag in the softc.
o Change wait_command() to take a struct mps_command ** instead of a struct mps_command *. This allows us to NULL out the caller's command pointer if we have to reinit the controller and the data structures get reallocated. (The REALLOCATED flag will be set in the softc if that has happened.)
o In every place that calls wait_command(), make sure we handle the case where the command is NULL after the call.
o The mpr(4) driver has mpr_request_polled() which can also reinitialize the card. Also check for reallocation there.
Reviewed by: scottl, slm Sponsored by: Spectra Logic
------------------------------------------------------------------------ |
319446 |
01-Jun-2017 |
slm |
MFC r318895: Fix several problems with mapping code in mps(4). MFC r318896: Fix several problems with mapping code in mpr(4).
-Add several comments describing what the mapping code is doing. -Added a callout timer to improve check for missing devices when discovery has completed so that missing counts are incremented correctly. -Fix problems with missing counts not being saved to the HBA. -Update man pages mps(4) and mpr(4) to include a description of the use use_phy_num sysctl variable. -Remove channel field in the mapping structure because it's not used. -Improve logging by using mps_dprint or mpr_dprint instead of printf and adding more logging where appropriate. -Add check for a bad index before writing mapping entries to controller. -The high missing count check in the mapping table was using the incorrect initial value, which could lead to a bad result. -The usage of the IN_USE flag for volume mapping was changed to be more intuitive, and was not being used correctly. -The check for a free DPM entry was changed, as this was completely wrong. -Updates to the missing count for volumes were not being done correctly, so this function was completely rewritten. -_mapping_add_to_removal_table() was overly complicated and incorrectly used, so this function was rewritten. -Missing counts for all devices were not being incremented properly, so this functionality was added. -The search for space in the mapping table for missing enclosures was not calculating the found space correctly due to not breaking out of a loop when required, and the num_found variable was not being reset when needed. -Retries when a device fails to get added due to a full mapping table were removed because this is unneccessary. -mps_mapping_is_reinit_required() and mpr_mapping_is_reinit_required() were removed because they were not being used. -Some functions were renamed to avoid confusion between Target IDs and SAS IDs. -_mapping_check_update_ir_mt_idx() was removed because it was overly complicating volume mapping. -The setting of the maxtargets variable was changed to include max volumes. -The setting of the initiator_id variable was changed to be the invalid target ID after all targets, including volumes. Previously, this was set to the last valid target ID. -Don't exclude target IDs of RAID components or check for a reuse of a target ID for RAID components. -Some endienness was added.
Approved by: ken, mav |
319435 |
01-Jun-2017 |
slm |
MFC r308217, r308301, r311958, r312437, r318188, r318427, r318679
r308217: Add a fallback to the device mapper logic. We've seen systems in the field that are apparently misconfigured by the manufacturer and cause the mapping logic to fail. The fallback allows drive numbers to be assigned based on the PHY number that they're attached to. Add sysctls and tunables to overrid this new behavior, but they should be considered only necessary for debugging.
Reviewed by: imp, smh Obtained from: Netflix MFC after: 3 days Sponsored by: D8403
r308301: Record the LogInfo field when reporting the IOCStatus. Helps in debugging errors.
Submitted by: slm Obtained from: Netflix MFC after: 3 days
r311958: Print out the number of queues/MSIx vectors.
Sponsored by: Netflix
r312437: Rework the debug print API. Event printing no longer gets special handling. All of the printing from the tables file now has wrappers so that the handling is cleaner and it's possible to print something out (say, during development) without having to fight the global debug flags. This re-org will also make it easier to have the tables be compiled out at build time if desired.
Other than fixing some minor bugs, there are no user-visible changes from this change
Sponsored by: Netflix, Inc. Differential Revision: D9238
r318188: Improve error messages during command timeout for the mpr and mps drivers.
Sponsored by: Netflix
r318427: Add tri-mode support (SAS/SATA/PCIe).
This includes NVMe device support and adds support for the following adapters: SAS 3408 SAS 3416 SAS 3508 SAS 3516 SAS 3616 SAS 3708 SAS 3716
Reviewed by: ken, scottl, asomers, mav Approved by: ken, scottl, mav MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D10095
r318679: Fix powerpc compiler error.
Approved by: ken |
302031 |
20-Jun-2016 |
slm |
- No log bit in IOCStatus and endian-safe changes.
Use MPI2_IOCSTATUS_MASK when checking IOCStatus to mask off the log bit, and make a few more things endian-safe.
- Fix possible use of invalid pointer.
It was possible to use an invalid pointer to get the target ID value. To fix this, initialize a local Target ID variable to an invalid value and change that variable to a valid value only if the pointer to the Target ID is not NULL.
- No need to set the MPSSAS_SHUTDOWN flag because it's never used.
- done_ccb pointer can be used if it is NULL.
To prevent this, move check for done_ccb == NULL to before done_ccb is used in mpssas_stop_unit_done().
- Disks can go missing until a reboot is done in some cases.
This is due to the DevHandle not being released, which causes the Firmware to not allow that disk to be re-added.
Reviewed by: ken Approved by: re (gjb), ken, scottl, ambrisko (mentors) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D6872
|
279253 |
24-Feb-2015 |
slm |
- Updated all files with 2015 Avago copyright, and updated LSI's copyright dates.
- Changed all of the PCI device strings from LSI to Avago Technologies (LSI).
- Added a sysctl variable to control how StartStopUnit behavior works. User can select to spin down disks based on if disk is SSD or HDD.
- Inquiry data is required to tell if a disk will support SSU at shutdown or not. Due to the addition of mpssas_async, which gets Advanced Info but not Inquiry data, the setting of supports_SSU was moved to the mpssas_scsiio_complete function, which snoops for any Inquiry commands. And, since disks are shutdown as a target and not a LUN, this process was simplified by basing it on targets and not LUNs.
- Added a sysctl variable that sets the amount of time to retry after sending a failed SATA ID command. This helps with some bad disks and large disks that require a lot of time to spin up. Part of this change was to add a callout to handle timeouts with the SATA ID command. The callout function is called mpssas_ata_id_timeout(). (Fixes PR 191348)
- Changed the way resets work by allowing I/O to continue to devices that are not currently under a reset condition. This uses devq's instead of simq's and makes use of the MPSSAS_TARGET_INRESET flag. This change also adds a function called mpssas_prepare_tm().
- Some changes were made to reduce code duplication when getting a SAS address for a SATA disk.
- Fixed some formatting and whitespace.
- Bump version of mps driver to 20.00.00.00-fbsd
PR: 191348 Reviewed by: ken, scottl Approved by: ken, scottl MFC after: 2 weeks
|
278964 |
18-Feb-2015 |
ken |
Make sure that the flags for the XPT_DEV_ADVINFO CCB are initialized properly.
If there is garbage in the flags field, it can sometimes include a set CDAI_FLAG_STORE flag, which may cause either an error or perhaps result in overwriting the field that was intended to be read.
sys/cam/cam_ccb.h: Add a new flag to the XPT_DEV_ADVINFO CCB, CDAI_FLAG_NONE, that callers can use to set the flags field when no store is desired.
sys/cam/scsi/scsi_enc_ses.c: In ses_setphyspath_callback(), explicitly set the XPT_DEV_ADVINFO flags to CDAI_FLAG_NONE when fetching the physical path information. Instead of ORing in the CDAI_FLAG_STORE flag when storing the physical path, set the flags field to CDAI_FLAG_STORE.
sys/cam/scsi/scsi_sa.c: Set the XPT_DEV_ADVINFO flags field to CDAI_FLAG_NONE when fetching extended inquiry information.
sys/cam/scsi/scsi_da.c: When storing extended READ CAPACITY information, set the XPT_DEV_ADVINFO flags field to CDAI_FLAG_STORE instead of ORing it into a field that isn't initialized.
sys/dev/mpr/mpr_sas.c, sys/dev/mps/mps_sas.c: When fetching extended READ CAPACITY information, set the XPT_DEV_ADVINFO flags field to CDAI_FLAG_NONE instead of setting it to 0.
sbin/camcontrol/camcontrol.c: When fetching a device ID, set the XPT_DEV_ADVINFO flags field to CDAI_FLAG_NONE instead of 0.
sys/sys/param.h: Bump __FreeBSD_version to 1100061 for the new XPT_DEV_ADVINFO CCB flag, CDAI_FLAG_NONE.
Sponsored by: Spectra Logic MFC after: 1 week
|
274819 |
21-Nov-2014 |
smh |
Prevent overflow issues in timeout processing
Previously, any timeout value for which (timeout * hz) will overflow the signed integer, will give weird results, since callout(9) routines will convert negative values of ticks to '1'. For unsigned integer overflow we will get sufficiently smaller timeout values than expected.
Switch from callout_reset, which requires conversion to int based ticks to callout_reset_sbt to avoid this.
Also correct isci to correctly resolve ccb timeout.
This was based on the original work done by Eygene Ryabinkin <rea@freebsd.org> back in 5 Aug 2011 which used a macro to help avoid the overlow.
Differential Revision: https://reviews.freebsd.org/D1157 Reviewed by: mav, davide MFC after: 1 month Sponsored by: Multiplay
|
273377 |
21-Oct-2014 |
hselasky |
Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.
- Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes.
- Logical OR where binary OR was expected.
- Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs.
- Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function.
- Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement.
- Updated "EXAMPLES" section in SYSCTL manual page.
MFC after: 3 days Sponsored by: Mellanox Technologies
|
254615 |
21-Aug-2013 |
ken |
Fix mps(4) driver breakage that came in in change 253550 that manifested itself in out of chain frame conditions.
When the driver ran out of chain frames, the request in question would get completed early, and go through mpssas_scsiio_complete().
In mpssas_scsiio_complete(), the negation of the CAM status values (CAM_STATUS_MASK | CAM_SIM_QUEUED) was ORed in instead of being ANDed in. This resulted in a bogus CAM CCB status value. This didn't show up in the non-error case, because the status was reset to something valid (e.g. CAM_REQ_CMP) later on in the function.
But in the error case, such as when the driver ran out of chain frames, the CAM_REQUEUE_REQ status was ORed in to the bogus status value. This led to the CAM transport layer repeatedly releasing the SIM queue, because it though that the CAM_RELEASE_SIMQ flag had been set. The symptom was messages like this on the console when INVARIANTS were enabled:
xpt_release_simq: requested 1 > present 0 xpt_release_simq: requested 1 > present 0 xpt_release_simq: requested 1 > present 0
mps_sas.c: In mpssas_scsiio_complete(), use &= to take status bits out. |= adds them in.
In the error case in mpssas_scsiio_complete(), set the status to CAM_REQUEUE_REQ, don't OR it in.
MFC after: 3 days Sponsored by: Spectra Logic
|
254263 |
12-Aug-2013 |
scottl |
Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCI command register. The lazy BAR allocation code in FreeBSD sometimes disables this bit when it detects a range conflict, and will re-enable it on demand when a driver allocates the BAR. Thus, the bit is no longer a reliable indication of capability, and should not be checked. This results in the elimination of a lot of code from drivers, and also gives the opportunity to simplify a lot of drivers to use a helper API to set the busmaster enable bit.
This changes fixes some recent reports of disk controllers and their associated drives/enclosures disappearing during boot.
Submitted by: jhb Reviewed by: jfv, marius, achadd, achim MFC after: 1 day
|
253550 |
22-Jul-2013 |
ken |
Merge in phase 14+ -> 16 mps driver fixes from LSI:
--------------------------------------------------------------- System panics during a Port reset with ouststanding I/O --------------------------------------------------------------- It is possible to call mps_mapping_free_memory after this memory is already freed, causing a panic. Removed this extra call to mps_mappiing_free_memory and call mps_mapping_exit in place of the mps_mapping_free_memory call so that any outstanding mapping items can be flushed before memory is freed.
--------------------------------------------------------------- Correct memory leak during a Port reset with ouststanding I/O --------------------------------------------------------------- In mps_reinit function, the mapping memory was not being freed before being re-allocated. Added line to call the memory free function for mapping memory.
--------------------------------------------------------------- Use CAM_SIM_QUEUED flag in Driver IO path. --------------------------------------------------------------- This flag informs the XPT that successful abort of a CCB requires an abort ccb to be issued to the SIM. While processing SCSI IO's, set the CAM_SIM_QUEUED flag in the status for the IO. When the command completes, clear this flag.
--------------------------------------------------------------- Check for CAM_REQ_INPROG in I/O path. --------------------------------------------------------------- Added a check in mpssas_action_scsiio for the In Progress status for the IO. If this flag is set, the IO has already been aborted by the upper layer (before CAM_SIM_QUEUED was set) and there is no need to send the IO. The request will be completed without error.
--------------------------------------------------------------- Improve "doorbell handshake method" for mps_get_iocfacts --------------------------------------------------------------- Removed call to get Port Facts since this information is not used currently.
Added mps_iocfacts_allocate function to allocate memory that is based on IOC Facts data. Added mps_iocfacts_free function to free memory that is based on IOC Facts data. Both of the functions are used when a Diag Reset is performed or when the driver is attached/detached. This is needed in case IOC Facts changes after a Diag Reset, which could happen if FW is upgraded.
Moved call of mps_bases_static_config_pages from the attach routine to after the IOC is ready to process accesses based on the new memory allocations (instead of polling through the Doorbell).
--------------------------------------------------------------- Set TimeStamp in INIT message in millisecond format Set the IOC ---------------------------------------------------------------
--------------------------------------------------------------- Prefer mps_wait_command to mps_request_polled --------------------------------------------------------------- Instead of using mps_request_polled, call mps_wait_command whenever possible. Change the mps_wait_command function to check the current context and either use interrupt context or poll if required by using the pause or DELAY function. Added a check after waiting 50mSecs to see if the command has timed out. This is only done if polliing, the msleep command will automatically timeout if the command has taken too long to complete.
--------------------------------------------------------------- Integrated RAID: Volume Activation Failed error message is displayed though the volume has been activated. --------------------------------------------------------------- Instead of failing an IOCTL request that does not have a large enough buffer to hold the complete reply, copy as much data from the reply as possible into the user's buffer and log a message saying that the user's buffer was smaller than the returned data.
--------------------------------------------------------------- mapping_add_new_device failure due to persistent table FULL --------------------------------------------------------------- When a new device is added, if it is determined that the device persistent table is being used and is full, instead of displaying a message for this condition every time, only log a message if the MPS_INFO bit is set in the debug_flags.
Submitted by: LSI MFC after: 1 week
|
253549 |
22-Jul-2013 |
ken |
CAM and mps(4) driver scanning changes.
Add a PIM_NOSCAN flag to the CAM path inquiry CCB. This tells CAM not to perform a rescan on a bus when it is registered.
We now use this flag in the mps(4) driver. Since it knows what devices it has attached, it is more efficient for it to just issue a target rescan on the targets that are attached.
Also, remove the private rescan thread from the mps(4) driver in favor of the rescan thread already built into CAM. Without this change, but with the change above, the MPS scanner could run before or during CAM's initial setup, which would cause duplicate device reprobes and announcements.
sys/param.h: Bump __FreeBSD_version to 1000039 for the inclusion of the PIM_RESCAN CAM path inquiry flag.
sys/cam/cam_ccb.h: sys/cam/cam_xpt.c: Added a PIM_NOSCAN flag. If a SIM sets this in the path inquiry ccb, then CAM won't rescan the bus in xpt_bus_regsister.
sys/dev/mps/mps_sas.c For versions of FreeBSD that have the PIM_NOSCAN path inquiry flag, don't freeze the sim queue during scanning, because CAM won't be scanning this bus. Instead, hold up the boot. Don't call mpssas_rescan_target in mpssas_startup_decrement; it's redundant and I don't know why it was in there.
Set PIM_NOSCAN in path inquiry CCBs.
Remove methods related to the internal rescan daemon.
Always use async events to trigger a probe for EEDP support. In older versions of FreeBSD where AC_ADVINFO_CHANGED is not available, use AC_FOUND_DEVICE and issue the necessary READ CAPACITY manually.
Provide a path to xpt_register_async() so that we only receive events for our own SCSI domain.
Improve error reporting in cases where setup for EEDP detection fails.
sys/dev/mps/mps_sas.h: Remove softc flags and data related to the scanner thread.
sys/dev/mps/mps_sas_lsi.c: Unconditionally rescan the target whenever a device is added.
Sponsored by: Spectra Logic MFC after: 1 week
|
246713 |
12-Feb-2013 |
kib |
Reform the busdma API so that new types may be added without modifying every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback.
The cam changes unify the bus_dmamap_load* handling in cam drivers.
The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map.
Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
|
237683 |
28-Jun-2012 |
ken |
Bring in LSI's latest mps(4) 6Gb SAS and WarpDrive driver, version 14.00.00.01-fbsd.
Their description of the changes is as follows:
1. Copyright contents has been changed in all respective .c and .h files
2. Support for WRITE12 and READ12 for direct-io (warpdrive only) has been added.
3. Driver has added checks to see if Drive has READ_CAP_16 support before sending it down to the device. If SPC3_SID_PROTECT flag is set in the inquiry data, the device supports protection information, and must support the 16 byte read capacity command, otherwise continue without sending read cap 16. This will optimize driver performance, since it will not send READ_CAP_16 to the drive which does not have support of READ_CAP_16.
4. With new approach, "MPTIOCTL_RESET_ADAPTER" IOCTL will not use DELAY() which is busy loop implementation. It will use <msleep> (Better way to sleep without busy loop). Also from the HBA reset code path and some other places, DELAY() is replaced with msleep() or "pause()", which is based on sleep/wakeup style calls. Driver use msleep()/pause() instead of DELAY based on CAN_SLEEP/NO_SLEEP flags to avoid busy loop which is not required all the time.e.a
a. While driver is getting loaded, driver calls most of the commands with NO_SLEEP. b. When Driver is functional and it needs Reinit of HBA, CAN_SLEEP flag is used.
5. <mpslsi> driver is not Endian safe. It will not work on Big Endian machines like Sparc and PowerPC platforms because it assumes it is running on a Little Endian machine.
Driver code is modified such way that it does not assume CPU arch is Little Endian. a. All places where Driver interacts from HBA to Host, it converts Little Endian format to CPU format. b. All places where Driver interacts from Host to HBA, it converts CPU format to Little Endian.
6. Findout memory leaks in FreeBSD Driver and resolve those, such as memory leak in targ's luns creation/deletion. Also added additional checks to see memory allocation success/fail.
7. Add loginfo prints as debug message, i.e. When FW sends any loginfo, Driver should print those as debug message. This will help for debugging purpose.
8. There is possibility to get config request timeout. Current driver is able to detect config request timetout, but it does not do anything on config_request timeout. Driver should call mps_reinit() if any request_poll (which is called as part of config_request) is time out.
9. cdb length check is required for 32 byte CDB. Add correct mpi control value for 32 bit CDB as below while submitting SCSI IO Request to controller. mpi_control |= 4 << MPI2_SCSIIO_CONTROL_ADDCDBLEN_SHIFT;
10. Check the actual status of Message unit reset (mps_message_unit_reset).Previously FreeBSD Driver just writes MPI2_FUNCTION_IOC_MESSAGE_UNIT_RESET and never check the ack (it just wait for 50 millisecond). So, Driver now check the status of "MPI2_FUNCTION_IOC_MESSAGE_UNIT_RESET" after writing it to the FW.
Now it also checking for whether doorbell ack uses msleep with proper sleep flags, instead of <DELAY>.
11. Previously CAM does not detect Multi-Lun Devices. In order to detect Multi-Lun Devices by CAM the driver needs following change set: a. There is "max_lun" field which Driver need to set based on hw/fw support. Currently LSI released driver does not set this field. b. Default of "max_lun" should not be 0 in OS, but it is currently set to 0 in CAM layer. c. Export max_lun capacity to 255
12. Driver will not reset target info after port enable complete and also do Device removal when Device remove from FW. The detail description is as follows a. When Driver receive WD PD add events, it will add all information in driver local data structure. b. Only for WD, we have below checks after port enable completes, where driver clear off all information retrieved at #1. if ((sc->WD_available && (sc->WD_hide_expose == MPS_WD_HIDE_ALWAYS)) || (sc->WD_valid_config && (sc->WD_hide_expose == MPS_WD_HIDE_IF_VOLUME)) { // clear off target data structure. } It is mainly not to attach PDs to OS.
FreeBSD does bus rescan as older Parallel scsi style. So Driver needs to handle which Drive is visible to OS. That is a reason we have to clear off targ information for PDs.
Again, above logic was implemented long time ago. Similar concept we have for non-wd also. For that, LSI have introduced different logic to hide PDs.
Eventually, because of above gap, when Phy goes offline, we observe below failure. That is what Driver is not doing complete removal of device with FW. (which was pointed by Scott) Apr 5 02:39:24 Freebsd7 kernel: mpslsi0: mpssas_prepare_remove Apr 5 02:39:24 Freebsd7 kernel: mpssas_prepare_remove 497 : invalid handle 0xe
Now Driver will not reset target info after port enable complete and also will do Device removal when Device remove from FW.
13. Returning "CAM_SEL_TIMEOUT" instead of "CAM_TID_INVALID" error code on request to the Target IDs that have no devices conected at that moment. As if "CAM_TID_INVALID" error code is returned to the CAM Layaer then it results in a huge chain of errors in verbose kernel messages on boot and every hot-plug event.
Submitted by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> MFC after: 3 days
|
231240 |
09-Feb-2012 |
ken |
Bring in a number of mps(4) driver fixes from LSI:
1. Fixed timeout specification for the msleep in mps_wait_command(). Added 30 second timeout for mps_wait_command() calls in mps_user.c.
2. Make sure we call mps_detach_user() from the kldunload path.
3. Raid Hotplug behavior change.
The driver now removes a volume when it goes to a failed state, so we also need to add volume back to the OS when it goes to opitimal/degraded/online from failed/missing.
Handle raid volume add and remove from the IR_Volume event. 4. Added some more debugging information.
5. Replace xpt_async(AC_LOST_DEVICE, path, NULL) with mpssas_rescan_target().
This is to work around a panic in CAM that shows up when adding a drive with a rescan and removing another device from the driver thread with an AC_LOST_DEVICE async notification.
This problem was encountered in testing with the LSI sas2ircu utility, which was used to create a RAID volume from physical disks. The driver has to create the RAID volume target and remove the physical disk targets, and triggered a panic in the process.
The CAM issue needs to be fully diagnosed and fixed, but this works around the issue for now.
6. Fix some memory initialization issues in mps_free_command().
7. Resolve the "devq freeze forever" issue. This was caused by the internal read capacity command issued in the non-head version of the driver. When the command completed with an error, the driver wasn't unfreezing thd device queue.
The version in head uses the CAM infrastructure for getting the read capacity information, and therefore doesn't have the same issue.
8. Bump the version to 13.00.00.00-fbsd. (this is very close to LSI's internal stable driver 13.00.00.00)
Submitted by: Kashyap Desai <Kashyap.Desai@lsi.com> MFC after: 3 days
|
225950 |
03-Oct-2011 |
ken |
Add descriptor sense support to CAM, and honor sense residuals properly in CAM.
Desriptor sense is a new sense data format that originated in SPC-3. Among other things, it allows for an 8-byte info field, which is necessary to pass back block numbers larger than 4 bytes.
This change adds a number of new functions to scsi_all.c (and therefore libcam) that abstract out most access to sense data.
This includes a bump of CAM_VERSION, because the CCB ABI has changed. Userland programs that use the CAM pass(4) driver will need to be recompiled.
camcontrol.c: Change uses of scsi_extract_sense() to use scsi_extract_sense_len().
Use scsi_get_sks() instead of accessing sense key specific data directly.
scsi_modes: Update the control mode page to the latest version (SPC-4).
scsi_cmds.c, scsi_target.c: Change references to struct scsi_sense_data to struct scsi_sense_data_fixed. This should be changed to allow the user to specify fixed or descriptor sense, and then use scsi_set_sense_data() to build the sense data.
ps3cdrom.c: Use scsi_set_sense_data() instead of setting sense data manually.
cam_periph.c: Use scsi_extract_sense_len() instead of using scsi_extract_sense() or accessing sense data directly.
cam_ccb.h: Bump the CAM_VERSION from 0x15 to 0x16. The change of struct scsi_sense_data from 32 to 252 bytes changes the size of struct ccb_scsiio, but not the size of union ccb. So the version must be bumped to prevent structure mis-matches.
scsi_all.h: Lots of updated SCSI sense data and other structures.
Add function prototypes for the new sense data functions.
Take out the inline implementation of scsi_extract_sense(). It is now too large to put in a header file.
Add macros to calculate whether fields are present and filled in fixed and descriptor sense data
scsi_all.c: In scsi_op_desc(), allow the user to pass in NULL inquiry data, and we'll assume a direct access device in that case.
Changed the SCSI RESERVED sense key name and description to COMPLETED, as it is now defined in the spec.
Change the error recovery action for a number of read errors to prevent lots of retries when the drive has said that the block isn't accessible. This speeds up reconstruction of the block by any RAID software running on top of the drive (e.g. ZFS).
In scsi_sense_desc(), allow for invalid sense key numbers. This allows calling this routine without checking the input values first.
Change scsi_error_action() to use scsi_extract_sense_len(), and handle things when invalid asc/ascq values are encountered.
Add a new routine, scsi_desc_iterate(), that will call the supplied function for every descriptor in descriptor format sense data.
Add scsi_set_sense_data(), and scsi_set_sense_data_va(), which build descriptor and fixed format sense data. They currently default to fixed format sense data.
Add a number of scsi_get_*() functions, which get different types of sense data fields from either fixed or descriptor format sense data, if the data is present.
Add a number of scsi_*_sbuf() functions, which print formatted versions of various sense data fields. These functions work for either fixed or descriptor sense.
Add a number of scsi_sense_*_sbuf() functions, which have a standard calling interface and print the indicated field. These functions take descriptors only.
Add scsi_sense_desc_sbuf(), which will print a formatted version of the given sense descriptor.
Pull out a majority of the scsi_sense_sbuf() function and put it into scsi_sense_only_sbuf(). This allows callers that don't use struct ccb_scsiio to easily utilize the printing routines. Revamp that function to handle descriptor sense and use the new sense fetching and printing routines.
Move scsi_extract_sense() into scsi_all.c, and implement it in terms of the new function, scsi_extract_sense_len(). The _len() version takes a length (which should be the sense length - residual) and can indicate which fields are present and valid in the sense data.
Add a couple of new scsi_get_*() routines to get the sense key, asc, and ascq only.
mly.c: Rename struct scsi_sense_data to struct scsi_sense_data_fixed.
sbp_targ.c: Use the new sense fetching routines to get sense data instead of accessing it directly.
sbp.c: Change the firewire/SCSI sense data transformation code to use struct scsi_sense_data_fixed instead of struct scsi_sense_data. This should be changed later to use scsi_set_sense_data().
ciss.c: Calculate the sense residual properly. Use scsi_get_sense_key() to fetch the sense key.
mps_sas.c, mpt_cam.c: Set the sense residual properly.
iir.c: Use scsi_set_sense_data() instead of building sense data by hand.
iscsi_subr.c: Use scsi_extract_sense_len() instead of grabbing sense data directly.
umass.c: Use scsi_set_sense_data() to build sense data.
Grab the sense key using scsi_get_sense_key().
Calculate the sense residual properly.
isp_freebsd.h: Use scsi_get_*() routines to grab asc, ascq, and sense key values.
Calculate and set the sense residual.
MFC after: 3 days Sponsored by: Spectra Logic Corporation
|
218812 |
18-Feb-2011 |
ken |
Fix several issues with the mps(4) driver.
When the driver ran out of DMA chaining buffers, it kept the timeout for the I/O, and I/O would stall.
The driver was not freezing the device queue on errors.
mps.c: Pull command completion logic into a separate function, and call the callback/wakeup for commands that are never sent due to lack of chain buffers.
Add a number of extra diagnostic sysctl variables.
Handle pre-hardware errors for configuration I/O. This doesn't panic the system, but it will fail the configuration I/O and there is no retry mechanism. So the device probe will not succeed. This should be a very uncommon situation, however.
mps_sas.c: Freeze the SIM queue when we run out of chain buffers, and unfreeze it when more commands complete.
Freeze the device queue when errors occur, so that CAM can insure proper command ordering.
Report pre-hardware errors for task management commands. In general, that shouldn't be possible because task management commands don't have S/G lists, and that is currently the only error path before we get to the hardware.
Handle pre-hardware errors (like out of chain elements) for SMP requests. That shouldn't happen either, since we should have enough space for two S/G elements in the standard request.
For commands that end with MPI2_IOCSTATUS_SCSI_IOC_TERMINATED and MPI2_IOCSTATUS_SCSI_EXT_TERMINATED, return them with CAM_REQUEUE_REQ to retry them unconditionally. These seem to be related to back end, transport related problems that are hopefully transient. We don't want to go through the retry count for something that is not a permanent error.
Keep track of the number of outstanding I/Os.
mpsvar.h: Track the number of free chain elements.
Add variables for the number of outstanding I/Os, and I/O high water mark.
Add variables to track the number of free chain buffers and the chain low water mark, as well as the number of chain allocation failures.
Add I/O state flags and an attach done flag.
MFC after: 3 days
|
218811 |
18-Feb-2011 |
ken |
In the MPS driver, during device removal processing, don't assume that the controller firmware will return all of our commands. Instead, keep track of outstanding I/Os and return them to CAM once device removal processing completes.
mpsvar.h: Declare the new "io_list" in the mps_softc.
mps.c: Initialize the new "io_list" in the mps softc.
mps_sas.c: o Track SCSI I/O requests on the io_list from the time of mpssas_action() through mpssas_scsiio_complete(). o Zero out the request structures used for device removal commands prior to filling them out. o Once the target reset task management function completes during device removal processing, assume any SCSI I/O commands that are still oustanding will never return from the controller, and process them manually.
Submitted by: gibbs MFC after: 3 days
|
216363 |
10-Dec-2010 |
ken |
Fix an event handling bug with the mps(4) driver.
This bug manifested itself after repeated device arrivals and departures. The root of the problem was that the last entry in the reply array wasn't initialized/allocated. So every time we got around to that event, we had a bogus address.
There were a couple more problems with the code that are also fixed:
- The reply mechanism was being treated as sequential (indexed by sc->replycurindex) even though the spec says that the driver should use the ReplyFrameAddress field of the post queue descriptor to figure out where the reply is. There is no guarantee that the reply descriptors will be used in sequential order.
- The second word of the reply post queue descriptor wasn't being checked in mps_intr_locked() to make sure that it wasn't 0xffffffff. So the driver could potentially come across a partially DMAed descriptor.
- The number of replies allocated was one less than the actual size of the queue. Instead, it was the size of the number of replies that can be used at one time. (Which is one less than the size of the queue.)
mps.c: When initializing the entries in the reply free queue, make sure we initialize the full number that we tell the chip we have (sc->fqdepth), not the number that can be used at any one time (sc->num_replies).
When allocating replies, make sure we allocate the number of replies that we've told the chip exist, not just the number that can be used simultaneously.
Use the ReplyFrameAddress field of the post queue descriptor to figure out which reply is being referenced. This is what the spec says to do, and the spec doesn't guarantee that the replies will be used in order.
Put a check in to verify that the reply address passed back from the card is valid. (Panic if it isn't, we'll panic when we try to deference the reply pointer in any case.)
In mps_intr_locked(), verify that the second word of the post queue descriptor is not 0xffffffff in addition to verifying that the unused flag is not set, so we can make sure we didn't get a partially DMAed descriptor.
Remove references to sc->replycurindex, it isn't needed now.
mpsvar.h: Remove replycurindex from the softc, it isn't needed now.
Reviewed by: scottl
|
216088 |
30-Nov-2010 |
ken |
Add Serial Management Protocol (SMP) passthrough support to CAM.
This includes support in the kernel, camcontrol(8), libcam and the mps(4) driver for SMP passthrough.
The CAM SCSI probe code has been modified to fetch Inquiry VPD page 0x00 to determine supported pages, and will now fetch page 0x83 in addition to page 0x80 if supported.
Add two new CAM CCBs, XPT_SMP_IO, and XPT_GDEV_ADVINFO. The SMP CCB is intended for SMP requests and responses. The ADVINFO is currently used to fetch cached VPD page 0x83 data from the transport layer, but is intended to be extensible to fetch other types of device-specific data.
SMP-only devices are not currently represented in the CAM topology, and so the current semantics are that the SIM will route SMP CCBs to either the addressed device, if it contains an SMP target, or its parent, if it contains an SMP target. (This is noted in cam_ccb.h, since it will change later once we have the ability to have SMP-only devices in CAM's topology.)
smp_all.c, smp_all.h: New helper routines for SMP. This includes SMP request building routines, response parsing routines, error decoding routines, and structure definitions for a number of SMP commands.
libcam/Makefile: Add smp_all.c to libcam, so that SMP functionality is available to userland applications.
camcontrol.8, camcontrol.c: Add smp passthrough support to camcontrol. Several new subcommands are now available:
'smpcmd' functions much like 'cmd', except that it allows the user to send generic SMP commands.
'smprg' sends the SMP report general command, and displays the decoded output. It will automatically fetch extended output if it is available.
'smppc' sends the SMP phy control command, with any number of potential options. Among other things, this allows the user to reset a phy on a SAS expander, or disable a phy on an expander.
'smpmaninfo' sends the SMP report manufacturer information and displays the decoded output.
'smpphylist' displays a list of phys on an expander, and the CAM devices attached to those phys, if any.
cam.h, cam.c: Add a status value for SMP errors (CAM_SMP_STATUS_ERROR).
Add a missing description for CAM_SCSI_IT_NEXUS_LOST.
Add support for SMP commands to cam_error_string().
cam_ccb.h: Rename the CAM_DIR_RESV flag to CAM_DIR_BOTH. SMP commands are by nature bi-directional, and we may need to support bi-directional SCSI commands later.
Add the XPT_SMP_IO CCB. Since SMP commands are bi-directional, there are pointers for both the request and response.
Add a fill routine for SMP CCBs.
Add the XPT_GDEV_ADVINFO CCB. This is currently used to fetch cached page 0x83 data from the transport later, but is extensible to fetch many other types of data.
cam_periph.c: Add support in cam_periph_mapmem() for XPT_SMP_IO and XPT_GDEV_ADVINFO CCBs.
cam_xpt.c: Add support for executing XPT_SMP_IO CCBs.
cam_xpt_internal.h: Add fields for VPD pages 0x00 and 0x83 in struct cam_ed.
scsi_all.c: Add scsi_get_sas_addr(), a function that parses VPD page 0x83 data and pulls out a SAS address.
scsi_all.h: Add VPD page 0x00 and 0x83 structures, and a prototype for scsi_get_sas_addr().
scsi_pass.c: Add support for mapping buffers in XPT_SMP_IO and XPT_GDEV_ADVINFO CCBs.
scsi_xpt.c: In the SCSI probe code, first ask the device for VPD page 0x00. If any VPD pages are supported, that page is required to be implemented. Based on the response, we may probe for the serial number (page 0x80) or device id (page 0x83).
Add support for the XPT_GDEV_ADVINFO CCB.
sys/conf/files: Add smp_all.c.
mps.c: Add support for passing in a uio in mps_map_command(), so we can map a S/G list at once.
Add support for SMP passthrough commands in mps_data_cb(). SMP is a special case, because the first buffer in the S/G list is outbound and the second buffer is inbound.
Add support for warning the user if the busdma code comes back with more buffers than will work for the command. This will, for example, help the user determine why an SMP command failed if busdma comes back with three buffers.
mps_pci.c: Add sys/uio.h.
mps_sas.c: Add the SAS address and the parent handle to the list of fields we pull from device page 0 and cache in struct mpssas_target. These are needed for SMP passthrough.
Add support for the XPT_SMP_IO CCB. For now, this CCB is routed to the addressed device if it supports SMP, or to its parent if it does not and the parent does. This is necessary because CAM does not currently support SMP-only nodes in the topology.
Make SMP passthrough support conditional on __FreeBSD_version >= 900026. This will make it easier to MFC this change to the driver without MFCing the CAM changes as well.
mps_user.c: Un-staticize mpi_init_sge() so we can use it for the SMP passthrough code.
mpsvar.h: Add a uio and iovecs into struct mps_command for SMP passthrough commands.
Add a cm_max_segs field to struct mps_command so that we can warn the user if busdma comes back with too many segments.
Clear the cm_reply when a command gets freed. If it is not cleared, reply frames will eventually get freed into the pool multiple times and corrupt the pool. (This fix is from scottl.)
Add a prototype for mpi_init_sge().
sys/param.h: Bump __FreeBSD_version to 900026 for the for the inclusion of the XPT_GDEV_ADVINFO and XPT_SMP_IO CAM CCBs.
|
213535 |
07-Oct-2010 |
ken |
Turn on serialization of task management commands going down to the controller, but make it optional.
After a problem report from Andrew Boyer, it looks like the LSI chip may have issues (the watchdog timer fired) if too many aborts are sent down to the chip at the same time. We know that task management commands are serialized, and although the manual doesn't say it, it may be a good idea to just send one at a time.
But, since I'm not certain that this is necessary, add a tunable and sysctl variable (hw.mps.%d.allow_multiple_tm_cmds) to control the driver's behavior.
mps.c: Add support for the sysctl and tunable, and add a comment about the possible return values to mps_map_command().
mps_sas.c: Run all task management commands through two new routines, mpssas_issue_tm_request() and mpssas_complete_tm_request().
This allows us to optionally serialize task management commands. Also, change things so that the response to a task management command always comes back through the callback. (Before it could come via the callback or the return value.)
mpsvar.h: Add softc variables for the list of active task management commands, the number of active commands, and whether we should allow multiple active task management commands. Add an active command flag.
mps.4: Describe the new sysctl/loader tunable variable.
Sponsored by: Spectra Logic Corporation
|
212772 |
16-Sep-2010 |
ken |
MFp4 (//depot/projects/mps/...):
According to the MPT2 spec, task management commands are serialized, and so no I/O should start while task management commands are active.
So, to comply with that, freeze the SIM queue before we send any task management commands (abort, target reset, etc.) down to the IOC. We unfreeze the queue once the task management command completes.
It isn't clear from the spec whether multiple simultaneous task management commands are supported. Right now it is possible to have multiple outstanding task management commands, especially in the abort case. Multiple outstanding aborts do complete successfully, so it may be supported.
We also don't yet have any recovery mechanism (e.g. reset the IOC) if the task management command fails.
|
212420 |
10-Sep-2010 |
ken |
MFp4 (//depot/projects/mps/...)
Bring in a driver for the LSI Logic MPT2 6Gb SAS controllers.
This driver supports basic I/O, and works with SAS and SATA drives and expanders.
Basic error recovery works (i.e. timeouts and aborts) as well.
Integrated RAID isn't supported yet, and there are some known bugs.
So this isn't ready for production use, but is certainly ready for testing and additional development. For the moment, new commits to this driver should go into the FreeBSD Perforce repository first (//depot/projects/mps/...) and then get merged into -current once they've been vetted.
This has only been added to the amd64 GENERIC, since that is the only architecture I have tested this driver with.
Submitted by: scottl Discussed with: imp, gibbs, will Sponsored by: Yahoo, Spectra Logic Corporation
|