#
debdce20 |
|
08-Mar-2024 |
Dave Jiang <dave.jiang@intel.com> |
cxl/region: Deal with numa nodes not enumerated by SRAT For the numa nodes that are not created by SRAT, no memory_target is allocated and is not managed by the HMAT_REPORTING code. Therefore hmat_callback() memory hotplug notifier will exit early on those NUMA nodes. The CXL memory hotplug notifier will need to call node_set_perf_attrs() directly in order to setup the access sysfs attributes. In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is stored. Add a helper function acpi_node_backed_by_real_pxm() in order to check if a NUMA node id is defined by SRAT or created by CFMWS. node_set_perf_attrs() symbol is exported to allow update of perf attribs for a node. The sysfs path of /sys/devices/system/node/nodeX/access0/initiators/* is created by node_set_perf_attrs() for the various attributes where nodeX is matched to the NUMA node of the CXL region. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-13-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
067353a4 |
|
08-Mar-2024 |
Dave Jiang <dave.jiang@intel.com> |
cxl/region: Add memory hotplug notifier for cxl region When the CXL region is formed, the driver computes the performance data for the region. However this data is not available at the node data collection that has been populated by the HMAT during kernel initialization. Add a memory hotplug notifier to update the access coordinates to the 'struct memory_target' context kept by the HMAT_REPORTING code. Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the priority number to be called before HMAT_CALLBACK_PRI. The CXL update must happen before hmat_callback(). A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in order to allow CXL to update the memory_target access coordinates. A new ext_updated member is added to the memory_target to indicate that the access coordinates within the memory_target has been updated by an external agent such as CXL. This prevents data being overwritten by the hmat_update_target_attrs() triggered by hmat_callback(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Huang, Ying <ying.huang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-12-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
c20eaf44 |
|
08-Mar-2024 |
Dave Jiang <dave.jiang@intel.com> |
cxl/region: Add sysfs attribute for locality attributes of CXL regions Add read/write latencies and bandwidth sysfs attributes for the enabled CXL region. The bandwidth is the aggregated bandwidth of all devices that contribute to the CXL region. The latency is the worst latency of the device amongst all the devices that contribute to the CXL region. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-11-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
3d9f4a19 |
|
08-Mar-2024 |
Dave Jiang <dave.jiang@intel.com> |
cxl/region: Calculate performance data for a region Calculate and store the performance data for a CXL region. Find the worst read and write latency for all the included ranges from each of the devices that attributes to the region and designate that as the latency data. Sum all the read and write bandwidth data for each of the device region and that is the total bandwidth for the region. The perf list is expected to be constructed before the endpoint decoders are registered and thus there should be no early reading of the entries from the region assemble action. The calling of the region qos calculate function is under the protection of cxl_dpa_rwsem and will ensure that all DPA associated work has completed. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-10-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
cb66b1d6 |
|
31-Jan-2024 |
Alison Schofield <alison.schofield@intel.com> |
cxl/region: Allow out of order assembly of autodiscovered regions Autodiscovered regions can fail to assemble if they are not discovered in HPA decode order. The user will see failure messages like: [] cxl region0: endpoint5: HPA order violation region1 [] cxl region0: endpoint5: failed to allocate region reference The check that is causing the failure helps the CXL driver enforce a CXL spec mandate that decoders be committed in HPA order. The check is needless for autodiscovered regions since their decoders are already programmed. Trying to enforce order in the assembly of these regions is useless because they are assembled once all their member endpoints arrive, and there is no guarantee on the order in which endpoints are discovered during probe. Keep the existing check, but for autodiscovered regions, allow the out of order assembly after a sanity check that the lesser numbered decoder has the lesser HPA starting address. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Wonjae Lee <wj28.lee@samsung.com> Link: https://lore.kernel.org/r/3dec69ee97524ab229a20c6739272c3000b18408.1706736863.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
453a7fde |
|
31-Jan-2024 |
Alison Schofield <alison.schofield@intel.com> |
cxl/region: Handle endpoint decoders in cxl_region_find_decoder() In preparation for adding a new caller of cxl_region_find_decoders() teach it to find a decoder from a cxl_endpoint_decoder structure. Combining switch and endpoint decoder lookup in one function prevents code duplication in call sites. Update the existing caller. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Wonjae Lee <wj28.lee@samsung.com> Link: https://lore.kernel.org/r/79ae6d72978ef9f3ceec9722e1cb793820553c8e.1706736863.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
d76779dd |
|
24-Jan-2024 |
Quanquan Cao <caoqq@fujitsu.com> |
cxl/region:Fix overflow issue in alloc_hpa() Creating a region with 16 memory devices caused a problem. The div_u64_rem function, used for dividing an unsigned 64-bit number by a 32-bit one, faced an issue when SZ_256M * p->interleave_ways. The result surpassed the maximum limit of the 32-bit divisor (4G), leading to an overflow and a remainder of 0. note: At this point, p->interleave_ways is 16, meaning 16 * 256M = 4G To fix this issue, I replaced the div_u64_rem function with div64_u64_rem and adjusted the type of the remainder. Signed-off-by: Quanquan Cao <caoqq@fujitsu.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Fixes: 23a22cd1c98b ("cxl/region: Allocate HPA capacity to regions") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
c7ad3dc3 |
|
03-Nov-2023 |
Jim Harris <jim.harris@samsung.com> |
cxl/region: fix x9 interleave typo CXL supports x3, x6 and x12 - not x9. Fixes: 80d10a6cee050 ("cxl/region: Add interleave geometry attributes") Signed-off-by: Jim Harris <jim.harris@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/169904271254.204936.8580772404462743630.stgit@ubuntu Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
58f1e9d3 |
|
02-Jan-2024 |
Randy Dunlap <rdunlap@infradead.org> |
cxl/region: use %pap format to print resource_size_t Use "%pap" to print a resource_size_t (phys_addr_t derived type) to prevent build warnings on 32-bit arches (seen on i386 and riscv-32). ../drivers/cxl/core/region.c: In function 'alloc_hpa': ../drivers/cxl/core/region.c:556:25: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type 'resource_size_t' {aka 'unsigned int'} [-Wformat=] 556 | "HPA allocation error (%ld) for size:%#llx in %s %pr\n", Fixes: 7984d22f1315 ("cxl/region: Add dev_dbg() detail on failure to allocate HPA space") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Fan Ni <fan.ni@samsung.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <linux-cxl@vger.kernel.org> Link: https://lore.kernel.org/r/20240102173917.19718-1-rdunlap@infradead.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
7984d22f |
|
22-Dec-2023 |
Alison Schofield <alison.schofield@intel.com> |
cxl/region: Add dev_dbg() detail on failure to allocate HPA space When the region driver fails while allocating HPA space for a new region it can be because the parent resource, the CXL Window, has no more available space. In that case, the debug user sees this message: cxl_core:alloc_hpa:555: cxl region2: failed to allocate HPA: -34 Expand the message like this: cxl_core:alloc_hpa:555: cxl region8: HPA allocation error (-34) for size:0x20000000 in CXL Window 0 [mem 0xf010000000-0xf04fffffff flags 0x200] Now the debug user can examine /proc/iomem and consider actions like removing other allocations in that space or reducing the size of their region request. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20231223004740.1401858-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
ace196de |
|
14-Dec-2023 |
Dave Jiang <dave.jiang@intel.com> |
cxl: Fix unregister_region() callback parameter assignment In devm_cxl_add_region(), devm_add_action_or_reset() is called by passing in unregister_region() with data ptr of 'cxlr'. However, in unregister_region(), the passed in parameter is incorrectly assumed to be a 'struct device' rather than the 'cxlr' pointer. The code has been working because 'struct device' is the first member of 'struct cxl_region'. Issue found by inspection. Fix the assignment so that cxlr is pointing directly to the passed in parameter. Not flagged for -stable since there is no functional impact of this fix. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/170258123810.952211.3907381447996426480.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
5558b92e |
|
26-Nov-2023 |
Alison Schofield <alison.schofield@intel.com> |
cxl/core: Always hold region_rwsem while reading poison lists A read of a device poison list is triggered via a sysfs attribute and the results are logged as kernel trace events of type cxl_poison. The work is managed by either: a) the region driver when one of more regions map the device, or by b) the memdev driver when no regions map the device. In the case of a) the region driver holds the region_rwsem while reading the poison by committed endpoint decoder mappings and for any unmapped resources. This makes sure that the cxl_poison trace event trace reports valid region info. (Region name, HPA, and UUID). In the case of b) the memdev driver holds the dpa_rwsem preventing new DPA resources from being attached to a region. However, it leaves a gap between region attach and decoder commit actions. If a DPA in the gap is in the poison list, the cxl_poison trace event will omit the region info. Close the gap by holding the region_rwsem and the dpa_rwsem when reading poison per memdev. Since both methods now hold both locks, down_read both from the caller. Doing so also addresses the lockdep assert that found this issue: Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
5d09c63f |
|
31-Oct-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/hdm: Remove broken error path Dan reports that cxl_decoder_commit() potentially leaks a hold of cxl_dpa_rwsem. The potential error case is a "should not" happen scenario, turn it into a "can not" happen scenario by adding the error check to cxl_port_setup_targets() where other setting validation occurs. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: http://lore.kernel.org/r/63295673-5d63-4919-b851-3b06d48734c0@moroto.mountain Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
98a04c7a |
|
26-Oct-2023 |
Jim Harris <jim.harris@samsung.com> |
cxl/region: Fix x1 root-decoder granularity calculations Root decoder granularity must match value from CFWMS, which may not be the region's granularity for non-interleaved root decoders. So when calculating granularities for host bridge decoders, use the region's granularity instead of the root decoder's granularity to ensure the correct granularities are set for the host bridge decoders and any downstream switch decoders. Test configuration is 1 host bridge * 2 switches * 2 endpoints per switch. Region created with 2048 granularity using following command line: cxl create-region -m -d decoder0.0 -w 4 mem0 mem2 mem1 mem3 \ -g 2048 -s 2048M Use "cxl list -PDE | grep granularity" to get a view of the granularity set at each level of the topology. Before this patch: "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":512, "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":512, "interleave_granularity":256, After: "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":4096, "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":4096, "interleave_granularity":2048, Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Cc: <stable@vger.kernel.org> Signed-off-by: Jim Harris <jim.harris@samsung.com> Link: https://lore.kernel.org/r/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in [djbw: fixup the prebuilt cxl_test region] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
3531b27f |
|
25-Oct-2023 |
Li Zhijian <lizhijian@fujitsu.com> |
cxl/region: Fix cxl_region_rwsem lock held when returning to user space Fix a missed "goto out" to unlock on error to cleanup this splat: WARNING: lock held when returning to user space! 6.6.0-rc3-lizhijian+ #213 Not tainted ------------------------------------------------ cxl/673 is leaving the kernel with locks still held! 1 lock held by cxl/673: #0: ffffffffa013b9d0 (cxl_region_rwsem){++++}-{3:3}, at: commit_store+0x7d/0x3e0 [cxl_core] In terms of user visible impact of this bug for backports: cxl_region_invalidate_memregion() on x86 invokes wbinvd which is a problematic instruction for virtualized environments. So, on virtualized x86, cxl_region_invalidate_memregion() returns an error. This failure case got missed because CXL memory-expander device passthrough is not a production use case, and emulation of CXL devices is typically limited to kernel development builds with CONFIG_CXL_REGION_INVALIDATION_TEST=y, that makes cxl_region_invalidate_memregion() succeed. In other words, the expected exposure of this bug is limited to CXL subsystem development environments using QEMU that neglected CONFIG_CXL_REGION_INVALIDATION_TEST=y. Fixes: d1257d098a5a ("cxl/region: Move cache invalidation before region teardown, and before setup") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231025085450.2514906-1-lizhijian@fujitsu.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
0cf36a85 |
|
25-Oct-2023 |
Alison Schofield <alison.schofield@intel.com> |
cxl/region: Use cxl_calc_interleave_pos() for auto-discovery For auto-discovered regions the driver must assign each target to a valid position in the region interleave set based on the decoder topology. The current implementation fails to parse valid decode topologies, as it does not consider the child offset into a parent port. The sort put all targets of one port ahead of another port when an interleave was expected, causing the region assembly to fail. Replace the existing relative sort with cxl_calc_interleave_pos() that finds the exact position in a region interleave for an endpoint based on a walk up the ancestral tree from endpoint to root decoder. cxl_calc_interleave_pos() was introduced in a prior patch, so the work here is to use it in cxl_region_sort_targets(). Remove the obsoleted helper functions from the prior sort. Testing passes on pre-production hardware with BIOS defined regions that natively trigger this autodiscovery path of the region driver. Testing passes a CXL unit test using the dev_dbg() calculation test (see cxl_region_attach()) across an expanded set of region configs: 1, 1, 1+1, 1+1+1, 2, 2+2, 2+2+2, 2+2+2+2, 4, 4+4, where each number represents the count of endpoints per host bridge. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jim Harris <jim.harris@samsung.com> Link: https://lore.kernel.org/r/3946cc55ddc19678733eddc9de2c317749f43f3b.1698263080.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
a3e00c96 |
|
27-Oct-2023 |
Alison Schofield <alison.schofield@intel.com> |
cxl/region: Calculate a target position in a region interleave Introduce a calculation to find a target's position in a region interleave. Perform a self-test of the calculation on user-defined regions. The region driver uses the kernel sort() function to put region targets in relative order. Positions are assigned based on each target's index in that sorted list. That relative sort doesn't consider the offset of a port into its parent port which causes some auto-discovered regions to fail creation. In one failure case, a 2 + 2 config (2 host bridges each with 2 endpoints), the sort puts all the targets of one port ahead of another port when they were expected to be interleaved. In preparation for repairing the autodiscovery region assembly, introduce a new method for discovering a target position in the region interleave. cxl_calc_interleave_pos() adds a method to find the target position by ascending from an endpoint to a root decoder. The calculation starts with the endpoint's local position and position in the parent port. It traverses towards the root decoder and examines both position and ways in order to allow the position to be refined all the way to the root decoder. This calculation: position = position * parent_ways + parent_pos; applied iteratively yields the correct position. Include a self-test that exercises this new position calculation against every successfully configured user-defined region. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/0ac32c75cf81dd8b86bf07d70ff139d33c2300bc.1698263080.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
11105814 |
|
26-Oct-2023 |
Alison Schofield <alison.schofield@intel.com> |
cxl/region: Prepare the decoder match range helper for reuse match_decoder_by_range() and decoder_match_range() both determine if an HPA range matches a decoder. The first does it for root decoders and the second one operates on switch decoders. Tidy these up with clear naming and make the switch helper more like the root decoder helper in style and functionality. Make it take the actual range, rather than an endpoint decoder from which it extracts the range. Require an exact match on switch decoders, because unlike a root decoder that maps an entire region, Linux only supports 1:1 mapping of switch to endpoint decoders. Note that root-decoders are a super-set of switch-decoders and the range they cover is a super-set of a region, hence the use of range_contains() for that case. Aside from aesthetics and maintainability, this is in preparation for reuse. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jim Harris <jim.harris@samsung.com> Link: https://lore.kernel.org/r/011b1f498e1758bb8df17c5951be00bd8d489e3b.1698263080.git.alison.schofield@intel.com [djbw: fixup root decoder vs switch decoder range checks] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
0718588c |
|
11-Oct-2023 |
Jim Harris <jim.harris@samsung.com> |
cxl/region: Do not try to cleanup after cxl_region_setup_targets() fails Commit 5e42bcbc3fef ("cxl/region: decrement ->nr_targets on error in cxl_region_attach()") tried to avoid 'eiw' initialization errors when ->nr_targets exceeded 16, by just decrementing ->nr_targets when cxl_region_setup_targets() failed. Commit 86987c766276 ("cxl/region: Cleanup target list on attach error") extended that cleanup to also clear cxled->pos and p->targets[pos]. The initialization error was incidentally fixed separately by: Commit 8d4285425714 ("cxl/region: Fix port setup uninitialized variable warnings") which was merged a few days after 5e42bcbc3fef. But now the original cleanup when cxl_region_setup_targets() fails prevents endpoint and switch decoder resources from being reused: 1) the cleanup does not set the decoder's region to NULL, which results in future dpa_size_store() calls returning -EBUSY 2) the decoder is not properly freed, which results in future commit errors associated with the upstream switch Now that the initialization errors were fixed separately, the proper cleanup for this case is to just return immediately. Then the resources associated with this target get cleanup up as normal when the failed region is deleted. The ->nr_targets decrement in the error case also helped prevent a p->targets[] array overflow, so add a new check to prevent against that overflow. Tested by trying to create an invalid region for a 2 switch * 2 endpoint topology, and then following up with creating a valid region. Fixes: 5e42bcbc3fef ("cxl/region: decrement ->nr_targets on error in cxl_region_attach()") Cc: <stable@vger.kernel.org> Signed-off-by: Jim Harris <jim.harris@samsung.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169703589120.1202031.14696100866518083806.stgit@bgt-140510-bm03.eng.stellus.in Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
33981838 |
|
04-Oct-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/memdev: Fix sanitize vs decoder setup locking The sanitize operation is destructive and the expectation is that the device is unmapped while in progress. The current implementation does a lockless check for decoders being active, but then does nothing to prevent decoders from racing to be committed. Introduce state tracking to resolve this race. This incidentally cleans up unpriveleged userspace from triggering mmio read cycles by spinning on reading the 'security/state' attribute. Which at a minimum is a waste since the kernel state machine can cache the completion result. Lastly cxl_mem_sanitize() was mistakenly marked EXPORT_SYMBOL() in the original implementation, but an export was never required. Fixes: 0c36b6ad436a ("cxl/mbox: Add sanitization handling machinery") Cc: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
7914992b |
|
14-Sep-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/port: Quiet warning messages from the cxl_test environment The cxl_test platform device CXL port hierarchy is useful for testing, but throws warning messages of the form: cxl_mem mem2: at cxl_root_port.1 no parent for dport: platform cxl_mem mem3: at cxl_root_port.2 no parent for dport: platform cxl_mem mem4: at cxl_root_port.3 no parent for dport: platform cxl_mem mem5: at cxl_root_port.0 no parent for dport: platform cxl_mem mem6: at cxl_root_port.1 no parent for dport: platform cxl_mem mem7: at cxl_root_port.2 no parent for dport: platform cxl_mem mem8: at cxl_root_port.3 no parent for dport: platform cxl_mem mem9: at cxl_root_port.4 no parent for dport: platform cxl_mem mem10: at cxl_root_port.4 no parent for dport: platform ...and this message when running testing in QEMU: cxl_region region4: Bypassing cpu_cache_invalidate_memregion() for testing! Noisy cxl_test warnings have caused other regressions to be missed. In the interest of using cxl_test for early detection of dev_err() and dev_warn() messages, silence platform device topology and cache-invalidation messages. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
18f35dc9 |
|
22-Aug-2023 |
Alison Schofield <alison.schofield@intel.com> |
cxl/region: Refactor granularity select in cxl_port_setup_targets() In cxl_port_setup_targets() the region driver validates the configuration of auto-discovered region decoders, as well as decoders the driver is preparing to program. The existing calculations use the encoded interleave granularity value to create an interleave granularity that properly fans out when routing an x1 interleave to a greater than x1 interleave. That all worked well, until this config came along: Host Bridge: 2 way at 256 granularity Switch Decoder_A: 1 way at 512 Endpoint_X: 2 way at 256 Switch Decoder_B: 1 way at 512 Endpoint_Y: 2 way at 256 When the Host Bridge interleave is greater than 1 and the root decoder interleave is exactly 1, the region driver needs to consider the number of targets in the region when calculating the expected granularity. While examining the existing logic, and trying to cover the case above, a couple of simplifications appeared, hence this proposed refactoring. The first simplification is to apply the logic to the nominal values and use the existing helper function granularity_to_eig() to translate the desired granularity to the encoded form. This means the comment and code regarding setting address bits is discarded. Although that logic is not wrong, it adds a level of complexity that is not required in the granularity selection. The eig and eiw are indeed part of the routing instructions programmed into the decoders. Up-level the discussion to nominal ways and granularity for clearer analysis. The second simplification reduces the logic to a single granularity calculation that works for all cases. The new calculation doesn't care if parent_iw => 1 because parent_iw is used as a multiplier. The refactor cleans up a useless assignment of eiw made after the iw is already calculated. Regression testing included an examination of all of the ways and granularity selections made during a run of the cxl_test unit tests. There were no differences in selections before and after this patch. Fixes: ("27b3f8d13830 cxl/region: Program target lists") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230822180928.117596-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
9e4edf1a |
|
05-Sep-2023 |
Alison Schofield <alison.schofield@intel.com> |
cxl/region: Match auto-discovered region decoders by HPA range Currently, when the region driver attaches a region to a port, it selects the ports next available decoder to program. With the addition of auto-discovered regions, a port decoder has already been programmed so grabbing the next available decoder can be a mismatch when there is more than one region using the port. The failure appears like this with CXL DEBUG enabled: [] cxl_core:alloc_region_ref:754: cxl region0: endpoint9: HPA order violation region0:[mem 0x14780000000-0x1478fffffff flags 0x200] vs [mem 0x880000000-0x185fffffff flags 0x200] [] cxl_core:cxl_port_attach_region:972: cxl region0: endpoint9: failed to allocate region reference When CXL DEBUG is not enabled, there is no failure message. The region just never materializes. Users can suspect this issue if they know their firmware has programmed decoders so that more than one region is using a port. Note that the problem may appear intermittently, ie not on every reboot. Add a matching method for auto-discovered regions that finds a decoder based on an HPA range. The decoder range must exactly match the region resource parameter. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230905211007.256385-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8c897b36 |
|
14-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Manage decoder target_type at decoder-attach time Switch-level (mid-level) decoders between the platform root and an endpoint can dynamically switch modes between HDM-H and HDM-D[B] depending on which region they target. Use the region type to fixup each decoder that gets allocated to map the given region. Note that endpoint decoders are meant to determine the region type, so warn if those ever need to be fixed up, but since it is possible to continue do so. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679262543.3436160.13053831955768440312.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
5aa39a91 |
|
14-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/port: Rename CXL_DECODER_{EXPANDER, ACCELERATOR} => {HOSTONLYMEM, DEVMEM} In preparation for support for HDM-D and HDM-DB configuration (device-memory, and device-memory with back-invalidate). Rename the current type designators to use HOSTONLYMEM and DEVMEM as a suffix. HDM-DB can be supported by devices that are not accelerators, so DEVMEM is a more generic term for that case. Fixup one location where this type value was open coded. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679261369.3436160.7042443847605280593.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
adfe1973 |
|
16-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix state transitions after reset failure Jonathan reports that failed attempts to reset a region (teardown its HDM decoder configuration) mistakenly advance the state of the region to "not committed". Revert to the previous state of the region on reset failure so that the reset can be re-attempted. Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696507968.3590522.14484000711718573626.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
2ab47045 |
|
16-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Flag partially torn down regions as unusable cxl_region_decode_reset() walks all the decoders associated with a given region and disables them. Due to decoder ordering rules it is possible that a switch in the topology notices that a given decoder can not be shutdown before another region with a higher HPA is shutdown first. That can leave the region in a partially committed state. Capture that state in a new CXL_REGION_F_NEEDS_RESET flag and require that a successful cxl_region_decode_reset() attempt must be completed before cxl_region_probe() accepts the region. This is a corollary for the bug that Jonathan identified in "CXL/region : commit reset of out of order region appears to succeed." [1]. Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Link: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com [1] Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696507423.3590522.16254212607926684429.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
d1257d09 |
|
16-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Move cache invalidation before region teardown, and before setup Vikram raised a concern with the theoretical case of a CPU sending MemClnEvict to a device that is not prepared to receive. MemClnEvict is a message that is sent after a CPU has taken ownership of a cacheline from accelerator memory (HDM-DB). In the case of hotplug or HDM decoder reconfiguration it is possible that the CPU is holding old contents for a new device that has taken over the physical address range being cached by the CPU. To avoid this scenario, invalidate caches prior to tearing down an HDM decoder configuration. Now, this poses another problem that it is possible for something to speculate into that space while the decode configuration is still up, so to close that gap also invalidate prior to establish new contents behind a given physical address range. With this change the cache invalidation is now explicit and need not be checked in cxl_region_probe(), and that obviates the need for CXL_REGION_F_INCOHERENT. Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Fixes: d18bc74aced6 ("cxl/region: Manage CPU caches relative to DPA invalidation events") Reported-by: Vikram Sethi <vsethi@nvidia.com> Closes: http://lore.kernel.org/r/BYAPR12MB33364B5EB908BF7239BB996BBD53A@BYAPR12MB3336.namprd12.prod.outlook.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696506886.3590522.4597053660991916591.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
7481653d |
|
22-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl: Rename 'uport' to 'uport_dev' For symmetry with the recent rename of ->dport_dev for a 'struct cxl_dport', add the "_dev" suffix to the ->uport property of a 'struct cxl_port'. These devices represent the downstream-port-device and upstream-port-device respectively in the CXL/PCIe topology. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-6-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
227db574 |
|
22-Jun-2023 |
Robert Richter <rrichter@amd.com> |
cxl: Rename member @dport of struct cxl_dport to @dport_dev Reading code like dport->dport does not immediately suggest that this points to the corresponding device structure of the dport. Rename struct member @dport to @dport_dev. While at it, also rename @new argument of add_dport() to @dport. This better describes the variable as a dport (e.g. new->dport becomes to dport->dport_dev). Co-developed-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-5-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
f0832a58 |
|
18-Apr-2023 |
Alison Schofield <alison.schofield@intel.com> |
cxl/region: Provide region info to the cxl_poison trace event User space may need to know which region, if any, maps the poison address(es) logged in a cxl_poison trace event. Since the mapping of DPAs (device physical addresses) to a region can change, the kernel must provide this information at the time the poison list is read. The event informs user space that at event <timestamp> this <region> mapped to this <DPA>, which is poisoned. The cxl_poison trace event is already wired up to log the region name and uuid if it receives param 'struct cxl_region'. In order to provide that cxl_region, add another method for gathering poison - by committed endpoint decoder mappings. This method is only available with CONFIG_CXL_REGION and is only used if a region actually maps the memdev where poison is being read. After the region driver reads the poison list for all the mapped resources, poison is read for any remaining unmapped resources. The default method remains: read the poison by memdev resource. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/438b01ccaa70592539e8eda4eb2b1d617ba03160.1681838292.git.alison.schofield@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
9ff3eec9 |
|
03-Apr-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Move coherence tracking into cxl_region_attach() Each time the contents of a given HPA are potentially changed in a cache incoherent manner the CXL core sets CXL_REGION_F_INCOHERENT to invalidate CPU caches before the region is used. Successful invocation of attach_target() indicates that DPA has been newly assigned to a given HPA in the dynamic region creation flow. However, attach_target() is also reused in the autodiscovery flow where the region was activated by platform firmware. In that case there is no need to invalidate caches because that region is already in active use and nothing about the autodiscovery flow modifies the HPA-to-DPA relationship. In the autodiscovery case cxl_region_attach() exits early after determining the endpoint decoder is already correctly attached to the region. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168002858817.50647.1217607907088920888.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
030f8803 |
|
03-Apr-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix region setup/teardown for RCDs RCDs (CXL memory devices that link train without VH capability and show up as root complex integrated endpoints), hide the presence of the link between the endpoint and the host-bridge. The CXL region setup/teardown paths assume that a link hop is present and go looking for at least one 'struct cxl_port' instance between the CXL root port-object and an endpoint port-object leading to crashes of the form: BUG: kernel NULL pointer dereference, address: 0000000000000008 [..] RIP: 0010:cxl_region_setup_targets+0x3e9/0xae0 [cxl_core] [..] Call Trace: <TASK> cxl_region_attach+0x46c/0x7a0 [cxl_core] cxl_create_region+0x20b/0x270 [cxl_core] cxl_mock_mem_probe+0x641/0x800 [cxl_mock_mem] platform_probe+0x5b/0xb0 Detect RCDs explicitly and skip walking the non-existent port hierarchy between root and endpoint in that case. While this has been a problem since: commit 0a19bfc8de93 ("cxl/port: Add RCD endpoint port enumeration") ...it becomes a more reliable crash scenario with the new autodiscovery implementation. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168002858268.50647.728091521032131326.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
d35b495d |
|
03-Apr-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/port: Fix find_cxl_root() for RCDs and simplify it The find_cxl_root() helper is used to lookup root decoders and other CXL platform topology information for a given endpoint. It turns out that for RCDs it has never worked. The result of find_cxl_root(&cxlmd->dev) is always NULL for the RCH topology case because it expects to find a cxl_port at the host-bridge. RCH topologies only have the root cxl_port object with the host-bridge as a dport. While there are no reports of this being a problem to date, by inspection region enumeration should crash as a result of this problem, and it does in a local unit test for this scenario. However, an observation that ever since: commit f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue") ...all callers of find_cxl_root() occur after the memdev connection to the port topology has been established. That means that find_cxl_root() can be simplified to a walk of the endpoint port topology to the root. Switch to that arrangement which also fixes the RCD bug. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168002857715.50647.344876437247313909.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
7abcb0b1 |
|
13-Feb-2023 |
Arnd Bergmann <arnd@arndb.de> |
cxl: avoid returning uninitialized error code The new cxl_add_to_region() function returns an uninitialized value on success: drivers/cxl/core/region.c:2628:6: error: variable 'rc' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] if (IS_ERR(cxlr)) { ^~~~~~~~~~~~ drivers/cxl/core/region.c:2654:9: note: uninitialized use occurs here return rc; Simplify the logic to have the rc variable always initialized in the same place. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230213101220.3821689-1-arnd@kernel.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
09d09e04 |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/dax: Create dax devices for CXL RAM regions While platform firmware takes some responsibility for mapping the RAM capacity of CXL devices present at boot, the OS is responsible for mapping the remainder and hot-added devices. Platform firmware is also responsible for identifying the platform general purpose memory pool, typically DDR attached DRAM, and arranging for the remainder to be 'Soft Reserved'. That reservation allows the CXL subsystem to route the memory to core-mm via memory-hotplug (dax_kmem), or leave it for dedicated access (device-dax). The new 'struct cxl_dax_region' object allows for a CXL memory resource (region) to be published, but also allow for udev and module policy to act on that event. It also prevents cxl_core.ko from having a module loading dependency on any drivers/dax/ modules. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167602003896.1924368.10335442077318970468.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
a32320b7 |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
9995576c |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Move region-position validation to a helper In preparation for region autodiscovery, that needs all devices discovered before their relative position in the region can be determined, consolidate all position dependent validation in a helper. Recall that in the on-demand region creation flow the end-user picks the position of a given endpoint decoder in a region. In the autodiscovery case the position of an endpoint decoder can only be determined after all other endpoint decoders that claim to decode the region's address range have been enumerated and attached. So, in the autodiscovery case endpoint decoders may be attached before their relative position is known. Once all decoders arrive, then positions can be determined and validated with cxl_region_validate_position() the same as user initiated on-demand creation. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601997584.1924368.4615769326126138969.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
86987c76 |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Cleanup target list on attach error Jonathan noticed that the target list setup is not unwound completely upon error. Undo all the setup in the 'err_decrement:' exit path. Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Link: http://lore.kernel.org/r/20230208123031.00006990@Huawei.com Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601996980.1924368.390423634911157277.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
3528b1e1 |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Refactor attach_target() for autodiscovery Region autodiscovery is the process of kernel creating 'struct cxl_region' object to represent active CXL memory ranges it finds already active in hardware when the driver loads. Typically this happens when platform firmware establishes CXL memory regions and then publishes them in the memory map. However, this can also happen in the case of kexec-reboot after the kernel has created regions. In the autodiscovery case the region creation process starts with a known endpoint decoder. Refactor attach_target() into a helper that is suitable to be called from either sysfs, for runtime region creation, or from cxl_port_probe() after it has enumerated all endpoint decoders. The cxl_port_probe() context is an async device-core probing context, so it is not appropriate to allow SIGTERM to interrupt the assembly process. Refactor attach_target() to take @cxled and @state as arguments where @state indicates whether waiting from the region rwsem is interruptible or not. No behavior change is intended. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/167601996393.1924368.2202255054618600069.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
6e099264 |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Add volatile region creation support Expand the region creation infrastructure to enable 'ram' (volatile-memory) regions. The internals of create_pmem_region_store() and create_pmem_region_show() are factored out into helpers __create_region() and __create_region_show() for the 'ram' case to reuse. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/167601995775.1924368.352616146815830591.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
1b9b7a6f |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Validate region mode vs decoder mode In preparation for a new region mode, do not, for example, allow 'ram' decoders to be assigned to 'pmem' regions and vice versa. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/167601995111.1924368.7459128614177994602.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
a8e7d558 |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Support empty uuids for non-pmem regions Shipping versions of the cxl-cli utility expect all regions to have a 'uuid' attribute. In preparation for 'ram' regions, update the 'uuid' attribute to return an empty string which satisfies the current expectations of 'cxl list -R'. Otherwise, 'cxl list -R' fails in the presence of regions with the 'uuid' attribute missing. Force the attribute to be read-only as there is no facility or expectation for a 'ram' region to recall its uuid from one boot to the next. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601994558.1924368.12612811533724694444.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
7d505f98 |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Add a mode attribute for regions In preparation for a new region type, "ram" regions, add a mode attribute to clarify the mode of the decoders that can be added to a region. Share the internals of mode_show() (for decoders) with the region case. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/167601993930.1924368.4305018565539515665.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
cbbd05d0 |
|
24-Jan-2023 |
Randy Dunlap <rdunlap@infradead.org> |
cxl: fix spelling mistakes Correct spelling mistakes (reported by codespell). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-cxl@vger.kernel.org Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/20230125032221.21277-1-rdunlap@infradead.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
af3ea9ab |
|
16-Dec-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Clarify when a cxld->commit() callback is mandatory Both cxl_switch_decoders() and cxl_endpoint_decoders() are considered by cxl_region_decode_commit(). Flag cases where cxl_switch_decoders with multiple targets, or cxl_endpoint_decoders do not have a commit callback set. The switch case is unlikely to happen since switches are only enumerated by the CXL core, but the endpoint case may support decoders defined by drivers outside of drivers/cxl, like accerator drivers. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167124081824.1626103.1555704405392757219.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
e520d52d |
|
15-Dec-2022 |
Davidlohr Bueso <dave@stgolabs.net> |
cxl/region: Only warn about cpu_cache_invalidate_memregion() once No need for more than once per module load. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20221215183836.24136-1-dave@stgolabs.net Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
711442e2 |
|
07-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix passthrough-decoder detection A passthrough decoder is a decoder that maps only 1 target. It is a special case because it does not impose any constraints on the interleave-math as compared to a decoder with multiple targets. Extend the passthrough case to multi-target-capable decoders that only have one target selected. I.e. the current code was only considering passthrough *ports* which are only a subset of the potential passthrough decoder scenarios. Fixes: e4f6dfa9ef75 ("cxl/region: Fix 'distance' calculation with passthrough ports") Cc: <stable@vger.kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167564540422.847146.13816934143225777888.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
4fa4302d |
|
15-Dec-2022 |
Fan Ni <fan.ni@samsung.com> |
cxl/region: Fix null pointer dereference for resetting decoder Not all decoders have a reset callback. The CXL specification allows a host bridge with a single root port to have no explicit HDM decoders. Currently the region driver assumes there are none. As such the CXL core creates a special pass through decoder instance without a commit/reset callback. Prior to this patch, the ->reset() callback was called unconditionally when calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge, 1 Root Port, and one directly attached CXL type 3 device or multiple CXL type 3 devices attached to downstream ports of a switch can cause a null pointer dereference. Before the fix, a kernel crash was observed when we destroy the region, and a pass through decoder is reset. The issue can be reproduced as below, 1) create a region with a CXL setup which includes a HB with a single root port under which a memdev is attached directly. 2) destroy the region with cxl destroy-region regionX -f. Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Cc: <stable@vger.kernel.org> Signed-off-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Link: https://lore.kernel.org/r/20221215170909.2650271-1-fan.ni@samsung.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
f04facfb |
|
07-Nov-2022 |
Fan Ni <fan.ni@samsung.com> |
cxl/region: Fix memdev reuse check Due to a typo, the check of whether or not a memdev has already been used as a target for the region (above code piece) will always be skipped. Given a memdev with more than one HDM decoder, an interleaved region can be created that maps multiple HPAs to the same DPA. According to CXL spec 3.0 8.1.3.8.4, "Aliasing (mapping more than one Host Physical Address (HPA) to a single Device Physical Address) is forbidden." Fix this by using existing iterator for memdev reuse check. Cc: <stable@vger.kernel.org> Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20221107212153.745993-1-fan.ni@samsung.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
c99b2e8c |
|
05-Dec-2022 |
Dave Jiang <dave.jiang@intel.com> |
cxl: update names for interleave ways conversion macros Change names for interleave ways macros to clearly indicate which variable is encoded and which is the actual ways value. ways == interleave ways eiw == encoded interleave ways Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167027516228.3124679.11265039496968588580.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
83351ddb |
|
05-Dec-2022 |
Dave Jiang <dave.jiang@intel.com> |
cxl: update names for interleave granularity conversion macros Change names for granularity macros to clearly indicate which variable is encoded and which is the actual granularity. granularity == interleave granularity eig == encoded interleave granularity Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167027493237.3124429.8948852388671827664.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
cb4cdf74 |
|
05-Dec-2022 |
Colin Ian King <colin.i.king@gmail.com> |
cxl/region: Fix spelling mistake "memergion" -> "memregion" There is a spelling mistake in a dev_warn message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20221205091819.1943564-1-colin.i.king@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
d18bc74a |
|
01-Dec-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Manage CPU caches relative to DPA invalidation events A "DPA invalidation event" is any scenario where the contents of a DPA (Device Physical Address) is modified in a way that is incoherent with CPU caches, or if the HPA (Host Physical Address) to DPA association changes due to a remapping event. PMEM security events like Unlock and Passphrase Secure Erase already manage caches through LIBNVDIMM, so that leaves HPA to DPA remap events that need cache management by the CXL core. Those only happen when the boot time CXL configuration has changed. That event occurs when userspace attaches an endpoint decoder to a region configuration, and that region is subsequently activated. The implications of not invalidating caches between remap events is that reads from the region at different points in time may return different results due to stale cached data from the previous HPA to DPA mapping. Without a guarantee that the region contents after cxl_region_probe() are written before being read (a layering-violation assumption that cxl_region_probe() can not make) the CXL subsystem needs to ensure that reads that precede writes see consistent results. A CONFIG_CXL_REGION_INVALIDATION_TEST option is added to support debug and unit testing of the CXL implementation in QEMU or other environments where cpu_cache_has_invalidate_memregion() returns false. This may prove too restrictive for QEMU where the HDM decoders are emulated, but in that case the CXL subsystem needs some new mechanism / indication that the HDM decoder is emulated and not a passthrough of real hardware. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/166993222098.1995348.16604163596374520890.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
bf3e5da8 |
|
01-Dec-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix missing probe failure cxl_region_probe() allows for regions not in the 'commit' state to be enabled. Fail probe when the region is not committed otherwise the kernel may indicate that an address range is active when none of the decoders are active. Fixes: 8d48817df6ac ("cxl/region: Add region driver boiler plate") Cc: <stable@vger.kernel.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/166993220462.1995348.1698008475198427361.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
f17b558d |
|
01-Dec-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/pmem: Refactor nvdimm device registration, delete the workqueue The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and 'struct cxl_pmem_region' manage CXL persistent memory resources. The bridge represents base platform resources, the nvdimm represents one or more endpoints, and the region is a collection of nvdimms that contribute to an assembled address range. Their relationship is such that a region is torn down if any component endpoints are removed. All regions and endpoints are torn down if the foundational bridge device goes down. A workqueue was deployed to manage these interdependencies, but it is difficult to reason about, and fragile. A recent attempt to take the CXL root device lock in the cxl_mem driver was reported by lockdep as colliding with the flush_work() in the cxl_pmem flows. Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn down immediately and hierarchically. A similar change is made to both the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both changes are made in the same patch which unfortunately makes the patch bigger than desired. Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and cxl_pmem_region as a devres release action of the bridge device. Additionally, include a devres release action of the cxl_memdev or cxl_region device that triggers the bridge's release action if an endpoint exits before the bridge. I.e. this allows either unplugging the bridge, or unplugging and endpoint to result in the same cleanup actions. To keep the patch smaller the cleanup of the now defunct workqueue infrastructure is saved for a follow-on patch. Tested-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8f401ec1 |
|
03-Nov-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Recycle region ids At region creation time the next region-id is atomically cached so that there is predictability of region device names. If that region is destroyed and then a new one is created the region id increments. That ends up looking like a memory leak, or is otherwise surprising that identifiers roll forward even after destroying all previously created regions. Try to reuse rather than free old region ids at region release time. While this fixes a cosmetic issue, the needlessly advancing memory region-id gives the appearance of a memory leak, hence the "Fixes" tag, but no "Cc: stable" tag. Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Fixes: 779dd20cfb56 ("cxl/region: Add region creation support") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752186062.947915.13200195701224993317.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
e4f6dfa9 |
|
03-Nov-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix 'distance' calculation with passthrough ports When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible. The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets. An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices). However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port. Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance. Cc: <stable@vger.kernel.org> Reported-by: Bobo WL <lmw.bobo@gmail.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
0d9e7340 |
|
03-Nov-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix cxl_region leak, cleanup targets at region delete When a region is deleted any targets that have been previously assigned to that region hold references to it. Trigger those references to drop by detaching all targets at unregister_region() time. Otherwise that region object will leak as userspace has lost the ability to detach targets once region sysfs is torn down. Cc: <stable@vger.kernel.org> Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752183055.947915.17681995648556534844.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
a90accb3 |
|
03-Nov-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix region HPA ordering validation Some regions may not have any address space allocated. Skip them when validating HPA order otherwise a crash like the following may result: devm_cxl_add_region: cxl_acpi cxl_acpi.0: decoder3.4: created region9 BUG: kernel NULL pointer dereference, address: 0000000000000000 [..] RIP: 0010:store_targetN+0x655/0x1740 [cxl_core] [..] Call Trace: <TASK> kernfs_fop_write_iter+0x144/0x200 vfs_write+0x24a/0x4d0 ksys_write+0x69/0xf0 do_syscall_64+0x3a/0x90 store_targetN+0x655/0x1740: alloc_region_ref at drivers/cxl/core/region.c:676 (inlined by) cxl_port_attach_region at drivers/cxl/core/region.c:850 (inlined by) cxl_region_attach at drivers/cxl/core/region.c:1290 (inlined by) attach_target at drivers/cxl/core/region.c:1410 (inlined by) store_targetN at drivers/cxl/core/region.c:1453 Cc: <stable@vger.kernel.org> Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/166752182461.947915.497032805239915067.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
71ee71d7 |
|
01-Nov-2022 |
Vishal Verma <vishal.l.verma@intel.com> |
cxl/region: Fix decoder allocation crash When an intermediate port's decoders have been exhausted by existing regions, and creating a new region with the port in question in it's hierarchical path is attempted, cxl_port_attach_region() fails to find a port decoder (as would be expected), and drops into the failure / cleanup path. However, during cleanup of the region reference, a sanity check attempts to dereference the decoder, which in the above case didn't exist. This causes a NULL pointer dereference BUG. To fix this, refactor the decoder allocation and de-allocation into helper routines, and in this 'free' routine, check that the decoder, @cxld, is valid before attempting any operations on it. Cc: <stable@vger.kernel.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
2816e24b |
|
18-Aug-2022 |
Jonathan Cameron <Jonathan.Cameron@huawei.com> |
cxl/region: Fix null pointer dereference due to pass through decoder commit Not all decoders have a commit callback. The CXL specification allows a host bridge with a single root port to have no explicit HDM decoders. Currently the region driver assumes there are none. As such the CXL core creates a special pass through decoder instance without a commit callback. Prior to this patch, the ->commit() callback was called unconditionally. Thus a configuration with 1 Host Bridge, 1 Root Port, 1 switch with multiple downstream ports below which there are multiple CXL type 3 devices results in a situation where committing the region causes a null pointer dereference. Reported-by: Bobo WL <lmw.bobo@gmail.com> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20220818164210.2084-1-Jonathan.Cameron@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
4d8e4ea5 |
|
05-Aug-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Disallow region granularity != window granularity The endpoint decode granularity must be <= the window granularity otherwise capacity in the endpoints is lost in the decode. Consider an attempt to have a region granularity of 512 with 4 devices within a window that maps 2 host bridges at a granularity of 256 bytes: HPA DPA Offset HB Port EP 0x0 0x0 0 0 0 0x100 0x0 1 0 2 0x200 0x100 0 0 0 0x300 0x100 1 0 2 0x400 0x200 0 1 1 0x500 0x200 1 1 3 0x600 0x300 0 1 1 0x700 0x300 1 1 3 0x800 0x400 0 0 0 0x900 0x400 1 0 2 0xA00 0x500 0 0 0 0xB00 0x500 1 0 2 Notice how endpoint0 maps HPA 0x0 and 0x200 correctly, but then at HPA 0x800 it results in DPA 0x200-0x400 on being skipped. Fix this by restricing the region granularity to be equal to the window granularity resulting in the following for a x4 region under a x2 window at a granularity of 256. HPA DPA Offset HB Port EP 0x0 0x0 0 0 0 0x100 0x0 1 0 2 0x200 0x0 0 1 1 0x300 0x0 1 1 3 0x400 0x100 0 0 0 0x500 0x100 1 0 2 0x600 0x100 0 1 1 0x700 0x100 1 1 3 Not that it ever made practical sense to support region granularity > window granularity. The window rotates host bridges causing endpoints to never see a consecutive stream of requests at the desired granularity without breaks to issue cycles to the other host bridge. Fixes: 80d10a6cee05 ("cxl/region: Add interleave geometry attributes") Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973127171.1526540.9923273539049172976.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
298d44d0 |
|
05-Aug-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix x1 interleave to greater than x1 interleave routing In cases where the decode fans out as it traverses downstream, the interleave granularity needs to increment to identify the port selector bits out of the remaining address bits. For example, recall that with an x2 parent port intereleave (IW == 1), the downstream decode for children of those ports will either see address bit IG+8 always set, or address bit IG+8 always clear. So if the child port needs to select a downstream port it can only use address bits starting at IG+9 (where IG and IW are the CXL encoded values for interleave granularity (ilog2(ig) - 8) and ways (ilog2(iw))). When the parent port interleave is x1 no such masking occurs and the child port can maintain the granularity that was routed to the parent port. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973126583.1526540.657948655360009242.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
910bc55d |
|
05-Aug-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Move HPA setup to cxl_region_attach() A recent bug fix added the setup of the endpoint decoder interleave geometry settings to cxl_region_attach(). Move the HPA setup there as well to keep all endpoint decoder parameter setting in a central location. For symmetry, move endpoint HPA teardown to cxl_region_detach(), and for switches move HPA setup / teardown to cxl_port_{setup,reset}_targets(). Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973126020.1526540.14701949254436069807.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
2901c8bd |
|
02-Aug-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix decoder interleave programming Jonathan notes: "Curiously interleave ways = 1 for the EPs which is obviously wrong" ...while testing the latest CXL development branch on QEMU. It turns out the region creation process failed to program the endpoint decoders. This was missed because the default settings of x1 at 4K intereleave still results in the region appearing to function. Jonathan caught the bug by reverse mapping the translations that need to happen for the QEMU support. Link: https://lore.kernel.org/r/62e95fdf9f6e2_30440294e4@dwillia2-xfh.jf.intel.com.notmuch Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165951146336.967013.11160153960900111443.stgit@dwillia2-xfh.jf.intel.com Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
f13da0d9 |
|
04-Aug-2022 |
Bagas Sanjaya <bagasdotme@gmail.com> |
cxl/regions: add padding for cxl_rr_ep_add nested lists Sphinx reported indentation warnings: Documentation/driver-api/cxl/memory-devices:457: ./drivers/cxl/core/region.c:732: WARNING: Unexpected indentation. Documentation/driver-api/cxl/memory-devices:457: ./drivers/cxl/core/region.c:733: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/driver-api/cxl/memory-devices:457: ./drivers/cxl/core/region.c:735: WARNING: Unexpected indentation. These warnings above are due to missing blank line padding in the nested list in kernel-doc comment for cxl_rr_ep_add(). Add the paddings to fix the warnings. Fixes: 384e624bb211b4 ("cxl/region: Attach endpoint decoders") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220804075448.98241-2-bagasdotme@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
e29a8995 |
|
01-Aug-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix region reference target accounting Dan reports: The error handling in cxl_port_attach_region() looks like it might have a similar bug. The cxl_rr->nr_targets++; might want a --. That function is more complicated. Indeed cxl_rr->nr_targets leaks when cxl_rr_ep_add() fails, but that flow is not clear. Fix the bug and the clarity by separating the 'new' region-reference case from the 'extend' region-reference case. This also moves the host-physical-address (HPA) validation, that the HPA of a new region being accounted to the port is greater than the HPA of all other regions associated with the port, to alloc_region_ref(). Introduce @nr_targets_inc to track when the error exit path needs to clean up cxl_rr->nr_targets. Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: http://lore.kernel.org/r/165939482134.252363.1915691883146696327.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
69c99613 |
|
02-Aug-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix region commit uninitialized variable warning 0day robot reports: drivers/cxl/core/region.c:196 cxl_region_decode_commit() error: uninitialized symbol 'rc'. The re-checking of loop termination conditions to determine "success" makes it hard to see that @rc is initialized in all cases. Remove those to make it explicit that @rc reflects a commit error and that the rest of logic is concerned with unwinding committed decoders. This change potentially results in cxl_region_decode_reset() being called with @count == 0 where it was not called before, but cxl_region_decode_reset() treats that as a nop. Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/165951148105.967013.14191992449932268431.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8d428542 |
|
02-Aug-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Fix port setup uninitialized variable warnings 0day robot reports: drivers/cxl/core/region.c:1068 cxl_port_setup_targets() error: uninitialized symbol 'eiw'. drivers/cxl/core/region.c:1068 cxl_port_setup_targets() error: uninitialized symbol 'peig'. drivers/cxl/core/region.c:1068 cxl_port_setup_targets() error: uninitialized symbol 'peiw'. ...which are all valid reports. Add debug statement to consume the, albeit unexpected, errors. Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165951147487.967013.929590444907251028.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
817b2794 |
|
01-Aug-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Stop initializing interleave granularity In preparation for a patch that validates that the region ways setting is compatible with the granularity setting, the initial granularity setting needs to start at zero to indicate "unset". Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/165853777484.2430596.3423921169034844397.stgit@dwillia2-xfh.jf.intel.com [djbw: fix up unused variable] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
5e42bcbc |
|
01-Aug-2022 |
Dan Carpenter <dan.carpenter@oracle.com> |
cxl/region: decrement ->nr_targets on error in cxl_region_attach() The ++ needs a match -- on the clean up path. If the p->nr_targets value gets to be more than 16 it leads to uninitialized data in cxl_port_setup_targets(). drivers/cxl/core/region.c:995 cxl_port_setup_targets() error: uninitialized symbol 'eiw'. Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YuepCvUAoCtdpcoO@kili Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
c7e3548c |
|
01-Aug-2022 |
Dan Carpenter <dan.carpenter@oracle.com> |
cxl/region: prevent underflow in ways_to_cxl() The "ways" variable comes from the user. The ways_to_cxl() function has an upper bound but it doesn't check for negatives. Make the "ways" variable an unsigned int to fix this bug. Fixes: 80d10a6cee05 ("cxl/region: Add interleave geometry attributes") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Yueo3NV2hFCXx1iV@kili [djbw: fixup interleave_ways_store() to only accept unsigned input] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
88ab1dde |
|
01-Aug-2022 |
Dan Carpenter <dan.carpenter@oracle.com> |
cxl/region: uninitialized variable in alloc_hpa() This should check "p->res" instead of "res" (which is uninitialized). Fixes: 23a22cd1c98b ("cxl/region: Allocate HPA capacity to regions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Yueor88I/DkVSOtL@kili Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
04ad63f0 |
|
11-Jan-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Introduce cxl_pmem_region objects The LIBNVDIMM subsystem is a platform agnostic representation of system NVDIMM / persistent memory resources. To date, the CXL subsystem's interaction with LIBNVDIMM has been to register an nvdimm-bridge device and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM subsystem mechanics. With regions the approach is the same. Create a new cxl_pmem_region object to proxy CXL region details into a LIBNVDIMM definition. With this enabling LIBNVDIMM can partition CXL persistent memory regions with legacy namespace labels. A follow-on patch will add CXL region label and CXL namespace label support to persist region configurations across driver reload / system-reset events. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784340111.1758207.3036498385188290968.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8d48817d |
|
15-Jun-2021 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Add region driver boiler plate The CXL region driver is responsible for routing fully formed CXL regions to one of libnvdimm, for persistent memory regions, device-dax for volatile memory regions, or just act as an enumeration placeholder if the region was setup and configuration locked by platform firmware. In the platform-firmware-setup case the expectation is that region is already accounted in the system memory map, i.e. already enabled as "System RAM". For now, just attach to CXL regions in the CXL_CONFIG_COMMIT state, and take no further action. Given this driver is just a small / simple router, include it in the core rather than its own module. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-18-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
176baefb |
|
08-Jun-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/hdm: Commit decoder state to hardware After all the soft validation of the region has completed, convey the region configuration to hardware while being careful to commit decoders in specification mandated order. In addition to programming the endpoint decoder base-address, interleave ways and granularity, the switch decoder target lists are also established. While the kernel can enforce spec-mandated commit order, it can not enforce spec-mandated reset order. For example, the kernel can't stop someone from removing an endpoint device that is occupying decoderN in a switch decoder where decoderN+1 is also committed. To reset decoderN, decoderN+1 must be torn down first. That "tear down the world" implementation is saved for a follow-on patch. Callback operations are provided for the 'commit' and 'reset' operations. While those callbacks may prove useful for CXL accelerators (Type-2 devices with memory) the primary motivation is to enable a simple way for cxl_test to intercept those operations. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784338418.1758207.14659830845389904356.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
27b3f8d1 |
|
06-Jun-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Program target lists Once the region's interleave geometry (ways, granularity, size) is established and all the endpoint decoder targets are assigned, the next phase is to program all the intermediate decoders. Specifically, each CXL switch in the path between the endpoint and its CXL host-bridge (including the logical switch internal to the host-bridge) needs to have its decoders programmed and the target list order assigned. The difficulty in this implementation lies in determining which endpoint decoder ordering combinations are valid. Consider the cxl_test case of 2 host bridges, each of those host-bridges attached to 2 switches, and each of those switches attached to 2 endpoints for a potential 8-way interleave. The x2 interleave at the host-bridge level requires that all even numbered endpoint decoder positions be located on the "left" hand side of the topology tree, and the odd numbered positions on the other. The endpoints that are peers on the same switch need to have a position that can be routed with a dedicated address bit per-endpoint. See check_last_peer() for the details. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337827.1758207.132121746122685208.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
384e624b |
|
07-Jun-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Attach endpoint decoders CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
b9686e8c |
|
04-Jun-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Enable the assignment of endpoint decoders to regions The region provisioning process involves allocating DPA to a set of endpoint decoders, and HPA plus the region geometry to a region device. Then the decoder is assigned to the region. At this point several validation steps can be performed to validate that the decoder is suitable to participate in the region. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/165784336184.1758207.16403282029203949622.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
23a22cd1 |
|
25-Apr-2022 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Allocate HPA capacity to regions After a region's interleave parameters (ways and granularity) are set, add a way for regions to allocate HPA (host physical address space) from the free capacity in their parent root-decoder. The allocator for this capacity reuses the 'struct resource' based allocator used for CONFIG_DEVICE_PRIVATE. Once the tuple of "ways, granularity, [uuid], and size" is set the region configuration transitions to the CXL_CONFIG_INTERLEAVE_ACTIVE state which is a precursor to allowing endpoint decoders to be added to a region. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784335630.1758207.420216490941955417.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
80d10a6c |
|
25-Apr-2022 |
Ben Widawsky <bwidawsk@kernel.org> |
cxl/region: Add interleave geometry attributes Add ABI to allow the number of devices that comprise a region to be set as well as the interleave granularity for the region. Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> [djbw: reword changelog] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-11-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
dd5ba0eb |
|
27-May-2021 |
Ben Widawsky <bwidawsk@kernel.org> |
cxl/region: Add a 'uuid' attribute The process of provisioning a region involves triggering the creation of a new region object, pouring in the configuration, and then binding that configured object to the region driver to start its operation. For persistent memory regions the CXL specification mandates that it identified by a uuid. Add an ABI for userspace to specify a region's uuid. Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> [djbw: simplify locking] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784334465.1758207.8224025435884752570.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
779dd20c |
|
08-Jun-2021 |
Ben Widawsky <bwidawsk@kernel.org> |
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|