#
f36be9ce |
|
19-Dec-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
EDAC: constantify the struct bus_type usage In many places in the edac code, struct bus_type pointers are passed around and then eventually sent to the driver core, which can handle a constant pointer. So constantify all of the edac usage of these as well because the data in them is never modified by the edac code either. Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: <linux-edac@vger.kernel.org> Link: https://lore.kernel.org/r/2023121909-tribute-punctuate-4b22@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
9a5f580c |
|
02-Nov-2023 |
Muralidhara M K <muralidhara.mk@amd.com> |
EDAC/mc: Add support for HBM3 memory type AMD MI300A models use HBM3 (High Bandwidth Memory Gen 3) memory. HBM is a high-speed computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM). Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231102114225.2006878-4-muralimk@amd.com
|
#
9a1043d4 |
|
22-Aug-2022 |
Serge Semin <Sergey.Semin@baikalelectronics.ru> |
EDAC/mc: Replace spaces with tabs in memtype flags definition Currently, the memory type macros are partly defined with multiple spaces between the macro name and its definition. Replace the spaces with tabs as the kernel coding style requires. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220822190730.27277-13-Sergey.Semin@baikalelectronics.ru
|
#
f9571124 |
|
08-Dec-2021 |
Yazen Ghannam <yazen.ghannam@amd.com> |
EDAC: Add RDDR5 and LRDDR5 memory types Include Registered-DDR5 and Load-Reduced DDR5 in the list of memory types. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208174356.1997855-2-yazen.ghannam@amd.com
|
#
e1ca90b7 |
|
30-Jun-2021 |
Naveen Krishna Chatradhi <nchatrad@amd.com> |
EDAC/mc: Add new HBM2 memory type Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]' for HBM2 (High Bandwidth Memory Gen 2) new memory type. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210630152828.162659-4-nchatrad@amd.com
|
#
bc1c99a5 |
|
17-Nov-2020 |
Qiuxu Zhuo <qiuxu.zhuo@intel.com> |
EDAC: Add DDR5 new memory type Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]' for DDR5 new memory type. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
3b203693 |
|
05-Nov-2020 |
Qiuxu Zhuo <qiuxu.zhuo@intel.com> |
EDAC: Add three new memory types There are {Low-Power DDR3/4, WIO2} types of memory. Add new entries to 'enum mem_type' and new strings to 'edac_mem_types[]' for the new types. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
24269999 |
|
23-Oct-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
EDAC: Fix some kernel-doc markups Kernel-doc markup should use this format: identifier - description Correct that and also fix some enums' names in the kernel-doc markup. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1d291393ba58c7b80908a3fedf02d2f53921ffe9.1603469755.git.mchehab+huawei@kernel.org
|
#
e370f886 |
|
16-Jun-2020 |
Borislav Petkov <bp@suse.de> |
EDAC: Remove edac_get_dimm_by_index() It is unused now. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
7fc0b9b9 |
|
14-Feb-2020 |
Tony Luck <tony.luck@intel.com> |
EDAC: Drop the EDAC report status checks When acpi_extlog was added, we were worried that the same error would be reported more than once by different subsystems. But in the ensuing years I've seen complaints that people could not find an error log (because this mechanism suppressed the log they were looking for). Rip it all out. People are smart enough to notice the same address from different reporting mechanisms. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-8-tony.luck@intel.com
|
#
4aa92c86 |
|
16-Feb-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Remove per layer counters Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it turns out that only the leaves in the memory hierarchy are consumed (in sysfs), but not the intermediate layers, e.g.: count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx]; These unused counters only add complexity, remove them. The error counter values are directly stored in struct dimm_info now. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-11-rrichter@marvell.com
|
#
67792cf9 |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Remove enable_per_layer_report function argument Many functions carry the enable_per_layer_report argument. This is a bool value indicating the error information contains some location data where the error occurred. This can easily being determined by checking the pos[] array for values. Negative values indicate there is no location available. So if the top layer is negative, the error location is unknown. Just check if the top layer is negative and remove enable_per_layer_report as function argument and also from struct edac_raw_error_desc. [ bp: Reflow comments to 80 columns, while at it. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-8-rrichter@marvell.com
|
#
672ef0e5 |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC: Store error type in struct edac_raw_error_desc Store the error type in struct edac_raw_error_desc. This makes the type parameter of edac_raw_mc_handle_error() obsolete. [ kernel-doc typo ] Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-4-rrichter@marvell.com
|
#
98edb865 |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC: Remove misleading comment in struct edac_raw_error_desc There never has been such function edac_raw_error_desc_clean() and in function ghes_edac_report_mem_error() the whole struct is zero'ed including the string arrays. Remove that comment. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-9-rrichter@marvell.com
|
#
c498afaf |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC: Introduce an mci_for_each_dimm() iterator Introduce an mci_for_each_dimm() iterator. It returns a pointer to a struct dimm_info. This makes the declaration and use of an index obsolete and avoids access to internal data of struct mci (direct array access etc). [ bp: push the struct dimm_info *dimm; declaration into the CONFIG_EDAC_DEBUG block. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com
|
#
977b1ce7 |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC: Remove EDAC_DIMM_OFF() macro The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index. Simplify this by storing the index in struct dimm_info to avoid its calculation and remove the EDAC_DIMM_OFF() macro. The index can be directly used then. Another advantage is that edac_mc_alloc() could be used even if the exact size of the layers is unknown. Only the number of DIMMs would be needed. Rename iterator variable to idx, while at it. The name is more handy, esp. when searching for it in the code. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com
|
#
bc9ad9e4 |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function The EDAC_DIMM_PTR() macro takes 3 arguments from struct mem_ctl_info. Clean up this interface to only pass the mci struct and replace this macro with a new function edac_get_dimm(). Also introduce an edac_get_dimm_by_index() function for later use. This allows it to get a DIMM pointer only by a given index. This can be useful if the DIMM's position within the layers of the memory controller or the exact size of the layers are unknown. Small style changes made for some hunks after applying the semantic patch. Semantic patch used: @@ expression mci, a, b,c; @@ -EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, a, b, c) +edac_get_dimm(mci, a, b, c) [ bp: Touchups. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tero Kristo <t-kristo@ti.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-2-rrichter@marvell.com
|
#
d55c79ac |
|
01-Sep-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC: Prefer 'unsigned int' to bare use of 'unsigned' Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for edac_mc*, ghes and the i5100 driver as reported by checkpatch.pl. While at it, struct member dev_ch_attribute->channel is always used as unsigned int. Change type to unsigned int to avoid type casts. [ bp: Massage. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190902123216.9809-2-rrichter@marvell.com
|
#
861e6ed6 |
|
05-Nov-2018 |
Borislav Petkov <bp@suse.de> |
EDAC: Drop per-memory controller buses ... and use the single edac_subsys object returned from subsys_system_register(). The idea is to have a single bus and multiple devices on it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> CC: Aristeu Rozanski Filho <arozansk@redhat.com> CC: Greg KH <gregkh@linuxfoundation.org> CC: Justin Ernst <justin.ernst@hpe.com> CC: linux-edac <linux-edac@vger.kernel.org> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Russ Anderson <rja@hpe.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic
|
#
6b588594 |
|
25-Sep-2018 |
Justin Ernst <justin.ernst@hpe.com> |
EDAC: Raise the maximum number of memory controllers We observe an oops in the skx_edac module during boot: EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0 EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1 EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0 ... EDAC MC13: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1 EDAC MC14: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0 EDAC MC15: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1 Too many memory controllers: 16 EDAC MC: Removed device 0 for skx_edac Skylake Socket#0 IMC#0 We observe there are two memory controllers per socket, with a limit of 16. Raise the maximum number of memory controllers from 16 to 2 * MAX_NUMNODES (1024). [ bp: This is just a band-aid fix until we've sorted out the whole issue with the bus_type association and handling in EDAC and can get rid of this arbitrary limit. ] Signed-off-by: Justin Ernst <justin.ernst@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Russ Anderson <russ.anderson@hpe.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20180925143449.284634-1-justin.ernst@hpe.com
|
#
c798c88f |
|
18-Sep-2018 |
Fan Wu <wufan@codeaurora.org> |
EDAC, ghes: Use CPER module handles to locate DIMMs Use SMBIOS module handle type 17, on platforms which provide valid ones, to locate the corresponding DIMM and thus have per-DIMM error counter updates. Signed-off-by: Fan Wu <wufan@codeaurora.org> [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tyler Baicar <baicar.tyler@gmail.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: Toshi Kani <toshi.kani@hpe.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: baicar.tyler@gmail.com Cc: john.garry@huawei.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: shiju.jose@huawei.com Cc: tanxiaofei@huawei.com Cc: wanghuiqiang@huawei.com Link: http://lkml.kernel.org/r/1537322340-1860-1-git-send-email-wufan@codeaurora.org
|
#
001f8613 |
|
12-Mar-2018 |
Tony Luck <tony.luck@intel.com> |
EDAC: Add new memory type for non-volatile DIMMs There are now non-volatile versions of DIMMs. Add a new entry to "enum mem_type" and a new string in edac_mem_types[]. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Len Brown <lenb@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-acpi@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/20180312182430.10335-3-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
c54182ec |
|
28-Jun-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Get rid of mci->mod_ver It is a write-only variable so get rid of it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Robert Richter <rric@kernel.org> Acked-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Thor Thayer <thor.thayer@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Loc Ho <lho@apm.com> Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org
|
#
bffc7dec |
|
04-Feb-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Rename report status accessors Change them to have the edac_ prefix. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
fee27d7d |
|
04-Feb-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Delete edac_stub.c Move the remaining functionality to edac_mc.c. Convert "edac_report=" to a module parameter. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
d3116a08 |
|
26-Jan-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Remove edac_err_assert ... and the glue around it. It is not needed anymore. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
97bb6c17 |
|
26-Jan-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Get rid of edac_handlers Use mc_devices list instead to check whether we have EDAC driver instances successfully registered with EDAC core. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
db47d5f8 |
|
25-Jan-2017 |
Borislav Petkov <bp@suse.de> |
x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI Apparently, some machines used to report DRAM errors through a PCI SERR NMI. This is why we have a call into EDAC in the NMI handler. See c0d121720220 ("drivers/edac: add new nmi rescan"). From looking at the patch above, that's two drivers: e752x_edac.c and e7xxx_edac.c. Now, I wanna say those are old machines which are probably decommissioned already. Tony says that "[t]the newest CPU supported by either of those drivers is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some folks are still using these ... but people that hold onto h/w for 7 years generally cling to old s/w too ... so I'd guess it unlikely that we will get complaints for breaking these in upstream." So even if there is a small number still in use, we did load EDAC with edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact) which means a default EDAC setup without any parameters supplied on the command line or otherwise would never even log the error in the NMI handler because we're polling by default: inline int edac_handler_set(void) { if (edac_op_state == EDAC_OPSTATE_POLL) return 0; return atomic_read(&edac_handlers); } So, long story short, I'd like to get rid of that nastiness called edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If we ever have to do stuff like that again, it should be notifiers we're using and not some insanity like this one. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com>
|
#
eca90a3b |
|
05-Jan-2017 |
Alexander Alemayhu <alexander@alemayhu.com> |
EDAC: Fix typos in enum mem_type comments s/labed/labeled/ s/differenciate/differentiate/ Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170105211150.24003-1-alexander@alemayhu.com Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
4838a0de |
|
01-Dec-2016 |
Yazen Ghannam <Yazen.Ghannam@amd.com> |
EDAC: Document HW_EVENT_ERR_DEFERRED type Add a description of the HW_EVENT_ERR_DEFERRED type that wasn't included with commit d12a969ebbfc ("EDAC, amd64: Add Deferred Error type"). Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
#
6b1fb6f7 |
|
29-Oct-2016 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac.rst: move concepts dictionary from edac.h Instead of storing the concepts dictionary inside header file, move it to the subsystem documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
#
e0020758 |
|
28-Oct-2016 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: fix kenel-doc markups at edac.h As this file was never added to the driver-api, the kernel-doc markups there were never tested. Some of them have issues. Fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
#
0b892c71 |
|
29-Oct-2016 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: move EDAC PCI definitions to drivers/edac/edac_pci.h The edac_core.h header contain data structures and function definitions for the 3 parts of EDAC: MC, PCI and device. Let's move the PCI ones to a separate header file, as part of a header reorganization. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
#
d12a969e |
|
17-Nov-2016 |
Yazen Ghannam <Yazen.Ghannam@amd.com> |
EDAC, amd64: Add Deferred Error type Currently, deferred errors are classified as correctable in EDAC. Add a new error type for deferred errors so that they are correctly reported to the user. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
1e8096bb |
|
17-Nov-2016 |
Yazen Ghannam <Yazen.Ghannam@amd.com> |
EDAC: Add LRDDR4 DRAM type AMD Fam17h systems can support Load-Reduced DDR4 DIMMs. So add this new type to edac.h in preparation for the Fam17h EDAC update. Also, let's fix a format issue with the LRDDR3 line while we're here. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-3-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
a97d2627 |
|
30-Nov-2015 |
Borislav Petkov <bp@suse.de> |
EDAC: Unexport and make edac_subsys static ... and use the accessor instead. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
733476cf |
|
27-Nov-2015 |
Borislav Petkov <bp@suse.de> |
EDAC: Rip out the edac_subsys reference counting This was really dumb - reference counting for the main EDAC sysfs object. While we could've simply registered it as the first thing in the module init path and then hand it around to what needs it. Do that and rip out all the code around it, thus simplifying the whole handling significantly. Move the edac_subsys node back to edac_module.c. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
255379ae |
|
03-Dec-2015 |
Jim Snow <jim.m.snow@intel.com> |
EDAC: Add DDR4 flag Make EDAC aware of DDR4/RDDR4 mem types. Signed-off-by: Jim Snow <jim.m.snow@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-2-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
621a5f7a |
|
26-Sep-2015 |
Viresh Kumar <viresh.kumar@linaro.org> |
debugfs: Pass bool pointer to debugfs_create_bool() Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument, when all it needs is a boolean pointer. It would be better to update this API to make it accept 'bool *' instead, as that will make it more consistent and often more convenient. Over that bool takes just a byte. That required updates to all user sites as well, in the same commit updating the API. regmap core was also using debugfs_{read|write}_file_bool(), directly and variable types were updated for that to be bool as well. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
7ac8bf9b |
|
22-Sep-2015 |
Borislav Petkov <bp@suse.de> |
EDAC: Carve out debugfs functionality ... into a separate compilation unit and drop a couple of CONFIG_EDAC_DEBUG ifdefferies. Rename edac_create_debug_nodes() to edac_create_debugfs_nodes(), while at it. No functionality change. Cc: <linux-edac@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
348fec70 |
|
18-Sep-2014 |
Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> |
EDAC: Add DDR3 LRDIMM entries to edac_mem_types F15hM60h adds support for DDR4 and DDR3 LRDIMMs. Add them here. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1411070218-10258-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: improve comments. ] Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
7b827835 |
|
18-Jun-2014 |
Aristeu Rozanski <aris@redhat.com> |
edac: add DDR4 and RDDR4 Haswell memory controller can make use of DDR4 and Registered DDR4 Cc: tony.luck@intel.com Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
#
c700f013 |
|
05-Dec-2013 |
Chen, Gong <gong.chen@linux.intel.com> |
EDAC: Add an edac_report parameter to EDAC This new parameter is used to control how to report HW error reporting, especially for newer Intel platform, like Ivybridge-EX, which contains an enhanced error decoding functionality in the firmware, i.e. eMCA. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
56507694 |
|
18-Oct-2013 |
Chen, Gong <gong.chen@linux.intel.com> |
EDAC, GHES: Update ghes error record info In latest UEFI spec(by now it's 2.4) there are some new fields for memory error reporting. Add these new fields for ghes_edac interface. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
88d84ac9 |
|
18-Jul-2013 |
Borislav Petkov <bp@suse.de> |
EDAC: Fix lockdep splat Fix the following: BUG: key ffff88043bdd0330 not in .data! ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0() DEBUG_LOCKS_WARN_ON(1) Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1 Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc Call Trace: dump_stack warn_slowpath_common warn_slowpath_fmt lockdep_init_map ? trace_hardirqs_on_caller ? trace_hardirqs_on debug_mutex_init __mutex_init bus_register edac_create_sysfs_mci_device edac_mc_add_mc sbridge_probe pci_device_probe driver_probe_device __driver_attach ? driver_probe_device bus_for_each_dev driver_attach bus_add_driver driver_register __pci_register_driver ? 0xffffffffa0010fff sbridge_init ? 0xffffffffa0010fff do_one_initcall load_module ? unset_module_init_ro_nx SyS_init_module tracesys ---[ end trace d24a70b0d3ddf733 ]--- EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0 EDAC sbridge: Driver loaded. What happens is that bus_register needs a statically allocated lock_key because the last is handed in to lockdep. However, struct mem_ctl_info embeds struct bus_type (the whole struct, not a pointer to it) and the whole thing gets dynamically allocated. Fix this by using a statically allocated struct bus_type for the MC bus. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: stable@kernel.org # v3.10 Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
9713faec |
|
11-Mar-2013 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
EDAC: Merge mci.mem_is_per_rank with mci.csbased Both mci.mem_is_per_rank and mci.csbased denote the same thing: the memory controller is csrows based. Merge both fields into one. There's no need for the driver to actually fill it, as the core detects it by checking if one of the layers has the csrows type as part of the memory hierarchy: if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT) per_rank = true; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
1eef1282 |
|
11-Mar-2013 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
amd64_edac: Correct DIMM sizes We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC: Fix csrow size reported in sysfs") tried to address the issue. It fixed the report with the old API but not with the new one. Correct it for the new API too. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [ make it a per-csrow accounting regardless of ->channel_count ] Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
8dd93d45 |
|
19-Feb-2013 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: add support for error type "Info" The CPER spec defines a forth type of error: informational logs. Add support for it at the edac API and at the trace event interface. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
c7ef7645 |
|
21-Feb-2013 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: reduce stack pressure by using a pre-allocated buffer The number of variables at the stack is too big. Reduces the stack usage by using a pre-allocated error buffer. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
c2c93dbc |
|
19-Feb-2013 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: remove proc_name from mci structure proc_name isn't used anywhere. Remove it. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
c66b5a79 |
|
15-Feb-2013 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: add a new memory layer type There are some cases where the memory controller layout is completely hidden. This is the case of firmware-driven error code, like the one provided by GHES. Add a new layer to be used on such memory error report mechanisms. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
80f5ab09 |
|
18-Aug-2012 |
Shaun Ruffell <sruffell@digium.com> |
edac: edac_mc no longer deals with kobjects directly There are no more embedded kobjects in struct mem_ctl_info. Remove a header and a comment that does not reflect the code anymore. Signed-off-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
16a528ee |
|
13-Sep-2012 |
Borislav Petkov <borislav.petkov@amd.com> |
EDAC: Fix csrow size reported in sysfs On csrow-based memory controllers, we combine the csrow size from both channels and there's no need to do that again in csrow_size_show which leads to double the size of a csrow. Fix it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
#
11652769 |
|
13-Sep-2012 |
Borislav Petkov <borislav.petkov@amd.com> |
EDAC: Add memory controller flags The first flag is ->csbased and will be used in common EDAC code later. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
#
38ced28b |
|
12-Jun-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: allow specifying the error count with fake_inject In order to test if the error counters are properly incremented, add a way to specify how many errors were generated by a trace. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
de3910eb |
|
24-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: change the mem allocation scheme to make Documentation/kobject.txt happy Kernel kobjects have rigid rules: each container object should be dynamically allocated, and can't be allocated into a single kmalloc. EDAC never obeyed this rule: it has a single malloc function that allocates all needed data into a single kzalloc. As this is not accepted anymore, change the allocation schema of the EDAC *_info structs to enforce this Kernel standard. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Greg K H <gregkh@linuxfoundation.org> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
452a6bf9 |
|
26-Mar-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Add debufs nodes to allow doing fake error inject Sometimes, it is useful to have a mechanism that generates fake errors, in order to test the EDAC core code, and the userspace tools. Provide such mechanism by adding a few debugfs nodes. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
d90c0089 |
|
21-Mar-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Get rid of the old kobj's from the edac mc code Now that al users for the old kobj raw access are gone, we can get rid of the legacy kobj-based structures and data. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
7a623c03 |
|
16-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: rewrite the sysfs code to use struct device The EDAC subsystem uses the old struct sysdev approach, creating all nodes using the raw sysfs API. This is bad, as the API is deprecated. As we'll be changing the EDAC API, let's first port the existing code to struct device. There's one drawback on this patch: driver-specific sysfs nodes, used by mpc85xx_edac, amd64_edac and i7core_edac won't be created anymore. While it would be possible to also port the device-specific code, that would mix kobj with struct device, with is not recommended. Also, it is easier and nicer to move the code to the drivers, instead, as the core can get rid of some complex logic that just emulates what the device_add() and device_create_file() already does. The next patches will convert the driver-specific code to use the device-specific calls. Then, the remaining bits of the old sysfs API will be removed. NOTE: a per-MC bus is required, otherwise devices with more than one memory controller will hit a bug like the one below: [ 819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev() [ 819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1 [ 819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1 [ 819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0 [ 819.094984] ------------[ cut here ]------------ [ 819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0() [ 819.107282] Hardware name: S2600CP [ 819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0' [ 819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan] [ 819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1 [ 819.184113] Call Trace: [ 819.186868] [<ffffffff8105adaf>] warn_slowpath_common+0x7f/0xc0 [ 819.193573] [<ffffffff8105aea6>] warn_slowpath_fmt+0x46/0x50 [ 819.200000] [<ffffffff811f53d1>] sysfs_add_one+0xc1/0xf0 [ 819.206025] [<ffffffff811f5cf5>] sysfs_do_create_link+0x135/0x220 [ 819.212944] [<ffffffff811f7023>] ? sysfs_create_group+0x13/0x20 [ 819.219656] [<ffffffff811f5df3>] sysfs_create_link+0x13/0x20 [ 819.226109] [<ffffffff813b04f6>] bus_add_device+0xe6/0x1b0 [ 819.232350] [<ffffffff813ae7cb>] device_add+0x2db/0x460 [ 819.238300] [<ffffffffa0325634>] edac_create_dimm_object+0x84/0xf0 [edac_core] [ 819.246460] [<ffffffffa0325e18>] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core] [ 819.255215] [<ffffffffa0322e2a>] edac_mc_add_mc+0x5a/0x2c0 [edac_core] [ 819.262611] [<ffffffffa03412df>] sbridge_register_mci+0x1bc/0x279 [sb_edac] [ 819.270493] [<ffffffffa03417a3>] sbridge_probe+0xef/0x175 [sb_edac] [ 819.277630] [<ffffffff813ba4e8>] ? pm_runtime_enable+0x58/0x90 [ 819.284268] [<ffffffff812f430c>] local_pci_probe+0x5c/0xd0 [ 819.290508] [<ffffffff812f5ba1>] __pci_device_probe+0xf1/0x100 [ 819.297117] [<ffffffff812f5bea>] pci_device_probe+0x3a/0x60 [ 819.303457] [<ffffffff813b1003>] really_probe+0x73/0x270 [ 819.309496] [<ffffffff813b138e>] driver_probe_device+0x4e/0xb0 [ 819.316104] [<ffffffff813b149b>] __driver_attach+0xab/0xb0 [ 819.322337] [<ffffffff813b13f0>] ? driver_probe_device+0xb0/0xb0 [ 819.329151] [<ffffffff813af5d6>] bus_for_each_dev+0x56/0x90 [ 819.335489] [<ffffffff813b0d7e>] driver_attach+0x1e/0x20 [ 819.341534] [<ffffffff813b0980>] bus_add_driver+0x1b0/0x2a0 [ 819.347884] [<ffffffffa0347000>] ? 0xffffffffa0346fff [ 819.353641] [<ffffffff813b19f6>] driver_register+0x76/0x140 [ 819.359980] [<ffffffff8159f18b>] ? printk+0x51/0x53 [ 819.365524] [<ffffffffa0347000>] ? 0xffffffffa0346fff [ 819.371291] [<ffffffff812f5896>] __pci_register_driver+0x56/0xd0 [ 819.378096] [<ffffffffa0347054>] sbridge_init+0x54/0x1000 [sb_edac] [ 819.385231] [<ffffffff8100203f>] do_one_initcall+0x3f/0x170 [ 819.391577] [<ffffffff810bcd2e>] sys_init_module+0xbe/0x230 [ 819.397926] [<ffffffff815bb529>] system_call_fastpath+0x16/0x1b [ 819.404633] ---[ end trace 1654fdd39556689f ]--- This happens because the bus is not being properly initialized. Instead of putting the memory sub-devices inside the memory controller, it is putting everything under the same directory: $ tree /sys/bus/edac/ /sys/bus/edac/ ├── devices │ ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts │ ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0 │ ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1 │ ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2 │ ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0 │ ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1 │ ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3 │ ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6 │ ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch │ ├── mc -> ../../../devices/system/edac/mc │ └── mc0 -> ../../../devices/system/edac/mc/mc0 ├── drivers ├── drivers_autoprobe ├── drivers_probe └── uevent On a multi-memory controller system, the names "csrow%d" and "dimm%d" should be under "mc%d", and not at the main hierarchy level. So, we need to create a per-MC bus, in order to have its own namespace. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Greg K H <gregkh@linuxfoundation.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
b0610bb8 |
|
21-Mar-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: use Documentation-nano format for some data structs No functional changes. Just comment improvements. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
fd687502 |
|
16-Mar-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Rename the parent dev to pdev As EDAC doesn't use struct device itself, it created a parent dev pointer called as "pdev". Now that we'll be converting it to use struct device, instead of struct devsys, this needs to be fixed. No functional changes. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
5926ff50 |
|
09-Feb-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Initialize the dimm label with the known information While userspace doesn't fill the dimm labels, add there the dimm location, as described by the used memory model. This could eventually match what is described at the dmidecode, making easier for people to identify the memory. For example, on an Intel motherboard where the DMI table is reliable, the first memory stick is described as: Memory Device Array Handle: 0x0029 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 2048 MB Form Factor: DIMM Set: 1 Locator: A1_DIMM0 Bank Locator: A1_Node0_Channel0_Dimm0 Type: <OUT OF SPEC> Type Detail: Synchronous Speed: 800 MHz Manufacturer: A1_Manufacturer0 Serial Number: A1_SerNum0 Asset Tag: A1_AssetTagNum0 Part Number: A1_PartNum0 The memory named as "A1_DIMM0" is physically located at the first memory controller (node 0), at channel 0, dimm slot 0. After this patch, the memory label will be filled with: /sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0 And (after the new EDAC API patches) as: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0 So, even if the memory label is not initialized on userspace, an useful information with the error location is filled there, expecially since several systems/motherboards are provided with enough info to map from channel/slot (or branch/channel/slot) into the DIMM label. So, letting the EDAC core fill it by default is a good thing. It should noticed that, as the label filling happens at the edac_mc_alloc(), drivers can override it to better describe the memories (and some actually do it). Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
4275be63 |
|
18-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Change internal representation to work with layers Change the EDAC internal representation to work with non-csrow based memory controllers. There are lots of those memory controllers nowadays, and more are coming. So, the EDAC internal representation needs to be changed, in order to work with those memory controllers, while preserving backward compatibility with the old ones. The edac core was written with the idea that memory controllers are able to directly access csrows. This is not true for FB-DIMM and RAMBUS memory controllers. Also, some recent advanced memory controllers don't present a per-csrows view. Instead, they view memories as DIMMs, instead of ranks. So, change the allocation and error report routines to allow them to work with all types of architectures. This will allow the removal of several hacks with FB-DIMM and RAMBUS memory controllers. Also, several tests were done on different platforms using different x86 drivers. TODO: a multi-rank DIMMs are currently represented by multiple DIMM entries in struct dimm_info. That means that changing a label for one rank won't change the same label for the other ranks at the same DIMM. This bug is present since the beginning of the EDAC, so it is not a big deal. However, on several drivers, it is possible to fix this issue, but it should be a per-driver fix, as the csrow => DIMM arrangement may not be equal for all. So, don't try to fix it here yet. I tried to make this patch as short as possible, preceding it with several other patches that simplified the logic here. Yet, as the internal API changes, all drivers need changes. The changes are generally bigger in the drivers for FB-DIMMs. Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
982216a4 |
|
16-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac.h: Add generic layers for describing a memory location The edac core were written with the idea that memory controllers are able to directly access csrows, and that the channels are used inside a csrows select. This is not true for FB-DIMM and RAMBUS memory controllers. Also, some recent advanced memory controllers don't present a per-csrows view. Instead, they view memories as DIMMs, instead of ranks, accessed via csrow/channel. So, changes are needed in order to allow the EDAC core to work with all types of architectures. In preparation for handling non-csrows based memory controllers, add some memory structs and a macro: enum hw_event_mc_err_type: describes the type of error (corrected, uncorrected, fatal) To be used by the new edac_mc_handle_error function; enum edac_mc_layer: describes the type of a given memory architecture layer (branch, channel, slot, csrow). struct edac_mc_layer: describes the properties of a memory layer (type, size, and if the layer will be used on a virtual csrow. EDAC_DIMM_PTR() - as the number of layers can vary from 1 to 3, this macro converts from an address with up to 3 layers into a linear address. Reviewed-by: Borislav Petkov <bp@amd64.org> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
a895bf8b |
|
28-Jan-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: move nr_pages to dimm struct The number of pages is a dimm property. Move it to the dimm struct. After this change, it is possible to add sysfs nodes for the DIMM's that will properly represent the DIMM stick properties, including its size. A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when the memory controller represents the memory via chip select rows. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
084a4fcc |
|
27-Jan-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: move dimm properties to struct dimm_info On systems based on chip select rows, all channels need to use memories with the same properties, otherwise the memories on channels A and B won't be recognized. However, such assumption is not true for all types of memory controllers. Controllers for FB-DIMM's don't have such requirements. Also, modern Intel controllers seem to be capable of handling such differences. So, we need to get rid of storing the DIMM information into a per-csrow data, storing it, instead at the right place. The first step is to move grain, mtype, dtype and edac_mode to the per-dimm struct. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: James Bottomley <James.Bottomley@parallels.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Mike Williams <mike@mikebwilliams.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
a7d7d2e1 |
|
27-Jan-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Create a dimm struct and move the labels into it The way a DIMM is currently represented implies that they're linked into a per-csrow struct. However, some drivers don't see csrows, as they're ridden behind some chip like the AMB's on FBDIMM's, for example. This forced drivers to fake^Wvirtualize a csrow struct, and to create a mess under csrow/channel original's concept. Move the DIMM labels into a per-DIMM struct, and add there the real location of the socket, in terms of csrow/channel. Latter patches will modify the location to properly represent the memory architecture. All other drivers will use a per-csrow type of location. Some of those drivers will require a latter conversion, as they also fake the csrows internally. TODO: While this patch doesn't change the existing behavior, on csrows-based memory controllers, a csrow/channel pair points to a memory rank. There's a known bug at the EDAC core that allows having different labels for the same DIMM, if it has more than one rank. A latter patch is need to merge the several ranks for a DIMM into the same dimm_info struct, in order to avoid having different labels for the same DIMM. The edac_mc_alloc() will now contain a per-dimm initialization loop that will be changed by latter patches in order to match other types of memory architectures. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
a4b4be3f |
|
27-Jan-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: rename channel_info to rank_info What it is pointed by a csrow/channel vector is a rank information, and not a channel information. On a traditional architecture, the memory controller directly access the memory ranks, via chip select rows. Different ranks at the same DIMM is selected via different chip select rows. So, typically, one csrow/channel pair means one different DIMM. On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced Memory Buffer (AMB) that serves as the interface between the memory controller and the memory chips. The AMB selection is via the DIMM slot, and not via a csrow. It is up to the AMB to talk with the csrows of the DRAM chips. So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM rank. RAMBUS is similar. Newer memory controllers, like the ones found on Intel Sandy Bridge and Nehalem, even working with normal DDR3 DIMM's, don't use the usual channel A/channel B interleaving schema to provide 128 bits data access. Instead, they have more channels (3 or 4 channels), and they can use several interleaving schemas. Such memory controllers see the DIMMs directly on their registers, instead of the ranks, which is better for the driver, as its main usageis to point to a broken DIMM stick (the Field Repleceable Unit), and not to point to a broken DRAM chip. The drivers that support such such newer memory architecture models currently need to fake information and to abuse on EDAC structures, as the subsystem was conceived with the idea that the csrow would always be visible by the CPU. To make things a little worse, those drivers don't currently fake csrows/channels on a consistent way, as the concepts there don't apply to the memory controllers they're talking with. So, each driver author interpreted the concepts using a different logic. In order to fix it, let's rename the data structure that points into a DIMM rank to "rank_info", in order to be clearer about what's stored there. Latter patches will provide a better way to represent the memory hierarchy for the other types of memory controller. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
01a6e28b |
|
03-Feb-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Improve the comments to better describe the memory concepts The Computer memory terminology has changed with time since EDAC was originally written: new concepts were introduced, and some things have different meanings, depending on the memory architecture. Improve the definition of all related terms. Also, describe each memory type in a more detailed fashion. No functional changes. Just comments were touched. Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
313162d0 |
|
30-Jan-2012 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
device.h: audit and cleanup users in main include dir The <linux/device.h> header includes a lot of stuff, and it in turn gets a lot of use just for the basic "struct device" which appears so often. Clean up the users as follows: 1) For those headers only needing "struct device" as a pointer in fcn args, replace the include with exactly that. 2) For headers not really using anything from device.h, simply delete the include altogether. 3) For headers relying on getting device.h implicitly before being included themselves, now explicitly include device.h 4) For files in which doing #1 or #2 uncovers an implicit dependency on some other header, fix by explicitly adding the required header(s). Any C files that were implicitly relying on device.h to be present have already been dealt with in advance. Total removals from #1 and #2: 51. Total additions coming from #3: 9. Total other implicit dependencies from #4: 7. As of 3.3-rc1, there were 110, so a net removal of 42 gives about a 38% reduction in device.h presence in include/* Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
fe5ff8b8 |
|
14-Dec-2011 |
Kay Sievers <kay.sievers@vrfy.org> |
edac: convert sysdev_class to a regular subsystem After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Cc: Doug Thompson <dougthompson@xmission.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
ddeb3547 |
|
04-Mar-2011 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Move edac main structs to include/linux/edac.h As we'll need to use those structs for trace functions, they should be on a more public place. So, move struct mem_ctl_info & friends to edac.h. No functional changes on this patch. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com>
|
#
60063497 |
|
26-Jul-2011 |
Arun Sharma <asharma@fb.com> |
atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
30e1f7a8 |
|
02-Sep-2010 |
Borislav Petkov <borislav.petkov@amd.com> |
EDAC: Export edac sysfs class to users. Move toplevel sysfs class to the stub and make it available to non-modularized code too. Add proper refcounting of its users and move the registration functionality into the reference counting routines. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
#
c3c52bce |
|
29-Apr-2008 |
Hitoshi Mitake <h.mitake@gmail.com> |
edac: fix module initialization on several modules 2nd time I implemented opstate_init() as a inline function in linux/edac.h. added calling opstate_init() to: i82443bxgx_edac.c i82860_edac.c i82875p_edac.c i82975x_edac.c I wrote a fixed patch of edac-fix-module-initialization-on-several-modules.patch, and tested building 2.6.25-rc7 with applying this. It was succeed. I think the patch is now correct. Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
66ee2f94 |
|
19-Jul-2007 |
Dave Jiang <djiang@mvista.com> |
drivers/edac: mod assert_error check Change error check and clear variable from an atomic to an int Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c0d12172 |
|
19-Jul-2007 |
Dave Jiang <djiang@mvista.com> |
drivers/edac: add new nmi rescan Provides a way for NMI reported errors on x86 to notify the EDAC subsystem pending ECC errors by writing to a software state variable. Here's the reworked patch. I added an EDAC stub to the kernel so we can have variables that are in the kernel even if EDAC is a module. I also implemented the idea of using the chip driver to select error detection mode via module parameter and eliminate the kernel compile option. Please review/test. Thx! Also, I only made changes to some of the chipset drivers since I am unfamiliar with the other ones. We can add similar changes as we go. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|