#
9a5f580c |
|
02-Nov-2023 |
Muralidhara M K <muralidhara.mk@amd.com> |
EDAC/mc: Add support for HBM3 memory type AMD MI300A models use HBM3 (High Bandwidth Memory Gen 3) memory. HBM is a high-speed computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM). Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231102114225.2006878-4-muralimk@amd.com
|
#
93df1947 |
|
22-Aug-2022 |
Serge Semin <Sergey.Semin@baikalelectronics.ru> |
EDAC/mc: Drop duplicated dimm->nr_pages debug printout The duplicated edac_dbg()-based dimm->nr_pages print was introduced in 6e84d359b2be ("edac_mc: Cleanup per-dimm_info debug messages"). The duplicated line can be found even in the commit message text: [ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 [ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8 [ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 Drop the second edac_dbg() call. [ bp: Massage commit message. ] Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220822190730.27277-14-Sergey.Semin@baikalelectronics.ru
|
#
13088b65 |
|
12-Apr-2022 |
Borislav Petkov <bp@suse.de> |
EDAC: Use kcalloc() It is syntactic sugar anyway: # drivers/edac/edac_mc.o: text data bss dec hex filename 13378 324 8 13710 358e edac_mc.o.before 13378 324 8 13710 358e edac_mc.o.after md5: 70a53ee3ac7f867730e35c2be9110748 edac_mc.o.before.asm 70a53ee3ac7f867730e35c2be9110748 edac_mc.o.after.asm # drivers/edac/edac_device.o: text data bss dec hex filename 5704 120 4 5828 16c4 edac_device.o.before 5704 120 4 5828 16c4 edac_device.o.after md5: 880563c859da6eb9aca85ec431fdbaeb edac_device.o.before.asm 880563c859da6eb9aca85ec431fdbaeb edac_device.o.after.asm No functional changes. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220412211957.28899-1-bp@alien8.de
|
#
713c4ff8 |
|
08-Mar-2022 |
Borislav Petkov <bp@suse.de> |
EDAC/mc: Get rid of edac_align_ptr() Get rid of it now that it is unused. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-6-bp@alien8.de
|
#
0bbb265f |
|
20-Feb-2022 |
Borislav Petkov <bp@suse.de> |
EDAC/mc: Get rid of silly one-shot struct allocation in edac_mc_alloc() This has probably meant something at some point but there's no need for it anymore - the struct mem_ctl_info allocation can happen with normal, boring k*alloc() calls like everyone else does it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-2-bp@alien8.de
|
#
b0596da1 |
|
13-Jan-2022 |
Eliav Farber <farbere@amazon.com> |
EDAC/mc: Remove unnecessary cast to char * in edac_align_ptr() Remove the forgotten (char *) casts as that function returns void *. [ bp: Rewrite commit message. ] Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220113100622.12783-3-farbere@amazon.com
|
#
f8efca92 |
|
13-Jan-2022 |
Eliav Farber <farbere@amazon.com> |
EDAC: Fix calculation of returned address and next offset in edac_align_ptr() Do alignment logic properly and use the "ptr" local variable for calculating the remainder of the alignment. This became an issue because struct edac_mc_layer has a size that is not zero modulo eight, and the next offset that was prepared for the private data was unaligned, causing an alignment exception. The patch in Fixes: which broke this actually wanted to "what we actually care about is the alignment of the actual pointer that's about to be returned." But it didn't check that alignment. Use the correct variable "ptr" for that. [ bp: Massage commit message. ] Fixes: 8447c4d15e35 ("edac: Do alignment logic properly in edac_align_ptr()") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220113100622.12783-2-farbere@amazon.com
|
#
f9571124 |
|
08-Dec-2021 |
Yazen Ghannam <yazen.ghannam@amd.com> |
EDAC: Add RDDR5 and LRDDR5 memory types Include Registered-DDR5 and Load-Reduced DDR5 in the list of memory types. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208174356.1997855-2-yazen.ghannam@amd.com
|
#
fca61165 |
|
03-Sep-2021 |
Len Baker <len.baker@gmx.com> |
EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or scnprintf() strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehavior. The safe replacement is strscpy(). [1][2] However, to simplify and clarify the code, to concatenate labels use the scnprintf() function. This way it is not necessary to check the return value of strscpy() (-E2BIG if the parameter count is 0 or the src was truncated) since scnprintf() always returns the number of chars written into the buffer. This function always returns a nul-terminated string even if it needs to be truncated. While at it, fix all other broken string generation code that wrongly interprets snprintf()'s return code or just uses sprintf(), implement that using scnprintf() here too. Drop breaks in loops around scnprintf() as it is safe now to loop. Moreover, the check is not needed: for the case when the buffer is exhausted, len never gets zero because scnprintf() takes the full buffer length as input parameter, but excludes the trailing '\0' in its return code and thus, 1 is the minimum len. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [2] https://github.com/KSPP/linux/issues/88 [ rric: Replace snprintf() with scnprintf(), rework sprintf() user, drop breaks in loops around scnprintf(), introduce 'end' pointer to reduce pointer arithmetic, use prefix pattern for e->location, adjust subject and description ] Co-developed-by: Joe Perches <joe@perches.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Len Baker <len.baker@gmx.com> Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210903150539.7282-1-len.baker@gmx.com
|
#
e1ca90b7 |
|
30-Jun-2021 |
Naveen Krishna Chatradhi <nchatrad@amd.com> |
EDAC/mc: Add new HBM2 memory type Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]' for HBM2 (High Bandwidth Memory Gen 2) new memory type. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210630152828.162659-4-nchatrad@amd.com
|
#
bc1c99a5 |
|
17-Nov-2020 |
Qiuxu Zhuo <qiuxu.zhuo@intel.com> |
EDAC: Add DDR5 new memory type Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]' for DDR5 new memory type. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
3b203693 |
|
05-Nov-2020 |
Qiuxu Zhuo <qiuxu.zhuo@intel.com> |
EDAC: Add three new memory types There are {Low-Power DDR3/4, WIO2} types of memory. Add new entries to 'enum mem_type' and new strings to 'edac_mem_types[]' for the new types. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
e9ff6636 |
|
10-Jun-2020 |
Zhenzhong Duan <zhenzhong.duan@gmail.com> |
EDAC/mc: Call edac_inc_ue_error() before panic By calling edac_inc_ue_error() before panic, we get a correct UE error count for core dump analysis. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200610065846.3626-2-zhenzhong.duan@gmail.com
|
#
7fc0b9b9 |
|
14-Feb-2020 |
Tony Luck <tony.luck@intel.com> |
EDAC: Drop the EDAC report status checks When acpi_extlog was added, we were worried that the same error would be reported more than once by different subsystems. But in the ensuing years I've seen complaints that people could not find an error log (because this mechanism suppressed the log they were looking for). Rip it all out. People are smart enough to notice the same address from different reporting mechanisms. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-8-tony.luck@intel.com
|
#
4aa92c86 |
|
16-Feb-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Remove per layer counters Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it turns out that only the leaves in the memory hierarchy are consumed (in sysfs), but not the intermediate layers, e.g.: count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx]; These unused counters only add complexity, remove them. The error counter values are directly stored in struct dimm_info now. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-11-rrichter@marvell.com
|
#
1853ee72 |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Remove detail[] string and cleanup error string generation The error descriptor is passed to the error reporting functions, so the error details can be directly generated there. Move string generation from edac_raw_mc_handle_error() to edac_ce_error() and edac_ue_error(). The intermediate detail[] string can be removed then. Also, cleanup the string generation by switching to a single variant only using the ternary operator. [ bp: put ternary operators on a separate line for better readability and use the short-form "inline if" in edac_mc_handle_error(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-10-rrichter@marvell.com
|
#
6ab76179 |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Pass the error descriptor to error reporting functions Most arguments of error reporting functions are already stored in the struct edac_raw_error_desc error descriptor. Pass the error descriptor to the functions and reduce the functions' argument list. [ bp: Sort function args in reverse fir tree order. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-9-rrichter@marvell.com
|
#
67792cf9 |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Remove enable_per_layer_report function argument Many functions carry the enable_per_layer_report argument. This is a bool value indicating the error information contains some location data where the error occurred. This can easily being determined by checking the pos[] array for values. Negative values indicate there is no location available. So if the top layer is negative, the error location is unknown. Just check if the top layer is negative and remove enable_per_layer_report as function argument and also from struct edac_raw_error_desc. [ bp: Reflow comments to 80 columns, while at it. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-8-rrichter@marvell.com
|
#
65bb4d1a |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Report "unknown memory" on too many DIMM labels found There is a limitation to report only EDAC_MAX_LABELS in e->label of the error descriptor. This is to prevent a potential string overflow. The current implementation falls back to "any memory" in this case and also stops all further processing to find a unique row and channel of the possible error location. Reporting "any memory" is wrong as the memory controller reported an error location for one of the layers. Instead, report "unknown memory" and also do not break early in the loop to further check row and channel for uniqueness. [ bp: Massage commit message. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-7-rrichter@marvell.com
|
#
6334dc4e |
|
14-Feb-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Carve out error increment into a separate function Carve out the error_count increment into a separate function edac_inc_csrow(). This better separates code and reduces the indentation level. Implementation note: The function edac_inc_csrow() counts the same as before, ->ce_count is only incremented if row >= 0. This is esp. true for the case of (!e->enable_per_layer_report). Here, a DIMM was not found, variable row still has a value of -1 and ->ce_count is not incremented. [ bp: Massage commit message. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200214141757.8976-1-rrichter@marvell.com
|
#
91b327f6 |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Determine mci pointer from the error descriptor Each struct mci has its own error descriptor. Create a function error_desc_to_mci() to determine the corresponding mci from an error descriptor. This removes @mci from the parameter list of edac_raw_mc_handle_error() as the mci pointer does not need to be passed any longer. [ bp: Massage commit message. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-5-rrichter@marvell.com
|
#
672ef0e5 |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC: Store error type in struct edac_raw_error_desc Store the error type in struct edac_raw_error_desc. This makes the type parameter of edac_raw_mc_handle_error() obsolete. [ kernel-doc typo ] Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-4-rrichter@marvell.com
|
#
1f27c790 |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Reorder functions edac_mc_alloc*() Reorder the new created functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms() and move them before edac_mc_alloc(). No further code changes. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-3-rrichter@marvell.com
|
#
aad28c6f |
|
23-Jan-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Split edac_mc_alloc() into smaller functions edac_mc_alloc() is huge. Factor out code by moving it to the two new functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms(). Do not move code yet for better review. [ bp: sort local args in reversed fir tree order. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-2-rrichter@marvell.com
|
#
bea1bfd5 |
|
12-Feb-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Change mci device removal to use put_device() There are dimm and csrow devices linked to the mci device esp. to show up in sysfs. It must be granted that children devices are removed before its mci parent. Thus, the release functions must be called in the correct order and may not miss any child before releasing its parent. In the current implementation this is only granted by the correct order of release functions. A much better approach is to use put_device() that releases the device only after all users are gone. It is the recommended way to release a device and free its memory. The function uses the device's refcount and only frees it if there are no users of it anymore such as children. So implement a mci_release() function to remove mci devices, use put_device() to free them and early initialize the mci device right after its struct has been allocated. Change the release function so that it can be universally used no matter if the device is registered or not. Since subsequent dimm and csrow sysfs links are implemented as children devices, their refcounts will keep the parent mci device from being removed as long as sysfs entries exist and until all users have been unregistered in edac_remove_sysfs_mci_device(). Remove edac_unregister_sysfs() and merge mci sysfs removal into edac_remove_sysfs_mci_device(). There is only a single instance now that removes the sysfs entries. The function can now be used in the error paths for cleanup. Also, create device release functions for all involved devices (dev->release), remove device_type release functions (dev_type-> release) and also use dev->init_name instead of dev_set_name(). [ bp: Massage commit message and comments. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200212120340.4764-5-rrichter@marvell.com
|
#
216aa145 |
|
12-Feb-2020 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Fix use-after-free and memleaks during device removal A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and DEBUG_KMEMLEAK set, revealed several issues when removing an mci device: 1) Use-after-free: On 27.11.19 17:07:33, John Garry wrote: > [ 22.104498] BUG: KASAN: use-after-free in > edac_remove_sysfs_mci_device+0x148/0x180 The use-after-free is caused by the mci_for_each_dimm() macro called in edac_remove_sysfs_mci_device(). The iterator was introduced with c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator"). The iterator loop calls device_unregister(&dimm->dev), which removes the sysfs entry of the device, but also frees the dimm struct in dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(), the dimm struct is accessed again, after having been freed already. The fix is to free all the mci device's subsequent dimm and csrow objects at a later point, in _edac_mc_free(), when the mci device itself is being freed. This keeps the data structures intact and the mci device can be fully used until its removal. The change allows the safe usage of mci_for_each_dimm() to release dimm devices from sysfs. 2) Memory leaks: Following memory leaks have been detected: # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c 1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows 16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels 16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn] 1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms 34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc() All leaks are from memory allocated by edac_mc_alloc(). Note: The test above shows that edac_mc_alloc() was called here from ghes_edac_register(), thus both functions show up in the stack trace but the module causing the leaks is edac_mc. The comments with the data structures involved were made manually by analyzing the objdump. The data structures listed above and created by edac_mc_alloc() are not properly removed during device removal, which is done in edac_mc_free(). There are two paths implemented to remove the device depending on device registration, _edac_mc_free() is called if the device is not registered and edac_unregister_sysfs() otherwise. The implemenations differ. For the sysfs case, the mci device removal lacks the removal of subsequent data structures (csrows, channels, dimms). This causes the memory leaks (see mci_attr_release()). [ bp: Massage commit message. ] Fixes: c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator") Fixes: faa2ad09c01c ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.") Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
|
#
787d8999 |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC: Unify the mc_event tracepoint call The code in ghes_edac.c and edac_mc.c for grain_bits calculation and calling trace_mc_event() is now the same. Move it to a single location in edac_raw_mc_handle_error(). The only difference is the missing IS_ENABLED(CONFIG_RAS) switch, but this is needed for ghes too. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-13-rrichter@marvell.com
|
#
0d8292e0 |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Reduce indentation level in edac_mc_handle_error() Reduce the indentation level in edac_mc_handle_error() a bit. No functional changes. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-7-rrichter@marvell.com
|
#
47bec6b4 |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Remove needless zero string termination The e string to which this is pointing to has already been cleared earlier in the function so remove the needless zero string termination. [ bp: Correct the commit message. ] Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-6-rrichter@marvell.com
|
#
d260e8ff |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Do not BUG_ON() in edac_mc_alloc() No need to crash the system in case edac_mc_alloc() is called with invalid arguments, just warn and return. This would cause a checkpatch warning when touching the code later, so just fix it. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-5-rrichter@marvell.com
|
#
c498afaf |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC: Introduce an mci_for_each_dimm() iterator Introduce an mci_for_each_dimm() iterator. It returns a pointer to a struct dimm_info. This makes the declaration and use of an index obsolete and avoids access to internal data of struct mci (direct array access etc). [ bp: push the struct dimm_info *dimm; declaration into the CONFIG_EDAC_DEBUG block. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com
|
#
977b1ce7 |
|
06-Nov-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC: Remove EDAC_DIMM_OFF() macro The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index. Simplify this by storing the index in struct dimm_info to avoid its calculation and remove the EDAC_DIMM_OFF() macro. The index can be directly used then. Another advantage is that edac_mc_alloc() could be used even if the exact size of the layers is unknown. Only the number of DIMMs would be needed. Rename iterator variable to idx, while at it. The name is more handy, esp. when searching for it in the code. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com
|
#
d55c79ac |
|
01-Sep-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC: Prefer 'unsigned int' to bare use of 'unsigned' Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for edac_mc*, ghes and the i5100 driver as reported by checkpatch.pl. While at it, struct member dev_ch_attribute->channel is always used as unsigned int. Change type to unsigned int to avoid type casts. [ bp: Massage. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190902123216.9809-2-rrichter@marvell.com
|
#
718d5851 |
|
24-Jun-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Cleanup _edac_mc_free() code Remove needless and boilerplate variable declarations. No functional changes. [ bp: Add newlines for better readability. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190624150758.6695-10-rrichter@marvell.com
|
#
3724ace5 |
|
24-Jun-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Fix grain_bits calculation The grain in EDAC is defined as "minimum granularity for an error report, in bytes". The following calculation of the grain_bits in edac_mc is wrong: grain_bits = fls_long(e->grain) + 1; Where grain_bits is defined as: grain = 1 << grain_bits Example: grain = 8 # 64 bit (8 bytes) grain_bits = fls_long(8) + 1 grain_bits = 4 + 1 = 5 grain = 1 << grain_bits grain = 1 << 5 = 32 Replace it with the correct calculation: grain_bits = fls_long(e->grain - 1); The example gives now: grain_bits = fls_long(8 - 1) grain_bits = fls_long(7) grain_bits = 3 grain = 1 << 3 = 8 Also, check if the hardware reports a reasonable grain != 0 and fallback with a warning to 1 byte granularity otherwise. [ bp: massage a bit. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190624150758.6695-2-rrichter@marvell.com
|
#
29a0c843 |
|
14-May-2019 |
Robert Richter <rrichter@marvell.com> |
EDAC/mc: Fix edac_mc_find() in case no device is found The function should return NULL in case no device is found, but it always returns the last checked mc device from the list even if the index did not match. Fix that. I did some analysis why this did not raise any issues for about 3 years and the reason is that edac_mc_find() is mostly used to search for existing devices. Thus, the bug is not triggered. [ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ] Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list") Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
|
#
861e6ed6 |
|
05-Nov-2018 |
Borislav Petkov <bp@suse.de> |
EDAC: Drop per-memory controller buses ... and use the single edac_subsys object returned from subsys_system_register(). The idea is to have a single bus and multiple devices on it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> CC: Aristeu Rozanski Filho <arozansk@redhat.com> CC: Greg KH <gregkh@linuxfoundation.org> CC: Justin Ernst <justin.ernst@hpe.com> CC: linux-edac <linux-edac@vger.kernel.org> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Russ Anderson <rja@hpe.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic
|
#
b748f2de |
|
10-Aug-2018 |
Takashi Iwai <tiwai@suse.de> |
EDAC: Add missing MEM_LRDDR4 entry in edac_mem_types[] The edac_mem_types[] array misses a MEM_LRDDR4 entry, which leads to NULL pointer dereference when accessed via sysfs or such. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20180810141426.8918-1-tiwai@suse.de Fixes: 1e8096bb2031 ("EDAC: Add LRDDR4 DRAM type") Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
001f8613 |
|
12-Mar-2018 |
Tony Luck <tony.luck@intel.com> |
EDAC: Add new memory type for non-volatile DIMMs There are now non-volatile versions of DIMMs. Add a new entry to "enum mem_type" and a new string in edac_mem_types[]. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Len Brown <lenb@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-acpi@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/20180312182430.10335-3-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
d6dd77eb |
|
12-Mar-2018 |
Tony Luck <tony.luck@intel.com> |
EDAC: Drop duplicated array of strings for memory type names Somehow we ended up with two separate arrays of strings to describe the "enum mem_type" values. In edac_mc.c we have an exported list edac_mem_types[] that is used by a couple of drivers in debug messaged. In edac_mc_sysfs.c we have a private list that is used to display values in: /sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type /sys/devices/system/edac/mc/mc*/csrow*/mem_type This list was missing a value for MEM_LRDDR3. The string values in the two lists were different :-( Combining the lists, I kept the values so that the sysfs output will be unchanged as some scripts may depend on that. Reported-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Len Brown <lenb@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-acpi@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
3877c7d1 |
|
23-Aug-2017 |
Toshi Kani <toshi.kani@hpe.com> |
EDAC: Add helper which returns the loaded platform driver Only a single EDAC platform driver can be loaded. When ghes_edac is enabled, an EDAC platform driver still attempts to register itself and fails in edac_mc_add_mc(). Add edac_get_owner() so that EDAC platform drivers can check the owner first. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Borislav Petkov <bp@alien8.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170823225447.15608-5-toshi.kani@hpe.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
bffc7dec |
|
04-Feb-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Rename report status accessors Change them to have the edac_ prefix. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
fee27d7d |
|
04-Feb-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Delete edac_stub.c Move the remaining functionality to edac_mc.c. Convert "edac_report=" to a module parameter. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
be1d1629 |
|
03-Feb-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Issue tracepoint only when it is defined ... and this happens only when CONFIG_RAS is enabled. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
8c22b4fe |
|
26-Jan-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Move edac_op_state to edac_mc.c ... as part of moving stuff away from edac_stub.c Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
d3116a08 |
|
26-Jan-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Remove edac_err_assert ... and the glue around it. It is not needed anymore. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
97bb6c17 |
|
26-Jan-2017 |
Borislav Petkov <bp@suse.de> |
EDAC: Get rid of edac_handlers Use mc_devices list instead to check whether we have EDAC driver instances successfully registered with EDAC core. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
d7fc9d77 |
|
27-Jan-2017 |
Yazen Ghannam <Yazen.Ghannam@amd.com> |
EDAC: Add routine to check if MC devices list is empty We need to know if any MC devices have been allocated. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com [ Prettify text. ] Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
7c0f6ba6 |
|
24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e01aa14c |
|
26-Oct-2016 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: move documentation from edac_mc.c to edac_core.h Several functions are documented at edac_mc.c. As we'll be including edac_core.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
#
78d88e8a |
|
29-Oct-2016 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: rename edac_core.h to edac_mc.h Now, all left at edac_core.h are at drivers/edac/edac_mc.c, so rename it to edac_mc.h. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
#
c73e8833 |
|
14-Nov-2016 |
Borislav Petkov <bp@suse.de> |
EDAC, mc: Fix locking around mc_devices list When accessing the mc_devices list of memory controller descriptors, we need to hold mem_ctls_mutex. This was not always the case, fix that. Make all external callers call a version which grabs the mutex since the last is local to edac_mc.c. Reported-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
fbedcaf4 |
|
19-May-2016 |
Nicholas Krause <xerofoify@gmail.com> |
EDAC: Fix workqueues poll period resetting After the workqueue cleanup, we're registering workqueues based on the presence of an ->edac_check function. When that is the case, we're setting OP_RUNNING_POLL. But we forgot to check that in edac_mc_reset_delay_period(), leading to: BUG: unable to handle kernel paging request at 0000000000015d10 IP: [ .. ] queued_spin_lock_slowpath PGD 3ffcc8067 PUD 3ffc56067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: ... CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1 Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 Stack: Call Trace: ? _raw_spin_lock_irqsave ? lock_timer_base.isra.34 ? del_timer ? try_to_grab_pending ? mod_delayed_work_on ? edac_mc_reset_delay_period ? edac_set_poll_msec ? param_attr_store ? module_attr_store ? kernfs_fop_write ? __vfs_write ? __vfs_read ? __alloc_fd ? vfs_write ? SyS_write ? entry_SYSCALL_64_fastpath Code: RIP [ .. ] queued_spin_lock_slowpath RSP <> CR2: 0000000000015d10 ---[ end trace 3f286bc71cca15d1 ]--- Kernel panic - not syncing: Fatal exception Fix it. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Cc: <stable@vger.kernel.org> # 4.5 Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com [ Rewrite commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
993f88f1 |
|
23-Apr-2016 |
Emmanouil Maroudas <emmanouil.maroudas@gmail.com> |
EDAC: Increment correct counter in edac_inc_ue_error() Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of ce_noinfo_count. Signed-off-by: Emmanouil Maroudas <emmanouil.maroudas@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 4275be635597 ("edac: Change internal representation to work with layers") Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
06e912d4 |
|
02-Feb-2016 |
Borislav Petkov <bp@suse.de> |
EDAC: Cleanup/sync workqueue functions They're both running only when ->edac_check is initialized so remove that check from the workqueue function itself. Synchronize/generalize the ->op_state check between the two. Kill useless comments, while at it. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
626a7a4d |
|
02-Feb-2016 |
Borislav Petkov <bp@suse.de> |
EDAC: Kill workqueue setup/teardown functions We have the generic wrappers now, use those. edac_pci_workq_setup() had an unused argument anyway. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
09667606 |
|
02-Feb-2016 |
Borislav Petkov <bp@suse.de> |
EDAC: Balance workqueue setup and teardown We use the ->edac_check function pointers to determine whether we need to setup a polling workqueue. However, the destroy path is not balanced and we might try to teardown an unitialized workqueue. Balance init and destroy paths by looking at ->edac_check in both cases. Set op_state to OP_OFFLINE *before* destroying anything. Reported-by: Zhiqiang Hou <Zhiqiang.Hou@freescale.com> Cc: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
c4cf3b45 |
|
30-Nov-2015 |
Borislav Petkov <bp@suse.de> |
EDAC: Rework workqueue handling Hide the EDAC workqueue pointer in a separate compilation unit and add accessors for the workqueue manipulations needed. Remove edac_pci_reset_delay_period() which wasn't used by anything. It seems it got added without a user with 91b99041c1d5 ("drivers/edac: updated PCI monitoring") Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
fcd5c4dd |
|
27-Nov-2015 |
Borislav Petkov <bp@suse.de> |
EDAC: Robustify workqueues destruction EDAC workqueue destruction is really fragile. We cancel delayed work but if it is still running and requeues itself, we still go ahead and destroy the workqueue and the queued work explodes when workqueue core attempts to run it. Make the destruction more robust by switching op_state to offline so that requeuing stops. Cancel any pending work *synchronously* too. EDAC i7core: Driver loaded. general protection fault: 0000 [#1] SMP CPU 12 Modules linked in: Supported: Yes Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7 RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0 < ... regs ...> Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600) Stack: ... Call Trace: call_timer_fn run_timer_softirq __do_softirq call_softirq do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt intel_idle cpuidle_idle_call cpu_idle Code: ... RIP __queue_work RSP <...> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org>
|
#
990995ba |
|
20-Oct-2015 |
Tan Xiaojun <tanxiaojun@huawei.com> |
EDAC: Fix PAGES_TO_MiB macro misuse The PAGES_TO_MiB macro is used for unit conversion but the trace_mc_event() tracepoint expects a page address. Fix that. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
b01aec9b |
|
21-May-2015 |
Borislav Petkov <bp@suse.de> |
EDAC: Cleanup atomic_scrub mess So first of all, this atomic_scrub() function's naming is bad. It looks like an atomic_t helper. Change it to edac_atomic_scrub(). The bigger problem is that this function is arch-specific and every new arch which doesn't necessarily need that functionality still needs to define it, otherwise EDAC doesn't compile. So instead of doing that and including arch-specific headers, have each arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c for ifdeffery. Much cleaner. And we already are doing this with another symbol - EDAC_SUPPORT. This is also much cleaner than having CONFIG_EDAC enumerate all the arches which need/have EDAC support and drivers. This way I can kill the useless edac.h header in tile too. Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Doug Thompson <dougthompson@xmission.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linuxppc-dev@lists.ozlabs.org Cc: "Maciej W. Rozycki" <macro@codesourcery.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Paul Mackerras <paulus@samba.org> Cc: "Steven J. Hill" <Steven.Hill@imgtec.com> Cc: x86@kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
4e8d230d |
|
04-Feb-2015 |
Takashi Iwai <tiwai@suse.de> |
EDAC: Allow to pass driver-specific attribute groups Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info object with the optional attribute groups. This allows drivers to pass additional sysfs entries without manual (and racy) device_create_file() and co calls. edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups() with NULL groups. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
4cfc3a40 |
|
30-Sep-2014 |
Borislav Petkov <bp@suse.de> |
EDAC: Sync memory types and names Make keeping the sync between the mem_types enum and the actual string names simpler by using designated initializers. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
348fec70 |
|
18-Sep-2014 |
Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> |
EDAC: Add DDR3 LRDIMM entries to edac_mem_types F15hM60h adds support for DDR4 and DDR3 LRDIMMs. Add them here. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1411070218-10258-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: improve comments. ] Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
f4ce6eca |
|
13-Aug-2014 |
Borislav Petkov <bp@suse.de> |
EDAC: Fix mem_types strings type This one got forgotten during an earlier cleanup. Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
76ac8275 |
|
11-Jun-2014 |
Chen, Gong <gong.chen@linux.intel.com> |
trace, RAS: Add basic RAS trace event To avoid confuision and conflict of usage for RAS related trace event, add an unified RAS trace event stub. Start a RAS subsystem menu which will be fleshed out in time, when more features get added to it. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
aa2064d7 |
|
08-May-2014 |
Loc Ho <lho@apm.com> |
EDAC: Fix MC scrub mode comparsion bug for correctable errors The MC structure field scrub_mode is of integer type - not bit field. Use it accordingly. Signed-off-by: Loc Ho <lho@apm.com> Link: http://lkml.kernel.org/r/1399590199-12256-2-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
cb6ef42e |
|
12-Feb-2014 |
Borislav Petkov <bp@suse.de> |
EDAC: Correct workqueue setup path We're using edac_mc_workq_setup() both on the init path, when we load an edac driver and when we change the polling period (edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec. On that second path we don't need to init the workqueue which has been initialized already. Thanks to Tejun for workqueue insights. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com Cc: <stable@vger.kernel.org>
|
#
9da21b15 |
|
03-Feb-2014 |
Borislav Petkov <bp@suse.de> |
EDAC: Poll timeout cannot be zero, p2 Sanitize code even more to accept unsigned longs only and to not allow polling intervals below 1 second as this is unnecessary and doesn't make much sense anyway for polling errors. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com Cc: Doug Thompson <dougthompson@xmission.com> Cc: <stable@vger.kernel.org>
|
#
7270a608 |
|
10-Oct-2013 |
Robert Richter <robert.richter@linaro.org> |
edac: Unify reporting of device info for device, mc and pci Log messages slightly differ between edac subsystems. Unifying it. Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rric@kernel.org>
|
#
88d84ac9 |
|
18-Jul-2013 |
Borislav Petkov <bp@suse.de> |
EDAC: Fix lockdep splat Fix the following: BUG: key ffff88043bdd0330 not in .data! ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0() DEBUG_LOCKS_WARN_ON(1) Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1 Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc Call Trace: dump_stack warn_slowpath_common warn_slowpath_fmt lockdep_init_map ? trace_hardirqs_on_caller ? trace_hardirqs_on debug_mutex_init __mutex_init bus_register edac_create_sysfs_mci_device edac_mc_add_mc sbridge_probe pci_device_probe driver_probe_device __driver_attach ? driver_probe_device bus_for_each_dev driver_attach bus_add_driver driver_register __pci_register_driver ? 0xffffffffa0010fff sbridge_init ? 0xffffffffa0010fff do_one_initcall load_module ? unset_module_init_ro_nx SyS_init_module tracesys ---[ end trace d24a70b0d3ddf733 ]--- EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0 EDAC sbridge: Driver loaded. What happens is that bus_register needs a statically allocated lock_key because the last is handed in to lockdep. However, struct mem_ctl_info embeds struct bus_type (the whole struct, not a pointer to it) and the whole thing gets dynamically allocated. Fix this by using a statically allocated struct bus_type for the MC bus. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: stable@kernel.org # v3.10 Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
9713faec |
|
11-Mar-2013 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
EDAC: Merge mci.mem_is_per_rank with mci.csbased Both mci.mem_is_per_rank and mci.csbased denote the same thing: the memory controller is csrows based. Merge both fields into one. There's no need for the driver to actually fill it, as the core detects it by checking if one of the layers has the csrows type as part of the memory hierarchy: if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT) per_rank = true; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
e7e24830 |
|
31-Oct-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: add support for raw error reports That allows APEI GHES driver to report errors directly, using the EDAC error report API. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
c7ef7645 |
|
21-Feb-2013 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: reduce stack pressure by using a pre-allocated buffer The number of variables at the stack is too big. Reduces the stack usage by using a pre-allocated error buffer. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
80cc7d87 |
|
31-Oct-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: lock module owner to avoid error report conflicts APEI GHES and i7core_edac/sb_edac currently can be loaded at the same time, but those are Highlander modules: "There can be only one". There are two reasons for that: 1) Each driver assumes that it is the only one registering at the EDAC core, as it is driver's responsibility to number the memory controllers, and all of them start from 0; 2) If BIOS is handling the memory errors, the OS can't also be doing it, as one will mangle with the other. So, we need to add an module owner's lock at the EDAC core, in order to avoid having two different modules handling memory errors at the same time. The best way for doing this lock seems to use the driver's name, as this is unique, and won't require changes on every driver. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
c66b5a79 |
|
15-Feb-2013 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: add a new memory layer type There are some cases where the memory controller layout is completely hidden. This is the case of firmware-driven error code, like the one provided by GHES. Add a new layer to be used on such memory error report mechanisms. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
d3d09e18 |
|
26-Jan-2013 |
Joe Perches <joe@perches.com> |
EDAC: Fix kcalloc argument order First number, then size. Signed-off-by: Joe Perches <joe@perches.com> Cc: <stable@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
80f5ab09 |
|
18-Aug-2012 |
Shaun Ruffell <sruffell@digium.com> |
edac: edac_mc no longer deals with kobjects directly There are no more embedded kobjects in struct mem_ctl_info. Remove a header and a comment that does not reflect the code anymore. Signed-off-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
f430d570 |
|
10-Sep-2012 |
Borislav Petkov <borislav.petkov@amd.com> |
EDAC: Handle empty msg strings when reporting errors A reported error could look like this [ 226.178315] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6) with two spaces back-to-back due to the msg argument of edac_mc_handle_error being passed on empty by the specific drivers. Handle that. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
#
4da1b7bf |
|
10-Sep-2012 |
Borislav Petkov <borislav.petkov@amd.com> |
EDAC: Remove useless assignment of error type The tracepoint decodes the error type later anyway so remove a useless assignment to the temporary p which gets overwritten later anyway. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
#
24bef66e |
|
24-Oct-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Fix the dimm filling for csrows-based layouts The driver is currently filling data in a wrong way, on drivers for csrows-based memory controller, when the first layer is a csrow. This is not easily to notice, as, in general, memories are filed in dual, interleaved, symetric mode, as very few memory controllers support asymetric modes. While digging into a bug for i82795_edac driver, the asymetric mode there is now working, allowing us to fill the machine with 4x1GB ranks at channel 0, and 2x512GB at channel 1: Channel 0 ranks: EDAC DEBUG: i82975x_init_csrows: DIMM A0: from page 0x00000000 to 0x0003ffff (size: 0x00040000 pages) EDAC DEBUG: i82975x_init_csrows: DIMM A1: from page 0x00040000 to 0x0007ffff (size: 0x00040000 pages) EDAC DEBUG: i82975x_init_csrows: DIMM A2: from page 0x00080000 to 0x000bffff (size: 0x00040000 pages) EDAC DEBUG: i82975x_init_csrows: DIMM A3: from page 0x000c0000 to 0x000fffff (size: 0x00040000 pages) Channel 1 ranks: EDAC DEBUG: i82975x_init_csrows: DIMM B0: from page 0x00100000 to 0x0011ffff (size: 0x00020000 pages) EDAC DEBUG: i82975x_init_csrows: DIMM B1: from page 0x00120000 to 0x0013ffff (size: 0x00020000 pages) Instead of properly showing the memories as such, before this patch, it shows the memory layout as: +-----------------------------------+ | mc0 | | csrow0 | csrow1 | csrow2 | ----------+-----------------------------------+ channel1: | 1024 MB | 1024 MB | 512 MB | channel0: | 1024 MB | 1024 MB | 512 MB | ----------+-----------------------------------+ as if both channels were symetric, grouping the DIMMs on a wrong layout. After this patch, the memory is correctly represented. So, for csrows at layers[0], it shows: +-----------------------------------------------+ | mc0 | | csrow0 | csrow1 | csrow2 | csrow3 | ----------+-----------------------------------------------+ channel1: | 512 MB | 512 MB | 0 MB | 0 MB | channel0: | 1024 MB | 1024 MB | 1024 MB | 1024 MB | ----------+-----------------------------------------------+ For csrows at layers[1], it shows: +-----------------------+ | mc0 | | channel0 | channel1 | --------+-----------------------+ csrow3: | 1024 MB | 0 MB | csrow2: | 1024 MB | 0 MB | --------+-----------------------+ csrow1: | 1024 MB | 512 MB | csrow0: | 1024 MB | 512 MB | --------+-----------------------+ So, no matter of what comes first, the information between channel and csrow will be properly represented. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
faa2ad09 |
|
22-Sep-2012 |
Shaun Ruffell <sruffell@digium.com> |
edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs. Fix potential NULL pointer dereference in edac_unregister_sysfs() on system boot introduced in 3.6-rc1. Since commit 7a623c039 ("edac: rewrite the sysfs code to use struct device") edac_mc_alloc() no longer initializes embedded kobjects in struct mem_ctl_info. Therefore edac_mc_free() can no longer simply decrement a kobject reference count to free the allocated memory unless the memory controller driver module had also called edac_mc_add_mc(). Now edac_mc_free() will check if the newly embedded struct device has been registered with sysfs before using either the standard device release functions or freeing the data structures itself with logic pulled out of the error path of edac_mc_alloc(). The BUG this patch resolves for me: BUG: unable to handle kernel NULL pointer dereference at (null) EIP is at __wake_up_common+0x1a/0x6a Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000) Call Trace: complete_all+0x3f/0x50 device_pm_remove+0x23/0xa2 device_del+0x34/0x142 edac_unregister_sysfs+0x3b/0x5c [edac_core] edac_mc_free+0x29/0x2f [edac_core] e7xxx_probe1+0x268/0x311 [e7xxx_edac] e7xxx_init_one+0x56/0x61 [e7xxx_edac] local_pci_probe+0x13/0x15 ... Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ef6e7816 |
|
22-Sep-2012 |
Fengguang Wu <fengguang.wu@intel.com> |
edac_mc: fix messy kfree calls in the error path coccinelle warns about: + drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429 421 if (mci->csrows) { > 422 for (chn = 0; chn < tot_channels; chn++) { 423 csr = mci->csrows[chn]; 424 if (csr) { > 425 for (chn = 0; chn < tot_channels; chn++) 426 kfree(csr->channels[chn]); 427 kfree(csr); 428 } > 429 kfree(mci->csrows[i]); 430 } 431 kfree(mci->csrows); 432 } and that code block seem to mess things up in several ways (double free, memory leak, out-of-bound reads etc.): L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be "row" and "tot_csrows" respectively. Which means either memory leak, or out-of-bound reads (which if does not trigger an immediate page fault error, will further lead to kfree() on random addresses). L425: The inner loop is reusing the same iterator "chn" as the outer loop, which could lead to premature end of the outer loop, and hence memory leak. L429: The array index 'i' in mci->csrows[i] is a temporary value used in previous loops, and won't change at all in the current loop. Which means either out-of-bound read and possibly kfree(random number), or the same mci->csrows[i] get freed once and again, and possibly double free for the kfree(csr) in L427. L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory. The buggy code was introduced by commit de3910eb ("edac: change the mem allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1 merge window. Fix it by freeing up resources in this order: free csrows[i]->channels[j] free csrows[i]->channels free csrows[i] free csrows CC: Mauro Carvalho Chehab <mchehab@redhat.com> CC: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
41f63c53 |
|
03-Aug-2012 |
Tejun Heo <tj@kernel.org> |
workqueue: use mod_delayed_work() instead of cancel + queue Convert delayed_work users doing cancel_delayed_work() followed by queue_delayed_work() to mod_delayed_work(). Most conversions are straight-forward. Ones worth mentioning are, * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always use mod_delayed_work() and cancel loop in edac_mc_reset_delay_period() is dropped. * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether watchdog is active or not. @fan_watchdog_active and related code dropped. * drivers/power/charger-manager.c: Seemingly a lot of delayed_work_pending() abuse going on here. [delayed_]work_pending() are unsynchronized and racy when used like this. I converted one instance in fullbatt_handler(). Please conver the rest so that it invokes workqueue APIs for the intended target state rather than trying to game work item pending state transitions. e.g. if timer should be modified - call mod_delayed_work(), canceled - call cancel_delayed_work[_sync](). * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling() simplified. Note that round_jiffies() calls in this function are meaningless. round_jiffies() work on absolute jiffies not delta delay used by delayed_work. v2: Tomi pointed out that __cancel_delayed_work() users can't be safely converted to mod_delayed_work(). They could be calling it from irq context and if that happens while delayed_work_timer_fn() is running, it could deadlock. __cancel_delayed_work() users are dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Doug Thompson <dougthompson@xmission.com> Cc: David Airlie <airlied@linux.ie> Cc: Roland Dreier <roland@kernel.org> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Johannes Berg <johannes@sipsolutions.net>
|
#
9eb07a7f |
|
04-Jun-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: edac_mc_handle_error(): add an error_count parameter In order to avoid loosing error events, it is desirable to group error events together and generate a single trace for several identical errors. The trace API already allows reporting multiple errors. Change the handle_error function to also allow that. The changes at the drivers were made by this small script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
03f7eae8 |
|
04-Jun-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: remove arch-specific parameter for the error handler Remove the arch-dependent parameter, as it were not used, as the MCE tracepoint weren't implemented. It probably doesn't make sense to have an MCE-specific tracepoint, as this will cost more bytes at the tracepoint, and tracepoint is not free. The changes at the EDAC drivers were done by this small perl script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
08a4a136 |
|
18-May-2012 |
Dan Carpenter <dan.carpenter@oracle.com> |
edac_mc: check for allocation failure in edac_mc_alloc() Add a check here for if kzalloc() failed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
6e84d359 |
|
30-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac_mc: Cleanup per-dimm_info debug messages The edac_mc_alloc() routine allocates one dimm_info device for all possible memories, including the non-filled ones. The debug messages there are somewhat confusing. So, cleans them, by moving the code that prints the memory location to edac_mc, and using it on both edac_mc_sysfs and edac_mc. Also, only dumps information when DIMM/ranks are actually filled. After this patch, a dimm-based memory controller will print the debug info as: [ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0 [ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow: csrow = ffff8801169be000 [ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow: csrow->first_page = 0x0 [ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow: csrow->last_page = 0x0 [ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow: csrow->page_mask = 0x0 [ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow: csrow->nr_channels = 3 [ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow: csrow->channels = ffff8801149c2840 [ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow: csrow->mci = ffff880117426000 [ 1011.380041] EDAC DEBUG: edac_mc_dump_channel: channel->chan_idx = 0 [ 1011.380042] EDAC DEBUG: edac_mc_dump_channel: channel = ffff8801149c2860 [ 1011.380044] EDAC DEBUG: edac_mc_dump_channel: channel->csrow = ffff8801169be000 [ 1011.380046] EDAC DEBUG: edac_mc_dump_channel: channel->dimm = ffff88010fe90400 ... [ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0 [ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm: dimm = ffff88010fe90400 [ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm: dimm->label = 'CPU#0Channel#0_DIMM#0' [ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 [ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8 [ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 ... (a rank-based memory controller would print, instead of "dimm?", "rank?" on the above debug info) Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
956b9ba1 |
|
29-Apr-2012 |
Joe Perches <joe@perches.com> |
edac: Convert debugfX to edac_dbg(X, Use a more common debugging style. Remove __FILE__ uses, add missing newlines, coalesce formats and align arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
dd23cd6e |
|
29-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs The debug macro already adds that. Most of the work here was made by this small script: $f .=$_ while (<>); $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*": /\1"/g; $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*/\1/g; $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*"MC: /\1"/g; $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g; $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g; $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g; $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g; $f =~ s/\"MC\: \\n\"/"MC:\\n"/g; print $f; After running the script, manual cleanups were done to fix it the remaining places. While here, removed the __LINE__ on most places, as it doesn't actually give useful info on most places. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
de3910eb |
|
24-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: change the mem allocation scheme to make Documentation/kobject.txt happy Kernel kobjects have rigid rules: each container object should be dynamically allocated, and can't be allocated into a single kmalloc. EDAC never obeyed this rule: it has a single malloc function that allocates all needed data into a single kzalloc. As this is not accepted anymore, change the allocation schema of the EDAC *_info structs to enforce this Kernel standard. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Greg K H <gregkh@linuxfoundation.org> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
d90c0089 |
|
21-Mar-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Get rid of the old kobj's from the edac mc code Now that al users for the old kobj raw access are gone, we can get rid of the legacy kobj-based structures and data. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
7a623c03 |
|
16-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: rewrite the sysfs code to use struct device The EDAC subsystem uses the old struct sysdev approach, creating all nodes using the raw sysfs API. This is bad, as the API is deprecated. As we'll be changing the EDAC API, let's first port the existing code to struct device. There's one drawback on this patch: driver-specific sysfs nodes, used by mpc85xx_edac, amd64_edac and i7core_edac won't be created anymore. While it would be possible to also port the device-specific code, that would mix kobj with struct device, with is not recommended. Also, it is easier and nicer to move the code to the drivers, instead, as the core can get rid of some complex logic that just emulates what the device_add() and device_create_file() already does. The next patches will convert the driver-specific code to use the device-specific calls. Then, the remaining bits of the old sysfs API will be removed. NOTE: a per-MC bus is required, otherwise devices with more than one memory controller will hit a bug like the one below: [ 819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev() [ 819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1 [ 819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1 [ 819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0 [ 819.094984] ------------[ cut here ]------------ [ 819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0() [ 819.107282] Hardware name: S2600CP [ 819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0' [ 819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan] [ 819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1 [ 819.184113] Call Trace: [ 819.186868] [<ffffffff8105adaf>] warn_slowpath_common+0x7f/0xc0 [ 819.193573] [<ffffffff8105aea6>] warn_slowpath_fmt+0x46/0x50 [ 819.200000] [<ffffffff811f53d1>] sysfs_add_one+0xc1/0xf0 [ 819.206025] [<ffffffff811f5cf5>] sysfs_do_create_link+0x135/0x220 [ 819.212944] [<ffffffff811f7023>] ? sysfs_create_group+0x13/0x20 [ 819.219656] [<ffffffff811f5df3>] sysfs_create_link+0x13/0x20 [ 819.226109] [<ffffffff813b04f6>] bus_add_device+0xe6/0x1b0 [ 819.232350] [<ffffffff813ae7cb>] device_add+0x2db/0x460 [ 819.238300] [<ffffffffa0325634>] edac_create_dimm_object+0x84/0xf0 [edac_core] [ 819.246460] [<ffffffffa0325e18>] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core] [ 819.255215] [<ffffffffa0322e2a>] edac_mc_add_mc+0x5a/0x2c0 [edac_core] [ 819.262611] [<ffffffffa03412df>] sbridge_register_mci+0x1bc/0x279 [sb_edac] [ 819.270493] [<ffffffffa03417a3>] sbridge_probe+0xef/0x175 [sb_edac] [ 819.277630] [<ffffffff813ba4e8>] ? pm_runtime_enable+0x58/0x90 [ 819.284268] [<ffffffff812f430c>] local_pci_probe+0x5c/0xd0 [ 819.290508] [<ffffffff812f5ba1>] __pci_device_probe+0xf1/0x100 [ 819.297117] [<ffffffff812f5bea>] pci_device_probe+0x3a/0x60 [ 819.303457] [<ffffffff813b1003>] really_probe+0x73/0x270 [ 819.309496] [<ffffffff813b138e>] driver_probe_device+0x4e/0xb0 [ 819.316104] [<ffffffff813b149b>] __driver_attach+0xab/0xb0 [ 819.322337] [<ffffffff813b13f0>] ? driver_probe_device+0xb0/0xb0 [ 819.329151] [<ffffffff813af5d6>] bus_for_each_dev+0x56/0x90 [ 819.335489] [<ffffffff813b0d7e>] driver_attach+0x1e/0x20 [ 819.341534] [<ffffffff813b0980>] bus_add_driver+0x1b0/0x2a0 [ 819.347884] [<ffffffffa0347000>] ? 0xffffffffa0346fff [ 819.353641] [<ffffffff813b19f6>] driver_register+0x76/0x140 [ 819.359980] [<ffffffff8159f18b>] ? printk+0x51/0x53 [ 819.365524] [<ffffffffa0347000>] ? 0xffffffffa0346fff [ 819.371291] [<ffffffff812f5896>] __pci_register_driver+0x56/0xd0 [ 819.378096] [<ffffffffa0347054>] sbridge_init+0x54/0x1000 [sb_edac] [ 819.385231] [<ffffffff8100203f>] do_one_initcall+0x3f/0x170 [ 819.391577] [<ffffffff810bcd2e>] sys_init_module+0xbe/0x230 [ 819.397926] [<ffffffff815bb529>] system_call_fastpath+0x16/0x1b [ 819.404633] ---[ end trace 1654fdd39556689f ]--- This happens because the bus is not being properly initialized. Instead of putting the memory sub-devices inside the memory controller, it is putting everything under the same directory: $ tree /sys/bus/edac/ /sys/bus/edac/ ├── devices │ ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts │ ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0 │ ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1 │ ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2 │ ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0 │ ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1 │ ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3 │ ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6 │ ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch │ ├── mc -> ../../../devices/system/edac/mc │ └── mc0 -> ../../../devices/system/edac/mc/mc0 ├── drivers ├── drivers_autoprobe ├── drivers_probe └── uevent On a multi-memory controller system, the names "csrow%d" and "dimm%d" should be under "mc%d", and not at the main hierarchy level. So, we need to create a per-MC bus, in order to have its own namespace. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Greg K H <gregkh@linuxfoundation.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
8447c4d1 |
|
06-Jun-2012 |
Chris Metcalf <cmetcalf@tilera.com> |
edac: Do alignment logic properly in edac_align_ptr() The logic was checking the sizeof the structure being allocated to determine whether an alignment fixup was required. This isn't right; what we actually care about is the alignment of the actual pointer that's about to be returned. This became an issue recently because struct edac_mc_layer has a size that is not zero modulo eight, so we were taking the correctly-aligned pointer and forcing it to be misaligned. On Tile this caused an alignment exception. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
fd687502 |
|
16-Mar-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Rename the parent dev to pdev As EDAC doesn't use struct device itself, it created a parent dev pointer called as "pdev". Now that we'll be converting it to use struct device, instead of struct devsys, this needs to be fixed. No functional changes. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
53f2d028 |
|
23-Feb-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
RAS: Add a tracepoint for reporting memory controller events Add a new tracepoint-based hardware events report method for reporting Memory Controller events. Part of the description bellow is shamelessly copied from Tony Luck's notes about the Hardware Error BoF during LPC 2010 [1]. Tony, thanks for your notes and discussions to generate the h/w error reporting requirements. [1] http://lwn.net/Articles/416669/ We have several subsystems & methods for reporting hardware errors: 1) EDAC ("Error Detection and Correction"). In its original form this consisted of a platform specific driver that read topology information and error counts from chipset registers and reported the results via a sysfs interface. 2) mcelog - x86 specific decoding of machine check bank registers reporting in binary form via /dev/mcelog. Recent additions make use of the APEI extensions that were documented in version 4.0a of the ACPI specification to acquire more information about errors without having to rely reading chipset registers directly. A user level programs decodes into somewhat human readable format. 3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and decodes errors reported via machine check bank registers in AMD processors to the console log using printk(); Each of these mechanisms has a band of followers ... and none of them appear to meet all the needs of all users. As part of a RAS subsystem, let's encapsulate the memory error hardware events into a trace facility. The tracepoint printk will be displayed like: mc_event: [quant] (Corrected|Uncorrected|Fatal) error:[error msg] on [label] ([location] [edac_mc detail] [driver_detail] Where: [quant] is the quantity of errors [error msg] is the driver-specific error message (e. g. "memory read", "bus error", ...); [location] is the location in terms of memory controller and branch/channel/slot, channel/slot or csrow/channel; [label] is the memory stick label; [edac_mc detail] describes the address location of the error and the syndrome; [driver detail] is driver-specifig error message details, when needed/provided (e. g. "area:DMA", ...) For example: mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA) Of course, any userspace tools meant to handle errors should not parse the above data. They should, instead, use the binary fields provided by the tracepoint, mapping them directly into their Management Information Base. NOTE: The original patch was providing an additional mechanism for MCA-based trace events that also contained MCA error register data. However, as no agreement was reached so far for the MCA-based trace events, for now, let's add events only for memory errors. A latter patch is planned to change the tracepoint, for those types of event. Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
5926ff50 |
|
09-Feb-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Initialize the dimm label with the known information While userspace doesn't fill the dimm labels, add there the dimm location, as described by the used memory model. This could eventually match what is described at the dmidecode, making easier for people to identify the memory. For example, on an Intel motherboard where the DMI table is reliable, the first memory stick is described as: Memory Device Array Handle: 0x0029 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 2048 MB Form Factor: DIMM Set: 1 Locator: A1_DIMM0 Bank Locator: A1_Node0_Channel0_Dimm0 Type: <OUT OF SPEC> Type Detail: Synchronous Speed: 800 MHz Manufacturer: A1_Manufacturer0 Serial Number: A1_SerNum0 Asset Tag: A1_AssetTagNum0 Part Number: A1_PartNum0 The memory named as "A1_DIMM0" is physically located at the first memory controller (node 0), at channel 0, dimm slot 0. After this patch, the memory label will be filled with: /sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0 And (after the new EDAC API patches) as: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0 So, even if the memory label is not initialized on userspace, an useful information with the error location is filled there, expecially since several systems/motherboards are provided with enough info to map from channel/slot (or branch/channel/slot) into the DIMM label. So, letting the EDAC core fill it by default is a good thing. It should noticed that, as the label filling happens at the edac_mc_alloc(), drivers can override it to better describe the memories (and some actually do it). Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
ca0907b9 |
|
02-May-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Remove the legacy EDAC ABI Now that all drivers got converted to use the new ABI, we can drop the old one. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
4275be63 |
|
18-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Change internal representation to work with layers Change the EDAC internal representation to work with non-csrow based memory controllers. There are lots of those memory controllers nowadays, and more are coming. So, the EDAC internal representation needs to be changed, in order to work with those memory controllers, while preserving backward compatibility with the old ones. The edac core was written with the idea that memory controllers are able to directly access csrows. This is not true for FB-DIMM and RAMBUS memory controllers. Also, some recent advanced memory controllers don't present a per-csrows view. Instead, they view memories as DIMMs, instead of ranks. So, change the allocation and error report routines to allow them to work with all types of architectures. This will allow the removal of several hacks with FB-DIMM and RAMBUS memory controllers. Also, several tests were done on different platforms using different x86 drivers. TODO: a multi-rank DIMMs are currently represented by multiple DIMM entries in struct dimm_info. That means that changing a label for one rank won't change the same label for the other ranks at the same DIMM. This bug is present since the beginning of the EDAC, so it is not a big deal. However, on several drivers, it is possible to fix this issue, but it should be a per-driver fix, as the csrow => DIMM arrangement may not be equal for all. So, don't try to fix it here yet. I tried to make this patch as short as possible, preceding it with several other patches that simplified the logic here. Yet, as the internal API changes, all drivers need changes. The changes are generally bigger in the drivers for FB-DIMMs. Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
93e4fe64 |
|
16-Apr-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: rewrite edac_align_ptr() The edac_align_ptr() function is used to prepare data for a single memory allocation kzalloc() call. It counts how many bytes are needed by some data structure. Using it as-is is not that trivial, as the quantity of memory elements reserved is not there, but, instead, it is on a next call. In order to avoid mistakes when using it, move the number of allocated elements into it, making easier to use it. Reviewed-by: Borislav Petkov <bp@amd64.org> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
a895bf8b |
|
28-Jan-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: move nr_pages to dimm struct The number of pages is a dimm property. Move it to the dimm struct. After this change, it is possible to add sysfs nodes for the DIMM's that will properly represent the DIMM stick properties, including its size. A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when the memory controller represents the memory via chip select rows. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
084a4fcc |
|
27-Jan-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: move dimm properties to struct dimm_info On systems based on chip select rows, all channels need to use memories with the same properties, otherwise the memories on channels A and B won't be recognized. However, such assumption is not true for all types of memory controllers. Controllers for FB-DIMM's don't have such requirements. Also, modern Intel controllers seem to be capable of handling such differences. So, we need to get rid of storing the DIMM information into a per-csrow data, storing it, instead at the right place. The first step is to move grain, mtype, dtype and edac_mode to the per-dimm struct. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: James Bottomley <James.Bottomley@parallels.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Mike Williams <mike@mikebwilliams.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
a7d7d2e1 |
|
27-Jan-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: Create a dimm struct and move the labels into it The way a DIMM is currently represented implies that they're linked into a per-csrow struct. However, some drivers don't see csrows, as they're ridden behind some chip like the AMB's on FBDIMM's, for example. This forced drivers to fake^Wvirtualize a csrow struct, and to create a mess under csrow/channel original's concept. Move the DIMM labels into a per-DIMM struct, and add there the real location of the socket, in terms of csrow/channel. Latter patches will modify the location to properly represent the memory architecture. All other drivers will use a per-csrow type of location. Some of those drivers will require a latter conversion, as they also fake the csrows internally. TODO: While this patch doesn't change the existing behavior, on csrows-based memory controllers, a csrow/channel pair points to a memory rank. There's a known bug at the EDAC core that allows having different labels for the same DIMM, if it has more than one rank. A latter patch is need to merge the several ranks for a DIMM into the same dimm_info struct, in order to avoid having different labels for the same DIMM. The edac_mc_alloc() will now contain a per-dimm initialization loop that will be changed by latter patches in order to match other types of memory architectures. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
a4b4be3f |
|
27-Jan-2012 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac: rename channel_info to rank_info What it is pointed by a csrow/channel vector is a rank information, and not a channel information. On a traditional architecture, the memory controller directly access the memory ranks, via chip select rows. Different ranks at the same DIMM is selected via different chip select rows. So, typically, one csrow/channel pair means one different DIMM. On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced Memory Buffer (AMB) that serves as the interface between the memory controller and the memory chips. The AMB selection is via the DIMM slot, and not via a csrow. It is up to the AMB to talk with the csrows of the DRAM chips. So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM rank. RAMBUS is similar. Newer memory controllers, like the ones found on Intel Sandy Bridge and Nehalem, even working with normal DDR3 DIMM's, don't use the usual channel A/channel B interleaving schema to provide 128 bits data access. Instead, they have more channels (3 or 4 channels), and they can use several interleaving schemas. Such memory controllers see the DIMMs directly on their registers, instead of the ranks, which is better for the driver, as its main usageis to point to a broken DIMM stick (the Field Repleceable Unit), and not to point to a broken DRAM chip. The drivers that support such such newer memory architecture models currently need to fake information and to abuse on EDAC structures, as the subsystem was conceived with the idea that the csrow would always be visible by the CPU. To make things a little worse, those drivers don't currently fake csrows/channels on a consistent way, as the concepts there don't apply to the memory controllers they're talking with. So, each driver author interpreted the concepts using a different logic. In order to fix it, let's rename the data structure that points into a DIMM rank to "rank_info", in order to be clearer about what's stored there. Latter patches will provide a better way to represent the memory hierarchy for the other types of memory controller. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
4e5df7ca |
|
25-Nov-2011 |
Cong Wang <amwang@redhat.com> |
edac: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>
|
#
fe5ff8b8 |
|
14-Dec-2011 |
Kay Sievers <kay.sievers@vrfy.org> |
edac: convert sysdev_class to a regular subsystem After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Cc: Doug Thompson <dougthompson@xmission.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
e2e77098 |
|
26-May-2011 |
Lai Jiangshan <laijs@cn.fujitsu.com> |
edac,rcu: use synchronize_rcu() instead of call_rcu()+rcu_barrier() synchronize_rcu() does the stuff as needed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
25985edc |
|
30-Mar-2011 |
Lucas De Marchi <lucas.demarchi@profusion.mobi> |
Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
#
24f9a7fe |
|
07-Oct-2010 |
Borislav Petkov <borislav.petkov@amd.com> |
amd64_edac: Rework printk macros Add a macro per printk level, shorten up error messages. Add relevant information to KERN_INFO level. No functional change. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
#
bb31b312 |
|
02-Dec-2010 |
Borislav Petkov <borislav.petkov@amd.com> |
EDAC: Fix workqueue-related crashes 00740c58541b6087d78418cebca1fcb86dc6077d changed edac_core to un-/register a workqueue item only if a lowlevel driver supplies a polling routine. Normally, when we remove a polling low-level driver, we go and cancel all the queued work. However, the workqueue unreg happens based on the ->op_state setting, and edac_mc_del_mc() sets this to OP_OFFLINE _before_ we cancel the work item, leading to NULL ptr oops on the workqueue list. Fix it by putting the unreg stuff in proper order. Cc: <stable@kernel.org> #36.x Reported-and-tested-by: Tobias Karnat <tobias.karnat@googlemail.com> LKML-Reference: <1291201307.3029.21.camel@Tobias-Karnat> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
#
accf74ff |
|
16-Aug-2010 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
i7core_edac: don't use a freed mci struct This is a nasty bug. Since kobject count will be reduced by zero by edac_mc_del_mc(), and this triggers the kobj release method, the mci memory will be freed automatically. So, all we have left is ctl_name, as shown by enabling debug: [ 80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link [ 80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance [ 80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing [ 80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0 [ 80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs [ 80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free() [ 80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj() [ 80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices() Also, kfree(mci) shouldn't happen at the kobj.release, as it happens when edac_remove_sysfs_mci_device() is called, but the logic is: edac_remove_sysfs_mci_device(mci); edac_printk(KERN_INFO, EDAC_MC, "Removed device %d for %s %s: DEV %s\n", mci->mc_idx, mci->mod_name, mci->ctl_name, edac_dev_name(mci)); So, as the edac_printk() needs the mci struct, this generates an OOPS. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
bbc560ae |
|
16-Aug-2010 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac_core: Print debug messages at release calls This is important to track a nasty bug at the free logic. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
6fe1108f |
|
11-Aug-2010 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
edac_core: Do a better job with node removal Make sure we remove groups at the right order Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
939747bd |
|
10-Aug-2010 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
i7core_edac: Be sure that the edac pci handler will be properly released With multi-sockets, more than one edac pci handler is enabled. Be sure to un-register all instances. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
#
00740c58 |
|
25-Sep-2010 |
Borislav Petkov <borislav.petkov@amd.com> |
amd64_edac: Fix driver module removal f4347553b30ec66530bfe63c84530afea3803396 removed the edac polling mechanism in favor of using a notifier chain for conveying MCE information to edac. However, the module removal path didn't test whether the driver had setup the polling function workqueue at all and the rmmod process was hanging in the kernel at try_to_del_timer_sync() in the cancel_delayed_work() path, trying to cancel an uninitialized work struct. Fix that by adding a balancing check to the workqueue removal path. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
#
239642fe |
|
12-Nov-2009 |
Borislav Petkov <borislav.petkov@amd.com> |
edac: add memory types strings for debugging Instead of using deeply-nested conditionals for dumping the DIMM type in debug mode, add a strings array of the supported DIMM types. This is useful in cases where an edac driver supports multiple DRAM types and is only defined in debug builds. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
#
458e5ff1 |
|
23-Sep-2009 |
Jesper Dangaard Brouer <hawk@comx.dk> |
edac: core: remove completion-wait for complete with rcu_barrier Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c and edac_pci.c. They all use a wait_for_completion() scheme, but this scheme it not 100% safe on multiple CPUs. See the _rcu_barrier() implementation which explains why extra precausion is needed. The patch adds a comment about rcu_barrier() and as a precausion calls rcu_barrier(). A maintainer needs to look at removing the wait_for_completion code. [dougthompson@xmission.com: remove the wait_for_completion code] Signed-off-by Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fbeb4384 |
|
13-Apr-2009 |
Jean Delvare <khali@linux-fr.org> |
edac: use to_delayed_work() The edac-core driver includes code which assumes that the work_struct which is included in every delayed_work is the first member of that structure. This is currently the case but might change in the future, so use to_delayed_work() instead, which doesn't make such an assumption. linux-2.6.30-rc1 has the to_delayed_work() function that will allow this patch to work Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
281efb17 |
|
06-Jan-2009 |
Kay Sievers <kay.sievers@vrfy.org> |
edac: struct device: replace bus_id with dev_name(), dev_set_name() This patch is part of a larger patch series which will remove the "char bus_id[20]" name string from struct device. The device name is managed in the kobject anyway, and without any size limitation, and just needlessly copied into "struct device". [akpm@linux-foundation.org: coding-style fixes] Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
17aa7e03 |
|
04-May-2008 |
Stephen Rothwell <sfr@canb.auug.org.au> |
dev_name introduction fall out fix Commit 06916639e2fed9ee475efef2747a1b7429f8fe76 ("driver-core: add dev_name() to help transition away from using bus_id") added a static inline dev_name() and used it in dev_printk. Unfortunately, drivers/edac/edac_core.h defines a macro called dev_name(). Rename the latter. Diagnosis by Tony Breeds and Michael Ellerman. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1a45027d |
|
29-Apr-2008 |
Adrian Bunk <bunk@kernel.org> |
edac: remove unneeded functions and add static accessor Collection of patches, merged into one, from Adrian that do the following: 1) This patch makes the following needlessly global functions static: - edac_pci_get_log_pe() - edac_pci_get_log_npe() - edac_pci_get_panic_on_pe() - edac_pci_unregister_sysfs_instance_kobj() - edac_pci_main_kobj_setup() 2) Remove unneeded function edac_device_find() 3) Added #if 0 around function edac_pci_find() 4) make the needlessly global edac_pci_generic_check() static 5) Removed function edac_check_mc_devices() Doug Thompson modified Adrian's patches, to bettern represent the direction of EDAC, and make them one patch. Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ff6ac2a6 |
|
29-Apr-2008 |
Robert P. J. Day <rpjday@crashcourse.ca> |
edac: use the shorter LIST_HEAD for brevity Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bce19683 |
|
26-Jul-2007 |
Doug Thompson <dougthompson@xmission.com> |
drivers/edac: fix reset edac_mc pollmsec This fixes a deadlock that could occur on a 'setup' and 'teardown' sequence of the workq for a edac_mc control structure instance. A similiar fix was previously implemented for the edac_device code. In addition, the edac_mc device code there was missing code to allow the workq period valu to be altered via sysfs control. This patch adds that fix on the code, and allows for the changing of the period value as well. Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bf52fa4a |
|
19-Jul-2007 |
Doug Thompson <dougthompson@xmission.com> |
drivers/edac: fix workq reset deadlock Fix mutex locking deadlock on the device controller linked list. Was calling a lock then a function that could call the same lock. Moved the cancel workq function to outside the lock Added some short circuit logic in the workq code Added comments of description Code tidying Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Greg KH <greg@kroah.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8096cfaf |
|
19-Jul-2007 |
Doug Thompson <dougthompson@xmission.com> |
drivers/edac: fix edac_mc sysfs completion code This patch refactors the 'releasing' of kobjects for the edac_mc type of device. The correct pattern of kobject release is followed. As internal kobjs are allocated they bump a ref count on the top level kobj. It in turn has a module ref count on the edac_core module. When internal kobjects are released, they dec the ref count on the top level kobj. When the top level kobj reaches zero, it decrements the ref count on the edac_core object, allow it to be unloaded, as all resources have all now been released. Cc: Alan Cox alan@lxorguk.ukuu.org.uk Signed-off-by: Doug Thompson <dougthompson@xmission.com> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b8f6f975 |
|
19-Jul-2007 |
Doug Thompson <dougthompson@xmission.com> |
drivers/edac: fix edac_mc init apis Refactoring of sysfs code necessitated the refactoring of the edac_mc_alloc() and edac_mc_add_mc() apis, of moving the index value to the alloc() function. This patch alters the in tree drivers to utilize this new api signature. Having the index value performed later created a chicken-and-the-egg issue. Moving it to the alloc() function allows for creating the necessary sysfs entries with the proper index number Cc: Alan Cox alan@lxorguk.ukuu.org.uk Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7391c6dc |
|
19-Jul-2007 |
Douglas Thompson <dougthompson@xmission.com> |
drivers/edac: mod edac_align_ptr function Refactor the edac_align_ptr() function to reduce the noise of casting the aligned pointer to the various types of data objects and modified its callers to its new signature Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
052dfb45 |
|
19-Jul-2007 |
Douglas Thompson <dougthompson@xmission.com> |
drivers/edac: cleanup spaces-gotos after Lindent messup This patch fixes some remnant spaces inserted by the use of Lindent. Seems Lindent adds some spaces when it shoulded. These have been fixed. In addition, goto targets have issues, these have been fixed in this patch. Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
86aa8cb7 |
|
19-Jul-2007 |
Douglas Thompson <dougthompson@xmission.com> |
drivers/edac: cleanup workq ifdefs The origin of this code comes from patches at sourceforge, that allow EDAC to be updated to various kernels. With kernel version 2.6.20 a new workq system was installed, thus the patches needed to be modified based on the kernel version. For submitting to the latest kernel.org those #ifdefs are removed Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
079708b9 |
|
19-Jul-2007 |
Douglas Thompson <dougthompson@xmission.com> |
drivers/edac: core Lindent cleanup Run the EDAC CORE files through Lindent for cleanup Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4de78c68 |
|
19-Jul-2007 |
Dave Jiang <djiang@mvista.com> |
drivers/edac: mod PCI poll names Fixup poll values for MC and PCI. Also make mc function names unique to mc. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmissin.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
66ee2f94 |
|
19-Jul-2007 |
Dave Jiang <djiang@mvista.com> |
drivers/edac: mod assert_error check Change error check and clear variable from an atomic to an int Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
81d87cb1 |
|
19-Jul-2007 |
Dave Jiang <djiang@mvista.com> |
drivers/edac: mod MC to use workq instead of kthread Move the memory controller object to work queue based implementation from the kernel thread based. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c4192705 |
|
19-Jul-2007 |
Dave Jiang <djiang@mvista.com> |
drivers/edac: add dev_name getter function Move dev_name() macro to a more generic interface since it's not possible to determine whether a device is pci, platform, or of_device easily. Now each low level driver sets the name into the control structure, and the EDAC core references the control structure for the information. Better abstraction. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
20bcb7a8 |
|
19-Jul-2007 |
Douglas Thompson <dougthompson@xmission.com> |
drivers/edac: mod use edac_core.h In the refactoring of edac_mc.c into several subsystem files, the header file edac_mc.h became meaningless. A new header file edac_core.h was created. All the files that previously included "edac_mc.h" are changed to include "edac_core.h". Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c0d12172 |
|
19-Jul-2007 |
Dave Jiang <djiang@mvista.com> |
drivers/edac: add new nmi rescan Provides a way for NMI reported errors on x86 to notify the EDAC subsystem pending ECC errors by writing to a software state variable. Here's the reworked patch. I added an EDAC stub to the kernel so we can have variables that are in the kernel even if EDAC is a module. I also implemented the idea of using the chip driver to select error detection mode via module parameter and eliminate the kernel compile option. Please review/test. Thx! Also, I only made changes to some of the chipset drivers since I am unfamiliar with the other ones. We can add similar changes as we go. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
63b7df91 |
|
19-Jul-2007 |
Matthias Kaehlcke <matthias.kaehlcke@gmail.com> |
drivers/edac: change from semaphore to mutex operation The EDAC core code uses a semaphore as mutex. use the mutex API instead of the (binary) semaphore. Matthaias wrote this, but since I had some patches ahead of it, I need to modify it to follow my patches. Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e27e3dac |
|
19-Jul-2007 |
Douglas Thompson <dougthompson@xmission.com> |
drivers/edac: add edac_device class This patch adds the new 'class' of object to be managed, named: 'edac_device'. As a peer of the 'edac_mc' class of object, it provides a non-memory centric view of an ERROR DETECTING device in hardware. It provides a sysfs interface and an abstraction for varioius EDAC type devices. Multiple 'instances' within the class are possible, with each 'instance' able to have multiple 'blocks', and each 'block' having 'attributes'. At the 'block' level there are the 'ce_count' and 'ue_count' fields which the device driver can update and/or call edac_device_handle_XX() functions. At each higher level are additional 'total' count fields, which are a summation of counts below that level. This 'edac_device' has been used to capture and present ECC errors which are found in a a L1 and L2 system on a per CORE/CPU basis. Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7c9281d7 |
|
19-Jul-2007 |
Douglas Thompson <dougthompson@xmission.com> |
drivers/edac: split out functions to unique files This is a large patch to refactor the original EDAC module in the kernel and to break it up into better file granularity, such that each source file contains a given subsystem of the EDAC CORE. Originally, the EDAC 'core' was contained in one source file: edac_mc.c with it corresponding edac_mc.h file. Now, there are the following files: edac_module.c The main module init/exit function and other overhead edac_mc.c Code handling the edac_mc class of object edac_mc_sysfs.c Code handling for sysfs presentation edac_pci_sysfs.c Code handling for PCI sysfs presentation edac_core.h CORE .h include file for 'edac_mc' and 'edac_device' drivers edac_module.h Internal CORE .h include file This forms a foundation upon which a later patch can create the 'edac_device' class of object code in a new file 'edac_device.c'. Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2da1c119 |
|
19-Jul-2007 |
Adrian Bunk <bunk@stusta.de> |
drivers/edac: core: make functions static This patch makes needlessly global code static, in the edac core Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Doug Thompson <norsk5@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5da0831c |
|
19-Jul-2007 |
Douglas Thompson <dougthompson@xmission.com> |
drivers/edac: add edac_mc_find API This simple patch adds an important CORE API for EDAC that EDAC drivers can use to find their edac_mc control structure by passing a mem_ctl_info 'instance' value Needed for subsequent patches Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
83144186 |
|
17-Jul-2007 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
Freezer: make kernel threads nonfreezable by default Currently, the freezer treats all tasks as freezable, except for the kernel threads that explicitly set the PF_NOFREEZE flag for themselves. This approach is problematic, since it requires every kernel thread to either set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't care for the freezing of tasks at all. It seems better to only require the kernel threads that want to or need to be frozen to use some freezer-related code and to remove any freezer-related code from the other (nonfreezable) kernel threads, which is done in this patch. The patch causes all kernel threads to be nonfreezable by default (ie. to have PF_NOFREEZE set by default) and introduces the set_freezable() function that should be called by the freezable kernel threads in order to unset PF_NOFREEZE. It also makes all of the currently freezable kernel threads call set_freezable(), so it shouldn't cause any (intentional) change of behaviour to appear. Additionally, it updates documentation to describe the freezing of tasks more accurately. [akpm@linux-foundation.org: build fixes] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9794f33d |
|
12-Feb-2007 |
eric wollesen <ericw@xmtp.net> |
[PATCH] EDAC: Add Fully-Buffered DIMM APIs to core Eric Wollesen ported the Bluesmoke Memory Controller driver for the Intel 5000X/V/P (Blackford/Greencreek) chipset to the in kernel EDAC model. This patch incorporates those required changes to the edac_mc.c and edac_mc.h core files by added new Fully Buffered DIMM interface to the EDAC Core module. Signed-off-by: eric wollesen <ericw@xmtp.net> Signed-off-by: doug thompson <norsk5@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4f423ddf |
|
12-Feb-2007 |
Frithiof Jensen <frithiof.jensen@ericson.com> |
[PATCH] EDAC: Add memory scrubbing controls API to core This is an attempt of providing an interface for memory scrubbing control in EDAC. This patch modifies the EDAC Core to provide the Interface for memory controller modules to implment. The following things are still outstanding: - K8 is the first implemenation, The patch provide a method of configuring the K8 hardware memory scrubber via the 'mcX' sysfs directory. There should be some fallback to a generic scrubber implemented in software if the hardware does not support scrubbing. Or .. the scrubbing sysfs entry should not be visible at all. - Only works with SDRAM, not cache, The K8 can scrub cache and l2cache also - but I think this is not so useful as the cache is busy all the time (one hopes). One would also expect that cache scrubbing requires hardware support. - Error Handling, I would like that errors are returned to the user in "terms of file system". - Presentation, I chose Bandwidth in Bytes/Second as a representation of the scrubbing rate for the following reasons: I like that the sysfs entries are sort-of textual, related to something that makes sense instead of magical values that must be looked up. "My People" wants "% main memory scrubbed per hour" others prefer "% memory bandwidth used" as representation, "bandwith used" makes it easy to calculate both versions in one-liner scripts. If one later wants to scrub cache, the scaling becomes wierd for K8 changing from "blocks of 64 byte memory" to "blocks of 64 cache lines" to "blocks of 64 bit". Using "bandwidth used" makes sense in all three cases, (I.M.O. anyway ;-). - Discovery, There is no way to discover the possible settings and what they do without reading the code and the documentation. *I* do not know how to make that work in a practical way. - Bugs(??), other tools can set invalid values in the memory scrub control register, those will read back as '-1', requiring the user to reset the scrub rate. This is how *I* think it should be. - Afflicting other areas of code, I made changes to edac_mc.c and edac_mc.h which will show up globally - this is not nice, it would be better that the memory scrubbing fuctionality and interface could be entirely contained within the memory controller it applies to. Frithiof Jensen edac_mc.c and its .h file is a CORE helper module for EDAC driver modules. This provides the abstraction for device specific drivers. It is fine to modify this CORE to provide help for new features of the the drivers doug thompson Signed-off-by: Frithiof Jensen <frithiof.jensen@ericson.com> Signed-off-by: doug thompson <norsk5@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7dfb7103 |
|
06-Dec-2006 |
Nigel Cunningham <ncunningham@linuxmail.org> |
[PATCH] Add include/linux/freezer.h and move definitions from sched.h Move process freezing functions from include/linux/sched.h to freezer.h, so that modifications to the freezer or the kernel configuration don't require recompiling just about everything. [akpm@osdl.org: fix ueagle driver] Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
77d6e139 |
|
02-Nov-2006 |
Akinobu Mita <akinobu.mita@gmail.com> |
[PATCH] edac_mc: fix error handling Call sysdev_class_unregister() on failure in edac_sysfs_memctrl_setup() and decrease identation level for clear logic. Acked-by: Doug Thompson <norsk5@xmission.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
49c0dab7 |
|
10-Jul-2006 |
Doug Thompson <norsk5@xmission.com> |
[PATCH] Fix and enable EDAC sysfs operation When EDAC was first introduced into the kernel it had a sysfs interface, but due to some problems it was disabled in 2.6.16 and remained disabled in 2.6.17. With feedback, several of the control and attribute files of that interface had some good constructive feedback. PCI Blacklist/Whitelist was a major set which has design issues and it has been removed in this patch. Instead of storing PCI broken parity status in EDAC, it has been moved to the pci_dev structure itself by a previous PCI patch. A future patch will enable that feature in EDAC by utilizing the pci_dev info. The sysfs is now enabled in this patch, with a minimal set of control and attribute files for examining EDAC state and for enabling/disabling the memory and PCI operations. The Documentation for EDAC has also been updated to reflect the new state of EDAC operation. Signed-off-by:Doug Thompson <norsk5@xmisson.com> Cc: Greg KH <greg@kroah.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
2d7bbb91 |
|
30-Jun-2006 |
Doug Thompson <norsk5@xmission.com> |
[PATCH] EDAC: mc numbers refactor 1-of-2 Remove add_mc_to_global_list(). In next patch, this function will be reimplemented with different semantics. 1 Reimplement add_mc_to_global_list() with semantics that allow the caller to determine the ID number for a mem_ctl_info structure. Then modify edac_mc_add_mc() so that the caller specifies the ID number for the new mem_ctl_info structure. Platform-specific code should be able to assign the ID numbers in a platform-specific manner. For instance, on Opteron it makes sense to have the ID of the mem_ctl_info structure match the ID of the node that the memory controller belongs to. 2 Modify callers of edac_mc_add_mc() so they use the new semantics. Signed-off-by: Doug Thompson <norsk5@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
37f04581 |
|
30-Jun-2006 |
Doug Thompson <norsk5@xmission.com> |
[PATCH] EDAC: PCI device to DEVICE cleanup Change MC drivers from using CVS revision strings for their version number, Now each driver has its own local string. Remove some PCI dependencies from the core EDAC module. Made the code 'struct device' centric instead of 'struct pci_dev' Most of the code changes here are from a patch by Dave Jiang. It may be best to eventually move the PCI-specific code into a separate source file. Signed-off-by: Doug Thompson <norsk5@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
7f927fcc |
|
28-Mar-2006 |
Alexey Dobriyan <adobriyan@gmail.com> |
[PATCH] Typo fixes Fix a lot of typos. Eyeballed by jmc@ in OpenBSD. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
9110540f |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: use EXPORT_SYMBOL_GPL Change all instances of EXPORT_SYMBOL() in the core EDAC module to EXPORT_SYMBOL_GPL(). Signed-off-by: David S. Peterson <dsp@llnl.gov> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e7ecd891 |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: formatting cleanup Cosmetic indentation/formatting cleanup for EDAC code. Make sure we are using tabs rather than spaces to indent, etc. Signed-off-by: David S. Peterson <dsp@llnl.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
54933ddd |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: reorder EXPORT_SYMBOL macros Fix EDAC code so EXPORT_SYMBOL comes after the function that is being exported. This is to maintain consistency with the rest of the kernel. Signed-off-by: David S. Peterson <dsp@llnl.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
18dbc337 |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: protect memory controller list - Fix code so we always hold mem_ctls_mutex while we are stepping through the list of mem_ctl_info structures. Otherwise bad things may happen if one task is stepping through the list while another task is modifying it. We may eventually want to use reference counting to manage the mem_ctl_info structures. In the meantime we may as well fix this bug. - Don't disable interrupts while we are walking the list of mem_ctl_info structures in check_mc_devices(). This is unnecessary. Signed-off-by: David S. Peterson <dsp@llnl.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
472678eb |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: kobject/sysfs fixes - After we unregister a kobject, wait for our kobject release method to call complete(). This causes us to wait until the kobject reference count reaches 0. Otherwise, a task accessing the EDAC sysfs interface can hold the reference count above 0 until after the EDAC module has been unloaded. When the reference count finally drops to 0, this will result in an attempt to call our release method inside the EDAC module after the module has already been unloaded. This isn't the best fix, since a process can get stuck sleeping forever uninterruptibly if the user does the following: rmmod my_module < /sys/my_sysfs/file I'll go back and implement a better fix later. However this should be ok for now. - Call edac_remove_sysfs_mci_device() from edac_mc_del_mc() rather than from edac_mc_free(). Since edac_mc_add_mc() calls edac_create_sysfs_mci_device(), edac_mc_del_mc() should call edac_remove_sysfs_mci_device(). Signed-off-by: David S. Peterson <dsp@llnl.gov> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
6e5a8748 |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: kobject_init/kobject_put fixes - Remove calls to kobject_init(). These are unnecessary because kobject_register() calls kobject_init(). - Remove extra calls to kobject_put(). When we call kobject_unregister(), this releases our reference to the kobject. The extra calls to kobject_put() may cause the reference count to drop to 0 while a kobject is still in use. Signed-off-by: David S. Peterson <dsp@llnl.gov> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
028a7b6d |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: edac_mc_add_mc fix [2/2] This is part 2 of a 2-part patch set. Fix edac_mc_add_mc() so it cleans up properly if call to edac_create_sysfs_mci_device() fails. Signed-off-by: David S. Peterson <dsp@llnl.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
a1d03fcc |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: edac_mc_add_mc fix [1/2] This is part 1 of a 2-part patch set. The code changes are split into two parts to make the patches more readable. Move complete_mc_list_del() and del_mc_from_global_list() so we can call del_mc_from_global_list() from edac_mc_add_mc() without forward declarations. Perhaps using forward declarations would be better? I'm doing things this way because the rest of the code is missing them. Signed-off-by: David S. Peterson <dsp@llnl.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
749ede57 |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: cleanup code for clearing initial errors Fix xxx_probe1() functions so they call xxx_get_error_info() functions to clear initial errors. This is simpler and cleaner than duplicating the low-level code for accessing PCI config space. Signed-off-by: David S. Peterson <dsp@llnl.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
537fba28 |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: printk cleanup This implements the following idea: On Monday 30 January 2006 19:22, Eric W. Biederman wrote: > One piece missing from this conversation is the issue that we need errors > in a uniform format. That is why edac_mc has helper functions. > > However there will always be errors that don't fit any particular model. > Could we add a edac_printk(dev, ); That is similar to dev_printk but > prints out an EDAC header and the device on which the error was found? > Letting the rest of the string be user specified. > > For actual control that interface may be to blunt, but at least for people > looking in the logs it allows all of the errors to be detected and > harvested. Signed-off-by: David S. Peterson <dsp@llnl.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f2fe42ab |
|
26-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: switch to kthread_ API This patch was originally posted by Christoph Hellwig (see http://lkml.org/lkml/2006/2/14/331): "Christoph Hellwig" <hch@lst.de> wrote: > Use the kthread_ API instead of opencoding lots of hairy code for kernel > thread creation and teardown, including tasklist_lock abuse. > Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Peterson <dsp@llnl.gov> Cc: <dave_peterson@pobox.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
ceb2ca9c |
|
13-Mar-2006 |
Dave Peterson <dsp@llnl.gov> |
[PATCH] EDAC: disable sysfs interface - Disable the EDAC sysfs code. The sysfs interface that EDAC presents to user space needs more thought, and is likely to change substantially. Therefore disable it for now so users don't start depending on it in its current form. - Disable the default behavior of calling panic() when an uncorrectible error is detected (since for now, there is no sysfs interface that allows the user to configure this behavior). Signed-off-by: David S. Peterson <dsp@llnl.gov> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
4136cabf |
|
11-Mar-2006 |
Arjan van de Ven <arjan@linux.intel.com> |
[PATCH] edac: disable a few sysfs files to avoid them becoming an ABI Disable (via ugly #if 0's) the 3 sysfs files that I think by now we all agree are very much wrong. These files shouldn't become part of the ABI by the 2.6.16 release, so I rather have this minimal patch merged to disable them for now, the real fix can then come during the 2.6.17 devel window. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
353368df |
|
03-Feb-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] edac_mc: Remove include of version.h By including version.h edac_mc was rebuilding on every incremental build. Which defeats the point of incremental builds. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
da9bb1d2 |
|
18-Jan-2006 |
Alan Cox <alan@lxorguk.ukuu.org.uk> |
[PATCH] EDAC: core EDAC support code This is a subset of the bluesmoke project core code, stripped of the NMI work which isn't ready to merge and some of the "interesting" proc functionality that needs reworking or just has no place in kernel. It requires no core kernel changes except the added scrub functions already posted. The goal is to merge further functionality only after the core code is accepted and proven in the base kernel, and only at the point the upstream extras are really ready to merge. From: doug thompson <norsk5@xmission.com> This converts EDAC to sysfs and is the final chunk neccessary before EDAC has a stable user space API and can be considered for submission into the base kernel. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: doug thompson <norsk5@xmission.com> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|