History log of /linux-master/drivers/edac/edac_mc.c
Revision Date Author Comments
# 9a5f580c 02-Nov-2023 Muralidhara M K <muralidhara.mk@amd.com>

EDAC/mc: Add support for HBM3 memory type

AMD MI300A models use HBM3 (High Bandwidth Memory Gen 3) memory. HBM is
a high-speed computer memory interface for 3D-stacked synchronous
dynamic random-access memory (SDRAM).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231102114225.2006878-4-muralimk@amd.com


# 93df1947 22-Aug-2022 Serge Semin <Sergey.Semin@baikalelectronics.ru>

EDAC/mc: Drop duplicated dimm->nr_pages debug printout

The duplicated edac_dbg()-based dimm->nr_pages print was introduced in

6e84d359b2be ("edac_mc: Cleanup per-dimm_info debug messages").

The duplicated line can be found even in the commit message text:

[ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
[ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8
[ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000

Drop the second edac_dbg() call.

[ bp: Massage commit message. ]

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220822190730.27277-14-Sergey.Semin@baikalelectronics.ru


# 13088b65 12-Apr-2022 Borislav Petkov <bp@suse.de>

EDAC: Use kcalloc()

It is syntactic sugar anyway:

# drivers/edac/edac_mc.o:

text data bss dec hex filename
13378 324 8 13710 358e edac_mc.o.before
13378 324 8 13710 358e edac_mc.o.after

md5:
70a53ee3ac7f867730e35c2be9110748 edac_mc.o.before.asm
70a53ee3ac7f867730e35c2be9110748 edac_mc.o.after.asm

# drivers/edac/edac_device.o:

text data bss dec hex filename
5704 120 4 5828 16c4 edac_device.o.before
5704 120 4 5828 16c4 edac_device.o.after

md5:
880563c859da6eb9aca85ec431fdbaeb edac_device.o.before.asm
880563c859da6eb9aca85ec431fdbaeb edac_device.o.after.asm

No functional changes.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220412211957.28899-1-bp@alien8.de


# 713c4ff8 08-Mar-2022 Borislav Petkov <bp@suse.de>

EDAC/mc: Get rid of edac_align_ptr()

Get rid of it now that it is unused.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220310095254.1510-6-bp@alien8.de


# 0bbb265f 20-Feb-2022 Borislav Petkov <bp@suse.de>

EDAC/mc: Get rid of silly one-shot struct allocation in edac_mc_alloc()

This has probably meant something at some point but there's no need for
it anymore - the struct mem_ctl_info allocation can happen with normal,
boring k*alloc() calls like everyone else does it.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220310095254.1510-2-bp@alien8.de


# b0596da1 13-Jan-2022 Eliav Farber <farbere@amazon.com>

EDAC/mc: Remove unnecessary cast to char * in edac_align_ptr()

Remove the forgotten (char *) casts as that function returns void *.

[ bp: Rewrite commit message. ]

Signed-off-by: Eliav Farber <farbere@amazon.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220113100622.12783-3-farbere@amazon.com


# f8efca92 13-Jan-2022 Eliav Farber <farbere@amazon.com>

EDAC: Fix calculation of returned address and next offset in edac_align_ptr()

Do alignment logic properly and use the "ptr" local variable for
calculating the remainder of the alignment.

This became an issue because struct edac_mc_layer has a size that is not
zero modulo eight, and the next offset that was prepared for the private
data was unaligned, causing an alignment exception.

The patch in Fixes: which broke this actually wanted to "what we
actually care about is the alignment of the actual pointer that's about
to be returned." But it didn't check that alignment.

Use the correct variable "ptr" for that.

[ bp: Massage commit message. ]

Fixes: 8447c4d15e35 ("edac: Do alignment logic properly in edac_align_ptr()")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220113100622.12783-2-farbere@amazon.com


# f9571124 08-Dec-2021 Yazen Ghannam <yazen.ghannam@amd.com>

EDAC: Add RDDR5 and LRDDR5 memory types

Include Registered-DDR5 and Load-Reduced DDR5 in the list of memory
types.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208174356.1997855-2-yazen.ghannam@amd.com


# fca61165 03-Sep-2021 Len Baker <len.baker@gmx.com>

EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or scnprintf()

strcpy() performs no bounds checking on the destination buffer. This
could result in linear overflows beyond the end of the buffer, leading
to all kinds of misbehavior. The safe replacement is strscpy().
[1][2]

However, to simplify and clarify the code, to concatenate labels use the
scnprintf() function. This way it is not necessary to check the return
value of strscpy() (-E2BIG if the parameter count is 0 or the src was
truncated) since scnprintf() always returns the number of chars written
into the buffer. This function always returns a nul-terminated string
even if it needs to be truncated.

While at it, fix all other broken string generation code that wrongly
interprets snprintf()'s return code or just uses sprintf(), implement
that using scnprintf() here too. Drop breaks in loops around
scnprintf() as it is safe now to loop.

Moreover, the check is not needed: for the case when the buffer is
exhausted, len never gets zero because scnprintf() takes the full buffer
length as input parameter, but excludes the trailing '\0' in its return
code and thus, 1 is the minimum len.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy
[2] https://github.com/KSPP/linux/issues/88

[ rric: Replace snprintf() with scnprintf(), rework sprintf() user,
drop breaks in loops around scnprintf(), introduce 'end' pointer to
reduce pointer arithmetic, use prefix pattern for e->location,
adjust subject and description ]

Co-developed-by: Joe Perches <joe@perches.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210903150539.7282-1-len.baker@gmx.com


# e1ca90b7 30-Jun-2021 Naveen Krishna Chatradhi <nchatrad@amd.com>

EDAC/mc: Add new HBM2 memory type

Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]'
for HBM2 (High Bandwidth Memory Gen 2) new memory type.

Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210630152828.162659-4-nchatrad@amd.com


# bc1c99a5 17-Nov-2020 Qiuxu Zhuo <qiuxu.zhuo@intel.com>

EDAC: Add DDR5 new memory type

Add a new entry to 'enum mem_type' and a new string to
'edac_mem_types[]' for DDR5 new memory type.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 3b203693 05-Nov-2020 Qiuxu Zhuo <qiuxu.zhuo@intel.com>

EDAC: Add three new memory types

There are {Low-Power DDR3/4, WIO2} types of memory.
Add new entries to 'enum mem_type' and new strings to
'edac_mem_types[]' for the new types.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# e9ff6636 10-Jun-2020 Zhenzhong Duan <zhenzhong.duan@gmail.com>

EDAC/mc: Call edac_inc_ue_error() before panic

By calling edac_inc_ue_error() before panic, we get a correct UE error
count for core dump analysis.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20200610065846.3626-2-zhenzhong.duan@gmail.com


# 7fc0b9b9 14-Feb-2020 Tony Luck <tony.luck@intel.com>

EDAC: Drop the EDAC report status checks

When acpi_extlog was added, we were worried that the same error would
be reported more than once by different subsystems. But in the ensuing
years I've seen complaints that people could not find an error log
(because this mechanism suppressed the log they were looking for).

Rip it all out. People are smart enough to notice the same address from
different reporting mechanisms.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200214222720.13168-8-tony.luck@intel.com


# 4aa92c86 16-Feb-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Remove per layer counters

Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it
turns out that only the leaves in the memory hierarchy are consumed
(in sysfs), but not the intermediate layers, e.g.:

count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];

These unused counters only add complexity, remove them. The error
counter values are directly stored in struct dimm_info now.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-11-rrichter@marvell.com


# 1853ee72 23-Jan-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Remove detail[] string and cleanup error string generation

The error descriptor is passed to the error reporting functions, so
the error details can be directly generated there. Move string
generation from edac_raw_mc_handle_error() to edac_ce_error() and
edac_ue_error(). The intermediate detail[] string can be removed then.

Also, cleanup the string generation by switching to a single variant
only using the ternary operator.

[ bp: put ternary operators on a separate line for better readability
and use the short-form "inline if" in edac_mc_handle_error(). ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-10-rrichter@marvell.com


# 6ab76179 23-Jan-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Pass the error descriptor to error reporting functions

Most arguments of error reporting functions are already stored in the
struct edac_raw_error_desc error descriptor. Pass the error descriptor
to the functions and reduce the functions' argument list.

[ bp: Sort function args in reverse fir tree order. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-9-rrichter@marvell.com


# 67792cf9 23-Jan-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Remove enable_per_layer_report function argument

Many functions carry the enable_per_layer_report argument. This is a
bool value indicating the error information contains some location
data where the error occurred. This can easily being determined by
checking the pos[] array for values. Negative values indicate there is
no location available. So if the top layer is negative, the error
location is unknown.

Just check if the top layer is negative and remove
enable_per_layer_report as function argument and also from struct
edac_raw_error_desc.

[ bp: Reflow comments to 80 columns, while at it. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-8-rrichter@marvell.com


# 65bb4d1a 23-Jan-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Report "unknown memory" on too many DIMM labels found

There is a limitation to report only EDAC_MAX_LABELS in e->label of
the error descriptor. This is to prevent a potential string overflow.

The current implementation falls back to "any memory" in this case and
also stops all further processing to find a unique row and channel of
the possible error location.

Reporting "any memory" is wrong as the memory controller reported an
error location for one of the layers. Instead, report "unknown memory"
and also do not break early in the loop to further check row and channel
for uniqueness.

[ bp: Massage commit message. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-7-rrichter@marvell.com


# 6334dc4e 14-Feb-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Carve out error increment into a separate function

Carve out the error_count increment into a separate function
edac_inc_csrow(). This better separates code and reduces the indentation
level.

Implementation note: The function edac_inc_csrow() counts the same
as before, ->ce_count is only incremented if row >= 0. This is esp.
true for the case of (!e->enable_per_layer_report). Here, a DIMM was
not found, variable row still has a value of -1 and ->ce_count is not
incremented.

[ bp: Massage commit message. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200214141757.8976-1-rrichter@marvell.com


# 91b327f6 23-Jan-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Determine mci pointer from the error descriptor

Each struct mci has its own error descriptor. Create a function
error_desc_to_mci() to determine the corresponding mci from an
error descriptor. This removes @mci from the parameter list of
edac_raw_mc_handle_error() as the mci pointer does not need to be passed
any longer.

[ bp: Massage commit message. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-5-rrichter@marvell.com


# 672ef0e5 23-Jan-2020 Robert Richter <rrichter@marvell.com>

EDAC: Store error type in struct edac_raw_error_desc

Store the error type in struct edac_raw_error_desc. This makes the
type parameter of edac_raw_mc_handle_error() obsolete.

[ kernel-doc typo ]
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-4-rrichter@marvell.com


# 1f27c790 23-Jan-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Reorder functions edac_mc_alloc*()

Reorder the new created functions edac_mc_alloc_csrows() and
edac_mc_alloc_dimms() and move them before edac_mc_alloc(). No further
code changes.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-3-rrichter@marvell.com


# aad28c6f 23-Jan-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Split edac_mc_alloc() into smaller functions

edac_mc_alloc() is huge. Factor out code by moving it to the two new
functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms(). Do not
move code yet for better review.

[ bp: sort local args in reversed fir tree order. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-2-rrichter@marvell.com


# bea1bfd5 12-Feb-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Change mci device removal to use put_device()

There are dimm and csrow devices linked to the mci device esp. to show
up in sysfs. It must be granted that children devices are removed before
its mci parent. Thus, the release functions must be called in the
correct order and may not miss any child before releasing its parent. In
the current implementation this is only granted by the correct order of
release functions.

A much better approach is to use put_device() that releases the device
only after all users are gone. It is the recommended way to release a
device and free its memory. The function uses the device's refcount and
only frees it if there are no users of it anymore such as children.

So implement a mci_release() function to remove mci devices, use
put_device() to free them and early initialize the mci device right
after its struct has been allocated.

Change the release function so that it can be universally used no
matter if the device is registered or not. Since subsequent dimm
and csrow sysfs links are implemented as children devices, their
refcounts will keep the parent mci device from being removed as long
as sysfs entries exist and until all users have been unregistered in
edac_remove_sysfs_mci_device().

Remove edac_unregister_sysfs() and merge mci sysfs removal into
edac_remove_sysfs_mci_device(). There is only a single instance now that
removes the sysfs entries. The function can now be used in the error
paths for cleanup.

Also, create device release functions for all involved devices
(dev->release), remove device_type release functions (dev_type->
release) and also use dev->init_name instead of dev_set_name().

[ bp: Massage commit message and comments. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200212120340.4764-5-rrichter@marvell.com


# 216aa145 12-Feb-2020 Robert Richter <rrichter@marvell.com>

EDAC/mc: Fix use-after-free and memleaks during device removal

A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:

1) Use-after-free:

On 27.11.19 17:07:33, John Garry wrote:
> [ 22.104498] BUG: KASAN: use-after-free in
> edac_remove_sysfs_mci_device+0x148/0x180

The use-after-free is caused by the mci_for_each_dimm() macro called in
edac_remove_sysfs_mci_device(). The iterator was introduced with

c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator").

The iterator loop calls device_unregister(&dimm->dev), which removes
the sysfs entry of the device, but also frees the dimm struct in
dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
the dimm struct is accessed again, after having been freed already.

The fix is to free all the mci device's subsequent dimm and csrow
objects at a later point, in _edac_mc_free(), when the mci device itself
is being freed.

This keeps the data structures intact and the mci device can be
fully used until its removal. The change allows the safe usage of
mci_for_each_dimm() to release dimm devices from sysfs.

2) Memory leaks:

Following memory leaks have been detected:

# grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows
16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels
16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn]
1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms
34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()

All leaks are from memory allocated by edac_mc_alloc().

Note: The test above shows that edac_mc_alloc() was called here from
ghes_edac_register(), thus both functions show up in the stack trace
but the module causing the leaks is edac_mc. The comments with the data
structures involved were made manually by analyzing the objdump.

The data structures listed above and created by edac_mc_alloc() are
not properly removed during device removal, which is done in
edac_mc_free().

There are two paths implemented to remove the device depending on device
registration, _edac_mc_free() is called if the device is not registered
and edac_unregister_sysfs() otherwise.

The implemenations differ. For the sysfs case, the mci device removal
lacks the removal of subsequent data structures (csrows, channels,
dimms). This causes the memory leaks (see mci_attr_release()).

[ bp: Massage commit message. ]

Fixes: c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator")
Fixes: faa2ad09c01c ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com


# 787d8999 06-Nov-2019 Robert Richter <rrichter@marvell.com>

EDAC: Unify the mc_event tracepoint call

The code in ghes_edac.c and edac_mc.c for grain_bits calculation and
calling trace_mc_event() is now the same. Move it to a single location
in edac_raw_mc_handle_error().

The only difference is the missing IS_ENABLED(CONFIG_RAS) switch, but
this is needed for ghes too.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-13-rrichter@marvell.com


# 0d8292e0 06-Nov-2019 Robert Richter <rrichter@marvell.com>

EDAC/mc: Reduce indentation level in edac_mc_handle_error()

Reduce the indentation level in edac_mc_handle_error() a bit.

No functional changes.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-7-rrichter@marvell.com


# 47bec6b4 06-Nov-2019 Robert Richter <rrichter@marvell.com>

EDAC/mc: Remove needless zero string termination

The e string to which this is pointing to has already been cleared
earlier in the function so remove the needless zero string termination.

[ bp: Correct the commit message. ]

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-6-rrichter@marvell.com


# d260e8ff 06-Nov-2019 Robert Richter <rrichter@marvell.com>

EDAC/mc: Do not BUG_ON() in edac_mc_alloc()

No need to crash the system in case edac_mc_alloc() is called with
invalid arguments, just warn and return. This would cause a checkpatch
warning when touching the code later, so just fix it.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-5-rrichter@marvell.com


# c498afaf 06-Nov-2019 Robert Richter <rrichter@marvell.com>

EDAC: Introduce an mci_for_each_dimm() iterator

Introduce an mci_for_each_dimm() iterator. It returns a pointer to
a struct dimm_info. This makes the declaration and use of an index
obsolete and avoids access to internal data of struct mci (direct array
access etc).

[ bp: push the struct dimm_info *dimm; declaration into the
CONFIG_EDAC_DEBUG block. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com


# 977b1ce7 06-Nov-2019 Robert Richter <rrichter@marvell.com>

EDAC: Remove EDAC_DIMM_OFF() macro

The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index.
Simplify this by storing the index in struct dimm_info to avoid its
calculation and remove the EDAC_DIMM_OFF() macro. The index can be
directly used then.

Another advantage is that edac_mc_alloc() could be used even if the
exact size of the layers is unknown. Only the number of DIMMs would be
needed.

Rename iterator variable to idx, while at it. The name is more handy,
esp. when searching for it in the code.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com


# d55c79ac 01-Sep-2019 Robert Richter <rrichter@marvell.com>

EDAC: Prefer 'unsigned int' to bare use of 'unsigned'

Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for
edac_mc*, ghes and the i5100 driver as reported by checkpatch.pl.

While at it, struct member dev_ch_attribute->channel is always used as
unsigned int. Change type to unsigned int to avoid type casts.

[ bp: Massage. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190902123216.9809-2-rrichter@marvell.com


# 718d5851 24-Jun-2019 Robert Richter <rrichter@marvell.com>

EDAC/mc: Cleanup _edac_mc_free() code

Remove needless and boilerplate variable declarations. No functional
changes.

[ bp: Add newlines for better readability. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190624150758.6695-10-rrichter@marvell.com


# 3724ace5 24-Jun-2019 Robert Richter <rrichter@marvell.com>

EDAC/mc: Fix grain_bits calculation

The grain in EDAC is defined as "minimum granularity for an error
report, in bytes". The following calculation of the grain_bits in
edac_mc is wrong:

grain_bits = fls_long(e->grain) + 1;

Where grain_bits is defined as:

grain = 1 << grain_bits

Example:

grain = 8 # 64 bit (8 bytes)
grain_bits = fls_long(8) + 1
grain_bits = 4 + 1 = 5

grain = 1 << grain_bits
grain = 1 << 5 = 32

Replace it with the correct calculation:

grain_bits = fls_long(e->grain - 1);

The example gives now:

grain_bits = fls_long(8 - 1)
grain_bits = fls_long(7)
grain_bits = 3

grain = 1 << 3 = 8

Also, check if the hardware reports a reasonable grain != 0 and fallback
with a warning to 1 byte granularity otherwise.

[ bp: massage a bit. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190624150758.6695-2-rrichter@marvell.com


# 29a0c843 14-May-2019 Robert Richter <rrichter@marvell.com>

EDAC/mc: Fix edac_mc_find() in case no device is found

The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.

I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.

[ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ]

Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com


# 861e6ed6 05-Nov-2018 Borislav Petkov <bp@suse.de>

EDAC: Drop per-memory controller buses

... and use the single edac_subsys object returned from
subsys_system_register(). The idea is to have a single bus
and multiple devices on it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CC: Aristeu Rozanski Filho <arozansk@redhat.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Justin Ernst <justin.ernst@hpe.com>
CC: linux-edac <linux-edac@vger.kernel.org>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Russ Anderson <rja@hpe.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic


# b748f2de 10-Aug-2018 Takashi Iwai <tiwai@suse.de>

EDAC: Add missing MEM_LRDDR4 entry in edac_mem_types[]

The edac_mem_types[] array misses a MEM_LRDDR4 entry, which leads to
NULL pointer dereference when accessed via sysfs or such.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180810141426.8918-1-tiwai@suse.de
Fixes: 1e8096bb2031 ("EDAC: Add LRDDR4 DRAM type")
Signed-off-by: Borislav Petkov <bp@suse.de>


# 001f8613 12-Mar-2018 Tony Luck <tony.luck@intel.com>

EDAC: Add new memory type for non-volatile DIMMs

There are now non-volatile versions of DIMMs. Add a new entry to "enum
mem_type" and a new string in edac_mem_types[].

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-3-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# d6dd77eb 12-Mar-2018 Tony Luck <tony.luck@intel.com>

EDAC: Drop duplicated array of strings for memory type names

Somehow we ended up with two separate arrays of strings to describe the
"enum mem_type" values.

In edac_mc.c we have an exported list edac_mem_types[] that is used
by a couple of drivers in debug messaged.

In edac_mc_sysfs.c we have a private list that is used to display
values in:
/sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type
/sys/devices/system/edac/mc/mc*/csrow*/mem_type

This list was missing a value for MEM_LRDDR3.

The string values in the two lists were different :-(

Combining the lists, I kept the values so that the sysfs output
will be unchanged as some scripts may depend on that.

Reported-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# 3877c7d1 23-Aug-2017 Toshi Kani <toshi.kani@hpe.com>

EDAC: Add helper which returns the loaded platform driver

Only a single EDAC platform driver can be loaded. When ghes_edac is
enabled, an EDAC platform driver still attempts to register itself and
fails in edac_mc_add_mc().

Add edac_get_owner() so that EDAC platform drivers can check the owner
first.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170823225447.15608-5-toshi.kani@hpe.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# bffc7dec 04-Feb-2017 Borislav Petkov <bp@suse.de>

EDAC: Rename report status accessors

Change them to have the edac_ prefix.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>


# fee27d7d 04-Feb-2017 Borislav Petkov <bp@suse.de>

EDAC: Delete edac_stub.c

Move the remaining functionality to edac_mc.c. Convert "edac_report=" to
a module parameter.

Signed-off-by: Borislav Petkov <bp@suse.de>


# be1d1629 03-Feb-2017 Borislav Petkov <bp@suse.de>

EDAC: Issue tracepoint only when it is defined

... and this happens only when CONFIG_RAS is enabled.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 8c22b4fe 26-Jan-2017 Borislav Petkov <bp@suse.de>

EDAC: Move edac_op_state to edac_mc.c

... as part of moving stuff away from edac_stub.c

Signed-off-by: Borislav Petkov <bp@suse.de>


# d3116a08 26-Jan-2017 Borislav Petkov <bp@suse.de>

EDAC: Remove edac_err_assert

... and the glue around it. It is not needed anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 97bb6c17 26-Jan-2017 Borislav Petkov <bp@suse.de>

EDAC: Get rid of edac_handlers

Use mc_devices list instead to check whether we have EDAC driver
instances successfully registered with EDAC core.

Signed-off-by: Borislav Petkov <bp@suse.de>


# d7fc9d77 27-Jan-2017 Yazen Ghannam <Yazen.Ghannam@amd.com>

EDAC: Add routine to check if MC devices list is empty

We need to know if any MC devices have been allocated.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com
[ Prettify text. ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# 7c0f6ba6 24-Dec-2016 Linus Torvalds <torvalds@linux-foundation.org>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e01aa14c 26-Oct-2016 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: move documentation from edac_mc.c to edac_core.h

Several functions are documented at edac_mc.c.

As we'll be including edac_core.h at drivers-api book, move
those, in order for the kernel-doc markups be part of the API
documentation book.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>


# 78d88e8a 29-Oct-2016 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: rename edac_core.h to edac_mc.h

Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>


# c73e8833 14-Nov-2016 Borislav Petkov <bp@suse.de>

EDAC, mc: Fix locking around mc_devices list

When accessing the mc_devices list of memory controller descriptors, we
need to hold mem_ctls_mutex. This was not always the case, fix that.

Make all external callers call a version which grabs the mutex since the
last is local to edac_mc.c.

Reported-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>


# fbedcaf4 19-May-2016 Nicholas Krause <xerofoify@gmail.com>

EDAC: Fix workqueues poll period resetting

After the workqueue cleanup, we're registering workqueues based on
the presence of an ->edac_check function. When that is the case,
we're setting OP_RUNNING_POLL. But we forgot to check that in
edac_mc_reset_delay_period(), leading to:

BUG: unable to handle kernel paging request at 0000000000015d10
IP: [ .. ] queued_spin_lock_slowpath
PGD 3ffcc8067 PUD 3ffc56067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: ...
CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1
Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
Stack:
Call Trace:
? _raw_spin_lock_irqsave
? lock_timer_base.isra.34
? del_timer
? try_to_grab_pending
? mod_delayed_work_on
? edac_mc_reset_delay_period
? edac_set_poll_msec
? param_attr_store
? module_attr_store
? kernfs_fop_write
? __vfs_write
? __vfs_read
? __alloc_fd
? vfs_write
? SyS_write
? entry_SYSCALL_64_fastpath
Code:
RIP [ .. ] queued_spin_lock_slowpath
RSP <>
CR2: 0000000000015d10
---[ end trace 3f286bc71cca15d1 ]---
Kernel panic - not syncing: Fatal exception

Fix it.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Cc: <stable@vger.kernel.org> # 4.5
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com
[ Rewrite commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# 993f88f1 23-Apr-2016 Emmanouil Maroudas <emmanouil.maroudas@gmail.com>

EDAC: Increment correct counter in edac_inc_ue_error()

Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of
ce_noinfo_count.

Signed-off-by: Emmanouil Maroudas <emmanouil.maroudas@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 4275be635597 ("edac: Change internal representation to work with layers")
Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# 06e912d4 02-Feb-2016 Borislav Petkov <bp@suse.de>

EDAC: Cleanup/sync workqueue functions

They're both running only when ->edac_check is initialized so remove
that check from the workqueue function itself. Synchronize/generalize
the ->op_state check between the two.

Kill useless comments, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 626a7a4d 02-Feb-2016 Borislav Petkov <bp@suse.de>

EDAC: Kill workqueue setup/teardown functions

We have the generic wrappers now, use those. edac_pci_workq_setup() had
an unused argument anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 09667606 02-Feb-2016 Borislav Petkov <bp@suse.de>

EDAC: Balance workqueue setup and teardown

We use the ->edac_check function pointers to determine whether we need
to setup a polling workqueue. However, the destroy path is not balanced
and we might try to teardown an unitialized workqueue.

Balance init and destroy paths by looking at ->edac_check in both cases.
Set op_state to OP_OFFLINE *before* destroying anything.

Reported-by: Zhiqiang Hou <Zhiqiang.Hou@freescale.com>
Cc: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Borislav Petkov <bp@suse.de>


# c4cf3b45 30-Nov-2015 Borislav Petkov <bp@suse.de>

EDAC: Rework workqueue handling

Hide the EDAC workqueue pointer in a separate compilation unit and add
accessors for the workqueue manipulations needed.

Remove edac_pci_reset_delay_period() which wasn't used by anything. It
seems it got added without a user with

91b99041c1d5 ("drivers/edac: updated PCI monitoring")

Signed-off-by: Borislav Petkov <bp@suse.de>


# fcd5c4dd 27-Nov-2015 Borislav Petkov <bp@suse.de>

EDAC: Robustify workqueues destruction

EDAC workqueue destruction is really fragile. We cancel delayed work
but if it is still running and requeues itself, we still go ahead and
destroy the workqueue and the queued work explodes when workqueue core
attempts to run it.

Make the destruction more robust by switching op_state to offline so
that requeuing stops. Cancel any pending work *synchronously* too.

EDAC i7core: Driver loaded.
general protection fault: 0000 [#1] SMP
CPU 12
Modules linked in:
Supported: Yes
Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7
RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0
< ... regs ...>
Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
Stack:
...
Call Trace:
call_timer_fn
run_timer_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
intel_idle
cpuidle_idle_call
cpu_idle
Code: ...
RIP __queue_work
RSP <...>

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>


# 990995ba 20-Oct-2015 Tan Xiaojun <tanxiaojun@huawei.com>

EDAC: Fix PAGES_TO_MiB macro misuse

The PAGES_TO_MiB macro is used for unit conversion but the
trace_mc_event() tracepoint expects a page address. Fix that.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# b01aec9b 21-May-2015 Borislav Petkov <bp@suse.de>

EDAC: Cleanup atomic_scrub mess

So first of all, this atomic_scrub() function's naming is bad. It looks
like an atomic_t helper. Change it to edac_atomic_scrub().

The bigger problem is that this function is arch-specific and every new
arch which doesn't necessarily need that functionality still needs to
define it, otherwise EDAC doesn't compile.

So instead of doing that and including arch-specific headers, have each
arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c
for ifdeffery. Much cleaner.

And we already are doing this with another symbol - EDAC_SUPPORT. This
is also much cleaner than having CONFIG_EDAC enumerate all the arches
which need/have EDAC support and drivers.

This way I can kill the useless edac.h header in tile too.

Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Chris Metcalf <cmetcalf@ezchip.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "Maciej W. Rozycki" <macro@codesourcery.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Steven J. Hill" <Steven.Hill@imgtec.com>
Cc: x86@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>


# 4e8d230d 04-Feb-2015 Takashi Iwai <tiwai@suse.de>

EDAC: Allow to pass driver-specific attribute groups

Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups. This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.

edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>


# 4cfc3a40 30-Sep-2014 Borislav Petkov <bp@suse.de>

EDAC: Sync memory types and names

Make keeping the sync between the mem_types enum and the actual string
names simpler by using designated initializers.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 348fec70 18-Sep-2014 Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>

EDAC: Add DDR3 LRDIMM entries to edac_mem_types

F15hM60h adds support for DDR4 and DDR3 LRDIMMs. Add them here.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1411070218-10258-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: improve comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# f4ce6eca 13-Aug-2014 Borislav Petkov <bp@suse.de>

EDAC: Fix mem_types strings type

This one got forgotten during an earlier cleanup.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 76ac8275 11-Jun-2014 Chen, Gong <gong.chen@linux.intel.com>

trace, RAS: Add basic RAS trace event

To avoid confuision and conflict of usage for RAS related trace event,
add an unified RAS trace event stub.

Start a RAS subsystem menu which will be fleshed out in time, when more
features get added to it.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# aa2064d7 08-May-2014 Loc Ho <lho@apm.com>

EDAC: Fix MC scrub mode comparsion bug for correctable errors

The MC structure field scrub_mode is of integer type - not bit field.
Use it accordingly.

Signed-off-by: Loc Ho <lho@apm.com>
Link: http://lkml.kernel.org/r/1399590199-12256-2-git-send-email-lho@apm.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# cb6ef42e 12-Feb-2014 Borislav Petkov <bp@suse.de>

EDAC: Correct workqueue setup path

We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.

On that second path we don't need to init the workqueue which has been
initialized already.

Thanks to Tejun for workqueue insights.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: <stable@vger.kernel.org>


# 9da21b15 03-Feb-2014 Borislav Petkov <bp@suse.de>

EDAC: Poll timeout cannot be zero, p2

Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>


# 7270a608 10-Oct-2013 Robert Richter <robert.richter@linaro.org>

edac: Unify reporting of device info for device, mc and pci

Log messages slightly differ between edac subsystems. Unifying it.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Robert Richter <rric@kernel.org>


# 88d84ac9 18-Jul-2013 Borislav Petkov <bp@suse.de>

EDAC: Fix lockdep splat

Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
lockdep_init_map
? trace_hardirqs_on_caller
? trace_hardirqs_on
debug_mutex_init
__mutex_init
bus_register
edac_create_sysfs_mci_device
edac_mc_add_mc
sbridge_probe
pci_device_probe
driver_probe_device
__driver_attach
? driver_probe_device
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
? 0xffffffffa0010fff
sbridge_init
? 0xffffffffa0010fff
do_one_initcall
load_module
? unset_module_init_ro_nx
SyS_init_module
tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 9713faec 11-Mar-2013 Mauro Carvalho Chehab <mchehab@kernel.org>

EDAC: Merge mci.mem_is_per_rank with mci.csbased

Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>


# e7e24830 31-Oct-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: add support for raw error reports

That allows APEI GHES driver to report errors directly, using
the EDAC error report API.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# c7ef7645 21-Feb-2013 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: reduce stack pressure by using a pre-allocated buffer

The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 80cc7d87 31-Oct-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: lock module owner to avoid error report conflicts

APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
"There can be only one".

There are two reasons for that:

1) Each driver assumes that it is the only one registering at
the EDAC core, as it is driver's responsibility to number
the memory controllers, and all of them start from 0;

2) If BIOS is handling the memory errors, the OS can't also be
doing it, as one will mangle with the other.

So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# c66b5a79 15-Feb-2013 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: add a new memory layer type

There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# d3d09e18 26-Jan-2013 Joe Perches <joe@perches.com>

EDAC: Fix kcalloc argument order

First number, then size.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>


# 80f5ab09 18-Aug-2012 Shaun Ruffell <sruffell@digium.com>

edac: edac_mc no longer deals with kobjects directly

There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.

Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# f430d570 10-Sep-2012 Borislav Petkov <borislav.petkov@amd.com>

EDAC: Handle empty msg strings when reporting errors

A reported error could look like this

[ 226.178315] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6)

with two spaces back-to-back due to the msg argument of
edac_mc_handle_error being passed on empty by the specific drivers.
Handle that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>


# 4da1b7bf 10-Sep-2012 Borislav Petkov <borislav.petkov@amd.com>

EDAC: Remove useless assignment of error type

The tracepoint decodes the error type later anyway so remove a useless
assignment to the temporary p which gets overwritten later anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>


# 24bef66e 24-Oct-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Fix the dimm filling for csrows-based layouts

The driver is currently filling data in a wrong way, on drivers
for csrows-based memory controller, when the first layer is a
csrow.

This is not easily to notice, as, in general, memories are
filed in dual, interleaved, symetric mode, as very few memory
controllers support asymetric modes.

While digging into a bug for i82795_edac driver, the asymetric
mode there is now working, allowing us to fill the machine with
4x1GB ranks at channel 0, and 2x512GB at channel 1:

Channel 0 ranks:
EDAC DEBUG: i82975x_init_csrows: DIMM A0: from page 0x00000000 to 0x0003ffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A1: from page 0x00040000 to 0x0007ffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A2: from page 0x00080000 to 0x000bffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A3: from page 0x000c0000 to 0x000fffff (size: 0x00040000 pages)

Channel 1 ranks:
EDAC DEBUG: i82975x_init_csrows: DIMM B0: from page 0x00100000 to 0x0011ffff (size: 0x00020000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM B1: from page 0x00120000 to 0x0013ffff (size: 0x00020000 pages)

Instead of properly showing the memories as such, before this patch, it
shows the memory layout as:

+-----------------------------------+
| mc0 |
| csrow0 | csrow1 | csrow2 |
----------+-----------------------------------+
channel1: | 1024 MB | 1024 MB | 512 MB |
channel0: | 1024 MB | 1024 MB | 512 MB |
----------+-----------------------------------+

as if both channels were symetric, grouping the DIMMs on a wrong
layout.

After this patch, the memory is correctly represented.
So, for csrows at layers[0], it shows:

+-----------------------------------------------+
| mc0 |
| csrow0 | csrow1 | csrow2 | csrow3 |
----------+-----------------------------------------------+
channel1: | 512 MB | 512 MB | 0 MB | 0 MB |
channel0: | 1024 MB | 1024 MB | 1024 MB | 1024 MB |
----------+-----------------------------------------------+

For csrows at layers[1], it shows:

+-----------------------+
| mc0 |
| channel0 | channel1 |
--------+-----------------------+
csrow3: | 1024 MB | 0 MB |
csrow2: | 1024 MB | 0 MB |
--------+-----------------------+
csrow1: | 1024 MB | 512 MB |
csrow0: | 1024 MB | 512 MB |
--------+-----------------------+

So, no matter of what comes first, the information between
channel and csrow will be properly represented.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# faa2ad09 22-Sep-2012 Shaun Ruffell <sruffell@digium.com>

edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.

Fix potential NULL pointer dereference in edac_unregister_sysfs() on
system boot introduced in 3.6-rc1.

Since commit 7a623c039 ("edac: rewrite the sysfs code to use struct
device") edac_mc_alloc() no longer initializes embedded kobjects in
struct mem_ctl_info. Therefore edac_mc_free() can no longer simply
decrement a kobject reference count to free the allocated memory unless
the memory controller driver module had also called edac_mc_add_mc().

Now edac_mc_free() will check if the newly embedded struct device has
been registered with sysfs before using either the standard device
release functions or freeing the data structures itself with logic
pulled out of the error path of edac_mc_alloc().

The BUG this patch resolves for me:

BUG: unable to handle kernel NULL pointer dereference at (null)
EIP is at __wake_up_common+0x1a/0x6a
Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
Call Trace:
complete_all+0x3f/0x50
device_pm_remove+0x23/0xa2
device_del+0x34/0x142
edac_unregister_sysfs+0x3b/0x5c [edac_core]
edac_mc_free+0x29/0x2f [edac_core]
e7xxx_probe1+0x268/0x311 [e7xxx_edac]
e7xxx_init_one+0x56/0x61 [e7xxx_edac]
local_pci_probe+0x13/0x15
...

Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ef6e7816 22-Sep-2012 Fengguang Wu <fengguang.wu@intel.com>

edac_mc: fix messy kfree calls in the error path

coccinelle warns about:

+ drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429

421 if (mci->csrows) {
> 422 for (chn = 0; chn < tot_channels; chn++) {
423 csr = mci->csrows[chn];
424 if (csr) {
> 425 for (chn = 0; chn < tot_channels; chn++)
426 kfree(csr->channels[chn]);
427 kfree(csr);
428 }
> 429 kfree(mci->csrows[i]);
430 }
431 kfree(mci->csrows);
432 }

and that code block seem to mess things up in several ways (double free, memory
leak, out-of-bound reads etc.):

L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be
"row" and "tot_csrows" respectively. Which means either memory leak, or
out-of-bound reads (which if does not trigger an immediate page fault
error, will further lead to kfree() on random addresses).

L425: The inner loop is reusing the same iterator "chn" as the outer loop,
which could lead to premature end of the outer loop, and hence memory leak.

L429: The array index 'i' in mci->csrows[i] is a temporary value used in
previous loops, and won't change at all in the current loop. Which
means either out-of-bound read and possibly kfree(random number), or the
same mci->csrows[i] get freed once and again, and possibly double free
for the kfree(csr) in L427.

L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory.

The buggy code was introduced by commit de3910eb ("edac: change the mem
allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1
merge window. Fix it by freeing up resources in this order:

free csrows[i]->channels[j]
free csrows[i]->channels
free csrows[i]
free csrows

CC: Mauro Carvalho Chehab <mchehab@redhat.com>
CC: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 41f63c53 03-Aug-2012 Tejun Heo <tj@kernel.org>

workqueue: use mod_delayed_work() instead of cancel + queue

Convert delayed_work users doing cancel_delayed_work() followed by
queue_delayed_work() to mod_delayed_work().

Most conversions are straight-forward. Ones worth mentioning are,

* drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
use mod_delayed_work() and cancel loop in
edac_mc_reset_delay_period() is dropped.

* drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
watchdog is active or not. @fan_watchdog_active and related code
dropped.

* drivers/power/charger-manager.c: Seemingly a lot of
delayed_work_pending() abuse going on here.
[delayed_]work_pending() are unsynchronized and racy when used like
this. I converted one instance in fullbatt_handler(). Please
conver the rest so that it invokes workqueue APIs for the intended
target state rather than trying to game work item pending state
transitions. e.g. if timer should be modified - call
mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().

* drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
simplified. Note that round_jiffies() calls in this function are
meaningless. round_jiffies() work on absolute jiffies not delta
delay used by delayed_work.

v2: Tomi pointed out that __cancel_delayed_work() users can't be
safely converted to mod_delayed_work(). They could be calling it
from irq context and if that happens while delayed_work_timer_fn()
is running, it could deadlock. __cancel_delayed_work() users are
dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Johannes Berg <johannes@sipsolutions.net>


# 9eb07a7f 04-Jun-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: edac_mc_handle_error(): add an error_count parameter

In order to avoid loosing error events, it is desirable to group
error events together and generate a single trace for several identical
errors.

The trace API already allows reporting multiple errors. Change the
handle_error function to also allow that.

The changes at the drivers were made by this small script:

$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 03f7eae8 04-Jun-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: remove arch-specific parameter for the error handler

Remove the arch-dependent parameter, as it were not used,
as the MCE tracepoint weren't implemented. It probably doesn't
make sense to have an MCE-specific tracepoint, as this will
cost more bytes at the tracepoint, and tracepoint is not free.

The changes at the EDAC drivers were done by this small perl script:

$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 08a4a136 18-May-2012 Dan Carpenter <dan.carpenter@oracle.com>

edac_mc: check for allocation failure in edac_mc_alloc()

Add a check here for if kzalloc() failed.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 6e84d359 30-Apr-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac_mc: Cleanup per-dimm_info debug messages

The edac_mc_alloc() routine allocates one dimm_info device for all
possible memories, including the non-filled ones. The debug messages
there are somewhat confusing. So, cleans them, by moving the code
that prints the memory location to edac_mc, and using it on both
edac_mc_sysfs and edac_mc.

Also, only dumps information when DIMM/ranks are actually
filled.

After this patch, a dimm-based memory controller will print the debug
info as:

[ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0
[ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow: csrow = ffff8801169be000
[ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow: csrow->first_page = 0x0
[ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow: csrow->last_page = 0x0
[ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow: csrow->page_mask = 0x0
[ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow: csrow->nr_channels = 3
[ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow: csrow->channels = ffff8801149c2840
[ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow: csrow->mci = ffff880117426000
[ 1011.380041] EDAC DEBUG: edac_mc_dump_channel: channel->chan_idx = 0
[ 1011.380042] EDAC DEBUG: edac_mc_dump_channel: channel = ffff8801149c2860
[ 1011.380044] EDAC DEBUG: edac_mc_dump_channel: channel->csrow = ffff8801169be000
[ 1011.380046] EDAC DEBUG: edac_mc_dump_channel: channel->dimm = ffff88010fe90400
...
[ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0
[ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm: dimm = ffff88010fe90400
[ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm: dimm->label = 'CPU#0Channel#0_DIMM#0'
[ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
[ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8
[ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
...

(a rank-based memory controller would print, instead of "dimm?", "rank?"
on the above debug info)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 956b9ba1 29-Apr-2012 Joe Perches <joe@perches.com>

edac: Convert debugfX to edac_dbg(X,

Use a more common debugging style.

Remove __FILE__ uses, add missing newlines,
coalesce formats and align arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# dd23cd6e 29-Apr-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs

The debug macro already adds that. Most of the work here was
made by this small script:

$f .=$_ while (<>);

$f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*": /\1"/g;
$f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*/\1/g;
$f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*"MC: /\1"/g;

$f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
$f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;
$f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
$f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;

$f =~ s/\"MC\: \\n\"/"MC:\\n"/g;

print $f;

After running the script, manual cleanups were done to fix it the remaining
places.

While here, removed the __LINE__ on most places, as it doesn't actually give
useful info on most places.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# de3910eb 24-Apr-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: change the mem allocation scheme to make Documentation/kobject.txt happy

Kernel kobjects have rigid rules: each container object should be
dynamically allocated, and can't be allocated into a single kmalloc.

EDAC never obeyed this rule: it has a single malloc function that
allocates all needed data into a single kzalloc.

As this is not accepted anymore, change the allocation schema of the
EDAC *_info structs to enforce this Kernel standard.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Greg K H <gregkh@linuxfoundation.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# d90c0089 21-Mar-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Get rid of the old kobj's from the edac mc code

Now that al users for the old kobj raw access are gone,
we can get rid of the legacy kobj-based structures and
data.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 7a623c03 16-Apr-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: rewrite the sysfs code to use struct device

The EDAC subsystem uses the old struct sysdev approach,
creating all nodes using the raw sysfs API. This is bad,
as the API is deprecated.

As we'll be changing the EDAC API, let's first port the existing
code to struct device.

There's one drawback on this patch: driver-specific sysfs
nodes, used by mpc85xx_edac, amd64_edac and i7core_edac
won't be created anymore. While it would be possible to
also port the device-specific code, that would mix kobj with
struct device, with is not recommended. Also, it is easier and nicer
to move the code to the drivers, instead, as the core can get rid
of some complex logic that just emulates what the device_add()
and device_create_file() already does.

The next patches will convert the driver-specific code to use
the device-specific calls. Then, the remaining bits of the old
sysfs API will be removed.

NOTE: a per-MC bus is required, otherwise devices with more than
one memory controller will hit a bug like the one below:

[ 819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev()
[ 819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1
[ 819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1
[ 819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0
[ 819.094984] ------------[ cut here ]------------
[ 819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0()
[ 819.107282] Hardware name: S2600CP
[ 819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0'
[ 819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan]
[ 819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1
[ 819.184113] Call Trace:
[ 819.186868] [<ffffffff8105adaf>] warn_slowpath_common+0x7f/0xc0
[ 819.193573] [<ffffffff8105aea6>] warn_slowpath_fmt+0x46/0x50
[ 819.200000] [<ffffffff811f53d1>] sysfs_add_one+0xc1/0xf0
[ 819.206025] [<ffffffff811f5cf5>] sysfs_do_create_link+0x135/0x220
[ 819.212944] [<ffffffff811f7023>] ? sysfs_create_group+0x13/0x20
[ 819.219656] [<ffffffff811f5df3>] sysfs_create_link+0x13/0x20
[ 819.226109] [<ffffffff813b04f6>] bus_add_device+0xe6/0x1b0
[ 819.232350] [<ffffffff813ae7cb>] device_add+0x2db/0x460
[ 819.238300] [<ffffffffa0325634>] edac_create_dimm_object+0x84/0xf0 [edac_core]
[ 819.246460] [<ffffffffa0325e18>] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core]
[ 819.255215] [<ffffffffa0322e2a>] edac_mc_add_mc+0x5a/0x2c0 [edac_core]
[ 819.262611] [<ffffffffa03412df>] sbridge_register_mci+0x1bc/0x279 [sb_edac]
[ 819.270493] [<ffffffffa03417a3>] sbridge_probe+0xef/0x175 [sb_edac]
[ 819.277630] [<ffffffff813ba4e8>] ? pm_runtime_enable+0x58/0x90
[ 819.284268] [<ffffffff812f430c>] local_pci_probe+0x5c/0xd0
[ 819.290508] [<ffffffff812f5ba1>] __pci_device_probe+0xf1/0x100
[ 819.297117] [<ffffffff812f5bea>] pci_device_probe+0x3a/0x60
[ 819.303457] [<ffffffff813b1003>] really_probe+0x73/0x270
[ 819.309496] [<ffffffff813b138e>] driver_probe_device+0x4e/0xb0
[ 819.316104] [<ffffffff813b149b>] __driver_attach+0xab/0xb0
[ 819.322337] [<ffffffff813b13f0>] ? driver_probe_device+0xb0/0xb0
[ 819.329151] [<ffffffff813af5d6>] bus_for_each_dev+0x56/0x90
[ 819.335489] [<ffffffff813b0d7e>] driver_attach+0x1e/0x20
[ 819.341534] [<ffffffff813b0980>] bus_add_driver+0x1b0/0x2a0
[ 819.347884] [<ffffffffa0347000>] ? 0xffffffffa0346fff
[ 819.353641] [<ffffffff813b19f6>] driver_register+0x76/0x140
[ 819.359980] [<ffffffff8159f18b>] ? printk+0x51/0x53
[ 819.365524] [<ffffffffa0347000>] ? 0xffffffffa0346fff
[ 819.371291] [<ffffffff812f5896>] __pci_register_driver+0x56/0xd0
[ 819.378096] [<ffffffffa0347054>] sbridge_init+0x54/0x1000 [sb_edac]
[ 819.385231] [<ffffffff8100203f>] do_one_initcall+0x3f/0x170
[ 819.391577] [<ffffffff810bcd2e>] sys_init_module+0xbe/0x230
[ 819.397926] [<ffffffff815bb529>] system_call_fastpath+0x16/0x1b
[ 819.404633] ---[ end trace 1654fdd39556689f ]---

This happens because the bus is not being properly initialized.
Instead of putting the memory sub-devices inside the memory controller,
it is putting everything under the same directory:

$ tree /sys/bus/edac/
/sys/bus/edac/
├── devices
│ ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts
│ ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0
│ ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1
│ ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2
│ ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0
│ ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1
│ ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3
│ ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6
│ ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch
│ ├── mc -> ../../../devices/system/edac/mc
│ └── mc0 -> ../../../devices/system/edac/mc/mc0
├── drivers
├── drivers_autoprobe
├── drivers_probe
└── uevent

On a multi-memory controller system, the names "csrow%d" and "dimm%d"
should be under "mc%d", and not at the main hierarchy level.

So, we need to create a per-MC bus, in order to have its own namespace.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Greg K H <gregkh@linuxfoundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 8447c4d1 06-Jun-2012 Chris Metcalf <cmetcalf@tilera.com>

edac: Do alignment logic properly in edac_align_ptr()

The logic was checking the sizeof the structure being allocated to
determine whether an alignment fixup was required. This isn't right;
what we actually care about is the alignment of the actual pointer that's
about to be returned. This became an issue recently because struct
edac_mc_layer has a size that is not zero modulo eight, so we were
taking the correctly-aligned pointer and forcing it to be misaligned.
On Tile this caused an alignment exception.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# fd687502 16-Mar-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Rename the parent dev to pdev

As EDAC doesn't use struct device itself, it created a parent dev
pointer called as "pdev". Now that we'll be converting it to use
struct device, instead of struct devsys, this needs to be fixed.

No functional changes.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 53f2d028 23-Feb-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

RAS: Add a tracepoint for reporting memory controller events

Add a new tracepoint-based hardware events report method for
reporting Memory Controller events.

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

We have several subsystems & methods for reporting hardware errors:

1) EDAC ("Error Detection and Correction"). In its original form
this consisted of a platform specific driver that read topology
information and error counts from chipset registers and reported
the results via a sysfs interface.

2) mcelog - x86 specific decoding of machine check bank registers
reporting in binary form via /dev/mcelog. Recent additions make use
of the APEI extensions that were documented in version 4.0a of the
ACPI specification to acquire more information about errors without
having to rely reading chipset registers directly. A user level
programs decodes into somewhat human readable format.

3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
decodes errors reported via machine check bank registers in AMD
processors to the console log using printk();

Each of these mechanisms has a band of followers ... and none
of them appear to meet all the needs of all users.

As part of a RAS subsystem, let's encapsulate the memory error hardware
events into a trace facility.

The tracepoint printk will be displayed like:

mc_event: [quant] (Corrected|Uncorrected|Fatal) error:[error msg] on [label] ([location] [edac_mc detail] [driver_detail]

Where:
[quant] is the quantity of errors
[error msg] is the driver-specific error message
(e. g. "memory read", "bus error", ...);
[location] is the location in terms of memory controller and
branch/channel/slot, channel/slot or csrow/channel;
[label] is the memory stick label;
[edac_mc detail] describes the address location of the error
and the syndrome;
[driver detail] is driver-specifig error message details,
when needed/provided (e. g. "area:DMA", ...)

For example:

mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their Management Information
Base.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
However, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 5926ff50 09-Feb-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Initialize the dimm label with the known information

While userspace doesn't fill the dimm labels, add there the dimm location,
as described by the used memory model. This could eventually match what
is described at the dmidecode, making easier for people to identify the
memory.

For example, on an Intel motherboard where the DMI table is reliable,
the first memory stick is described as:

Memory Device
Array Handle: 0x0029
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 2048 MB
Form Factor: DIMM
Set: 1
Locator: A1_DIMM0
Bank Locator: A1_Node0_Channel0_Dimm0
Type: <OUT OF SPEC>
Type Detail: Synchronous
Speed: 800 MHz
Manufacturer: A1_Manufacturer0
Serial Number: A1_SerNum0
Asset Tag: A1_AssetTagNum0
Part Number: A1_PartNum0

The memory named as "A1_DIMM0" is physically located at the first
memory controller (node 0), at channel 0, dimm slot 0.

After this patch, the memory label will be filled with:
/sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0

And (after the new EDAC API patches) as:
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0

So, even if the memory label is not initialized on userspace, an useful
information with the error location is filled there, expecially since
several systems/motherboards are provided with enough info to map from
channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
EDAC core fill it by default is a good thing.

It should noticed that, as the label filling happens at the
edac_mc_alloc(), drivers can override it to better describe the memories
(and some actually do it).

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# ca0907b9 02-May-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Remove the legacy EDAC ABI

Now that all drivers got converted to use the new ABI, we can
drop the old one.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 4275be63 18-Apr-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Change internal representation to work with layers

Change the EDAC internal representation to work with non-csrow
based memory controllers.

There are lots of those memory controllers nowadays, and more
are coming. So, the EDAC internal representation needs to be
changed, in order to work with those memory controllers, while
preserving backward compatibility with the old ones.

The edac core was written with the idea that memory controllers
are able to directly access csrows.

This is not true for FB-DIMM and RAMBUS memory controllers.

Also, some recent advanced memory controllers don't present a per-csrows
view. Instead, they view memories as DIMMs, instead of ranks.

So, change the allocation and error report routines to allow
them to work with all types of architectures.

This will allow the removal of several hacks with FB-DIMM and RAMBUS
memory controllers.

Also, several tests were done on different platforms using different
x86 drivers.

TODO: a multi-rank DIMMs are currently represented by multiple DIMM
entries in struct dimm_info. That means that changing a label for one
rank won't change the same label for the other ranks at the same DIMM.
This bug is present since the beginning of the EDAC, so it is not a big
deal. However, on several drivers, it is possible to fix this issue, but
it should be a per-driver fix, as the csrow => DIMM arrangement may not
be equal for all. So, don't try to fix it here yet.

I tried to make this patch as short as possible, preceding it with
several other patches that simplified the logic here. Yet, as the
internal API changes, all drivers need changes. The changes are
generally bigger in the drivers for FB-DIMMs.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 93e4fe64 16-Apr-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: rewrite edac_align_ptr()

The edac_align_ptr() function is used to prepare data for a single
memory allocation kzalloc() call. It counts how many bytes are needed
by some data structure.

Using it as-is is not that trivial, as the quantity of memory elements
reserved is not there, but, instead, it is on a next call.

In order to avoid mistakes when using it, move the number of allocated
elements into it, making easier to use it.

Reviewed-by: Borislav Petkov <bp@amd64.org>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a895bf8b 28-Jan-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: move nr_pages to dimm struct

The number of pages is a dimm property. Move it to the dimm struct.

After this change, it is possible to add sysfs nodes for the DIMM's that
will properly represent the DIMM stick properties, including its size.

A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
the memory controller represents the memory via chip select rows.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 084a4fcc 27-Jan-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: move dimm properties to struct dimm_info

On systems based on chip select rows, all channels need to use memories
with the same properties, otherwise the memories on channels A and B
won't be recognized.

However, such assumption is not true for all types of memory
controllers.

Controllers for FB-DIMM's don't have such requirements.

Also, modern Intel controllers seem to be capable of handling such
differences.

So, we need to get rid of storing the DIMM information into a per-csrow
data, storing it, instead at the right place.

The first step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Mike Williams <mike@mikebwilliams.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a7d7d2e1 27-Jan-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Create a dimm struct and move the labels into it

The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.

This forced drivers to fake^Wvirtualize a csrow struct, and to create
a mess under csrow/channel original's concept.

Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel.
Latter patches will modify the location to properly represent the
memory architecture.

All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.

TODO: While this patch doesn't change the existing behavior, on
csrows-based memory controllers, a csrow/channel pair points to a memory
rank. There's a known bug at the EDAC core that allows having different
labels for the same DIMM, if it has more than one rank. A latter patch
is need to merge the several ranks for a DIMM into the same dimm_info
struct, in order to avoid having different labels for the same DIMM.

The edac_mc_alloc() will now contain a per-dimm initialization loop that
will be changed by latter patches in order to match other types of
memory architectures.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a4b4be3f 27-Jan-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: rename channel_info to rank_info

What it is pointed by a csrow/channel vector is a rank information, and
not a channel information.

On a traditional architecture, the memory controller directly access the
memory ranks, via chip select rows. Different ranks at the same DIMM is
selected via different chip select rows. So, typically, one
csrow/channel pair means one different DIMM.

On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
Memory Buffer (AMB) that serves as the interface between the memory
controller and the memory chips.

The AMB selection is via the DIMM slot, and not via a csrow.

It is up to the AMB to talk with the csrows of the DRAM chips.

So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
rank. RAMBUS is similar.

Newer memory controllers, like the ones found on Intel Sandy Bridge and
Nehalem, even working with normal DDR3 DIMM's, don't use the usual
channel A/channel B interleaving schema to provide 128 bits data access.

Instead, they have more channels (3 or 4 channels), and they can use
several interleaving schemas. Such memory controllers see the DIMMs
directly on their registers, instead of the ranks, which is better for
the driver, as its main usageis to point to a broken DIMM stick (the
Field Repleceable Unit), and not to point to a broken DRAM chip.

The drivers that support such such newer memory architecture models
currently need to fake information and to abuse on EDAC structures, as
the subsystem was conceived with the idea that the csrow would always be
visible by the CPU.

To make things a little worse, those drivers don't currently fake
csrows/channels on a consistent way, as the concepts there don't apply
to the memory controllers they're talking with. So, each driver author
interpreted the concepts using a different logic.

In order to fix it, let's rename the data structure that points into a
DIMM rank to "rank_info", in order to be clearer about what's stored
there.

Latter patches will provide a better way to represent the memory
hierarchy for the other types of memory controller.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 4e5df7ca 25-Nov-2011 Cong Wang <amwang@redhat.com>

edac: remove the second argument of k[un]map_atomic()

Signed-off-by: Cong Wang <amwang@redhat.com>


# fe5ff8b8 14-Dec-2011 Kay Sievers <kay.sievers@vrfy.org>

edac: convert sysdev_class to a regular subsystem

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# e2e77098 26-May-2011 Lai Jiangshan <laijs@cn.fujitsu.com>

edac,rcu: use synchronize_rcu() instead of call_rcu()+rcu_barrier()

synchronize_rcu() does the stuff as needed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 24f9a7fe 07-Oct-2010 Borislav Petkov <borislav.petkov@amd.com>

amd64_edac: Rework printk macros

Add a macro per printk level, shorten up error messages. Add relevant
information to KERN_INFO level. No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>


# bb31b312 02-Dec-2010 Borislav Petkov <borislav.petkov@amd.com>

EDAC: Fix workqueue-related crashes

00740c58541b6087d78418cebca1fcb86dc6077d changed edac_core to
un-/register a workqueue item only if a lowlevel driver supplies a
polling routine. Normally, when we remove a polling low-level driver, we
go and cancel all the queued work. However, the workqueue unreg happens
based on the ->op_state setting, and edac_mc_del_mc() sets this to
OP_OFFLINE _before_ we cancel the work item, leading to NULL ptr oops on
the workqueue list.

Fix it by putting the unreg stuff in proper order.

Cc: <stable@kernel.org> #36.x
Reported-and-tested-by: Tobias Karnat <tobias.karnat@googlemail.com>
LKML-Reference: <1291201307.3029.21.camel@Tobias-Karnat>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>


# accf74ff 16-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: don't use a freed mci struct

This is a nasty bug. Since kobject count will be reduced by zero by
edac_mc_del_mc(), and this triggers the kobj release method, the
mci memory will be freed automatically. So, all we have left is ctl_name,
as shown by enabling debug:

[ 80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link
[ 80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance
[ 80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[ 80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
[ 80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
[ 80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
[ 80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
[ 80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()

Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
when edac_remove_sysfs_mci_device() is called, but the logic is:
edac_remove_sysfs_mci_device(mci);
edac_printk(KERN_INFO, EDAC_MC,
"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
mci->mod_name, mci->ctl_name, edac_dev_name(mci));
So, as the edac_printk() needs the mci struct, this generates an OOPS.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# bbc560ae 16-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

edac_core: Print debug messages at release calls

This is important to track a nasty bug at the free logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 6fe1108f 11-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

edac_core: Do a better job with node removal

Make sure we remove groups at the right order

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 939747bd 10-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Be sure that the edac pci handler will be properly released

With multi-sockets, more than one edac pci handler is enabled. Be sure to
un-register all instances.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 00740c58 25-Sep-2010 Borislav Petkov <borislav.petkov@amd.com>

amd64_edac: Fix driver module removal

f4347553b30ec66530bfe63c84530afea3803396 removed the edac polling
mechanism in favor of using a notifier chain for conveying MCE
information to edac. However, the module removal path didn't test
whether the driver had setup the polling function workqueue at all and
the rmmod process was hanging in the kernel at try_to_del_timer_sync()
in the cancel_delayed_work() path, trying to cancel an uninitialized
work struct.

Fix that by adding a balancing check to the workqueue removal path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>


# 239642fe 12-Nov-2009 Borislav Petkov <borislav.petkov@amd.com>

edac: add memory types strings for debugging

Instead of using deeply-nested conditionals for dumping the DIMM type in
debug mode, add a strings array of the supported DIMM types.

This is useful in cases where an edac driver supports multiple DRAM
types and is only defined in debug builds.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>


# 458e5ff1 23-Sep-2009 Jesper Dangaard Brouer <hawk@comx.dk>

edac: core: remove completion-wait for complete with rcu_barrier

Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c
and edac_pci.c.

They all use a wait_for_completion() scheme, but this scheme it not 100%
safe on multiple CPUs. See the _rcu_barrier() implementation which
explains why extra precausion is needed.

The patch adds a comment about rcu_barrier() and as a precausion calls
rcu_barrier(). A maintainer needs to look at removing the
wait_for_completion code.

[dougthompson@xmission.com: remove the wait_for_completion code]
Signed-off-by Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fbeb4384 13-Apr-2009 Jean Delvare <khali@linux-fr.org>

edac: use to_delayed_work()

The edac-core driver includes code which assumes that the work_struct
which is included in every delayed_work is the first member of that
structure. This is currently the case but might change in the future, so
use to_delayed_work() instead, which doesn't make such an assumption.

linux-2.6.30-rc1 has the to_delayed_work() function that will allow this
patch to work

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 281efb17 06-Jan-2009 Kay Sievers <kay.sievers@vrfy.org>

edac: struct device: replace bus_id with dev_name(), dev_set_name()

This patch is part of a larger patch series which will remove the "char
bus_id[20]" name string from struct device. The device name is managed in
the kobject anyway, and without any size limitation, and just needlessly
copied into "struct device".

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 17aa7e03 04-May-2008 Stephen Rothwell <sfr@canb.auug.org.au>

dev_name introduction fall out fix

Commit 06916639e2fed9ee475efef2747a1b7429f8fe76 ("driver-core: add
dev_name() to help transition away from using bus_id") added a static
inline dev_name() and used it in dev_printk.

Unfortunately, drivers/edac/edac_core.h defines a macro called
dev_name(). Rename the latter.

Diagnosis by Tony Breeds and Michael Ellerman.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1a45027d 29-Apr-2008 Adrian Bunk <bunk@kernel.org>

edac: remove unneeded functions and add static accessor

Collection of patches, merged into one, from Adrian that do the following:

1) This patch makes the following needlessly global functions static:
- edac_pci_get_log_pe()
- edac_pci_get_log_npe()
- edac_pci_get_panic_on_pe()
- edac_pci_unregister_sysfs_instance_kobj()
- edac_pci_main_kobj_setup()

2) Remove unneeded function edac_device_find()

3) Added #if 0 around function edac_pci_find()

4) make the needlessly global edac_pci_generic_check() static

5) Removed function edac_check_mc_devices()

Doug Thompson modified Adrian's patches, to bettern represent
the direction of EDAC, and make them one patch.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ff6ac2a6 29-Apr-2008 Robert P. J. Day <rpjday@crashcourse.ca>

edac: use the shorter LIST_HEAD for brevity

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bce19683 26-Jul-2007 Doug Thompson <dougthompson@xmission.com>

drivers/edac: fix reset edac_mc pollmsec

This fixes a deadlock that could occur on a 'setup' and 'teardown' sequence of
the workq for a edac_mc control structure instance. A similiar fix was
previously implemented for the edac_device code.

In addition, the edac_mc device code there was missing code to allow the workq
period valu to be altered via sysfs control.

This patch adds that fix on the code, and allows for the changing of the
period value as well.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bf52fa4a 19-Jul-2007 Doug Thompson <dougthompson@xmission.com>

drivers/edac: fix workq reset deadlock

Fix mutex locking deadlock on the device controller linked list. Was calling
a lock then a function that could call the same lock. Moved the cancel workq
function to outside the lock

Added some short circuit logic in the workq code

Added comments of description

Code tidying

Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8096cfaf 19-Jul-2007 Doug Thompson <dougthompson@xmission.com>

drivers/edac: fix edac_mc sysfs completion code

This patch refactors the 'releasing' of kobjects for the edac_mc type of
device. The correct pattern of kobject release is followed.

As internal kobjs are allocated they bump a ref count on the top level kobj.
It in turn has a module ref count on the edac_core module. When internal
kobjects are released, they dec the ref count on the top level kobj. When the
top level kobj reaches zero, it decrements the ref count on the edac_core
object, allow it to be unloaded, as all resources have all now been released.

Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b8f6f975 19-Jul-2007 Doug Thompson <dougthompson@xmission.com>

drivers/edac: fix edac_mc init apis

Refactoring of sysfs code necessitated the refactoring of the edac_mc_alloc()
and edac_mc_add_mc() apis, of moving the index value to the alloc() function.
This patch alters the in tree drivers to utilize this new api signature.

Having the index value performed later created a chicken-and-the-egg issue.
Moving it to the alloc() function allows for creating the necessary sysfs
entries with the proper index number

Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7391c6dc 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com>

drivers/edac: mod edac_align_ptr function

Refactor the edac_align_ptr() function to reduce the noise of casting the
aligned pointer to the various types of data objects and modified its callers
to its new signature

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 052dfb45 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com>

drivers/edac: cleanup spaces-gotos after Lindent messup

This patch fixes some remnant spaces inserted by the use of Lindent.
Seems Lindent adds some spaces when it shoulded. These have been fixed.
In addition, goto targets have issues, these have been fixed
in this patch.

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 86aa8cb7 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com>

drivers/edac: cleanup workq ifdefs

The origin of this code comes from patches at sourceforge, that
allow EDAC to be updated to various kernels. With kernel version 2.6.20 a
new workq system was installed, thus the patches needed to be modified
based on the kernel version. For submitting to the latest kernel.org
those #ifdefs are removed

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 079708b9 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com>

drivers/edac: core Lindent cleanup

Run the EDAC CORE files through Lindent for cleanup

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4de78c68 19-Jul-2007 Dave Jiang <djiang@mvista.com>

drivers/edac: mod PCI poll names

Fixup poll values for MC and PCI.
Also make mc function names unique to mc.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmissin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 66ee2f94 19-Jul-2007 Dave Jiang <djiang@mvista.com>

drivers/edac: mod assert_error check

Change error check and clear variable from an atomic to an int

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 81d87cb1 19-Jul-2007 Dave Jiang <djiang@mvista.com>

drivers/edac: mod MC to use workq instead of kthread

Move the memory controller object to work queue based implementation from the
kernel thread based.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c4192705 19-Jul-2007 Dave Jiang <djiang@mvista.com>

drivers/edac: add dev_name getter function

Move dev_name() macro to a more generic interface since it's not possible
to determine whether a device is pci, platform, or of_device easily.

Now each low level driver sets the name into the control structure, and
the EDAC core references the control structure for the information.

Better abstraction.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 20bcb7a8 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com>

drivers/edac: mod use edac_core.h

In the refactoring of edac_mc.c into several subsystem files,
the header file edac_mc.h became meaningless. A new header file
edac_core.h was created. All the files that previously included
"edac_mc.h" are changed to include "edac_core.h".

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c0d12172 19-Jul-2007 Dave Jiang <djiang@mvista.com>

drivers/edac: add new nmi rescan

Provides a way for NMI reported errors on x86 to notify the EDAC
subsystem pending ECC errors by writing to a software state variable.

Here's the reworked patch. I added an EDAC stub to the kernel so we can
have variables that are in the kernel even if EDAC is a module. I also
implemented the idea of using the chip driver to select error detection
mode via module parameter and eliminate the kernel compile option.
Please review/test. Thx!

Also, I only made changes to some of the chipset drivers since I am
unfamiliar with the other ones. We can add similar changes as we go.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 63b7df91 19-Jul-2007 Matthias Kaehlcke <matthias.kaehlcke@gmail.com>

drivers/edac: change from semaphore to mutex operation

The EDAC core code uses a semaphore as mutex. use the mutex API
instead of the (binary) semaphore.

Matthaias wrote this, but since I had some patches ahead of it,
I need to modify it to follow my patches.

Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e27e3dac 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com>

drivers/edac: add edac_device class

This patch adds the new 'class' of object to be managed, named: 'edac_device'.

As a peer of the 'edac_mc' class of object, it provides a non-memory centric
view of an ERROR DETECTING device in hardware. It provides a sysfs interface
and an abstraction for varioius EDAC type devices.

Multiple 'instances' within the class are possible, with each 'instance'
able to have multiple 'blocks', and each 'block' having 'attributes'.

At the 'block' level there are the 'ce_count' and 'ue_count' fields
which the device driver can update and/or call edac_device_handle_XX()
functions. At each higher level are additional 'total' count fields,
which are a summation of counts below that level.

This 'edac_device' has been used to capture and present ECC errors
which are found in a a L1 and L2 system on a per CORE/CPU basis.

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7c9281d7 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com>

drivers/edac: split out functions to unique files

This is a large patch to refactor the original EDAC module in the kernel
and to break it up into better file granularity, such that each source
file contains a given subsystem of the EDAC CORE.

Originally, the EDAC 'core' was contained in one source file: edac_mc.c
with it corresponding edac_mc.h file.

Now, there are the following files:

edac_module.c The main module init/exit function and other overhead
edac_mc.c Code handling the edac_mc class of object
edac_mc_sysfs.c Code handling for sysfs presentation
edac_pci_sysfs.c Code handling for PCI sysfs presentation
edac_core.h CORE .h include file for 'edac_mc' and 'edac_device' drivers
edac_module.h Internal CORE .h include file

This forms a foundation upon which a later patch can create the 'edac_device'
class of object code in a new file 'edac_device.c'.

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2da1c119 19-Jul-2007 Adrian Bunk <bunk@stusta.de>

drivers/edac: core: make functions static

This patch makes needlessly global code static, in the edac core

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5da0831c 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com>

drivers/edac: add edac_mc_find API

This simple patch adds an important CORE API for EDAC that EDAC drivers can
use to find their edac_mc control structure by passing a mem_ctl_info
'instance' value

Needed for subsequent patches

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 83144186 17-Jul-2007 Rafael J. Wysocki <rjw@rjwysocki.net>

Freezer: make kernel threads nonfreezable by default

Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9794f33d 12-Feb-2007 eric wollesen <ericw@xmtp.net>

[PATCH] EDAC: Add Fully-Buffered DIMM APIs to core

Eric Wollesen ported the Bluesmoke Memory Controller driver for the Intel
5000X/V/P (Blackford/Greencreek) chipset to the in kernel EDAC model.

This patch incorporates those required changes to the edac_mc.c and edac_mc.h
core files by added new Fully Buffered DIMM interface to the EDAC Core module.

Signed-off-by: eric wollesen <ericw@xmtp.net>
Signed-off-by: doug thompson <norsk5@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4f423ddf 12-Feb-2007 Frithiof Jensen <frithiof.jensen@ericson.com>

[PATCH] EDAC: Add memory scrubbing controls API to core

This is an attempt of providing an interface for memory scrubbing control in
EDAC.

This patch modifies the EDAC Core to provide the Interface for memory
controller modules to implment.

The following things are still outstanding:

- K8 is the first implemenation,

The patch provide a method of configuring the K8 hardware memory scrubber
via the 'mcX' sysfs directory. There should be some fallback to a generic
scrubber implemented in software if the hardware does not support
scrubbing.

Or .. the scrubbing sysfs entry should not be visible at all.

- Only works with SDRAM, not cache,

The K8 can scrub cache and l2cache also - but I think this is not so
useful as the cache is busy all the time (one hopes).

One would also expect that cache scrubbing requires hardware support.

- Error Handling,

I would like that errors are returned to the user in "terms of file
system".

- Presentation,

I chose Bandwidth in Bytes/Second as a representation of the scrubbing
rate for the following reasons:

I like that the sysfs entries are sort-of textual, related to something
that makes sense instead of magical values that must be looked up.

"My People" wants "% main memory scrubbed per hour" others prefer "%
memory bandwidth used" as representation, "bandwith used" makes it easy to
calculate both versions in one-liner scripts.

If one later wants to scrub cache, the scaling becomes wierd for K8
changing from "blocks of 64 byte memory" to "blocks of 64 cache lines" to
"blocks of 64 bit". Using "bandwidth used" makes sense in all three cases,
(I.M.O. anyway ;-).

- Discovery,

There is no way to discover the possible settings and what they do
without reading the code and the documentation.

*I* do not know how to make that work in a practical way.

- Bugs(??),

other tools can set invalid values in the memory scrub control register,
those will read back as '-1', requiring the user to reset the scrub rate.
This is how *I* think it should be.

- Afflicting other areas of code,

I made changes to edac_mc.c and edac_mc.h which will show up globally -
this is not nice, it would be better that the memory scrubbing fuctionality
and interface could be entirely contained within the memory controller it
applies to.

Frithiof Jensen

edac_mc.c and its .h file is a CORE helper module for EDAC
driver modules. This provides the abstraction for device specific
drivers. It is fine to modify this CORE to provide help for
new features of the the drivers

doug thompson

Signed-off-by: Frithiof Jensen <frithiof.jensen@ericson.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7dfb7103 06-Dec-2006 Nigel Cunningham <ncunningham@linuxmail.org>

[PATCH] Add include/linux/freezer.h and move definitions from sched.h

Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.

[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 77d6e139 02-Nov-2006 Akinobu Mita <akinobu.mita@gmail.com>

[PATCH] edac_mc: fix error handling

Call sysdev_class_unregister() on failure in edac_sysfs_memctrl_setup()
and decrease identation level for clear logic.

Acked-by: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 49c0dab7 10-Jul-2006 Doug Thompson <norsk5@xmission.com>

[PATCH] Fix and enable EDAC sysfs operation

When EDAC was first introduced into the kernel it had a sysfs interface,
but due to some problems it was disabled in 2.6.16 and remained disabled in
2.6.17.

With feedback, several of the control and attribute files of that interface
had some good constructive feedback. PCI Blacklist/Whitelist was a major
set which has design issues and it has been removed in this patch. Instead
of storing PCI broken parity status in EDAC, it has been moved to the
pci_dev structure itself by a previous PCI patch. A future patch will
enable that feature in EDAC by utilizing the pci_dev info.

The sysfs is now enabled in this patch, with a minimal set of control and
attribute files for examining EDAC state and for enabling/disabling the
memory and PCI operations.

The Documentation for EDAC has also been updated to reflect the new state
of EDAC operation.

Signed-off-by:Doug Thompson <norsk5@xmisson.com>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 2d7bbb91 30-Jun-2006 Doug Thompson <norsk5@xmission.com>

[PATCH] EDAC: mc numbers refactor 1-of-2

Remove add_mc_to_global_list(). In next patch, this function will be
reimplemented with different semantics.

1 Reimplement add_mc_to_global_list() with semantics that allow the caller to
determine the ID number for a mem_ctl_info structure. Then modify
edac_mc_add_mc() so that the caller specifies the ID number for the new
mem_ctl_info structure. Platform-specific code should be able to assign the
ID numbers in a platform-specific manner. For instance, on Opteron it makes
sense to have the ID of the mem_ctl_info structure match the ID of the node
that the memory controller belongs to.

2 Modify callers of edac_mc_add_mc() so they use the new semantics.

Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 37f04581 30-Jun-2006 Doug Thompson <norsk5@xmission.com>

[PATCH] EDAC: PCI device to DEVICE cleanup

Change MC drivers from using CVS revision strings for their version number,
Now each driver has its own local string.

Remove some PCI dependencies from the core EDAC module. Made the code 'struct
device' centric instead of 'struct pci_dev' Most of the code changes here are
from a patch by Dave Jiang. It may be best to eventually move the
PCI-specific code into a separate source file.

Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# 7f927fcc 28-Mar-2006 Alexey Dobriyan <adobriyan@gmail.com>

[PATCH] Typo fixes

Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9110540f 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: use EXPORT_SYMBOL_GPL

Change all instances of EXPORT_SYMBOL() in the core EDAC module to
EXPORT_SYMBOL_GPL().

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e7ecd891 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: formatting cleanup

Cosmetic indentation/formatting cleanup for EDAC code. Make sure we
are using tabs rather than spaces to indent, etc.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 54933ddd 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: reorder EXPORT_SYMBOL macros

Fix EDAC code so EXPORT_SYMBOL comes after the function that is being
exported. This is to maintain consistency with the rest of the kernel.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 18dbc337 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: protect memory controller list

- Fix code so we always hold mem_ctls_mutex while we are stepping
through the list of mem_ctl_info structures. Otherwise bad things
may happen if one task is stepping through the list while another
task is modifying it. We may eventually want to use reference
counting to manage the mem_ctl_info structures. In the meantime we
may as well fix this bug.

- Don't disable interrupts while we are walking the list of
mem_ctl_info structures in check_mc_devices(). This is unnecessary.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 472678eb 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: kobject/sysfs fixes

- After we unregister a kobject, wait for our kobject release method
to call complete(). This causes us to wait until the kobject
reference count reaches 0. Otherwise, a task accessing the EDAC
sysfs interface can hold the reference count above 0 until after the
EDAC module has been unloaded. When the reference count finally
drops to 0, this will result in an attempt to call our release
method inside the EDAC module after the module has already been
unloaded.

This isn't the best fix, since a process can get stuck sleeping forever
uninterruptibly if the user does the following:

rmmod my_module < /sys/my_sysfs/file

I'll go back and implement a better fix later. However this should
be ok for now.

- Call edac_remove_sysfs_mci_device() from edac_mc_del_mc() rather
than from edac_mc_free(). Since edac_mc_add_mc() calls
edac_create_sysfs_mci_device(), edac_mc_del_mc() should call
edac_remove_sysfs_mci_device().

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6e5a8748 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: kobject_init/kobject_put fixes

- Remove calls to kobject_init(). These are unnecessary because
kobject_register() calls kobject_init().

- Remove extra calls to kobject_put(). When we call
kobject_unregister(), this releases our reference to the kobject.
The extra calls to kobject_put() may cause the reference count to
drop to 0 while a kobject is still in use.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 028a7b6d 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: edac_mc_add_mc fix [2/2]

This is part 2 of a 2-part patch set.

Fix edac_mc_add_mc() so it cleans up properly if call to
edac_create_sysfs_mci_device() fails.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a1d03fcc 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: edac_mc_add_mc fix [1/2]

This is part 1 of a 2-part patch set. The code changes are split into
two parts to make the patches more readable.

Move complete_mc_list_del() and del_mc_from_global_list() so we can
call del_mc_from_global_list() from edac_mc_add_mc() without forward
declarations. Perhaps using forward declarations would be better?
I'm doing things this way because the rest of the code is missing
them.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 749ede57 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: cleanup code for clearing initial errors

Fix xxx_probe1() functions so they call xxx_get_error_info() functions
to clear initial errors. This is simpler and cleaner than duplicating
the low-level code for accessing PCI config space.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 537fba28 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: printk cleanup

This implements the following idea:

On Monday 30 January 2006 19:22, Eric W. Biederman wrote:
> One piece missing from this conversation is the issue that we need errors
> in a uniform format. That is why edac_mc has helper functions.
>
> However there will always be errors that don't fit any particular model.
> Could we add a edac_printk(dev, ); That is similar to dev_printk but
> prints out an EDAC header and the device on which the error was found?
> Letting the rest of the string be user specified.
>
> For actual control that interface may be to blunt, but at least for people
> looking in the logs it allows all of the errors to be detected and
> harvested.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f2fe42ab 26-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: switch to kthread_ API

This patch was originally posted by Christoph Hellwig (see
http://lkml.org/lkml/2006/2/14/331):

"Christoph Hellwig" <hch@lst.de> wrote:
> Use the kthread_ API instead of opencoding lots of hairy code for kernel
> thread creation and teardown, including tasklist_lock abuse.
>

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: <dave_peterson@pobox.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# ceb2ca9c 13-Mar-2006 Dave Peterson <dsp@llnl.gov>

[PATCH] EDAC: disable sysfs interface

- Disable the EDAC sysfs code. The sysfs interface that EDAC presents to
user space needs more thought, and is likely to change substantially.
Therefore disable it for now so users don't start depending on it in its
current form.

- Disable the default behavior of calling panic() when an uncorrectible
error is detected (since for now, there is no sysfs interface that allows
the user to configure this behavior).

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 4136cabf 11-Mar-2006 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] edac: disable a few sysfs files to avoid them becoming an ABI

Disable (via ugly #if 0's) the 3 sysfs files that I think by now we all
agree are very much wrong. These files shouldn't become part of the ABI by
the 2.6.16 release, so I rather have this minimal patch merged to disable
them for now, the real fix can then come during the 2.6.17 devel window.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 353368df 03-Feb-2006 Eric W. Biederman <ebiederm@xmission.com>

[PATCH] edac_mc: Remove include of version.h

By including version.h edac_mc was rebuilding on every incremental build.
Which defeats the point of incremental builds.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# da9bb1d2 18-Jan-2006 Alan Cox <alan@lxorguk.ukuu.org.uk>

[PATCH] EDAC: core EDAC support code

This is a subset of the bluesmoke project core code, stripped of the NMI work
which isn't ready to merge and some of the "interesting" proc functionality
that needs reworking or just has no place in kernel. It requires no core
kernel changes except the added scrub functions already posted.

The goal is to merge further functionality only after the core code is
accepted and proven in the base kernel, and only at the point the upstream
extras are really ready to merge.

From: doug thompson <norsk5@xmission.com>

This converts EDAC to sysfs and is the final chunk neccessary before EDAC
has a stable user space API and can be considered for submission into the
base kernel.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>