History log of /linux-master/drivers/edac/i7core_edac.c
Revision Date Author Comments
# a2f99fba 28-Nov-2023 Abhinav Singh <singhabhinav9051571833@gmail.com>

EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer

Sparse warns about the use of the integer constant 0 as a NULL pointer
with the -Wnon-pointer-null switch.

Even though the C standard requires that 0 == NULL and type conversion
rules turn an integer constant 0 into a NULL pointer when cast to a void
* type, Linus notes that this is a very poor situation from a type
safety angle and a pointer should be initialized with a pointer type
- not an integer constant.

See https://www.spinics.net/lists/linux-sparse/msg10066.html for more
info.

[ bp: Rewrite commit message, drop useless comments in the code. ]

Signed-off-by: Abhinav Singh <singhabhinav9051571833@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231128141703.614605-1-singhabhinav9051571833@gmail.com


# 45bc6098 07-Jul-2020 Tony Luck <tony.luck@intel.com>

EDAC/{i7core,sb,pnd2,skx}: Fix error event severity

IA32_MCG_STATUS.RIPV indicates whether the return RIP value pushed onto
the stack as part of machine check delivery is valid or not.

Various drivers copied a code fragment that uses the RIPV bit to
determine the severity of the error as either HW_EVENT_ERR_UNCORRECTED
or HW_EVENT_ERR_FATAL, but this check is reversed (marking errors where
RIPV is set as "FATAL").

Reverse the tests so that the error is marked fatal when RIPV is not set.

Reported-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200707194324.14884-1-tony.luck@intel.com


# 7d4c1ea2 08-Jul-2020 Alexander A. Klimov <grandmaster@al2klimov.de>

EDAC: Replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.

[ bp: Merge all EDAC patches into a single one. ]

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tero Kristo <t-kristo@ti.com> # ti_edac
Link: https://lkml.kernel.org/r/20200708113546.14135-1-grandmaster@al2klimov.de


# 23ba710a 14-Feb-2020 Tony Luck <tony.luck@intel.com>

x86/mce: Fix all mce notifiers to update the mce->kflags bitmask

If the handler took any action to log or deal with the error, set a bit
in mce->kflags so that the default handler on the end of the machine
check chain can see what has been done.

Get rid of NOTIFY_STOP returns. Make the EDAC and dev-mcelog handlers
skip over errors already processed by CEC.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200214222720.13168-5-tony.luck@intel.com


# bc9ad9e4 06-Nov-2019 Robert Richter <rrichter@marvell.com>

EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function

The EDAC_DIMM_PTR() macro takes 3 arguments from struct mem_ctl_info.
Clean up this interface to only pass the mci struct and replace this
macro with a new function edac_get_dimm().

Also introduce an edac_get_dimm_by_index() function for later use.
This allows it to get a DIMM pointer only by a given index. This can
be useful if the DIMM's position within the layers of the memory
controller or the exact size of the layers are unknown.

Small style changes made for some hunks after applying the semantic
patch.

Semantic patch used:

@@ expression mci, a, b,c; @@

-EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, a, b, c)
+edac_get_dimm(mci, a, b, c)

[ bp: Touchups. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-2-rrichter@marvell.com


# 12237550 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 172

Based on 1 normalized pattern(s):

this file may be distributed under the terms of the gnu general
public license version 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 9 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070034.395589349@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 1722bc0e 09-Nov-2018 Colin Ian King <colin.king@canonical.com>

EDAC: Fix indentation issues in several EDAC drivers

Replace spaces with tabs and insert missing indentation.

[ bp: Rewrite commit message. ]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "Arvind R." <arvino55@gmail.com>
CC: Mark Gross <mark.gross@intel.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Ranganathan Desikan <ravi@jetztechnologies.com>
CC: kernel-janitors@vger.kernel.org
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181109133757.21471-1-colin.king@canonical.com


# 432de7fd 28-Sep-2018 Tony Luck <tony.luck@intel.com>

EDAC, {i7core,sb,skx}_edac: Fix uncorrected error counting

The count of errors is picked up from bits 52:38 of the machine check
bank status register. But this is the count of *corrected* errors. If an
uncorrected error is being logged, the h/w sets this field to 0. Which
means that when edac_mc_handle_error() is called, the EDAC core will
carefully add zero to the appropriate uncorrected error counts.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180928213934.19890-1-tony.luck@intel.com


# 07517369 24-Sep-2018 YueHaibing <yuehaibing@huawei.com>

EDAC, i7core: Remove set but not used variable pvt

Remove the unused local variable pvt:

drivers/edac/i7core_edac.c: In function 'i7core_mce_check_error':
drivers/edac/i7core_edac.c:1818:21: warning: variable 'pvt' set but not used \
[-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1537841043-108267-1-git-send-email-yuehaibing@huawei.com


# 6f6da136 18-Sep-2018 Qiuxu Zhuo <qiuxu.zhuo@intel.com>

EDAC: Correct DIMM capacity unit symbol

The {i3200|i7core|sb|skx}_edac drivers show DIMM capacity using the
wrong unit symbol: 'Mb' - megabit. Fix them by replacing 'Mb' with
'MiB' - mebibyte.

[Tony: These are all "edac_dbg()" messages, so this won't break scripts
that parse console logs.]

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20180919003433.16475-1-tony.luck@intel.com


# 6c974d4d 12-Jun-2018 Johan Hovold <johan@kernel.org>

EDAC, i7core: Fix memleaks and use-after-free on probe and remove

Make sure to free and deregister the addrmatch and chancounts devices
allocated during probe in all error paths. Also fix use-after-free in a
probe error path and in the remove success path where the devices were
being put before before deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 356f0a30860d ("i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy")
Link: http://lkml.kernel.org/r/20180612124335.6420-2-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>


# 6396bb22 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kzalloc() -> kcalloc()

The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

kzalloc(a * b, gfp)

with:
kcalloc(a * b, gfp)

as well as handling cases of:

kzalloc(a * b * c, gfp)

with:

kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# 83e548be 03-May-2018 Colin Ian King <colin.king@canonical.com>

EDAC, i7core: Fix spelling mistake: "redundacy" -> "redundancy"

Trivial fix to spelling mistake in err string.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180504113804.17103-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# 75f029c3 20-Sep-2017 Arvind Yadav <arvind.yadav.cs@gmail.com>

EDAC: Handle return value of kasprintf()

kasprintf() can fail and we must check its return value.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: linux-edac@vger.kernel.org
[ Merged into a single patch, small formatting fixups. ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# b2b3e736 19-Aug-2017 Bhumika Goyal <bhumirks@gmail.com>

EDAC: Make device_type const

Make these const as they are only stored in the type field of a device
structure, which is const.

Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1503130946-2854-2-git-send-email-bhumirks@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# c54182ec 28-Jun-2017 Borislav Petkov <bp@suse.de>

EDAC: Get rid of mci->mod_ver

It is a write-only variable so get rid of it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Loc Ho <lho@apm.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org


# 1c18be5a 17-Jul-2017 Arvind Yadav <arvind.yadav.cs@gmail.com>

EDAC: Constify attribute_group structures

attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
CC: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/776cb8265509054abd01b0b551624cc0da3b88e7.1499078335.git.arvind.yadav.cs@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# 9026cc82 23-Jan-2017 Borislav Petkov <bp@suse.de>

x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority

Assign all notifiers on the MCE decode chain a priority so that they get
called in the correct order.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 78d88e8a 29-Oct-2016 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: rename edac_core.h to edac_mc.h

Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>


# 53595345 28-Apr-2016 Tony Luck <tony.luck@intel.com>

EDAC, i7core: Remove double buffering of error records

In the bad old days the functions from x86_mce_decoder_chain could be
called in machine check context. So we used to carefully copy them and
defer processing until later. But in

f29a7aff4bd60 ("x86/mce: Avoid potential deadlock due to printk() in MCE context")

we switched the logging code to save the record in a genpool, and call
the functions that registered to be notified later from a work queue.

So drop all the double buffering and do all the work we want to do as
soon as i7core_mce_check_error() is called.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/29ab2c370915c6e132fc5d88e7b72cb834bedbfe.1461855008.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# c4fc1956 29-Apr-2016 Tony Luck <tony.luck@intel.com>

EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback

Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# eef4dfa0 12-Aug-2015 Borislav Petkov <bp@suse.de>

x86/mce: Kill drain_mcelog_buffer()

This used to flush out MCEs logged during early boot and which
were in the MCA registers from a previous system run. No need
for that now, since we've moved to a genpool.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fd4cf79f 12-Aug-2015 Chen, Gong <gong.chen@linux.intel.com>

x86/mce: Remove the MCE ring for Action Optional errors

Use unified genpool to save Action Optional error events and put
Action Optional error handling in the same notification chain as
MCE error decoding.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Fold in subsequent patch from Boris for early boot logging. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Correct a lot. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2eace188 04-Feb-2015 Takashi Iwai <tiwai@suse.de>

EDAC: i7core: Use static attribute groups for sysfs entries

... instead of manual device_create_file() and device_remove_file()
calls.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ Add NULL terminator to i7core_dev_attrs[] caught by the build robot. ]
Reported-by: Huang Ying <ying.huang@intel.com>
Link: http://lkml.kernel.org/r/1423046938-18111-6-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>


# e97d7e38 04-Feb-2015 Takashi Iwai <tiwai@suse.de>

EDAC: i7core: Return proper error codes for kzalloc() errors

... instead of possibly uninitialized return value.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-5-git-send-email-tiwai@suse.de
[ Add a commit message, albeit a small one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# f118920b 24-Feb-2014 Jean Delvare <jdelvare@suse.de>

i7core_edac: Drop unused variable

Fix the following warning:

drivers/edac/i7core_edac.c: In function "core_mce_output_error":
drivers/edac/i7core_edac.c:1711:8: warning: variable "type" set but not used [-Wunused-but-set-variable]
char *type, *optype, *err;
^
According to Mauro, type can just be dropped, as tp_event now maps if
the error is corrected, uncorrected non-fatal or uncorrected fatal
one.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224171358.692d7e5a@endymion.delvare
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>


# c0f5eeed 24-Feb-2014 Jean Delvare <jdelvare@suse.de>

i7core_edac: Fix PCI device reference count

The reference count changes done by pci_get_device can be a little
misleading when the usage diverges from the most common scheme. The
reference count of the device passed as the last parameter is always
decreased, even if the function returns no new device. So if we are
going to try alternative device IDs, we must manually increment the
device reference count before each retry. If we don't, we end up
decreasing the reference count, and after a few modprobe/rmmod cycles
the PCI devices will vanish.

In other words and as Alan put it: without this fix the EDAC code
corrupts the PCI device list.

This fixes kernel bug #50491:
https://bugzilla.kernel.org/show_bug.cgi?id=50491

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare
Reviewed-by: Alan Cox <alan@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>


# 37e59f87 07-Feb-2014 Mauro Carvalho Chehab <mchehab@kernel.org>

[media, edac] Change my email address

There are several left overs with my old email address.
Remove their occurrences and add myself at CREDITS, to
allow people to be able to reach me on my new addresses.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>


# ba935f40 06-Dec-2013 Jingoo Han <jg1.han@samsung.com>

EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro

Currently, there is no other bus that has something like this macro for
their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com
[ Boris: swap commit message with better one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# c7f62fc8 01-Jun-2013 Jingoo Han <jg1.han@samsung.com>

EDAC: Replace strict_strtoul() with kstrtoul()

The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>


# 9b3c6e85 21-Dec-2012 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Drivers: edac: remove __dev* attributes.

CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, and __devexit
from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# c31d34fe 29-Jan-2012 Niklas Söderlund <niso@kth.se>

i7core_edac: fix erroneous size of static array

Remove size from lookup arrays and mark them as const.

Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Niklas Söderlund <niso@kth.se>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 42709efb 16-Oct-2012 Prarit Bhargava <prarit@redhat.com>

i7core_edac: fix panic when accessing sysfs files

The i7core_edac addrmatch_dev and chancounts_dev have sysfs files
associated with them. The sysfs files, however, are coded so that the
parent device is is the mci device. This is incorrect and the mci struct
should be obtained through the addrmatch_dev and chancounts_dev device's
private data field which is populated in i7core_create_sysfs_devices().

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 00d18339 04-Jun-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: properly handle error count

Instead of generating a burst of errors or reporting the error
count via driver-specific details, use the new way provided by
edac_mc_handle_error.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 9eb07a7f 04-Jun-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: edac_mc_handle_error(): add an error_count parameter

In order to avoid loosing error events, it is desirable to group
error events together and generate a single trace for several identical
errors.

The trace API already allows reporting multiple errors. Change the
handle_error function to also allow that.

The changes at the drivers were made by this small script:

$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 03f7eae8 04-Jun-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: remove arch-specific parameter for the error handler

Remove the arch-dependent parameter, as it were not used,
as the MCE tracepoint weren't implemented. It probably doesn't
make sense to have an MCE-specific tracepoint, as this will
cost more bytes at the tracepoint, and tracepoint is not free.

The changes at the EDAC drivers were done by this small perl script:

$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 956b9ba1 29-Apr-2012 Joe Perches <joe@perches.com>

edac: Convert debugfX to edac_dbg(X,

Use a more common debugging style.

Remove __FILE__ uses, add missing newlines,
coalesce formats and align arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# dd23cd6e 29-Apr-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs

The debug macro already adds that. Most of the work here was
made by this small script:

$f .=$_ while (<>);

$f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*": /\1"/g;
$f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*/\1/g;
$f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*"MC: /\1"/g;

$f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
$f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;
$f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
$f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;

$f =~ s/\"MC\: \\n\"/"MC:\\n"/g;

print $f;

After running the script, manual cleanups were done to fix it the remaining
places.

While here, removed the __LINE__ on most places, as it doesn't actually give
useful info on most places.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 356f0a30 30-Mar-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy

Kernel kobjects have rigid rules: each container object should be
dynamically allocated, and can't be allocated into a single kmalloc.

EDAC never obeyed this rule: it has a single malloc function that
allocates all needed data into a single kzalloc.

As this is not accepted anymore, change the allocation schema of the
EDAC *_info structs to enforce this Kernel standard.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 5c4cdb5a 21-Mar-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: convert it to use struct device

Instead of relying on a complex logic inside the edac core to create
a "device tree-like" sysfs struct, just use device_add.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# fd687502 16-Mar-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Rename the parent dev to pdev

As EDAC doesn't use struct device itself, it created a parent dev
pointer called as "pdev". Now that we'll be converting it to use
struct device, instead of struct devsys, this needs to be fixed.

No functional changes.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# e35fca47 08-May-2012 Chen Gong <gong.chen@linux.intel.com>

edac: avoid mce decoding crash after edac driver unloaded

Some edac drivers register themselves as mce decoders via
notifier_chain. But in current notifier_chain implementation logic,
it doesn't accept same notifier registered twice. If so, it will be
wrong when adding/removing the element from the list. For example,
on one SandyBridge platform, remove module sb_edac and then trigger
one error, it will hit oops because it has no mce decoder registered
but related notifier_chain still points to an invalid callback
function. Here is an example:

Call Trace:
[<ffffffff8150ef6a>] atomic_notifier_call_chain+0x1a/0x20
[<ffffffff8102b936>] mce_log+0x46/0x180
[<ffffffff8102eaea>] apei_mce_report_mem_error+0x4a/0x60
[<ffffffff812e19d2>] ghes_do_proc+0x192/0x210
[<ffffffff812e2066>] ghes_proc+0x46/0x70
[<ffffffff812e20d8>] ghes_notify_sci+0x48/0x80
[<ffffffff8150ef05>] notifier_call_chain+0x55/0x80
[<ffffffff81076f1a>] __blocking_notifier_call_chain+0x5a/0x80
[<ffffffff812aea11>] ? acpi_os_wait_events_complete+0x23/0x23
[<ffffffff81076f56>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff812ddc4d>] acpi_hed_notify+0x19/0x1b
[<ffffffff812b16bd>] acpi_device_notify+0x19/0x1b
[<ffffffff812beb38>] acpi_ev_notify_dispatch+0x67/0x7f
[<ffffffff812aea3a>] acpi_os_execute_deferred+0x29/0x36
[<ffffffff81069dc2>] process_one_work+0x132/0x450
[<ffffffff8106bbcb>] worker_thread+0x17b/0x3c0
[<ffffffff8106ba50>] ? manage_workers+0x120/0x120
[<ffffffff81070aee>] kthread+0x9e/0xb0
[<ffffffff81514724>] kernel_thread_helper+0x4/0x10
[<ffffffff81070a50>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff81514720>] ? gs_change+0x13/0x13
Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41
0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b
79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41
RIP [<ffffffff8150eef6>] notifier_call_chain+0x46/0x80
RSP <ffff88042868fb20>
CR2: ffffffffa01af838
---[ end trace 0100930068e73e6f ]---
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff810705b0>] kthread_data+0x10/0x20
PGD 1a0d067 PUD 1a0e067 PMD 0
Oops: 0000 [#2] SMP

Only i7core_edac and sb_edac have such issues because they have more
than one memory controller which means they have to register mce
decoder many times.

Cc: <stable@vger.kernel.org> # 3.2 and upper
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 0bf09e82 26-Apr-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: fix ranks information at the per-channel struct

There is a flag at the per-channel struct that indicates if there are
any 4R dimm on it. The way the presence of this flag were reported
is not ok, as it might give the false idea that the channel were filled
with 2R memories:

[ 580.588701] EDAC DEBUG: get_dimm_config: Ch1 phy rd1, wr1 (0x063f7431): 2 ranks, UDIMMs
[ 580.588704] EDAC DEBUG: get_dimm_config: dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400

(in this case, just one 1R memory is filled on channel 1)

So, use a better way to represent the per-channel ranks information.
After the patch, it will show:

[ 2002.233978] EDAC DEBUG: get_dimm_config: Ch0 phy rd0, wr0 (0x063f7431): UDIMMs
[ 2002.233982] EDAC DEBUG: get_dimm_config: dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400
[ 2002.233988] EDAC DEBUG: get_dimm_config: dimm 1 1024 Mb offset: 4, bank: 8, rank: 1, row: 0x4000, col: 0x400

(in this case, there isn't any 4R memories)

Reported-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# e17a2f42a 11-May-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Cleanup the logs for i7core and sb edac drivers

Remove some information that it is duplicated at the MCE log,
and don't have much usage for the error. Those data will be
added again, when creating a trace function that outputs both
memory errors and MCE fields.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# ca0907b9 02-May-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Remove the legacy EDAC ABI

Now that all drivers got converted to use the new ABI, we can
drop the old one.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 0975c16f 16-Apr-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: convert driver to use the new edac ABI

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a895bf8b 28-Jan-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: move nr_pages to dimm struct

The number of pages is a dimm property. Move it to the dimm struct.

After this change, it is possible to add sysfs nodes for the DIMM's that
will properly represent the DIMM stick properties, including its size.

A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
the memory controller represents the memory via chip select rows.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 5e2af0c0 27-Jan-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Don't initialize csrow's first_page & friends when not needed

Almost all edac drivers initialize csrow_info->first_page,
csrow_info->last_page and csrow_info->page_mask. Those vars are
used inside the EDAC core, in order to calculate the csrow affected
by an error, by using the routine edac_mc_find_csrow_by_page().

However, very few drivers actually use it:
e752x_edac.c
e7xxx_edac.c
i3000_edac.c
i82443bxgx_edac.c
i82860_edac.c
i82875p_edac.c
i82975x_edac.c
r82600_edac.c

There also a few other drivers that have their own calculus
formula internally using those vars.

All the others are just wasting time by initializing those
data.

While initializing data without using them won't cause any troubles, as
those information is stored at the wrong place (at csrows structure), it
is better to remove what is unused, in order to simplify the next patch.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 084a4fcc 27-Jan-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: move dimm properties to struct dimm_info

On systems based on chip select rows, all channels need to use memories
with the same properties, otherwise the memories on channels A and B
won't be recognized.

However, such assumption is not true for all types of memory
controllers.

Controllers for FB-DIMM's don't have such requirements.

Also, modern Intel controllers seem to be capable of handling such
differences.

So, we need to get rid of storing the DIMM information into a per-csrow
data, storing it, instead at the right place.

The first step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Mike Williams <mike@mikebwilliams.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a7d7d2e1 27-Jan-2012 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: Create a dimm struct and move the labels into it

The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.

This forced drivers to fake^Wvirtualize a csrow struct, and to create
a mess under csrow/channel original's concept.

Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel.
Latter patches will modify the location to properly represent the
memory architecture.

All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.

TODO: While this patch doesn't change the existing behavior, on
csrows-based memory controllers, a csrow/channel pair points to a memory
rank. There's a known bug at the EDAC core that allows having different
labels for the same DIMM, if it has more than one rank. A latter patch
is need to merge the several ranks for a DIMM into the same dimm_info
struct, in order to avoid having different labels for the same DIMM.

The edac_mc_alloc() will now contain a per-dimm initialization loop that
will be changed by latter patches in order to match other types of
memory architectures.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 15ed103a 17-Apr-2012 David Mackey <tdmackey@twitter.com>

edac: Fix spelling errors.

Signed-off-by: David Mackey <tdmackey@twitter.com>
Signed-off-by: Vinson Lee <vlee@twitter.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 36c46f31 26-Feb-2012 Lionel Debroux <lionel_debroux@yahoo.fr>

EDAC: Make pci_device_id tables __devinitconst.

These const tables are currently marked __devinitdata, but
Documentation/PCI/pci.txt says:

"o The ID table array should be marked __devinitconst; this is done
automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()."

So use DEFINE_PCI_DEVICE_TABLE(x).

Based on PaX and earlier work by Andi Kleen.

Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>


# 3653ada5 04-Dec-2011 Borislav Petkov <borislav.petkov@amd.com>

x86, mce: Add wrappers for registering on the decode chain

No functionality change, this is done so that in a follow-on patch all
queued-up MCEs can be decoded after registering on the chain.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>


# 767ba4a5 16-Sep-2011 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Initialize memory name with cpu, channel, bank

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 4fad8098 21-Sep-2011 Sedat Dilek <sedat.dilek@googlemail.com>

i7core_edac: Fix compilation on 32 bits arch

on i386:
ERROR: "__udivdi3" [drivers/edac/i7core_edac.ko] undefined!\

In both get_sdram_scrub_rate() and set_sdram_scrub_rate()

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 535e9c78 08-Aug-2011 Nils Carlson <nils.carlson@ericsson.com>

i7core_edac: scrubbing fixups

Get a more reliable DCLK value from DMI, name the SCRUBINTERVAL mask
and guard against potential overflow in the scrub rate computations.

Signed-off-by: Nils Carlson <nils.carlson@ericsson.com>


# 40557591 30-Nov-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: return -ENODEV if no MC is found

Nehalem-EX uses a different memory controller. However, as the
memory controller is not visible on some Nehalem/Nehalem-EP, we
need to indirectly probe via a X58 PCI device. The same devices
are found on (some) Nehalem-EX. So, on those machines, the
probe routine needs to return -ENODEV, as the actual Memory
Controller registers won't be detected.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# f9902f24 21-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: use edac's own way to print errors

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 4140c542 18-Jul-2011 Borislav Petkov <borislav.petkov@amd.com>

i7core_edac: Drop the edac_mce facility

Remove edac_mce pieces and use the normal MCE decoder notifier chain by
retaining the same functionality with considerably less code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 5034086b 22-Jun-2011 Thomas Renninger <trenn@suse.de>

EDAC i7core: Use mce socketid for better compatibility

mce->socketid and cpu_data(mce->cpu).phys_proc_id are the same,
compare with mce_setup (in mce.c):
m->cpu = m->extcpu = smp_processor_id();
...
m->socketid = cpu_data(m->extcpu).phys_proc_id;

This makes it easier for example for XEN patches to hook into
the MCE subsystem.
Compile tested on x86_64.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: JBeulich@novell.com
CC: linux-edac@vger.kernel.org
CC: Mauro Carvalho Chehab <mchehab@redhat.com>


# 27100db0 04-Aug-2011 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Don't enable memory scrubbing for Xeon 35xx

Xeon 35xx doesn't mention memory scrub. It seems that only Xeon 55xx
and above supports it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# e8b6a127 30-Mar-2011 Samuel Gabrielsson <samuel.gabrielsson@gmail.com>

i7core_edac: Add scrubbing support

Add scrubbing support to i7core_edac, tested on intel Xeon L5638.

Signed-off-by: Samuel Gabrielsson <samuel.gabrielsson@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 224e871f 17-Mar-2011 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Fix oops when trying to inject errors

Error injection needs the pci device 0:0. So, we need to revert
this changeset: 79daef2099a02fed35747c23bad22f30441133ea.

Tests need to be made to be sure that refcount won't be wrong
as noticed before.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 80b8ce89 27-Dec-2010 David Sterba <dsterba@suse.cz>

i7core_edac: fix misuse of logical operation in place of bitop

CC: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 8cf2d239 18-Aug-2011 Mathias Krause <minipli@googlemail.com>

i7core_edac: fixed typo in error count calculation

Based on a patch from the PaX Team, found during a clang analysis pass.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: stable@kernel.org [v2.6.35+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 152ba394 31-Mar-2011 Michal Marek <mmarek@suse.cz>

edac: Drop __DATE__ usage

The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.

Cc: Doug Thompson <dougthompson@xmission.com>
Cc: bluesmoke-devel@lists.sourceforge.net
Cc: linux-edac@vger.kernel.org
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# e7bf068a 27-Dec-2010 David Sterba <dsterba@suse.cz>

i7core_edac: fix typos in comments

Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 76a7bd81 24-Oct-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: return -ENODEV when devices were already probed

Due to the nature of i7core, we need to probe and attach all PCI
devices used by this driver during the first time probe is called.
However, PCI core will call the probe routine one time for each CPU
socket. If we return -EINVAL to those calls, it would seem that the
driver fails, when, in fact, there's no more devices left to initialize.

Changing the return code to -ENODEV solves this issue.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 3c52cc57 24-Oct-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: properly terminate pci_dev_table

At pci_xeon_fixup(), it waits for a null-terminated table, while at
i7core_get_all_devices, it just do a for 0..ARRAY_SIZE. As other tables
are zero-terminated, change it to be terminate with 0 as well, and fixes
a bug where it may be running out of the table elements.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a3e15416 21-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Avoid PCI refcount to reach zero on successive load/reload

That's a nasty bug that took me a lot of time to track, and whose
solution took just one line to solve. The best fragrances and the worse
poisons are shipped on the smalest bottles.

The drivers/pci/quick.c implements the pci_get_device function. The normal
behavior is that you call it, the function returns you a pdev pointer
and increment pdev->kobj.kref.refcount of the pci device. However,
if you want to keep searching an object, you need to pass the previous
pdev function to the search.

When you use a not null pointer to pdev "from" field, pci_get_device
will decrement pdev->kobj.kref.refcount, assuming that the driver won't
be using the previous pdev.

The solution is simple: we just need to call pci_dev_get() manually,
for the pdev's that the driver will actually use.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 79daef20 20-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Fix refcount error at PCI devices

Probably due to a bug or some testing logic at PCI level, device
refcount for <bus>:00.0 device is decremented at the end of the
pci_get_device, made by i7core_get_all_devices(). The fact is that
the first versions of the driver relied on those devices to probe
for Nehalem, but the current versions don't use it at all.

So, let's just remove those devices from the driver, making it simpler
and fixing the bug.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 88ef5ea9 20-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL

i7core_unregister_mci() checks internally when mci=NULL. There's no
need to test it outside.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 6d37d240 19-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Fix an oops at i7core probe

changeset c91d57ba9ce5b5c93a7077e2f72510eb1f9131c4 moved the init
of the priv pointer to the end of the probe routine. However, we need
them before that, otherwise, we hit an OOPS:

[ 67.743453] EDAC DEBUG: mci_bind_devs: Associated fn 0.0, dev = ffff88011b46e000, socket 0
[ 67.751861] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 67.759685] IP: [<ffffffffa017e484>] i7core_probe+0x979/0x130c [i7core_edac]
[ 67.766721] PGD 10bd38067 PUD 10bd37067 PMD 0
[ 67.771178] Oops: 0000 [#1] SMP
[ 67.774414] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[ 67.782213] CPU 1
[ 67.784042] Modules linked in: i7core_edac(+) edac_core cpufreq_ondemand binfmt_misc dm_multipath video output pci_slot snd_hda_codd

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 21b6806a 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Remove unused member channels in i7core_pvt

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 2e5185f7 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Remove unused arg csrow from get_dimm_config

A local is enough.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# aace4283 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Reduce args of i7core_register_mci

We can check the number of channels in i7core_register_mci.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 1c6edbbe 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Introduce i7core_unregister_mci

In i7core_probe, when setup of mci for 2nd or later socket failed,
we should cleanup prepared mci for 1st socket or so before "put" of
all devices.

So let have i7core_unregister_mci that can be shared between here
and i7core_remove.

While here fix a typo "hanler".

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 73589c80 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Use saved pointers

We already have saved pointers. Use shorter ones.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 71fe0170 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Check probe counter in i7core_remove

Prevent i7core_remove from running multiple times.
Otherwise value proved will be negative and something will be wrong.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 2896637b 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Call pci_dev_put() when alloc_i7core_dev() failed

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 628c5ddf 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Fix error path of i7core_register_mci

Release resources properly.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 5939813b 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Fix order of lines in i7core_register_mci

The flag is_registered is not initialized until mci_bind_devs()
is called. Refer it properly.

The mci->dev and mci->edac_check is required in edac_mc_add_mc(),
so prepare them just before the call.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 64c10f6e 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Always do get/put for all devices

We already do 'get' for all sockets at once. So do 'put' in the
same way.

And let args of the 'get' function to void since it handles
only the single, static and known size table pci_dev_table[].

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a3aa0a4a 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Introduce i7core_pci_ctl_create/release

Have a couple of method.
while here sort out lines in the i7core_register_mci() a bit.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 2aa9be44 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Introduce free_i7core_dev

Have a method to make a couple with alloc_i7core_dev() previously
introduced. Using in pair will help proper resource handling.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 848b2f7e 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Introduce alloc_i7core_dev

It's nice to have a method for a single purpose.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# b197cba0 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Reduce args of i7core_get_onedevice

Since we need to pass the index of the entry, pass the table itself
instead of passing individual members of the table.

While here make it static.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 45b7c981 20-Aug-2010 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

i7core_edac: Fix the logic in i7core_remove()

commit 47251b4d960bdfa648b0d06dbc6d445f41cb3906 have changed
the logic for unexplained reasons. It looks strange that it
can release i7core_dev without calling i7core_put_devices()
that releases i7core_dev->pdev.

Fix the part.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 54a08ab1 19-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Don't do the legacy PCI probe by default

The legacy PCI probe sometimes cause hangs. Better to have it
disabled by default, and have a parameter to enable it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# accf74ff 16-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: don't use a freed mci struct

This is a nasty bug. Since kobject count will be reduced by zero by
edac_mc_del_mc(), and this triggers the kobj release method, the
mci memory will be freed automatically. So, all we have left is ctl_name,
as shown by enabling debug:

[ 80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link
[ 80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance
[ 80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[ 80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
[ 80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
[ 80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
[ 80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
[ 80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()

Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
when edac_remove_sysfs_mci_device() is called, but the logic is:
edac_remove_sysfs_mci_device(mci);
edac_printk(KERN_INFO, EDAC_MC,
"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
mci->mod_name, mci->ctl_name, edac_dev_name(mci));
So, as the edac_printk() needs the mci struct, this generates an OOPS.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# bbc560ae 16-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

edac_core: Print debug messages at release calls

This is important to track a nasty bug at the free logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 39300e71 11-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: explicitly remove PCI devices from the devices list

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 41ba6c10 10-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: MCE NMI handling should stop first

Otherwise, a NMI may happen causing a race condition and a panic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 6ee7dd50 10-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Initialize all priv vars before start polling

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 3cfd0146 10-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Improve debug to seek for register/remove errors

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# e9144601 10-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: move #if PAGE_SHIFT to edac_core.h

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 1288c18f 10-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Properly mark const static vars as such

There are two groups of sysfs attributes: one for rdimm and another
for udimm. Instead of changing dynamically the unique static struct
for handling udimm's, declare two vars and make them constant.

This avoids the risk of having two or more memory controllers, each
needing a different set of attributes.

While here, use const on all places where it is applicable.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac_core: use const for constant sysfs arguments

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 18c29002 10-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: move static vars to the beginning of the file

While here, don't initialize probed with 0.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 939747bd 10-Aug-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Be sure that the edac pci handler will be properly released

With multi-sockets, more than one edac pci handler is enabled. Be sure to
un-register all instances.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 64aab720 30-Sep-2010 Marcin Slusarz <marcin.slusarz@gmail.com>

i7core_edac: fix panic in udimm sysfs attributes registration

Array of udimm sysfs attributes was not ended with NULL marker, leading to
dereference of random memory.

EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm0
EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm1
EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm2
BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4
IP: [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
Pid: 1, comm: swapper Not tainted 2.6.36-rc3-nv+ #483 P6T SE/System Product Name
RIP: 0010:[<ffffffff81330b36>] [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
(...)
Call Trace:
[<ffffffff81330b86>] edac_create_mci_instance_attributes+0x198/0x1f1
[<ffffffff81330c9a>] edac_create_sysfs_mci_device+0xbb/0x2b2
[<ffffffff8132f533>] edac_mc_add_mc+0x46b/0x557
[<ffffffff81428901>] i7core_probe+0xccf/0xec0
RIP [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
---[ end trace 20de320855b81d78 ]---
Kernel panic - not syncing: Attempted to kill init!

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ab089374 23-Jul-2010 Daniel J Blueman <daniel.blueman@gmail.com>

quiesce EDAC initialisation on desktop/mobile i7

Don't print failure to detect Core i7 EDAC facilities to the console at
boot time, most often occurring on Core i7 desktops and laptops.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2d95d815 29-Jun-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Avoid doing multiple probes for the same card

As Nehalem/Nehalem-EP/Westmere devices uses several devices for the same
functionality (memory controller), the default way of proping devices doesn't
work. So, instead of a per-device probe, all devices should be probed at once.

This means that we should block any new attempt of probe, otherwise, it will
try to register the same device several times.

Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# bda14289 29-Jun-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Properly discover the first QPI device

On Nehalem/Nehalem-EP/Westmere, the first QPI device is the last PCI bus.
The last bus is generally at 0x3f or 0xff, but there are also other systems
using different setups. For example, HP Z800 has 0x7f as the last bus.

This patch adds a logic to discover the last bus, dynamically detecting it
at runtime.

Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 52707f91 18-May-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Better describe the supported devices

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# bd9e19ca 18-May-2010 Vernon Mauery <vernux@us.ibm.com>

Add support for Westmere to i7core_edac driver

This adds new PCI IDs for the Westmere's memory controller
devices and modifies the i7core_edac driver to be able to
probe both Nehalem and Westmere processors.

Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# d4d1ef45 18-May-2010 Tony Luck <tony.luck@intel.com>

i7core_edac: don't free on success

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# ac1ecece 18-May-2010 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add support for X5670

As reported by Vernon Mauery <vernux@us.ibm.com>, X5670 (Westmere-EP) uses a
different register for one of the uncore PCI devices. Add support for
it.

Those are the PCI ID's on this new chipset:

fe:00.0 0600: 8086:2c70 (rev 02)
fe:00.1 0600: 8086:2d81 (rev 02)
fe:02.0 0600: 8086:2d90 (rev 02)
fe:02.1 0600: 8086:2d91 (rev 02)
fe:02.2 0600: 8086:2d92 (rev 02)
fe:02.3 0600: 8086:2d93 (rev 02)
fe:02.4 0600: 8086:2d94 (rev 02)
fe:02.5 0600: 8086:2d95 (rev 02)
fe:03.0 0600: 8086:2d98 (rev 02)
fe:03.1 0600: 8086:2d99 (rev 02)
fe:03.2 0600: 8086:2d9a (rev 02)
fe:03.4 0600: 8086:2d9c (rev 02)
fe:04.0 0600: 8086:2da0 (rev 02)
fe:04.1 0600: 8086:2da1 (rev 02)
fe:04.2 0600: 8086:2da2 (rev 02)
fe:04.3 0600: 8086:2da3 (rev 02)
fe:05.0 0600: 8086:2da8 (rev 02)
fe:05.1 0600: 8086:2da9 (rev 02)
fe:05.2 0600: 8086:2daa (rev 02)
fe:05.3 0600: 8086:2dab (rev 02)
fe:06.0 0600: 8086:2db0 (rev 02)
fe:06.1 0600: 8086:2db1 (rev 02)
fe:06.2 0600: 8086:2db2 (rev 02)
fe:06.3 0600: 8086:2db3 (rev 02)
(as usual, the same PCI devices repeat at ff: bus)

The PCI device 8086:2c70 is shown as:

fe:00.0 Host bridge: Intel Corporation QuickPath Architecture Generic
Non-core Registers (rev 02)

So, for this device to be recognized, it is only a matter of adding this
new PCI ID to the driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 8a311e17 16-Apr-2010 Vernon Mauery <vernux@us.ibm.com>

Always call i7core_[ur]dimm_check_mc_ecc_err

This fixes an error in function i7core_check_error

In commit ca9c90ba09ca3c9799319f46a56f397afbf617c2 which converts the
driver to use double buffering, there is a change in the logic. Before,
if mce_count was zero, it skipped over a couple of statements and
finished out with a call to the *check_mc_ecc_err function. The current
code checks to see if mce_count is 0 and then exits.

This change reverts the behavior back to the original where if there are
no errors to report, we skip to the end and call the *check_mc_ecc_err
function.

This fix allows the driver to work again on my Nehalem based blades
again.

Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 2a6fae32 07-Jan-2010 Alexander Beregalov <a.beregalov@gmail.com>

i7core_edac: fix memory leak of i7core_dev

Free already allocated i7core_dev.

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 71753e01 09-Dec-2009 Jiri Slaby <jirislaby@kernel.org>

EDAC: add __init to i7core_xeon_pci_fixup

It's called only from an __init function and is the only user
of pcibios_scan_specific_bus which will be marked as __devinit in
the next patch.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 508fa179 14-Oct-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Fix wrong device id for channel 1 devices

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# f05da2f7 14-Oct-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: add support for Lynnfield alternate address

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 52a2e4fc 14-Oct-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add initial support for Lynnfield

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 3b918c12 07-Nov-2009 Randy Dunlap <randy.dunlap@oracle.com>

edac: fix i7core build

Fix build warning (missing header file) and
build error when CONFIG_SMP=n.

drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep'
drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 486dd09f 07-Nov-2009 Alan Cox <alan@linux.intel.com>

edac: i7core_edac produces undefined behaviour on 32bit

Fix the shifts up

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# de06eeef 14-Oct-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Use a more generic approach for probing PCI devices

Currently, only one PCI set of tables is allowed. This prevents using
the driver for other devices like Lynnfield, with have a different
set of PCI ID's.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# fd382654 14-Oct-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: PCI device is called NONCORE, instead of NOCORE

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 321ece4d 08-Oct-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Fix ringbuffer maxsize

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 6e103be1 05-Oct-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: First store, then increment

Fix ringbuffer store logic.

While here, add a few comments to the code and remove the undesired
printk that could otherwise be called during NMI time.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 4f87fad1 04-Oct-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Better parse "any" addrmask

Instead of accepting just "any", accept also "any\n"

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# ca9c90ba 04-Oct-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Use a lockless ringbuffer

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# f338d736 24-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Convert UDIMM error counters into a proper sysfs group

Instead of displaying 3 values at the same var, break it into 3
different sysfs nodes:

/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2

For registered dimms, however, the error counters are already being
displayed at:
/sys/devices/system/edac/mc/mc0/csrow*/ce_count

So, there's no need to add any extra sysfs nodes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# cc301b3a 24-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

edac: store/show methods for device groups weren't working

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a5538e53 23-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add support for sysfs addrmatch group

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 4af91889 24-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Avoid printing a warning when debug is disabled

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 42538680 24-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: We need to use list_for_each_entry_safe to avoid errors

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 22e6bcbd 05-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: change remove module strategy

The old remove module stragegy didn't work on devices with multiple
cores, since only one PCI device is used to open all mc's, due to
Nehalem nature.

Also, it were based at pdev value. However, this doesn't point to the
pci device used at mci->dev.

So, instead, it unregisters all devices at once, deleting them from the
device list.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 0f062792 04-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: remove static counter for max sockets

The number of sockets is now fully dynamic. Get rid of this obsolete
var.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 13d6e9b6 04-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: at remove, don't remove all pci devices at once

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# d88b8507 05-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Fix a bug when printing error counts with RDIMMs

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# d4c27795 05-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: a few fixes for multiple mc's

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 6c6aa3af 05-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: sanity check: print a warning if a mcelog is ignored

In thesis, the other mc controller should handle it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# f4742949 04-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: create one mc per socket/QPI

Instead of creating just one memory controller, create one per socket
(e. g. per Quick Link Path Interconnect).

This better reflects the Nehalem architecture.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 66607706 04-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

Dynamically allocate memory for PCI devices

Instead of using a static table assuming always 2 CPU sockets, allocate
space dynamically for Nehalem PCI devs.

This patch is part of a series of patches that changes i7core_edac to
allow more than 2 sockets and to properly report one memory controller
per socket.


# a55456f3 04-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: temporary workaround to allow it to compile against 2.6.30

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 3a3bb4a6 03-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Improve corrected_error_counts output for RDIMM

Just cosmetics. instead of showing something like:

socket 0, channel 2dimm0: 1
dimm1: 0
dimm2: 0
socket 1, channel 2dimm0: 0
dimm1: 0
dimm2: 0

Show:

socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0
socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0

This is more synthetic and easier to parse.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# bc2d7245 02-Sep-2009 Keith Mannthey <kmannth@us.ibm.com>

i7core_edac: Probe on Xeons eariler

On the Xeon 55XX series cpus the pci deives are not exposed via acpi so
we much explicitly probe them to make the usable as a Linux PCI device.

This moves the detection of this state to before pci_register_driver is
called. Its present position was not working on my systems, the driver
would complain about not finding a specific device.

This patch allows the driver to load on my systems.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 14d2c083 02-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: Use registered memories per processor

Instead of assuming that the entire machine has either registered or
unregistered memories, do it at CPU socket based.

While here, fix a bug at i7core_mce_output_error(), where the we're
using m->cpu directly as if it would represent a socket. Instead, the
proper socket_id is given by cpu_data[m->cpu].phys_proc_id.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---


# b4e8f0b6 02-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Use Device 3 function 2 to report errors with RDIMM's

Nehalem and upper chipsets provide an special device that has corrected memory
error counters detected with registered dimms. This device is only seen if
there are registered memories plugged.

After this patch, on a machine fully equiped with RDIMM's, it will use the
Device 3 function 2 to count corrected errors instead on relying at mcelog.

For unregistered DIMMs, it will keep the old behavior, counting errors
via mcelog.

This patch were developed together with Keith Mannthey <kmannth@us.ibm.com>

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 61053fde 02-Sep-2009 Keith Mannthey <kmannth@us.ibm.com>

i7core_edac: Fix ecc enable shift

From: Keith Mannthey <kmannth@us.ibm.com>

Simple correction to a shift value.
ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c

This correctly identifies the state of the ECC at the machine.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 3ef288a9 02-Sep-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Print an error message if pci register fails

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# b990538a 05-Aug-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: CodingSyle fixes/cleanups

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 4157d9f5 05-Aug-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: fix error injection

There were two stupid error injection bugs introduced by wrong
cut-and-paste: one at socket store, and another at the error inject
register. The last one were causing the code to not work at all.

While here, adds debug messages to allow seeing what registers are being
set while sending error injection.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 2068def5 05-Aug-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: fix error codes for sysfs error injection interface

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 276b824c 22-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: some fixes at error injection code

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 17cb7b0c 20-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Some cleanups at displayed info

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 086271a0 17-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: remove some uneeded noisy debug messages

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 3a7dde7f 17-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: add socket info at the debug msg

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# ec6df24c 18-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: better document i7core_get_active_channels()

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# c77720b9 18-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: fix get_devices routine for Xeon55xx

i7core_get_devices() were preparet to get just the first found device of each type.
Due to that, on Xeon 55xx, only socket 1 were retrived.

Rework i7core_get_devices() to clean it and to properly support Xeon 55xx.

While here, fix a small typo.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a639539f 17-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: enrich error information based on memory transaction type

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# c5d34528 17-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: check if the memory error is fatal or non-fatal

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 310cbb72 16-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core: fix probing on Xeon55xx

Xeon55xx fails to probe with this error message:

EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init()
EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41
i7core_edac: probe of 0000:00:14.0 failed with error -22

This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has
PCI ID 8086:2c40.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# f237fcf2 15-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: some fixes at memory error parser

m->bank is not related to the memory bank but, instead, to the MCA Error
register bank. Fix it accordingly. While here, improves the comments for
Nehalem bank.

A later fix is needed, in order to get bank/rank information from MCA
error log.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 8a2f118e 15-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: decode mcelog error and send it via edac interface

Enriches mcelog error by using the encoded information at MCE status and
misc registers (IA32_MCx_STATUS, IA32_MCx_MISC).

Some fixes are still needed here, in order to properly fill the EDAC
fields.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# ba6c5c62 15-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: maps all sockets as if ther are one MC controller

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 67166af4 15-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: add support for more than one MC socket

Some Nehalem architectures have more than one MC socket. Socket 0 is
located at bus 255.

Currently, it is using up to 2 sockets, but increasing it to a larger
number is just a matter of increasing MAX_SOCKETS definition.

This seems to be required for properly support of Xeon 55xx.

Still needs testing with Xeon 55xx.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# d1fd4fb6 10-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add a code to probe Xeon 55xx bus

This code changes the detection procedure of i7core_edac. Instead of
directly probing for MC registers, it probes for another register found
on Nehalem. If found, it tries to pick the first MC PCI BUS. This should
work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255
that are not properly detected by the non-legacy PCI methods.

The new detection code scans specifically at buses 254 and 255 for the
Xeon 55xx devices.

This code has not tested yet. After working, a change at the code will
be needed, since the i7core is not yet ready for working with 2 sets of
MC.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# e9bd2e73 09-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Adds write unlock to MC registers

The public Intel Xeon 5500 volume 2 datasheet describes, on page 53,
session 2.6.7 a register that can lock/unlock Memory Controller the
configuration register, called MC_CFG_CONTROL.

Adds support for it in the hope that software error injection would
work. With my tests with Xeon 35xx, there's still something missing.
With a program that does sequencial bit writes at dev 0.0, sometimes, it
produces error injection, after unblocking the MC_CFG_CONTROL (and,
sometimes, it just locks my testing machine).

I'll try later to discover by trial and error what's the register that
solves this issue on Xeon 35xx.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# d5381642 09-Jul-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add edac_mce glue

Adds a glue code to allow i7core to work with mcelog. With the glue,
i7core registers itself on edac_mce. At mce, when an error is detected,
it calls all registered drivers (in this case, i7core), for EDAC error
handling.

TODO: It currently just prints the MCE error log using about the same
format as mce panic messages. The error message should be enhanced
with mcelog userspace info and converted into the proper EDAC format,
to feed the EDAC error counts.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 41fcb7fe 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: CodingStyle fixes

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# eb94fc40 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: fill csrows edac sysfs info

csrows is still fake, since we can't identify its representation with
Nehalem registers.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 5566cb7c 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Memory info fixes and preparation for properly filling cswrow data

Now, memory size is properly displayed:

EDAC i7core: DOD Max limits: DIMMS: 2, 1-ranked, 8-banked
EDAC i7core: DOD Max rows x colums = 0x4000 x 0x400
EDAC i7core: Memory channel configuration:
EDAC i7core: Ch0 phy rd0, wr0 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: dimm 1 (0x00001288) 1024 Mb offset: 4, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: Ch1 phy rd1, wr1 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: Ch2 phy rd3, wr3 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400

Still, as the way to retrieve csrows info is not known, it does a
mapping of what's available to csrows basic unit at edac core.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 854d3349 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Get more info about the memory DIMMs

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 7dd6953c 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add more information about each active dimm

Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# b7c76151 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Improve error handling

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 1c6fed80 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Properly fill struct csrow_info

Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# ef708b53 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add additional tests for error detection

Properly check the number of channels and improve probing error detection

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 442305b1 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add a memory check routine, based on device 3 function 4

This function appears only on Xeon 5500 datasheet. Yet, testing with a
Xeon 3503 showed that this is also implemented on other Nehalem
processors.

At the first read, MC_TEST_ERR_RCV1 and MC_TEST_ERR_RCV0 can contain any
value. Modify CE error logic to update the error count only after the
second read.

An alternative approach would be to do a write at rcv0 and rcv1
registers, but it seemed better to keep they untouched, since BIOS might
eventually assume that they are exclusive for their usage.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 87d1d272 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: need mci->edac_check, otherwise module removal doesn't work

There are some locking troubles with edac_core: if you don't declare an
edac_check, module may suffer from soft lock.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 7b029d03 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: A few fixes at error injection code

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# f122a892 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Show read/write virtual/physical channel association

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 8f331907 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Registers all supported MC functions

Now, it will try to register on all supported Memory Controller
functions.

It should be noticed that dev3, function 2 is present only on chips with
Registered DIMM's, according to the datasheet. So, the driver doesn't
return -ENODEV is all functions but this one were successfully
registered and enabled:

EDAC i7core: Registered device 8086:2c18 fn=3 0
EDAC i7core: Registered device 8086:2c19 fn=3 1
EDAC i7core: Device not found: PCI ID 8086:2c1a (dev 3, func 2)
EDAC i7core: Registered device 8086:2c1c fn=3 4
EDAC i7core: Registered device 8086:2c20 fn=4 0
EDAC i7core: Registered device 8086:2c21 fn=4 1
EDAC i7core: Registered device 8086:2c22 fn=4 2
EDAC i7core: Registered device 8086:2c23 fn=4 3
EDAC i7core: Registered device 8086:2c28 fn=5 0
EDAC i7core: Registered device 8086:2c29 fn=5 1
EDAC i7core: Registered device 8086:2c2a fn=5 2
EDAC i7core: Registered device 8086:2c2b fn=5 3
EDAC i7core: Registered device 8086:2c30 fn=6 0
EDAC i7core: Registered device 8086:2c31 fn=6 1
EDAC i7core: Registered device 8086:2c32 fn=6 2
EDAC i7core: Registered device 8086:2c33 fn=6 3
EDAC i7core: Driver loaded.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 0b2b7b7e 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add more status functions to EDAC driver

This patch were co-authored with Aristeu Rozanski.

Signed-off-by: Aristeu Sergio <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 194a40fe 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add error insertion code for Nehalem

Implements set_inject_error() with the low-level code needed to inject
memory errors at Nehalem, and adds some sysfs nodes to allow error injection

The next patch will add an API for error injection.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# a0c36a1f 22-Jun-2009 Mauro Carvalho Chehab <mchehab@kernel.org>

i7core_edac: Add an EDAC memory controller driver for Nehalem chipsets

This driver is meant to support i7 core/i7core extreme desktop
processors and Xeon 35xx/55xx series with integrated memory controller.
It is likely that it can be expanded in the future to work with other
processor series based at the same Memory Controller design.

For now, it has just a few MCH status reads.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>