#
0aafbdf3 |
|
18-Feb-2023 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Make generic_calibrate_decr() the default ppc_md.calibrate_decr() is a mandatory item. Its nullity is never checked so it must be non null on all platforms. Most platforms define generic_calibrate_decr() as their ppc_md.calibrate_decr(). Have time_init() call generic_calibrate_decr() when ppc_md.calibrate_decr() is NULL, and remove default assignment from all machines. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/6cb9865d916231c38401ba34ad1a98c249fae135.1676711562.git.christophe.leroy@csgroup.eu
|
#
2a7ce82d |
|
05-Feb-2023 |
Rohan McLure <rmclure@linux.ibm.com> |
powerpc/kcsan: Exclude udelay to prevent recursive instrumentation In order for KCSAN to increase its likelihood of observing a data race, it sets a watchpoint on memory accesses and stalls, allowing for detection of conflicting accesses by other kernel threads or interrupts. Stalls are implemented by injecting a call to udelay in instrumented code. To prevent recursive instrumentation, exclude udelay from being instrumented. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230206021801.105268-3-rmclure@linux.ibm.com
|
#
c2854801 |
|
21-Jan-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Fix perf profiling asynchronous interrupt handlers Interrupt entry sets the soft mask to IRQS_ALL_DISABLED to match the hard irq disabled state. So when should_hard_irq_enable() returns true because we want PMI interrupts in irq handlers, MSR[EE] is enabled but PMIs just get soft-masked. Fix this by clearing IRQS_PMI_DISABLED before enabling MSR[EE]. This also tidies some of the warnings, no need to duplicate them in both should_hard_irq_enable() and do_hard_irq_enable(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230121100156.2824054-1-npiggin@gmail.com
|
#
f985adaf |
|
06-Oct-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: remove the last remnants of cputime_t cputime_t was a core kernel type, removed by commits ed5c8c854f2b..b672592f0221. As explained in commit b672592f0221 ("sched/cputime: Remove generic asm headers"), the final cleanup is for the arch to provide cputime_to_nsec[s](). Commit ade7667a981b ("powerpc: Add cputime_to_nsecs()") did that, but justdidn't remove the then-unused cputime_to_usecs(), cputime_t type, and associated remnants. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006105653.115829-1-npiggin@gmail.com
|
#
c8455020 |
|
09-Sep-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/time: avoid programming DEC at the start of the timer interrupt Setting DEC to maximum at the start of the timer interrupt is not necessary and can be avoided for performance when MSR[EE] is not enabled during the handler as explained in commit 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use"), where this change was first attempted. The idea is that the timer interrupt runs with MSR[EE]=0, and at the end of the interrupt DEC is programmed to the next timer interval, so there is no need to clear the decrementer exception before then. When the above commit was merged, that was not quite true. The low res timer subsystem had some cases in the oneshot timer code where if the tick was to be stopped and no timers active, the clock device would not get the ->set_state_oneshot_stopped() call, so DEC would not get reprogrammed, and this would hang taking continual timer interrupts. So this was reverted in commit d2b9be1f4af5 ("powerpc/time: Always set decrementer in timer_interrupt()"), which was a partial revert of the above commit. Commit 62c1256d5447 ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped") was later merged to fix this missing case in the timer subsystem, so now the behaviour can be restored. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220909142457.278032-1-npiggin@gmail.com
|
#
6ba5aa54 |
|
02-Sep-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/pseries: Move dtl scanning and steal time accounting to pseries platform dtl is the PAPR Dispatch Trace Log, which is entirely a pseries feature. The pseries platform alrady has a file dealing with the dtl, so move scanning for stolen time accounting there from kernel/time.c. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220902085316.2071519-5-npiggin@gmail.com
|
#
e6f6390a |
|
08-Mar-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Add missing headers Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h, asm/pci.h etc... Include the needed headers, and remove asm/prom.h when it was needed exclusively for pulling necessary headers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
|
#
1fd02f66 |
|
30-Apr-2022 |
Julia Lawall <Julia.Lawall@inria.fr> |
powerpc: fix typos in comments Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
|
#
ce0091a0 |
|
24-Mar-2021 |
He Ying <heying24@huawei.com> |
powerpc/time: Fix sparse warnings We found these warnings in arch/powerpc/kernel/time.c as follows: warning: symbol 'decrementer_max' was not declared. Should it be static? warning: symbol 'rtc_lock' was not declared. Should it be static? warning: symbol 'dtl_consumer' was not declared. Should it be static? Declare 'decrementer_max' in powerpc asm/time.h. Include linux/mc146818rtc.h in powerpc kernel/time.c where 'rtc_lock' is declared. And remove duplicated declaration of 'rtc_lock' in powerpc platforms/chrp/time.c because it has included linux/mc146818rtc.h. Move 'dtl_consumer' definition after "include <asm/dtl.h>" because it is declared there. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: He Ying <heying24@huawei.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210324090939.143477-1-heying24@huawei.com
|
#
d2b9be1f |
|
20-Apr-2022 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/time: Always set decrementer in timer_interrupt() This is a partial revert of commit 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use"). Prior to that commit, we always set the decrementer in timer_interrupt(), to clear the timer interrupt. Otherwise we could end up continuously taking timer interrupts. When high res timers are enabled there is no problem seen with leaving the decrementer untouched in timer_interrupt(), because it will be programmed via hrtimer_interrupt() -> tick_program_event() -> clockevents_program_event() -> decrementer_set_next_event(). However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we see a stall/lockup, because tick_nohz_handler() does not cause a reprogram of the decrementer, leading to endless timer interrupts. Example trace: [ 1.898617][ T7] Freeing initrd memory: 2624K^M [ 22.680919][ C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:^M [ 22.682281][ C1] rcu: 0-....: (25 ticks this GP) idle=073/0/0x1 softirq=10/16 fqs=1050 ^M [ 22.682851][ C1] (detected by 1, t=2102 jiffies, g=-1179, q=476)^M [ 22.683649][ C1] Sending NMI from CPU 1 to CPUs 0:^M [ 22.685252][ C0] NMI backtrace for cpu 0^M [ 22.685649][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc2-00185-g0faf20a1ad16 #145^M [ 22.686393][ C0] NIP: c000000000016d64 LR: c000000000f6cca4 CTR: c00000000019c6e0^M [ 22.686774][ C0] REGS: c000000002833590 TRAP: 0500 Not tainted (5.16.0-rc2-00185-g0faf20a1ad16)^M [ 22.687222][ C0] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24000222 XER: 00000000^M [ 22.688297][ C0] CFAR: c00000000000c854 IRQMASK: 0 ^M ... [ 22.692637][ C0] NIP [c000000000016d64] arch_local_irq_restore+0x174/0x250^M [ 22.694443][ C0] LR [c000000000f6cca4] __do_softirq+0xe4/0x3dc^M [ 22.695762][ C0] Call Trace:^M [ 22.696050][ C0] [c000000002833830] [c000000000f6cc80] __do_softirq+0xc0/0x3dc (unreliable)^M [ 22.697377][ C0] [c000000002833920] [c000000000151508] __irq_exit_rcu+0xd8/0x130^M [ 22.698739][ C0] [c000000002833950] [c000000000151730] irq_exit+0x20/0x40^M [ 22.699938][ C0] [c000000002833970] [c000000000027f40] timer_interrupt+0x270/0x460^M [ 22.701119][ C0] [c0000000028339d0] [c0000000000099a8] decrementer_common_virt+0x208/0x210^M Possibly this should be fixed in the lowres timing code, but that would be a generic change and could take some time and may not backport easily, so for now make the programming of the decrementer unconditional again in timer_interrupt() to avoid the stall/lockup. Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use") Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20220420141657.771442-1-mpe@ellerman.id.au
|
#
35de589c |
|
24-Jan-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/time: improve decrementer clockevent processing The stop/shutdown op should not use decrementer_set_next_event because that sets decrementers_next_tb to now + decrementer_max, which means a decrementer interrupt that occurs after that time will call the clockevent event handler unexpectedly. Set next_tb to ~0 here to prevent any clock event call. Init all clockevents to stopped. Then the decrementer clockevent device always has event_handler set and applicable because we know the clock event device was not stopped. So make this call unconditional to show that it is always called. next_tb need not be set to ~0 before the event handler is called because it will stop the clockevent device if there is no other timer. Finally, the timer broadcast interrupt should not modify next_tb because it is not involved with the local decrementer clockevent on this CPU. This doesn't fix a known bug, just tidies the code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124143930.3923442-3-npiggin@gmail.com
|
#
cf74ff52 |
|
24-Jan-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/time: Fix KVM host re-arming a timer beyond decrementer range If the next host timer is beyond decrementer range, timer_rearm_host_dec will leave decrementer not programmed. This will not cause a problem for the host it will just set the decrementer correctly when the decrementer interrupt hits, it seems safer not to leave the next host decrementer interrupt timing able to be influenced by a guest. This code is only used in the P9 KVM paths so it's unlikely to be hit practically unless large decrementer is force disabled in the host. Fixes: 25aa145856cd ("powerpc/time: add API for KVM to re-arm the host timer/decrementer") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124143930.3923442-2-npiggin@gmail.com
|
#
76222808 |
|
04-Mar-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Move C prototypes out of asm-prototypes.h We originally added asm-prototypes.h in commit 42f5b4cacd78 ("powerpc: Introduce asm-prototypes.h"). It's purpose was for prototypes of C functions that are only called from asm, in order to fix sparse warnings about missing prototypes. A few months later Nick added a different use case in commit 4efca4ed05cb ("kbuild: modversions for EXPORT_SYMBOL() for asm") for C prototypes for exported asm functions. This is basically the inverse of our original usage. Since then we've added various prototypes to asm-prototypes.h for both reasons, meaning we now need to unstitch it all. Dispatch prototypes of C functions into relevant headers and keep only the prototypes for functions defined in assembly. For the time being, leave prom_init() there because moving it into asm/prom.h or asm/setup.h conflicts with drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowrom.o This will be fixed later by untaggling asm/pci.h and asm/prom.h or by renaming the function in shadowrom.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62d46904eca74042097acf4cb12c175e3067f3d1.1646413435.git.christophe.leroy@csgroup.eu
|
#
cc15ff32 |
|
20-Jan-2022 |
Ganesh Goudar <ganeshgr@linux.ibm.com> |
powerpc/mce: Avoid using irq_work_queue() in realmode In realmode mce handler we use irq_work_queue() to defer the processing of mce events, irq_work_queue() can only be called when translation is enabled because it touches memory outside RMA, hence we enable translation before calling irq_work_queue and disable on return, though it is not safe to do in realmode. To avoid this, program the decrementer and call the event processing functions from timer handler. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220120121931.517974-1-ganeshgr@linux.ibm.com
|
#
8defc2a5 |
|
24-Jan-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/interrupt: Fix decrementer storm The decrementer exception can fail to be cleared when the interrupt returns in the case where the decrementer wraps with the next timer still beyond decrementer_max. This results in a decrementer interrupt storm. This is triggerable with small decrementer system with hard and soft watchdogs disabled. Fix this by always programming the decrementer if there was no timer. Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124143930.3923442-1-npiggin@gmail.com
|
#
0faf20a1 |
|
22-Sep-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use Enabling MSR[EE] in interrupt handlers while interrupts are still soft masked allows PMIs to profile interrupt handlers to some degree, beyond what SIAR latching allows. When perf is not being used, this is almost useless work. It requires an extra mtmsrd in the irq handler, and it also opens the door to masked interrupts hitting and requiring replay, which is more expensive than just taking them directly. This effect can be noticable in high IRQ workloads. Avoid enabling MSR[EE] unless perf is currently in use. This saves about 60 cycles (or 8%) on a simple decrementer interrupt microbenchmark. Replayed interrupts drop from 1.4% of all interrupts taken, to 0.003%. This does prevent the soft-nmi interrupt being taken in these handlers, but that's not too reliable anyway. The SMP watchdog will continue to be the reliable way to catch lockups. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210922145452.352571-5-npiggin@gmail.com
|
#
047a6fd4 |
|
19-Oct-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/config: Add CONFIG_BOOKE_OR_40x We have many functionnalities common to 40x and BOOKE, it leads to many places with #if defined(CONFIG_BOOKE) || defined(CONFIG_40x). We are going to add a few more with KUAP for booke/40x, so create a new symbol which is defined when either BOOKE or 40x is defined. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9a3dbd60924cb25c9f944d3d8205ac5a0d15e229.1634627931.git.christophe.leroy@csgroup.eu
|
#
25aa1458 |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/time: add API for KVM to re-arm the host timer/decrementer Rather than have KVM look up the host timer and fiddle with the irq-work internal details, have the powerpc/time.c code provide a function for KVM to re-arm the Linux timer code when exiting a guest. This is implementation has an improvement over existing code of marking a decrementer interrupt as soft-pending if a timer has expired, rather than setting DEC to a -ve value, which tended to cause host timers to take two interrupts (first hdec to exit the guest, then the immediate dec). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-8-npiggin@gmail.com
|
#
9581991a |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV P9: Use large decrementer for HDEC On processors that don't suppress the HDEC exceptions when LPCR[HDICE]=0, this could help reduce needless guest exits due to leftover exceptions on entering the guest. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-6-npiggin@gmail.com
|
#
4ebbd075 |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer read There is no need to save away the host DEC value, as it is derived from the host timer subsystem which maintains the next timer time, so it can be restored from there. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-5-npiggin@gmail.com
|
#
e606a2f4 |
|
31-Aug-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/time: Remove generic_suspend_{dis/en}able_irqs() Commit d75d68cfef49 ("powerpc: Clean up obsolete code relating to decrementer and timebase") made generic_suspend_enable_irqs() and generic_suspend_disable_irqs() static. Fold them into their only caller. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c3f9ec9950394ef939014f7934268e6ee30ca04f.1630398566.git.christophe.leroy@csgroup.eu
|
#
e225c4d6 |
|
23-Mar-2021 |
Wan Jiabing <wanjiabing@vivo.com> |
powerpc: Remove duplicate includes interrupt.c: asm/interrupt.h has been included at line 12, so remove the duplicate one at line 10. time.c: linux/sched/clock.h has been included at line 33,so remove the duplicate one at line 56 and move sched/cputime.h under sched including segament. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210323062916.295346-1-wanjiabing@vivo.com
|
#
98694166 |
|
10-Aug-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt() An interrupt handler shall not be called from another interrupt handler otherwise this leads to problems like the following: Kernel attempted to write user page (afd4fa84) - exploit attempt? (uid: 1000) ------------[ cut here ]------------ Bug: Write fault blocked by KUAP! WARNING: CPU: 0 PID: 1617 at arch/powerpc/mm/fault.c:230 do_page_fault+0x484/0x720 Modules linked in: CPU: 0 PID: 1617 Comm: sshd Tainted: G W 5.13.0-pmac-00010-g8393422eb77 #7 NIP: c001b77c LR: c001b77c CTR: 00000000 REGS: cb9e5bc0 TRAP: 0700 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 00021032 <ME,IR,DR,RI> CR: 24942424 XER: 00000000 GPR00: c001b77c cb9e5c80 c1582c00 00000021 3ffffbff 085b0000 00000027 c8eb644c GPR08: 00000023 00000000 00000000 00000000 24942424 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90 GPR24: 00000040 afd4fa96 00000040 02000000 c1fda6c0 afd4fa84 00000300 cb9e5cc0 NIP [c001b77c] do_page_fault+0x484/0x720 LR [c001b77c] do_page_fault+0x484/0x720 Call Trace: [cb9e5c80] [c001b77c] do_page_fault+0x484/0x720 (unreliable) [cb9e5cb0] [c000424c] DataAccess_virt+0xd4/0xe4 --- interrupt: 300 at __copy_tofrom_user+0x110/0x20c NIP: c001f9b4 LR: c03250a0 CTR: 00000004 REGS: cb9e5cc0 TRAP: 0300 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 48028468 XER: 20000000 DAR: afd4fa84 DSISR: 0a000000 GPR00: 20726f6f cb9e5d80 c1582c00 00000004 cb9e5e3a 00000016 afd4fa80 00000000 GPR08: 3835202d 72777872 2d78722d 00000004 28028464 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90 GPR24: 00000040 afd4fa96 00000040 cb9e5e0c 00000daa a0000000 cb9e5e98 afd4fa56 NIP [c001f9b4] __copy_tofrom_user+0x110/0x20c LR [c03250a0] _copy_to_iter+0x144/0x990 --- interrupt: 300 [cb9e5d80] [c03e89c0] n_tty_read+0xa4/0x598 (unreliable) [cb9e5df0] [c03e2a0c] tty_read+0xdc/0x2b4 [cb9e5e80] [c0156bf8] vfs_read+0x274/0x340 [cb9e5f00] [c01571ac] ksys_read+0x70/0x118 [cb9e5f30] [c0016048] ret_from_syscall+0x0/0x28 --- interrupt: c00 at 0xa7855c88 NIP: a7855c88 LR: a7855c5c CTR: 00000000 REGS: cb9e5f40 TRAP: 0c00 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 2402446c XER: 00000000 GPR00: 00000003 afd4ec70 a72137d0 0000000b afd4ecac 00004000 0065a990 00000800 GPR08: 00000000 a7947930 00000000 00000004 c15831b0 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 0065a9e0 00000001 0065fac0 GPR24: 00000000 00000089 00664050 00000000 00668e30 a720c8dc a7943ff4 0065f9b0 NIP [a7855c88] 0xa7855c88 LR [a7855c5c] 0xa7855c5c --- interrupt: c00 Instruction dump: 3884aa88 38630178 48076861 807f0080 48042e45 2f830000 419e0148 3c80c079 3c60c076 38841be4 386301c0 4801f705 <0fe00000> 3860000b 4bfffe30 3c80c06b ---[ end trace fd69b91a8046c2e5 ]--- Here the problem is that by re-enterring an exception handler, kuap_save_and_lock() is called a second time with this time KUAP access locked, leading to regs->kuap being overwritten hence KUAP not being unlocked at exception exit as expected. Do not call do_IRQ() from timer_interrupt() directly. Instead, redefine do_IRQ() as a standard function named __do_IRQ(), and call it from both do_IRQ() and time_interrupt() handlers. Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers") Cc: stable@vger.kernel.org # v5.12+ Reported-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c17d234f4927d39a1d7100864a8e1145323d33a0.1628611927.git.christophe.leroy@csgroup.eu
|
#
0cdff98b |
|
22-Jun-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Remove irq mask workaround in accumulate_stolen_time() The caller has been moved to C after irq soft-mask state has been reconciled, and Linux IRQs have been marked as disabled, so this no longer needs to play games with IRQ internals. Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210623022924.704645-1-npiggin@gmail.com
|
#
6ffe2c6e |
|
28-May-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races irq_work's use of the DEC SPR is racy with guest<->host switch and guest entry which flips the DEC interrupt to guest, which could lose a host work interrupt. This patch closes one race, and attempts to comment another class of races. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-11-npiggin@gmail.com
|
#
1b1b6a6f |
|
30-Jan-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: handle irq_enter/irq_exit in interrupt handler wrappers Move irq_enter/irq_exit into asynchronous interrupt handler wrappers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-35-npiggin@gmail.com
|
#
3a96570f |
|
30-Jan-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: convert interrupt handlers to use wrappers Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com
|
#
b709e32e |
|
22-Oct-2020 |
Pingfan Liu <kernelfans@gmail.com> |
powerpc/time: Enable sched clock for irqtime When CONFIG_IRQ_TIME_ACCOUNTING and CONFIG_VIRT_CPU_ACCOUNTING_GEN, powerpc does not enable "sched_clock_irqtime" and can not utilize irq time accounting. Like x86, powerpc does not use the sched_clock_register() interface. So it needs an dedicated call to enable_sched_clock_irqtime() to enable irq time accounting. Fixes: 518470fe962e ("powerpc: Add HAVE_IRQ_TIME_ACCOUNTING") Signed-off-by: Pingfan Liu <kernelfans@gmail.com> [mpe: Add fixes tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1603349479-26185-1-git-send-email-kernelfans@gmail.com
|
#
59d512e4 |
|
06-Nov-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: irq replay remove decrementer overflow check This is way to catch some cases of decrementer overflow, when the decrementer has underflowed an odd number of times, while MSR[EE] was disabled. With a typical small decrementer, a timer that fires when MSR[EE] is disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and 8.6 seconds after the timer expires. In any case, the decrementer interrupt would be taken at 8.6 seconds and the timer would be found at that point. So this check is for catching extreme latency events, and it prevents those latencies from being a further few seconds long. It's not obvious this is a good tradeoff. This is already a watchdog magnitude event and that situation is not improved a significantly with this check. For large decrementers, it's useless. Therefore remove this check, which avoids a mftb when enabling hard disabled interrupts (e.g., when enabling after coming from hardware interrupt handlers). Perhaps more importantly, it also removes the clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay which simplifies the code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
|
#
ab037dd8 |
|
26-Nov-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/vdso: Switch VDSO to generic C implementation. With the C VDSO, the performance is slightly lower, but it is worth it as it will ease maintenance and evolution, and also brings clocks that are not supported with the ASM VDSO. On an 8xx at 132 MHz, vdsotest with the ASM VDSO: gettimeofday: vdso: 828 nsec/call clock-getres-realtime-coarse: vdso: 391 nsec/call clock-gettime-realtime-coarse: vdso: 614 nsec/call clock-getres-realtime: vdso: 460 nsec/call clock-gettime-realtime: vdso: 876 nsec/call clock-getres-monotonic-coarse: vdso: 399 nsec/call clock-gettime-monotonic-coarse: vdso: 691 nsec/call clock-getres-monotonic: vdso: 460 nsec/call clock-gettime-monotonic: vdso: 1026 nsec/call On an 8xx at 132 MHz, vdsotest with the C VDSO: gettimeofday: vdso: 955 nsec/call clock-getres-realtime-coarse: vdso: 545 nsec/call clock-gettime-realtime-coarse: vdso: 592 nsec/call clock-getres-realtime: vdso: 545 nsec/call clock-gettime-realtime: vdso: 941 nsec/call clock-getres-monotonic-coarse: vdso: 545 nsec/call clock-gettime-monotonic-coarse: vdso: 591 nsec/call clock-getres-monotonic: vdso: 545 nsec/call clock-gettime-monotonic: vdso: 940 nsec/call It is even better for gettime with monotonic clocks. Unsupported clocks with ASM VDSO: clock-gettime-boottime: vdso: 3851 nsec/call clock-gettime-tai: vdso: 3852 nsec/call clock-gettime-monotonic-raw: vdso: 3396 nsec/call Same clocks with C VDSO: clock-gettime-tai: vdso: 941 nsec/call clock-gettime-monotonic-raw: vdso: 1001 nsec/call clock-gettime-monotonic-coarse: vdso: 591 nsec/call On an 8321E at 333 MHz, vdsotest with the ASM VDSO: gettimeofday: vdso: 220 nsec/call clock-getres-realtime-coarse: vdso: 102 nsec/call clock-gettime-realtime-coarse: vdso: 178 nsec/call clock-getres-realtime: vdso: 129 nsec/call clock-gettime-realtime: vdso: 235 nsec/call clock-getres-monotonic-coarse: vdso: 105 nsec/call clock-gettime-monotonic-coarse: vdso: 208 nsec/call clock-getres-monotonic: vdso: 129 nsec/call clock-gettime-monotonic: vdso: 274 nsec/call On an 8321E at 333 MHz, vdsotest with the C VDSO: gettimeofday: vdso: 272 nsec/call clock-getres-realtime-coarse: vdso: 160 nsec/call clock-gettime-realtime-coarse: vdso: 184 nsec/call clock-getres-realtime: vdso: 166 nsec/call clock-gettime-realtime: vdso: 281 nsec/call clock-getres-monotonic-coarse: vdso: 160 nsec/call clock-gettime-monotonic-coarse: vdso: 184 nsec/call clock-getres-monotonic: vdso: 169 nsec/call clock-gettime-monotonic: vdso: 275 nsec/call On a Power9 Nimbus DD2.2 at 3.8GHz, with the ASM VDSO: clock-gettime-monotonic: vdso: 35 nsec/call clock-getres-monotonic: vdso: 16 nsec/call clock-gettime-monotonic-coarse: vdso: 18 nsec/call clock-getres-monotonic-coarse: vdso: 522 nsec/call clock-gettime-monotonic-raw: vdso: 598 nsec/call clock-getres-monotonic-raw: vdso: 520 nsec/call clock-gettime-realtime: vdso: 34 nsec/call clock-getres-realtime: vdso: 16 nsec/call clock-gettime-realtime-coarse: vdso: 18 nsec/call clock-getres-realtime-coarse: vdso: 517 nsec/call getcpu: vdso: 8 nsec/call gettimeofday: vdso: 25 nsec/call And with the C VDSO: clock-gettime-monotonic: vdso: 37 nsec/call clock-getres-monotonic: vdso: 20 nsec/call clock-gettime-monotonic-coarse: vdso: 21 nsec/call clock-getres-monotonic-coarse: vdso: 19 nsec/call clock-gettime-monotonic-raw: vdso: 38 nsec/call clock-getres-monotonic-raw: vdso: 20 nsec/call clock-gettime-realtime: vdso: 37 nsec/call clock-getres-realtime: vdso: 20 nsec/call clock-gettime-realtime-coarse: vdso: 20 nsec/call clock-getres-realtime-coarse: vdso: 19 nsec/call getcpu: vdso: 8 nsec/call gettimeofday: vdso: 28 nsec/call Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-8-mpe@ellerman.id.au
|
#
8a6a5920 |
|
01-Dec-2020 |
Frederic Weisbecker <frederic@kernel.org> |
sched/vtime: Consolidate IRQ time accounting The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function. Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-4-frederic@kernel.org
|
#
942e8911 |
|
30-Sep-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/time: Avoid using get_tbl() and get_tbu() internally get_tbl() is confusing as it returns the content of TBL register on PPC32 but the concatenation of TBL and TBU on PPC64. Use mftb() instead. Do the same with get_tbu() for consistency allthough it's name is less confusing. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/41573406a4eab98838decaa91649086fef1e6119.1601556145.git.christophe.leroy@csgroup.eu
|
#
6601ec1c |
|
29-Sep-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Remove get_tb_or_rtc() 601 is gone, get_tb_or_rtc() is equivalent to get_tb(). Replace the former by the later. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3e8a13ee83418630c753c30cb722ae682d5b2d39.1601362098.git.christophe.leroy@csgroup.eu
|
#
a4c5a355 |
|
29-Sep-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Remove __USE_RTC() Now that PowerPC 601 is gone, __USE_RTC() is never true. Remove it. That also leads to removing get_rtc() and get_rtcl() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4757e1ed21fe1968c761ae081d1f3d790a9673f8.1601362098.git.christophe.leroy@csgroup.eu
|
#
d6bdceb6 |
|
29-May-2020 |
Peter Zijlstra <peterz@infradead.org> |
powerpc64: Break asm/percpu.h vs spinlock_types.h dependency In order to use <asm/percpu.h> in lockdep.h, we need to make sure asm/percpu.h does not itself depend on lockdep. The below seems to make that so and builds powerpc64-defconfig + PROVE_LOCKING. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> https://lkml.kernel.org/r/20200623083721.336906073@infradead.org
|
#
abc3fce7 |
|
02-Apr-2020 |
Nicholas Piggin <npiggin@gmail.com> |
Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled" This reverts commit ebb37cf3ffd39fdb6ec5b07111f8bb2f11d92c5f. That commit does not play well with soft-masked irq state manipulations in idle, interrupt replay, and possibly others due to tracing code sometimes using irq_work_queue (e.g., in trace_hardirqs_on()). That can cause PACA_IRQ_DEC to become set when it is not expected, and be ignored or cleared or cause warnings. The net result seems to be missing an irq_work until the next timer interrupt in the worst case which is usually not going to be noticed, however it could be a long time if the tick is disabled, which is against the spirit of irq_work and might cause real problems. The idea is still solid, but it would need more work. It's not really clear if it would be worth added complexity, so revert this for now (not a straight revert, but replace with a comment explaining why we might see interrupts happening, and gives git blame something to find). Fixes: ebb37cf3ffd3 ("powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200402120401.1115883-1-npiggin@gmail.com
|
#
60083063 |
|
13-Feb-2020 |
Geert Uytterhoeven <geert+renesas@glider.be> |
powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h> The PowerPC time code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Remove the #ifdef protecting the of_clk_init() call, as a stub is available for the !CONFIG_COMMON_CLK case. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200213083804.24315-1-geert+renesas@glider.be
|
#
2babd6ea |
|
25-Feb-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/exception: Avoid touching the stack in hdecrementer The hdec interrupt handler is reported to sometimes fire in Linux if KVM leaves it pending after a guest exists. This is harmless, so there is a no-op handler for it. The interrupt handler currently uses the regular kernel stack. Change this to avoid touching the stack entirely. This should be the last place where the regular Linux stack can be accessed with asynchronous interrupts (including PMI) soft-masked. It might be possible to take advantage of this invariant, e.g., to context switch the kernel stack SLB entry without clearing MSR[EE]. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-17-npiggin@gmail.com
|
#
55226345 |
|
02-Dec-2019 |
Vincenzo Frascino <vincenzo.frascino@arm.com> |
powerpc: Fix vDSO clock_getres() clock_getres in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; and hrtimer_resolution depends on the enablement of the high resolution timers that can happen either at compile or at run time. Fix the powerpc vdso implementation of clock_getres keeping a copy of hrtimer_resolution in vdso data and using that directly. Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Cc: stable@vger.kernel.org Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Shuah Khan <skhan@linuxfoundation.org> [chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a55eca3a5e85233838c2349783bcb5164dae1d09.1575273217.git.christophe.leroy@c-s.fr
|
#
176ed98c |
|
27-Oct-2019 |
Arnd Bergmann <arnd@arndb.de> |
y2038: vdso: powerpc: avoid timespec references As a preparation to stop using 'struct timespec' in the kernel, change the powerpc vdso implementation: - split up the vdso data definition to have equivalent members for seconds and nanoseconds instead of an xtime structure - use timespec64 as an intermediate for the xtime update - change the asm-offsets definition to be based the appropriate fixed-length types This is only a temporary fix for changing the types, in order to actually support a 64-bit safe vdso32 version of clock_gettime(), the entire powerpc vdso should be replaced with the generic lib/vdso/ implementation. If that happens first, this patch becomes obsolete. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
eb8e20f8 |
|
13-Oct-2019 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/pseries: Mark accumulate_stolen_time() as notrace accumulate_stolen_time() is called prior to interrupt state being reconciled, which can trip the warning in arch_local_irq_restore(): WARNING: CPU: 5 PID: 1017 at arch/powerpc/kernel/irq.c:258 .arch_local_irq_restore+0x9c/0x130 ... NIP .arch_local_irq_restore+0x9c/0x130 LR .rb_start_commit+0x38/0x80 Call Trace: .ring_buffer_lock_reserve+0xe4/0x620 .trace_function+0x44/0x210 .function_trace_call+0x148/0x170 .ftrace_ops_no_ops+0x180/0x1d0 ftrace_call+0x4/0x8 .accumulate_stolen_time+0x1c/0xb0 decrementer_common+0x124/0x160 For now just mark it as notrace. We may change the ordering to call it after interrupt state has been reconciled, but that is a larger change. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191024055932.27940-1-mpe@ellerman.id.au
|
#
f83eeb1a |
|
03-Oct-2019 |
Frederic Weisbecker <frederic@kernel.org> |
sched/cputime: Rename vtime_account_system() to vtime_account_kernel() vtime_account_system() decides if we need to account the time to the system (__vtime_account_system()) or to the guest (vtime_account_guest()). So this function is a misnomer as we are on a higher level than "system". All we know when we call that function is that we are accounting kernel cputime. Whether it belongs to guest or system time is a lower level detail. Rename this function to vtime_account_kernel(). This will clarify things and avoid too many underscored vtime_account_system() versions. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191003161745.28464-2-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
de269129 |
|
04-Mar-2019 |
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> |
powerpc/hmi: Fix kernel hang when TB is in error state. On TOD/TB errors timebase register stops/freezes until HMI error recovery gets TOD/TB back into running state. On successful recovery, TB starts running again and udelay() that relies on TB value continues to function properly. But in case when HMI fails to recover from TOD/TB errors, the TB register stay freezed. With TB not running the __delay() function keeps looping and never return. If __delay() is called while in panic path then system hangs and never reboots after panic. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
6917735e |
|
23-Mar-2019 |
Jagadeesh Pagadala <jagdsh.linux@gmail.com> |
powerpc: Remove duplicate headers Remove duplicate headers inclusions. Signed-off-by: Jagadeesh Pagadala <jagdsh.linux@gmail.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
75f8a375 |
|
28-Jan-2019 |
Brajeswar Ghosh <brajeswar.linux@gmail.com> |
powerpc/kernel/time: Remove duplicate header Remove linux/rtc.h which is included more than once Signed-off-by: Brajeswar Ghosh <brajeswar.linux@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
51eeef9e |
|
02-Aug-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected If CONFIG_PPC_SPLPAR is not selected, steal_time will always be NUL, so accounting it is pointless Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
abcff86d |
|
02-Aug-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64 scaled cputime is only meaningfull when the processor has SPURR and/or PURR, which means only on PPC64. Removing it on PPC32 significantly reduces the size of vtime_account_system() and vtime_account_idle() on an 8xx: Before: 00000000 l F .text 000000a8 vtime_delta 00000280 g F .text 0000010c vtime_account_system 0000038c g F .text 00000048 vtime_account_idle After: (vtime_delta gets inlined inside the two functions) 000001d8 g F .text 000000a0 vtime_account_system 00000278 g F .text 00000038 vtime_account_idle In terms of performance, we also get approximatly 7% improvement on task switch. The following small benchmark app is run with perf stat: void *thread(void *arg) { int i; for (i = 0; i < atoi((char*)arg); i++) pthread_yield(); } int main(int argc, char **argv) { pthread_t th1, th2; pthread_create(&th1, NULL, thread, argv[1]); pthread_create(&th2, NULL, thread, argv[1]); pthread_join(th1, NULL); pthread_join(th2, NULL); return 0; } Before the patch: Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs): 8228.476465 task-clock (msec) # 0.954 CPUs utilized ( +- 0.23% ) 200004 context-switches # 0.024 M/sec ( +- 0.00% ) After the patch: Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs): 7649.070444 task-clock (msec) # 0.955 CPUs utilized ( +- 0.27% ) 200004 context-switches # 0.026 M/sec ( +- 0.00% ) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
b38a181c |
|
02-Aug-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/time: isolate scaled cputime accounting in dedicated functions. scaled cputime is only meaningfull when the processor has SPURR and/or PURR, which means only on PPC64. In preparation of the following patch that will remove CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC32, this patch moves all scaled cputing accounting logic into dedicated functions. This patch doesn't change any functionality. It's only code reorganisation. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
b4d16ab5 |
|
17-Oct-2018 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/time: Fix clockevent_decrementer initalisation for PR KVM In the recent commit 8b78fdb045de ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer") we changed the way we initialise the decrementer clockevent(s). We no longer initialise the mult & shift values of decrementer_clockevent itself. This has the effect of breaking PR KVM, because it uses those values in kvmppc_emulate_dec(). The symptom is guest kernels spin forever mid-way through boot. For now fix it by assigning back to decrementer_clockevent the mult and shift values. Fixes: 8b78fdb045de ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
81759360 |
|
01-Oct-2018 |
Anton Blanchard <anton@ozlabs.org> |
powerpc/time: Add set_state_oneshot_stopped decrementer callback If CONFIG_PPC_WATCHDOG is enabled we always cap the decrementer to 0x7fffffff: if (IS_ENABLED(CONFIG_PPC_WATCHDOG)) set_dec(0x7fffffff); else set_dec(decrementer_max); If there are no future events, we don't reprogram the decrementer after this and we end up with 0x7fffffff even on a large decrementer capable system. As suggested by Nick, add a set_state_oneshot_stopped callback so we program the decrementer with decrementer_max if there are no future events. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8b78fdb0 |
|
01-Oct-2018 |
Anton Blanchard <anton@ozlabs.org> |
powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer We currently cap the decrementer clockevent at 4 seconds, even on systems with large decrementer support. Fix this by converting the code to use clockevents_register_device() which calculates the upper bound based on the max_delta passed in. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
60f1d289 |
|
29-May-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/time: inline arch_vtime_task_switch() arch_vtime_task_switch() is a small function which is called only from vtime_common_task_switch(), so it is worth inlining Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
34efabe4 |
|
23-Apr-2018 |
Arnd Bergmann <arnd@arndb.de> |
powerpc: remove unused to_tm() helper to_tm() is now completely unused, the only reference being in the _dump_time() helper that is also unused. This removes both, leaving the rest of the powerpc RTC code y2038 safe to as far as the hardware supports. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5235afa8 |
|
23-Apr-2018 |
Arnd Bergmann <arnd@arndb.de> |
powerpc: use time64_t in update_persistent_clock update_persistent_clock() is deprecated because it suffers from overflow in 2038 on 32-bit architectures. This changes powerpc to use the update_persistent_clock64() replacement, and to pass down 64-bit timestamps consistently. This is now simpler, as we no longer have to worry about the offset numbers in tm_year and tm_mon that are different between the Linux conventions and RTAS. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5bfd6435 |
|
23-Apr-2018 |
Arnd Bergmann <arnd@arndb.de> |
powerpc: use time64_t in read_persistent_clock Looking through the remaining users of the deprecated mktime() function, I found the powerpc rtc handlers, which use it in place of rtc_tm_to_time64(). To clean this up, I'm changing over the read_persistent_clock() function to the read_persistent_clock64() variant, and change all the platform specific handlers along with it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e360cd37 |
|
04-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/time: account broadcast timer event interrupts separately These are not local timer interrupts but IPIs. It's good to be able to see how timer offloading is behaving, so split these out into their own category. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bc907113 |
|
04-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: move timer broadcast code under GENERIC_CLOCKEVENTS_BROADCAST ifdef Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a7cba02d |
|
04-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: allow soft-NMI watchdog to cover timer interrupts with large decrementers Large decrementers (e.g., POWER9) can take a very long time to wrap, so when the timer iterrupt handler sets the decrementer to max so as to avoid taking another decrementer interrupt when hard enabling interrupts before running timers, it effectively disables the soft NMI coverage for timer interrupts. Fix this by using the traditional 31-bit value instead, which wraps after a few seconds. masked interrupt code does the same thing, and in normal operation neither of these paths would ever wrap even the 31 bit value. Note: the SMP watchdog should catch timer interrupt lockups, but it is preferable for the local soft-NMI to catch them, mainly to avoid the IPI. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
3f984620 |
|
04-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: generic clockevents broadcast receiver call tick_receive_broadcast The broadcast tick recipient can call tick_receive_broadcast rather than re-running the full timer interrupt. It does not have to check for the next event time, because the sender already determined the timer has expired. It does not have to test irq_work_pending, because that's a direct decrementer interrupt and does not go through the clock events subsystem. And it does not have to read PURR because that was removed with the previous patch. This results in no code size change, but both the decrementer and broadcast path lengths are reduced. Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
3d3a6021 |
|
04-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/pseries: lparcfg calculate PURR on demand For SPLPAR, lparcfg provides a sum of PURR registers for all CPUs. Currently this is done by reading PURR in context switch and timer interrupt, and storing that into a per-CPU variable. These are summed to provide the value. This does not work with all timer schemes (e.g., NO_HZ_FULL), and it is sub-optimal for performance because it reads the PURR register on every context switch, although that's been difficult to distinguish from noise in the contxt_switch microbenchmark. This patch implements the sum by calling a function on each CPU, to read and add PURR values of each CPU. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
ebb37cf3 |
|
04-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled irq_work_raise should not cause a decrementer exception unless it is called from NMI context. Doing so often just results in an immediate masked decrementer interrupt: <...>-550 90d... 4us : update_curr_rt <-dequeue_task_rt <...>-550 90d... 5us : dbs_update_util_handler <-update_curr_rt <...>-550 90d... 6us : arch_irq_work_raise <-irq_work_queue <...>-550 90d... 7us : soft_nmi_interrupt <-soft_nmi_common <...>-550 90d... 7us : printk_nmi_enter <-soft_nmi_interrupt <...>-550 90d.Z. 8us : rcu_nmi_enter <-soft_nmi_interrupt <...>-550 90d.Z. 9us : rcu_nmi_exit <-soft_nmi_interrupt <...>-550 90d... 9us : printk_nmi_exit <-soft_nmi_interrupt <...>-550 90d... 10us : cpuacct_charge <-update_curr_rt The soft_nmi_interrupt here is the call into the watchdog, due to the decrementer interrupt firing with irqs soft-disabled. This is harmless, but sub-optimal. When it's not called from NMI context or with interrupts enabled, mark the decrementer pending in the irq_happened mask directly, rather than having the masked decrementer interupt handler do it. This will be replayed at the next local_irq_enable. See the comment for details. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a6201da3 |
|
02-Apr-2018 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc: Fix oops due to bad access of lppaca on bare metal Commit 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") removed allocation of lppaca on bare metal platforms. But with CONFIG_PPC_SPLPAR enabled, we still access the lppaca on bare metal in some code paths. Fix this but adding runtime checks for SPLPAR (shared processor LPAR). Fixes: 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
890ae797 |
|
21-Feb-2018 |
Alexandre Belloni <alexandre.belloni@bootlin.com> |
powerpc/time: stop validating rtc_time in .read_time The RTC core is always calling rtc_valid_tm after the read_time callback. It is not necessary to call it just before returning from the callback. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4e26bc4a |
|
19-Dec-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/64: Rename soft_enabled to irq_soft_mask Rename the paca->soft_enabled to paca->irq_soft_mask as it is no longer used as a flag for interrupt state, but a mask. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e0b5687b |
|
19-Dec-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/64: Implement and use soft_enabled_return API Add a new wrapper function, soft_enabled_return(), added to return paca->soft_enabled value. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
0b63acf4 |
|
19-Dec-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/64: Move set_soft_enabled() and rename Move set_soft_enabled() from powerpc/kernel/irq.c to asm/hw_irq.c, to encourage updates to paca->soft_enabled done via these access function. Add "memory" clobber to hint compiler since paca->soft_enabled memory is the target here. Renaming it as soft_enabled_set() will make namespaces works better as prefix than a postfix when new soft_enabled manipulation functions are introduced. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c2e480ba |
|
19-Dec-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/64: Add #defines for paca->soft_enabled flags Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when updating paca->soft_enabled. Replace the hardcoded values used when updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic change. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4e287e65 |
|
06-Jun-2017 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: use spin loop primitives in some functions Use the different spin loop primitives in some simple powerpc spin loops, including those which will spin as a common case. This will help to test the spin loop primitives before more conversions are done. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add some includes of <linux/processor.h>] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
d4cfb113 |
|
27-May-2017 |
Paul Mackerras <paulus@ozlabs.org> |
powerpc: Convert VDSO update function to use new update_vsyscall interface This converts the powerpc VDSO time update function to use the new interface introduced in commit 576094b7f0aa ("time: Introduce new GENERIC_TIME_VSYSCALL", 2012-09-11). Where the old interface gave us the time as of the last update in seconds and whole nanoseconds, with the new interface we get the nanoseconds part effectively in a binary fixed-point format with tk->tkr_mono.shift bits to the right of the binary point. With the old interface, the fractional nanoseconds got truncated, meaning that the value returned by the VDSO clock_gettime function would have about 1ns of jitter in it compared to the value computed by the generic timekeeping code in the kernel. The powerpc VDSO time functions (clock_gettime and gettimeofday) already work in units of 2^-32 seconds, or 0.23283 ns, because that makes it simple to split the result into seconds and fractional seconds, and represent the fractional seconds in either microseconds or nanoseconds. This is good enough accuracy for now, so this patch avoids changing how the VDSO works or the interface in the VDSO data page. This patch converts the powerpc update_vsyscall_old to be called update_vsyscall and use the new interface. We convert the fractional second to units of 2^-32 seconds without truncating to whole nanoseconds. (There is still a conversion to whole nanoseconds for any legacy users of the vdso_data/systemcfg stamp_xtime field.) In addition, this improves the accuracy of the computation of tb_to_xs for those systems with high-frequency timebase clocks (>= 268.5 MHz) by doing the right shift in two parts, one before the multiplication and one after, rather than doing the right shift before the multiplication. (We can't do all of the right shift after the multiplication unless we use 128-bit arithmetic.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
6b847d79 |
|
20-Jun-2017 |
Santosh Sivaraj <santosh@fossix.org> |
powerpc/time: Fix tracing in time.c Since trace_clock is in a different file and already marked with notrace, enable tracing in time.c by removing it from the disabled list in Makefile. Also annotate clocksource read functions and sched_clock with notrace. Testing: Timer and ftrace selftests run with different trace clocks. Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
6e2f03e2 |
|
19-May-2017 |
Ivan Mikhaylov <ivan@de.ibm.com> |
powerpc/[booke|4xx]: Don't clobber TCR[WP] when setting TCR[DIE] Prevent a kernel panic caused by unintentionally clearing TCR watchdog bits. At this point in the kernel boot, the watchdog may have already been enabled by u-boot. The original code's attempt to write to the TCR register results in an inadvertent clearing of the watchdog configuration bits, causing the 476 to reset. Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
115631c3 |
|
30-Mar-2017 |
Nicolai Stange <nicstange@gmail.com> |
powerpc/time: Set ->min_delta_ticks and ->max_delta_ticks In preparation for making the clockevents core NTP correction aware, all clockevent device drivers must set ->min_delta_ticks and ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a clockevent device's rate is going to change dynamically and thus, the ratio of ns to ticks ceases to stay invariant. Make the powerpc arch's clockevent driver initialize these fields properly. This patch alone doesn't introduce any change in functionality as the clockevents core still looks exclusively at the (untouched) ->min_delta_ns and ->max_delta_ns. As soon as this has changed, a followup patch will purge the initialization of ->min_delta_ns and ->max_delta_ns from this driver. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Oliver O'Halloran <oohall@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
32ef5517 |
|
05-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> Introduce a trivial, mostly empty <linux/sched/cputime.h> header to prepare for the moving of cputime functionality out of sched.h. Update all code that relies on these facilities. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
e6017571 |
|
01-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9f3768e0 |
|
21-Feb-2017 |
Frédéric Weisbecker <fweisbec@gmail.com> |
powerpc: Remove leftover cputime_to_nsecs call causing build error This type conversion is a leftover that got ignored during the kcpustat conversion to nanosecs, resulting in build breakage with config having CONFIG_NO_HZ_FULL=y. arch/powerpc/kernel/time.c: In function 'running_clock': arch/powerpc/kernel/time.c:712:2: error: implicit declaration of function 'cputime_to_nsecs' [-Werror=implicit-function-declaration] return local_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]); All we need is to remove it. Fixes: e7f340ca9c07 ("powerpc, sched/cputime: Remove unused cputime definitions") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e7f340ca |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
powerpc, sched/cputime: Remove unused cputime definitions Since the core doesn't deal with cputime_t anymore, most of these APIs have been left unused. Lets remove these. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-33-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
fb8b049c |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime: Push time to account_system_time() in nsecs This is one more step toward converting cputime accounting to pure nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-25-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
18b43a9b |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime: Push time to account_idle_time() in nsecs This is one more step toward converting cputime accounting to pure nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-24-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
be9095ed |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime: Push time to account_steal_time() in nsecs This is one more step toward converting cputime accounting to pure nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-23-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
23244a5c |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime: Push time to account_user_time() in nsecs This is one more step toward converting cputime accounting to pure nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-22-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5613fda9 |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime: Convert task/group cputime to nsecs Now that most cputime readers use the transition API which return the task cputime in old style cputime_t, we can safely store the cputime in nsecs. This will eventually make cputime statistics less opaque and more granular. Back and forth convertions between cputime_t and nsecs in order to deal with cputime_t random granularity won't be needed anymore. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c8d7dabf |
|
05-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime: Rename vtime_account_user() to vtime_flush() CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y used to accumulate user time and account it on ticks and context switches only through the vtime_account_user() function. Now this model has been generalized on the 3 archs for all kind of cputime (system, irq, ...) and all the cputime flushing happens under vtime_account_user(). So let's rename this function to better reflect its new role. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-11-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a19ff1a2 |
|
05-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime, powerpc/vtime: Accumulate cputime and account only on tick/task switch Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on any context boundary: irq entry/exit, guest entry/exit, context switch, etc... Calling functions such as account_system_time(), account_user_time() and such can be costly, especially if they are called on many fastpath such as twice per IRQ. Those functions do more than just accounting to kcpustat and task cputime. Depending on the config, some subsystems can perform unpleasant multiplications and divisions, among other things. So lets accumulate the cputime instead and delay the accounting on ticks and context switches only. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-8-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f828c3d0 |
|
05-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime, powerpc: Migrate stolen_time field to the accounting structure That in order to gather all cputime accumulation to the same place. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-7-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
8c8b73c4 |
|
05-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime, powerpc: Prepare accounting structure for cputime flush on tick In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay cputime accounting to the tick, provide finegrained accumulators to powerpc in order to store the cputime until flushing. While at it, normalize the name of several fields according to common cputime naming. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-6-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
90d08ba2 |
|
05-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime, powerpc32: Fix stale scaled stime on context switch On context switch with powerpc32, the cputime is accumulated in the thread_info struct. So the switching-in task must move forward its start time snapshot to the current time in order to later compute the delta spent in system mode. This is what we do for the normal cputime by initializing the starttime field to the value of the previous task's starttime which got freshly updated. But we are missing the update of the scaled cputime start time. As a result we may be accounting too much scaled cputime later. Fix this by initializing the scaled cputime the same way we do for normal cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a5a1d1c2 |
|
21-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
|
#
7c0f6ba6 |
|
24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
40565b5a |
|
14-Nov-2016 |
Stanislaw Gruszka <sgruszka@redhat.com> |
sched/cputime, powerpc, s390: Make scaled cputime arch specific Only s390 and powerpc have hardware facilities allowing to measure cputimes scaled by frequency. On all other architectures utimescaled/stimescaled are equal to utime/stime (however they are accounted separately). Remove {u,s}timescaled accounting on all architectures except powerpc and s390, where those values are explicitly accounted in the proper places. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161031162143.GB12646@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
7008eb99 |
|
14-Nov-2016 |
Stanislaw Gruszka <sgruszka@redhat.com> |
sched/cputime, powerpc: Remove cputime_last_delta global variable Since commit: cf9efce0ce313 ("powerpc: Account time using timebase rather than PURR") cputime_last_delta is not initialized to other value than 0, hence it's not used except zero check and cputime_to_scaled() just returns the argument. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479175612-14718-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
0545d543 |
|
05-Sep-2016 |
Daniel Axtens <dja@axtens.net> |
powerpc/sparse: Add more assembler prototypes Another set of things that are only called from assembler and so need prototypes to keep sparse happy. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
9445aa1a |
|
13-Jan-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
ppc: move exports to definitions Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c223c903 |
|
17-May-2016 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc32: provide VIRT_CPU_ACCOUNTING This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture. PPC32 doesn't have the PACA structure, so we use the task_info structure to store the accounting data. In order to reuse on PPC32 the PPC64 functions, all u64 data has been replaced by 'unsigned long' so that it is u32 on PPC32 and u64 on PPC64 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
|
#
79901024 |
|
01-Jul-2016 |
Oliver O'Halloran <oohall@gmail.com> |
powerpc/timer: Large Decrementer support Power ISAv3 adds a large decrementer (LD) mode which increases the size of the decrementer register. The size of the enlarged decrementer register is between 32 and 64 bits with the exact size being dependent on the implementation. When in LD mode, reads are sign extended to 64 bits and a decrementer exception is raised when the high bit is set (i.e the value goes below zero). Writes however are truncated to the physical register width so some care needs to be taken to ensure that the high bit is not set when reloading the decrementer. This patch adds support for using the LD inside the host kernel on processors that support it. When LD mode is supported firmware will supply the ibm,dec-bits property for CPU nodes to allow the kernel to determine the maximum decrementer value. Enabling LD mode is a hypervisor privileged operation so the kernel can only enable it manually when running in hypervisor mode. Guests that support LD mode can request it using the "ibm,client-architecture-support" firmware call (not implemented in this patch) or some other platform specific method. If this property is not supplied then the traditional decrementer width of 32 bit is assumed and LD mode will not be enabled. This patch was based on initial work by Jack Miller. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
169047f4 |
|
30-May-2016 |
Arnd Bergmann <arnd@arndb.de> |
rtc: powerpc: provide rtc_class_ops directly The rtc-generic driver provides an architecture specific wrapper on top of the generic rtc_class_ops abstraction, and powerpc has another abstraction on top, which is a bit silly. This changes the powerpc rtc-generic device to provide its rtc_class_ops directly, to reduce the number of layers by one. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
|
#
7f92bc56 |
|
05-Jan-2016 |
Daniel Axtens <dja@axtens.net> |
powerpc: sparse: Include headers for __weak symbols Sometimes when sparse warns about undefined symbols, it isn't because they should have 'static' added, it's because they're overriding __weak symbols defined elsewhere, and the header has been missed. Fix a few of them by adding appropriate headers. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
00b912b0 |
|
15-Dec-2015 |
Daniel Axtens <dja@axtens.net> |
powerpc: Remove broken GregorianDay() GregorianDay() is supposed to calculate the day of the week (tm->tm_wday) for a given day/month/year. In that calcuation it indexed into an array called MonthOffset using tm->tm_mon-1. However tm_mon is zero-based, not one-based, so this is off-by-one. It also means that every January, GregoiranDay() will access element -1 of the MonthOffset array. It also doesn't appear to be a correct algorithm either: see in contrast kernel/time/timeconv.c's time_to_tm function. It's been broken forever, which suggests no-one in userland uses this. It looks like no-one in the kernel uses tm->tm_wday either (see e.g. drivers/rtc/rtc-ds1305.c:319). tm->tm_wday is conventionally set to -1 when not available in hardware so we can simply set it to -1 and drop the function. (There are over a dozen other drivers in drivers/rtc that do this.) Found using UBSAN. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrew Morton <akpm@linux-foundation.org> # as an example of what UBSan finds. Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: rtc-linux@googlegroups.com Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
37a13e78 |
|
16-Jul-2015 |
Viresh Kumar <viresh.kumar@linaro.org> |
powerpc/time: Migrate to new 'set-state' interface Migrate powerpc driver to the new 'set-state' interface provided by clockevents core, the earlier 'set-mode' interface is marked obsolete now. This also enables us to implement callbacks for new states of clockevent devices, for example: ONESHOT_STOPPED. We weren't doing anything in ->set_mode(ONSHOT) and so set_state_oneshot() isn't implemented. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
#
8f6b9512 |
|
01-May-2015 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: use device_initcall for registering rtc devices Currently these two RTC devices are in core platform code where it is not possible for them to be modular. It will never be modular, so using module_init as an alias for __initcall can be somewhat misleading. Fix this up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be a worse thing. Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of device_initcall directly in this change means that the runtime impact is zero -- they will remain at level 6 in initcall ordering. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Geoff Levand <geoff@infradead.org> Acked-by: Geoff Levand <geoff@infradead.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
b6c295df |
|
27-Mar-2015 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Accumulate timing information for real-mode code This reads the timebase at various points in the real-mode guest entry/exit code and uses that to accumulate total, minimum and maximum time spent in those parts of the code. Currently these times are accumulated per vcpu in 5 parts of the code: * rm_entry - time taken from the start of kvmppc_hv_entry() until just before entering the guest. * rm_intr - time from when we take a hypervisor interrupt in the guest until we either re-enter the guest or decide to exit to the host. This includes time spent handling hcalls in real mode. * rm_exit - time from when we decide to exit the guest until the return from kvmppc_hv_entry(). * guest - time spend in the guest * cede - time spent napping in real mode due to an H_CEDE hcall while other threads in the same vcore are active. These times are exposed in debugfs in a directory per vcpu that contains a file called "timings". This file contains one line for each of the 5 timings above, with the name followed by a colon and 4 numbers, which are the count (number of times the code has been executed), the total time, the minimum time, and the maximum time, all in nanoseconds. The overhead of the extra code amounts to about 30ns for an hcall that is handled in real mode (e.g. H_SET_DABR), which is about 25%. Since production environments may not wish to incur this overhead, the new code is conditional on a new config symbol, CONFIG_KVM_BOOK3S_HV_EXIT_TIMING. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
4be1b297 |
|
12-Feb-2015 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: add running_clock for powerpc to prevent spurious softlockup warnings On POWER8 virtualised kernels the VTB register can be read to have a view of time that only increases while the guest is running. This will prevent guests from seeing time jump if a guest is paused for significant amounts of time. On POWER7 and below virtualised kernels stolen time is subtracted from local_clock as a best effort approximation. This will not eliminate spurious warnings in the case of a suspended guest but may reduce the occurance in the case of softlockups due to host over commit. Bare metal kernels should avoid reading the VTB as KVM does not restore sane values when not executing, the approxmation is fine as host kernels won't observe any stolen time. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Jones <drjones@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: chai wen <chaiw.fnst@cn.fujitsu.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Ben Zhang <benzh@chromium.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f0d37300 |
|
03-Dec-2014 |
Kevin Hao <haokexin@gmail.com> |
powerpc: call of_clk_init() from time_init() So the boards which has COMMON_CLK enabled don't have to invoke this in its board specific file. Signed-off-by: Kevin Hao <haokexin@gmail.com> Acked-by: Scott Wood <scottwood@freescale.com> Acked-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Michael Turquette <mturquette@linaro.org>
|
#
16b1d26e |
|
14-Oct-2014 |
Neelesh Gupta <neelegup@linux.vnet.ibm.com> |
rtc/tpo: Driver to support rtc and wakeup on PowerNV platform The patch implements the OPAL rtc driver that binds with the rtc driver subsystem. The driver uses the platform device infrastructure to probe the rtc device and register it to rtc class framework. The 'wakeup' is supported depending upon the property 'has-tpo' present in the OF node. It provides a way to load the generic rtc driver in in the absence of an OPAL driver. The patch also moves the existing OPAL rtc get/set time interfaces to the new driver and exposes the necessary OPAL calls using EXPORT_SYMBOL_GPL. Test results: ------------- Host: [root@tul169p1 ~]# ls -l /sys/class/rtc/ total 0 lrwxrwxrwx 1 root root 0 Oct 14 03:07 rtc0 -> ../../devices/opal-rtc/rtc/rtc0 [root@tul169p1 ~]# cat /sys/devices/opal-rtc/rtc/rtc0/time 08:10:07 [root@tul169p1 ~]# echo `date '+%s' -d '+ 2 minutes'` > /sys/class/rtc/rtc0/wakealarm [root@tul169p1 ~]# cat /sys/class/rtc/rtc0/wakealarm 1413274345 [root@tul169p1 ~]# FSP: $ smgr mfgState standby $ rtim timeofday System time is valid: 2014/10/14 08:12:04.225115 $ smgr mfgState ipling $ CC: devicetree@vger.kernel.org CC: tglx@linutronix.de CC: rtc-linux@googlegroups.com CC: a.zummo@towertech.it Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
69111bac |
|
21-Oct-2014 |
Christoph Lameter <cl@linux.com> |
powerpc: Replace __get_cpu_var uses This still has not been merged and now powerpc is the only arch that does not have this change. Sorry about missing linuxppc-dev before. V2->V2 - Fix up to work against 3.18-rc1 __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> [mpe: Fix build errors caused by set/or_softirq_pending(), and rework assignment in __set_breakpoint() to use memcpy().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e51df2c1 |
|
19-Aug-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Make a bunch of things static Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e1802b06 |
|
19-Aug-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Move more symbol exports next to function definitions Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
23f66e2d |
|
27-Aug-2014 |
Tejun Heo <tj@kernel.org> |
Revert "powerpc: Replace __get_cpu_var uses" This reverts commit 5828f666c069af74e00db21559f1535103c9f79a due to build failure after merging with pending powerpc changes. Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.au Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
5828f666 |
|
16-Aug-2014 |
Christoph Lameter <cl@linux.com> |
powerpc: Replace __get_cpu_var uses __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) tj: Folded a fix patch. http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
4a0e6377 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource: Get rid of cycle_last cycle_last was added to the clocksource to support the TSC validation. We moved that to the core code, so we can get rid of the extra copy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
6e0fdf9a |
|
20-May-2014 |
Paul Bolle <pebolle@tiscali.nl> |
powerpc: fix typo 'CONFIG_PMAC' Commit b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending") added a check for CONFIG_PMAC were a check for CONFIG_PPC_PMAC was clearly intended. Fixes: b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending") Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8050936c |
|
09-May-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: irq work racing with timer interrupt can result in timer interrupt hang I am seeing an issue where a CPU running perf eventually hangs. Traces show timer interrupts happening every 4 seconds even when a userspace task is running on the CPU. /proc/timer_list also shows pending hrtimers have not run in over an hour, including the scheduler. Looking closer, decrementers_next_tb is getting set to 0xffffffffffffffff, and at that point we will never take a timer interrupt again. In __timer_interrupt() we set decrementers_next_tb to 0xffffffffffffffff and rely on ->event_handler to update it: *next_tb = ~(u64)0; if (evt->event_handler) evt->event_handler(evt); In this case ->event_handler is hrtimer_interrupt. This will eventually call back through the clockevents code with the next event to be programmed: static int decrementer_set_next_event(unsigned long evt, struct clock_event_device *dev) { /* Don't adjust the decrementer if some irq work is pending */ if (test_irq_work_pending()) return 0; __get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt; If irq work came in between these two points, we will return before updating decrementers_next_tb and we never process a timer interrupt again. This looks to have been introduced by 0215f7d8c53f (powerpc: Fix races with irq_work). Fix it by removing the early exit and relying on code later on in the function to force an early decrementer: /* We may have raced with new irq work */ if (test_irq_work_pending()) set_dec(1); Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0d948730 |
|
25-Feb-2014 |
Preeti U Murthy <preeti@linux.vnet.ibm.com> |
cpuidle/powernv: Add "Fast-Sleep" CPU idle state Fast sleep is one of the deep idle states on Power8 in which local timers of CPUs stop. On PowerPC we do not have an external clock device which can handle wakeup of such CPUs. Now that we have the support in the tick broadcast framework for archs that do not sport such a device and the low level support for fast sleep, enable it in the cpuidle framework on PowerNV. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
1b783955 |
|
25-Feb-2014 |
Preeti U Murthy <preeti@linux.vnet.ibm.com> |
powerpc: Split timer_interrupt() into timer handling and interrupt handling routines Split timer_interrupt(), which is the local timer interrupt handler on ppc into routines called during regular interrupt handling and __timer_interrupt(), which takes care of running local timers and collecting time related stats. This will enable callers interested only in running expired local timers to directly call into __timer_interupt(). One of the use cases of this is the tick broadcast IPI handling in which the sleeping CPUs need to handle the local timers that have expired. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
1b67bee1 |
|
25-Feb-2014 |
Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> |
powerpc: Implement tick broadcast IPI as a fixed IPI message For scalability and performance reasons, we want the tick broadcast IPIs to be handled as efficiently as possible. Fixed IPI messages are one of the most efficient mechanisms available - they are faster than the smp_call_function mechanism because the IPI handlers are fixed and hence they don't involve costly operations such as adding IPI handlers to the target CPU's function queue, acquiring locks for synchronization etc. Luckily we have an unused IPI message slot, so use that to implement tick broadcast IPIs efficiently. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> [Functions renamed to tick_broadcast* and Changelog modified by Preeti U. Murthy<preeti@linux.vnet.ibm.com>] Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Acked-by: Geoff Levand <geoff@infradead.org> [For the PS3 part] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0215f7d8 |
|
13-Jan-2014 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Fix races with irq_work If we set irq_work on a processor and immediately afterward, before the irq work has a chance to be processed, we change the decrementer value, we can seriously delay the handling of that irq_work. Fix it by checking in a few places for pending irq work, first before changing the decrementer in decrementer_set_next_event() and after changing it in the same function and in timer_interrupt(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
c041cfa2 |
|
23-Jan-2013 |
fan.du <fan.du@windriver.com> |
powerpc: Make irq_stat.timers_irqs counting more specific Current irq_stat.timers_irqs counting doesn't discriminate timer event handler and other timer interrupt(like arch_irq_work_raise). Sometimes we need to know exactly how much interrupts timer event handler fired, so let's be more specific on this. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
84b07386 |
|
16-Nov-2013 |
Anton Blanchard <anton@samba.org> |
powerpc/pseries: Duplicate dtl entries sometimes sent to userspace When reading from the dispatch trace log (dtl) userspace interface, I sometimes see duplicate entries. One example: # hexdump -C dtl.out 00000000 07 04 00 0c 00 00 48 44 00 00 00 00 00 00 00 00 00000010 00 0c a0 b4 16 83 6d 68 00 00 00 00 00 00 00 00 00000020 00 00 00 00 10 00 13 50 80 00 00 00 00 00 d0 32 00000030 07 04 00 0c 00 00 48 44 00 00 00 00 00 00 00 00 00000040 00 0c a0 b4 16 83 6d 68 00 00 00 00 00 00 00 00 00000050 00 00 00 00 10 00 13 50 80 00 00 00 00 00 d0 32 The problem is in scan_dispatch_log() where we call dtl_consumer() but bail out before incrementing the index. To fix this I moved dtl_consumer() after the timebase comparison. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7ffcf8ec |
|
06-Aug-2013 |
Anton Blanchard <anton@samba.org> |
powerpc: Fix little endian lppaca, slb_shadow and dtl_entry The lppaca, slb_shadow and dtl_entry hypervisor structures are big endian, so we have to byte swap them in little endian builds. LE KVM hosts will also need to be fixed but for now add an #error to remind us. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
6f7aba7b |
|
06-Aug-2013 |
Anton Blanchard <anton@samba.org> |
powerpc: Add some endian annotations to time and xics code Fix a couple of sparse warnings. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8c6ffba0 |
|
14-Jul-2013 |
Rusty Russell <rusty@rustcorp.com.au> |
PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. Sweep of the simple cases. Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arm-kernel@lists.infradead.org Cc: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
061d19f2 |
|
24-Jun-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: Delete __cpuinit usage from all users The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the powerpc uses of the __cpuinit macros. There are no __CPUINIT users in assembly files in powerpc. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
09652b00 |
|
09-Mar-2013 |
Adrian-Leonard Radu <ady8radu@gmail.com> |
powerpc: Use PTR_RET instead of IS_ERR/PTR_ERR Signed-off-by: Adrian-Leonard Radu <ady8radu@gmail.com> Acked-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
|
#
689dfa89 |
|
15-Jan-2013 |
Tiejun Chen <tiejun.chen@windriver.com> |
powerpc: Max next_tb to prevent from replaying timer interrupt With lazy interrupt, we always call __check_irq_replaysome with decrementers_next_tb to check if we need to replay timer interrupt. So in hotplug case we also need to set decrementers_next_tb as MAX to make sure __check_irq_replay don't replay timer interrupt when return as we expect, otherwise we'll trap here infinitely. Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
c11f11fc |
|
20-Jan-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
kvm: Prepare to add generic guest entry/exit callbacks Do some ground preparatory work before adding guest_enter() and guest_exit() context tracking callbacks. Those will be later used to read the guest cputime safely when we run in full dynticks mode. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Gleb Natapov <gleb@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
abf917cd |
|
24-Jul-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Generic on-demand virtual cputime accounting If we want to stop the tick further idle, we need to be able to account the cputime without using the tick. Virtual based cputime accounting solves that problem by hooking into kernel/user boundaries. However implementing CONFIG_VIRT_CPU_ACCOUNTING require low level hooks and involves more overhead. But we already have a generic context tracking subsystem that is required for RCU needs by archs which plan to shut down the tick outside idle. This patch implements a generic virtual based cputime accounting that relies on these generic kernel/user hooks. There are some upsides of doing this: - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING if context tracking is already built (already necessary for RCU in full tickless mode). - We can rely on the generic context tracking subsystem to dynamically (de)activate the hooks, so that we can switch anytime between virtual and tick based accounting. This way we don't have the overhead of the virtual accounting when the tick is running periodically. And one downside: - There is probably more overhead than a native virtual based cputime accounting. But this relies on hooks that are already set anyway. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
023f333a |
|
17-Dec-2012 |
Jason Gunthorpe <jgg@ziepe.ca> |
NTP: Add a CONFIG_RTC_SYSTOHC configuration The purpose of this option is to allow ARM/etc systems that rely on the class RTC subsystem to have the same kind of automatic NTP based synchronization that we have on PC platforms. Today ARM does not implement update_persistent_clock and makes extensive use of the class RTC system. When enabled CONFIG_RTC_SYSTOHC will provide a generic rtc_update_persistent_clock that stores the current time in the RTC and is intended complement the existing CONFIG_RTC_HCTOSYS option that loads the RTC at boot. Like with RTC_HCTOSYS the platform's update_persistent_clock is used first, if it works. Platforms with mixed class RTC and non-RTC drivers need to return ENODEV when class RTC should be used. Such an update for PPC is included in this patch. Long term, implementations of update_persistent_clock should migrate to proper class RTC drivers and use CONFIG_RTC_SYSTOHC instead. Tested on ARM kirkwood and PPC405 Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
ce73ec6d |
|
08-Nov-2012 |
Shan Hai <shan.hai@windriver.com> |
powerpc/vdso: Remove redundant locking in update_vsyscall_tz() The locking in update_vsyscall_tz() is not only unnecessary because the vdso code copies the data unproteced in __kernel_gettimeofday() but also introduces a hard to reproduce race condition between update_vsyscall() and update_vsyscall_tz(), which causes user space process to loop forever in vdso code. The following patch removes the locking from update_vsyscall_tz(). Locking is not only unnecessary because the vdso code copies the data unprotected in __kernel_gettimeofday() but also erroneous because updating the tb_update_count is not atomic and introduces a hard to reproduce race condition between update_vsyscall() and update_vsyscall_tz(), which further causes user space process to loop forever in vdso code. The below scenario describes the race condition, x==0 Boot CPU other CPU proc_P: x==0 timer interrupt update_vsyscall x==1 x++;sync settimeofday update_vsyscall_tz x==2 x++;sync x==3 sync;x++ sync;x++ proc_P: x==3 (loops until x becomes even) Because the ++ operator would be implemented as three instructions and not atomic on powerpc. A similar change was made for x86 in commit 6c260d58634 ("x86: vdso: Remove bogus locking in update_vsyscall_tz") Signed-off-by: Shan Hai <shan.hai@windriver.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
1b2852b1 |
|
19-Nov-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
vtime: Warn if irqs aren't disabled on system time accounting APIs System time accounting APIs such as vtime_account_system() and vtime_account_idle() need to be irqsafe. Current callers include irq entry, exit and kvm, all of which have been checked against that requirement. Now it's better to grow that with an automatic check in case we have further callers or we missed something. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
e3942ba0 |
|
13-Nov-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
vtime: Consolidate a bit the ctx switch code On ia64 and powerpc, vtime context switch only consists in flushing system and user pending time, plus a few arch housekeeping. Consolidate that into a generic implementation. s390 is a special case because pending user and system time accounting there is hard to dissociate. So it's keeping its own implementation. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
bcebdf84 |
|
13-Nov-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
vtime: Explicitly account pending user time on process tick All vtime implementations just flush the user time on process tick. Consolidate that in generic code by calling a user time accounting helper. This avoids an indirect call in ia64 and prepare to also consolidate vtime context switch code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
fd25b4c2 |
|
13-Nov-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
vtime: Remove the underscore prefix invasion Prepending irq-unsafe vtime APIs with underscores was actually a bad idea as the result is a big mess in the API namespace that is even waiting to be further extended. Also these helpers are always called from irq safe callers except kvm. Just provide a vtime_account_system_irqsafe() for this specific case so that we can remove the underscore prefix on other vtime functions. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
11113334 |
|
24-Oct-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
vtime: Make vtime_account_system() irqsafe vtime_account_system() currently has only one caller with vtime_account() which is irq safe. Now we are going to call it from other places like kvm where irqs are not always disabled by the time we account the cputime. So let's make it irqsafe. The arch implementation part is now prefixed with "__". vtime_account_idle() arch implementation is prefixed accordingly to stay consistent. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
a7e1a9e3 |
|
08-Sep-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
vtime: Consolidate system/idle context detection Move the code that finds out to which context we account the cputime into generic layer. Archs that consider the whole time spent in the idle task as idle time (ia64, powerpc) can rely on the generic vtime_account() and implement vtime_account_system() and vtime_account_idle(), letting the generic code to decide when to call which API. Archs that have their own meaning of idle time, such as s390 that only considers the time spent in CPU low power mode as idle time, can just override vtime_account(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
bf9fae9f |
|
08-Sep-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Use a proper subsystem naming for vtime related APIs Use a naming based on vtime as a prefix for virtual based cputime accounting APIs: - account_system_vtime() -> vtime_account() - account_switch_vtime() -> vtime_task_switch() It makes it easier to allow for further declension such as vtime_account_system(), vtime_account_idle(), ... if we want to find out the context we account to from generic code. This also make it better to know on which subsystem these APIs refer to. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
70639421 |
|
04-Sep-2012 |
John Stultz <john.stultz@linaro.org> |
time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD To help migrate archtectures over to the new update_vsyscall method, redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
189374ae |
|
04-Sep-2012 |
John Stultz <john.stultz@linaro.org> |
time: Move update_vsyscall definitions to timekeeper_internal.h Since users will need to include timekeeper_internal.h, move update_vsyscall definitions to timekeeper_internal.h. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
e72bbbab |
|
10-Sep-2012 |
Li Zhong <zhong@linux.vnet.ibm.com> |
powerpc/trace: Fix interrupt tracepoints vs. RCU There are a few tracepoints in the interrupt code path, which is before irq_enter(), or after irq_exit(), like trace_irq_entry()/trace_irq_exit() in do_IRQ(), trace_timer_interrupt_entry()/trace_timer_interrupt_exit() in timer_interrupt(). If the interrupt is from idle(), and because tracepoint contains RCU read-side critical section, we could see following suspicious RCU usage reported: [ 145.127743] =============================== [ 145.127747] [ INFO: suspicious RCU usage. ] [ 145.127752] 3.6.0-rc3+ #1 Not tainted [ 145.127755] ------------------------------- [ 145.127759] /root/.workdir/linux/arch/powerpc/include/asm/trace.h:33 suspicious rcu_dereference_check() usage! [ 145.127765] [ 145.127765] other info that might help us debug this: [ 145.127765] [ 145.127771] [ 145.127771] RCU used illegally from idle CPU! [ 145.127771] rcu_scheduler_active = 1, debug_locks = 0 [ 145.127777] RCU used illegally from extended quiescent state! [ 145.127781] no locks held by swapper/0/0. [ 145.127785] [ 145.127785] stack backtrace: [ 145.127789] Call Trace: [ 145.127796] [c00000000108b530] [c000000000013c40] .show_stack +0x70/0x1c0 (unreliable) [ 145.127806] [c00000000108b5e0] [c0000000000f59d8] .lockdep_rcu_suspicious+0x118/0x150 [ 145.127813] [c00000000108b680] [c00000000000fc58] .do_IRQ+0x498/0x500 [ 145.127820] [c00000000108b750] [c000000000003950] hardware_interrupt_common+0x150/0x180 [ 145.127828] --- Exception: 501 at .plpar_hcall_norets+0x84/0xd4 [ 145.127828] LR = .check_and_cede_processor+0x38/0x70 [ 145.127836] [c00000000108bab0] [c0000000000665dc] .shared_cede_loop +0x5c/0x100 [ 145.127844] [c00000000108bb70] [c000000000588ab0] .cpuidle_enter +0x30/0x50 [ 145.127850] [c00000000108bbe0] [c000000000588b0c] .cpuidle_enter_state+0x3c/0xb0 [ 145.127857] [c00000000108bc60] [c000000000589730] .cpuidle_idle_call +0x150/0x6c0 [ 145.127863] [c00000000108bd30] [c000000000058440] .pSeries_idle +0x10/0x40 [ 145.127870] [c00000000108bda0] [c00000000001683c] .cpu_idle +0x18c/0x2d0 [ 145.127876] [c00000000108be60] [c00000000000b434] .rest_init +0x124/0x1b0 [ 145.127884] [c00000000108bef0] [c0000000009d0d28] .start_kernel +0x568/0x588 [ 145.127890] [c00000000108bf90] [c000000000009660] .start_here_common +0x20/0x40 This is because the RCU usage in interrupt context should be used in area marked by rcu_irq_enter()/rcu_irq_exit(), called in irq_enter()/irq_exit() respectively. Move them into the irq_enter()/irq_exit() area to avoid the reporting. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
dabe859e |
|
26-Jul-2012 |
Paul Mackerras <paulus@samba.org> |
powerpc: Give hypervisor decrementer interrupts their own handler At the moment the handler for hypervisor decrementer interrupts is the same as for decrementer interrupts, i.e. timer_interrupt(). This is bogus; if we ever do get a hypervisor decrementer interrupt it won't have anything to do with the next timer event. In fact the only time we get hypervisor decrementer interrupts is when one is left pending on exit from a KVM guest. When we get a hypervisor decrementer interrupt we don't need to do anything special to clear it, since they are edge-triggered on the transition of HDEC from 0 to -1. Thus this adds an empty handler function for them. We don't need to have them masked when interrupts are soft-disabled, so we use STD_EXCEPTION_HV instead of MASKABLE_EXCEPTION_HV. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
baa36046 |
|
18-Jun-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Consolidate vtime handling on context switch The archs that implement virtual cputime accounting all flush the cputime of a task when it gets descheduled and sometimes set up some ground initialization for the next task to account its cputime. These archs all put their own hooks in their context switch callbacks and handle the off-case themselves. Consolidate this by creating a new account_switch_vtime() callback called in generic code right after a context switch and that these archs must implement to flush the prev task cputime and initialize the next task cputime related state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
860aed25 |
|
01-Jun-2012 |
Paul Mackerras <paulus@samba.org> |
powerpc/time: Sanity check of decrementer expiration is necessary This reverts 68568add2c ("powerpc/time: Remove unnecessary sanity check of decrementer expiration"). We do need to check whether we have reached the expiration time of the next event, because we sometimes get an early decrementer interrupt, most notably when we set the decrementer to 1 in arch_irq_work_raise(). The effect of not having the sanity check is that if timer_interrupt() gets called early, we leave the decrementer set to its maximum value, which means we then don't get any more decrementer interrupts for about 4 seconds (or longer, depending on timebase frequency). I saw these pauses as a consequence of getting a stray hypervisor decrementer interrupt left over from exiting a KVM guest. This isn't quite a straight revert because of changes to the surrounding code, but it restores the same algorithm as was previously used. Cc: stable@vger.kernel.org Acked-by: Anton Blanchard <anton@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
6e35994d |
|
18-Apr-2012 |
Bharat Bhushan <r65777@freescale.com> |
KVM: PPC: Use clockevent multiplier and shifter for decrementer Time for which the hrtimer is started for decrementer emulation is calculated using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC reprogramming (if needed) and which calculate timebase ticks using the multiplier and shifter mechanism implemented within clockevent layer. It was observed that this conversion (timebase->time->timebase) are not correct because the mechanism are not consistent. In our setup it adds 2% jitter. With this patch clockevent multiplier and shifter mechanism are used when starting hrtimer for decrementer emulation. Now the jitter is < 0.5%. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f5339277 |
|
15-Mar-2012 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: Remove FW_FEATURE ISERIES from arch code This is no longer selectable, so just remove all the dependent code. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7230c564 |
|
06-Mar-2012 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Rework lazy-interrupt handling The current implementation of lazy interrupts handling has some issues that this tries to address. We don't do the various workarounds we need to do when re-enabling interrupts in some cases such as when returning from an interrupt and thus we may still lose or get delayed decrementer or doorbell interrupts. The current scheme also makes it much harder to handle the external "edge" interrupts provided by some BookE processors when using the EPR facility (External Proxy) and the Freescale Hypervisor. Additionally, we tend to keep interrupts hard disabled in a number of cases, such as decrementer interrupts, external interrupts, or when a masked decrementer interrupt is pending. This is sub-optimal. This is an attempt at fixing it all in one go by reworking the way we do the lazy interrupt disabling from the ground up. The base idea is to replace the "hard_enabled" field with a "irq_happened" field in which we store a bit mask of what interrupt occurred while soft-disabled. When re-enabling, either via arch_local_irq_restore() or when returning from an interrupt, we can now decide what to do by testing bits in that field. We then implement replaying of the missed interrupts either by re-using the existing exception frame (in exception exit case) or via the creation of a new one from an assembly trampoline (in the arch_local_irq_enable case). This removes the need to play with the decrementer to try to create fake interrupts, among others. In addition, this adds a few refinements: - We no longer hard disable decrementer interrupts that occur while soft-disabled. We now simply bump the decrementer back to max (on BookS) or leave it stopped (on BookE) and continue with hard interrupts enabled, which means that we'll potentially get better sample quality from performance monitor interrupts. - Timer, decrementer and doorbell interrupts now hard-enable shortly after removing the source of the interrupt, which means they no longer run entirely hard disabled. Again, this will improve perf sample quality. - On Book3E 64-bit, we now make the performance monitor interrupt act as an NMI like Book3S (the necessary C code for that to work appear to already be present in the FSL perf code, notably calling nmi_enter instead of irq_enter). (This also fixes a bug where BookE perfmon interrupts could clobber r14 ... oops) - We could make "masked" decrementer interrupts act as NMIs when doing timer-based perf sampling to improve the sample quality. Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: - Add hard-enable to decrementer, timer and doorbells - Fix CR clobber in masked irq handling on BookE - Make embedded perf interrupt act as an NMI - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want to retrigger an interrupt without preventing hard-enable v3: - Fix or vs. ori bug on Book3E - Fix enabling of interrupts for some exceptions on Book3E v4: - Fix resend of doorbells on return from interrupt on Book3E v5: - Rebased on top of my latest series, which involves some significant rework of some aspects of the patch. v6: - 32-bit compile fix - more compile fixes with various .config combos - factor out the asm code to soft-disable interrupts - remove the C wrapper around preempt_schedule_irq v7: - Fix a bug with hard irq state tracking on native power7
|
#
9f5072d4 |
|
09-Dec-2011 |
Andreas Schwab <schwab@linux-m68k.org> |
powerpc: Fix wrong divisor in usecs_to_cputime Commit d57af9b (taskstats: use real microsecond granularity for CPU times) renamed msecs_to_cputime to usecs_to_cputime, but failed to update all numbers on the way. This causes nonsensical cpu idle/iowait values to be displayed in /proc/stat (the only user of usecs_to_cputime so far). This also renames __cputime_msec_factor to __cputime_usec_factor, adapting its value and using it directly in cputime_to_usecs instead of doing two multiplications. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7df10275 |
|
23-Nov-2011 |
Anton Blanchard <anton@samba.org> |
powerpc/time: Optimise decrementer_check_overflow decrementer_check_overflow is called from arch_local_irq_restore so we want to make it as light weight as possible. As such, turn decrementer_check_overflow into an inline function. To avoid a circular mess of includes, separate out the two components of struct decrementer_clock and keep the struct clock_event_device part local to time.c. The fast path improves from: arch_local_irq_restore 0: mflr r0 4: std r0,16(r1) 8: stdu r1,-112(r1) c: stb r3,578(r13) 10: cmpdi cr7,r3,0 14: beq- cr7,24 <.arch_local_irq_restore+0x24> ... 24: addi r1,r1,112 28: ld r0,16(r1) 2c: mtlr r0 30: blr to: arch_local_irq_restore 0: std r30,-16(r1) 4: ld r30,0(r2) 8: stb r3,578(r13) c: cmpdi cr7,r3,0 10: beq- cr7,6c <.arch_local_irq_restore+0x6c> ... 6c: ld r30,-16(r1) 70: blr Unfortunately we still setup a local TOC (due to -mminimal-toc). Yet another sign we should be moving to -mcmodel=medium. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
621692cb |
|
23-Nov-2011 |
Anton Blanchard <anton@samba.org> |
powerpc/time: Fix some style issues Fix some formatting issues and use the DECREMENTER_MAX define instead of 0x7fffffff. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
68568add |
|
23-Nov-2011 |
Anton Blanchard <anton@samba.org> |
powerpc/time: Remove unnecessary sanity check of decrementer expiration The clockevents code uses max_delta_ns to avoid calling a clockevent with too large a value. Remove the redundant version of this in the timer_interrupt code. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
11b8633a |
|
23-Nov-2011 |
Anton Blanchard <anton@samba.org> |
powerpc/time: Use clocksource_register_hz Use clocksource_register_hz which calculates the shift/mult factors for us. Also remove the shift = 22 assumption in vsyscall_update - thanks to Paul Mackerras and John Stultz for catching that. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
d8afc6fd |
|
23-Nov-2011 |
Anton Blanchard <anton@samba.org> |
powerpc/time: Use clockevents_calc_mult_shift We can use clockevents_calc_mult_shift instead of doing all the work ourselves. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
37fb9a02 |
|
23-Nov-2011 |
Anton Blanchard <anton@samba.org> |
powerpc/time: Handle wrapping of decrementer When re-enabling interrupts we have code to handle edge sensitive decrementers by resetting the decrementer to 1 whenever it is negative. If interrupts were disabled long enough that the decrementer wrapped to positive we do nothing. This means interrupts can be delayed for a long time until it finally goes negative again. While we hope interrupts are never be disabled long enough for the decrementer to go positive, we have a very good test team that can drive any kernel into the ground. The softlockup data we get back from these fails could be seconds in the future, completely missing the cause of the lockup. We already keep track of the timebase of the next event so use that to work out if we should trigger a decrementer exception. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
4b16f8e2 |
|
22-Jul-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: various straight conversions from module.h --> export.h All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
4f8b50bb |
|
27-Jun-2011 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
irq_work, ppc: Fix up arch hooks Commit e360adbe29 ("irq_work: Add generic hardirq context callbacks") fouled up the ppc bit, not properly naming the arch specific function that raises the 'self-IPI'. Cc: Huang Ying <ying.huang@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: stable@kernel.org # 37+ Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-eg0aqien8p1aqvzu9dft6dtv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
84ffae55 |
|
07-Apr-2011 |
Anton Blanchard <anton@samba.org> |
powerpc: Fix oops if scan_dispatch_log is called too early We currently enable interrupts before the dispatch log for the boot cpu is setup. If a timer interrupt comes in early enough we oops in scan_dispatch_log: Unable to handle kernel paging request for data at address 0x00000010 ... .scan_dispatch_log+0xb0/0x170 .account_system_vtime+0xa0/0x220 .irq_enter+0x88/0xc0 .do_IRQ+0x48/0x230 The patch below adds a check to scan_dispatch_log to ensure the dispatch log has been allocated. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
963e5d3b |
|
28-Mar-2011 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Make decrementer interrupt robust against offlined CPUs With some implementations, it is possible that a timer interrupt occurs every few seconds on an offline CPU. In this case, just re-arm the decrementer and return immediately Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
ad5d1c88 |
|
20-Mar-2011 |
Anton Blanchard <anton@samba.org> |
powerpc: Fix accounting of softirq time when idle commit cf9efce0ce31 (powerpc: Account time using timebase rather than PURR) used in_irq() to detect if the time was spent in interrupt processing. This only catches hardirq context so if we are in softirq context and in the idle loop we end up accounting it as idle time. If we instead use in_interrupt() we catch both softirq and hardirq time. The issue was found when running a network intensive workload. top showed the following: 0.0%us, 1.1%sy, 0.0%ni, 85.7%id, 0.0%wa, 9.9%hi, 3.3%si, 0.0%st 85.7% idle. But this was wildly different to the perf events data. To confirm the suspicion I ran something to keep the core busy: # yes > /dev/null & 8.2%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 10.3%hi, 81.4%si, 0.0%st We only got 8.2% of the CPU for the userspace task and softirq has shot up to 81.4%. With the patch below top shows the correct stats: 0.0%us, 0.0%sy, 0.0%ni, 5.3%id, 0.0%wa, 13.3%hi, 81.3%si, 0.0%st Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
b18ae08d |
|
02-Jan-2011 |
Tejun Heo <tj@kernel.org> |
powerpc/cell: Use system_wq in cpufreq_spudemand With cmwq, there's no reason to use a separate workqueue in cpufreq_spudemand. Use system_wq instead. The work items are already sync canceled on stop, so it's already guaranteed that no work is running when spu_gov_exit() is entered. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linuxppc-dev@lists.ozlabs.org Cc: Dave Jones <davej@redhat.com> Cc: cpufreq@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
364a1246 |
|
22-Nov-2010 |
Heiko Schocher <hs@denx.de> |
powerpc/time: printk time stamp init not correct problem: I see sometimes on my mpc5200 based board such printk timing information: [ 0.000000] NR_IRQS:512 nr_irqs:512 16 [ 0.000000] MPC52xx PIC is up and running! [ 0.000000] clocksource: timebase mult[79364d9] shift[22] registered [ 0.000000] console [ttyPSC0] enabled [ 130.300633] pid_max: default: 32768 minimum: 301 [ 130.305647] Mount-cache hash table entries: 512 [ 130.315818] NET: Registered protocol family 16 reason: if the tbu not starts from 0 when linux boots, boot_tb maybe could not store the real 64 bit tbu value, because boot_tp is only a 32 bit unsigned long. solution: change boot_tb to u64 [BenH: Made it u64 instead of unsigned long long] Signed-off-by: Heiko Schocher <hs@denx.de> cc: Wolfgang Denk <wd@denx.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
e360adbe |
|
14-Oct-2010 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
irq_work: Add generic hardirq context callbacks Provide a mechanism that allows running code in IRQ context. It is most useful for NMI code that needs to interact with the rest of the system -- like wakeup a task to drain buffers. Perf currently has such a mechanism, so extract that and provide it as a generic feature, independent of perf so that others may also benefit. The IRQ context callback is generated through self-IPIs where possible, or on architectures like powerpc the decrementer (the built-in timer facility) is set to generate an interrupt immediately. Architectures that don't have anything like this get to do with a callback from the timer tick. These architectures can call irq_work_run() at the tail of any IRQ handlers that might enqueue such work (like the perf IRQ handler) to avoid undue latencies in processing the work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
55ec2fca |
|
20-Sep-2010 |
Timur Tabi <timur@freescale.com> |
powerpc: export ppc_proc_freq and ppc_tb_freq as GPL symbols Export the global variable 'ppc_tb_freq', so that modules (like the Book-E watchdog driver) can use it. To maintain consistency, ppc_proc_freq is changed to a GPL-only export. This is okay, because any module that needs this symbol should be an actual Linux driver, which must be GPL-licensed. Signed-off-by: Timur Tabi <timur@freescale.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
#
872e439a |
|
30-Aug-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc/pseries: Re-enable dispatch trace log userspace interface Since the cpu accounting code uses the hypervisor dispatch trace log now when CONFIG_VIRT_CPU_ACCOUNTING = y, the previous commit disabled access to it via files in the /sys/kernel/debug/powerpc/dtl/ directory in that case. This restores those files. To do this, we now have a hook that the cpu accounting code will call as it processes each entry from the hypervisor dispatch trace log. The code in dtl.c now uses that to fill up its ring buffer, rather than having the hypervisor fill the ring buffer directly. This also fixes dtl_file_read() to handle overflow conditions a bit better and adds a spinlock to ensure that race conditions (multiple processes opening or reading the file concurrently) are handled correctly. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
cf9efce0 |
|
26-Aug-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc: Account time using timebase rather than PURR Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the PURR register for measuring the user and system time used by processes, as well as other related times such as hardirq and softirq times. This turns out to be quite confusing for users because it means that a program will often be measured as taking less time when run on a multi-threaded processor (SMT2 or SMT4 mode) than it does when run on a single-threaded processor (ST mode), even though the program takes longer to finish. The discrepancy is accounted for as stolen time, which is also confusing, particularly when there are no other partitions running. This changes the accounting to use the timebase instead, meaning that the reported user and system times are the actual number of real-time seconds that the program was executing on the processor thread, regardless of which SMT mode the processor is in. Thus a program will generally show greater user and system times when run on a multi-threaded processor than on a single-threaded processor. On pSeries systems on POWER5 or later processors, we measure the stolen time (time when this partition wasn't running) using the hypervisor dispatch trace log. We check for new entries in the log on every entry from user mode and on every transition from kernel process context to soft or hard IRQ context (i.e. when account_system_vtime() gets called). So that we can correctly distinguish time stolen from user time and time stolen from system time, without having to check the log on every exit to user mode, we store separate timestamps for exit to user mode and entry from user mode. On systems that have a SPURR (POWER6 and POWER7), we read the SPURR in account_system_vtime() (as before), and then apportion the SPURR ticks since the last time we read it between scaled user time and scaled system time according to the relative proportions of user time and system time over the same interval. This avoids having to read the SPURR on every kernel entry and exit. On systems that have PURR but not SPURR (i.e., POWER5), we do the same using the PURR rather than the SPURR. This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl for now since it conflicts with the use of the dispatch trace log by the time accounting code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
b0d278b7 |
|
10-Aug-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc/perf_event: Reduce latency of calling perf_event_do_pending Commit 0fe1ac48 ("powerpc/perf_event: Fix oops due to perf_event_do_pending call") moved the call to perf_event_do_pending in timer_interrupt() down so that it was after the irq_enter() call. Unfortunately this moved it after the code that checks whether it is time for the next decrementer clock event. The result is that the call to perf_event_do_pending() won't happen until the next decrementer clock event is due. This was pointed out by Milton Miller. This fixes it by moving the check for whether it's time for the next decrementer clock event down to the point where we're about to call the event handler, after we've called perf_event_do_pending. This has the side effect that on old pre-Core99 Powermacs where we use the ppc_n_lost_interrupts mechanism to replay interrupts, a replayed interrupt will incur a little more latency since it will now do the code from the irq_enter down to the irq_exit, that it used to skip. However, these machines are now old and rare enough that this doesn't matter. To make it clear that ppc_n_lost_interrupts is only used on Powermacs, and to speed up the code slightly on non-Powermac ppc32 machines, the code that tests ppc_n_lost_interrupts is now conditional on CONFIG_PMAC as well as CONFIG_PPC32. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
d75d68cf |
|
20-Jun-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc: Clean up obsolete code relating to decrementer and timebase Since the decrementer and timekeeping code was moved over to using the generic clockevents and timekeeping infrastructure, several variables and functions have been obsolete and effectively unused. This deletes them. In particular, wakeup_decrementer() is no longer needed since the generic code reprograms the decrementer as part of the process of resuming the timekeeping code, which happens during sysdev resume. Thus the wakeup_decrementer calls in the suspend_enter methods for 52xx platforms have been removed. The call in the powermac cpu frequency change code has been replaced by set_dec(1), which will cause a timer interrupt as soon as interrupts are enabled, and the generic code will then reprogram the decrementer with the correct value. This also simplifies the generic_suspend_en/disable_irqs functions and makes them static since they are not referenced outside time.c. The preempt_enable/disable calls are removed because the generic code has disabled all but the boot cpu at the point where these functions are called, so we can't be moved to another cpu. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0e469db8 |
|
20-Jun-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc: Rework VDSO gettimeofday to prevent time going backwards Currently it is possible for userspace to see the result of gettimeofday() going backwards by 1 microsecond, assuming that userspace is using the gettimeofday() in the VDSO. The VDSO gettimeofday() algorithm computes the time in "xsecs", which are units of 2^-20 seconds, or approximately 0.954 microseconds, using the algorithm now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec and then converts the time in xsecs to seconds and microseconds. The kernel updates the tb_orig_stamp and stamp_xsec values every tick in update_vsyscall(). If the length of the tick is not an integer number of xsecs, then some precision is lost in converting the current time to xsecs. For example, with CONFIG_HZ=1000, the tick is 1ms long, which is 1048.576 xsecs. That means that stamp_xsec will advance by either 1048 or 1049 on each tick. With the right conditions, it is possible for userspace to get (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is slightly late in updating the vdso_datapage, and then for stamp_xsec to advance by 1048 when the kernel does update it, and for userspace to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to integer truncation. The result is that time appears to go backwards by 1 microsecond. To fix this we change the VDSO gettimeofday to use a new field in the VDSO datapage which stores the nanoseconds part of the time as a fractional number of seconds in a 0.32 binary fraction format. (Or put another way, as a 32-bit number in units of 0.23283 ns.) This is convenient because we can use the mulhwu instruction to convert it to either microseconds or nanoseconds. Since it turns out that computing the time of day using this new field is simpler than either using stamp_xsec (as gettimeofday does) or stamp_xtime.tv_nsec (as clock_gettime does), this converts both gettimeofday and clock_gettime to use the new field. The existing __do_get_tspec function is converted to use the new field and take a parameter in r7 that indicates the desired resolution, 1,000,000 for microseconds or 1,000,000,000 for nanoseconds. The __do_get_xsec function is then unused and is deleted. The new algorithm is now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs + (stamp_xtime_seconds << 32) + stamp_sec_fraction with 'now' in units of 2^-32 seconds. That is then converted to seconds and either microseconds or nanoseconds with seconds = now >> 32 partseconds = ((now & 0xffffffff) * resolution) >> 32 The 32-bit VDSO code also makes a further simplification: it ignores the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary fraction. Doing so gets rid of 4 multiply instructions. Assuming a timebase frequency of 1GHz or less and an update interval of no more than 10ms, the upper 32 bits of tb_to_xs will be at least 4503599, so the error from ignoring the low 32 bits will be at most 2.2ns, which is more than an order of magnitude less than the time taken to do gettimeofday or clock_gettime on our fastest processors, so there is no possibility of seeing inconsistent values due to this. This also moves update_gtod() down next to its only caller, and makes update_vsyscall use the time passed in via the wall_time argument rather than accessing xtime directly. At present, wall_time always points to xtime, but that could change in future. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7615856e |
|
13-Jul-2010 |
John Stultz <johnstul@us.ibm.com> |
timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset update_vsyscall() did not provide the wall_to_monotoinc offset, so arch specific implementations tend to reference wall_to_monotonic directly. This limits future cleanups in the timekeeping core, so this patch fixes the update_vsyscall interface to provide wall_to_monotonic, allowing wall_to_monotonic to be made static as planned in Documentation/feature-removal-schedule.txt Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
06d518e3 |
|
13-Jul-2010 |
John Stultz <johnstul@us.ibm.com> |
powerpc: Cleanup xtime usage This removes powerpc's direct xtime usage, allowing for further generic timeekeping cleanups Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> LKML-Reference: <1279068988-21864-6-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
b0797b60 |
|
13-Jul-2010 |
John Stultz <johnstul@us.ibm.com> |
powerpc: Simplify update_vsyscall Currently powerpc's update_vsyscall calls an inline update_gtod. However, both are straightforward, and there are no other users, so this patch merges update_gtod into update_vsyscall. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1279068988-21864-5-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
c1aa687d |
|
20-Jun-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc: Clean up obsolete code relating to decrementer and timebase Since the decrementer and timekeeping code was moved over to using the generic clockevents and timekeeping infrastructure, several variables and functions have been obsolete and effectively unused. This deletes them. In particular, wakeup_decrementer() is no longer needed since the generic code reprograms the decrementer as part of the process of resuming the timekeeping code, which happens during sysdev resume. Thus the wakeup_decrementer calls in the suspend_enter methods for 52xx platforms have been removed. The call in the powermac cpu frequency change code has been replaced by set_dec(1), which will cause a timer interrupt as soon as interrupts are enabled, and the generic code will then reprogram the decrementer with the correct value. This also simplifies the generic_suspend_en/disable_irqs functions and makes them static since they are not referenced outside time.c. The preempt_enable/disable calls are removed because the generic code has disabled all but the boot cpu at the point where these functions are called, so we can't be moved to another cpu. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8fd63a9e |
|
20-Jun-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc: Rework VDSO gettimeofday to prevent time going backwards Currently it is possible for userspace to see the result of gettimeofday() going backwards by 1 microsecond, assuming that userspace is using the gettimeofday() in the VDSO. The VDSO gettimeofday() algorithm computes the time in "xsecs", which are units of 2^-20 seconds, or approximately 0.954 microseconds, using the algorithm now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec and then converts the time in xsecs to seconds and microseconds. The kernel updates the tb_orig_stamp and stamp_xsec values every tick in update_vsyscall(). If the length of the tick is not an integer number of xsecs, then some precision is lost in converting the current time to xsecs. For example, with CONFIG_HZ=1000, the tick is 1ms long, which is 1048.576 xsecs. That means that stamp_xsec will advance by either 1048 or 1049 on each tick. With the right conditions, it is possible for userspace to get (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is slightly late in updating the vdso_datapage, and then for stamp_xsec to advance by 1048 when the kernel does update it, and for userspace to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to integer truncation. The result is that time appears to go backwards by 1 microsecond. To fix this we change the VDSO gettimeofday to use a new field in the VDSO datapage which stores the nanoseconds part of the time as a fractional number of seconds in a 0.32 binary fraction format. (Or put another way, as a 32-bit number in units of 0.23283 ns.) This is convenient because we can use the mulhwu instruction to convert it to either microseconds or nanoseconds. Since it turns out that computing the time of day using this new field is simpler than either using stamp_xsec (as gettimeofday does) or stamp_xtime.tv_nsec (as clock_gettime does), this converts both gettimeofday and clock_gettime to use the new field. The existing __do_get_tspec function is converted to use the new field and take a parameter in r7 that indicates the desired resolution, 1,000,000 for microseconds or 1,000,000,000 for nanoseconds. The __do_get_xsec function is then unused and is deleted. The new algorithm is now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs + (stamp_xtime_seconds << 32) + stamp_sec_fraction with 'now' in units of 2^-32 seconds. That is then converted to seconds and either microseconds or nanoseconds with seconds = now >> 32 partseconds = ((now & 0xffffffff) * resolution) >> 32 The 32-bit VDSO code also makes a further simplification: it ignores the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary fraction. Doing so gets rid of 4 multiply instructions. Assuming a timebase frequency of 1GHz or less and an update interval of no more than 10ms, the upper 32 bits of tb_to_xs will be at least 4503599, so the error from ignoring the low 32 bits will be at most 2.2ns, which is more than an order of magnitude less than the time taken to do gettimeofday or clock_gettime on our fastest processors, so there is no possibility of seeing inconsistent values due to this. This also moves update_gtod() down next to its only caller, and makes update_vsyscall use the time passed in via the wall_time argument rather than accessing xtime directly. At present, wall_time always points to xtime, but that could change in future. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0fe1ac48 |
|
13-Apr-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc/perf_event: Fix oops due to perf_event_do_pending call Anton Blanchard found that large POWER systems would occasionally crash in the exception exit path when profiling with perf_events. The symptom was that an interrupt would occur late in the exit path when the MSR[RI] (recoverable interrupt) bit was clear. Interrupts should be hard-disabled at this point but they were enabled. Because the interrupt was not recoverable the system panicked. The reason is that the exception exit path was calling perf_event_do_pending after hard-disabling interrupts, and perf_event_do_pending will re-enable interrupts. The simplest and cleanest fix for this is to use the same mechanism that 32-bit powerpc does, namely to cause a self-IPI by setting the decrementer to 1. This means we can remove the tests in the exception exit path and raw_local_irq_restore. This also makes sure that the call to perf_event_do_pending from timer_interrupt() happens within irq_enter/irq_exit. (Note that calling perf_event_do_pending from timer_interrupt does not mean that there is a possible 1/HZ latency; setting the decrementer to 1 ensures that the timer interrupt will happen immediately, i.e. within one timebase tick, which is a few nanoseconds or 10s of nanoseconds.) Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
89713ed1 |
|
31-Jan-2010 |
Anton Blanchard <anton@samba.org> |
powerpc: Add timer, performance monitor and machine check counts to /proc/interrupts With NO_HZ it is useful to know how often the decrementer is going off. The patch below adds an entry for it and also adds it into the /proc/stat summaries. While here, I added performance monitoring and machine check exceptions. I found it useful to keep an eye on the PMU exception rate when using the perf tool. Since it's possible to take a completely handled machine check on a System p box it also sounds like a good idea to keep a machine check summary. The event naming matches x86 to keep gratuitous differences to a minimum. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
b919ee82 |
|
07-Feb-2010 |
Anton Blanchard <anton@samba.org> |
powerpc: Only print clockevent settings once The clockevent multiplier and shift is useful information, but we only need to print it once. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
61c03ddb |
|
12-Jan-2010 |
Anton Blanchard <anton@samba.org> |
powerpc: Replace per_cpu(, smp_processor_id()) with __get_cpu_var() The cputime code has a few places that do per_cpu(, smp_processor_id()). Replace them with __get_cpu_var(). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
3e7b4843 |
|
11-Jan-2010 |
Stefan Roese <sr@denx.de> |
powerpc: Fix decrementer setup on 1GHz boards We noticed that recent kernels didn't boot on our 1GHz Canyonlands 460EX boards anymore. As it seems, patch 8d165db1 [powerpc: Improve decrementer accuracy] introduced this problem. The routine div_sc() overflows with shift = 32 resulting in this incorrect setup: time_init: decrementer frequency = 1000.000012 MHz time_init: processor frequency = 1000.000012 MHz clocksource: timebase mult[400000] shift[22] registered clockevent: decrementer mult[33] shift[32] cpu[0] This patch now introduces a local div_dc64() version of this function so that this overflow doesn't happen anymore. Signed-off-by: Stefan Roese <sr@denx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Detlev Zundel <dzu@denx.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0696b711 |
|
16-Nov-2009 |
Lin Ming <ming.m.lin@intel.com> |
timekeeping: Fix clock_gettime vsyscall time warp Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier to struct timekeeper" the clock multiplier of vsyscall is updated with the unmodified clock multiplier of the clock source and not with the NTP adjusted multiplier of the timekeeper. This causes user space observerable time warps: new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf Add a new argument "mult" to update_vsyscall() and hand in the timekeeping internal NTP adjusted multiplier. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a362c638 |
|
13-Nov-2009 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource/events: Fix fallout of generic code changes powerpc grew a new warning due to the type change of clockevent->mult. The architectures which use parts of the generic time keeping infrastructure tripped over my wrong assumption that clocksource_register is only used when GENERIC_TIME=y. I should have looked and also I should have known better. These renitent Gaul villages are racking my nerves. Some serious deprecating is due. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
978d7eb3 |
|
01-Nov-2009 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Avoid giving out RTC dates below EPOCH Doing so causes xtime to be negative which crashes the timekeeping code in funny ways when doing suspend/resume Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
4ab79aa8 |
|
29-Oct-2009 |
Alexander Graf <agraf@suse.de> |
Export symbols for KVM module We want to be able to build KVM as a module. To enable us doing so, we need some more exports from core Linux parts. This patch exports all functions and variables that are required for KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
6795b85c |
|
26-Oct-2009 |
Anton Blanchard <anton@samba.org> |
powerpc: tracing: Add powerpc tracepoints for timer entry and exit We can monitor the effectiveness of our power management of both the kernel and hypervisor by probing the timer interrupt. For example, on this box we see 10.37s timer interrupts on an idle core: <idle>-0 [010] 3900.671297: timer_interrupt_entry: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3900.671302: timer_interrupt_exit: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3911.042963: timer_interrupt_entry: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3911.042968: timer_interrupt_exit: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3921.414630: timer_interrupt_entry: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3921.414635: timer_interrupt_exit: pt_regs=c0000000ce1e7b10 Since we have a 207MHz decrementer it will go negative and fire every 10.37s even if Linux is completely idle. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
cdd6c482 |
|
20-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
77c0a700 |
|
27-Aug-2009 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Properly start decrementer on BookE secondary CPUs This moves the code to start the decrementer on 40x and BookE into a separate function which is now called from time_init() and secondary_time_init(), before the respective clock sources are registered. We also remove the 85xx specific code for doing it from the platform code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
d90246cd |
|
22-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Increase granularity of read_persistent_clock(), build fix Fix the following build problem on powerpc: arch/powerpc/kernel/time.c: In function 'read_persistent_clock': arch/powerpc/kernel/time.c:788: error: 'return' with a value, in function returning void arch/powerpc/kernel/time.c:791: error: 'return' with a value, in function returning void Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: dwalker@fifo99.com Cc: johnstul@us.ibm.com LKML-Reference: <20090822222313.74b9619c@skybase> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
14ea58ad |
|
01-Aug-2009 |
Julia Lawall <julia@diku.dk> |
powerpc: Use DIV_ROUND_CLOSEST in time init code The kernel.h macro DIV_ROUND_CLOSEST performs the computation (x + d/2)/d but is perhaps more readable. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @haskernel@ @@ #include <linux/kernel.h> @depends on haskernel@ expression x,__divisor; @@ - (((x) + ((__divisor) / 2)) / (__divisor)) + DIV_ROUND_CLOSEST(x,__divisor) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
d4f587c6 |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Increase granularity of read_persistent_clock() The persistent clock of some architectures (e.g. s390) have a better granularity than seconds. To reduce the delta between the host clock and the guest clock in a virtualized system change the read_persistent_clock function to return a struct timespec. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134811.013873340@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a42548a1 |
|
28-Jul-2009 |
Stanislaw Gruszka <sgruszka@redhat.com> |
cputime: Optimize jiffies_to_cputime(1) For powerpc with CONFIG_VIRT_CPU_ACCOUNTING jiffies_to_cputime(1) is not compile time constant and run time calculations are quite expensive. To optimize we use precomputed value. For all other architectures is is preprocessor definition. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <1248862529-6063-5-git-send-email-sgruszka@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
105988c0 |
|
17-Jun-2009 |
Paul Mackerras <paulus@samba.org> |
perf_counter: powerpc: Enable use of software counters on 32-bit powerpc This enables the perf_counter subsystem on 32-bit powerpc. Since we don't have any support for hardware counters on 32-bit powerpc yet, only software counters can be used. Besides selecting HAVE_PERF_COUNTERS for 32-bit powerpc as well as 64-bit, the main thing this does is add an implementation of set_perf_counter_pending(). This needs to arrange for perf_counter_do_pending() to be called when interrupts are enabled. Rather than add code to local_irq_restore as 64-bit does, the 32-bit set_perf_counter_pending() generates an interrupt by setting the decrementer to 1 so that a decrementer interrupt will become pending in 1 or 2 timebase ticks (if a decrementer interrupt isn't already pending). When interrupts are enabled, timer_interrupt() will be called, and some new code in there calls perf_counter_do_pending(). We use a per-cpu array of flags to indicate whether we need to call perf_counter_do_pending() or not. This introduces a couple of new Kconfig symbols: PPC_HAVE_PMU_SUPPORT, which is selected by processor families for which we have hardware PMU support (currently only PPC64), and PPC_PERF_CTRS, which enables the powerpc-specific perf_counter back-end. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linuxppc-dev@ozlabs.org Cc: benh@kernel.crashing.org LKML-Reference: <19000.55404.103840.393470@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
177996e6 |
|
09-Jun-2009 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Don't do generic calibrate_delay() Currently we are wasting time calling the generic calibrate_delay() function. We don't need it since our implementation of __delay() is based on the CPU timebase. So instead, we use our own small implementation that initializes loops_per_jiffy to something sensible to make the few users like spinlock debug be happy Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8d165db1 |
|
10-May-2009 |
Anton Blanchard <anton@samba.org> |
powerpc: Improve decrementer accuracy I have been looking at sources of OS jitter and notice that after a long NO_HZ idle period we wakeup too early: relative time (us) event timer irq exit 999946.405 timer irq entry 4.835 timer irq exit 21.685 timer irq entry 3.540 timer (tick_sched_timer) entry Here we slept for just under a second then took a timer interrupt that did nothing. 21.685 us later we wake up again and do the work. We set a rather low shift value of 16 for the decrementer clockevent, which I think is causing this issue. On this box we have a 207MHz decrementer and see: clockevent: decrementer mult[3501] shift[16] cpu[0] For calculations of large intervals this mult/shift combination could be off by a significant amount. I notice the sparc code has a loop that iterates to find a mult/shift combination that maximises the shift value while keeping mult under 32bit. With the patch below we get: clockevent: decrementer mult[35015c20] shift[32] cpu[15] And we no longer see the spurious wakeups. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8e19608e |
|
21-Apr-2009 |
Magnus Damm <damm@igel.co.jp> |
clocksource: pass clocksource to read() callback Pass clocksource pointer to the read() callback for clocksources. This allows us to share the callback between multiple instances. [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods] [akpm@linux-foundation.org: cleanup] Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bcd68a70 |
|
19-Feb-2009 |
Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> |
powerpc: Hook up rtc-generic, and kill rtc-ppc PowerPC has been a long time user of the generic RTC abstraction, so hook up rtc-generic: - Create the "rtc-generic" platform device if ppc_md.get_rtc_time is set, - Kill rtc-ppc, as rtc-generic offers the same functionality in a more generic way, and supports autoloading through udev. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
#
79741dd3 |
|
31-Dec-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[PATCH] idle cputime accounting The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work. The first is accounted with account_idle_time and the second with account_system_time. The architectures that use the account_xxx_time interface directly and not the account_xxx_ticks interface now need to do the check for the idle process in their arch code. In particular to improve the system vs true idle time accounting the arch code needs to measure the true idle time instead of just testing for the idle process. To improve the tick based accounting as well we would need an architecture primitive that can tell us if the pt_regs of the interrupted context points to the magic instruction that halts the cpu. In addition idle time is no more added to the stime of the idle process. This field now contains the system time of the idle process as it should be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as every tick that occurs while idle is running will be accounted as idle time. This patch contains the necessary common code changes to be able to distinguish idle system time and true idle time. The architectures with support for VIRT_CPU_ACCOUNTING need some changes to exploit this. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
457533a7 |
|
31-Dec-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[PATCH] fix scaled & unscaled cputime accounting The utimescaled / stimescaled fields in the task structure and the global cpustat should be set on all architectures. On s390 the calls to account_user_time_scaled and account_system_time_scaled never have been added. In addition system time that is accounted as guest time to the user time of a process is accounted to the scaled system time instead of the scaled user time. To fix the bugs and to prevent future forgetfulness this patch merges account_system_time_scaled into account_system_time and account_user_time_scaled into account_user_time. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Michael Neuling <mikey@neuling.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
320ab2b0 |
|
13-Dec-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
cpumask: convert struct clock_event_device to cpumask pointers. Impact: change calling convention of existing clock_event APIs struct clock_event_timer's cpumask field gets changed to take pointer, as does the ->broadcast function. Another single-patch change. For safety, we BUG_ON() in clockevents_register_device() if it's not set. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>
|
#
3cc69878 |
|
27-Oct-2008 |
Paul Mackerras <paulus@samba.org> |
powerpc: Eliminate unused do_gtod variable Since we started using the generic timekeeping code, we haven't had a powerpc-specific version of do_gettimeofday, and hence there is now nothing that reads the do_gtod variable in arch/powerpc/kernel/time.c. This therefore removes it and the code that sets it. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
597bc5c0 |
|
27-Oct-2008 |
Paul Mackerras <paulus@samba.org> |
powerpc: Improve resolution of VDSO clock_gettime Currently the clock_gettime implementation in the VDSO produces a result with microsecond resolution for the cases that are handled without a system call, i.e. CLOCK_REALTIME and CLOCK_MONOTONIC. The nanoseconds field of the result is obtained by computing a microseconds value and multiplying by 1000. This changes the code in the VDSO to do the computation for clock_gettime with nanosecond resolution. That means that the resolution of the result will ultimately depend on the timebase frequency. Because the timestamp in the VDSO datapage (stamp_xsec, the real time corresponding to the timebase count in tb_orig_stamp) is in units of 2^-20 seconds, it doesn't have sufficient resolution for computing a result with nanosecond resolution. Therefore this adds a copy of xtime to the VDSO datapage and updates it in update_gtod() along with the other time-related fields. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
ddb107e9 |
|
27-Jun-2008 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc/booke: don't reinitialize time base For some reason long ago I decided that we should zero out the time base when we calibrate the decrementer. The problem is that this can be harmful in SMP systems where the firmware has already synchronized the time bases on the various cores. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
#
15c8b6c1 |
|
09-May-2008 |
Jens Axboe <jens.axboe@oracle.com> |
on_each_cpu(): kill unused 'retry' parameter It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
#
1c21a293 |
|
07-May-2008 |
Michael Ellerman <michael@ellerman.id.au> |
[POWERPC] Fix sparse warnings in arch/powerpc/kernel Make a few things static in lparcfg.c Make init and exit routines static in rtas_flash.c Make things static in rtas_pci.c Make some functions static in rtas.c Make fops static in rtas-proc.c Remove unneeded extern for do_gtod in smp.c Make clocksource_init() static in time.c Make last_tick_len and ticklen_to_xs static in time.c Move the declaration of the pvr per-cpu into smp.h Make kexec_smp_down() and kexec_stack static in machine_kexec_64.c Don't return void in arch_teardown_msi_irqs() in msi.c Move declaration of GregorianDay()into asm/time.h Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
7fc5c784 |
|
01-May-2008 |
Roman Zippel <zippel@linux-m68k.org> |
ntp: rename TICK_LENGTH_SHIFT to NTP_SCALE_SHIFT As TICK_LENGTH_SHIFT is used for more than just the tick length, the name isn't quite approriate anymore, so this renames it to NTP_SCALE_SHIFT. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
074b3b87 |
|
01-May-2008 |
Roman Zippel <zippel@linux-m68k.org> |
ntp: increase time_freq resolution This changes time_freq to a 64bit value and makes it static (the only outside user had no real need to modify it). Intermediate values were already 64bit, so the change isn't that big, but it saves a little in shifts by replacing SHIFT_NSEC with TICK_LENGTH_SHIFT. PPM_SCALE is then used to convert between user space and kernel space representation. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
06b8e878 |
|
06-Feb-2008 |
Michael Neuling <mikey@neuling.org> |
taskstats scaled time cleanup This moves the ability to scale cputime into generic code. This allows us to fix the issue in kernel/timer.c (noticed by Balbir) where we could only add an unscaled value to the scaled utime/stime. This adds a cputime_to_scaled function. As before, the POWERPC version does the scaling based on the last SPURR/PURR ratio calculated. The generic and s390 (only other arch to implement asm/cputime.h) versions are both NOPs. Also moves the SPURR and PURR snapshots closer. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7ac5dde9 |
|
12-Dec-2007 |
Scott Wood <scottwood@freescale.com> |
[POWERPC] Implement arch disable/enable irq hooks. These hooks ensure that a decrementer interrupt is not pending when suspending; otherwise, problems may occur on 6xx/7xx/7xxx-based systems (except for powermacs, which use a separate suspend path). For example, with deep sleep on the 831x, a pending decrementer will cause a system freeze because the SoC thinks the decrementer interrupt would have woken the system, but the core must have interrupts disabled due to the setup required for deep sleep. Changed via-pmu.c to use the new ppc_md hooks, and made the arch_* functions call the generic_* functions unconditionally. -- paulus Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
53024fe2 |
|
13-Dec-2007 |
Milton Miller <miltonm@bga.com> |
[POWERPC] Optimize account_system_vtime We have multiple calls to has_feature being inlined, but gcc can't be sure that the store via get_paca() doesn't alias the path to cur_cpu_spec->feature. Reorder to put the calls to read_purr and read_spurr adjacent to each other. To add a sense of consistency, reorder the remaining lines to perform parallel steps on purr and scaled purr of each line instead of calculating and then using one value before going on to the next. In addition, we can tell gcc that no SPURR means no PURR. The test is completely hidden in the PURR case, and in the !PURR case the second test is eliminated resulting in the simple register copy in the out-of-line branch. Further, gcc sees get_paca()->system_time referenced several times and allocates a register to address it (shadowing r13) instead of caching its value. Reading into a local varable saves the shadow of r13 and removes a potentially duplicate load (between the nested if and its parent). Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
db3801a8 |
|
13-Dec-2007 |
Milton Miller <miltonm@bga.com> |
[POWERPC] Depend on ->initialized in calc_steal_time If CPU_FTR_PURR is not set, we will never set cpu_purr_data->initialized. Checking via __get_cpu_var on 64 bit avoids one dependent load compared to cpu_has_feature in the not-present case, and is always required when it is present. The code is under CONFIG_VIRT_CPU_ACCOUNTING so 32 bit will not be affected. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
6e6b44e8 |
|
13-Dec-2007 |
Milton Miller <miltonm@bga.com> |
[POWERPC] Timer interrupt: use a struct for two per_cpu varables timer_interrupt() was calculating per_cpu_offset several times, having to start from the toc because of potential aliasing issues. Placing both decrementer per_cpu varables in a struct and calculating the address once with __get_cpu_var results in better code on both 32 and 64 bit. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
8b5621f1 |
|
13-Dec-2007 |
Milton Miller <miltonm@bga.com> |
[POWERPC] Use __get_cpu_var in time.c Use __get_cpu_var(x) instead of per_cpu(x, smp_processor_id()), as it is optimized on ppc64 to access the current cpu's per-cpu offset directly; it's local_paca.offset instead of TOC->paca[local_paca->processor_id].offset. This is the trivial portion, two functions with one use each. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
c481887f |
|
13-Dec-2007 |
Milton Miller <miltonm@bga.com> |
[POWERPC] init_decrementer_clockevent can be static __init as its only called from time_init, which is __init. Also remove unneeded forward declaration. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
2b46b567 |
|
19-Nov-2007 |
Michael Neuling <mikey@neuling.org> |
[POWERPC] Fix possible division by zero in scaled time accounting If we get no user time and no system time allocated since the last account_system_vtime, the system to user time ratio estimate can end up dividing by zero. This was causing a problem noticed by Balbir Singh. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
0302f12e |
|
11-Nov-2007 |
Tony Breeds <tony@bakeyournoodle.com> |
[POWERPC] Demote clockevent printk to KERN_DEBUG These don't need to be seen by everyone on every boot. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
fa13a5a1 |
|
09-Nov-2007 |
Paul Mackerras <paulus@samba.org> |
sched: restore deterministic CPU accounting on powerpc Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been broken on powerpc, because we end up counting user time twice: once in timer_interrupt() and once in update_process_times(). This fixes the problem by pulling the code in update_process_times that updates utime and stime into a separate function called account_process_tick. If CONFIG_VIRT_CPU_ACCOUNTING is not defined, there is a version of account_process_tick in kernel/timer.c that simply accounts a whole tick to either utime or stime as before. If CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to implement account_process_tick. This also lets us simplify the s390 code a bit; it means that the s390 timer interrupt can now call update_process_times even when CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a suitable account_process_tick(). account_process_tick() now takes the task_struct * as an argument. Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
43875cc0 |
|
31-Oct-2007 |
Paul Mackerras <paulus@samba.org> |
[POWERPC] Fix off-by-one error in setting decrementer on Book E/4xx (v2) The decrementer in Book E and 4xx processors interrupts on the transition from 1 to 0, rather than on the 0 to -1 transition as on 64-bit server and 32-bit "classic" (6xx/7xx/7xxx) processors. At the moment we subtract 1 from the count of how many decrementer ticks are required before the next interrupt before putting it into the decrementer, which is correct for server/classic processors, but could possibly cause the interrupt to happen too early on Book E and 4xx if the timebase/decrementer frequency is low. This fixes the problem by making set_dec subtract 1 from the count for server and classic processors, instead of having the callers subtract 1. Since set_dec already had a bunch of ifdefs to handle different processor types, there is no net increase in ugliness. :) Note that calling set_dec(0) may not generate an interrupt on some processors. To make sure that decrementer_set_next_event always calls set_dec with an interval of at least 1 tick, we set min_delta_ns of the decrementer_clockevent to correspond to 2 ticks (2 rather than 1 to compensate for truncations in the conversions between ticks and ns). This also removes a redundant call to set the decrementer to 0x7fffffff - it was already set to that earlier in timer_interrupt. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
4603ac18 |
|
18-Oct-2007 |
Michael Neuling <mikey@neuling.org> |
powerpc: add scaled time accounting This adds POWERPC specific hooks for scaled time accounting. POWER6 includes a SPURR register. The SPURR is based off the PURR register but is scaled based on CPU frequency and issue rates. This gives a more accurate account of the instructions used per task. The PURR and timebase will be constant relative to the wall clock, irrespective of the CPU frequency. This implementation reads the SPURR register in account_system_vtime which is only call called on context witch and hard and soft irq entry and exit. The percentage of user and system time is then estimated using the ratio of these accounted by the PURR. If the SPURR is not present, the PURR read. An earlier implementation of this patch read the SPURR whenever the PURR was read, which included the system call entry and exit path. Unfortunately this showed a performance regression on lmbench runs, so was re-implemented. I've included the lmbench results here when run bare metal on POWER6. 1st column is the unpatch results. 2nd column is the results using the below patch and the 3rd is the % diff of these results from the base. 4th and 5th columns are the results and % differnce from the base using the older patch (SPURR read in syscall entry/exit path). Base Scaled-Acct SPURR-in-syscall Result Result % diff Result % diff Simple syscall: 0.3086 0.3086 0.0000 0.3452 11.8600 Simple read: 0.4591 0.4671 1.7425 0.5044 9.86713 Simple write: 0.4364 0.4366 0.0458 0.4731 8.40971 Simple stat: 2.0055 2.0295 1.1967 2.0669 3.06158 Simple fstat: 0.5962 0.5876 -1.442 0.6368 6.80979 Simple open/close: 3.1283 3.1009 -0.875 3.2088 2.57328 Select on 10 fd's: 0.8554 0.8457 -1.133 0.8667 1.32101 Select on 100 fd's: 3.5292 3.6329 2.9383 3.6664 3.88756 Select on 250 fd's: 7.9097 8.1881 3.5197 8.2242 3.97613 Select on 500 fd's: 15.2659 15.836 3.7357 15.873 3.97814 Select on 10 tcp fd's: 0.9576 0.9416 -1.670 0.9752 1.83792 Select on 100 tcp fd's: 7.248 7.2254 -0.311 7.2685 0.28283 Select on 250 tcp fd's: 17.7742 17.707 -0.375 17.749 -0.1406 Select on 500 tcp fd's: 35.4258 35.25 -0.496 35.286 -0.3929 Signal handler installation: 0.6131 0.6075 -0.913 0.647 5.52927 Signal handler overhead: 2.0919 2.1078 0.7600 2.1831 4.35967 Protection fault: 0.7345 0.7478 1.8107 0.8031 9.33968 Pipe latency: 33.006 16.398 -50.31 33.475 1.42368 AF_UNIX sock stream latency: 14.5093 30.910 113.03 30.715 111.692 Process fork+exit: 219.8 222.8 1.3648 229.37 4.35623 Process fork+execve: 876.14 873.28 -0.32 868.66 -0.8533 Process fork+/bin/sh -c: 2830 2876.5 1.6431 2958 4.52296 File /var/tmp/XXX write bw: 1193497 1195536 0.1708 118657 -0.5799 Pagefaults on /var/tmp/XXX: 3.1272 3.2117 2.7020 3.2521 3.99398 Also, kernel compile times show no difference with this patch applied. [pbadari@us.ibm.com: Avoid unnecessary PURR reading] Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1281c8be |
|
14-Oct-2007 |
Anton Blanchard <anton@samba.org> |
[POWERPC] Quieten clockevent printk The clockevent bootup message only needs to be KERN_INFO. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
cdec12ae |
|
11-Oct-2007 |
Paul Mackerras <paulus@samba.org> |
[POWERPC] Make clockevents work on PPC601 processors In testing the new clocksource and clockevent code on a PPC601 processor, I discovered that the clockevent multiplier value for the decrementer clockevent was overflowing. Because the RTCL register in the 601 effectively counts at 1GHz (it doesn't actually, but it increases by 128 every 128ns), and the shift value was 32, that meant the multiplier value had to be 2^32, which won't fit in an unsigned long on 32-bit. The same problem would arise on any platform where the timebase frequency was 1GHz or more (not that we actually have any such machines today). This fixes it by reducing the shift value to 16. Doing the calculations with a resolution of 2^-16 nanoseconds (15 femtoseconds) should be quite adequate. :) Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
d968014b |
|
08-Oct-2007 |
Paul Mackerras <paulus@samba.org> |
[POWERPC] Prevent decrementer clockevents from firing early On old powermacs, we sometimes set the decrementer to 1 in order to trigger a decrementer interrupt, which we use to handle an interrupt that was pending at the time when it was re-enabled. This was causing the decrementer clock event device to call the event function for the next event early, which was causing problems when high-res timers were not enabled. This fixes the problem by recording the timebase value at which the next event should occur, and checking the current timebase against the recorded value in timer_interrupt. If it isn't time for the next event, it just reprograms the decrementer and returns. This also subtracts 1 from the value stored into the decrementer, which is appropriate because the decrementer interrupts on the transition from 0 to -1, not when the decrementer reaches 0. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
d831d0b8 |
|
20-Sep-2007 |
Tony Breeds <tony@bakeyournoodle.com> |
[POWERPC] Implement clockevents driver for powerpc This registers a clock event structure for the decrementer and turns on CONFIG_GENERIC_CLOCKEVENTS, which means that we now don't need most of timer_interrupt(), since the work is done in generic code. For secondary CPUs, their decrementer clockevent is registered when the CPU comes up (the generic code automatically removes the clockevent when the CPU goes down). Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
4a4cfe38 |
|
21-Sep-2007 |
Tony Breeds <tony@bakeyournoodle.com> |
[POWERPC] Implement generic time of day clocksource for powerpc Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
aa3be5f3 |
|
20-Sep-2007 |
Tony Breeds <tony@bakeyournoodle.com> |
[POWERPC] Implement {read,update}_persistent_clock With these functions implemented we cooperate better with the generic timekeeping code. This obsoletes the need for the timer sysdev as a bonus. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
c27da339 |
|
18-Sep-2007 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
[POWERPC] Fix timekeeping on PowerPC 601 Recent changes to the timekeeping code broke support for the PowerPC 601 processor which doesn't have the usual timebase facility but a slightly different thing called (yuck) the RTC. This fixes it, boot tested on an old 601 based PowerMac 7200. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
aab69292 |
|
20-Aug-2007 |
Josh Boyer <jwboyer@linux.vnet.ibm.com> |
[POWERPC] 40x decrementer fixes Allow generic_calibrate_decr to work for 40x platforms. Given that the hardware behavior is identical, this also changes the set_dec function to reload the PIT on 40x to match the behavior 44x currently has. Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
|
#
9420dc65 |
|
29-Jul-2007 |
Jesper Juhl <jesper.juhl@gmail.com> |
[POWERPC] Clean out a bunch of duplicate includes This removes several duplicate includes from arch/powerpc/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
1474855d |
|
20-Jul-2007 |
Bob Nelson <rrnelson@linux.vnet.ibm.com> |
[CELL] oprofile: add support to OProfile for profiling CELL BE SPUs From: Maynard Johnson <mpjohn@us.ibm.com> This patch updates the existing arch/powerpc/oprofile/op_model_cell.c to add in the SPU profiling capabilities. In addition, a 'cell' subdirectory was added to arch/powerpc/oprofile to hold Cell-specific SPU profiling code. Exports spu_set_profile_private_kref and spu_get_profile_private_kref which are used by OProfile to store private profile information in spufs data structures. Also incorporated several fixes from other patches (rrn). Check pointer returned from kzalloc. Eliminated unnecessary cast. Better error handling and cleanup in the related area. 64-bit unsigned long parameter was being demoted to 32-bit unsigned int and eventually promoted back to unsigned long. Signed-off-by: Carl Love <carll@us.ibm.com> Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Bob Nelson <rrnelson@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org>
|
#
fc9069fe |
|
03-Jul-2007 |
Tony Breeds <tony@bakeyournoodle.com> |
[POWERPC] Modify sched_clock() to make CONFIG_PRINTK_TIME more sane When booting a current kernel with CONFIG_PRINTK_TIME enabled you'll see messages like: [ 0.000000] time_init: decrementer frequency = 188.044000 MHz [ 0.000000] time_init: processor frequency = 1504.352000 MHz [3712914.436297] Console: colour dummy device 80x25 This cause by the initialisation of tb_to_ns_scale in time_init(), suddenly the multiplication in sched_clock() now does something :). This patch modifies sched_clock() to report the offset since the machine booted so the same printk's now look like: [ 0.000000] time_init: decrementer frequency = 188.044000 MHz [ 0.000000] time_init: processor frequency = 1504.352000 MHz [ 0.000135] Console: colour dummy device 80x25 Effectivly including the uptime in printk()s. This patch makes tb_to_ns_scale and tb_to_ns_shift static and read_mostly for good measure. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
71712b45 |
|
22-Jun-2007 |
Tony Breeds <tony@bakeyournoodle.com> |
[POWERPC] Move iSeries_tb_recal into its own late_initcall. Currently iSeries will recalibrate the cputime_factors in the first settimeofday() call. It seems the reason for doing this is to ensure a resaonable time delta after time_init(). On current kernels (with udev), this call is made 40-60 seconds into the boot process, by moving it to a late initcall it is called approximately 5 seconds after time_init() is called. This is sufficient to recalibrate the timebase. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
4cefebb1 |
|
07-Jun-2007 |
Michael Neuling <mikey@neuling.org> |
[POWERPC] Fix stolen time for SMT without LPAR For POWERPC, stolen time accounts for cycles lost to the hypervisor or PURR cycles attributed to the other SMT thread. Hence, when a PURR is available, we should still calculate stolen time, irrespective of being virtualised. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
df211c8a |
|
22-May-2007 |
Nathan Lynch <ntl@pobox.com> |
[POWERPC] Remove spinlock from struct cpu_purr_data cpu_purr_data is a per-cpu array used to account for stolen time on partitioned systems. It used to be the case that cpus accessed each others' cpu_purr_data, so each entry was protected by a spinlock. However, the code was reworked ("Simplify stolen time calculation") with the result that each cpu accesses its own cpu_purr_data and not those of other cpus. This means we can get rid of the spinlock as long as we're careful to disable interrupts when accessing cpu_purr_data in process context. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
e147ec8f |
|
11-May-2007 |
will schmidt <will_schmidt@vnet.ibm.com> |
[POWERPC] Simplify smp_space_timers Greatly simplify the function smp_space_timers. The stolen time calculation (per comment within the code) doesn't need the half-jiffy stagger any more. There isn't an issue with bouncing off global locks, so we really shouldn't need any sort of staggering at all. However, the last_jiffy value still needs to be set. This removes the extra stagger logic, and just sets the values. This change should benefit applications that rely on barrier synchronization, and will help cut down OS jitter. Boot tested across the board (G5,power3,power4,power5,970mp blade). Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
e2eb6392 |
|
03-Apr-2007 |
Stephen Rothwell <sfr@canb.auug.org.au> |
[POWERPC] Rename get_property to of_get_property: arch/powerpc Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
501b6d29 |
|
20-Nov-2006 |
Stephen Rothwell <sfr@canb.auug.org.au> |
[POWERPC] iSeries: fix time.c for combined build Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
df9c2309 |
|
21-Nov-2006 |
Kim Phillips <kim.phillips@freescale.com> |
[POWERPC] Revert "[POWERPC] Add powerpc get/set_rtc_time interface to new generic rtc class" This reverts commit 7a69af63e788a324d162201a0b23df41bcf158dd. As advised by David Brownell: http://marc.theaimsgroup.com/?l=linux-kernel&m=116387226902131&w=2 Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
cbcdb93d |
|
17-Oct-2006 |
Stephen Rothwell <sfr@canb.auug.org.au> |
[POWERPC] Simplify stolen time calculation In calculating stolen time, we were trying to actually account for time spent in the hypervisor. We don't really have enough information to do that accurately, so don't try. Instead, we now calculate stolen time as time that the current cpu thread is not actually dispatching instructions. On chips without a PURR, we cannot do this, so stolen time will always be zero. On chips with a PURR, this is merely the difference between the elapsed PURR values and the elapsed TB values. This gives us much more sane vaules from tools such as mpstat, even if they are still a bit strange e.g. 2 busy threads on one cpu will both appear to have 50% user time and 50% stolen time while 1 busy thread on a cpu will look like 100% user on one of them and 100% idle on the other. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
35a84c2f |
|
07-Oct-2006 |
Olaf Hering <olaf@aepfle.de> |
[POWERPC] Fix up after irq changes Remove struct pt_regs * from all handlers. Also remove the regs argument from get_irq() functions. Compile tested with arch/powerpc/config/* and arch/ppc/configs/prep_defconfig Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
7d12e780 |
|
05-Oct-2006 |
David Howells <dhowells@redhat.com> |
IRQ: Maintain regs pointer globally rather than passing to IRQ handlers Maintain a per-CPU global "struct pt_regs *" variable which can be used instead of passing regs around manually through all ~1800 interrupt handlers in the Linux kernel. The regs pointer is used in few places, but it potentially costs both stack space and code to pass it around. On the FRV arch, removing the regs parameter from all the genirq function results in a 20% speed up of the IRQ exit path (ie: from leaving timer_interrupt() to leaving do_IRQ()). Where appropriate, an arch may override the generic storage facility and do something different with the variable. On FRV, for instance, the address is maintained in GR28 at all times inside the kernel as part of general exception handling. Having looked over the code, it appears that the parameter may be handed down through up to twenty or so layers of functions. Consider a USB character device attached to a USB hub, attached to a USB controller that posts its interrupts through a cascaded auxiliary interrupt controller. A character device driver may want to pass regs to the sysrq handler through the input layer which adds another few layers of parameter passing. I've build this code with allyesconfig for x86_64 and i386. I've runtested the main part of the code on FRV and i386, though I can't test most of the drivers. I've also done partial conversion for powerpc and MIPS - these at least compile with minimal configurations. This will affect all archs. Mostly the changes should be relatively easy. Take do_IRQ(), store the regs pointer at the beginning, saving the old one: struct pt_regs *old_regs = set_irq_regs(regs); And put the old one back at the end: set_irq_regs(old_regs); Don't pass regs through to generic_handle_irq() or __do_IRQ(). In timer_interrupt(), this sort of change will be necessary: - update_process_times(user_mode(regs)); - profile_tick(CPU_PROFILING, regs); + update_process_times(user_mode(get_irq_regs())); + profile_tick(CPU_PROFILING); I'd like to move update_process_times()'s use of get_irq_regs() into itself, except that i386, alone of the archs, uses something other than user_mode(). Some notes on the interrupt handling in the drivers: (*) input_dev() is now gone entirely. The regs pointer is no longer stored in the input_dev struct. (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does something different depending on whether it's been supplied with a regs pointer or not. (*) Various IRQ handler function pointers have been moved to type irq_handler_t. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
|
#
7a69af63 |
|
26-Sep-2006 |
Kim Phillips <kim.phillips@freescale.com> |
[POWERPC] Add powerpc get/set_rtc_time interface to new generic rtc class Add powerpc get/set_rtc_time interface to new generic rtc class. This abstracts rtc chip specific code from the platform code for rtc-over-i2c platforms. Specific RTC chip support is now configured under Device Drivers -> Real Time Clock. Setting time of day from the RTC on startup is also configurable. this time without the potentially platform breaking initcall. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
8ef38609 |
|
01-Oct-2006 |
Atsushi Nemoto <anemo@mba.ocn.ne.jp> |
[PATCH] kill wall_jiffies With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies. So we can kill wall_jiffies completely. This is just a cleanup and logically should not change any real behavior except for one thing: RTC updating code in (old) ppc and xtensa use a condition "jiffies - wall_jiffies == 1". This condition is never met so I suppose it is just a bug. I just remove that condition only instead of kill the whole "if" block. [heiko.carstens@de.ibm.com: s390 build fix and cleanup] Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
3171a030 |
|
29-Sep-2006 |
Atsushi Nemoto <anemo@mba.ocn.ne.jp> |
[PATCH] simplify update_times (avoid jiffies/jiffies_64 aliasing problem) Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390 timer interrupt handler with this change. Currently update_times() calculates ticks by "jiffies - wall_jiffies", but callers of do_timer() should know how many ticks to update. Passing ticks get rid of this redundant calculation. Also there are another redundancy pointed out by Martin Schwidefsky. This cleanup make a barrier added by 5aee405c662ca644980c184774277fc6d0769a84 needless. So this patch removes it. As a bonus, this cleanup make wall_jiffies can be removed easily, since now wall_jiffies is always synced with jiffies. (This patch does not really remove wall_jiffies. It would be another cleanup patch) Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
a4dc7ff0 |
|
18-Sep-2006 |
Paul Mackerras <paulus@samba.org> |
[POWERPC] Define of_read_ulong helper There are various places where we want to extract an unsigned long value from a device-tree property that can be 1 or 2 cells in length. This replaces some open-coded calculations, and one place where we assumed without checking that properties were the length we wanted, with a little of_read_ulong() helper. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
eb36c288 |
|
30-Aug-2006 |
Paul Mackerras <paulus@samba.org> |
[PATCH] ppc32: fix last_jiffy time comparison This fixes a hang on ppc32. The problem was that I was comparing a 32-bit quantity with a 64-bit quantity, and consequently time wasn't advancing. This makes us use a 64-bit quantity on all platforms, which ends up simplifying the code since we can now get rid of the tb_last_stamp variable (which actually fixes another bug that Ben H and I noticed while going carefully through the code). This works fine on my G4 tibook. Let me know how it goes on your machines. Acked-by: Olaf Hering <olaf@aepfle.de> Acked-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e0d872d5 |
|
29-Aug-2006 |
Paul Mackerras <paulus@samba.org> |
[POWERPC] Fix problem with time not advancing on 32-bit platforms This fixes a problem introduced in 5db9fa9593e2ff69f2b95f9d59229dc4faaa564d. The last_jiffy per-cpu variable is only 32 bits on 32-bit machines, but it was being compared with a 64-bit quantity (tb_next_jiffy), which resulted in time not advancing. This fixes it by changing last_jiffy to be 64 bits on all platforms. With this, we no longer need tb_last_stamp as a 32-bit version of tb_last_jiffy, so this gets rid of tb_last_stamp and we just use tb_last_jiffy instead. This also fixes a bug when the boot cpu is not online, because using tb_last_stamp could have caused the wrong timebase origin value to be used when calculating the time of day. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
5db9fa95 |
|
22-Aug-2006 |
Nathan Lynch <ntl@pobox.com> |
[POWERPC] Fix gettimeofday inaccuracies There are two problems in the powerpc gettimeofday code which can cause incorrect results to be returned. The first is that there is a race between do_gettimeofday and the timer interrupt: 1. do_gettimeofday does get_tb() 2. decrementer exception on boot cpu which runs timer_recalc_offset, which also samples the timebase and updates the do_gtod structure with a greater timebase value. 3. do_gettimeofday calls __do_gettimeofday, which leads to the negative result from tb_val - temp_varp->tb_orig_stamp. The second is caused by taking the boot cpu offline, which can cause the value of tb_last_jiffy to be increased past the currently available timebase, causing the same underflow as above. [paulus@samba.org - define and use data_barrier() instead of mb().] Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
a7f67bdf |
|
11-Jul-2006 |
Jeremy Kerr <jk@ozlabs.org> |
[POWERPC] Constify & voidify get_property() Now that get_property() returns a void *, there's no need to cast its return value. Also, treat the return value as const, so we can constify get_property later. powerpc core changes. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
19923c19 |
|
26-Jun-2006 |
Roman Zippel <zippel@linux-m68k.org> |
[PATCH] fix and optimize clock source update This fixes the clock source updates in update_wall_time() to correctly track the time coming in via current_tick_length(). Optimize the fast paths to be as short as possible to keep the overhead low. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
260a4230 |
|
26-Jun-2006 |
John Stultz <johnstul@us.ibm.com> |
[PATCH] Time: Let user request precision from current_tick_length() Change the current_tick_length() function so it takes an argument which specifies how much precision to return in shifted nanoseconds. This provides a simple way to convert between NTPs internal nanoseconds shifted by (SHIFT_SCALE - 10) to other shifted nanosecond units that are used by the clocksource abstraction. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
0bb474a4 |
|
20-Jun-2006 |
Anton Blanchard <anton@samba.org> |
[POWERPC] support ibm,extended-*-frequency properties Support the ibm,extended-*-frequency properties found in recent POWER5 firmware: cpus/PowerPC,POWER5@0/clock-frequency 59aa5880 (1504336000) cpus/PowerPC,POWER5@0/ibm,extended-clock-frequency 00000000 59aa5880 cpus/PowerPC,POWER5@0/timebase-frequency 0b354b10 (188042000) cpus/PowerPC,POWER5@0/ibm,extended-timebase-frequency 00000000 0b354b10 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
4bd174fe |
|
18-Apr-2006 |
Olof Johansson <olof@lixom.net> |
[PATCH] powerpc: Remove stale iseries global Not even the iSeries maintainer seems to have access to this legendary piranha simulator. It adds a bit of ugliness in the common time init code, and if it's no longer used we might as well be done with it and remove the bloat. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
224ad80a |
|
12-Apr-2006 |
Olof Johansson <olof@lixom.net> |
[PATCH] powerpc: Quiet time init output Move time_init console output to KERN_DEBUG prink level. No need to print it at every boot. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
0e551954 |
|
28-Mar-2006 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
[PATCH] for_each_possible_cpu: powerpc for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
0a45d449 |
|
14-Mar-2006 |
Paul Mackerras <paulus@samba.org> |
powerpc: Fix problem with time going backwards The recent changes to keep gettimeofday in sync with xtime had the side effect that it was occasionally possible for the time reported by gettimeofday to go back by a microsecond. There were two reasons: (1) when we recalculated the offsets used by gettimeofday every 2^31 timebase ticks, we lost an accumulated fractional microsecond, and (2) because the update is done some time after the notional start of jiffy, if ntp is slowing the clock, it is possible to see time go backwards when the timebase factor gets reduced. This fixes it by (a) slowing the gettimeofday clock by about 1us in 2^31 timebase ticks (a factor of less than 1 in 3.7 million), and (b) adjusting the timebase offsets in the rare case that the gettimeofday result could possibly go backwards (i.e. when ntp is slowing the clock and the timer interrupt is late). In this case the adjustment will reduce to zero eventually because of (a). Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
2cf82c02 |
|
26-Feb-2006 |
Paul Mackerras <paulus@samba.org> |
powerpc: Export variables used in conversions to/from cputime_t The inline cputime_to_foo and foo_to_cputime conversion functions in include/asm-powerpc/cputime.h refer to 5 variables, which need to be exported if those functions are to be usable from modules. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
c6622f63 |
|
23-Feb-2006 |
Paul Mackerras <paulus@samba.org> |
powerpc: Implement accurate task and CPU time accounting This implements accurate task and cpu time accounting for 64-bit powerpc kernels. Instead of accounting a whole jiffy of time to a task on a timer interrupt because that task happened to be running at the time, we now account time in units of timebase ticks according to the actual time spent by the task in user mode and kernel mode. We also count the time spent processing hardware and software interrupts accurately. This is conditional on CONFIG_VIRT_CPU_ACCOUNTING. If that is not set, we do tick-based approximate accounting as before. To get this accurate information, we read either the PURR (processor utilization of resources register) on POWER5 machines, or the timebase on other machines on * each entry to the kernel from usermode * each exit to usermode * transitions between process context, hard irq context and soft irq context in kernel mode * context switches. On POWER5 systems with shared-processor logical partitioning we also read both the PURR and the timebase at each timer interrupt and context switch in order to determine how much time has been taken by the hypervisor to run other partitions ("steal" time). Unfortunately, since we need values of the PURR on both threads at the same time to accurately calculate the steal time, and since we can only calculate steal time on a per-core basis, the apportioning of the steal time between idle time (time which we ceded to the hypervisor in the idle loop) and actual stolen time is somewhat approximate at the moment. This is all based quite heavily on what s390 does, and it uses the generic interfaces that were added by the s390 developers, i.e. account_system_time(), account_user_time(), etc. This patch doesn't add any new interfaces between the kernel and userspace, and doesn't change the units in which time is reported to userspace by things such as /proc/stat, /proc/<pid>/stat, getrusage(), times(), etc. Internally the various task and cpu times are stored in timebase units, but they are converted to USER_HZ units (1/100th of a second) when reported to userspace. Some precision is therefore lost but there should not be any accumulating error, since the internal accumulation is at full precision. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
092b8f34 |
|
19-Feb-2006 |
Paul Mackerras <paulus@samba.org> |
powerpc: Keep xtime and gettimeofday in sync This fixes a regression which was introduced by moving ppc32 to use the same sort of lockless gettimeofday as ppc64 has been using for some time. This involves getting the timebase and performing some simple arithmetic to convert it to seconds and microseconds. However, the factor and offset used there weren't being updated when NTP varied the tick length using adjtimex. 64-bit didn't notice the problem because it had a hook in the 32-bit adjtimex compat routine that attempted to work out what the generic timekeeping code would do and alter the factor and offset to match. However, that code was very complex and it wasn't clear that it still matched what the generic code would do. Now we use the generic current_tick_length() routine that was recently added to check that the current tick will be as long as we expect; if not we recompute the factor and offset. This keeps gettimeofday and xtime in sync. In addition we check that gettimeofday hasn't got ahead of xtime on each timer interrupt; if it has, we resync. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
d8a8188d |
|
04-Feb-2006 |
Olaf Hering <olh@suse.de> |
[PATCH] powerpc: remove pointer/integer confusion in generic_calibrate_decr remove pointer/integer confusion Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
3356bb9f7 |
|
12-Jan-2006 |
David Gibson <david@gibson.dropbear.id.au> |
[PATCH] powerpc: Remove lppaca structure from the PACA At present the lppaca - the structure shared with the iSeries hypervisor and phyp - is contained within the PACA, our own low-level per-cpu structure. This doesn't have to be so, the patch below removes it, making a separate array of lppaca structures. This saves approximately 500*NR_CPUS bytes of image size and kernel memory, because we don't need aligning gap between the Linux and hypervisor portions of every PACA. On the other hand it means an extra level of dereference in many accesses to the lppaca. The patch also gets rid of several places where we assign the paca address to a local variable for no particular reason. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
404849bb |
|
23-Nov-2005 |
David Gibson <david@gibson.dropbear.id.au> |
[PATCH] powerpc: Remove some unneeded fields from the paca This patch removes several unnecessary fields from the paca: - next_jiffy_update_tb was simply unused. Remove trivially. - The exdsi exception save area was not used. There were plans to use it, but they never seem to have gone anywhere. If they ever do, we can put it back. Remove from the paca, and from asm-offsets.c - The default_decr field was used from asm, but was only ever assigned the value of tb_ticks_per_jiffy. Just access tb_ticks_per_jiffy from asm directly instead. Built and booted on POWER5 LPAR and iSeries RS64. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
6defa38b |
|
17-Nov-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Fix delay functions for 601 processors My earlier merge of delay.h introduced a timebase-based udelay for 32-bit machines but also broke the 601, which doesn't have the timebase register. This fixes it by using the 601's RTC register on the 601, and also moves __delay() and udelay() to be out-of-line in arch/powerpc/kernel/time.c. These functions aren't really performance critical, after all. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
a7f290da |
|
11-Nov-2005 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel This patch moves the vdso's to arch/powerpc, adds support for the 32 bits vdso to the 32 bits kernel, rename systemcfg (finally !), and adds some new (still untested) routines to both vdso's: clock_gettime() with support for CLOCK_REALTIME and CLOCK_MONOTONIC, clock_getres() (same clocks) and get_tbfreq() for glibc to retreive the timebase frequency. Tom,Steve: The implementation of get_tbfreq() I've done for 32 bits returns a long long (r3, r4) not a long. This is such that if we ever add support for >4Ghz timebases on ppc32, the userland interface won't have to change. I have tested gettimeofday() using some glibc patches in both ppc32 and ppc64 kernels using 32 bits userland (I haven't had a chance to test a 64 bits userland yet, but the implementation didn't change and was tested earlier). I haven't tested yet the new functions. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
cbe62e2b |
|
09-Nov-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Fix SMP time initialization problem We were getting the last_jiffy per-cpu variable set ahead of the current timebase in smp_space_timers on SMP machines. This caused the loop in timer_interrupt to loop virtually forever, since tb_ticks_since assumes that it will never be called with the timebase behind the last_jiffy value. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
799d6046 |
|
09-Nov-2005 |
Paul Mackerras <paulus@samba.org> |
[PATCH] powerpc: merge code values for identifying platforms This patch merges platform codes. systemcfg->platform is no longer used, systemcfg use in general is deprecated as much as possible (and renamed _systemcfg before it gets completely moved elsewhere in a future patch), _machine is now used on ppc64 along as ppc32. Platform codes aren't gone yet but we are getting a step closer. A bunch of asm code in head[_64].S is also turned into C code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
732ee21f |
|
07-Nov-2005 |
Olof Johansson <olof@lixom.net> |
[PATCH] POWERPC/PPC64: Fix CONFIG_SMP=n build for ppc64 Two CONFIG_SMP=n build fixes due to missing <asm/smp.h> includes. Signed-off-by: Olof Johansson <olof@lixom.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
2249ca9d |
|
06-Nov-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Various UP build fixes Mostly this involves adding #include <asm/smp.h>, since that defines things like boot_cpuid[_phys] and [gs]et_hard_smp_processor_id, which are SMP-related but still needed on UP. This incorporates fixes posted by Olof Johansson and Heikki Lindholm. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
8875ccfb |
|
01-Nov-2005 |
Kelly Daly <kelly@au.ibm.com> |
merge filename and modify references to iseries/it_lp_queue.h Signed-off-by: Kelly Daly <kelly@au.ibm.com>
|
#
8021b8a7 |
|
01-Nov-2005 |
Kelly Daly <kelly@au.ibm.com> |
merge filename and modify references to iseries/hv_call_xm.h Signed-off-by: Kelly Daly <kelly@au.ibm.com>
|
#
734d6524 |
|
30-Oct-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: apply recent changes to merged code Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
5f6b5b97 |
|
30-Oct-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Fix time setting bug on 32-bit This fixes a bug where settimeofday would set the wrong parameters in do_gtod, resulting in gettimeofday returning a value about 4 hours after the correct time. The bug was that we divided a negative 64-bit value with do_div, which treated it as unsigned and gave us a result that was approximately 1.8e10 too large (since the divisor was 1e9). Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
0fd6f717 |
|
25-Oct-2005 |
Kumar Gala <galak@freescale.com> |
[PATCH] powerpc: Add support for Book-E timer config to generic_calibrate_decr We need to initialize some control SPRS for timers on Book-E before we start taking decrementer interrupts. Signed-off-by: Kumar K. Gala <kumar.gala@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
d2e61512 |
|
20-Oct-2005 |
Kumar Gala <galak@freescale.com> |
[PATCH] powerpc: Make sure we have an RTC before trying to adjust it Its valid for ppc_md.set_rtc_time to be NULL. We need to check that its non-NULL before trying to update the RTC. Signed-off-by: Kumar K. Gala <kumar.gala@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
96c44507 |
|
23-Oct-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Fix time code for 601 processors The 601 doesn't have the timebase register; instead it has an RTCL register that counts nanoseconds and wraps at 1000000000, and an RTCU register that counts seconds. This makes the necessary changes for the merged time code to use the RTCL/U registers when the kernel is running on a 601. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
a5b518ed |
|
21-Oct-2005 |
Paul Mackerras <paulus@samba.org> |
ppc64/powerpc: Fix time initialization on SMP systems This moves smp_space_timers from arch/ppc64/kernel/smp.c to arch/powerpc/kernel/time.c and makes it initialize last_jiffy[] instead of paca[].next_jiffy_update_tb, since last_jiffy[] is now what the time code uses. It also declares smp_space_timers in include/asm-powerpc/time.h and gets rid of an ifdef in div128_by_32. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
5d14a18d |
|
20-Oct-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Fix some bugs in the new merged time code I had the sense of the test for when to use the old 601-style RTC registers inverted. pmac_calibrate_decr and via_calibrate_decr weren't setting ppc_tb_freq, on which all the further calculations depended. Lastly, update_gtod was losing the top 32 bits of the new tb_to_xs value. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
374e99d4 |
|
20-Oct-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Move some calculations from xxx_calibrate_decr to time_init Previously the individual xxx_calibrate_decr functions would each print the timebase and cpu frequency and calculate several values such as tb_to_us and tb_to_xs. This moves those printks and calculations into time_init just after the call to the platform's calibrate_decr function. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
f2783c15 |
|
19-Oct-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Merge time.c and asm/time.h. We now use the merged time.c for both 32-bit and 64-bit compilation with ARCH=powerpc, and for ARCH=ppc64, but not for ARCH=ppc32. This removes setup_default_decr (folds its function into time_init) and moves wakeup_decrementer into time.c. This also makes an asm-powerpc/rtc.h. Signed-off-by: Paul Mackerras <paulus@samba.org>
|