Cross Reference: /linux-master/arch/powerpc/platforms/cell/spufs/sched.c

History log of /linux-master/arch/powerpc/platforms/cell/spufs/sched.c
Revision	Date	Author	Comments
# 5986f6b6	08-Mar-2022	YueHaibing <yuehaibing@huawei.com>	powerpc/spufs: Fix build warning when CONFIG_PROC_FS=n arch/powerpc/platforms/cell/spufs/sched.c:1055:12: warning: ‘show_spu_loadavg’ defined but not used [-Wunused-function] static int show_spu_loadavg(struct seq_file s, void private) ^~~~~~~~~~~~~~~~ Move it into #ifdef block to fix this, also remove unneeded semicolon. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220308100928.23540-1-yuehaibing@huawei.com
# 925f76c5	08-May-2020	Julia Lawall <Julia.Lawall@inria.fr>	powerpc/spufs: adjust list element pointer type Other uses of &gang->aff_list_head, eg in spufs_assert_affinity, indicate that the list elements have type spu_context, not spu as used here. Change the type of tmp accordingly. This has no impact on the execution, because tmp is not used in the body of the loop. Fixes: c5fc8d2a92461 ("[CELL] cell: add placement computation for scheduling of affinity contexts") Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1588929176-28527-1-git-send-email-Julia.Lawall@inria.fr
# 9d061ba6	28-Jan-2021	Dietmar Eggemann <dietmar.eggemann@arm.com>	sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO The only remaining use of MAX_USER_PRIO (and USER_PRIO) is the SCALE_PRIO() definition in the PowerPC Cell architecture's Synergistic Processor Unit (SPU) scheduler. TASK_USER_PRIO isn't used anymore. Commit fe443ef2ac42 ("[POWERPC] spusched: Dynamic timeslicing for SCHED_OTHER") copied SCALE_PRIO() from the task scheduler in v2.6.23. Commit a4ec24b48dde ("sched: tidy up SCHED_RR") removed it from the task scheduler in v2.6.24. Commit 3ee237dddcd8 ("sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and NICE_WIDTH in prio.h") introduced NICE_WIDTH much later. With: MAX_USER_PRIO = USER_PRIO(MAX_PRIO) = MAX_PRIO - MAX_RT_PRIO MAX_PRIO = MAX_RT_PRIO + NICE_WIDTH MAX_USER_PRIO = MAX_RT_PRIO + NICE_WIDTH - MAX_RT_PRIO MAX_USER_PRIO = NICE_WIDTH MAX_USER_PRIO can be replaced by NICE_WIDTH to be able to remove all the {*_}USER_PRIO defines. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210128131040.296856-3-dietmar.eggemann@arm.com
# 7a3c90df	14-Jan-2021	Viresh Kumar <viresh.kumar@linaro.org>	arch: powerpc: Stop building and using oprofile The "oprofile" user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. This commits stops building oprofile for powerpc and removes any reference to it from directories in arch/powerpc/ apart from arch/powerpc/oprofile, which will be removed in the next commit (this is broken into two commits as the size of the commit became very big, ~5k lines). Note that the member "oprofile_cpu_type" in "struct cpu_spec" isn't removed as it was also used by other parts of the code. Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Robert Richter <rric@kernel.org> Acked-by: William Cohen <wcohen@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de>
# 3bd37062	23-Apr-2019	Sebastian Andrzej Siewior <bigeasy@linutronix.de>	sched/core: Provide a pointer to the valid CPU mask In commit: 4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not much difference in !RT but in RT we used this to implement migrate_disable(). Within a migrate_disable() section the CPU mask is restricted to single CPU while the "normal" CPU mask remains untouched. As an alternative implementation Ingo suggested to use: struct task_struct { const cpumask_t *cpus_ptr; cpumask_t cpus_mask; }; with t->cpus_ptr = &t->cpus_mask; In -RT we then can switch the cpus_ptr to: t->cpus_ptr = &cpumask_of(task_cpu(p)); in a migration disabled region. The rules are simple: - Code that 'uses' ->cpus_allowed would use the pointer. - Code that 'modifies' ->cpus_allowed would use the direct mask. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190423142636.14347-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
# de6cc651	27-May-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 153 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 675 mass ave cambridge ma 02139 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 77 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.837555891@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 8508cf3f	26-Oct-2018	Johannes Weiner <hannes@cmpxchg.org>	sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD There are several definitions of those functions/macros in places that mess with fixed-point load averages. Provide an official version. [akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c] Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Daniel Drake <drake@endlessm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 3f3942ac	15-May-2018	Christoph Hellwig <hch@lst.de>	proc: introduce proc_create_single{,_data} Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
# 9a1015b3	20-Apr-2018	Alexey Dobriyan <adobriyan@gmail.com>	proc: fix /proc/loadavg regression Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") changed last field of /proc/loadavg (last pid allocated) to be off by one: # unshare -p -f --mount-proc cat /proc/loadavg 0.00 0.00 0.00 1/60 2 <=== It should be 1 after first fork into pid namespace. This is formally a regression but given how useless this field is I don't think anyone is affected. Bug was found by /proc testsuite! Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2 Fixes: 95846ecf9dac508 ("pid: replace pid bitmap implementation with IDR API") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# e99e88a9	16-Oct-2017	Kees Cook <keescook@chromium.org>	treewide: setup_timer() -> timer_setup() This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something ptr = (struct something )data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list t) { struct something ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something ptr = (struct something )data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list t) { struct something ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); \| -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); \| -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); \| -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| _E->_timer@_stl.function = _callback; \| _E->_timer@_stl.function = &_callback; \| _E->_timer@_stl.function = (_cast_func)_callback; \| _E->_timer@_stl.function = (_cast_func)&_callback; \| _E._timer@_stl.function = _callback; \| _E._timer@_stl.function = &_callback; \| _E._timer@_stl.function = (_cast_func)_callback; \| _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list t ) { ( ... when != _origarg _handletype _handle = -(_handletype )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg \| ... when != _origarg _handletype _handle = -(void )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg \| ... when != _origarg _handletype _handle; ... when != _handle _handle = -(_handletype )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg \| ... when != _origarg _handletype _handle; ... when != _handle _handle = -(void )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list t ) { + _handletype _origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype )_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list t) { ... } // callback(struct something handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype _handle +struct timer_list t ) { + _handletype _handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list t) { - _handletype _handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); \| -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast \|\| change_callback_handle_cast_no_arg \|\| change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; \| _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; \| _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; \| _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast \|\| change_callback_handle_cast_no_arg \|\| change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer \| -(_cast_data)&_E +&_E._timer \| -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); \| -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); \| -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); \| -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); \| -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); \| -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>
# 95846ecf	17-Nov-2017	Gargi Sharma <gs051095@gmail.com>	pid: replace pid bitmap implementation with IDR API Patch series "Replacing PID bitmap implementation with IDR API", v4. This series replaces kernel bitmap implementation of PID allocation with IDR API. These patches are written to simplify the kernel by replacing custom code with calls to generic code. The following are the stats for pid and pid_namespace object files before and after the replacement. There is a noteworthy change between the IDR and bitmap implementation. Before text data bss dec hex filename 8447 3894 64 12405 3075 kernel/pid.o After text data bss dec hex filename 3397 304 0 3701 e75 kernel/pid.o Before text data bss dec hex filename 5692 1842 192 7726 1e2e kernel/pid_namespace.o After text data bss dec hex filename 2854 216 16 3086 c0e kernel/pid_namespace.o The following are the stats for ps, pstree and calling readdir on /proc for 10,000 processes. ps: With IDR API With bitmap real 0m1.479s 0m2.319s user 0m0.070s 0m0.060s sys 0m0.289s 0m0.516s pstree: With IDR API With bitmap real 0m1.024s 0m1.794s user 0m0.348s 0m0.612s sys 0m0.184s 0m0.264s proc: With IDR API With bitmap real 0m0.059s 0m0.074s user 0m0.000s 0m0.004s sys 0m0.016s 0m0.016s This patch (of 2): Replace the current bitmap implementation for Process ID allocation. Functions that are no longer required, for example, free_pidmap(), alloc_pidmap(), etc. are removed. The rest of the functions are modified to use the IDR API. The change was made to make the PID allocation less complex by replacing custom code with calls to generic API. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com [avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl] Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 174cd4b1	02-Feb-2017	Ingo Molnar <mingo@kernel.org>	sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 4f17722c	08-Feb-2017	Ingo Molnar <mingo@kernel.org>	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h> We are going to split <linux/sched/loadavg.h> out of <linux/sched.h>, which will have to be picked up from a couple of .c files. Create a trivial placeholder <linux/sched/topology.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 0c98d344	05-Feb-2017	Ingo Molnar <mingo@kernel.org>	sched/core: Remove the tsk_cpus_allowed() wrapper So the original intention of tsk_cpus_allowed() was to 'future-proof' the field - but it's pretty ineffectual at that, because half of the code uses ->cpus_allowed directly ... Also, the wrapper makes the code longer than the original expression! So just get rid of it. This also shrinks <linux/sched.h> a bit. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 027dfac6	01-Jun-2016	Michael Ellerman <mpe@ellerman.id.au>	powerpc: Various typo fixes Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
# f2dec1ea	16-Jul-2014	Thomas Gleixner <tglx@linutronix.de>	powerpc: cell: Use ktime_get_ns() Replace the ever recurring: ts = ktime_get_ts(); ns = timespec_to_ns(&ts); with ns = ktime_get_ns(); Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
# 74b8af78	10-Feb-2014	Jeremy Kerr <jk@ozlabs.org>	powerpc/spufs: Remove MAX_USER_PRIO define Current ppc64_defconfig fails with: arch/powerpc/platforms/cell/spufs/sched.c:86:0: error: "MAX_USER_PRIO" redefined [-Werror] cc1: all warnings being treated as errors Commit 6b6350f155af ("sched: Expose some macros related to priority") introduced a generic MAX_USER_PRIO macro to sched/prio.h, which is causing the conflit. Use that one instead of our own. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Cc: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1392098717.689604.970589769393.1.gpush@pablo Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 993db4b4	11-Feb-2013	Ingo Molnar <mingo@kernel.org>	sched, powerpc: Fix sched.h split-up build failure Fix PowerPC/Cell build fallout from: 8bd75c77b7c6 sched/rt: Move rt specific bits into new header file Reported-by: Michael Ellerman <michael@ellerman.id.au> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 17cf22c3	02-Mar-2010	Eric W. Biederman <ebiederm@xmission.com>	pidns: Use task_active_pid_ns where appropriate The expressions tsk->nsproxy->pid_ns and task_active_pid_ns aka ns_of_pid(task_pid(tsk)) should have the same number of cache line misses with the practical difference that ns_of_pid(task_pid(tsk)) is released later in a processes life. Furthermore by using task_active_pid_ns it becomes trivial to write an unshare implementation for the the pid namespace. So I have used task_active_pid_ns everywhere I can. In fork since the pid has not yet been attached to the process I use ns_of_pid, to achieve the same effect. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
# ead53f22	22-Jul-2011	Paul Gortmaker <paul.gortmaker@windriver.com>	powerpc: remove non-required uses of include <linux/module.h> None of the files touched here are modules, and they are not exporting any symbols either -- so there is no need to be including the module.h. Builds of all the files remains successful. Even kernel/module.c does not need to include it, since it includes linux/moduleloader.h instead. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
# 104699c0	27-Apr-2011	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>	powerpc: Convert old cpumask API into new one Adapt new API. Almost change is trivial. Most important change is the below line because we plan to change task->cpus_allowed implementation. - ctx->cpus_allowed = current->cpus_allowed; Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
# 25985edc	30-Mar-2011	Lucas De Marchi <lucas.demarchi@profusion.mobi>	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
# 5a0e3ad6	24-Mar-2010	Tejun Heo <tj@kernel.org>	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
# fc537766	17-Sep-2009	Christoph Hellwig <hch@lst.de>	tracing: Remove markers Now that the last users of markers have migrated to the event tracer we can kill off the (now orphan) support code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090917173527.GA1699@lst.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
# ae142e0c	11-Jun-2009	Christoph Hellwig <hch@lst.de>	powerpc/sputrace: Use the generic event tracer I wrote sputrace before generic tracing infrastrucure was available. Now that we have the generic event tracer we can convert it over and remove a lot of code: 8 files changed, 45 insertions(+), 285 deletions(-) To use it make sure CONFIG_EVENT_TRACING is enabled and then enable the spufs trace channel by echo 1 > /sys/kernel/debug/tracing/events/spufs/spufs_context/enable and then read the trace records using e.g. cat /sys/kernel/debug/tracing/trace Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
# 74019224	17-Feb-2009	Ingo Molnar <mingo@elte.hu>	timers: add mod_timer_pending() Impact: new timer API Based on an idea from Martin Josefsson with the help of Patrick McHardy and Stephen Hemminger: introduce the mod_timer_pending() API which is a mod_timer() offspring that is an invariant on already removed timers. (regular mod_timer() re-activates non-pending timers.) This is useful for the networking code in that it can allow unserialized mod_timer_pending() timer-forwarding calls, but a single del_timer*() will stop the timer from being reactivated again. Also while at it: - optimize the regular mod_timer() path some more, the timer-stat and a debug check was needlessly duplicated in __mod_timer(). - make the exports come straight after the function, as most other exports in timer.c already did. - eliminate __mod_timer() as an external API, change the users to mod_timer(). The regular mod_timer() code path is not impacted significantly, due to inlining optimizations and due to the simplifications. Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
# 86c6f274	26-Dec-2008	Rusty Russell <rusty@rustcorp.com.au>	cpumask: powerpc: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask Impact: New APIs The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these return a pointer to a struct cpumask. Part of removing cpumasks from the stack. (Also replaces powerpc internal uses of node_to_cpumask). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
# 34318c25	20-Oct-2008	Andre Detsch <adetsch@br.ibm.com>	powerpc/spufs: Explain conditional decrement of aff_sched_count This patch adds a comment to clarify why atomic_dec_if_positive is being used to decrement gang's aff_sched_count on SPU context unbind. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 10baa26c	20-Oct-2008	Andre Detsch <adetsch@br.ibm.com>	powerpc/spufs: Improve search of node for contexts with SPU affinity This patch improves redability of the code responsible for trying to find a node with enough SPUs not committed to other affinity gangs. An additional check is also added, to avoid taking into account gangs that have no SPU affinity. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# b2e601d1	04-Sep-2008	Andre Detsch <adetsch@br.ibm.com>	powerpc/spufs: Fix possible scheduling of a context to multiple SPEs We currently have a race when scheduling a context to a SPE - after we have found a runnable context in spusched_tick, the same context may have been scheduled by spu_activate(). This may result in a panic if we try to unschedule a context that has been freed in the meantime. This change exits spu_schedule() if the context has already been scheduled, so we don't end up scheduling it twice. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# b65fe035	03-Sep-2008	Jeremy Kerr <jk@ozlabs.org>	powerpc/spufs: Fix race for a free SPU We currently have a race for a free SPE. With one thread doing a spu_yield(), and another doing a spu_activate(): thread 1 thread 2 spu_yield(oldctx) spu_activate(ctx) __spu_deactivate(oldctx) spu_unschedule(oldctx, spu) spu->alloc_state = SPU_FREE spu = spu_get_idle(ctx) - searches for a SPE in state SPU_FREE, gets the context just freed by thread 1 spu_schedule(ctx, spu) spu->alloc_state = SPU_USED spu_schedule(newctx, spu) - assumes spu is still free - tries to schedule context on already-used spu This change introduces a 'free_spu' flag to spu_unschedule, to indicate whether or not the function should free the spu after descheduling the context. We only set this flag if we're not going to re-schedule another context on this SPU. Add a comment to document this behaviour. Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 9f43e391	01-Sep-2008	Jeremy Kerr <jk@ozlabs.org>	powerpc/spufs: Fix multiple get_spu_context() Commit 8d5636fbca202f61fdb808fc9e20c0142291d802 introduced a reference count on SPU contexts during find_victim, but this may cause a leak in the reference count if we later find a better contender for a context to unschedule. Change the reference to after we've found our victim context, so we don't do the extra get_spu_context(). Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# cb9808d3	18-Aug-2008	Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>	powerpc/spufs: Remove invalid semicolon after if statement Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 8d5636fb	13-Aug-2008	Jeremy Kerr <jk@ozlabs.org>	powerpc/spufs: reference context while dropping state mutex in scheduler Based on an original patch from Christoph Hellwig <hch@lst.de>. Currently, there is a possible reference-after-free in the spusched code - contexts may be freed after we have released their state_mutex in spusched_tick and find_victim. This change takes a reference to the context before releasing the mutex, so that the context doesn't get destroyed. Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# ad1ede12	23-Jul-2008	Andre Detsch <adetsch@br.ibm.com>	powerpc/spufs: better placement of spu affinity reference context This patch adjusts the placement of a reference context from a spu affinity chain. The reference context can now be placed only on nodes that have enough spus not intended to be used by another gang (already running on the node). Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 0855b543	23-Jul-2008	Andre Detsch <adetsch@br.ibm.com>	powerpc/spufs: fix aff_mutex and cbe_spu_info[n].list_mutex deadlock Currenlt,, it is possible to lock aff_mutex and cbe_spu_info[n].list_mutex in different orders, allowing a deadlock to occur. With this change, aff_mutex is not taken within a list_mutex critical section anymore. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# fabb6570	04-Jul-2008	Maxim Shchetynin <maxim@de.ibm.com>	powerpc/spufs: add atomic busy_spus counter to struct cbe_spu_info As nr_active counter includes also spus waiting for syscalls to return we need a seperate counter that only counts spus that are currently running on spu side. This counter shall be used by a cpufreq governor that targets a frequency dependent from the number of running spus. Signed-off-by: Christian Krafft <krafft@de.ibm.com> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
# 2442a8ba	05-Jun-2008	Luke Browning <lukebrowning@us.ibm.com>	powerpc/spufs: don't extend time time slice if context is not in spu_run An spu context shouldn't get an extra tick if the time slice code couldn't find something else to run. This means contexts that are not within spu_run (ie, SPU_SCHED_SPU_RUN is cleared) will not receive extra ticks while we have no other contexts waiting. Signed-off-by: Luke Browning <lukebrowning@us.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 028fda0a	15-Jun-2008	Luke Browning <lukebrowning@us.ibm.com>	powerpc/spufs: fix missed stop-and-signal event There is a delay in the transition to the stopped state for class 2 interrupts. In some cases, the controlling thread detects the state of the spu as running, and goes back to sleep resulting in a hung application as the event is missed. This change detects the stop condition and re-generates the wakeup event after a context save. Signed-off-by: Luke Browning <lukebrowning@us.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 2c911a14	12-Jun-2008	Luke Browning <lukebrowning@us.ibm.com>	powerpc/spufs: synchronize interaction between spu exception handling and time slicing Time slicing can occur at the same time as spu exception handling resulting in the wakeup of the wrong thread. This change uses the the spu's register_lock to enforce synchronization between bind/unbind and spu exception handling so that they are mutually exclusive. Signed-off-by: Luke Browning <lukebrowning@us.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 08fcf1d6	12-May-2008	Luke Browning <lukebr@linux.vnet.ibm.com>	[POWERPC] spufs: Fix pointer reference in find_victim If victim (not ctx) is in spu_run, add victim to rq. Signed-off-by: Luke Browning <lukebrowning@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 7a28a154	07-May-2008	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: don't requeue victim contex in find_victim if it's not in spu_run We should not requeue the victim context in find_victim if the owner is not in spu_run. It's first not needed because leaving the context on the spu is an optimization and second is harmful because it means the owner could re-enter spu_run when the context is on the runqueue and trip the BUG_ON in __spu_update_sched_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 7a214200	27-Apr-2008	Luke Browning <lukebr@linux.vnet.ibm.com>	[POWERPC] spufs: try to route SPU interrupts to local node Currently, we re-route SPU interrupts to the current cpu, which may be on a remote node. In the case of time slicing, all spu interrupts will end up routed to the same cpu, where the spusched_tick occurs. This change routes mfc interrupts to the cpu where the controlling thread last ran, provided that cpu is on the same node as the spu (otherwise don't reroute interrupts). This should improve performance and provide a more predictable environment for processing spu exceptions. In the past we have seen concurrent delivery of spu exceptions to two cpus. This eliminates that concern. Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 8a476d49	29-Apr-2008	Julio M. Merino Vidal <jmerino@ac.upc.edu>	[POWERPC] spufs: fix marker name for find_victim Fix a typo in the marker for the find_victim function, which prevented it from being traced. It previously read find_vitim. Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 5158e9b5	29-Apr-2008	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: add context switch notification log There are userspace instrumentation tools that need to monitor spu context switches. This patch adds a new file called 'switch_log' to each spufs context directory that can be used to monitor the context switches. Context switch in/out and exit from spu_run are monitored after the file was first opened and can be read from it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 66747138	29-Apr-2008	Denis V. Lunev <den@openvz.org>	powerpc: use non-racy method for proc entries creation Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data be setup before gluing PDE to main tree. Add correct ->owner to proc_fops to fix reading/module unloading race. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# ce7c191b	04-Mar-2008	Jeremy Kerr <jk@ozlabs.org>	[POWERPC] spufs: don't (ab)use SCHED_IDLE commit 4ef11014 introduced a usage of SCHED_IDLE to detect when a context is within spu_run. Instead of SCHED_IDLE (which has other meaning), add a flag to sched_flags to tell if a context should be running. Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 2a58aa33	25-Feb-2008	Andre Detsch <adetsch@br.ibm.com>	[POWERPC] spufs: fix use time accounting on SPE-overcommit The spu_runcntl_RW register is restored within spu_restore function. So, at the end of spu_bind_context, the SPU context is not just loaded, but running. This change corrects the state switch to account the time as USER. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 4ef11014	18-Feb-2008	Jeremy Kerr <jk@ozlabs.org>	[POWERPC] spufs: fix scheduler starvation by idle contexts 2.6.25 has a regression where we can starve the scheduler by creating (N_SPES+1) contexts, then running them one at a time. The final context will never be run, as the other contexts are loaded on the SPEs, none of which are repoted as free (ie, spu->alloc_state != SPU_FREE), so spu_get_idle() doesn't give us a spu to run on. Because all of the contexts are stopped, none are descheduled by the scheduler tick, as spusched_tick returns if spu_stopped(ctx). This change replaces the spu_stopped() check with checking for SCHED_IDLE in ctx->policy. We set a context's policy to SCHED_IDLE when we're not in spu_run(). We also favour SCHED_IDLE contexts when looking for contexts to unbind, but leave their timeslice intact for later resumption. This patch fixes the following test in the spufs-testsuite: tests/20-scheduler/02-yield-starvation Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
# 038200cf	10-Jan-2008	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: Add marker-based tracing facility This adds markers two important points in the spufs code and a new module (sputrace.ko) that allows reading these out through a proc file. Long-term I'd rather see something like lttng extended to use the spufs instrumentation, but for now I think this is a good enough quick solution. We'll probably want to add various addition event in addition to that ones I have already. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# a5a97112	01-Jan-2008	Paul Mackerras <paulus@samba.org>	[POWERPC] Fix build failure on Cell when CONFIG_SPU_FS=y Commit aed3a8c9bb1a8623a618232087c5ff62718e3b9a introduced a definition of notify_spus_active in .../cell/spu_syscalls.c, and another definition under #ifndef MODULE in .../cell/spufs/sched.c. The latter is not necessary and causes the build to fail when CONFIG_SPU_FS=y, so this removes it. It also removes the export of do_notify_spus_active, which is unnecessary. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jeremy Kerr <jk@ozlabs.org>
# aed3a8c9	14-Dec-2007	Bob Nelson <rrnelson@linux.vnet.ibm.com>	[POWERPC] Oprofile: Remove dependency on spufs module This removes an OProfile dependency on the spufs module. This dependency was causing a problem for multiplatform systems that are built with support for Oprofile on Cell but try to load the oprofile module on a non-Cell system. Signed-off-by: Bob Nelson <rrnelson@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 90608a29	20-Dec-2007	Aegis Lin <aegislin@gmail.com>	[POWERPC] spufs: Use separate timer for /proc/spu_loadavg calculation The original spusched_timer was designed to take effect only when a context is waiting in the runqueue. This change adds an additional lower-freq timer has been added to purely handle the spu_load updates. The new timer will be triggered per LOAD_FREQ ticks. Signed-off-by: Aegis Lin <aegislin@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# c9101bdb	20-Dec-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: make state_mutex interruptible Make most places that use spu_acquire/spu_acquire_saved interruptible, this allows getting out of the spufs code when e.g. pressing ctrl+c. There are a few places where we get called e.g. from spufs teardown routines were we can't simply err out so these are left with a comment. For now I've also not touched the poll routines because it's open what libspe would expect in terms of interrupted system calls. Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# e65c2f6f	20-Dec-2007	Luke Browning <lukebr@linux.vnet.ibm.com>	[POWERPC] spufs: decouple spu scheduler from spufs_spu_run (asynchronous scheduling) Change spufs_spu_run so that the context is queued directly to the scheduler and the controlling thread advances directly to spufs_wait() for spe errors and exceptions. nosched contexts are treated the same as before. Fixes from Christoph Hellwig <hch@lst.de> Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# b192541b	20-Dec-2007	Luke Browning <lukebr@linux.vnet.ibm.com>	[POWERPC] spufs: spu_find_victim may choose wrong victim Need to re-check priority after dropping lock. Otherwise, a more favored context may be preempted. Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 91569531	20-Dec-2007	Luke Browning <lukebr@linux.vnet.ibm.com>	[POWERPC] spufs: reorganize spu_run_init This cleans up spu_run_init so that it does all of the spu initialization for spufs_run_spu. It initializes the spu context as much as possible before it activates the spu and writes the runcntl register. Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# d6ad39bc	20-Dec-2007	Jeremy Kerr <jk@ozlabs.org>	[POWERPC] spufs: rework class 0 and 1 interrupt handling Based on original patches from Arnd Bergmann <arnd.bergman@de.ibm.com>; and Luke Browning <lukebr@linux.vnet.ibm.com> Currently, spu contexts need to be loaded to the SPU in order to take class 0 and class 1 exceptions. This change makes the actual interrupt-handlers much simpler (ie, set the exception information in the context save area), and defers the handling code to the spufs_handle_class[01] functions, called from spufs_run_spu. This should improve the concurrency of the spu scheduling leading to greater SPU utilization when SPUs are overcommited. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 33bfd7a7	20-Dec-2007	Arnd Bergmann <arnd.bergmann@de.ibm.com>	[POWERPC] spufs: block fault handlers in spu_acquire_runnable This change disables the logic that faults-in spu contexts under the covers from the page fault handler. When a fault requires a runnable context, the handler will block until the context is scheduled by other means. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 7cd58e43	20-Dec-2007	Jeremy Kerr <jk@ozlabs.org>	[POWERPC] spufs: move fault, lscsa_alloc and switch code to spufs module Currently, part of the spufs code (switch.o, lscsa_alloc.o and fault.o) is compiled directly into the kernel. This change moves these components of spufs into the kernel. The lscsa and switch objects are fairly straightforward to move in. For the fault.o module, we split the fault-handling code into two parts: a/p/p/c/spu_fault.c and a/p/p/c/spufs/fault.c. The former is for the in-kernel spu_handle_mm_fault function, and we move the rest of the fault-handling code into spufs. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 9b1d21f8	20-Dec-2007	Julio M. Merino Vidal <jmerino@ac.upc.edu>	[POWERPC] spufs: fix typos in sched.c comments Fix a few typos in the spufs scheduler comments Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# c0e7b4aa	18-Sep-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spusched: Fix null pointer dereference in find_victim find_victim can dereference a NULL pointer when iterating over the list of victim spus because list_mutex only guarantees spu->ct to be stable, but of course not to be non-NULL. Also fix find_victim to not call spu_unbind_context without list_mutex because that violates the above guarantee. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 36ddbb13	18-Sep-2007	Andre Detsch <adetsch@br.ibm.com>	[POWERPC] spufs: Fix race condition on gang->aff_ref_spu Affinity reference point location (gang->aff_ref_spu) is reset when the whole gang is descheduled. However, the last member of a gang can be descheduled while we are trying to schedule another member of the gang. This was leading to a race condition, and the code was using gang->aff_ref_spu in an unsafe manner. By holding the gang->aff_mutex a little bit longer, and increment gang->aff_sched_count (which controls when gang->aff_ref_spu should be reset) a little bit earlier, the problem is fixed. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 683e3ab2	30-Jul-2007	Andre Detsch <adetsch@br.ibm.com>	[POWERPC] spufs: Fix affinity after introduction of node_allowed() calls This patch fixes affinity reference point placement, which was not being done in some situations, after the introduction of node_allowed() calls. The previously used parameter, 'ctx', is just the iterator of the previous list_for_each_entry_reverse loop, and its value might be invalid at the end of the loop. Also, the right context to seek for information when defining the reference ctx location _is_ the reference ctx. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 6f6a6dc0	24-Jul-2007	Masato Noguchi <Masato.Noguchi@jp.sony.com>	[POWERPC] spufs: Fix incorrect initialization of cbe_spu_info.spus We currently initialize cbe_spu_info[].spus in both init_spu_base and spu_sched_init. The initialise in spu_sched_init clears the SPU list, so we end up with no physical SPUs. Because of this, the spu_run syscall will block forever. This change removes the unnecessary initialization in spu_sched_init. Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 486acd48	20-Jul-2007	Christoph Hellwig <hch@lst.de>	[CELL] spufs: rework list management and associated locking This sorts out the various lists and related locks in the spu code. In detail: - the per-node free_spus and active_list are gone. Instead struct spu gained an alloc_state member telling whether the spu is free or not - the per-node spus array is now locked by a per-node mutex, which takes over from the global spu_lock and the per-node active_mutex - the spu_alloc* and spu_free function are gone as the state change is now done inline in the spufs code. This allows some more sharing of code for the affinity vs normal case and more efficient locking - some little refactoring in the affinity code for this locking scheme Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 1474855d	20-Jul-2007	Bob Nelson <rrnelson@linux.vnet.ibm.com>	[CELL] oprofile: add support to OProfile for profiling CELL BE SPUs From: Maynard Johnson <mpjohn@us.ibm.com> This patch updates the existing arch/powerpc/oprofile/op_model_cell.c to add in the SPU profiling capabilities. In addition, a 'cell' subdirectory was added to arch/powerpc/oprofile to hold Cell-specific SPU profiling code. Exports spu_set_profile_private_kref and spu_get_profile_private_kref which are used by OProfile to store private profile information in spufs data structures. Also incorporated several fixes from other patches (rrn). Check pointer returned from kzalloc. Eliminated unnecessary cast. Better error handling and cleanup in the related area. 64-bit unsigned long parameter was being demoted to 32-bit unsigned int and eventually promoted back to unsigned long. Signed-off-by: Carl Love <carll@us.ibm.com> Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Bob Nelson <rrnelson@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org>
# 36aaccc1	20-Jul-2007	Bob Nelson <rrnelson@linux.vnet.ibm.com>	[CELL] oprofile: enable SPU switch notification to detect currently active SPU tasks From: Maynard Johnson <mpjohn@us.ibm.com> This patch adds to the capability of spu_switch_event_register so that the caller is also notified of currently active SPU tasks. Exports spu_switch_event_register and spu_switch_event_unregister so that OProfile can get access to the notifications provided. Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Carl Love <carll@us.ibm.com> Signed-off-by: Bob Nelson <rrnelson@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org>
# cbc23d3e	20-Jul-2007	Arnd Bergmann <arnd@arndb.de>	[CELL] spufs: integration of SPE affinity with the scheduller This patch makes the scheduller honor affinity information for each context being scheduled. If the context has no affinity information, behaviour is unchanged. If there are affinity information, context is schedulled to be run on the exact spu recommended by the affinity placement algorithm. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# c5fc8d2a	20-Jul-2007	Arnd Bergmann <arnd@arndb.de>	[CELL] cell: add placement computation for scheduling of affinity contexts This patch provides the spu affinity placement logic for the spufs scheduler. Each time a gang is going to be scheduled, the placement of a reference context is defined. The placement of all other contexts with affinity from the gang is defined based on this reference context location and on a precomputed displacement offset. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# aa6d5b20	20-Jul-2007	Arnd Bergmann <arnd@arndb.de>	[CELL] cell: add per BE structure with info about its SPUs Addition of a spufs-global "cbe_info" array. Each entry contains information about one Cell/B.E. node, namelly: * list of spus (both free and busy spus are in this list); * list of free spus (replacing the static spu_list from spu_base.c) * number of spus; * number of reserved (non scheduleable) spus. SPE affinity implementation actually requires only access to one spu per BE node (since it implements its own pointer to walk through the other spus of the ring) and the number of scheduleable spus (n_spus - non_sched_spus) However having this more general structure can be useful for other functionalities, concentrating per-cbe statistics / data. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 7e90b749	20-Jul-2007	Masato Noguchi <Masato.Noguchi@jp.sony.com>	[CELL] spufs: use find_first_bit() instead of sched_find_first_bit() spu_sched->bitmap has MAX_PRIO(=140) width in bits.However, since ff80a77f20f811c0cc5b251d0f657cbc6f788385, sched_find_first_bit() only supports 100-bit bitmaps. Thus, spu_sched->bitmap should be treated by generic find_first_bit(). Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 27ec41d3	20-Jul-2007	Andre Detsch <adetsch@br.ibm.com>	[CELL] spufs: add spu stats in sysfs and ctx stat file in spufs This patch exports per-context statistics in spufs as long as spu statistics in sysfs. It was formed by merging: "spufs: add spu stats in sysfs" From: Christoph Hellwig "spufs: add stat file to spufs" From: Christoph Hellwig "spufs: fix libassist accounting" From: Jeremy Kerr "spusched: fix spu utilization statistics" From: Luke Browning And some adjustments by myself, after suggestions on cbe-oss-dev. Having separate patches was making the review process harder than it should, as we end up integrating spus and ctx statistics accounting much more than it was on the first implementation. Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# e840cfe6	20-Jul-2007	Jeremy Kerr <jk@ozlabs.org>	[CELL] spufs: Remove spurious WARN_ON for spu_deactivate for NOSCHED contexts In 6cbf93960e64f313f6e247cbca7afaa50e3ee2c we added a WARN_ON for calling spu_deactivate on contexts created with the SPU_CREATE_NOSCHED flag. However, all NOSCHED contexts will need to be deactivated when the context is destroyed, so this gives a spurious warning when any NOSCHED context is closed. This change removes the WARN_ON. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# d1450317	20-Jul-2007	Sebastian Siewior <sebastian@breakpoint.cc>	[CELL] spufs: remove section mismatch warning WARNING: arch/powerpc/platforms/cell/spufs/spufs.o(.init.text+0x158): Section mismatch: reference to .exit.text:.spu_sched_exit (between '.init_module' and '.spu_sched_init') was introduced by c99c1994a2bb9493b4ac372b2b6ee2606d291171 This patch removes the warning. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# fe2f896d	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: Add spu stats in sysfs Export spu statistics in sysfs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 27449971	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spusched: Fix runqueue corruption spu_activate can be called from multiple threads at the same time on behalf of the same spu context. We need to make sure to only add it once to avoid runqueue corruption. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# c77239b8	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spusched: Disable tick when not needed Only enable the scheduler tick if we have any context waiting to be scheduled. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# e9f8a0b6	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: Add stat file to spufs Export per-context statistics in spufs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 65de66f0	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: Implement /proc/spu_loadavg Provide load average information for spu context. The format is identical to /proc/loadavg, which is also where a lot of code and concepts is borrowed from. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 476273ad	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: Add tid file The new tid file contains the ID of the thread currently running the context, if any. This is used so that the new spu-top and spu-ps tools can find the thread in /proc. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 7022543e	28-Jun-2007	Jeremy Kerr <jk@ozlabs.org>	[POWERPC] spufs: Trivial whitespace fixes Remove redundant whitespace in arch/powerpc/platforms/cell/spufs/ Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# df09cf3e	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spusched: No preemption for nosched contexts And last but not least we need to make sure the scheduler tick never preempts a nosched context. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 46cbf939	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spusched: Catch nosched contexts in spu_deactivate spu_deactivate should never be called for nosched contets. Put in a check so we can print a stacktrace and exit early in case it happes erroneously. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# ea1ae594	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spusched: fix cpu/node binding Add a cpus_allowed allowed filed to struct spu_context so that we always use the cpu mask of the owning thread instead of the one happening to call into the scheduler. Also use this information in grab_runnable_context to avoid spurious wakeups. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 2cf2b3b4	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spusched: Update scheduling paramters on every spu_run Update scheduling information on every spu_run to allow for setting threads to realtime priority just before running them. This requires some slightly ugly code in spufs_run_spu because we can just update the information unlocked if the spu is not runnable, but we need to acquire the active_mutex when it is runnable to protect against find_victim. This locking scheme requires opencoding spu_acquire_runnable in spufs_run_spu which actually is a nice cleanup all by itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# f3f59bec	28-Jun-2007	Jeremy Kerr <jk@ozlabs.org>	[POWERPC] spusched: Print out scheduling tunables with DEBUG Print out a few scheduler tuning parameters when we've compiled with DEBUG defined. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 60e24239	28-Jun-2007	Jeremy Kerr <jk@ozlabs.org>	[POWERPC] spusched: Fix timeslice calculations The current timeslice code mixes 'jiffies' up with 'spesched ticks'. This change correctly defines the number of time slices each SPE contexts is given, and clarifies the comment. This brings the default timeslice for SPE contexts into a reasonable range. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# fe443ef2	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spusched: Dynamic timeslicing for SCHED_OTHER Enable preemptive scheduling for non-RT contexts. We use the same algorithms as the CPU scheduler to calculate the time slice length, and for now we also use the same timeslice length as the CPU scheduler. This might be not enough for good performance and can be changed after some benchmarking. Note that currently we do not boost the priority for contexts waiting on the runqueue for a long time, so contexts with a higher nice value could starve ones with less priority. This could easily be fixed once the rework of the spu lists that Luke and I discussed is done. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 37901802	28-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spusched: Switch from workqueues to kthread + timer tick Get rid of the scheduler workqueues that complicated things a lot to a dedicated spu scheduler thread that gets woken by a traditional scheduler tick. By default this scheduler tick runs a HZ * 10, aka one spu scheduler tick for every 10 cpu ticks. Currently the tick is not disabled when we have less context than available spus, but I will implement this later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# e5c0b9ec	04-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: Don't yield nosched context Nosched context sould never be scheduled out, thus we must not deactivate them in spu_yield ever. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# bb5db29a	04-Jun-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs scheduler: Fix wakeup races Fix the race between checking for contexts on the runqueue and actually waking them in spu_deactive and spu_yield. The guts of spu_reschedule are split into a new helper called grab_runnable_context which shows if there is a runnable thread below a specified priority and if yes removes if from the runqueue and uses it. This function is used by the new __spu_deactivate hepler shared by preemption and spu_yield to grab a new context before deactivating a specified priority and if yes removes if from the runqueue and uses it. This function is used by the new __spu_deactivate hepler shared by preemption and spu_yield to grab a new context before deactivating the old one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
# e63340ae	08-May-2007	Randy Dunlap <randy.dunlap@oracle.com>	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 4e0f4ed0	23-Apr-2007	Luke Browning <lukebrowning@us.ibm.com>	[POWERPC] spu sched: make addition to stop_wq and runque atomic vs wakeup Addition to stop_wq needs to happen before adding to the runqeueue and under the same lock so that we don't have a race window for a lost wake up in the spu scheduler. Signed-off-by: Luke Browning <lukebrowning@us.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# a475c2f4	23-Apr-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: remove woken threads from the runqueue early A single context should only be woken once, and we should not have more wakeups for a given priority than the number of contexts on that runqueue position. Also add some asserts to trap future problems in this area more easily. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 390c5343	23-Apr-2007	Arnd Bergmann <arnd.bergmann@de.ibm.com>	[POWERPC] spufs: add memory barriers after set_bit set_bit does not guarantee ordering on powerpc, so using it for communication between threads requires explicit mb() calls. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# e097b513	23-Apr-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spu sched: ensure preempted threads are put back on the runqueue, part2 To not lose a spu thread we need to make sure it always gets put back on the runqueue. In find_victim aswell as in the scheduler tick as done in the previous patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# b3e76cc3	23-Apr-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spu sched: ensure preempted threads are put back on the runqueue To not lose a spu thread we need to make sure it always gets put back on the runqueue. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 08873095	23-Apr-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: use cancel_rearming_delayed_workqueue when stopping spu contexts The scheduler workqueue may rearm itself and deadlock when we try to stop it. Put a flag in place to avoid skip the work if we're tearing down the context. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# dbf8eefa	23-Mar-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: don't yield CPU in spu_yield There is no reason to yield the CPU in spu_yield - if the backing thread reenters spu_run it gets added to the end of the runqueue for it's priority. So the yield is just a slowdown for the case where we have higher priority contexts waiting. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 94b2a439	09-Mar-2007	Benjamin Herrenschmidt <benh@kernel.crashing.org>	[POWERPC] Fix spu SLB invalidations The SPU code doesn't properly invalidate SPUs SLBs when necessary, for example when changing a segment size from the hugetlbfs code. In addition, it saves and restores the SLB content on context switches which makes it harder to properly handle those invalidations. This patch removes the saving & restoring for now, something more efficient might be found later on. It also adds a spu_flush_all_slbs(mm) that can be used by the core mm code to flush the SLBs of all SPEs that are running a given mm at the time of the flush. In order to do that, it adds a spinlock to the list of all SPEs and move some bits & pieces from spufs to spu_base.c Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
# 50b520d4	09-Mar-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] avoid SPU_ACTIVATE_NOWAKE optimization This optimization was added recently but is still buggy, so back it out for now. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 2eb1b120	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spu sched: static timeslicing for SCHED_RR contexts For SCHED_RR tasks we can do some really trivial timeslicing. Basically we fire up a time for every scheduler tick that searches for a higher or same priority thread that is on the runqueue and if there is one context switches to it. Because we can't lock spus from timer context we actually run this from a delayed runqueue instead of a timer. A nice optimization would be to skip the actual priority bitmap search when there are less contexts than physical spus available. To implement this I need a so far unpublished patch from Andre, and it will be added after we have that patch in. Note that right now we only do the time slicing for SCHED_RR tasks. The code would work for SCHED_OTHER tasks aswell, but their prio value is defered from the one the PPU thread has at time of spu_run, and using this for spu scheduling decisions would make the code very unfair. SCHED_OTHER support will be enabled once we the spu scheduler knows how to calculcate cpu_context.prio (very soon) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 72cb3608	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spu sched: use DECLARE_BITMAP use DECLARE_BITMAP in the spu scheduler instead of reimplementing it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 52f04fcf	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spu sched: forced preemption at execution If we start a spu context with realtime priority we want it to run immediately and not wait until some other lower priority thread has finished. Try to find a suitable victim and use it's spu in this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# ae7b4c52	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spu sched: update some comments Give spu_yield a kerneldoc comment and remove the old comment documenting spu_activate, spu_deactive and spu_yield as all of them now have descriptive kerneldoc comments of their own. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 678b2ff1	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spu sched: simplity spu_remove_from_active_list If we call spu_remove_from_active_list that spu is always guaranteed to be on the active list and in runnable state, so we can simply do a list_del to remove it and unconditionally take the was_active codepath. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 26bec673	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: optimize spu_run There is no need to directly wake up contexts in spu_activate when called from spu_run, so add a flag to surpress this wakeup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 079cdb61	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: runqueue simplification This is the biggest patch in this series, and it reworks the guts of the spu scheduler runqueue mechanism: - instead of embedding a waitqueue in the runqueue there is now a simple doubly-linked list, the actual wakeups happen by reusing the stop_wq in the spu context (maybe we should rename it one day) - spu_free and spu_prio_wakeup are merged into a single spu_reschedule function - various functionality is split out into small helpers, and kerneldoc comments are added in various places to document what's going on. - spu_activate is rewritten into a tight loop by removing test for various impossible conditions and using the infrastructure in this patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 8389998a	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: move prio to spu_context It doesn't make any sense to have a priority field in the physical spu structure. Move it into the spu context instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 650f8b02	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: simplify state_mutex The r/w semaphore to lock the spus was overkill and can be replaced with a mutex to make it faster, simpler and easier to debug. It also helps to allow making most spufs interruptible in future patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 202557d2	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: sched.c cleanups Various cleanups to sched.c that don't change the global control flow: - add kerneldoc comments to various functions - add spu_ prefixes to various functions - add/remove context from the runqueue in bind/unbind_context as it's part of the logical operation - add a call to put_active_spu to spu_unbind_contex as it's logically part of the unbind operation Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 81998baf	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: bind_context sets SPU_STATE_RUNNABLE Only bind_context/unbind_context change the spu context state. Thus we can move all assignents of SPU_STATE_RUNNABLE into bind_context, which parallels the unbind side aswell. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# aa56c168	13-Feb-2007	Christoph Hellwig <hch@lst.de>	[POWERPC] spufs: remove superfluous SPU_STATE_SAVED assignments unbind_context already sets the context state to SPU_STATE_SAVED, thus the spu_deactivate callers don't need to do it again. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
# 86767277	04-Oct-2006	Arnd Bergmann <arnd@arndb.de>	[POWERPC] spufs: add infrastructure for finding elf objects This adds an 'object-id' file that the spe library can use to store a pointer to its ELF object. This was originally meant for use by oprofile, but is now also used by the GNU debugger, if available. In order for oprofile to find the location in an spu-elf binary where an event counter triggered, we need a way to identify the binary in the first place. Unfortunately, that binary itself can be embedded in a powerpc ELF binary. Since we can assume it is mapped into the effective address space of the running process, have that one write the pointer value into a new spufs file. When a context switch occurs, pass the user value to the profiler so that can look at the mapped file (with some care). Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 9add11da	04-Oct-2006	Arnd Bergmann <arnd@arndb.de>	[POWERPC] spufs: implement error event delivery to user space This tries to fix spufs so we have an interface closer to what is specified in the man page for events returned in the third argument of spu_run. Fortunately, libspe has never been using the returned contents of that register, as they were the same as the return code of spu_run (duh!). Unlike the specification that we never implemented correctly, we now require a SPU_CREATE_EVENTS_ENABLED flag passed to spu_create, in order to get the new behavior. When this flag is not passed, spu_run will simply ignore the third argument now. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# a68cf983	04-Oct-2006	Mark Nutter <mnutter@us.ibm.com>	[POWERPC] spufs: scheduler support for NUMA. This patch adds NUMA support to the the spufs scheduler. The new arch/powerpc/platforms/cell/spufs/sched.c is greatly simplified, in an attempt to reduce complexity while adding support for NUMA scheduler domains. SPUs are allocated starting from the calling thread's node, moving to others as supported by current->cpus_allowed. Preemption is gone as it was buggy, but should be re-enabled in another patch when stable. The new arch/powerpc/platforms/cell/spu_base.c maintains idle lists on a per-node basis, and allows caller to specify which node(s) an SPU should be allocated from, while passing -1 tells spu_alloc() that any node is allowed. Since the patch removes the currently implemented preemptive scheduling, it is technically a regression, but practically all users have since migrated to this version, as it is part of the IBM SDK and the yellowdog distribution, so there is not much point holding it back while the new preemptive scheduling patch gets delayed further. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 6ab3d562	30-Jun-2006	Jörn Engel <joern@wohnheim.fh-wedel.de>	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
# a91942ae	19-Jun-2006	Geoff Levand <geoffrey.levand@am.sony.com>	[POWERPC] spufs: fix spu irq affinity setting This changes the hypervisor abstraction of setting cpu affinity to a higher level to avoid platform dependent interrupt controller routines. I replaced spu_priv1_ops:spu_int_route_set() with a new routine spu_priv1_ops:spu_cpu_affinity_set(). As a by-product, this change eliminated what looked like an existing bug in the set affinity code where spu_int_route_set() mistakenly called int_stat_get(). Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# a33a7d73	22-Mar-2006	Arnd Bergmann <abergman@de.ibm.com>	[PATCH] spufs: implement mfc access for PPE-side DMA This patch adds a new file called 'mfc' to each spufs directory. The file accepts DMA commands that are a subset of what would be legal DMA commands for problem state register access. Upon reading the file, a bitmask is returned with the completed tag groups set. The file is meant to be used from an abstraction in libspe that is added by a different patch. From the kernel perspective, this means a process can now offload a memory copy from or into an SPE local store without having to run code on the SPE itself. The transfer will only be performed while the SPE is owned by one thread that is waiting in the spu_run system call and the data will be transferred into that thread's address space, independent of which thread started the transfer. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 2fb9d206	05-Jan-2006	Arnd Bergmann <arnd@arndb.de>	[PATCH] spufs: set irq affinity for running threads For far, all SPU triggered interrupts always end up on the first SMT thread, which is a bad solution. This patch implements setting the affinity to the CPU that was running last when entering execution on an SPU. This should result in a significant reduction in IPI calls and better cache locality for SPE thread specific data. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 8837d921	04-Jan-2006	Arnd Bergmann <arnd@arndb.de>	[PATCH] spufs: clean up use of bitops checking bits manually might not be synchonized with the use of set_bit/clear_bit. Make sure we always use the correct bitops by removing the unnecessary identifiers. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 7945a4a2	09-Dec-2005	Arnd Bergmann <arnd@arndb.de>	[PATCH] spufs: trivial compile fix One of my last patches contained a broken line from splitting out some other changes, this restores a working version. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 2a911f0b	05-Dec-2005	Arnd Bergmann <arnd@arndb.de>	[PATCH] spufs: Improved SPU preemptability [part 2]. This patch reduces lock complexity of SPU scheduler, particularly for involuntary preemptive switches. As a result the new code does a better job of mapping the highest priority tasks to SPUs. Lock complexity is reduced by using the system default workqueue to perform involuntary saves. In this way we avoid nasty lock ordering problems that the previous code had. A "minimum timeslice" for SPU contexts is also introduced. The intent here is to avoid thrashing. While the new scheduler does a better job at prioritization it still does nothing for fairness. From: Mark Nutter <mnutter@us.ibm.com> Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 5110459f	05-Dec-2005	Arnd Bergmann <arnd@arndb.de>	[PATCH] spufs: Improved SPU preemptability. This patch makes it easier to preempt an SPU context by having the scheduler hold ctx->state_sema for much shorter periods of time. As part of this restructuring, the control logic for the "run" operation is moved from arch/ppc64/kernel/spu_base.c to fs/spufs/file.c. Of course the base retains "bottom half" handlers for class{0,1} irqs. The new run loop will re-acquire an SPU if preempted. From: Mark Nutter <mnutter@us.ibm.com> Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 3b3d22cb	05-Dec-2005	Arnd Bergmann <arnd@arndb.de>	[PATCH] spufs: Turn off debugging output spufs is rather noisy when debugging is enabled, this turns off the messages for production use. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
# 8b3d6663	15-Nov-2005	Arnd Bergmann <arnd@arndb.de>	[PATCH] spufs: cooperative scheduler support This adds a scheduler for SPUs to make it possible to use more logical SPUs than physical ones are present in the system. Currently, there is no support for preempting a running SPU thread, they have to leave the SPU by either triggering an event on the SPU that causes it to return to the owning thread or by sending a signal to it. This patch also adds operations that enable accessing an SPU in either runnable or saved state. We use an RW semaphore to protect the state of the SPU from changing underneath us, while we are holding it readable. In order to change the state, it is acquired writeable and a context save or restore is executed before downgrading the semaphore to read-only. From: Mark Nutter <mnutter@us.ibm.com>, Uli Weigand <Ulrich.Weigand@de.ibm.com> Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>