#
5f423759 |
|
12-Sep-2023 |
Casey Schaufler <casey@schaufler-ca.com> |
LSM: wireup Linux Security Module syscalls Wireup lsm_get_self_attr, lsm_set_self_attr and lsm_list_modules system calls. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: linux-api@vger.kernel.org Reviewed-by: Mickaël Salaün <mic@digikod.net> [PM: forward ported beyond v6.6 due merge window changes] Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
f0a78b3e |
|
08-Jan-2024 |
Florian Fainelli <florian.fainelli@broadcom.com> |
arm64: Update __NR_compat_syscalls for statmount/listmount Commit d8b0f5465012 ("wire up syscalls for statmount/listmount") added two new system calls to arch/arm64/include/asm/unistd32.h but forgot to update the __NR_compat_syscalls number, thus causing the following build failures: arch/arm64/include/asm/unistd32.h:922:24: error: array index in initializer exceeds array bounds 922 | #define __NR_statmount 457 | ^~~ arch/arm64/kernel/sys32.c:130:34: note: in definition of macro '__SYSCALL' 130 | #define __SYSCALL(nr, sym) [nr] = __arm64_##sym, | ^~ Bump up the number by two to accomodate for the new system calls added. Fixes: d8b0f5465012 ("wire up syscalls for statmount/listmount") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2fd0ebad |
|
14-Sep-2023 |
Sohil Mehta <sohil.mehta@intel.com> |
arch: Reserve map_shadow_stack() syscall number for all architectures commit c35559f94ebc ("x86/shstk: Introduce map_shadow_stack syscall") recently added support for map_shadow_stack() but it is limited to x86 only for now. There is a possibility that other architectures (namely, arm64 and RISC-V), that are implementing equivalent support for shadow stacks, might need to add support for it. Independent of that, reserving arch-specific syscall numbers in the syscall tables of all architectures is good practice and would help avoid future conflicts. map_shadow_stack() is marked as a conditional syscall in sys_ni.c. Adding it to the syscall tables of other architectures is harmless and would return ENOSYS when exercised. Note, map_shadow_stack() was assigned #453 during the merge process since #452 was taken by fchmodat2(). For Powerpc, map it to sys_ni_syscall() as is the norm for Powerpc syscall tables. For Alpha, map_shadow_stack() takes up #563 as Alpha still diverges from the common syscall numbering system in the other architectures. Link: https://lore.kernel.org/lkml/20230515212255.GA562920@debug.ba.rivosinc.com/ Link: https://lore.kernel.org/lkml/b402b80b-a7c6-4ef0-b977-c0f5f582b78a@sirena.org.uk/ Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
0f4b5f97 |
|
20-Sep-2023 |
peterz@infradead.org <peterz@infradead.org> |
futex: Add sys_futex_requeue() Finish off the 'simple' futex2 syscall group by adding sys_futex_requeue(). Unlike sys_futex_{wait,wake}() its arguments are too numerous to fit into a regular syscall. As such, use struct futex_waitv to pass the 'source' and 'destination' futexes to the syscall. This syscall implements what was previously known as FUTEX_CMP_REQUEUE and uses {val, uaddr, flags} for source and {uaddr, flags} for destination. This design explicitly allows requeueing between different types of futex by having a different flags word per uaddr. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230921105248.511860556@noisy.programming.kicks-ass.net
|
#
cb8c4312 |
|
20-Sep-2023 |
peterz@infradead.org <peterz@infradead.org> |
futex: Add sys_futex_wait() To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This syscall implements what was previously known as FUTEX_WAIT_BITSET except it uses 'unsigned long' for the value and bitmask arguments, takes timespec and clockid_t arguments for the absolute timeout and uses FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230921105248.164324363@noisy.programming.kicks-ass.net
|
#
9f6c532f |
|
20-Sep-2023 |
peterz@infradead.org <peterz@infradead.org> |
futex: Add sys_futex_wake() To complement sys_futex_waitv() add sys_futex_wake(). This syscall implements what was previously known as FUTEX_WAKE_BITSET except it uses 'unsigned long' for the bitmask and takes FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230921105247.936205525@noisy.programming.kicks-ass.net
|
#
78252deb |
|
11-Jul-2023 |
Palmer Dabbelt <palmer@sifive.com> |
arch: Register fchmodat2, usually as syscall 452 This registers the new fchmodat2 syscall in most places as nuber 452, with alpha being the exception where it's 562. I found all these sites by grepping for fspick, which I assume has found me everything. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Alexey Gladkov <legion@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Message-Id: <a677d521f048e4ca439e7080a5328f21eb8e960e.1689092120.git.legion@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
946e697c |
|
10-May-2023 |
Nhat Pham <nphamcs@gmail.com> |
cachestat: wire up cachestat for other architectures cachestat is previously only wired in for x86 (and architectures using the generic unistd.h table): https://lore.kernel.org/lkml/20230503013608.2431726-1-nphamcs@gmail.com/ This patch wires cachestat in for all the other architectures. [nphamcs@gmail.com: wire up cachestat for arm64] Link: https://lkml.kernel.org/r/20230511092843.3896327-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20230510195806.2902878-1-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
f18ed30d |
|
05-Apr-2022 |
Guo Ren <guoren@kernel.org> |
fs: stat: compat: Add __ARCH_WANT_COMPAT_STAT RISC-V doesn't neeed compat_stat, so using __ARCH_WANT_COMPAT_STAT to exclude unnecessary SYSCALL functions. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Helge Deller <deller@gmx.de> # parisc Link: https://lore.kernel.org/r/20220405071314.3225832-6-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
#
21b084fd |
|
14-Jan-2022 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
mm/mempolicy: wire up syscall set_mempolicy_home_node Link: https://lkml.kernel.org/r/20211202123810.267175-4-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ea7c45fd |
|
23-Sep-2021 |
André Almeida <andrealmeid@collabora.com> |
futex,arm: Wire up sys_futex_waitv() Wire up syscall entry point for ARM architectures, for both 32 and 64-bit. Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210923171111.300673-19-andrealmeid@collabora.com
|
#
dce49103 |
|
02-Sep-2021 |
Suren Baghdasaryan <surenb@google.com> |
mm: wire up syscall process_mrelease Split off from prev patch in the series that implements the syscall. Link: https://lkml.kernel.org/r/20210809185259.405936-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Jan Engelhardt <jengelh@inai.de> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a49f4f81 |
|
22-Apr-2021 |
Mickaël Salaün <mic@linux.microsoft.com> |
arch: Wire up Landlock syscalls Wire up the following system calls for all architectures: * landlock_create_ruleset(2) * landlock_add_rule(2) * landlock_restrict_self(2) Cc: Arnd Bergmann <arnd@arndb.de> Cc: James Morris <jmorris@namei.org> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Link: https://lore.kernel.org/r/20210422154123.13086-10-mic@digikod.net Signed-off-by: James Morris <jamorris@linux.microsoft.com>
|
#
fa8b9007 |
|
04-Mar-2021 |
Sascha Hauer <s.hauer@pengutronix.de> |
quota: wire up quotactl_path Wire up the quotactl_path syscall added in the previous patch. Link: https://lore.kernel.org/r/20210304123541.30749-3-s.hauer@pengutronix.de Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
2a186721 |
|
21-Jan-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
fs: add mount_setattr() This implements the missing mount_setattr() syscall. While the new mount api allows to change the properties of a superblock there is currently no way to change the properties of a mount or a mount tree using file descriptors which the new mount api is based on. In addition the old mount api has the restriction that mount options cannot be applied recursively. This hasn't changed since changing mount options on a per-mount basis was implemented in [1] and has been a frequent request not just for convenience but also for security reasons. The legacy mount syscall is unable to accommodate this behavior without introducing a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND | MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost mount. Changing MS_REC to apply to the whole mount tree would mean introducing a significant uapi change and would likely cause significant regressions. The new mount_setattr() syscall allows to recursively clear and set mount options in one shot. Multiple calls to change mount options requesting the same changes are idempotent: int mount_setattr(int dfd, const char *path, unsigned flags, struct mount_attr *uattr, size_t usize); Flags to modify path resolution behavior are specified in the @flags argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW, and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to restrict path resolution as introduced with openat2() might be supported in the future. The mount_setattr() syscall can be expected to grow over time and is designed with extensibility in mind. It follows the extensible syscall pattern we have used with other syscalls such as openat2(), clone3(), sched_{set,get}attr(), and others. The set of mount options is passed in the uapi struct mount_attr which currently has the following layout: struct mount_attr { __u64 attr_set; __u64 attr_clr; __u64 propagation; __u64 userns_fd; }; The @attr_set and @attr_clr members are used to clear and set mount options. This way a user can e.g. request that a set of flags is to be raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in @attr_set while at the same time requesting that another set of flags is to be lowered such as removing noexec from a mount tree by specifying MOUNT_ATTR_NOEXEC in @attr_clr. Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0, not a bitmap, users wanting to transition to a different atime setting cannot simply specify the atime setting in @attr_set, but must also specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in @attr_clr. The @propagation field lets callers specify the propagation type of a mount tree. Propagation is a single property that has four different settings and as such is not really a flag argument but an enum. Specifically, it would be unclear what setting and clearing propagation settings in combination would amount to. The legacy mount() syscall thus forbids the combination of multiple propagation settings too. The goal is to keep the semantics of mount propagation somewhat simple as they are overly complex as it is. The @userns_fd field lets user specify a user namespace whose idmapping becomes the idmapping of the mount. This is implemented and explained in detail in the next patch. [1]: commit 2e4b7fcd9260 ("[PATCH] r/o bind mounts: honor mount writer counts at remount") Link: https://lore.kernel.org/r/20210121131959.646623-35-christian.brauner@ubuntu.com Cc: David Howells <dhowells@redhat.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-api@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
b0a0c261 |
|
18-Dec-2020 |
Willem de Bruijn <willemb@google.com> |
epoll: wire up syscall epoll_pwait2 Split off from prev patch in the series that implements the syscall. Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ecb8ac8b |
|
17-Oct-2020 |
Minchan Kim <minchan@kernel.org> |
mm/madvise: introduce process_madvise() syscall: an external memory hinting API There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. It also supports vector address range because Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma.(With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. ince it could affect other process's address range, only privileged process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. FAQ: Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/ [minchan@kernel.org: fix process_madvise build break for arm64] Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com [minchan@kernel.org: fix build error for mips of process_madvise] Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com [akpm@linux-foundation.org: fix patch ordering issue] [akpm@linux-foundation.org: fix arm64 whoops] [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian] [akpm@linux-foundation.org: fix i386 build] [sfr@canb.auug.org.au: fix syscall numbering] Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au [sfr@canb.auug.org.au: madvise.c needs compat.h] Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au [minchan@kernel.org: fix mips build] Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com [yuehaibing@huawei.com: remove duplicate header which is included twice] Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com [minchan@kernel.org: do not use helper functions for process_madvise] Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com [akpm@linux-foundation.org: pidfd_get_pid() gained an argument] [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"] Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c8ffd8bc |
|
14-May-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
vfs: add faccessat2 syscall POSIX defines faccessat() as having a fourth "flags" argument, while the linux syscall doesn't have it. Glibc tries to emulate AT_EACCESS and AT_SYMLINK_NOFOLLOW, but AT_EACCESS emulation is broken. Add a new faccessat(2) syscall with the added flags argument and implement both flags. The value of AT_EACCESS is defined in glibc headers to be the same as AT_REMOVEDIR. Use this value for the kernel interface as well, together with the explanatory comment. Also add AT_EMPTY_PATH support, which is not documented by POSIX, but can be useful and is trivial to implement. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
3568b889 |
|
19-Mar-2020 |
Vincenzo Frascino <vincenzo.frascino@arm.com> |
arm64: compat: Fix syscall number of compat_clock_getres The syscall number of compat_clock_getres was erroneously set to 247 (__NR_io_cancel!) instead of 264. This causes the vDSO fallback of clock_getres() to land on the wrong syscall for compat tasks. Fix the numbering. Cc: <stable@vger.kernel.org> Fixes: 53c489e1dfeb6 ("arm64: compat: Add missing syscall numbers") Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
fddb5d43 |
|
18-Jan-2020 |
Aleksa Sarai <cyphar@cyphar.com> |
open: introduce openat2(2) syscall /* Background. */ For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Userspace also has a hard time figuring out whether a particular flag is supported on a particular kernel. While it is now possible with contemporary kernels (thanks to [3]), older kernels will expose unknown flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during openat(2) time matches modern syscall designs and is far more fool-proof. In addition, the newly-added path resolution restriction LOOKUP flags (which we would like to expose to user-space) don't feel related to the pre-existing O_* flag set -- they affect all components of path lookup. We'd therefore like to add a new flag argument. Adding a new syscall allows us to finally fix the flag-ignoring problem, and we can make it extensible enough so that we will hopefully never need an openat3(2). /* Syscall Prototype. */ /* * open_how is an extensible structure (similar in interface to * clone3(2) or sched_setattr(2)). The size parameter must be set to * sizeof(struct open_how), to allow for future extensions. All future * extensions will be appended to open_how, with their zero value * acting as a no-op default. */ struct open_how { /* ... */ }; int openat2(int dfd, const char *pathname, struct open_how *how, size_t size); /* Description. */ The initial version of 'struct open_how' contains the following fields: flags Used to specify openat(2)-style flags. However, any unknown flag bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR) will result in -EINVAL. In addition, this field is 64-bits wide to allow for more O_ flags than currently permitted with openat(2). mode The file mode for O_CREAT or O_TMPFILE. Must be set to zero if flags does not contain O_CREAT or O_TMPFILE. resolve Restrict path resolution (in contrast to O_* flags they affect all path components). The current set of flags are as follows (at the moment, all of the RESOLVE_ flags are implemented as just passing the corresponding LOOKUP_ flag). RESOLVE_NO_XDEV => LOOKUP_NO_XDEV RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS RESOLVE_BENEATH => LOOKUP_BENEATH RESOLVE_IN_ROOT => LOOKUP_IN_ROOT open_how does not contain an embedded size field, because it is of little benefit (userspace can figure out the kernel open_how size at runtime fairly easily without it). It also only contains u64s (even though ->mode arguably should be a u16) to avoid having padding fields which are never used in the future. Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE is no longer permitted for openat(2). As far as I can tell, this has always been a bug and appears to not be used by userspace (and I've not seen any problems on my machines by disallowing it). If it turns out this breaks something, we can special-case it and only permit it for openat(2) but not openat2(2). After input from Florian Weimer, the new open_how and flag definitions are inside a separate header from uapi/linux/fcntl.h, to avoid problems that glibc has with importing that header. /* Testing. */ In a follow-up patch there are over 200 selftests which ensure that this syscall has the correct semantics and will correctly handle several attack scenarios. In addition, I've written a userspace library[4] which provides convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care must be taken when using RESOLVE_IN_ROOT'd file descriptors with other syscalls). During the development of this patch, I've run numerous verification tests using libpathrs (showing that the API is reasonably usable by userspace). /* Future Work. */ Additional RESOLVE_ flags have been suggested during the review period. These can be easily implemented separately (such as blocking auto-mount during resolution). Furthermore, there are some other proposed changes to the openat(2) interface (the most obvious example is magic-link hardening[5]) which would be a good opportunity to add a way for userspace to restrict how O_PATH file descriptors can be re-opened. Another possible avenue of future work would be some kind of CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace which openat2(2) flags and fields are supported by the current kernel (to avoid userspace having to go through several guesses to figure it out). [1]: https://lwn.net/Articles/588444/ [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com [3]: commit 629e014bb834 ("fs: completely ignore unknown open flags") [4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523 [5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/ [6]: https://youtu.be/ggD-eb3yPVs Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
9a2cef09 |
|
07-Jan-2020 |
Sargun Dhillon <sargun@sargun.me> |
arch: wire up pidfd_getfd syscall This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
3e3c8ca5 |
|
02-Jan-2020 |
Amanieu d'Antras <amanieu@gmail.com> |
arm64: Move __ARCH_WANT_SYS_CLONE3 definition to uapi headers Previously this was only defined in the internal headers which resulted in __NR_clone3 not being defined in the user headers. Signed-off-by: Amanieu d'Antras <amanieu@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Cc: <stable@vger.kernel.org> # 5.3.x Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200102172413.654385-2-amanieu@gmail.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
7615d9e1 |
|
23-May-2019 |
Christian Brauner <christian@brauner.io> |
arch: wire-up pidfd_open() This wires up the pidfd_open() syscall into all arches at once. Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jann Horn <jannh@google.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-api@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-ia64@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org
|
#
53c489e1 |
|
21-Jun-2019 |
Vincenzo Frascino <vincenzo.frascino@arm.com> |
arm64: compat: Add missing syscall numbers vDSO requires gettimeofday() and clock_gettime() syscalls to implement the fallback mechanism. Add the missing syscall numbers to unistd.h for arm64. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Shijith Thotton <sthotton@marvell.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Link: https://lkml.kernel.org/r/20190621095252.32307-7-vincenzo.frascino@arm.com
|
#
d68dbb0c |
|
20-Jun-2019 |
Christian Brauner <christian@brauner.io> |
arch: handle arches who do not yet define clone3 This cleanly handles arches who do not yet define clone3. clone3() was initially placed under __ARCH_WANT_SYS_CLONE under the assumption that this would cleanly handle all architectures. It does not. Architectures such as nios2 or h8300 simply take the asm-generic syscall definitions and generate their syscall table from it. Since they don't define __ARCH_WANT_SYS_CLONE the build would fail complaining about sys_clone3 missing. The reason this doesn't happen for legacy clone is that nios2 and h8300 provide assembly stubs for sys_clone. This seems to be done for architectural reasons. The build failures for nios2 and h8300 were caught int -next luckily. The solution is to define __ARCH_WANT_SYS_CLONE3 that architectures can add. Additionally, we need a cond_syscall(clone3) for architectures such as nios2 or h8300 that generate their syscall table in the way I explained above. Fixes: 8f3220a80654 ("arch: wire-up clone3() syscall") Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Adrian Reber <adrian@lisas.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Florian Weimer <fweimer@redhat.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org
|
#
caab277b |
|
02-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
8f3220a8 |
|
25-May-2019 |
Christian Brauner <christian@brauner.io> |
arch: wire-up clone3() syscall Wire up the clone3() call on all arches that don't require hand-rolled assembly. Some of the arches look like they need special assembly massaging and it is probably smarter if the appropriate arch maintainers would do the actual wiring. Arches that are wired-up are: - x86{_32,64} - arm{64} - xtensa Signed-off-by: Christian Brauner <christian@brauner.io> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Adrian Reber <adrian@lisas.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Florian Weimer <fweimer@redhat.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org
|
#
d8076bdb |
|
15-May-2019 |
David Howells <dhowells@redhat.com> |
uapi: Wire up the mount API syscalls on non-x86 arches [ver #2] Wire up the mount API syscalls on non-x86 arches. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
39036cd2 |
|
28-Feb-2019 |
Arnd Bergmann <arnd@arndb.de> |
arch: add pidfd and io_uring syscalls everywhere Add the io_uring and pidfd_send_signal system calls to all architectures. These system calls are designed to handle both native and compat tasks, so all entries are the same across architectures, only arm-compat and the generic tale still use an old format. Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> (s390) Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
48166e6e |
|
09-Jan-2019 |
Arnd Bergmann <arnd@arndb.de> |
y2038: add 64-bit time_t syscalls to all 32-bit architectures This adds 21 new system calls on each ABI that has 32-bit time_t today. All of these have the exact same semantics as their existing counterparts, and the new ones all have macro names that end in 'time64' for clarification. This gets us to the point of being able to safely use a C library that has 64-bit time_t in user space. There are still a couple of loose ends to tie up in various areas of the code, but this is the big one, and should be entirely uncontroversial at this point. In particular, there are four system calls (getitimer, setitimer, waitid, and getrusage) that don't have a 64-bit counterpart yet, but these can all be safely implemented in the C library by wrapping around the existing system calls because the 32-bit time_t they pass only counts elapsed time, not time since the epoch. They will be dealt with later. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
4ab65ba7 |
|
30-Dec-2018 |
Arnd Bergmann <arnd@arndb.de> |
ARM: add kexec_file_load system call number A couple of architectures including arm64 already implement the kexec_file_load system call, on many others we have assigned a system call number for it, but not implemented it yet. Adding the number in arch/arm/ lets us use the system call on arm64 systems in compat mode, and also reduces the number of differences between architectures. If we want to implement kexec_file_load on ARM in the future, the number assignment means that kexec tools can already be built with the now current set of kernel headers. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
78594b95 |
|
30-Dec-2018 |
Arnd Bergmann <arnd@arndb.de> |
ARM: add migrate_pages() system call The migrate_pages system call has an assigned number on all architectures except ARM. When it got added initially in commit d80ade7b3231 ("ARM: Fix warning: #warning syscall migrate_pages not implemented"), it was intentionally left out based on the observation that there are no 32-bit ARM NUMA systems. However, there are now arm64 NUMA machines that can in theory run 32-bit kernels (actually enabling NUMA there would require additional work) as well as 32-bit user space on 64-bit kernels, so that argument is no longer very strong. Assigning the number lets us use the system call on 64-bit kernels as well as providing a more consistent set of syscalls across architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
7e0b44e8 |
|
03-Jan-2019 |
Will Deacon <will@kernel.org> |
arm64: compat: Hook up io_pgetevents() for 32-bit tasks Commit 73aeb2cbcdc9 ("ARM: 8787/1: wire up io_pgetevents syscall") hooked up the io_pgetevents() system call for 32-bit ARM, so we can do the same for the compat wrapper on arm64. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
169113ec |
|
03-Jan-2019 |
Will Deacon <will@kernel.org> |
arm64: compat: Avoid sending SIGILL for unallocated syscall numbers The ARM Linux kernel handles the EABI syscall numbers as follows: 0 - NR_SYSCALLS-1 : Invoke syscall via syscall table NR_SYSCALLS - 0xeffff : -ENOSYS (to be allocated in future) 0xf0000 - 0xf07ff : Private syscall or -ENOSYS if not allocated > 0xf07ff : SIGILL Our compat code gets this wrong and ends up sending SIGILL in response to all syscalls greater than NR_SYSCALLS which have a value greater than 0x7ff in the bottom 16 bits. Fix this by defining the end of the ARM private syscall region and checking the syscall number against that directly. Update the comment while we're at it. Cc: <stable@vger.kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Reported-by: Pi-Hsun Shih <pihsun@chromium.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
4faea239 |
|
16-Apr-2018 |
Arnd Bergmann <arnd@arndb.de> |
y2038: utimes: Rework #ifdef guards for compat syscalls After changing over to 64-bit time_t syscalls, many architectures will want compat_sys_utimensat() but not respective handlers for utime(), utimes() and futimesat(). This adds a new __ARCH_WANT_SYS_UTIME32 to complement __ARCH_WANT_SYS_UTIME. For now, all 64-bit architectures that support CONFIG_COMPAT set it, but future 64-bit architectures will not (tile would not have needed it either, but got removed). As older 32-bit architectures get converted to using CONFIG_64BIT_TIME, they will have to use __ARCH_WANT_SYS_UTIME32 instead of __ARCH_WANT_SYS_UTIME. Architectures using the generic syscall ABI don't need either of them as they never had a utime syscall. Since the compat_utimbuf structure is now required outside of CONFIG_COMPAT, I'm moving it into compat_time.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- changed from last version: - renamed __ARCH_WANT_COMPAT_SYS_UTIME to __ARCH_WANT_SYS_UTIME32
|
#
caf6f9c8 |
|
12-Apr-2018 |
Arnd Bergmann <arnd@arndb.de> |
asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macro The sys_llseek sytem call is needed on all 32-bit architectures and none of the 64-bit ones, so we can remove the __ARCH_WANT_SYS_LLSEEK guard and simplify the include/asm-generic/unistd.h header further. Since 32-bit tasks can run either natively or in compat mode on 64-bit architectures, we have to check for both !CONFIG_64BIT and CONFIG_COMPAT. There are a few 64-bit architectures that also reference sys_llseek in their 64-bit ABI (e.g. sparc), but I verified that those all select CONFIG_COMPAT, so the #if check is still correct here. It's a bit odd to include it in the syscall table though, as it's the same as sys_lseek() on 64-bit, but with strange calling conventions. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
409d5db4 |
|
20-Jun-2018 |
Will Deacon <will@kernel.org> |
arm64: rseq: Implement backend rseq calls and select HAVE_RSEQ Implement calls to rseq_signal_deliver, rseq_handle_notify_resume and rseq_syscall so that we can select HAVE_RSEQ on arm64. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
2611dc19 |
|
08-Apr-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
Remove compat_sys_getdents64() Unlike normal compat syscall variants, it is needed only for biarch architectures that have different alignement requirements for u64 in 32bit and 64bit ABI *and* have __put_user() that won't handle a store of 64bit value at 32bit-aligned address. We used to have one such (ia64), but its biarch support has been gone since 2010 (after being broken in 2008, which went unnoticed since nobody had been using it). It had escaped removal at the same time only because back in 2004 a patch that switched several syscalls on amd64 from private wrappers to generic compat ones had switched to use of compat_sys_getdents64(), which hadn't needed (or used) a compat wrapper on amd64. Let's bury it - it's at least 7 years overdue. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
713cc9df |
|
21-Mar-2017 |
Will Deacon <will@kernel.org> |
arm64: compat: Update compat syscalls Hook up three pkey syscalls (which we don't implement) and the new statx syscall, as has been done for arch/arm/. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
10fdf851 |
|
01-Jun-2016 |
Will Deacon <will@kernel.org> |
arm64: unistd32.h: wire up missing syscalls for compat tasks We're missing entries for mlock2, copy_file_range, preadv2 and pwritev2 in our compat syscall table, so hook them up. Only the last two need compat wrappers. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
eb93ce2c |
|
14-Oct-2015 |
Will Deacon <will@kernel.org> |
arm64: compat: wire up new syscalls Commit 208473c1f3ac ("ARM: wire up new syscalls") hooked up the new userfaultfd and membarrier syscalls for ARM, so do the same for our compat syscall table in arm64. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
0156411b |
|
06-Jan-2015 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Implement the compat_sys_call_table in C Unlike the sys_call_table[], the compat one was implemented in sys32.S making it impossible to notice discrepancies between the number of compat syscalls and the __NR_compat_syscalls macro, the latter having to be defined in asm/unistd.h as including asm/unistd32.h would cause conflicts on __NR_* definitions. With this patch, incorrect __NR_compat_syscalls values will result in a build-time error. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
|
#
cd25b85b |
|
12-Jan-2015 |
Will Deacon <will@kernel.org> |
arm64: compat: wire up compat_sys_execveat With 841ee230253f ("ARM: wire up execveat syscall"), arch/arm/ has grown support for the execveat system call. This patch wires up the compat variant for arm64. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
0f9132ce |
|
04-Jan-2015 |
Mark Rutland <mark.rutland@arm.com> |
arm64: Correct __NR_compat_syscalls for bpf Commit 97b56be10352a70c (arm64: compat: Enable bpf syscall) made the usual mistake of forgetting to update __NR_compat_syscalls. Due to this, when el0_sync_compat calls el0_svc_naked, the test against sc_nr (__NR_compat_syscalls) will fail, and we'll call ni_sys, returning -ENOSYS to userspace. This patch bumps __NR_compat_syscalls appropriately, enabling the use of the bpf syscall from compat tasks. Due to the reorganisation of unistd{,32}.h as part of commit f3e5c847ec3d12b4 (arm64: Add __NR_* definitions for compat syscalls) it is not currently possible to include both headers and sanity-check the value of __NR_compat_syscalls at build-time to prevent this from happening again. Additional rework is required to make such niceties a possibility. Cc: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
a1ae65b2 |
|
27-Nov-2014 |
AKASHI Takahiro <takahiro.akashi@linaro.org> |
arm64: add seccomp support secure_computing() is called first in syscall_trace_enter() so that a system call will be aborted quickly without doing succeeding syscall tracing if seccomp rules want to deny that system call. On compat task, syscall numbers for system calls allowed in seccomp mode 1 are different from those on normal tasks, and so _NR_seccomp_xxx_32's need to be redefined. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
a97a42c4 |
|
11-Aug-2014 |
Will Deacon <will@kernel.org> |
arm64: compat: wire up memfd_create and getrandom syscalls for aarch32 arch/arm/ just grew support for the new memfd_create and getrandom syscalls, so add them to our compat layer too. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
f3e5c847 |
|
30-Jan-2014 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Add __NR_* definitions for compat syscalls This patch adds __NR_* definitions to asm/unistd32.h, moves the __NR_compat_* definitions to asm/unistd.h and removes all the explicit unistd32.h includes apart from the one building the compat syscall table. The aim is to have the compat __NR_* definitions available but without colliding with the native syscall definitions (required by lib/compat_audit.c to avoid duplicating the audit header files between native and compat). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
055b1212 |
|
30-Apr-2014 |
AKASHI Takahiro <takahiro.akashi@linaro.org> |
arm64: ftrace: Add system call tracepoint This patch allows system call entry or exit to be traced as ftrace events, ie. sys_enter_*/sys_exit_*, if CONFIG_FTRACE_SYSCALLS is enabled. Those events appear and can be controlled under ${sysfs}/tracing/events/syscalls/ Please note that we can't trace compat system calls here because AArch32 mode does not share the same syscall table with AArch64. Just define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS in order to avoid unexpected results (bogus syscalls reported or even hang-up). Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
0473c9b5 |
|
03-Mar-2014 |
Heiko Carstens <hca@linux.ibm.com> |
compat: let architectures define __ARCH_WANT_COMPAT_SYS_GETDENTS64 For architecture dependent compat syscalls in common code an architecture must define something like __ARCH_WANT_<WHATEVER> if it wants to use the code. This however is not true for compat_sys_getdents64 for which architectures must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code. This leads to the situation where all architectures, except mips, get the compat code but only x86_64, arm64 and the generic syscall architectures actually use it. So invert the logic, so that architectures actively must do something to get the compat code. This way a couple of architectures get rid of otherwise dead code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
d64008a8 |
|
25-Nov-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
burying unused conditionals __ARCH_WANT_SYS_RT_SIGACTION, __ARCH_WANT_SYS_RT_SIGSUSPEND, __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND, __ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} - can be assumed always set.
|
#
ae903caa |
|
13-Dec-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
Bury the conditionals from kernel_thread/kernel_execve series All architectures have CONFIG_GENERIC_KERNEL_THREAD CONFIG_GENERIC_KERNEL_EXECVE __ARCH_WANT_SYS_EXECVE None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers of kernel_execve() (which is a trivial wrapper for do_execve() now) left. Kill the conditionals and make both callers use do_execve(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0ad50c38 |
|
17-Dec-2012 |
Catalin Marinas <catalin.marinas@arm.com> |
compat: generic compat_sys_sched_rr_get_interval() implementation This function is used by sparc, powerpc tile and arm64 for compat support. The patch adds a generic implementation with a wrapper for PowerPC to do the u32->int sign extension. The reason for a single patch covering powerpc, tile, sparc and arm64 is to keep it bisectable, otherwise kernel building may fail with mismatched function declarations. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [for tile] Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9ac08002 |
|
21-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
arm64: sanitize copy_thread(), switch to generic fork/vfork/clone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6212a512 |
|
07-Nov-2012 |
Will Deacon <will@kernel.org> |
arm64: compat: select CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION Commit c1d7e01d7877 ("ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION") replaced the __ARCH_WANT_COMPAT_IPC_PARSE_VERSION token with a corresponding Kconfig option instead. This patch updates arm64 to use the latter, rather than #define an unused token. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
6a872777 |
|
10-Sep-2012 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Use generic sys_execve() implementation This patch converts the arm64 port to use the generic sys_execve() implementation removing the arm64-specific (compat_)sys_execve_wrapper() functions. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
4262a727 |
|
11-Oct-2012 |
David Howells <dhowells@redhat.com> |
UAPI: (Scripted) Disintegrate arch/arm64/include/asm Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
|
#
f3d447a9 |
|
10-Oct-2012 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Do not include asm/unistd32.h in asm/unistd.h This patch only includes asm/unistd32.h where necessary and removes its inclusion in the asm/unistd.h file. The __SYSCALL_COMPAT guard is dropped. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Will Deacon <will.deacon@arm.com>
|
#
1c1e4362 |
|
03-Oct-2012 |
David Howells <dhowells@redhat.com> |
UAPI: Split compound conditionals containing __KERNEL__ in Arm64 Split compound conditionals containing __KERNEL__ in Arm64 where possible to make it easier for the UAPI disintegration scripts to handle them. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
89013952 |
|
03-Oct-2012 |
David Howells <dhowells@redhat.com> |
UAPI: Fix the guards on various asm/unistd.h files asm-generic/unistd.h and a number of asm/unistd.h files have been given reinclusion guards that allow the guard to be overridden if __SYSCALL is defined. Unfortunately, these files define __SYSCALL and don't undefine it when they've finished with it, thus rendering the guard ineffective. The reason for this override is to allow the file to be #included multiple times with different settings on __SYSCALL for purposes like generating syscall tables. The following guards are problematic: arch/arm64/include/asm/unistd.h:#if !defined(__ASM_UNISTD_H) || defined(__SYSCALL) arch/arm64/include/asm/unistd32.h:#if !defined(__ASM_UNISTD32_H) || defined(__SYSCALL) arch/c6x/include/asm/unistd.h:#if !defined(_ASM_C6X_UNISTD_H) || defined(__SYSCALL) arch/hexagon/include/asm/unistd.h:#if !defined(_ASM_HEXAGON_UNISTD_H) || defined(__SYSCALL) arch/openrisc/include/asm/unistd.h:#if !defined(__ASM_OPENRISC_UNISTD_H) || defined(__SYSCALL) arch/score/include/asm/unistd.h:#if !defined(_ASM_SCORE_UNISTD_H) || defined(__SYSCALL) arch/tile/include/asm/unistd.h:#if !defined(_ASM_TILE_UNISTD_H) || defined(__SYSCALL) arch/unicore32/include/asm/unistd.h:#if !defined(__UNICORE_UNISTD_H__) || defined(__SYSCALL) include/asm-generic/unistd.h:#if !defined(_ASM_GENERIC_UNISTD_H) || defined(__SYSCALL) On the assumption that the guards' ineffectiveness has passed unnoticed, just remove these guards entirely. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
7992d60d |
|
05-Mar-2012 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: System calls handling This patch adds support for system calls coming from 64-bit applications. It uses the asm-generic/unistd.h definitions with the canonical set of system calls. The private system calls are only used for 32-bit (compat) applications as 64-bit ones can set the TLS and flush the caches entirely from user space. The sys_call_table is just an array defined in a C file and it contains pointers to the syscall functions. The array is 4KB aligned to allow the use of the ADRP instruction (longer range ADR) in entry.S. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
|