History log of /linux-master/security/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
b250e6d1 03-Sep-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

- Add -s option (strict mode) to merge_config.sh to make it fail when
any symbol is redefined.

- Show a warning if a different compiler is used for building external
modules.

- Infer --target from ARCH for CC=clang to let you cross-compile the
kernel without CROSS_COMPILE.

- Make the integrated assembler default (LLVM_IAS=1) for CC=clang.

- Add <linux/stdarg.h> to the kernel source instead of borrowing
<stdarg.h> from the compiler.

- Add Nick Desaulniers as a Kbuild reviewer.

- Drop stale cc-option tests.

- Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG
to handle symbols in inline assembly.

- Show a warning if 'FORCE' is missing for if_changed rules.

- Various cleanups

* tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
kbuild: redo fake deps at include/ksym/*.h
kbuild: clean up objtool_args slightly
modpost: get the *.mod file path more simply
checkkconfigsymbols.py: Fix the '--ignore' option
kbuild: merge vmlinux_link() between ARCH=um and other architectures
kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh
kbuild: merge vmlinux_link() between the ordinary link and Clang LTO
kbuild: remove stale *.symversions
kbuild: remove unused quiet_cmd_update_lto_symversions
gen_compile_commands: extract compiler command from a series of commands
x86: remove cc-option-yn test for -mtune=
arc: replace cc-option-yn uses with cc-option
s390: replace cc-option-yn uses with cc-option
ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild
sparc: move the install rule to arch/sparc/Makefile
security: remove unneeded subdir-$(CONFIG_...)
kbuild: sh: remove unused install script
kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
kbuild: Switch to 'f' variants of integrated assembler flag
kbuild: Shuffle blank line to improve comment meaning
...


14726903 03-Sep-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'akpm' (patches from Andrew)

Merge misc updates from Andrew Morton:
"173 patches.

Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
mm/madvise: add MADV_WILLNEED to process_madvise()
mm/vmstat: remove unneeded return value
mm/vmstat: simplify the array size calculation
mm/vmstat: correct some wrong comments
mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
selftests: vm: add COW time test for KSM pages
selftests: vm: add KSM merging time test
mm: KSM: fix data type
selftests: vm: add KSM merging across nodes test
selftests: vm: add KSM zero page merging test
selftests: vm: add KSM unmerge test
selftests: vm: add KSM merge test
mm/migrate: correct kernel-doc notation
mm: wire up syscall process_mrelease
mm: introduce process_mrelease system call
memblock: make memblock_find_in_range method private
mm/mempolicy.c: use in_task() in mempolicy_slab_node()
mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
mm/mempolicy: advertise new MPOL_PREFERRED_MANY
mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
...


5b78ed24 02-Sep-2021 Luigi Rizzo <lrizzo@google.com>

mm/pagemap: add mmap_assert_locked() annotations to find_vma*()

find_vma() and variants need protection when used. This patch adds
mmap_assert_lock() calls in the functions.

To make sure the invariant is satisfied, we also need to add a
mmap_read_lock() around the get_user_pages_remote() call in
get_arg_page(). The lock is not strictly necessary because the mm has
been newly created, but the extra cost is limited because the same mutex
was also acquired shortly before in __bprm_mm_init(), so it is hot and
uncontended.

[penguin-kernel@i-love.sakura.ne.jp: TOMOYO needs the same protection which get_arg_page() needs]
Link: https://lkml.kernel.org/r/58bb6bf7-a57e-8a40-e74b-39584b415152@i-love.sakura.ne.jp

Link: https://lkml.kernel.org/r/20210731175341.3458608-1-lrizzo@google.com
Signed-off-by: Luigi Rizzo <lrizzo@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e052826f 28-May-2021 Masahiro Yamada <masahiroy@kernel.org>

security: remove unneeded subdir-$(CONFIG_...)

All of these are unneeded. The directories to descend are specified
by obj-$(CONFIG_...).

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

aef4892a 02-Sep-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity subsystem updates from Mimi Zohar:

- Limit the allowed hash algorithms when writing security.ima xattrs or
verifying them, based on the IMA policy and the configured hash
algorithms.

- Return the calculated "critical data" measurement hash and size to
avoid code duplication. (Preparatory change for a proposed LSM.)

- and a single patch to address a compiler warning.

* tag 'integrity-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
IMA: reject unknown hash algorithms in ima_get_hash_algo
IMA: prevent SETXATTR_CHECK policy rules with unavailable algorithms
IMA: introduce a new policy option func=SETXATTR_CHECK
IMA: add a policy option to restrict xattr hash algorithms on appraisal
IMA: add support to restrict the hash algorithms used for file appraisal
IMA: block writes of the security.ima xattr with unsupported algorithms
IMA: remove the dependency on CRYPTO_MD5
ima: Add digest and digest_len params to the functions to measure a buffer
ima: Return int in the functions to measure a buffer
ima: Introduce ima_get_current_hash_algo()
IMA: remove -Wmissing-prototypes warning


b55060d7 02-Sep-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'hardening-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:

- Expand lib/test_stackinit to include more initialization styles

- Improve Kconfig for CLang's auto-var-init feature

- Introduce support for GCC's zero-call-used-regs feature

* tag 'hardening-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib/test_stackinit: Add assigned initializers
lib/test_stackinit: Allow building stand-alone
lib/test_stackinit: Fix static initializer test
hardening: Clarify Kconfig text for auto-var-init
hardening: Introduce CONFIG_ZERO_CALL_USED_REGS


9e9fb765 31-Aug-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
"Core:

- Enable memcg accounting for various networking objects.

BPF:

- Introduce bpf timers.

- Add perf link and opaque bpf_cookie which the program can read out
again, to be used in libbpf-based USDT library.

- Add bpf_task_pt_regs() helper to access user space pt_regs in
kprobes, to help user space stack unwinding.

- Add support for UNIX sockets for BPF sockmap.

- Extend BPF iterator support for UNIX domain sockets.

- Allow BPF TCP congestion control progs and bpf iterators to call
bpf_setsockopt(), e.g. to switch to another congestion control
algorithm.

Protocols:

- Support IOAM Pre-allocated Trace with IPv6.

- Support Management Component Transport Protocol.

- bridge: multicast: add vlan support.

- netfilter: add hooks for the SRv6 lightweight tunnel driver.

- tcp:
- enable mid-stream window clamping (by user space or BPF)
- allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
- more accurate DSACK processing for RACK-TLP

- mptcp:
- add full mesh path manager option
- add partial support for MP_FAIL
- improve use of backup subflows
- optimize option processing

- af_unix: add OOB notification support.

- ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the
router.

- mac80211: Target Wake Time support in AP mode.

- can: j1939: extend UAPI to notify about RX status.

Driver APIs:

- Add page frag support in page pool API.

- Many improvements to the DSA (distributed switch) APIs.

- ethtool: extend IRQ coalesce uAPI with timer reset modes.

- devlink: control which auxiliary devices are created.

- Support CAN PHYs via the generic PHY subsystem.

- Proper cross-chip support for tag_8021q.

- Allow TX forwarding for the software bridge data path to be
offloaded to capable devices.

Drivers:

- veth: more flexible channels number configuration.

- openvswitch: introduce per-cpu upcall dispatch.

- Add internet mix (IMIX) mode to pktgen.

- Transparently handle XDP operations in the bonding driver.

- Add LiteETH network driver.

- Renesas (ravb):
- support Gigabit Ethernet IP

- NXP Ethernet switch (sja1105):
- fast aging support
- support for "H" switch topologies
- traffic termination for ports under VLAN-aware bridge

- Intel 1G Ethernet
- support getcrosststamp() with PCIe PTM (Precision Time
Measurement) for better time sync
- support Credit-Based Shaper (CBS) offload, enabling HW traffic
prioritization and bandwidth reservation

- Broadcom Ethernet (bnxt)
- support pulse-per-second output
- support larger Rx rings

- Mellanox Ethernet (mlx5)
- support ethtool RSS contexts and MQPRIO channel mode
- support LAG offload with bridging
- support devlink rate limit API
- support packet sampling on tunnels

- Huawei Ethernet (hns3):
- basic devlink support
- add extended IRQ coalescing support
- report extended link state

- Netronome Ethernet (nfp):
- add conntrack offload support

- Broadcom WiFi (brcmfmac):
- add WPA3 Personal with FT to supported cipher suites
- support 43752 SDIO device

- Intel WiFi (iwlwifi):
- support scanning hidden 6GHz networks
- support for a new hardware family (Bz)

- Xen pv driver:
- harden netfront against malicious backends

- Qualcomm mobile
- ipa: refactor power management and enable automatic suspend
- mhi: move MBIM to WWAN subsystem interfaces

Refactor:

- Ambient BPF run context and cgroup storage cleanup.

- Compat rework for ndo_ioctl.

Old code removal:

- prism54 remove the obsoleted driver, deprecated by the p54 driver.

- wan: remove sbni/granch driver"

* tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits)
net: Add depends on OF_NET for LiteX's LiteETH
ipv6: seg6: remove duplicated include
net: hns3: remove unnecessary spaces
net: hns3: add some required spaces
net: hns3: clean up a type mismatch warning
net: hns3: refine function hns3_set_default_feature()
ipv6: remove duplicated 'net/lwtunnel.h' include
net: w5100: check return value after calling platform_get_resource()
net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()
net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()
net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()
fou: remove sparse errors
ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
octeontx2-af: Set proper errorcode for IPv4 checksum errors
octeontx2-af: Fix static code analyzer reported issues
octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg
octeontx2-af: Fix loop in free and unmap counter
af_unix: fix potential NULL deref in unix_dgram_connect()
dpaa2-eth: Replace strlcpy with strscpy
octeontx2-af: Use NDC TX for transmit packet data
...


efa916af 31-Aug-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-5.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- Add DM infrastructure for IMA-based remote attestion. These changes
are the basis for deploying DM-based storage in a "cloud" that must
validate configurations end-users run to maintain trust. These DM
changes allow supported DM targets' configurations to be measured via
IMA. But the policy and enforcement (of which configurations are
valid) is managed by something outside the kernel (e.g. Keylime).

- Fix DM crypt scalability regression on systems with many cpus due to
percpu_counter spinlock contention in crypt_page_alloc().

- Use in_hardirq() instead of deprecated in_irq() in DM crypt.

- Add event counters to DM writecache to allow users to further assess
how the writecache is performing.

- Various code cleanup in DM writecache's main IO mapping function.

* tag 'for-5.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm crypt: use in_hardirq() instead of deprecated in_irq()
dm ima: update dm documentation for ima measurement support
dm ima: update dm target attributes for ima measurements
dm ima: add a warning in dm_init if duplicate ima events are not measured
dm ima: prefix ima event name related to device mapper with dm_
dm ima: add version info to dm related events in ima log
dm ima: prefix dm table hashes in ima log with hash algorithm
dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()
dm: add documentation for IMA measurement support
dm: update target status functions to support IMA measurement
dm ima: measure data on device rename
dm ima: measure data on table clear
dm ima: measure data on device remove
dm ima: measure data on device resume
dm ima: measure data on table load
dm writecache: add event counters
dm writecache: report invalid return from writecache_map helpers
dm writecache: further writecache_map() cleanup
dm writecache: factor out writecache_map_remap_origin()
dm writecache: split up writecache_map() to improve code readability


9b2eacd8 31-Aug-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'Smack-for-5.15' of git://github.com/cschaufler/smack-next

Pull smack updates from Casey Schaufler:
"There is a variable used only during start-up that's now marked
__initdata and a change where the code was working by sheer luck that
is now done properly.

Both have been in next for several weeks and pass the Smack testsuite"

* tag 'Smack-for-5.15' of git://github.com/cschaufler/smack-next:
smack: mark 'smack_enabled' global variable as __initdata
Smack: Fix wrong semantics in smk_access_entry()


befa491c 31-Aug-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20210830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux update from Paul Moore:
"We've got an unusually small SELinux pull request for v5.15 that
consists of only one (?!) patch that is really pretty minor when you
look at it.

Unsurprisingly it passes all of our tests and merges cleanly on top of
your tree right now, please merge this for v5.15"

* tag 'selinux-pr-20210830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: return early for possible NULL audit buffers


46f4945e 30-Aug-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'efi-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI updates from Ingo Molnar:
"A handful of EFI changes for this cycle:

- EFI CPER parsing improvements

- Don't take the address of efi_guid_t internal fields"

* tag 'efi-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: cper: check section header more appropriately
efi: Don't use knowledge about efi_guid_t internals
efi: cper: fix scnprintf() use in cper_mem_err_location()


b31eea2e 09-Feb-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

efi: Don't use knowledge about efi_guid_t internals

When print GUIDs supply pointer to the efi_guid_t (guid_t) type rather
its internal members.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

cb181da1 22-Aug-2021 THOBY Simon <Simon.THOBY@viveris.fr>

IMA: reject unknown hash algorithms in ima_get_hash_algo

The new function validate_hash_algo() assumed that ima_get_hash_algo()
always return a valid 'enum hash_algo', but it returned the
user-supplied value present in the digital signature without
any bounds checks.

Update ima_get_hash_algo() to always return a valid hash algorithm,
defaulting on 'ima_hash_algo' when the user-supplied value inside
the xattr is invalid.

Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr>
Reported-by: syzbot+e8bafe7b82c739eaf153@syzkaller.appspotmail.com
Fixes: 50f742dd9147 ("IMA: block writes of the security.ima xattr with unsupported algorithms")
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

8ecd39cb 16-Aug-2021 THOBY Simon <Simon.THOBY@viveris.fr>

IMA: prevent SETXATTR_CHECK policy rules with unavailable algorithms

SETXATTR_CHECK policy rules assume that any algorithm listed in the
'appraise_algos' flag must be accepted when performing setxattr() on
the security.ima xattr. However nothing checks that they are
available in the current kernel. A userland application could hash
a file with a digest that the kernel wouldn't be able to verify.
However, if SETXATTR_CHECK is not in use, the kernel already forbids
that xattr write.

Verify that algorithms listed in appraise_algos are available to the
current kernel and reject the policy update otherwise. This will fix
the inconsistency between SETXATTR_CHECK and non-SETXATTR_CHECK
behaviors.

That filtering is only performed in ima_parse_appraise_algos() when
updating policies so that we do not have to pay the price of
allocating a hash object every time validate_hash_algo() is called
in ima_inode_setxattr().

Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

4f2946aa 16-Aug-2021 THOBY Simon <Simon.THOBY@viveris.fr>

IMA: introduce a new policy option func=SETXATTR_CHECK

While users can restrict the accepted hash algorithms for the
security.ima xattr file signature when appraising said file, users
cannot restrict the algorithms that can be set on that attribute:
any algorithm built in the kernel is accepted on a write.

Define a new value for the ima policy option 'func' that restricts
globally the hash algorithms accepted when writing the security.ima
xattr.

When a policy contains a rule of the form
appraise func=SETXATTR_CHECK appraise_algos=sha256,sha384,sha512
only values corresponding to one of these three digest algorithms
will be accepted for writing the security.ima xattr. Attempting to
write the attribute using another algorithm (or "free-form" data)
will be denied with an audit log message. In the absence of such a
policy rule, the default is still to only accept hash algorithms
built in the kernel (with all the limitations that entails).

Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

583a80ae 16-Aug-2021 THOBY Simon <Simon.THOBY@viveris.fr>

IMA: add a policy option to restrict xattr hash algorithms on appraisal

The kernel has the ability to restrict the set of hash algorithms it
accepts for the security.ima xattr when it appraises files.

Define a new IMA policy rule option "appraise_algos=", using the
mentioned mechanism to expose a user-toggable policy knob to opt-in
to that restriction and select the desired set of algorithms that
must be accepted.

When a policy rule uses the 'appraise_algos' option, appraisal of a
file referenced by that rule will now fail if the digest algorithm
employed to hash the file was not one of those explicitly listed in
the option. In its absence, any hash algorithm compiled in the
kernel will be accepted.

For example, on a system where SELinux is properly deployed, the rule
appraise func=BPRM_CHECK obj_type=iptables_exec_t \
appraise_algos=sha256,sha384
will block the execution of iptables if the xattr security.ima of its
executables were not hashed with either sha256 or sha384.

Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

1624dc00 16-Aug-2021 THOBY Simon <Simon.THOBY@viveris.fr>

IMA: add support to restrict the hash algorithms used for file appraisal

The kernel accepts any hash algorithm as a value for the security.ima
xattr. Users may wish to restrict the accepted algorithms to only
support strong cryptographic ones.

Provide the plumbing to restrict the permitted set of hash algorithms
used for verifying file hashes and signatures stored in security.ima
xattr.

Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

50f742dd 16-Aug-2021 THOBY Simon <Simon.THOBY@viveris.fr>

IMA: block writes of the security.ima xattr with unsupported algorithms

By default, writes to the extended attributes security.ima will be
allowed even if the hash algorithm used for the xattr is not compiled
in the kernel (which does not make sense because the kernel would not
be able to appraise that file as it lacks support for validating the
hash).

Prevent and audit writes to the security.ima xattr if the hash algorithm
used in the new value is not available in the current kernel.

Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

8510505d 16-Aug-2021 THOBY Simon <Simon.THOBY@viveris.fr>

IMA: remove the dependency on CRYPTO_MD5

MD5 is a weak digest algorithm that shouldn't be used for cryptographic
operation. It hinders the efficiency of a patch set that aims to limit
the digests allowed for the extended file attribute namely security.ima.
MD5 is no longer a requirement for IMA, nor should it be used there.

The sole place where we still use the MD5 algorithm inside IMA is setting
the ima_hash algorithm to MD5, if the user supplies 'ima_hash=md5'
parameter on the command line. With commit ab60368ab6a4 ("ima: Fallback
to the builtin hash algorithm"), setting "ima_hash=md5" fails gracefully
when CRYPTO_MD5 is not set:
ima: Can not allocate md5 (reason: -2)
ima: Allocating md5 failed, going to use default hash algorithm sha256

Remove the CRYPTO_MD5 dependency for IMA.

Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
[zohar@linux.ibm.com: include commit number in patch description for
stable.]
Cc: stable@vger.kernel.org # 4.17
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

f4083a75 13-Aug-2021 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins")
099fdeda659d ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


91ccbbac 12-Jul-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

dm ima: measure data on table load

DM configures a block device with various target specific attributes
passed to it as a table. DM loads the table, and calls each target’s
respective constructors with the attributes as input parameters.
Some of these attributes are critical to ensure the device meets
certain security bar. Thus, IMA should measure these attributes, to
ensure they are not tampered with, during the lifetime of the device.
So that the external services can have high confidence in the
configuration of the block-devices on a given system.

Some devices may have large tables. And a given device may change its
state (table-load, suspend, resume, rename, remove, table-clear etc.)
many times. Measuring these attributes each time when the device
changes its state will significantly increase the size of the IMA logs.
Further, once configured, these attributes are not expected to change
unless a new table is loaded, or a device is removed and recreated.
Therefore the clear-text of the attributes should only be measured
during table load, and the hash of the active/inactive table should be
measured for the remaining device state changes.

Export IMA function ima_measure_critical_data() to allow measurement
of DM device parameters, as well as target specific attributes, during
table load. Compute the hash of the inactive table and store it for
measurements during future state change. If a load is called multiple
times, update the inactive table hash with the hash of the latest
populated table. So that the correct inactive table hash is measured
when the device transitions to different states like resume, remove,
rename, etc.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

51e1bb9e 08-Aug-2021 Daniel Borkmann <daniel@iogearbox.net>

bpf: Add lockdown check for probe_write_user helper

Back then, commit 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper
to be called in tracers") added the bpf_probe_write_user() helper in order
to allow to override user space memory. Its original goal was to have a
facility to "debug, divert, and manipulate execution of semi-cooperative
processes" under CAP_SYS_ADMIN. Write to kernel was explicitly disallowed
since it would otherwise tamper with its integrity.

One use case was shown in cf9b1199de27 ("samples/bpf: Add test/example of
using bpf_probe_write_user bpf helper") where the program DNATs traffic
at the time of connect(2) syscall, meaning, it rewrites the arguments to
a syscall while they're still in userspace, and before the syscall has a
chance to copy the argument into kernel space. These days we have better
mechanisms in BPF for achieving the same (e.g. for load-balancers), but
without having to write to userspace memory.

Of course the bpf_probe_write_user() helper can also be used to abuse
many other things for both good or bad purpose. Outside of BPF, there is
a similar mechanism for ptrace(2) such as PTRACE_PEEK{TEXT,DATA} and
PTRACE_POKE{TEXT,DATA}, but would likely require some more effort.
Commit 96ae52279594 explicitly dedicated the helper for experimentation
purpose only. Thus, move the helper's availability behind a newly added
LOCKDOWN_BPF_WRITE_USER lockdown knob so that the helper is disabled under
the "integrity" mode. More fine-grained control can be implemented also
from LSM side with this change.

Fixes: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>

71330842 09-Aug-2021 Daniel Borkmann <daniel@iogearbox.net>

bpf: Add _kernel suffix to internal lockdown_bpf_read

Rename LOCKDOWN_BPF_READ into LOCKDOWN_BPF_READ_KERNEL so we have naming
more consistent with a LOCKDOWN_BPF_WRITE_USER option that we are adding.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>

0ca8d3ca 05-Aug-2021 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Build failure in drivers/net/wwan/mhi_wwan_mbim.c:
add missing parameter (0, assuming we don't want buffer pre-alloc).

Conflict in drivers/net/dsa/sja1105/sja1105_main.c between:
589918df9322 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
0fac6aa098ed ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode")

Follow the instructions from the commit message of the former commit
- removed the if conditions. When looking at commit 589918df9322 ("net:
dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
note that the mask_iotag fields get removed by the following patch.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


0b53abfc 05-Aug-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20210805' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
"One small SELinux fix for a problem where an error code was not being
propagated back up to userspace when a bogus SELinux policy is loaded
into the kernel"

* tag 'selinux-pr-20210805' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: correct the return value when loads initial sids


4c156084 28-Jul-2021 Xiu Jianfeng <xiujianfeng@huawei.com>

selinux: correct the return value when loads initial sids

It should not return 0 when SID 0 is assigned to isids.
This patch fixes it.

Cc: stable@vger.kernel.org
Fixes: e3e0b582c321a ("selinux: remove unused initial SIDs and improve handling")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
[PM: remove changelog from description]
Signed-off-by: Paul Moore <paul@paul-moore.com>

bc49d816 28-Jul-2021 Jeremy Kerr <jk@codeconstruct.com.au>

mctp: Add MCTP base

Add basic Kconfig, an initial (empty) af_mctp source object, and
{AF,PF}_MCTP definitions, and the required definitions for a new
protocol type.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca3c9bdb 23-Jul-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Add digest and digest_len params to the functions to measure a buffer

This patch performs the final modification necessary to pass the buffer
measurement to callers, so that they provide a functionality similar to
ima_file_hash(). It adds the 'digest' and 'digest_len' parameters to
ima_measure_critical_data() and process_buffer_measurement().

These functions calculate the digest even if there is no suitable rule in
the IMA policy and, in this case, they simply return 1 before generating a
new measurement entry.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

ce5bb5a8 23-Jul-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Return int in the functions to measure a buffer

ima_measure_critical_data() and process_buffer_measurement() currently
don't return a result as, unlike appraisal-related functions, the result is
not used by callers to deny an operation. Measurement-related functions
instead rely on the audit subsystem to notify the system administrator when
an error occurs.

However, ima_measure_critical_data() and process_buffer_measurement() are a
special case, as these are the only functions that can return a buffer
measurement (for files, there is ima_file_hash()). In a subsequent patch,
they will be modified to return the calculated digest.

In preparation to return the result of the digest calculation, this patch
modifies the return type from void to int, and returns 0 if the buffer has
been successfully measured, a negative value otherwise.

Given that the result of the measurement is still not necessary, this patch
does not modify the behavior of existing callers by processing the returned
value. For those, the return value is ignored.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Acked-by: Paul Moore <paul@paul-moore.com> (for the SELinux bits)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

5d1ef2ce 23-Jul-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Introduce ima_get_current_hash_algo()

Buffer measurements, unlike file measurements, are not accessible after the
measurement is done, as buffers are not suitable for use with the
integrity_iint_cache structure (there is no index, for files it is the
inode number). In the subsequent patches, the measurement (digest) will be
returned directly by the functions that perform the buffer measurement,
ima_measure_critical_data() and process_buffer_measurement().

A caller of those functions also needs to know the algorithm used to
calculate the digest. Instead of adding the algorithm as a new parameter to
the functions, this patch provides it separately with the new function
ima_get_current_hash_algo().

Since the hash algorithm does not change after the IMA setup phase, there
is no risk of races (obtaining a digest calculated with a different
algorithm than the one returned).

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
[zohar@linux.ibm.com: annotate ima_hash_algo as __ro_after_init]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

a32ad904 29-Jun-2021 Austin Kim <austin.kim@lge.com>

IMA: remove -Wmissing-prototypes warning

With W=1 build, the compiler throws warning message as below:

security/integrity/ima/ima_mok.c:24:12: warning:
no previous prototype for ‘ima_mok_init’ [-Wmissing-prototypes]
__init int ima_mok_init(void)

Silence the warning by adding static keyword to ima_mok_init().

Signed-off-by: Austin Kim <austin.kim@lge.com>
Fixes: 41c89b64d718 ("IMA: create machine owner and blacklist keyrings")
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

dcb7c0b9 20-Jul-2021 Kees Cook <keescook@chromium.org>

hardening: Clarify Kconfig text for auto-var-init

Clarify the details around the automatic variable initialization modes
available. Specifically this details the values used for pattern init
and expands on the rationale for zero init safety. Additionally makes
zero init the default when available.

Cc: glider@google.com
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: linux-security-module@vger.kernel.org
Cc: clang-built-linux@googlegroups.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>

a82adfd5 12-Apr-2021 Kees Cook <keescook@chromium.org>

hardening: Introduce CONFIG_ZERO_CALL_USED_REGS

When CONFIG_ZERO_CALL_USED_REGS is enabled, build the kernel with
"-fzero-call-used-regs=used-gpr" (in GCC 11). This option will zero any
caller-used register contents just before returning from a function,
ensuring that temporary values are not leaked beyond the function
boundary. This means that register contents are less likely to be
available for side channel attacks and information exposures.

Additionally this helps reduce the number of useful ROP gadgets in the
kernel image by about 20%:

$ ROPgadget.py --nosys --nojop --binary vmlinux.stock | tail -n1
Unique gadgets found: 337245

$ ROPgadget.py --nosys --nojop --binary vmlinux.zero-call-regs | tail -n1
Unique gadgets found: 267175

and more notably removes simple "write-what-where" gadgets:

$ ROPgadget.py --ropchain --binary vmlinux.stock | sed -n '/Step 1/,/Step 2/p'
- Step 1 -- Write-what-where gadgets

[+] Gadget found: 0xffffffff8102d76c mov qword ptr [rsi], rdx ; ret
[+] Gadget found: 0xffffffff81000cf5 pop rsi ; ret
[+] Gadget found: 0xffffffff8104d7c8 pop rdx ; ret
[-] Can't find the 'xor rdx, rdx' gadget. Try with another 'mov [reg], reg'

[+] Gadget found: 0xffffffff814c2b4c mov qword ptr [rsi], rdi ; ret
[+] Gadget found: 0xffffffff81000cf5 pop rsi ; ret
[+] Gadget found: 0xffffffff81001e51 pop rdi ; ret
[-] Can't find the 'xor rdi, rdi' gadget. Try with another 'mov [reg], reg'

[+] Gadget found: 0xffffffff81540d61 mov qword ptr [rsi], rdi ; pop rbx ; pop rbp ; ret
[+] Gadget found: 0xffffffff81000cf5 pop rsi ; ret
[+] Gadget found: 0xffffffff81001e51 pop rdi ; ret
[-] Can't find the 'xor rdi, rdi' gadget. Try with another 'mov [reg], reg'

[+] Gadget found: 0xffffffff8105341e mov qword ptr [rsi], rax ; ret
[+] Gadget found: 0xffffffff81000cf5 pop rsi ; ret
[+] Gadget found: 0xffffffff81029a11 pop rax ; ret
[+] Gadget found: 0xffffffff811f1c3b xor rax, rax ; ret

- Step 2 -- Init syscall number gadgets

$ ROPgadget.py --ropchain --binary vmlinux.zero* | sed -n '/Step 1/,/Step 2/p'
- Step 1 -- Write-what-where gadgets

[-] Can't find the 'mov qword ptr [r64], r64' gadget

For an x86_64 parallel build tests, this has a less than 1% performance
impact, and grows the image size less than 1%:

$ size vmlinux.stock vmlinux.zero-call-regs
text data bss dec hex filename
22437676 8559152 14127340 45124168 2b08a48 vmlinux.stock
22453184 8563248 14110956 45127388 2b096dc vmlinux.zero-call-regs

Impact for other architectures may vary. For example, arm64 sees a 5.5%
image size growth, mainly due to needing to always clear x16 and x17:
https://lore.kernel.org/lkml/20210510134503.GA88495@C02TD0UTHF1T.local/

Signed-off-by: Kees Cook <keescook@chromium.org>

bfc3cac0 29-Jun-2021 Austin Kim <austin.kim@lge.com>

smack: mark 'smack_enabled' global variable as __initdata

Mark 'smack_enabled' as __initdata
since it is only used during initialization code.

Signed-off-by: Austin Kim <austin.kim@lge.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

6d14f5c7 15-Jul-2021 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

Smack: Fix wrong semantics in smk_access_entry()

In the smk_access_entry() function, if no matching rule is found
in the rust_list, a negative error code will be used to perform bit
operations with the MAY_ enumeration value. This is semantically
wrong. This patch fixes this issue.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

893c47d1 13-Jul-2021 Austin Kim <austin.kim@lge.com>

selinux: return early for possible NULL audit buffers

audit_log_start() may return NULL in below cases:

- when audit is not initialized.
- when audit backlog limit exceeds.

After the call to audit_log_start() is made and then possible NULL audit
buffer argument is passed to audit_log_*() functions,
audit_log_*() functions return immediately in case of a NULL audit buffer
argument.

But it is optimal to return early when audit_log_start() returns NULL,
because it is not necessary for audit_log_*() functions to be called with
NULL audit buffer argument.

So add exception handling for possible NULL audit buffers where
return value can be handled from callers.

Signed-off-by: Austin Kim <austin.kim@lge.com>
[PM: tweak subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>

4cad6719 02-Jul-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'asm-generic-unaligned-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm/unaligned.h unification from Arnd Bergmann:
"Unify asm/unaligned.h around struct helper

The get_unaligned()/put_unaligned() helpers are traditionally
architecture specific, with the two main variants being the
"access-ok.h" version that assumes unaligned pointer accesses always
work on a particular architecture, and the "le-struct.h" version that
casts the data to a byte aligned type before dereferencing, for
architectures that cannot always do unaligned accesses in hardware.

Based on the discussion linked below, it appears that the access-ok
version is not realiable on any architecture, but the struct version
probably has no downsides. This series changes the code to use the
same implementation on all architectures, addressing the few
exceptions separately"

Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/
Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2
Link: https://lore.kernel.org/lkml/CAHk-=whGObOKruA_bU3aPGZfoDqZM1_9wBkwREp0H0FgR-90uQ@mail.gmail.com/

* tag 'asm-generic-unaligned-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: simplify asm/unaligned.h
asm-generic: uaccess: 1-byte access is always aligned
netpoll: avoid put_unaligned() on single character
mwifiex: re-fix for unaligned accesses
apparmor: use get_unaligned() only for multi-byte words
partitions: msdos: fix one-byte get_unaligned()
asm-generic: unaligned always use struct helpers
asm-generic: unaligned: remove byteshift helpers
powerpc: use linux/unaligned/le_struct.h on LE power7
m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
sh: remove unaligned access for sh4a
openrisc: always use unaligned-struct header
asm-generic: use asm-generic/unaligned.h for most architectures


92183137 30-Jun-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'safesetid-5.14' of git://github.com/micah-morton/linux

Pull SafeSetID update from Micah Morton:
"One very minor code cleanup change that marks a variable as
__initdata"

* tag 'safesetid-5.14' of git://github.com/micah-morton/linux:
LSM: SafeSetID: Mark safesetid_initialized as __initdata


5c874a5b 30-Jun-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'Smack-for-5.14' of git://github.com/cschaufler/smack-next

Pull smack updates from Casey Schaufler:
"There is nothing more significant than an improvement to a byte count
check in smackfs.

All changes have been in next for weeks"

* tag 'Smack-for-5.14' of git://github.com/cschaufler/smack-next:
Smack: fix doc warning
Revert "Smack: Handle io_uring kernel thread privileges"
smackfs: restrict bytes count in smk_set_cipso()
security/smack/: fix misspellings using codespell tool


290fe0fa 30-Jun-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
"Another merge window, another small audit pull request.

Four patches in total: one is cosmetic, one removes an unnecessary
initialization, one renames some enum values to prevent name
collisions, and one converts list_del()/list_add() to list_move().

None of these are earth shattering and all pass the audit-testsuite
tests while merging cleanly on top of your tree from earlier today"

* tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove unnecessary 'ret' initialization
audit: remove trailing spaces and tabs
audit: Use list_move instead of list_del/list_add
audit: Rename enum audit_state constants to avoid AUDIT_DISABLED redefinition
audit: add blank line after variable declarations


6bd344e5 30-Jun-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux updates from Paul Moore:

- The slow_avc_audit() function is now non-blocking so we can remove
the AVC_NONBLOCKING tricks; this also includes the 'flags' variant of
avc_has_perm().

- Use kmemdup() instead of kcalloc()+copy when copying parts of the
SELinux policydb.

- The InfiniBand device name is now passed by reference when possible
in the SELinux code, removing a strncpy().

- Minor cleanups including: constification of avtab function args,
removal of useless LSM/XFRM function args, SELinux kdoc fixes, and
removal of redundant assignments.

* tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit()
selinux: slow_avc_audit has become non-blocking
selinux: Fix kernel-doc
selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC
lsm_audit,selinux: pass IB device name by reference
selinux: Remove redundant assignment to rc
selinux: Corrected comment to match kernel-doc comment
selinux: delete selinux_xfrm_policy_lookup() useless argument
selinux: constify some avtab function arguments
selinux: simplify duplicate_policydb_cond_list() by using kmemdup()


a60c538e 28-Jun-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity subsystem updates from Mimi Zohar:
"The large majority of the changes are EVM portable & immutable
signature related: removing a dependency on loading an HMAC key,
safely allowing file metadata included in the EVM portable & immutable
signatures to be modified, allowing EVM signatures to fulfill IMA file
signature policy requirements, including the EVM file metadata
signature in lieu of an IMA file data signature in the measurement
list, and adding dynamic debugging of EVM file metadata.

In addition, in order to detect critical data or file change
reversions, duplicate measurement records are permitted in the IMA
measurement list.

The remaining patches address compiler, sparse, and doc warnings"

* tag 'integrity-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: (31 commits)
evm: Check xattr size discrepancy between kernel and user
evm: output EVM digest calculation info
IMA: support for duplicate measurement records
ima: Fix warning: no previous prototype for function 'ima_add_kexec_buffer'
ima: differentiate between EVM failures in the audit log
ima: Fix fall-through warning for Clang
ima: Pass NULL instead of 0 to ima_get_action() in ima_file_mprotect()
ima: Include header defining ima_post_key_create_or_update()
ima/evm: Fix type mismatch
ima: Set correct casting types
doc: Fix warning in Documentation/security/IMA-templates.rst
evm: Don't return an error in evm_write_xattrs() if audit is not enabled
ima: Define new template evm-sig
ima: Define new template fields xattrnames, xattrlengths and xattrvalues
evm: Verify portable signatures against all protected xattrs
ima: Define new template field imode
ima: Define new template fields iuid and igid
ima: Add ima_show_template_uint() template library function
ima: Don't remove security.ima if file must not be appraised
ima: Introduce template field evmsig and write to field sig as fallback
...


907a399d 21-Jun-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Check xattr size discrepancy between kernel and user

The kernel and the user obtain an xattr value in two different ways:

kernel (EVM): uses vfs_getxattr_alloc() which obtains the xattr value from
the filesystem handler (raw value);

user (ima-evm-utils): uses vfs_getxattr() which obtains the xattr value
from the LSMs (normalized value).

Normally, this does not have an impact unless security.selinux is set with
setfattr, with a value not terminated by '\0' (this is not the recommended
way, security.selinux should be set with the appropriate tools such as
chcon and restorecon).

In this case, the kernel and the user see two different xattr values: the
former sees the xattr value without '\0' (raw value), the latter sees the
value with '\0' (value normalized by SELinux).

This could result in two different verification outcomes from EVM and
ima-evm-utils, if a signature was calculated with a security.selinux value
terminated by '\0' and the value set in the filesystem is not terminated by
'\0'. The former would report verification failure due to the missing '\0',
while the latter would report verification success (because it gets the
normalized value with '\0').

This patch mitigates this issue by comparing in evm_calc_hmac_or_hash() the
size of the xattr returned by the two xattr functions and by warning the
user if there is a discrepancy.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

87ac3d00 13-May-2021 Mimi Zohar <zohar@linux.ibm.com>

evm: output EVM digest calculation info

Output the data used in calculating the EVM digest and the resulting
digest as ascii hexadecimal strings.

Suggested-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> (CONFIG_DYNAMIC_DEBUG)
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reported-by: kernel test robot <lkp@intel.com> (Use %zu for size_t)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

98eaa63e 10-Jun-2021 ChenXiaoSong <chenxiaosong2@huawei.com>

tomoyo: fix doc warnings

Fix gcc W=1 warnings:

security/tomoyo/audit.c:331: warning: Function parameter or member 'matched_acl' not described in 'tomoyo_get_audit'
security/tomoyo/securityfs_if.c:146: warning: Function parameter or member 'inode' not described in 'tomoyo_release'
security/tomoyo/tomoyo.c:122: warning: Function parameter or member 'path' not described in 'tomoyo_inode_getattr'
security/tomoyo/tomoyo.c:497: warning: Function parameter or member 'clone_flags' not described in 'tomoyo_task_alloc'
security/tomoyo/util.c:92: warning: Function parameter or member 'time64' not described in 'tomoyo_convert_time'

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
[ penguin-kernel: Also adjust spaces and similar warnings ]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

0ecc6178 10-Jun-2021 Austin Kim <austin.kim@lge.com>

audit: remove unnecessary 'ret' initialization

The variable 'ret' is set to 0 when declared.
The 'ret' is unused until it is set to 0 again.

So it had better remove unnecessary initialization.

Signed-off-by: Austin Kim <austin.kim@lge.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

d99cf13f 16-Jan-2021 Al Viro <viro@zeniv.linux.org.uk>

selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit()

... along with avc_has_perm_flags() itself, since now it's identical
to avc_has_perm() (as pointed out by Paul Moore)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[PM: add "selinux:" prefix to subj and tweak for length]
Signed-off-by: Paul Moore <paul@paul-moore.com>

b17ec22f 16-Jan-2021 Al Viro <viro@zeniv.linux.org.uk>

selinux: slow_avc_audit has become non-blocking

dump_common_audit_data() is safe to use under rcu_read_lock() now;
no need for AVC_NONBLOCKING and games around it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul Moore <paul@paul-moore.com>

d0a83314 11-Jun-2021 Yang Li <yang.lee@linux.alibaba.com>

selinux: Fix kernel-doc

Fix function name and add comment for parameter state in ss/services.c
kernel-doc to remove some warnings found by running make W=1 LLVM=1.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

52c20839 10-May-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

IMA: support for duplicate measurement records

IMA measures contents of a given file/buffer/critical-data record,
and properly re-measures it on change. However, IMA does not measure
the duplicate value for a given record, since TPM extend is a very
expensive operation. For example, if the record changes from value
'v#1' to 'v#2', and then back to 'v#1', IMA will not measure and log
the last change to 'v#1', since the hash of 'v#1' for that record is
already present in the IMA htable. This limits the ability of an
external attestation service to accurately determine the current state
of the system. The service would incorrectly conclude that the latest
value of the given record on the system is 'v#2', and act accordingly.

Define and use a new Kconfig option IMA_DISABLE_HTABLE to permit
duplicate records in the IMA measurement list.

In addition to the duplicate measurement records described above,
other duplicate file measurement records may be included in the log,
when CONFIG_IMA_DISABLE_HTABLE is enabled. For example,
- i_version is not enabled,
- i_generation changed,
- same file present on different filesystems,
- an inode is evicted from dcache

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
[zohar@linux.ibm.com: updated list of duplicate measurement records]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

c6791349 10-Jun-2021 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

ima: Fix warning: no previous prototype for function 'ima_add_kexec_buffer'

The function prototype for ima_add_kexec_buffer() is present
in 'linux/ima.h'. But this header file is not included in
ima_kexec.c where the function is implemented. This results
in the following compiler warning when "-Wmissing-prototypes" flag
is turned on:

security/integrity/ima/ima_kexec.c:81:6: warning: no previous prototype
for function 'ima_add_kexec_buffer' [-Wmissing-prototypes]

Include the header file 'linux/ima.h' in ima_kexec.c to fix
the compiler warning.

Fixes: dce92f6b11c3 (arm64: Enable passing IMA log to next kernel on kexec)
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

648f2c61 09-Jun-2021 Minchan Kim <minchan@kernel.org>

selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC

In the field, we have seen lots of allocation failure from the call
path below.

06-03 13:29:12.999 1010315 31557 31557 W Binder : 31542_2: page allocation failure: order:0, mode:0x800(GFP_NOWAIT), nodemask=(null),cpuset=background,mems_allowed=0
...
...
06-03 13:29:12.999 1010315 31557 31557 W Call trace:
06-03 13:29:12.999 1010315 31557 31557 W : dump_backtrace.cfi_jt+0x0/0x8
06-03 13:29:12.999 1010315 31557 31557 W : dump_stack+0xc8/0x14c
06-03 13:29:12.999 1010315 31557 31557 W : warn_alloc+0x158/0x1c8
06-03 13:29:12.999 1010315 31557 31557 W : __alloc_pages_slowpath+0x9d8/0xb80
06-03 13:29:12.999 1010315 31557 31557 W : __alloc_pages_nodemask+0x1c4/0x430
06-03 13:29:12.999 1010315 31557 31557 W : allocate_slab+0xb4/0x390
06-03 13:29:12.999 1010315 31557 31557 W : ___slab_alloc+0x12c/0x3a4
06-03 13:29:12.999 1010315 31557 31557 W : kmem_cache_alloc+0x358/0x5e4
06-03 13:29:12.999 1010315 31557 31557 W : avc_alloc_node+0x30/0x184
06-03 13:29:12.999 1010315 31557 31557 W : avc_update_node+0x54/0x4f0
06-03 13:29:12.999 1010315 31557 31557 W : avc_has_extended_perms+0x1a4/0x460
06-03 13:29:12.999 1010315 31557 31557 W : selinux_file_ioctl+0x320/0x3d0
06-03 13:29:12.999 1010315 31557 31557 W : __arm64_sys_ioctl+0xec/0x1fc
06-03 13:29:12.999 1010315 31557 31557 W : el0_svc_common+0xc0/0x24c
06-03 13:29:12.999 1010315 31557 31557 W : el0_svc+0x28/0x88
06-03 13:29:12.999 1010315 31557 31557 W : el0_sync_handler+0x8c/0xf0
06-03 13:29:12.999 1010315 31557 31557 W : el0_sync+0x1a4/0x1c0
..
..
06-03 13:29:12.999 1010315 31557 31557 W SLUB : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
06-03 13:29:12.999 1010315 31557 31557 W cache : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
06-03 13:29:12.999 1010315 31557 31557 W node 0 : slabs: 57, objs: 2907, free: 0
06-03 13:29:12.999 1010161 10686 10686 W SLUB : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
06-03 13:29:12.999 1010161 10686 10686 W cache : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
06-03 13:29:12.999 1010161 10686 10686 W node 0 : slabs: 57, objs: 2907, free: 0
06-03 13:29:12.999 1010161 10686 10686 W SLUB : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
06-03 13:29:12.999 1010161 10686 10686 W cache : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
06-03 13:29:12.999 1010161 10686 10686 W node 0 : slabs: 57, objs: 2907, free: 0
06-03 13:29:12.999 1010161 10686 10686 W SLUB : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
06-03 13:29:12.999 1010161 10686 10686 W cache : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
06-03 13:29:12.999 1010161 10686 10686 W node 0 : slabs: 57, objs: 2907, free: 0
06-03 13:29:13.000 1010161 10686 10686 W SLUB : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
06-03 13:29:13.000 1010161 10686 10686 W cache : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
06-03 13:29:13.000 1010161 10686 10686 W node 0 : slabs: 57, objs: 2907, free: 0
06-03 13:29:13.000 1010161 10686 10686 W SLUB : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
06-03 13:29:13.000 1010161 10686 10686 W cache : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
06-03 13:29:13.000 1010161 10686 10686 W node 0 : slabs: 57, objs: 2907, free: 0
06-03 13:29:13.000 1010161 10686 10686 W SLUB : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
06-03 13:29:13.000 1010161 10686 10686 W cache : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
06-03 13:29:13.000 1010161 10686 10686 W node 0 : slabs: 57, objs: 2907, free: 0
06-03 13:29:13.000 10230 30892 30892 W SLUB : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
06-03 13:29:13.000 10230 30892 30892 W cache : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
06-03 13:29:13.000 10230 30892 30892 W node 0 : slabs: 57, objs: 2907, free: 0
06-03 13:29:13.000 10230 30892 30892 W SLUB : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
06-03 13:29:13.000 10230 30892 30892 W cache : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0

Based on [1], selinux is tolerate for failure of memory allocation.
Then, use __GFP_NOWARN together.

[1] 476accbe2f6e ("selinux: use GFP_NOWAIT in the AVC kmem_caches")

Signed-off-by: Minchan Kim <minchan@kernel.org>
[PM: subj fix, line wraps, normalized commit refs]
Signed-off-by: Paul Moore <paul@paul-moore.com>

55748ac6 02-Jun-2021 Mimi Zohar <zohar@linux.ibm.com>

ima: differentiate between EVM failures in the audit log

Differentiate between an invalid EVM portable signature failure
from other EVM HMAC/signature failures.

Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

1b8b7192 08-Jun-2021 Austin Kim <austindh.kim@gmail.com>

LSM: SafeSetID: Mark safesetid_initialized as __initdata

Mark safesetid_initialized as __initdata since it is only used
in initialization routine.

Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: Micah Morton <mortonm@chromium.org>

7d2201d4 07-Jun-2021 Gustavo A. R. Silva <gustavoars@kernel.org>

ima: Fix fall-through warning for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a
fall-through warning by explicitly adding a break statement instead
of just letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

531bf6a8 08-Jun-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Pass NULL instead of 0 to ima_get_action() in ima_file_mprotect()

This patch fixes the sparse warning:

sparse: warning: Using plain integer as NULL pointer

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

8c559415 08-Jun-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Include header defining ima_post_key_create_or_update()

This patch fixes the sparse warning for ima_post_key_create_or_update() by
adding the header file that defines the prototype (linux/ima.h).

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

6b26285f 08-Jun-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima/evm: Fix type mismatch

The endianness of a variable written to the measurement list cannot be
determined at compile time, as it depends on the value of the
ima_canonical_fmt global variable (set through a kernel option with the
same name if the machine is big endian).

If ima_canonical_fmt is false, the endianness of a variable is the same as
the machine; if ima_canonical_fmt is true, the endianness is little endian.
The warning arises due to this type of instruction:

var = cpu_to_leXX(var)

which tries to assign a value in little endian to a variable with native
endianness (little or big endian).

Given that the variables set with this instruction are not used in any
operation but just written to a buffer, it is safe to force the type of the
value being set to be the same of the type of the variable with:

var = (__force <var type>)cpu_to_leXX(var)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

24c9ae23 08-Jun-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Set correct casting types

The code expects that the values being parsed from a buffer when the
ima_canonical_fmt global variable is true are in little endian. Thus, this
patch sets the casting types accordingly.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

fe6bde73 06-Jun-2021 ChenXiaoSong <chenxiaosong2@huawei.com>

Smack: fix doc warning

Fix gcc W=1 warning:

security/smack/smack_access.c:342: warning: Function parameter or member 'ad' not described in 'smack_log'
security/smack/smack_access.c:403: warning: Function parameter or member 'skp' not described in 'smk_insert_entry'
security/smack/smack_access.c:487: warning: Function parameter or member 'level' not described in 'smk_netlbl_mls'
security/smack/smack_access.c:487: warning: Function parameter or member 'len' not described in 'smk_netlbl_mls'

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

d721c15f 28-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Don't return an error in evm_write_xattrs() if audit is not enabled

This patch avoids that evm_write_xattrs() returns an error when audit is
not enabled. The ab variable can be NULL and still be passed to the other
audit_log_() functions, as those functions do not include any instruction.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

88016de3 03-Jun-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Define new template evm-sig

With the recent introduction of the evmsig template field, remote verifiers
can obtain the EVM portable signature instead of the IMA signature, to
verify file metadata.

After introducing the new fields to include file metadata in the
measurement list, this patch finally defines the evm-sig template, whose
format is:

d-ng|n-ng|evmsig|xattrnames|xattrlengths|xattrvalues|iuid|igid|imode

xattrnames, xattrlengths and xattrvalues are populated only from defined
EVM protected xattrs, i.e. the ones that EVM considers to verify the
portable signature. xattrnames and xattrlengths are populated only if the
xattr is present.

xattrnames and xattrlengths are not necessary for verifying the EVM
portable signature, but they are included for completeness of information,
if a remote verifier wants to infer more from file metadata.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

8314b673 01-Jun-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Define new template fields xattrnames, xattrlengths and xattrvalues

This patch defines the new template fields xattrnames, xattrlengths and
xattrvalues, which contain respectively a list of xattr names (strings,
separated by |), lengths (u32, hex) and values (hex). If an xattr is not
present, the name and length are not displayed in the measurement list.

Reported-by: kernel test robot <lkp@intel.com> (Missing prototype def)
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

8c7a703e 28-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Verify portable signatures against all protected xattrs

Currently, the evm_config_default_xattrnames array contains xattr names
only related to LSMs which are enabled in the kernel configuration.
However, EVM portable signatures do not depend on local information and a
vendor might include in the signature calculation xattrs that are not
enabled in the target platform.

Just including all xattrs names in evm_config_default_xattrnames is not a
safe approach, because a target system might have already calculated
signatures or HMACs based only on the enabled xattrs. After applying this
patch, EVM would verify those signatures and HMACs with all xattrs instead.
The non-enabled ones, which could possibly exist, would cause a
verification error.

Thus, this patch adds a new field named enabled to the xattr_list
structure, which is set to true if the LSM associated to a given xattr name
is enabled in the kernel configuration. The non-enabled xattrs are taken
into account only in evm_calc_hmac_or_hash(), if the passed security.evm
type is EVM_XATTR_PORTABLE_DIGSIG.

The new function evm_protected_xattr_if_enabled() has been defined so that
IMA can include all protected xattrs and not only the enabled ones in the
measurement list, if the new template fields xattrnames, xattrlengths or
xattrvalues have been included in the template format.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

f8216f6b 28-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Define new template field imode

This patch defines the new template field imode, which includes the
inode mode. It can be used by a remote verifier to verify the EVM portable
signature, if it was included with the template fields sig or evmsig.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

7dcfeacc 28-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Define new template fields iuid and igid

This patch defines the new template fields iuid and igid, which include
respectively the inode UID and GID. For idmapped mounts, still the original
UID and GID are provided.

These fields can be used to verify the EVM portable signature, if it was
included with the template fields sig or evmsig.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

cde1391a 28-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Add ima_show_template_uint() template library function

This patch introduces the new function ima_show_template_uint(). This can
be used for showing integers of different sizes in ASCII format. The
function ima_show_template_data_ascii() automatically determines how to
print a stored integer by checking the integer size.

If integers have been written in canonical format,
ima_show_template_data_ascii() calls the appropriate leXX_to_cpu() function
to correctly display the value.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

ed1b472f 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Don't remove security.ima if file must not be appraised

Files might come from a remote source and might have xattrs, including
security.ima. It should not be IMA task to decide whether security.ima
should be kept or not. This patch removes the removexattr() system
call in ima_inode_post_setattr().

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

026d7fc9 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Introduce template field evmsig and write to field sig as fallback

With the patch to accept EVM portable signatures when the
appraise_type=imasig requirement is specified in the policy, appraisal can
be successfully done even if the file does not have an IMA signature.

However, remote attestation would not see that a different signature type
was used, as only IMA signatures can be included in the measurement list.
This patch solves the issue by introducing the new template field 'evmsig'
to show EVM portable signatures and by including its value in the existing
field 'sig' if the IMA signature is not found.

Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

7aa5783d 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

ima: Allow imasig requirement to be satisfied by EVM portable signatures

System administrators can require that all accessed files have a signature
by specifying appraise_type=imasig in a policy rule.

Currently, IMA signatures satisfy this requirement. Appended signatures may
also satisfy this requirement, but are not applicable as IMA signatures.
IMA/appended signatures ensure data source authentication for file content
and prevent any change. EVM signatures instead ensure data source
authentication for file metadata. Given that the digest or signature of the
file content must be included in the metadata, EVM signatures provide the
same file data guarantees of IMA signatures, as well as providing file
metadata guarantees.

This patch lets systems protected with EVM signatures pass appraisal
verification if the appraise_type=imasig requirement is specified in the
policy. This facilitates deployment in the scenarios where only EVM
signatures are available.

The patch makes the following changes:

file xattr types:
security.ima: IMA_XATTR_DIGEST/IMA_XATTR_DIGEST_NG
security.evm: EVM_XATTR_PORTABLE_DIGSIG

execve(), mmap(), open() behavior (with appraise_type=imasig):
before: denied (file without IMA signature, imasig requirement not met)
after: allowed (file with EVM portable signature, imasig requirement met)

open(O_WRONLY) behavior (without appraise_type=imasig):
before: allowed (file without IMA signature, not immutable)
after: denied (file with EVM portable signature, immutable)

In addition, similarly to IMA signatures, this patch temporarily allows
new files without or with incomplete metadata to be opened so that content
can be written.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

1886ab01 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Allow setxattr() and setattr() for unmodified metadata

With the patch to allow xattr/attr operations if a portable signature
verification fails, cp and tar can copy all xattrs/attrs so that at the
end of the process verification succeeds.

However, it might happen that the xattrs/attrs are already set to the
correct value (taken at signing time) and signature verification succeeds
before the copy has completed. For example, an archive might contains files
owned by root and the archive is extracted by root.

Then, since portable signatures are immutable, all subsequent operations
fail (e.g. fchown()), even if the operation is legitimate (does not alter
the current value).

This patch avoids this problem by reporting successful operation to user
space when that operation does not alter the current value of xattrs/attrs.

With this patch, the one that introduces evm_hmac_disabled() and the one
that allows a metadata operation on the INTEGRITY_FAIL_IMMUTABLE error, EVM
portable signatures can be used without disabling metadata verification
(by setting EVM_ALLOW_METADATA_WRITES). Due to keeping metadata
verification enabled, altering immutable metadata protected with a portable
signature that was successfully verified will be denied (existing
behavior).

Reported-by: kernel test robot <lkp@intel.com> [implicit declaration of function]
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

7e135dc7 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Pass user namespace to set/remove xattr hooks

In preparation for 'evm: Allow setxattr() and setattr() for unmodified
metadata', this patch passes mnt_userns to the inode set/remove xattr hooks
so that the GID of the inode on an idmapped mount is correctly determined
by posix_acl_update_mode().

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

cdef685b 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Allow xattr/attr operations for portable signatures

If files with portable signatures are copied from one location to another
or are extracted from an archive, verification can temporarily fail until
all xattrs/attrs are set in the destination. Only portable signatures may
be moved or copied from one file to another, as they don't depend on
system-specific information such as the inode generation. Instead portable
signatures must include security.ima.

Unlike other security.evm types, EVM portable signatures are also
immutable. Thus, it wouldn't be a problem to allow xattr/attr operations
when verification fails, as portable signatures will never be replaced with
the HMAC on possibly corrupted xattrs/attrs.

This patch first introduces a new integrity status called
INTEGRITY_FAIL_IMMUTABLE, that allows callers of
evm_verify_current_integrity() to detect that a portable signature didn't
pass verification and then adds an exception in evm_protect_xattr() and
evm_inode_setattr() for this status and returns 0 instead of -EPERM.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

4a804b8a 20-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Introduce evm_hmac_disabled() to safely ignore verification errors

When a file is being created, LSMs can set the initial label with the
inode_init_security hook. If no HMAC key is loaded, the new file will have
LSM xattrs but not the HMAC. It is also possible that the file remains
without protected xattrs after creation if no active LSM provided it, or
because the filesystem does not support them.

Unfortunately, EVM will deny any further metadata operation on new files,
as evm_protect_xattr() will return the INTEGRITY_NOLABEL error if protected
xattrs exist without security.evm, INTEGRITY_NOXATTRS if no protected
xattrs exist or INTEGRITY_UNKNOWN if xattrs are not supported. This would
limit the usability of EVM when only a public key is loaded, as commands
such as cp or tar with the option to preserve xattrs won't work.

This patch introduces the evm_hmac_disabled() function to determine whether
or not it is safe to ignore verification errors, based on the ability of
EVM to calculate HMACs. If the HMAC key is not loaded, and it cannot be
loaded in the future due to the EVM_SETUP_COMPLETE initialization flag,
allowing an operation despite the attrs/xattrs being found invalid will not
make them valid.

Since the post hooks can be executed even when the HMAC key is not loaded,
this patch also ensures that the EVM_INIT_HMAC initialization flag is set
before the post hooks call evm_update_evmxattr().

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Suggested-by: Mimi Zohar <zohar@linux.ibm.com> (for ensuring EVM_INIT_HMAC is set)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

e3ccfe1a 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Introduce evm_revalidate_status()

When EVM_ALLOW_METADATA_WRITES is set, EVM allows any operation on
metadata. Its main purpose is to allow users to freely set metadata when it
is protected by a portable signature, until an HMAC key is loaded.

However, callers of evm_verifyxattr() are not notified about metadata
changes and continue to rely on the last status returned by the function.
For example IMA, since it caches the appraisal result, will not call again
evm_verifyxattr() until the appraisal flags are cleared, and will grant
access to the file even if there was a metadata operation that made the
portable signature invalid.

This patch introduces evm_revalidate_status(), which callers of
evm_verifyxattr() can use in their xattr hooks to determine whether
re-validation is necessary and to do the proper actions. IMA calls it in
its xattr hooks to reset the appraisal flags, so that the EVM status is
re-evaluated after a metadata operation.

Lastly, this patch also adds a call to evm_reset_status() in
evm_inode_post_setattr() to invalidate the cached EVM status after a
setattr operation.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

9acc89d3 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Refuse EVM_ALLOW_METADATA_WRITES only if an HMAC key is loaded

EVM_ALLOW_METADATA_WRITES is an EVM initialization flag that can be set to
temporarily disable metadata verification until all xattrs/attrs necessary
to verify an EVM portable signature are copied to the file. This flag is
cleared when EVM is initialized with an HMAC key, to avoid that the HMAC is
calculated on unverified xattrs/attrs.

Currently EVM unnecessarily denies setting this flag if EVM is initialized
with a public key, which is not a concern as it cannot be used to trust
xattrs/attrs updates. This patch removes this limitation.

Fixes: ae1ba1676b88e ("EVM: Allow userland to permit modification of EVM-protected metadata")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Cc: stable@vger.kernel.org # 4.16.x
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

aa2ead71 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Load EVM key in ima_load_x509() to avoid appraisal

The public builtin keys do not need to be appraised by IMA as the
restriction on the IMA/EVM trusted keyrings ensures that a key can be
loaded only if it is signed with a key on the builtin or secondary
keyrings.

However, when evm_load_x509() is called, appraisal is already enabled and
a valid IMA signature must be added to the EVM key to pass verification.

Since the restriction is applied on both IMA and EVM trusted keyrings, it
is safe to disable appraisal also when the EVM key is loaded. This patch
calls evm_load_x509() inside ima_load_x509() if CONFIG_IMA_LOAD_X509 is
enabled, which crosses the normal IMA and EVM boundary.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

9eea2904 14-May-2021 Roberto Sassu <roberto.sassu@huawei.com>

evm: Execute evm_inode_init_security() only when an HMAC key is loaded

evm_inode_init_security() requires an HMAC key to calculate the HMAC on
initial xattrs provided by LSMs. However, it checks generically whether a
key has been loaded, including also public keys, which is not correct as
public keys are not suitable to calculate the HMAC.

Originally, support for signature verification was introduced to verify a
possibly immutable initial ram disk, when no new files are created, and to
switch to HMAC for the root filesystem. By that time, an HMAC key should
have been loaded and usable to calculate HMACs for new files.

More recently support for requiring an HMAC key was removed from the
kernel, so that signature verification can be used alone. Since this is a
legitimate use case, evm_inode_init_security() should not return an error
when no HMAC key has been loaded.

This patch fixes this problem by replacing the evm_key_loaded() check with
a check of the EVM_INIT_HMAC flag in evm_initialized.

Fixes: 26ddabfe96b ("evm: enable EVM when X509 certificate is loaded")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Cc: stable@vger.kernel.org # 4.5.x
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

49219d9b 26-Apr-2021 Mimi Zohar <zohar@linux.ibm.com>

evm: fix writing <securityfs>/evm overflow

EVM_SETUP_COMPLETE is defined as 0x80000000, which is larger than INT_MAX.
The "-fno-strict-overflow" compiler option properly prevents signaling
EVM that the EVM policy setup is complete. Define and read an unsigned
int.

Fixes: f00d79750712 ("EVM: Allow userspace to signal an RSA key has been loaded")
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

0169d8f3 25-Mar-2021 Jens Axboe <axboe@kernel.dk>

Revert "Smack: Handle io_uring kernel thread privileges"

This reverts commit 942cb357ae7d9249088e3687ee6a00ed2745a0c7.

The io_uring PF_IO_WORKER threads no longer have PF_KTHREAD set, so no
need to special case them for credential checks.

Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

dd979d7a 07-May-2021 Arnd Bergmann <arnd@arndb.de>

apparmor: use get_unaligned() only for multi-byte words

Using get_unaligned() on a u8 pointer is pointless, and will
result in a compiler warning after a planned cleanup:

In file included from arch/x86/include/generated/asm/unaligned.h:1,
from security/apparmor/policy_unpack.c:16:
security/apparmor/policy_unpack.c: In function 'unpack_u8':
include/asm-generic/unaligned.h:13:15: error: 'packed' attribute ignored for field of type 'u8' {aka 'unsigned char'} [-Werror=attributes]
13 | const struct { type x __packed; } *__pptr = (typeof(__pptr))(ptr); \
| ^

Simply dereference this pointer directly.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Johansen <john.johansen@canonical.com>

869cbeef 12-May-2021 Ondrej Mosnacek <omosnace@redhat.com>

lsm_audit,selinux: pass IB device name by reference

While trying to address a Coverity warning that the dev_name string
might end up unterminated when strcpy'ing it in
selinux_ib_endport_manage_subnet(), I realized that it is possible (and
simpler) to just pass the dev_name pointer directly, rather than copying
the string to a buffer.

The ibendport variable goes out of scope at the end of the function
anyway, so the lifetime of the dev_name pointer will never be shorter
than that of ibendport, thus we can safely just pass the dev_name
pointer and be done with it.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

b3ad7855 29-Apr-2021 Ben Boeckel <mathstuf@gmail.com>

trusted-keys: match tpm_get_ops on all return paths

The `tpm_get_ops` call at the beginning of the function is not paired
with a `tpm_put_ops` on this return path.

Cc: stable@vger.kernel.org
Fixes: f2219745250f ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

83a775d5 29-Apr-2021 Colin Ian King <colin.king@canonical.com>

KEYS: trusted: Fix memory leak on object td

Two error return paths are neglecting to free allocated object td,
causing a memory leak. Fix this by returning via the error return
path that securely kfree's td.

Fixes clang scan-build warning:
security/keys/trusted-keys/trusted_tpm1.c:496:10: warning: Potential
memory leak [unix.Malloc]

Cc: stable@vger.kernel.org
Fixes: 5df16caada3f ("KEYS: trusted: Fix incorrect handling of tpm_get_random()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

fd781f45 28-Apr-2021 Jiapeng Chong <jiapeng.chong@linux.alibaba.com>

selinux: Remove redundant assignment to rc

Variable rc is set to '-EINVAL' but this value is never read as
it is overwritten or not used later on, hence it is a redundant
assignment and can be removed.

Cleans up the following clang-analyzer warning:

security/selinux/ss/services.c:2103:3: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].

security/selinux/ss/services.c:2079:2: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].

security/selinux/ss/services.c:2071:2: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].

security/selinux/ss/services.c:2062:2: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].

security/selinux/ss/policydb.c:2592:3: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

7cffc377 25-Apr-2021 Souptick Joarder <jrdr.linux@gmail.com>

selinux: Corrected comment to match kernel-doc comment

Minor documentation update.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>

8a922805 08-Apr-2021 Zhongjun Tan <tanzhongjun@yulong.com>

selinux: delete selinux_xfrm_policy_lookup() useless argument

seliunx_xfrm_policy_lookup() is hooks of security_xfrm_policy_lookup().
The dir argument is uselss in security_xfrm_policy_lookup(). So
remove the dir argument from selinux_xfrm_policy_lookup() and
security_xfrm_policy_lookup().

Signed-off-by: Zhongjun Tan <tanzhongjun@yulong.com>
[PM: reformat the subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>

e1cce3a3 30-Mar-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: constify some avtab function arguments

This makes the code a bit easier to reason about.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

fba472bb 30-Mar-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: simplify duplicate_policydb_cond_list() by using kmemdup()

We can do the allocation + copying of expr.nodes in one go using
kmemdup().

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

49ec114a 12-Apr-2021 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>

smackfs: restrict bytes count in smk_set_cipso()

Oops, I failed to update subject line.

From 07571157c91b98ce1a4aa70967531e64b78e8346 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 12 Apr 2021 22:25:06 +0900
Subject: [PATCH] smackfs: restrict bytes count in smk_set_cipso()

Commit 7ef4c19d245f3dc2 ("smackfs: restrict bytes count in smackfs write
functions") missed that count > SMK_CIPSOMAX check applies to only
format == SMK_FIXED24_FMT case.

Reported-by: syzbot <syzbot+77c53db50c9fff774e8e@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

2e08fb55 18-Mar-2021 Xiong Zhenwu <xiong.zhenwu@zte.com.cn>

security/smack/: fix misspellings using codespell tool

A typo is found out by codespell tool in 383th line of smackfs.c:

$ codespell ./security/smack/
./smackfs.c:383: numer ==> number

Fix a typo found by codespell.

Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

d29c9bb0 05-May-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'safesetid-5.13' of git://github.com/micah-morton/linux

Pull SafeSetID update from Micah Morton:
"Simple code cleanup

This just has a single three-line code cleanup to eliminate some
unnecessary 'break' statements"

* tag 'safesetid-5.13' of git://github.com/micah-morton/linux:
LSM: SafeSetID: Fix code specification by scripts/checkpatch.pl


27787ba3 02-May-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs updates from Al Viro:
"Assorted stuff all over the place"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
useful constants: struct qstr for ".."
hostfs_open(): don't open-code file_dentry()
whack-a-mole: kill strlen_user() (again)
autofs: should_expire() argument is guaranteed to be positive
apparmor:match_mn() - constify devpath argument
buffer: a small optimization in grow_buffers
get rid of autofs_getpath()
constify dentry argument of dentry_path()/dentry_path_raw()


17ae69ab 01-May-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull Landlock LSM from James Morris:
"Add Landlock, a new LSM from Mickaël Salaün.

Briefly, Landlock provides for unprivileged application sandboxing.

From Mickaël's cover letter:
"The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock
is a stackable LSM [1], it makes possible to create safe security
sandboxes as new security layers in addition to the existing
system-wide access-controls. This kind of sandbox is expected to
help mitigate the security impact of bugs or unexpected/malicious
behaviors in user-space applications. Landlock empowers any
process, including unprivileged ones, to securely restrict
themselves.

Landlock is inspired by seccomp-bpf but instead of filtering
syscalls and their raw arguments, a Landlock rule can restrict the
use of kernel objects like file hierarchies, according to the
kernel semantic. Landlock also takes inspiration from other OS
sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD
Pledge/Unveil.

In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This
series still addresses multiple use cases, especially with the
combined use of seccomp-bpf: applications with built-in sandboxing,
init systems, security sandbox tools and security-oriented APIs [2]"

The cover letter and v34 posting is here:

https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/

See also:

https://landlock.io/

This code has had extensive design discussion and review over several
years"

Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1]
Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2]

* tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
landlock: Enable user space to infer supported features
landlock: Add user and kernel documentation
samples/landlock: Add a sandbox manager example
selftests/landlock: Add user space tests
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
LSM: Infrastructure management of the superblock
landlock: Add ptrace restrictions
landlock: Set up the security framework and manage credentials
landlock: Add ruleset and domain management
landlock: Add object management


e6f0bf09 01-May-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull IMA updates from Mimi Zohar:
"In addition to loading the kernel module signing key onto the builtin
keyring, load it onto the IMA keyring as well.

Also six trivial changes and bug fixes"

* tag 'integrity-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: ensure IMA_APPRAISE_MODSIG has necessary dependencies
ima: Fix fall-through warnings for Clang
integrity: Add declarations to init_once void arguments.
ima: Fix function name error in comment.
ima: enable loading of build time generated key on .ima keyring
ima: enable signing of modules with build time generated key
keys: cleanup build time module signing keys
ima: Fix the error code for restoring the PCR value
ima: without an IMA policy loaded, return quickly


9d31d233 29-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
"Core:

- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to walk
all map elements in a more robust and easier to verify fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF on
s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets

- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks

- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices which don't
need headers in the linear skb part (e.g. virtio)

- nexthop: resilient next-hop groups - improve path stability on
next-hops group changes (incl. offload for mlxsw)

- ipv6: segment routing: add support for IPv4 decapsulation

- icmp: add support for RFC 8335 extended PROBE messages

- inet: use bigger hash table for IP ID generation

- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is slow in
reporting that it completed transmitting the original

- tcp: reorder tcp_congestion_ops for better cache locality

- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow

- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take place
correctly even for encapsulated UDP traffic

- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO

- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls

- veth: allow GRO without XDP - this allows aggregating UDP packets
before handing them off to routing, bridge, OvS, etc.

- allow specifing ifindex when device is moved to another namespace

- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used to
define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily

- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic

- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing

Device APIs:

- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
independent APIs

- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and bnxt
support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP which
define EEPROM in terms of pages (incl. mlx5 support)

- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)

- psample: add additional metadata attributes like transit delay for
packets sampled from switch HW (and corresponding egress and
policy-based sampling in the mlxsw driver)

- dsa: improve support for sandwiched LAGs with bridge and DSA

- netfilter:
- flowtable: use direct xmit in topologies with IP forwarding,
bridging, vlans etc.
- nftables: counter hardware offload support

- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver

- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames

- phy: add support for Clause-45 PHY Loopback

- pci/iov: add sysfs MSI-X vector assignment interface to distribute
MSI-X resources to VFs (incl. mlx5 support)

New hardware/drivers:

- dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port
Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit
interfaces.

- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and
BCM63xx switches

- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches

- ath11k: support for QCN9074 a 802.11ax device

- Bluetooth: Broadcom BCM4330 and BMC4334

- phy: Marvell 88X2222 transceiver support

- mdio: add BCM6368 MDIO mux bus controller

- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips

- mana: driver for Microsoft Azure Network Adapter (MANA)

- Actions Semi Owl Ethernet MAC

- can: driver for ETAS ES58X CAN/USB interfaces

Pure driver changes:

- add XDP support to: enetc, igc, stmmac

- add AF_XDP support to: stmmac

- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary

- mlx5:
- flow rules: add support for mirroring with conntrack, matching
on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting

- ice, iavf: support flow filters, UDP Segmentation Offload

- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic

- ionic:
- implement Rx page reuse
- support HW PTP time-stamping

- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.

- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment

- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping

- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.

- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)

- mt7601u: enable TDLS support

- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes"

* tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits)
net: selftest: fix build issue if INET is disabled
net: netrom: nr_in: Remove redundant assignment to ns
net: tun: Remove redundant assignment to ret
net: phy: marvell: add downshift support for M88E1240
net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255
net/sched: act_ct: Remove redundant ct get and check
icmp: standardize naming of RFC 8335 PROBE constants
bpf, selftests: Update array map tests for per-cpu batched ops
bpf: Add batched ops support for percpu array
bpf: Implement formatted output helpers with bstr_printf
seq_file: Add a seq_bprintf function
sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
net:nfc:digital: Fix a double free in digital_tg_recv_dep_req
net: fix a concurrency bug in l2tp_tunnel_register()
net/smc: Remove redundant assignment to rc
mpls: Remove redundant assignment to err
llc2: Remove redundant assignment to rc
net/tls: Remove redundant initialization of record
rds: Remove redundant assignment to nr_sig
dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0
...


0080665f 28-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'devicetree-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:

- Refactor powerpc and arm64 kexec DT handling to common code. This
enables IMA on arm64.

- Add kbuild support for applying DT overlays at build time. The first
user are the DT unittests.

- Fix kerneldoc formatting and W=1 warnings in drivers/of/

- Fix handling 64-bit flag on PCI resources

- Bump dtschema version required to v2021.2.1

- Enable undocumented compatible checks for dtbs_check. This allows
tracking of missing binding schemas.

- DT docs improvements. Regroup the DT docs and add the example schema
and DT kernel ABI docs to the doc build.

- Convert Broadcom Bluetooth and video-mux bindings to schema

- Add QCom sm8250 Venus video codec binding schema

- Add vendor prefixes for AESOP, YIC System Co., Ltd, and Siliconfile
Technologies Inc.

- Cleanup of DT schema type references on common properties and
standard unit properties

* tag 'devicetree-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (64 commits)
powerpc: If kexec_build_elf_info() fails return immediately from elf64_load()
powerpc: Free fdt on error in elf64_load()
of: overlay: Fix kerneldoc warning in of_overlay_remove()
of: linux/of.h: fix kernel-doc warnings
of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses
dt-bindings: bcm4329-fmac: add optional brcm,ccode-map
docs: dt: update writing-schema.rst references
dt-bindings: media: venus: Add sm8250 dt schema
of: base: Fix spelling issue with function param 'prop'
docs: dt: Add DT API documentation
of: Add missing 'Return' section in kerneldoc comments
of: Fix kerneldoc output formatting
docs: dt: Group DT docs into relevant sub-sections
docs: dt: Make 'Devicetree' wording more consistent
docs: dt: writing-schema: Include the example schema in the doc build
docs: dt: writing-schema: Remove spurious indentation
dt-bindings: Fix reference in submitting-patches.rst to the DT ABI doc
dt-bindings: ddr: Add optional manufacturer and revision ID to LPDDR3
dt-bindings: media: video-interfaces: Drop the example
devicetree: bindings: clock: Minor typo fix in the file armada3700-tbg-clock.txt
...


acd3d285 27-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'fixes-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull security layer fixes from James Morris:
"Miscellaneous minor fixes"

* tag 'fixes-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security: commoncap: clean up kernel-doc comments
security: commoncap: fix -Wstringop-overread warning


f1c921fb 27-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

- Add support for measuring the SELinux state and policy capabilities
using IMA.

- A handful of SELinux/NFS patches to compare the SELinux state of one
mount with a set of mount options. Olga goes into more detail in the
patch descriptions, but this is important as it allows more
flexibility when using NFS and SELinux context mounts.

- Properly differentiate between the subjective and objective LSM
credentials; including support for the SELinux and Smack. My clumsy
attempt at a proper fix for AppArmor didn't quite pass muster so John
is working on a proper AppArmor patch, in the meantime this set of
patches shouldn't change the behavior of AppArmor in any way. This
change explains the bulk of the diffstat beyond security/.

- Fix a problem where we were not properly terminating the permission
list for two SELinux object classes.

* tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: add proper NULL termination to the secclass_map permissions
smack: differentiate between subjective and objective task credentials
selinux: clarify task subjective and objective credentials
lsm: separate security_task_getsecid() into subjective and objective variants
nfs: account for selinux security context when deciding to share superblock
nfs: remove unneeded null check in nfs_fill_super()
lsm,selinux: add new hook to compare new mount to an existing mount
selinux: fix misspellings using codespell tool
selinux: fix misspellings using codespell tool
selinux: measure state and policy capabilities
selinux: Allow context mounts for unpriviliged overlayfs


1ca86ac1 09-Mar-2021 Yanwei Gao <gaoyanwei.tx@gmail.com>

LSM: SafeSetID: Fix code specification by scripts/checkpatch.pl

First, the code is found to be irregular through checkpatch.pl.
Then I found break is really useless here.

Signed-off-by: Yanwei Gao <gaoyanwei.tx@gmail.com>
Signed-off-by: Micah Morton <mortonm@chromium.org>

a4a78bc8 26-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
"API:

- crypto_destroy_tfm now ignores errors as well as NULL pointers

Algorithms:

- Add explicit curve IDs in ECDH algorithm names

- Add NIST P384 curve parameters

- Add ECDSA

Drivers:

- Add support for Green Sardine in ccp

- Add ecdh/curve25519 to hisilicon/hpre

- Add support for AM64 in sa2ul"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (184 commits)
fsverity: relax build time dependency on CRYPTO_SHA256
fscrypt: relax Kconfig dependencies for crypto API algorithms
crypto: camellia - drop duplicate "depends on CRYPTO"
crypto: s5p-sss - consistently use local 'dev' variable in probe()
crypto: s5p-sss - remove unneeded local variable initialization
crypto: s5p-sss - simplify getting of_device_id match data
ccp: ccp - add support for Green Sardine
crypto: ccp - Make ccp_dev_suspend and ccp_dev_resume void functions
crypto: octeontx2 - add support for OcteonTX2 98xx CPT block.
crypto: chelsio/chcr - Remove useless MODULE_VERSION
crypto: ux500/cryp - Remove duplicate argument
crypto: chelsio - remove unused function
crypto: sa2ul - Add support for AM64
crypto: sa2ul - Support for per channel coherency
dt-bindings: crypto: ti,sa2ul: Add new compatible for AM64
crypto: hisilicon - enable new error types for QM
crypto: hisilicon - add new error type for SEC
crypto: hisilicon - support new error types for ZIP
crypto: hisilicon - dynamic configuration 'err_info'
crypto: doc - fix kernel-doc notation in chacha.c and af_alg.c
...


b0e22b47 26-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'keys-cve-2020-26541-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull x509 dbx/mokx UEFI support from David Howells:
"Here's a set of patches from Eric Snowberg[1] that add support for
EFI_CERT_X509_GUID entries in the dbx and mokx UEFI tables (such
entries cause matching certificates to be rejected).

These are currently ignored and only the hash entries are made use of.

Additionally Eric included his patches to allow such certificates to
be preloaded.

These patches deal with CVE-2020-26541.

To quote Eric:
'This is the fifth patch series for adding support for
EFI_CERT_X509_GUID entries [2]. It has been expanded to not only
include dbx entries but also entries in the mokx. Additionally
my series to preload these certificate [3] has also been
included'"

Link: https://lore.kernel.org/r/20210122181054.32635-1-eric.snowberg@oracle.com [1]
Link: https://patchwork.kernel.org/project/linux-security-module/patch/20200916004927.64276-1-eric.snowberg@oracle.com/ [2]
Link: https://lore.kernel.org/patchwork/cover/1315485/ [3]

* tag 'keys-cve-2020-26541-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
integrity: Load mokx variables into the blacklist keyring
certs: Add ability to preload revocation certs
certs: Move load_system_certificate_list to a common function
certs: Add EFI_CERT_X509_GUID support for dbx entries


87f27e7b 26-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'queue' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/tpmdd

Pull tpm fixes from James Bottomley:
"Fix a regression in the TPM trusted keys caused by the generic rework
to add ARM TEE based trusted keys.

Without this fix, the TPM trusted key subsystem fails to add or load
any keys"

* tag 'queue' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/tpmdd:
KEYS: trusted: fix TPM trusted keys for generic framework


7dd1ce1a 26-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'tpmdd-next-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm updates from Jarkko Sakkinen:
"New features:

- ARM TEE backend for kernel trusted keys to complete the existing
TPM backend

- ASN.1 format for TPM2 trusted keys to make them interact with the
user space stack, such as OpenConnect VPN

Other than that, a bunch of bug fixes"

* tag 'tpmdd-next-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: Fix missing null return from kzalloc call
char: tpm: fix error return code in tpm_cr50_i2c_tis_recv()
MAINTAINERS: Add entry for TEE based Trusted Keys
doc: trusted-encrypted: updates with TEE as a new trust source
KEYS: trusted: Introduce TEE based Trusted Keys
KEYS: trusted: Add generic trusted keys framework
security: keys: trusted: Make sealed key properly interoperable
security: keys: trusted: use ASN.1 TPM2 key format for the blobs
security: keys: trusted: fix TPM2 authorizations
oid_registry: Add TCG defined OIDS for TPM keys
lib: Add ASN.1 encoder
tpm: vtpm_proxy: Avoid reading host log when using a virtual device
tpm: acpi: Check eventlog signature before using it
tpm: efi: Use local variable for calculating final log size


3532b0b4 22-Apr-2021 Mickaël Salaün <mic@linux.microsoft.com>

landlock: Enable user space to infer supported features

Add a new flag LANDLOCK_CREATE_RULESET_VERSION to
landlock_create_ruleset(2). This enables to retreive a Landlock ABI
version that is useful to efficiently follow a best-effort security
approach. Indeed, it would be a missed opportunity to abort the whole
sandbox building, because some features are unavailable, instead of
protecting users as much as possible with the subset of features
provided by the running kernel.

This new flag enables user space to identify the minimum set of Landlock
features supported by the running kernel without relying on a filesystem
interface (e.g. /proc/version, which might be inaccessible) nor testing
multiple syscall argument combinations (i.e. syscall bisection). New
Landlock features will be documented and tied to a minimum version
number (greater than 1). The current version will be incremented for
each new kernel release supporting new Landlock features. User space
libraries can leverage this information to seamlessly restrict processes
as much as possible while being compatible with newer APIs.

This is a much more lighter approach than the previous
landlock_get_features(2): the complexity is pushed to user space
libraries. This flag meets similar needs as securityfs versions:
selinux/policyvers, apparmor/features/*/version* and tomoyo/version.

Supporting this flag now will be convenient for backward compatibility.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-14-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

265885da 22-Apr-2021 Mickaël Salaün <mic@linux.microsoft.com>

landlock: Add syscall implementations

These 3 system calls are designed to be used by unprivileged processes
to sandbox themselves:
* landlock_create_ruleset(2): Creates a ruleset and returns its file
descriptor.
* landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
ruleset, identified by the dedicated file descriptor.
* landlock_restrict_self(2): Enforces a ruleset on the calling thread
and its future children (similar to seccomp). This syscall has the
same usage restrictions as seccomp(2): the caller must have the
no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
namespace.

All these syscalls have a "flags" argument (not currently used) to
enable extensibility.

Here are the motivations for these new syscalls:
* A sandboxed process may not have access to file systems, including
/dev, /sys or /proc, but it should still be able to add more
restrictions to itself.
* Neither prctl(2) nor seccomp(2) (which was used in a previous version)
fit well with the current definition of a Landlock security policy.

All passed structs (attributes) are checked at build time to ensure that
they don't contain holes and that they are aligned the same way for each
architecture.

See the user and kernel documentation for more details (provided by a
following commit):
* Documentation/userspace-api/landlock.rst
* Documentation/security/landlock.rst

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Link: https://lore.kernel.org/r/20210422154123.13086-9-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

83e804f0 22-Apr-2021 Mickaël Salaün <mic@linux.microsoft.com>

fs,security: Add sb_delete hook

The sb_delete security hook is called when shutting down a superblock,
which may be useful to release kernel objects tied to the superblock's
lifetime (e.g. inodes).

This new hook is needed by Landlock to release (ephemerally) tagged
struct inodes. This comes from the unprivileged nature of Landlock
described in the next commit.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-7-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

cb2c7d1a 22-Apr-2021 Mickaël Salaün <mic@linux.microsoft.com>

landlock: Support filesystem access-control

Using Landlock objects and ruleset, it is possible to tag inodes
according to a process's domain. To enable an unprivileged process to
express a file hierarchy, it first needs to open a directory (or a file)
and pass this file descriptor to the kernel through
landlock_add_rule(2). When checking if a file access request is
allowed, we walk from the requested dentry to the real root, following
the different mount layers. The access to each "tagged" inodes are
collected according to their rule layer level, and ANDed to create
access to the requested file hierarchy. This makes possible to identify
a lot of files without tagging every inodes nor modifying the
filesystem, while still following the view and understanding the user
has from the filesystem.

Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
keep the same struct inodes for the same inodes whereas these inodes are
in use.

This commit adds a minimal set of supported filesystem access-control
which doesn't enable to restrict all file-related actions. This is the
result of multiple discussions to minimize the code of Landlock to ease
review. Thanks to the Landlock design, extending this access-control
without breaking user space will not be a problem. Moreover, seccomp
filters can be used to restrict the use of syscall families which may
not be currently handled by Landlock.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-8-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

1aea7808 22-Apr-2021 Casey Schaufler <casey@schaufler-ca.com>

LSM: Infrastructure management of the superblock

Move management of the superblock->sb_security blob out of the
individual security modules and into the security infrastructure.
Instead of allocating the blobs from within the modules, the modules
tell the infrastructure how much space is required, and the space is
allocated there.

Cc: John Johansen <john.johansen@canonical.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-6-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

afe81f75 22-Apr-2021 Mickaël Salaün <mic@linux.microsoft.com>

landlock: Add ptrace restrictions

Using ptrace(2) and related debug features on a target process can lead
to a privilege escalation. Indeed, ptrace(2) can be used by an attacker
to impersonate another task and to remain undetected while performing
malicious activities. Thanks to ptrace_may_access(), various part of
the kernel can check if a tracer is more privileged than a tracee.

A landlocked process has fewer privileges than a non-landlocked process
and must then be subject to additional restrictions when manipulating
processes. To be allowed to use ptrace(2) and related syscalls on a
target process, a landlocked process must have a subset of the target
process's rules (i.e. the tracee must be in a sub-domain of the tracer).

Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-5-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

385975dc 22-Apr-2021 Mickaël Salaün <mic@linux.microsoft.com>

landlock: Set up the security framework and manage credentials

Process's credentials point to a Landlock domain, which is underneath
implemented with a ruleset. In the following commits, this domain is
used to check and enforce the ptrace and filesystem security policies.
A domain is inherited from a parent to its child the same way a thread
inherits a seccomp policy.

Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-4-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

ae271c1b 22-Apr-2021 Mickaël Salaün <mic@linux.microsoft.com>

landlock: Add ruleset and domain management

A Landlock ruleset is mainly a red-black tree with Landlock rules as
nodes. This enables quick update and lookup to match a requested
access, e.g. to a file. A ruleset is usable through a dedicated file
descriptor (cf. following commit implementing syscalls) which enables a
process to create and populate a ruleset with new rules.

A domain is a ruleset tied to a set of processes. This group of rules
defines the security policy enforced on these processes and their future
children. A domain can transition to a new domain which is the
intersection of all its constraints and those of a ruleset provided by
the current process. This modification only impact the current process.
This means that a process can only gain more constraints (i.e. lose
accesses) over time.

Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20210422154123.13086-3-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

90945448 22-Apr-2021 Mickaël Salaün <mic@linux.microsoft.com>

landlock: Add object management

A Landlock object enables to identify a kernel object (e.g. an inode).
A Landlock rule is a set of access rights allowed on an object. Rules
are grouped in rulesets that may be tied to a set of processes (i.e.
subjects) to enforce a scoped access-control (i.e. a domain).

Because Landlock's goal is to empower any process (especially
unprivileged ones) to sandbox themselves, we cannot rely on a
system-wide object identification such as file extended attributes.
Indeed, we need innocuous, composable and modular access-controls.

The main challenge with these constraints is to identify kernel objects
while this identification is useful (i.e. when a security policy makes
use of this object). But this identification data should be freed once
no policy is using it. This ephemeral tagging should not and may not be
written in the filesystem. We then need to manage the lifetime of a
rule according to the lifetime of its objects. To avoid a global lock,
this implementation make use of RCU and counters to safely reference
objects.

A following commit uses this generic object management for inodes.

Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-2-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

e4c82eaf 21-Apr-2021 Paul Moore <paul@paul-moore.com>

selinux: add proper NULL termination to the secclass_map permissions

This patch adds the missing NULL termination to the "bpf" and
"perf_event" object class permission lists.

This missing NULL termination should really only affect the tools
under scripts/selinux, with the most important being genheaders.c,
although in practice this has not been an issue on any of my dev/test
systems. If the problem were to manifest itself it would likely
result in bogus permissions added to the end of the object class;
thankfully with no access control checks using these bogus
permissions and no policies defining these permissions the impact
would likely be limited to some noise about undefined permissions
during policy load.

Cc: stable@vger.kernel.org
Fixes: ec27c3568a34 ("selinux: bpf: Add selinux check for eBPF syscall operations")
Fixes: da97e18458fb ("perf_event: Add support for LSM and SELinux checks")
Signed-off-by: Paul Moore <paul@paul-moore.com>

60dc5f1b 21-Apr-2021 James Bottomley <James.Bottomley@HansenPartnership.com>

KEYS: trusted: fix TPM trusted keys for generic framework

The generic framework patch broke the current TPM trusted keys because
it doesn't correctly remove the values consumed by the generic parser
before passing them on to the implementation specific parser. Fix
this by having the generic parser return the string minus the consumed
tokens.

Additionally, there may be no tokens left for the implementation
specific parser, so make it handle the NULL case correctly and finally
fix a TPM 1.2 specific check for no keyhandle.

Fixes: 5d0682be3189 ("KEYS: trusted: Add generic trusted keys framework")
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

9d5171ea 21-Apr-2021 James Bottomley <James.Bottomley@HansenPartnership.com>

KEYS: trusted: Fix TPM reservation for seal/unseal

The original patch 8c657a0590de ("KEYS: trusted: Reserve TPM for seal
and unseal operations") was correct on the mailing list:

https://lore.kernel.org/linux-integrity/20210128235621.127925-4-jarkko@kernel.org/

But somehow got rebased so that the tpm_try_get_ops() in
tpm2_seal_trusted() got lost. This causes an imbalanced put of the
TPM ops and causes oopses on TIS based hardware.

This fix puts back the lost tpm_try_get_ops()

Fixes: 8c657a0590de ("KEYS: trusted: Reserve TPM for seal and unseal operations")
Reported-by: Mimi Zohar <zohar@linux.ibm.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

28073eb0 19-Nov-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

ima: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

8203c7ce 17-Apr-2021 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
- keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
- fix build after move to net_generic

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


02c58773 16-Apr-2021 Walter Wu <walter-zh.wu@mediatek.com>

kasan: remove redundant config option

CONFIG_KASAN_STACK and CONFIG_KASAN_STACK_ENABLE both enable KASAN stack
instrumentation, but we should only need one config, so that we remove
CONFIG_KASAN_STACK_ENABLE and make CONFIG_KASAN_STACK workable. see [1].

When enable KASAN stack instrumentation, then for gcc we could do no
prompt and default value y, and for clang prompt and default value n.

This patch fixes the following compilation warning:

include/linux/kasan.h:333:30: warning: 'CONFIG_KASAN_STACK' is not defined, evaluates to 0 [-Wundef]

[akpm@linux-foundation.org: fix merge snafu]

Link: https://bugzilla.kernel.org/show_bug.cgi?id=210221 [1]
Link: https://lkml.kernel.org/r/20210226012531.29231-1-walter-zh.wu@mediatek.com
Fixes: d9b571c885a8 ("kasan: fix KASAN_STACK dependency for HW_TAGS")
Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

049ae601 11-Apr-2021 Randy Dunlap <rdunlap@infradead.org>

security: commoncap: clean up kernel-doc comments

Fix kernel-doc notation in commoncap.c.

Use correct (matching) function name in comments as in code.
Use correct function argument names in kernel-doc comments.
Use kernel-doc's "Return:" format for function return values.

Fixes these kernel-doc warnings:

../security/commoncap.c:1206: warning: expecting prototype for cap_task_ioprio(). Prototype was for cap_task_setioprio() instead
../security/commoncap.c:1219: warning: expecting prototype for cap_task_ioprio(). Prototype was for cap_task_setnice() instead

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

aec00aa0 12-Apr-2021 Colin Ian King <colin.king@canonical.com>

KEYS: trusted: Fix missing null return from kzalloc call

The kzalloc call can return null with the GFP_KERNEL flag so
add a null check and exit via a new error exit label. Use the
same exit error label for another error path too.

Addresses-Coverity: ("Dereference null return value")
Fixes: 830027e2cb55 ("KEYS: trusted: Add generic trusted keys framework")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

0a95ebc9 01-Mar-2021 Sumit Garg <sumit.garg@linaro.org>

KEYS: trusted: Introduce TEE based Trusted Keys

Add support for TEE based trusted keys where TEE provides the functionality
to seal and unseal trusted keys using hardware unique key.

Refer to Documentation/staging/tee.rst for detailed information about TEE.

Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

5d0682be 01-Mar-2021 Sumit Garg <sumit.garg@linaro.org>

KEYS: trusted: Add generic trusted keys framework

Current trusted keys framework is tightly coupled to use TPM device as
an underlying implementation which makes it difficult for implementations
like Trusted Execution Environment (TEE) etc. to provide trusted keys
support in case platform doesn't posses a TPM device.

Add a generic trusted keys framework where underlying implementations
can be easily plugged in. Create struct trusted_key_ops to achieve this,
which contains necessary functions of a backend.

Also, define a module parameter in order to select a particular trust
source in case a platform support multiple trust sources. In case its
not specified then implementation itetrates through trust sources list
starting with TPM and assign the first trust source as a backend which
has initiazed successfully during iteration.

Note that current implementation only supports a single trust source at
runtime which is either selectable at compile time or during boot via
aforementioned module parameter.

Suggested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

e5fb5d2c 27-Jan-2021 James Bottomley <James.Bottomley@HansenPartnership.com>

security: keys: trusted: Make sealed key properly interoperable

The current implementation appends a migratable flag to the end of a
key, meaning the format isn't exactly interoperable because the using
party needs to know to strip this extra byte. However, all other
consumers of TPM sealed blobs expect the unseal to return exactly the
key. Since TPM2 keys have a key property flag that corresponds to
migratable, use that flag instead and make the actual key the only
sealed quantity. This is secure because the key properties are bound
to a hash in the private part, so if they're altered the key won't
load.

Backwards compatibility is implemented by detecting whether we're
loading a new format key or not and correctly setting migratable from
the last byte of old format keys.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

f2219745 27-Jan-2021 James Bottomley <James.Bottomley@HansenPartnership.com>

security: keys: trusted: use ASN.1 TPM2 key format for the blobs

Modify the TPM2 key format blob output to export and import in the
ASN.1 form for TPM2 sealed object keys. For compatibility with prior
trusted keys, the importer will also accept two TPM2B quantities
representing the public and private parts of the key. However, the
export via keyctl pipe will only output the ASN.1 format.

The benefit of the ASN.1 format is that it's a standard and thus the
exported key can be used by userspace tools (openssl_tpm2_engine,
openconnect and tpm2-tss-engine). The format includes policy
specifications, thus it gets us out of having to construct policy
handles in userspace and the format includes the parent meaning you
don't have to keep passing it in each time.

This patch only implements basic handling for the ASN.1 format, so
keys with passwords but no policy.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

de66514d 27-Jan-2021 James Bottomley <James.Bottomley@HansenPartnership.com>

security: keys: trusted: fix TPM2 authorizations

In TPM 1.2 an authorization was a 20 byte number. The spec actually
recommended you to hash variable length passwords and use the sha1
hash as the authorization. Because the spec doesn't require this
hashing, the current authorization for trusted keys is a 40 digit hex
number. For TPM 2.0 the spec allows the passing in of variable length
passwords and passphrases directly, so we should allow that in trusted
keys for ease of use. Update the 'blobauth' parameter to take this
into account, so we can now use plain text passwords for the keys.

so before

keyctl add trusted kmk "new 32 blobauth=f572d396fae9206628714fb2ce00f72e94f2258fkeyhandle=81000001" @u

after we will accept both the old hex sha1 form as well as a new
directly supplied password:

keyctl add trusted kmk "new 32 blobauth=hello keyhandle=81000001" @u

Since a sha1 hex code must be exactly 40 bytes long and a direct
password must be 20 or less, we use the length as the discriminator
for which form is input.

Note this is both and enhancement and a potential bug fix. The TPM
2.0 spec requires us to strip leading zeros, meaning empyty
authorization is a zero length HMAC whereas we're currently passing in
20 bytes of zeros. A lot of TPMs simply accept this as OK, but the
Microsoft TPM emulator rejects it with TPM_RC_BAD_AUTH, so this patch
makes the Microsoft TPM emulator work with trusted keys.

Fixes: 0fe5480303a1 ("keys, trusted: seal/unseal with TPM 2.0 chips")
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

8859a44e 09-Apr-2021 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Conflicts:

MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


60144b23 09-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20210409' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fixes from Paul Moore:
"Three SELinux fixes.

These fix known problems relating to (re)loading SELinux policy or
changing the policy booleans, and pass our test suite without problem"

* tag 'selinux-pr-20210409' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix race between old and new sidtab
selinux: fix cond_list corruption when changing booleans
selinux: make nslot handling in avtab more robust


282c0a4d 06-Apr-2021 Jiele Zhao <unclexiaole@gmail.com>

integrity: Add declarations to init_once void arguments.

init_once is a callback to kmem_cache_create. The parameter
type of this function is void *, so it's better to give a
explicit cast here.

Signed-off-by: Jiele Zhao <unclexiaole@gmail.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

41d75dd9 05-Apr-2021 Jiele Zhao <unclexiaole@gmail.com>

ima: Fix function name error in comment.

The original function name was ima_path_check(). The policy parsing
still supports PATH_CHECK. Commit 9bbb6cad0173 ("ima: rename
ima_path_check to ima_file_check") renamed the function to
ima_file_check(), but missed modifying the function name in the
comment.

Fixes: 9bbb6cad0173 ("ima: rename ima_path_check to ima_file_check").

Signed-off-by: Jiele Zhao <unclexiaole@gmail.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

6cbdfb3d 09-Apr-2021 Nayna Jain <nayna@linux.ibm.com>

ima: enable loading of build time generated key on .ima keyring

The kernel currently only loads the kernel module signing key onto the
builtin trusted keyring. Load the module signing key onto the IMA keyring
as well.

Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Acked-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

9ad6e9cb 07-Apr-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: fix race between old and new sidtab

Since commit 1b8b31a2e612 ("selinux: convert policy read-write lock to
RCU"), there is a small window during policy load where the new policy
pointer has already been installed, but some threads may still be
holding the old policy pointer in their read-side RCU critical sections.
This means that there may be conflicting attempts to add a new SID entry
to both tables via sidtab_context_to_sid().

See also (and the rest of the thread):
https://lore.kernel.org/selinux/CAFqZXNvfux46_f8gnvVvRYMKoes24nwm2n3sPbMjrB8vKTW00g@mail.gmail.com/

Fix this by installing the new policy pointer under the old sidtab's
spinlock along with marking the old sidtab as "frozen". Then, if an
attempt to add new entry to a "frozen" sidtab is detected, make
sidtab_context_to_sid() return -ESTALE to indicate that a new policy
has been installed and that the caller will have to abort the policy
transaction and try again after re-taking the policy pointer (which is
guaranteed to be a newer policy). This requires adding a retry-on-ESTALE
logic to all callers of sidtab_context_to_sid(), but fortunately these
are easy to determine and aren't that many.

This seems to be the simplest solution for this problem, even if it
looks somewhat ugly. Note that other places in the kernel (e.g.
do_mknodat() in fs/namei.c) use similar stale-retry patterns, so I think
it's reasonable.

Cc: stable@vger.kernel.org
Fixes: 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

d8f5f0ea 02-Apr-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: fix cond_list corruption when changing booleans

Currently, duplicate_policydb_cond_list() first copies the whole
conditional avtab and then tries to link to the correct entries in
cond_dup_av_list() using avtab_search(). However, since the conditional
avtab may contain multiple entries with the same key, this approach
often fails to find the right entry, potentially leading to wrong rules
being activated/deactivated when booleans are changed.

To fix this, instead start with an empty conditional avtab and add the
individual entries one-by-one while building the new av_lists. This
approach leads to the correct result, since each entry is present in the
av_lists exactly once.

The issue can be reproduced with Fedora policy as follows:

# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True
# setsebool ftpd_anon_write=off ftpd_connect_all_unreserved=off ftpd_connect_db=off ftpd_full_access=off

On fixed kernels, the sesearch output is the same after the setsebool
command:

# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True

While on the broken kernels, it will be different:

# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True

While there, also simplify the computation of nslots. This changes the
nslots values for nrules 2 or 3 to just two slots instead of 4, which
makes the sequence more consistent.

Cc: stable@vger.kernel.org
Fixes: c7c556f1e81b ("selinux: refactor changing booleans")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

442dc00f 02-Apr-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: make nslot handling in avtab more robust

1. Make sure all fileds are initialized in avtab_init().
2. Slightly refactor avtab_alloc() to use the above fact.
3. Use h->nslot == 0 as a sentinel in the access functions to prevent
dereferencing h->htable when it's not allocated.

Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

4e53d170 26-Mar-2021 Jens Axboe <axboe@kernel.dk>

tomoyo: don't special case PF_IO_WORKER for PF_KTHREAD

Since commit 3bfe6106693b6b4b ("io-wq: fork worker threads from original
task") stopped using PF_KTHREAD flag for the io_uring PF_IO_WORKER threads,
tomoyo_kernel_service() no longer needs to check PF_IO_WORKER flag.

(This is a 5.12+ patch. Please don't send to stable kernels.)

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

947d7059 16-Mar-2021 Stefan Berger <stefanb@linux.ibm.com>

ima: Support EC keys for signature verification

Add support for IMA signature verification for EC keys. Since SHA type
of hashes can be used by RSA and ECDSA signature schemes we need to
look at the key and derive from the key which signature scheme to use.
Since this can be applied to all types of keys, we change the selection
of the encoding type to be driven by the key's signature scheme rather
than by the hash type.

Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: linux-integrity@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: keyrings@vger.kernel.org
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Vitaly Chikunov <vt@altlinux.org>
Reviewed-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

db24726b 25-Mar-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.12-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity fix from Mimi Zohar:
"Just one patch to address a NULL ptr dereferencing when there is a
mismatch between the user enabled LSMs and IMA/EVM"

* tag 'integrity-v5.12-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: double check iint_cache was initialized


efd13b71 25-Mar-2021 David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Signed-off-by: David S. Miller <davem@davemloft.net>


82e5d8cc 22-Mar-2021 Arnd Bergmann <arnd@arndb.de>

security: commoncap: fix -Wstringop-overread warning

gcc-11 introdces a harmless warning for cap_inode_getsecurity:

security/commoncap.c: In function ‘cap_inode_getsecurity’:
security/commoncap.c:440:33: error: ‘memcpy’ reading 16 bytes from a region of size 0 [-Werror=stringop-overread]
440 | memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The problem here is that tmpbuf is initialized to NULL, so gcc assumes
it is not accessible unless it gets set by vfs_getxattr_alloc(). This is
a legitimate warning as far as I can tell, but the code is correct since
it correctly handles the error when that function fails.

Add a separate NULL check to tell gcc about it as well.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: James Morris <jamorris@linux.microsoft.com>

64b2f34f 24-Mar-2021 Al Viro <viro@zeniv.linux.org.uk>

apparmor:match_mn() - constify devpath argument

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7990ccaf 02-Mar-2021 Li Huafei <lihuafei1@huawei.com>

ima: Fix the error code for restoring the PCR value

In ima_restore_measurement_list(), hdr[HDR_PCR].data is pointing to a
buffer of type u8, which contains the dumped 32-bit pcr value.
Currently, only the least significant byte is used to restore the pcr
value. We should convert hdr[HDR_PCR].data to a pointer of type u32
before fetching the value to restore the correct pcr value.

Fixes: 47fdee60b47f ("ima: use ima_parse_buf() to parse measurements headers")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

1fb057dc 19-Feb-2021 Paul Moore <paul@paul-moore.com>

smack: differentiate between subjective and objective task credentials

With the split of the security_task_getsecid() into subjective and
objective variants it's time to update Smack to ensure it is using
the correct task creds.

Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

eb1231f7 18-Feb-2021 Paul Moore <paul@paul-moore.com>

selinux: clarify task subjective and objective credentials

SELinux has a function, task_sid(), which returns the task's
objective credentials, but unfortunately is used in a few places
where the subjective task credentials should be used. Most notably
in the new security_task_getsecid_subj() LSM hook.

This patch fixes this and attempts to make things more obvious by
introducing a new function, task_sid_subj(), and renaming the
existing task_sid() function to task_sid_obj().

This patch also adds an interesting function in task_sid_binder().
The task_sid_binder() function has a comment which hopefully
describes it's reason for being, but it basically boils down to the
simple fact that we can't safely access another task's subjective
credentials so in the case of binder we need to stick with the
objective credentials regardless.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

4ebd7651 19-Feb-2021 Paul Moore <paul@paul-moore.com>

lsm: separate security_task_getsecid() into subjective and objective variants

Of the three LSMs that implement the security_task_getsecid() LSM
hook, all three LSMs provide the task's objective security
credentials. This turns out to be unfortunate as most of the hook's
callers seem to expect the task's subjective credentials, although
a small handful of callers do correctly expect the objective
credentials.

This patch is the first step towards fixing the problem: it splits
the existing security_task_getsecid() hook into two variants, one
for the subjective creds, one for the objective creds.

void security_task_getsecid_subj(struct task_struct *p,
u32 *secid);
void security_task_getsecid_obj(struct task_struct *p,
u32 *secid);

While this patch does fix all of the callers to use the correct
variant, in order to keep this patch focused on the callers and to
ease review, the LSMs continue to use the same implementation for
both hooks. The net effect is that this patch should not change
the behavior of the kernel in any way, it will be up to the latter
LSM specific patches in this series to change the hook
implementations and return the correct credentials.

Acked-by: Mimi Zohar <zohar@linux.ibm.com> (IMA)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

f873b28f 19-Mar-2021 Mimi Zohar <zohar@linux.ibm.com>

ima: without an IMA policy loaded, return quickly

Unless an IMA policy is loaded, don't bother checking for an appraise
policy rule. Return immediately.

Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

92063f3c 19-Mar-2021 Mimi Zohar <zohar@linux.ibm.com>

integrity: double check iint_cache was initialized

The kernel may be built with multiple LSMs, but only a subset may be
enabled on the boot command line by specifying "lsm=". Not including
"integrity" on the ordered LSM list may result in a NULL deref.

As reported by Dmitry Vyukov:
in qemu:
qemu-system-x86_64 -enable-kvm -machine q35,nvdimm -cpu
max,migratable=off -smp 4 -m 4G,slots=4,maxmem=16G -hda
wheezy.img -kernel arch/x86/boot/bzImage -nographic -vga std
-soundhw all -usb -usbdevice tablet -bt hci -bt device:keyboard
-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net
nic,model=virtio-net-pci -object
memory-backend-file,id=pmem1,share=off,mem-path=/dev/zero,size=64M
-device nvdimm,id=nvdimm1,memdev=pmem1 -append "console=ttyS0
root=/dev/sda earlyprintk=serial rodata=n oops=panic panic_on_warn=1
panic=86400 lsm=smack numa=fake=2 nopcid dummy_hcd.num=8" -pidfile
vm_pid -m 2G -cpu host

But it crashes on NULL deref in integrity_inode_get during boot:

Run /sbin/init as init process
BUG: kernel NULL pointer dereference, address: 000000000000001c
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc2+ #97
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-44-g88ab0c15525c-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmem_cache_alloc+0x2b/0x370 mm/slub.c:2920
Code: 57 41 56 41 55 41 54 41 89 f4 55 48 89 fd 53 48 83 ec 10 44 8b
3d d9 1f 90 0b 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 <8b> 5f
1c 4cf
RSP: 0000:ffffc9000032f9d8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888017fc4f00 RCX: 0000000000000000
RDX: ffff888040220000 RSI: 0000000000000c40 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888019263627
R10: ffffffff83937cd1 R11: 0000000000000000 R12: 0000000000000c40
R13: ffff888019263538 R14: 0000000000000000 R15: 0000000000ffffff
FS: 0000000000000000(0000) GS:ffff88802d180000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000001c CR3: 000000000b48e000 CR4: 0000000000750ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
integrity_inode_get+0x47/0x260 security/integrity/iint.c:105
process_measurement+0x33d/0x17e0 security/integrity/ima/ima_main.c:237
ima_bprm_check+0xde/0x210 security/integrity/ima/ima_main.c:474
security_bprm_check+0x7d/0xa0 security/security.c:845
search_binary_handler fs/exec.c:1708 [inline]
exec_binprm fs/exec.c:1761 [inline]
bprm_execve fs/exec.c:1830 [inline]
bprm_execve+0x764/0x19a0 fs/exec.c:1792
kernel_execve+0x370/0x460 fs/exec.c:1973
try_to_run_init_process+0x14/0x4e init/main.c:1366
kernel_init+0x11d/0x1b8 init/main.c:1477
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Modules linked in:
CR2: 000000000000001c
---[ end trace 22d601a500de7d79 ]---

Since LSMs and IMA may be configured at build time, but not enabled at
run time, panic the system if "integrity" was not initialized before use.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 79f7865d844c ("LSM: Introduce "lsm=" for boottime LSM selection")
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

69c4a42d 26-Feb-2021 Olga Kornievskaia <kolga@netapp.com>

lsm,selinux: add new hook to compare new mount to an existing mount

Add a new hook that takes an existing super block and a new mount
with new options and determines if new options confict with an
existing mount or not.

A filesystem can use this new hook to determine if it can share
the an existing superblock with a new superblock for the new mount.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[PM: tweak the subject line, fix tab/space problems]
Signed-off-by: Paul Moore <paul@paul-moore.com>

84196390 22-Mar-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20210322' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fixes from Paul Moore:
"Three SELinux patches:

- Fix a problem where a local variable is used outside its associated
function. Thankfully this can only be triggered by reloading the
SELinux policy, which is a restricted operation for other obvious
reasons.

- Fix some incorrect, and inconsistent, audit and printk messages
when loading the SELinux policy.

All three patches are relatively minor and have been through our
testing with no failures"

* tag 'selinux-pr-20210322' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinuxfs: unify policy load error reporting
selinux: fix variable scope issue in live sidtab conversion
selinux: don't log MAC_POLICY_LOAD record on failed policy load


ee5de60a 18-Mar-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinuxfs: unify policy load error reporting

Let's drop the pr_err()s from sel_make_policy_nodes() and just add one
pr_warn_ratelimited() call to the sel_make_policy_nodes() error path in
sel_write_load().

Changing from error to warning makes sense, since after 02a52c5c8c3b
("selinux: move policy commit after updating selinuxfs"), this error
path no longer leads to a broken selinuxfs tree (it's just kept in the
original state and policy load is aborted).

I also added _ratelimited to be consistent with the other prtin in the
same function (it's probably not necessary, but can't really hurt...
there are likely more important error messages to be printed when
filesystem entry creation starts erroring out).

Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

6406887a 18-Mar-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: fix variable scope issue in live sidtab conversion

Commit 02a52c5c8c3b ("selinux: move policy commit after updating
selinuxfs") moved the selinux_policy_commit() call out of
security_load_policy() into sel_write_load(), which caused a subtle yet
rather serious bug.

The problem is that security_load_policy() passes a reference to the
convert_params local variable to sidtab_convert(), which stores it in
the sidtab, where it may be accessed until the policy is swapped over
and RCU synchronized. Before 02a52c5c8c3b, selinux_policy_commit() was
called directly from security_load_policy(), so the convert_params
pointer remained valid all the way until the old sidtab was destroyed,
but now that's no longer the case and calls to sidtab_context_to_sid()
on the old sidtab after security_load_policy() returns may cause invalid
memory accesses.

This can be easily triggered using the stress test from commit
ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
performance"):
```
function rand_cat() {
echo $(( $RANDOM % 1024 ))
}

function do_work() {
while true; do
echo -n "system_u:system_r:kernel_t:s0:c$(rand_cat),c$(rand_cat)" \
>/sys/fs/selinux/context 2>/dev/null || true
done
}

do_work >/dev/null &
do_work >/dev/null &
do_work >/dev/null &

while load_policy; do echo -n .; sleep 0.1; done

kill %1
kill %2
kill %3
```

Fix this by allocating the temporary sidtab convert structures
dynamically and passing them among the
selinux_policy_{load,cancel,commit} functions.

Fixes: 02a52c5c8c3b ("selinux: move policy commit after updating selinuxfs")
Cc: stable@vger.kernel.org
Tested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: merge fuzz in security.h and services.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>

519dad3b 18-Mar-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: don't log MAC_POLICY_LOAD record on failed policy load

If sel_make_policy_nodes() fails, we should jump to 'out', not 'out1',
as the latter would incorrectly log an MAC_POLICY_LOAD audit record,
even though the policy hasn't actually been reloaded. The 'out1' jump
label now becomes unused and can be removed.

Fixes: 02a52c5c8c3b ("selinux: move policy commit after updating selinuxfs")
Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

3b0c2d3e 12-Mar-2021 Eric W. Biederman <ebiederm@xmission.com>

Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")

It turns out that there are in fact userspace implementations that
care and this recent change caused a regression.

https://github.com/containers/buildah/issues/3071

As the motivation for the original change was future development,
and the impact is existing real world code just revert this change
and allow the ambiguity in v3 file caps.

Cc: stable@vger.kernel.org
Fixes: 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

710ec562 11-Mar-2021 Ido Schimmel <idosch@nvidia.com>

nexthop: Add netlink defines and enumerators for resilient NH groups

- RTM_NEWNEXTHOP et.al. that handle resilient groups will have a new nested
attribute, NHA_RES_GROUP, whose elements are attributes NHA_RES_GROUP_*.

- RTM_NEWNEXTHOPBUCKET et.al. is a suite of new messages that will
currently serve only for dumping of individual buckets of resilient next
hop groups. For nexthop group buckets, these messages will carry a nested
attribute NHA_RES_BUCKET, whose elements are attributes NHA_RES_BUCKET_*.

There are several reasons why a new suite of messages is created for
nexthop buckets instead of overloading the information on the existing
RTM_{NEW,DEL,GET}NEXTHOP messages.

First, a nexthop group can contain a large number of nexthop buckets (4k
is not unheard of). This imposes limits on the amount of information that
can be encoded for each nexthop bucket given a netlink message is limited
to 64k bytes.

Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at
this point, in the future it can be extended to provide user space with
control over nexthop buckets configuration.

- The new group type is NEXTHOP_GRP_TYPE_RES. Note that nexthop code is
adjusted to bounce groups with that type for now.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebd9c2ae 22-Jan-2021 Eric Snowberg <eric.snowberg@oracle.com>

integrity: Load mokx variables into the blacklist keyring

During boot the Secure Boot Forbidden Signature Database, dbx,
is loaded into the blacklist keyring. Systems booted with shim
have an equivalent Forbidden Signature Database called mokx.
Currently mokx is only used by shim and grub, the contents are
ignored by the kernel.

Add the ability to load mokx into the blacklist keyring during boot.

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
cc: keyrings@vger.kernel.org
Link: https://lore.kernel.org/r/c33c8e3839a41e9654f41cc92c7231104931b1d7.camel@HansenPartnership.com/
Link: https://lore.kernel.org/r/20210122181054.32635-5-eric.snowberg@oracle.com/ # v5
Link: https://lore.kernel.org/r/161428674320.677100.12637282414018170743.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161433313205.902181.2502803393898221637.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161529607422.163428.13530426573612578854.stgit@warthog.procyon.org.uk/ # v3

56c58126 22-Jan-2021 Eric Snowberg <eric.snowberg@oracle.com>

certs: Add EFI_CERT_X509_GUID support for dbx entries

This fixes CVE-2020-26541.

The Secure Boot Forbidden Signature Database, dbx, contains a list of now
revoked signatures and keys previously approved to boot with UEFI Secure
Boot enabled. The dbx is capable of containing any number of
EFI_CERT_X509_SHA256_GUID, EFI_CERT_SHA256_GUID, and EFI_CERT_X509_GUID
entries.

Currently when EFI_CERT_X509_GUID are contained in the dbx, the entries are
skipped.

Add support for EFI_CERT_X509_GUID dbx entries. When a EFI_CERT_X509_GUID
is found, it is added as an asymmetrical key to the .blacklist keyring.
Anytime the .platform keyring is used, the keys in the .blacklist keyring
are referenced, if a matching key is found, the key will be rejected.

[DH: Made the following changes:
- Added to have a config option to enable the facility. This allows a
Kconfig solution to make sure that pkcs7_validate_trust() is
enabled.[1][2]
- Moved the functions out from the middle of the blacklist functions.
- Added kerneldoc comments.]

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
cc: Randy Dunlap <rdunlap@infradead.org>
cc: Mickaël Salaün <mic@digikod.net>
cc: Arnd Bergmann <arnd@kernel.org>
cc: keyrings@vger.kernel.org
Link: https://lore.kernel.org/r/20200901165143.10295-1-eric.snowberg@oracle.com/ # rfc
Link: https://lore.kernel.org/r/20200909172736.73003-1-eric.snowberg@oracle.com/ # v2
Link: https://lore.kernel.org/r/20200911182230.62266-1-eric.snowberg@oracle.com/ # v3
Link: https://lore.kernel.org/r/20200916004927.64276-1-eric.snowberg@oracle.com/ # v4
Link: https://lore.kernel.org/r/20210122181054.32635-2-eric.snowberg@oracle.com/ # v5
Link: https://lore.kernel.org/r/161428672051.677100.11064981943343605138.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161433310942.902181.4901864302675874242.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161529605075.163428.14625520893961300757.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/bc2c24e3-ed68-2521-0bf4-a1f6be4a895d@infradead.org/ [1]
Link: https://lore.kernel.org/r/20210225125638.1841436-1-arnd@kernel.org/ [2]

431c3be1 08-Mar-2021 Xiong Zhenwu <xiong.zhenwu@zte.com.cn>

selinux: fix misspellings using codespell tool

A typo is f out by codespell tool in 422th line of security.h:

$ codespell ./security/selinux/include/
./security.h:422: thie ==> the, this

Fix a typo found by codespell.

Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>

63ddf1ba 08-Mar-2021 Xiong Zhenwu <xiong.zhenwu@zte.com.cn>

selinux: fix misspellings using codespell tool

A typo is found out by codespell tool in 16th line of hashtab.c

$ codespell ./security/selinux/ss/
./hashtab.c:16: rouding ==> rounding

Fix a typo found by codespell.

Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>

2554a48f 12-Feb-2021 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

selinux: measure state and policy capabilities

SELinux stores the configuration state and the policy capabilities
in kernel memory. Changes to this data at runtime would have an impact
on the security guarantees provided by SELinux. Measuring this data
through IMA subsystem provides a tamper-resistant way for
an attestation service to remotely validate it at runtime.

Measure the configuration state and policy capabilities by calling
the IMA hook ima_measure_critical_data().

To enable SELinux data measurement, the following steps are required:

1, Add "ima_policy=critical_data" to the kernel command line arguments
to enable measuring SELinux data at boot time.
For example,
BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc3+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset security=selinux ima_policy=critical_data

2, Add the following rule to /etc/ima/ima-policy
measure func=CRITICAL_DATA label=selinux

Sample measurement of SELinux state and policy capabilities:

10 2122...65d8 ima-buf sha256:13c2...1292 selinux-state 696e...303b

Execute the following command to extract the measured data
from the IMA's runtime measurements list:

grep "selinux-state" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | tail -1 | cut -d' ' -f 6 | xxd -r -p

The output should be a list of key-value pairs. For example,
initialized=1;enforcing=0;checkreqprot=1;network_peer_controls=1;open_perms=1;extended_socket_class=1;always_check_network=0;cgroup_seclabel=1;nnp_nosuid_transition=1;genfs_seclabel_symlinks=0;

To verify the measurement is consistent with the current SELinux state
reported on the system, compare the integer values in the following
files with those set in the IMA measurement (using the following commands):

- cat /sys/fs/selinux/enforce
- cat /sys/fs/selinux/checkreqprot
- cat /sys/fs/selinux/policy_capabilities/[capability_file]

Note that the actual verification would be against an expected state
and done on a separate system (likely an attestation server) requiring
"initialized=1;enforcing=1;checkreqprot=0;"
for a secure state and then whatever policy capabilities are actually
set in the expected policy (which can be extracted from the policy
itself via seinfo, for example).

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

7fa2e79a 11-Feb-2021 Vivek Goyal <vgoyal@redhat.com>

selinux: Allow context mounts for unpriviliged overlayfs

Now overlayfs allow unpriviliged mounts. That is root inside a non-init
user namespace can mount overlayfs. This is being added in 5.11 kernel.

Giuseppe tried to mount overlayfs with option "context" and it failed
with error -EACCESS.

$ su test
$ unshare -rm
$ mkdir -p lower upper work merged
$ mount -t overlay -o lowerdir=lower,workdir=work,upperdir=upper,userxattr,context='system_u:object_r:container_file_t:s0' none merged

This fails with -EACCESS. It works if option "-o context" is not specified.

Little debugging showed that selinux_set_mnt_opts() returns -EACCESS.

So this patch adds "overlay" to the list, where it is fine to specific
context from non init_user_ns.

Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
[PM: trimmed the changelog from the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>

fee3ff99 21-Feb-2021 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c

The functions defined in "arch/powerpc/kexec/ima.c" handle setting up
and freeing the resources required to carry over the IMA measurement
list from the current kernel to the next kernel across kexec system call.
These functions do not have architecture specific code, but are
currently limited to powerpc.

Move remove_ima_buffer() and setup_ima_buffer() calls into
of_kexec_alloc_and_setup_fdt() defined in "drivers/of/kexec.c".

Move the remaining architecture independent functions from
"arch/powerpc/kexec/ima.c" to "drivers/of/kexec.c".
Delete "arch/powerpc/kexec/ima.c" and "arch/powerpc/include/asm/ima.h".
Remove references to the deleted files and functions in powerpc and
in ima.

Co-developed-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Tested-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-11-nramas@linux.microsoft.com

0c605158 21-Feb-2021 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

powerpc: Move ima buffer fields to struct kimage

The fields ima_buffer_addr and ima_buffer_size in "struct kimage_arch"
for powerpc are used to carry forward the IMA measurement list across
kexec system call. These fields are not architecture specific, but are
currently limited to powerpc.

arch_ima_add_kexec_buffer() defined in "arch/powerpc/kexec/ima.c"
sets ima_buffer_addr and ima_buffer_size for the kexec system call.
This function does not have architecture specific code, but is
currently limited to powerpc.

Move ima_buffer_addr and ima_buffer_size to "struct kimage".
Set ima_buffer_addr and ima_buffer_size in ima_add_kexec_buffer()
in security/integrity/ima/ima_kexec.c.

Co-developed-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Will Deacon <will@kernel.org>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-9-nramas@linux.microsoft.com

c03c21ba 23-Feb-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'keys-misc-20210126' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull keyring updates from David Howells:
"Here's a set of minor keyrings fixes/cleanups that I've collected from
various people for the upcoming merge window.

A couple of them might, in theory, be visible to userspace:

- Make blacklist_vet_description() reject uppercase letters as they
don't match the all-lowercase hex string generated for a blacklist
search.

This may want reconsideration in the future, but, currently, you
can't add to the blacklist keyring from userspace and the only
source of blacklist keys generates lowercase descriptions.

- Fix blacklist_init() to use a new KEY_ALLOC_* flag to indicate that
it wants KEY_FLAG_KEEP to be set rather than passing KEY_FLAG_KEEP
into keyring_alloc() as KEY_FLAG_KEEP isn't a valid alloc flag.

This isn't currently a problem as the blacklist keyring isn't
currently writable by userspace.

The rest of the patches are cleanups and I don't think they should
have any visible effect"

* tag 'keys-misc-20210126' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
watch_queue: rectify kernel-doc for init_watch()
certs: Replace K{U,G}IDT_INIT() with GLOBAL_ROOT_{U,G}ID
certs: Fix blacklist flag type confusion
PKCS#7: Fix missing include
certs: Fix blacklisted hexadecimal hash string check
certs/blacklist: fix kernel doc interface issue
crypto: public_key: Remove redundant header file from public_key.h
keys: remove trailing semicolon in macro definition
crypto: pkcs7: Use match_string() helper to simplify the code
PKCS#7: drop function from kernel-doc pkcs7_validate_trust_one
encrypted-keys: Replace HTTP links with HTTPS ones
crypto: asymmetric_keys: fix some comments in pkcs7_parser.h
KEYS: remove redundant memset
security: keys: delete repeated words in comments
KEYS: asymmetric: Fix kerneldoc
security/keys: use kvfree_sensitive()
watch_queue: Drop references to /dev/watch_queue
keys: Remove outdated __user annotations
security: keys: Fix fall-through warnings for Clang


7d6beb71 23-Feb-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull idmapped mounts from Christian Brauner:
"This introduces idmapped mounts which has been in the making for some
time. Simply put, different mounts can expose the same file or
directory with different ownership. This initial implementation comes
with ports for fat, ext4 and with Christoph's port for xfs with more
filesystems being actively worked on by independent people and
maintainers.

Idmapping mounts handle a wide range of long standing use-cases. Here
are just a few:

- Idmapped mounts make it possible to easily share files between
multiple users or multiple machines especially in complex
scenarios. For example, idmapped mounts will be used in the
implementation of portable home directories in
systemd-homed.service(8) where they allow users to move their home
directory to an external storage device and use it on multiple
computers where they are assigned different uids and gids. This
effectively makes it possible to assign random uids and gids at
login time.

- It is possible to share files from the host with unprivileged
containers without having to change ownership permanently through
chown(2).

- It is possible to idmap a container's rootfs and without having to
mangle every file. For example, Chromebooks use it to share the
user's Download folder with their unprivileged containers in their
Linux subsystem.

- It is possible to share files between containers with
non-overlapping idmappings.

- Filesystem that lack a proper concept of ownership such as fat can
use idmapped mounts to implement discretionary access (DAC)
permission checking.

- They allow users to efficiently changing ownership on a per-mount
basis without having to (recursively) chown(2) all files. In
contrast to chown (2) changing ownership of large sets of files is
instantenous with idmapped mounts. This is especially useful when
ownership of a whole root filesystem of a virtual machine or
container is changed. With idmapped mounts a single syscall
mount_setattr syscall will be sufficient to change the ownership of
all files.

- Idmapped mounts always take the current ownership into account as
idmappings specify what a given uid or gid is supposed to be mapped
to. This contrasts with the chown(2) syscall which cannot by itself
take the current ownership of the files it changes into account. It
simply changes the ownership to the specified uid and gid. This is
especially problematic when recursively chown(2)ing a large set of
files which is commong with the aforementioned portable home
directory and container and vm scenario.

- Idmapped mounts allow to change ownership locally, restricting it
to specific mounts, and temporarily as the ownership changes only
apply as long as the mount exists.

Several userspace projects have either already put up patches and
pull-requests for this feature or will do so should you decide to pull
this:

- systemd: In a wide variety of scenarios but especially right away
in their implementation of portable home directories.

https://systemd.io/HOME_DIRECTORY/

- container runtimes: containerd, runC, LXD:To share data between
host and unprivileged containers, unprivileged and privileged
containers, etc. The pull request for idmapped mounts support in
containerd, the default Kubernetes runtime is already up for quite
a while now: https://github.com/containerd/containerd/pull/4734

- The virtio-fs developers and several users have expressed interest
in using this feature with virtual machines once virtio-fs is
ported.

- ChromeOS: Sharing host-directories with unprivileged containers.

I've tightly synced with all those projects and all of those listed
here have also expressed their need/desire for this feature on the
mailing list. For more info on how people use this there's a bunch of
talks about this too. Here's just two recent ones:

https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
https://fosdem.org/2021/schedule/event/containers_idmap/

This comes with an extensive xfstests suite covering both ext4 and
xfs:

https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

It covers truncation, creation, opening, xattrs, vfscaps, setid
execution, setgid inheritance and more both with idmapped and
non-idmapped mounts. It already helped to discover an unrelated xfs
setgid inheritance bug which has since been fixed in mainline. It will
be sent for inclusion with the xfstests project should you decide to
merge this.

In order to support per-mount idmappings vfsmounts are marked with
user namespaces. The idmapping of the user namespace will be used to
map the ids of vfs objects when they are accessed through that mount.
By default all vfsmounts are marked with the initial user namespace.
The initial user namespace is used to indicate that a mount is not
idmapped. All operations behave as before and this is verified in the
testsuite.

Based on prior discussions we want to attach the whole user namespace
and not just a dedicated idmapping struct. This allows us to reuse all
the helpers that already exist for dealing with idmappings instead of
introducing a whole new range of helpers. In addition, if we decide in
the future that we are confident enough to enable unprivileged users
to setup idmapped mounts the permission checking can take into account
whether the caller is privileged in the user namespace the mount is
currently marked with.

The user namespace the mount will be marked with can be specified by
passing a file descriptor refering to the user namespace as an
argument to the new mount_setattr() syscall together with the new
MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
of extensibility.

The following conditions must be met in order to create an idmapped
mount:

- The caller must currently have the CAP_SYS_ADMIN capability in the
user namespace the underlying filesystem has been mounted in.

- The underlying filesystem must support idmapped mounts.

- The mount must not already be idmapped. This also implies that the
idmapping of a mount cannot be altered once it has been idmapped.

- The mount must be a detached/anonymous mount, i.e. it must have
been created by calling open_tree() with the OPEN_TREE_CLONE flag
and it must not already have been visible in the filesystem.

The last two points guarantee easier semantics for userspace and the
kernel and make the implementation significantly simpler.

By default vfsmounts are marked with the initial user namespace and no
behavioral or performance changes are observed.

The manpage with a detailed description can be found here:

https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8

In order to support idmapped mounts, filesystems need to be changed
and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
patches to convert individual filesystem are not very large or
complicated overall as can be seen from the included fat, ext4, and
xfs ports. Patches for other filesystems are actively worked on and
will be sent out separately. The xfstestsuite can be used to verify
that port has been done correctly.

The mount_setattr() syscall is motivated independent of the idmapped
mounts patches and it's been around since July 2019. One of the most
valuable features of the new mount api is the ability to perform
mounts based on file descriptors only.

Together with the lookup restrictions available in the openat2()
RESOLVE_* flag namespace which we added in v5.6 this is the first time
we are close to hardened and race-free (e.g. symlinks) mounting and
path resolution.

While userspace has started porting to the new mount api to mount
proper filesystems and create new bind-mounts it is currently not
possible to change mount options of an already existing bind mount in
the new mount api since the mount_setattr() syscall is missing.

With the addition of the mount_setattr() syscall we remove this last
restriction and userspace can now fully port to the new mount api,
covering every use-case the old mount api could. We also add the
crucial ability to recursively change mount options for a whole mount
tree, both removing and adding mount options at the same time. This
syscall has been requested multiple times by various people and
projects.

There is a simple tool available at

https://github.com/brauner/mount-idmapped

that allows to create idmapped mounts so people can play with this
patch series. I'll add support for the regular mount binary should you
decide to pull this in the following weeks:

Here's an example to a simple idmapped mount of another user's home
directory:

u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
-rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
-rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
-rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo

u1001@f2-vm:/$ touch /mnt/my-file

u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--

u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
xfs: support idmapped mounts
ext4: support idmapped mounts
fat: handle idmapped mounts
tests: add mount_setattr() selftests
fs: introduce MOUNT_ATTR_IDMAP
fs: add mount_setattr()
fs: add attr_flags_to_mnt_flags helper
fs: split out functions to hold writers
namespace: only take read lock in do_reconfigure_mnt()
mount: make {lock,unlock}_mount_hash() static
namespace: take lock_mount_hash() directly when changing flags
nfs: do not export idmapped mounts
overlayfs: do not mount on top of idmapped mounts
ecryptfs: do not mount on top of idmapped mounts
ima: handle idmapped mounts
apparmor: handle idmapped mounts
fs: make helpers idmap mount aware
exec: handle idmapped mounts
would_dump: handle idmapped mounts
...


7b0b78df 22-Feb-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'userns-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull user namespace update from Eric Biederman:
"There are several pieces of active development, but only a single
change made it through the gauntlet to be ready for v5.12. That change
is tightening up the semantics of the v3 capabilities xattr. It is
just short of being a bug-fix/security issue as no user space is known
to even generate the problem case"

* 'userns-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
capabilities: Don't allow writing ambiguous v3 file capabilities


250a25e7 22-Feb-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull RCU-safe common_lsm_audit() from Al Viro:
"Make common_lsm_audit() non-blocking and usable from RCU pathwalk
context.

We don't really need to grab/drop dentry in there - rcu_read_lock() is
enough. There's a couple of followups using that to simplify the
logics in selinux, but those hadn't soaked in -next yet, so they'll
have to go in next window"

* 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make dump_common_audit_data() safe to be called from RCU pathwalk
new helper: d_find_alias_rcu()


a2b095e0 21-Feb-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'tpmdd-next-v5.12-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm updates from Jarkko Sakkinen:
"New features:

- Cr50 I2C TPM driver

- sysfs exports of PCR registers in TPM 2.0 chips

Bug fixes:

- bug fixes for tpm_tis driver, which had a racy wait for hardware
state change to be ready to send a command to the TPM chip. The bug
has existed already since 2006, but has only made itself known in
recent past. This is the same as the "last time" :-)

- Otherwise there's bunch of fixes for not as alarming regressions. I
think the list is about the same as last time, except I added fixes
for some disjoint bugs in trusted keys that I found some time ago"

* tag 'tpmdd-next-v5.12-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: Reserve TPM for seal and unseal operations
KEYS: trusted: Fix migratable=1 failing
KEYS: trusted: Fix incorrect handling of tpm_get_random()
tpm/ppi: Constify static struct attribute_group
ABI: add sysfs description for tpm exports of PCR registers
tpm: add sysfs exports for all banks of PCR registers
keys: Update comment for restrict_link_by_key_or_keyring_chain
tpm: Remove tpm_dev_wq_lock
char: tpm: add i2c driver for cr50
tpm: Fix fall-through warnings for Clang
tpm_tis: Clean up locality release
tpm_tis: Fix check_locality for correct locality acquisition


92ae63c0 21-Feb-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'Smack-for-v5.12' of git://github.com/cschaufler/smack-next

Pull smack updates from Casey Schaufler:
"Bounds checking for writes to smackfs interfaces"

* tag 'Smack-for-v5.12' of git://github.com/cschaufler/smack-next:
smackfs: restrict bytes count in smackfs write functions


d643a990 21-Feb-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull IMA updates from Mimi Zohar:
"New is IMA support for measuring kernel critical data, as per usual
based on policy. The first example measures the in memory SELinux
policy. The second example measures the kernel version.

In addition are four bug fixes to address memory leaks and a missing
'static' function declaration"

* tag 'integrity-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: Make function integrity_add_key() static
ima: Free IMA measurement buffer after kexec syscall
ima: Free IMA measurement buffer on error
IMA: Measure kernel version in early boot
selinux: include a consumer of the new IMA critical data hook
IMA: define a builtin critical data measurement policy
IMA: extend critical data hook to limit the measurement based on a label
IMA: limit critical data measurement based on a label
IMA: add policy rule to measure critical data
IMA: define a hook to measure kernel integrity critical data
IMA: add support to measure buffer data hash
IMA: generalize keyring specific measurement constructs
evm: Fix memleak in init_desc


d1fec221 21-Feb-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20210215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
"We've got a good handful of patches for SELinux this time around; with
everything passing the selinux-testsuite and applying cleanly to your
tree as of a few minutes ago. The highlights are:

- Add support for labeling anonymous inodes, and extend this new
support to userfaultfd.

- Fallback to SELinux genfs file labeling if the filesystem does not
have xattr support. This is useful for virtiofs which can vary in
its xattr support depending on the backing filesystem.

- Classify and handle MPTCP the same as TCP in SELinux.

- Ensure consistent behavior between inode_getxattr and
inode_listsecurity when the SELinux policy is not loaded. This
fixes a known problem with overlayfs.

- A couple of patches to prune some unused variables from the SELinux
code, mark private variables as static, and mark other variables as
__ro_after_init or __read_mostly"

* tag 'selinux-pr-20210215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
fs: anon_inodes: rephrase to appropriate kernel-doc
userfaultfd: use secure anon inodes for userfaultfd
selinux: teach SELinux about anonymous inodes
fs: add LSM-supporting anon-inode interface
security: add inode_init_security_anon() LSM hook
selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support
selinux: mark selinux_xfrm_refcount as __read_mostly
selinux: mark some global variables __ro_after_init
selinux: make selinuxfs_mount static
selinux: drop the unnecessary aurule_callback variable
selinux: remove unused global variables
selinux: fix inconsistency between inode_getxattr and inode_listsecurity
selinux: handle MPTCP consistently with TCP


e210761f 21-Feb-2021 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'tomoyo-pr-20210215' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1

Pull tomoyo updates from Tetsuo Handa:
"Detect kernel thread correctly, and ignore harmless data race"

* tag 'tomoyo-pr-20210215' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: recognize kernel threads correctly
tomoyo: ignore data race while checking quota


8c657a05 28-Jan-2021 Jarkko Sakkinen <jarkko@kernel.org>

KEYS: trusted: Reserve TPM for seal and unseal operations

When TPM 2.0 trusted keys code was moved to the trusted keys subsystem,
the operations were unwrapped from tpm_try_get_ops() and tpm_put_ops(),
which are used to take temporarily the ownership of the TPM chip. The
ownership is only taken inside tpm_send(), but this is not sufficient,
as in the key load TPM2_CC_LOAD, TPM2_CC_UNSEAL and TPM2_FLUSH_CONTEXT
need to be done as a one single atom.

Take the TPM chip ownership before sending anything with
tpm_try_get_ops() and tpm_put_ops(), and use tpm_transmit_cmd() to send
TPM commands instead of tpm_send(), reverting back to the old behaviour.

Fixes: 2e19e10131a0 ("KEYS: trusted: Move TPM2 trusted keys code")
Reported-by: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: stable@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
Acked-by Sumit Garg <sumit.garg@linaro.org>
Tested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

8da7520c 28-Jan-2021 Jarkko Sakkinen <jarkko@kernel.org>

KEYS: trusted: Fix migratable=1 failing

Consider the following transcript:

$ keyctl add trusted kmk "new 32 blobauth=helloworld keyhandle=80000000 migratable=1" @u
add_key: Invalid argument

The documentation has the following description:

migratable= 0|1 indicating permission to reseal to new PCR values,
default 1 (resealing allowed)

The consequence is that "migratable=1" should succeed. Fix this by
allowing this condition to pass instead of return -EINVAL.

[*] Documentation/security/keys/trusted-encrypted.rst

Cc: stable@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Fixes: d00a1c72f7f4 ("keys: add new trusted key-type")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

5df16caa 28-Jan-2021 Jarkko Sakkinen <jarkko@kernel.org>

KEYS: trusted: Fix incorrect handling of tpm_get_random()

When tpm_get_random() was introduced, it defined the following API for the
return value:

1. A positive value tells how many bytes of random data was generated.
2. A negative value on error.

However, in the call sites the API was used incorrectly, i.e. as it would
only return negative values and otherwise zero. Returning he positive read
counts to the user space does not make any possible sense.

Fix this by returning -EIO when tpm_get_random() returns a positive value.

Fixes: 41ab999c80f1 ("tpm: Move tpm_get_random api into the TPM device driver")
Cc: stable@vger.kernel.org
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>

f6692213 10-Feb-2021 Wei Yongjun <weiyongjun1@huawei.com>

integrity: Make function integrity_add_key() static

The sparse tool complains as follows:

security/integrity/digsig.c:146:12: warning:
symbol 'integrity_add_key' was not declared. Should it be static?

This function is not used outside of digsig.c, so this
commit marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 60740accf784 ("integrity: Load certs to the platform keyring")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

cccb0efd 10-Feb-2021 Mimi Zohar <zohar@linux.ibm.com>

Merge branch 'ima-kexec-fixes' into next-integrity


f31e3386 04-Feb-2021 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

ima: Free IMA measurement buffer after kexec syscall

IMA allocates kernel virtual memory to carry forward the measurement
list, from the current kernel to the next kernel on kexec system call,
in ima_add_kexec_buffer() function. This buffer is not freed before
completing the kexec system call resulting in memory leak.

Add ima_buffer field in "struct kimage" to store the virtual address
of the buffer allocated for the IMA measurement list.
Free the memory allocated for the IMA measurement list in
kimage_file_post_load_cleanup() function.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

6d14c651 04-Feb-2021 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

ima: Free IMA measurement buffer on error

IMA allocates kernel virtual memory to carry forward the measurement
list, from the current kernel to the next kernel on kexec system call,
in ima_add_kexec_buffer() function. In error code paths this memory
is not freed resulting in memory leak.

Free the memory allocated for the IMA measurement list in
the error code paths in ima_add_kexec_buffer() function.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

7ef4c19d 28-Jan-2021 Sabyrzhan Tasbolatov <snovitoll@gmail.com>

smackfs: restrict bytes count in smackfs write functions

syzbot found WARNINGs in several smackfs write operations where
bytes count is passed to memdup_user_nul which exceeds
GFP MAX_ORDER. Check count size if bigger than PAGE_SIZE.

Per smackfs doc, smk_write_net4addr accepts any label or -CIPSO,
smk_write_net6addr accepts any label or -DELETE. I couldn't find
any general rule for other label lengths except SMK_LABELLEN,
SMK_LONGLABEL, SMK_CIPSOMAX which are documented.

Let's constrain, in general, smackfs label lengths for PAGE_SIZE.
Although fuzzer crashes write to smackfs/netlabel on 0x400000 length.

Here is a quick way to reproduce the WARNING:
python -c "print('A' * 0x400000)" > /sys/fs/smackfs/netlabel

Reported-by: syzbot+a71a442385a0b2815497@syzkaller.appspotmail.com
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

9c83465f 31-Jan-2021 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

tomoyo: recognize kernel threads correctly

Commit db68ce10c4f0a27c ("new helper: uaccess_kernel()") replaced
segment_eq(get_fs(), KERNEL_DS) with uaccess_kernel(). But the correct
method for tomoyo to check whether current is a kernel thread in order
to assume that kernel threads are privileged for socket operations was
(current->flags & PF_KTHREAD). Now that uaccess_kernel() became 0 on x86,
tomoyo has to fix this problem. Do like commit 942cb357ae7d9249 ("Smack:
Handle io_uring kernel thread privileges") does.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

5797e861 31-Jan-2021 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

tomoyo: ignore data race while checking quota

syzbot is reporting that tomoyo's quota check is racy [1]. But this check
is tolerant of some degree of inaccuracy. Thus, teach KCSAN to ignore
this data race.

[1] https://syzkaller.appspot.com/bug?id=999533deec7ba6337f8aa25d8bd1a4d5f7e50476

Reported-by: syzbot <syzbot+0789a72b46fd91431bd8@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

f2b00be4 28-Jan-2021 Miklos Szeredi <mszeredi@redhat.com>

cap: fix conversions on getxattr

If a capability is stored on disk in v2 format cap_inode_getsecurity() will
currently return in v2 format unconditionally.

This is wrong: v2 cap should be equivalent to a v3 cap with zero rootid,
and so the same conversions performed on it.

If the rootid cannot be mapped, v3 is returned unconverted. Fix this so
that both v2 and v3 return -EOVERFLOW if the rootid (or the owner of the fs
user namespace in case of v2) cannot be mapped into the current user
namespace.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

b3f82afc 26-Jan-2021 Raphael Gianotti <raphgi@linux.microsoft.com>

IMA: Measure kernel version in early boot

The integrity of a kernel can be verified by the boot loader on cold
boot, and during kexec, by the current running kernel, before it is
loaded. However, it is still possible that the new kernel being
loaded is older than the current kernel, and/or has known
vulnerabilities. Therefore, it is imperative that an attestation
service be able to verify the version of the kernel being loaded on
the client, from cold boot and subsequent kexec system calls,
ensuring that only kernels with versions known to be good are loaded.

Measure the kernel version using ima_measure_critical_data() early on
in the boot sequence, reducing the chances of known kernel
vulnerabilities being exploited. With IMA being part of the kernel,
this overall approach makes the measurement itself more trustworthy.

To enable measuring the kernel version "ima_policy=critical_data"
needs to be added to the kernel command line arguments.
For example,
BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc3+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset ima_policy=critical_data

If runtime measurement of the kernel version is ever needed, the
following should be added to /etc/ima/ima-policy:

measure func=CRITICAL_DATA label=kernel_info

To extract the measured data after boot, the following command can be used:

grep -m 1 "kernel_version" \
/sys/kernel/security/integrity/ima/ascii_runtime_measurements

Sample output from the command above:

10 a8297d408e9d5155728b619761d0dd4cedf5ef5f ima-buf
sha256:5660e19945be0119bc19cbbf8d9c33a09935ab5d30dad48aa11f879c67d70988
kernel_version 352e31312e302d7263332d31363138372d676564623634666537383234342d6469727479

The above hex-ascii string corresponds to the kernel version
(e.g. xxd -r -p):

5.11.0-rc3-16187-gedb64fe78244-dirty

Signed-off-by: Raphael Gianotti <raphgi@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

a2d2329e 21-Jan-2021 Christian Brauner <christian.brauner@ubuntu.com>

ima: handle idmapped mounts

IMA does sometimes access the inode's i_uid and compares it against the
rules' fowner. Enable IMA to handle idmapped mounts by passing down the
mount's user namespace. We simply make use of the helpers we introduced
before. If the initial user namespace is passed nothing changes so
non-idmapped mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-27-christian.brauner@ubuntu.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

3cee6079 21-Jan-2021 Christian Brauner <christian.brauner@ubuntu.com>

apparmor: handle idmapped mounts

The i_uid and i_gid are mostly used when logging for AppArmor. This is
broken in a bunch of places where the global root id is reported instead
of the i_uid or i_gid of the file. Nonetheless, be kind and log the
mapped inode if we're coming from an idmapped mount. If the initial user
namespace is passed nothing changes so non-idmapped mounts will see
identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-26-christian.brauner@ubuntu.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

549c7297 21-Jan-2021 Christian Brauner <christian.brauner@ubuntu.com>

fs: make helpers idmap mount aware

Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.

Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

71bc356f 21-Jan-2021 Christian Brauner <christian.brauner@ubuntu.com>

commoncap: handle idmapped mounts

When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller, whether they need to be removed and so on. The main
infrastructure for this resides in the capability codepaths but they are
called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.

In order to actually get filesystem capabilities from disk the
capability infrastructure exposes the get_vfs_caps_from_disk() helper.
For user namespace aware filesystem capabilities a root uid is stored
alongside the capabilities.

In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according
to the superblock's user namespace. If it can be translated to uid 0
according to that id mapping the caller can use the filesystem
capabilities stored on disk. If we are accessing the inode that holds
the filesystem capabilities through an idmapped mount we map the root
uid according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts: reading filesystem caps from disk
enforces that the root uid associated with the filesystem capability
must have a mapping in the superblock's user namespace and that the
caller is either in the same user namespace or is a descendant of the
superblock's user namespace. For filesystems that are mountable inside
user namespace the caller can just mount the filesystem and won't
usually need to idmap it. If they do want to idmap it they can create an
idmapped mount and mark it with a user namespace they created and which
is thus a descendant of s_user_ns. For filesystems that are not
mountable inside user namespaces the descendant rule is trivially true
because the s_user_ns will be the initial user namespace.

If the initial user namespace is passed nothing changes so non-idmapped
mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

c7c7a1a1 21-Jan-2021 Tycho Andersen <tycho@tycho.pizza>

xattr: handle idmapped mounts

When interacting with extended attributes the vfs verifies that the
caller is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid
or gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an
idmapped mount we need to map it according to the mount's user
namespace. Afterwards the checks are identical to non-idmapped mounts.
This has no effect for e.g. security xattrs since they don't store uids
or gids and don't perform permission checks on them like posix acls do.

Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

e65ce2a5 21-Jan-2021 Christian Brauner <christian.brauner@ubuntu.com>

acl: handle idmapped mounts

The posix acl permission checking helpers determine whether a caller is
privileged over an inode according to the acls associated with the
inode. Add helpers that make it possible to handle acls on idmapped
mounts.

The vfs and the filesystems targeted by this first iteration make use of
posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
translate basic posix access and default permissions such as the
ACL_USER and ACL_GROUP type according to the initial user namespace (or
the superblock's user namespace) to and from the caller's current user
namespace. Adapt these two helpers to handle idmapped mounts whereby we
either map from or into the mount's user namespace depending on in which
direction we're translating.
Similarly, cap_convert_nscap() is used by the vfs to translate user
namespace and non-user namespace aware filesystem capabilities from the
superblock's user namespace to the caller's user namespace. Enable it to
handle idmapped mounts by accounting for the mount's user namespace.

In addition the fileystems targeted in the first iteration of this patch
series make use of the posix_acl_chmod() and, posix_acl_update_mode()
helpers. Both helpers perform permission checks on the target inode. Let
them handle idmapped mounts. These two helpers are called when posix
acls are set by the respective filesystems to handle this case we extend
the ->set() method to take an additional user namespace argument to pass
the mount's user namespace down.

Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

21cb47be 21-Jan-2021 Christian Brauner <christian.brauner@ubuntu.com>

inode: make init and permission helpers idmapped mount aware

The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped
mount it according to the mount's user namespace. Afterwards the checks
are identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.

Similarly, allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the
fsuid and fsgid of the caller from the mount's user namespace. If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

0558c1bf 21-Jan-2021 Christian Brauner <christian.brauner@ubuntu.com>

capability: handle idmapped mounts

In order to determine whether a caller holds privilege over a given
inode the capability framework exposes the two helpers
privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former
verifies that the inode has a mapping in the caller's user namespace and
the latter additionally verifies that the caller has the requested
capability in their current user namespace.
If the inode is accessed through an idmapped mount map it into the
mount's user namespace. Afterwards the checks are identical to
non-idmapped inodes. If the initial user namespace is passed all
operations are a nop so non-idmapped mounts will not see a change in
behavior.

Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

4993e1f9 20-Nov-2020 David Howells <dhowells@redhat.com>

certs: Fix blacklist flag type confusion

KEY_FLAG_KEEP is not meant to be passed to keyring_alloc() or key_alloc(),
as these only take KEY_ALLOC_* flags. KEY_FLAG_KEEP has the same value as
KEY_ALLOC_BYPASS_RESTRICTION, but fortunately only key_create_or_update()
uses it. LSMs using the key_alloc hook don't check that flag.

KEY_FLAG_KEEP is then ignored but fortunately (again) the root user cannot
write to the blacklist keyring, so it is not possible to remove a key/hash
from it.

Fix this by adding a KEY_ALLOC_SET_KEEP flag that tells key_alloc() to set
KEY_FLAG_KEEP on the new key. blacklist_init() can then, correctly, pass
this to keyring_alloc().

We can also use this in ima_mok_init() rather than setting the flag
manually.

Note that this doesn't fix an observable bug with the current
implementation but it is required to allow addition of new hashes to the
blacklist in the future without making it possible for them to be removed.

Fixes: 734114f8782f ("KEYS: Add a system blacklist keyring")
Reported-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Mickaël Salaün <mic@linux.microsoft.com>
cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>

c224926e 22-Jul-2020 Tom Rix <trix@redhat.com>

KEYS: remove redundant memset

Reviewing use of memset in keyctl_pkey.c

keyctl_pkey_params_get prologue code to set params up

memset(params, 0, sizeof(*params));
params->encoding = "raw";

keyctl_pkey_query has the same prologue
and calls keyctl_pkey_params_get.

So remove the prologue.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Ben Boeckel <mathstuf@gmail.com>

328c95db 07-Aug-2020 Randy Dunlap <rdunlap@infradead.org>

security: keys: delete repeated words in comments

Drop repeated words in comments.
{to, will, the}

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Ben Boeckel <mathstuf@gmail.com>
Cc: keyrings@vger.kernel.org
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org

272a1219 27-Aug-2020 Denis Efremov <efremov@linux.com>

security/keys: use kvfree_sensitive()

Use kvfree_sensitive() instead of open-coding it.

Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Ben Boeckel <mathstuf@gmail.com>

8fe62e0c 24-Nov-2020 Gabriel Krisman Bertazi <krisman@collabora.com>

watch_queue: Drop references to /dev/watch_queue

The merged API doesn't use a watch_queue device, but instead relies on
pipes, so let the documentation reflect that.

Fixes: f7e47677e39a ("watch_queue: Add a key/keyring notification facility")
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Ben Boeckel <mathstuf@gmail.com>

796e46f9 23-Nov-2020 Jann Horn <jannh@google.com>

keys: Remove outdated __user annotations

When the semantics of the ->read() handlers were changed such that "buffer"
is a kernel pointer, some __user annotations survived.
Since they're wrong now, get rid of them.

Fixes: d3ec10aa9581 ("KEYS: Don't write out to userspace while holding key semaphore")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Ben Boeckel <mathstuf@gmail.com>

634c21bb 19-Nov-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

security: keys: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Ben Boeckel <mathstuf@gmail.com>

23d8f5b6 05-Jan-2021 Al Viro <viro@zeniv.linux.org.uk>

make dump_common_audit_data() safe to be called from RCU pathwalk

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d36a1dd9 05-Jan-2021 Al Viro <viro@zeniv.linux.org.uk>

dump_common_audit_data(): fix racy accesses to ->d_name

We are not guaranteed the locking environment that would prevent
dentry getting renamed right under us. And it's possible for
old long name to be freed after rename, leading to UAF here.

Cc: stable@kernel.org # v2.6.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fdd1ffe8 14-Jan-2021 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

selinux: include a consumer of the new IMA critical data hook

SELinux stores the active policy in memory, so the changes to this data
at runtime would have an impact on the security guarantees provided
by SELinux. Measuring in-memory SELinux policy through IMA subsystem
provides a secure way for the attestation service to remotely validate
the policy contents at runtime.

Measure the hash of the loaded policy by calling the IMA hook
ima_measure_critical_data(). Since the size of the loaded policy
can be large (several MB), measure the hash of the policy instead of
the entire policy to avoid bloating the IMA log entry.

To enable SELinux data measurement, the following steps are required:

1, Add "ima_policy=critical_data" to the kernel command line arguments
to enable measuring SELinux data at boot time.
For example,
BOOT_IMAGE=/boot/vmlinuz-5.10.0-rc1+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset security=selinux ima_policy=critical_data

2, Add the following rule to /etc/ima/ima-policy
measure func=CRITICAL_DATA label=selinux

Sample measurement of the hash of SELinux policy:

To verify the measured data with the current SELinux policy run
the following commands and verify the output hash values match.

sha256sum /sys/fs/selinux/policy | cut -d' ' -f 1

grep "selinux-policy-hash" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | tail -1 | cut -d' ' -f 6

Note that the actual verification of SELinux policy would require loading
the expected policy into an identical kernel on a pristine/known-safe
system and run the sha256sum /sys/kernel/selinux/policy there to get
the expected hash.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

03cee168 07-Jan-2021 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

IMA: define a builtin critical data measurement policy

Define a new critical data builtin policy to allow measuring
early kernel integrity critical data before a custom IMA policy
is loaded.

Update the documentation on kernel parameters to document
the new critical data builtin policy.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

9f5d7d23 07-Jan-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

IMA: extend critical data hook to limit the measurement based on a label

The IMA hook ima_measure_critical_data() does not support a way to
specify the source of the critical data provider. Thus, the data
measurement cannot be constrained based on the data source label
in the IMA policy.

Extend the IMA hook ima_measure_critical_data() to support passing
the data source label as an input parameter, so that the policy rule can
be used to limit the measurements based on the label.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

47d76a48 07-Jan-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

IMA: limit critical data measurement based on a label

Integrity critical data may belong to a single subsystem or it may
arise from cross subsystem interaction. Currently there is no mechanism
to group or limit the data based on certain label. Limiting and
grouping critical data based on a label would make it flexible and
configurable to measure.

Define "label:=", a new IMA policy condition, for the IMA func
CRITICAL_DATA to allow grouping and limiting measurement of integrity
critical data.

Limit the measurement to the labels that are specified in the IMA
policy - CRITICAL_DATA+"label:=". If "label:=" is not provided with
the func CRITICAL_DATA, measure all the input integrity critical data.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

c4e43aa2 07-Jan-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

IMA: add policy rule to measure critical data

A new IMA policy rule is needed for the IMA hook
ima_measure_critical_data() and the corresponding func CRITICAL_DATA for
measuring the input buffer. The policy rule should ensure the buffer
would get measured only when the policy rule allows the action. The
policy rule should also support the necessary constraints (flags etc.)
for integrity critical buffer data measurements.

Add policy rule support for measuring integrity critical data.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

d6e64501 07-Jan-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

IMA: define a hook to measure kernel integrity critical data

IMA provides capabilities to measure file and buffer data. However,
various data structures, policies, and states stored in kernel memory
also impact the integrity of the system. Several kernel subsystems
contain such integrity critical data. These kernel subsystems help
protect the integrity of the system. Currently, IMA does not provide a
generic function for measuring kernel integrity critical data.

Define ima_measure_critical_data, a new IMA hook, to measure kernel
integrity critical data.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

291af651 07-Jan-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

IMA: add support to measure buffer data hash

The original IMA buffer data measurement sizes were small (e.g. boot
command line), but the new buffer data measurement use cases have data
sizes that are a lot larger. Just as IMA measures the file data hash,
not the file data, IMA should similarly support the option for measuring
buffer data hash.

Introduce a boolean parameter to support measuring buffer data hash,
which would be much smaller, instead of the buffer itself.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

2b4a2474 07-Jan-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

IMA: generalize keyring specific measurement constructs

IMA functions such as ima_match_keyring(), process_buffer_measurement(),
ima_match_policy() etc. handle data specific to keyrings. Currently,
these constructs are not generic to handle any func specific data.
This makes it harder to extend them without code duplication.

Refactor the keyring specific measurement constructs to be generic and
reusable in other measurement scenarios.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

29cd6591 08-Jan-2021 Daniel Colascione <dancol@google.com>

selinux: teach SELinux about anonymous inodes

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface. The example above is just
for exposition.)

Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

215b674b 08-Jan-2021 Lokesh Gidra <lokeshgidra@google.com>

security: add inode_init_security_anon() LSM hook

This change adds a new LSM hook, inode_init_security_anon(), that will
be used while creating secure anonymous inodes. The hook allows/denies
its creation and assigns a security context to the inode.

The new hook accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules
for granting/denying permission to create an anon-inode of the same type.
This context_inode's security_context can also be used to initialize the
newly created anon-inode's security_context.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

08abe46b 13-Jan-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support

When a superblock is assigned the SECURITY_FS_USE_XATTR behavior by the
policy yet it lacks xattr support, try to fall back to genfs rather than
rejecting the mount. If a genfscon rule is found for the filesystem,
then change the behavior to SECURITY_FS_USE_GENFS, otherwise reject the
mount as before. A similar fallback is already done in security_fs_use()
if no behavior specification is found for the given filesystem.

This is needed e.g. for virtiofs, which may or may not support xattrs
depending on the backing host filesystem.

Example:
# seinfo --genfs | grep ' ramfs'
genfscon ramfs / system_u:object_r:ramfs_t:s0
# echo '(fsuse xattr ramfs (system_u object_r fs_t ((s0) (s0))))' >ramfs_xattr.cil
# semodule -i ramfs_xattr.cil
# mount -t ramfs none /mnt

Before:
mount: /mnt: mount(2) system call failed: Operation not supported.

After:
(mount succeeds)
# ls -Zd /mnt
system_u:object_r:ramfs_t:s0 /mnt

See also:
https://lore.kernel.org/selinux/20210105142148.GA3200@redhat.com/T/
https://github.com/fedora-selinux/selinux-policy/pull/478

Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

ccf11dba 10-Jan-2021 Dinghao Liu <dinghao.liu@zju.edu.cn>

evm: Fix memleak in init_desc

tmp_tfm is allocated, but not freed on subsequent kmalloc failure, which
leads to a memory leak. Free tmp_tfm.

Fixes: d46eb3699502b ("evm: crypto hash replaced by shash")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
[zohar@linux.ibm.com: formatted/reworded patch description]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

e0de8a9a 06-Jan-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: mark selinux_xfrm_refcount as __read_mostly

This is motivated by a perfomance regression of selinux_xfrm_enabled()
that happened on a RHEL kernel due to false sharing between
selinux_xfrm_refcount and (the late) selinux_ss.policy_rwlock (i.e. the
.bss section memory layout changed such that they happened to share the
same cacheline). Since the policy rwlock's memory region was modified
upon each read-side critical section, the readers of
selinux_xfrm_refcount had frequent cache misses, eventually leading to a
significant performance degradation under a TCP SYN flood on a system
with many cores (32 in this case, but it's detectable on less cores as
well).

While upstream has since switched to RCU locking, so the same can no
longer happen here, selinux_xfrm_refcount could still share a cacheline
with another frequently written region, thus marking it __read_mostly
still makes sense. __read_mostly helps, because it will put the symbol
in a separate section along with other read-mostly variables, so there
should never be a clash with frequently written data.

Since selinux_xfrm_refcount is modified only in case of an explicit
action, it should be safe to do this (i.e. it shouldn't disrupt other
read-mostly variables too much).

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

cd2bb4cb 06-Jan-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: mark some global variables __ro_after_init

All of these are never modified outside initcalls, so they can be
__ro_after_init.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

db478cd6 06-Jan-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: make selinuxfs_mount static

It is not referenced outside selinuxfs.c, so remove its extern header
declaration and make it static.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

3c797e51 06-Jan-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: drop the unnecessary aurule_callback variable

Its value is actually not changed anywhere, so it can be substituted for
a direct call to audit_update_lsm_rules().

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

46434ba0 06-Jan-2021 Ondrej Mosnacek <omosnace@redhat.com>

selinux: remove unused global variables

All of sel_ib_pkey_list, sel_netif_list, sel_netnode_list, and
sel_netport_list are declared but never used. Remove them.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

a9ffe682 18-Dec-2020 Amir Goldstein <amir73il@gmail.com>

selinux: fix inconsistency between inode_getxattr and inode_listsecurity

When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
intercept in inode_getxattr hooks.

When selinux LSM is installed but not initialized, it will list the
security.selinux xattr in inode_listsecurity, but will not intercept it
in inode_getxattr. This results in -ENODATA for a getxattr call for an
xattr returned by listxattr.

This situation was manifested as overlayfs failure to copy up lower
files from squashfs when selinux is built-in but not initialized,
because ovl_copy_xattr() iterates the lower inode xattrs by
vfs_listxattr() and vfs_getxattr().

Match the logic of inode_listsecurity to that of inode_getxattr and
do not list the security.selinux xattr if selinux is not initialized.

Reported-by: Michael Labriola <michael.d.labriola@gmail.com>
Tested-by: Michael Labriola <michael.d.labriola@gmail.com>
Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/
Fixes: c8e222616c7e ("selinux: allow reading labels before policy is loaded")
Cc: stable@vger.kernel.org#v5.9+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

95ca9072 15-Dec-2020 Paolo Abeni <pabeni@redhat.com>

selinux: handle MPTCP consistently with TCP

The MPTCP protocol uses a specific protocol value, even if
it's an extension to TCP. Additionally, MPTCP sockets
could 'fall-back' to TCP at run-time, depending on peer MPTCP
support and available resources.

As a consequence of the specific protocol number, selinux
applies the raw_socket class to MPTCP sockets.

Existing TCP application converted to MPTCP - or forced to
use MPTCP socket with user-space hacks - will need an
updated policy to run successfully.

This change lets selinux attach the TCP socket class to
MPTCP sockets, too, so that no policy changes are needed in
the above scenario.

Note that the MPTCP is setting, propagating and updating the
security context on all the subflows and related request
socket.

Link: https://lore.kernel.org/linux-security-module/CAHC9VhTaK3xx0hEGByD2zxfF7fadyPP1kb-WeWH_YCyq9X-sRg@mail.gmail.com/T/#t
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[PM: tweaked subject's prefix]
Signed-off-by: Paul Moore <paul@paul-moore.com>

95ebabde 17-Dec-2020 Eric W. Biederman <ebiederm@xmission.com>

capabilities: Don't allow writing ambiguous v3 file capabilities

The v3 file capabilities have a uid field that records the filesystem
uid of the root user of the user namespace the file capabilities are
valid in.

When someone is silly enough to have the same underlying uid as the
root uid of multiple nested containers a v3 filesystem capability can
be ambiguous.

In the spirit of don't do that then, forbid writing a v3 filesystem
capability if it is ambiguous.

Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities")
Reviewed-by: Andrew G. Morgan <morgan@kernel.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

2f2fce3d 24-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'Smack-for-5.11-io_uring-fix' of git://github.com/cschaufler/smack-next

Pull smack fix from Casey Schaufler:
"Provide a fix for the incorrect handling of privilege in the face of
io_uring's use of kernel threads. That invalidated an long standing
assumption regarding the privilege of kernel threads.

The fix is simple and safe. It was provided by Jens Axboe and has been
tested"

* tag 'Smack-for-5.11-io_uring-fix' of git://github.com/cschaufler/smack-next:
Smack: Handle io_uring kernel thread privileges


4a1106af 24-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'efi_updates_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI updates from Borislav Petkov:
"These got delayed due to a last minute ia64 build issue which got
fixed in the meantime.

EFI updates collected by Ard Biesheuvel:

- Don't move BSS section around pointlessly in the x86 decompressor

- Refactor helper for discovering the EFI secure boot mode

- Wire up EFI secure boot to IMA for arm64

- Some fixes for the capsule loader

- Expose the RT_PROP table via the EFI test module

- Relax DT and kernel placement restrictions on ARM

with a few followup fixes:

- fix the build breakage on IA64 caused by recent capsule loader
changes

- suppress a type mismatch build warning in the expansion of
EFI_PHYS_ALIGN on ARM"

* tag 'efi_updates_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: arm: force use of unsigned type for EFI_PHYS_ALIGN
efi: ia64: disable the capsule loader
efi: stub: get rid of efi_get_max_fdt_addr()
efi/efi_test: read RuntimeServicesSupported
efi: arm: reduce minimum alignment of uncompressed kernel
efi: capsule: clean scatter-gather entries from the D-cache
efi: capsule: use atomic kmap for transient sglist mappings
efi: x86/xen: switch to efi_get_secureboot_mode helper
arm64/ima: add ima_arch support
ima: generalize x86/EFI arch glue for other EFI architectures
efi: generalize efi_get_secureboot
efi/libstub: EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER should not default to yes
efi/x86: Only copy the compressed kernel image in efi_relocate_kernel()
efi/libstub/x86: simplify efi_is_native()


942cb357 22-Dec-2020 Casey Schaufler <casey@schaufler-ca.com>

Smack: Handle io_uring kernel thread privileges

Smack assumes that kernel threads are privileged for smackfs
operations. This was necessary because the credential of the
kernel thread was not related to a user operation. With io_uring
the credential does reflect a user's rights and can be used.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

92dbc9de 17-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'ovl-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:

- Allow unprivileged mounting in a user namespace.

For quite some time the security model of overlayfs has been that
operations on underlying layers shall be performed with the
privileges of the mounting task.

This way an unprvileged user cannot gain privileges by the act of
mounting an overlayfs instance. A full audit of all function calls
made by the overlayfs code has been performed to see whether they
conform to this model, and this branch contains some fixes in this
regard.

- Support running on copied filesystem images by optionally disabling
UUID verification.

- Bug fixes as well as documentation updates.

* tag 'ovl-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: unprivieged mounts
ovl: do not get metacopy for userxattr
ovl: do not fail because of O_NOATIME
ovl: do not fail when setting origin xattr
ovl: user xattr
ovl: simplify file splice
ovl: make ioctl() safe
ovl: check privs before decoding file handle
vfs: verify source area in vfs_dedupe_file_range_one()
vfs: move cap_convert_nscap() call into vfs_setxattr()
ovl: fix incorrect extent info in metacopy case
ovl: expand warning in ovl_d_real()
ovl: document lower modification caveats
ovl: warn about orphan metacopy
ovl: doc clarification
ovl: introduce new "uuid=off" option for inodes index feature
ovl: propagate ovl_fs to ovl_decode_real_fh and ovl_encode_real_fh


8bda68d6 16-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'Smack-for-5.11' of git://github.com/cschaufler/smack-next

Pull smack updates from Casey Schaufler:
"There are no functional changes. Just one minor code clean-up and a
set of corrections in function header comments"

* tag 'Smack-for-5.11' of git://github.com/cschaufler/smack-next:
security/smack: remove unused varible 'rc'
Smack: fix kernel-doc interface on functions


e20a9b92 16-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity subsystem updates from Mimi Zohar:
"Just three patches here. Other integrity changes are being upstreamed
via EFI (defines a common EFI secure and trusted boot IMA policy) and
BPF LSM (exporting the IMA file cache hash info based on inode).

The three patches included here:

- bug fix: fail calculating the file hash, when a file not opened for
read and the attempt to re-open it for read fails.

- defer processing the "ima_appraise" boot command line option to
avoid enabling different modes (e.g. fix, log) to when the secure
boot flag is available on arm.

- defines "ima-buf" as the default IMA buffer measurement template in
preparation for the builtin integrity "critical data" policy"

* tag 'integrity-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Don't modify file descriptor mode on the fly
ima: select ima-buf template for buffer measurement
ima: defer arch_ima_get_secureboot() call to IMA init time


ca5b877b 16-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
"While we have a small number of SELinux patches for v5.11, there are a
few changes worth highlighting:

- Change the LSM network hooks to pass flowi_common structs instead
of the parent flowi struct as the LSMs do not currently need the
full flowi struct and they do not have enough information to use it
safely (missing information on the address family).

This patch was discussed both with Herbert Xu (representing team
netdev) and James Morris (representing team
LSMs-other-than-SELinux).

- Fix how we handle errors in inode_doinit_with_dentry() so that we
attempt to properly label the inode on following lookups instead of
continuing to treat it as unlabeled.

- Tweak the kernel logic around allowx, auditallowx, and dontauditx
SELinux policy statements such that the auditx/dontauditx are
effective even without the allowx statement.

Everything passes our test suite"

* tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
lsm,selinux: pass flowi_common instead of flowi to the LSM hooks
selinux: Fix fall-through warnings for Clang
selinux: drop super_block backpointer from superblock_security_struct
selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
selinux: allow dontauditx and auditallowx rules to take effect without allowx
selinux: fix error initialization in inode_doinit_with_dentry()


3d5de2dd 16-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'audit-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
"A small set of audit patches for v5.11 with four patches in total and
only one of any real significance.

Richard's patch to trigger accompanying records causes the kernel to
emit additional related records when an audit event occurs; helping
provide some much needed context to events in the audit log. It is
also worth mentioning that this is a revised patch based on an earlier
attempt that had to be reverted in the v5.8 time frame.

Everything passes our test suite, and with no problems reported please
merge this for v5.11"

* tag 'audit-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: replace atomic_add_return()
audit: fix macros warnings
audit: trigger accompanying records when no rules present
audit: fix a kernel-doc markup


9801ca27 15-Dec-2020 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

apparmor: remove duplicate macro list_entry_is_head()

Strangely I hadn't had noticed the existence of the list_entry_is_head()
in apparmor code when added the same one in the list.h. Luckily it's
fully identical and didn't break builds. In any case we don't need a
duplicate anymore, thus remove it from apparmor code.

Link: https://lkml.kernel.org/r/20201208100639.88182-1-andriy.shevchenko@linux.intel.com
Fixes: e130816164e244 ("include/linux/list.h: add a macro to test if entry is pointing to the head")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E . Hallyn " <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d635a69d 15-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
"Core:

- support "prefer busy polling" NAPI operation mode, where we defer
softirq for some time expecting applications to periodically busy
poll

- AF_XDP: improve efficiency by more batching and hindering the
adjacency cache prefetcher

- af_packet: make packet_fanout.arr size configurable up to 64K

- tcp: optimize TCP zero copy receive in presence of partial or
unaligned reads making zero copy a performance win for much smaller
messages

- XDP: add bulk APIs for returning / freeing frames

- sched: support fragmenting IP packets as they come out of conntrack

- net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs

BPF:

- BPF switch from crude rlimit-based to memcg-based memory accounting

- BPF type format information for kernel modules and related tracing
enhancements

- BPF implement task local storage for BPF LSM

- allow the FENTRY/FEXIT/RAW_TP tracing programs to use
bpf_sk_storage

Protocols:

- mptcp: improve multiple xmit streams support, memory accounting and
many smaller improvements

- TLS: support CHACHA20-POLY1305 cipher

- seg6: add support for SRv6 End.DT4/DT6 behavior

- sctp: Implement RFC 6951: UDP Encapsulation of SCTP

- ppp_generic: add ability to bridge channels directly

- bridge: Connectivity Fault Management (CFM) support as is defined
in IEEE 802.1Q section 12.14.

Drivers:

- mlx5: make use of the new auxiliary bus to organize the driver
internals

- mlx5: more accurate port TX timestamping support

- mlxsw:
- improve the efficiency of offloaded next hop updates by using
the new nexthop object API
- support blackhole nexthops
- support IEEE 802.1ad (Q-in-Q) bridging

- rtw88: major bluetooth co-existance improvements

- iwlwifi: support new 6 GHz frequency band

- ath11k: Fast Initial Link Setup (FILS)

- mt7915: dual band concurrent (DBDC) support

- net: ipa: add basic support for IPA v4.5

Refactor:

- a few pieces of in_interrupt() cleanup work from Sebastian Andrzej
Siewior

- phy: add support for shared interrupts; get rid of multiple driver
APIs and have the drivers write a full IRQ handler, slight growth
of driver code should be compensated by the simpler API which also
allows shared IRQs

- add common code for handling netdev per-cpu counters

- move TX packet re-allocation from Ethernet switch tag drivers to a
central place

- improve efficiency and rename nla_strlcpy

- number of W=1 warning cleanups as we now catch those in a patchwork
build bot

Old code removal:

- wan: delete the DLCI / SDLA drivers

- wimax: move to staging

- wifi: remove old WDS wifi bridging support"

* tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1922 commits)
net: hns3: fix expression that is currently always true
net: fix proc_fs init handling in af_packet and tls
nfc: pn533: convert comma to semicolon
af_vsock: Assign the vsock transport considering the vsock address flags
af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
vsock_addr: Check for supported flag values
vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
vm_sockets: Add flags field in the vsock address data structure
net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled
tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context
nfc: s3fwrn5: Release the nfc firmware
net: vxget: clean up sparse warnings
mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router
mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3
mlxsw: spectrum_router_xm: Introduce basic XM cache flushing
mlxsw: reg: Add Router LPM Cache Enable Register
mlxsw: reg: Add Router LPM Cache ML Delete Register
mlxsw: spectrum_router_xm: Implement L-value tracking for M-index
mlxsw: reg: Add XM Router M Table Register
...


9e4b0d55 14-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
"API:
- Add speed testing on 1420-byte blocks for networking

Algorithms:
- Improve performance of chacha on ARM for network packets
- Improve performance of aegis128 on ARM for network packets

Drivers:
- Add support for Keem Bay OCS AES/SM4
- Add support for QAT 4xxx devices
- Enable crypto-engine retry mechanism in caam
- Enable support for crypto engine on sdm845 in qce
- Add HiSilicon PRNG driver support"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (161 commits)
crypto: qat - add capability detection logic in qat_4xxx
crypto: qat - add AES-XTS support for QAT GEN4 devices
crypto: qat - add AES-CTR support for QAT GEN4 devices
crypto: atmel-i2c - select CONFIG_BITREVERSE
crypto: hisilicon/trng - replace atomic_add_return()
crypto: keembay - Add support for Keem Bay OCS AES/SM4
dt-bindings: Add Keem Bay OCS AES bindings
crypto: aegis128 - avoid spurious references crypto_aegis128_update_simd
crypto: seed - remove trailing semicolon in macro definition
crypto: x86/poly1305 - Use TEST %reg,%reg instead of CMP $0,%reg
crypto: x86/sha512 - Use TEST %reg,%reg instead of CMP $0,%reg
crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%reg
crypto: cpt - Fix sparse warnings in cptpf
hwrng: ks-sa - Add dependency on IOMEM and OF
crypto: lib/blake2s - Move selftest prototype into header file
crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata
crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()
crypto: ccree - rework cache parameters handling
crypto: cavium - Use dma_set_mask_and_coherent to simplify code
crypto: marvell/octeontx - Use dma_set_mask_and_coherent to simplify code
...


da062855 14-Dec-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'tomoyo-pr-20201214' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1

Pull tomoyo updates from Tetsuo Handa:
"Limit recursion depth, fix clang warning, fix comment typo, and
silence memory allocation failure warning"

* tag 'tomoyo-pr-20201214' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: Fix typo in comments.
tomoyo: Fix null pointer check
tomoyo: Limit wildcard recursion depth.
tomoyo: fix clang pointer arithmetic warning
tomoyo: Loosen pathname/domainname validation.


7c03e2cd 14-Dec-2020 Miklos Szeredi <mszeredi@redhat.com>

vfs: move cap_convert_nscap() call into vfs_setxattr()

cap_convert_nscap() does permission checking as well as conversion of the
xattr value conditionally based on fs's user-ns.

This is needed by overlayfs and probably other layered fs (ecryptfs) and is
what vfs_foo() is supposed to do anyway.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>

e2437ac2 12-Dec-2020 Jakub Kicinski <kuba@kernel.org>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-12-12

Just one patch this time:

1) Redact the SA keys with kernel lockdown confidentiality.
If enabled, no secret keys are sent to uuserspace.
From Antony Antony.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: redact SA secret with lockdown confidentiality
====================

Link: https://lore.kernel.org/r/20201212085737.2101294-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


15269fb1 05-Dec-2020 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

tomoyo: Fix typo in comments.

Spotted by developers and codespell program.

Co-developed-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Co-developed-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

a1dd1d86 04-Dec-2020 Jakub Kicinski <kuba@kernel.org>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-12-03

The main changes are:

1) Support BTF in kernel modules, from Andrii.

2) Introduce preferred busy-polling, from Björn.

3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh.

4) Memcg-based memory accounting for bpf objects, from Roman.

5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits)
selftests/bpf: Fix invalid use of strncat in test_sockmap
libbpf: Use memcpy instead of strncpy to please GCC
selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module
selftests/bpf: Add tp_btf CO-RE reloc test for modules
libbpf: Support attachment of BPF tracing programs to kernel modules
libbpf: Factor out low-level BPF program loading helper
bpf: Allow to specify kernel module BTFs when attaching BPF programs
bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier
selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF
selftests/bpf: Add support for marking sub-tests as skipped
selftests/bpf: Add bpf_testmod kernel module for testing
libbpf: Add kernel module BTF support for CO-RE relocations
libbpf: Refactor CO-RE relocs to not assume a single BTF object
libbpf: Add internal helper to load BTF data by FD
bpf: Keep module's btf_data_size intact after load
bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP
bpf: Adds support for setting window clamp
samples/bpf: Fix spelling mistake "recieving" -> "receiving"
bpf: Fix cold build of test_progs-no_alu32
...
====================

Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


41dd9596 30-Nov-2020 Florian Westphal <fw@strlen.de>

security: add const qualifier to struct sock in various places

A followup change to tcp_request_sock_op would have to drop the 'const'
qualifier from the 'route_req' function as the
'security_inet_conn_request' call is moved there - and that function
expects a 'struct sock *'.

However, it turns out its also possible to add a const qualifier to
security_inet_conn_request instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

207cdd56 26-Nov-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Don't modify file descriptor mode on the fly

Commit a408e4a86b36b ("ima: open a new file instance if no read
permissions") already introduced a second open to measure a file when the
original file descriptor does not allow it. However, it didn't remove the
existing method of changing the mode of the original file descriptor, which
is still necessary if the current process does not have enough privileges
to open a new one.

Changing the mode isn't really an option, as the filesystem might need to
do preliminary steps to make the read possible. Thus, this patch removes
the code and keeps the second open as the only option to measure a file
when it is unreadable with the original file descriptor.

Cc: <stable@vger.kernel.org> # 4.20.x: 0014cc04e8ec0 ima: Set file->f_mode
Fixes: 2fe5d6def1672 ("ima: integrity appraisal extension")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

1b6b924e 26-Nov-2020 Zheng Zengkai <zhengzengkai@huawei.com>

tomoyo: Fix null pointer check

Since tomoyo_memory_ok() will check for null pointer returned by
kzalloc() in tomoyo_assign_profile(), tomoyo_assign_namespace(),
tomoyo_get_name() and tomoyo_commit_ok(), then emit OOM warnings
if needed. And this is the expected behavior as informed by
Tetsuo Handa.

Let's add __GFP_NOWARN to kzalloc() in those related functions.
Besides, to achieve this goal, remove the null check for entry
right after kzalloc() in tomoyo_assign_namespace().

Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

c7a5899e 17-Nov-2020 Antony Antony <antony.antony@secunet.com>

xfrm: redact SA secret with lockdown confidentiality

redact XFRM SA secret in the netlink response to xfrm_get_sa()
or dumpall sa.
Enable lockdown, confidentiality mode, at boot or at run time.

e.g. when enabled:
cat /sys/kernel/security/lockdown
none integrity [confidentiality]

ip xfrm state
src 172.16.1.200 dst 172.16.1.100
proto esp spi 0x00000002 reqid 2 mode tunnel
replay-window 0
aead rfc4106(gcm(aes)) 0x0000000000000000000000000000000000000000 96

note: the aead secret is redacted.
Redacting secret is also a FIPS 140-2 requirement.

v1->v2
- add size checks before memset calls
v2->v3
- replace spaces with tabs for consistency
v3->v4
- use kernel lockdown instead of a /proc setting
v4->v5
- remove kconfig option

Reviewed-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

403319be 24-Nov-2020 KP Singh <kpsingh@google.com>

ima: Implement ima_inode_hash

This is in preparation to add a helper for BPF LSM programs to use
IMA hashes when attached to LSM hooks. There are LSM hooks like
inode_unlink which do not have a struct file * argument and cannot
use the existing ima_file_hash API.

An inode based API is, therefore, useful in LSM based detections like an
executable trying to delete itself which rely on the inode_unlink LSM
hook.

Moreover, the ima_file_hash function does nothing with the struct file
pointer apart from calling file_inode on it and converting it to an
inode.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20201124151210.1081188-2-kpsingh@chromium.org

3df98d79 27-Sep-2020 Paul Moore <paul@paul-moore.com>

lsm,selinux: pass flowi_common instead of flowi to the LSM hooks

As pointed out by Herbert in a recent related patch, the LSM hooks do
not have the necessary address family information to use the flowi
struct safely. As none of the LSMs currently use any of the protocol
specific flowi information, replace the flowi pointers with pointers
to the address family independent flowi_common struct.

Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

b2d99bcb 19-Nov-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

selinux: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>

8eb62169 16-Sep-2020 David Howells <dhowells@redhat.com>

keys: Provide the original description to the key preparser

Provide the proposed description (add key) or the original description
(update/instantiate key) when preparsing a key so that the key type can
validate it against the data.

This is important for rxrpc server keys as we need to check that they have
the right amount of key material present - and it's better to do that when
the key is loaded rather than deep in trying to process a response packet.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
cc: keyrings@vger.kernel.org

dea87d08 12-Nov-2020 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

ima: select ima-buf template for buffer measurement

The default IMA template used for all policy rules is the value set
for CONFIG_IMA_DEFAULT_TEMPLATE if the policy rule does not specify
a template. The default IMA template for buffer measurements should be
'ima-buf' - so that the measured buffer is correctly included in the IMA
measurement log entry.

With the default template format, buffer measurements are added to
the measurement list, but do not include the buffer data, making it
difficult, if not impossible, to validate. Including 'ima-buf'
template records in the measurement list by default, should not impact
existing attestation servers without 'ima-buf' template support.

Initialize a global 'ima-buf' template and select that template,
by default, for buffer measurements.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

a24d22b2 12-Nov-2020 Eric Biggers <ebiggers@google.com>

crypto: sha - split sha.h into sha1.h and sha2.h

Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.

This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure. So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.

Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.

This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1. It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

56495a24 19-Nov-2020 Jakub Kicinski <kuba@kernel.org>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


9b0072e2 07-Nov-2020 Alex Shi <alexs@kernel.org>

security/smack: remove unused varible 'rc'

This varible isn't used and can be removed to avoid a gcc warning:
security/smack/smack_lsm.c:3873:6: warning: variable ‘rc’ set but not
used [-Wunused-but-set-variable]

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

30636a59 14-Nov-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20201113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
"One small SELinux patch to make sure we return an error code when an
allocation fails. It passes all of our tests, but given the nature of
the patch that isn't surprising"

* tag 'selinux-pr-20201113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: Fix error return code in sel_ib_pkey_sid_slow()


07cbce2e 14-Nov-2020 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-11-14

1) Add BTF generation for kernel modules and extend BTF infra in kernel
e.g. support for split BTF loading and validation, from Andrii Nakryiko.

2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
on inlined branch conditions, from Alexei Starovoitov.

3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.

4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
infra, from Martin KaFai Lau.

5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
XDP_REDIRECT path, from Lorenzo Bianconi.

6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
context, from Song Liu.

7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
for different target archs on same source tree, from Jean-Philippe Brucker.

8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.

9) Move functionality from test_tcpbpf_user into the test_progs framework so it
can run in BPF CI, from Alexander Duyck.

10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.

Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.

[0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/
https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
net: mlx5: Add xdp tx return bulking support
net: mvpp2: Add xdp tx return bulking support
net: mvneta: Add xdp tx return bulking support
net: page_pool: Add bulk support for ptr_ring
net: xdp: Introduce bulking for xdp tx return path
bpf: Expose bpf_d_path helper to sleepable LSM hooks
bpf: Augment the set of sleepable LSM hooks
bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Rename some functions in bpf_sk_storage
bpf: Folding omem_charge() into sk_storage_charge()
selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
selftests/bpf: Add skb_pkt_end test
bpf: Support for pointers beyond pkt_end.
tools/bpf: Always run the *-clean recipes
tools/bpf: Add bootstrap/ to .gitignore
bpf: Fix NULL dereference in bpf_task_storage
tools/bpftool: Fix build slowdown
tools/runqslower: Build bpftool using HOSTCC
tools/runqslower: Enable out-of-tree build
...
====================

Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


7da31b85 13-Nov-2020 Alex Shi <alexs@kernel.org>

Smack: fix kernel-doc interface on functions

The are some kernel-doc interface issues:
security/smack/smackfs.c:1950: warning: Function parameter or member
'list' not described in 'smk_parse_label_list'
security/smack/smackfs.c:1950: warning: Excess function parameter
'private' description in 'smk_parse_label_list'
security/smack/smackfs.c:1979: warning: Function parameter or member
'list' not described in 'smk_destroy_label_list'
security/smack/smackfs.c:1979: warning: Excess function parameter 'head'
description in 'smk_destroy_label_list'
security/smack/smackfs.c:2141: warning: Function parameter or member
'count' not described in 'smk_read_logging'
security/smack/smackfs.c:2141: warning: Excess function parameter 'cn'
description in 'smk_read_logging'
security/smack/smackfs.c:2278: warning: Function parameter or member
'format' not described in 'smk_user_access'

Correct them in this patch.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

c350f8be 12-Nov-2020 Chen Zhou <chenzhou10@huawei.com>

selinux: Fix error return code in sel_ib_pkey_sid_slow()

Fix to return a negative error code from the error handling case
instead of 0 in function sel_ib_pkey_sid_slow(), as done elsewhere
in this function.

Cc: stable@vger.kernel.org
Fixes: 409dcf31538a ("selinux: Add a cache for quicker retreival of PKey SIDs")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

b159e86b 04-Nov-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: drop super_block backpointer from superblock_security_struct

It appears to have been needed for selinux_complete_init() in the past,
but today it's useless.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

4cf1bc1f 06-Nov-2020 KP Singh <kpsingh@google.com>

bpf: Implement task local storage

Similar to bpf_local_storage for sockets and inodes add local storage
for task_struct.

The life-cycle of storage is managed with the life-cycle of the
task_struct. i.e. the storage is destroyed along with the owning task
with a callback to the bpf_task_storage_free from the task_free LSM
hook.

The BPF LSM allocates an __rcu pointer to the bpf_local_storage in
the security blob which are now stackable and can co-exist with other
LSMs.

The userspace map operations can be done by using a pid fd as a key
passed to the lookup, update and delete operations.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-3-kpsingh@chromium.org

25519d68 30-Oct-2020 Chester Lin <clin@suse.com>

ima: generalize x86/EFI arch glue for other EFI architectures

Move the x86 IMA arch code into security/integrity/ima/ima_efi.c,
so that we will be able to wire it up for arm64 in a future patch.

Co-developed-by: Chester Lin <clin@suse.com>
Signed-off-by: Chester Lin <clin@suse.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

200ea5a2 03-Nov-2020 Paul Moore <paul@paul-moore.com>

selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling

A previous fix, commit 83370b31a915 ("selinux: fix error initialization
in inode_doinit_with_dentry()"), changed how failures were handled
before a SELinux policy was loaded. Unfortunately that patch was
potentially problematic for two reasons: it set the isec->initialized
state without holding a lock, and it didn't set the inode's SELinux
label to the "default" for the particular filesystem. The later can
be a problem if/when a later attempt to revalidate the inode fails
and SELinux reverts to the existing inode label.

This patch should restore the default inode labeling that existed
before the original fix, without affecting the LABEL_INVALID marking
such that revalidation will still be attempted in the future.

Fixes: 83370b31a915 ("selinux: fix error initialization in inode_doinit_with_dentry()")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

e991a40b 02-Nov-2020 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

tomoyo: Limit wildcard recursion depth.

Since wildcards that need recursion consume kernel stack memory (or might
cause CPU stall warning problem), we cannot allow infinite recursion.

Since TOMOYO 1.8 survived with 20 recursions limit for 5 years, nobody
would complain if applying this limit to TOMOYO 2.6.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

b000d5cb 13-Oct-2020 Ard Biesheuvel <ardb@kernel.org>

ima: defer arch_ima_get_secureboot() call to IMA init time

Chester reports that it is necessary to introduce a new way to pass
the EFI secure boot status between the EFI stub and the core kernel
on ARM systems. The usual way of obtaining this information is by
checking the SecureBoot and SetupMode EFI variables, but this can
only be done after the EFI variable workqueue is created, which
occurs in a subsys_initcall(), whereas arch_ima_get_secureboot()
is called much earlier by the IMA framework.

However, the IMA framework itself is started as a late_initcall,
and the only reason the call to arch_ima_get_secureboot() occurs
so early is because it happens in the context of a __setup()
callback that parses the ima_appraise= command line parameter.

So let's refactor this code a little bit, by using a core_param()
callback to capture the command line argument, and deferring any
reasoning based on its contents to the IMA init routine.

Cc: Chester Lin <clin@suse.com>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Link: https://lore.kernel.org/linux-arm-kernel/20200904072905.25332-2-clin@suse.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: kernel test robot <lkp@intel.com> [missing core_param()]
[zohar@linux.ibm.com: included linux/module.h]
Tested-by: Chester Lin <clin@suse.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

4739eeaf 31-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

ima: Replace zero-length array with flexible-array member

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>

d9594e04 28-Oct-2020 Arnd Bergmann <arnd@arndb.de>

tomoyo: fix clang pointer arithmetic warning

clang warns about additions on NULL pointers being undefined in C:

security/tomoyo/securityfs_if.c:226:59: warning: arithmetic on a null pointer treated as a cast from integer to pointer is a GNU extension [-Wnull-pointer-arithmetic]
securityfs_create_file(name, mode, parent, ((u8 *) NULL) + key,

Change the code to instead use a cast through uintptr_t to avoid
the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

44141f58 09-Oct-2020 bauen1 <j2468h@googlemail.com>

selinux: allow dontauditx and auditallowx rules to take effect without allowx

This allows for dontauditing very specific ioctls e.g. TCGETS without
dontauditing every ioctl or granting additional permissions.

Now either an allowx, dontauditx or auditallowx rules enables checking
for extended permissions.

Signed-off-by: Jonathan Hettwer <j2468h@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

83370b31 08-Oct-2020 Tianyue Ren <rentianyue@kylinos.cn>

selinux: fix error initialization in inode_doinit_with_dentry()

Mark the inode security label as invalid if we cannot find
a dentry so that we will retry later rather than marking it
initialized with the unlabeled SID.

Fixes: 9287aed2ad1f ("selinux: Convert isec->lock into a spinlock")
Signed-off-by: Tianyue Ren <rentianyue@kylinos.cn>
[PM: minor comment tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>

6d915476 22-Sep-2020 Richard Guy Briggs <rgb@redhat.com>

audit: trigger accompanying records when no rules present

When there are no audit rules registered, mandatory records (config,
etc.) are missing their accompanying records (syscall, proctitle, etc.).

This is due to audit context dummy set on syscall entry based on absence
of rules that signals that no other records are to be printed. Clear the dummy
bit if any record is generated, open coding this in audit_log_start().

The proctitle context and dummy checks are pointless since the
proctitle record will not be printed if no syscall records are printed.

The fds array is reset to -1 after the first syscall to indicate it
isn't valid any more, but was never set to -1 when the context was
allocated to indicate it wasn't yet valid.

Check ctx->pwd in audit_log_name().

The audit_inode* functions can be called without going through
getname_flags() or getname_kernel() that sets audit_names and cwd, so
set the cwd in audit_alloc_name() if it has not already been done so due to
audit_names being valid and purge all other audit_getcwd() calls.

Revert the LSM dump_common_audit_data() LSM_AUDIT_DATA_* cases from the
ghak96 patch since they are no longer necessary due to cwd coverage in
audit_alloc_name().

Thanks to bauen1 <j2468h@googlemail.com> for reporting LSM situations in
which context->cwd is not valid, inadvertantly fixed by the ghak96 patch.

Please see upstream github issue
https://github.com/linux-audit/audit-kernel/issues/120
This is also related to upstream github issue
https://github.com/linux-audit/audit-kernel/issues/96

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

81ecf91e 25-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'safesetid-5.10' of git://github.com/micah-morton/linux

Pull SafeSetID updates from Micah Morton:
"The changes are mostly contained to within the SafeSetID LSM, with the
exception of a few 1-line changes to change some ns_capable() calls to
ns_capable_setid() -- causing a flag (CAP_OPT_INSETID) to be set that
is examined by SafeSetID code and nothing else in the kernel.

The changes to SafeSetID internally allow for setting up GID
transition security policies, as already existed for UIDs"

* tag 'safesetid-5.10' of git://github.com/micah-morton/linux:
LSM: SafeSetID: Fix warnings reported by test bot
LSM: SafeSetID: Add GID security policy handling
LSM: Signal to SafeSetID when setting group IDs


91989c70 16-Oct-2020 Jens Axboe <axboe@kernel.dk>

task_work: cleanup notification modes

A previous commit changed the notification mode from true/false to an
int, allowing notify-no, notify-yes, or signal-notify. This was
backwards compatible in the sense that any existing true/false user
would translate to either 0 (on notification sent) or 1, the latter
which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.

Clean this up properly, and define a proper enum for the notification
mode. Now we have:

- TWA_NONE. This is 0, same as before the original change, meaning no
notification requested.
- TWA_RESUME. This is 1, same as before the original change, meaning
that we use TIF_NOTIFY_RESUME.
- TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
notification.

Clean up all the callers, switching their 0/1/false/true to using the
appropriate TWA_* mode for notifications.

Fixes: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9ff9b0d3 15-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.

Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).

- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.

- Allow more than 255 IPv4 multicast interfaces.

- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.

- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.

- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.

- Allow more calls to same peer in RxRPC.

- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.

- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.

- Add TC actions for implementing MPLS L2 VPNs.

- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.

- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.

- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.

- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.

- Support sleepable BPF programs, initially targeting LSM and tracing.

- Add bpf_d_path() helper for returning full path for given 'struct
path'.

- Make bpf_tail_call compatible with bpf-to-bpf calls.

- Allow BPF programs to call map_update_elem on sockmaps.

- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).

- Support listing and getting information about bpf_links via the bpf
syscall.

- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).

- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.

- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).

- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.

- Add XDP support for Intel's igb driver.

- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.

- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.

- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.

- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.

- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.

- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.

- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.

- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.

- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.

- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...


840e5bb3 15-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
"Continuing IMA policy rule cleanup and validation in particular for
measuring keys, adding/removing/updating informational and error
messages (e.g. "ima_appraise" boot command line option), and other bug
fixes (e.g. minimal data size validation before use, return code and
NULL pointer checking)"

* tag 'integrity-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Fix NULL pointer dereference in ima_file_hash
evm: Check size of security.evm before using it
ima: Remove semicolon at the end of ima_get_binary_runtime_size()
ima: Don't ignore errors from crypto_shash_update()
ima: Use kmemdup rather than kmalloc+memcpy
integrity: include keyring name for unknown key request
ima: limit secure boot feedback scope for appraise
integrity: invalid kernel parameters feedback
ima: add check for enforced appraise option
integrity: Use current_uid() in integrity_audit_message()
ima: Fail rule parsing when asymmetric key measurement isn't supportable
ima: Pre-parse the list of keyrings in a KEY_CHECK rule


726eb70e 15-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
"Here is the big set of char, misc, and other assorted driver subsystem
patches for 5.10-rc1.

There's a lot of different things in here, all over the drivers/
directory. Some summaries:

- soundwire driver updates

- habanalabs driver updates

- extcon driver updates

- nitro_enclaves new driver

- fsl-mc driver and core updates

- mhi core and bus updates

- nvmem driver updates

- eeprom driver updates

- binder driver updates and fixes

- vbox minor bugfixes

- fsi driver updates

- w1 driver updates

- coresight driver updates

- interconnect driver updates

- misc driver updates

- other minor driver updates

All of these have been in linux-next for a while with no reported
issues"

* tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (396 commits)
binder: fix UAF when releasing todo list
docs: w1: w1_therm: Fix broken xref, mistakes, clarify text
misc: Kconfig: fix a HISI_HIKEY_USB dependency
LSM: Fix type of id parameter in kernel_post_load_data prototype
misc: Kconfig: add a new dependency for HISI_HIKEY_USB
firmware_loader: fix a kernel-doc markup
w1: w1_therm: make w1_poll_completion static
binder: simplify the return expression of binder_mmap
test_firmware: Test partial read support
firmware: Add request_partial_firmware_into_buf()
firmware: Store opt_flags in fw_priv
fs/kernel_file_read: Add "offset" arg for partial reads
IMA: Add support for file reads without contents
LSM: Add "contents" flag to kernel_read_file hook
module: Call security_kernel_post_load_data()
firmware_loader: Use security_post_load_data()
LSM: Introduce kernel_post_load_data() hook
fs/kernel_read_file: Add file_size output argument
fs/kernel_read_file: Switch buffer size arg to size_t
fs/kernel_read_file: Remove redundant size argument
...


7b540812 13-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
"A decent number of SELinux patches for v5.10, twenty two in total. The
highlights are listed below, but all of the patches pass our test
suite and merge cleanly.

- A number of changes to how the SELinux policy is loaded and managed
inside the kernel with the goal of improving the atomicity of a
SELinux policy load operation.

These changes account for the bulk of the diffstat as well as the
patch count. A special thanks to everyone who contributed patches
and fixes for this work.

- Convert the SELinux policy read-write lock to RCU.

- A tracepoint was added for audited SELinux access control events;
this should help provide a more unified backtrace across kernel and
userspace.

- Allow the removal of security.selinux xattrs when a SELinux policy
is not loaded.

- Enable policy capabilities in SELinux policies created with the
scripts/selinux/mdp tool.

- Provide some "no sooner than" dates for the SELinux checkreqprot
sysfs deprecation"

* tag 'selinux-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (22 commits)
selinux: provide a "no sooner than" date for the checkreqprot removal
selinux: Add helper functions to get and set checkreqprot
selinux: access policycaps with READ_ONCE/WRITE_ONCE
selinux: simplify away security_policydb_len()
selinux: move policy mutex to selinux_state, use in lockdep checks
selinux: fix error handling bugs in security_load_policy()
selinux: convert policy read-write lock to RCU
selinux: delete repeated words in comments
selinux: add basic filtering for audit trace events
selinux: add tracepoint on audited events
selinux: Create new booleans and class dirs out of tree
selinux: Standardize string literal usage for selinuxfs directory names
selinux: Refactor selinuxfs directory populating functions
selinux: Create function for selinuxfs directory cleanup
selinux: permit removing security.selinux xattr before policy load
selinux: fix memdup.cocci warnings
selinux: avoid dereferencing the policy prior to initialization
selinux: fix allocation failure check on newpolicy->sidtab
selinux: refactor changing booleans
selinux: move policy commit after updating selinuxfs
...


99a6740f 13-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'Smack-for-5.10' of git://github.com/cschaufler/smack-next

Pull smack updates from Casey Schaufler:
"Two minor fixes and one performance enhancement to Smack. The
performance improvement is significant and the new code is more like
its counterpart in SELinux.

- Two kernel test robot suggested clean-ups.

- Teach Smack to use the IPv4 netlabel cache. This results in a
12-14% improvement on TCP benchmarks"

* tag 'Smack-for-5.10' of git://github.com/cschaufler/smack-next:
Smack: Remove unnecessary variable initialization
Smack: Fix build when NETWORK_SECMARK is not set
Smack: Use the netlabel cache
Smack: Set socket labels only once
Smack: Consolidate uses of secmark into a function


b274279a 13-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'tomoyo-pr-20201012' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1

Pull tomoyo fix from Tetsuo HandaL
"One patch to make it possible to execute usermode-driver's path"

* tag 'tomoyo-pr-20201012' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: Loosen pathname/domainname validation.


03ca0ec1 11-Aug-2020 Thomas Cedeno <thomascedeno@google.com>

LSM: SafeSetID: Fix warnings reported by test bot

Fix multiple cast-to-union warnings related to casting kuid_t and kgid_t
types to kid_t union type. Also fix incompatible type warning that
arises from accidental omission of "__rcu" qualifier on the struct
setid_ruleset pointer in the argument list for safesetid_file_read().

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Cedeno <thomascedeno@google.com>
Signed-off-by: Micah Morton <mortonm@chromium.org>

5294bac9 16-Jul-2020 Thomas Cedeno <thomascedeno@google.com>

LSM: SafeSetID: Add GID security policy handling

The SafeSetID LSM has functionality for restricting setuid() calls based
on its configured security policies. This patch adds the analogous
functionality for setgid() calls. This is mostly a copy-and-paste change
with some code deduplication, plus slight modifications/name changes to
the policy-rule-related structs (now contain GID rules in addition to
the UID ones) and some type generalization since SafeSetID now needs to
deal with kgid_t and kuid_t types.

Signed-off-by: Thomas Cedeno <thomascedeno@google.com>
Signed-off-by: Micah Morton <mortonm@chromium.org>

39a5101f 13-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
"API:
- Allow DRBG testing through user-space af_alg
- Add tcrypt speed testing support for keyed hashes
- Add type-safe init/exit hooks for ahash

Algorithms:
- Mark arc4 as obsolete and pending for future removal
- Mark anubis, khazad, sead and tea as obsolete
- Improve boot-time xor benchmark
- Add OSCCA SM2 asymmetric cipher algorithm and use it for integrity

Drivers:
- Fixes and enhancement for XTS in caam
- Add support for XIP8001B hwrng in xiphera-trng
- Add RNG and hash support in sun8i-ce/sun8i-ss
- Allow imx-rngc to be used by kernel entropy pool
- Use crypto engine in omap-sham
- Add support for Ingenic X1830 with ingenic"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (205 commits)
X.509: Fix modular build of public_key_sm2
crypto: xor - Remove unused variable count in do_xor_speed
X.509: fix error return value on the failed path
crypto: bcm - Verify GCM/CCM key length in setkey
crypto: qat - drop input parameter from adf_enable_aer()
crypto: qat - fix function parameters descriptions
crypto: atmel-tdes - use semicolons rather than commas to separate statements
crypto: drivers - use semicolons rather than commas to separate statements
hwrng: mxc-rnga - use semicolons rather than commas to separate statements
hwrng: iproc-rng200 - use semicolons rather than commas to separate statements
hwrng: stm32 - use semicolons rather than commas to separate statements
crypto: xor - use ktime for template benchmarking
crypto: xor - defer load time benchmark to a later time
crypto: hisilicon/zip - fix the uninitalized 'curr_qm_qp_num'
crypto: hisilicon/zip - fix the return value when device is busy
crypto: hisilicon/zip - fix zero length input in GZIP decompress
crypto: hisilicon/zip - fix the uncleared debug registers
lib/mpi: Fix unused variable warnings
crypto: x86/poly1305 - Remove assignments with no effect
hwrng: npcm - modify readl to readb
...


85ed13e7 12-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull compat iovec cleanups from Al Viro:
"Christoph's series around import_iovec() and compat variant thereof"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
security/keys: remove compat_keyctl_instantiate_key_iov
mm: remove compat_process_vm_{readv,writev}
fs: remove compat_sys_vmsplice
fs: remove the compat readv/writev syscalls
fs: remove various compat readv/writev helpers
iov_iter: transparently handle compat iovecs in import_iovec
iov_iter: refactor rw_copy_check_uvector and import_iovec
iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
compat.h: fix a spelling error in <linux/compat.h>


e6412f98 12-Oct-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'efi-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI changes from Ingo Molnar:

- Preliminary RISC-V enablement - the bulk of it will arrive via the
RISCV tree.

- Relax decompressed image placement rules for 32-bit ARM

- Add support for passing MOK certificate table contents via a config
table rather than a EFI variable.

- Add support for 18 bit DIMM row IDs in the CPER records.

- Work around broken Dell firmware that passes the entire Boot####
variable contents as the command line

- Add definition of the EFI_MEMORY_CPU_CRYPTO memory attribute so we
can identify it in the memory map listings.

- Don't abort the boot on arm64 if the EFI RNG protocol is available
but returns with an error

- Replace slashes with exclamation marks in efivarfs file names

- Split efi-pstore from the deprecated efivars sysfs code, so we can
disable the latter on !x86.

- Misc fixes, cleanups and updates.

* tag 'efi-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
efi: mokvar: add missing include of asm/early_ioremap.h
efi: efivars: limit availability to X86 builds
efi: remove some false dependencies on CONFIG_EFI_VARS
efi: gsmi: fix false dependency on CONFIG_EFI_VARS
efi: efivars: un-export efivars_sysfs_init()
efi: pstore: move workqueue handling out of efivars
efi: pstore: disentangle from deprecated efivars module
efi: mokvar-table: fix some issues in new code
efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure
efivarfs: Replace invalid slashes with exclamation marks in dentries.
efi: Delete deprecated parameter comments
efi/libstub: Fix missing-prototypes in string.c
efi: Add definition of EFI_MEMORY_CPU_CRYPTO and ability to report it
cper,edac,efi: Memory Error Record: bank group/address and chip id
edac,ghes,cper: Add Row Extension to Memory Error Record
efi/x86: Add a quirk to support command line arguments on Dell EFI firmware
efi/libstub: Add efi_warn and *_once logging helpers
integrity: Load certs from the EFI MOK config table
integrity: Move import of MokListRT certs to a separate routine
efi: Support for MOK variable config table
...


a2075167 09-Sep-2020 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

tomoyo: Loosen pathname/domainname validation.

Since commit e2dc9bf3f5275ca3 ("umd: Transform fork_usermode_blob into
fork_usermode_driver") started calling execve() on a program written in
a local mount which is not connected to mount tree,
tomoyo_realpath_from_path() started returning a pathname in
"$fsname:/$pathname" format which violates TOMOYO's domainname rule that
it must start with "<$namespace>" followed by zero or more repetitions of
pathnames which start with '/'.

Since $fsname must not contain '.' since commit 79c0b2df79eb56fc ("add
filesystem subtype support"), tomoyo_correct_path() can recognize a token
which appears '/' before '.' appears (e.g. proc:/self/exe ) as a pathname
while rejecting a token which appears '.' before '/' appears (e.g.
exec.realpath="/bin/bash" ) as a condition parameter.

Therefore, accept domainnames which contain pathnames which do not start
with '/' but contain '/' before '.' (e.g. <kernel> tmpfs:/bpfilter_umh ).

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

edd61537 05-Oct-2020 Casey Schaufler <casey@schaufler-ca.com>

Smack: Remove unnecessary variable initialization

The initialization of rc in smack_from_netlbl() is pointless.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

0fa8e084 02-Oct-2020 Kees Cook <keescook@chromium.org>

fs/kernel_file_read: Add "offset" arg for partial reads

To perform partial reads, callers of kernel_read_file*() must have a
non-NULL file_size argument and a preallocated buffer. The new "offset"
argument can then be used to seek to specific locations in the file to
fill the buffer to, at most, "buf_size" per call.

Where possible, the LSM hooks can report whether a full file has been
read or not so that the contents can be reasoned about.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201002173828.2099543-14-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

34736dae 02-Oct-2020 Scott Branden <scott.branden@broadcom.com>

IMA: Add support for file reads without contents

When the kernel_read_file LSM hook is called with contents=false, IMA
can appraise the file directly, without requiring a filled buffer. When
such a buffer is available, though, IMA can continue to use it instead
of forcing a double read here.

Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Link: https://lore.kernel.org/lkml/20200706232309.12010-10-scott.branden@broadcom.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-13-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2039bda1 02-Oct-2020 Kees Cook <keescook@chromium.org>

LSM: Add "contents" flag to kernel_read_file hook

As with the kernel_load_data LSM hook, add a "contents" flag to the
kernel_read_file LSM hook that indicates whether the LSM can expect
a matching call to the kernel_post_read_file LSM hook with the full
contents of the file. With the coming addition of partial file read
support for kernel_read_file*() API, the LSM will no longer be able
to always see the entire contents of a file during the read calls.

For cases where the LSM must read examine the complete file contents,
it will need to do so on its own every time the kernel_read_file
hook is called with contents=false (or reject such cases). Adjust all
existing LSMs to retain existing behavior.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-12-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4f2d99b0 02-Oct-2020 Kees Cook <keescook@chromium.org>

firmware_loader: Use security_post_load_data()

Now that security_post_load_data() is wired up, use it instead
of the NULL file argument style of security_post_read_file(),
and update the security_kernel_load_data() call to indicate that a
security_kernel_post_load_data() call is expected.

Wire up the IMA check to match earlier logic. Perhaps a generalized
change to ima_post_load_data() might look something like this:

return process_buffer_measurement(buf, size,
kernel_load_data_id_str(load_id),
read_idmap[load_id] ?: FILE_CHECK,
0, NULL);

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-10-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b64fcae7 02-Oct-2020 Kees Cook <keescook@chromium.org>

LSM: Introduce kernel_post_load_data() hook

There are a few places in the kernel where LSMs would like to have
visibility into the contents of a kernel buffer that has been loaded or
read. While security_kernel_post_read_file() (which includes the
buffer) exists as a pairing for security_kernel_read_file(), no such
hook exists to pair with security_kernel_load_data().

Earlier proposals for just using security_kernel_post_read_file() with a
NULL file argument were rejected (i.e. "file" should always be valid for
the security_..._file hooks, but it appears at least one case was
left in the kernel during earlier refactoring. (This will be fixed in
a subsequent patch.)

Since not all cases of security_kernel_load_data() can have a single
contiguous buffer made available to the LSM hook (e.g. kexec image
segments are separately loaded), there needs to be a way for the LSM to
reason about its expectations of the hook coverage. In order to handle
this, add a "contents" argument to the "kernel_load_data" hook that
indicates if the newly added "kernel_post_load_data" hook will be called
with the full contents once loaded. That way, LSMs requiring full contents
can choose to unilaterally reject "kernel_load_data" with contents=false
(which is effectively the existing hook coverage), but when contents=true
they can allow it and later evaluate the "kernel_post_load_data" hook
once the buffer is loaded.

With this change, LSMs can gain coverage over non-file-backed data loads
(e.g. init_module(2) and firmware userspace helper), which will happen
in subsequent patches.

Additionally prepare IMA to start processing these cases.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

88535288 02-Oct-2020 Kees Cook <keescook@chromium.org>

fs/kernel_read_file: Add file_size output argument

In preparation for adding partial read support, add an optional output
argument to kernel_read_file*() that reports the file size so callers
can reason more easily about their reading progress.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-8-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

113eeb51 02-Oct-2020 Kees Cook <keescook@chromium.org>

fs/kernel_read_file: Switch buffer size arg to size_t

In preparation for further refactoring of kernel_read_file*(), rename
the "max_size" argument to the more accurate "buf_size", and correct
its type to size_t. Add kerndoc to explain the specifics of how the
arguments will be used. Note that with buf_size now size_t, it can no
longer be negative (and was never called with a negative value). Adjust
callers to use it as a "maximum size" when *buf is NULL.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-7-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f7a4f689 02-Oct-2020 Kees Cook <keescook@chromium.org>

fs/kernel_read_file: Remove redundant size argument

In preparation for refactoring kernel_read_file*(), remove the redundant
"size" argument which is not needed: it can be included in the return
code, with callers adjusted. (VFS reads already cannot be larger than
INT_MAX.)

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-6-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b89999d0 02-Oct-2020 Scott Branden <scott.branden@broadcom.com>

fs/kernel_read_file: Split into separate include file

Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h
include file. That header gets pulled in just about everywhere
and doesn't really need functions not related to the general fs interface.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/r/20200706232309.12010-2-scott.branden@broadcom.com
Link: https://lore.kernel.org/r/20201002173828.2099543-4-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c307459b 02-Oct-2020 Kees Cook <keescook@chromium.org>

fs/kernel_read_file: Remove FIRMWARE_PREALLOC_BUFFER enum

FIRMWARE_PREALLOC_BUFFER is a "how", not a "what", and confuses the LSMs
that are interested in filtering between types of things. The "how"
should be an internal detail made uninteresting to the LSMs.

Fixes: a098ecd2fa7d ("firmware: support loading into a pre-allocated buffer")
Fixes: fd90bc559bfb ("ima: based on policy verify firmware signatures (pre-allocated buffer)")
Fixes: 4f0496d8ffa3 ("ima: based on policy warn about loading firmware (pre-allocated buffer)")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201002173828.2099543-2-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5d47b394 24-Sep-2020 Christoph Hellwig <hch@lst.de>

security/keys: remove compat_keyctl_instantiate_key_iov

Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

89cd35c5 24-Sep-2020 Christoph Hellwig <hch@lst.de>

iov_iter: transparently handle compat iovecs in import_iovec

Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

This removes the need for special compat logic in most callers, and
the remaining ones can still be simplified by using __import_iovec
with a bool compat parameter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

0b7e44d3 20-Sep-2020 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

integrity: Asymmetric digsig supports SM2-with-SM3 algorithm

Asymmetric digsig supports SM2-with-SM3 algorithm combination,
so that IMA can also verify SM2's signature data.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: Xufeng Zhang <yunbo.xufeng@linux.alibaba.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Vitaly Chikunov <vt@altlinux.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3ab0a7a0 22-Sep-2020 David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Two minor conflicts:

1) net/ipv4/route.c, adding a new local variable while
moving another local variable and removing it's
initial assignment.

2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
One pretty prints the port mode differently, whilst another
changes the driver to try and obtain the port mode from
the port node rather than the switch node.

Signed-off-by: David S. Miller <davem@davemloft.net>


bf0afe67 22-Sep-2020 Casey Schaufler <casey@schaufler-ca.com>

Smack: Fix build when NETWORK_SECMARK is not set

Use proper conditional compilation for the secmark field in
the network skb.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

aa662fc0 16-Sep-2020 KP Singh <kpsingh@google.com>

ima: Fix NULL pointer dereference in ima_file_hash

ima_file_hash can be called when there is no iint->ima_hash available
even though the inode exists in the integrity cache. It is fairly
common for a file to not have a hash. (e.g. an mknodat, prior to the
file being closed).

Another example where this can happen (suggested by Jann Horn):

Process A does:

while(1) {
unlink("/tmp/imafoo");
fd = open("/tmp/imafoo", O_RDWR|O_CREAT|O_TRUNC, 0700);
if (fd == -1) {
perror("open");
continue;
}
write(fd, "A", 1);
close(fd);
}

and Process B does:

while (1) {
int fd = open("/tmp/imafoo", O_RDONLY);
if (fd == -1)
continue;
char *mapping = mmap(NULL, 0x1000, PROT_READ|PROT_EXEC,
MAP_PRIVATE, fd, 0);
if (mapping != MAP_FAILED)
munmap(mapping, 0x1000);
close(fd);
}

Due to the race to get the iint->mutex between ima_file_hash and
process_measurement iint->ima_hash could still be NULL.

Fixes: 6beea7afcc72 ("ima: add the ability to query the cached hash of a given file")
Signed-off-by: KP Singh <kpsingh@google.com>
Reviewed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

726bd896 04-Sep-2020 Lenny Szubowicz <lszubowi@redhat.com>

integrity: Load certs from the EFI MOK config table

Because of system-specific EFI firmware limitations, EFI volatile
variables may not be capable of holding the required contents of
the Machine Owner Key (MOK) certificate store when the certificate
list grows above some size. Therefore, an EFI boot loader may pass
the MOK certs via a EFI configuration table created specifically for
this purpose to avoid this firmware limitation.

An EFI configuration table is a much more primitive mechanism
compared to EFI variables and is well suited for one-way passage
of static information from a pre-OS environment to the kernel.

This patch adds the support to load certs from the MokListRT
entry in the MOK variable configuration table, if it's present.
The pre-existing support to load certs from the MokListRT EFI
variable remains and is used if the EFI MOK configuration table
isn't present or can't be successfully used.

Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Link: https://lore.kernel.org/r/20200905013107.10457-4-lszubowi@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

38a1f03a 04-Sep-2020 Lenny Szubowicz <lszubowi@redhat.com>

integrity: Move import of MokListRT certs to a separate routine

Move the loading of certs from the UEFI MokListRT into a separate
routine to facilitate additional MokList functionality.

There is no visible functional change as a result of this patch.
Although the UEFI dbx certs are now loaded before the MokList certs,
they are loaded onto different key rings. So the order of the keys
on their respective key rings is the same.

Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/r/20200905013107.10457-3-lszubowi@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

1e484d38 15-Sep-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'fixes-v5.9a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull security layer fix from James Morris:
"A device_cgroup RCU warning fix from Amol Grover"

* tag 'fixes-v5.9a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
device_cgroup: Fix RCU list debugging warning


8861d0af 14-Sep-2020 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

selinux: Add helper functions to get and set checkreqprot

checkreqprot data member in selinux_state struct is accessed directly by
SELinux functions to get and set. This could cause unexpected read or
write access to this data member due to compiler optimizations and/or
compiler's reordering of access to this field.

Add helper functions to get and set checkreqprot data member in
selinux_state struct. These helper functions use READ_ONCE and
WRITE_ONCE macros to ensure atomic read or write of memory for
this data member.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

455b6c91 04-Sep-2020 Roberto Sassu <roberto.sassu@huawei.com>

evm: Check size of security.evm before using it

This patch checks the size for the EVM_IMA_XATTR_DIGSIG and
EVM_XATTR_PORTABLE_DIGSIG types to ensure that the algorithm is read from
the buffer returned by vfs_getxattr_alloc().

Cc: stable@vger.kernel.org # 4.19.x
Fixes: 5feeb61183dde ("evm: Allow non-SHA1 digital signatures")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

4be92db3 04-Sep-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Remove semicolon at the end of ima_get_binary_runtime_size()

This patch removes the unnecessary semicolon at the end of
ima_get_binary_runtime_size().

Cc: stable@vger.kernel.org
Fixes: d158847ae89a2 ("ima: maintain memory size needed for serializing the measurement list")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

60386b85 04-Sep-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Don't ignore errors from crypto_shash_update()

Errors returned by crypto_shash_update() are not checked in
ima_calc_boot_aggregate_tfm() and thus can be overwritten at the next
iteration of the loop. This patch adds a check after calling
crypto_shash_update() and returns immediately if the result is not zero.

Cc: stable@vger.kernel.org
Fixes: 3323eec921efd ("integrity: IMA as an integrity service provider")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

f60c826d 09-Sep-2020 Alex Dewar <alex.dewar90@gmail.com>

ima: Use kmemdup rather than kmalloc+memcpy

Issue identified with Coccinelle.

Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

322dd63c 11-Aug-2020 Casey Schaufler <casey@schaufler-ca.com>

Smack: Use the netlabel cache

Utilize the Netlabel cache mechanism for incoming packet matching.
Refactor the initialization of secattr structures, as it was being
done in two places.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

a2af0318 11-Aug-2020 Casey Schaufler <casey@schaufler-ca.com>

Smack: Set socket labels only once

Refactor the IP send checks so that the netlabel value
is set only when necessary, not on every send. Some functions
get renamed as the changes made the old name misleading.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

36be8129 11-Aug-2020 Casey Schaufler <casey@schaufler-ca.com>

Smack: Consolidate uses of secmark into a function

Add a function smack_from_skb() that returns the Smack label
identified by a network secmark. Replace the explicit uses of
the secmark with this function.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

e8ba53d0 10-Sep-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

selinux: access policycaps with READ_ONCE/WRITE_ONCE

Use READ_ONCE/WRITE_ONCE for all accesses to the
selinux_state.policycaps booleans to prevent compiler
mischief.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

8c2f516c 04-Sep-2020 Bruno Meneguele <bmeneg@redhat.com>

integrity: include keyring name for unknown key request

Depending on the IMA policy rule a key may be searched for in multiple
keyrings (e.g. .ima and .platform) and possibly not found. This patch
improves feedback by including the keyring "description" (name) in the
error message.

Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>
[zohar@linux.ibm.com: updated commit message]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

e4d7e2df 04-Sep-2020 Bruno Meneguele <bmeneg@redhat.com>

ima: limit secure boot feedback scope for appraise

Only emit an unknown/invalid message when setting the IMA appraise mode
to anything other than "enforce", when secureboot is enabled.

Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>
[zohar@linux.ibm.com: updated commit message]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

7fe2bb7e 04-Sep-2020 Bruno Meneguele <bmeneg@redhat.com>

integrity: invalid kernel parameters feedback

Don't silently ignore unknown or invalid ima_{policy,appraise,hash} and evm
kernel boot command line options.

Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

4afb28ab 04-Sep-2020 Bruno Meneguele <bmeneg@redhat.com>

ima: add check for enforced appraise option

The "enforce" string is allowed as an option for ima_appraise= kernel
paramenter per kernel-paramenters.txt and should be considered on the
parameter setup checking as a matter of completeness. Also it allows futher
checking on the options being passed by the user.

Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

44a8c4f3 04-Sep-2020 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

We got slightly different patches removing a double word
in a comment in net/ipv4/raw.c - picked the version from net.

Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
values instead of VNIC login response buffer (following what
commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login
response buffer") did).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


e44f1287 24-Aug-2020 Denis Efremov <efremov@linux.com>

integrity: Use current_uid() in integrity_audit_message()

Modify integrity_audit_message() to use current_uid().

Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

48ce1ddc 11-Aug-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Fail rule parsing when asymmetric key measurement isn't supportable

Measuring keys is currently only supported for asymmetric keys. In the
future, this might change.

For now, the "func=KEY_CHECK" and "keyrings=" options are only
appropriate when CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS is enabled. Make
this clear at policy load so that IMA policy authors don't assume that
these policy language constructs are supported.

Fixes: 2b60c0ecedf8 ("IMA: Read keyrings= option from the IMA policy")
Fixes: 5808611cccb2 ("IMA: Add KEY_CHECK func to measure keys")
Suggested-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

176377d9 11-Aug-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Pre-parse the list of keyrings in a KEY_CHECK rule

The ima_keyrings buffer was used as a work buffer for strsep()-based
parsing of the "keyrings=" option of an IMA policy rule. This parsing
was re-performed each time an asymmetric key was added to a kernel
keyring for each loaded policy rule that contained a "keyrings=" option.

An example rule specifying this option is:

measure func=KEY_CHECK keyrings=a|b|c

The rule says to measure asymmetric keys added to any of the kernel
keyrings named "a", "b", or "c". The size of the buffer size was
equal to the size of the largest "keyrings=" value seen in a previously
loaded rule (5 + 1 for the NUL-terminator in the previous example) and
the buffer was pre-allocated at the time of policy load.

The pre-allocated buffer approach suffered from a couple bugs:

1) There was no locking around the use of the buffer so concurrent key
add operations, to two different keyrings, would result in the
strsep() loop of ima_match_keyring() to modify the buffer at the same
time. This resulted in unexpected results from ima_match_keyring()
and, therefore, could cause unintended keys to be measured or keys to
not be measured when IMA policy intended for them to be measured.

2) If the kstrdup() that initialized entry->keyrings in ima_parse_rule()
failed, the ima_keyrings buffer was freed and set to NULL even when a
valid KEY_CHECK rule was previously loaded. The next KEY_CHECK event
would trigger a call to strcpy() with a NULL destination pointer and
crash the kernel.

Remove the need for a pre-allocated global buffer by parsing the list of
keyrings in a KEY_CHECK rule at the time of policy load. The
ima_rule_entry will contain an array of string pointers which point to
the name of each keyring specified in the rule. No string processing
needs to happen at the time of asymmetric key add so iterating through
the list and doing a string comparison is all that's required at the
time of policy check.

In the process of changing how the "keyrings=" policy option is handled,
a couple additional bugs were fixed:

1) The rule parser accepted rules containing invalid "keyrings=" values
such as "a|b||c", "a|b|", or simply "|".

2) The /sys/kernel/security/ima/policy file did not display the entire
"keyrings=" value if the list of keyrings was longer than what could
fit in the fixed size tbuf buffer in ima_policy_show().

Fixes: 5c7bac9fb2c5 ("IMA: pre-allocate buffer to hold keyrings string")
Fixes: 2b60c0ecedf8 ("IMA: Read keyrings= option from the IMA policy")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

66ccd256 27-Aug-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: simplify away security_policydb_len()

Remove the security_policydb_len() calls from sel_open_policy() and
instead update the inode size from the size returned from
security_read_policy().

Since after this change security_policydb_len() is only called from
security_load_policy(), remove it entirely and just open-code it there.

Also, since security_load_policy() is always called with policy_mutex
held, make it dereference the policy pointer directly and drop the
unnecessary RCU locking.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

9ff9abc4 26-Aug-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

selinux: move policy mutex to selinux_state, use in lockdep checks

Move the mutex used to synchronize policy changes (reloads and setting
of booleans) from selinux_fs_info to selinux_state and use it in
lockdep checks for rcu_dereference_protected() calls in the security
server functions. This makes the dependency on the mutex explicit
in the code rather than relying on comments.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

0256b0aa 26-Aug-2020 Dan Carpenter <dan.carpenter@oracle.com>

selinux: fix error handling bugs in security_load_policy()

There are a few bugs in the error handling for security_load_policy().

1) If the newpolicy->sidtab allocation fails then it leads to a NULL
dereference. Also the error code was not set to -ENOMEM on that
path.
2) If policydb_read() failed then we call policydb_destroy() twice
which meands we call kvfree(p->sym_val_to_name[i]) twice.
3) If policydb_load_isids() failed then we call sidtab_destroy() twice
and that results in a double free in the sidtab_destroy_tree()
function because entry.ptr_inner and entry.ptr_leaf are not set to
NULL.

One thing that makes this code nice to deal with is that none of the
functions return partially allocated data. In other words, the
policydb_read() either allocates everything successfully or it frees
all the data it allocates. It never returns a mix of allocated and
not allocated data.

I re-wrote this to only free the successfully allocated data which
avoids the double frees. I also re-ordered selinux_policy_free() so
it's in the reverse order of the allocation function.

Fixes: c7c556f1e81b ("selinux: refactor changing booleans")
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[PM: partially merged by hand due to merge fuzz]
Signed-off-by: Paul Moore <paul@paul-moore.com>

8ea63684 25-Aug-2020 KP Singh <kpsingh@google.com>

bpf: Implement bpf_local_storage for inodes

Similar to bpf_local_storage for sockets, add local storage for inodes.
The life-cycle of storage is managed with the life-cycle of the inode.
i.e. the storage is destroyed along with the owning inode.

The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the
security blob which are now stackable and can co-exist with other LSMs.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200825182919.1118197-6-kpsingh@chromium.org

1b8b31a2 19-Aug-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

selinux: convert policy read-write lock to RCU

Convert the policy read-write lock to RCU. This is significantly
simplified by the earlier work to encapsulate the policy data
structures and refactor the policy load and boolean setting logic.
Move the latest_granting sequence number into the selinux_policy
structure so that it can be updated atomically with the policy.
Since removing the policy rwlock and moving latest_granting reduces
the selinux_ss structure to nothing more than a wrapper around the
selinux_policy pointer, get rid of the extra layer of indirection.

At present this change merely passes a hardcoded 1 to
rcu_dereference_check() in the cases where we know we do not need to
take rcu_read_lock(), with the preceding comment explaining why.
Alternatively we could pass fsi->mutex down from selinuxfs and
apply a lockdep check on it instead.

Based in part on earlier attempts to convert the policy rwlock
to RCU by Kaigai Kohei [1] and by Peter Enderborg [2].

[1] https://lore.kernel.org/selinux/6e2f9128-e191-ebb3-0e87-74bfccb0767f@tycho.nsa.gov/
[2] https://lore.kernel.org/selinux/20180530141104.28569-1-peter.enderborg@sony.com/

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

c76a2f9e 07-Aug-2020 Randy Dunlap <rdunlap@infradead.org>

selinux: delete repeated words in comments

Drop a repeated word in comments.
{open, is, then}

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: selinux@vger.kernel.org
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
[PM: fix subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>

df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>

30969bc8 21-Aug-2020 Peter Enderborg <peter.enderborg@sony.com>

selinux: add basic filtering for audit trace events

This patch adds further attributes to the event. These attributes are
helpful to understand the context of the message and can be used
to filter the events.

There are three common items. Source context, target context and tclass.
There are also items from the outcome of operation performed.

An event is similar to:
<...>-1309 [002] .... 6346.691689: selinux_audited:
requested=0x4000000 denied=0x4000000 audited=0x4000000
result=-13
scontext=system_u:system_r:cupsd_t:s0-s0:c0.c1023
tcontext=system_u:object_r:bin_t:s0 tclass=file

With systems where many denials are occurring, it is useful to apply a
filter. The filtering is a set of logic that is inserted with
the filter file. Example:
echo "tclass==\"file\" " > events/avc/selinux_audited/filter

This adds that we only get tclass=file.

The trace can also have extra properties. Adding the user stack
can be done with
echo 1 > options/userstacktrace

Now the output will be
runcon-1365 [003] .... 6960.955530: selinux_audited:
requested=0x4000000 denied=0x4000000 audited=0x4000000
result=-13
scontext=system_u:system_r:cupsd_t:s0-s0:c0.c1023
tcontext=system_u:object_r:bin_t:s0 tclass=file
runcon-1365 [003] .... 6960.955560: <user stack trace>
=> <00007f325b4ce45b>
=> <00005607093efa57>

Signed-off-by: Peter Enderborg <peter.enderborg@sony.com>
Reviewed-by: Thiébaud Weksteen <tweek@google.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

dd816621 21-Aug-2020 Thiébaud Weksteen <tweek@google.com>

selinux: add tracepoint on audited events

The audit data currently captures which process and which target
is responsible for a denial. There is no data on where exactly in the
process that call occurred. Debugging can be made easier by being able to
reconstruct the unified kernel and userland stack traces [1]. Add a
tracepoint on the SELinux denials which can then be used by userland
(i.e. perf).

Although this patch could manually be added by each OS developer to
trouble shoot a denial, adding it to the kernel streamlines the
developers workflow.

It is possible to use perf for monitoring the event:
# perf record -e avc:selinux_audited -g -a
^C
# perf report -g
[...]
6.40% 6.40% audited=800000 tclass=4
|
__libc_start_main
|
|--4.60%--__GI___ioctl
| entry_SYSCALL_64
| do_syscall_64
| __x64_sys_ioctl
| ksys_ioctl
| binder_ioctl
| binder_set_nice
| can_nice
| capable
| security_capable
| cred_has_capability.isra.0
| slow_avc_audit
| common_lsm_audit
| avc_audit_post_callback
| avc_audit_post_callback
|

It is also possible to use the ftrace interface:
# echo 1 > /sys/kernel/debug/tracing/events/avc/selinux_audited/enable
# cat /sys/kernel/debug/tracing/trace
tracer: nop
entries-in-buffer/entries-written: 1/1 #P:8
[...]
dmesg-3624 [001] 13072.325358: selinux_denied: audited=800000 tclass=4

The tclass value can be mapped to a class by searching
security/selinux/flask.h. The audited value is a bit field of the
permissions described in security/selinux/av_permissions.h for the
corresponding class.

[1] https://source.android.com/devices/tech/debug/native_stack_dump

Signed-off-by: Thiébaud Weksteen <tweek@google.com>
Suggested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Peter Enderborg <peter.enderborg@sony.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

0eea6091 19-Aug-2020 Daniel Burgener <dburgener@linux.microsoft.com>

selinux: Create new booleans and class dirs out of tree

In order to avoid concurrency issues around selinuxfs resource availability
during policy load, we first create new directories out of tree for
reloaded resources, then swap them in, and finally delete the old versions.

This fix focuses on concurrency in each of the two subtrees swapped, and
not concurrency between the trees. This means that it is still possible
that subsequent reads to eg the booleans directory and the class directory
during a policy load could see the old state for one and the new for the other.
The problem of ensuring that policy loads are fully atomic from the perspective
of userspace is larger than what is dealt with here. This commit focuses on
ensuring that the directories contents always match either the new or the old
policy state from the perspective of userspace.

In the previous implementation, on policy load /sys/fs/selinux is updated
by deleting the previous contents of
/sys/fs/selinux/{class,booleans} and then recreating them. This means
that there is a period of time when the contents of these directories do not
exist which can cause race conditions as userspace relies on them for
information about the policy. In addition, it means that error recovery in
the event of failure is challenging.

In order to demonstrate the race condition that this series fixes, you
can use the following commands:

while true; do cat /sys/fs/selinux/class/service/perms/status
>/dev/null; done &
while true; do load_policy; done;

In the existing code, this will display errors fairly often as the class
lookup fails. (In normal operation from systemd, this would result in a
permission check which would be allowed or denied based on policy settings
around unknown object classes.) After applying this patch series you
should expect to no longer see such error messages.

Signed-off-by: Daniel Burgener <dburgener@linux.microsoft.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

613ba187 19-Aug-2020 Daniel Burgener <dburgener@linux.microsoft.com>

selinux: Standardize string literal usage for selinuxfs directory names

Switch class and policy_capabilities directory names to be referred to with
global constants, consistent with booleans directory name. This will allow
for easy consistency of naming in future development.

Signed-off-by: Daniel Burgener <dburgener@linux.microsoft.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

66ec384a 19-Aug-2020 Daniel Burgener <dburgener@linux.microsoft.com>

selinux: Refactor selinuxfs directory populating functions

Make sel_make_bools and sel_make_classes take the specific elements of
selinux_fs_info that they need rather than the entire struct.

This will allow a future patch to pass temporary elements that are not in
the selinux_fs_info struct to these functions so that the original elements
can be preserved until we are ready to perform the switch over.

Signed-off-by: Daniel Burgener <dburgener@linux.microsoft.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

aeecf4a3 19-Aug-2020 Daniel Burgener <dburgener@linux.microsoft.com>

selinux: Create function for selinuxfs directory cleanup

Separating the cleanup from the creation will simplify two things in
future patches in this series. First, the creation can be made generic,
to create directories not tied to the selinux_fs_info structure. Second,
we will ultimately want to reorder creation and deletion so that the
deletions aren't performed until the new directory structures have already
been moved into place.

Signed-off-by: Daniel Burgener <dburgener@linux.microsoft.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

9530a3e0 20-Aug-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

selinux: permit removing security.selinux xattr before policy load

Currently SELinux denies attempts to remove the security.selinux xattr
always, even when permissive or no policy is loaded. This was originally
motivated by the view that all files should be labeled, even if that label
is unlabeled_t, and we shouldn't permit files that were once labeled to
have their labels removed entirely. This however prevents removing
SELinux xattrs in the case where one "disables" SELinux by not loading
a policy (e.g. a system where runtime disable is removed and selinux=0
was not specified). Allow removing the xattr before SELinux is
initialized. We could conceivably permit it even after initialization
if permissive, or introduce a separate permission check here.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

bc62d68e 06-Apr-2020 Amol Grover <frextrite@gmail.com>

device_cgroup: Fix RCU list debugging warning

exceptions may be traversed using list_for_each_entry_rcu()
outside of an RCU read side critical section BUT under the
protection of decgroup_mutex. Hence add the corresponding
lockdep expression to fix the following false-positive
warning:

[ 2.304417] =============================
[ 2.304418] WARNING: suspicious RCU usage
[ 2.304420] 5.5.4-stable #17 Tainted: G E
[ 2.304422] -----------------------------
[ 2.304424] security/device_cgroup.c:355 RCU-list traversed in non-reader section!!

Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: James Morris <jmorris@namei.org>

87922931 19-Aug-2020 kernel test robot <lkp@intel.com>

selinux: fix memdup.cocci warnings

Use kmemdup rather than duplicating its implementation

Generated by: scripts/coccinelle/api/memdup.cocci

Fixes: c7c556f1e81b ("selinux: refactor changing booleans")
CC: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

37ea433c 19-Aug-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

selinux: avoid dereferencing the policy prior to initialization

Certain SELinux security server functions (e.g. security_port_sid,
called during bind) were not explicitly testing to see if SELinux
has been initialized (i.e. initial policy loaded) and handling
the no-policy-loaded case. In the past this happened to work
because the policydb was statically allocated and could always
be accessed, but with the recent encapsulation of policy state
and conversion to dynamic allocation, we can no longer access
the policy state prior to initialization. Add a test of
!selinux_initialized(state) to all of the exported functions that
were missing them and handle appropriately.

Fixes: 461698026ffa ("selinux: encapsulate policy state, refactor policy load")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

69ea651c 19-Aug-2020 Colin Ian King <colin.king@canonical.com>

selinux: fix allocation failure check on newpolicy->sidtab

The allocation check of newpolicy->sidtab is null checking if
newpolicy is null and not newpolicy->sidtab. Fix this.

Addresses-Coverity: ("Logically dead code")
Fixes: c7c556f1e81b ("selinux: refactor changing booleans")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

c7c556f1 11-Aug-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

selinux: refactor changing booleans

Refactor the logic for changing SELinux policy booleans in a similar
manner to the refactoring of policy load, thereby reducing the
size of the critical section when the policy write-lock is held
and making it easier to convert the policy rwlock to RCU in the
future. Instead of directly modifying the policydb in place, modify
a copy and then swap it into place through a single pointer update.
Only fully copy the portions of the policydb that are affected by
boolean changes to avoid the full cost of a deep policydb copy.
Introduce another level of indirection for the sidtab since changing
booleans does not require updating the sidtab, unlike policy load.
While we are here, create a common helper for notifying
other kernel components and userspace of a policy change and call it
from both security_set_bools() and selinux_policy_commit().

Based on an old (2004) patch by Kaigai Kohei [1] to convert the policy
rwlock to RCU that was deferred at the time since it did not
significantly improve performance and introduced complexity. Peter
Enderborg later submitted a patch series to convert to RCU [2] that
would have made changing booleans a much more expensive operation
by requiring a full policydb_write();policydb_read(); sequence to
deep copy the entire policydb and also had concerns regarding
atomic allocations.

This change is now simplified by the earlier work to encapsulate
policy state in the selinux_policy struct and to refactor
policy load. After this change, the last major obstacle to
converting the policy rwlock to RCU is likely the sidtab live
convert support.

[1] https://lore.kernel.org/selinux/6e2f9128-e191-ebb3-0e87-74bfccb0767f@tycho.nsa.gov/
[2] https://lore.kernel.org/selinux/20180530141104.28569-1-peter.enderborg@sony.com/

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

02a52c5c 07-Aug-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

selinux: move policy commit after updating selinuxfs

With the refactoring of the policy load logic in the security
server from the previous change, it is now possible to split out
the committing of the new policy from security_load_policy() and
perform it only after successful updating of selinuxfs. Change
security_load_policy() to return the newly populated policy
data structures to the caller, export selinux_policy_commit()
for external callers, and introduce selinux_policy_cancel() to
provide a way to cancel the policy load in the event of an error
during updating of the selinuxfs directory tree. Further, rework
the interfaces used by selinuxfs to get information from the policy
when creating the new directory tree to take and act upon the
new policy data structure rather than the current/active policy.
Update selinuxfs to use these updated and new interfaces. While
we are here, stop re-creating the policy_capabilities directory
on each policy load since it does not depend on the policy, and
stop trying to create the booleans and classes directories during
the initial creation of selinuxfs since no information is available
until first policy load.

After this change, a failure while updating the booleans and class
directories will cause the entire policy load to be canceled, leaving
the original policy intact, and policy load notifications to userspace
will only happen after a successful completion of updating those
directories. This does not (yet) provide full atomicity with respect
to the updating of the directory trees themselves.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

46169802 07-Aug-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

selinux: encapsulate policy state, refactor policy load

Encapsulate the policy state in its own structure (struct
selinux_policy) that is separately allocated but referenced from the
selinux_ss structure. The policy state includes the SID table
(particularly the context structures), the policy database, and the
mapping between the kernel classes/permissions and the policy values.
Refactor the security server portion of the policy load logic to
cleanly separate loading of the new structures from committing the new
policy. Unify the initial policy load and reload code paths as much
as possible, avoiding duplicated code. Make sure we are taking the
policy read-lock prior to any dereferencing of the policy. Move the
copying of the policy capability booleans into the state structure
outside of the policy write-lock because they are separate from the
policy and are read outside of any policy lock; possibly they should
be using at least READ_ONCE/WRITE_ONCE or smp_load_acquire/store_release.

These changes simplify the policy loading logic, reduce the size of
the critical section while holding the policy write-lock, and should
facilitate future changes to e.g. refactor the entire policy reload
logic including the selinuxfs code to make the updating of the policy
and the selinuxfs directory tree atomic and/or to convert the policy
read-write lock to RCU.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

339949be 06-Aug-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

scripts/selinux,selinux: update mdp to enable policy capabilities

Presently mdp does not enable any SELinux policy capabilities
in the dummy policy it generates. Thus, policies derived from
it will by default lack various features commonly used in modern
policies such as open permission, extended socket classes, network
peer controls, etc. Split the policy capability definitions out into
their own headers so that we can include them into mdp without pulling in
other kernel headers and extend mdp generate policycap statements for the
policy capabilities known to the kernel. Policy authors may wish to
selectively remove some of these from the generated policy.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

9ad57f6d 12-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

- most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
memory-hotplug, cleanups, uaccess, migration, gup, pagemap),

- various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
exec, kdump, rapidio, panic, kcov, kgdb, ipc).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits)
mm/gup: remove task_struct pointer for all gup code
mm: clean up the last pieces of page fault accountings
mm/xtensa: use general page fault accounting
mm/x86: use general page fault accounting
mm/sparc64: use general page fault accounting
mm/sparc32: use general page fault accounting
mm/sh: use general page fault accounting
mm/s390: use general page fault accounting
mm/riscv: use general page fault accounting
mm/powerpc: use general page fault accounting
mm/parisc: use general page fault accounting
mm/openrisc: use general page fault accounting
mm/nios2: use general page fault accounting
mm/nds32: use general page fault accounting
mm/mips: use general page fault accounting
mm/microblaze: use general page fault accounting
mm/m68k: use general page fault accounting
mm/ia64: use general page fault accounting
mm/hexagon: use general page fault accounting
mm/csky: use general page fault accounting
...


64019a2e 11-Aug-2020 Peter Xu <peterx@redhat.com>

mm/gup: remove task_struct pointer for all gup code

After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more. Remove that parameter in the whole gup
stack.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ce13266d 11-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull security subsystem updates from James Morris:
"A couple of minor documentation updates only for this release"

* tag 'for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
LSM: drop duplicated words in header file comments
Replace HTTP links with HTTPS ones: security


453431a5 07-Aug-2020 Waiman Long <longman@redhat.com>

mm, treewide: rename kzfree() to kfree_sensitive()

As said by Linus:

A symmetric naming is only helpful if it implies symmetries in use.
Otherwise it's actively misleading.

In "kzalloc()", the z is meaningful and an important part of what the
caller wants.

In "kzfree()", the z is actively detrimental, because maybe in the
future we really _might_ want to use that "memfill(0xdeadbeef)" or
something. The "zero" part of the interface isn't even _relevant_.

The main reason that kzfree() exists is to clear sensitive information
that should not be leaked to other future users of the same memory
objects.

Rename kzfree() to kfree_sensitive() to follow the example of the recently
added kvfree_sensitive() and make the intention of the API more explicit.
In addition, memzero_explicit() is used to clear the memory to make sure
that it won't get optimized away by the compiler.

The renaming is done by using the command sequence:

git grep -w --name-only kzfree |\
xargs sed -i 's/kzfree/kfree_sensitive/'

followed by some editing of the kfree_sensitive() kerneldoc and adding
a kzfree backward compatibility macro in slab.h.

[akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
[akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c9fecf50 05-Jul-2020 Alexander A. Klimov <grandmaster@al2klimov.de>

Replace HTTP links with HTTPS ones: security

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>

4cec9293 06-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
"The nicest change is the IMA policy rule checking. The other changes
include allowing the kexec boot cmdline line measure policy rules to
be defined in terms of the inode associated with the kexec kernel
image, making the IMA_APPRAISE_BOOTPARAM, which governs the IMA
appraise mode (log, fix, enforce), a runtime decision based on the
secure boot mode of the system, and including errno in the audit log"

* tag 'integrity-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: remove redundant initialization of variable ret
ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime
ima: AppArmor satisfies the audit rule requirements
ima: Rename internal filter rule functions
ima: Support additional conditionals in the KEXEC_CMDLINE hook function
ima: Use the common function to detect LSM conditionals in a rule
ima: Move comprehensive rule validation checks out of the token parser
ima: Use correct type for the args_p member of ima_rule_entry.lsm elements
ima: Shallow copy the args_p member of ima_rule_entry.lsm elements
ima: Fail rule parsing when appraise_flag=blacklist is unsupportable
ima: Fail rule parsing when the KEY_CHECK hook is combined with an invalid cond
ima: Fail rule parsing when the KEXEC_CMDLINE hook is combined with an invalid cond
ima: Fail rule parsing when buffer hook functions have an invalid action
ima: Free the entire rule if it fails to parse
ima: Free the entire rule when deleting a list of rules
ima: Have the LSM free its audit rule
IMA: Add audit log for failure conditions
integrity: Add errno field in audit message


bfdd5aaa 06-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'Smack-for-5.9' of git://github.com/cschaufler/smack-next

Pull smack updates from Casey Schaufler:
"Minor fixes to Smack for the v5.9 release.

All were found by automated checkers and have straightforward
resolution"

* tag 'Smack-for-5.9' of git://github.com/cschaufler/smack-next:
Smack: prevent underflow in smk_set_cipso()
Smack: fix another vsscanf out of bounds
Smack: fix use-after-free in smk_write_relabel_self()


74858abb 04-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull checkpoint-restore updates from Christian Brauner:
"This enables unprivileged checkpoint/restore of processes.

Given that this work has been going on for quite some time the first
sentence in this summary is hopefully more exciting than the actual
final code changes required. Unprivileged checkpoint/restore has seen
a frequent increase in interest over the last two years and has thus
been one of the main topics for the combined containers &
checkpoint/restore microconference since at least 2018 (cf. [1]).

Here are just the three most frequent use-cases that were brought forward:

- The JVM developers are integrating checkpoint/restore into a Java
VM to significantly decrease the startup time.

- In high-performance computing environment a resource manager will
typically be distributing jobs where users are always running as
non-root. Long-running and "large" processes with significant
startup times are supposed to be checkpointed and restored with
CRIU.

- Container migration as a non-root user.

In all of these scenarios it is either desirable or required to run
without CAP_SYS_ADMIN. The userspace implementation of
checkpoint/restore CRIU already has the pull request for supporting
unprivileged checkpoint/restore up (cf. [2]).

To enable unprivileged checkpoint/restore a new dedicated capability
CAP_CHECKPOINT_RESTORE is introduced. This solution has last been
discussed in 2019 in a talk by Google at Linux Plumbers (cf. [1]
"Update on Task Migration at Google Using CRIU") with Adrian and
Nicolas providing the implementation now over the last months. In
essence, this allows the CRIU binary to be installed with the
CAP_CHECKPOINT_RESTORE vfs capability set thereby enabling
unprivileged users to restore processes.

To make this possible the following permissions are altered:

- Selecting a specific PID via clone3() set_tid relaxed from userns
CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.

- Selecting a specific PID via /proc/sys/kernel/ns_last_pid relaxed
from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.

- Accessing /proc/pid/map_files relaxed from init userns
CAP_SYS_ADMIN to init userns CAP_CHECKPOINT_RESTORE.

- Changing /proc/self/exe from userns CAP_SYS_ADMIN to userns
CAP_CHECKPOINT_RESTORE.

Of these four changes the /proc/self/exe change deserves a few words
because the reasoning behind even restricting /proc/self/exe changes
in the first place is just full of historical quirks and tracking this
down was a questionable version of fun that I'd like to spare others.

In short, it is trivial to change /proc/self/exe as an unprivileged
user, i.e. without userns CAP_SYS_ADMIN right now. Either via ptrace()
or by simply intercepting the elf loader in userspace during exec.
Nicolas was nice enough to even provide a POC for the latter (cf. [3])
to illustrate this fact.

The original patchset which introduced PR_SET_MM_MAP had no
permissions around changing the exe link. They too argued that it is
trivial to spoof the exe link already which is true. The argument
brought up against this was that the Tomoyo LSM uses the exe link in
tomoyo_manager() to detect whether the calling process is a policy
manager. This caused changing the exe links to be guarded by userns
CAP_SYS_ADMIN.

All in all this rather seems like a "better guard it with something
rather than nothing" argument which imho doesn't qualify as a great
security policy. Again, because spoofing the exe link is possible for
the calling process so even if this were security relevant it was
broken back then and would be broken today. So technically, dropping
all permissions around changing the exe link would probably be
possible and would send a clearer message to any userspace that relies
on /proc/self/exe for security reasons that they should stop doing
this but for now we're only relaxing the exe link permissions from
userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE.

There's a final uapi change in here. Changing the exe link used to
accidently return EINVAL when the caller lacked the necessary
permissions instead of the more correct EPERM. This pr contains a
commit fixing this. I assume that userspace won't notice or care and
if they do I will revert this commit. But since we are changing the
permissions anyway it seems like a good opportunity to try this fix.

With these changes merged unprivileged checkpoint/restore will be
possible and has already been tested by various users"

[1] LPC 2018
1. "Task Migration at Google Using CRIU"
https://www.youtube.com/watch?v=yI_1cuhoDgA&t=12095
2. "Securely Migrating Untrusted Workloads with CRIU"
https://www.youtube.com/watch?v=yI_1cuhoDgA&t=14400
LPC 2019
1. "CRIU and the PID dance"
https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=2m48s
2. "Update on Task Migration at Google Using CRIU"
https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=1h2m8s

[2] https://github.com/checkpoint-restore/criu/pull/1155

[3] https://github.com/nviennot/run_as_exe

* tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests: add clone3() CAP_CHECKPOINT_RESTORE test
prctl: exe link permission error changed from -EINVAL to -EPERM
prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe
proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE
pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid
pid: use checkpoint_restore_ns_capable() for set_tid
capabilities: Introduce CAP_CHECKPOINT_RESTORE


3950e975 04-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull execve updates from Eric Biederman:
"During the development of v5.7 I ran into bugs and quality of
implementation issues related to exec that could not be easily fixed
because of the way exec is implemented. So I have been diggin into
exec and cleaning up what I can.

This cycle I have been looking at different ideas and different
implementations to see what is possible to improve exec, and cleaning
the way exec interfaces with in kernel users. Only cleaning up the
interfaces of exec with rest of the kernel has managed to stabalize
and make it through review in time for v5.9-rc1 resulting in 2 sets of
changes this cycle.

- Implement kernel_execve

- Make the user mode driver code a better citizen

With kernel_execve the code size got a little larger as the copying of
parameters from userspace and copying of parameters from userspace is
now separate. The good news is kernel threads no longer need to play
games with set_fs to use exec. Which when combined with the rest of
Christophs set_fs changes should security bugs with set_fs much more
difficult"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
exec: Implement kernel_execve
exec: Factor bprm_stack_limits out of prepare_arg_pages
exec: Factor bprm_execve out of do_execve_common
exec: Move bprm_mm_init into alloc_bprm
exec: Move initialization of bprm->filename into alloc_bprm
exec: Factor out alloc_bprm
exec: Remove unnecessary spaces from binfmts.h
umd: Stop using split_argv
umd: Remove exit_umh
bpfilter: Take advantage of the facilities of struct pid
exit: Factor thread_group_exited out of pidfd_poll
umd: Track user space drivers with struct pid
bpfilter: Move bpfilter_umh back into init data
exec: Remove do_execve_file
umh: Stop calling do_execve_file
umd: Transform fork_usermode_blob into fork_usermode_driver
umd: Rename umd_info.cmdline umd_info.driver_name
umd: For clarity rename umh_info umd_info
umh: Separate the user mode driver and the user mode helper support
umh: Remove call_usermodehelper_setup_file.
...


fd76a74d 04-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
"Aside from some smaller bug fixes, here are the highlights:

- add a new backlog wait metric to the audit status message, this is
intended to help admins determine how long processes have been
waiting for the audit backlog queue to clear

- generate audit records for nftables configuration changes

- generate CWD audit records for for the relevant LSM audit records"

* tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: report audit wait metric in audit status reply
audit: purge audit_log_string from the intra-kernel audit API
audit: issue CWD record to accompany LSM_AUDIT_DATA_* records
audit: use the proper gfp flags in the audit_log_nfcfg() calls
audit: remove unused !CONFIG_AUDITSYSCALL __audit_inode* stubs
audit: add gfp parameter to audit_log_nfcfg
audit: log nftables configuration change events
audit: Use struct_size() helper in alloc_chunk


49e917de 04-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
"Beyond the usual smattering of bug fixes, we've got three small
improvements worth highlighting:

- improved SELinux policy symbol table performance due to a reworking
of the insert and search functions

- allow reading of SELinux labels before the policy is loaded,
allowing for some more "exotic" initramfs approaches

- improved checking an error reporting about process
class/permissions during SELinux policy load"

* tag 'selinux-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: complete the inlining of hashtab functions
selinux: prepare for inlining of hashtab functions
selinux: specialize symtab insert and search functions
selinux: Fix spelling mistakes in the comments
selinux: fixed a checkpatch warning with the sizeof macro
selinux: log error messages on required process class / permissions
scripts/selinux/mdp: fix initial SID handling
selinux: allow reading labels before policy is loaded


5b5d3be5 04-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'var-init-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull automatic variable initialization updates from Kees Cook:
"This adds the "zero" init option from Clang, which is being used
widely in production builds of Android and Chrome OS (though it also
keeps the "pattern" init, which is better for debug builds).

- Introduce CONFIG_INIT_STACK_ALL_ZERO (Alexander Potapenko)"

* tag 'var-init-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
security: allow using Clang's zero initialization for stack variables


382625d0 03-Aug-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block

Pull core block updates from Jens Axboe:
"Good amount of cleanups and tech debt removals in here, and as a
result, the diffstat shows a nice net reduction in code.

- Softirq completion cleanups (Christoph)

- Stop using ->queuedata (Christoph)

- Cleanup bd claiming (Christoph)

- Use check_events, moving away from the legacy media change
(Christoph)

- Use inode i_blkbits consistently (Christoph)

- Remove old unused writeback congestion bits (Christoph)

- Cleanup/unify submission path (Christoph)

- Use bio_uninit consistently, instead of bio_disassociate_blkg
(Christoph)

- sbitmap cleared bits handling (John)

- Request merging blktrace event addition (Jan)

- sysfs add/remove race fixes (Luis)

- blk-mq tag fixes/optimizations (Ming)

- Duplicate words in comments (Randy)

- Flush deferral cleanup (Yufen)

- IO context locking/retry fixes (John)

- struct_size() usage (Gustavo)

- blk-iocost fixes (Chengming)

- blk-cgroup IO stats fixes (Boris)

- Various little fixes"

* tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
block: blk-timeout: delete duplicated word
block: blk-mq-sched: delete duplicated word
block: blk-mq: delete duplicated word
block: genhd: delete duplicated words
block: elevator: delete duplicated word and fix typos
block: bio: delete duplicated words
block: bfq-iosched: fix duplicated word
iocost_monitor: start from the oldest usage index
iocost: Fix check condition of iocg abs_vdebt
block: Remove callback typedefs for blk_mq_ops
block: Use non _rcu version of list functions for tag_set_list
blk-cgroup: show global disk stats in root cgroup io.stat
blk-cgroup: make iostat functions visible to stat printing
block: improve discard bio alignment in __blkdev_issue_discard()
block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
block: defer flush request no matter whether we have elevator
block: make blk_timeout_init() static
block: remove retry loop in ioc_release_fn()
block: remove unnecessary ioc nested locking
block: integrate bd_start_claiming into __blkdev_get
...


3db0d0c2 01-Jul-2020 Colin Ian King <colin.king@canonical.com>

integrity: remove redundant initialization of variable ret

The variable ret is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.

Fixes: eb5798f2e28f ("integrity: convert digsig to akcipher api")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Addresses-Coverity: ("Unused value")
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

42a2df3e 23-Jul-2020 Dan Carpenter <dan.carpenter@oracle.com>

Smack: prevent underflow in smk_set_cipso()

We have an upper bound on "maplevel" but forgot to check for negative
values.

Fixes: e114e473771c ("Smack: Simplified Mandatory Access Control Kernel")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

a6bd4f6d 23-Jul-2020 Dan Carpenter <dan.carpenter@oracle.com>

Smack: fix another vsscanf out of bounds

This is similar to commit 84e99e58e8d1 ("Smack: slab-out-of-bounds in
vsscanf") where we added a bounds check on "rule".

Reported-by: syzbot+a22c6092d003d6fe1122@syzkaller.appspotmail.com
Fixes: f7112e6c9abf ("Smack: allow for significantly longer Smack labels v4")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

f1d9b23c 13-Jul-2020 Richard Guy Briggs <rgb@redhat.com>

audit: purge audit_log_string from the intra-kernel audit API

audit_log_string() was inteded to be an internal audit function and
since there are only two internal uses, remove them. Purge all external
uses of it by restructuring code to use an existing audit_log_format()
or using audit_log_format().

Please see the upstream issue
https://github.com/linux-audit/audit-kernel/issues/84

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

be619f7f 12-Jul-2020 Eric W. Biederman <ebiederm@xmission.com>

exec: Implement kernel_execve

To allow the kernel not to play games with set_fs to call exec
implement kernel_execve. The function kernel_execve takes pointers
into kernel memory and copies the values pointed to onto the new
userspace stack.

The calls with arguments from kernel space of do_execve are replaced
with calls to kernel_execve.

The calls do_execve and do_execveat are made static as there are now
no callers outside of exec.

The comments that mention do_execve are updated to refer to
kernel_execve or execve depending on the circumstances. In addition
to correcting the comments, this makes it easy to grep for do_execve
and verify it is not used.

Inspired-by: https://lkml.kernel.org/r/20200627072704.2447163-1-hch@lst.de
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/87wo365ikj.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

311aa6aa 13-Jul-2020 Bruno Meneguele <bmeneg@redhat.com>

ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime

The IMA_APPRAISE_BOOTPARAM config allows enabling different "ima_appraise="
modes - log, fix, enforce - at run time, but not when IMA architecture
specific policies are enabled.  This prevents properly labeling the
filesystem on systems where secure boot is supported, but not enabled on the
platform.  Only when secure boot is actually enabled should these IMA
appraise modes be disabled.

This patch removes the compile time dependency and makes it a runtime
decision, based on the secure boot state of that platform.

Test results as follows:

-> x86-64 with secure boot enabled

[ 0.015637] Kernel command line: <...> ima_policy=appraise_tcb ima_appraise=fix
[ 0.015668] ima: Secure boot enabled: ignoring ima_appraise=fix boot parameter option

-> powerpc with secure boot disabled

[ 0.000000] Kernel command line: <...> ima_policy=appraise_tcb ima_appraise=fix
[ 0.000000] Secure boot mode disabled

-> Running the system without secure boot and with both options set:

CONFIG_IMA_APPRAISE_BOOTPARAM=y
CONFIG_IMA_ARCH_POLICY=y

Audit prompts "missing-hash" but still allow execution and, consequently,
filesystem labeling:

type=INTEGRITY_DATA msg=audit(07/09/2020 12:30:27.778:1691) : pid=4976
uid=root auid=root ses=2
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 op=appraise_data
cause=missing-hash comm=bash name=/usr/bin/evmctl dev="dm-0" ino=493150
res=no

Cc: stable@vger.kernel.org
Fixes: d958083a8f64 ("x86/ima: define arch_get_ima_policy() for x86")
Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>
Cc: stable@vger.kernel.org # 5.0
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

1768215a 23-Jun-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: AppArmor satisfies the audit rule requirements

AppArmor meets all the requirements for IMA in terms of audit rules
since commit e79c26d04043 ("apparmor: Add support for audit rule
filtering"). Update IMA's Kconfig section for CONFIG_IMA_LSM_RULES to
reflect this.

Fixes: e79c26d04043 ("apparmor: Add support for audit rule filtering")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

b8867eed 10-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Rename internal filter rule functions

Rename IMA's internal filter rule functions from security_filter_rule_*()
to ima_filter_rule_*(). This avoids polluting the security_* namespace,
which is typically reserved for general security subsystem
infrastructure.

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Suggested-by: Casey Schaufler <casey@schaufler-ca.com>
[zohar@linux.ibm.com: reword using the term "filter", not "audit"]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

4834177e 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Support additional conditionals in the KEXEC_CMDLINE hook function

Take the properties of the kexec kernel's inode and the current task
ownership into consideration when matching a KEXEC_CMDLINE operation to
the rules in the IMA policy. This allows for some uniformity when
writing IMA policy rules for KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK,
and KEXEC_CMDLINE operations.

Prior to this patch, it was not possible to write a set of rules like
this:

dont_measure func=KEXEC_KERNEL_CHECK obj_type=foo_t
dont_measure func=KEXEC_INITRAMFS_CHECK obj_type=foo_t
dont_measure func=KEXEC_CMDLINE obj_type=foo_t
measure func=KEXEC_KERNEL_CHECK
measure func=KEXEC_INITRAMFS_CHECK
measure func=KEXEC_CMDLINE

The inode information associated with the kernel being loaded by a
kexec_kernel_load(2) syscall can now be included in the decision to
measure or not

Additonally, the uid, euid, and subj_* conditionals can also now be
used in KEXEC_CMDLINE rules. There was no technical reason as to why
those conditionals weren't being considered previously other than
ima_match_rules() didn't have a valid inode to use so it immediately
bailed out for KEXEC_CMDLINE operations rather than going through the
full list of conditional comparisons.

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: kexec@lists.infradead.org
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

592b24cb 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Use the common function to detect LSM conditionals in a rule

Make broader use of ima_rule_contains_lsm_cond() to check if a given
rule contains an LSM conditional. This is a code cleanup and has no
user-facing change.

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

30031b0e 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Move comprehensive rule validation checks out of the token parser

Use ima_validate_rule(), at the end of the token parsing stage, to
verify combinations of actions, hooks, and flags. This is useful to
increase readability by consolidating such checks into a single function
and also because rule conditionals can be specified in arbitrary order
making it difficult to do comprehensive rule validation until the entire
rule has been parsed.

This allows for the check that ties together the "keyrings" conditional
with the KEY_CHECK function hook to be moved into the final rule
validation.

The modsig check no longer needs to compiled conditionally because the
token parser will ensure that modsig support is enabled before accepting
"imasig|modsig" appraise type values. The final rule validation will
ensure that appraise_type and appraise_flag options are only present in
appraise rules.

Finally, this allows for the check that ties together the "pcr"
conditional with the measure action to be moved into the final rule
validation.

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

aa0c0227 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Use correct type for the args_p member of ima_rule_entry.lsm elements

Make args_p be of the char pointer type rather than have it be a void
pointer that gets casted to char pointer when it is used. It is a simple
NUL-terminated string as returned by match_strdup().

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

39e5993d 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Shallow copy the args_p member of ima_rule_entry.lsm elements

The args_p member is a simple string that is allocated by
ima_rule_init(). Shallow copy it like other non-LSM references in
ima_rule_entry structs.

There are no longer any necessary error path cleanups to do in
ima_lsm_copy_rule().

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

5f3e9265 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Fail rule parsing when appraise_flag=blacklist is unsupportable

Verifying that a file hash is not blacklisted is currently only
supported for files with appended signatures (modsig). In the future,
this might change.

For now, the "appraise_flag" option is only appropriate for appraise
actions and its "blacklist" value is only appropriate when
CONFIG_IMA_APPRAISE_MODSIG is enabled and "appraise_flag=blacklist" is
only appropriate when "appraise_type=imasig|modsig" is also present.
Make this clear at policy load so that IMA policy authors don't assume
that other uses of "appraise_flag=blacklist" are supported.

Fixes: 273df864cf74 ("ima: Check against blacklisted hashes for files with modsig")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reivewed-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

124ea650 18-Jul-2020 Adrian Reber <areber@redhat.com>

capabilities: Introduce CAP_CHECKPOINT_RESTORE

This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating
checkpoint/restore for non-root users.

Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has
been asked numerous times if it is possible to checkpoint/restore a
process as non-root. The answer usually was: 'almost'.

The main blocker to restore a process as non-root was to control the PID
of the restored process. This feature available via the clone3 system
call, or via /proc/sys/kernel/ns_last_pid is unfortunately guarded by
CAP_SYS_ADMIN.

In the past two years, requests for non-root checkpoint/restore have
increased due to the following use cases:
* Checkpoint/Restore in an HPC environment in combination with a
resource manager distributing jobs where users are always running as
non-root. There is a desire to provide a way to checkpoint and
restore long running jobs.
* Container migration as non-root
* We have been in contact with JVM developers who are integrating
CRIU into a Java VM to decrease the startup time. These
checkpoint/restore applications are not meant to be running with
CAP_SYS_ADMIN.

We have seen the following workarounds:
* Use a setuid wrapper around CRIU:
See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c
* Use a setuid helper that writes to ns_last_pid.
Unfortunately, this helper delegation technique is impossible to use
with clone3, and is thus prone to races.
See https://github.com/twosigma/set_ns_last_pid
* Cycle through PIDs with fork() until the desired PID is reached:
This has been demonstrated to work with cycling rates of 100,000 PIDs/s
See https://github.com/twosigma/set_ns_last_pid
* Patch out the CAP_SYS_ADMIN check from the kernel
* Run the desired application in a new user and PID namespace to provide
a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited
use in typical container environments (e.g., Kubernetes) as /proc is
typically protected with read-only layers (e.g., /proc/sys) for
hardening purposes. Read-only layers prevent additional /proc mounts
(due to proc's SB_I_USERNS_VISIBLE property), making the use of new
PID namespaces limited as certain applications need access to /proc
matching their PID namespace.

The introduced capability allows to:
* Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable
for the corresponding PID namespace via ns_last_pid/clone3.
* Open files in /proc/pid/map_files when the current user is
CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for
recovering files that are unreachable via the file system such as
deleted files, or memfd files.

See corresponding selftest for an example with clone3().

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200719100418.2112740-2-areber@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

eb624fe2 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Fail rule parsing when the KEY_CHECK hook is combined with an invalid cond

The KEY_CHECK function only supports the uid, pcr, and keyrings
conditionals. Make this clear at policy load so that IMA policy authors
don't assume that other conditionals are supported.

Fixes: 5808611cccb2 ("IMA: Add KEY_CHECK func to measure keys")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

db2045f5 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Fail rule parsing when the KEXEC_CMDLINE hook is combined with an invalid cond

The KEXEC_CMDLINE hook function only supports the pcr conditional. Make
this clear at policy load so that IMA policy authors don't assume that
other conditionals are supported.

Since KEXEC_CMDLINE's inception, ima_match_rules() has always returned
true on any loaded KEXEC_CMDLINE rule without any consideration for
other conditionals present in the rule. Make it clear that pcr is the
only supported KEXEC_CMDLINE conditional by returning an error during
policy load.

An example of why this is a problem can be explained with the following
rule:

dont_measure func=KEXEC_CMDLINE obj_type=foo_t

An IMA policy author would have assumed that rule is valid because the
parser accepted it but the result was that measurements for all
KEXEC_CMDLINE operations would be disabled.

Fixes: b0935123a183 ("IMA: Define a new hook to measure the kexec boot command line arguments")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

71218343 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Fail rule parsing when buffer hook functions have an invalid action

Buffer based hook functions, such as KEXEC_CMDLINE and KEY_CHECK, can
only measure. The process_buffer_measurement() function quietly ignores
all actions except measure so make this behavior clear at the time of
policy load.

The parsing of the keyrings conditional had a check to ensure that it
was only specified with measure actions but the check should be on the
hook function and not the keyrings conditional since
"appraise func=KEY_CHECK" is not a valid rule.

Fixes: b0935123a183 ("IMA: Define a new hook to measure the kexec boot command line arguments")
Fixes: 5808611cccb2 ("IMA: Add KEY_CHECK func to measure keys")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

2bdd737c 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Free the entire rule if it fails to parse

Use ima_free_rule() to fix memory leaks of allocated ima_rule_entry
members, such as .fsname and .keyrings, when an error is encountered
during rule parsing.

Set the args_p pointer to NULL after freeing it in the error path of
ima_lsm_rule_init() so that it isn't freed twice.

This fixes a memory leak seen when loading an rule that contains an
additional piece of allocated memory, such as an fsname, followed by an
invalid conditional:

# echo "measure fsname=tmpfs bad=cond" > /sys/kernel/security/ima/policy
-bash: echo: write error: Invalid argument
# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff98e7e4ece6c0 (size 8):
comm "bash", pid 672, jiffies 4294791843 (age 21.855s)
hex dump (first 8 bytes):
74 6d 70 66 73 00 6b a5 tmpfs.k.
backtrace:
[<00000000abab7413>] kstrdup+0x2e/0x60
[<00000000f11ede32>] ima_parse_add_rule+0x7d4/0x1020
[<00000000f883dd7a>] ima_write_policy+0xab/0x1d0
[<00000000b17cf753>] vfs_write+0xde/0x1d0
[<00000000b8ddfdea>] ksys_write+0x68/0xe0
[<00000000b8e21e87>] do_syscall_64+0x56/0xa0
[<0000000089ea7b98>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: f1b08bbcbdaf ("ima: define a new policy condition based on the filesystem name")
Fixes: 2b60c0ecedf8 ("IMA: Read keyrings= option from the IMA policy")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

465aee77 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Free the entire rule when deleting a list of rules

Create a function, ima_free_rule(), to free all memory associated with
an ima_rule_entry. Use the new function to fix memory leaks of allocated
ima_rule_entry members, such as .fsname and .keyrings, when deleting a
list of rules.

Make the existing ima_lsm_free_rule() function specific to the LSM
audit rule array of an ima_rule_entry and require that callers make an
additional call to kfree to free the ima_rule_entry itself.

This fixes a memory leak seen when loading by a valid rule that contains
an additional piece of allocated memory, such as an fsname, followed by
an invalid rule that triggers a policy load failure:

# echo -e "dont_measure fsname=securityfs\nbad syntax" > \
/sys/kernel/security/ima/policy
-bash: echo: write error: Invalid argument
# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff9bab67ca12c0 (size 16):
comm "bash", pid 684, jiffies 4295212803 (age 252.344s)
hex dump (first 16 bytes):
73 65 63 75 72 69 74 79 66 73 00 6b 6b 6b 6b a5 securityfs.kkkk.
backtrace:
[<00000000adc80b1b>] kstrdup+0x2e/0x60
[<00000000d504cb0d>] ima_parse_add_rule+0x7d4/0x1020
[<00000000444825ac>] ima_write_policy+0xab/0x1d0
[<000000002b7f0d6c>] vfs_write+0xde/0x1d0
[<0000000096feedcf>] ksys_write+0x68/0xe0
[<0000000052b544a2>] do_syscall_64+0x56/0xa0
[<000000007ead1ba7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: f1b08bbcbdaf ("ima: define a new policy condition based on the filesystem name")
Fixes: 2b60c0ecedf8 ("IMA: Read keyrings= option from the IMA policy")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

9ff8a616 09-Jul-2020 Tyler Hicks <tyhicks@linux.microsoft.com>

ima: Have the LSM free its audit rule

Ask the LSM to free its audit rule rather than directly calling kfree().
Both AppArmor and SELinux do additional work in their audit_rule_free()
hooks. Fix memory leaks by allowing the LSMs to perform necessary work.

Fixes: b16942455193 ("ima: use the lsm policy update notifier")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Janne Karhunen <janne.karhunen@gmail.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

34e980bb 18-Jun-2020 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

IMA: Add audit log for failure conditions

process_buffer_measurement() and ima_alloc_key_entry() functions need to
log an audit message for auditing integrity measurement failures.

Add audit message in these two functions. Remove "pr_devel" log message
in process_buffer_measurement().

Sample audit messages:

[ 6.303048] audit: type=1804 audit(1592506281.627:2): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel op=measuring_key cause=ENOMEM comm="swapper/0" name=".builtin_trusted_keys" res=0 errno=-12

[ 8.019432] audit: type=1804 audit(1592506283.344:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 op=measuring_kexec_cmdline cause=hashing_error comm="systemd" name="kexec-cmdline" res=0 errno=-22

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

2f845882 18-Jun-2020 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

integrity: Add errno field in audit message

Error code is not included in the audit messages logged by
the integrity subsystem.

Define a new function integrity_audit_message() that takes error code
in the "errno" parameter. Add "errno" field in the audit messages logged
by the integrity subsystem and set the value passed in the "errno"
parameter.

[ 6.303048] audit: type=1804 audit(1592506281.627:2): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel op=measuring_key cause=ENOMEM comm="swapper/0" name=".builtin_trusted_keys" res=0 errno=-12

[ 7.987647] audit: type=1802 audit(1592506283.312:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 op=policy_update cause=completed comm="systemd" res=1 errno=0

[ 8.019432] audit: type=1804 audit(1592506283.344:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 op=measuring_kexec_cmdline cause=hashing_error comm="systemd" name="kexec-cmdline" res=0 errno=-22

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Steve Grubb <sgrubb@redhat.com>
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

beb4ee67 08-Jul-2020 Eric Biggers <ebiggers@google.com>

Smack: fix use-after-free in smk_write_relabel_self()

smk_write_relabel_self() frees memory from the task's credentials with
no locking, which can easily cause a use-after-free because multiple
tasks can share the same credentials structure.

Fix this by using prepare_creds() and commit_creds() to correctly modify
the task's credentials.

Reproducer for "BUG: KASAN: use-after-free in smk_write_relabel_self":

#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>

static void *thrproc(void *arg)
{
int fd = open("/sys/fs/smackfs/relabel-self", O_WRONLY);
for (;;) write(fd, "foo", 3);
}

int main()
{
pthread_t t;
pthread_create(&t, NULL, thrproc, NULL);
thrproc(NULL);
}

Reported-by: syzbot+e6416dabb497a650da40@syzkaller.appspotmail.com
Fixes: 38416e53936e ("Smack: limited capability for changing process label")
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

54b27f92 09-Jul-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: complete the inlining of hashtab functions

Move (most of) the definitions of hashtab_search() and hashtab_insert()
to the header file. In combination with the previous patch, this avoids
calling the callbacks indirectly by function pointers and allows for
better optimization, leading to a drastic performance improvement of
these operations.

With this patch, I measured a speed up in the following areas (measured
on x86_64 F32 VM with 4 CPUs):
1. Policy load (`load_policy`) - takes ~150 ms instead of ~230 ms.
2. `chcon -R unconfined_u:object_r:user_tmp_t:s0:c381,c519 /tmp/linux-src`
where /tmp/linux-src is an extracted linux-5.7 source tarball -
takes ~522 ms instead of ~576 ms. This is because of many
symtab_search() calls in string_to_context_struct() when there are
many categories specified in the context.
3. `stress-ng --msg 1 --msg-ops 10000000` - takes 12.41 s instead of
13.95 s (consumes 18.6 s of kernel CPU time instead of 21.6 s).
This is thanks to security_transition_sid() being ~43% faster after
this patch.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

24def7bb 09-Jul-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: prepare for inlining of hashtab functions

Refactor searching and inserting into hashtabs to pave the way for
converting hashtab_search() and hashtab_insert() to inline functions in
the next patch. This will avoid indirect calls and allow the compiler to
better optimize individual callers, leading to a significant performance
improvement.

In order to avoid the indirect calls, the key hashing and comparison
callbacks need to be extracted from the hashtab struct and passed
directly to hashtab_search()/_insert() by the callers so that the
callback address is always known at compile time. The kernel's
rhashtable library (<linux/rhashtable*.h>) does the same thing.

This of course makes the hashtab functions slightly easier to misuse by
passing a wrong callback set, but unfortunately there is no better way
to implement a hash table that is both generic and efficient in C. This
patch tries to somewhat mitigate this by only calling the hashtab
functions in the same file where the corresponding callbacks are
defined (wrapping them into more specialized functions as needed).

Note that this patch doesn't bring any benefit without also moving the
definitions of hashtab_search() and -_insert() to the header file, which
is done in a follow-up patch for easier review of the hashtab.c changes
in this patch.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

237389e3 08-Jul-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: specialize symtab insert and search functions

This encapsulates symtab a little better and will help with further
refactoring later.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

d7481b24 02-Jul-2020 Richard Guy Briggs <rgb@redhat.com>

audit: issue CWD record to accompany LSM_AUDIT_DATA_* records

The LSM_AUDIT_DATA_* records for PATH, FILE, IOCTL_OP, DENTRY and INODE
are incomplete without the task context of the AUDIT Current Working
Directory record. Add it.

This record addition can't use audit_dummy_context to determine whether
or not to store the record information since the LSM_AUDIT_DATA_*
records are initiated by various LSMs independent of any audit rules.
context->in_syscall is used to determine if it was called in user
context like audit_getname.

Please see the upstream issue
https://github.com/linux-audit/audit-kernel/issues/96

Adapted from Vladis Dronov's v2 patch.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

2c3d8dfe 06-Jul-2020 lihao <fly.lihao@huawei.com>

selinux: Fix spelling mistakes in the comments

Fix spelling mistakes in the comments
quering==>querying

Signed-off-by: lihao <fly.lihao@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

a1f9b1c0 08-May-2020 Christoph Hellwig <hch@lst.de>

integrity/ima: switch to using __kernel_read

__kernel_read has a bunch of additional sanity checks, and this moves
the set_fs out of non-core code.

Signed-off-by: Christoph Hellwig <hch@lst.de>

615bc218 30-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'fixes-v5.8-rc3-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull security subsystem fixes from James Morris:
"Two simple fixes for v5.8:

- Fix hook iteration and default value for inode_copy_up_xattr
(KP Singh)

- Fix the key_permission LSM hook function type (Sami Tolvanen)"

* tag 'fixes-v5.8-rc3-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security: Fix hook iteration and default value for inode_copy_up_xattr
security: fix the key_permission LSM hook function type


65d96351 23-Jun-2020 Ethan Edwards <ethancarteredwards@gmail.com>

selinux: fixed a checkpatch warning with the sizeof macro

`sizeof buf` changed to `sizeof(buf)`

Signed-off-by: Ethan Edwards <ethancarteredwards@gmail.com>
[PM: rewrote the subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>

20c59ce0 23-Jun-2020 Maurizio Drocco <maurizio.drocco@ibm.com>

ima: extend boot_aggregate with kernel measurements

Registers 8-9 are used to store measurements of the kernel and its
command line (e.g., grub2 bootloader with tpm module enabled). IMA
should include them in the boot aggregate. Registers 8-9 should be
only included in non-SHA1 digests to avoid ambiguity.

Signed-off-by: Maurizio Drocco <maurizio.drocco@ibm.com>
Reviewed-by: Bruno Meneguele <bmeneg@redhat.com>
Tested-by: Bruno Meneguele <bmeneg@redhat.com> (TPM 1.2, TPM 2.0)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

3f1266f1 20-Jun-2020 Christoph Hellwig <hch@lst.de>

block: move block-related definitions out of fs.h

Move most of the block related definition out of fs.h into more suitable
headers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7383c0f9 17-Jun-2020 Stephen Smalley <stephen.smalley.work@gmail.com>

selinux: log error messages on required process class / permissions

In general SELinux no longer treats undefined object classes or permissions
in the policy as a fatal error, instead handling them in accordance with
handle_unknown. However, the process class and process transition and
dyntransition permissions are still required to be defined due to
dependencies on these definitions for default labeling behaviors,
role and range transitions in older policy versions that lack an explicit
class field, and role allow checking. Log error messages in these cases
since otherwise the policy load will fail silently with no indication
to the user as to the underlying cause. While here, fix the checking for
process transition / dyntransition so that omitting either permission is
handled as an error; both are needed in order to ensure that role allow
checking is consistently applied.

Reported-by: bauen1 <j2468h@googlemail.com>
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

c8e22261 28-May-2020 Jonathan Lebon <jlebon@redhat.com>

selinux: allow reading labels before policy is loaded

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow
labeling before policy is loaded") did for `setxattr`; it allows
querying the current SELinux label on disk before the policy is loaded.

One of the motivations described in that commit message also drives this
patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be
able to move the root filesystem for example, from xfs to ext4 on RAID,
on first boot, at initrd time.[1]

Because such an operation works at the filesystem level, we need to be
able to read the SELinux labels first from the original root, and apply
them to the files of the new root. The previous commit enabled the
second part of this process; this commit enables the first part.

[1] https://github.com/coreos/fedora-coreos-tracker/issues/94

Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

23e390cd 21-Jun-2020 KP Singh <kpsingh@google.com>

security: Fix hook iteration and default value for inode_copy_up_xattr

inode_copy_up_xattr returns 0 to indicate the acceptance of the xattr
and 1 to reject it. If the LSM does not know about the xattr, it's
expected to return -EOPNOTSUPP, which is the correct default value for
this hook. BPF LSM, currently, uses 0 as the default value and thereby
falsely allows all overlay fs xattributes to be copied up.

The iteration logic is also updated from the "bail-on-fail"
call_int_hook to continue on the non-decisive -EOPNOTSUPP and bail out
on other values.

Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks")
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: James Morris <jmorris@namei.org>

817d914d 21-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20200621' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fixes from Paul Moore:
"Three small patches to fix problems in the SELinux code, all found via
clang.

Two patches fix potential double-free conditions and one fixes an
undefined return value"

* tag 'selinux-pr-20200621' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix undefined return of cond_evaluate_expr
selinux: fix a double free in cond_read_node()/cond_read_list()
selinux: fix double free


8231b0b9 17-Jun-2020 Tom Rix <trix@redhat.com>

selinux: fix undefined return of cond_evaluate_expr

clang static analysis reports an undefined return

security/selinux/ss/conditional.c:79:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn]
return s[0];
^~~~~~~~~~~

static int cond_evaluate_expr( ...
{
u32 i;
int s[COND_EXPR_MAXDEPTH];

for (i = 0; i < expr->len; i++)
...

return s[0];

When expr->len is 0, the loop which sets s[0] never runs.

So return -1 if the loop never runs.

Cc: stable@vger.kernel.org
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

aa449a79 15-Jun-2020 Tom Rix <trix@redhat.com>

selinux: fix a double free in cond_read_node()/cond_read_list()

Clang static analysis reports this double free error

security/selinux/ss/conditional.c:139:2: warning: Attempt to free released memory [unix.Malloc]
kfree(node->expr.nodes);
^~~~~~~~~~~~~~~~~~~~~~~

When cond_read_node fails, it calls cond_node_destroy which frees the
node but does not poison the entry in the node list. So when it
returns to its caller cond_read_list, cond_read_list deletes the
partial list. The latest entry in the list will be deleted twice.

So instead of freeing the node in cond_read_node, let list freeing in
code_read_list handle the freeing the problem node along with all of the
earlier nodes.

Because cond_read_node no longer does any error handling, the goto's
the error case are redundant. Instead just return the error code.

Cc: stable@vger.kernel.org
Fixes: 60abd3181db2 ("selinux: convert cond_list to array")
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>

f0fe00d4 16-Jun-2020 glider@google.com <glider@google.com>

security: allow using Clang's zero initialization for stack variables

In addition to -ftrivial-auto-var-init=pattern (used by
CONFIG_INIT_STACK_ALL now) Clang also supports zero initialization for
locals enabled by -ftrivial-auto-var-init=zero. The future of this flag
is still being debated (see https://bugs.llvm.org/show_bug.cgi?id=45497).
Right now it is guarded by another flag,
-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang,
which means it may not be supported by future Clang releases. Another
possible resolution is that -ftrivial-auto-var-init=zero will persist
(as certain users have already started depending on it), but the name
of the guard flag will change.

In the meantime, zero initialization has proven itself as a good
production mitigation measure against uninitialized locals. Unlike pattern
initialization, which has a higher chance of triggering existing bugs,
zero initialization provides safe defaults for strings, pointers, indexes,
and sizes. On the other hand, pattern initialization remains safer for
return values. Chrome OS and Android are moving to using zero
initialization for production builds.

Performance-wise, the difference between pattern and zero initialization
is usually negligible, although the generated code for zero
initialization is more compact.

This patch renames CONFIG_INIT_STACK_ALL to CONFIG_INIT_STACK_ALL_PATTERN
and introduces another config option, CONFIG_INIT_STACK_ALL_ZERO, that
enables zero initialization for locals if the corresponding flags are
supported by Clang.

Cc: Kees Cook <keescook@chromium.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/r/20200616083435.223038-1-glider@google.com
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>

eb492c62 28-May-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

ima: Replace zero-length array with flexible-array

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>

4a87b197 14-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'LSM-add-setgid-hook-5.8-author-fix' of git://github.com/micah-morton/linux

Pull SafeSetID update from Micah Morton:
"Add additional LSM hooks for SafeSetID

SafeSetID is capable of making allow/deny decisions for set*uid calls
on a system, and we want to add similar functionality for set*gid
calls.

The work to do that is not yet complete, so probably won't make it in
for v5.8, but we are looking to get this simple patch in for v5.8
since we have it ready.

We are planning on the rest of the work for extending the SafeSetID
LSM being merged during the v5.9 merge window"

* tag 'LSM-add-setgid-hook-5.8-author-fix' of git://github.com/micah-morton/linux:
security: Add LSM hooks to set*gid syscalls


39030e13 09-Jun-2020 Thomas Cedeno <thomascedeno@google.com>

security: Add LSM hooks to set*gid syscalls

The SafeSetID LSM uses the security_task_fix_setuid hook to filter
set*uid() syscalls according to its configured security policy. In
preparation for adding analagous support in the LSM for set*gid()
syscalls, we add the requisite hook here. Tested by putting print
statements in the security_task_fix_setgid hook and seeing them get hit
during kernel boot.

Signed-off-by: Thomas Cedeno <thomascedeno@google.com>
Signed-off-by: Micah Morton <mortonm@chromium.org>

6adc19fd 13-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

- fix build rules in binderfs sample

- fix build errors when Kbuild recurses to the top Makefile

- covert '---help---' in Kconfig to 'help'

* tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
treewide: replace '---help---' in Kconfig files with 'help'
kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
samples: binderfs: really compile this sample and fix build issues


a7f7f624 13-Jun-2020 Masahiro Yamada <masahiroy@kernel.org>

treewide: replace '---help---' in Kconfig files with 'help'

Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

6c329784 13-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull notification queue from David Howells:
"This adds a general notification queue concept and adds an event
source for keys/keyrings, such as linking and unlinking keys and
changing their attributes.

Thanks to Debarshi Ray, we do have a pull request to use this to fix a
problem with gnome-online-accounts - as mentioned last time:

https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47

Without this, g-o-a has to constantly poll a keyring-based kerberos
cache to find out if kinit has changed anything.

[ There are other notification pending: mount/sb fsinfo notifications
for libmount that Karel Zak and Ian Kent have been working on, and
Christian Brauner would like to use them in lxc, but let's see how
this one works first ]

LSM hooks are included:

- A set of hooks are provided that allow an LSM to rule on whether or
not a watch may be set. Each of these hooks takes a different
"watched object" parameter, so they're not really shareable. The
LSM should use current's credentials. [Wanted by SELinux & Smack]

- A hook is provided to allow an LSM to rule on whether or not a
particular message may be posted to a particular queue. This is
given the credentials from the event generator (which may be the
system) and the watch setter. [Wanted by Smack]

I've provided SELinux and Smack with implementations of some of these
hooks.

WHY
===

Key/keyring notifications are desirable because if you have your
kerberos tickets in a file/directory, your Gnome desktop will monitor
that using something like fanotify and tell you if your credentials
cache changes.

However, we also have the ability to cache your kerberos tickets in
the session, user or persistent keyring so that it isn't left around
on disk across a reboot or logout. Keyrings, however, cannot currently
be monitored asynchronously, so the desktop has to poll for it - not
so good on a laptop. This facility will allow the desktop to avoid the
need to poll.

DESIGN DECISIONS
================

- The notification queue is built on top of a standard pipe. Messages
are effectively spliced in. The pipe is opened with a special flag:

pipe2(fds, O_NOTIFICATION_PIPE);

The special flag has the same value as O_EXCL (which doesn't seem
like it will ever be applicable in this context)[?]. It is given up
front to make it a lot easier to prohibit splice&co from accessing
the pipe.

[?] Should this be done some other way? I'd rather not use up a new
O_* flag if I can avoid it - should I add a pipe3() system call
instead?

The pipe is then configured::

ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

Messages are then read out of the pipe using read().

- It should be possible to allow write() to insert data into the
notification pipes too, but this is currently disabled as the
kernel has to be able to insert messages into the pipe *without*
holding pipe->mutex and the code to make this work needs careful
auditing.

- sendfile(), splice() and vmsplice() are disabled on notification
pipes because of the pipe->mutex issue and also because they
sometimes want to revert what they just did - but one or more
notification messages might've been interleaved in the ring.

- The kernel inserts messages with the wait queue spinlock held. This
means that pipe_read() and pipe_write() have to take the spinlock
to update the queue pointers.

- Records in the buffer are binary, typed and have a length so that
they can be of varying size.

This allows multiple heterogeneous sources to share a common
buffer; there are 16 million types available, of which I've used
just a few, so there is scope for others to be used. Tags may be
specified when a watchpoint is created to help distinguish the
sources.

- Records are filterable as types have up to 256 subtypes that can be
individually filtered. Other filtration is also available.

- Notification pipes don't interfere with each other; each may be
bound to a different set of watches. Any particular notification
will be copied to all the queues that are currently watching for it
- and only those that are watching for it.

- When recording a notification, the kernel will not sleep, but will
rather mark a queue as having lost a message if there's
insufficient space. read() will fabricate a loss notification
message at an appropriate point later.

- The notification pipe is created and then watchpoints are attached
to it, using one of:

keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

where in both cases, fd indicates the queue and the number after is
a tag between 0 and 255.

- Watches are removed if either the notification pipe is destroyed or
the watched object is destroyed. In the latter case, a message will
be generated indicating the enforced watch removal.

Things I want to avoid:

- Introducing features that make the core VFS dependent on the
network stack or networking namespaces (ie. usage of netlink).

- Dumping all this stuff into dmesg and having a daemon that sits
there parsing the output and distributing it as this then puts the
responsibility for security into userspace and makes handling
namespaces tricky. Further, dmesg might not exist or might be
inaccessible inside a container.

- Letting users see events they shouldn't be able to see.

TESTING AND MANPAGES
====================

- The keyutils tree has a pipe-watch branch that has keyctl commands
for making use of notifications. Proposed manual pages can also be
found on this branch, though a couple of them really need to go to
the main manpages repository instead.

If the kernel supports the watching of keys, then running "make
test" on that branch will cause the testing infrastructure to spawn
a monitoring process on the side that monitors a notifications pipe
for all the key/keyring changes induced by the tests and they'll
all be checked off to make sure they happened.

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

- A test program is provided (samples/watch_queue/watch_test) that
can be used to monitor for keyrings, mount and superblock events.
Information on the notifications is simply logged to stdout"

* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
smack: Implement the watch_key and post_notification hooks
selinux: Implement the watch_key security hook
keys: Make the KEY_NEED_* perms an enum rather than a mask
pipe: Add notification lossage handling
pipe: Allow buffers to be marked read-whole-or-error for notifications
Add sample notification program
watch_queue: Add a key/keyring notification facility
security: Add hooks to rule on setting a watch
pipe: Add general notification queue support
pipe: Add O_NOTIFICATION_PIPE
security: Add a hook for the point of notification insertion
uapi: General notification queue definitions


923ea163 12-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity fix from Mimi Zohar:
"ima mprotect performance fix"

* tag 'integrity-v5.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: fix mprotect checking


4235b1a4 10-Jun-2020 Mimi Zohar <zohar@linux.ibm.com>

ima: fix mprotect checking

Make sure IMA is enabled before checking mprotect change. Addresses
report of a 3.7% regression of boot-time.dhcp.

Fixes: 8eb613c0b8f1 ("ima: verify mprotect change is consistent with mmap policy")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

65de5096 10-Jun-2020 Tom Rix <trix@redhat.com>

selinux: fix double free

Clang's static analysis tool reports these double free memory errors.

security/selinux/ss/services.c:2987:4: warning: Attempt to free released memory [unix.Malloc]
kfree(bnames[i]);
^~~~~~~~~~~~~~~~
security/selinux/ss/services.c:2990:2: warning: Attempt to free released memory [unix.Malloc]
kfree(bvalues);
^~~~~~~~~~~~~~

So improve the security_get_bools error handling by freeing these variables
and setting their return pointers to NULL and the return len to 0

Cc: stable@vger.kernel.org
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

52435c86 09-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'ovl-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:
"Fixes:

- Resolve mount option conflicts consistently

- Sync before remount R/O

- Fix file handle encoding corner cases

- Fix metacopy related issues

- Fix an unintialized return value

- Add missing permission checks for underlying layers

Optimizations:

- Allow multipe whiteouts to share an inode

- Optimize small writes by inheriting SB_NOSEC from upper layer

- Do not call ->syncfs() multiple times for sync(2)

- Do not cache negative lookups on upper layer

- Make private internal mounts longterm"

* tag 'ovl-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (27 commits)
ovl: remove unnecessary lock check
ovl: make oip->index bool
ovl: only pass ->ki_flags to ovl_iocb_to_rwf()
ovl: make private mounts longterm
ovl: get rid of redundant members in struct ovl_fs
ovl: add accessor for ofs->upper_mnt
ovl: initialize error in ovl_copy_xattr
ovl: drop negative dentry in upper layer
ovl: check permission to open real file
ovl: call secutiry hook in ovl_real_ioctl()
ovl: verify permissions in ovl_path_open()
ovl: switch to mounter creds in readdir
ovl: pass correct flags for opening real directory
ovl: fix redirect traversal on metacopy dentries
ovl: initialize OVL_UPPERDATA in ovl_lookup()
ovl: use only uppermetacopy state in ovl_lookup()
ovl: simplify setting of origin for index lookup
ovl: fix out of bounds access warning in ovl_check_fb_len()
ovl: return required buffer size for file handles
ovl: sync dirty data when remounting to ro mode
...


595a56ac 09-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'linux-kselftest-kunit-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kunit updates from Shuah Khan:
"This consists of:

- Several config fragment fixes from Anders Roxell to improve test
coverage.

- Improvements to kunit run script to use defconfig as default and
restructure the code for config/build/exec/parse from Vitor Massaru
Iha and David Gow.

- Miscellaneous documentation warn fix"

* tag 'linux-kselftest-kunit-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
security: apparmor: default KUNIT_* fragments to KUNIT_ALL_TESTS
fs: ext4: default KUNIT_* fragments to KUNIT_ALL_TESTS
drivers: base: default KUNIT_* fragments to KUNIT_ALL_TESTS
lib: Kconfig.debug: default KUNIT_* fragments to KUNIT_ALL_TESTS
kunit: default KUNIT_* fragments to KUNIT_ALL_TESTS
kunit: Kconfig: enable a KUNIT_ALL_TESTS fragment
kunit: Fix TabError, remove defconfig code and handle when there is no kunitconfig
kunit: use KUnit defconfig by default
kunit: use --build_dir=.kunit as default
Documentation: test.h - fix warnings
kunit: kunit_tool: Separate out config/build/exec/parse


c1e8d7c6 08-Jun-2020 Michel Lespinasse <michel@lespinasse.org>

mmap locking API: convert mmap_sem comments

Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a2b44706 07-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'apparmor-pr-2020-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor updates from John Johansen:
"Features:
- Replace zero-length array with flexible-array
- add a valid state flags check
- add consistency check between state and dfa diff encode flags
- add apparmor subdir to proc attr interface
- fail unpack if profile mode is unknown
- add outofband transition and use it in xattr match
- ensure that dfa state tables have entries

Cleanups:
- Use true and false for bool variable
- Remove semicolon
- Clean code by removing redundant instructions
- Replace two seq_printf() calls by seq_puts() in aa_label_seq_xprint()
- remove duplicate check of xattrs on profile attachment
- remove useless aafs_create_symlink

Bug fixes:
- Fix memory leak of profile proxy
- fix introspection of of task mode for unconfined tasks
- fix nnp subset test for unconfined
- check/put label on apparmor_sk_clone_security()"

* tag 'apparmor-pr-2020-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix memory leak of profile proxy
apparmor: fix introspection of of task mode for unconfined tasks
apparmor: check/put label on apparmor_sk_clone_security()
apparmor: Use true and false for bool variable
security/apparmor/label.c: Clean code by removing redundant instructions
apparmor: Replace zero-length array with flexible-array
apparmor: ensure that dfa state tables have entries
apparmor: remove duplicate check of xattrs on profile attachment.
apparmor: add outofband transition and use it in xattr match
apparmor: fail unpack if profile mode is unknown
apparmor: fix nnp subset test for unconfined
apparmor: remove useless aafs_create_symlink
apparmor: add proc subdir to attrs
apparmor: add consistency check between state and dfa diff encode flags
apparmor: add a valid state flags check
AppArmor: Remove semicolon
apparmor: Replace two seq_printf() calls by seq_puts() in aa_label_seq_xprint()


8b8c704d 07-Jun-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Remove __init annotation from ima_pcrread()

Commit 6cc7c266e5b4 ("ima: Call ima_calc_boot_aggregate() in
ima_eventdigest_init()") added a call to ima_calc_boot_aggregate() so that
the digest can be recalculated for the boot_aggregate measurement entry if
the 'd' template field has been requested. For the 'd' field, only SHA1 and
MD5 digests are accepted.

Given that ima_eventdigest_init() does not have the __init annotation, all
functions called should not have it. This patch removes __init from
ima_pcrread().

Cc: stable@vger.kernel.org
Fixes: 6cc7c266e5b4 ("ima: Call ima_calc_boot_aggregate() in ima_eventdigest_init()")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3622ad25 07-Jun-2020 John Johansen <john.johansen@canonical.com>

apparmor: Fix memory leak of profile proxy

When the proxy isn't replaced and the profile is removed, the proxy
is being leaked resulting in a kmemleak check message of

unreferenced object 0xffff888077a3a490 (size 16):
comm "apparmor_parser", pid 128041, jiffies 4322684109 (age 1097.028s)
hex dump (first 16 bytes):
03 00 00 00 00 00 00 00 b0 92 fd 4b 81 88 ff ff ...........K....
backtrace:
[<0000000084d5daf2>] aa_alloc_proxy+0x58/0xe0
[<00000000ecc0e21a>] aa_alloc_profile+0x159/0x1a0
[<000000004cc9ce15>] unpack_profile+0x275/0x1c40
[<000000007332b3ca>] aa_unpack+0x1e7/0x7e0
[<00000000e25e31bd>] aa_replace_profiles+0x18a/0x1d10
[<00000000350d9415>] policy_update+0x237/0x650
[<000000003fbf934e>] profile_load+0x122/0x160
[<0000000047f7b781>] vfs_write+0x139/0x290
[<000000008ad12358>] ksys_write+0xcd/0x170
[<000000001a9daa7b>] do_syscall_64+0x70/0x310
[<00000000b9efb0cf>] entry_SYSCALL_64_after_hwframe+0x49/0xb3

Make sure to cleanup the profile's embedded label which will result
on the proxy being properly freed.

Fixes: 637f688dc3dc ("apparmor: switch from profiles to using labels on contexts")
Signed-off-by: John Johansen <john.johansen@canonical.com>

dd2569fb 05-Jun-2020 John Johansen <john.johansen@canonical.com>

apparmor: fix introspection of of task mode for unconfined tasks

Fix two issues with introspecting the task mode.

1. If a task is attached to a unconfined profile that is not the
ns->unconfined profile then. Mode the mode is always reported
as -

$ ps -Z
LABEL PID TTY TIME CMD
unconfined 1287 pts/0 00:00:01 bash
test (-) 1892 pts/0 00:00:00 ps

instead of the correct value of (unconfined) as shown below

$ ps -Z
LABEL PID TTY TIME CMD
unconfined 2483 pts/0 00:00:01 bash
test (unconfined) 3591 pts/0 00:00:00 ps

2. if a task is confined by a stack of profiles that are unconfined
the output of label mode is again the incorrect value of (-) like
above, instead of (unconfined). This is because the visibile
profile count increment is skipped by the special casing of
unconfined.

Fixes: f1bd904175e8 ("apparmor: add the base fns() for domain labels")
Signed-off-by: John Johansen <john.johansen@canonical.com>

3b646abc 02-Jun-2020 Mauricio Faria de Oliveira <mfo@canonical.com>

apparmor: check/put label on apparmor_sk_clone_security()

Currently apparmor_sk_clone_security() does not check for existing
label/peer in the 'new' struct sock; it just overwrites it, if any
(with another reference to the label of the source sock.)

static void apparmor_sk_clone_security(const struct sock *sk,
struct sock *newsk)
{
struct aa_sk_ctx *ctx = SK_CTX(sk);
struct aa_sk_ctx *new = SK_CTX(newsk);

new->label = aa_get_label(ctx->label);
new->peer = aa_get_label(ctx->peer);
}

This might leak label references, which might overflow under load.
Thus, check for and put labels, to prevent such errors.

Note this is similarly done on:

static int apparmor_socket_post_create(struct socket *sock, ...)
...
if (sock->sk) {
struct aa_sk_ctx *ctx = SK_CTX(sock->sk);

aa_put_label(ctx->label);
ctx->label = aa_get_label(label);
}
...

Context:
-------

The label reference count leak is observed if apparmor_sock_graft()
is called previously: this sets the 'ctx->label' field by getting
a reference to the current label (later overwritten, without put.)

static void apparmor_sock_graft(struct sock *sk, ...)
{
struct aa_sk_ctx *ctx = SK_CTX(sk);

if (!ctx->label)
ctx->label = aa_get_current_label();
}

And that is the case on crypto/af_alg.c:af_alg_accept():

int af_alg_accept(struct sock *sk, struct socket *newsock, ...)
...
struct sock *sk2;
...
sk2 = sk_alloc(...);
...
security_sock_graft(sk2, newsock);
security_sk_clone(sk, sk2);
...

Apparently both calls are done on their own right, especially for
other LSMs, being introduced in 2010/2014, before apparmor socket
mediation in 2017 (see commits [1,2,3,4]).

So, it looks OK there! Let's fix the reference leak in apparmor.

Test-case:
---------

Exercise that code path enough to overflow label reference count.

$ cat aa-refcnt-af_alg.c
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <linux/if_alg.h>

int main() {
int sockfd;
struct sockaddr_alg sa;

/* Setup the crypto API socket */
sockfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (sockfd < 0) {
perror("socket");
return 1;
}

memset(&sa, 0, sizeof(sa));
sa.salg_family = AF_ALG;
strcpy((char *) sa.salg_type, "rng");
strcpy((char *) sa.salg_name, "stdrng");

if (bind(sockfd, (struct sockaddr *) &sa, sizeof(sa)) < 0) {
perror("bind");
return 1;
}

/* Accept a "connection" and close it; repeat. */
while (!close(accept(sockfd, NULL, 0)));

return 0;
}

$ gcc -o aa-refcnt-af_alg aa-refcnt-af_alg.c

$ ./aa-refcnt-af_alg
<a few hours later>

[ 9928.475953] refcount_t overflow at apparmor_sk_clone_security+0x37/0x70 in aa-refcnt-af_alg[1322], uid/euid: 1000/1000
...
[ 9928.507443] RIP: 0010:apparmor_sk_clone_security+0x37/0x70
...
[ 9928.514286] security_sk_clone+0x33/0x50
[ 9928.514807] af_alg_accept+0x81/0x1c0 [af_alg]
[ 9928.516091] alg_accept+0x15/0x20 [af_alg]
[ 9928.516682] SYSC_accept4+0xff/0x210
[ 9928.519609] SyS_accept+0x10/0x20
[ 9928.520190] do_syscall_64+0x73/0x130
[ 9928.520808] entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Note that other messages may be seen, not just overflow, depending on
the value being incremented by kref_get(); on another run:

[ 7273.182666] refcount_t: saturated; leaking memory.
...
[ 7273.185789] refcount_t: underflow; use-after-free.

Kprobes:
-------

Using kprobe events to monitor sk -> sk_security -> label -> count (kref):

Original v5.7 (one reference leak every iteration)

... (af_alg_accept+0x0/0x1c0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd2
... (af_alg_release_parent+0x0/0xd0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd4
... (af_alg_accept+0x0/0x1c0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd3
... (af_alg_release_parent+0x0/0xd0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd5
... (af_alg_accept+0x0/0x1c0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd4
... (af_alg_release_parent+0x0/0xd0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd6

Patched v5.7 (zero reference leak per iteration)

... (af_alg_accept+0x0/0x1c0) label=0xffff9ff376c25eb0 label_refcnt=0x593
... (af_alg_release_parent+0x0/0xd0) label=0xffff9ff376c25eb0 label_refcnt=0x594
... (af_alg_accept+0x0/0x1c0) label=0xffff9ff376c25eb0 label_refcnt=0x593
... (af_alg_release_parent+0x0/0xd0) label=0xffff9ff376c25eb0 label_refcnt=0x594
... (af_alg_accept+0x0/0x1c0) label=0xffff9ff376c25eb0 label_refcnt=0x593
... (af_alg_release_parent+0x0/0xd0) label=0xffff9ff376c25eb0 label_refcnt=0x594

Commits:
-------

[1] commit 507cad355fc9 ("crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets")
[2] commit 4c63f83c2c2e ("crypto: af_alg - properly label AF_ALG socket")
[3] commit 2acce6aa9f65 ("Networking") a.k.a ("crypto: af_alg - Avoid sock_graft call warning)
[4] commit 56974a6fcfef ("apparmor: add base infastructure for socket mediation")

Fixes: 56974a6fcfef ("apparmor: add base infastructure for socket mediation")
Reported-by: Brian Moyles <bmoyles@netflix.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

3c0ad98c 06-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'integrity-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
"The main changes are extending the TPM 2.0 PCR banks with bank
specific file hashes, calculating the "boot_aggregate" based on other
TPM PCR banks, using the default IMA hash algorithm, instead of SHA1,
as the basis for the cache hash table key, and preventing the mprotect
syscall to circumvent an IMA mmap appraise policy rule.

- In preparation for extending TPM 2.0 PCR banks with bank specific
digests, commit 0b6cf6b97b7e ("tpm: pass an array of
tpm_extend_digest structures to tpm_pcr_extend()") modified
tpm_pcr_extend(). The original SHA1 file digests were
padded/truncated, before being extended into the other TPM PCR
banks. This pull request calculates and extends the TPM PCR banks
with bank specific file hashes completing the above change.

- The "boot_aggregate", the first IMA measurement list record, is the
"trusted boot" link between the pre-boot environment and the
running OS. With TPM 2.0, the "boot_aggregate" record is not
limited to being based on the SHA1 TPM PCR bank, but can be
calculated based on any enabled bank, assuming the hash algorithm
is also enabled in the kernel.

Other changes include the following and five other bug fixes/code
clean up:

- supporting both a SHA1 and a larger "boot_aggregate" digest in a
custom template format containing both the the SHA1 ('d') and
larger digests ('d-ng') fields.

- Initial hash table key fix, but additional changes would be good"

* tag 'integrity-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Directly free *entry in ima_alloc_init_template() if digests is NULL
ima: Call ima_calc_boot_aggregate() in ima_eventdigest_init()
ima: Directly assign the ima_default_policy pointer to ima_rules
ima: verify mprotect change is consistent with mmap policy
evm: Fix possible memory leak in evm_calc_hmac_or_hash()
ima: Set again build_ima_appraise variable
ima: Remove redundant policy rule set in add_rules()
ima: Fix ima digest hash table key calculation
ima: Use ima_hash_algo for collision detection in the measurement list
ima: Calculate and extend PCR with digests in ima_template_entry
ima: Allocate and initialize tfm for each PCR bank
ima: Switch to dynamically allocated buffer for template digests
ima: Store template digest directly in ima_template_entry
ima: Evaluate error in init_ima()
ima: Switch to ima_hash_algo for boot aggregate


42413b49 05-Jun-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Directly free *entry in ima_alloc_init_template() if digests is NULL

To support multiple template digests, the static array entry->digest has
been replaced with a dynamically allocated array in commit aa724fe18a8a
("ima: Switch to dynamically allocated buffer for template digests"). The
array is allocated in ima_alloc_init_template() and if the returned pointer
is NULL, ima_free_template_entry() is called.

However, (*entry)->template_desc is not yet initialized while it is used by
ima_free_template_entry(). This patch fixes the issue by directly freeing
*entry without calling ima_free_template_entry().

Fixes: aa724fe18a8a ("ima: Switch to dynamically allocated buffer for template digests")
Reported-by: syzbot+223310b454ba6b75974e@syzkaller.appspotmail.com
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

886d7de6 04-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'akpm' (patches from Andrew)

Merge yet more updates from Andrew Morton:

- More MM work. 100ish more to go. Mike Rapoport's "mm: remove
__ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue

- Various other little subsystems

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits)
lib/ubsan.c: fix gcc-10 warnings
tools/testing/selftests/vm: remove duplicate headers
selftests: vm: pkeys: fix multilib builds for x86
selftests: vm: pkeys: use the correct page size on powerpc
selftests/vm/pkeys: override access right definitions on powerpc
selftests/vm/pkeys: test correct behaviour of pkey-0
selftests/vm/pkeys: introduce a sub-page allocator
selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
selftests/vm/pkeys: associate key on a mapped page and detect write violation
selftests/vm/pkeys: associate key on a mapped page and detect access violation
selftests/vm/pkeys: improve checks to determine pkey support
selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
selftests/vm/pkeys: fix number of reserved powerpc pkeys
selftests/vm/pkeys: introduce powerpc support
selftests/vm/pkeys: introduce generic pkey abstractions
selftests: vm: pkeys: use the correct huge page size
selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
selftests/vm/pkeys: fix pkey_disable_clear()
selftests: vm: pkeys: add helpers for pkey bits
...


d4eaa283 04-Jun-2020 Waiman Long <longman@redhat.com>

mm: add kvfree_sensitive() for freeing sensitive data objects

For kvmalloc'ed data object that contains sensitive information like
cryptographic keys, we need to make sure that the buffer is always cleared
before freeing it. Using memset() alone for buffer clearing may not
provide certainty as the compiler may compile it away. To be sure, the
special memzero_explicit() has to be used.

This patch introduces a new kvfree_sensitive() for freeing those sensitive
data objects allocated by kvmalloc(). The relevant places where
kvfree_sensitive() can be used are modified to use it.

Fixes: 4f0882491a14 ("KEYS: Avoid false positive ENOMEM error on key read")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Link: http://lkml.kernel.org/r/20200407200318.11711-1-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

15a2bc4d 04-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull execve updates from Eric Biederman:
"Last cycle for the Nth time I ran into bugs and quality of
implementation issues related to exec that could not be easily be
fixed because of the way exec is implemented. So I have been digging
into exec and cleanup up what I can.

I don't think I have exec sorted out enough to fix the issues I
started with but I have made some headway this cycle with 4 sets of
changes.

- promised cleanups after introducing exec_update_mutex

- trivial cleanups for exec

- control flow simplifications

- remove the recomputation of bprm->cred

The net result is code that is a bit easier to understand and work
with and a decrease in the number of lines of code (if you don't count
the added tests)"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits)
exec: Compute file based creds only once
exec: Add a per bprm->file version of per_clear
binfmt_elf_fdpic: fix execfd build regression
selftests/exec: Add binfmt_script regression test
exec: Remove recursion from search_binary_handler
exec: Generic execfd support
exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
exec: Move the call of prepare_binprm into search_binary_handler
exec: Allow load_misc_binary to call prepare_binprm unconditionally
exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
exec: Teach prepare_exec_creds how exec treats uids & gids
exec: Set the point of no return sooner
exec: Move handling of the point of no return to the top level
exec: Run sync_mm_rss before taking exec_update_mutex
exec: Fix spelling of search_binary_handler in a comment
exec: Move the comment from above de_thread to above unshare_sighand
exec: Rename flush_old_exec begin_new_exec
exec: Move most of setup_new_exec into flush_old_exec
exec: In setup_new_exec cache current in the local variable me
...


9ff72585 04-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull proc updates from Eric Biederman:
"This has four sets of changes:

- modernize proc to support multiple private instances

- ensure we see the exit of each process tid exactly

- remove has_group_leader_pid

- use pids not tasks in posix-cpu-timers lookup

Alexey updated proc so each mount of proc uses a new superblock. This
allows people to actually use mount options with proc with no fear of
messing up another mount of proc. Given the kernel's internal mounts
of proc for things like uml this was a real problem, and resulted in
Android's hidepid mount options being ignored and introducing security
issues.

The rest of the changes are small cleanups and fixes that came out of
my work to allow this change to proc. In essence it is swapping the
pids in de_thread during exec which removes a special case the code
had to handle. Then updating the code to stop handling that special
case"

* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: proc_pid_ns takes super_block as an argument
remove the no longer needed pid_alive() check in __task_pid_nr_ns()
posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock
posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type
posix-cpu-timers: Extend rcu_read_lock removing task_struct references
signal: Remove has_group_leader_pid
exec: Remove BUG_ON(has_group_leader_pid)
posix-cpu-timer: Unify the now redundant code in lookup_task
posix-cpu-timer: Tidy up group_leader logic in lookup_task
proc: Ensure we see the exit of each process tid exactly once
rculist: Add hlists_swap_heads_rcu
proc: Use PIDTYPE_TGID in next_tgid
Use proc_pid_ns() to get pid_namespace from the proc superblock
proc: use named enums for better readability
proc: use human-readable values for hidepid
docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior
proc: add option to mount only a pids subset
proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option
proc: allow to mount many instances of proc in one pid namespace
proc: rename struct proc_fs_info to proc_fs_opts


acf25aa6 04-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'Smack-for-5.8' of git://github.com/cschaufler/smack-next

Pull smack updates from Casey Schaufler:
"Clean out dead code and repair an out-of-bounds warning"

* tag 'Smack-for-5.8' of git://github.com/cschaufler/smack-next:
Smack: Remove unused inline function smk_ad_setfield_u_fs_path_mnt
Smack:- Remove redundant inode_smack cache
Smack:- Remove mutex lock "smk_lock" from inode_smack
Smack: slab-out-of-bounds in vsscanf
smack: remove redundant structure variable from header.
smack: avoid unused 'sip' variable warning


a484a497 04-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'keys-next-20200602' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull keyring updates from David Howells:

- Fix a documentation warning.

- Replace a zero-length array with a flexible one

- Make the big_key key type use ChaCha20Poly1305 and use the crypto
algorithm directly rather than going through the crypto layer.

- Implement the update op for the big_key type.

* tag 'keys-next-20200602' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Implement update for the big_key type
security/keys: rewrite big_key crypto to use library interface
KEYS: Replace zero-length array with flexible-array
Documentation: security: core.rst: add missing argument


cb8e59cc 03-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from David Miller:

1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.

2) Add GSO partial support to igc, from Sasha Neftin.

3) Several cleanups and improvements to r8169 from Heiner Kallweit.

4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.

5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.

6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

8) Add sriov and vf support to hinic, from Luo bin.

9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.

10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.

12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.

13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.

14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.

15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.

16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.

18) Several RISCV bpf jit optimizations, from Luke Nelson.

19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.

20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.

21) Add BPF iterators, from Yonghang Song.

22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.

23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.

25) Add CAP_BPF, from Alexei Starovoitov.

26) Support terse dumps in the packet scheduler, from Vlad Buslov.

27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

28) Add devm_register_netdev(), from Bartosz Golaszewski.

29) Minimize qdisc resets, from Cong Wang.

30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...


6cc7c266 03-Jun-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Call ima_calc_boot_aggregate() in ima_eventdigest_init()

If the template field 'd' is chosen and the digest to be added to the
measurement entry was not calculated with SHA1 or MD5, it is
recalculated with SHA1, by using the passed file descriptor. However, this
cannot be done for boot_aggregate, because there is no file descriptor.

This patch adds a call to ima_calc_boot_aggregate() in
ima_eventdigest_init(), so that the digest can be recalculated also for the
boot_aggregate entry.

Cc: stable@vger.kernel.org # 3.13.x
Fixes: 3ce1217d6cd5d ("ima: define template fields library and new helpers")
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

067a436b 03-Jun-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Directly assign the ima_default_policy pointer to ima_rules

This patch prevents the following oops:

[ 10.771813] BUG: kernel NULL pointer dereference, address: 0000000000000
[...]
[ 10.779790] RIP: 0010:ima_match_policy+0xf7/0xb80
[...]
[ 10.798576] Call Trace:
[ 10.798993] ? ima_lsm_policy_change+0x2b0/0x2b0
[ 10.799753] ? inode_init_owner+0x1a0/0x1a0
[ 10.800484] ? _raw_spin_lock+0x7a/0xd0
[ 10.801592] ima_must_appraise.part.0+0xb6/0xf0
[ 10.802313] ? ima_fix_xattr.isra.0+0xd0/0xd0
[ 10.803167] ima_must_appraise+0x4f/0x70
[ 10.804004] ima_post_path_mknod+0x2e/0x80
[ 10.804800] do_mknodat+0x396/0x3c0

It occurs when there is a failure during IMA initialization, and
ima_init_policy() is not called. IMA hooks still call ima_match_policy()
but ima_rules is NULL. This patch prevents the crash by directly assigning
the ima_default_policy pointer to ima_rules when ima_rules is defined. This
wouldn't alter the existing behavior, as ima_rules is always set at the end
of ima_init_policy().

Cc: stable@vger.kernel.org # 3.7.x
Fixes: 07f6a79415d7d ("ima: add appraise action keywords and default rules")
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

292f902a 02-Jun-2020 Miklos Szeredi <mszeredi@redhat.com>

ovl: call secutiry hook in ovl_real_ioctl()

Verify LSM permissions for underlying file, since vfs_ioctl() doesn't do
it.

[Stephen Rothwell] export security_file_ioctl

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

d9afbb35 02-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull lockdown update from James Morris:
"An update for the security subsystem to allow unprivileged users
to see the status of the lockdown feature. From Jeremy Cline"

Also an added comment to describe CAP_SETFCAP.

* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
capabilities: add description for CAP_SETFCAP
lockdown: Allow unprivileged users to see lockdown status


f41030a2 02-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux updates from Paul Moore:
"The highlights:

- A number of improvements to various SELinux internal data
structures to help improve performance. We move the role
transitions into a hash table. In the content structure we shift
from hashing the content string (aka SELinux label) to the
structure itself, when it is valid. This last change not only
offers a speedup, but it helps us simplify the code some as well.

- Add a new SELinux policy version which allows for a more space
efficient way of storing the filename transitions in the binary
policy. Given the default Fedora SELinux policy with the unconfined
module enabled, this change drops the policy size from ~7.6MB to
~3.3MB. The kernel policy load time dropped as well.

- Some fixes to the error handling code in the policy parser to
properly return error codes when things go wrong"

* tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: netlabel: Remove unused inline function
selinux: do not allocate hashtabs dynamically
selinux: fix return value on error in policydb_read()
selinux: simplify range_write()
selinux: fix error return code in policydb_read()
selinux: don't produce incorrect filename_trans_count
selinux: implement new format of filename transitions
selinux: move context hashing under sidtab
selinux: hash context structure directly
selinux: store role transitions in a hash table
selinux: drop unnecessary smp_load_acquire() call
selinux: fix warning Comparison to bool


91681e84 02-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'tomoyo-pr-20200601' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1

Pull tomoyo update from Tetsuo Handa:
"One patch for suppressing coccicheck's warning"

* tag 'tomoyo-pr-20200601' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: use true for bool variable


b6f61c31 12-May-2020 David Howells <dhowells@redhat.com>

keys: Implement update for the big_key type

Implement the ->update op for the big_key type.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>

521fd61c 11-May-2020 Jason A. Donenfeld <Jason@zx2c4.com>

security/keys: rewrite big_key crypto to use library interface

A while back, I noticed that the crypto and crypto API usage in big_keys
were entirely broken in multiple ways, so I rewrote it. Now, I'm
rewriting it again, but this time using the simpler ChaCha20Poly1305
library function. This makes the file considerably more simple; the
diffstat alone should justify this commit. It also should be faster,
since it no longer requires a mutex around the "aead api object" (nor
allocations), allowing us to encrypt multiple items in parallel. We also
benefit from being able to pass any type of pointer, so we can get rid
of the ridiculously complex custom page allocator that big_key really
doesn't need.

[DH: Change the select CRYPTO_LIB_CHACHA20POLY1305 to a depends on as
select doesn't propagate and the build can end up with an =y dependending
on some =m pieces.

The depends on CRYPTO also had to be removed otherwise the configurator
complains about a recursive dependency.]

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kernel-hardening@lists.openwall.com
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David Howells <dhowells@redhat.com>

e0cd9206 01-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'uaccess.access_ok' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull uaccess/access_ok updates from Al Viro:
"Removals of trivially pointless access_ok() calls.

Note: the fiemap stuff was removed from the series, since they are
duplicates with part of ext4 series carried in Ted's tree"

* 'uaccess.access_ok' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vmci_host: get rid of pointless access_ok()
hfi1: get rid of pointless access_ok()
usb: get rid of pointless access_ok() calls
lpfc_debugfs: get rid of pointless access_ok()
efi_test: get rid of pointless access_ok()
drm_read(): get rid of pointless access_ok()
via-pmu: don't bother with access_ok()
drivers/crypto/ccp/sev-dev.c: get rid of pointless access_ok()
omapfb: get rid of pointless access_ok() calls
amifb: get rid of pointless access_ok() calls
drivers/fpga/dfl-afu-dma-region.c: get rid of pointless access_ok()
drivers/fpga/dfl-fme-pr.c: get rid of pointless access_ok()
cm4000_cs.c cmm_ioctl(): get rid of pointless access_ok()
nvram: drop useless access_ok()
n_hdlc_tty_read(): remove pointless access_ok()
tomoyo_write_control(): get rid of pointless access_ok()
btrfs_ioctl_send(): don't bother with access_ok()
fat_dir_ioctl(): hadn't needed that access_ok() for more than a decade...
dlmfs_file_write(): get rid of pointless access_ok()


6d6861d4 11-May-2020 Anders Roxell <anders.roxell@linaro.org>

security: apparmor: default KUNIT_* fragments to KUNIT_ALL_TESTS

This makes it easier to enable all KUnit fragments.

Adding 'if !KUNIT_ALL_TESTS' so individual tests can not be turned off.
Therefore if KUNIT_ALL_TESTS is enabled that will hide the prompt in
menuconfig.

Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

a7092c82 01-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
"Kernel side changes:

- Add AMD Fam17h RAPL support

- Introduce CAP_PERFMON to kernel and user space

- Add Zhaoxin CPU support

- Misc fixes and cleanups

Tooling changes:

- perf record:

Introduce '--switch-output-event' to use arbitrary events to be
setup and read from a side band thread and, when they take place a
signal be sent to the main 'perf record' thread, reusing the core
for '--switch-output' to take perf.data snapshots from the ring
buffer used for '--overwrite', e.g.:

# perf record --overwrite -e sched:* \
--switch-output-event syscalls:*connect* \
workload

will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
connect syscalls.

Add '--num-synthesize-threads' option to control degree of
parallelism of the synthesize_mmap() code which is scanning
/proc/PID/task/PID/maps and can be time consuming. This mimics
pre-existing behaviour in 'perf top'.

- perf bench:

Add a multi-threaded synthesize benchmark and kallsyms parsing
benchmark.

- Intel PT support:

Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.

Allow using Intel PT to synthesize callchains for regular events.

Add support for synthesizing branch stacks for regular events
(cycles, instructions, etc) from Intel PT data.

Misc changes:

- Updated perf vendor events for power9 and Coresight.

- Add flamegraph.py script via 'perf flamegraph'

- Misc other changes, fixes and cleanups - see the Git log for details

Also, since over the last couple of years perf tooling has matured and
decoupled from the kernel perf changes to a large degree, going
forward Arnaldo is going to send perf tooling changes via direct pull
requests"

* tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits)
perf/x86/rapl: Add AMD Fam17h RAPL support
perf/x86/rapl: Make perf_probe_msr() more robust and flexible
perf/x86/rapl: Flip logic on default events visibility
perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs
perf/x86/rapl: Move RAPL support to common x86 code
perf/core: Replace zero-length array with flexible-array
perf/x86: Replace zero-length array with flexible-array
perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont
perf/x86/rapl: Add Ice Lake RAPL support
perf flamegraph: Use /bin/bash for report and record scripts
perf cs-etm: Move definition of 'traceid_list' global variable from header file
libsymbols kallsyms: Move hex2u64 out of header
libsymbols kallsyms: Parse using io api
perf bench: Add kallsyms parsing
perf: cs-etm: Update to build with latest opencsd version.
perf symbol: Fix kernel symbol address display
perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
...


81e8c10d 01-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
"API:
- Introduce crypto_shash_tfm_digest() and use it wherever possible.
- Fix use-after-free and race in crypto_spawn_alg.
- Add support for parallel and batch requests to crypto_engine.

Algorithms:
- Update jitter RNG for SP800-90B compliance.
- Always use jitter RNG as seed in drbg.

Drivers:
- Add Arm CryptoCell driver cctrng.
- Add support for SEV-ES to the PSP driver in ccp"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits)
crypto: hisilicon - fix driver compatibility issue with different versions of devices
crypto: engine - do not requeue in case of fatal error
crypto: cavium/nitrox - Fix a typo in a comment
crypto: hisilicon/qm - change debugfs file name from qm_regs to regs
crypto: hisilicon/qm - add DebugFS for xQC and xQE dump
crypto: hisilicon/zip - add debugfs for Hisilicon ZIP
crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE
crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC
crypto: hisilicon/qm - add debugfs to the QM state machine
crypto: hisilicon/qm - add debugfs for QM
crypto: stm32/crc32 - protect from concurrent accesses
crypto: stm32/crc32 - don't sleep in runtime pm
crypto: stm32/crc32 - fix multi-instance
crypto: stm32/crc32 - fix run-time self test issue.
crypto: stm32/crc32 - fix ext4 chksum BUG_ON()
crypto: hisilicon/zip - Use temporary sqe when doing work
crypto: hisilicon - add device error report through abnormal irq
crypto: hisilicon - remove codes of directly report device errors through MSI
crypto: hisilicon - QM memory management optimization
crypto: hisilicon - unify initial value assignment into QM
...


1806c13d 31-May-2020 David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.

The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.

Signed-off-by: David S. Miller <davem@davemloft.net>


56305aa9 29-May-2020 Eric W. Biederman <ebiederm@xmission.com>

exec: Compute file based creds only once

Move the computation of creds from prepare_binfmt into begin_new_exec
so that the creds need only be computed once. This is just code
reorganization no semantic changes of any kind are made.

Moving the computation is safe. I have looked through the kernel and
verified none of the binfmts look at bprm->cred directly, and that
there are no helpers that look at bprm->cred indirectly. Which means
that it is not a problem to compute the bprm->cred later in the
execution flow as it is not used until it becomes current->cred.

A new function bprm_creds_from_file is added to contain the work that
needs to be done. bprm_creds_from_file first computes which file
bprm->executable or most likely bprm->file that the bprm->creds
will be computed from.

The funciton bprm_fill_uid is updated to receive the file instead of
accessing bprm->file. The now unnecessary work needed to reset the
bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid.
A small comment to document that bprm_fill_uid now only deals with the
work to handle suid and sgid files. The default case is already
heandled by prepare_exec_creds.

The function security_bprm_repopulate_creds is renamed
security_bprm_creds_from_file and now is explicitly passed the file
from which to compute the creds. The documentation of the
bprm_creds_from_file security hook is updated to explain when the hook
is called and what it needs to do. The file is passed from
cap_bprm_creds_from_file into get_file_caps so that the caps are
computed for the appropriate file. The now unnecessary work in
cap_bprm_creds_from_file to reset the ambient capabilites has been
removed. A small comment to document that the work of
cap_bprm_creds_from_file is to read capabilities from the files
secureity attribute and derive capabilities from the fact the
user had uid 0 has been added.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

a7868323 29-May-2020 Eric W. Biederman <ebiederm@xmission.com>

exec: Add a per bprm->file version of per_clear

There is a small bug in the code that recomputes parts of bprm->cred
for every bprm->file. The code never recomputes the part of
clear_dangerous_personality_flags it is responsible for.

Which means that in practice if someone creates a sgid script
the interpreter will not be able to use any of:
READ_IMPLIES_EXEC
ADDR_NO_RANDOMIZE
ADDR_COMPAT_LAYOUT
MMAP_PAGE_ZERO.

This accentially clearing of personality flags probably does
not matter in practice because no one has complained
but it does make the code more difficult to understand.

Further remaining bug compatible prevents the recomputation from being
removed and replaced by simply computing bprm->cred once from the
final bprm->file.

Making this change removes the last behavior difference between
computing bprm->creds from the final file and recomputing
bprm->cred several times. Which allows this behavior change
to be justified for it's own reasons, and for any but hunts
looking into why the behavior changed to wind up here instead
of in the code that will follow that computes bprm->cred
from the final bprm->file.

This small logic bug appears to have existed since the code
started clearing dangerous personality bits.

History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support")
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

00fca6b5 23-Apr-2020 Al Viro <viro@zeniv.linux.org.uk>

tomoyo_write_control(): get rid of pointless access_ok()

address is passed only to get_user()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

0bffedbc 27-May-2020 Ingo Molnar <mingo@kernel.org>

Merge tag 'v5.7-rc7' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>


e32f8879 27-May-2020 Eric W. Biederman <ebiederm@xmission.com>

Merge commit a4ae32c71fe9 ("exec: Always set cap_ambient in cap_bprm_set_creds")

This is a bug fix and one of two places where I have found that the
result of calling security_bprm_repopulate_creds more than once on
different bprm->files depends on all of the bprm->files not just the
file bprm->file.

I intend to fix both of those cases and then modify the code to
only call security_bprm_repopulate_creds on the final bprm file.

So merge this change in so I hopefully reduce conflicts for others
and I make it possible to build on top of this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


3301f6ae 27-May-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

- Reverted stricter synchronization for cgroup recursive stats which
was prepping it for event counter usage which never got merged. The
change was causing performation regressions in some cases.

- Restore bpf-based device-cgroup operation even when cgroup1 device
cgroup is disabled.

- An out-param init fix.

* 'for-5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
device_cgroup: Cleanup cgroup eBPF device filter code
xattr: fix uninitialized out-param
Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"


006f38a1 27-May-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull execve fix from Eric Biederman:
"While working on my exec cleanups I found a bug in exec that winds up
miscomputing the ambient credentials during exec. Andy appears to have
to been confused as to why credentials are computed for both the
script and the interpreter

From the original patch description:

[3] Linux very confusingly processes both the script and the
interpreter if applicable, for reasons that elude me. The results
from thinking about a script's file capabilities and/or setuid
bits are mostly discarded.

The only value in struct cred that gets changed in cap_bprm_set_creds
that I could find that might persist between the script and the
interpreter was cap_ambient. Which is fixed with this trivial change"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
exec: Always set cap_ambient in cap_bprm_set_creds


a4ae32c7 24-May-2020 Eric W. Biederman <ebiederm@xmission.com>

exec: Always set cap_ambient in cap_bprm_set_creds

An invariant of cap_bprm_set_creds is that every field in the new cred
structure that cap_bprm_set_creds might set, needs to be set every
time to ensure the fields does not get a stale value.

The field cap_ambient is not set every time cap_bprm_set_creds is
called, which means that if there is a suid or sgid script with an
interpreter that has neither the suid nor the sgid bits set the
interpreter should be able to accept ambient credentials.
Unfortuantely because cap_ambient is not reset to it's original value
the interpreter can not accept ambient credentials.

Given that the ambient capability set is expected to be controlled by
the caller, I don't think this is particularly serious. But it is
definitely worth fixing so the code works correctly.

I have tested to verify my reading of the code is correct and the
interpreter of a sgid can receive ambient capabilities with this
change and cannot receive ambient capabilities without this change.

Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Fixes: 58319057b784 ("capabilities: ambient capabilities")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

13209a8f 24-May-2020 David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>


caffb99b 23-May-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

1) Fix RCU warnings in ipv6 multicast router code, from Madhuparna
Bhowmik.

2) Nexthop attributes aren't being checked properly because of
mis-initialized iterator, from David Ahern.

3) Revert iop_idents_reserve() change as it caused performance
regressions and was just working around what is really a UBSAN bug
in the compiler. From Yuqi Jin.

4) Read MAC address properly from ROM in bmac driver (double iteration
proceeds past end of address array), from Jeremy Kerr.

5) Add Microsoft Surface device IDs to r8152, from Marc Payne.

6) Prevent reference to freed SKB in __netif_receive_skb_core(), from
Boris Sukholitko.

7) Fix ACK discard behavior in rxrpc, from David Howells.

8) Preserve flow hash across packet scrubbing in wireguard, from Jason
A. Donenfeld.

9) Cap option length properly for SO_BINDTODEVICE in AX25, from Eric
Dumazet.

10) Fix encryption error checking in kTLS code, from Vadim Fedorenko.

11) Missing BPF prog ref release in flow dissector, from Jakub Sitnicki.

12) dst_cache must be used with BH disabled in tipc, from Eric Dumazet.

13) Fix use after free in mlxsw driver, from Jiri Pirko.

14) Order kTLS key destruction properly in mlx5 driver, from Tariq
Toukan.

15) Check devm_platform_ioremap_resource() return value properly in
several drivers, from Tiezhu Yang.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
net: smsc911x: Fix runtime PM imbalance on error
net/mlx4_core: fix a memory leak bug.
net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend
net: phy: mscc: fix initialization of the MACsec protocol mode
net: stmmac: don't attach interface until resume finishes
net: Fix return value about devm_platform_ioremap_resource()
net/mlx5: Fix error flow in case of function_setup failure
net/mlx5e: CT: Correctly get flow rule
net/mlx5e: Update netdev txq on completions during closure
net/mlx5: Annotate mutex destroy for root ns
net/mlx5: Don't maintain a case of del_sw_func being null
net/mlx5: Fix cleaning unmanaged flow tables
net/mlx5: Fix memory leak in mlx5_events_init
net/mlx5e: Fix inner tirs handling
net/mlx5e: kTLS, Destroy key object after destroying the TIS
net/mlx5e: Fix allowed tc redirect merged eswitch offload cases
net/mlx5: Avoid processing commands before cmdif is ready
net/mlx5: Fix a race when moving command interface to events mode
net/mlx5: Add command entry handling completion
rxrpc: Fix a memory leak in rxkad_verify_response()
...


8eb613c0 02-May-2020 Mimi Zohar <zohar@linux.ibm.com>

ima: verify mprotect change is consistent with mmap policy

Files can be mmap'ed read/write and later changed to execute to circumvent
IMA's mmap appraise policy rules. Due to locking issues (mmap semaphore
would be taken prior to i_mutex), files can not be measured or appraised at
this point. Eliminate this integrity gap, by denying the mprotect
PROT_EXECUTE change, if an mmap appraise policy rule exists.

On mprotect change success, return 0. On failure, return -EACESS.

Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

c54d481d 21-Oct-2019 Navid Emamdoost <navid.emamdoost@gmail.com>

apparmor: Fix use-after-free in aa_audit_rule_init

In the implementation of aa_audit_rule_init(), when aa_label_parse()
fails the allocated memory for rule is released using
aa_audit_rule_free(). But after this release, the return statement
tries to access the label field of the rule which results in
use-after-free. Before releasing the rule, copy errNo and return it
after release.

Fixes: 52e8c38001d8 ("apparmor: Fix memory leak of rule on error exit path")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

c6b39f07 19-Apr-2020 Xiyu Yang <xiyuyang19@fudan.edu.cn>

apparmor: Fix aa_label refcnt leak in policy_update

policy_update() invokes begin_current_label_crit_section(), which
returns a reference of the updated aa_label object to "label" with
increased refcount.

When policy_update() returns, "label" becomes invalid, so the refcount
should be decreased to keep refcount balanced.

The reference counting issue happens in one exception handling path of
policy_update(). When aa_may_manage_policy() returns not NULL, the
refcnt increased by begin_current_label_crit_section() is not decreased,
causing a refcnt leak.

Fix this issue by jumping to "end_section" label when
aa_may_manage_policy() returns not NULL.

Fixes: 5ac8c355ae00 ("apparmor: allow introspecting the loaded policy pre internal transform")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

a0b845ff 04-Apr-2020 Xiyu Yang <xiyuyang19@fudan.edu.cn>

apparmor: fix potential label refcnt leak in aa_change_profile

aa_change_profile() invokes aa_get_current_label(), which returns
a reference of the current task's label.

According to the comment of aa_get_current_label(), the returned
reference must be put with aa_put_label().
However, when the original object pointed by "label" becomes
unreachable because aa_change_profile() returns or a new object
is assigned to "label", reference count increased by
aa_get_current_label() is not decreased, causing a refcnt leak.

Fix this by calling aa_put_label() before aa_change_profile() return
and dropping unnecessary aa_get_current_label().

Fixes: 9fcf78cca198 ("apparmor: update domain transitions that are subsets of confinement at nnp")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

112b7147 13-May-2020 Eric W. Biederman <ebiederm@xmission.com>

exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds

Rename bprm->cap_elevated to bprm->active_secureexec and initialize it
in prepare_binprm instead of in cap_bprm_set_creds. Initializing
bprm->active_secureexec in prepare_binprm allows multiple
implementations of security_bprm_repopulate_creds to play nicely with
each other.

Rename security_bprm_set_creds to security_bprm_reopulate_creds to
emphasize that this path recomputes part of bprm->cred. This
recomputation avoids the time of check vs time of use problems that
are inherent in unix #! interpreters.

In short two renames and a move in the location of initializing
bprm->active_secureexec.

Link: https://lkml.kernel.org/r/87o8qkzrxp.fsf_-_@x220.int.ebiederm.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

0550cfe8 20-May-2020 KP Singh <kpsingh@google.com>

security: Fix hook iteration for secid_to_secctx

secid_to_secctx is not stackable, and since the BPF LSM registers this
hook by default, the call_int_hook logic is not suitable which
"bails-on-fail" and casues issues when other LSMs register this hook and
eventually breaks Audit.

In order to fix this, directly iterate over the security hooks instead
of using call_int_hook as suggested in:

https: //lore.kernel.org/bpf/9d0eb6c6-803a-ff3a-5603-9ad6d9edfc00@schaufler-ca.com/#t

Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks")
Fixes: 625236ba3832 ("security: Fix the default value of secid_to_secctx hook")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/bpf/20200520125616.193765-1-kpsingh@chromium.org

b8bff599 22-Mar-2020 Eric W. Biederman <ebiederm@xmission.com>

exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds

Today security_bprm_set_creds has several implementations:
apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds,
smack_bprm_set_creds, and tomoyo_bprm_set_creds.

Except for cap_bprm_set_creds they all test bprm->called_set_creds and
return immediately if it is true. The function cap_bprm_set_creds
ignores bprm->calld_sed_creds entirely.

Create a new LSM hook security_bprm_creds_for_exec that is called just
before prepare_binprm in __do_execve_file, resulting in a LSM hook
that is called exactly once for the entire of exec. Modify the bits
of security_bprm_set_creds that only want to be called once per exec
into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds
behind.

Remove bprm->called_set_creds all of it's former users have been moved
to security_bprm_creds_for_exec.

Add or upate comments a appropriate to bring them up to date and
to reflect this change.

Link: https://lkml.kernel.org/r/87v9kszrzh.fsf_-_@x220.int.ebiederm.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com> # For the LSM and Smack bits
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

a8478a60 14-Jan-2020 David Howells <dhowells@redhat.com>

smack: Implement the watch_key and post_notification hooks

Implement the watch_key security hook in Smack to make sure that a key
grants the caller Read permission in order to set a watch on a key.

Also implement the post_notification security hook to make sure that the
notification source is granted Write permission by the watch queue.

For the moment, the watch_devices security hook is left unimplemented as
it's not obvious what the object should be since the queue is global and
didn't previously exist.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>

3e412ccc 14-Jan-2020 David Howells <dhowells@redhat.com>

selinux: Implement the watch_key security hook

Implement the watch_key security hook to make sure that a key grants the
caller View permission in order to set a watch on a key.

For the moment, the watch_devices security hook is left unimplemented as
it's not obvious what the object should be since the queue is global and
didn't previously exist.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>

8c0637e9 12-May-2020 David Howells <dhowells@redhat.com>

keys: Make the KEY_NEED_* perms an enum rather than a mask

Since the meaning of combining the KEY_NEED_* constants is undefined, make
it so that you can't do that by turning them into an enum.

The enum is also given some extra values to represent special
circumstances, such as:

(1) The '0' value is reserved and causes a warning to trap the parameter
being unset.

(2) The key is to be unlinked and we require no permissions on it, only
the keyring, (this replaces the KEY_LOOKUP_FOR_UNLINK flag).

(3) An override due to CAP_SYS_ADMIN.

(4) An override due to an instantiation token being present.

(5) The permissions check is being deferred to later key_permission()
calls.

The extra values give the opportunity for LSMs to audit these situations.

[Note: This really needs overhauling so that lookup_user_key() tells
key_task_permission() and the LSM what operation is being done and leaves
it to those functions to decide how to map that onto the available
permits. However, I don't really want to make these change in the middle
of the notifications patchset.]

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
cc: Paul Moore <paul@paul-moore.com>
cc: Stephen Smalley <stephen.smalley.work@gmail.com>
cc: Casey Schaufler <casey@schaufler-ca.com>
cc: keyrings@vger.kernel.org
cc: selinux@vger.kernel.org

f7e47677 14-Jan-2020 David Howells <dhowells@redhat.com>

watch_queue: Add a key/keyring notification facility

Add a key/keyring change notification facility whereby notifications about
changes in key and keyring content and attributes can be received.

Firstly, an event queue needs to be created:

pipe2(fds, O_NOTIFICATION_PIPE);
ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, 256);

then a notification can be set up to report notifications via that queue:

struct watch_notification_filter filter = {
.nr_filters = 1,
.filters = {
[0] = {
.type = WATCH_TYPE_KEY_NOTIFY,
.subtype_filter[0] = UINT_MAX,
},
},
};
ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);

After that, records will be placed into the queue when events occur in
which keys are changed in some way. Records are of the following format:

struct key_notification {
struct watch_notification watch;
__u32 key_id;
__u32 aux;
} *n;

Where:

n->watch.type will be WATCH_TYPE_KEY_NOTIFY.

n->watch.subtype will indicate the type of event, such as
NOTIFY_KEY_REVOKED.

n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
record.

n->watch.info & WATCH_INFO_ID will be the second argument to
keyctl_watch_key(), shifted.

n->key will be the ID of the affected key.

n->aux will hold subtype-dependent information, such as the key
being linked into the keyring specified by n->key in the case of
NOTIFY_KEY_LINKED.

Note that it is permissible for event records to be of variable length -
or, at least, the length may be dependent on the subtype. Note also that
the queue can be shared between multiple notifications of various types.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>

998f5040 12-Feb-2020 David Howells <dhowells@redhat.com>

security: Add hooks to rule on setting a watch

Add security hooks that will allow an LSM to rule on whether or not a watch
may be set. More than one hook is required as the watches watch different
types of object.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
cc: Casey Schaufler <casey@schaufler-ca.com>
cc: Stephen Smalley <sds@tycho.nsa.gov>
cc: linux-security-module@vger.kernel.org

344fa64e 12-Feb-2020 David Howells <dhowells@redhat.com>

security: Add a hook for the point of notification insertion

Add a security hook that allows an LSM to rule on whether a notification
message is allowed to be inserted into a particular watch queue.

The hook is given the following information:

(1) The credentials of the triggerer (which may be init_cred for a system
notification, eg. a hardware error).

(2) The credentials of the whoever set the watch.

(3) The notification message.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
cc: Casey Schaufler <casey@schaufler-ca.com>
cc: Stephen Smalley <sds@tycho.nsa.gov>
cc: linux-security-module@vger.kernel.org

9d78edea 18-May-2020 Alexey Gladkov <gladkov.alexey@gmail.com>

proc: proc_pid_ns takes super_block as an argument

syzbot found that

touch /proc/testfile

causes NULL pointer dereference at tomoyo_get_local_path()
because inode of the dentry is NULL.

Before c59f415a7cb6, Tomoyo received pid_ns from proc's s_fs_info
directly. Since proc_pid_ns() can only work with inode, using it in
the tomoyo_get_local_path() was wrong.

To avoid creating more functions for getting proc_ns, change the
argument type of the proc_pid_ns() function. Then, Tomoyo can use
the existing super_block to get pid_ns.

Link: https://lkml.kernel.org/r/0000000000002f0c7505a5b0e04c@google.com
Link: https://lkml.kernel.org/r/20200518180738.2939611-1-gladkov.alexey@gmail.com
Reported-by: syzbot+c1af344512918c61362c@syzkaller.appspotmail.com
Fixes: c59f415a7cb6 ("Use proc_pid_ns() to get pid_namespace from the proc superblock")
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

642b151f 18-May-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity fixes from Mimi Zohar:
"A couple of miscellaneous bug fixes for the integrity subsystem:

IMA:

- Properly modify the open flags in order to calculate the file hash.

- On systems requiring the IMA policy to be signed, the policy is
loaded differently. Don't differentiate between "enforce" and
either "log" or "fix" modes how the policy is loaded.

EVM:

- Two patches to fix an EVM race condition, normally the result of
attempting to load an unsupported hash algorithm.

- Use the lockless RCU version for walking an append only list"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
evm: Fix a small race in init_desc()
evm: Fix RCU list related warnings
ima: Fix return value of ima_write_policy()
evm: Check also if *tfm is an error pointer in init_desc()
ima: Set file->f_mode instead of file->f_flags in ima_calc_file_hash()


e3798609 28-Apr-2020 Zou Wei <zou_wei@huawei.com>

apparmor: Use true and false for bool variable

Fixes coccicheck warnings:

security/apparmor/file.c:162:9-10: WARNING: return of 0/1 in function 'is_deleted' with return type bool
security/apparmor/file.c:362:9-10: WARNING: return of 0/1 in function 'xindex_is_subset' with return type bool
security/apparmor/policy_unpack.c:246:9-10: WARNING: return of 0/1 in function 'unpack_X' with return type bool
security/apparmor/policy_unpack.c:292:9-10: WARNING: return of 0/1 in function 'unpack_nameX' with return type bool
security/apparmor/policy_unpack.c:646:8-9: WARNING: return of 0/1 in function 'unpack_rlimits' with return type bool
security/apparmor/policy_unpack.c:604:8-9: WARNING: return of 0/1 in function 'unpack_secmark' with return type bool
security/apparmor/policy_unpack.c:538:8-9: WARNING: return of 0/1 in function 'unpack_trans_table' with return type bool
security/apparmor/policy_unpack.c:327:9-10: WARNING: return of 0/1 in function 'unpack_u32' with return type bool
security/apparmor/policy_unpack.c:345:9-10: WARNING: return of 0/1 in function 'unpack_u64' with return type bool
security/apparmor/policy_unpack.c:309:9-10: WARNING: return of 0/1 in function 'unpack_u8' with return type bool
security/apparmor/policy_unpack.c:568:8-9: WARNING: return of 0/1 in function 'unpack_xattrs' with return type bool
security/apparmor/policy_unpack.c:1007:10-11: WARNING: return of 0/1 in function 'verify_dfa_xindex' with return type bool
security/apparmor/policy_unpack.c:997:9-10: WARNING: return of 0/1 in function 'verify_xindex' with return type bool

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

c84b80cd 03-Mar-2020 Mateusz Nosek <mateusznosek0@gmail.com>

security/apparmor/label.c: Clean code by removing redundant instructions

Previously 'label->proxy->label' value checking
and conditional reassigning were done twice in the same function.
The second one is redundant and can be removed.

Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

fe9fd23e 07-May-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

apparmor: Replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>

a17b53c4 13-May-2020 Alexei Starovoitov <ast@kernel.org>

bpf, capability: Introduce CAP_BPF

Split BPF operations that are allowed under CAP_SYS_ADMIN into
combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.
For backward compatibility include them in CAP_SYS_ADMIN as well.

The end result provides simple safety model for applications that use BPF:
- to load tracing program types
BPF_PROG_TYPE_{KPROBE, TRACEPOINT, PERF_EVENT, RAW_TRACEPOINT, etc}
use CAP_BPF and CAP_PERFMON
- to load networking program types
BPF_PROG_TYPE_{SCHED_CLS, XDP, SK_SKB, etc}
use CAP_BPF and CAP_NET_ADMIN

There are few exceptions from this rule:
- bpf_trace_printk() is allowed in networking programs, but it's using
tracing mechanism, hence this helper needs additional CAP_PERFMON
if networking program is using this helper.
- BPF_F_ZERO_SEED flag for hash/lru map is allowed under CAP_SYS_ADMIN only
to discourage production use.
- BPF HW offload is allowed under CAP_SYS_ADMIN.
- bpf_probe_write_user() is allowed under CAP_SYS_ADMIN only.

CAPs are not checked at attach/detach time with two exceptions:
- loading BPF_PROG_TYPE_CGROUP_SKB is allowed for unprivileged users,
hence CAP_NET_ADMIN is required at attach time.
- flow_dissector detach doesn't check prog FD at detach,
hence CAP_NET_ADMIN is required at detach time.

CAP_SYS_ADMIN is required to iterate BPF objects (progs, maps, links) via get_next_id
command and convert them to file descriptor via GET_FD_BY_ID command.
This restriction guarantees that mutliple tasks with CAP_BPF are not able to
affect each other. That leads to clean isolation of tasks. For example:
task A with CAP_BPF and CAP_NET_ADMIN loads and attaches a firewall via bpf_link.
task B with the same capabilities cannot detach that firewall unless
task A explicitly passed link FD to task B via scm_rights or bpffs.
CAP_SYS_ADMIN can still detach/unload everything.

Two networking user apps with CAP_SYS_ADMIN and CAP_NET_ADMIN can
accidentely mess with each other programs and maps.
Two networking user apps with CAP_NET_ADMIN and CAP_BPF cannot affect each other.

CAP_NET_ADMIN + CAP_BPF allows networking programs access only packet data.
Such networking progs cannot access arbitrary kernel memory or leak pointers.

bpftool, bpftrace, bcc tools binaries should NOT be installed with
CAP_BPF and CAP_PERFMON, since unpriv users will be able to read kernel secrets.
But users with these two permissions will be able to use these tracing tools.

CAP_PERFMON is least secure, since it allows kprobes and kernel memory access.
CAP_NET_ADMIN can stop network traffic via iproute2.
CAP_BPF is the safest from security point of view and harmless on its own.

Having CAP_BPF and/or CAP_NET_ADMIN is not enough to write into arbitrary map
and if that map is used by firewall-like bpf prog.
CAP_BPF allows many bpf prog_load commands in parallel. The verifier
may consume large amount of memory and significantly slow down the system.

Existing unprivileged BPF operations are not affected.
In particular unprivileged users are allowed to load socket_filter and cg_skb
program types and to create array, hash, prog_array, map-in-map map types.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200513230355.7858-2-alexei.starovoitov@gmail.com

d00f26b6 14-May-2020 David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-05-14

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON.

2) support for narrow loads in bpf_sock_addr progs and additional
helpers in cg-skb progs, from Andrey.

3) bpf benchmark runner, from Andrii.

4) arm and riscv JIT optimizations, from Luke.

5) bpf iterator infrastructure, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>


84338569 12-May-2020 Dan Carpenter <dan.carpenter@oracle.com>

evm: Fix a small race in init_desc()

The IS_ERR_OR_NULL() function has two conditions and if we got really
unlucky we could hit a race where "ptr" started as an error pointer and
then was set to NULL. Both conditions would be false even though the
pointer at the end was NULL.

This patch fixes the problem by ensuring that "*tfm" can only be NULL
or valid. I have introduced a "tmp_tfm" variable to make that work. I
also reversed a condition and pulled the code in one tab.

Reported-by: Roberto Sassu <roberto.sassu@huawei.com>
Fixes: 53de3b080d5e ("evm: Check also if *tfm is an error pointer in init_desc()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Krzysztof Struczynski <krzysztof.struczynski@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

60cf7c5ed 14-May-2020 Jeremy Cline <jcline@redhat.com>

lockdown: Allow unprivileged users to see lockdown status

A number of userspace tools, such as systemtap, need a way to see the
current lockdown state so they can gracefully deal with the kernel being
locked down. The state is already exposed in
/sys/kernel/security/lockdown, but is only readable by root. Adjust the
permissions so unprivileged users can read the state.

Fixes: 000d388ed3bb ("security: Add a static lockdown policy LSM")
Cc: Frank Ch. Eigler <fche@redhat.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>

fe5a90b8 09-May-2020 YueHaibing <yuehaibing@huawei.com>

selinux: netlabel: Remove unused inline function

There's no callers in-tree.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

27acbf41 14-Apr-2020 Zou Wei <zou_wei@huawei.com>

tomoyo: use true for bool variable

Fixes coccicheck warning:

security/tomoyo/common.c:1028:2-13: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

ef26650a 09-May-2020 YueHaibing <yuehaibing@huawei.com>

Smack: Remove unused inline function smk_ad_setfield_u_fs_path_mnt

commit a269434d2fb4 ("LSM: separate LSM_AUDIT_DATA_DENTRY from LSM_AUDIT_DATA_PATH")
left behind this, remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

bce395ee 01-May-2020 Eric Biggers <ebiggers@google.com>

KEYS: encrypted: use crypto_shash_tfm_digest()

Instead of manually allocating a 'struct shash_desc' on the stack and
calling crypto_shash_digest(), switch to using the new helper function
crypto_shash_tfm_digest() which does this for us.

Cc: keyrings@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0c4395fb 14-Apr-2020 Roberto Sassu <roberto.sassu@huawei.com>

evm: Fix possible memory leak in evm_calc_hmac_or_hash()

Don't immediately return if the signature is portable and security.ima is
not present. Just set error so that memory allocated is freed before
returning from evm_calc_hmac_or_hash().

Fixes: 50b977481fce9 ("EVM: Add support for portable signature format")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

b59fda44 26-Apr-2020 Krzysztof Struczynski <krzysztof.struczynski@huawei.com>

ima: Set again build_ima_appraise variable

After adding the new add_rule() function in commit c52657d93b05
("ima: refactor ima_init_policy()"), all appraisal flags are added to the
temp_ima_appraise variable. Revert to the previous behavior instead of
removing build_ima_appraise, to benefit from the protection offered by
__ro_after_init.

The mentioned commit introduced a bug, as it makes all the flags
modifiable, while build_ima_appraise flags can be protected with
__ro_after_init.

Cc: stable@vger.kernel.org # 5.0.x
Fixes: c52657d93b05 ("ima: refactor ima_init_policy()")
Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Krzysztof Struczynski <krzysztof.struczynski@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

6ee28442 26-Apr-2020 Krzysztof Struczynski <krzysztof.struczynski@huawei.com>

ima: Remove redundant policy rule set in add_rules()

Function ima_appraise_flag() returns the flag to be set in
temp_ima_appraise depending on the hook identifier passed as an argument.
It is not necessary to set the flag again for the POLICY_CHECK hook.

Signed-off-by: Krzysztof Struczynski <krzysztof.struczynski@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

1129d31b 28-Apr-2020 Krzysztof Struczynski <krzysztof.struczynski@huawei.com>

ima: Fix ima digest hash table key calculation

Function hash_long() accepts unsigned long, while currently only one byte
is passed from ima_hash_key(), which calculates a key for ima_htable.

Given that hashing the digest does not give clear benefits compared to
using the digest itself, remove hash_long() and return the modulus
calculated on the first two bytes of the digest with the number of slots.
Also reduce the depth of the hash table by doubling the number of slots.

Cc: stable@vger.kernel.org
Fixes: 3323eec921ef ("integrity: IMA as an integrity service provider")
Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Krzysztof Struczynski <krzysztof.struczynski@huawei.com>
Acked-by: David.Laight@aculab.com (big endian system concerns)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

770f6058 30-Apr-2020 Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>

evm: Fix RCU list related warnings

This patch fixes the following warning and few other instances of
traversal of evm_config_xattrnames list:

[ 32.848432] =============================
[ 32.848707] WARNING: suspicious RCU usage
[ 32.848966] 5.7.0-rc1-00006-ga8d5875ce5f0b #1 Not tainted
[ 32.849308] -----------------------------
[ 32.849567] security/integrity/evm/evm_main.c:231 RCU-list traversed in non-reader section!!

Since entries are only added to the list and never deleted, use
list_for_each_entry_lockless() instead of list_for_each_entry_rcu for
traversing the list. Also, add a relevant comment in evm_secfs.c to
indicate this fact.

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org> (RCU viewpoint)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

2e3a34e9 26-Apr-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Fix return value of ima_write_policy()

This patch fixes the return value of ima_write_policy() when a new policy
is directly passed to IMA and the current policy requires appraisal of the
file containing the policy. Currently, if appraisal is not in ENFORCE mode,
ima_write_policy() returns 0 and leads user space applications to an
endless loop. Fix this issue by denying the operation regardless of the
appraisal mode.

Cc: stable@vger.kernel.org # 4.10.x
Fixes: 19f8a84713edc ("ima: measure and appraise the IMA policy itself")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Krzysztof Struczynski <krzysztof.struczynski@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

53de3b08 26-Apr-2020 Roberto Sassu <roberto.sassu@huawei.com>

evm: Check also if *tfm is an error pointer in init_desc()

This patch avoids a kernel panic due to accessing an error pointer set by
crypto_alloc_shash(). It occurs especially when there are many files that
require an unsupported algorithm, as it would increase the likelihood of
the following race condition:

Task A: *tfm = crypto_alloc_shash() <= error pointer
Task B: if (*tfm == NULL) <= *tfm is not NULL, use it
Task B: rc = crypto_shash_init(desc) <= panic
Task A: *tfm = NULL

This patch uses the IS_ERR_OR_NULL macro to determine whether or not a new
crypto context must be created.

Cc: stable@vger.kernel.org
Fixes: d46eb3699502b ("evm: crypto hash replaced by shash")
Co-developed-by: Krzysztof Struczynski <krzysztof.struczynski@huawei.com>
Signed-off-by: Krzysztof Struczynski <krzysztof.struczynski@huawei.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

0014cc04 26-Apr-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Set file->f_mode instead of file->f_flags in ima_calc_file_hash()

Commit a408e4a86b36 ("ima: open a new file instance if no read
permissions") tries to create a new file descriptor to calculate a file
digest if the file has not been opened with O_RDONLY flag. However, if a
new file descriptor cannot be obtained, it sets the FMODE_READ flag to
file->f_flags instead of file->f_mode.

This patch fixes this issue by replacing f_flags with f_mode as it was
before that commit.

Cc: stable@vger.kernel.org # 4.20.x
Fixes: a408e4a86b36 ("ima: open a new file instance if no read permissions")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

3793faad 06-May-2020 David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Conflicts were all overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>


f87b87a1 06-May-2020 Alexei Starovoitov <ast@kernel.org>

Merge tag 'perf-for-bpf-2020-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into bpf-next

CAP_PERFMON for BPF


4ca75287 28-Apr-2020 Casey Schaufler <casey@schaufler-ca.com>

Smack:- Remove redundant inode_smack cache

The inode_smack cache is no longer used.
Remove it.

Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

921bb1cb 24-Apr-2020 Casey Schaufler <casey@schaufler-ca.com>

Smack:- Remove mutex lock "smk_lock" from inode_smack

"smk_lock" mutex is used during inode instantiation in
smack_d_instantiate()function. It has been used to avoid
simultaneous access on same inode security structure.
Since smack related initialization is done only once i.e during
inode creation. If the inode has already been instantiated then
smack_d_instantiate() function just returns without doing
anything.

So it means mutex lock is required only during inode creation.
But since 2 processes can't create same inodes or files
simultaneously. Also linking or some other file operation can't
be done simultaneously when the file is getting created since
file lookup will fail before dentry inode linkup which is done
after smack initialization.
So no mutex lock is required in inode_smack structure.

It will save memory as well as improve some performance.
If 40000 inodes are created in system, it will save 1.5 MB on
32-bit systems & 2.8 MB on 64-bit systems.

Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

84e99e58 09-Apr-2020 Casey Schaufler <casey@schaufler-ca.com>

Smack: slab-out-of-bounds in vsscanf

Add barrier to soob. Return -EOVERFLOW if the buffer
is exceeded.

Suggested-by: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+bfdd4a2f07be52351350@syzkaller.appspotmail.com
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

092c94ae 09-Apr-2020 Maninder Singh <maninder1.s@samsung.com>

smack: remove redundant structure variable from header.

commit afb1cbe37440 ("LSM: Infrastructure management
of the inode security") removed usage of smk_rcu,
thus removing it from structure.

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

00720f0e 08-Apr-2020 Arnd Bergmann <arnd@arndb.de>

smack: avoid unused 'sip' variable warning

The mix of IS_ENABLED() and #ifdef checks has left a combination
that causes a warning about an unused variable:

security/smack/smack_lsm.c: In function 'smack_socket_connect':
security/smack/smack_lsm.c:2838:24: error: unused variable 'sip' [-Werror=unused-variable]
2838 | struct sockaddr_in6 *sip = (struct sockaddr_in6 *)sap;

Change the code to use C-style checks consistently so the compiler
can handle it correctly.

Fixes: 87fbfffcc89b ("broken ping to ipv6 linklocal addresses on debian buster")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

03414a49 28-Apr-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: do not allocate hashtabs dynamically

It is simpler to allocate them statically in the corresponding
structure, avoiding unnecessary kmalloc() calls and pointer
dereferencing.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: manual merging required in policydb.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>

46619b44 01-May-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: fix return value on error in policydb_read()

The value of rc is still zero from the last assignment when the error
path is taken. Fix it by setting it to -ENOMEM before the
hashtab_create() call.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e67b2ec9f617 ("selinux: store role transitions in a hash table")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

3348bd33 28-Apr-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: simplify range_write()

No need to traverse the hashtab to count its elements, hashtab already
tracks it for us.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

4c09f8b6 29-Apr-2020 Wei Yongjun <weiyongjun1@huawei.com>

selinux: fix error return code in policydb_read()

Fix to return negative error code -ENOMEM from the kvcalloc() error
handling case instead of 0, as done elsewhere in this function.

Fixes: acdf52d97f82 ("selinux: convert to kvmalloc")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

39e16d93 30-Apr-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fixes from Paul Moore:
"Two more SELinux patches to fix problems in the v5.7-rcX releases.

Wei Yongjun's patch fixes a return code in an error path, and my patch
fixes a problem where we were not correctly applying access controls
to all of the netlink messages in the netlink_send LSM hook"

* tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: properly handle multiple messages in selinux_netlink_send()
selinux: fix error return code in cond_read_list()


fb739741 28-Apr-2020 Paul Moore <paul@paul-moore.com>

selinux: properly handle multiple messages in selinux_netlink_send()

Fix the SELinux netlink_send hook to properly handle multiple netlink
messages in a single sk_buff; each message is parsed and subject to
SELinux access control. Prior to this patch, SELinux only inspected
the first message in the sk_buff.

Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

0b54142e 28-Apr-2020 Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'work.sysctl' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull in Christoph Hellwig's series that changes the sysctl's ->proc_handler
methods to take kernel pointers instead. It gets rid of the set_fs address
space overrides used by BPF. As per discussion, pull in the feature branch
into bpf-next as it relates to BPF sysctl progs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200427071508.GV23230@ZenIV.linux.org.uk/T/


292fed1f 26-Apr-2020 Wei Yongjun <weiyongjun1@huawei.com>

selinux: fix error return code in cond_read_list()

Fix to return negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 60abd3181db2 ("selinux: convert cond_list to array")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

32927393 24-Apr-2020 Christoph Hellwig <hch@lst.de>

sysctl: pass kernel pointers to ->proc_handler

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c59f415a 23-Apr-2020 Alexey Gladkov <gladkov.alexey@gmail.com>

Use proc_pid_ns() to get pid_namespace from the proc superblock

To get pid_namespace from the procfs superblock should be used a special
helper. This will avoid errors when s_fs_info will change the type.

Link: https://lore.kernel.org/lkml/20200423200316.164518-3-gladkov.alexey@gmail.com/
Link: https://lore.kernel.org/lkml/20200423112858.95820-1-gladkov.alexey@gmail.com/
Link: https://lore.kernel.org/lkml/06B50A1C-406F-4057-BFA8-3A7729EA7469@lca.pw/
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

9521eb3e 20-Apr-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: don't produce incorrect filename_trans_count

I thought I fixed the counting in filename_trans_read_helper() to count
the compat rule count correctly in the final version, but it's still
wrong. To really count the same thing as in the compat path, we'd need
to add up the cardinalities of stype bitmaps of all datums.

Since the kernel currently doesn't implement an ebitmap_cardinality()
function (and computing the proper count would just waste CPU cycles
anyway), just document that we use the field only in case of the old
format and stop updating it in filename_trans_read_helper().

Fixes: 430059024389 ("selinux: implement new format of filename transitions")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

87cfeb19 22-Apr-2020 Ingo Molnar <mingo@kernel.org>

Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:

kernel + tools/perf:

Alexey Budankov:

- Introduce CAP_PERFMON to kernel and user space.

callchains:

Adrian Hunter:

- Allow using Intel PT to synthesize callchains for regular events.

Kan Liang:

- Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.

perf script:

Andreas Gerstmayr:

- Add flamegraph.py script

BPF:

Jiri Olsa:

- Synthesize bpf_trampoline/dispatcher ksymbol events.

perf stat:

Arnaldo Carvalho de Melo:

- Honour --timeout for forked workloads.

Stephane Eranian:

- Force error in fallback on :k events, to avoid counting nothing when
the user asks for kernel events but is not allowed to.

perf bench:

Ian Rogers:

- Add event synthesis benchmark.

tools api fs:

Stephane Eranian:

- Make xxx__mountpoint() more scalable

libtraceevent:

He Zhe:

- Handle return value of asprintf.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


2592677c 25-Mar-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Use ima_hash_algo for collision detection in the measurement list

Before calculating a digest for each PCR bank, collisions were detected
with a SHA1 digest. This patch includes ima_hash_algo among the algorithms
used to calculate the template digest and checks collisions on that digest.

The position in the measurement entry array of the template digest
calculated with the IMA default hash algorithm is stored in the
ima_hash_algo_idx global variable and is determined at IMA initialization
time.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

1ea973df 25-Mar-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Calculate and extend PCR with digests in ima_template_entry

This patch modifies ima_calc_field_array_hash() to calculate a template
digest for each allocated PCR bank and SHA1. It also passes the tpm_digest
array of the template entry to ima_pcr_extend() or in case of a violation,
the pre-initialized digests array filled with 0xff.

Padding with zeros is still done if the mapping between TPM algorithm ID
and crypto ID is unknown.

This patch calculates again the template digest when a measurement list is
restored. Copying only the SHA1 digest (due to the limitation of the
current measurement list format) is not sufficient, as hash collision
detection will be done on the digest calculated with the IMA default hash
algorithm.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

6d94809a 25-Mar-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Allocate and initialize tfm for each PCR bank

This patch creates a crypto_shash structure for each allocated PCR bank and
for SHA1 if a bank with that algorithm is not currently allocated.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

aa724fe1 25-Mar-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Switch to dynamically allocated buffer for template digests

This patch dynamically allocates the array of tpm_digest structures in
ima_alloc_init_template() and ima_restore_template_data(). The size of the
array is equal to the number of PCR banks plus ima_extra_slots, to make
room for SHA1 and the IMA default hash algorithm, when PCR banks with those
algorithms are not allocated.

Calculating the SHA1 digest is mandatory, as SHA1 still remains the default
hash algorithm for the measurement list. When IMA will support the Crypto
Agile format, remaining digests will be also provided.

The position in the measurement entry array of the SHA1 digest is stored in
the ima_sha1_idx global variable and is determined at IMA initialization
time.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

7ca79645 25-Mar-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Store template digest directly in ima_template_entry

In preparation for the patch that calculates a digest for each allocated
PCR bank, this patch passes to ima_calc_field_array_hash() the
ima_template_entry structure, so that digests can be directly stored in
that structure instead of ima_digest_data.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

e144d6b2 25-Mar-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Evaluate error in init_ima()

Evaluate error in init_ima() before register_blocking_lsm_notifier() and
return if not zero.

Cc: stable@vger.kernel.org # 5.3.x
Fixes: b16942455193 ("ima: use the lsm policy update notifier")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

6f1a1d10 25-Mar-2020 Roberto Sassu <roberto.sassu@huawei.com>

ima: Switch to ima_hash_algo for boot aggregate

boot_aggregate is the first entry of IMA measurement list. Its purpose is
to link pre-boot measurements to IMA measurements. As IMA was designed to
work with a TPM 1.2, the SHA1 PCR bank was always selected even if a
TPM 2.0 with support for stronger hash algorithms is available.

This patch first tries to find a PCR bank with the IMA default hash
algorithm. If it does not find it, it selects the SHA256 PCR bank for
TPM 2.0 and SHA1 for TPM 1.2. Ultimately, it selects SHA1 also for TPM 2.0
if the SHA256 PCR bank is not found.

If none of the PCR banks above can be found, boot_aggregate file digest is
filled with zeros, as for TPM bypass, making it impossible to perform a
remote attestation of the system.

Cc: stable@vger.kernel.org # 5.1.x
Fixes: 879b589210a9 ("tpm: retrieve digest size of unknown algorithms with PCR read")
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

43005902 16-Apr-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: implement new format of filename transitions

Implement a new, more space-efficient way of storing filename
transitions in the binary policy. The internal structures have already
been converted to this new representation; this patch just implements
reading/writing an equivalent represntation from/to the binary policy.

This new format reduces the size of Fedora policy from 7.6 MB to only
3.3 MB (with policy optimization enabled in both cases). With the
unconfined module disabled, the size is reduced from 3.3 MB to 2.4 MB.

The time to load policy into kernel is also shorter with the new format.
On Fedora Rawhide x86_64 it dropped from 157 ms to 106 ms; without the
unconfined module from 115 ms to 105 ms.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

225621c9 17-Apr-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: move context hashing under sidtab

Now that context hash computation no longer depends on policydb, we can
simplify things by moving the context hashing completely under sidtab.
The hash is still cached in sidtab entries, but not for the in-flight
context structures.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

50077289 17-Apr-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: hash context structure directly

Always hashing the string representation is inefficient. Just hash the
contents of the structure directly (using jhash). If the context is
invalid (str & len are set), then hash the string as before, otherwise
hash the structured data.

Since the context hashing function is now faster (about 10 times), this
patch decreases the overhead of security_transition_sid(), which is
called from many hooks.

The jhash function seemed as a good choice, since it is used as the
default hashing algorithm in rhashtable.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Jeff Vander Stoep <jeffv@google.com>
Tested-by: Jeff Vander Stoep <jeffv@google.com>
[PM: fixed some spelling errors in the comments pointed out by JVS]
Signed-off-by: Paul Moore <paul@paul-moore.com>

e67b2ec9 07-Apr-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: store role transitions in a hash table

Currently, they are stored in a linked list, which adds significant
overhead to security_transition_sid(). On Fedora, with 428 role
transitions in policy, converting this list to a hash table cuts down
its run time by about 50%. This was measured by running 'stress-ng --msg
1 --msg-ops 100000' under perf with and without this patch.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

9786cab6 16-Apr-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20200416' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fix from Paul Moore:
"One small SELinux fix to ensure we cleanup properly on an error
condition"

* tag 'selinux-pr-20200416' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: free str on error in str_read()


86d32f9a 14-Apr-2020 Vasily Averin <vvs@virtuozzo.com>

keys: Fix proc_keys_next to increase position index

If seq_file .next function does not change position index,
read after some lseek can generate unexpected output:

$ dd if=/proc/keys bs=1 # full usual output
0f6bfdf5 I--Q--- 2 perm 3f010000 1000 1000 user 4af2f79ab8848d0a: 740
1fb91b32 I--Q--- 3 perm 1f3f0000 1000 65534 keyring _uid.1000: 2
27589480 I--Q--- 1 perm 0b0b0000 0 0 user invocation_id: 16
2f33ab67 I--Q--- 152 perm 3f030000 0 0 keyring _ses: 2
33f1d8fa I--Q--- 4 perm 3f030000 1000 1000 keyring _ses: 1
3d427fda I--Q--- 2 perm 3f010000 1000 1000 user 69ec44aec7678e5a: 740
3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1
521+0 records in
521+0 records out
521 bytes copied, 0,00123769 s, 421 kB/s

But a read after lseek in middle of last line results in the partial
last line and then a repeat of the final line:

$ dd if=/proc/keys bs=500 skip=1
dd: /proc/keys: cannot skip to specified offset
g _uid_ses.1000: 1
3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1
0+1 records in
0+1 records out
97 bytes copied, 0,000135035 s, 718 kB/s

and a read after lseek beyond end of file results in the last line being
shown:

$ dd if=/proc/keys bs=1000 skip=1 # read after lseek beyond end of file
dd: /proc/keys: cannot skip to specified offset
3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1
0+1 records in
0+1 records out
76 bytes copied, 0,000119981 s, 633 kB/s

See https://bugzilla.kernel.org/show_bug.cgi?id=206283

Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

98073728 02-Apr-2020 Alexey Budankov <alexey.budankov@linux.intel.com>

capabilities: Introduce CAP_PERFMON to kernel and user space

Introduce the CAP_PERFMON capability designed to secure system
performance monitoring and observability operations so that CAP_PERFMON
can assist CAP_SYS_ADMIN capability in its governing role for
performance monitoring and observability subsystems.

CAP_PERFMON hardens system security and integrity during performance
monitoring and observability operations by decreasing attack surface that
is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
to system performance monitoring and observability operations under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
chances to misuse the credentials and makes the operation more secure.

Thus, CAP_PERFMON implements the principle of least privilege for
performance monitoring and observability operations (POSIX IEEE 1003.1e:
2.2.2.39 principle of least privilege: A security design principle that
states that a process or program be granted only those privileges
(e.g., capabilities) necessary to accomplish its legitimate function,
and only for the time that such privileges are actually required)

CAP_PERFMON meets the demand to secure system performance monitoring and
observability operations for adoption in security sensitive, restricted,
multiuser production environments (e.g. HPC clusters, cloud and virtual compute
environments), where root or CAP_SYS_ADMIN credentials are not available to
mass users of a system, and securely unblocks applicability and scalability
of system performance monitoring and observability operations beyond root
and CAP_SYS_ADMIN use cases.

CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance
monitoring and observability operations and balances amount of CAP_SYS_ADMIN
credentials following the recommendations in the capabilities man page [1]
for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel
developers, below." For backward compatibility reasons access to system
performance monitoring and observability subsystems of the kernel remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability
usage for secure system performance monitoring and observability operations
is discouraged with respect to the designed CAP_PERFMON capability.

Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official hardware issues mitigation procedure [2]. The bugs
in the software itself can be fixed following the standard kernel development
process [3] to maintain and harden security of system performance monitoring
and observability operations.

[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Link: http://lore.kernel.org/lkml/5590d543-82c6-490a-6544-08e6a5517db0@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

433e3aa3 08-Apr-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: drop unnecessary smp_load_acquire() call

In commit 66f8e2f03c02 ("selinux: sidtab reverse lookup hash table") the
corresponding load is moved under the spin lock, so there is no race
possible and we can read the count directly. The smp_store_release() is
still needed to avoid racing with the lock-free readers.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

af15f14c 14-Apr-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: free str on error in str_read()

In [see "Fixes:"] I missed the fact that str_read() may give back an
allocated pointer even if it returns an error, causing a potential
memory leak in filename_trans_read_one(). Fix this by making the
function free the allocated string whenever it returns a non-zero value,
which also makes its behavior more obvious and prevents repeating the
same mistake in the future.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1461665 ("Resource leaks")
Fixes: c3a276111ea2 ("selinux: optimize storage of filename transitions")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>

4b850396 14-Apr-2020 Zou Wei <zou_wei@huawei.com>

selinux: fix warning Comparison to bool

fix below warnings reported by coccicheck

security/selinux/ss/mls.c:539:39-43: WARNING: Comparison to bool
security/selinux/ss/services.c:1815:46-50: WARNING: Comparison to bool
security/selinux/ss/services.c:1827:46-50: WARNING: Comparison to bool

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

eec8fd02 03-Apr-2020 Odin Ugedal <odin@ugedal.com>

device_cgroup: Cleanup cgroup eBPF device filter code

Original cgroup v2 eBPF code for filtering device access made it
possible to compile with CONFIG_CGROUP_DEVICE=n and still use the eBPF
filtering. Change
commit 4b7d4d453fc4 ("device_cgroup: Export devcgroup_check_permission")
reverted this, making it required to set it to y.

Since the device filtering (and all the docs) for cgroup v2 is no longer
a "device controller" like it was in v1, someone might compile their
kernel with CONFIG_CGROUP_DEVICE=n. Then (for linux 5.5+) the eBPF
filter will not be invoked, and all processes will be allowed access
to all devices, no matter what the eBPF filter says.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

c27c6bd2 31-Mar-2020 John Johansen <john.johansen@canonical.com>

apparmor: ensure that dfa state tables have entries

Currently it is possible to specify a state machine table with 0 length,
this is not valid as optional tables are specified by not defining
the table as present. Further this allows by-passing the base tables
range check against the next/check tables.

Fixes: d901d6a298dc ("apparmor: dfa split verification of table headers")
Reported-by: Mike Salvatore <mike.salvatore@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

4c205c84 04-Apr-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'keys-fixes-20200329' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull keyrings fixes from David Howells:
"Here's a couple of patches that fix a circular dependency between
holding key->sem and mm->mmap_sem when reading data from a key.

One potential issue is that a filesystem looking to use a key inside,
say, ->readpages() could deadlock if the key being read is the key
that's required and the buffer the key is being read into is on a page
that needs to be fetched.

The case actually detected is a bit more involved - with a filesystem
calling request_key() and locking the target keyring for write - which
could be being read"

* tag 'keys-fixes-20200329' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
KEYS: Avoid false positive ENOMEM error on key read
KEYS: Don't write out to userspace while holding key semaphore


ff2ae607 03-Apr-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx

Pull SPDX updates from Greg KH:
"Here are three SPDX patches for 5.7-rc1.

One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.

Nothing too complex, but you will get a merge conflict with your
current tree, that should be trivial to handle (one file modified by
two things, one file deleted.)

All three of these have been in linux-next for a while, with no
reported issues other than the merge conflict"

* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
ASoC: MT6660: make spdxcheck.py happy
.gitignore: add SPDX License Identifier
.gitignore: remove too obvious comments


7f218319 02-Apr-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
"Just a couple of updates for linux-5.7:

- A new Kconfig option to enable IMA architecture specific runtime
policy rules needed for secure and/or trusted boot, as requested.

- Some message cleanup (eg. pr_fmt, additional error messages)"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: add a new CONFIG for loading arch-specific policies
integrity: Remove duplicate pr_fmt definitions
IMA: Add log statements for failure conditions
IMA: Update KBUILD_MODNAME for IMA files to ima


29d9f30d 31-Mar-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from David Miller:
"Highlights:

1) Fix the iwlwifi regression, from Johannes Berg.

2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.

3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.

4) Add TTL decrement action to openvswitch, from Matteo Croce.

5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.

6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.

7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.

8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.

9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.

11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.

12) Support bcmgenet on ACPI, from Jeremy Linton.

13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.

14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.

16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.

17) Offload FIFO qdisc in mlxsw, from Petr Machata.

18) Support UDP sockets in sockmap, from Lorenz Bauer.

19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.

20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.

21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.

22) Add support for hw offload of MACSEC, from Antoine Tenart.

23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.

24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.

25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...


b3aa112d 31-Mar-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux updates from Paul Moore:
"We've got twenty SELinux patches for the v5.7 merge window, the
highlights are below:

- Deprecate setting /sys/fs/selinux/checkreqprot to 1.

This flag was originally created to deal with legacy userspace and
the READ_IMPLIES_EXEC personality flag. We changed the default from
1 to 0 back in Linux v4.4 and now we are taking the next step of
deprecating it, at some point in the future we will take the final
step of rejecting 1.

- Allow kernfs symlinks to inherit the SELinux label of the parent
directory. In order to preserve backwards compatibility this is
protected by the genfs_seclabel_symlinks SELinux policy capability.

- Optimize how we store filename transitions in the kernel, resulting
in some significant improvements to policy load times.

- Do a better job calculating our internal hash table sizes which
resulted in additional policy load improvements and likely general
SELinux performance improvements as well.

- Remove the unused initial SIDs (labels) and improve how we handle
initial SIDs.

- Enable per-file labeling for the bpf filesystem.

- Ensure that we properly label NFS v4.2 filesystems to avoid a
temporary unlabeled condition.

- Add some missing XFS quota command types to the SELinux quota
access controls.

- Fix a problem where we were not updating the seq_file position
index correctly in selinuxfs.

- We consolidate some duplicated code into helper functions.

- A number of list to array conversions.

- Update Stephen Smalley's email address in MAINTAINERS"

* tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: clean up indentation issue with assignment statement
NFS: Ensure security label is set for root inode
MAINTAINERS: Update my email address
selinux: avtab_init() and cond_policydb_init() return void
selinux: clean up error path in policydb_init()
selinux: remove unused initial SIDs and improve handling
selinux: reduce the use of hard-coded hash sizes
selinux: Add xfs quota command types
selinux: optimize storage of filename transitions
selinux: factor out loop body from filename_trans_read()
security: selinux: allow per-file labeling for bpffs
selinux: generalize evaluate_cond_node()
selinux: convert cond_expr to array
selinux: convert cond_av_list to array
selinux: convert cond_list to array
selinux: sel_avc_get_stat_idx should increase position index
selinux: allow kernfs symlinks to inherit parent directory context
selinux: simplify evaluate_cond_node()
Documentation,selinux: deprecate setting checkreqprot to 1
selinux: move status variables out of selinux_ss


c753924b 27-Mar-2020 Colin Ian King <colin.king@canonical.com>

selinux: clean up indentation issue with assignment statement

The assignment of e->type_names is indented one level too deep,
clean this up by removing the extraneous tab.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

a776c270 30-Mar-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI updates from Ingo Molnar:
"The EFI changes in this cycle are much larger than usual, for two
(positive) reasons:

- The GRUB project is showing signs of life again, resulting in the
introduction of the generic Linux/UEFI boot protocol, instead of
x86 specific hacks which are increasingly difficult to maintain.
There's hope that all future extensions will now go through that
boot protocol.

- Preparatory work for RISC-V EFI support.

The main changes are:

- Boot time GDT handling changes

- Simplify handling of EFI properties table on arm64

- Generic EFI stub cleanups, to improve command line handling, file
I/O, memory allocation, etc.

- Introduce a generic initrd loading method based on calling back
into the firmware, instead of relying on the x86 EFI handover
protocol or device tree.

- Introduce a mixed mode boot method that does not rely on the x86
EFI handover protocol either, and could potentially be adopted by
other architectures (if another one ever surfaces where one
execution mode is a superset of another)

- Clean up the contents of 'struct efi', and move out everything that
doesn't need to be stored there.

- Incorporate support for UEFI spec v2.8A changes that permit
firmware implementations to return EFI_UNSUPPORTED from UEFI
runtime services at OS runtime, and expose a mask of which ones are
supported or unsupported via a configuration table.

- Partial fix for the lack of by-VA cache maintenance in the
decompressor on 32-bit ARM.

- Changes to load device firmware from EFI boot service memory
regions

- Various documentation updates and minor code cleanups and fixes"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
efi/libstub/arm: Fix spurious message that an initrd was loaded
efi/libstub/arm64: Avoid image_base value from efi_loaded_image
partitions/efi: Fix partition name parsing in GUID partition entry
efi/x86: Fix cast of image argument
efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
efi/x86: Ignore the memory attributes table on i386
efi/x86: Don't relocate the kernel unless necessary
efi/x86: Remove extra headroom for setup block
efi/x86: Add kernel preferred address to PE header
efi/x86: Decompress at start of PE image load address
x86/boot/compressed/32: Save the output address instead of recalculating it
efi/libstub/x86: Deal with exit() boot service returning
x86/boot: Use unsigned comparison for addresses
efi/x86: Avoid using code32_start
efi/x86: Make efi32_pe_entry() more readable
efi/x86: Respect 32-bit ABI in efi32_pe_entry()
efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
...


520b7aa0 28-Mar-2020 KP Singh <kpsingh@google.com>

bpf: lsm: Initialize the BPF LSM hooks

* The hooks are initialized using the definitions in
include/linux/lsm_hook_defs.h.
* The LSM can be enabled / disabled with CONFIG_BPF_LSM.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Florent Revest <revest@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/bpf/20200329004356.27286-6-kpsingh@chromium.org

98e828a0 28-Mar-2020 KP Singh <kpsingh@google.com>

security: Refactor declaration of LSM hooks

The information about the different types of LSM hooks is scattered
in two locations i.e. union security_list_options and
struct security_hook_heads. Rather than duplicating this information
even further for BPF_PROG_TYPE_LSM, define all the hooks with the
LSM_HOOK macro in lsm_hook_defs.h which is then used to generate all
the data structures required by the LSM framework.

The LSM hooks are defined as:

LSM_HOOK(<return_type>, <default_value>, <hook_name>, args...)

with <default_value> acccessible in security.c as:

LSM_RET_DEFAULT(<hook_name>)

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Florent Revest <revest@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/bpf/20200329004356.27286-3-kpsingh@chromium.org

4f088249 21-Mar-2020 Waiman Long <longman@redhat.com>

KEYS: Avoid false positive ENOMEM error on key read

By allocating a kernel buffer with a user-supplied buffer length, it
is possible that a false positive ENOMEM error may be returned because
the user-supplied length is just too large even if the system do have
enough memory to hold the actual key data.

Moreover, if the buffer length is larger than the maximum amount of
memory that can be returned by kmalloc() (2^(MAX_ORDER-1) number of
pages), a warning message will also be printed.

To reduce this possibility, we set a threshold (PAGE_SIZE) over which we
do check the actual key length first before allocating a buffer of the
right size to hold it. The threshold is arbitrary, it is just used to
trigger a buffer length check. It does not limit the actual key length
as long as there is enough memory to satisfy the memory request.

To further avoid large buffer allocation failure due to page
fragmentation, kvmalloc() is used to allocate the buffer so that vmapped
pages can be used when there is not a large enough contiguous set of
pages available for allocation.

In the extremely unlikely scenario that the key keeps on being changed
and made longer (still <= buflen) in between 2 __keyctl_read_key()
calls, the __keyctl_read_key() calling loop in keyctl_read_key() may
have to be iterated a large number of times, but definitely not infinite.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>

d3ec10aa 21-Mar-2020 Waiman Long <longman@redhat.com>

KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>

d198b34f 03-Mar-2020 Masahiro Yamada <masahiroy@kernel.org>

.gitignore: add SPDX License Identifier

Add SPDX License Identifier to all .gitignore files.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2985bed6 03-Mar-2020 Masahiro Yamada <masahiroy@kernel.org>

.gitignore: remove too obvious comments

Some .gitignore files have comments like "Generated files",
"Ignore generated files" at the header part, but they are
too obvious.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2e356101 27-Feb-2020 Yang Xu <xuyang2018.jy@cn.fujitsu.com>

KEYS: reaching the keys quotas correctly

Currently, when we add a new user key, the calltrace as below:

add_key()
key_create_or_update()
key_alloc()
__key_instantiate_and_link
generic_key_instantiate
key_payload_reserve
......

Since commit a08bf91ce28e ("KEYS: allow reaching the keys quotas exactly"),
we can reach max bytes/keys in key_alloc, but we forget to remove this
limit when we reserver space for payload in key_payload_reserve. So we
can only reach max keys but not max bytes when having delta between plen
and type->def_datalen. Remove this limit when instantiating the key, so we
can keep consistent with key_alloc.

Also, fix the similar problem in keyctl_chown_key().

Fixes: 0b77f5bfb45c ("keys: make the keyring quotas controllable through /proc/sys")
Fixes: a08bf91ce28e ("KEYS: allow reaching the keys quotas exactly")
Cc: stable@vger.kernel.org # 5.0.x
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

9e2b4be3 08-Mar-2020 Nayna Jain <nayna@linux.ibm.com>

ima: add a new CONFIG for loading arch-specific policies

Every time a new architecture defines the IMA architecture specific
functions - arch_ima_get_secureboot() and arch_ima_get_policy(), the IMA
include file needs to be updated. To avoid this "noise", this patch
defines a new IMA Kconfig IMA_SECURE_AND_OR_TRUSTED_BOOT option, allowing
the different architectures to select it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Philipp Rudo <prudo@linux.ibm.com> (s390)
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

5e729e11 05-Mar-2020 Paul Moore <paul@paul-moore.com>

selinux: avtab_init() and cond_policydb_init() return void

The avtab_init() and cond_policydb_init() functions always return
zero so mark them as returning void and update the callers not to
check for a return value.

Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

34a2dab4 02-Mar-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: clean up error path in policydb_init()

Commit e0ac568de1fa ("selinux: reduce the use of hard-coded hash sizes")
moved symtab initialization out of policydb_init(), but left the cleanup
of symtabs from the error path. This patch fixes the oversight.

Suggested-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

555d6d71 18-Feb-2020 Tushar Sugandhi <tusharsu@linux.microsoft.com>

integrity: Remove duplicate pr_fmt definitions

The #define for formatting log messages, pr_fmt, is duplicated in the
files under security/integrity.

This change moves the definition to security/integrity/integrity.h and
removes the duplicate definitions in the other files under
security/integrity.

With this change, the messages in the following files will be prefixed
with 'integrity'.

security/integrity/platform_certs/platform_keyring.c
security/integrity/platform_certs/load_powerpc.c
security/integrity/platform_certs/load_uefi.c
security/integrity/iint.c

e.g. "integrity: Error adding keys to platform keyring %s\n"

And the messages in the following file will be prefixed with 'ima'.

security/integrity/ima/ima_mok.c

e.g. "ima: Allocating IMA blacklist keyring.\n"

For the rest of the files under security/integrity, there will be no
change in the message format.

Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

72ec611c 18-Feb-2020 Tushar Sugandhi <tusharsu@linux.microsoft.com>

IMA: Add log statements for failure conditions

process_buffer_measurement() does not have log messages for failure
conditions.

This change adds a log statement in the above function.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

e2bf6814 18-Feb-2020 Tushar Sugandhi <tusharsu@linux.microsoft.com>

IMA: Update KBUILD_MODNAME for IMA files to ima

The kbuild Makefile specifies object files for vmlinux in the $(obj-y)
lists. These lists depend on the kernel configuration[1].

The kbuild Makefile for IMA combines the object files for IMA into a
single object file namely ima.o. All the object files for IMA should be
combined into ima.o. But certain object files are being added to their
own $(obj-y). This results in the log messages from those modules getting
prefixed with their respective base file name, instead of "ima". This is
inconsistent with the log messages from the IMA modules that are combined
into ima.o.

This change fixes the above issue.

[1] Documentation\kbuild\makefiles.rst

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

e3e0b582 24-Feb-2020 Stephen Smalley <sds@tycho.nsa.gov>

selinux: remove unused initial SIDs and improve handling

Remove initial SIDs that have never been used or are no longer used by
the kernel from its string table, which is also used to generate the
SECINITSID_* symbols referenced in code. Update the code to
gracefully handle the fact that these can now be NULL. Stop treating
it as an error if a policy defines additional initial SIDs unknown to
the kernel. Do not load unused initial SID contexts into the sidtab.
Fix the incorrect usage of the name from the ocontext in error
messages when loading initial SIDs since these are not presently
written to the kernel policy and are therefore always NULL.

After this change, it is possible to safely reclaim and reuse some of
the unused initial SIDs without compatibility issues. Specifically,
unused initial SIDs that were being assigned the same context as the
unlabeled initial SID in policies can be reclaimed and reused for
another purpose, with existing policies still treating them as having
the unlabeled context and future policies having the option of mapping
them to a more specific context. For example, this could have been
used when the infiniband labeling support was introduced to define
initial SIDs for the default pkey and endport SIDs similar to the
handling of port/netif/node SIDs rather than always using
SECINITSID_UNLABELED as the default.

The set of safely reclaimable unused initial SIDs across all known
policies is igmp_packet (13), icmp_socket (14), tcp_socket (15), kmod
(24), policy (25), and scmp_packet (26); these initial SIDs were
assigned the same context as unlabeled in all known policies including
mls. If only considering non-mls policies (i.e. assuming that mls
users always upgrade policy with their kernels), the set of safely
reclaimable unused initial SIDs further includes file_labels (6), init
(7), sysctl_modprobe (16), and sysctl_fs (18) through sysctl_dev (23).

Adding new initial SIDs beyond SECINITSID_NUM to policy unfortunately
became a fatal error in commit 24ed7fdae669 ("selinux: use separate
table for initial SID lookup") and even before that it could cause
problems on a policy reload (collision between the new initial SID and
one allocated at runtime) ever since commit 42596eafdd75 ("selinux:
load the initial SIDs upon every policy load") so we cannot safely
start adding new initial SIDs to policies beyond SECINITSID_NUM (27)
until such a time as all such kernels do not need to be supported and
only those that include this commit are relevant. That is not a big
deal since we haven't added a new initial SID since 2004 (v2.6.7) and
we have plenty of unused ones we can reclaim if we truly need one.

If we want to avoid the wasted storage in initial_sid_to_string[]
and/or sidtab->isids[] for the unused initial SIDs, we could introduce
an indirection between the kernel initial SID values and the policy
initial SID values and just map the policy SID values in the ocontexts
to the kernel values during policy_load_isids(). Originally I thought
we'd do this by preserving the initial SID names in the kernel policy
and creating a mapping at load time like we do for the security
classes and permissions but that would require a new kernel policy
format version and associated changes to libsepol/checkpolicy and I'm
not sure it is justified. Simpler approach is just to create a fixed
mapping table in the kernel from the existing fixed policy values to
the kernel values. Less flexible but probably sufficient.

A separate selinux userspace change was applied in
https://github.com/SELinuxProject/selinux/commit/8677ce5e8f592950ae6f14cea1b68a20ddc1ac25
to enable removal of most of the unused initial SID contexts from
policies, but there is no dependency between that change and this one.
That change permits removing all of the unused initial SID contexts
from policy except for the fs and sysctl SID contexts. The initial
SID declarations themselves would remain in policy to preserve the
values of subsequent ones but the contexts can be dropped. If/when
the kernel decides to reuse one of them, future policies can change
the name and start assigning a context again without breaking
compatibility.

Here is how I would envision staging changes to the initial SIDs in a
compatible manner after this commit is applied:

1. At any time after this commit is applied, the kernel could choose
to reclaim one of the safely reclaimable unused initial SIDs listed
above for a new purpose (i.e. replace its NULL entry in the
initial_sid_to_string[] table with a new name and start using the
newly generated SECINITSID_name symbol in code), and refpolicy could
at that time rename its declaration of that initial SID to reflect its
new purpose and start assigning it a context going
forward. Existing/old policies would map the reclaimed initial SID to
the unlabeled context, so that would be the initial default behavior
until policies are updated. This doesn't depend on the selinux
userspace change; it will work with existing policies and userspace.

2. In 6 months or so we'll have another SELinux userspace release that
will include the libsepol/checkpolicy support for omitting unused
initial SID contexts.

3. At any time after that release, refpolicy can make that release its
minimum build requirement and drop the sid context statements (but not
the sid declarations) for all of the unused initial SIDs except for
fs and sysctl, which must remain for compatibility on policy
reload with old kernels and for compatibility with kernels that were
still using SECINITSID_SYSCTL (< 2.6.39). This doesn't depend on this
kernel commit; it will work with previous kernels as well.

4. After N years for some value of N, refpolicy decides that it no
longer cares about policy reload compatibility for kernels that
predate this kernel commit, and refpolicy drops the fs and sysctl
SID contexts from policy too (but retains the declarations).

5. After M years for some value of M, the kernel decides that it no
longer cares about compatibility with refpolicies that predate step 4
(dropping the fs and sysctl SIDs), and those two SIDs also become
safely reclaimable. This step is optional and need not ever occur unless
we decide that the need to reclaim those two SIDs outweighs the
compatibility cost.

6. After O years for some value of O, refpolicy decides that it no
longer cares about policy load (not just reload) compatibility for
kernels that predate this kernel commit, and both kernel and refpolicy
can then start adding and using new initial SIDs beyond 27. This does
not depend on the previous change (step 5) and can occur independent
of it.

Fixes: https://github.com/SELinuxProject/selinux-kernel/issues/12
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

e0ac568d 26-Feb-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: reduce the use of hard-coded hash sizes

Instead allocate hash tables with just the right size based on the
actual number of elements (which is almost always known beforehand, we
just need to defer the hashtab allocation to the right time). The only
case when we don't know the size (with the current policy format) is the
new filename transitions hashtable. Here I just left the existing value.

After this patch, the time to load Fedora policy on x86_64 decreases
from 790 ms to 167 ms. If the unconfined module is removed, it decreases
from 750 ms to 122 ms. It is also likely that other operations are going
to be faster, mainly string_to_context_struct() or mls_compute_sid(),
but I didn't try to quantify that.

The memory usage of all hash table arrays increases from ~58 KB to
~163 KB (with Fedora policy on x86_64).

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

e9765680 26-Feb-2020 Ingo Molnar <mingo@kernel.org>

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core

Pull EFI updates for v5.7 from Ard Biesheuvel:

This time, the set of changes for the EFI subsystem is much larger than
usual. The main reasons are:

- Get things cleaned up before EFI support for RISC-V arrives, which will
increase the size of the validation matrix, and therefore the threshold to
making drastic changes,

- After years of defunct maintainership, the GRUB project has finally started
to consider changes from the distros regarding UEFI boot, some of which are
highly specific to the way x86 does UEFI secure boot and measured boot,
based on knowledge of both shim internals and the layout of bootparams and
the x86 setup header. Having this maintenance burden on other architectures
(which don't need shim in the first place) is hard to justify, so instead,
we are introducing a generic Linux/UEFI boot protocol.

Summary of changes:

- Boot time GDT handling changes (Arvind)

- Simplify handling of EFI properties table on arm64

- Generic EFI stub cleanups, to improve command line handling, file I/O,
memory allocation, etc.

- Introduce a generic initrd loading method based on calling back into
the firmware, instead of relying on the x86 EFI handover protocol or
device tree.

- Introduce a mixed mode boot method that does not rely on the x86 EFI
handover protocol either, and could potentially be adopted by other
architectures (if another one ever surfaces where one execution mode
is a superset of another)

- Clean up the contents of struct efi, and move out everything that
doesn't need to be stored there.

- Incorporate support for UEFI spec v2.8A changes that permit firmware
implementations to return EFI_UNSUPPORTED from UEFI runtime services at
OS runtime, and expose a mask of which ones are supported or unsupported
via a configuration table.

- Various documentation updates and minor code cleanups (Heinrich)

- Partial fix for the lack of by-VA cache maintenance in the decompressor
on 32-bit ARM. Note that these patches were deliberately put at the
beginning so they can be used as a stable branch that will be shared with
a PR containing the complete fix, which I will send to the ARM tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>


6b75d54d 16-Feb-2020 Ard Biesheuvel <ardb@kernel.org>

integrity: Check properly whether EFI GetVariable() is available

Testing the value of the efi.get_variable function pointer is not
the right way to establish whether the platform supports EFI
variables at runtime. Instead, use the newly added granular check
that can test for the presence of each EFI runtime service
individually.

Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

e4cfa05e 20-Feb-2020 Richard Haines <richard_c_haines@btinternet.com>

selinux: Add xfs quota command types

Add Q_XQUOTAOFF, Q_XQUOTAON and Q_XSETQLIM to trigger filesystem quotamod
permission check.

Add Q_XGETQUOTA, Q_XGETQSTAT, Q_XGETQSTATV and Q_XGETNEXTQUOTA to trigger
filesystem quotaget permission check.

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>

c3a27611 17-Feb-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: optimize storage of filename transitions

In these rules, each rule with the same (target type, target class,
filename) values is (in practice) always mapped to the same result type.
Therefore, it is much more efficient to group the rules by (ttype,
tclass, filename).

Thus, this patch drops the stype field from the key and changes the
datum to be a linked list of one or more structures that contain a
result type and an ebitmap of source types that map the given target to
the given result type under the given filename. The size of the hash
table is also incremented to 2048 to be more optimal for Fedora policy
(which currently has ~2500 unique (ttype, tclass, filename) tuples,
regardless of whether the 'unconfined' module is enabled).

Not only does this dramtically reduce memory usage when the policy
contains a lot of unconfined domains (ergo a lot of filename based
transitions), but it also slightly reduces memory usage of strongly
confined policies (modeled on Fedora policy with 'unconfined' module
disabled) and significantly reduces lookup times of these rules on
Fedora (roughly matches the performance of the rhashtable conversion
patch [1] posted recently to selinux@vger.kernel.org).

An obvious next step is to change binary policy format to match this
layout, so that disk space is also saved. However, since that requires
more work (including matching userspace changes) and this patch is
already beneficial on its own, I'm posting it separately.

Performance/memory usage comparison:

Kernel | Policy load | Policy load | Mem usage | Mem usage | openbench
| | (-unconfined) | | (-unconfined) | (createfiles)
-----------------|-------------|---------------|-----------|---------------|--------------
reference | 1,30s | 0,91s | 90MB | 77MB | 55 us/file
rhashtable patch | 0.98s | 0,85s | 85MB | 75MB | 38 us/file
this patch | 0,95s | 0,87s | 75MB | 75MB | 40 us/file

(Memory usage is measured after boot. With SELinux disabled the memory
usage was ~60MB on the same system.)

[1] https://lore.kernel.org/selinux/20200116213937.77795-1-dev@lynxeye.de/T/

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

ebe7acad 20-Feb-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull IMA fixes from Mimi Zohar:
"Two bug fixes and an associated change for each.

The one that adds SM3 to the IMA list of supported hash algorithms is
a simple change, but could be considered a new feature"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: add sm3 algorithm to hash algorithm configuration list
crypto: rename sm3-256 to sm3 in hash_algo_name
efi: Only print errors about failing to get certs if EFI vars are found
x86/ima: use correct identifier for SetupMode variable


5780b9ab 10-Feb-2020 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

ima: add sm3 algorithm to hash algorithm configuration list

sm3 has been supported by the ima hash algorithm, but it is not
yet in the Kconfig configuration list. After adding, both ima and tpm2
can support sm3 well.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

3be54d55 16-Feb-2020 Javier Martinez Canillas <javierm@redhat.com>

efi: Only print errors about failing to get certs if EFI vars are found

If CONFIG_LOAD_UEFI_KEYS is enabled, the kernel attempts to load the certs
from the db, dbx and MokListRT EFI variables into the appropriate keyrings.

But it just assumes that the variables will be present and prints an error
if the certs can't be loaded, even when is possible that the variables may
not exist. For example the MokListRT variable will only be present if shim
is used.

So only print an error message about failing to get the certs list from an
EFI variable if this is found. Otherwise these printed errors just pollute
the kernel log ring buffer with confusing messages like the following:

[ 5.427251] Couldn't get size: 0x800000000000000e
[ 5.427261] MODSIGN: Couldn't get UEFI db list
[ 5.428012] Couldn't get size: 0x800000000000000e
[ 5.428023] Couldn't get UEFI MokListRT

Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

253050f5 11-Feb-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: factor out loop body from filename_trans_read()

It simplifies cleanup in the error path. This will be extra useful in
later patch.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

4ca54d3d 07-Feb-2020 Connor O'Brien <connoro@google.com>

security: selinux: allow per-file labeling for bpffs

Add support for genfscon per-file labeling of bpffs files. This allows
for separate permissions for different pinned bpf objects, which may
be completely unrelated to each other.

Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Steven Moreland <smoreland@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

89d4d7c8 02-Feb-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: generalize evaluate_cond_node()

Both callers iterate the cond_list and call it for each node - turn it
into evaluate_cond_nodes(), which does the iteration for them.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

8794d783 02-Feb-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: convert cond_expr to array

Since it is fixed-size after allocation and we know the size beforehand,
using a plain old array is simpler and more efficient.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

2b3a003e 02-Feb-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: convert cond_av_list to array

Since it is fixed-size after allocation and we know the size beforehand,
using a plain old array is simpler and more efficient.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

60abd318 02-Feb-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: convert cond_list to array

Since it is fixed-size after allocation and we know the size beforehand,
using a plain old array is simpler and more efficient.

While there, also fix signedness of some related variables/parameters.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

a5650acb 10-Feb-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20200210' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fixes from Paul Moore:
"Two small fixes: one fixes a locking problem in the recently merged
label translation code, the other fixes an embarrassing 'binderfs' /
'binder' filesystem name check"

* tag 'selinux-pr-20200210' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix sidtab string cache locking
selinux: fix typo in filesystem name


8d269a8e 01-Feb-2020 Vasily Averin <vvs@virtuozzo.com>

selinux: sel_avc_get_stat_idx should increase position index

If seq_file .next function does not change position index,
read after some lseek can generate unexpected output.

$ dd if=/sys/fs/selinux/avc/cache_stats # usual output
lookups hits misses allocations reclaims frees
817223 810034 7189 7189 6992 7037
1934894 1926896 7998 7998 7632 7683
1322812 1317176 5636 5636 5456 5507
1560571 1551548 9023 9023 9056 9115
0+1 records in
0+1 records out
189 bytes copied, 5,1564e-05 s, 3,7 MB/s

$# read after lseek to midle of last line
$ dd if=/sys/fs/selinux/avc/cache_stats bs=180 skip=1
dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset
056 9115 <<<< end of last line
1560571 1551548 9023 9023 9056 9115 <<< whole last line once again
0+1 records in
0+1 records out
45 bytes copied, 8,7221e-05 s, 516 kB/s

$# read after lseek beyond end of of file
$ dd if=/sys/fs/selinux/avc/cache_stats bs=1000 skip=1
dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset
1560571 1551548 9023 9023 9056 9115 <<<< generates whole last line
0+1 records in
0+1 records out
36 bytes copied, 9,0934e-05 s, 396 kB/s

https://bugzilla.kernel.org/show_bug.cgi?id=206283

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

7470d0d1 28-Jan-2020 Christian Göttsche <cgzones@googlemail.com>

selinux: allow kernfs symlinks to inherit parent directory context

Currently symlinks on kernel filesystems, like sysfs, are labeled on
creation with the parent filesystem root sid.

Allow symlinks to inherit the parent directory context, so fine-grained
kernfs labeling can be applied to symlinks too and checking contexts
doesn't complain about them.

For backward-compatibility this behavior is contained in a new policy
capability: genfs_seclabel_symlinks

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

06c2efe2 17-Jan-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: simplify evaluate_cond_node()

It never fails, so it can just return void.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

e9c38f9f 08-Jan-2020 Stephen Smalley <sds@tycho.nsa.gov>

Documentation,selinux: deprecate setting checkreqprot to 1

Deprecate setting the SELinux checkreqprot tunable to 1 via kernel
parameter or /sys/fs/selinux/checkreqprot. Setting it to 0 is left
intact for compatibility since Android and some Linux distributions
do so for security and treat an inability to set it as a fatal error.
Eventually setting it to 0 will become a no-op and the kernel will
stop using checkreqprot's value internally altogether.

checkreqprot was originally introduced as a compatibility mechanism
for legacy userspace and the READ_IMPLIES_EXEC personality flag.
However, if set to 1, it weakens security by allowing mappings to be
made executable without authorization by policy. The default value
for the SECURITY_SELINUX_CHECKREQPROT_VALUE config option was changed
from 1 to 0 in commit 2a35d196c160e3 ("selinux: change
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and both Android
and Linux distributions began explicitly setting
/sys/fs/selinux/checkreqprot to 0 some time ago.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

4b36cb77 17-Jan-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: move status variables out of selinux_ss

It fits more naturally in selinux_state, since it reflects also global
state (the enforcing and policyload fields).

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

c9d35ee0 08-Feb-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs file system parameter updates from Al Viro:
"Saner fs_parser.c guts and data structures. The system-wide registry
of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
the horror switch() in fs_parse() that would have to grow another case
every time something got added to that system-wide registry.

New syntax types can be added by filesystems easily now, and their
namespace is that of functions - not of system-wide enum members. IOW,
they can be shared or kept private and if some turn out to be widely
useful, we can make them common library helpers, etc., without having
to do anything whatsoever to fs_parse() itself.

And we already get that kind of requests - the thing that finally
pushed me into doing that was "oh, and let's add one for timeouts -
things like 15s or 2h". If some filesystem really wants that, let them
do it. Without somebody having to play gatekeeper for the variants
blessed by direct support in fs_parse(), TYVM.

Quite a bit of boilerplate is gone. And IMO the data structures make a
lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
tmpfs: switch to use of invalfc()
cgroup1: switch to use of errorfc() et.al.
procfs: switch to use of invalfc()
hugetlbfs: switch to use of invalfc()
cramfs: switch to use of errofc() et.al.
gfs2: switch to use of errorfc() et.al.
fuse: switch to use errorfc() et.al.
ceph: use errorfc() and friends instead of spelling the prefix out
prefix-handling analogues of errorf() and friends
turn fs_param_is_... into functions
fs_parse: handle optional arguments sanely
fs_parse: fold fs_parameter_desc/fs_parameter_spec
fs_parser: remove fs_parameter_description name field
add prefix to fs_context->log
ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
new primitive: __fs_parse()
switch rbd and libceph to p_log-based primitives
struct p_log, variants of warnf() et.al. taking that one instead
teach logfc() to handle prefices, give it saner calling conventions
get rid of cg_invalf()
...


d7167b14 07-Sep-2019 Al Viro <viro@zeniv.linux.org.uk>

fs_parse: fold fs_parameter_desc/fs_parameter_spec

The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

96cafb9c 06-Dec-2019 Eric Sandeen <sandeen@sandeen.net>

fs_parser: remove fs_parameter_description name field

Unused now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

85e55296 06-Feb-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'Smack-for-5.6' of git://github.com/cschaufler/smack-next

Pull smack fix from Casey Schaufler:
"One fix for an obscure error found using an old version of ping(1)
that did not use IPv6 sockets in the documented way"

* tag 'Smack-for-5.6' of git://github.com/cschaufler/smack-next:
broken ping to ipv6 linklocal addresses on debian buster


39a706fb 03-Feb-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: fix sidtab string cache locking

Avoiding taking a lock in an IRQ context is not enough to prevent
deadlocks, as discovered by syzbot:

===
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.5.0-syzkaller #0 Not tainted
-----------------------------------------------------
syz-executor.0/8927 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: sidtab_sid2str_put.part.0+0x36/0x880 security/selinux/ss/sidtab.c:533

and this task is already holding:
ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: nf_conntrack_lock+0x17/0x70 net/netfilter/nf_conntrack_core.c:91
which would create a new lock dependency:
(&(&nf_conntrack_locks[i])->rlock){+.-.} -> (&(&s->cache_lock)->rlock){+.+.}

but this new dependency connects a SOFTIRQ-irq-safe lock:
(&(&nf_conntrack_locks[i])->rlock){+.-.}

[...]

other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&s->cache_lock)->rlock);
local_irq_disable();
lock(&(&nf_conntrack_locks[i])->rlock);
lock(&(&s->cache_lock)->rlock);
<Interrupt>
lock(&(&nf_conntrack_locks[i])->rlock);

*** DEADLOCK ***
[...]
===

Fix this by simply locking with irqsave/irqrestore and stop giving up on
!in_task(). It makes the locking a bit slower, but it shouldn't make a
big difference in real workloads. Under the scenario from [1] (only
cache hits) it only increased the runtime overhead from the
security_secid_to_secctx() function from ~2% to ~3% (it was ~5-65%
before introducing the cache).

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1733259

Fixes: d97bd23c2d7d ("selinux: cache the SID -> context string translation")
Reported-by: syzbot+61cba5033e2072d61806@syzkaller.appspotmail.com
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

a20456ae 01-Feb-2020 Hridya Valsaraju <hridya@google.com>

selinux: fix typo in filesystem name

Correct the filesystem name to "binder" to enable genfscon per-file
labelling for binderfs.

Fixes: 7a4b5194747 ("selinux: allow per-file labelling for binderfs")
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: slight style changes to the subj/description]
Signed-off-by: Paul Moore <paul@paul-moore.com>

87fbfffc 03-Feb-2020 Casey Schaufler <casey@schaufler-ca.com>

broken ping to ipv6 linklocal addresses on debian buster

I am seeing ping failures to IPv6 linklocal addresses with Debian
buster. Easiest example to reproduce is:

$ ping -c1 -w1 ff02::1%eth1
connect: Invalid argument

$ ping -c1 -w1 ff02::1%eth1
PING ff02::01%eth1(ff02::1%eth1) 56 data bytes
64 bytes from fe80::e0:f9ff:fe0c:37%eth1: icmp_seq=1 ttl=64 time=0.059 ms

git bisect traced the failure to
commit b9ef5513c99b ("smack: Check address length before reading address family")

Arguably ping is being stupid since the buster version is not setting
the address family properly (ping on stretch for example does):

$ strace -e connect ping6 -c1 -w1 ff02::1%eth1
connect(5, {sa_family=AF_UNSPEC,
sa_data="\4\1\0\0\0\0\377\2\0\0\0\0\0\0\0\0\0\0\0\0\0\1\3\0\0\0"}, 28)
= -1 EINVAL (Invalid argument)

but the command works fine on kernels prior to this commit, so this is
breakage which goes against the Linux paradigm of "don't break userspace"

Cc: stable@vger.kernel.org
Reported-by: David Ahern <dsahern@gmail.com>
Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

 security/smack/smack_lsm.c | 41 +++++++++++++++++++----------------------
1 file changed, 19 insertions(+), 22 deletions(-)

08a3ef8f 29-Jan-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest kunit updates from Shuah Khan:
"This kunit update consists of:

- Support for building kunit as a module from Alan Maguire

- AppArmor KUnit tests for policy unpack from Mike Salvatore"

* tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: building kunit as a module breaks allmodconfig
kunit: update documentation to describe module-based build
kunit: allow kunit to be loaded as a module
kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
kunit: allow kunit tests to be loaded as a module
kunit: hide unexported try-catch interface in try-catch-impl.h
kunit: move string-stream.h to lib/kunit
apparmor: add AppArmor KUnit tests for policy unpack


6aee4bad 29-Jan-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull openat2 support from Al Viro:
"This is the openat2() series from Aleksa Sarai.

I'm afraid that the rest of namei stuff will have to wait - it got
zero review the last time I'd posted #work.namei, and there had been a
leak in the posted series I'd caught only last weekend. I was going to
repost it on Monday, but the window opened and the odds of getting any
review during that... Oh, well.

Anyway, openat2 part should be ready; that _did_ get sane amount of
review and public testing, so here it comes"

From Aleksa's description of the series:
"For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown
flags are present[1].

This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road
to being added to openat(2).

Furthermore, the need for some sort of control over VFS's path
resolution (to avoid malicious paths resulting in inadvertent
breakouts) has been a very long-standing desire of many userspace
applications.

This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
(which was a variant of David Drysdale's O_BENEATH patchset[4] which
was a spin-off of the Capsicum project[5]) with a few additions and
changes made based on the previous discussion within [6] as well as
others I felt were useful.

In line with the conclusions of the original discussion of
AT_NO_JUMPS, the flag has been split up into separate flags. However,
instead of being an openat(2) flag it is provided through a new
syscall openat2(2) which provides several other improvements to the
openat(2) interface (see the patch description for more details). The
following new LOOKUP_* flags are added:

LOOKUP_NO_XDEV:

Blocks all mountpoint crossings (upwards, downwards, or through
absolute links). Absolute pathnames alone in openat(2) do not
trigger this. Magic-link traversal which implies a vfsmount jump is
also blocked (though magic-link jumps on the same vfsmount are
permitted).

LOOKUP_NO_MAGICLINKS:

Blocks resolution through /proc/$pid/fd-style links. This is done
by blocking the usage of nd_jump_link() during resolution in a
filesystem. The term "magic-links" is used to match with the only
reference to these links in Documentation/, but I'm happy to change
the name.

It should be noted that this is different to the scope of
~LOOKUP_FOLLOW in that it applies to all path components. However,
you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
will *not* fail (assuming that no parent component was a
magic-link), and you will have an fd for the magic-link.

In order to correctly detect magic-links, the introduction of a new
LOOKUP_MAGICLINK_JUMPED state flag was required.

LOOKUP_BENEATH:

Disallows escapes to outside the starting dirfd's
tree, using techniques such as ".." or absolute links. Absolute
paths in openat(2) are also disallowed.

Conceptually this flag is to ensure you "stay below" a certain
point in the filesystem tree -- but this requires some additional
to protect against various races that would allow escape using
"..".

Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
can trivially beam you around the filesystem (breaking the
protection). In future, there might be similar safety checks done
as in LOOKUP_IN_ROOT, but that requires more discussion.

In addition, two new flags are added that expand on the above ideas:

LOOKUP_NO_SYMLINKS:

Does what it says on the tin. No symlink resolution is allowed at
all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
can still be used with NOFOLLOW to open an fd for the symlink as
long as no parent path had a symlink component.

LOOKUP_IN_ROOT:

This is an extension of LOOKUP_BENEATH that, rather than blocking
attempts to move past the root, forces all such movements to be
scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that
chroot(2) is not.

If a race is detected (as with LOOKUP_BENEATH) then an error is
generated, and similar to LOOKUP_BENEATH it is not permitted to
cross magic-links with LOOKUP_IN_ROOT.

The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[7] when opening
paths in a potentially malicious container.

There is a long list of CVEs that could have bene mitigated by
having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
few).

In order to make all of the above more usable, I'm working on
libpathrs[8] which is a C-friendly library for safe path resolution.
It features a userspace-emulated backend if the kernel doesn't support
openat2(2). Hopefully we can get userspace to switch to using it, and
thus get openat2(2) support for free once it's ready.

Future work would include implementing things like
RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
programs to be sure they don't hit DoSes though stale NFS handles)"

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Documentation: path-lookup: include new LOOKUP flags
selftests: add openat2(2) selftests
open: introduce openat2(2) syscall
namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
namei: LOOKUP_NO_XDEV: block mountpoint crossing
namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
namei: LOOKUP_NO_SYMLINKS: block symlink resolution
namei: allow set_root() to produce errors
namei: allow nd_jump_link() to produce errors
nsfs: clean-up ns_get_path() signature to return int
namei: only return -ECHILD from follow_dotdot_rcu()


b3a60822 28-Jan-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-v5.6' of git://git.kernel.org:/pub/scm/linux/kernel/git/jmorris/linux-security

Pull security subsystem update from James Morris:
"Just one minor fix this time"

* 'for-v5.6' of git://git.kernel.org:/pub/scm/linux/kernel/git/jmorris/linux-security:
security: remove EARLY_LSM_COUNT which never used


73a0bff2 28-Jan-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull IMA updates from Mimi Zohar:
"Two new features - measuring certificates and querying IMA for a file
hash - and three bug fixes:

- Measuring certificates is like the rest of IMA, based on policy,
but requires loading a custom policy. Certificates loaded onto a
keyring, for example during early boot, before a custom policy has
been loaded, are queued and only processed after loading the custom
policy.

- IMA calculates and caches files hashes. Other kernel subsystems,
and possibly kernel modules, are interested in accessing these
cached file hashes.

The bug fixes prevent classifying a file short read (e.g. shutdown) as
an invalid file signature, add a missing blank when displaying the
securityfs policy rules containing LSM labels, and, lastly, fix the
handling of the IMA policy information for unknown LSM labels"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
IMA: Defined delayed workqueue to free the queued keys
IMA: Call workqueue functions to measure queued keys
IMA: Define workqueue for early boot key measurements
IMA: pre-allocate buffer to hold keyrings string
ima: ima/lsm policy rule loading logic bug fixes
ima: add the ability to query the cached hash of a given file
ima: Add a space after printing LSM rules for readability
IMA: fix measuring asymmetric keys Kconfig
IMA: Read keyrings= option from the IMA policy
IMA: Add support to limit measuring keys
KEYS: Call the IMA hook to measure keys
IMA: Define an IMA hook to measure keys
IMA: Add KEY_CHECK func to measure keys
IMA: Check IMA policy flag
ima: avoid appraise error for hash calc interrupt


2cf64d7c 28-Jan-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'tomoyo-pr-20200128' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1

Pull tomoyo update from Tetsuo Handa:
"One 'int' -> 'atomic_t' conversion patch to suppress KCSAN's warning"

* tag 'tomoyo-pr-20200128' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: Use atomic_t for statistics counter


bd2463ac 28-Jan-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from David Miller:

1) Add WireGuard

2) Add HE and TWT support to ath11k driver, from John Crispin.

3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

4) Add variable window congestion control to TIPC, from Jon Maloy.

5) Add BCM84881 PHY driver, from Russell King.

6) Start adding netlink support for ethtool operations, from Michal
Kubecek.

7) Add XDP drop and TX action support to ena driver, from Sameeh
Jubran.

8) Add new ipv4 route notifications so that mlxsw driver does not have
to handle identical routes itself. From Ido Schimmel.

9) Add BPF dynamic program extensions, from Alexei Starovoitov.

10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

11) Add support for macsec HW offloading, from Antoine Tenart.

12) Add initial support for MPTCP protocol, from Christoph Paasch,
Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
Cherian, and others.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
udp: segment looped gso packets correctly
netem: change mailing list
qed: FW 8.42.2.0 debug features
qed: rt init valid initialization changed
qed: Debug feature: ilt and mdump
qed: FW 8.42.2.0 Add fw overlay feature
qed: FW 8.42.2.0 HSI changes
qed: FW 8.42.2.0 iscsi/fcoe changes
qed: Add abstraction for different hsi values per chip
qed: FW 8.42.2.0 Additional ll2 type
qed: Use dmae to write to widebus registers in fw_funcs
qed: FW 8.42.2.0 Parser offsets modified
qed: FW 8.42.2.0 Queue Manager changes
qed: FW 8.42.2.0 Expose new registers and change windows
qed: FW 8.42.2.0 Internal ram offsets modifications
MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
Documentation: net: octeontx2: Add RVU HW and drivers overview
octeontx2-pf: ethtool RSS config support
octeontx2-pf: Add basic ethtool support
...


b1dba247 27-Jan-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'selinux-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux update from Paul Moore:
"This is one of the bigger SELinux pull requests in recent years with
28 patches. Everything is passing our test suite and the highlights
are below:

- Mark CONFIG_SECURITY_SELINUX_DISABLE as deprecated. We're some time
away from actually attempting to remove this in the kernel, but the
only distro we know that still uses it (Fedora) is working on
moving away from this so we want to at least let people know we are
planning to remove it.

- Reorder the SELinux hooks to help prevent bad things when SELinux
is disabled at runtime. The proper fix is to remove the
CONFIG_SECURITY_SELINUX_DISABLE functionality (see above) and just
take care of it at boot time (e.g. "selinux=0").

- Add SELinux controls for the kernel lockdown functionality,
introducing a new SELinux class/permissions: "lockdown { integrity
confidentiality }".

- Add a SELinux control for move_mount(2) that reuses the "file {
mounton }" permission.

- Improvements to the SELinux security label data store lookup
functions to speed up translations between our internal label
representations and the visible string labels (both directions).

- Revisit a previous fix related to SELinux inode auditing and
permission caching and do it correctly this time.

- Fix the SELinux access decision cache to cleanup properly on error.
In some extreme cases this could limit the cache size and result in
a decrease in performance.

- Enable SELinux per-file labeling for binderfs.

- The SELinux initialized and disabled flags were wrapped with
accessors to ensure they are accessed correctly.

- Mark several key SELinux structures with __randomize_layout.

- Changes to the LSM build configuration to only build
security/lsm_audit.c when needed.

- Changes to the SELinux build configuration to only build the IB
object cache when CONFIG_SECURITY_INFINIBAND is enabled.

- Move a number of single-caller functions into their callers.

- Documentation fixes (/selinux -> /sys/fs/selinux).

- A handful of cleanup patches that aren't worth mentioning on their
own, the individual descriptions have plenty of detail"

* tag 'selinux-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (28 commits)
selinux: fix regression introduced by move_mount(2) syscall
selinux: do not allocate ancillary buffer on first load
selinux: remove redundant allocation and helper functions
selinux: remove redundant selinux_nlmsg_perm
selinux: fix wrong buffer types in policydb.c
selinux: reorder hooks to make runtime disable less broken
selinux: treat atomic flags more carefully
selinux: make default_noexec read-only after init
selinux: move ibpkeys code under CONFIG_SECURITY_INFINIBAND.
selinux: remove redundant msg_msg_alloc_security
Documentation,selinux: fix references to old selinuxfs mount point
selinux: deprecate disabling SELinux and runtime
selinux: allow per-file labelling for binderfs
selinuxfs: use scnprintf to get real length for inode
selinux: remove set but not used variable 'sidtab'
selinux: ensure the policy has been loaded before reading the sidtab stats
selinux: ensure we cleanup the internal AVC counters on error in avc_update()
selinux: randomize layout of key structures
selinux: clean up selinux_enabled/disabled/enforcing_boot
selinux: remove unnecessary selinux cred request
...


10c2d111 21-Jan-2020 Alex Shi <alexs@kernel.org>

security: remove EARLY_LSM_COUNT which never used

This macro is never used from it was introduced in commit e6b1db98cf4d5
("security: Support early LSMs"), better to remove it.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <jmorris@namei.org>

5b3014b9 22-Jan-2020 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

IMA: Defined delayed workqueue to free the queued keys

Keys queued for measurement should be freed if a custom IMA policy
was not loaded. Otherwise, the keys will remain queued forever
consuming kernel memory.

This patch defines a delayed workqueue to handle the above scenario.
The workqueue handler is setup to execute 5 minutes after IMA
initialization is completed.

If a custom IMA policy is loaded before the workqueue handler is
scheduled to execute, the workqueue task is cancelled and any queued keys
are processed for measurement. But if a custom policy was not loaded then
the queued keys are just freed when the delayed workqueue handler is run.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reported-by: kernel test robot <rong.a.chen@intel.com> # sleeping
function called from invalid context
Reported-by: kbuild test robot <lkp@intel.com> # redefinition of
ima_init_key_queue() function.
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

450d0fd5 22-Jan-2020 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

IMA: Call workqueue functions to measure queued keys

Measuring keys requires a custom IMA policy to be loaded. Keys should
be queued for measurement if a custom IMA policy is not yet loaded.
Keys queued for measurement, if any, should be processed when a custom
policy is loaded.

This patch updates the IMA hook function ima_post_key_create_or_update()
to queue the key if a custom IMA policy has not yet been loaded. And,
ima_update_policy() function, which is called when a custom IMA policy
is loaded, is updated to process queued keys.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

9f81a2ed 22-Jan-2020 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

IMA: Define workqueue for early boot key measurements

Measuring keys requires a custom IMA policy to be loaded. Keys created
or updated before a custom IMA policy is loaded should be queued and
will be processed after a custom policy is loaded.

This patch defines a workqueue for queuing keys when a custom IMA policy
has not yet been loaded. An intermediate Kconfig boolean option namely
IMA_QUEUE_EARLY_BOOT_KEYS is used to declare the workqueue functions.

A flag namely ima_process_keys is used to check if the key should be
queued or should be processed immediately.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

5c7bac9f 16-Jan-2020 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

IMA: pre-allocate buffer to hold keyrings string

ima_match_keyring() is called while holding rcu read lock. Since this
function executes in atomic context, it should not call any function
that can sleep (such as kstrdup()).

This patch pre-allocates a buffer to hold the keyrings string read from
the IMA policy and uses that to match the given keyring.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Fixes: e9085e0ad38a ("IMA: Add support to limit measuring keys")
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

483ec26e 15-Jan-2020 Janne Karhunen <janne.karhunen@gmail.com>

ima: ima/lsm policy rule loading logic bug fixes

Keep the ima policy rules around from the beginning even if they appear
invalid at the time of loading, as they may become active after an lsm
policy load. However, loading a custom IMA policy with unknown LSM
labels is only safe after we have transitioned from the "built-in"
policy rules to a custom IMA policy.

Patch also fixes the rule re-use during the lsm policy reload and makes
some prints a bit more human readable.

Changelog:
v4:
- Do not allow the initial policy load refer to non-existing lsm rules.
v3:
- Fix too wide policy rule matching for non-initialized LSMs
v2:
- Fix log prints

Fixes: b16942455193 ("ima: use the lsm policy update notifier")
Cc: Casey Schaufler <casey@schaufler-ca.com>
Reported-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Janne Karhunen <janne.karhunen@gmail.com>
Signed-off-by: Konsta Karsisto <konsta.karsisto@gmail.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

6beea7af 13-Jan-2020 Florent Revest <revest@google.com>

ima: add the ability to query the cached hash of a given file

This allows other parts of the kernel (perhaps a stacked LSM allowing
system monitoring, eg. the proposed KRSI LSM [1]) to retrieve the hash
of a given file from IMA if it's present in the iint cache.

It's true that the existence of the hash means that it's also in the
audit logs or in /sys/kernel/security/ima/ascii_runtime_measurements,
but it can be difficult to pull that information out for every
subsequent exec. This is especially true if a given host has been up
for a long time and the file was first measured a long time ago.

It should be kept in mind that this function gives access to cached
entries which can be removed, for instance on security_inode_free().

This is based on Peter Moody's patch:
https://sourceforge.net/p/linux-ima/mailman/message/33036180/

[1] https://lkml.org/lkml/2019/9/10/393

Signed-off-by: Florent Revest <revest@google.com>
Reviewed-by: KP Singh <kpsingh@chromium.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

5350ceb0 04-Jan-2020 Clay Chang <clayc@hpe.com>

ima: Add a space after printing LSM rules for readability

When reading ima_policy from securityfs, there is a missing
space between output string of LSM rules and the remaining
rules.

Signed-off-by: Clay Chang <clayc@hpe.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

01df52d7 29-Aug-2019 John Johansen <john.johansen@canonical.com>

apparmor: remove duplicate check of xattrs on profile attachment.

The second check to ensure the xattrs are present and checked is
unneeded as this is already done in the profile attachment xmatch.

Signed-off-by: John Johansen <john.johansen@canonical.com>

0df34a64 30-Jul-2019 John Johansen <john.johansen@canonical.com>

apparmor: add outofband transition and use it in xattr match

There are cases where the a special out of band transition that can
not be triggered by input is useful in separating match conditions
in the dfa encoding.

The null_transition is currently used as an out of band transition
for match conditions that can not contain a \0 in their input
but apparmor needs an out of band transition for cases where
the match condition is allowed to contain any input character.

Achieve this by allowing for an explicit transition out of input
range that can only be triggered by code.

Signed-off-by: John Johansen <john.johansen@canonical.com>

f05841a9 19-Dec-2019 John Johansen <john.johansen@canonical.com>

apparmor: fail unpack if profile mode is unknown

Profile unpack should fail if the profile mode is not a mode that the
kernel understands.

Signed-off-by: John Johansen <john.johansen@canonical.com>

3ed4aaa9 25-Sep-2019 John Johansen <john.johansen@canonical.com>

apparmor: fix nnp subset test for unconfined

The subset test is not taking into account the unconfined exception
which will cause profile transitions in the stacked confinement
case to fail when no_new_privs is applied.

This fixes a regression introduced in the fix for
https://bugs.launchpad.net/bugs/1839037

BugLink: https://bugs.launchpad.net/bugs/1844186
Signed-off-by: John Johansen <john.johansen@canonical.com>

a68d59ff 24-Sep-2019 John Johansen <john.johansen@canonical.com>

apparmor: remove useless aafs_create_symlink

commit 1180b4c757aa ("apparmor: fix dangling symlinks to policy
rawdata after replacement") reworked how the rawdata symlink is
handled but failedto remove aafs_create_symlink which was reduced to a
useles stub.

Fixes: 1180b4c757aa ("apparmor: fix dangling symlinks to policy rawdata after replacement")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John Johansen <john.johansen@canonical.com>

98aa0034 17-Jan-2020 Stephen Smalley <sds@tycho.nsa.gov>

selinux: fix regression introduced by move_mount(2) syscall

commit 2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around")
introduced a new move_mount(2) system call and a corresponding new LSM
security_move_mount hook but did not implement this hook for any existing
LSM. This creates a regression for SELinux with respect to consistent
checking of mounts; the existing selinux_mount hook checks mounton
permission to the mount point path. Provide a SELinux hook
implementation for move_mount that applies this same check for
consistency. In the future we may wish to add a new move_mount
filesystem permission and check as well, but this addresses
the immediate regression.

Fixes: 2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around")
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

dae60293 31-Aug-2019 John Johansen <john.johansen@canonical.com>

apparmor: add consistency check between state and dfa diff encode flags

Check that a states diff encode flag is only set if diff encode is
enabled in the dfa header.

Signed-off-by: John Johansen <john.johansen@canonical.com>

c6596969 31-Aug-2019 John Johansen <john.johansen@canonical.com>

apparmor: add a valid state flags check

Add a check to ensure only known state flags are set on each
state in the dfa.

Signed-off-by: John Johansen <john.johansen@canonical.com>

e4f4e6ba 26-Jul-2019 Vasyl Gomonovych <gomonovych@gmail.com>

AppArmor: Remove semicolon

Remove unneeded semicolon

Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

278de07e 02-Jul-2019 Markus Elfring <elfring@users.sourceforge.net>

apparmor: Replace two seq_printf() calls by seq_puts() in aa_label_seq_xprint()

Two strings which did not contain a data format specification should be put
into a sequence. Thus use the corresponding function “seq_puts”.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: John Johansen <john.johansen@canonical.com>

dd89b9d9 16-Jan-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: do not allocate ancillary buffer on first load

In security_load_policy(), we can defer allocating the newpolicydb
ancillary array to after checking state->initialized, thereby avoiding
the pointless allocation when loading policy the first time.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: merged portions by hand]
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

cb89e246 10-Jan-2020 Paul Moore <paul@paul-moore.com>

selinux: remove redundant allocation and helper functions

This patch removes the inode, file, and superblock security blob
allocation functions and moves the associated code into the
respective LSM hooks. This patch also removes the inode_doinit()
function as it was a trivial wrapper around
inode_doinit_with_dentry() and called from one location in the code.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

df4779b5 13-Jan-2020 Huaisheng Ye <yehs1@lenovo.com>

selinux: remove redundant selinux_nlmsg_perm

selinux_nlmsg_perm is used for only by selinux_netlink_send. Remove
the redundant function to simplify the code.

Fix a typo by suggestion from Stephen.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

ae3d8c2e 16-Jan-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: fix wrong buffer types in policydb.c

Two places used u32 where there should have been __le32.

Fixes sparse warnings:
CHECK [...]/security/selinux/ss/services.c
[...]/security/selinux/ss/policydb.c:2669:16: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2669:16: expected unsigned int
[...]/security/selinux/ss/policydb.c:2669:16: got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2674:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2674:24: expected unsigned int
[...]/security/selinux/ss/policydb.c:2674:24: got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2675:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2675:24: expected unsigned int
[...]/security/selinux/ss/policydb.c:2675:24: got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2676:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2676:24: expected unsigned int
[...]/security/selinux/ss/policydb.c:2676:24: got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2681:32: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2681:32: expected unsigned int
[...]/security/selinux/ss/policydb.c:2681:32: got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2701:16: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2701:16: expected unsigned int
[...]/security/selinux/ss/policydb.c:2701:16: got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2706:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2706:24: expected unsigned int
[...]/security/selinux/ss/policydb.c:2706:24: got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2707:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2707:24: expected unsigned int
[...]/security/selinux/ss/policydb.c:2707:24: got restricted __le32 [usertype]

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

8dcea187 14-Jan-2020 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: vlan: add rtm definitions and dump support

This patch adds vlan rtm definitions:
- NEWVLAN: to be used for creating vlans, setting options and
notifications
- DELVLAN: to be used for deleting vlans
- GETVLAN: used for dumping vlan information

Dumping vlans which can span multiple messages is added now with basic
information (vid and flags). We use nlmsg_parse() to validate the header
length in order to be able to extend the message with filtering
attributes later.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35c57fc3 10-Jan-2020 Alan Maguire <alan.maguire@oracle.com>

kunit: building kunit as a module breaks allmodconfig

kunit tests that do not support module build should depend
on KUNIT=y rather than just KUNIT in Kconfig, otherwise
they will trigger compilation errors for "make allmodconfig"
builds.

Fixes: 9fe124bf1b77 ("kunit: allow kunit to be loaded as a module")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

cfff75d8 08-Jan-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: reorder hooks to make runtime disable less broken

Commit b1d9e6b0646d ("LSM: Switch to lists of hooks") switched the LSM
infrastructure to use per-hook lists, which meant that removing the
hooks for a given module was no longer atomic. Even though the commit
clearly documents that modules implementing runtime revmoval of hooks
(only SELinux attempts this madness) need to take special precautions to
avoid race conditions, SELinux has never addressed this.

By inserting an artificial delay between the loop iterations of
security_delete_hooks() (I used 100 ms), booting to a state where
SELinux is enabled, but policy is not yet loaded, and running these
commands:

while true; do ping -c 1 <some IP>; done &
echo -n 1 >/sys/fs/selinux/disable
kill %1
wait

...I was able to trigger NULL pointer dereferences in various places. I
also have a report of someone getting panics on a stock RHEL-8 kernel
after setting SELINUX=disabled in /etc/selinux/config and rebooting
(without adding "selinux=0" to kernel command-line).

Reordering the SELinux hooks such that those that allocate structures
are removed last seems to prevent these panics. It is very much possible
that this doesn't make the runtime disable completely race-free, but at
least it makes the operation much less fragile.

Cc: stable@vger.kernel.org
Fixes: b1d9e6b0646d ("LSM: Switch to lists of hooks")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

65cddd50 07-Jan-2020 Ondrej Mosnacek <omosnace@redhat.com>

selinux: treat atomic flags more carefully

The disabled/enforcing/initialized flags are all accessed concurrently
by threads so use the appropriate accessors that ensure atomicity and
document that it is expected.

Use smp_load/acquire...() helpers (with memory barriers) for the
initialized flag, since it gates access to the rest of the state
structures.

Note that the disabled flag is currently not used for anything other
than avoiding double disable, but it will be used for bailing out of
hooks once security_delete_hooks() is removed.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

b78b7d59 07-Jan-2020 Stephen Smalley <sds@tycho.nsa.gov>

selinux: make default_noexec read-only after init

SELinux checks whether VM_EXEC is set in the VM_DATA_DEFAULT_FLAGS
during initialization and saves the result in default_noexec for use
in its mmap and mprotect hook function implementations to decide
whether to apply EXECMEM, EXECHEAP, EXECSTACK, and EXECMOD checks.
Mark default_noexec as ro_after_init to prevent later clearing it
and thereby disabling these checks. It is only set legitimately from
init code.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

fe49c7e4 09-Jan-2020 Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>

selinux: move ibpkeys code under CONFIG_SECURITY_INFINIBAND.

Move cache based pkey sid retrieval code which was added
with commit "409dcf31" under CONFIG_SECURITY_INFINIBAND.
As its going to alloc a new cache which impacts
low RAM devices which was enabled by default.

Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
[PM: checkpatch.pl cleanups, fixed capitalization in the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>

b82f3f68 10-Jan-2020 Huaisheng Ye <yehs1@lenovo.com>

selinux: remove redundant msg_msg_alloc_security

selinux_msg_msg_alloc_security only calls msg_msg_alloc_security but
do nothing else. And also msg_msg_alloc_security is just used by the
former.

Remove the redundant function to simplify the code.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

4d944bcd 05-Nov-2019 Mike Salvatore <mike.salvatore@canonical.com>

apparmor: add AppArmor KUnit tests for policy unpack

Add KUnit tests to test AppArmor unpacking of userspace policies.
AppArmor uses a serialized binary format for loading policies. To find
policy format documentation see
Documentation/admin-guide/LSM/apparmor.rst.

In order to write the tests against the policy unpacking code, some
static functions needed to be exposed for testing purposes. One of the
goals of this patch is to establish a pattern for which testing these
kinds of functions should be done in the future.

Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Mike Salvatore <mike.salvatore@canonical.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

ea78979d 08-Jan-2020 Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

IMA: fix measuring asymmetric keys Kconfig

As a result of the asymmetric public keys subtype Kconfig option being
defined as tristate, with the existing IMA Makefile, ima_asymmetric_keys.c
could be built as a kernel module. To prevent this from happening, this
patch defines and uses an intermediate Kconfig boolean option named
IMA_MEASURE_ASYMMETRIC_KEYS.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: James.Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reported-by: kbuild test robot <lkp@intel.com> # ima_asymmetric_keys.c
is built as a kernel module.
Fixes: 88e70da170e8 ("IMA: Define an IMA hook to measure keys")
Fixes: cb1aa3823c92 ("KEYS: Call the IMA hook to measure keys")
[zohar@linux.ibm.com: updated patch description]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

d41415eb 07-Jan-2020 Stephen Smalley <sds@tycho.nsa.gov>

Documentation,selinux: fix references to old selinuxfs mount point

selinuxfs was originally mounted on /selinux, and various docs and
kconfig help texts referred to nodes under it. In Linux 3.0,
/sys/fs/selinux was introduced as the preferred mount point for selinuxfs.
Fix all the old references to /selinux/ to /sys/fs/selinux/.
While we are there, update the description of the selinux boot parameter
to reflect the fact that the default value is always 1 since
commit be6ec88f41ba94 ("selinux: Remove SECURITY_SELINUX_BOOTPARAM_VALUE")
and drop discussion of runtime disable since it is deprecated.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

89b223bf 18-Dec-2019 Paul Moore <paul@paul-moore.com>

selinux: deprecate disabling SELinux and runtime

Deprecate the CONFIG_SECURITY_SELINUX_DISABLE functionality. The
code was originally developed to make it easier for Linux
distributions to support architectures where adding parameters to the
kernel command line was difficult. Unfortunately, supporting runtime
disable meant we had to make some security trade-offs when it came to
the LSM hooks, as documented in the Kconfig help text:

NOTE: selecting this option will disable the '__ro_after_init'
kernel hardening feature for security hooks. Please consider
using the selinux=0 boot parameter instead of enabling this
option.

Fortunately it looks as if that the original motivation for the
runtime disable functionality is gone, and Fedora/RHEL appears to be
the only major distribution enabling this capability at build time
so we are now taking steps to remove it entirely from the kernel.
The first step is to mark the functionality as deprecated and print
an error when it is used (what this patch is doing). As Fedora/RHEL
makes progress in transitioning the distribution away from runtime
disable, we will introduce follow-up patches over several kernel
releases which will block for increasing periods of time when the
runtime disable is used. Finally we will remove the option entirely
once we believe all users have moved to the kernel cmdline approach.

Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

7a4b5194 06-Jan-2020 Hridya Valsaraju <hridya@google.com>

selinux: allow per-file labelling for binderfs

This patch allows genfscon per-file labeling for binderfs.
This is required to have separate permissions to allow
access to binder, hwbinder and vndbinder devices which are
relocating to binderfs.

Acked-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>

7e78c875 06-Jan-2020 liuyang34 <yangliuxm34@gmail.com>

selinuxfs: use scnprintf to get real length for inode

The return value of snprintf maybe over the size of TMPBUFLEN, use
scnprintf instead in sel_read_class and sel_read_perm.

Signed-off-by: liuyang34 <liuyang34@xiaomi.com>
[PM: cleaned up the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>

a125bcda 04-Jan-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor fixes from John Johansen:

- performance regression: only get a label reference if the fast path
check fails

- fix aa_xattrs_match() may sleep while holding a RCU lock

- fix bind mounts aborting with -ENOMEM

* tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
apparmor: only get a label reference if the fast path check fails
apparmor: fix bind mounts aborting with -ENOMEM


8c62ed27 02-Jan-2020 John Johansen <john.johansen@canonical.com>

apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock

aa_xattrs_match() is unfortunately calling vfs_getxattr_alloc() from a
context protected by an rcu_read_lock. This can not be done as
vfs_getxattr_alloc() may sleep regardles of the gfp_t value being
passed to it.

Fix this by breaking the rcu_read_lock on the policy search when the
xattr match feature is requested and restarting the search if a policy
changes occur.

Fixes: 8e51f9087f40 ("apparmor: Add support for attaching profiles via xattr, presence and value")
Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John Johansen <john.johansen@canonical.com>

20d4e80d 18-Dec-2019 John Johansen <john.johansen@canonical.com>

apparmor: only get a label reference if the fast path check fails

The common fast path check can be done under rcu_read_lock() and
doesn't need a reference count on the label. Only take a reference
count if entering the slow path.

Fixes reported hackbench regression
- sha1 79e178a57dae ("Merge tag 'apparmor-pr-2019-12-03' of
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor")

hackbench -l (256000/#grp) -g #grp
128 groups 19.679 ±0.90%

- previous sha1 01d1dff64662 ("Merge tag 's390-5.5-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux")

hackbench -l (256000/#grp) -g #grp
128 groups 3.1689 ±3.04%

Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: bce4e7e9c45e ("apparmor: reduce rcu_read_lock scope for aa_file_perm mediation")
Signed-off-by: John Johansen <john.johansen@canonical.com>

9c95a278 10-Dec-2019 Patrick Steinhardt <ps@pks.im>

apparmor: fix bind mounts aborting with -ENOMEM

With commit df323337e507 ("apparmor: Use a memory pool instead per-CPU
caches, 2019-05-03"), AppArmor code was converted to use memory pools. In
that conversion, a bug snuck into the code that polices bind mounts that
causes all bind mounts to fail with -ENOMEM, as we erroneously error out
if `aa_get_buffer` returns a pointer instead of erroring out when it
does _not_ return a valid pointer.

Fix the issue by correctly checking for valid pointers returned by
`aa_get_buffer` to fix bind mounts with AppArmor.

Fixes: df323337e507 ("apparmor: Use a memory pool instead per-CPU caches")
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: John Johansen <john.johansen@canonical.com>

a8772fad 01-Jan-2020 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

tomoyo: Use atomic_t for statistics counter

syzbot is reporting that there is a race at tomoyo_stat_update() [1].
Although it is acceptable to fail to track exact number of times policy
was updated, convert to atomic_t because this is not a hot path.

[1] https://syzkaller.appspot.com/bug?id=a4d7b973972eeed410596e6604580e0133b0fc04

Reported-by: syzbot <syzbot+efea72d4a0a1d03596cd@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

c5c928c6 31-Dec-2019 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'tomoyo-fixes-for-5.5' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1

Pull tomoyo fixes from Tetsuo Handa:
"Two bug fixes:

- Suppress RCU warning at list_for_each_entry_rcu()

- Don't use fancy names on sockets"

* tag 'tomoyo-fixes-for-5.5' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: Suppress RCU warning at list_for_each_entry_rcu().
tomoyo: Don't use nifty names on sockets.


f1268534 24-Dec-2019 YueHaibing <yuehaibing@huawei.com>

selinux: remove set but not used variable 'sidtab'

security/selinux/ss/services.c: In function security_port_sid:
security/selinux/ss/services.c:2346:17: warning: variable sidtab set but not used [-Wunused-but-set-variable]
security/selinux/ss/services.c: In function security_ib_endport_sid:
security/selinux/ss/services.c:2435:17: warning: variable sidtab set but not used [-Wunused-but-set-variable]
security/selinux/ss/services.c: In function security_netif_sid:
security/selinux/ss/services.c:2480:17: warning: variable sidtab set but not used [-Wunused-but-set-variable]
security/selinux/ss/services.c: In function security_fs_use:
security/selinux/ss/services.c:2831:17: warning: variable sidtab set but not used [-Wunused-but-set-variable]

Since commit 66f8e2f03c02 ("selinux: sidtab reverse lookup hash table")
'sidtab' is not used any more, so remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

15b590a8 23-Dec-2019 Paul Moore <paul@paul-moore.com>

selinux: ensure the policy has been loaded before reading the sidtab stats

Check to make sure we have loaded a policy before we query the
sidtab's hash stats. Failure to do so could result in a kernel
panic/oops due to a dereferenced NULL pointer.

Fixes: 66f8e2f03c02 ("selinux: sidtab reverse lookup hash table")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

030b995a 17-Dec-2019 Jaihind Yadav <jaihindyadav@codeaurora.org>

selinux: ensure we cleanup the internal AVC counters on error in avc_update()

In AVC update we don't call avc_node_kill() when avc_xperms_populate()
fails, resulting in the avc->avc_cache.active_nodes counter having a
false value. In last patch this changes was missed , so correcting it.

Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Signed-off-by: Jaihind Yadav <jaihindyadav@codeaurora.org>
Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
[PM: merge fuzz, minor description cleanup]
Signed-off-by: Paul Moore <paul@paul-moore.com>

5c108d4e 13-Dec-2019