History log of /linux-master/include/uapi/linux/magic.h
Revision Date Author Comments
# cb12fd8e 12-Feb-2024 Christian Brauner <brauner@kernel.org>

pidfd: add pidfs

This moves pidfds from the anonymous inode infrastructure to a tiny
pseudo filesystem. This has been on my todo for quite a while as it will
unblock further work that we weren't able to do simply because of the
very justified limitations of anonymous inodes. Moving pidfds to a tiny
pseudo filesystem allows:

* statx() on pidfds becomes useful for the first time.
* pidfds can be compared simply via statx() and then comparing inode
numbers.
* pidfds have unique inode numbers for the system lifetime.
* struct pid is now stashed in inode->i_private instead of
file->private_data. This means it is now possible to introduce
concepts that operate on a process once all file descriptors have been
closed. A concrete example is kill-on-last-close.
* file->private_data is freed up for per-file options for pidfds.
* Each struct pid will refer to a different inode but the same struct
pid will refer to the same inode if it's opened multiple times. In
contrast to now where each struct pid refers to the same inode. Even
if we were to move to anon_inode_create_getfile() which creates new
inodes we'd still be associating the same struct pid with multiple
different inodes.

The tiny pseudo filesystem is not visible anywhere in userspace exactly
like e.g., pipefs and sockfs. There's no lookup, there's no complex
inode operations, nothing. Dentries and inodes are always deleted when
the last pidfd is closed.

We allocate a new inode for each struct pid and we reuse that inode for
all pidfds. We use iget_locked() to find that inode again based on the
inode number which isn't recycled. We allocate a new dentry for each
pidfd that uses the same inode. That is similar to anonymous inodes
which reuse the same inode for thousands of dentries. For pidfds we're
talking way less than that. There usually won't be a lot of concurrent
openers of the same struct pid. They can probably often be counted on
two hands. I know that systemd does use separate pidfd for the same
struct pid for various complex process tracking issues. So I think with
that things actually become way simpler. Especially because we don't
have to care about lookup. Dentries and inodes continue to be always
deleted.

The code is entirely optional and fairly small. If it's not selected we
fallback to anonymous inodes. Heavily inspired by nsfs which uses a
similar stashing mechanism just for namespaces.

Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 68f2736a 07-Jun-2022 Matthew Wilcox (Oracle) <willy@infradead.org>

mm: Convert all PageMovable users to movable_operations

These drivers are rather uncomfortably hammered into the
address_space_operations hole. They aren't filesystems and don't behave
like filesystems. They just need their own movable_operations structure,
which we can point to directly from page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>


# c086df49 10-Jan-2022 Jeff Layton <jlayton@kernel.org>

fuse: move FUSE_SUPER_MAGIC definition to magic.h

...to help userland apps that need to identify FUSE mounts.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# a0b3a15e 10-Jan-2022 Jeff Layton <jlayton@kernel.org>

ceph: move CEPH_SUPER_MAGIC definition to magic.h

The uapi headers are missing the ceph definition. Move it there so
userland apps can ID cephfs.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>


# dea29037 10-Jan-2022 Jeff Layton <jlayton@kernel.org>

cifs: move superblock magic defitions to magic.h

Help userland apps to identify cifs and smb2 mounts.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>


# 1ed147e2 25-Nov-2021 Namjae Jeon <linkinjeon@kernel.org>

exfat: move super block magic number to magic.h

Move exfat superblock magic number from local definition to magic.h.
It is also needed by userspace programs that call fstatfs().

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>


# 1507f512 07-Jul-2021 Mike Rapoport <rppt@kernel.org>

mm: introduce memfd_secret system call to create "secret" memory areas

Introduce "memfd_secret" system call with the ability to create memory
areas visible only in the context of the owning process and not mapped not
only to other processes but in the kernel page tables as well.

The secretmem feature is off by default and the user must explicitly
enable it at the boot time.

Once secretmem is enabled, the user will be able to create a file
descriptor using the memfd_secret() system call. The memory areas created
by mmap() calls from this file descriptor will be unmapped from the kernel
direct map and they will be only mapped in the page table of the processes
that have access to the file descriptor.

Secretmem is designed to provide the following protections:

* Enhanced protection (in conjunction with all the other in-kernel
attack prevention systems) against ROP attacks. Seceretmem makes
"simple" ROP insufficient to perform exfiltration, which increases the
required complexity of the attack. Along with other protections like
the kernel stack size limit and address space layout randomization which
make finding gadgets is really hard, absence of any in-kernel primitive
for accessing secret memory means the one gadget ROP attack can't work.
Since the only way to access secret memory is to reconstruct the missing
mapping entry, the attacker has to recover the physical page and insert
a PTE pointing to it in the kernel and then retrieve the contents. That
takes at least three gadgets which is a level of difficulty beyond most
standard attacks.

* Prevent cross-process secret userspace memory exposures. Once the
secret memory is allocated, the user can't accidentally pass it into the
kernel to be transmitted somewhere. The secreremem pages cannot be
accessed via the direct map and they are disallowed in GUP.

* Harden against exploited kernel flaws. In order to access secretmem,
a kernel-side attack would need to either walk the page tables and
create new ones, or spawn a new privileged uiserspace process to perform
secrets exfiltration using ptrace.

The file descriptor based memory has several advantages over the
"traditional" mm interfaces, such as mlock(), mprotect(), madvise(). File
descriptor approach allows explicit and controlled sharing of the memory
areas, it allows to seal the operations. Besides, file descriptor based
memory paves the way for VMMs to remove the secret memory range from the
userspace hipervisor process, for instance QEMU. Andy Lutomirski says:

"Getting fd-backed memory into a guest will take some possibly major
work in the kernel, but getting vma-backed memory into a guest without
mapping it in the host user address space seems much, much worse."

memfd_secret() is made a dedicated system call rather than an extension to
memfd_create() because it's purpose is to allow the user to create more
secure memory mappings rather than to simply allow file based access to
the memory. Nowadays a new system call cost is negligible while it is way
simpler for userspace to deal with a clear-cut system calls than with a
multiplexer or an overloaded syscall. Moreover, the initial
implementation of memfd_secret() is completely distinct from
memfd_create() so there is no much sense in overloading memfd_create() to
begin with. If there will be a need for code sharing between these
implementation it can be easily achieved without a need to adjust user
visible APIs.

The secret memory remains accessible in the process context using uaccess
primitives, but it is not exposed to the kernel otherwise; secret memory
areas are removed from the direct map and functions in the
follow_page()/get_user_page() family will refuse to return a page that
belongs to the secret memory area.

Once there will be a use case that will require exposing secretmem to the
kernel it will be an opt-in request in the system call flags so that user
would have to decide what data can be exposed to the kernel.

Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which
affects the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice". Hence, it is sufficient to
have secretmem disabled by default with the ability of a system
administrator to enable it at boot time.

Pages in the secretmem regions are unevictable and unmovable to avoid
accidental exposure of the sensitive data via swap or during page
migration.

Since the secretmem mappings are locked in memory they cannot exceed
RLIMIT_MEMLOCK. Since these mappings are already locked independently
from mlock(), an attempt to mlock()/munlock() secretmem range would fail
and mlockall()/munlockall() will ignore secretmem mappings.

However, unlike mlock()ed memory, secretmem currently behaves more like
long-term GUP: secretmem mappings are unmovable mappings directly consumed
by user space. With default limits, there is no excessive use of
secretmem and it poses no real problem in combination with
ZONE_MOVABLE/CMA, but in the future this should be addressed to allow
balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA.

A page that was a part of the secret memory area is cleared when it is
freed to ensure the data is not exposed to the next user of that page.

The following example demonstrates creation of a secret mapping (error
handling is omitted):

fd = memfd_secret(0);
ftruncate(fd, MAP_SIZE);
ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);

[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

[akpm@linux-foundation.org: suppress Kconfig whine]

Link: https://lkml.kernel.org/r/20210518072034.31572-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3234ac66 21-May-2020 Dan Williams <dan.j.williams@intel.com>

/dev/mem: Revoke mappings when a driver claims the region

Close the hole of holding a mapping over kernel driver takeover event of
a given address range.

Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
kernel against scenarios where a /dev/mem user tramples memory that a
kernel driver owns. However, this protection only prevents *new* read(),
write() and mmap() requests. Established mappings prior to the driver
calling request_mem_region() are left alone.

Especially with persistent memory, and the core kernel metadata that is
stored there, there are plentiful scenarios for a /dev/mem user to
violate the expectations of the driver and cause amplified damage.

Teach request_mem_region() to find and shoot down active /dev/mem
mappings that it believes it has successfully claimed for the exclusive
use of the driver. Effectively a driver call to request_mem_region()
becomes a hole-punch on the /dev/mem device.

The typical usage of unmap_mapping_range() is part of
truncate_pagecache() to punch a hole in a file, but in this case the
implementation is only doing the "first half" of a hole punch. Namely it
is just evacuating current established mappings of the "hole", and it
relies on the fact that /dev/mem establishes mappings in terms of
absolute physical address offsets. Once existing mmap users are
invalidated they can attempt to re-establish the mapping, or attempt to
continue issuing read(2) / write(2) to the invalidated extent, but they
will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
block those subsequent accesses.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 8dcc1a9d 25-Dec-2019 Damien Le Moal <damien.lemoal@wdc.com>

fs: New zonefs file system

zonefs is a very simple file system exposing each zone of a zoned block
device as a file. Unlike a regular file system with zoned block device
support (e.g. f2fs), zonefs does not hide the sequential write
constraint of zoned block devices to the user. Files representing
sequential write zones of the device must be written sequentially
starting from the end of the file (append only writes).

As such, zonefs is in essence closer to a raw block device access
interface than to a full featured POSIX file system. The goal of zonefs
is to simplify the implementation of zoned block device support in
applications by replacing raw block device file accesses with a richer
file API, avoiding relying on direct block device file ioctls which may
be more obscure to developers. One example of this approach is the
implementation of LSM (log-structured merge) tree structures (such as
used in RocksDB and LevelDB) on zoned block devices by allowing SSTables
to be stored in a zone file similarly to a regular file system rather
than as a range of sectors of a zoned device. The introduction of the
higher level construct "one file is one zone" can help reducing the
amount of changes needed in the application as well as introducing
support for different application programming languages.

Zonefs on-disk metadata is reduced to an immutable super block to
persistently store a magic number and optional feature flags and
values. On mount, zonefs uses blkdev_report_zones() to obtain the device
zone configuration and populates the mount point with a static file tree
solely based on this information. E.g. file sizes come from the device
zone type and write pointer offset managed by the device itself.

The zone files created on mount have the following characteristics.
1) Files representing zones of the same type are grouped together
under a common sub-directory:
* For conventional zones, the sub-directory "cnv" is used.
* For sequential write zones, the sub-directory "seq" is used.
These two directories are the only directories that exist in zonefs.
Users cannot create other directories and cannot rename nor delete
the "cnv" and "seq" sub-directories.
2) The name of zone files is the number of the file within the zone
type sub-directory, in order of increasing zone start sector.
3) The size of conventional zone files is fixed to the device zone size.
Conventional zone files cannot be truncated.
4) The size of sequential zone files represent the file's zone write
pointer position relative to the zone start sector. Truncating these
files is allowed only down to 0, in which case, the zone is reset to
rewind the zone write pointer position to the start of the zone, or
up to the zone size, in which case the file's zone is transitioned
to the FULL state (finish zone operation).
5) All read and write operations to files are not allowed beyond the
file zone size. Any access exceeding the zone size is failed with
the -EFBIG error.
6) Creating, deleting, renaming or modifying any attribute of files and
sub-directories is not allowed.
7) There are no restrictions on the type of read and write operations
that can be issued to conventional zone files. Buffered, direct and
mmap read & write operations are accepted. For sequential zone files,
there are no restrictions on read operations, but all write
operations must be direct IO append writes. mmap write of sequential
files is not allowed.

Several optional features of zonefs can be enabled at format time.
* Conventional zone aggregation: ranges of contiguous conventional
zones can be aggregated into a single larger file instead of the
default one file per zone.
* File ownership: The owner UID and GID of zone files is by default 0
(root) but can be changed to any valid UID/GID.
* File access permissions: the default 640 access permissions can be
changed.

The mkzonefs tool is used to format zoned block devices for use with
zonefs. This tool is available on Github at:

git@github.com:damien-lemoal/zonefs-tools.git.

zonefs-tools also includes a test suite which can be run against any
zoned block device, including null_blk block device created with zoned
mode.

Example: the following formats a 15TB host-managed SMR HDD with 256 MB
zones with the conventional zones aggregation feature enabled.

$ sudo mkzonefs -o aggr_cnv /dev/sdX
$ sudo mount -t zonefs /dev/sdX /mnt
$ ls -l /mnt/
total 0
dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq

The size of the zone files sub-directories indicate the number of files
existing for each type of zones. In this example, there is only one
conventional zone file (all conventional zones are aggregated under a
single file).

$ ls -l /mnt/cnv
total 137101312
-rw-r----- 1 root root 140391743488 Nov 25 13:23 0

This aggregated conventional zone file can be used as a regular file.

$ sudo mkfs.ext4 /mnt/cnv/0
$ sudo mount -o loop /mnt/cnv/0 /data

The "seq" sub-directory grouping files for sequential write zones has
in this example 55356 zones.

$ ls -lv /mnt/seq
total 14511243264
-rw-r----- 1 root root 0 Nov 25 13:23 0
-rw-r----- 1 root root 0 Nov 25 13:23 1
-rw-r----- 1 root root 0 Nov 25 13:23 2
...
-rw-r----- 1 root root 0 Nov 25 13:23 55354
-rw-r----- 1 root root 0 Nov 25 13:23 55355

For sequential write zone files, the file size changes as data is
appended at the end of the file, similarly to any regular file system.

$ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000452219 s, 9.1 MB/s

$ ls -l /mnt/seq/0
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0

The written file can be truncated to the zone size, preventing any
further write operation.

$ truncate -s 268435456 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0

Truncation to 0 size allows freeing the file zone storage space and
restart append-writes to the file.

$ truncate -s 0 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0

Since files are statically mapped to zones on the disk, the number of
blocks of a file as reported by stat() and fstat() indicates the size
of the file zone.

$ stat /mnt/seq/0
File: /mnt/seq/0
Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
Device: 870h/2160d Inode: 50431 Links: 1
Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-11-25 13:23:57.048971997 +0900
Modify: 2019-11-25 13:52:25.553805765 +0900
Change: 2019-11-25 13:52:25.553805765 +0900
Birth: -

The number of blocks of the file ("Blocks") in units of 512B blocks
gives the maximum file size of 524288 * 512 B = 256 MB, corresponding
to the device zone size in this example. Of note is that the "IO block"
field always indicates the minimum IO size for writes and corresponds
to the device physical sector size.

This code contains contributions from:
* Johannes Thumshirn <jthumshirn@suse.de>,
* Darrick J. Wong <darrick.wong@oracle.com>,
* Christoph Hellwig <hch@lst.de>,
* Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> and
* Ting Yao <tingyao@hust.edu.cn>.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# fe030c9b 31-Oct-2019 David Hildenbrand <david@redhat.com>

powerpc/pseries/cmm: Implement balloon compaction

We can now get rid of the cmm_lock and completely rely on the balloon
compaction internals, which now also manage the page list and the
lock.

Inflated/"loaned" pages are now movable. Memory blocks that contain
such pages can get offlined. Also, all such pages will be marked
PageOffline() and can therefore be excluded in memory dumps using
recent versions of makedumpfile.

Don't switch to balloon_page_alloc() yet (due to the GFP_NOIO). Will
do that separately to discuss this change in detail.

Signed-off-by: David Hildenbrand <david@redhat.com>
[mpe: Add isolated_pages-- in cmm_migratepage() as suggested by David]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-9-david@redhat.com


# 47e4937a 22-Aug-2019 Gao Xiang <xiang@kernel.org>

erofs: move erofs out of staging

EROFS filesystem has been merged into linux-staging for a year.

EROFS is designed to be a better solution of saving extra storage
space with guaranteed end-to-end performance for read-only files
with the help of reduced metadata, fixed-sized output compression
and decompression inplace technologies.

In the past year, EROFS was greatly improved by many people as
a staging driver, self-tested, betaed by a large number of our
internal users, successfully applied to almost all in-service
HUAWEI smartphones as the part of EMUI 9.1 and proven to be stable
enough to be moved out of staging.

EROFS is a self-contained filesystem driver. Although there are
still some TODOs to be more generic, we have a dedicated team
actively keeping on working on EROFS in order to make it better
with the evolution of Linux kernel as the other in-kernel filesystems.

As Pavel suggested, it's better to do as one commit since git
can do moves and all histories will be saved in this way.

Let's promote it from staging and enhance it more actively as
a "real" part of kernel for more wider scenarios!

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Pavel Machek <pavel@denx.de>
Cc: David Sterba <dsterba@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J . Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Li Guifu <bluce.liguifu@huawei.com>
Cc: Fang Wei <fangwei1@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190822213659.5501-1-hsiangkao@aol.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# ed63bb1d 13-Jun-2019 Greg Hackmann <ghackmann@google.com>

dma-buf: give each buffer a full-fledged inode

By traversing /proc/*/fd and /proc/*/map_files, processes with CAP_ADMIN
can get a lot of fine-grained data about how shmem buffers are shared
among processes. stat(2) on each entry gives the caller a unique
ID (st_ino), the buffer's size (st_size), and even the number of pages
currently charged to the buffer (st_blocks / 512).

In contrast, all dma-bufs share the same anonymous inode. So while we
can count how many dma-buf fds or mappings a process has, we can't get
the size of the backing buffers or tell if two entries point to the same
dma-buf. On systems with debugfs, we can get a per-buffer breakdown of
size and reference count, but can't tell which processes are actually
holding the references to each buffer.

Replace the singleton inode with full-fledged inodes allocated by
alloc_anon_inode(). This involves creating and mounting a
mini-pseudo-filesystem for dma-buf, following the example in fs/aio.c.

Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613223408.139221-2-fengc@google.com


# ea8157ab 21-May-2019 David Howells <dhowells@redhat.com>

zsfold: Convert zsfold to use the new mount API

Convert the zsfold filesystem to the new internal mount API as the old one
will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Signed-off-by: David Howells <dhowells@redhat.com>


# 3ad20fe3 14-Dec-2018 Christian Brauner <christian@brauner.io>

binder: implement binderfs

As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the
implementation of binderfs.

/* Abstract */
binderfs is a backwards-compatible filesystem for Android's binder ipc
mechanism. Each ipc namespace will mount a new binderfs instance. Mounting
binderfs multiple times at different locations in the same ipc namespace
will not cause a new super block to be allocated and hence it will be the
same filesystem instance.
Each new binderfs mount will have its own set of binder devices only
visible in the ipc namespace it has been mounted in. All devices in a new
binderfs mount will follow the scheme binder%d and numbering will always
start at 0.

/* Backwards compatibility */
Devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES for the
initial ipc namespace will work as before. They will be registered via
misc_register() and appear in the devtmpfs mount. Specifically, the
standard devices binder, hwbinder, and vndbinder will all appear in their
standard locations in /dev. Mounting or unmounting the binderfs mount in
the initial ipc namespace will have no effect on these devices, i.e. they
will neither show up in the binderfs mount nor will they disappear when the
binderfs mount is gone.

/* binder-control */
Each new binderfs instance comes with a binder-control device. No other
devices will be present at first. The binder-control device can be used to
dynamically allocate binder devices. All requests operate on the binderfs
mount the binder-control device resides in.
Assuming a new instance of binderfs has been mounted at /dev/binderfs
via mount -t binderfs binderfs /dev/binderfs. Then a request to create a
new binder device can be made as illustrated in [2].
Binderfs devices can simply be removed via unlink().

/* Implementation details */
- dynamic major number allocation:
When binderfs is registered as a new filesystem it will dynamically
allocate a new major number. The allocated major number will be returned
in struct binderfs_device when a new binder device is allocated.
- global minor number tracking:
Minor are tracked in a global idr struct that is capped at
BINDERFS_MAX_MINOR. The minor number tracker is protected by a global
mutex. This is the only point of contention between binderfs mounts.
- struct binderfs_info:
Each binderfs super block has its own struct binderfs_info that tracks
specific details about a binderfs instance:
- ipc namespace
- dentry of the binder-control device
- root uid and root gid of the user namespace the binderfs instance
was mounted in
- mountable by user namespace root:
binderfs can be mounted by user namespace root in a non-initial user
namespace. The devices will be owned by user namespace root.
- binderfs binder devices without misc infrastructure:
New binder devices associated with a binderfs mount do not use the
full misc_register() infrastructure.
The misc_register() infrastructure can only create new devices in the
host's devtmpfs mount. binderfs does however only make devices appear
under its own mountpoint and thus allocates new character device nodes
from the inode of the root dentry of the super block. This will have
the side-effect that binderfs specific device nodes do not appear in
sysfs. This behavior is similar to devpts allocated pts devices and
has no effect on the functionality of the ipc mechanism itself.

[1]: https://goo.gl/JL2tfX
[2]: program to allocate a new binderfs binder device:

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <linux/android/binder_ctl.h>

int main(int argc, char *argv[])
{
int fd, ret, saved_errno;
size_t len;
struct binderfs_device device = { 0 };

if (argc < 2)
exit(EXIT_FAILURE);

len = strlen(argv[1]);
if (len > BINDERFS_MAX_NAME)
exit(EXIT_FAILURE);

memcpy(device.name, argv[1], len);

fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC);
if (fd < 0) {
printf("%s - Failed to open binder-control device\n",
strerror(errno));
exit(EXIT_FAILURE);
}

ret = ioctl(fd, BINDER_CTL_ADD, &device);
saved_errno = errno;
close(fd);
errno = saved_errno;
if (ret < 0) {
printf("%s - Failed to allocate new binder device\n",
strerror(errno));
exit(EXIT_FAILURE);
}

printf("Allocated new binder device with major %d, minor %d, and "
"name %s\n", device.major, device.minor,
device.name);

exit(EXIT_SUCCESS);
}

Cc: Martijn Coenen <maco@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# dddde68b 18-Oct-2018 Adam Borowski <kilobyte@angband.pl>

xfs: add a define for statfs magic to uapi

Needed by userspace programs that call fstatfs().

It'd be natural to publish XFS_SB_MAGIC in uapi, but while these two
have identical values, they have different semantic meaning: one is
an enum cookie meant for statfs, the other a signature of the
on-disk format.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# f044c884 02-Nov-2017 David Howells <dhowells@redhat.com>

afs: Lay the groundwork for supporting network namespaces

Lay the groundwork for supporting network namespaces (netns) to the AFS
filesystem by moving various global features to a network-namespace struct
(afs_net) and providing an instance of this as a temporary global variable
that everything uses via accessor functions for the moment.

The following changes have been made:

(1) Store the netns in the superblock info. This will be obtained from
the mounter's nsproxy on a manual mount and inherited from the parent
superblock on an automount.

(2) The cell list is made per-netns. It can be viewed through
/proc/net/afs/cells and also be modified by writing commands to that
file.

(3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
This is unset by default.

(4) The 'rootcell' module parameter, which sets a cell and VL server list
modifies the init net namespace, thereby allowing an AFS root fs to be
theoretically used.

(5) The volume location lists and the file lock manager are made
per-netns.

(6) The AF_RXRPC socket and associated I/O bits are made per-ns.

The various workqueues remain global for the moment.

Changes still to be made:

(1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
from the old name.

(2) A per-netns subsys needs to be registered for AFS into which it can
store its per-netns data.

(3) Rather than the AF_RXRPC socket being opened on module init, it needs
to be opened on the creation of a superblock in that netns.

(4) The socket needs to be closed when the last superblock using it is
destroyed and all outstanding client calls on it have been completed.
This prevents a reference loop on the namespace.

(5) It is possible that several namespaces will want to use AFS, in which
case each one will need its own UDP port. These can either be set
through /proc/net/afs/cm_port or the kernel can pick one at random.
The init_ns gets 7001 by default.

Other issues that need resolving:

(1) The DNS keyring needs net-namespacing.

(2) Where do upcalls go (eg. DNS request-key upcall)?

(3) Need something like open_socket_in_file_ns() syscall so that AFS
command line tools attempting to operate on an AFS file/volume have
their RPC calls go to the right place.

Signed-off-by: David Howells <dhowells@redhat.com>


# 6f52b16c 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX license identifier to uapi header files with no license

Many user space API headers are missing licensing information, which
makes it hard for compliance tools to determine the correct license.

By default are files without license information under the default
license of the kernel, which is GPLV2. Marking them GPLV2 would exclude
them from being included in non GPLV2 code, which is obviously not
intended. The user space API headers fall under the syscall exception
which is in the kernels COPYING file:

NOTE! This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived work".

otherwise syscall usage would not be possible.

Update the files which contain no license information with an SPDX
license identifier. The chosen identifier is 'GPL-2.0 WITH
Linux-syscall-note' which is the officially assigned identifier for the
Linux syscall exception. SPDX license identifiers are a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne. See the previous patch in this series for the
methodology of how this patch was researched.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 62aa81d7 06-Jul-2017 Fabian Frederick <fabf@skynet.be>

ocfs2: use magic.h

Filesystems generally use SUPER_MAGIC values from magic.h instead of a
local definition.

Link: http://lkml.kernel.org/r/20170521154217.27917-1-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a481f4d9 25-May-2017 John Johansen <john.johansen@canonical.com>

apparmor: add custom apparmorfs that will be used by policy namespace files

AppArmor policy needs to be able to be resolved based on the policy
namespace a task is confined by. Add a base apparmorfs filesystem that
(like nsfs) will exist as a kern mount and be accessed via jump_link
through a securityfs file.

Setup the base apparmorfs fns and data, but don't use it yet.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>


# 5ff193fb 28-Oct-2016 Fenghua Yu <fenghua.yu@intel.com>

x86/intel_rdt: Add basic resctrl filesystem support

Use kernfs as basis for our user interface filesystem. This patch
supports mount/umount, and one mount parameter "cdp" to enable code/data
prioritization (though all we do at this point is ensure that the system
can support CDP). The file system is not populated yet in this patch.

[ tglx: Fixed up a few nits and added cdp handling in case of error ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3bc52c45 24-Jul-2016 Dan Williams <dan.j.williams@intel.com>

dax: define a unified inode/address_space for device-dax mappings

In support of enabling resize / truncate of device-dax instances, define
a pseudo-fs to provide a unified inode/address space for vm operations.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 48b4800a 26-Jul-2016 Minchan Kim <minchan@kernel.org>

zsmalloc: page migration support

This patch introduces run-time migration feature for zspage.

For migration, VM uses page.lru field so it would be better to not use
page.next field which is unified with page.lru for own purpose. For
that, firstly, we can get first object offset of the page via runtime
calculation instead of using page.index so we can use page.index as link
for page chaining instead of page.next.

In case of huge object, it stores handle to page.index instead of next
link of page chaining because huge object doesn't need to next link for
page chaining. So get_next_page need to identify huge object to return
NULL. For it, this patch uses PG_owner_priv_1 flag of the page flag.

For migration, it supports three functions

* zs_page_isolate

It isolates a zspage which includes a subpage VM want to migrate from
class so anyone cannot allocate new object from the zspage.

We could try to isolate a zspage by the number of subpage so subsequent
isolation trial of other subpage of the zpsage shouldn't fail. For
that, we introduce zspage.isolated count. With that, zs_page_isolate
can know whether zspage is already isolated or not for migration so if
it is isolated for migration, subsequent isolation trial can be
successful without trying further isolation.

* zs_page_migrate

First of all, it holds write-side zspage->lock to prevent migrate other
subpage in zspage. Then, lock all objects in the page VM want to
migrate. The reason we should lock all objects in the page is due to
race between zs_map_object and zs_page_migrate.

zs_map_object zs_page_migrate

pin_tag(handle)
obj = handle_to_obj(handle)
obj_to_location(obj, &page, &obj_idx);

write_lock(&zspage->lock)
if (!trypin_tag(handle))
goto unpin_object

zspage = get_zspage(page);
read_lock(&zspage->lock);

If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can be
stale by migration so it goes crash.

If it locks all of objects successfully, it copies content from old page
to new one, finally, create new zspage chain with new page. And if it's
last isolated subpage in the zspage, put the zspage back to class.

* zs_page_putback

It returns isolated zspage to right fullness_group list if it fails to
migrate a page. If it find a zspage is ZS_EMPTY, it queues zspage
freeing to workqueue. See below about async zspage freeing.

This patch introduces asynchronous zspage free. The reason to need it
is we need page_lock to clear PG_movable but unfortunately, zs_free path
should be atomic so the apporach is try to grab page_lock. If it got
page_lock of all of pages successfully, it can free zspage immediately.
Otherwise, it queues free request and free zspage via workqueue in
process context.

If zs_free finds the zspage is isolated when it try to free zspage, it
delays the freeing until zs_page_putback finds it so it will free free
the zspage finally.

In this patch, we expand fullness_list from ZS_EMPTY to ZS_FULL. First
of all, it will use ZS_EMPTY list for delay freeing. And with adding
ZS_FULL list, it makes to identify whether zspage is isolated or not via
list_empty(&zspage->list) test.

[minchan@kernel.org: zsmalloc: keep first object offset in struct page]
Link: http://lkml.kernel.org/r/1465788015-23195-1-git-send-email-minchan@kernel.org
[minchan@kernel.org: zsmalloc: zspage sanity check]
Link: http://lkml.kernel.org/r/20160603010129.GC3304@bbox
Link: http://lkml.kernel.org/r/1464736881-24886-12-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b1123ea6 26-Jul-2016 Minchan Kim <minchan@kernel.org>

mm: balloon: use general non-lru movable page feature

Now, VM has a feature to migrate non-lru movable pages so balloon
doesn't need custom migration hooks in migrate.c and compaction.c.

Instead, this patch implements the page->mapping->a_ops->
{isolate|migrate|putback} functions.

With that, we could remove hooks for ballooning in general migration
functions and make balloon compaction simple.

[akpm@linux-foundation.org: compaction.h requires that the includer first include node.h]
Link: http://lkml.kernel.org/r/1464736881-24886-4-git-send-email-minchan@kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2a28900b 28-Apr-2016 Jan Kara <jack@suse.cz>

udf: Export superblock magic to userspace

Currently UDF superblock magic doesn't appear in any userspace header
files and thus userspace apps have hard time checking for this fs. Let's
export the magic to userspace as with any other filesystem.

Signed-off-by: Jan Kara <jack@suse.cz>


# 67e9c74b 16-Nov-2015 Tejun Heo <tj@kernel.org>

cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type

With major controllers - cpu, memory and io - shaping up for the
unified hierarchy, cgroup2 is about ready to be, gradually, released
into the wild. Replace __DEVEL__sane_behavior flag which was used to
select the unified hierarchy with a separate filesystem type "cgroup2"
so that unified hierarchy can be mounted as follows.

mount -t cgroup2 none $MOUNT_POINT

The cgroup2 fs has its own magic number - 0x63677270 ("cgrp").

v2: Assign a different magic number to cgroup2 fs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>


# 257f8719 04-Nov-2015 Stephen Hemminger <stephen@networkplumber.org>

ovl: move super block magic number to magic.h

The overlayfs file system is not recognized by programs
like tail because the magic number is not in standard header location.

Move it so that the value will propagate on for the GNU library
and utilities. Needs to go in the fstatfs manual page as well.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>


# b2197755 29-Oct-2015 Daniel Borkmann <daniel@iogearbox.net>

bpf: add support for persistent maps/progs

This work adds support for "persistent" eBPF maps/programs. The term
"persistent" is to be understood that maps/programs have a facility
that lets them survive process termination. This is desired by various
eBPF subsystem users.

Just to name one example: tc classifier/action. Whenever tc parses
the ELF object, extracts and loads maps/progs into the kernel, these
file descriptors will be out of reach after the tc instance exits.
So a subsequent tc invocation won't be able to access/relocate on this
resource, and therefore maps cannot easily be shared, f.e. between the
ingress and egress networking data path.

The current workaround is that Unix domain sockets (UDS) need to be
instrumented in order to pass the created eBPF map/program file
descriptors to a third party management daemon through UDS' socket
passing facility. This makes it a bit complicated to deploy shared
eBPF maps or programs (programs f.e. for tail calls) among various
processes.

We've been brainstorming on how we could tackle this issue and various
approches have been tried out so far, which can be read up further in
the below reference.

The architecture we eventually ended up with is a minimal file system
that can hold map/prog objects. The file system is a per mount namespace
singleton, and the default mount point is /sys/fs/bpf/. Any subsequent
mounts within a given namespace will point to the same instance. The
file system allows for creating a user-defined directory structure.
The objects for maps/progs are created/fetched through bpf(2) with
two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor
along with a pathname is being passed to bpf(2) that in turn creates
(we call it eBPF object pinning) the file system nodes. Only the pathname
is being passed to bpf(2) for getting a new BPF file descriptor to an
existing node. The user can use that to access maps and progs later on,
through bpf(2). Removal of file system nodes is being managed through
normal VFS functions such as unlink(2), etc. The file system code is
kept to a very minimum and can be further extended later on.

The next step I'm working on is to add dump eBPF map/prog commands
to bpf(2), so that a specification from a given file descriptor can
be retrieved. This can be used by things like CRIU but also applications
can inspect the meta data after calling BPF_OBJ_GET.

Big thanks also to Alexei and Hannes who significantly contributed
in the design discussion that eventually let us end up with this
architecture here.

Reference: https://lkml.org/lkml/2015/10/15/925
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4282d606 20-Jan-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracefs: Add new tracefs file system

Add a separate file system to handle the tracing directory. Currently it
is part of debugfs, but that is starting to show its limits.

One thing is that in order to access the tracing infrastructure, you need
to mount debugfs. As that includes debugging from all sorts of sub systems
in the kernel, it is not considered advisable to mount such an all
encompassing debugging system.

Having the tracing system in its own file systems gives access to the
tracing sub system without needing to include all other systems.

Another problem with tracing using the debugfs system is that the
instances use mkdir to create sub buffers. debugfs does not support mkdir
from userspace so to implement it, special hacks were used. By controlling
the file system that the tracing infrastructure uses, this can be properly
done without hacks.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e149ed2b 01-Nov-2014 Al Viro <viro@zeniv.linux.org.uk>

take the targets of /proc/*/ns/* symlinks to separate fs

New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot. The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.

As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
from ns_get_path().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 294e30fe 08-Oct-2013 Josef Bacik <jbacik@fusionio.com>

Btrfs: add tests for find_lock_delalloc_range

So both Liu and I made huge messes of find_lock_delalloc_range trying to fix
stuff, me first by fixing extent size, then him by fixing something I broke and
then me again telling him to fix it a different way. So this is obviously a
candidate for some testing. This patch adds a pseudo fs so we can allocate fake
inodes for tests that need an inode or pages. Then it addes a bunch of tests to
make sure find_lock_delalloc_range is acting the way it is supposed to. With
this patch and all of our previous patches to find_lock_delalloc_range I am sure
it is working as expected now. Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>


# 2b3b9bb0 27-Mar-2013 James Hogan <jhogan@kernel.org>

hostfs: move HOSTFS_SUPER_MAGIC to <linux/magic.h>

Move HOSTFS_SUPER_MAGIC to <linux/magic.h> to be with it's magical
friends from other file systems.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# cee7e443 06-Nov-2012 Jarkko Sakkinen <jarkko.sakkinen@iki.fi>

smack: SMACK_MAGIC to include/uapi/linux/magic.h

SMACK_MAGIC moved to a proper place for easy user space access
(i.e. libsmack).

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@iki.fi>


# 39a53e0c 27-Nov-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add superblock and major in-memory structure

This adds the following major in-memory structures in f2fs.

- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.

- f2fs_inode_info:
contains vfs_inode and other fs-specific information.

- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.

- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.

- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.

In addition, add F2FS_SUPER_MAGIC in magic.h.

Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 91716322 22-Oct-2012 Matt Fleming <matt.fleming@intel.com>

efivarfs: Add unique magic number

Using pstore's superblock magic number is no doubt going to cause
problems in the future. Give efivarfs its own magic number.

Acked-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# 607ca46e 13-Oct-2012 David Howells <dhowells@redhat.com>

UAPI: (Scripted) Disintegrate include/linux

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>