History log of /freebsd-current/sys/compat/linux/linux_stats.c
Revision Date Author Comments
# 3460fab5 18-Aug-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Remove sys/cdefs.h inclusion where it's not needed due to 685dc743


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# fd745e1d 29-May-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Use pwd_altroot() to tell namei() about ABI root path

PR: 72920
Differential Revision: https://reviews.freebsd.org/D40090
MFC after: 2 month


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 166e2e5a 28-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Uniformly dev_t arguments translation

The two main uses of dev_t are in struct stat and as a parameter of the
mknod system calls.
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit quantity
with 12 bits set asaid for the major number and 20 for the minor number.
The in-kernel dev_t encoded as MMMmmmmm, where M is a hex digit of the
major number and m is a hex digit of the minor number.
The user-space dev_t encoded as mmmM MMmm, where M and m is the major
and minor numbers accordingly. This is downward compatible with legacy
systems where dev_t is 16 bits wide, encoded as MMmm.
In glibc dev_t is a 64-bit quantity, with 32-bit major and minor numbers,
encoded as MMMM Mmmm mmmM MMmm. This is downward compatible with the Linux
kernel and with legacy systems where dev_t is 16 bits wide.
In the FreeBSD dev_t is a 64-bit quantity. The major and minor numbers
are encoded as MMMmmmMm, therefore conversion of the device numbers between
Linux user-space and FreeBSD kernel required.


# 994ed958 28-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add a dedicated fstat() implementation

In between kern_fstat() and translate_fd_major_minor(), another process
having the same filedesc could modify or close fd.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D39763


# cb858340 28-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add a dedicated statat() implementation

Get rid of calling Linux stat translation hook and specific to Linux
handling of non-vnode dirfd from kern_statat(),

Reviewed by: kib, mjg
Differential revision: https://reviews.freebsd.org/D35474


# a408fc09 28-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Rename obsolete old struct l_stat to struct l_old_stat


# e9204c5c 28-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Move statx_copyout() close to linux_statx()

Just for future changes of the conditional Linuxulator build. We need
a small refactoring of the MI code to help porting Linuxulator to other
platforms.


# 6072eea0 28-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Move translate_vnhook_major_minor() into the Linux common module


# 2a38f51c 28-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Sort includes in the linux_stats.c


# d8e53d94 14-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Cleanup includes under compat/linux

Cleanup unneeded includes, sort the rest according to style(9).
No functional changes.

MFC after: 2 weeks


# 10d16789 12-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Get rid of the opt_compat.h include.

Since e013e369 COMPAT_LINUX, COMPAT_LINUX32 build options are removed,
so include of opt_compat.h is no more needed.

MFC after: 2 weeks


# 9922bccb 28-Jan-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Convert mount exported flags for statfs system calls.

MFC after: 1 week


# 953688e8 28-Jan-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Rework statfs conversion routine.

Rework the routines to convert a native statfs structure (with fixed-size 64-bit
counters) to a Linux statfs structure (with long-sized counters) for 32-bit apps.

Instead of following Linux and return an EOVERFLOW error from statfs() family of
syscalls when actual fs stat value(s) are large enough to not fit into 32 bits,
apply scale logics used by FreeBSD to convert a 5.x statfs structure to a 4.x
statfs structure.

For more details see cc479dda.

Tested by: glebius
MFC after: 1 week


# 09d60bfa 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Cleanup empty lines.

MFC after: 2 weeks


# ff39d74a 25-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add AT_NO_AUTOMOUNT to statx.

Specific to Linux AT_NO_AUTOMOUNT flag tells the kernel to not automount the
terminal component of pathname if it is a directory that is an automount point.
As it is the default for FreeBSD silencly ignore this flag.

glibc-2.34 uses this flag in the stat64 system calls which is used by i386.

Reviewed by: trasz
Differential revision: https://reviews.freebsd.org/D31524
MFC after: 2 weeks


# af4051d2 25-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

linux: remove the always curthread argument from lconvpath


# 9d167945 15-Jun-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: improve reporting for unsupported syscall flags

Filter out the flags we do support; previously we would print
out the flag value verbatim.

Reviewed By: dchagin
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30693


# 2362ad45 08-Jun-2021 Philippe Michaud-Boudreault <pitwuu@gmail.com>

linux: implement statx(2)

PR: 252106
Reviewed By: dchagin
Differential Revision: https://reviews.freebsd.org/D30466


# 4b45c2bb 16-Apr-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: make fstatat(2) handle AT_EMPTY_PATH

Without it, Qt5 apps from Focal fail to start, being unable to load
their plugins. It's also necessary for glibc 2.33, as found in recent
Arch snapshots.

PR: 254112
Reviewed By: kib
Sponsored by: The FreeBSD Foundation, EPSRC
Differential Revision: https://reviews.freebsd.org/D28192


# 1a34e9fa 20-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix potential race condition in linux stat(2).

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25618


# 1a180032 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

compat: clean up empty lines in .c and .h files


# 7ad2a82d 18-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error

Most consumers pass NULL.


# a125ed50 18-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

linux: add sysctl compat.linux.use_emul_path

This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.

It defaults to on (== current behavior).

make -C /root/linux-5.3-rc8 -s -j 1 bzImage:

use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total


# 17f701a3 11-Jul-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make linux stat(2) return the same st_dev for every devfs instance.
The reason for this is to work around an idiosyncrasy of glibc
getttynam(3) implementation: it checks whether st_dev returned for
fd 0 is the same as st_dev returned for the target of /proc/self/fd/0
symlink, and with linux chroots having their own devfs instance,
the check will fail if you chrooted into it.

PR: kern/240767
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25559


# c8b3463d 07-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)

The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by: kib (previous version)
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D23036


# 273ce4ae 29-Dec-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make Linux stat(2) et al distinguish between block and character
devices. It's required for LTP, among other things. It's not
complete, but good enough for now.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22950


# c5156c77 13-May-2019 Dmitry Chagin <dchagin@FreeBSD.org>

Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR: 222861
Reviewed by: trasz
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20178


# 931e2a1a 15-Jun-2018 Ed Maste <emaste@FreeBSD.org>

linuxulator: do not include legacy syscalls on arm64

Existing linuxulator platforms (i386, amd64) support legacy syscalls,
such as non-*at ones like open, but arm64 and other new platforms do
not.

Wrap these in #ifdef LINUX_LEGACY_SYSCALLS, #defined in the MD linux.h
files. We may need finer grained control in the future but this is
sufficient for now.

Reviewed by: andrew
Sponsored by: Turing Robotic Industries
Differential Revision: https://reviews.freebsd.org/D15237


# 407a8126 12-Jun-2018 Bruce Evans <bde@FreeBSD.org>

Oops, r335053 had an old version of the comment about 16-bit linux dev_t
translation.


# ab35e1c7 12-Jun-2018 Bruce Evans <bde@FreeBSD.org>

Fix the encoding of major and minor numbers in 64-bit dev_t by restoring
the old encodings for the lower 16 and 32 bits and only using the
higher 32 bits for unusually large major and minor numbers. This
change breaks compatibility with the previous encoding (which was only
used in -current).

Fix truncation to (essentially) 16-bit dev_t in newnfs v3.

Any encoding of device numbers gives an ABI, so it can't be changed
without translations for compatibility. Extra bits give the much
larger complication that the translations need to compress into fewer
bits. Fortunately, more than 32 bits are rarely needed, so
compression is rarely needed except for 16-bit linux dev_t where it
was always needed but never done.

The previous encoding moved the major number into the top 32 bits.
Almost no translation code handled this, so the major number was blindly
truncated away in most 32-bit encodings. E.g., for ffs, mknod(8) with
major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent
this and blindly truncated it to 2. But if this mknod was run on any
released version of FreeBSD, it gives dev_t = 0x102. ffs can represent
this, but in the previous encoding it was not decoded, giving major = 0,
minor = 0x102.

The presence of bugs was most obvious for exporting dev_t's from an
old system to -current, since bugs in newnfs augment them. I fixed
oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed
to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then
further in -current. E.g., old ad0 with major = 234, minor = 0x10002
had the correct (major, minor) number on the wire, but newnfs truncated
this to (234, 2) and then the previous encoding shifted the major
number into oblivion as seen by ffs or old applications.

I first tried to fix this by translating on every ABI/API boundary, but
there are too many boundaries and too many sloppy translations by blind
truncation. So use the old encoding for the low 32 bits so that sloppy
translations work no worse than before provided the high 32 bits are
not set. Add some error checking for when bits are lost. Keep not
doing any error checking for translations for almost everything in
compat/linux.

compat/freebsd32/freebsd32_misc.c:
Optionally check for losing bits after possibly-truncating assignments as
before.

compat/linux/linux_stats.c:
Depend on the representation being compatible with Linux's (or just with
itself for local use) and spell some of the translations as assignments in
a macro that hides the details.

fs/nfsclient/nfs_clcomsubs.c:
Essentially the same fix as in 1996, except there is now no possible
truncation in makedev() itself. Also fix nearby style bugs.

kern/vfs_syscalls.c:
As for freebsd32. Also update the sysctl description to include file
numbers, and change it to describe device ids as device numbers.

sys/types.h:
Use inline functions (wrapped by macros) since the expressions are now
a bit too complicated for plain macros. Describe the encoding and
some of the reasons for it. 16-bit compatibility didn't leave many
reasonable choices for the 32-bit encoding, and 32-bit compatibility
doesn't leave many reasonable choices for the 64-bit encoding. My
choice is to put the 8 new minor bits in the low 8 bits of the top 32
bits. This minimizes discontiguities.

Reviewed by: kib (except for rewrite of the comment in linux_stats.c)


# 372639f9 13-Jun-2018 Bruce Evans <bde@FreeBSD.org>

Fix some bugs found while fixing the representation and translation
of 64-bit dev_t's (but not ones involving dev_t's).

st_size was supposed to be clamped in cvtstat() and linux's copy_stat(),
but the clamping code wasn't aware that st_size is signed, and also had
an obfuscated off-by-1 value for the unsigned limit, so its effect was
to produce a bizarre negative size instead of clamping.

Change freebsd32's copy_ostat() to be no worse than cvtstat(). It was
missing clamping and bzero()ing of padding.

Reviewed by: kib (except a final fix of the clamp to the signed maximum)


# cbd92ce6 09-May-2018 Matt Macy <mmacy@FreeBSD.org>

Eliminate the overhead of gratuitous repeated reinitialization of cap_rights

- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
A 3.6% speedup in fstat was measured with this change.

Reported by: mjg
Reviewed by: oshogbo
Approved by: sbruno
MFC after: 1 month


# 340f4a8d 12-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Linuxulator: apply style(9) to return

Sponsored by: Turing Robotic Industries Inc.


# 0ba1b365 16-Feb-2018 Ed Maste <emaste@FreeBSD.org>

Rationalize license text on Linuxolator files

Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text. To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders. Additional files
waiting on permission from others are listed in review D14210.

Approved by: kan, marcel, sos, rdivacky
MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 7f2d13d6 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/compat: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 2c75d7b0 24-Sep-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

Small style(9) issue: spaces vs TAB.


# dbaa9ebf 18-Jun-2017 Ed Maste <emaste@FreeBSD.org>

Add ZFS to Linux statfs ftype

PR: 220086
Reviewed by: cem
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D11252


# e801ac78 25-Feb-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix linux_fstatfs() to return proper value for f_frsize. Without it,
linux df(1) binary from Xenial shows garbage.

Reviewed by: dchagin
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9692


# 3b51ec08 21-Feb-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Get rid of foo_sys() in linuxulator code. It was commented out, and it
would be useless anyway - there is no point in pretending to have block
devices; our "block" devices are in fact character ones, and can only
be accessed as such.

Discussed with: dchagin
MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# 2f304845 05-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not allocate struct statfs on kernel stack.

Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable. With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from: ino64 work by gleb
Discussed with: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 2ad02313 20-Mar-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Check bsd_to_linux_statfs() return value. Forgotten in r297070.

MFC after: 1 week


# 525c9796 20-Mar-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Return EOVERFLOW in case when actual statfs values are large enough and
not fit into 32 bit fileds of a Linux struct statfs.

PR: 181012
MFC after: 1 week


# 7958a34c 20-Mar-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Whitespaces, style(9) fixes. No functional changes.

MFC after: 1 week


# 99546279 20-Mar-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Implement fstatfs64 system call.

PR: 181012
Submitted by: John Wehle
MFC after: 1 week


# f131759f 05-Jul-2015 Mateusz Guzik <mjg@FreeBSD.org>

fd: make 'rights' a manadatory argument to fget* functions


# 9802eb9e 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Implement Linux specific syncfs() system call.


# 2166e4e0 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

As for now our tmpfs is no longer being considered
"highly experimental" remove /dev/shm magic commited
in r218497 and convert tmpfs type to an expected magic number.

Differential Revision: https://reviews.freebsd.org/D1497
Reviewed by: emaste, trasz


# 606bcc17 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Add newfstatat system call for 64-bit Linuxulator.

Differential Revision: https://reviews.freebsd.org/D1071
Reviewed by: trasz


# 4ca75bed 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Fix compilation with -DDEBUG option.

Differential Revision: https://reviews.freebsd.org/D1070
Reviewed by: trasz


# 7f8f1d7f 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Disable i386 call for x86-64 Linux.

Differential Revision: https://reviews.freebsd.org/D1067
Reviewed by: trasz


# 297f61cc 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Get ready to commit x86_64 Linux emulation.
All fields of type l_int in struct statfs are defined
as l_long on i386 and amd64.

Differential Revision: https://reviews.freebsd.org/D1064
Reviewed by: trasz


# 6e646651 13-Nov-2014 Konstantin Belousov <kib@FreeBSD.org>

Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 7870adb6 09-Feb-2012 Ed Schouten <ed@FreeBSD.org>

Remove direct access to si_name.

Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.

MFC after: 2 weeks


# 9a14aa01 15-Jan-2012 Ulrich Spörlein <uqs@FreeBSD.org>

Convert files to UTF-8


# a9d2f8d8 10-Aug-2011 Robert Watson <rwatson@FreeBSD.org>

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


# 529844c7 09-Feb-2011 Alexander Leidinger <netchild@FreeBSD.org>

Linux' shm_open() fails because it wants to find some funky shmfs
to construct the full pathname. It starts to search at the default
mountpoint which is /dev/shm. If this fails it runs through fstab
and searches for shmfs and tmpfs. Whatever it finds will be
statfs()'ed to be checked for Linux' fs magic for shmfs (0x01021994).

Ideally our tmpfs should deliver this fs magic to Linux processes, but
as our tmpfs is considered to be an experimental feature we can not
assume that there is always a tmpfs available.

To make shared memory work in the Linuxulator, force the fs type of
/dev/shm (which can be a symlink) to match what Linux expects. The user
is responsible (info has to be added to the linux base ports and the docs)
to setup a suitable link for /dev/shm.

Noticed by: Andre Albsmeier <Andre.Albsmeier@siemens.com>
Submitted by: Andre Albsmeier <Andre.Albsmeier@siemens.com>
MFC after: 1 month


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 510ea843 28-Mar-2010 Ed Schouten <ed@FreeBSD.org>

Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.

A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.


# 957d68dd 18-Feb-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

No need to include security/mac/mac_framework.h here.


# bcf11e8d 05-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 7ae27ff4 07-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions. This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by: dchagin, kib
Approved by: bz (mentor)


# 0eee862a 20-Feb-2009 Ed Schouten <ed@FreeBSD.org>

Don't make Linux stat() open character devices to resolve its name.

The existing code calls kern_open() to resolve the vnode of a pathname
right after a stat(). This is not correct, because it causes random
character devices to be opened in /dev. This means ls'ing a tape
streamer will cause it to rewind, for example. Changes I have made:

- Add kern_statat_vnhook() to allow binary emulators to `post-process'
struct stat, using the proper vnode.

- Remove unneeded printf's from stat() and statfs().

- Make the Linuxolator use kern_statat_vnhook(), replacing
translate_path_major_minor_at().

- Let translate_fd_major_minor() use vp->v_rdev instead of
vp->v_un.vu_cdev.

Result:

crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx
crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0
crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1
crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2

Before this commit, ptmx also had a major number of 136, because it
silently allocated and deallocated a pseudo-terminal. Device nodes that
cannot be opened now have proper major/minor-numbers.

Reviewed by: kib, netchild, rdivacky (thanks!)


# a4611ab6 28-Jan-2009 Ed Schouten <ed@FreeBSD.org>

Last step of splitting up minor and unit numbers: remove minor().

Inside the kernel, the minor() function was responsible for obtaining
the device minor number of a character device. Because we made device
numbers dynamically allocated and independent of the unit number passed
to make_dev() a long time ago, it was actually a misnomer. If you really
want to obtain the device number, you should use dev2udev().

We already converted all the drivers to use dev2unit() to obtain the
device unit number, which is still used by a lot of drivers. I've
noticed not a single driver passes NULL to dev2unit(). Even if they
would, its behaviour would make little sense. This is why I've removed
the NULL check.

Ths commit removes minor(), minor2unit() and unit2minor() from the
kernel. Because there was a naming collision with uminor(), we can
rename umajor() and uminor() back to major() and minor(). This means
that the makedev(3) manual page also applies to kernel space code now.

I suspect umajor() and uminor() isn't used that often in external code,
but to make it easier for other parties to port their code, I've
increased __FreeBSD_version to 800062.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# bc093719 20-Aug-2008 Ed Schouten <ed@FreeBSD.org>

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


# a147e6ca 02-Jun-2008 Ed Schouten <ed@FreeBSD.org>

Push down the major/minor conversion for pts/%u to improve consistency.

In the mpsafetty branch, Linux sshd seems to work properly inside a
jail. Some small modifications had to be made to the Linux compatibility
layer.

The Linux PTY routines always expect the device major number to be 136
or higher. Our code always set the major/minor number pair to 136:0.
This makes routines like ttyname() and ptsname() fail, because we'll end
up having ambiguous device numbers.

The conversion was not performed on all *stat() routines, which meant in
some cases the numbers didn't get transformed. By pushing the conversion
into linux_driver_get_major_minor(), the transformation will take place
on all calls.

Approved by: philip (mentor), rdivacky


# 48b05c3f 08-Apr-2008 Konstantin Belousov <kib@FreeBSD.org>

Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by: rdivacky
Sponsored by: Google Summer of Code 2007
Tested by: pho


# d075105d 04-Jan-2008 Konstantin Belousov <kib@FreeBSD.org>

After applying LCONVPATH() to the path, do use the converted path
instead of original user-mode string in the linux_stat() and
linux_lstat() syscalls.

Tested by: Peter Holm
MFC after: 3 days


# 15b78ac5 29-Dec-2007 Konstantin Belousov <kib@FreeBSD.org>

Apply the LCONVPATH() to the (old) linux_stat() and linux_lstat() syscalls.
Without it, code has two problems:
- behaviour of the old and new [l]stat are different with regard of
the /compat/linux
- directly accessing the userspace data from the kernel asks for
the panics.

Reported and tested by: Peter Holm
Reviewed by: rdivacky
MFC after: 3 days


# 3ab85269 18-Sep-2007 David Malone <dwmalone@FreeBSD.org>

The kernel version of Linux statfs64 is actually supposed to take
3 arguments, but we had forgotten the second argument. Also make the
Linux statfs64 struct depend on the architecture because it has an
extra 4 bytes padding on amd64 compared to i386.

The three argument fix is from David Taylor, the struct statfs64
stuff is my fault. With this patch I can install i386 Linux matlab
on an amd64 machine.

Submitted by: David Taylor <davidt_at_yadt.co.uk>
Approved by: re (kensmith)


# b77ad8fc 06-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

In translate_path_major_minor(), do not calculate otherwise unused 'fp'
variable, avoiding an extra locking of the file descriptor array.


# b256a1e1 04-Dec-2006 Jung-uk Kim <jkim@FreeBSD.org>

MFP4: 109652

Fixes for 'blocking in fifoor state' problem of LTP tests.
linux_*stat*() functions were opening files with O_RDONLY to get
major/minor pair for char/block special files. Unfortunately,
when these functions are used against fifo, it is blocked forever
because there is no writer. Instead, we only open char/block special
files for major/minor conversion. We have to get rid of kern_open()
entirely from translate_path_major_minor() but today is not the day.
While I am here, add checks for errors before calling
translate_path_major_minor().


# aed55708 22-Oct-2006 Robert Watson <rwatson@FreeBSD.org>

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 835e5061 27-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Add the linux statfs64 call. This allows Tivoli backup to proceed a little
but further on -current (still not successful, but a step into the right
direction).

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Tested by: Paul Mather <paul@gromit.dlib.vt.edu>


# edb75eca 16-May-2006 Doug Ambrisko <ambrisko@FreeBSD.org>

Fix file leaking in translate_path_major_minor.


# 01e0ffba 10-May-2006 Alexander Leidinger <netchild@FreeBSD.org>

Now that we don't have a linuxolator on alpha anymore:
- unifdef __alpha__
- revert rev. 1.66 of linux_socket.c


# 03487601 05-May-2006 Doug Ambrisko <ambrisko@FreeBSD.org>

Fix the the duplicate cut-n-paste in linux_fstat64 pointed out by
Alexander Leidinger. I forget to fix it in this version.


# 060e4882 05-May-2006 Doug Ambrisko <ambrisko@FreeBSD.org>

Enhance the Linux emulation layer to make MegaRAID SAS managements tool happy.
Add back in a scheme to emulate old type major/minor numbers via hooks into
stat, linprocfs to return major/minors that Linux app's expect. Currently
only /dev/null is always registered. Drivers can register via the Linux
type shim similar to the ioctl shim but by using
linux_device_register_handler/linux_device_unregister_handler functions.
The structure is:

struct linux_device_handler {
char *bsd_driver_name;
char *linux_driver_name;
char *bsd_device_name;
char *linux_device_name;
int linux_major;
int linux_minor;
int linux_char_device;
};

Linprocfs uses this to display the major number of the driver. The
soon to be available linsysfs will use it to fill in the driver name.
Linux_stat uses it to translate the major/minor into Linux type values.

Note major numbers are dynamically assigned via passing in a -1 for
the major number so we don't need to keep track of them.

This is somewhat needed due to us switching to our devfs. MegaCli
will not run until I add in the linsysfs and mfi Linux compat changes.

Sponsored by: IronPort Systems


# 61da9d97 20-Mar-2006 Alexander Leidinger <netchild@FreeBSD.org>

Fix tinderbox on alpha.

Tested by: cross-compile


# aefce619 19-Mar-2006 Ruslan Ermilov <ru@FreeBSD.org>

Unbreak COMPAT_LINUX32 option support on amd64.

Broken by: netchild


# d4a3f5dd 18-Mar-2006 Alexander Leidinger <netchild@FreeBSD.org>

Fixup some problems in my previous commit (COMPAT_43).

Pointyhat to: netchild


# 5c8919ad 18-Mar-2006 Alexander Leidinger <netchild@FreeBSD.org>

Get rid of the need of COMPAT_43 in the linuxolator.

Submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Obtained from: DragonFly (some parts)


# c4be1946 06-Feb-2006 Jeff Roberson <jeff@FreeBSD.org>

- Remove ifdef disabled code that doesn't have a chance of working anymore.


# d425dbec 26-Jan-2006 Olivier Houchard <cognet@FreeBSD.org>

Fix a typo : deivce => device

Spotted by: rwatson


# e83d253b 25-Jan-2006 Olivier Houchard <cognet@FreeBSD.org>

Linux compat bits needed to make linux programs use the new ptys :
linux_ioctl.[ch] : Implement LINUX_TIOCGPTN, which returns the pty number
linux_stats.c :
- Return the magic number for devfs.
- In various stats()-related functions, check that we're stating a
file in /dev/pts, and if so, change the st_rdev field to match what linux
expects to be there for a slave pty device. The glibc checks for this, and
their openpty() fails if it is no correct.


# 06a13778 23-Jun-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Actually only protect mount-point if security.jail.enforce_statfs is set to 2.
If we don't return statistics about requested file systems, system tools
may not work correctly or at all.

Approved by: re (scottl)


# 820a0de9 09-Jun-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value policy
0 show all mount-points without any restrictions
1 show only mount-points below jail's chroot and show only part of the
mount-point's path (if jail's chroot directory is /jails/foo and
mount-point is /jails/foo/usr/home only /usr/home will be shown)
2 show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with: rwatson


# d0cad55d 27-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove (now) unused argument 'td' from bsd_to_linux_statfs().


# 672d95c5 22-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

The code is under '#ifdef not_that_way', but anyway:

- Add missing prison_check_mount() check.


# a0e96a49 22-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

If we need to hide fsid, kern_statfs()/kern_fstatfs() will do it for us,
so do not duplicate the code in cvtstatfs().
Note, that we now need to clear fsid in freebsd4_getfsstat().

This moves all security related checks from functions like cvtstatfs()
and will allow to add more security related stuff (like statfs(2), etc.
protection for jails) a bit easier.


# bbbc2d96 15-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Neuter the duplicated disk-device magic code for now. Somebody with
serious linux-clue is necessary to fix this properly.


# 1e247cc2 22-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Neuter linux_ustat() until somebody finds time to try to fix it.

The fundamental problem is that we get only the lower 8 bits of the
minor device number so there is no guarantee that we can actually
find the disk device in question at all.

This was probably a bigger issue pre-GEOM where the upper bits
signaled which slice were in use.

The secondary problem is how we get from (partial) dev_t to vnode.

The correct implementation will involve traversing the mount list
looking for a perfect match or a possible match (for truncated
minor).


# f7a25872 07-Feb-2005 John Baldwin <jhb@FreeBSD.org>

- Use kern_{l,f,}stat() and kern_{f,}statfs() functions rather than
duplicating the contents of the same functions inline.
- Consolidate common code to convert a BSD statfs struct to a Linux struct
into a static worker function.


# 1997c537 13-Jan-2005 David E. O'Brien <obrien@FreeBSD.org>

Match the LINUX32's style with existing style
Submitted by: Jung-uk Kim <jkim@niksun.com>

Use positive, not negative logic.


# f69f5fbd 24-Sep-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Hold thread reference while frobbing cdevsw.


# 4af27623 16-Aug-2004 Tim J. Robbins <tjr@FreeBSD.org>

Changes to MI Linux emulation code necessary to run 32-bit Linux binaries
on AMD64, and the general case where the emulated platform has different
size pointers than we use natively:
- declare certain structure members as l_uintptr_t and use the new PTRIN
and PTROUT macros to convert to and from native pointers.
- declare some structures __packed on amd64 when the layout would differ
from that used on i386.
- include <machine/../linux32/linux.h> instead of <machine/../linux/linux.h>
if compiling with COMPAT_LINUX32. This will need to be revisited before
32-bit and 64-bit Linux emulation support can coexist in the same kernel.
- other small scattered changes.

This should be a no-op on i386 and Alpha.


# 41befa53 14-Aug-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add XXX comment about findcdev() misuse.


# f3732fd1 17-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 89c9c53d 16-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 651b11ea 11-Mar-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.


# 816d62bb 21-Feb-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Device megapatch 5/6:

Remove the unused second argument from udev2dev().

Convert all remaining users of makedev() to use udev2dev(). The
semantic difference is that udev2dev() will only locate a pre-existing
dev_t, it will not line makedev() create a new one.

Apart from the tiny well controlled windown in D_PSEUDO drivers,
there should no longer be any "anonymous" dev_t's in the system
now, only dev_t's created with make_dev() and make_dev_alias()


# 0b399cc8 05-Nov-2003 Eric Anholt <anholt@FreeBSD.org>

Prevent leaking of fsid to non-root users in linux_statfs and linux_fstatfs.
Matches native syscalls now.

PR: kern/58793
Submitted by: David P. Reese Jr. <daver@gomerbud.com>
MFC after: 1 week


# 3b6d9652 22-Jun-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add a f_vnode field to struct file.

Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.


# 16dbc7f2 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# a966b13d 29-Apr-2003 Martin Blapp <mbr@FreeBSD.org>

Initialize tbuf in newstat_copyout() too.

Reviewed by: phk


# 616aa29a 28-Apr-2003 Martin Blapp <mbr@FreeBSD.org>

Do the same thing for stat64_copyout() as we already
do for newstat_copyout().

Lie about disk drives which are character devices
in FreeBSD but block devices under Linux.

PR: 37227
Submitted by: Vladimir B. Grebenschikov <vova@sw.ru>
Reviewed by: phk
MFC after: 2 weeks


# 31566c96 20-Mar-2003 John Baldwin <jhb@FreeBSD.org>

Use td->td_ucred instead of td->td_proc->p_ucred.


# 1d062e2b 03-Mar-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

Clean up whitespace and remove register keyword.


# 4b7ef73d 03-Mar-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

More caddr_t removal, in conjunction with copy{in,out}(9) this time.
Also clean up some egregious casts and incorrect use of sizeof.


# 48e3128b 12-Jan-2003 Matthew Dillon <dillon@FreeBSD.org>

Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.


# cd72f218 11-Jan-2003 Matthew Dillon <dillon@FreeBSD.org>

Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.


# 85422e62 05-Sep-2002 Bruce Evans <bde@FreeBSD.org>

Include <sys/malloc.h> instead of depending on namespace pollution 2
layers deep in <sys/proc.h> or <sys/vnode.h>.

Removed unused includes. Sorted includes.


# 206a5d3a 01-Sep-2002 Ian Dowse <iedowse@FreeBSD.org>

Use the new kern_* functions to avoid the need to store arguments
in the stack gap. This converts most VFS and signal related system
calls, as well as select().

Discussed on: -arch
Approved by: marcel


# ea6027a8 15-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential. Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential. Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument. This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL()
to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
and consumers so that this distinction is maintained at the VFS
as well as 'struct file' layer. Pass active_cred instead of
td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
credential is properly initialized and can be used in the socket
code if desired. Pass ap->a_td->td_ucred as the active
credential to soo_poll(). If we teach the vnop interface about
the distinction between file and active credentials, we would use
the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained. It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 9702d652 11-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Another fix that wasn't pulled in from the MAC branch: the
struct mount is not cached as *mp at this point, so use
vp->v_mount directly, following the check that it's non-NULL.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# eddc160e 01-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC entry points for a number of VFS-related
operations in the Linux ABI module. In particular, handle uselib
in a manner similar to open() (more work is probably needed here),
as well as handle statfs(), and linux readdir()-like calls.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 21dc7d4f 02-Jun-2002 Jens Schweikhardt <schweikh@FreeBSD.org>

Fix typo in the BSD copyright: s/withough/without/

Spotted and suggested by: des
MFC after: 3 weeks


# a4db4953 13-Jan-2002 Alfred Perlstein <alfred@FreeBSD.org>

Replace ffind_* with fget calls.

Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().


# 426da3bc 13-Jan-2002 Alfred Perlstein <alfred@FreeBSD.org>

SMP Lock struct file, filedesc and the global file list.

Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.

1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file * fhold(struct file *fp);
/* increments reference count on a file */

struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */

struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */

struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.


# 962cf420 18-Sep-2001 Maxim Sobolev <sobomax@FreeBSD.org>

Fix abuse of vtagtype. In addition, after this the linux programs will be
able correctly distinguish ext2fs from the ufs filesystem (previously ext2fs
was indistinguishable from the ufs).

Reviewed by: phk, marcel


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 5002a60f 08-Sep-2001 Marcel Moolenaar <marcel@FreeBSD.org>

Round of cleanups and enhancements. These include (in random order):

o Introduce private types for use in linux syscalls for two reasons:
1. establish type independence for ease in porting and,
2. provide a visual queue as to which syscalls have proper
prototypes to further cleanup the i386/alpha split.
Linuxulator types are prefixed by 'l_'. void and char have not
been "virtualized".

o Provide dummy functions for all syscalls and remove dummy functions
or implementations of truely obsolete syscalls.

o Sanitize the shm*, sem* and msg* syscalls.

o Make a first attempt to implement the linux_sysctl syscall. At this
time it only returns one MIB (KERN_VERSION), but most importantly,
it tells us when we need to add additional sysctls :-)

o Bump the kenel version up to 2.4.2 (this is not the same as the
KERN_VERSION MIB, BTW).

o Implement new syscalls, of which most are specific to i386. Our
syscall table is now up to date with Linux 2.4.2. Some highlights:
- Implement the 32-bit uid_t and gid_t bases syscalls.
- Implement a couple of 64-bit file size/offset bases syscalls.

o Fix or improve numerous syscalls and prototypes.

o Reduce style(9) violations while I'm here. Especially indentation
inconsistencies within the same file are addressed. Re-indenting
did not obfuscate actual changes to the extend that it could not
be combined.

NOTE: I spend some time testing these changes and found that if there
were regressions, they were not caused by these changes AFAICT.
It was observed that installing a RH 7.1 runtime environment
did make matters worse. Hangs and/or reboots have been observed
with and without these changes, so when it failed to make life
better in cases it doesn't look like it made it worse.


# 34c40997 03-Jun-2001 Paul Richards <paul@FreeBSD.org>

S_IFCHR is not a bit mask, it's just a value in a field. The correct
way to clear that field is to use S_IFMT.

Pointed out by BDE.


# 0b381bf1 01-Jun-2001 Ruslan Ermilov <ru@FreeBSD.org>

Remove vestiges of MFS.


# 753d4978 29-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Remove MFS


# 8a8402d3 26-May-2001 Ruslan Ermilov <ru@FreeBSD.org>

- sys/n[tw]fs moved to sys/fs/n[tw]fs
- /usr/include/n[tw]fs moved to /usr/include/fs/n[tw]fs


# 9ca3a84a 25-Apr-2001 Paul Richards <paul@FreeBSD.org>

A bogus check for a char device also matched symbolic links.
Replace it with a correct check using S_ISCHR()

Symbolic links will now work again in linux compatibility.


# 24593369 16-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Allow debugging output to be controlled on a per-syscall granularity.
Also clean up debugging output in a slightly more uniform fashion.

The default behavior remains the same (all debugging output is turned on)


# 4c3a3ec0 14-Jan-2001 Josef Karthauser <joe@FreeBSD.org>

Instead of hard coding the major numbers for IDE and SCSI disks
look in the device's cdevsw for the D_DISK flag.


# 7842f151 28-Dec-2000 Paul Richards <paul@FreeBSD.org>

Map FreeBSD character device hard disks to Linux block device hard disks.

This fixes the problem with VMWARE not being able to use raw disks.


# b4c6727a 02-Dec-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Don't auto-generate the syscalls.


# ebea8660 10-Nov-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Revert auto-generation. The Alpha port is broken.
Syncing with it is wrong.


# 2da829a0 09-Nov-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Sync with Alpha:
Do not use sysent.c, proto.h and syscall.h in source tree;
use auto-generated versions.


# a4e13024 01-Nov-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Fix linux_ustat syscall. We only have cdevs now, so looking
for a block device isn't that useful anymore.

Reported by: Wesley Morgan <morganw@chemicals.tacorp.com>
Submitted by: gallatin
Acknowledged by: phk


# 5231fb20 01-Nov-2000 David E. O'Brien <obrien@FreeBSD.org>

The MI/MD split wasn't perfect and the MI files need hacks for the
AlphaLinux compat bits. This will be better cleaned up soon.

Agreed to what ever was necessary by: marcel


# ac951e62 21-Aug-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Update include directives.


# 2c9b67a8 30-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


# dca60efc 08-Jan-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Convert the filesystem type returned in struct statfs by syscalls
linux_statfs and linux_fstatfs. Linux binaries testing this expect
the filesystem's magic number and not our vnode's tag.

PR: 15425
Tested by: Vladimir N. Silyaev <vsilyaev@mindspring.com>


# 762e6b85 15-Dec-1999 Eivind Eklund <eivind@FreeBSD.org>

Introduce NDFREE (and remove VOP_ABORTOP)


# af3a75ec 09-Dec-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Remove unused includes.

Found by: phk-scan


# 408da119 27-Nov-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Implement linux_ustat.

Reviewed by: bde


# 53c2c4e2 07-Nov-1999 Peter Wemm <peter@FreeBSD.org>

Use fo_stat() rather than Yet Another duplication of kern_descrip.c's stat
code.


# e459b442 29-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Fix a braino: Linux minor device numbers are 8 bits wide and not 10.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 4e0eaf69 25-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Fix linux_newlstat in that it doesn't return the attributes of its containing
directory. Also, update arguments of NDINIT for both newstat and newlstat.

While I'm at it, fix style bugs in all {s|ls|fs}tat syscalls.

Reported by: bde


# bfbb9ce6 11-May-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


# 90fb3368 09-May-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Yet another place which knew too much. Still not sure how much
good this does in the end.


# d5558c00 06-May-1999 Peter Wemm <peter@FreeBSD.org>

Fix up a few easy 'assignment used as truth value' and 'suggest parens
around && within ||' type warnings. I'm pretty sure I have not masked
any problems here, I've committed real problem fixes seperately.


# cb226aaa 06-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# bb65f5a1 19-Dec-1996 Bruce Evans <bde@FreeBSD.org>

Fixed lseek() on named pipes. It always succeeded but should always fail.
Broke locking on named pipes in the same way as locking on non-vnodes
(wrong errno). This will be fixed later.

The fix involves negative logic. Named pipes are now distinguished from
other types of files with vnodes, and there is additional code to handle
vnodes and named pipes in the same way only where that makes sense (not
for lseek, locking or TIOCSCTTY).


# d66a5066 02-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.


# 8cbf6e58 29-Jan-1996 Peter Wemm <peter@FreeBSD.org>

Call pipe_stat() when presented with a DTYPE_PIPE file in the linux
fstat() syscall, rather than panic("linux newfstat").

(Note: I've extracted this from a larger set of diffs, I'm confident I've
not missed any dependencies but can't modload it to test it on my system)


# 1f3dad5a 22-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Completed function declarations and added prototypes.

Removed some unnecessary #includes.

Fixed warnings about nested externs.


# 2b14f991 28-Aug-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# c21dee17 25-Jun-1995 Søren Schmidt <sos@FreeBSD.org>

First incarnation of our Linux emulator or rather compatibility code.
This first shot only incorporaties so much functionality that DOOM
can run (the X version), signal handling is VERY weak, so is many
other things. But it meets my milestone number one (you guessed it
- running DOOM).

Uses /compat/linux as prefix for loading shared libs, so it won't
conflict with our own libs.

Kernel must be compiled with "options COMPAT_LINUX" for this to work.