History log of /freebsd-10.0-release/sys/compat/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

258929 04-Dec-2013 peter

MFC: r258718: fix emulated jail_v0 byte order

Approved by: re (gjb)


258885 03-Dec-2013 kib

MFC r258661:
Add sysctl KERN_PROC_SIGTRAMP to retrieve signal trampoline location for the
given process.

Approved by: re (gjb)


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


256061 04-Oct-2013 kib

Add padding to match the compat32 struct stat32 definition to the real
struct stat on 32bit architectures.

Debugged and tested by: bsam
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (marius)


255971 01-Oct-2013 markj

Fix some typos that were causing probe argument types to show up as unknown.

Reviewed by: rwatson (mac provider)
Approved by: re (glebius)
MFC after: 1 week


255778 21-Sep-2013 markj

Regenerate syscall argument strings after r255777.

Approved by: re (gjb)
MFC after: 1 week


255709 19-Sep-2013 jhb

Regen.

Approved by: re (delphij)


255708 19-Sep-2013 jhb

Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
from arbitrary processes. Similar to ktrace it can apply a change to all
existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
control operations on processes (as opposed to the debugger-specific
operations provided by ptrace(2)). procctl(2) uses a combination of
idtype_t and an id to identify the set of processes on which to operate
similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
of a set of processes. MADV_PROTECT still works for backwards
compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
the first bit of which is used to track if P_PROTECT should be inherited
by new child processes.

Reviewed by: kib, jilles (earlier version)
Approved by: re (delphij)
MFC after: 1 month


255675 18-Sep-2013 rdivacky

Revert r255672, it has some serious flaws, leaking file references etc.

Approved by: re (delphij)


255672 18-Sep-2013 rdivacky

Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by: re (delphij)
Sponsored by: Google Summer of Code
Submitted by: Yuri Victorovich <yuri at rawbw dot com>
Tested by: Yuri Victorovich <yuri at rawbw dot com>


255658 17-Sep-2013 jilles

Regenerate for freebsd32_cap_enter().

Approved by: re (hrs)


255657 17-Sep-2013 jilles

Disallow cap_enter() in freebsd32 compatibility mode.

The freebsd32 compatibility mode (for running 32-bit binaries on 64-bit
kernels) does not currently allow any system calls in capability mode, but
still permits cap_enter(). As a result, 32-bit binaries on 64-bit kernels
that use capability mode do not work (they crash after being disallowed to
call sys_exit()). Affected binaries include dhclient and uniq. The latter's
crashes cause obscure build failures.

This commit makes freebsd32 cap_enter() fail with [ENOSYS], as if capability
mode was not compiled in. Applications deal with this by doing their work
without capability mode.

This commit does not fix the uncommon situation where a 64-bit process
enters capability mode and then executes a 32-bit binary using fexecve().

This commit should be reverted when allowing the necessary freebsd32 system
calls in capability mode.

Reviewed by: pjd
Approved by: re (hrs)


255426 09-Sep-2013 jhb

Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space. This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by: alc
Approved by: re (kib)


255220 05-Sep-2013 pjd

Regenerate after r255219.

Sponsored by: The FreeBSD Foundation


255219 05-Sep-2013 pjd

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


254943 27-Aug-2013 will

Add the ability to display the default FIB number for a process to the
ps(1) utility, e.g. "ps -O fib".

bin/ps/keyword.c:
Add the "fib" keyword and default its column name to "FIB".

bin/ps/ps.1:
Add "fib" as a supported keyword.

sys/compat/freebsd32/freebsd32.h:
sys/kern/kern_proc.c:
sys/sys/user.h:
Add the default fib number for a process (p->p_fibnum)
to the user land accessible process data of struct kinfo_proc.

Submitted by: Oliver Fromme <olli@fromme.com>, gibbs


254842 25-Aug-2013 andre

Give (*ext_free) an int return value allowing for very sophisticated
external mbuf buffer management capabilities in the future.

For now only EXT_FREE_OK is defined with current legacy behavior.

Sponsored by: The FreeBSD Foundation


254492 18-Aug-2013 pjd

Regenerate after r254491.


254491 18-Aug-2013 pjd

The cap_rights_limit(2) system calls needs a wrapper for 32bit binaries
running under 64bit kernels as the 'rights' argument has to be split
into two registers or the half of the rights will disappear.

Reported by: jilles
Sponsored by: The FreeBSD Foundation


254490 18-Aug-2013 pjd

Move the PAIR32TO64() macro and the RETVAL_HI/RETVAL_LO defines to a
header file for use by other .c files.

Sponsored by: The FreeBSD Foundation


254482 18-Aug-2013 pjd

Regenerate after r254481.


254481 18-Aug-2013 pjd

Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2)
system calls as unsigned longs have different size on i386 and amd64.

Reported by: jilles
Sponsored by: The FreeBSD Foundation


254467 17-Aug-2013 markj

Remove a couple of unused macros.

MFC after: 3 days


254448 17-Aug-2013 pjd

Regenerate after r254447.

Sponsored by: The FreeBSD Foundation


254447 17-Aug-2013 pjd

Make pdfork(2), pdkill(2) and pdgetpid(2) syscalls available for 32bit
binaries running under 64bit kernel.

Sponsored by: The FreeBSD Foundation


254356 15-Aug-2013 glebius

Make sendfile() a method in the struct fileops. Currently only
vnode backed file descriptors have this method implemented.

Reviewed by: kib
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


254025 07-Aug-2013 jeff

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253531 21-Jul-2013 kib

Regenerate.


253530 21-Jul-2013 kib

Implement compat32 wrappers for the ktimer_* syscalls.

Reported, reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


253529 21-Jul-2013 kib

Wrap kmq_notify(2) for compat32 to properly consume struct sigevent32
argument.

Reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


253528 21-Jul-2013 kib

The freebsd32_lio_listio() compat syscall takes the struct sigevent32.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


253527 21-Jul-2013 kib

Move the convert_sigevent32() utility function into freebsd32_misc.c
for consumption outside the vfs_aio.c.

For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods,
also copy in the sigev_value, since librt event pumping loop compares
note generation number with the value passed through sigev_value.

Tested by: Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


253525 21-Jul-2013 kib

Cosmetic change, use the same union name on the left and right sides
of the conversion.

Tested by: Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


253495 20-Jul-2013 kib

Regenerate


253494 20-Jul-2013 kib

id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2).

Reported and tested by: Petr Salinger <Petr.Salinger@seznam.cz>
PR: threads/180652
Sponsored by: The FreeBSD Foundation


253338 14-Jul-2013 hselasky

Add some missing LIBUSB IOCTL conversion codes.


252892 06-Jul-2013 netchild

- Move videodev headers from compat/linux to contrib/v4l (cp from vendor and
apply diff to compat/linux versions).
- The cp implies an update of videodev2.h to the linux kernel 2.6.34.14 one.

The update makes video in skype v4 work on FreeBSD.

Tested by: Artyom Mirgorodskiy <artyom.mirgorodsky@gmail.com>
(update of header only)


251527 08-Jun-2013 glebius

aio_mlock() added:
- Regen for r251526.
- Bump __FreeBSD_version.


251526 08-Jun-2013 glebius

Add new system call - aio_mlock(). The name speaks for itself. It allows
to perform the mlock(2) operation, which can consume a lot of time, under
control of aio(4).

Reviewed by: kib, jilles
Sponsored by: Nginx, Inc.


251423 05-Jun-2013 alc

Relax the vm object locking. Use a read lock.

Sponsored by: EMC / Isilon Storage Division


251198 31-May-2013 obrien

Add a "kern.features" MIB for 32bit support under a 64bit kernel.


250854 21-May-2013 kib

Regenerate.


250853 21-May-2013 kib

Fix the wait6(2) on 32bit architectures and for the compat32, by using
the right type for the argument in syscalls.master. Also fix the
posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the
architectures which require padding of the 64bit argument.

Noted and reviewed by: jhb
Pointy hat to: kib
MFC after: 1 week


250160 01-May-2013 jilles

Regenerate files for pipe2().


250159 01-May-2013 jilles

Add pipe2() system call.

The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and
O_NONBLOCK (on both sides) as part of the function.

If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p).

If the pointer is not valid, behaviour differs: pipe2() writes into the
array from the kernel like socketpair() does, while pipe() writes into the
array from an architecture-specific assembler wrapper.

Reviewed by: kan, kib


250155 01-May-2013 jilles

Regenerate files for accept4().


250154 01-May-2013 jilles

Add accept4() system call.

The accept4() function, compared to accept(), allows setting the new file
descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)

The linuxulator's accept4 implementation leaves a race window where the new
file descriptor is not close-on-exec because it calls sys_accept(). This
implementation leaves no such race window (by using falloc() flags). The
linuxulator could be fixed and simplified by using the new code.

Like accept(), accept4() is async-signal-safe, a cancellation point and
permitted in capability mode.


248996 02-Apr-2013 mdf

Regen.

MFC after: 1 week


248995 02-Apr-2013 mdf

Fix return type of extattr_set_* and fix rmextattr(8) utility.

extattr_set_{fd,file,link} is logically a write(2)-like operation and
should return ssize_t, just like extattr_get_*. Also, the user-space
utility was using an int for the return value of extattr_get_* and
extattr_list_*, both of which return an ssize_t.

MFC after: 1 week


248951 31-Mar-2013 jilles

Rename do_pipe() to kern_pipe2() and declare it properly.


248600 21-Mar-2013 pjd

Regenerate after r248599.

Sponsored by: The FreeBSD Foundation


248599 21-Mar-2013 pjd

Implement chflagsat(2) system call, similar to fchmodat(2), but operates on
file flags.

Reviewed by: kib, jilles
Sponsored by: The FreeBSD Foundation


248598 21-Mar-2013 pjd

Regenerate after r248597.

Sponsored by: The FreeBSD Foundation


248597 21-Mar-2013 pjd

- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
for lchflags(2)) stated that it was u_long. Now some related functions
use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on: arch
Sponsored by: The FreeBSD Foundation


248324 15-Mar-2013 glebius

Use m_get/m_gethdr instead of compat macros.

Sponsored by: Nginx, Inc.


248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


247764 04-Mar-2013 eadler

Remove check for NULL prior to free(9) and m_freem(9).

Approved by: cperciva (mentor)


247668 02-Mar-2013 pjd

Regen after r247667.


247667 02-Mar-2013 pjd

- Implement two new system calls:

int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

which allow to bind and connect respectively to a UNIX domain socket with a
path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by: The FreeBSD Foundation
Discussed with: rwatson, jilles, kib, des


247604 02-Mar-2013 pjd

Regen after r247602.


247602 02-Mar-2013 pjd

Merge Capsicum overhaul:

- Capability is no longer separate descriptor type. Now every descriptor
has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
cap_new(2), which limits capability rights of the given descriptor
without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
that can be used with the new cap_fcntls_limit(2) syscall and retrive
them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
backward API and ABI compatibility there are some incompatible changes
that are described in detail below:

CAP_CREATE old behaviour:
- Allow for openat(2)+O_CREAT.
- Allow for linkat(2).
- Allow for symlinkat(2).
CAP_CREATE new behaviour:
- Allow for openat(2)+O_CREAT.

Added CAP_LINKAT:
- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
- Allow to be target for renameat(2).

Added CAP_SYMLINKAT:
- Allow for symlinkat(2).

Removed CAP_DELETE. Old behaviour:
- Allow for unlinkat(2) when removing non-directory object.
- Allow to be source for renameat(2).

Removed CAP_RMDIR. Old behaviour:
- Allow for unlinkat(2) when removing directory.

Added CAP_RENAMEAT:
- Required for source directory for the renameat(2) syscall.

Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
- Allow for unlinkat(2) on any object.
- Required if target of renameat(2) exists and will be removed by this
call.

Removed CAP_MAPEXEC.

CAP_MMAP old behaviour:
- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
PROT_WRITE.
CAP_MMAP new behaviour:
- Allow for mmap(2)+PROT_NONE.

Added CAP_MMAP_R:
- Allow for mmap(PROT_READ).
Added CAP_MMAP_W:
- Allow for mmap(PROT_WRITE).
Added CAP_MMAP_X:
- Allow for mmap(PROT_EXEC).
Added CAP_MMAP_RW:
- Allow for mmap(PROT_READ | PROT_WRITE).
Added CAP_MMAP_RX:
- Allow for mmap(PROT_READ | PROT_EXEC).
Added CAP_MMAP_WX:
- Allow for mmap(PROT_WRITE | PROT_EXEC).
Added CAP_MMAP_RWX:
- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

Renamed CAP_MKDIR to CAP_MKDIRAT.
Renamed CAP_MKFIFO to CAP_MKFIFOAT.
Renamed CAP_MKNODE to CAP_MKNODEAT.

CAP_READ old behaviour:
- Allow pread(2).
- Disallow read(2), readv(2) (if there is no CAP_SEEK).
CAP_READ new behaviour:
- Allow read(2), readv(2).
- Disallow pread(2) (CAP_SEEK was also required).

CAP_WRITE old behaviour:
- Allow pwrite(2).
- Disallow write(2), writev(2) (if there is no CAP_SEEK).
CAP_WRITE new behaviour:
- Allow write(2), writev(2).
- Disallow pwrite(2) (CAP_SEEK was also required).

Added convinient defines:

#define CAP_PREAD (CAP_SEEK | CAP_READ)
#define CAP_PWRITE (CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ)
#define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
#define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W)
#define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X)
#define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X)
#define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
#define CAP_RECV CAP_READ
#define CAP_SEND CAP_WRITE

#define CAP_SOCK_CLIENT \
(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
#define CAP_SOCK_SERVER \
(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
CAP_SETSOCKOPT | CAP_SHUTDOWN)

Added defines for backward API compatibility:

#define CAP_MAPEXEC CAP_MMAP_X
#define CAP_DELETE CAP_UNLINKAT
#define CAP_MKDIR CAP_MKDIRAT
#define CAP_RMDIR CAP_UNLINKAT
#define CAP_MKFIFO CAP_MKFIFOAT
#define CAP_MKNOD CAP_MKNODAT
#define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by: The FreeBSD Foundation
Reviewed by: Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with: rwatson, benl, jonathan
ABI compatibility discussed with: kib


247595 01-Mar-2013 delphij

Fix wrong assignment.

Submitted by: Sascha Wildner <saw online de>
Obtained from: DragonFly rev 9568dd07a22a136e380e6c19a8ea188eb92976d5
MFC after: 2 weeks


246085 29-Jan-2013 jhb

Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h
by moving bits that are MI out into headers in compat/linux.

Reviewed by: Chagin Dmitry dmitry | gmail
MFC after: 2 weeks


245908 25-Jan-2013 dchagin

Arithmetic on pointers takes into account the size of the type. Properly cast the pointer to avoid incorrect pointer scaling.

MFC after: 1 Week


245849 23-Jan-2013 jhb

Don't assume that all Linux TCP-level socket options are identical to
FreeBSD TCP-level socket options (only the first two are). Instead,
using a mapping function and fail unsupported options as we do for other
socket option levels.

MFC after: 2 weeks


243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


243419 23-Nov-2012 cperciva

MFS security patches which seem to have accidentally not reached HEAD:

Fix insufficient message length validation for EAP-TLS messages.

Fix Linux compatibility layer input validation error.

Security: FreeBSD-SA-12:07.hostapd
Security: FreeBSD-SA-12:08.linux
Security: CVE-2012-4445, CVE-2012-4576
With hat: so@


243133 16-Nov-2012 kib

Style fixes for r242958.

Reported and reviewed by: bde
MFC after: 28 days


242959 13-Nov-2012 kib

Regen


242958 13-Nov-2012 kib

Add the wait6(2) system call. It takes POSIX waitid()-like process
designator to select a process which is waited for. The system call
optionally returns siginfo_t which would be otherwise provided to
SIGCHLD handler, as well as extended structure accounting for child
and cumulative grandchild resource usage.

Allow to get the current rusage information for non-exited processes
as well, similar to Solaris.

The explicit WEXITED flag is required to wait for exited processes,
allowing for more fine-grained control of the events the waiter is
interested in.

Fix the handling of siginfo for WNOWAIT option for all wait*(2)
family, by not removing the queued signal state.

PR: standards/170346
Submitted by: "Jukka A. Ukkonen" <jau@iki.fi>
MFC after: 1 month


242476 02-Nov-2012 kib

The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem. There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount. Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode. Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode. Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by: pho
MFC after: 3 weeks


241896 22-Oct-2012 kib

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


241025 28-Sep-2012 kib

Fix the mis-handling of the VV_TEXT on the nullfs vnodes.

If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by: pho (previous version)
MFC after: 2 weeks


240387 12-Sep-2012 kevlo

Remove redundant check


240325 10-Sep-2012 jhb

Remove some more NetBSD compat shims and other unused bits from these
drivers:
- Remove scsi_low_pisa.*, they were unused.
- Remove <compat/netbsd/physio_proc.h> and calls to the stubs in that
header. They were empty nops.
- Retire sl_xname and use device_get_nameunit() and device_printf() with
the underlying device_t instead.
- Remove unused {ct,ncv,nsp,stg}print() functions.
- Remove empty SOFT_INTR_REQUIRED() macro and the unused sl_irq member.


239349 17-Aug-2012 davidxu

regen.


239347 17-Aug-2012 davidxu

Implement syscall clock_getcpuclockid2, so we can get a clock id
for process, thread or others we want to support.
Use the syscall to implement POSIX API clock_getcpuclock and
pthread_getcpuclockid.

PR: 168417


239297 15-Aug-2012 kib

Regenerate.


239296 15-Aug-2012 kib

Provide 32bit compat for truncate(2) and ftruncate(2).

MFC after: 1 week


239249 14-Aug-2012 kib

Regenerate.


239248 14-Aug-2012 kib

Implement the old mmap syscall for compat32, when COMPAT_43 option is
enabled. The syscall is used by FreeBSD 1.1.5.1 dynamic linker.

MFC after: 1 week


238687 22-Jul-2012 kib

Cosmetics: define FREEBSD32_MINUSER and AOUT32_MINUSER for struct
sysentvec .sv_minuser. Also improve style.

Submitted by: Oliver Pinter <oliver.pinter@gmail.com>
MFC after: 1 week


238029 02-Jul-2012 kib

Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by: pho
No objections from: jhb
MFC after: 3 weeks


236213 29-May-2012 kevlo

Make sure that each va_start has one and only one matching va_end,
especially in error cases.


236136 27-May-2012 kib

Fix ki_cow for compat32 binaries.

MFC after: 3 days


236027 25-May-2012 ed

Regenerate system call tables.


236026 25-May-2012 ed

Remove use of non-ISO-C integer types from system call tables.

These files already use ISO-C-style integer types, so make them less
inconsistent by preferring the standard types.


235886 24-May-2012 gleb

Add kern_fhstat(), adjust sys_fhstat() to use it.

Extend kern_getdirentries() to accept uio segflag and optionally return
buffer residue.

Sponsored by: Google Summer of Code 2011


235063 05-May-2012 netchild

- >500 static DTrace probes for the linuxulator
- DTrace scripts to check for errors, performance, ...
they serve mostly as examples of what you can do with the static probe;s
with moderate load the scripts may be overwhelmed, excessive lock-tracing
may influence program behavior (see the last design decission)

Design decissions:
- use "linuxulator" as the provider for the native bitsize; add the
bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
- Add probes only for locks which are acquired in one function and released
in another function. Locks which are aquired and released in the same
function should be easy to pair in the code, inter-function
locking is more easy to verify in DTrace.
- Probes for locks should be fired after locking and before releasing to
prevent races (to provide data/function stability in DTrace, see the
man-page of "dtrace -v ..." and the corresponding DTrace docs).


234352 16-Apr-2012 jkim

- Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c. There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *. Probably
this was the source of MI/MD confusion.

Reviewed by: emulation


233127 18-Mar-2012 tijl

Remove some unnecessary includes.


233125 18-Mar-2012 tijl

Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h.

Reviewed by: kib


233124 18-Mar-2012 tijl

Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98
reg.h with stubs.

The tREGISTER macros are only made visible on i386. These macros are
deprecated and should not be available on amd64.

The i386 and amd64 versions of struct reg have been renamed to struct
__reg32 and struct __reg64. During compilation either __reg32 or __reg64
is defined as reg depending on the machine architecture. On amd64 the i386
struct is also available as struct reg32 which is used in COMPAT_FREEBSD32
code.

Most of compat/ia32/ia32_reg.h is now IA64 only.

Reviewed by: kib (previous version)


233044 16-Mar-2012 tijl

Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h.
Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed.
Create machine/npx.h on amd64 to allow compiling i386 code that uses
this header.

The original npx.h and fpu.h define struct envxmm differently. Both
definitions have been included in the new x86 header as struct __envxmm32
and struct __envxmm64. During compilation either __envxmm32 or __envxmm64
is defined as envxmm depending on machine architecture. On amd64 the i386
struct is also available as struct envxmm32.

Reviewed by: kib


232509 04-Mar-2012 brucec

Fix race condition in KfRaiseIrql().

After getting the current irql, if the kthread gets preempted and
subsequently runs on a different CPU, the saved irql could be wrong.

Also, correct the panic string.

PR: kern/165630
Submitted by: Vladislav Movchan <vladislav.movchan at gmail.com>


232475 03-Mar-2012 jmallett

On MIPS, _ALIGN always aligns to 8 bytes, even for 32-bit binaries. This might
not be ideal, but is the ABI we've shipped so far. Fix macros which reflect
the results of _ALIGN on 32-bit MIPS to use the right alignment.

This fixes sendmsg under COMPAT_FREEBSD32 on n64 MIPS kernels.


232449 03-Mar-2012 jmallett

o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI. This mostly follows nwhitehorn's lead in implementing
COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit
ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
with 32-bit compatibility, then, disable some code which assumes otherwise
wrongly when built for MIPS. A more general macro to check in this case would
seem like a good idea eventually. If someone adds support for using n32
userland with n64 kernels on MIPS, then they will have to add a variety of
flags related to each piece of the ABI that can vary. That's probably the
right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
freebsd32 compat code. Probably this should be generalized at some point.

Reviewed by: gonzo


232278 29-Feb-2012 mm

Add procfs to jail-mountable filesystems.

Reviewed by: jamie
MFC after: 1 week


231949 21-Feb-2012 kib

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


231885 17-Feb-2012 kib

Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.

Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.

Do not map the whole executable into KVA at all to copy it out into
usermode. Directly use vn_rdwr() for the case of not page aligned
binary.

There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.

Reviewed by: alc
MFC after: 2 weeks


231378 10-Feb-2012 ed

Remove direct access to si_name.

Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.

MFC after: 2 weeks


231006 05-Feb-2012 davidxu

Add 32-bit compat code for AIO kevent flags introduced in revision 230857.


230426 21-Jan-2012 kib

Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs. As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
and the required size of FPU save area. The hw.use_xsave tunable is
provided for disabling XSAVE, and hw.xsave_mask may be used to
select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
(run-time sized) user save area on the top of the kernel stack,
right above the PCB. Reorganize the thread0 PCB initialization to
postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
well. FPU state is only useful for suspend, where it is saved in
dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
mcontext_t has a valid pointer to out-of-struct extended FPU
state. Signal handlers are supplied with stack-allocated fpu
state. The sigreturn(2) and setcontext(2) syscall honour the flag,
allowing the signal handlers to inspect and manipilate extended
state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
place in the fixed-sized mcontext_t to place variable-sized save
area. And, since mcontext_t is embedded into ucontext_t, makes it
impossible to fix in a reasonable way. Instead of extending
getcontext(2) syscall, provide a sysarch(2) facility to query
extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
consumers, making it opaque. Internally, struct fpu_kern_ctx now
contains a space for the extended state. Convert in-kernel consumers
of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after: 1 month


230249 17-Jan-2012 mckusick

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks


230145 15-Jan-2012 trociny

Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want
to read strings completely to know the actual size.

As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.

Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.

Suggested by: kib
MFC after: 3 days


230132 15-Jan-2012 uqs

Convert files to UTF-8


229402 03-Jan-2012 dim

In sys/compat/linux/linux_ioctl.c, work around a warning when a pointer
is compared to an integer, by casting the pointer to l_uintptr_t. No
functional difference on both i386 and amd64.

Reviewed by: ed, jhb
MFC after: 1 week


229004 30-Dec-2011 dim

In sys/compat/ndis/subr_ntoskrnl.c, change the RtlFillMemory function
definition from K&R to ANSI, to avoid a clang warning about the uint8_t
parameter being promoted to int, which is not compatible with the type
declared in the earlier prototype.

MFC after: 1 week


228957 29-Dec-2011 jhb

Implement linux_fadvise64() and linux_fadvise64_64() using
kern_posix_fadvise().

Reviewed by: silence on emulation@
MFC after: 2 weeks


228268 04-Dec-2011 trociny

Protect process environment variables with p_candebug().

Discussed with: jilles, kib, rwatson
MFC after: 2 weeks


227836 22-Nov-2011 trociny

Retire linprocfs_doargv(). Instead use new functions, proc_getargv()
and proc_getenvv(), which were implemented using linprocfs_doargv() as
a reference.

Suggested by: kib
Reviewed by: kib
Approved by: des (linprocfs maintainer)
MFC after: 2 weeks


227776 21-Nov-2011 lstewart

- Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate()
system calls to provide feed-forward clock management capabilities to
userspace processes. ffclock_getcounter() returns the current value of the
kernel's feed-forward clock counter. ffclock_getestimate() returns the current
feed-forward clock parameter estimates and ffclock_setestimate() updates the
feed-forward clock parameter estimates.

- Document the syscalls in the ffclock.2 man page.

- Regenerate the script-derived syscall related files.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by: Julien Ridoux (jridoux at unimelb edu au)


227693 19-Nov-2011 ed

Make the Linux *at() calls a bit more complete.

Properly support:

- AT_EACCESS for faccessat(),
- AT_SYMLINK_FOLLOW for linkat().


227692 19-Nov-2011 ed

Regenerate system call tables.


227691 19-Nov-2011 ed

Improve *access*() parameter name consistency.

The current code mixes the use of `flags' and `mode'. This is a bit
confusing, since the faccessat() function as a `flag' parameter to store
the AT_ flag.

Make this less confusing by using the same name as used in the POSIX
specification -- `amode'.


227502 14-Nov-2011 jhb

- Split out a kern_posix_fadvise() from the posix_fadvise() system call so
it can be used by in-kernel consumers.
- Make kern_posix_fallocate() public.
- Use kern_posix_fadvise() and kern_posix_fallocate() to implement the
freebsd32 wrappers for the two system calls.


227447 11-Nov-2011 pluknet

struct timespec32: change types of tv_sec and tv_nsec fields to signed
to match native struct timespec ABI on __LP32__.

This change is a prerequisite for upcoming futimens()/utimensat() in whose
implementations it is assumed that timespec32 can take a negative value.

MFC after: 1 week


227441 11-Nov-2011 rstone

Correct the types of the arguments to return probes of the syscall
provider. Previously we were erroneously supplying the argument types of
the corresponding entry probe.

Reviewed by: rpaulo
MFC after: 1 week


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


227071 04-Nov-2011 jhb

Regen.


227070 04-Nov-2011 jhb

Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region. It is currently only supported on regular files.

Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types. The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are
thus filesystem independent. Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.

The second type of hints are used to hint to the OS that data will or
will not be used. These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request. This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode. This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue. The second change adds
a new vm_object_page_cache() method. This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.

To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.

Reviewed by: jilles, kib, arch@
Approved by: re (kib)
MFC after: 1 month


226388 15-Oct-2011 kib

Control the execution permission of the readable segments for
i386 binaries on the amd64 and ia64 with the sysctl, instead of
unconditionally enabling it.

Reviewed by: marcel


226365 14-Oct-2011 jhb

Regen.


226364 14-Oct-2011 jhb

Use PAIR32TO64() for the offset and length parameters to
freebsd32_posix_fallocate() to properly handle big-endian platforms.

Reviewed by: mdf
MFC after: 1 week


226353 13-Oct-2011 marcel

Use PTRIN().


226349 13-Oct-2011 marcel

Wrap mprotect(2) so that we can add execute permissions when read
permissions are requested. This is needed on amd64 and ia64 for
JDK 1.4.x


226348 13-Oct-2011 marcel

Wrap mprotect(2)


226347 13-Oct-2011 marcel

In freebsd32_mmap() and when compiling for amd64 or ia64, also
ask for execute permissions when read permissions are wanted.
This is needed for JDK 1.4.x on i386.


226253 11-Oct-2011 brueffer

Add curly braces missed in r226247.

Pointy hat to: brueffer
Submitted by: many
MFC after: 1 week


226247 11-Oct-2011 brueffer

Properly free linux_gidset in case of an error.

CID: 4136
Found with: Coverity Prevent(tm)
MFC after: 1 week


226079 06-Oct-2011 jkim

Use the caculated length instead of maximum length.


226078 06-Oct-2011 jkim

Remove a now-defunct variable.


226074 06-Oct-2011 jkim

Use uint32_t instead of u_int32_t. Fix style(9) nits.


226073 06-Oct-2011 jkim

Make sure to ignore the leading NULL byte from Linux abstract namespace.


226072 06-Oct-2011 jkim

Restore the original socket address length if it was not really AF_INET6.


226071 06-Oct-2011 jkim

Retern more appropriate errno when Linux path name is too long.


226069 06-Oct-2011 jkim

Inline do_sa_get() function and remove an unused return value.


226068 06-Oct-2011 jkim

Unroll inlined strnlen(9) and make it easier to read. No functional change.


226023 04-Oct-2011 cperciva

Fix a bug in UNIX socket handling in the linux emulator which was
exposed by the security fix in FreeBSD-SA-11:05.unix.

Approved by: so (cperciva)
Approved by: re (kib)
Security: Related to FreeBSD-SA-11:05.unix, but not actually
a security fix.


225618 16-Sep-2011 kmacy

Auto-generated code from sys_ prefixing makesyscalls.sh change

Approved by: re(bz)


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


224987 18-Aug-2011 jonathan

Add experimental support for process descriptors

A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc


224778 11-Aug-2011 rwatson

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


224582 01-Aug-2011 kib

Implement the linprocfs swaps file, providing information about the
configured swap devices in the Linux-compatible format.

Based on the submission by: Robert Millan <rmh debian org>
PR: kern/159281
Reviewed by: bde
Approved by: re (kensmith)
MFC after: 2 weeks


224199 18-Jul-2011 bz

Rename ki_ocomm to ki_tdname and OCOMMLEN to TDNAMLEN.
Provide backward compatibility defines under BURN_BRIDGES.

Suggested by: jhb
Reviewed by: emaste
Sponsored by: Sandvine Incorporated
Approved by: re (kib)


224140 17-Jul-2011 marck

Correct small typo in a do{}while(0) define

Approved by: kib
MFC after: 2 weeks


224123 17-Jul-2011 bz

Remove the 'either' from the comment as it'll be less obvious that we
removed semmap in a bit of time from now. Re-wrap.

Suggested by: jhb


224067 15-Jul-2011 jonathan

Auto-generated system call code with cap_new(), cap_getrights().

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc


224066 15-Jul-2011 jonathan

Add cap_new() and cap_getrights() system calls.

Implement two previously-reserved Capsicum system calls:
- cap_new() creates a capability to wrap an existing file descriptor
- cap_getrights() queries the rights mask of a capability.

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc


224016 14-Jul-2011 bz

Remove semaphore map entry count "semmap" field and its tuning
option that is highly recommended to be adjusted in too much
documentation while doing nothing in FreeBSD since r2729 (rev 1.1).

ipcs(1) needs to be recompiled as it is accessing _KERNEL private
variables.

Reviewed by: jhb (before comment change on linux code)
Sponsored by: Sandvine Incorporated


223182 17-Jun-2011 pluknet

Return empty cmdline/environ string for processes with kernel address
space. This is consistent with the behavior in linux.

PR: kern/157871
Reported by: Petr Salinger <Petr Salinger att seznam cz>
Verified on: GNU/kFreeBSD debian 8.2-1-amd64 (by reporter)
Reviewed by: kib (some time ago)
MFC after: 2 weeks


223167 16-Jun-2011 kib

Regen.


223166 16-Jun-2011 kib

Implement compat32 for old lseek, for the a.out binaries on amd64.


221434 04-May-2011 netchild

Commit the missing linux_videdev2_compat.h (lost somewhere between
commit tree patch generation -> successful compile tree build test -> commmit).

Pointy hat to: netchild


221428 04-May-2011 netchild

Add FEATURE macros for v4l and v4l2 to the linuxulator.

Suggested by: ae


221426 04-May-2011 netchild

This is v4l2 support for the linuxulator. This allows to access FreeBSD
native devices which support the v4l2 API from processes running within
the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd
or multimedia/webcamd supplied drivers.

Submitted by: nox
MFC after: 1 month


221425 04-May-2011 netchild

Fix typo in comment, improve comment.


221424 04-May-2011 netchild

Add explanation about the use-permission and FreeBSDify it.


221423 04-May-2011 netchild

Copy the v4l2 header unchanged from the vendor branch.


220792 18-Apr-2011 mdf

Regen.


220791 18-Apr-2011 mdf

Add the posix_fallocate(2) syscall. The default implementation in
vop_stdallocate() is filesystem agnostic and will run as slow as a
read/write loop in userspace; however, it serves to correctly
implement the functionality for filesystems that do not implement a
VOP_ALLOCATE.

Note that __FreeBSD_version was already bumped today to 900036 for any
ports which would like to use this function.

Also reserve space in the syscall table for posix_fadvise(2).

Reviewed by: -arch (previous version)


220515 10-Apr-2011 trasz

Remove stray semicolon.


220433 07-Apr-2011 jkim

Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now. More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly). Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).


220373 05-Apr-2011 trasz

Add accounting for most of the memory-related resources.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


220281 02-Apr-2011 kib

Implement compat32 shims for PCIOCGETCONF.

There is a generic problem with the shims for ioctls that receive
pointers to the usermode data areas in the data argument. We either have
to modify the handler to accept UIO_USERSPACE/UIO_SYSSPACE indicator, or
allocate and fill a usermode memory for data buffer in the host format.
The change goes the second route, in particular because we do not need
to modify the handler.

Submitted by: John Wehle <john feith com>
MFC after: 2 weeks


220280 02-Apr-2011 kib

Provide the structures and ioctl number definition for handling
PCIOCGETCONF compat32.

Submitted by: John Wehle <john feith com>
MFC after: 2 weeks


220239 01-Apr-2011 kib

Regen


220238 01-Apr-2011 kib

Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.

In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
on amd64.

The changes are hidden under COMPAT_43.

MFC after: 1 month


220186 31-Mar-2011 avg

Revert r220032:linux compat: add SO_PASSCRED option with basic handling

I have not properly thought through the commit. After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred. And that is not implemented yet.

Pointyhat to: avg


220164 30-Mar-2011 trasz

Regenerate.


220163 30-Mar-2011 trasz

Add rctl. It's used by racct to take user-configurable actions based
on the set of rules it maintains and the current resource usage. It also
privides userland API to manage that ruleset.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


220159 30-Mar-2011 kib

Regen.


220158 30-Mar-2011 kib

Provide compat32 shims for kldstat(2).

Requested and tested by: jpaetzel
MFC after: 1 week


220032 26-Mar-2011 avg

linux compat: add SO_PASSCRED option with basic handling

This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by: dchagin, nox
MFC after: 2 weeks


220031 26-Mar-2011 avg

linux compat: improve and fix sendmsg/recvmsg compatibility

- implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS
and prctl PR_SET_KEEPCAPS.
- add SCM_CREDS support to sendmsg and recvmsg
- modify sendmsg to ignore control messages if not using UNIX
domain sockets

This should allow linux pulse audio daemon and client work on FreeBSD
and interoperate with native counter-parts modulo the differences in
pulseaudio versions.

PR: kern/149168
Submitted by: John Wehle <john@feith.com>
Reviewed by: netchild
MFC after: 2 weeks


219989 25-Mar-2011 kib

Implement compat32 MEMRANGE_GET and MEMRANGE_SET. This is needed to
run 32bit Xorg server with VESA driver.

Submitted by: John Wehle <john feith com>
MFC after: 1 week


219988 25-Mar-2011 kib

Fully emulate MDIOCLIST for compat32.

MFC after: 1 week


219987 25-Mar-2011 kib

Remove unneccessary panics, that can be easily triggered by user.
The copyin() function handles NULL as well as any other pointer.

MFC after: 3 days


219986 25-Mar-2011 kib

Fix file leakage in the freebsd32_ioctl routines.

Code inspection shows freebsd32_ioctl calls fget for a fd and calls
a subroutine to handle each specific ioctl. It is expected that the
subroutine will call fdrop when done. However many of the subroutines
will exit out early if copyin encounters an error resulting in fdrop
never being called.

Submitted by: John Wehle <john feith com>
MFC after: 3 days


219968 24-Mar-2011 jhb

Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
in fork to honor the locking requirements. While here, expand the scope
of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously
the code was locking the new child process (p2) after it had locked the
parent process (p1). However, when locking two processes, the safe order
is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
having the process locked to use PROC_LOCK(). Every place was already
locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading
the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after: 1 week


219668 15-Mar-2011 netchild

Staticize functions which are not used somewhere else, move the
corresponding prototypes from the header to the code file.


219560 12-Mar-2011 avg

add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls

Regenerate system call and systrace support files.

PR: kern/152822
Submitted by: Artem Belevich <fbsdlist@src.cx>
Reviewed by: jhb (earlier version)
MFC after: 3 weeks


219559 12-Mar-2011 avg

add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls

This commits makes necessary changes in syscall/sysent generation
infrastructure.

PR: kern/152822
Submitted by: Artem Belevich <fbsdlist@src.cx>
Reviewed by: jhb (ealier version)
MFC after: 3 weeks


219558 12-Mar-2011 dchagin

Style(9) fixes. No functional changes.

MFC after: 2 Week


219460 10-Mar-2011 jhb

Remove now-obsolete comment.

Submitted by: netchild
MFC after: 1 week


219430 09-Mar-2011 jkim

Remove custom interrupt dispatcher. This is a pointless micro-optimization
and it may cause problems if SS and SP are modified by real-mode code.

MFC after: 1 month


219421 09-Mar-2011 dchagin

Indeed, remove bogus since r219405 check of the Linux ABI.

Pointed out: jhb

MFC after: 2 Week


219405 08-Mar-2011 dchagin

Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with: kib

MFC after: 2 Week


219307 05-Mar-2011 trasz

Export login class information via kinfo and make it possible to view
it using "ps -o class".


219305 05-Mar-2011 trasz

Regenerate.


219304 05-Mar-2011 trasz

Add two new system calls, setloginclass(2) and getloginclass(2). This makes
it possible for the kernel to track login class the process is assigned to,
which is required for RCTL. This change also make setusercontext(3) call
setloginclass(2) and makes it possible to retrieve current login class using
id(1).

Reviewed by: kib (as part of a larger patch)


219242 03-Mar-2011 dchagin

Print out shared flag for debug purpose.

MFC after: 1 Week


219240 03-Mar-2011 dchagin

Switch PROCESS_SHARE to AUTO_SHARE (as umtx do). Even for SHARED,
if page mapped MAP_ANON linux uses private algorithm too.

Disscussed with: jhb

MFC after: 3 Days


219132 01-Mar-2011 rwatson

Regenerate system call files following addition of cap_enter(2),
cap_getmode(2), and capabilities.conf.

Reviewed by: anderson
Discussed with: benl, kris, pjd
Obtained from: Capsicum Project
Sponsored by: Google, Inc.
MFC after: 3 months


219129 01-Mar-2011 rwatson

Add initial support for Capsicum's Capability Mode to the FreeBSD kernel,
compiled conditionally on options CAPABILITIES:

Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a
subject (typically a process) is in capability mode.

Add two new system calls, cap_enter(2) and cap_getmode(2), which allow
setting and querying (but never clearing) the flag.

Export the capability mode flag via process information sysctls.

Sponsored by: Google, Inc.
Reviewed by: anderson
Discussed with: benl, kris, pjd
Obtained from: Capsicum Project
MFC after: 3 months


218985 23-Feb-2011 brucec

Use the cprd_mem field when setting the start and length for a memory
resource - the layout of cprd_port is identical but using cprd_mem
makes the code easier to understand.

PR: kern/118493
Submitted by: Weongyo Jeong <weongyo.jeong at gmail.com>
MFC after: 3 days


218970 23-Feb-2011 jhb

Use umtx_key objects to uniquely identify futexes. Private futexes in
different processes that happen to use the same user address in the
separate processes will now be treated as distinct futexes rather than the
same futex. We can now honor shared futexes properly by mapping them to a
PROCESS_SHARED umtx_key. Private futexes use THREAD_SHARED umtx_key
objects.

In conjunction with: dchagin
Reviewed by: kib
MFC after: 1 week


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218879 20-Feb-2011 dchagin

Do not clobber %rdx.
Before calling vfork() syscall the linux user-space stores the current PID
in the %rdx and restore it when the parent process will leave the kernel.


218720 15-Feb-2011 dchagin

For realtime signals fill the sigval value.


218719 15-Feb-2011 dchagin

Make a linux_rt_sigtimedwait() system call is actually working.

1) Translate the native signal number in the appropriate Linux signal.
2) Remove bogus code, which can lead to a panic as it calls
kern_sigtimedwait with same ksiginfo.
3) Return the corresponding signal number.


218718 15-Feb-2011 dchagin

Style(9) fix. Wrap long lines in linux_rt_sigtimedwait().


218717 15-Feb-2011 dchagin

Put the macro declaration in the relevant include file for future use.


218686 14-Feb-2011 dchagin

Style(9) fix. Do not initialize variables in the declarations.


218668 13-Feb-2011 dchagin

Sort include files in the alphabetical order.


218655 13-Feb-2011 dchagin

Remove comment about 'ftlk' LOR.


218654 13-Feb-2011 dchagin

Stop printing the LOR, as this is expected behavior.


218646 13-Feb-2011 dchagin

The bitset field of freshly created futex should be initialized explicity.
Otherwise, REQUEUE operations fails.


218621 12-Feb-2011 dchagin

Rename used_requeue and use it as bitwise field to store more flags.
Reimplement used_requeue logic with LINUX_XDEPR_REQUEUEOP flag.


218618 12-Feb-2011 dchagin

Slightly rewrite linux_fork:

1) Remove bogus error checking.
2) A new process exit from kernel through fork_trampoline(),
so remove bogus check.


218617 12-Feb-2011 dchagin

Remove bogus include <machine/frame.h>


218616 12-Feb-2011 dchagin

Move linux_clone(), linux_fork(), linux_vfork() to a MI path.


218497 09-Feb-2011 netchild

Linux' shm_open() fails because it wants to find some funky shmfs
to construct the full pathname. It starts to search at the default
mountpoint which is /dev/shm. If this fails it runs through fstab
and searches for shmfs and tmpfs. Whatever it finds will be
statfs()'ed to be checked for Linux' fs magic for shmfs (0x01021994).

Ideally our tmpfs should deliver this fs magic to Linux processes, but
as our tmpfs is considered to be an experimental feature we can not
assume that there is always a tmpfs available.

To make shared memory work in the Linuxulator, force the fs type of
/dev/shm (which can be a symlink) to match what Linux expects. The user
is responsible (info has to be added to the linux base ports and the docs)
to setup a suitable link for /dev/shm.

Noticed by: Andre Albsmeier <Andre.Albsmeier@siemens.com>
Submitted by: Andre Albsmeier <Andre.Albsmeier@siemens.com>
MFC after: 1 month


218118 31-Jan-2011 dchagin

Yet another unimplemented futex operation, print out about.

Submitted by: arundel
MFC after: 1 month.


218117 31-Jan-2011 dchagin

Implement a futex BITSET op.

Submitted by: arundel
MFC after: 1 month.


218114 31-Jan-2011 bz

Update interface stats counters to match the current format in linux and
try to export as much information as we can match.

Requested on: Debian GNU/kFreeBSD list (debian-bsd lists.debian.org) 2010-12
Tested by: Mats Erik Andersson (mats.andersson gisladisker.se)
MFC after: 10 days


218031 28-Jan-2011 dchagin

Style(9) fixes.

MFC after: 1 Month.


218030 28-Jan-2011 dchagin

Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after: 1 Month.


218005 28-Jan-2011 dchagin

Style(9) fix.

MFC after: 1 month.


217896 26-Jan-2011 dchagin

Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after: 1 month


217743 23-Jan-2011 dchagin

Style(9) fix.

Approved by: kib(mentor)
MFC after: 1 month


217578 19-Jan-2011 kib

In linuxolator getdents_common(), it seems there is no reason to loop
if no records where returned by VOP_READDIR(). Readdir implementations
allowed to return 0 records when first record is larger then supplied
buffer. In this case trying to execute VOP_READDIR() again causes the
syscall looping forewer.

The goto was there from the day 1, which goes back to 1995 year.

Reported and tested by: Beat G?tzi <beat chruetertee ch>
MFC after: 2 weeks


217566 19-Jan-2011 mdf

Fix a few more SYSCTL_PROC() that were missing a CTLFLAG type specifier.


217151 08-Jan-2011 kib

Create shared (readonly) page. Each ABI may specify the use of page by
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.

Enable shared page use for amd64, both native and 32bit FreeBSD
binaries. Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.

Reviewed by: alc


216813 30-Dec-2010 scf

Fix the LINUX_SOUND_MIXER_INFO ioctl to return success after the
information is set to FreeBSD. It had been falling through to the end
of linux_ioctl_sound() and returning ENOIOCTL. Noticed when running the
Linux ALSA amixer tool.

Add a LINUX_SOUND_MIXER_READ_CAPS ioctl which is used by the Skype
v2.1.0.81 binary.

Reviewed by: gavin
MFC after: 2 weeks


216592 20-Dec-2010 tijl

Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by: imp (previous version), jhb
Approved by: kib (mentor)


216572 19-Dec-2010 kib

Restore the ABI of struct kinfo_proc32 after r213536.

MFC after: 3 days


216242 06-Dec-2010 bschmidt

Implement NdisGetRoutineAddress and MmGetSystemRoutineAddress used in
newer Ralink drivers.

Submitted by: Paul B Mahol <onemda at gmail.com>


216050 29-Nov-2010 bschmidt

Add a dummy for IoOpenDeviceRegistryKey().

With that change the Atheros 9xxx driver is actually usable and does not
panic anymore.

Submitted by: Paul B Mahol <onemda at gmail.com>
MFC after: 2 weeks


216049 29-Nov-2010 bschmidt

Some drivers rely on the existence of certain keys. The Atheros 9xxx
driver for example requests the NetCfgInstanceId but doesn't check the
returned status code and will happily access random memory instead.

Submitted by: Paul B Mahol <onemda at gmail.com>
MFC after: 2 weeks


215782 23-Nov-2010 bschmidt

Add prototype for InitializeSListHead().


215779 23-Nov-2010 bschmidt

Add a few functions used in newer drivers. Fix RtlCompareMemory() while
here.

Submitted by: Paul B Mahol <onemda@gmail.com>


215747 23-Nov-2010 pluknet

Update MNT_ROOTFS comments after changes in the root mount logic.

Reported by: arundel
Suggested by: marcel (phrasing)
Approved by: kib (mentor)


215741 23-Nov-2010 kib

Add include guards.

MFC after: 3 days


215708 22-Nov-2010 bschmidt

Resurrect amd64 support.
- Many drivers on amd64 are picking system uptime, interrupt time and ticks
via global data structure instead of calling functions for performance
reasons. For now just patch such address so driver will not trigger page
fault when trying to access such data. In future, additional callout may
be added to update data in periodic intervals.
- On amd64 we need to allocate "shadow space" on stack before calling any
function.

Submitted by: Paul B Mahol <onemda at gmail.com>


215707 22-Nov-2010 bschmidt

Prefer pmap_extract() over pmap_kextract() as done in MmIsAddressValid().
According to the comment for MmIsAddressValid() there are issues on PAE
kernels using pmap_kextract().

Submitted by: Paul B Mahol <onemda at gmail.com>


215706 22-Nov-2010 dim

Fix linux kernel module breakage introduced in r215675, by including
<sys/sysent.h>.

Noticed by: many
Pointy hat to: netchild


215679 22-Nov-2010 attilio

Add the ability for GDB to printout the thread name along with other
thread specific informations.

In order to do that, and in order to avoid KBI breakage with existing
infrastructure the following semantic is implemented:
- For live programs, a new member to the PT_LWPINFO is added (pl_tdname)
- For cores, a new ELF note is added (NT_THRMISC) that can be used for
storing thread specific, miscellaneous, informations. Right now it is
just popluated with a thread name.

GDB, then, retrieves the correct informations from the corefile via the
BFD interface, as it groks the ELF notes and create appropriate
pseudo-sections.

Sponsored by: Sandvine Incorporated
Tested by: gianni
Discussed with: dim, kan, kib
MFC after: 2 weeks


215675 22-Nov-2010 netchild

Do not take the process lock. The assignment to u_short inside the
properly aligned structure is atomic on all supported architectures, and
the thread that should see side-effect of assignment is the same thread
that does assignment.

Use a more appropriate conditional to detect the linux ABI.

Suggested by: kib
X-MFC: together with r215664


215666 22-Nov-2010 netchild

Remove trailing dot from the unimplemented futex messages to make
them consistent with the syscall and ipc messages.

Submitted by: arundel
MFC after: 3 days


215664 22-Nov-2010 netchild

By using the 32-bit Linux version of Sun's Java Development Kit 1.6
on FreeBSD (amd64), invocations of "javac" (or "java") eventually
end with the output of "Killed" and exit code 137.

This is caused by:
1. After calling exec() in multithreaded linux program threads are not
destroyed and continue running. They get killed after program being
executed finishes.

2. linux_exit_group doesn't return correct exit code when called not
from group leader. Which happens regularly using sun jvm.

The submitters fix this in a similar way to how NetBSD handles this.

I took the PRs away from dchagin, who seems to be out of touch of
this since a while (no response from him).

The patches committed here are from [2], with some little modifications
from me to the style.

PR: 141439 [1], 144194 [2]
Submitted by: Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk
Reviewed by: rdivacky (in april 2010)
MFC after: 5 days


215420 17-Nov-2010 bschmidt

Fix a panic on i386 for drivers using MmAllocateContiguousMemory()
and MmAllocateContiguousMemorySpecifyCache().

Those two functions take 64-bit variable(s) for their arguments. On i386
that takes additional 32-bit variable per argument. This is required so
that windrv_wrap() can correctly wrap function that miniport driver calls
with stdcall convention. Similar explanation is provided in subr_ndis.c for
other functions.

Submitted by: Paul B Mahol <onemda at gmail.com>


215419 17-Nov-2010 bschmidt

Use kmem_alloc_contig() to honour the cache_type variable.

Pointed out by: alc


215354 15-Nov-2010 des

Remove no-op assignment.

Submitted by: clang via arundel@
MFC after: 2 weeks


215339 15-Nov-2010 netchild

Some style(9) fixes.

Submitted by: arundel
MFC after: 1 week


215338 15-Nov-2010 netchild

- print out the PID and program name of the program trying to use an
unsupported futex operation
- for those futex operations which are known to be not supported,
print out which futex operation it is
- shortcut the error return of the unsupported FUTEX_CLOCK_REALTIME in
some cases:
FUTEX_CLOCK_REALTIME can be used to tell linux to use
CLOCK_REALTIME instead of CLOCK_MONOTONIC. FUTEX_CLOCK_REALTIME
however must only be set, if either FUTEX_WAIT_BITSET or
FUTEX_WAIT_REQUEUE_PI are set too. If that's not the case
we can die with ENOSYS right at the beginning.

Submitted by: arundel
Reviewed by: rdivacky (earlier iteration of the patch)
MFC after: 1 week


215135 11-Nov-2010 bschmidt

According to specs for MmAllocateContiguousMemorySpecifyCache() physically
contiguous memory with requested restrictions must be allocated.

Submitted by: Paul B Mahol <onemda at gmail.com>


214985 08-Nov-2010 des

Break long line.


214982 08-Nov-2010 des

Fix CPU ID in /proc/cpuinfo.

PR: kern/56451
Submitted by: arundel@
MFC after: 3 weeks


214798 04-Nov-2010 bschmidt

Remove 4.x, 5.x and 6.x compatibility bits.

Submitted by: Paul B Mahol <onemda at gmail.com>


213846 14-Oct-2010 kib

Remove stale comment.

Submitted by: arundel
MFC after: 3 days


213716 12-Oct-2010 kib

Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with: bz, rwatson
Approved by: re (bz, kensmith)
MFC after: 2 weeks


213490 06-Oct-2010 jkim

Simplify timeout check in futex_wait() using itimerfix() and return error
if the given timeout is invalid. Consistently use int type for timeout and
correct a format string in futex_sleep().


213471 06-Oct-2010 netchild

Fix a comparision of an uninitialised pointer.

Submitted by: arundel
Found by: clang analysis (automatic service by uqs@)
Reviewed by: rdivacky


213461 05-Oct-2010 thompsa

Use the printf-like capability from kproc_create().

Submitted by: Paul B Mahol


213458 05-Oct-2010 jkim

Prefer pmap_unmapbios() over pmap_unmapdev(). The binary does not change
after this because pmap_unmapbios() is a macro for pmap_unmapdev() on amd64.


213246 28-Sep-2010 kib

In linprocfs_doargv():
- handle compat32 processes;
- remove the checks for copied in addresses to belong into valid
usermode range, proc_rwmem() does this;
- simplify loop reading single string, limit the total amount of strings
collected by ARG_MAX bytes;
- correctly add '\0' at the end of each copied string;
- fix style.

In linprocfs_doprocenviron():
- unlock the process before calling copyin code [1]. The process is held
by pseudofs.

In linprocfs_doproccmdline:
- use linprocfs_doargv() to handle !curproc case for which p_args is not cached.

Reported by: plulnet [1]
Tested by: pluknet
Approved by: des (linprocfs maintainer, previous
version of the patch)
MFC after: 3 weeks


212723 16-Sep-2010 des

Implement proc/$$/environment.

Submitted by: Fernando Apesteguía <fernando.apesteguia@gmail.com>
MFC after: 3 weeks


212425 10-Sep-2010 mdf

Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.


211824 25-Aug-2010 jkim

Add x86bios_set_intr() to set interrupt vectors for real mode and simplify
x86bios_get_intr() a little.


211823 25-Aug-2010 jkim

Check opcode for short jump as well. Some option ROMs do short jumps
(e.g., some NVIDIA video cards) and we were not able to do POST while
resuming because we only honored long jump.

MFC after: 3 days


211412 17-Aug-2010 kib

Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by: marius (sparc64)
MFC after: 1 month


211148 10-Aug-2010 jkim

Place spinlock_enter() and spinlock_exit() just around X86EMU calls.


211131 10-Aug-2010 jkim

Tidy up locking and memory allocation for the real mode emulator wrapper.
Now we use a regular mutex instead of a spin mutex. When we enter and exit
the emulator, spinlock_enter() and spinlock_exit() are additionally used.
Move some page table related stuff from x86bios_init() and x86bios_uninit()
to x86bios_map_mem() and x86bios_unmap_mem().


211120 09-Aug-2010 jkim

Tidy up printf() calls for debugging.


211114 09-Aug-2010 jkim

Initialize a variable just before its use.


211112 09-Aug-2010 jkim

Reduce diffs between VM86 and X86EMU wrappers for x86bios_alloc() and
x86bios_free(). Add strict sanity checks for VM86 wrapper and add strict
page table locking for X86EMU wrapper.


211006 07-Aug-2010 kib

Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS
in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well.

MFC after: 1 week


211005 07-Aug-2010 kib

Add compat32 definition for (old) struct ostat.

MFC after: 1 week


210993 07-Aug-2010 jkim

Do not block any I/O port on amd64.


210992 07-Aug-2010 jkim

Optimize interrupt vector lookup. There is no need to check the page table.


210938 06-Aug-2010 jkim

Consistently use architecture specific macros.


210934 06-Aug-2010 jkim

Fix allocation of multiple pages, which forgot to increase page number.
Particularly, it caused "vm86_addpage: overlap" panics under VirtualBox.
Add a safety check before freeing memory while I am here.


210887 05-Aug-2010 jkim

Re-add flag register for output. Some BIOS calls actually use it to return
success/failure status. Oops.


210885 05-Aug-2010 jkim

Do not copy stack pointer and flags. These registers are unconditionally
destroyed from vm86_prepcall().


210877 05-Aug-2010 jkim

Implement a simple native VM86 backend for X86BIOS. Now i386 uses native
VM86 calls instead of the real mode emulator as a backend. VM86 has been
proven reliable for very long time and it is actually few times faster than
emulation. Increase maximum number of page table entries per VM86 context
from 3 to 8 pages. It was (ridiculously) low and insufficient for new VM86
backend, which shares one context globally. Slighly rearrange and clean up
the emulator backend to accommodate new code. The only visible change here
is stack size, which is decreased from 64K to 4K bytes to sync. with VM86.
Actually, it seems there is no need for big stack in real mode.

MFC after: 1 month


210848 04-Aug-2010 kib

Copy inode birthtime to the struct stat32.

MFC after: 1 week


210847 04-Aug-2010 kib

Fix style.

MFC after: 1 week


210796 03-Aug-2010 kib

When compat32 recvmsg(2) does not need to copy out control messages, set
msg_controllen to 0.

PR: kern/149227
Submitted by: Stef Walter <stef memberwebs com>
MFC after: 1 weeks


210545 27-Jul-2010 alc

Introduce exec_alloc_args(). The objective being to encapsulate the
details of the string buffer allocation in one place.

Eliminate the portion of the string buffer that was dedicated to storing
the interpreter name. The pointer to the interpreter name can simply be
made to point to the appropriate argument string.

Reviewed by: kib


210498 26-Jul-2010 kib

Revert r210451, and the similar part of the r210431. The forward-declaration
for the enum tag when enum definition is not complete is not allowed by
C99, and is gcc extension.

Requested by: stefanf
MFC after: 28 days


210475 25-Jul-2010 alc

Change the order in which the file name, arguments, environment, and
shell command are stored in exec*()'s demand-paged string buffer. For
a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces
the number of global TLB shootdowns by 31%. It also eliminates about
330k page faults on the kernel address space.

Change exec_shell_imgact() to use "args->begin_argv" consistently as
the start of the argument and environment strings. Previously, it
would sometimes use "args->buf", which is the start of the overall
buffer, but no longer the start of the argument and environment
strings. While I'm here, eliminate unnecessary passing of "&length"
to copystr(), where we don't actually care about the length of the
copied string.

Clean up the initialization of the exec map. In particular, use the
correct size for an entry, and express that size in the same way that
is used when an entry is allocated. The old size was one page too
large. (This discrepancy originated in 2004 when I rewrote
exec_map_first_page() to use sf_buf_alloc() instead of the exec map
for mapping the first page of the executable.)

Reviewed by: kib


210431 23-Jul-2010 kib

Remove the linux_exec_copyin_args(), freebsd32_exec_copyin_args() may
server as well. COMPAT_FREEBSD32 is a prerequisite for COMPAT_LINUX32.

Reviewed by: alc
MFC after: 3 weeks


210429 23-Jul-2010 alc

Eliminate a little bit of duplicated code.


210197 17-Jul-2010 trasz

Remove proc locking, it's not needed after r210132.


210132 15-Jul-2010 trasz

Make svr4(4) version of poll(2) use the same limit of file descriptors as the
usual poll(2) does, instead of checking resource limits.


209687 04-Jul-2010 kib

Constify source argument for siginfo_to_siginfo32().

MFC after: 1 week


209592 29-Jun-2010 jhb

Tweak the in-kernel API for sending signals to threads:
- Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c.
- Add tdsignal() and tdksignal() routines that mirror psignal() and
pksignal() except that they accept a thread as an argument instead of
a process. They send a signal to a specific thread rather than to an
individual process.

Reviewed by: kib


209581 28-Jun-2010 kib

Regenerate


209579 28-Jun-2010 kib

Count number of threads that enter and leave dynamically registered
syscalls. On the dynamic syscall deregistration, wait until all
threads leave the syscall code. This somewhat increases the safety
of the loadable modules unloading.

Reviewed by: jhb
Tested by: pho
MFC after: 1 month


209472 23-Jun-2010 jkim

Let x86bios_alloc() pass contigmalloc(9) flags. Use it to set M_WAITOK
from VESA BIOS initialization. All other malloc(9) uses in the function is
blocking any way.


209102 12-Jun-2010 ed

ANSIfy prototypes in subr_usbd.c.

Clang generates the following warnings when building subr_usbd.c:

| subr_usbd.c:598:13: warning: promoted type 'int' of K&R function
| parameter is not compatible with the parameter type 'uint8_t' (aka
| 'unsigned char') declared in a previous prototype
| subr_usbd.c:627:13: warning: promoted type 'int' of K&R function
| parameter is not compatible with the parameter type 'uint8_t' (aka
| 'unsigned char') declared in a previous prototype
| subr_usbd.c:649:13: warning: promoted type 'int' of K&R function
| parameter is not compatible with the parameter type 'uint8_t' (aka
| 'unsigned char') declared in a previous prototype

Instead of just ANSIfying these three prototypes, do it for the entire
file.

Spotted by: clang


209059 11-Jun-2010 jhb

Update several places that iterate over CPUs to use CPU_FOREACH().


208486 24-May-2010 wkoszek

Bring USB fixes for linux(4).

Intention of this commit is to let us take a full advantage
of libusb(8) ported to Linux. This decreases a possibility of getting
any collisions within ioctl() "command" space, especially with
relation to LINUX_SNDCTL_SEQ... stuff.

Basically, we provide commands, that will be mapped in the kernel
to correct ones and forward those to the USB layer. Port enabling
functionality brought with this patch is here:

http://www.freebsd.org/cgi/query-pr.cgi?pr=146895

Bump __FreeBSD_version to catch, since which version installing a
port makes sense.

This patch should bring no regressions. So far, only i386 is tested.

Tested by: thompsa@
Reviewed by: thompsa@
OKed by: netchild@


208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


207569 03-May-2010 netchild

- #ifdef out the cliplist part, skype seems like using an uninitialized
variable and can cause problems, without the cliplist handling it works
without problems
- improve the cliplist error handling
- fix VIDIOCGTUNER and VIDIOCSMICROCODE (still no hardware available to test)

Submitted by: J.R. Oldroyd <jr@opal.com>
X-MFC after: soon (together with all the v4l stuff)


207456 01-May-2010 jkim

Reduce MD code further. At least, it compiles on ia64 now (but it is not
connected to build). The idea/code was shamelessly taken from r207329.


207454 01-May-2010 jkim

Do not initialize mutex and return error if it cannot map memory.


207008 21-Apr-2010 kib

Provide compat32 shims for kinfo_proc sysctl. This allows 32bit ps(1) to
mostly work on 64bit host.

The work is based on an original patch submitted by emaste, obtained
from Sandvine's source tree.

Reviewed by: jhb
MFC after: 1 week


207007 21-Apr-2010 kib

Extract the code to copy-out struct rusage32 from struct rusage
into the new function.

Reviewed by: jhb
MFC after: 1 week


206597 14-Apr-2010 emaste

Linux puts a blank line between each CPU.


206136 03-Apr-2010 bz

Add a forward declaration to silence a warning when compiling ia32_genassym.c.

Reviewed by: kib
MFC after: 3 days


206081 02-Apr-2010 netchild

Re-apply r205683 with some modifications:
Fix some bogus values in linprocfs.

Submitted by: Petr Salinger <Petr.Salinger@seznam.cz>
Verified on: GNU/kFreeBSD debian 8.0-1-686 (by submitter)
PR: 144584

Reviewed by / discussed with: kib, des, jhb, submitter


205792 28-Mar-2010 ed

Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.

A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.


205695 26-Mar-2010 netchild

Revert r205683 to resolve some code quality issues which do not affect the
build or use of linprocfs, before committing the reworked patch again.

Requested by: des


205683 26-Mar-2010 netchild

Fix some bogus values in linprocfs.

Submitted by: Petr Salinger <Petr.Salinger@seznam.cz>
Verified on: GNU/kFreeBSD debian 8.0-1-686 (by submitter)
PR: 144584


205678 26-Mar-2010 netchild

Fix some problems which may lead to a panic:
- right order of src and dst in memcpy
- NULL out the clips after freeing to prevent an accident

Noticed by: hselasky


205650 25-Mar-2010 jkim

Revert accidentally committed initial real mode %sp change of r205347.
Note I am keeping %ds change because X.org int10 handler does it and
it seems reasonable.


205649 25-Mar-2010 jkim

Optimize real mode page table lookup.


205647 25-Mar-2010 jkim

Fix stupid typos. Some VESA BIOSes directly call BIOS interrupt handlers
within the VBE interrupt handler. Unfortunately it was causing real mode
page faults because we were fetching instructions from bogus addresses.
Pass me the pointyhat, please.

PR: kern/144654
MFC after: 3 days


205642 25-Mar-2010 nwhitehorn

Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by: jhb


205592 24-Mar-2010 jhb

Add missing Giant locking for the vfsconf list.

Submitted by: kib


205541 23-Mar-2010 jhb

Implement /proc/filesystems.

Submitted by: Fernando Apesteguia fernando.apesteguia (gmail)


205455 22-Mar-2010 jkim

Support memory wraparound instead of high memory as VM86 mode does.

Suggested by: delphij


205452 22-Mar-2010 jkim

Fix i386 PAE kernel build.

Reported by: tinderbox


205423 21-Mar-2010 ed

Actually make O_DIRECTORY work.

According to POSIX open() must return ENOTDIR when the path name does
not refer to a path name. Change vn_open() to respect this flag. This
also simplifies the Linuxolator a bit.


205347 19-Mar-2010 jkim

- Map EBDA if available and add 64KB above 1MB (high memory), just in case.
- Print the initial memory map when bootverbose is set.
- Change the page fault address format from linear to %cs:%ip style.
- Move duplicate code into a newly added function.
- Add strictly aligned memory access for distant future. ;-)


205328 19-Mar-2010 kib

Regen


205327 19-Mar-2010 kib

Remove empty line.

MFC after: 2 weeks


205325 19-Mar-2010 kib

Implement compat32 shims for mqueuefs.

Reviewed by: jhb
MFC after: 2 weeks


205324 19-Mar-2010 kib

Implement compat32 shims for ksem syscalls.

Reviewed by: jhb
MFC after: 2 weeks


205323 19-Mar-2010 kib

Move SysV IPC freebsd32 compat shims from freebsd32_misc.c to corresponding
sysv_{msg,sem,shm}.c files.

Mark SysV IPC freebsd32 syscalls as NOSTD and add required
SYSCALL_INIT_HELPER/SYSCALL32_INIT_HELPERs to provide auto
register/unregister on module load.

This makes COMPAT_FREEBSD32 functional with SysV IPC compiled and loaded
as modules.

Reviewed by: jhb
MFC after: 2 weeks


205322 19-Mar-2010 kib

Move SysV IPC freebsd32 compat shims helpers from freebsd32_misc.c to
sysv_ipc.c.

Reviewed by: jhb
MFC after: 2 weeks


205321 19-Mar-2010 kib

Introduce SYSCALL_INIT_HELPER and SYSCALL32_INIT_HELPER macros and
neccessary support functions to allow registering dynamically loaded
syscalls from the MOD_LOAD handlers. Helpers handle registration
failures semi-automatically.

Reviewed by: jhb
MFC after: 2 weeks


205320 19-Mar-2010 kib

FOr SYSCALL_MODULE_HELPER, use "sys/<syscallname>" module name.
FOr SYSCALL32_MODULE_HELPER, use "sys32/<syscallname>" module name.
This avoids modules name conflict when compat32 syscall does not
need shims.

Note that SYSCALL_MODULE_HELPER is going to be unused in the tree by
several next commits.

Suggested by: jhb
MFC after: 2 weeks


205319 19-Mar-2010 kib

Make freebsd32_copyiniov() available outside of freebsd32_misc.

MFC after: 2 weeks


205297 18-Mar-2010 jkim

Detect illegal access to unmapped memory within real mode emulator to aid
debugging. Update copyright date while I am here.


205016 11-Mar-2010 nwhitehorn

Regen after big endian compatibility import.


205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


204825 07-Mar-2010 ed

Make /proc/self/fd `work'.

On Linux, /proc/<pid>/fd is comparable to fdescfs, where it allows you
to inspect the file descriptors used by each process. Glibc's ttyname()
works by performing a readlink() on these nodes, since all nodes in this
directory are symlinks.

It is a bit hard to implement this in linprocfs right now, so I am not
going to bother. Add a way to make ttyname(3) work, by adding a
/proc/<pid>/fd symlink, which points to /dev/fd only if the calling
process matches. When fdescfs is mounted, this will cause the
readlink() in ttyname() to fail, causing it to fall back on manually
finding a matching node in /dev.

Discussed on: emulation@


204523 01-Mar-2010 joel

The NetBSD Foundation has granted permission to remove clause 3 and 4 from
their software.

Obtained from: NetBSD


204068 18-Feb-2010 pjd

No need to include security/mac/mac_framework.h here.


203728 09-Feb-2010 delphij

- Return EAFNOSUPPORT instead of EINVAL for unsupported address family,
this matches the Linux behavior.
- Check if we have sufficient space allocated for socket structure, which
fixes a buffer overflow when wrong length is being passed into the
emulation layer. [1]

PR: kern/138860
Submitted by: Mateusz Guzik <mjguzik gmail com>
Reported by: Alexander Best [1]
MFC after: 2 weeks


203660 08-Feb-2010 ed

Remove unused LIBCOMPAT keyword from syscalls.master.


202598 18-Jan-2010 wkoszek

Let us to use our libusb(3) in Linuxolator.

With this change, Linux binaries can work with our libusb(3) when
it's compiled against our header files on GNU/Linux system -- this
solves the problem with differences between /dev layouts.

With ported libusb(3), I am able to use my USB JTAG cable with Linux
binaries that support it.

Reviewed by: thompsa


202376 15-Jan-2010 netchild

Whitespace change to be able to provide the correct commit log for r202364:
---snip---
Add video clipping support but with the caveats below.

Background info:

Video clipping allows the user to provide either a series of clip rectangles
or a clip bitmap to the driver and have the driver mask the video according
to the clipping specs provided.

Adding support for clipping to the FreeBSD Linux emulator is problematic
because it seems that this feature is not supported by many drivers and
therefore it is ignored by many applications. Unfortunately, when not
using it, rather than passing in a null clipping list, some apps leave the
clipping fields uninitialized, casuing random values to be passed in. In
the case where the driver does not use the clipping info, this is not a
problem (although it is bad form). But the Linux emulator does not know
which drivers will use this and which won't, so the Linux emulator must
try to handle this clip list, and deal gracefully with cases where the
values seem to be uninitialized.

Video clipping info is passed in using the VIDIOCSWIN ioctl in two fields
in the video_window structure: the integer clipcount and the pointer clips.

How the linuxulator handles this from this commit on:

* if (clipcount == VIDEO_CLIP_BITMAP)
The clips variable is a void * pointer to a 128*625 byte
(1024*625 bit) memory area containing a bitmap of the clipping area.
The pointer in the video_window structure is copied, but no
video_clip structures are copied.
* if (clipcount > 0 && clipcount <= 16384)
The clips variable is pointer to a list of video_clip structures. Up
to clipcount structures are copied and passed to the driver.
The upper limit of 16384 was imposed here so that user code that does
not properly initialize clipcount falls through below and no attempt
is made to copy an uninitialized list. This value was found by
examining Linux drivers that support the clip list.
* else
The clipcount is either negative (but not VIDEO_CLIP_BITMAP), zero or
positive (> 16384).
All these cases are treated as invalid data. Both the clipcount field
and clips pointer are forced to zero/NULL and passed to the driver.

It should be noted that, at the time of developing this V4L emulator code,
the pwc(4) V4L driver does not support clipping.

Submitted by: J.R. Oldroyd <fbsd@opal.com>
MFC after: 1 month
---snip---


202364 15-Jan-2010 netchild

This is v4l support for the linuxulator. This allows to access FreeBSD
native devices which support the v4l API from processes running within
the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd driver.

Not tested is firmware upload, framebuffer stuff and video tuner stuff
due to lack of hardware.
The clipping part (VIDIOCSWIN) needs a little bit of further work (partly
in progress, but can not be tested due to lack of a suitable device).

The submitter tested this sucessfully with Skype and flash apps on amd64 and
i386 with the multimedia/pwcbsd driver.

Submitted by: J.R. Oldroyd <fbsd@opal.com>


202341 15-Jan-2010 brooks

Since all other comparisons involving ngroups_max use
"ngroups_max + 1", use ">= ngroups_max+1" instead of the equivalent
"> ngroups_max" to reduce confusion.


202143 12-Jan-2010 brooks

Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1. Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after: 1 month


202113 11-Jan-2010 mckusick

Background:

When renaming a directory it passes through several intermediate
states. First its new name will be created causing it to have two
names (from possibly different parents). Next, if it has different
parents, its value of ".." will be changed from pointing to the old
parent to pointing to the new parent. Concurrently, its old name
will be removed bringing it back into a consistent state. When fsck
encounters an extra name for a directory, it offers to remove the
"extraneous hard link"; when it finds that the names have been
changed but the update to ".." has not happened, it offers to rewrite
".." to point at the correct parent. Both of these changes were
considered unexpected so would cause fsck in preen mode or fsck in
background mode to fail with the need to run fsck manually to fix
these problems. Fsck running in preen mode or background mode now
corrects these expected inconsistencies that arise during directory
rename. The functionality added with this update is used by fsck
running in background mode to make these fixes.

Solution:

This update adds three new fsck sysctl commands to support background
fsck in correcting expected inconsistencies that arise from incomplete
directory rename operations. They are:

setcwd(dirinode) - set the current directory to dirinode in the
filesystem associated with the snapshot.
setdotdot(oldvalue, newvalue) - Verify that the inode number for ".."
in the current directory is oldvalue then change it to newvalue.
unlink(nameptr, oldvalue) - Verify that the inode number associated
with nameptr in the current directory is oldvalue then unlink it.

As with all other fsck sysctls, these new ones may only be used by
processes with appropriate priviledge.

Reported by: jeff
Security issues: rwatson


201758 07-Jan-2010 mbr

Remove extraneous semicolons, no functional changes.

Submitted by: Marc Balmer <marc@msys.ch>
MFC after: 1 week


200667 18-Dec-2009 kib

Signal 0 is used to check the permission for current process to signal
target one. Since r184058, linux_do_tkill() calls tdsignal() instead of
kill(), without checking for validity of supplied signal number. Prevent
panic when supplied signal is 0 by finishing work after checks.

Found and tested by: scf
MFC after: 3 days


200619 16-Dec-2009 imp

Revert 200606.


200606 16-Dec-2009 imp

Fix compiling FREEBSD_COMPAT[4,5,6] without FREEBSD_COMPAT7.

Note: Not sure this is the right way to do compat, but it makes the
headers consistent with the implementations.


200591 15-Dec-2009 jkim

Add two new debugging tunables for x86bios instead of abusing bootverbose,
i.e., debug.x86bios.call and debug.x86bios.int.


200112 04-Dec-2009 kib

Regenerate.


200111 04-Dec-2009 kib

Add several syscall compat32 entries for acl manipulation.
They do not require translation of the arguments.

Tested by: bsam
MFC after: 1 week


200110 04-Dec-2009 netchild

This is v4l support for the linuxulator. This allows to access FreeBSD
native devices which support the v4l API from processes running within
the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd driver.

Not tested is firmware upload, framebuffer stuff and video tuner stuff
due to lack of hardware.
The clipping part (VIDIOCSWIN) needs a little bit of further work (partly
in progress, but can not be tested due to lack of a suitable device).

The submitter tested this sucessfully with Skype and flash apps on amd64 and
i386 with the multimedia/pwcbsd driver.

Submitted by: J.R. Oldroyd <fbsd@opal.com>


200109 04-Dec-2009 netchild

Import the unchanged v4l videodev.h from the vendor branch.


199882 28-Nov-2009 ed

Include <sys/tty.h> instead of <sys/termios.h>.

Right now <sys/termios.h> includes <sys/ttycom.h>, which provides the
TTY ioctls to the svr4 code. We need both struct termios and the ioctls,
so include <sys/tty.h> for now.


198945 05-Nov-2009 netchild

Fix typo in kernel message. The fix is based upon the patch in the PR.

PR: kern/140279
Submitted by: Alexander Best <alexbestms@math.uni-muenster.de>
MFC after: 1 week


198819 02-Nov-2009 rpaulo

Revert a functional change that snuck in.


198816 02-Nov-2009 rpaulo

Fix a non-style change that snuck in.

Spotted by: danfe


198786 02-Nov-2009 rpaulo

Big style cleanup. While there remove references to FreeBSD versions
older than 6.0.

Submitted by: Paul B Mahol <onemda at gmail.com>


198512 27-Oct-2009 kib

Regenerate


198508 27-Oct-2009 kib

Current pselect(3) is implemented in usermode and thus vulnerable to
well-known race condition, which elimination was the reason for the
function appearance in first place. If sigmask supplied as argument to
pselect() enables a signal, the signal might be delivered before thread
called select(2), causing lost wakeup. Reimplement pselect() in kernel,
making change of sigmask and sleep atomic.

Since signal shall be delivered to the usermode, but sigmask restored,
set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK
should be cleared by ast() in case signal was not gelivered during
syscall execution.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198507 27-Oct-2009 kib

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198506 27-Oct-2009 kib

In kern_sigsuspend(), better manipulate thread signal mask using
kern_sigprocmask() to properly notify other possible candidate threads
for signal delivery.

Since sigsuspend() shall only return to usermode after a signal was
delivered, do cursig/postsig loop immediately after waiting for
signal, repeating the wait if wakeup was spurious due to race with
other thread fetching signal from the process queue before us. Add
thread_suspend_check() call to allow the thread to be stopped or killed
while in loop.

Modify last argument of kern_sigprocmask() from boolean to flags,
allowing the function to be called with locked proc. Convertion of the
callers that supplied 1 to the old argument will be done in the next
commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally
correct in between.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198467 25-Oct-2009 bz

Unconditionally call the setsockopt for IPV6_V6ONLY for v6 linux sockets
no matter whether we are compiled as module or if our default of the
net.inet6.ip6.v6only sysctl already matches what we would set.

This avoids unnecessary complications with modules, VIMAGES, INET6 and
the sysctl value, especially considering that most users will use
linux compat as a module.

Discussed with: kib, rwatson (weeks ago)
Reviewed by: rwatson
MFC after: 6 weeks


198252 19-Oct-2009 jkim

Fix a copy-and-pasto in the previous commit.


198251 19-Oct-2009 jkim

Rewrite x86bios and update its dependent drivers.

- Do not map entire real mode memory (1MB). Instead, we map IVT/BDA and
ROM area separately. Most notably, ROM area is mapped as device memory
(uncacheable) as it should be. User memory is dynamically allocated and
free'ed with contigmalloc(9) and contigfree(9). Remove now redundant and
potentially dangerous x86bios_alloc.c. If this emulator ever grows to
support non-PC hardware, we may implement it with rman(9) later.
- Move all host-specific initializations from x86emu_util.c to x86bios.c and
remove now unnecessary x86emu_util.c. Currently, non-PC hardware is not
supported. We may use bus_space(9) later when the KPI is fixed.
- Replace all bzero() calls for emulated registers with more obviously named
x86bios_init_regs(). This function also initializes DS and SS properly.
- Add x86bios_get_intr(). This function checks if the interrupt vector is
available for the platform. It is not necessary for PC-compatible hardware
but it may be needed later. ;-)
- Do not try turning off monitor if DPMS does not support the state.
- Allocate stable memory for VESA OEM strings instead of just holding
pointers to them. They may or may not be accessible always. Fix a memory
leak of video mode table while I am here.
- Add (experimental) BIOS POST call for vesa(4). This function calls VGA
BIOS POST code from the current VGA option ROM. Some video controllers
cannot save and restore the state properly even if it is claimed to be
supported. Usually the symptom is blank display after resuming from suspend
state. If the video mode does not match the previous mode after restoring,
we try BIOS POST and force the known good initial state. Some magic was
taken from NetBSD (and it was taken from vbetool, I believe.)
- Add a loader tunable for vgapci(4) to give a hint to dpms(4) and vesa(4)
to identify who owns the VESA BIOS. This is very useful for multi-display
adapter setup. By default, the POST video controller is automatically
probed and the tunable "hw.pci.default_vgapci_unit" is set to corresponding
vgapci unit number. You may override it from loader but it is very unlikely
to be necessary. Unfortunately only AGP/PCI/PCI-E controllers can be
matched because ISA controller does not have necessary device IDs.
- Fix a long standing bug in state save/restore function. The state buffer
pointer should be ES:BX, not ES:DI according to VBE 3.0. If it ever worked,
that's because BX was always zero. :-)
- Clean up register initializations more clearer per VBE 3.0.
- Fix a lot of style issues with vesa(4).


197729 03-Oct-2009 bz

Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by: kib
MFC after: 1 month


197637 30-Sep-2009 rwatson

Regenerate system call files following r197636.


197636 30-Sep-2009 rwatson

Reserve system call numbers for Capsicum security framework capabilities,
capability mode, and process descriptors: cap_new, cap_getrights, cap_enter,
cap_getmode, pdfork, pdkill, pdgetpid, and pdwait.

Obtained from: TrustedBSD Project
Sponsored by: Google
MFC after: 3 weeks


197571 28-Sep-2009 delphij

Use a 2 clause BSD-style license instead of stating the code as public
domain, as requested by core@ and reviewed by the author.


197493 25-Sep-2009 jkim

- Reduce BIOS memory mapping. We want 1MB of physical memory, not 12MB[1].
- Remove CS and IP registers from x86bios.h. They have no use for us.
- Adjust register dump to make it little bit more useful for debugging.

Submitted by: paradox (ddkprog yahoo com)[1] (initial version)


197475 24-Sep-2009 jkim

Dump real mode registers under bootverbose to help debugging BIOS emulator.


197466 24-Sep-2009 jkim

- Use FreeBSD function naming convention.
- Change x86biosCall() to more appropriate x86bios_intr().[1]

Discussed with: delphij, paradox (ddkprog yahoo com)
Submitted by: paradox (ddkprog yahoo com)[1]


197444 23-Sep-2009 jkim

Move sys/dev/x86bios to sys/compat/x86bios.

It may not be optimal but it is clearly better than the old place.

OK'ed by: delphij, paradox (ddkprog yahoo com)


197176 13-Sep-2009 zec

Lock the ifnet list while iterating over it.

Submitted by: julian
MFC after: 3 days


197064 10-Sep-2009 des

As jhb@ pointed out to me, r197057 was incorrect, not least because these
are generated files.


197057 10-Sep-2009 des

If a certain feature that was present in FreeBSD 7 was removed or changed in
FreeBSD 8, the compatibility shims should be built not just when FreeBSD 7
compatibility is requested, but also when compatibility with any older
FreeBSD version where that feature was present is requested.o

Without this patch, a kernel config that sets COMPAT_FREEBSD6 but not *7
would fail to build due to inconsistencies between the declaration of the
compatibility shims and their use in the SysV code.

There are similar errors in other *proto.h headers in the tree.

MFC after: 3 weeks


197049 09-Sep-2009 kib

kern_select(9) copies fd_set in and out of userspace in quantities of
longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in
or out 4 bytes more then the process expected.

Calculate the amount of bytes to copy taking into account size of fd_set
for the current process ABI.

Diagnosed and tested by: Peter Jeremy <peterjeremy acm org>
Reviewed by: jhb
MFC after: 1 week


196653 30-Aug-2009 bz

Make sure FreeBSD binaries without .note.ABI-tag section work
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.

Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.

The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo was matched first, as the fallback to ld-elf32.so.1 does
not exist in that case.

Reported and tested by: ticso
In collaboration with: kib
MFC after: 3 days


196635 28-Aug-2009 zec

Fix a few panics in linuxulator + VIMAGE due to curvnet not being set.

This change affects only options VIMAGE builds.

Reviewed by: julian
MFC after: 3 days


196512 24-Aug-2009 bz

Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
MFC after: 3 days


196481 23-Aug-2009 rwatson

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days


196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


195911 27-Jul-2009 jhb

Fix the freebsd32 versions of semsys(), shmsys(), and msgsys() to use the
old ABI versions of the relevant control system call (e.g.
freebsd7_freebsd32_msgctl() instead of freebsd32_msgctl() for msgsys()).

Approved by: re (kib)


195870 25-Jul-2009 jamie

Some jail parameters (in particular, "ip4" and "ip6" for IP address
restrictions) were found to be inadequately described by a boolean.
Define a new parameter type with three values (disable, new, inherit)
to handle these and future cases.

Approved by: re (kib), bz (mentor)
Discussed with: rwatson


195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


195469 08-Jul-2009 trasz

Regen the freebsd32 parts.

Approved by: re (kib)


195468 08-Jul-2009 trasz

Fix freebsd32 version of lpathconf(2).

Approved by: re (kib)


195458 08-Jul-2009 trasz

There is an optimization in chmod(1), that makes it not to call chmod(2)
if the new file mode is the same as it was before; however, this
optimization must be disabled for filesystems that support NFSv4 ACLs.
Chmod uses pathconf(2) to determine whether this is the case - however,
pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't.

This change adds lpathconf(3) to make it possible to solve that problem
in a clean way.

Reviewed by: rwatson (earlier version)
Approved by: re (kib)


195104 27-Jun-2009 rwatson

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


195031 26-Jun-2009 weongyo

provides a extra write buffer when the NDIS driver want to send a
request whose body has some datas through the default pipe.

Tested by: Nikos Vassiliadis <nvass9573 at gmx.com>


194919 24-Jun-2009 jhb

Regen.


194910 24-Jun-2009 jhb

Change the ABI of some of the structures used by the SYSV IPC API:
- The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned
short.
- The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned
short.
- The mode member of struct ipc_perm is now mode_t instead of unsigned short
(this is merely a style bug).
- The rather dubious padding fields for ABI compat with SV/I386 have been
removed from struct msqid_ds and struct semid_ds.
- The shm_segsz member of struct shmid_ds is now a size_t instead of an
int. This removes the need for the shm_bsegsz member in struct
shmid_kernel and should allow for complete support of SYSV SHM regions
>= 2GB.
- The shm_nattch member of struct shmid_ds is now an int instead of a
short.
- The shm_internal member of struct shmid_ds is now gone. The internal
VM object pointer for SHM regions has been moved into struct
shmid_kernel.
- The existing __semctl(), msgctl(), and shmctl() system call entries are
now marked COMPAT7 and new versions of those system calls which support
the new ABI are now present.
- The new system calls are assigned to the FBSD-1.1 version in libc. The
FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls.
- A simplistic framework for tagging system calls with compatibility
symbol versions has been added to libc. Version tags are added to
system calls by adding an appropriate __sym_compat() entry to
src/lib/libc/incldue/compat.h. [1]

PR: kern/16195 kern/113218 bin/129855
Reviewed by: arch@, rwatson
Discussed with: kan, kib [1]


194833 24-Jun-2009 jhb

Add a new COMPAT7 flag for FreeBSD 7.x compatibility system calls.


194739 23-Jun-2009 bz

After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.


194682 23-Jun-2009 thompsa

Fix a typeo in the frame len function to unbreak the build, make it shorter
while I am here.


194677 23-Jun-2009 thompsa

- Make struct usb_xfer opaque so that drivers can not access the internals
- Reduce the number of headers needed for a usb driver, the common case is just usb.h and usbdi.h


194647 22-Jun-2009 jhb

Regen.


194645 22-Jun-2009 jhb

Fix a typo in a comment.


194498 19-Jun-2009 brooks

Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively. (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer. Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively. Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary. In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups. When feasible, truncate
the group list rather than generating an error.

Minor changes:
- Reduce the number of hand rolled versions of groupmember().
- Do not assign to both cr_gid and cr_groups[0].
- Modify ipfw to cache ucreds instead of part of their contents since
they are immutable once referenced by more than one entity.

Submitted by: Isilon Systems (initial implementation)
X-MFC after: never
PR: bin/113398 kern/133867


194392 17-Jun-2009 jhb

Regen.


194390 17-Jun-2009 jhb

- Add the ability to mix multiple flags seperated by pipe ('|') characters
in the type field of system call tables. Specifically, one can now use
the 'NO*' types as flags in addition to the 'COMPAT*' types. For example,
to tag 'COMPAT*' system calls as living in a KLD via NOSTD. The COMPAT*
type is required to be listed first in this case.
- Add new functions 'type()' and 'flag()' to the embedded awk script in
makesyscalls.sh that return true if a requested flag is found in the
type field ($3). The flag() function checks all of the flags in the
field, but type() only checks the first flag. type() is meant to be
used in the top-level "switch" statement and flag() should be used
otherwise.
- Retire the CPT_NOA type, it is now replaced with "COMPAT|NOARGS" using
the flags approach.
- Tweak the comment descriptions of COMPAT[46] system calls so that they
say "freebsd[46] foo" rather than "old foo".
- Document the COMPAT6 type.
- Sync comments in compat32 syscall table with the master table.


194368 17-Jun-2009 bz

Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.


194263 15-Jun-2009 jhb

Regen.


194262 15-Jun-2009 jhb

Add a new 'void closefrom(int lowfd)' system call. When called, it closes
any open file descriptors >= 'lowfd'. It is largely identical to the same
function on other operating systems such as Solaris, DFly, NetBSD, and
OpenBSD. One difference from other *BSD is that this closefrom() does not
fail with any errors. In practice, while the manpages for NetBSD and
OpenBSD claim that they return EINTR, they ignore internal errors from
close() and never return EINTR. DFly does return EINTR, but for the common
use case (closing fd's prior to execve()), the caller really wants all
fd's closed and returning EINTR just forces callers to call closefrom() in
a loop until it stops failing.

Note that this implementation of closefrom(2) does not make any effort to
resolve userland races with open(2) in other threads. As such, it is not
multithread safe.

Submitted by: rwatson (initial version)
Reviewed by: rwatson
MFC after: 2 weeks


194252 15-Jun-2009 jamie

Get vnets from creds instead of threads where they're available, and from
passed threads instead of curthread.

Reviewed by: zec, julian
Approved by: bz (mentor)


194228 15-Jun-2009 thompsa

s/usb2_/usb_|usbd_/ on all function names for the USB stack.


194203 14-Jun-2009 dchagin

Unlock process lock when return error from getrobustlist call.

Tested by: Alexander Best <alexbestms at math uni-muenster de>
Approved by: kib (mentor)
MFC after: 3 days


194090 13-Jun-2009 jamie

Add counterparts to getcredhostname:
getcreddomainname, getcredhostuuid, getcredhostid

Suggested by: rmacklem
Approved by: bz


193917 10-Jun-2009 kib

Regenerate


193916 10-Jun-2009 kib

Add several syscall compat32 entries for extattr manipulation syscalls,
that do not require translation of the arguments.

Requested by: kientzle
Reviewed by: jhb (previous wrong version)
MFC after: 1 week


193744 08-Jun-2009 bz

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


193644 07-Jun-2009 thompsa

Rename usb pipes to endpoints as it better represents what they are, and struct
usb_pipe may be used for a different purpose later on.


193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


193265 01-Jun-2009 dchagin

Add forgotten in previous commit flags argument.

Approved by: kib (mentor)
MFC after: 1 month


193264 01-Jun-2009 dchagin

Implement accept4 syscall.

Approved by: kib (mentor)
MFC after: 1 month


193263 01-Jun-2009 dchagin

Implement a variation of the accept_common() which takes
a flags argument.

Do not preserve td_retval before kern_fcntl(F_SETFL) as it does not
changed.

Approved by: kib (mentor)
MFC after: 1 month


193262 01-Jun-2009 dchagin

Split linux_accept() syscall onto linux_accept_common() which should
be used by linuxulator and linux_accept() itself.

Approved by: kib (mentor)
MFC after: 1 month


193235 01-Jun-2009 rwatson

Regenerate generated syscall files following changes to struct sysent in
r193234.


193168 31-May-2009 dchagin

Implement a variation of the socketpair() syscall which takes a flags
in addition to the type argument.

Approved by: kib (mentor)
MFC after: 1 month


193165 31-May-2009 dchagin

Move new socket flags handling into a separate function as Linux
introduced more syscalls which uses these flags.

Approved by: kib (mentor)
MFC after: 1 month


193164 31-May-2009 dchagin

Remove empty lines.

Approved by: kib (mentor)
MFC after: 1 month


193084 30-May-2009 delphij

Attempt to fix build by updating hostid to follow the new world order.


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


193045 29-May-2009 thompsa

s/usb2_/usb_/ on all typedefs for the USB stack.


193017 29-May-2009 delphij

Implement SI_ISALIST.

PR: kern/91293
Submitted by: "Pedro f. Giffuni" <giffunip asme org>
Obtained from: NetBSD


193016 29-May-2009 delphij

Fix the sysinfo(SI_HW_SERIAL, emulation so that we actually get the
hostid of the machine rather than always getting "0".

PR: kern/91293
Submitted by: "Pedro f. Giffuni" <giffunip asme org>
Obtained from: NetBSD


193015 29-May-2009 delphij

copyinstr(9) takes parameter 'len' as a size_t *, not int *.

PR: kern/91293
Submitted by: "Pedro f. Giffuni" <giffunip asme org>
Obtained from: NetBSD


193014 29-May-2009 delphij

de-register.

Submitted by: "Pedro f. Giffuni" <giffunip asme org>
Obtained from: NetBSD
PR: kern/91293


193013 29-May-2009 delphij

svr4_sys_getdents64() should not assume that the cookie would exist
everywhere.

PR: kern/91293
Submitted by: "Pedro f. Giffuni" <giffunip asme org>
Obtained from: NetBSD


193012 29-May-2009 delphij

Add new sysconfig bits, Fix the bogus numbering of the old bits.

Submitted by: "Pedro f. Giffuni" <giffunip asme org>
Obtained from: NetBSD
PR: kern/91293


192994 28-May-2009 delphij

Use strlcpy().


192984 28-May-2009 thompsa

s/usb2_/usb_/ on all C structs for the USB stack.


192899 27-May-2009 avg

linux_ioctl_cdrom: reduce stack usage

... by moving two ~2KB structures from stack to heap allocation.
I experienced stack overflow in linux emulation on i386 (8K stack)
when LINUX_DVD_READ_STRUCT ioctl was performed on atapicam cd
device and there was an error that resulted in additional quite
heavy stack use in cam layer.

Reviewed by: dchagin
Approved by: jhb (mentor)


192895 27-May-2009 jamie

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


192692 24-May-2009 antoine

Remove an unused variable.


192459 20-May-2009 jhb

Comment nits.


192455 20-May-2009 jhb

Put the vnode returned from namei() immediately after namei() returns in
svr4_sys_resolvepath().


192373 19-May-2009 dchagin

Validate user-supplied arguments values.
Args argument is a pointer to the structure located in user space in
which the socketcall arguments are packed. The structure must be
copied to the kernel instead of direct dereferencing.

Approved by: kib (mentor)
MFC after: 1 week


192284 18-May-2009 dchagin

Implement MSG_CMSG_CLOEXEC flag for linux_recvmsg().

Approved by: kib (mentor)
MFC after: 1 month


192206 16-May-2009 dchagin

Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and
SOCK_NONBLOCK flags, that allow to save fcntl() calls.

Implement a variation of the socket() syscall which takes a flags
in addition to the type argument.

Approved by: kib (mentor)
MFC after: 1 month


192205 16-May-2009 dchagin

Return EINVAL in case when the incorrect or unsupported
type argument is specified.

Do not map type argument value as its Linux values are
identical to FreeBSD values.

Approved by: kib (mentor)


192204 16-May-2009 dchagin

Use the protocol family constants for the domain argument validation.
Return immediately when the socket() failed.

Approved by: kib (mentor)
MFC after: 1 month


192203 16-May-2009 dchagin

Emulate SO_PEERCRED socket option.
Temporarily use 0 for pid member as the FreeBSD does not cache remote
UNIX domain socket peer pid.

PR: kern/102956
Reviewed by: rwatson
Approved by: kib (mentor)
MFC after: 1 month


192090 14-May-2009 brueffer

Remove an unused variable.

Found with: Coverity Prevent(tm)
CID: 1167


192036 13-May-2009 brueffer

Fix memory leak in an error case.

Found with: Coverity Prevent(tm)
CID: 371
MFC after: 2 weeks


191989 11-May-2009 dchagin

Translate l_timeval arg to native struct timeval in
linux_setsockopt()/linux_getsockopt() for SO_RCVTIMEO,
SO_SNDTIMEO opts as l_timeval has MD members.

Remove bogus __packed attribute from l_timeval struct on __amd64__.

PR: kern/134276
Submitted by: Thomas Mueller <tmueller sysgo com>
Approved by: kib (mentor)
MFC after: 2 weeks


191988 11-May-2009 dchagin

Add forgotten linux to bsd flags argument mapping into the linux_recv().

PR: kern/134276
Submitted by: Thomas Mueller <tmueller sysgo com>
Approved by: kib (mentor)
MFC after: 2 weeks


191973 10-May-2009 dchagin

Do not export AT_CLKTCK when emulating Linux kernel prior
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.

Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.

For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.

Approved by: kib (mentor)


191972 10-May-2009 dchagin

Introduce linux_kernver() interface which is intended for an exact
designation of the emulated kernel version.

linux_kernver() returns integer value formatted as 'VVVMMMIII' where
VVV - version, MMM - major revision, III - minor revision.

Approved by: kib (mentor)


191966 10-May-2009 dchagin

Rework r189362, r191883.
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.

Pointed out by: bde

Approved by: kib (mentor)


191921 08-May-2009 ed

Regenerate system call tables to use SVN ids.


191919 08-May-2009 ed

Burn TTY ioctl bridges in compat layers.

I really don't want any pieces of code to include ioctl_compat.h, so let
the ibcs2 and svr4 compat leave sgtty alone. If they want to support
sgtty, they should emulate it on top of termios, not sgtty.

The code has been marked with BURN_BRIDGES for a long time. ibcs2 and
svr4 are not really popular pieces of code anyway.


191915 08-May-2009 zec

Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg. Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by: bz
Approved by: julian (mentor)


191898 07-May-2009 jamie

Give vfs_getopt the type it's expecting.
Write 100 times: "32 bits is so twentieth century."

Noticed by: dchagin


191896 07-May-2009 jamie

Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions. This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by: dchagin, kib
Approved by: bz (mentor)


191887 07-May-2009 dchagin

Add KTR(9) tracing for futex emulation.

Approved by: kib (mentor)
MFC after: 1 month


191883 07-May-2009 dchagin

Linux exports HZ value to user space via AT_CLKTCK auxiliary vector entry,
which is available for Glibc as sysconf(_SC_CLK_TCK). If AT_CLKTCK entry is
not exported, Glibc uses 100.

linux_times() shall use the value that is exported to user space.

Pointyhat to: dchagin

PR: kern/134251
Approved by: kib (mentor)
MFC after: 2 weeks


191880 07-May-2009 dchagin

Change linux struct tms definition to match actual linux one.

Approved by: kib (mentor)
MFC after: 2 weeks


191877 07-May-2009 dchagin

Add preliminary KTR(9) support to the linux emulation layer.

Approved by: kib (mentor)
MFC after: 1 month


191876 07-May-2009 dchagin

To avoid excessive code duplication move MI definitions to the MI
header file. As it is defined in Linux.

Approved by: kib (mentor)
MFC after: 1 month


191875 07-May-2009 dchagin

Return EAFNOSUPPORT instead of EINVAL in case when the incorrect or
unsupported domain argument is specified.

Approved by: kib (mentor)


191871 07-May-2009 dchagin

Rework r191742.
Use the protocol family constants for the domain argument validation.

Return EAFNOSUPPORT in case when the incorrect domain argument
is specified.

Return EPROTONOSUPPORT instead of passing values that are not 0
to the BSD layer.

Suggested by: rwatson

Approved by: kib (mentor)
MFC after: 1 month


191792 04-May-2009 jamie

Mark Linux MIB sysctls MPSAFE.

Reviewed by: dchagin, kib
Approved by: bz (mentor)


191742 02-May-2009 dchagin

Linux socketpair() call expects explicit specified protocol for
AF_LOCAL domain unlike FreeBSD which expects 0 in this case.

Approved by: kib (mentor)
MFC after: 1 month


191741 02-May-2009 dchagin

Move extern variable definitions to the header file.

Approved by: kib (mentor)
MFC after: 1 month


191719 01-May-2009 dchagin

Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.

New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.

Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.

Approved by: kib (mentor)
MFC after: 1 month


191675 29-Apr-2009 jamie

Regen for new jail system calls in r191673.

Approved by: bz (mentor)


191673 29-Apr-2009 jamie

Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2). Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
This replaces jail(2).
* jail_get, to read the parameters of existing jails. This replaces the
security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes. The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by: bz (mentor)


191548 26-Apr-2009 zec

In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by: bz (an older version of the patch)
Approved by: julian (mentor)


191269 19-Apr-2009 dchagin

Remove support for FUTEX_REQUEUE operation.
Glibc does not use this operation since 2.3.3 version (Jun 2004),
as it is racy and replaced by FUTEX_CMP_REQUEUE operation.
Glibc versions prior to 2.3.3 fall back to FUTEX_WAKE when
FUTEX_REQUEUE returned EINVAL.

Any application directly using FUTEX_REQUEUE without return
value checking are definitely broken.

Limit quantity of messages per process about unsupported
operation.

Approved by: kib (mentor)
MFC after: 1 month


190734 05-Apr-2009 thompsa

MFp4 //depot/projects/usb@159909

- make usb2_power_mask_t 16-bit
- remove "usb2_config_sub" structure from "usb2_config". To compensate for this
"usb2_config" has a new field called "usb_mode" which select for which mode
the current xfer entry is active. Options are: a) Device mode only b) Host
mode only (default-by-zero) c) Both modes. This change was scripted using
the following sed script: "s/\.mh\././g".
- the standard packet size table in "usb_transfer.c" is now a function, hence
the code for the function uses less memory than the table itself.

Submitted by: Hans Petter Selasky


190708 05-Apr-2009 dchagin

Fix KBI breakage by r190520 which affects older linux.ko binaries:

1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
modules won't have the flag set, so the new field brand_note would be
ignored.

Suggested by: jhb
Reviewed by: jhb
Approved by: kib (mentor)
MFC after: 6 days


190622 01-Apr-2009 kib

Regen


190621 01-Apr-2009 kib

Rename implementation function for freebsd32 sysarch(2) to allow for
the arguments translations. Provide ABI-compatible definition of the
struct i386_ldt_args for freebsd32 compat layer.

In collaboration with: pho
Reviewed by: jhb


190616 01-Apr-2009 kib

Add all segment registers for the amd64 CPU to struct reg and mcontext.
To keep these structures ABI-compatible, half the size of r_trapno,
r_err, mc_trapno, mc_flags.

Add fsbase and gsbase to mcontext on both amd64 and i386.
Add flags to amd64 mcontext to indicate that it contains valid segments
or bases.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb


190529 29-Mar-2009 ed

Emulate the FIODGNAME ioctl in our 32-bit emulator.

It's quite strange that nobody reported this issue before. It turns out
functions like ttyname(), ptsname() and fdevname() don't work in
compat32. This means it't not even possible to run applications like
script(1) inside a 32-bit FreeBSD jail.

Fix this by converting 32-bit fiodgname_arg structures to their 64-bit
equivalent.

Reported by: kris
Tested by: kris


190466 27-Mar-2009 jamie

Whitespace/spelling fixes in advance of upcoming functional changes.

Approved by: bz (mentor)


190445 26-Mar-2009 ambrisko

Add stuff to support upcoming BMC/IPMI flashing of newer Dell machine
via the Linux tool.
- Add Linux shim to ipmi(4)
- Create a partitions file to linprocfs to make Linux fdisk see
disks. This file is dynamic so we can see disks come and go.
- Convert msdosfs to vfat in mtab since Linux uses that for
msdosfs.
- In the Linux mount path convert vfat passed in to msdosfs
so Linux mount works on FreeBSD. Note that tasting works
so that if da0 is a msdos file system
/compat/linux/bin/mount /dev/da0 /mnt
works.
- fix a 64it bug for l_off_t.
Grabing sh, mount, fdisk, df from Linux, creating a symlink of mtab to
/compat/linux/etc/mtab and then some careful unpacking of the Linux bmc
update tool and hacking makes it work on newer Dell boxes. Note, probably
if you can't figure out how to do this, then you probably shouldn't be
doing it :-)


189950 18-Mar-2009 weongyo

Some NDIS USB drivers try to call URB funcs like URB_FUNCTION_VENDOR_xxx
or URB_FUNCTION_CLASS_xxx with HAL preemption lock that means it's
non-sleepable during USB requests though usb2_do_request() requires a
sleep so it needs to send queries to the default pipe without those
interfaces to avoid sleep.


189942 18-Mar-2009 weongyo

If the caller sets irp_usriostat or irp_usrevent it try to process it
whatever the IRP flag is because some drivers (eg. RTL8187L NDIS driver)
call IoCompleteRequest() without setting flags. It will prevent waiting
a event forever at attach.


189927 17-Mar-2009 kib

Supply AT_EXECPATH auxinfo entry to the interpreter, both for native and
compat32 binaries.

Tested by: pho
Reviewed by: kan


189917 17-Mar-2009 weongyo

grab NDIS USB lock instead of HAL preemption. This change should be
happened in the previous.


189874 16-Mar-2009 weongyo

use usb2_desc_foreach() to iterate the USB config descriptor instread of
accessing structures directly to check some invalid descriptors.

Pointed by: hps


189867 16-Mar-2009 dchagin

Sort include files in the alphabetical order.

Approved by: kib (mentor)
MFC after: 2 weeks


189862 15-Mar-2009 dchagin

Ignore FUTEX_FD op, as it is done by linux.

Approved by: kib (mentor)
MFC after: 2 weeks


189861 15-Mar-2009 dchagin

Include linux_futex.h before linux_emul.h

Approved by: kib (mentor)
MFC after: 6 days


189771 13-Mar-2009 dchagin

Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR: 118473
Approved by: kib (mentor)


189719 12-Mar-2009 weongyo

o change a lock model based on HAL preemption lock to a normal mtx.
Based on the HAL preemption lock there is a problem on SMP machines
and causes a panic.
o When a device detached the current tactic to detach NDIS USB driver is
to call SURPRISE_REMOVED event. So it don't need to call
ndis_halt_nic() again. This fixes some page faults when some drivers
work abnormal.
o it assumes now that URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER is in
DISPATCH_LEVEL (non-sleepable) and as further work
URB_FUNCTION_VENDOR_XXX and URB_FUNCTION_CLASS_XXX should be.

Reviewed by: Hans Petter Selasky <hselasky_at_freebsd.org>
Tested by: Paul B. Mahol <onemda_at_gmail.com>


189488 07-Mar-2009 weongyo

o port NDIS USB support from USB1 to the new usb(USB2).
o implement URB_FUNCTION_ABORT_PIPE handling.
o remove unused code related with canceling the timer list for USB
drivers.
o whitespace cleanup and style(9)

Obtained from: hps's original patch


189423 05-Mar-2009 jhb

A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
thread hasn't used the FPU yet to reflect the real initial control
word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
word instead of trashing whatever the current state of the FPU is.

Reviewed by: bde


189362 04-Mar-2009 dchagin

Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.

Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.

Fix some minor style issues.

Submitted by: Marcin Cieslak <saper at SYSTEM PL>
Approved by: kib (mentor)
MFC after: 1 week


189290 02-Mar-2009 jamie

Extend the "vfsopt" mount options for more general use. Make struct
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.

Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.

While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.

Approved by: bz (mentor)


189106 27-Feb-2009 bz

For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.


189004 24-Feb-2009 rdivacky

Change the functions to ANSI in those cases where it breaks promotion
to int rule. See ISO C Standard: SS6.7.5.3:15.

Approved by: kib (mentor)
Reviewed by: warner
Tested by: silence on -current


188939 23-Feb-2009 thompsa

Move usb to a graveyard location under sys/legacy/dev, it is intended that the
new USB2 stack will fully replace this for 8.0.

Remove kernel modules, a subsequent commit will update conf/files. Unhook
usbdevs from the build.


188849 20-Feb-2009 ed

Don't make Linux stat() open character devices to resolve its name.

The existing code calls kern_open() to resolve the vnode of a pathname
right after a stat(). This is not correct, because it causes random
character devices to be opened in /dev. This means ls'ing a tape
streamer will cause it to rewind, for example. Changes I have made:

- Add kern_statat_vnhook() to allow binary emulators to `post-process'
struct stat, using the proper vnode.

- Remove unneeded printf's from stat() and statfs().

- Make the Linuxolator use kern_statat_vnhook(), replacing
translate_path_major_minor_at().

- Let translate_fd_major_minor() use vp->v_rdev instead of
vp->v_un.vu_cdev.

Result:

crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx
crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0
crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1
crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2

Before this commit, ptmx also had a major number of 136, because it
silently allocated and deallocated a pseudo-terminal. Device nodes that
cannot be opened now have proper major/minor-numbers.

Reviewed by: kib, netchild, rdivacky (thanks!)


188588 13-Feb-2009 jhb

Use shared vnode locks when invoking VOP_READDIR().

MFC after: 1 month


188579 13-Feb-2009 jhb

Fix a bug in the previous change to the mtab handler: use the path returned
by vn_fullpath() when vn_fullpath() succeeds instead of when it fails.

Submitted by: Artem Belevich fbsdlist of src.cx
MFC after: 3 days


188572 13-Feb-2009 netchild

Fix an edge-case of the linux readdir: We need the size of a linux dirent
structure, not the size of a pointer to it.

PR: 131099
Submitted by: Andreas Kies <andikies@gmail.com>
MFC after: 2 weeks


187948 31-Jan-2009 obrien

Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by: jhb


187830 28-Jan-2009 ed

Last step of splitting up minor and unit numbers: remove minor().

Inside the kernel, the minor() function was responsible for obtaining
the device minor number of a character device. Because we made device
numbers dynamically allocated and independent of the unit number passed
to make_dev() a long time ago, it was actually a misnomer. If you really
want to obtain the device number, you should use dev2udev().

We already converted all the drivers to use dev2unit() to obtain the
device unit number, which is still used by a lot of drivers. I've
noticed not a single driver passes NULL to dev2unit(). Even if they
would, its behaviour would make little sense. This is why I've removed
the NULL check.

Ths commit removes minor(), minor2unit() and unit2minor() from the
kernel. Because there was a naming collision with uminor(), we can
rename umajor() and uminor() back to major() and minor(). This means
that the makedev(3) manual page also applies to kernel space code now.

I suspect umajor() and uminor() isn't used that often in external code,
but to make it easier for other parties to port their code, I've
increased __FreeBSD_version to 800062.


187594 22-Jan-2009 jkim

Replace couple of strcmp(cpu_vendor, "foo") with cpu_vendor_id for i386
and hide i386-specific code under #ifdef.


186564 29-Dec-2008 ed

Push down Giant inside sysctl. Also add some more assertions to the code.

In the existing code we didn't really enforce that callers hold Giant
before calling userland_sysctl(), even though there is no guarantee it
is safe. Fix this by just placing Giant locks around the call to the oid
handler. This also means we only pick up Giant for a very short period
of time. Maybe we should add MPSAFE flags to sysctl or phase it out all
together.

I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root()
and name2oid() are called with the sysctl lock held.

Reviewed by: Jille Timmermans <jille quis cx>


186563 29-Dec-2008 kib

vm_map_lock_read() does not increment map->timestamp, so we should
compare map->timestamp with saved timestamp after map read lock is
reacquired, not with saved timestamp + 1. The only consequence of the +1
was unconditional lookup of the next map entry, though.

Tested by: pho
Approved by: des
MFC after: 2 weeks


186540 28-Dec-2008 ganbold

Remove unused variable.

Found with: Coverity Prevent(tm)
CID: 542

Approved by: weongyo


186509 27-Dec-2008 weongyo

fix a bug to handling the argument that it passed `device_t' but it's
handled as `struct ndis_softc'. It'll cause a panic when the driver is
detached.


186507 27-Dec-2008 weongyo

Integrate the NDIS USB support code to CURRENT.

Now the NDISulator supports NDIS USB drivers that it've tested with
devices as follows:

- Anygate XM-142 (Conexant)
- Netgear WG111v2 (Realtek)
- U-Khan UW-2054u (Marvell)
- Shuttle XPC Accessory PN20 (Realtek)
- ipTIME G054U2 (Ralink)
- UNiCORN WL-54G (ZyDAS)
- ZyXEL G-200v2 (ZyDAS)

All of them succeeded to attach and worked though there are still some
problems that it's expected to be solved.

To use NDIS USB support, you should rebuild and install ndiscvt(8) and
if you encounter a problem to attach please set `hw.ndisusb.halt' to
0 then retry.

I expect no changes of the NDIS code for PCI, PCMCIA devices.

Obtained from: //depot/projects/ndisusb/...


186225 17-Dec-2008 kib

Remove two remnant uses of AT_DEBUG.


185984 12-Dec-2008 kib

Reference the vmspace of the process being inspected by procfs, linprocfs
and sysctl kern_proc_vmmap handlers.

Reported and tested by: pho
Reviewed by: rwatson, des
MFC after: 1 week


185898 11-Dec-2008 bz

Add 32-bit compat support for AIO.

jhb probably forgot to commit this file with r185878 and will want to
review this. It unbreaks the build here.

Obtained from: p4 //depot/user/jhb/lock/compat/freebsd32/freebsd32_signal.h#2


185879 10-Dec-2008 jhb

Regen.


185878 10-Dec-2008 jhb

- Add 32-bit compat system calls for VFS_AIO. The system calls live in the
aio code and are registered via the recently added SYSCALL32_*() helpers.
- Since the aio code likes to invoke fuword and suword a lot down in the
"bowels" of system calls, add a structure holding a set of operations for
things like storing errors, copying in the aiocb structure, storing
status, etc. The 32-bit system calls use a separate operations vector to
handle fuword32 vs fuword, etc. Also, the oldsigevent handling is now
done by having seperate operation vectors with different aiocb copyin
routines.
- Split out kern_foo() functions for the various AIO system calls so the
32-bit front ends can manage things like copying in and converting
timespec structures, etc.
- For both the native and 32-bit aio_suspend() and lio_listio() calls,
just use copyin() to read the array of aiocb pointers instead of using
a for loop that iterated over fuword/fuword32. The error handling in
the old case was incomplete (lio_listio() just ignored any aiocb's that
it got an EFAULT trying to read rather than reporting an error), and
possibly slower.

MFC after: 1 month


185864 10-Dec-2008 kib

Relock user map earlier, to have the lock held when break leaves the
loop earlier due to sbuf error.

Pointy hat to: me
Submitted by: dchagin


185766 08-Dec-2008 kib

Make two style changes to create new commit and document proper commit
message for r185765.

Noted by: rdivacky
Requested by: des

Commit message for r185765 should be:
In procfs map handler, and in linprocfs maps handler, do not call
vn_fullpath() while having vm map locked. This is done in anticipation
of the vop_vptocnp commit, that would make vn_fullpath sometime
acquire vnode lock.

Also, in linprocfs, maps handler already acquires vnode lock.

No objections from: des
MFC after: 2 week


185765 08-Dec-2008 kib

Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use
sbuf instead of doing uiomove. This allows for reads from non-zero
offsets to work.

Patch is forward-ported des@' one, and was adopted to current code
by dchagin@ and me.

Reviewed by: des (linprocfs part)
PR: kern/101453
MFC after: 1 week


185589 03-Dec-2008 jhb

When unloading a 32-bit system call module, restore the sysent vector in
the 32-bit system call table instead of the main system call table.


185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


185442 29-Nov-2008 kib

Make linux_sendmsg() and linux_recvmsg() work on linux32/amd64.
Change types used in the linux' struct msghdr and struct cmsghdr
definitions to the properly-sized architecture-specific types.
Move ancillary data handler from linux_sendit() to linux_sendmsg().

Submitted by: dchagin


185436 29-Nov-2008 bz

Regen after jail support was added in r185435.


185435 29-Nov-2008 bz

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


185337 26-Nov-2008 rdivacky

Document that all the other commands are either
identical to the FreeBSD ones or rejected by
kern_msgctl().

Found with: Coverity Prevent(tm)
CID: 3456
Approved by: kib (mentor)


185169 22-Nov-2008 kib

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with: dchagin, imp, jhb, peter


185002 16-Nov-2008 kib

In the robust futexes list head, futex_offset shall be signed,
and glibc actually supplies negative offsets. Change l_ulong to l_long.

Submitted by: dchagin


184829 10-Nov-2008 peter

Sigh. Fix a pointer/int compile error.


184828 10-Nov-2008 peter

Fix a signal emulation bug introduced in r163018 (and present in 7.x).
This prevents 32 bit signal handlers from finding out what the faulting
address is. Both the secret 4th argument and siginfo->si_addr are zero.


184790 09-Nov-2008 ed

Regenerate system call tables for r184789.


184789 09-Nov-2008 ed

Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.

Looking at our source code history, it seems the uname(),
getdomainname() and setdomainname() system calls got deprecated
somewhere after FreeBSD 1.1, but they have never been phased out
properly. Because we don't have a COMPAT_FREEBSD1, just use
COMPAT_FREEBSD4.

Also fix the Linuxolator to build without the setdomainname() routine by
just making it call userland_sysctl on kern.domainname. Also replace the
setdomainname()'s implementation to use this approach, because we're
duplicating code with sysctl_domainname().

I wasn't able to keep these three routines working in our
COMPAT_FREEBSD32, because that would require yet another keyword for
syscalls.master (COMPAT4+NOPROTO). Because this routine is probably
unused already, this won't be a problem in practice. If it turns out to
be a problem, we'll just restore this functionality.

Reviewed by: rdivacky, kib


184691 05-Nov-2008 des

utf-8

MFC after: 3 weeks


184649 04-Nov-2008 jhb

Don't leak a reference on the /compat/linux vnode everytime
the linprocfs 'mtab' file is read.

MFC after: 1 month


184589 03-Nov-2008 dfr

Regen.


184588 03-Nov-2008 dfr

Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by: Isilon Systems
MFC after: 1 month


184501 31-Oct-2008 kib

The code in linux_proc_exit() contains a race when multiple linux based
processes exits at the same time. The linux_emuldata structure is freed
but p->p_emuldata is left as a dangling pointer to the just freed memory.

The check for W_EXIT in the loop scanning the child processes isn't safe
since the state of the child process can change right afterwards. Lock
the process and check the W_EXIT before delivering signal.

Submitted by: tegge
Reviewed by: davidxu
MFC after: 1 week


184413 28-Oct-2008 trasz

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


184184 22-Oct-2008 jhb

Regen for freebsd32_getdirentries().


184183 22-Oct-2008 jhb

Split the copyout of *base at the end of getdirentries() out leaving the
rest in kern_getdirentries(). Use kern_getdirentries() to implement
freebsd32_getdirentries(). This fixes a bug where calls to getdirentries()
in 32-bit binaries would trash the 4 bytes after the 'long base' in
userland.

Submitted by: ups
MFC after: 1 week


184058 19-Oct-2008 kib

Correctly fill siginfo for the signals delivered by linux tkill/tgkill.
It is required for async cancellation to work.

Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made
to not linux process.

Do not call em_find(p, ...) with p unlocked.

Move common code for linux_tkill() and linux_tgkill() into
linux_do_tkill().

Change linux siginfo_t definition to match actual linux one. Extend
uid fields to 4 bytes from 2. The extension does not change structure
layout and is binary compatible with previous definition, because i386
is little endian, and each uid field has 2 byte padding after it.

Reported by: Nicolas Joly <njoly pasteur fr>
Submitted by: dchangin
MFC after: 1 month


183871 14-Oct-2008 kib

Make robust futexes work on linux32/amd64. Use PTRIN to read
user-mode pointers. Change types used in the structures definitions to
properly-sized architecture-specific types.

Submitted by: dchagin
MFC after: 1 week


183612 04-Oct-2008 kib

Current linux_fooaffinity() emulation fails, as the FreeBSD affinity
syscalls expect the bitmap size in the range from 32 to 128. Old glibc
always assumed size 1024, while newer glibc searches for approriate
size, starting from 1024 and going up.

For now, use FreeBSD size of cpuset_t for bitmap size parameter and
return EINVAL if length of user space bitmap less than our size of
cpuset_t.

Submitted by: dchagin
MFC after: 1 week
[This requires MFC of the actual linux affinity syscalls]


183600 04-Oct-2008 kib

Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use
sbuf instead of doing uiomove. This allows for reads from non-zero
offsets to work.

Patch is forward-ported des@' one, and was adopted to current code
by dchagin@ and me.

Reviewed by: des (linprocfs part)
PR: kern/101453
MFC after: 1 week


183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


183385 26-Sep-2008 cognet

Advertise bit 26 as sse2.

Spotted out by: gahr


183365 25-Sep-2008 jhb

Add support for installing 32-bit system calls from kernel modules. This
includes syscall32_{de,}register() routines as well as a module handler
and wrapper macros similar to the support for native syscalls in
<sys/sysent.h>.

MFC after: 1 month


183363 25-Sep-2008 jhb

Sort includes and add multiple include guards.


183362 25-Sep-2008 jhb

Regen.


183361 25-Sep-2008 jhb

Tidy up a few things with syscall generation:
- Instead of using a syscall slot (370) just to get a function prototype
for lkmressys(), add an explicit function prototype to <sys/sysent.h>.
This also removes unused special case checks for 'lkmressys' from
makesyscalls.sh.
- Instead of having magic logic in makesyscalls.sh to only generate a
function prototype the first time 'lkmnosys' is seen, make 'NODEF'
always not generate a function prototype and include an explicit
prototype for 'lkmnosys' in <sys/sysent.h>.
- As a result of the fix in (2), update the LKM syscall entries in
the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'.
- Use NOPROTO for the __syscall() entry (198) in the native ABI. This
avoids the need for magic logic in makesyscalls.h to only generate
a function prototype the first time 'nosys' is encountered.


183322 24-Sep-2008 kib

Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from: jhb
MFC after: 1 month


183275 22-Sep-2008 trasz

Fix usage of mac_vnode_check_open() in linuxulator - last argument
should be VREAD, not FREAD.

Approved by: rwatson (mentor)


183273 22-Sep-2008 obrien

Add freebsd32 compat shims for ioctl(2)
CDIOREADTOCHEADER and CDIOREADTOCENTRYS requests.


183271 22-Sep-2008 obrien

Regenerate for r183270.


183270 22-Sep-2008 obrien

Add freebsd32 compat shims for ioctl(2)
MDIOCATTACH, MDIOCDETACH, MDIOCQUERY, and MDIOCLIST requests.


183189 19-Sep-2008 obrien

Regenerate for r183188.


183188 19-Sep-2008 obrien

Add freebsd32 compat shim for nmount(2).
(and quiet some compiler warnings for vfs_donmount)


183044 15-Sep-2008 obrien

style(9)


183043 15-Sep-2008 obrien

Regenerate for r183042.


183042 15-Sep-2008 obrien

Fix bug in r100384 (rev 1.2) in which the 32-bit swapon(2) was made
"obsolete, not included in system", where as the system call does exist.


183040 15-Sep-2008 ed

Allow COMPAT_SVR4 to be built without COMPAT_43.

It seems we only depend on COMPAT_43 to implement the send() and recv()
routines. We can easily implement them using sendto() and recvfrom(),
just like we do inside our very own C library.

I wasn't able to really test it, apart from simple compilation testing.
I've heard rumours that COMPAT_SVR4 is broken inside execve() anyway.
It's still worth to fix this, because I suspect we'll get rid of
COMPAT_43 somewhere in the future...

Reviewed by: rdivacky
Discussed with: jhb


183003 13-Sep-2008 thompsa

Allow PAGE_SHIFT to already be defined.

Submitted by: Hans Petter Selasky


182935 11-Sep-2008 rdivacky

The ERESTART to EINTR conversion is already done in
kern_select so there is no need to repeat it in
linux_select().

Submitted by: Dmitry Chagin <dchagin@>
MFC after: 1 week
Approved by: kib (mentor)


182892 09-Sep-2008 rdivacky

Getdents requires padding with 2 bytes instead of 1 byte
as with getdents64. The last byte is used for storing
the d_type, add this to plain getdents case where it was
missing before. Also change the code to use strlcpy instead
of plain strcpy. This changes fix the getdents crash we
had reports about (hl2 server etc.)

PR: kern/117010
MFC after: 1 week
Submitted by: Dmitry Chagin (dchagin@)
Tested by: MITA Yoshio <mita ee.t.u-tokyo.ac jp>
Approved by: kib (mentor)


182890 09-Sep-2008 kib

Remove superfluous copyin() of args, structures are already in kernel space.

Submitted by: dchagin
MFC after: 1 week


182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


182145 25-Aug-2008 julian

We left out V_static_len from ip_fw2.c
(also a whitespace diff that i'd rahter fix her ethan break in the
vimage branch.)


182141 25-Aug-2008 julian

All opt_x.h includes go at the top of other includes.


182124 24-Aug-2008 rwatson

Regenerate following r182123.


182123 24-Aug-2008 rwatson

When MPSAFE ttys were merged, a new BSM audit event identifier was
allocated for posix_openpt(2). Unfortunately, that identifier
conflicts with other events already allocated to other systems in
OpenBSM. Assign a new globally unique identifier and conform
better to the AUE_ event naming scheme.

This is a stopgap until a new OpenBSM import is done with the
correct identifier, so we'll maintain this as a local diff in svn
until then.

Discussed with: ed
Obtained from: TrustedBSD Project


181972 21-Aug-2008 obrien

Add comments on NOARGS, NODEF, and NOPROTO.


181906 20-Aug-2008 ed

Update system call tables.

The previous commit also included changes to all the system call lists,
but it is a tradition to update these lists in a second commit, so rerun
make sysent to update the $FreeBSD$ tags inside these files to refer to
the latest version of syscalls.master.

Requested by: rwatson


181905 20-Aug-2008 ed

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


180768 23-Jul-2008 ed

Add TIOCPKT and TIOCSPTLCK to the Linuxolator.

We're very lucky, because the flags used by our TIOCPKT implementation
are the same as flags used by Linux. We can safely enable TIOCPKT,
assuming EXTPROC is not used.

TIOCSPTLCK is used by unlockpt(). Because we don't need unlockpt() in
our implementation, make this ioctl a no-op.

Approved by: philip (mentor, implicit), rdivacky
Obtained from: P4 (//depot/projects/mpsafetty/...)


180766 23-Jul-2008 rdivacky

Fix linux_alarm, the linux behaviour is to limit the
secs to INT_MAX when the passed in parameter is bigger
than INT_MAX.

Submitted by: Dmitry Chagin <chagin.dmitry gmail com>
Approved by: kib (mentor)


180754 23-Jul-2008 weongyo

when NDIS framework try to query/set informations NDIS drivers can
return NDIS_STATUS_PENDING. In this case, it's waiting for 5 secs to
get the response from drivers now. However, some NDIS drivers can send
the response before NDIS framework gets ready to receive it so we might
always be blocked for 5 secs in current implementation. NDIS framework
should reset the event before calling NDIS driver's callback not after.

MFC after: 1 month


180436 10-Jul-2008 brooks

style(9): put parentheses around return values.


180434 10-Jul-2008 brooks

Regen


180433 10-Jul-2008 brooks

id_t is a 64-bit integer and thus is passed as two arguments like off_t is.
As a result, those arguments must be recombined before calling the real
syscal implementation. This change fixes 32-bit compatibility for
cpuset_getid(), cpuset_setid(), cpuset_getaffinity(), and
cpuset_setaffinity().


180291 05-Jul-2008 rwatson

Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables. Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout(). A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after: 3 weeks


179806 15-Jun-2008 cokane

Silence warning about missing IoGetDeviceObjectPointer by implementing
a simple stub that always returns STATUS_SUCCESS.

Submitted by: Paul B. Mahol <onemda@gmail.com>
Reviewed by: thompsa
MFC after: 1 week


179785 14-Jun-2008 wkoszek

Remove obselete PECOFF image activator support.

PRs assigned at the time of removal: kern/80742

Discussed on: freebsd-current (silence), IRC
Tested by: make universe
Approved by: cognet (mentor)


179720 11-Jun-2008 weongyo

fix a page fault that it occurred during ifp is NULL. This bug happens
when NDIS driver's initialization is failed and NDIS driver's trying to
call NdisWriteErrorLogEntry().


179651 08-Jun-2008 rdivacky

d_ino member of linux_dirent structure should be unsigned long.

Submitted by: Chagin Dmitry <chagin.dmitry@gmail.com>
Approved by: kib (mentor)


179523 03-Jun-2008 rdivacky

Switch to emulating Linux 2.6 on default.

Approved by: kib (mentor)


179486 02-Jun-2008 ed

Push down the major/minor conversion for pts/%u to improve consistency.

In the mpsafetty branch, Linux sshd seems to work properly inside a
jail. Some small modifications had to be made to the Linux compatibility
layer.

The Linux PTY routines always expect the device major number to be 136
or higher. Our code always set the major/minor number pair to 136:0.
This makes routines like ttyname() and ptsname() fail, because we'll end
up having ambiguous device numbers.

The conversion was not performed on all *stat() routines, which meant in
some cases the numbers didn't get transformed. By pushing the conversion
into linux_driver_get_major_minor(), the transformation will take place
on all calls.

Approved by: philip (mentor), rdivacky


179423 30-May-2008 weongyo

Fix a panic that a priority value which is passed to cv_broadcastpri(9)
can be < 0. We don't ignore a `increment' argument but at least we keep
a priority value of NDIS threads over PRI_MIN_KERN.

Reviewed by: thompsa


179009 15-May-2008 weongyo

Fix a panic when it occurred during initializing the ndis driver because
it try to read network address through ifnet structure which is NULL
until the ndis driver's initialization is finished.

Reviewed by: thompsa


178976 13-May-2008 rdivacky

Implement robust futexes. Most of the code is modelled after
what Linux does. This is because robust futexes are mostly
userspace thing which we cannot alter. Two syscalls maintain
pointer to userspace list and when process exits a routine
walks this list waking up processes sleeping on futexes
from that list.

Reviewed by: kib (mentor)
MFC after: 1 month


178439 23-Apr-2008 rdivacky

Implement linux_truncate64() syscall.

Tested by: Aline de Freitas <aline@riseup.net>
Approved by: kib (mentor)


178391 21-Apr-2008 rdivacky

The vmspace->vm_daddr is constant until freed, there is no need
to hold lock while accessing it.

Approved by: kib (mentor)


178036 09-Apr-2008 rdivacky

Remove using magic value of -1 to distinguish between linux_open()
and linux_openat(). Instead just pass AT_FDCWD into linux_common_open()
for the linux_open() case. This prevents passing -1 as a dirfd to
openat() from succeeding which is wrong.

Suggested by: rwatson, kib
Approved by: kib (mentor)


177997 08-Apr-2008 kib

Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by: rdivacky
Sponsored by: Google Summer of Code 2007
Tested by: pho


177790 31-Mar-2008 kib

Regen


177789 31-Mar-2008 kib

Add the freebsd32 compatibility shims for the *at() syscalls.

Reviewed by: rwatson, rdivacky
Tested by: pho


177785 31-Mar-2008 kib

Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho


177675 28-Mar-2008 jb

Remove files that have been repo copied to their new location
in cddl-specific parts of the source tree.


177634 26-Mar-2008 dfr

Regen.


177633 26-Mar-2008 dfr

Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.

* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.

Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks


177613 25-Mar-2008 jhb

Regen.


177612 25-Mar-2008 jhb

Add entries for the cpuset-related system calls. The existing system calls
can be used on little endian systems.

Pointy hat to: jeff


177604 25-Mar-2008 ru

Fix build.

Reported by: ache, tinderbox


177460 20-Mar-2008 rdivacky

o Add stub support for some new futex operations,
so the annoying message is not printed.

o Don't warn about FUTEX_FD not being implemented
and return ENOSYS instead of 0 (eg. success).

o Clear FUTEX_PRIVATE_FLAG as we actually implement
only private futexes so there is no reason to
return ENOSYS when app asks for a private futex.
We don't reject shared futexes because they worked
just fine with our implementation so far.

Approved by: kib (mentor)
Tested by: bsam
MFC after: 1 week


177314 17-Mar-2008 antoine

Simplify fcntl(SVR4_F_DUP2FD) code now that FreeBSD has F_DUP2FD.

Approved by: rwatson (mentor)


177257 16-Mar-2008 rdivacky

Implement sched_setaffinity and get_setaffinity using
real cpu affinity setting primitives.

Reviewed by: jeff
Approved by: kib (mentor)


177127 12-Mar-2008 jeff

- The P_SA flag has been removed. Don't reference it in a KASSERT.


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


176740 02-Mar-2008 kib

Return ENOSYS instead of 0 for the unknown futex operations.

Submitted by: rdivacky
Reported and tested by: Gary Stanley <gary velocity-servers net>


176460 22-Feb-2008 kib

Sanitize arguments to linux_mremap().
Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified.
Check for the page alignment of the addr argument.

Submitted by: rdivacky
MFC after: 1 week


176216 12-Feb-2008 ru

Regenerate for readlink(2).


176215 12-Feb-2008 ru

Change readlink(2)'s return type and type of the last argument
to match POSIX.

Prodded by: Alexey Lyashkov


175872 01-Feb-2008 phk

Give MEXTADD() another argument to make both void pointers to the
free function controlable, instead of passing the KVA of the buffer
storage as the first argument.

Fix all conventional users of the API to pass the KVA of the buffer
as the first argument, to make this a no-op commit.

Likely break the only non-convetional user of the API, after informing
the relevant committer.

Update the mbuf(9) manual page, which was already out of sync on
this point.

Bump __FreeBSD_version to 800016 as there is no way to tell how
many arguments a CPP macro needs any other way.

This paves the way for giving sendfile(9) a way to wait for the
passed storage to have been accessed before returning.

This does not affect the memory layout or size of mbufs.

Parental oversight by: sam and rwatson.

No MFC is anticipated.


175632 24-Jan-2008 pjd

Change type of kmem_used() and kmem_size() functions to uint64_t, so it
doesn't overflow in arc.c in this check:

if (kmem_used() > (kmem_size() * 4) / 5)
return (1);

With this bug ZFS almost doesn't cache.

Only 32bit machines are affected that have vm.kmem_size set to values >=1GB.

Reported by: David Taylor <davidt@yadt.co.uk>


175518 20-Jan-2008 rwatson

Regenerate.


175517 20-Jan-2008 rwatson

Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls
shm_open() and shm_unlink(). More auditing will need to be done for
these calls to capture arguments properly.


175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


175202 10-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


175165 08-Jan-2008 jhb

Regen for shm_open(2) and shm_unlink(2).


175164 08-Jan-2008 jhb

Add a new file descriptor type for IPC shared memory objects and use it to
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.

Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())


175107 05-Jan-2008 kib

After applying LCONVPATH() to the path, do use the converted path
instead of original user-mode string in the linux_stat() and
linux_lstat() syscalls.

Tested by: Peter Holm
MFC after: 3 days


174988 30-Dec-2007 jeff

Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
in such a way that the ops vector is only valid after the data, type,
and flags are valid.
- Protect f_flag and f_count with atomic operations.
- Remove the global list of all files and associated accounting.
- Rewrite the unp garbage collection such that it no longer requires
the global list of all files and instead uses a list of all unp sockets.
- Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by: kris, pho


174975 29-Dec-2007 kib

Plug the leaks in the present (hopefully, soon to be replaced)
implementation of the linux_openat() for the quick MFC.

Reported and tested by: Peter Holm
MFC after: 3 days


174974 29-Dec-2007 kib

Apply the LCONVPATH() to the (old) linux_stat() and linux_lstat() syscalls.
Without it, code has two problems:
- behaviour of the old and new [l]stat are different with regard of
the /compat/linux
- directly accessing the userspace data from the kernel asks for
the panics.

Reported and tested by: Peter Holm
Reviewed by: rdivacky
MFC after: 3 days


174898 25-Dec-2007 rwatson

Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.


174526 10-Dec-2007 jhb

Bah, remove last vestiges of some statfs conversion fixes that aren't quite
ready for CVS yet that snuck into 1.68.

Pointy hat to: jhb


174430 08-Dec-2007 scottl

Grrr, remove an unused variable missed in the last commit.


174424 07-Dec-2007 scottl

Don't expect a return value from statfs_scale_blocks().


174383 06-Dec-2007 jhb

Regen.


174382 06-Dec-2007 jhb

Add freebsd32 compat wrappers for msgctl() and __semctl() using
kern_msgctl() and kern_semctl().

MFC after: 1 week


174381 06-Dec-2007 jhb

Add freebsd32 compat wrappers for msgctl() and _semctl() using
kern_msgctl() and kern_semctl().

MFC after: 1 week


174380 06-Dec-2007 jhb

Move 32-bit SYSV IPC structure definitions into freebsd32_ipc.h.

MFC after: 1 week


174377 06-Dec-2007 jhb

Move several data structure definitions out of freebsd32_misc.c and into
freebsd32.h instead.

MFC after: 1 week


174268 04-Dec-2007 jkim

Remove redundant checks for msgsnd(3) and msgrcv(3).
COMPAT_IA32 (implicitly) requires SYSVSEM, SYSVSHM and SYSVMSG in kernel.

Pointed out by: jhb


174240 03-Dec-2007 thompsa

Implement functions required by some ndis drivers.

NdisIMCopySendPerPacketInfo [1]
KeQuerySystemTime [1]
KeTickCount [1]
strncat [1]
KeBugCheckEx

Submitted by: Marcin Simonides [1]


174150 02-Dec-2007 thompsa

Correct the calculation for the number of 100ns intervals since
January 1, 1601. The 1601 - 1970 period was in seconds rather than 100ns
units.

Remove duplication by having NdisGetCurrentSystemTime call ntoskrnl_time.


174141 02-Dec-2007 thompsa

Correct the nwbx_ies field type in struct ndis_wlan_bssid_ex.

PR: kern/118369
Submitted by: Weongyo Jeong


174070 29-Nov-2007 peter

Move the shared cp_time array (counts %sys, %user, %idle etc) to the
per-cpu area. cp_time[] goes away and a new function creates a merged
cp_time-like array for things like linprocfs, sysctl etc. The
atomic ops for updating cp_time[] in statclock go away, and the scope
of the thread lock is reduced.

sysctl kern.cp_time returns a backwards compatible cp_time[] array.
A new kern.cp_times sysctl returns the individual per-cpu stats.

I have pending changes to make top and vmstat optionally show per-cpu
stats.

I'm very aware that there are something like 5 or 6 other versions "out
there" for doing this - but none were handy when I needed them.

I did merge my changes with John Baldwin's, and ended up replacing a
few chunks of my stuff with his, and stealing some other code.

Reviewed by: jhb
Partly obtained from: jhb


174065 29-Nov-2007 jb

Remove some compatibility stuff that we now get from the Solaris header.


174042 28-Nov-2007 jb

Add more OpenSolaris compatibility headers.


174041 28-Nov-2007 jb

Remove an extern that is defined elsewhere.


174040 28-Nov-2007 jb

Add compatibility cruft moved from under _SOLARIS_C_SOURCE in sys/types.h


174039 28-Nov-2007 jb

Remove a typedef which was just a hack to avoid including vmem.h.
That typedef breaks other Solaris code.


174037 28-Nov-2007 jb

Add a missing volatile so that the code compiles cleanly.


174036 28-Nov-2007 jb

Rename the definition of lbolt to LBOLT to avoid a clash with a global
variable in FreeBSD. Until now lbolt in sys/proc.h has been #ifdef'ed
out based on _SOLARIS_C_SOURCE, but that is going away now.


173422 07-Nov-2007 kib

Implement LINUX_SIOCGIFCOUNT and LINUX_SIOCGIFINDEX/LINUX_SIOGIFINDEX.

LINUX_SIOCGIFCOUNT just returns 0 since it is not implemented in the
Linux 2.6.16.

LINUX_SIOCGIFINDEX/LINUX_SIOGIFINDEX are mapped to the FreeBSD native
SIOCGIFINDEX.

Tested by: Peter Kostouros <kpeter@melbpc.org.au>
Reviewed by: brooks, rpaulo (on net@)
Submitted by: rdivacky
MFC after: 1 week


173371 05-Nov-2007 pjd

Remove "zfs:" prefix from lock and condvar names and also skip non-letter
characters (mostly "&"). Because top(1) shows only first six characters of
wait channel, without this change we saw only one meaningful character.

Requested by: kris & others
MFC after: 1 week


173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


173247 01-Nov-2007 pjd

- Move crfree() outside MNT_ILOCK()/MNT_IUNLOCK() to eliminate a LOR:
1st 0xc4cea568 struct mount mtx (struct mount mtx) @ /usr/src/sys/modules/zfs/../../compat/opensolaris/kern/opensolaris_vfs.c:209
2nd 0xc3ee9010 sleep mtxpool (sleep mtxpool) @ /usr/src/sys/kern/kern_resource.c:1266
- Move crdup() outside MNT_ILOCK()/MNT_IUNLOCK(), as it can sleep.

Reported by: Olli Hauer <ohauer@gmx.de>
MFC after: 3 days


172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


172836 20-Oct-2007 julian

Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.


172568 12-Oct-2007 kevlo

Spelling fix for interupt -> interrupt


172316 24-Sep-2007 jhb

Allow the ia32 resource limits (compat.ia32.max{dsiz,ssiz,vmem} to be
set via loader tunables. They are already tunable via sysctl.

MFC after: 1 week
Approved by: re (kensmith)


172220 18-Sep-2007 dwmalone

The kernel version of Linux statfs64 is actually supposed to take
3 arguments, but we had forgotten the second argument. Also make the
Linux statfs64 struct depend on the architecture because it has an
extra 4 bytes padding on amd64 compared to i386.

The three argument fix is from David Taylor, the struct statfs64
stuff is my fault. With this patch I can install i386 Linux matlab
on an amd64 machine.

Submitted by: David Taylor <davidt_at_yadt.co.uk>
Approved by: re (kensmith)


172003 28-Aug-2007 jhb

Rework the routines to convert a 5.x+ statfs structure (with fixed-size
64-bit counters) to a 4.x statfs structure (with long-sized counters).
- For block counters, we scale up the block size sufficiently large so
that the resulting block counts fit into a the long-sized (long for the
ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs
VOP did this already. This can lie about the block size to 4.x binaries,
but it presents a more accurate picture of the ratios of free and
available space.
- For non-block counters, fix the freebsd32 stats converter to cap the
values at INT32_MAX rather than losing the upper 32-bits to match the
behavior of the 4.x statfs conversion routine in vfs_syscalls.c

Approved by: re (kensmith)


171998 28-Aug-2007 kib

Implement fake linux sched_getaffinity() syscall to enable java to work
with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets
native scheduler affinity syscalls.

Submitted by: rdivacky
Reviewed by: jkim
Sponsored by: Google Summer of Code 2007
Approved by: re (kensmith)


171863 16-Aug-2007 pjd

Some ZFS threads needs stack larger than the default 8kB, so use 16kB of
alternate stack if the default is smaller than 16kB.

Approved by: re (rwatson)


171861 16-Aug-2007 davidxu

Regenerate.

Approved by: re(kensmith)


171860 16-Aug-2007 davidxu

Add thr_kill2 compat32 syscall.

Submitted by: Tijl Coosemans tijl at ulyssis dot org
Approved by: re (kensmith)


171744 06-Aug-2007 rwatson

Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.

Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)


171548 22-Jul-2007 thompsa

ndis will signal the kthread to exit and then sleep on the proc pointer to
be woken up by kthread_exit. This is racey and in some cases the kthread will
exit before ndis gets around to sleep so it will be stuck indefinitely. This
change reuses the kq_exit variable to indicate that the thread has gone and
will loop on tsleep with a timeout waiting for it. If the kthread has already
exited then it will not sleep at all.

Approved by: re (rwatson)


171410 12-Jul-2007 jhb

Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
and use that in preference to maxssiz during exec() rather than always
using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
RLIMIT_STACK to determine if the existing mapping for the stack needs to
be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by: re (kensmith)


171242 05-Jul-2007 peter

Quiet warnings. I believe gcc is incorrect about these.

Approved by: re (rwatson)


171216 04-Jul-2007 peter

Don't add the 'pad' argument to the mmap/truncate/etc syscalls.

Submitted by: kensmith
Approved by: re (kensmith)


171215 04-Jul-2007 peter

Add compat6 wrapper code for mmap/lseek/pread/pwrite/truncate/ftruncate.

Approved by: re (kensmith)


171214 04-Jul-2007 peter

Regenerate after mmap/lseek/etc syscall changes

Approved by: re (kensmith)


171213 04-Jul-2007 peter

Add i386 emulation wrappers for mmap/lseek/etc. These use COMPAT6, so
you must use the already existing, already in generic, COMPAT_FREEBSD6
kernel option for running old 32 bit binaries.

Approved by: re (kensmith)


170870 17-Jun-2007 mjacob

Try a cheap way to get around gcc4.2 believing that user arguments
to system calls can change across intervening functions.


170795 15-Jun-2007 emaste

Remove stale 'XXX implement' comments for syscalls which have since been
implemented.


170587 12-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


170487 10-Jun-2007 mjacob

Quiesce warnings by initializing irql values to zero.


170486 10-Jun-2007 mjacob

Ensure that newpath is always initialized, even for the error case.


170472 09-Jun-2007 attilio

rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)


170466 09-Jun-2007 attilio

The current rusage code show peculiar problems:
- Unsafeness on ruadd() in thread_exit()
- Unatomicity of thread_exiit() in the exit1() operations

This patch addresses these problems allocating p_fd as part of the
process and modifying the way it is accessed.

A small chunk of this patch, resolves a race about p_state in kern_wait(),
since we have to be sure about the zombif-ing process.

Submitted by: jeff
Approved by: jeff (mentor)


170431 08-Jun-2007 pjd

- Reduce number of atomic operations needed to be implemented in asm by
implementing some of them using existing ones.
- Allow to compile ZFS on all archs and use atomic operations surrounded
by global mutex on archs we don't have or can't have all atomic
operations needed by ZFS.


170307 05-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170289 04-Jun-2007 dwmalone

Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported. In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.


170281 04-Jun-2007 pjd

Reimplement traverse() helper function:
1. Pass locking flags to VFS_ROOT().
2. Check v_mountedhere while the vnode is locked.
3. Always return locked vnode on success.

Change 1 fixes problem reported by Stephen M. Rumble - after
zfs_vfsops.c,1.9 change, zfs_root() no longer locks the vnode
unconditionally and traverse() didn't pass right lock type to
VFS_ROOT(). The result was that kernel paniced when .zfs/ directory
was accessed via NFS.


170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


170152 31-May-2007 kib

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


170006 26-May-2007 pjd

There are too many false positive LORs reported by WITNESS, so when ZFS
debug is turned off, initialize locks with NOWITNESS flag.
At some point I'll get back to them, we would probably need BLESSING
functionality, which is currently turned off by default.


169934 24-May-2007 pjd

DNLC_NO_VNODE can't be NULL.

Reported by: ru


169920 23-May-2007 pjd

FreeBSD's namecache works quite well with ZFS, so remove DNLC.


169901 23-May-2007 cognet

Remove duplicate includes.

Submitted by: Cyril Nguyen Huu <cyril ci0 org>


169895 23-May-2007 kib

Move futex support code from <arch>/support.s into linux compat directory.
Implement all futex atomic operations in assembler to not depend on the
fuword() that does not allow to distinguish between -1 and failure return.
Correctly return 0 from atomic operations on success.

In collaboration with: rdivacky
Tested by: Scot Hetzel <swhetzel gmail com>, Milos Vyletel <mvyletel mzm cz>
Sponsored by: Google SoC 2007


169846 22-May-2007 kan

Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.


169802 20-May-2007 jeff

- Move GDT/LDT locking into a seperate spinlock, removing the global
scheduler lock from this responsibility.

Contributed by: Attilio Rao <attilio@FreeBSD.org>
Tested by: jeff, kkenn


169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


169565 14-May-2007 jhb

Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels). Previously, each 32-bit process overwrote
its resource limits at exec() time. The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process. To fix this, don't
ovewrite the resource limits during exec(). Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit. We then use this when querying and
setting resource limits. Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children. However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after: 1 week


169199 02-May-2007 pjd

Share-lock a vnode where possible.


169181 01-May-2007 alc

Eliminate the use of Giant from ia64-specific code in freebsd32_mmap().


169156 01-May-2007 alc

Synchronize vm map and object accesses.

Approved by: des@


168962 23-Apr-2007 pjd

MFp4: Reduce diff against vendor code:
- Move FreeBSD-specific code to zfs_freebsd_*() functions in zfs_vnops.c
and keep original functions as similar to vendor's code as possible.
- Add various includes back, now that we have them.


168942 22-Apr-2007 des

Now that we're MPSAFE, tell namei() to acquire Giant if necessary.


168926 21-Apr-2007 pjd

MFp4:

@118370 Correct typo.

@118371 Integrate changes from vendor.

@118491 Show backtrace on unexpected code paths.

@118494 Integrate changes from vendor.

@118504 Fix sendfile(2). I had two ways of fixing it:
1. Fixing sendfile(2) itself to use VOP_GETPAGES() instead of
hacking around with vn_rdwr(UIO_NOCOPY), which was suggested
by ups.
2. Modify ZFS behaviour to handle this special case.

Although 1 is more correct, I've choosen 2, because hack from 1
have a side-effect of beeing faster - it reads ahead MAXBSIZE
bytes instead of reading page by page. This is not easy to implement
with VOP_GETPAGES(), at least not for me in this very moment.

Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>

@118525 Reorganize the code to reduce diff.

@118526 This code path is expected. It is simply when file is opened with
O_FSYNC flag.

Reported by: kris
Reported by: Michal Suszko <dry@dry.pl>


168840 18-Apr-2007 pjd

MFp4: Fix automatic snapshot mount when unprivileged user does lookup
on a snapshot directory:
- Remove PRIV_VFS_MOUNT check - regular users can mount snapshots
via lookups on snapshot directory.
- Reset mount credential to kcred, so user won't be able to unmount
the snapshot.
- Reset owner uid.
- Unlock vnode in case of a failure.

Reported by: simokawa


168824 17-Apr-2007 pjd

- Fix a leftover - vfs_mount_alloc() is now exported properly.
This fixes stange panics when listing .zfs/snapshot/ directory for me.
Reported by: simokawa
Reported by: Johan Hendriks <Johan@double-l.nl>
- Hide cache_purge() under FREEBSD_NAMECACHE like in other files.
- Protect mnt_flag with mount interlock.


168762 15-Apr-2007 des

Whitespace cleanup.


168711 14-Apr-2007 rwatson

Some Linux applications (ping) pass a non-NULL msg_control argument to
sendmsg() while using a 0-length msg_controllen. This isn't allowed in
the FreeBSD system call ABI, so detect this case and set msg_control to
NULL. This allows Linux ping to work.

Submitted by: rdivacky


168604 10-Apr-2007 wkoszek

strchr() and strrchr() are already present in the kernel, but with less
popular names. Hence:

- comment current index() and rindex() functions, as these serve the same
functionality as, respectively, strchr() and strrchr() from userland;
- add inlined version of strchr() and strrchr(), as we tend to use them more
often;
- remove str[r]chr() definitions from ZFS code;

Reviewed by: pjd
Approved by: cognet (mentor)


168602 10-Apr-2007 scottl

Whitespace fixes


168566 10-Apr-2007 pjd

Try to stabilize ZFS with regard to memory consumption:
- Allow to shrink ARC down to 16MB (instead of 64MB).
- Set arc_max to 1/2 of kmem_map by default.
- Start freeing things earlier when low memory situation is detected.
- Serialize execution of arc_lowmem().

I decided to setup minimum ZFS memory requirements to 512MB of RAM and 256MB of
kmem_map size. If there is less RAM or kmem_map, a warning will be printed.
World is cruel, be no better. In other words: modern file system requires
modern hardware:)

From ZFS administration guide:

"Currently the minimum amount of memory recommended to install a Solaris
system is 512 Mbytes. However, for good ZFS performance, at least one
Gbyte or more of memory is recommended."


168514 09-Apr-2007 pjd

Instead of detecting if lock is already initialized based on standard 1 bit
check, use more accurate 13 bits check. We had too many false-positives with
the standard check.

Reported by: mlaier


168508 08-Apr-2007 pjd

Extend kobj compatibility KPI to support operating on files before and
after the root file system is mounted.
This is one of the changes that will allow to put root file system on ZFS.


168498 08-Apr-2007 pjd

MFp4: Synchronize with recent OpenSolaris changes.


168477 07-Apr-2007 scottl

Add the CAM 'SG' peripheral device. This device implements a subset of the
Linux SCSI SG passthrough device API. The intention is to allow for both
running of Linux apps that want to talk to /dev/sg* nodes, and to facilitate
porting of apps from Linux to FreeBSD. As such, both native and linuxolator
entry points and definitions are provided.

Caveats:
- This does not support the procfs and sysfs nodes that the Linux SG
driver provides. Some Linux apps may rely on these for operation,
others may only use them for informational purposes.
- More ioctls need to be implemented.
- Linux uses a naming scheme of "sg[a-z]" for devices, while FreeBSD uses a
scheme of "sg[0-9]". Devfs aliasis (symlinks) are automatically created
to link the two together. However, tools like camcontrol only see the
native names.
- Some operations were originally designed to return byte counts or other
data directly as the syscall return value. The linuxolator doesn't appear
to support this well, so this driver just punts for these cases.

Now that the driver is in place, others are welcome to add missing
functionality. Thanks to Roman Divacky for pushing this work along.


168440 06-Apr-2007 jkim

Fix kernel module dependency. linprocfs depends on sysvmsg and sysvsem.

Submitted by: nork


168421 06-Apr-2007 pjd

We have strcasecmp() in libkern now.


168404 06-Apr-2007 pjd

Please welcome ZFS - The last word in file systems.

ZFS file system was ported from OpenSolaris operating system. The code in under
CDDL license.

I'd like to thank all SUN developers that created this great piece of software.

Supported by: Wheel LTD (http://www.wheel.pl/)
Supported by: The FreeBSD Foundation (http://www.freebsdfoundation.org/)
Supported by: Sentex (http://www.sentex.net/)


168355 04-Apr-2007 rwatson

Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by: kris
Discussed with: jhb, kris, attilio, jeff


168275 02-Apr-2007 jkim

MFP4: Turn emul_lock into a mutex.

Submitted by: rdivacky


168067 30-Mar-2007 jkim

Use underlying structures instead of kernel_sysctlbyname() for msginfo and
seminfo because kernel_sysctlbyname() is slow. There is no dependency
problem since linux module depends on both sysvmsg and sysvsem and linprocfs
depends on it in turn.

Pointed out by: des
Reviewed by: des


168037 30-Mar-2007 jkim

MFP4: Linux futex support for amd64.

Initial patch was submitted by kib and additional work was done
by Divacky Roman.

Tested by: emulation


168014 29-Mar-2007 julian

Implement the openat() linux syscall
Submitted by: Roman Divacky (rdivacky@)
MFC after: 2 weeks


167482 12-Mar-2007 des

Add a pn_destroy field to pfs_node. This field points to a destructor
function which is called from pfs_destroy() before the node is reclaimed.

Modify pfs_create_{dir,file,link}() to accept a pointer to a destructor
function in addition to the usual attr / fill / vis pointers.

This breaks both the programming and binary interfaces between pseudofs
and its consumers. It is believed that there are no pseudofs consumers
outside the source tree, so that the impact of this change is minimal.

Submitted by: Aniruddha Bohra <bohra@cs.rutgers.edu>


167257 06-Mar-2007 rwatson

In translate_path_major_minor(), do not calculate otherwise unused 'fp'
variable, avoiding an extra locking of the file descriptor array.


167159 02-Mar-2007 jkim

MFP4: 113090, 113130, 113132

Add Linux kernel version strings to /proc/sys/kernel.


167157 02-Mar-2007 jkim

MFP4: 115220, 115222

- Fix style(9) and reduce diff between amd64 and i386.
- Prefix Linuxulator macros with LINUX_ to prevent future collision.


166970 25-Feb-2007 netchild

MFp4 (110541):
Sync with rev 1.7 in NetBSD.

Obtained from: NetBSD


166969 25-Feb-2007 netchild

MFp4 (110523, parts which apply cleanly):
semi-automatic style(9)

The futex stuff already differs a lot (only a small part does not differ)
from NetBSD, so we are already way off and can't apply changes from NetBSD
automatically. As we need to merge everything by hand already, we can even
make the files comply to our world order.


166944 24-Feb-2007 netchild

Partial MFp4 of 114977:
Whitespace commit: Fix grammar, spelling and punctuation.

Submitted by: "Scot Hetzel" <swhetzel@gmail.com>


166931 23-Feb-2007 netchild

MFp4 (114193 (i386 part), 114194, 114195, 114200):
- Dont "return" in linux_clone() after we forked the new process in a case
of problems.
- Move the copyout of p2->p_pid outside the emul_lock coverage in
linux_clone().
- Cache the em->pdeath_signal in a local variable and move the copyout
out of the emul_lock coverage.
- Move the free() out of the emul_shared_lock coverage in a preparation
to switch emul_lock to non-sleepable lock (mutex).

Submitted by: rdivacky


166930 23-Feb-2007 netchild

MFp4 (part of 114132):
- Fix a LOR caused by holding emul_lock and proctree_lock at once.

Submitted by: rdivacky


166909 23-Feb-2007 jhb

Use 'pause' in several places rather than trying to tsleep() on NULL (which
triggers a KASSERT) or local variables. In the case of kern_ndis, the
tsleep() actually used a common sleep address (curproc) making it
susceptible to a premature wakeup.


166901 23-Feb-2007 piso

o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@


166420 02-Feb-2007 kib

Remove extern int hz; use proper include file instead.


166398 01-Feb-2007 kib

Introduce some more SO_ option equivalents from Linux to FreeBSD.

The msg variable in linux_recvmsg() was not initialized.
Copy it from userspace.

Submitted by: rdivacky


166397 01-Feb-2007 kib

No need to lock emul_lock in exit_group() because em->shared
cannot change (because its referenced by curthread). This fixes
a LOR caused by acquiring emul_shared_lock while holding emul_lock.

Fix typo in comment.

Submitted by: rdivacky


166396 01-Feb-2007 kib

No need to synchronize linux_schedtail with linux_proc_init.
p->p_emuldata is properly initialized in the time when the child can run.

Do not set p->p_emuldata to NULL when the process is exiting.
It does not make any sense and only costs 2 mutex operations.

Do not lock emul_data to unlock it on the very next line.
Comment on possible race while there.

Reparent all procs that are part of a threading group but not its leaders
to init and SIGCHLD init to finish the zombies off. This fixes zombies
left after opera's exit. [1]

There is no need to lock p_em in the linux_proc_init CLONE_THREAD
case because the process cannot change the address of the p_em->shared
because its currently running this code path.
Move assigning of em->shared outside emul_shared_lock.

Noticed by: Scott Robbins <scottro@nyc.rr.com> [1]
Submitted by: rdivacky


166162 21-Jan-2007 netchild

Use a printf-modifier which doesn't need a cast.

Submitted by: scottl


166155 20-Jan-2007 netchild

Fix tinderbox build on amd64.


166150 20-Jan-2007 netchild

MFp4 (113077, 113083, 113103, 113124, 113097):

Dont expose em->shared to the outside world before its properly
initialized. Might not affect anything but its at least a better
coding style.

Dont expose em via p->p_emuldata until its properly initialized.
This also enables us to get rid of some locking and simplify the
code because we are workin on a local copy.

In linux_fork and linux_vfork create the process in stopped state
to be sure that the new process runs with fully initialized emuldata
structure [1]. Also fix the vfork (both in linux_clone and linux_vfork)
race that could result in never woken up process [2].

Reported by: Scot Hetzel [1]
Suggested by: jhb [2]
Reviewed by: jhb (at least some important parts)
Submitted by: rdivacky
Tested by: Scot Hetzel (on amd64)

Change 2 comments (in the new code) to comply to style(9).

Suggested by: jhb


166141 20-Jan-2007 netchild

Ooops, fix the ratelimit.


166140 20-Jan-2007 netchild

Convert a KASSERT into a runtime warning (rate limited) + failsafe fallback.

Because of a stupid bug (also fixed with this commit) the KASSERT was
triggered when runnung the linux top.

Pointy hat to: netchild


166085 18-Jan-2007 kib

Add support for LINUX_O_DIRECT, LINUX_O_DIRECT and LINUX_O_NOFOLLOW flags
to open() [1].
Improve locking for accessing session control structures [2].
Try to document (most likely harmless) races in the code [3].

Based on submission by: Intron (intron at intron ac) [1]
Reviewed by: jhb [2]
Discussed with: netchild, rwatson, jhb [3]


166008 14-Jan-2007 netchild

MFp4 (112379):
Implement SETALL/GETALL IPC primitives. This fixes some LTP testcases and
LabView is able to proceed a little bit further.

Submitted by: rdivacky


166006 14-Jan-2007 netchild

MFp4 (112705):
Inherit setting of the default emulation version to the jails.

Pointed out by: jhb
Submitted by: rdivacky


165871 07-Jan-2007 netchild

MFp4 (112646):
Now (ok it's been a while...) that FreeBSD has RLIMIT_AS too, we can use
it in the linuxolator instead of ignoring it.

This fixes a LTP test.

Submitted by: rdivacky


165870 07-Jan-2007 netchild

MFp4 (112535):
No need to lock prison in a case of linux_use26 because the int
setting is atomic and process cannot leave jail.

Submitted by: kib
Reviewed by: jhb
Requested by: rdivacky


165869 07-Jan-2007 netchild

MFp4 (112534):
Dont lock em in a case of just using em->shared->group_pid because
the group_pid never changes.

Submitted by: rdivacky
Reviewed by: kib
Glanced at by: jhb


165868 07-Jan-2007 netchild

MFp4 (112499):
Protect em->shared with the lock in case of CLONE_THREAD.

Submitted by: rdivacky


165867 07-Jan-2007 netchild

MFp4 (112498):
Rename the locking flags to EMUL_DOLOCK and EMUL_DONTLOCK to prevent confusion.

Submitted by: rdivacky


165718 01-Jan-2007 delphij

Fix amd64 build.

Submitted by: Divacky Roman <xdivac02 stud fit vutbr cz>


165689 31-Dec-2006 netchild

MFp4 (111746, 108671, 108945, 112352):
- add linux utimes syscall [1]
- add linux rt_sigtimedwait syscall [2]

Submitted by: "Scot Hetzel" <swhetzel@gmail.com> [1]
Submitted by: Bruce Becker <hostmaster@whois.gts.net> [2]
PR: 93199 [2]


165688 31-Dec-2006 netchild

MFp4:
- semi-automatic style fixes


165687 31-Dec-2006 netchild

MFp4 (111746+):
Redo the checking for 2.6 emulation. We now cache the value of
use26 and replace calls to linux_get_osrelease() + parsing with
a call to linux_use26(). Typical path is lockless now.

Pointed out by: kib

This allows to ship RELENG_7_0 with a default osrelease of 2.4.2 and the
possibility to enable 2.6.x emulation without the possible performance
impact of the previous version of the check.

Submitted by: rdivacky


165686 31-Dec-2006 netchild

MFp4:
- semi-automatic style fixes
- spelling fixes in comments
- add some comments


165544 25-Dec-2006 sam

add entry points required by newer broadcom wireless driver

PR: kern/106131
Submitted by: Scot Hetzel
MFC after: 2 weeks


165439 21-Dec-2006 netchild

MFP4 (110956):
Add definition for LINUX_MSG_INFO.

This fixes the tinderbox errors.

Submitted by: rdivacky


165408 20-Dec-2006 jkim

MFP4: 109655

- Move linux_nanosleep() from src/sys/amd64/linux32/linux32_machdep.c to
src/sys/compat/linux/linux_time.c.
- Validate timespec ranges before use as Linux kernel does.
- Fix l_timespec structure.
- Clean up style(9) nits.


165407 20-Dec-2006 jkim

MFP4: 110179

Add rudimentary IPC_INFO/MSG_INFO command support for linux_msgctl()
to pacify Linux ipcs(1). While I am here, add more bound checks
for linux_msgsnd() and linux_msgrcv().


165406 20-Dec-2006 jkim

Regen.


165405 20-Dec-2006 jkim

MFP4: (part of) 110058

Fix 32-bit msgsnd(3) and msgrcv(3) emulations for amd64.


165404 20-Dec-2006 jkim

MFP4: (part of) 110058

Use new kern_msgsnd()/kern_msgrcv() to fix linux32 emulation on amd64.


164893 04-Dec-2006 jkim

MFP4: 109653

Linux mknod(2) can open any files, not just char/block or fifo files.
This fixes Linux Test Project test cases mknod01, mknod07 and mknod09.


164890 04-Dec-2006 jkim

MFP4: 109652

Fixes for 'blocking in fifoor state' problem of LTP tests.
linux_*stat*() functions were opening files with O_RDONLY to get
major/minor pair for char/block special files. Unfortunately,
when these functions are used against fifo, it is blocked forever
because there is no writer. Instead, we only open char/block special
files for major/minor conversion. We have to get rid of kern_open()
entirely from translate_path_major_minor() but today is not the day.
While I am here, add checks for errors before calling
translate_path_major_minor().


164858 03-Dec-2006 netchild

MFP4 (110957)

Use TAILQ_FOREACH_SAFE instead of the unsafe one where an item is removed
from the queue.

This prevents a panic on kldunload.

Submitted by: rdivacky
Tested by: bsam


164826 02-Dec-2006 netchild

MFP4 (108673, 110519, 110874):
- Currently LINUX_MAX_COMM_LEN is smaller than MAXCOMLEN, but in case
this will change we have a buffer overflow. Apply some defensive
programming to DTRT when this should happen.
- Use copyinstr() instead of copyin where appropriate.
* Fallback to copyin() in case of ENAMETOOLONG. [1]
* Use the right source and destination (it was wrong before).
- Use strlcpy instead of strcpy.
- Properly lock the read case (PR_GET_NAME) like the write case.

Reviewed by: rwatson (except [1])
Suggested by: rwatson [1]


164692 27-Nov-2006 jkim

MFP4: Change 109654

Add two linprocfs entries for Linux IPC:

/proc/sys/kernel/msgmni -> kern.ipc.msgmni
/proc/sys/kernel/sem -> kern.ipc.semmsl
kern.ipc.semmns
kern.ipc.semopm
kern.ipc.semmni

This fixes msgget03 and semget05 from Linux Test Project (LTP) test suite.
msgctl08 and msgctl09 also use /proc/sys/kernel/msgmni but another fix is
required from p4 (Change 110179).

Requested by: netchild


164383 18-Nov-2006 kib

Add missed ")". Fix the build.

Pointy hat to: kib


164380 18-Nov-2006 kib

Sync struct sysinfo with real one from linux.

Submitted by: rdivacky


164379 18-Nov-2006 kib

Use standard debugging facilities in linux_getcwd().

Submitted by: rdivacky


164378 18-Nov-2006 kib

Add debuging printfs to syscalls that do not contain it yet. In
sethostname do not print the hostname because it would require to copyin
the string. Sethostname is not very frequently used.

Submitted by: rdivacky


164377 18-Nov-2006 kib

Remove unecessary locking of process in linux_getpid.

Suggested by: jhb
Submitted by: rdivacky


164297 15-Nov-2006 kib

Group pid and parent are shared in a case of CLONE_THREAD not CLONE_VM.
This fix lets clone02 LTP test pass with 2.6 emulation. In reality 99%
of the cases are that CLONE_VM and CLONE_THREAD are both set so it
seemed to work.

Submitted by: rdivacky


164296 15-Nov-2006 kib

In rev 1.188 of linux_misc.c the added check for valid options ommited
__WCLONE. This fixes it thus fixing skype/teamspeak to not keep zombies
after exit.

Submitted by: rdivacky
Reported by: Bakul Shah (bakul at bitblocks com)


164199 11-Nov-2006 ru

Regen.

Forgotten by: trhodes


164184 11-Nov-2006 trhodes

Merge posix4/* into normal kernel hierarchy.

Reviewed by: glanced at by jhb
Approved by: silence on -arch@ and -standards@


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163961 03-Nov-2006 ru

Regen.


163960 03-Nov-2006 ru

Fix build breakage introduced in previous commit (redeclatation
of sctp functions).


163956 03-Nov-2006 rrs

This commits the remake in kern/ make sysent to get
the correct syscalls.master's $FreeBSD$ tag record and
a make sysent in sys/compat/freebsd32. Thanks Ruslan
for pointing out the steps I missed :-0
Approved by: gnn


163953 03-Nov-2006 rrs

Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by: gnn
Approved by: gnn


163760 29-Oct-2006 netchild

Backout the linux aio stuff. Several problems where identified and the
dynamic nature (if no native aio code is available, the linux part
returns ENOSYS because of missing requisites) should be solved differently
than it is.

All this will be done in P4.

Not included in this commit is a backout of the changes to the native aio
code (removing static in some places). Those changes (and some more) will
also be needed when the reworked linux aio stuff will reenter the tree.

Requested by: rwatson
Discussed with: rwatson


163757 29-Oct-2006 netchild

style(9)

Noticed by: rwatson


163740 28-Oct-2006 netchild

Fix style(9).

Noticed by: rwatson


163734 28-Oct-2006 netchild

MFP4:
Implement prctl().

Submitted by: rdivacky
Tested with: LTP


163664 24-Oct-2006 sobomax

Regen.


163663 24-Oct-2006 sobomax

Fix kernel breakage introduced in the previous commit (redeclatation
of the audit functions).


163658 24-Oct-2006 rwatson

Regenerate.


163657 24-Oct-2006 rwatson

Hook up audit functions in the freebsd32 compatibility code. It is
believed these likely don't require wrappers.

Reported by: sobomax
MFC after: 3 days


163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


163451 17-Oct-2006 davidxu

Regenerate.


163450 17-Oct-2006 davidxu

Sync with master.


163381 15-Oct-2006 netchild

Fix compile (use the right variable name).


163379 15-Oct-2006 netchild

MFP4 (with some minor changes):

Implement the linux_io_* syscalls (AIO). They are only enabled if the native
AIO code is available (either compiled in to the kernel or as a module) at
the time the functions are used. If the AIO stuff is not available there
will be a ENOSYS.

From the submitter:
---snip---
DESIGN NOTES:

1. Linux permits a process to own multiple AIO queues (distinguished by
"context"), but FreeBSD creates only one single AIO queue per process.
My code maintains a request queue (STAILQ of queue(3)) per "context",
and throws all AIO requests of all contexts owned by a process into
the single FreeBSD per-process AIO queue.

When the process calls io_destroy(2), io_getevents(2), io_submit(2) and
io_cancel(2), my code can pick out requests owned by the specified context
from the single FreeBSD per-process AIO queue according to the per-context
request queues maintained by my code.

2. The request queue maintained by my code stores contrast information between
Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks
(struct aiocb). FreeBSD IO control block actually exists in userland memory
space, required by FreeBSD native aio_XXXXXX(2).

3. It is quite troubling that the function io_getevents() of libaio-0.3.105
needs to use Linux-specific "struct aio_ring", which is a partial mirror
of context in user space. I would rather take the address of context in
kernel as the context ID, but the io_getevents() of libaio forces me to
take the address of the "ring" in user space as the context ID.

To my surprise, one comment line in the file "io_getevents.c" of
libaio-0.3.105 reads:

Ben will hate me for this

REFERENCE:

1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/
(include/linux/aio_abi.h, fs/aio.c)

2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/
(io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2))

3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html
The design notes: http://lse.sourceforge.net/io/aionotes.txt

4. The package libaio, both source and binary:
http://rpmfind.net/linux/rpm2html/search.php?query=libaio
Simple transparent interface to Linux AIO system calls.

5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/
POSIX AIO implementation based on Linux AIO system calls (depending on
libaio).
---snip---

Submitted by: Li, Xiao <intron@intron.ac>


163369 15-Oct-2006 netchild

MFP4 (107868 - 107870):
Use a macro to test for a valid signal instead of doing it my hand everywhere.

Submitted by: rdivacky


163251 11-Oct-2006 keramida

Spell proc/sys/kernel/pid_max correctly in a comment.

Submitted by: rdivacky


163217 10-Oct-2006 jhb

Don't pass unused bufsz to kern_shmctl().


163216 10-Oct-2006 jhb

Only try to copyin a msqid for the IPC_SET command to msgctl(). Other
commands (such as IPC_RMID) were bogusly failing with EFAULT.

Tested by: jkim


163215 10-Oct-2006 jhb

Remove unnecessary casts before PTRIN().


163132 08-Oct-2006 netchild

- change if (cond) panic() to KASSERT.
- Dont forget to free em in a case of error.

Suggested by: ssouhlal
Submitted by: rdivacky
Tested with: LTP


163131 08-Oct-2006 netchild

- Replace homegrown check for FIFO with S_ISFIFO. [1]
- Check the status of the options before messing with it.

Inspired by: NetBSD [1]
Submitted by: rdivacky
Tested with: LTP


163129 08-Oct-2006 netchild

Implement /proc/sys/kernel/pid_max.

Submitted by: rdivacky
Tested with: LTP


163047 06-Oct-2006 davidxu

Regenerate.


163046 06-Oct-2006 davidxu

Implement 32bit umtx_lock and umtx_unlock system calls, these two system
calls are not used by libthr in RELENG_6 and HEAD, it is only used by
the libthr in RELENG-5, the _umtx_op system call can do more incremental
dirty works than these two system calls without having to introduce new
system calls or throw away old system calls when things are going on.


163020 05-Oct-2006 davidxu

Regenerate.


163019 05-Oct-2006 davidxu

Oops, add the missing file.


163018 05-Oct-2006 davidxu

Move some declaration of 32-bit signal structures into file
freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.


162993 03-Oct-2006 rwatson

Regenerate.


162992 03-Oct-2006 rwatson

Change getpagesize() system call audit event to more clearly indicate
that we don't audit it.

MFC after: 3 days
Obtained from: TrustedBSD Project


162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


162585 23-Sep-2006 netchild

MFp4:
- Linux returns ENOPROTOOPT in a case of not supported opt to setsockopt.
- Return EISDIR in pread() when arg is a directory.
- Return EINVAL instead of EFAULT when namelen is not correct in accept().
- Return EINVAL instead of EACCESS if invalid access mode is entered in
access().
- Return EINVAL instead of EADDRNOTAVAIL in a case of bad salen param
to bind().

Submitted by: rdivacky
Tested with: LTP (vfork01 fails now, but it seems to be a race and
not caused by those changes)
MFC after: 1 week


162566 23-Sep-2006 davidxu

Regenerate.


162565 23-Sep-2006 davidxu

Enable sigwait.


162552 22-Sep-2006 davidxu

Regenerate.


162551 22-Sep-2006 davidxu

Add compatible code to let 32bit libthr work on 64bit kernel.


162537 22-Sep-2006 davidxu

Regenerate.


162536 22-Sep-2006 davidxu

Add umtx support for 32bit process on AMD64 machine.


162502 21-Sep-2006 davidxu

Regenerate.


162501 21-Sep-2006 davidxu

sync with master.


162374 17-Sep-2006 rwatson

Regenerate.


162373 17-Sep-2006 rwatson

AUE_SIGALTSTACK instead of AUE_SIGPENDING for sigaltstack().

Obtained from: TrustedBSD Project
MFC after: 3 days


162358 16-Sep-2006 netchild

- don't reboot() when feed with wrong parameters (and enough permissions) [1]
- add support to power off the system [2]
- check the linux magic values [3]

Submitted by: Marcin Cieslak <saper@SYSTEM.PL> [1,2]
Modelled after: linux man page of the reboot() syscall [3]
Found by: LTP testcase "reboot02" [1]
Tested with: LTP testcase "reboot02" [1,3]
MFC after: 1 week


162201 10-Sep-2006 netchild

The Linux unlink syscall uses a different errno value when trying to unlink
a directory.

PR: 102897 [1]
Noticed by: Knut Anders Hatlen <kahatlen@gmail.com>, testrun with LTP [1]
Submitted by: Marcin Cieslak <saper@SYSTEM.PL>
Tested by: netchild (LTP test run)


162184 09-Sep-2006 netchild

- Extend the coverage of PROC_LOCK to cover wakeup(&p->p_emuldata);
- Lock the emuldata in a case when we just created it.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Suggested by: jhb


162182 09-Sep-2006 netchild

Change futex lock from mutex to sx. Make futex_get atomic (protected by the
futex lock).

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Suggested by: jhb


162179 09-Sep-2006 netchild

- don't wake every sleeper just the first one [1]
- remove debuging printf [2]

Submitted by: intron <mag@intron.ac> [1], rdivacky [2]


162167 09-Sep-2006 davidxu

The following functions need not to be reimplemented, reuse 64bit
syscalls instead:
sigqueue, thr_set_name, thr_setscheduler, thr_getscheduler,
thr_setschedparam.


161960 03-Sep-2006 rwatson

Regenerate.


161958 03-Sep-2006 rwatson

Set freebsd32 system call event identifiers for:

- old truncate, ftruncate
- old getpeername, gethostid, sethostid, getrlimit, setrlimit, killpg.
- old quota, getsockname, getdirentries.
- lgetfh
- old getdomainname, setdomainname
- sysarch, rtprio, __getcwd, jail, sigtimedwait
- extattrctl, extattr_{get,set,delete,list}_{file,fd,link}
- getresgid, getresuid, kqueue, eaccess, nmount, sendfile
- fhstatfs, kldunloadf

Right identifiers for:

- nfssvc

Remove incorrect identifier for:

- __acl_get_file

Compile tested with help of: sam
Obtained from: TrustedBSD Project


161948 03-Sep-2006 rwatson

Regenerate. Looks like someone missed doing this previously as more than
just the audit event change appears in the diff.


161947 03-Sep-2006 rwatson

Use AUE_NTP_ADJTIME instead of AUE_ADJTIME for ntp_adjtime().

Obtained from: TrustedBSD Project


161860 02-Sep-2006 rwatson

Remove two hypothetical calls to suser() in ifdef'd (and uncompilable)
svr4 code: this code would call centralized sysctl code that does
these checks also.

MFC after: 1 week
Obtained from: TrustedBSD Project
Sponsored by: nCircle Network Security, Inc.


161697 28-Aug-2006 ssouhlal

FREE -> free

Submitted by: rdivacky


161665 27-Aug-2006 netchild

Add the linux statfs64 call. This allows Tivoli backup to proceed a little
but further on -current (still not successful, but a step into the right
direction).

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Tested by: Paul Mather <paul@gromit.dlib.vt.edu>


161637 26-Aug-2006 netchild

Correct the number of retries in a futex_wake() call.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


161610 25-Aug-2006 rwatson

Don't call suser_cred() directly from linux_sethostname(), as it just
wraps userland_sysctl(), which performs necessary privilege checks as
part of its normal operation.

MFC after: 1 week


161474 20-Aug-2006 netchild

Sync the MI parts for amd64 with i386 and remove the corresponding special
handling for amd64 in the common code. The MD parts for amd64 are still
outstanding, but at least this fixes some panics on amd64.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Tested by: bsam


161461 19-Aug-2006 netchild

Get rid of some nested includes.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb


161460 19-Aug-2006 ssouhlal

MALLOC -> malloc and FREE -> free

Submitted by: rdivacky
Pointed out by: jhb


161459 19-Aug-2006 ssouhlal

ifdef DEBUG a printf

Submitted by: rdivacky


161425 17-Aug-2006 imp

while (0); -> while (0) in multi-line macros


161420 17-Aug-2006 netchild

- disable some more code when osrelease=2.4.2
- protect td->td_proc->p_pid with the proc lock in linux_getpid
in the amd64 (= non i386) case [1]

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: netchild [1]


161419 17-Aug-2006 netchild

Move some stuff into headers where they belong.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb, ssouhlal


161398 17-Aug-2006 netchild

Fix the DEBUG build:
- linux_emul.c [1]
- linux_futex.c [2]

Sponsored by: Google SoC 2006 [1]
Submitted by: rdivacky [1]
netchild [2]


161367 16-Aug-2006 peter

Grab two syscall numbers. One is used to emulate functionality that linux
has in its procfs (do a readlink of /proc/self/fd/<nn> to find the pathname
that corresponds to a given file descriptor). Valgrind-3.x needs this
functionality. This is a placeholder only at this time.


161365 16-Aug-2006 netchild

Style fixes to comments.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb, ssouhlal


161343 16-Aug-2006 jkim

Include sys/limits.h for INT_MAX. freebsd32_proto.h 1.58 does not include
sys/umtx.h any more and previously it was included from there.


161330 15-Aug-2006 jhb

Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier
systrace changes.


161328 15-Aug-2006 jhb

- Remove unused sysvec variables from various syscalls.conf.
- Send the systrace_args files for all the compat ABIs to /dev/null for
now. Right now makesyscalls.sh generates a file with a hardcoded
function name, so it wouldn't work for any of the ABIs anyway. Probably
the function name should be configurable via a 'systracename' variable
and the functions should be stored in a function pointer in the sysvec
structure.


161317 15-Aug-2006 netchild

Disable some parts of the code on amd64 for now to prevent a panic. A better
fix will come later.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


161310 15-Aug-2006 netchild

Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
- pid/tid mangling - complete
- thread area - complete
- futexes - complete with issues
- clone() extension - complete with some possible minor issues
- mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
disabled when not build as part of the kernel with native FreeBSD mq*
support (module support for this will come later)

Tested with:
- linux-firefox - works, tested
- linux-opera - works, tested
- linux-realplay - doesnt work, issue with futexes
- linux-skype - doesnt work, issue with futexes
- linux-rt2-demo - works, tested
- linux-acroread - doesnt work, unknown reason (coredump) and sometimes
issue with futexes
- various unix utilities in linux-base-gentoo3 and linux-base-fc4:
everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
sysctl compat.linux.osrelease=2.6.16
to switch back use
sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Some suggestions/help by: jhb, kib, manu@NetBSD.org, netchild


161304 15-Aug-2006 netchild

Add some new files needed for linux 2.6.x compatibility.

Please don't style(9) the NetBSD code, we want to stay in sync. Not imported
on a vendor branch since we need local changes.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
With help from: manu@NetBSD.org
Obtained from: NetBSD (linux_{futex,time}.*)


161094 08-Aug-2006 kib

Lock the vnode around the call to VOP_GETATTR. Move the locked code
and vn_fullpath (that call malloc(..., M_WAITOK)) from under the
vm object lock, since sleep is not allowed while holding the mutex.

Being there, wrap VOP_GETATTR call with conditional Giant aquire.
Currently this is (almost) noop because pseudofs is Giant-locked.

Tested by: kris
Approved by: pjd (mentor)
MFC after: 2 weeks


161012 05-Aug-2006 rwatson

With socket code no longer in svr4_stream.c, MAC includes are no longer
required, so GC.


160980 04-Aug-2006 brooks

Use TAILQ_EMPTY instead of checking if TAILQ_FIRST is NULL.


160799 28-Jul-2006 jhb

Regen for MPSAFE flag removal.


160798 28-Jul-2006 jhb

Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.


160797 28-Jul-2006 jhb

Various fixes to comments in the syscall master files including removing
cruft from the audit import and adding mention of COMPAT4 to freebsd32.


160795 28-Jul-2006 jhb

Regen.


160794 28-Jul-2006 jhb

- Explicitly lock Giant to protect the fields in the svr4_strm structure
except for s_family (which is read-only once after it is set when the
structure is created).
- Mark svr4_sys_ioctl(), svr4_sys_getmsg(), and svr4_sys_putmsg() MPSAFE.


160765 27-Jul-2006 jhb

Fix a file descriptor race I reintroduced when I split accept1() up into
kern_accept() and accept1(). If another thread closed the new file
descriptor and the first thread later got an error trying to copyout the
socket address, then it would attempt to close the wrong file object. To
fix, add a struct file ** argument to kern_accept(). If it is non-NULL,
then on success kern_accept() will store a pointer to the new file object
there and not release any of the references. It is up to the calling code
to drop the references appropriately (including a call to fdclose() in case
of error to safely handle the aforementioned race). While I'm at it, go
ahead and fix the svr4 streams code to not leak the accept fd if it gets an
error trying to copyout the streams structures.


160559 21-Jul-2006 jhb

Regen.


160558 21-Jul-2006 jhb

Clean up the svr4 socket cache and streams code some to make it more easily
locked.
- Move all the svr4 socket cache code into svr4_socket.c, specifically
move svr4_delete_socket() over from streams.c. Make the socket cache
entry structure and svr4_head private to svr4_socket.c as a result.
- Add a mutex to protect the svr4 socket cache.
- Change svr4_find_socket() to copy the sockaddr_un struct into a
caller-supplied sockaddr_un rather than giving the caller a pointer to
our internal one. This removes the one case where code outside of
svr4_socket.c could access data in the cache.
- Add an eventhandler for process_exit and process_exec to purge the cache
of any entries for the exiting or execing process.
- Add methods to init and destroy the socket cache and call them from the
svr4 ABI module's event handler.
- Conditionally grab Giant around socreate() in streamsopen().
- Use fdclose() instead of inlining it in streamsopen() when handling
socreate() failure.
- Only allocate a stream structure and attach it to a socket in
streamsopen(). Previously, if a svr4 program performed a stream
operation on an arbitrary socket not opened via the streams device,
we would attach streams state data to it and change f_ops of the
associated struct file while it was in use. The latter was especially
not safe, and if a program wants a stream object it should open it via
the streams device anyway.
- Don't bother locking so_emuldata in the streams code now that we only
touch it right after creating a socket (in streamsopen()) or when
tearing it down when the file is closed.
- Remove D_NEEDGIANT from the streams device as it is no longer needed.


160557 21-Jul-2006 jhb

Add conditional VFS Giant locking to svr4_sys_fchroot() and mark it MPSAFE.
Also, call change_dir() instead of doing part of it inline (this now adds
a mac_check_vnode_chdir() call) to match fchdir() and call
mac_check_vnode_chroot() to match chroot(). Also, use the change_root()
function to do the actual change root to match chroot().

Reviewed by: rwatson


160555 21-Jul-2006 jhb

- Pass the MPSAFE flag to namei() in linux_uselib() and handle conditional
Giant VFS locking in that function.
- Remove bogus code to handle the case where namei() returns success but a
NULL vnode pointer.
- Note that this code duplicates exec_check_permissions() and annotate
where it differs.
- Hold the vnode lock longer to protect the write to set VV_TEXT in
v_vflag.
- Mark linux_uselib() MPSAFE.

Reviewed by: rwatson


160512 19-Jul-2006 jhb

Regen.


160511 19-Jul-2006 jhb

Add conditional VFS Giant locking to svr4_sys_resolvepath() and mark it
MPSAFE.


160510 19-Jul-2006 jhb

Make svr4_sys_waitsys() a lot less ugly and mark it MPSAFE.
- If the WNOWAIT flag isn't specified and either of WEXITED or WTRAPPED is
set, then just call kern_wait() and let it do all the work. This means
that this function no longer has to duplicate the work to teardown
zombies that is done in kern_wait(). Instead, if the above conditions
aren't true, then it uses a simpler loop to implement WNOWAIT and/or
tracing for only stopped or continued processes. This function still
has to duplicate code from kern_wait() for the latter two cases, but
those are much simpler.
- Sync the code to handle the WCONTINUED and WSTOPPED cases with the
equivalent code in kern_wait().
- Fix several places that would return with the proctree lock still held.
- Lock the current process to prevent lost wakeup races when blocking.


160506 19-Jul-2006 jhb

Don't free the sockaddr in kern_bind() and kern_connect() as not all
callers pass a sockaddr allocated via malloc() from M_SONAME anymore.
Instead, free it in the callers when necessary.


160504 19-Jul-2006 jhb

Initialize svr4_head during MOD_LOAD rather than on demand.


160333 14-Jul-2006 davidxu

sync with master.


160277 11-Jul-2006 jhb

Regen.


160276 11-Jul-2006 jhb

- Add conditional VFS Giant locking to getdents_common() (linux ABIs),
ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(),
and svr4_sys_getdents64() similar to that in getdirentries().
- Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(),
linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and
svr4_sys_getdents64() MPSAFE.


160249 10-Jul-2006 jhb

- Split out kern_accept(), kern_getpeername(), and kern_getsockname() for
use by ABI emulators.
- Alter the interface of kern_recvit() somewhat. Specifically, go ahead
and hard code UIO_USERSPACE in the uio as that's what all the callers
specify. In place, add a new uioseg to indicate what type of pointer
is in mp->msg_name. Previously it was always a userland address, but
ABI emulators may pass in kernel-side sockaddrs. Also, remove the
namelenp field and instead require the two places that used it to
explicitly copy mp->msg_namelen out to userland.
- Use the patched kern_recvit() to replace svr4_recvit() and the stock
kern_sendit() to replace svr4_sendit().
- Use kern_bind() instead of stackgap use in ti_bind().
- Use kern_getpeername() and kern_getsockname() instead of stackgap in
svr4_stream_ti_ioctl().
- Use kern_connect() instead of stackgap in svr4_do_putmsg().
- Use kern_getpeername() and kern_accept() instead of stackgap in
svr4_do_getmsg().
- Retire the stackgap from SVR4 compat as it is no longer used.


160246 10-Jul-2006 jhb

Unexpand PTRIN() in several places and fix one instance where 0 was being
used instead of NULL.


160190 08-Jul-2006 jhb

Add a kern_close() so that the ABIs can close a file descriptor w/o having
to populate a close_args struct and change some of the places that do.


160187 08-Jul-2006 jhb

Rework kern_semctl a bit to always assume the UIO_SYSSPACE case. This
mostly consists of pushing a few copyin's and copyout's up into
__semctl() as all the other callers were already doing the UIO_SYSSPACE
case. This also changes kern_semctl() to set the return value in a passed
in pointer to a register_t rather than td->td_retval[0] directly so that
callers can only set td->td_retval[0] if all the various copyout's succeed.

As a result of these changes, kern_semctl() no longer does copyin/copyout
(except for GETALL/SETALL) so simplify the locking to acquire the semakptr
mutex before the MAC check and hold it all the way until the end of the
big switch statement. The GETALL/SETALL cases have to temporarily drop it
while they do copyin/malloc and copyout. Also, simplify the SETALL case to
remove handling for a non-existent race condition.


160143 06-Jul-2006 jhb

- Protect the list of linux ioctl handlers with an sx lock.
- Hold Giant while calling linux ioctl handlers for now as they aren't all
known to be MPSAFE yet.
- Mark linux_ioctl() MPSAFE.


160141 06-Jul-2006 jhb

Don't try to copyin extra data for IPC_RMID requests to msgctl() or
shmctl(). None of the other ABI's do this (including the native FreeBSD
ABI), and uselessly trying to do a copyin() can actually result in a
bogus EFAULT if the a process specifies NULL for the optional argument
(which is what they should do in this case).


160063 01-Jul-2006 markm

Housekeeping. Update for maintainers who have handed in their commit bits
or (in my case) no longer feel that oversight is necessary.


159995 27-Jun-2006 netchild

Improve linprovfs to provide/fix the
- process state (idle, sleeping, running, ...) [1]
- the process group ID of the process which owns the connected tty
- some page fault stats
- time spend in kernel/userland
- priority/nice value
- starttime [1]
- memory/swap stats
- scheduling policy

Additionally add some new fields and correct some not filled out ones.

This brings us down to 15 dummy fields.

The fields marked with [1] are needed to get Oracle 10 running. The starttime
field is not completely right, since it displays the _same_ starttime for
_every_ process, but at least it is not 0 and Oracle accepts this.

This is a RELENG_x_y candidate.

Noticed by: Dmitry Ganenko <dima@apk-inform.com> [1]
Reviewed by: des, rdivacky
MFC after: 1 week


159994 27-Jun-2006 jhb

Regen.


159993 27-Jun-2006 jhb

Use kern_shmctl() in svr4_sys_shmctl() and drop use of the stackgap. Mark
svr4_sys_shmctl() MPSAFE.


159992 27-Jun-2006 jhb

Axe the stackgap macros as the Linux ABIs no longer use the stackgap.


159991 27-Jun-2006 jhb

- Add a kern_semctl() helper function for __semctl(). It accepts a pointer
to a copied-in copy of the 'union semun' and a uioseg to indicate which
memory space the 'buf' pointer of the union points to. This is then used
in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap.
- Mark linux_ipc() and svr4_sys_semsys() MPSAFE.


159983 27-Jun-2006 jhb

Regen.


159982 27-Jun-2006 jhb

- Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away. mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
(or rather, to handle concurrent unmount races) from going away.
umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.


159961 26-Jun-2006 jhb

Regen.


159960 26-Jun-2006 jhb

Change svr4_sys_break() to just call obreak() and mark it MPSAFE.

Not objected to by: alc


159958 26-Jun-2006 jhb

- Sync with master: rmdir(), mkdir(), and extattr_*() are all MPSAFE.
- freebsd32_utimes() is MPSAFE.


159896 23-Jun-2006 netchild

The linux times syscall can be called with a NULL pointer, so keep cool
and don't panic.

This fix is different from the patch submitted as it not only prevents
a NULL-pointer dereference, but also skips some work in this case.

Noticed by: Dmitry Ganenko <dima@apk-inform.com>
Reviewed by: rdivacky (the original version as in emulation@)
MFC after: 1 week
Security: This is a RELENG_x_y candidate (local DoS).
Go ahead by: secteam (cperciva)


159856 22-Jun-2006 dds

Move conditional preprocessing out of the SYSCTL_ADD_STRING macro
invocation. Per C99 6.10.3 paragraph 11 preprocessing directives
appearing inside macro arguments yield undefined behavior.


159808 20-Jun-2006 jhb

Conditionally acquire Giant around VFS operations.


159797 20-Jun-2006 jhb

- Add a new linker_file_foreach() function that walks the list of linker
file objects calling a user-specified predicate function on each object.
The iteration terminates either when the entire list has been iterated
over or the predicate function returns a non-zero value.
linker_file_foreach() returns the value returned by the last invocation
of the predicate function. It also accepts a void * context pointer that
is passed to the predicate function as well. Using an iterator function
avoids exposing linker internals to the rest of the kernel making locking
simpler.
- Use linker_file_foreach() instead of walking the list of linker files
manually to lookup ndis files in ndis(4).
- Use linker_file_foreach() to implement linker_hwpmc_list_objects().


159548 12-Jun-2006 jhb

Forcefully turn off GPROF in this file if it is enabled as GPROF's
attempt to use a macro for 'ret' doesn't play well with the wrappers
trying to implement 'Pascal-style' calling conventions.


159544 12-Jun-2006 des

Add the model name, obtained from the hw.model sysctl variable.

MFC after: 3 weeks


159412 08-Jun-2006 ps

Do not copy out the iovec in the 32bit recvmsg call since soreceive
calls uiomove directly.

Reviewed by: ups
MFC after: 1 week


159170 02-Jun-2006 des

As far as I can tell, the correct CPU family for amd64 (which Linux calls
x86_64) is 15, not 6.

MFC after: 3 weeks


158658 16-May-2006 ambrisko

Fix file leaking in translate_path_major_minor.


158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


158471 12-May-2006 jhb

Remove various bits of conditional Alpha code and fixup a few comments.


158434 11-May-2006 ambrisko

Remove the dependency on procfs since it isn't used.

Noticed by: des


158415 10-May-2006 netchild

Now that we don't have a linuxolator on alpha anymore:
- unifdef __alpha__
- revert rev. 1.66 of linux_socket.c


158406 10-May-2006 netchild

Implement rt_sigpending in the linuxolator.

PR: 92671
Submitted by: Markus Niemist"o <markus.niemisto@gmx.net>


158381 09-May-2006 ambrisko

Add in linsysfs. A linux 2.6 like sys filesystem to pacify the Linux
LSI MegaRAID SAS utility.

Sponsored by: IronPort Systems
Man page help from: brueffer


158312 05-May-2006 ambrisko

Fix the the duplicate cut-n-paste in linux_fstat64 pointed out by
Alexander Leidinger. I forget to fix it in this version.


158311 05-May-2006 ambrisko

Enhance the Linux emulation layer to make MegaRAID SAS managements tool happy.
Add back in a scheme to emulate old type major/minor numbers via hooks into
stat, linprocfs to return major/minors that Linux app's expect. Currently
only /dev/null is always registered. Drivers can register via the Linux
type shim similar to the ioctl shim but by using
linux_device_register_handler/linux_device_unregister_handler functions.
The structure is:

struct linux_device_handler {
char *bsd_driver_name;
char *linux_driver_name;
char *bsd_device_name;
char *linux_device_name;
int linux_major;
int linux_minor;
int linux_char_device;
};

Linprocfs uses this to display the major number of the driver. The
soon to be available linsysfs will use it to fill in the driver name.
Linux_stat uses it to translate the major/minor into Linux type values.

Note major numbers are dynamically assigned via passing in a -1 for
the major number so we don't need to keep track of them.

This is somewhat needed due to us switching to our devfs. MegaCli
will not run until I add in the linsysfs and mfi Linux compat changes.

Sponsored by: IronPort Systems


157369 01-Apr-2006 rwatson

Annotate uses of fgetsock() with indications that they should rely
on their existing file descriptor references to sockets, rather than
use fgetsock() to retrieve a direct socket reference.

MFC after: 3 months


157286 30-Mar-2006 ps

regen for 32bit System V shared memory


157285 30-Mar-2006 ps

Properly support for FreeBSD 4 32bit System V shared memory.

Submitted by: peter
Obtained from: Yahoo!
MFC after: 3 weeks


157189 27-Mar-2006 avatar

Unbreaking build by removing a now unused variable.


157183 27-Mar-2006 jhb

Use td_ucred rather than p_ucred to avoid panics and general unhappiness.

Pointy hat to: netchild


156976 21-Mar-2006 netchild

Fix the LINT build on alpha:
- rename some file local structure definitions, the names clash with
autogenerated names
- on !alpha add some compatibility defines for those renamed structures
- make some functions globally visible on alpha


156921 20-Mar-2006 netchild

Fix tinderbox on alpha.

Tested by: cross-compile


156874 19-Mar-2006 ru

Unbreak COMPAT_LINUX32 option support on amd64.

Broken by: netchild


156850 18-Mar-2006 netchild

Fixup some problems in my previous commit (COMPAT_43).

Pointyhat to: netchild


156842 18-Mar-2006 netchild

Get rid of the need of COMPAT_43 in the linuxolator.

Submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Obtained from: DragonFly (some parts)


156440 08-Mar-2006 ups

Fix exec_map resource leaks.

Tested by: kris@


156266 04-Mar-2006 ps

use strlcpy in cvtstatfs and copy_statfs instead of bcopy to ensure
the copied strings are properly terminated.

bzero the statfs32 struct in copy_statfs.


156115 28-Feb-2006 ps

regen for 32bit sendfile


156114 28-Feb-2006 ps

Fix 32bit sendfile by implementing kern_sendfile so that it takes
the header and trailers as iovec arguments instead of copying them
in inside of sendfile.

Reviewed by: jhb
MFC after: 3 weeks


155402 06-Feb-2006 jhb

- Always call exec_free_args() in kern_execve() instead of doing it in all
the callers if the exec either succeeds or fails early.
- Move the code to call exit1() if the exec fails after the vmspace is
gone to the bottom of kern_execve() to cut down on some code duplication.


155382 06-Feb-2006 jeff

- Remove ifdef disabled code that doesn't have a chance of working anymore.


155295 04-Feb-2006 rwatson

Regenerate.


155294 04-Feb-2006 rwatson

Audit FreeBSD 32-bit system calls on 64-bit FreeBSD systems.

Obtained from: TrustedBSD Project


155033 30-Jan-2006 jeff

- vn_lock with LK_RETRY can not return an error. The code that handled this
case was not necessary.

Sponsored by: Isilon Systems, Inc.


154872 26-Jan-2006 cognet

Fix a typo : deivce => device

Spotted by: rwatson


154834 26-Jan-2006 cognet

Linux compat bits needed to make linux programs use the new ptys :
linux_ioctl.[ch] : Implement LINUX_TIOCGPTN, which returns the pty number
linux_stats.c :
- Return the magic number for devfs.
- In various stats()-related functions, check that we're stating a
file in /dev/pts, and if so, change the st_rdev field to match what linux
expects to be there for a slave pty device. The glibc checks for this, and
their openpty() fails if it is no correct.


154596 20-Jan-2006 ambrisko

Fix the build. When I added the lutimes the futimes definitions
went away in the generated files? This didn't happen on my amd64
test machine but did when I committed it on my other i386 machine.
I need to figure this out since a regen on the amd64 doesn't fix it
now. For now make the build work again. Matt caught this before
my local mirror caught up.


154587 20-Jan-2006 ambrisko

Regen.


154586 20-Jan-2006 ambrisko

Add 32bit version of lutimes so untar doesn't mess up sym-links on amd64.


153775 28-Dec-2005 trhodes

Cast tv_sec to intmax_t and print with %jd in some ifdef'ed code.


153744 27-Dec-2005 glebius

Add \n to log() message.

Submitted by: Stanislaw Halik <weirdo tehran.lain.pl>


153741 26-Dec-2005 sobomax

Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>


153692 23-Dec-2005 ru

Regen.


153691 23-Dec-2005 ru

Fix build.


153681 23-Dec-2005 phk

Regenerate sysent with new abort2 system call.

Implement abort2(const char *reason, int narg, void **args);

Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>


153680 23-Dec-2005 phk

Add missing 455-462 syscalls as unimplemented


153679 23-Dec-2005 phk

Add abort2() systemcall.


153448 15-Dec-2005 jhb

Remove linux_mib_destroy() (which I actually added in between 5.0 and 5.1)
which existed to cleanup the linux_osname mutex. Now that MTX_SYSINIT()
has grown a SYSUNINIT to destroy mutexes on unload, the extra destroy here
was redundant and resulted in panics in debug kernels.

MFC after: 1 week
Reported by: Goran Gajic ggajic at afrodita dot rcub dot bg dot ac dot yu


153378 13-Dec-2005 delphij

In Linux, kernel parameters passed to ioctl are by value, while in FreeBSD
they are passed by reference. Handle the difference within the
linux_ioctl_termio on the LINUX_TCFLSH path.

Submitted by: Jaroslav Drzik <jaro_AT_coop-voz_dot_sk>


153310 11-Dec-2005 mlaier

Fix calculation of meminfo's swaptotal and swapfree on at least amd64.

MFC after: 3 days


153248 08-Dec-2005 ambrisko

Regen for futimes.


153247 08-Dec-2005 ambrisko

Add 32bit version of futimes so untar doesn't result in bad dates
(Jan 1, 1970) when run on amd64.

Reviewed by: ps


153236 08-Dec-2005 glebius

Suppress logging about unimplemented syscalls to one time per process. This
prevents hard flood of the system console.

Reviewed by: bde


153180 06-Dec-2005 peter

Catch up to the system siginfo changes. Use a union for the ia32 layout
of siginfo just like the system one. There are now two fields to copy
instead of one.


153072 04-Dec-2005 ru

Fix -Wundef.


152912 29-Nov-2005 rodrigc

Remove MNT_NODEV mount option. In RELENG_6, MNT_NODEV was a no-op.
The presence of MNT_NODEV was confusing the am-utils autoconf scripts.

PR: conf/79715


152721 23-Nov-2005 wpaul

Somehow memmove() got mapped to memset() in the patch table. Create a
real memmove() implementation and use that instead.


152626 20-Nov-2005 wpaul

Correct the API for Windows interupt handling a little. The prototype
for a Windows ISR is 'BOOLEAN isrfunc(KINTERRUPT *, void *)' meaning
the ISR get a pointer to the interrupt object and a context pointer,
and returns TRUE if the ISR determines the interrupt was really generated
by the associated device, or FALSE if not.

I had mistakenly used 'void isrfunc(void *)' instead. It happens the
only thing this affects is the internal ndis_intr() ISR in subr_ndis.c,
but it should be fixed just in case we ever need to register a real
Windows ISR vi IoConnectInterrupt().

For NDIS miniports that provide a MiniportISR() method, the 'is_our_intr'
value returned by the method serves as the return value from ndis_isr(),
and 'call_isr' is used to decide whether or not to schedule the interrupt
handler via DPC. For drivers that only supply MiniportEnableInterrupt()
and MiniportDisableInterrupt() methods, call_isr is always TRUE and
is_our_intr is always FALSE.

In the end, there should be no functional changes, except that now
ntoskrnl_intr() can terminate early once it finds the ISR that wants
to service the interrupt.


152423 14-Nov-2005 ru

Unlike the rest of the world, NDIS code can access "struct
ifnet" before is has been fully initialized by if_attach().
Account for that to avoid a null pointer dereference.


152399 13-Nov-2005 wpaul

Restore backwards source compatibility with 6.x and 5.x.


152315 11-Nov-2005 ru

- Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.


152257 10-Nov-2005 wpaul

Implement RtlZeroMemory() and RtlCopyMemory(). This seems to allow
the Broadcom Win64 wireless driver for the BCM4318 to work on amd64.


152159 07-Nov-2005 wpaul

Change the definition for EXT_NDIS to EXT_NET_DRV. Since the latest
mbuf code changes, MEXTADD() can be used to add an external buffer with
arbitrary type, but mb_ext_free() won't let you free it.


152136 06-Nov-2005 wpaul

The latest version of the Intel 2200BG/2915ABG driver (9.0.0.3-9) from
Intel's web site requires some minor tweaks to get it to work:

- The driver seems to have been released with full WMI tracing enabled,
and makes references to some WMI APIs, namely IoWMIRegistrationControl(),
WmiQueryTraceInformation() and WmiTraceMessage(). Only the first
one is ever called (during intialization). These have been implemented
as do-nothing stubs for now. Also added a definition for STATUS_NOT_FOUND
to ntoskrnl_var.h, which is used as a return code for one of the WMI
routines.

- The driver references KeRaiseIrqlToDpcLevel() and KeLowerIrql()
(the latter as a function, which is unusual because normally
KeLowerIrql() is a macro in the Windows DDK that calls KfLowewIrql()).
I'm not sure why these are being called since they're not really
part of WDM. Presumeably they're being used for backwards
compatibility with old versions of Windows. These have been
implemented in subr_hal.c. (Note that they're _stdcall routines
instead of _fastcall.)

- When querying the OID_802_11_BSSID_LIST OID to get a BSSID list,
you don't know ahead of time how many networks the NIC has found
during scanning, so you're allowed to pass 0 as the list length.
This should cause the driver to return an 'insufficient resources'
error and set the length to indicate how many bytes are actually
needed. However for some reason, the Intel driver does not honor
this convention: if you give it a length of 0, it returns some
other error and doesn't tell you how much space is really needed.
To get around this, if using a length of 0 yields anything besides
the expected error case, we arbitrarily assume a length of 64K.
This is similar to the hack that wpa_supplicant uses when doing
a BSSID list query.


152134 06-Nov-2005 ps

Copy out the number of iovecs in freebsd32_recvmsg, not the length
of a single iovec.


151980 02-Nov-2005 ps

Calling setrlimit from 32bit apps could potentially increase certain
limits beyond what should be capiable in a 32bit process, so we
must fixup the limits.

Reviewed by: jhb


151977 02-Nov-2005 wpaul

Tests with my dual Opteron system have shown that it's possible
for code to start out on one CPU when thunking into Windows
mode in ctxsw_utow(), and then be pre-empted and migrated to another
CPU before thunking back to UNIX mode in ctxsw_wtou(). This is
bad, because then we can end up looking at the wrong 'thread environment
block' when trying to come back to UNIX mode. To avoid this, we now
pin ourselves to the current CPU when thunking into Windows code.

Few other cleanups, since I'm here:

- Get rid of the ndis_isr(), ndis_enable_interrupt() and
ndis_disable_interrupt() wrappers from kern_ndis.c and just invoke
the miniport's methods directly in the interrupt handling routines
in subr_ndis.c. We may as well lose the function call overhead,
since we don't need to export these things outside of ndis.ko
now anyway.

- Remove call to ndis_enable_interrupt() from ndis_init() in if_ndis.c.
We don't need to do it there anyway (the miniport init routine handles
it, if needed).

- Fix the logic in NdisWriteErrorLogEntry() a little.

- Change some NDIS_STATUS_xxx codes in subr_ntoskrnl.c into STATUS_xxx
codes.

- Handle kthread_create() failure correctly in PsCreateSystemThread().


151967 02-Nov-2005 andre

Retire MT_HEADER mbuf type and change its users to use MT_DATA.

Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it. It only adds a layer of confusion. The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.

Non-native code is not changed in this commit. For compatibility
MT_HEADER is mapped to MT_DATA.

Sponsored by: TCP/IP Optimization Fundraise 2005


151924 01-Nov-2005 wpaul

Clean up one remaining 'multiple DPC thread' bogon: only bzero() one
sizeof(kq_queue), not sizeof(kq_queue) * mp_ncpus.


151909 31-Oct-2005 ps

Reformat socket control messages on input/output for 32bit compatibility
on 64bit systems.

Submitted by: ps, ups
Reviewed by: jhb


151721 26-Oct-2005 peter

Regenerate (with the correct #ifdef COMPAT_43 tests now)


151720 26-Oct-2005 peter

There is no 'freebsd3_' prefix for COMPAT_43 syscalls. Those are all
bundled under MCOMPAT and have an 'o' prefix. Adjust as appropriate.
This re-enables compiling without COMPAT_43 again.


151709 26-Oct-2005 wpaul

Minor nit: in ntoskrnl_finddev(), only free the 'children' device_t
array if device_find_children() actually returned a non-NULL array pointer.


151703 26-Oct-2005 wpaul

Clean up and apply the fix for PR 83477. The calculation for locating
the start of the section headers has to take into account the fact
that the image_nt_header is really variable sized. It happens that
the existing calculation is correct for _most_ production binaries
produced by the Windows DDK, but if we get a binary with oddball
offsets, the PE loader could crash.

Changes from the supplied patch are:

- We don't really need to use the IMAGE_SIZEOF_NT_HEADER() macro when
computing how much of the header to return to callers of
pe_get_optional_header(). While it's important to take the variable
size of the header into account in other calculations, we never
actually look at anything outside the non-variable portion of the
header. This saves callers from having to allocate a variable sized
buffer off the heap (I purposely tried to avoid using malloc()
in subr_pe.c to make it easier to compile in both the -D_KERNEL and
!-D_KERNEL case), and since we're copying into a buffer on the
stack, we always have to copy the same amount of data or else
we'll trash the stack something fierce.

- We need <stddef.h> to get offsetof() in the !-D_KERNEL case.

- ndiscvt.c needs the IMAGE_FIRST_SECTION() macro too, since it does
a little bit of section pre-processing.

PR: kern/83477


151691 26-Oct-2005 wpaul

Get rid of the timer tracking and reaping code in NdisMInitializeTimer()
and ndis_halt_nic(). It's been disabled for some time anyway, and
it turns out there's a possible deadlock in NdisMInitializeTimer() when
acquiring the miniport block lock to modify the timer list: it's
possible for a driver to call NdisMInitializeTimer() when the miniport
block lock has already been acquired by an earlier piece of code. You
can't acquire the same spinlock twice, so this can deadlock.

Also, implement MmMapIoSpace() and MmUnmapIoSpace(), and make
NdisMMapIoSpace() and NdisMUnmapIoSpace() use them. There are some
drivers that want MmMapIoSpace() and MmUnmapIoSpace() so that they can
map arbitrary register spaces not directly associated with their
device resources. For example, there's an Atheros driver for
a miniPci card (0x168C:0x1014) on the IBM Thinkpad x40 that wants
to map some I/O spaces at 0xF00000 and 0xE00000 which are held by
the acpi0 device. I don't know what it wants these ranges for,
but if it can't map and access them, the MiniportInitialize() method
fails.


151606 24-Oct-2005 wpaul

Fix handling of message table messages that got broken when I
converted NdisWriteErrorLogEntry() to use the RtlXXX unicode/ansi
conversion routines.


151597 23-Oct-2005 obrien

Add a 'clean' target.


151583 23-Oct-2005 ps

regen


151582 23-Oct-2005 ps

Implement for FreeBSD 3 32 binaries:
sigaction, sigprocmask, sigpending, sigvec, sigblock, sigsetmask,
sigsuspend, sigstack


151548 22-Oct-2005 wpaul

Make the multiple DPC threads an option, and create only one by default.
This avoids the need for sched_bind() in the default case so that you
can start up the NDIS subsystem at boot time when only CPU 0 is running.

There are potentially ways to fix it so that the DPC threads aren't
started until after the other CPUs are launched, but doing it correctly
is tricky. You need to defer the startup of the ntoskrnl subsystem
(ntoskrnl_libinit()), not just defer ndis_attach().

For now, I don't think it will make much difference having just the
single DPC thread (I started out with just one anyway). Note that this
turns the KeSetTargetProcessorDpc() routine into a no-op, since the
CPU number in struct kdpc is now ignored.


151529 21-Oct-2005 wpaul

Correct the macro definition for KeRaiseIrql(). The official API
is KeRaiseIrql(newirql, &oldirql), not oldirql = KeRaiseIrql(newirql).
(The macro ultimately translates to KfRaiseIrql() which does use
the latter API, so this has no effect on generated code.)

Also, wait for thread termination the right way: kthread_exit()
will ultimately do a wakeup(td->td_proc). This is the event we
should wait on. Eliminate the previous synchronization machinery
for this since it was never guaranteed to work correctly.


151520 20-Oct-2005 wpaul

Use sched_bind() to make sure the DPC threads are bound to the correct
processor, to insure DPC thread 0 runs on CPU0, DPC thread 1 runs on
CPU1, and so on.

Elevate the priority of the workitem threads, though don't use as
high a priority as the DPC threads.


151463 19-Oct-2005 davidxu

Fix compiling problem by adding prefix name svr4 to si_xxx macro, the
si_xxx macro should not be used in compat headers, as these are standard
member names or only can be used in our native header file signal.h.


151451 18-Oct-2005 wpaul

Another round of cleanups and fixes:

- Change ndis_return() from a DPC to a workitem so that it doesn't
run at DISPATCH_LEVEL (with the dispatcher lock held).

- In if_ndis.c, submit packets to the stack via (*ifp->if_input)() in
a workitem instead of doing it directly in ndis_rxeof(), because
ndis_rxeof() runs in a DPC, and hence at DISPATCH_LEVEL. This
implies that the 'dispatch level' mutex for the current CPU is
being held, and we don't want to call if_input while holding
any locks.

- Reimplement IoConnectInterrupt()/IoDisconnectInterrupt(). The original
approach I used to track down the interrupt resource (by scanning
the device tree starting at the nexus) is prone to problems when
two devices share an interrupt. (E.g removing ndis1 might disable
interrupts for ndis0.) The new approach is to multiplex all the
NDIS interrupts through a common internal dispatcher (ntoskrnl_intr())
and allow IoConnectInterrupt()/IoDisconnectInterrupt() to add or
remove interrupts from the dispatch list.

- Implement KeAcquireInterruptSpinLock() and KeReleaseInterruptSpinLock().

- Change the DPC and workitem threads to use the KeXXXSpinLock
API instead of mtx_lock_spin()/mtx_unlock_spin().

- Simplify the NdisXXXPacket routines by creating an actual
packet pool structure and using the InterlockedSList routines
to manage the packet queue.

- Only honor the value returned by OID_GEN_MAXIMUM_SEND_PACKETS
for serialized drivers. For deserialized drivers, we now create
a packet array of 64 entries. (The Microsoft DDK documentation
says that for deserialized miniports, OID_GEN_MAXIMUM_SEND_PACKETS
is ignored, and the driver for the Marvell 8335 chip, which is
a deserialized miniport, returns 1 when queried.)

- Clean up timer handling in subr_ntoskrnl.

- Add the following conditional debugging code:
NTOSKRNL_DEBUG_TIMERS - add debugging and stats for timers
NDIS_DEBUG_PACKETS - add extra sanity checking for NdisXXXPacket API
NTOSKRNL_DEBUG_SPINLOCKS - add test for spinning too long

- In kern_ndis.c, always start the HAL first and shut it down last,
since Windows spinlocks depend on it. Ntoskrnl should similarly be
started second and shut down next to last.


151360 15-Oct-2005 ps

regen after recvmsg, recvfrom, sendmsg


151359 15-Oct-2005 ps

Implement the 32bit versions of recvmsg, recvfrom, sendmsg

Partially obtained from: jhb


151358 15-Oct-2005 ps

regen for clock_gettime, clock_settime, clock_getres


151357 15-Oct-2005 ps

Implement 32bit wrappers for clock_gettime, clock_settime, and
clock_getres.


151356 15-Oct-2005 ps

regen


151355 15-Oct-2005 ps

Correct the prototype for freebsd32_nanosleep and use the proper
size when copying struct timespec32 in and out.


151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


151248 12-Oct-2005 wpaul

Convert ndis_set_info() and ndis_get_info() from using msleep()
to KeSetEvent()/KeWaitForSingleObject(). Also make object argument
of KeWaitForSingleObject() a void * like it's supposed to be.


151207 10-Oct-2005 wpaul

This commit makes a big round of updates and fixes many, many things.

First and most importantly, I threw out the thread priority-twiddling
implementation of KeRaiseIrql()/KeLowerIrq()/KeGetCurrentIrql() in
favor of a new scheme that uses sleep mutexes. The old scheme was
really very naughty and sought to provide the same behavior as
Windows spinlocks (i.e. blocking pre-emption) but in a way that
wouldn't raise the ire of WITNESS. The new scheme represents
'DISPATCH_LEVEL' as the acquisition of a per-cpu sleep mutex. If
a thread on cpu0 acquires the 'dispatcher mutex,' it will block
any other thread on the same processor that tries to acquire it,
in effect only allowing one thread on the processor to be at
'DISPATCH_LEVEL' at any given time. It can then do the 'atomic sit
and spin' routine on the spinlock variable itself. If a thread on
cpu1 wants to acquire the same spinlock, it acquires the 'dispatcher
mutex' for cpu1 and then it too does an atomic sit and spin to try
acquiring the spinlock.

Unlike real spinlocks, this does not disable pre-emption of all
threads on the CPU, but it does put any threads involved with
the NDISulator to sleep, which is just as good for our purposes.

This means I can now play nice with WITNESS, and I can safely do
things like call malloc() when I'm at 'DISPATCH_LEVEL,' which
you're allowed to do in Windows.

Next, I completely re-wrote most of the event/timer/mutex handling
and wait code. KeWaitForSingleObject() and KeWaitForMultipleObjects()
have been re-written to use condition variables instead of msleep().
This allows us to use the Windows convention whereby thread A can
tell thread B "wake up with a boosted priority." (With msleep(), you
instead have thread B saying "when I get woken up, I'll use this
priority here," and thread A can't tell it to do otherwise.) The
new KeWaitForMultipleObjects() has been better tested and better
duplicates the semantics of its Windows counterpart.

I also overhauled the IoQueueWorkItem() API and underlying code.
Like KeInsertQueueDpc(), IoQueueWorkItem() must insure that the
same work item isn't put on the queue twice. ExQueueWorkItem(),
which in my implementation is built on top of IoQueueWorkItem(),
was also modified to perform a similar test.

I renamed the doubly-linked list macros to give them the same names
as their Windows counterparts and fixed RemoveListTail() and
RemoveListHead() so they properly return the removed item.

I also corrected the list handling code in ntoskrnl_dpc_thread()
and ntoskrnl_workitem_thread(). I realized that the original logic
did not correctly handle the case where a DPC callout tries to
queue up another DPC. It works correctly now.

I implemented IoConnectInterrupt() and IoDisconnectInterrupt() and
modified NdisMRegisterInterrupt() and NdisMDisconnectInterrupt() to
use them. I also tried to duplicate the interrupt handling scheme
used in Windows. The interrupt handling is now internal to ndis.ko,
and the ndis_intr() function has been removed from if_ndis.c. (In
the USB case, interrupt handling isn't needed in if_ndis.c anyway.)

NdisMSleep() has been rewritten to use a KeWaitForSingleObject()
and a KeTimer, which is how it works in Windows. (This is mainly
to insure that the NDISulator uses the KeTimer API so I can spot
any problems with it that may arise.)

KeCancelTimer() has been changed so that it only cancels timers, and
does not attempt to cancel a DPC if the timer managed to fire and
queue one up before KeCancelTimer() was called. The Windows DDK
documentation seems to imply that KeCantelTimer() will also call
KeRemoveQueueDpc() if necessary, but it really doesn't.

The KeTimer implementation has been rewritten to use the callout API
directly instead of timeout()/untimeout(). I still cheat a little in
that I have to manage my own small callout timer wheel, but the timer
code works more smoothly now. I discovered a race condition using
timeout()/untimeout() with periodic timers where untimeout() fails
to actually cancel a timer. I don't quite understand where the race
is, using callout_init()/callout_reset()/callout_stop() directly
seems to fix it.

I also discovered and fixed a bug in winx32_wrap.S related to
translating _stdcall calls. There are a couple of routines
(i.e. the 64-bit arithmetic intrinsics in subr_ntoskrnl) that
return 64-bit quantities. On the x86 arch, 64-bit values are
returned in the %eax and %edx registers. However, it happens
that the ctxsw_utow() routine uses %edx as a scratch register,
and x86_stdcall_wrap() and x86_stdcall_call() were only preserving
%eax before branching to ctxsw_utow(). This means %edx was getting
clobbered in some cases. Curiously, the most noticeable effect of this
bug is that the driver for the TI AXC110 chipset would constantly drop
and reacquire its link for no apparent reason. Both %eax and %edx
are preserved on the stack now. The _fastcall and _regparm
wrappers already handled everything correctly.

I changed if_ndis to use IoAllocateWorkItem() and IoQueueWorkItem()
instead of the NdisScheduleWorkItem() API. This is to avoid possible
deadlocks with any drivers that use NdisScheduleWorkItem() themselves.

The unicode/ansi conversion handling code has been cleaned up. The
internal routines have been moved to subr_ntoskrnl and the
RtlXXX routines have been exported so that subr_ndis can call them.
This removes the incestuous relationship between the two modules
regarding this code and fixes the implementation so that it honors
the 'maxlen' fields correctly. (Previously it was possible for
NdisUnicodeStringToAnsiString() to possibly clobber memory it didn't
own, which was causing many mysterious crashes in the Marvell 8335
driver.)

The registry handling code (NdisOpen/Close/ReadConfiguration()) has
been fixed to allocate memory for all the parameters it hands out to
callers and delete whem when NdisCloseConfiguration() is called.
(Previously, it would secretly use a single static buffer.)

I also substantially updated if_ndis so that the source can now be
built on FreeBSD 7, 6 and 5 without any changes. On FreeBSD 5, only
WEP support is enabled. On FreeBSD 6 and 7, WPA-PSK support is enabled.

The original WPA code has been updated to fit in more cleanly with
the net80211 API, and to eleminate the use of magic numbers. The
ndis_80211_setstate() routine now sets a default authmode of OPEN
and initializes the RTS threshold and fragmentation threshold.
The WPA routines were changed so that the authentication mode is
always set first, followed by the cipher. Some drivers depend on
the operations being performed in this order.

I also added passthrough ioctls that allow application code to
directly call the MiniportSetInformation()/MiniportQueryInformation()
methods via ndis_set_info() and ndis_get_info(). The ndis_linksts()
routine also caches the last 4 events signalled by the driver via
NdisMIndicateStatus(), and they can be queried by an application via
a separate ioctl. This is done to allow wpa_supplicant to directly
program the various crypto and key management options in the driver,
allowing things like WPA2 support to work.

Whew.


150883 03-Oct-2005 jhb

Use the constants for the syscall names from syscall.h rather than
hardcoding the numbers for the SYSVIPC syscalls.


150663 28-Sep-2005 rwatson

Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by: bde


150632 27-Sep-2005 peter

Regenerate


150631 27-Sep-2005 peter

Implement 32 bit getcontext/setcontext/swapcontext on amd64. I've added
stubs for ia64 to keep it compiling. These are used by 32 bit apps such
as gdb.


150335 19-Sep-2005 rwatson

Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after: 1 week


149632 30-Aug-2005 andre

Test the mbuf flags against the correct constant. The previous version
worked as intended but only by chance. MT_HEADER == M_PKTHDR == 0x2.


149551 28-Aug-2005 delphij

Fix kernel build.

Reported by: tinderbox


149524 27-Aug-2005 rodrigc

Rewrite linux_ifconf() to be more like ifconf() in net/if.c
so that we do not call uiomove() while IFNET_RLOCK() is held.
This eliminates the witness warning:

Calling uiomove() with the following non-sleepable locks held:
exclusive sleep mutex ifnet r = 0 (0xc096dd60) locked @
/usr/src/sys/modules/linux/../../compat/linux/linux_ioctl.c:2170

MFC after: 2 days


148887 09-Aug-2005 rwatson

Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by: pjd, bz
MFC after: 7 days


148541 29-Jul-2005 jhb

Add missing dependencies on the SYSVIPC modules.


148540 29-Jul-2005 jhb

Move MODULE_DEPEND() statements for SYSVIPC dependencies to linux_ipc.c
so that they aren't duplicated 3 times and are also in the same file as
the code that depends on the SYSVIPC modules.


147975 13-Jul-2005 jhb

Regen.


147974 13-Jul-2005 jhb

Make a pass through all the compat ABIs sychronizing the MP safe flags
with the master syscall table as well as marking several ABI wrapper
functions safe.

MFC after: 1 week


147966 13-Jul-2005 jhb

Regen.


147965 13-Jul-2005 jhb

- Stop hardcoding #define's for options and use the appropriate
opt_foo.h headers instead.
- Hook up the IPC SVR4 syscalls.

MFC after: 3 days


147964 13-Jul-2005 jhb

Wrap the ia64-specific freebsd32_mmap_partial() hack in Giant for now
since it calls into VFS and VM. This makes the freebsd32_mmap() routine
MP safe and the extra Giants here can be revisited later.

Glanced at by: marcel
MFC after: 3 days


147854 09-Jul-2005 jhb

Add Giant around linux_getcwd_common() in linux_getcwd().

Approved by: re (scottl)


147853 09-Jul-2005 jhb

Add missing locking to linux_connect() so that it can be marked MP safe:
- Conditionally grab Giant around the EISCONN hack at the end based on
debug.mpsafenet.
- Protect access to so_emuldata via SOCK_LOCK.

Reviewed by: rwatson
Approved by: re (scottl)


147837 08-Jul-2005 rik

Use implicit type cast for ->k_lock to fix compilation of ndis
as a part of the GENERIC kernel with INVARIANT* and WITNESS*
turned off.
(For non GENERIC kernel KTR and MUTEX_PROFILING should be also
off).

Submitted by: Eygene A. Ryabinkin <rea at rea dot mbslab dot kiae dot ru>
Approved by: re (scottl)
PR: 81767


147819 07-Jul-2005 jhb

Lock Giant in svr4_add_socket() so that the various svr4_*stat() calls
can be marked MP safe as this is the only part of them that is not
already MP safe.

Approved by: re (scottl)


147818 07-Jul-2005 jhb

Remove an unused syscallarg() macro leftover from this code's origins in
NetBSD.

Approved by: re (scottl)


147817 07-Jul-2005 jhb

Rototill this file so that it actually compiles. It doesn't do anything
in the build still due to some #undef's in svr4.h, but if you hack around
that and add some missing entries to syscalls.master, then this file will
now compile. The changes involved proc -> thread, using FreeBSD syscall
names instead of NetBSD, and axeing syscallarg() and retval arguments.

Approved by: re (scottl)


147816 07-Jul-2005 jhb

Fix the computation of uptime for linux_sysinfo(). Before it was returning
the uptime in seconds mod 60 which wasn't very useful.

Approved by: re (scottl)


147814 07-Jul-2005 jhb

Regenerate.

Approved by: re (scottl)


147813 07-Jul-2005 jhb

- Add two new system calls: preadv() and pwritev() which are like readv()
and writev() except that they take an additional offset argument and do
not change the current file position. In SAT speak:
preadv:readv::pread:read and pwritev:writev::pwrite:write.
- Try to reduce code duplication some by merging most of the old
kern_foov() and dofilefoo() functions into new dofilefoo() functions
that are called by kern_foov() and kern_pfoov(). The non-v functions
now all generate a simple uio on the stack from the passed in arguments
and then call kern_foov(). For example, read() now just builds a uio and
calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().

PR: kern/80362
Submitted by: Marc Olzheim marcolz at stack dot nl (1)
Approved by: re (scottl)
MFC after: 1 week


147692 30-Jun-2005 peter

Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
sscanf fails. They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note
that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet. We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by: re


147654 29-Jun-2005 jhb

- Change the commented out freebsd32_xxx() example to use kern_xxx() along
with a single copyin() + translate and translate + copyout() rather than
using the stackgap.
- Remove implementation of the stackgap for freebsd32 since it is no longer
used for that compat ABI.

Approved by: re (scottl)


147588 24-Jun-2005 jhb

Correct the amount of data to allocate in these local copies of
exec_copyin_strings() to catch up to rev 1.266 of kern_exec.c. This fixes
panics on amd64 with compat binaries since exec_free_args() was freeing
more memory than these functions were allocating and the mismatch could
cause memory to be freed out from under other concurrent execs.

Approved by: re (scottl)


147559 23-Jun-2005 pjd

Actually only protect mount-point if security.jail.enforce_statfs is set to 2.
If we don't return statistics about requested file systems, system tools
may not work correctly or at all.

Approved by: re (scottl)


147302 11-Jun-2005 pjd

Do not allocate memory based on not-checked argument from userland.
It can be used to panic the kernel by giving too big value.
Fix it by moving allocation and size verification into kern_getfsstat().
This even simplifies kern_getfsstat() consumers, but destroys symmetry -
memory is allocated inside kern_getfsstat(), but has to be freed by the
caller.

Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by: Peter Holm <peter@holm.cc>


147256 10-Jun-2005 brooks

Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.

Reviewed by: sobomax, sam


147185 09-Jun-2005 pjd

Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value policy
0 show all mount-points without any restrictions
1 show only mount-points below jail's chroot and show only part of the
mount-point's path (if jail's chroot directory is /jails/foo and
mount-point is /jails/foo/usr/home only /usr/home will be shown)
2 show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with: rwatson


147178 09-Jun-2005 pjd

Avoid code duplication in serval places by introducing universal
kern_getfsstat() function.

Obtained from: jhb


147141 08-Jun-2005 sobomax

Properly convert FreeBSD priority values into Linux values in the
getpriority(2) syscall.

PR: kern/81951
Submitted by: Andriy Gapon <avg@icyb.net.ua>


146950 03-Jun-2005 ps

Wrap copyin/copyout for kevent so the 32bit wrapper does not have
to malloc nchanges * sizeof(struct kevent) AND/OR nevents *
sizeof(struct kevent) on every syscall.

Glanced at by: peter, jmg
Obtained from: Yahoo!
MFC after: 2 weeks


146807 30-May-2005 rwatson

Rebuild generated system call definition files following the addition of
the audit event field to the syscalls.master file format.

Submitted by: wsalamon
Obtained from: TrustedBSD Project


146806 30-May-2005 rwatson

Introduce a new field in the syscalls.master file format to hold the
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used. The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.

Submitted by: wsalamon
Obtained from: TrustedBSD Project


146734 29-May-2005 nyan

Remove bus_{mem,p}io.h and related code for a micro-optimization on i386
and amd64. The optimization is a trivial on recent machines.

Reviewed by: -arch (imp, marcel, dfr)


146695 27-May-2005 pjd

Remove (now) unused argument 'td' from bsd_to_linux_statfs().


146583 24-May-2005 ps

Copyout to userland if kern_sigaction succeeds


146505 22-May-2005 pjd

The code is under '#ifdef not_that_way', but anyway:

- Add missing prison_check_mount() check.


146502 22-May-2005 pjd

If we need to hide fsid, kern_statfs()/kern_fstatfs() will do it for us,
so do not duplicate the code in cvtstatfs().
Note, that we now need to clear fsid in freebsd4_getfsstat().

This moves all security related checks from functions like cvtstatfs()
and will allow to add more security related stuff (like statfs(2), etc.
protection for jails) a bit easier.


146428 20-May-2005 wpaul

Missed kern_windrv.c in the last checkin.


146427 20-May-2005 wpaul

Deal with a few bootstrap issues:

We can't call KeFlushQueuedDpcs() during bootstrap (cold == 1), since
the flush operation sleeps to wait for completion, and we can't sleep
here (clowns will eat us).

On an i386 SMP system, if we're loaded/probed/attached during bootstrap,
smp_rendezvous() won't run us anywhere except CPU 0 (since the other CPUs
aren't launched until later), which means we won't be able to set up
the GDTs anywhere except CPU 0. To deal with this case, ctxsw_utow()
now checks to see if the TID for the current processor has been properly
initialized and sets up the GTD for the current CPU if not.

Lastly, in if_ndis.c:ndis_shutdown(), do an ndis_stop() to insure we
really halt the NIC and stop interrupts from happening.

Note that loading a driver during bootstrap is, unfortunately, kind of
a hit or miss sort of proposition. In Windows, the expectation is that
by the time a given driver's MiniportInitialize() method is called,
the system is already in 'multiuser' state, i.e. it's up and running
enough to support all the stuff specified in the NDIS API, which includes
the underlying OS-supplied facilities it implicitly depends on, such as
having all CPUs running, having the DPC queues initialized, WorkItem
threads running, etc. But in UNIX, a lot of that stuff won't work during
bootstrap. This causes a problem since we need to call MiniportInitialize()
at least once during ndis_attach() in order to find out what kind of NIC
we have and learn its station address.

What this means is that some cards just plain won't work right if
you try to pre-load the driver along with the kernel: they'll only be
probed/attach correctly if the driver is kldloaded _after_ the system
has reached multiuser. I can't really think of a way around this that
would still preserve the ability to use an NDIS device for diskless
booting.


146424 20-May-2005 wpaul

In ndis_halt_nic(), invalidate the miniportadapterctx early to try and
prevent anything from making calls to the NIC while it's being shut down.
This is yet another attempt to stop things like mdnsd from trying to
poke at the card while it's not properly initialized and panicking
the system.

Also, remove unneeded debug message from if_ndis.c.


146364 19-May-2005 wpaul

Fix some of the things I broke so that the SMC2602W (AMD Am1772) driver
works again.

This driver uses NdisScheduleWorkItem(), and we have to take special steps
to insure that its workitems don't collide with any of the other workitems
used by the NDISulator. In particular, if one of the driver's work jobs
blocks, it can prevent NdisMAllocateSharedMemoryAsync() from completing
when expected.

The original hack to fix this was to have NdisMAllocateSharedMemoryAsync()
defer its work to the DPC queue instead of the general task queue. To
fix it now, I decided to add some additional workitem threads. (There's
supposed to be a pool of worker threads in Windows anyway.) Currently,
there are 4. There should be at least 2. One is reserved for the legacy
ExQueueWorkItem() API, while the others are used in round-robin by the
IoQueueWorkItem() API. NdisMAllocateSharedMemoryAsync() uses the latter
API while NdisScheduleWorkItem() uses the former, so the deadlock is
avoided.

Fixed NdisMRegisterDevice()/NdisMDeregisterDevice() to work a little
more sensibly with the new driver_object/device_object framework. It
doesn't really register a working user-mode interface, but the existing
code was completely wrong for the new framework.

Fixed a couple of bugs dealing with the cancellation of events and
DPCs. When cancelling an event that's still on the timer queue (i.e.
hasn't expired yet), reset dh_inserted in its dispatch header to FALSE.
Previously, it was left set to TRUE, which would make a cancelled
timer appear to have not been cancelled. Also, when removing a DPC
from a queue, reset its list pointers, otherwise a cancelled DPC
might mistakenly be treated as still pending.

Lastly, fix the behavior of ntoskrnl_wakeup() when dealing with
objects that have nobody waiting on them: sync event objects get
their signalled state reset to FALSE, but notification objects
should still be set to TRUE.


146274 16-May-2005 wpaul

Remove harmless bit of leftover debug code.


146273 16-May-2005 wpaul

Correct some problems with workitem usage. NdisScheduleWorkItem() does
not use exactly the same workitem sturcture as ExQueueWorkItem() like
I originally thought it did.


146230 15-May-2005 wpaul

Add support for NdisMEthIndicateReceive() and MiniportTransferData().
The Ralink RT2500 driver uses this API instead of NdisMIndicateReceivePacket().

Drivers use NdisMEthIndicateReceive() when they know they support
802.3 media and expect to hand their packets only protocols that want
to deal with that particular media type. With this API, the driver does
not manage its own NDIS_PACKET/NDIS_BUFFER structures. Instead, it
lets bound protocols have a peek at the data, and then they supply
an NDIS_PACKET/NDIS_BUFFER combo to the miniport driver, into which
it copies the packet data.

Drivers use NdisMIndicateReceivePacket() to allow their packets to
be read by any protocol, not just those bound to 802.3 media devices.

To make this work, we need an internal pool of NDIS_PACKETS for
receives. Currently, we check to see if the driver exports a
MiniportTransferData() method in its characteristics structure,
and only allocate the pool for drivers that have this method.

This should allow the RT2500 driver to work correctly, though I
still have to fix ndiscvt(8) to parse its .inf file properly.

Also, change kern_ndis.c:ndis_halt_nic() to reap timers before
acquiring NDIS_LOCK(), since the reaping process might entail sleeping
briefly (and we can't sleep with a lock held).


146016 08-May-2005 wpaul

More fixes for multibus drivers. When calling out to the match
function in if_ndis_pci.c and if_ndis_pccard.c, provide the bustype
too so the stubs can ignore devlists that don't concern them.


146015 08-May-2005 wpaul

Fix support for Windows drivers that support both PCI and PCMCIA devices at
the same time.

Fix if_ndis_pccard.c so that it sets sc->ndis_dobj and sc->ndis_regvals.

Correct IMPORT_SFUNC() macros for the READ_PORT_BUFFER_xxx() routines,
which take 3 arguments, not 2.

This fixes it so that the Windows driver for my Cisco Aironet 340 PCMCIA
card works again. (Yes, I know the an(4) driver supports this card natively,
but it's the only PCMCIA device I have with a Windows XP driver.)


145999 08-May-2005 wpaul

Correct the patch table entries for the 64-bit intrinsic math
routines (_alldiv(), _allmul(), _alludiv(), _aullmul(), etc...)
that use the _stdcall calling convention.

These routines all take two arguments, but the arguments are 64 bits wide.
On the i386 this means they each consume two 32-bit slots on the stack.
Consequently, when we specify the argument count in the IMPORT_SFUNC()
macro, we have to lie and claim there are 4 arguments instead of two.
This will cause the resulting i386 assembly wrapper to push the right
number of longwords onto the stack.

This fixes a crash I discovered with the RealTek 8180 driver, which
uses these routines a lot during initialization.


145935 05-May-2005 wpaul

Cast 64 bit quantity to uintmax_t to print it with %jx. This is
technically a no-op since uintmax_t is uint64_t on all currently
supported architectures, but we should use an explicit cast instead
of depending on this obscure coincidence.


145906 05-May-2005 wpaul

Use %jx instead of %qx to silence compiler warning on amd64.


145898 05-May-2005 wpaul

Avoid sleeping with mutex held in kern_ndis.c.

Remove unused fields from ndis_miniport_block.

Fix a bug in KeFlushQueuedDpcs() (we weren't calculating the kq pointer
correctly).

In if_ndis.c, clear the IFF_RUNNING flag before calling ndis_halt_nic().

Add some guards in kern_ndis.c to avoid letting anyone invoke ndis_get_info()
or ndis_set_info() if the NIC isn't fully initialized. Apparently, mdnsd
will sometimes try to invoke the ndis_ioctl() routine at exactly the
wrong moment (to futz with its multicast filters) when the interface
comes up, and can trigger a crash unless we guard against it.


145896 05-May-2005 wpaul

Remove extranaous free() of ASCII filename from NdisOpenFile().

Oh, one additional change I forgot to mention in the last commit:
NdisOpenFile() was broken in the case for firmware files that were
pre-loaded as modules. When searching for the module in NdisOpenFile(),
we would match against a symbol name, which would contain the string
we were looking for, then save a pointer to the linker file handle.
Later, in NdisMapFile(), we would refer to the filename hung off
this handle when trying to find the starting address symbol. Only
problem is, this filename is different from the embedded symbol
name we're searching for, so the mapping would fail. I found this
problem while testing the AirGo driver, which requires a small
firmware file.


145895 05-May-2005 wpaul

This commit makes a bunch of changes, some big, some not so big.

- Remove the old task threads from kern_ndis.c and reimplement them in
subr_ntoskrnl.c, in order to more properly emulate the Windows DPC
API. Each CPU gets its own DPC queue/thread, and each queue can
have low, medium and high importance DPCs. New APIs implemented:
KeSetTargetProcessorDpc(), KeSetImportanceDpc() and KeFlushQueuedDpcs().
(This is the biggest change.)

- Fix a bug in NdisMInitializeTimer(): the k_dpc pointer in the
nmt_timer embedded in the ndis_miniport_timer struct must be set
to point to the DPC, also embedded in the struct. Failing to do
this breaks dequeueing of DPCs submitted via timers, and in turn
breaks cancelling timers.

- Fix a bug in KeCancelTimer(): if the timer is interted in the timer
queue (i.e. the timeout callback is still pending), we have to both
untimeout() the timer _and_ call KeRemoveQueueDpc() to nuke the DPC
that might be pending. Failing to do this breaks cancellation of
periodic timers, which always appear to be inserted in the timer queue.

- Make use of the nmt_nexttimer field in ndis_miniport_timer: keep a
queue of pending timers and cancel them all in ndis_halt_nic(), prior
to calling MiniportHalt(). Also call KeFlushQueuedDpcs() to make sure
any DPCs queued by the timers have expired.

- Modify NdisMAllocateSharedMemory() and NdisMFreeSharedMemory() to keep
track of both the virtual and physical addresses of the shared memory
buffers that get handed out. The AirGo MIMO driver appears to have a bug
in it: for one of the segments is allocates, it returns the wrong
virtual address. This would confuse NdisMFreeSharedMemory() and cause
a crash. Why it doesn't crash Windows too I have no idea (from reading
the documentation for NdisMFreeSharedMemory(), it appears to be a violation
of the API).

- Implement strstr(), strchr() and MmIsAddressValid().

- Implement IoAllocateWorkItem(), IoFreeWorkItem(), IoQueueWorkItem() and
ExQueueWorkItem(). (This is the second biggest change.)

- Make NdisScheduleWorkItem() call ExQueueWorkItem(). (Note that the
ExQueueWorkItem() API is deprecated by Microsoft, but NDIS still uses
it, since NdisScheduleWorkItem() is incompatible with the IoXXXWorkItem()
API.)

- Change if_ndis.c to use the NdisScheduleWorkItem() interface for scheduling
tasks.

With all these changes and fixes, the AirGo MIMO driver for the Belkin
F5D8010 Pre-N card now works. Special thanks to Paul Robinson
(paul dawt robinson at pwermedia dawt net) for the loan of a card
for testing.


145584 27-Apr-2005 jeff

- Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.


145485 24-Apr-2005 wpaul

Throw the switch on the new driver generation/loading mechanism. From
here on in, if_ndis.ko will be pre-built as a module, and can be built
into a static kernel (though it's not part of GENERIC). Drivers are
created using the new ndisgen(8) script, which uses ndiscvt(8) under
the covers, along with a few other tools. The result is a driver module
that can be kldloaded into the kernel.

A driver with foo.inf and foo.sys files will be converted into
foo_sys.ko (and foo_sys.o, for those who want/need to make static
kernels). This module contains all of the necessary info from the
.INF file and the driver binary image, converted into an ELF module.
You can kldload this module (or add it to /boot/loader.conf) to have
it loaded automatically. Any required firmware files can be bundled
into the module as well (or converted/loaded separately).

Also, add a workaround for a problem in NdisMSleep(). During system
bootstrap (cold == 1), msleep() always returns 0 without actually
sleeping. The Intel 2200BG driver uses NdisMSleep() to wait for
the NIC's firmware to come to life, and fails to load if NdisMSleep()
doesn't actually delay. As a workaround, if msleep() (and hence
ndis_thsuspend()) returns 0, use a hard DELAY() to sleep instead).
This is not really the right thing to do, but we can't really do much
else. At the very least, this makes the Intel driver happy.

There are probably other drivers that fail in this way during bootstrap.
Unfortunately, the only workaround for those is to avoid pre-loading
them and kldload them once the system is running instead.


145205 17-Apr-2005 wpaul

Now that the GDT has been reorganized and GNDIS_SEL has been reserved
for us, use it if it's available, otherwise default to using slot 7
as before.


145133 16-Apr-2005 wpaul

When setting up the new stack for a function in x86_64_wrap(), make
sure to make it 16-byte aligned, in keeping with amd64 calling
convention requirements.

Submitted by: Mikore Li at sun dot com


145006 13-Apr-2005 jeff

- Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details.

Sponsored by: Isilon Systems, Inc.


144988 13-Apr-2005 mdodd

Implement SOUND_MIXER_INFO ioctl in compat layer.


144987 13-Apr-2005 mdodd

Add support for O_NOFOLLOW and O_DIRECT to Linux fcntl() F_GETFL/F_SETFL.


144913 11-Apr-2005 wpaul

In winx32_wrap.S, preserve return values in the fastcall and regparm
wrappers by pushing them onto the stack rather than keeping them in %esi
and %edi.


144888 11-Apr-2005 wpaul

Create new i386 windows/bsd thunking layer, similar to the amd64 thunking
layer, but with a twist.

The twist has to do with the fact that Microsoft supports structured
exception handling in kernel mode. On the i386 arch, exception handling
is implemented by hanging an exception registration list off the
Thread Environment Block (TEB), and the TEB is accessed via the %fs
register. The problem is, we use %fs as a pointer to the pcpu stucture,
which means any driver that tries to write through %fs:0 will overwrite
the curthread pointer and make a serious mess of things.

To get around this, Project Evil now creates a special entry in
the GDT on each processor. When we call into Windows code, a context
switch routine will fix up %fs so it points to our new descriptor,
which in turn points to a fake TEB. When the Windows code returns,
or calls out to an external routine, we swap %fs back again. Currently,
Project Evil makes use of GDT slot 7, which is all 0s by default.
I fully expect someone to jump up and say I can't do that, but I
couldn't find any code that makes use of this entry anywhere. Sadly,
this was the only method I could come up with that worked on both
UP and SMP. (Modifying the LDT works on UP, but becomes incredibly
complicated on SMP.) If necessary, the context switching stuff can
be yanked out while preserving the convention calling wrappers.

(Fortunately, it looks like Microsoft uses some special epilog/prolog
code on amd64 to implement exception handling, so the same nastiness
won't be necessary on that arch.)

The advantages are:

- Any driver that uses %fs as though it were a TEB pointer won't
clobber pcpu.
- All the __stdcall/__fastcall/__regparm stuff that's specific to
gcc goes away.

Also, while I'm here, switch NdisGetSystemUpTime() back to using
nanouptime() again. It turns out nanouptime() is way more accurate
than just using ticks(). On slower machines, the Atheros drivers
I tested seem to take a long time to associate due to the loss
in accuracy.


144689 05-Apr-2005 peter

Fix 32 bit signals on amd64. It turns out that I was sign extending
the register values coming back from sigreturn(2). Normally this wouldn't
matter because the 32 bit environment would truncate the upper 32 bits
and re-save the truncated values at the next trap. However, if we got
a fast second signal and it was pending while we were returning from
sigreturn(2) in the signal trampoline, we'd never have had a chance to
truncate the bogus values in 32 bit mode, and the new sendsig would get
an EFAULT when trying to write to the bogus user stack address.


144501 01-Apr-2005 jhb

- Change the vm_mmap() function to accept an objtype_t parameter specifying
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
vnodes and anonymous memory. Note that mmaping a cdev directly does not
currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
cdev the ioctl is acting on rather than trying to find a suitable vnode
to map from.

Reviewed by: alc, arch@


144495 01-Apr-2005 wpaul

Fix another KeInitializeDpc()/amd64 calling convention issue:
ndis_intrhand() has to be wrapped for the same reason as ndis_timercall().


144450 31-Mar-2005 jhb

- Use a custom version of copyinuio() to implement readv/writev using
kern_readv/writev.
- Use kern_settimeofday() and kern_adjtime() rather than stackgapping it.


144428 31-Mar-2005 wpaul

Apparently I'm cursed. ndis_findwrap() should be searching ndis_functbl,
not ntoskrnl_functbl.


144402 31-Mar-2005 wpaul

Fix an amd64 issue I overlooked. When setting up a callout to
ndis_timercall() in NdisMInitializeTimer(), we can't use the raw
function pointer. This is because ntoskrnl_run_dpc() expects to
invoke a function with Microsoft calling conventions. On i386,
this works because ndis_timercall() is declared with the __stdcall
attribute, but this is a no-op on amd64. To do it correctly, we
have to generate a wrapper for ndis_timercall() and us the wrapper
instead of of the raw function pointer.

Fix this by adding ndis_timercall() to the funcptr table in subr_ndis.c,
and create ndis_findwrap() to extract the wrapped function from the
table in NdisMInitializeTimer() instead of just passing ndis_timercall()
to KeInitializeDpc() directly.


144342 30-Mar-2005 wpaul

Fix a possible mutex leak in KeSetTimerEx(): if timer is NULL, we
bail out without releasing the dispatcher lock. Move the lock acquisition
after the pointer test to avoid this.


144317 30-Mar-2005 wpaul

Remove a couple of #ifdef 0'ed code blocks left over from Atheros debugging.

Remember to reset ndis_pendingreq to NULL when bailing out of
ndis_set_info() or ndis_get_info() due to miniportadapterctx not
being set.


144290 29-Mar-2005 jeff

- Initial cn_lkflags to LK_EXCLUSIVE.

Sponsored by: Isilon Systems, Inc.


144256 28-Mar-2005 wpaul

The filehandle allocated in NdisOpenFile() is allocated using
ExAllocatePoolWithTag(), not malloc(), so it should be released
with ExFreePool(), not free(). Fix a couple if instances of
free(fh, ...) that got overlooked.


144254 28-Mar-2005 wpaul

Another Coverity fix from Sam: add NULL pointer test in
NdisMFreeSharedMemory() (if the list is already empty, just bail).


144253 28-Mar-2005 wpaul

More additions for amd64:

- On amd64, InterlockedPushEntrySList() and InterlockedPopEntrySList()
are mapped to ExpInterlockedPushEntrySList and
ExpInterlockedPopEntrySList() via macros (which do the same thing).
Add IMPORT_FUNC_MAP()s for these.

- Implement ExQueryDepthSList().


144251 28-Mar-2005 wpaul

Fix resource leak found by Coverity (via Sam Leffler).


144250 28-Mar-2005 wpaul

Fix for amd64.


144248 28-Mar-2005 wpaul

Fix another amd64 issue with lookaside lists: we initialize the
alloc and free routine pointers in the lookaside list with pointers
to ExAllocatePoolWithTag() and ExFreePool() (in the case where the
driver does not provide its own alloc and free routines). For amd64,
this is wrong: we have to use pointers to the wrapped versions of these
functions, not the originals.


144240 28-Mar-2005 wpaul

Tweak to hopefully make lookaside lists work on amd64: in Windows, the
nll_obsoletelock field in the lookaside list structure is only defined
for the i386 arch. For amd64, the field is gone, and different list
update routines are used which do their locking internally. Apparently
the Inprocomm amd64 driver uses lookaside lists. I'm not positive this
will make it work yet since I don't have an Inprocomm NIC to test, but
this needs to be fixed anyway.


144239 28-Mar-2005 wpaul

Spell '0' as 'FALSE' when initializing npp_validcounts. (Doesn't change
the code, but emphasises that this field is used as a boolean.)


144238 28-Mar-2005 wpaul

Unbreak the build: correct the resource list traversal code for
__FreeBSD_version >= 600022.


144176 27-Mar-2005 wpaul

Argh. PCI resource list became an STAILQ instead of an SLIST. Try to
deal with this while maintaining backards source compatibility with
stable.


144175 27-Mar-2005 wpaul

Check in ntoskrnl_var.h, which should have been included in the
previous commit.


144174 27-Mar-2005 wpaul

Finally bring an end to the great "make the Atheros NDIS driver
work on SMP" saga. After several weeks and much gnashing of teeth,
I have finally tracked down all the problems, despite their best
efforts to confound and annoy me.

Problem nunmber one: the Atheros windows driver is _NOT_ a de-serialized
miniport! It used to be that NDIS drivers relied on the NDIS library
itself for all their locking and serialization needs. Transmit packet
queues were all handled internally by NDIS, and all calls to
MiniportXXX() routines were guaranteed to be appropriately serialized.
This proved to be a performance problem however, and Microsoft
introduced de-serialized miniports with the NDIS 5.x spec. Microsoft
still supports serialized miniports, but recommends that all new drivers
written for Windows XP and later be deserialized. Apparently Atheros
wasn't listening when they said this.

This means (among other things) that we have to serialize calls to
MiniportSendPackets(). We also have to serialize calls to MiniportTimer()
that are triggered via the NdisMInitializeTimer() routine. It finally
dawned on me why NdisMInitializeTimer() takes a special
NDIS_MINIPORT_TIMER structure and a pointer to the miniport block:
the timer callback must be serialized, and it's only by saving the
miniport block handle that we can get access to the serialization
lock during the timer callback.

Problem number two: haunted hardware. The thing that was _really_
driving me absolutely bonkers for the longest time is that, for some
reason I couldn't understand, my test machine would occasionally freeze
or more frustratingly, reset completely. That's reset and in *pow!*
back to the BIOS startup. No panic, no crashdump, just a reset. This
appeared to happen most often when MiniportReset() was called. (As
to why MiniportReset() was being called, see problem three below.)
I thought maybe I had created some sort of horrible deadlock
condition in the process of adding the serialization, but after three
weeks, at least 6 different locking implementations and heroic efforts
to debug the spinlock code, the machine still kept resetting. Finally,
I started single stepping through the MiniportReset() routine in
the driver using the kernel debugger, and this ultimately led me to
the source of the problem.

One of the last things the Atheros MiniportReset() routine does is
call NdisReadPciSlotInformation() several times to inspect a portion
of the device's PCI config space. It reads the same chunk of config
space repeatedly, in rapid succession. Presumeably, it's polling
the hardware for some sort of event. The reset occurs partway through
this process. I discovered that when I single-stepped through this
portion of the routine, the reset didn't occur. So I inserted a 1
microsecond delay into the read loop in NdisReadPciSlotInformation().
Suddenly, the reset was gone!!

I'm still very puzzled by the whole thing. What I suspect is happening
is that reading the PCI config space so quickly is causing a severe
PCI bus error. My test system is a Sun w2100z dual Opteron system,
and the NIC is a miniPCI card mounted in a miniPCI-to-PCI carrier card,
plugged into a 100Mhz PCI slot. It's possible that this combination of
hardware causes a bus protocol violation in this scenario which leads
to a fatal machine check. This is pure speculation though. Really all I
know for sure is that inserting the delay makes the problem go away.
(To quote Homer Simpson: "I don't know how it works, but fire makes
it good!")

Problem number three: NdisAllocatePacket() needs to make sure to
initialize the npp_validcounts field in the 'private' section of
the NDIS_PACKET structure. The reason if_ndis was calling the
MiniportReset() routine in the first place is that packet transmits
were sometimes hanging. When sending a packet, an NDIS driver will
call NdisQueryPacket() to learn how many physical buffers the packet
resides in. NdisQueryPacket() is actually a macro, which traverses
the NDIS_BUFFER list attached to the NDIS_PACKET and stashes some
of the results in the 'private' section of the NDIS_PACKET. It also
sets the npp_validcounts field to TRUE To indicate that the results are
now valid. The problem is, now that if_ndis creates a pool of transmit
packets via NdisAllocatePacketPool(), it's important that each time
a new packet is allocated via NdisAllocatePacket() that validcounts
be initialized to FALSE. If it isn't, and a previously transmitted
NDIS_PACKET is pulled out of the pool, it may contain stale data
from a previous transmission which won't get updated by NdisQueryPacket().
This would cause the driver to miscompute the number of fragments
for a given packet, and botch the transmission.

Fixing these three problems seems to make the Atheros driver happy
on SMP, which hopefully means other serialized miniports will be
happy too.

And there was much rejoicing.

Other stuff fixed along the way:

- Modified ndis_thsuspend() to take a mutex as an argument. This
allows KeWaitForSingleObject() and KeWaitForMultipleObjects() to
avoid any possible race conditions with other routines that
use the dispatcher lock.

- Fixed KeCancelTimer() so that it returns the correct value for
'pending' according to the Microsoft documentation

- Modfied NdisGetSystemUpTime() to use ticks and hz rather than
calling nanouptime(). Also added comment that this routine wraps
after 49.7 days.

- Added macros for KeAcquireSpinLock()/KeReleaseSpinLock() to hide
all the MSCALL() goop.

- For x86, KeAcquireSpinLockRaiseToDpc() needs to be a separate
function. This is because it's supposed to be _stdcall on the x86
arch, whereas KeAcquireSpinLock() is supposed to be _fastcall.
On amd64, all routines use the same calling convention so we can
just map KeAcquireSpinLockRaiseToDpc() directly to KfAcquireSpinLock()
and it will work. (The _fastcall attribute is a no-op on amd64.)

- Implement and use IoInitializeDpcRequest() and IoRequestDpc() (they're
just macros) and use them for interrupt handling. This allows us to
move the ndis_intrtask() routine from if_ndis.c to kern_ndis.c.

- Fix the MmInitializeMdl() macro so that is uses sizeof(vm_offset_t)
when computing mdl_size instead of uint32_t, so that it matches the
MmSizeOfMdl() routine.

- Change a could of M_WAITOKs to M_NOWAITs in the unicode routines in
subr_ndis.c.

- Use the dispatcher lock a little more consistently in subr_ntoskrnl.c.

- Get rid of the "wait for link event" hack in ndis_init(). Now that
I fixed NdisReadPciSlotInformation(), it seems I don't need it anymore.
This should fix the witness panic a couple of people have reported.

- Use MSCALL1() when calling the MiniportHangCheck() function in
ndis_ticktask(). I accidentally missed this one when adding the
wrapping for amd64.


144075 24-Mar-2005 brooks

Use the CTASSERT() macro instead of rolling my own, non-portable one
using #error.

Suggested by: jhb


144070 24-Mar-2005 brooks

Compile errors are way more useful then panics later.

Replace a KASSERT of LINUX_IFNAMSIZ == IFNAMSIZ with a preprocessor
check and #error message. This will prevent nasty suprises if users
change IFNAMSIZ without updating the linux code appropriatly.


144014 23-Mar-2005 das

Bounds check the user-supplied length used in a copyout() in
svr4_do_getmsg(). In principle this bug could disclose data from
kernel memory, but in practice, the SVR4 emulation layer is probably
not functional enough to cause the relevant code path to be executed.
In any case, the emulator has been disconnected from the build since
5.0-RELEASE.

Found by: Coverity Prevent analysis tool


144012 23-Mar-2005 das

Reject packets larger than IP_MAXPACKET in linux_sendto() for sockets
with the IP_HDRINCL option set. Without this change, a Linux process
with access to a raw socket could cause a kernel panic. Raw sockets
must be created by root, and are generally not consigned to untrusted
applications; hence, the security implications of this bug are
minimal. I believe this only affects 6-CURRENT on or after 2005-01-30.

Found by: Coverity Prevent analysis tool
Security: Local DOS


143801 18-Mar-2005 phk

s/SLIST/STAILQ/
/imp/a\
pointy hat
.


143635 15-Mar-2005 phk

Neuter the duplicated disk-device magic code for now. Somebody with
serious linux-clue is necessary to fix this properly.


143295 08-Mar-2005 sobomax

Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.

Suggested by: alfred


143233 07-Mar-2005 sobomax

Handle MSG_NOSIGNAL flag in linux_send() by setting SO_NOSIGPIPE on socket
for the duration of the send() call. Such approach may be less than ideal
in threading environment, when several threads share the same socket and it
might happen that several of them are calling linux_send() at the same time
with and without SO_NOSIGPIPE set.

However, such race condition is very unlikely in practice, therefore this
change provides practical improvement compared to the previous behaviour.

PR: kern/76426
Submitted by: Steven Hartland <killing@multiplay.co.uk>
MFC after: 3 days


143204 07-Mar-2005 wpaul

When you call MiniportInitialize() for an 802.11 driver, it will
at some point result in a status event being triggered (it should
be a link down event: the Microsoft driver design guide says you
should generate one when the NIC is initialized). Some drivers
generate the event during MiniportInitialize(), such that by the
time MiniportInitialize() completes, the NIC is ready to go. But
some drivers, in particular the ones for Atheros wireless NICs,
don't generate the event until after a device interrupt occurs
at some point after MiniportInitialize() has completed.

The gotcha is that you have to wait until the link status event
occurs one way or the other before you try to fiddle with any
settings (ssid, channel, etc...). For the drivers that set the
event sycnhronously this isn't a problem, but for the others
we have to pause after calling ndis_init_nic() and wait for the event
to arrive before continuing. Failing to wait can cause big trouble:
on my SMP system, calling ndis_setstate_80211() after ndis_init_nic()
completes, but _before_ the link event arrives, will lock up or
reset the system.

What we do now is check to see if a link event arrived while
ndis_init_nic() was running, and if it didn't we msleep() until
it does.

Along the way, I discovered a few other problems:

- Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL.
ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation
wrong.)

- Similarly, the NDIS interrupt handler, which is essentially a
DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask()
has been fixed accordingly.

- MiniportQueryInformation() and MiniportSetInformation() run at
DISPATCH_LEVEL, and each request must complete before another
can be submitted. ndis_get_info() and ndis_set_info() have been
fixed accordingly.

- Turned the sleep lock that guards the NDIS thread job list into
a spin lock. We never do anything with this lock held except manage
the job list (no other locks are held), so it's safe to do this,
and it's possible that ndis_sched() and ndis_unsched() can be
called from DISPATCH_LEVEL, so using a sleep lock here is
semantically incorrect. Also updated subr_witness.c to add the
lock to the order list.


143197 07-Mar-2005 sobomax

Handle unimplemented syscall by instantly returning ENOSYS instead of sending
signal first and only then returning ENOSYS to match what real linux does.

PR: kern/74302
Submitted by: Travis Poppe <tlp@LiquidX.org>


143194 06-Mar-2005 sobomax

Always produce cpuX entries, even in the case when there is only one CPU
in the system. This is consistent with what real linuxes do.

PR: kern/75848
Submitted by: Andriy Gapon <avg@icyb.net.ua>
MFC after: 3 days


143086 03-Mar-2005 wpaul

MAXPATHLEN is 1024, which means NdisOpenFile() and ndis_find_sym() were
both consuming 1K of stack space. This is unfriendly. Allocate the buffers
off the heap instead. It's a little slower, but these aren't performance
critical routines.

Also, add a spinlock to NdisAllocatePacketPool(), NdisAllocatePacket(),
NdisFreePacketPool() and NdisFreePacket(). The pool is maintained as a
linked list. I don't know for a fact that it can be corrupted, but why
take chances.


142939 01-Mar-2005 jhb

Remove linux_emul_find() and the CHECKALT*() macros as they are no longer
used.


142934 01-Mar-2005 ps

Use kern_kevent instead of the stackgap for 32bit syscall wrapping.

Submitted by: jhb
Tested on: amd64


142931 01-Mar-2005 wpaul

In windrv_load(), I was allocating the driver object using
malloc(sizeof(device_object), ...) by mistake. Correct this, and
rename "dobj" to "drv" to make it a bit clearer what this variable
is supposed to be.

Spotted by: Mikore Li at Sun dot comnospamplzkthx


142918 01-Mar-2005 ps

Ooops. I will compile test before committing. The stackgap version
of kevent32 will be going away shortly, so this is temporary until
I commit the non-stackgap version.


142874 01-Mar-2005 ps

Correct the freebsd32_kevent prototype.


142553 26-Feb-2005 wpaul

Don't need to do MmInitializeMdl() in ndis_mtop() anymore:
IoInitializeMdl() does it internally (and doing it again here
messes things up).


142530 26-Feb-2005 wpaul

MDLs are supposed to be variable size (they include an array of pages
that describe a buffer of variable size). The problem is, allocating
MDLs off the heap is slow, and it can happen that drivers will allocate
lots and lots of lots of MDLs as they run.

As a compromise, we now do the following: we pre-allocate a zone for
MDLs big enough to describe any buffer with 16 or less pages. If
IoAllocateMdl() needs a MDL for a buffer with 16 or less pages, we'll
allocate it from the zone. Otherwise, we allocate it from the heap.
MDLs allocate from the zone have a flag set in their mdl_flags field.
When the MDL is released, IoMdlFree() will uma_zfree() the MDL if
it has the MDL_ZONE_ALLOCED flag set, otherwise it will release it
to the heap.

The assumption is that 16 pages is a "big number" and we will rarely
need MDLs larger than that.

- Moved the ndis_buffer zone to subr_ntoskrnl.c from kern_ndis.c
and named it mdl_zone.

- Modified IoAllocateMdl() and IoFreeMdl() to use uma_zalloc() and
uma_zfree() if necessary.

- Made ndis_mtop() use IoAllocateMdl() instead of calling uma_zalloc()
directly.

Inspired by: discussion with Giridhar Pemmasani


142500 25-Feb-2005 sam

fixup signal mapping:
o change the mapping arrays to have a zero offset rather than base 1;
this eliminates lots of signo adjustments and brings the code
back inline with the original netbsd code
o purge use of SVR4_SIGTBLZ; SVR4_NSIG is the only definition for
how big a mapping array is
o change the mapping loops to explicitly ignore signal 0
o purge some bogus code from bsd_to_svr4_sigset
o adjust svr4_sysentvec to deal with the mapping table change

Enticed into fixing by: Coverity Prevent analysis tool
Glanced at by: marcel, jhb


142497 25-Feb-2005 wpaul

Add macros to construct Windows IOCTL codes, and to extract function
codes from an IOCTL. (The USB module will need them later.)


142443 25-Feb-2005 wpaul

Fix a couple of callback instances that should have been wrapped with
MSCALLx().

Add definition for STATUS_PENDING error code.


142433 25-Feb-2005 wpaul

Compute the right length to use with bzero() when initializing an IRP
in IoInitializeIrp() (must use IoSizeOfIrp() to account for the stack
locations).


142399 24-Feb-2005 wpaul

- Correct one aspect of the driver_object/device_object/IRP framework:
when we create a PDO, the driver_object associated with it is that
of the parent driver, not the driver we're trying to attach. For
example, if we attach a PCI device, the PDO we pass to the NdisAddDevice()
function should contain a pointer to fake_pci_driver, not to the NDIS
driver itself. For PCI or PCMCIA devices this doesn't matter because
the child never needs to talk to the parent bus driver, but for USB,
the child needs to be able to send IRPs to the parent USB bus driver, and
for that to work the parent USB bus driver has to be hung off the PDO.

This involves modifying windrv_lookup() so that we can search for
bus drivers by name, if necessary. Our fake bus drivers attach themselves
as "PCI Bus," "PCCARD Bus" and "USB Bus," so we can search for them
using those names.

The individual attachment stubs now create and attach PDOs to the
parent bus drivers instead of hanging them off the NDIS driver's
object, and in if_ndis.c, we now search for the correct driver
object depending on the bus type, and use that to find the correct PDO.

With this fix, I can get my sample USB ethernet driver to deliver
an IRP to my fake parent USB bus driver's dispatch routines.

- Add stub modules for USB support: subr_usbd.c, usbd_var.h and
if_ndis_usb.c. The subr_usbd.c module is hooked up the build
but currently doesn't do very much. It provides the stub USB
parent driver object and a dispatch routine for
IRM_MJ_INTERNAL_DEVICE_CONTROL. The only exported function at
the moment is USBD_GetUSBDIVersion(). The if_ndis_usb.c stub
compiles, but is not hooked up to the build yet. I'm putting
these here so I can keep them under source code control as I
flesh them out.


142390 24-Feb-2005 jhb

Regen.


142389 24-Feb-2005 jhb

Use msync() to implement msync() for freebsd32 emulation. This isn't quite
right for certain MAP_FIXED mappings on ia64 but it will work fine for all
other mappings and works fine on amd64.

Requested by: ps, Christian Zander
MFC after: 1 week


142387 24-Feb-2005 wpaul

Couple of lessons learned during USB driver testing:

- In kern_ndis.c:ndis_unload_driver(), test that ndis_block->nmb_rlist
is not NULL before trying to free() it.

- In subr_pe.c:pe_get_import_descriptor(), do a case-insensitive
match on the import module name. Most drivers I have encountered
link against "ntoskrnl.exe" but the ASIX USB ethernet driver I'm
testing with wants "NTOSKRNL.EXE."

- In subr_ntoskrnl.c:IoAllocateIrp(), return a pointer to the IRP
instead of NULL. (Stub code leftover.)

- Also in subr_ntoskrnl.c, add ExAllocatePoolWithTag() and ExFreePool()
to the function table list so they'll get exported to drivers properly.


142311 23-Feb-2005 wpaul

Implement IoCancelIrp(), IoAcquireCancelSpinLock(), IoReleaseCancelSpinLock()
and a machine-independent though inefficient InterlockedExchange().
In Windows, InterlockedExchange() appears to be implemented in header
files via inline assembly. I would prefer using an atomic.h macro for
this, but there doesn't seem to be one that just does a plain old
atomic exchange (as opposed to compare and exchange). Also implement
IoSetCancelRoutine(), which is just a macro that uses InterlockedExchange().

Fill in IoBuildSynchronousFsdRequest(), IoBuildAsynchronousFsdRequest()
and IoBuildDeviceIoControlRequest() so that they do something useful,
and add a bunch of #defines to ntoskrnl_var.h to help make these work.
These may require some tweaks later.


142220 22-Feb-2005 phk

Neuter linux_ustat() until somebody finds time to try to fix it.

The fundamental problem is that we get only the lower 8 bits of the
minor device number so there is no guarantee that we can actually
find the disk device in question at all.

This was probably a bigger issue pre-GEOM where the upper bits
signaled which slice were in use.

The secondary problem is how we get from (partial) dev_t to vnode.

The correct implementation will involve traversing the mount list
looking for a perfect match or a possible match (for truncated
minor).


142196 22-Feb-2005 sam

remove dead code

Submitted by: Coverity Prevent analysis tool


142059 18-Feb-2005 jhb

- Add a custom version of exec_copyin_args() to deal with the 32-bit
pointers in argv and envv in userland and use that together with
kern_execve() and exec_free_args() to implement freebsd32_execve()
without using the stackgap.
- Fix freebsd32_adjtime() to call adjtime() rather than utimes(). Still
uses stackgap for now.
- Use kern_setitimer(), kern_getitimer(), kern_select(), kern_utimes(),
kern_statfs(), kern_fstatfs(), kern_fhstatfs(), kern_stat(),
kern_fstat(), and kern_lstat().

Tested by: cokane (amd64)
Silence on: amd64, ia64


142037 18-Feb-2005 wpaul

Fix a couple of u_int_foos that should have been uint_foos.


142035 18-Feb-2005 wpaul

Make the Win64 -> ELF64 template a little smaller by using a string
copy op to shift arguments on the stack instead of transfering each
argument one by one through a register. Probably doesn't affect overall
operation, but makes the code a little less grotty and easier to update
later if I choose to make the wrapper handle more args. Also add
comments.


141990 16-Feb-2005 wpaul

Remove redundant label.


141982 16-Feb-2005 wpaul

Fix freeing of custom driver extensions. (ExFreePool() was being
called with the wrong pointer.)


141980 16-Feb-2005 wpaul

KeAcquireSpinLockRaiseToDpc() and KeReleaseSpinLock() are (at least
for now) exactly the same as KfAcquireSpinLock() and KfReleaseSpinLock().
I implemented the former as small routines in subr_ntoskrnl.c that just
turned around and invoked the latter. But I don't really need the wrapper
routines: I can just create an entries in the ntoskrnl func table that
map KeAcquireSpinLockRaiseToDpc() and KeReleaseSpinLock() to
KfAcquireSpinLock() and KfReleaseSpinLock() directly. This means
the stubs can go away.


141963 16-Feb-2005 wpaul

Add support for Windows/x86-64 binaries to Project Evil.
Ville-Pertti Keinonen (will at exomi dot comohmygodnospampleasekthx)
deserves a big thanks for submitting initial patches to make it
work. I have mangled his contributions appropriately.

The main gotcha with Windows/x86-64 is that Microsoft uses a different
calling convention than everyone else. The standard ABI requires using
6 registers for argument passing, with other arguments on the stack.
Microsoft uses only 4 registers, and requires the caller to leave room
on the stack for the register arguments incase the callee needs to
spill them. Unlike x86, where Microsoft uses a mix of _cdecl, _stdcall
and _fastcall, all routines on Windows/x86-64 uses the same convention.
This unfortunately means that all the functions we export to the
driver require an intermediate translation wrapper. Similarly, we have
to wrap all calls back into the driver binary itself.

The original patches provided macros to wrap every single routine at
compile time, providing a secondary jump table with a customized
wrapper for each exported routine. I decided to use a different approach:
the call wrapper for each function is created from a template at
runtime, and the routine to jump to is patched into the wrapper as
it is created. The subr_pe module has been modified to patch in the
wrapped function instead of the original. (On x86, the wrapping
routine is a no-op.)

There are some minor API differences that had to be accounted for:

- KeAcquireSpinLock() is a real function on amd64, not a macro wrapper
around KfAcquireSpinLock()
- NdisFreeBuffer() is actually IoFreeMdl(). I had to change the whole
NDIS_BUFFER API a bit to accomodate this.

Bugs fixed along the way:
- IoAllocateMdl() always returned NULL
- kern_windrv.c:windrv_unload() wasn't releasing private driver object
extensions correctly (found thanks to memguard)

This has only been tested with the driver for the Broadcom 802.11g
chipset, which was the only Windows/x86-64 driver I could find.


141829 13-Feb-2005 njl

Unbreak the kernel build. Pointy hat to: sobomax.


141815 13-Feb-2005 sobomax

Backout previous change (disabling of security checks for signals delivered
in emulation layers), since it appears to be too broad.

Requested by: rwatson


141812 13-Feb-2005 sobomax

Split out kill(2) syscall service routine into user-level and kernel part, the
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.

Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.

As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.

Requested (sorta) by: rwatson
MFC after: 2 weeks


141691 11-Feb-2005 sobomax

Semctl with IPC_STAT command should return zero in case of success.

PR: 73778
Submitted by: Andriy Gapon <avg@icyb.net.ua>
MFC after: 2 weeks


141524 08-Feb-2005 wpaul

Next step on the road to IRPs: create and use an imitation of the
Windows DRIVER_OBJECT and DEVICE_OBJECT mechanism so that we can
simulate driver stacking.

In Windows, each loaded driver image is attached to a DRIVER_OBJECT
structure. Windows uses the registry to match up a given vendor/device
ID combination with a corresponding DRIVER_OBJECT. When a driver image
is first loaded, its DriverEntry() routine is invoked, which sets up
the AddDevice() function pointer in the DRIVER_OBJECT and creates
a dispatch table (based on IRP major codes). When a Windows bus driver
detects a new device, it creates a Physical Device Object (PDO) for
it. This is a DEVICE_OBJECT structure, with semantics analagous to
that of a device_t in FreeBSD. The Windows PNP manager will invoke
the driver's AddDevice() function and pass it pointers to the DRIVER_OBJECT
and the PDO.

The AddDevice() function then creates a new DRIVER_OBJECT structure of
its own. This is known as the Functional Device Object (FDO) and
corresponds roughly to a private softc instance. The driver uses
IoAttachDeviceToDeviceStack() to add this device object to the
driver stack for this PDO. Subsequent drivers (called filter drivers
in Windows-speak) can be loaded which add themselves to the stack.
When someone issues an IRP to a device, it travel along the stack
passing through several possible filter drivers until it reaches
the functional driver (which actually knows how to talk to the hardware)
at which point it will be completed. This is how Windows achieves
driver layering.

Project Evil now simulates most of this. if_ndis now has a modevent
handler which will use MOD_LOAD and MOD_UNLOAD events to drive the
creation and destruction of DRIVER_OBJECTs. (The load event also
does the relocation/dynalinking of the image.) We don't have a registry,
so the DRIVER_OBJECTS are stored in a linked list for now. Eventually,
the list entry will contain the vendor/device ID list extracted from
the .INF file. When ndis_probe() is called and detectes a supported
device, it will create a PDO for the device instance and attach it
to the DRIVER_OBJECT just as in Windows. ndis_attach() will then call
our NdisAddDevice() handler to create the FDO. The NDIS miniport block
is now a device extension hung off the FDO, just as it is in Windows.
The miniport characteristics table is now an extension hung off the
DRIVER_OBJECT as well (the characteristics are the same for all devices
handled by a given driver, so they don't need to be per-instance.)
We also do an IoAttachDeviceToDeviceStack() to put the FDO on the
stack for the PDO. There are a couple of fake bus drivers created
for the PCI and pccard buses. Eventually, there will be one for USB,
which will actually accept USB IRP.s

Things should still work just as before, only now we do things in
the proper order and maintain the correct framework to support passing
IRPs between drivers.

Various changes:

- corrected the comments about IRQL handling in subr_hal.c to more
accurately reflect reality
- update ndiscvt to make the drv_data symbol in ndis_driver_data.h a
global so that if_ndis_pci.o and/or if_ndis_pccard.o can see it.
- Obtain the softc pointer from the miniport block by referencing
the PDO rather than a private pointer of our own (nmb_ifp is no
longer used)
- implement IoAttachDeviceToDeviceStack(), IoDetachDevice(),
IoGetAttachedDevice(), IoAllocateDriverObjectExtension(),
IoGetDriverObjectExtension(), IoCreateDevice(), IoDeleteDevice(),
IoAllocateIrp(), IoReuseIrp(), IoMakeAssociatedIrp(), IoFreeIrp(),
IoInitializeIrp()
- fix a few mistakes in the driver_object and device_object definitions
- add a new module, kern_windrv.c, to handle the driver registration
and relocation/dynalinkign duties (which don't really belong in
kern_ndis.c).
- made ndis_block and ndis_chars in the ndis_softc stucture pointers
and modified all references to it
- fixed NdisMRegisterMiniport() and NdisInitializeWrapper() so they
work correctly with the new driver_object mechanism
- changed ndis_attach() to call NdisAddDevice() instead of ndis_load_driver()
(which is now deprecated)
- used ExAllocatePoolWithTag()/ExFreePool() in lookaside list routines
instead of kludged up alloc/free routines
- added kern_windrv.c to sys/modules/ndis/Makefile and files.i386.


141486 07-Feb-2005 jhb

- Implement svr4_emul_find() using kern_alternate_path(). This changes
the semantics in that the returned filename to use is now a kernel
pointer rather than a user space pointer. This required changing the
arguments to the CHECKALT*() macros some and changing the various system
calls that used pathnames to use the kern_foo() functions that can accept
kernel space filename pointers instead of calling the system call
directly.
- Use kern_open(), kern_access(), kern_msgctl(), kern_execve(),
kern_mkfifo(), kern_mknod(), kern_statfs(), kern_fstatfs(),
kern_setitimer(), kern_stat(), kern_lstat(), kern_fstat(), kern_utimes(),
kern_pathconf(), and kern_unlink().


141473 07-Feb-2005 jhb

- Use kern_{l,f,}stat() and kern_{f,}statfs() functions rather than
duplicating the contents of the same functions inline.
- Consolidate common code to convert a BSD statfs struct to a Linux struct
into a static worker function.


141472 07-Feb-2005 jhb

Make linux_emul_convpath() a simple wrapper for kern_alternate_path().


141471 07-Feb-2005 jhb

- Tweak kern_msgctl() to return a copy of the requested message queue id
structure in the struct pointed to by the 3rd argument for IPC_STAT and
get rid of the 4th argument. The old way returned a pointer into the
kernel array that the calling function would then access afterwards
without holding the appropriate locks and doing non-lock-safe things like
copyout() with the data anyways. This change removes that unsafeness and
resulting race conditions as well as simplifying the interface.
- Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(),
fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of
code duplication for freebsd4 and netbsd compatability system calls.
- Add a new lookup function kern_alternate_path() that looks up a filename
under an alternate prefix and determines which filename should be used.
This is basically a more general version of linux_emul_convpath() that
can be shared by all the ABIs thus allowing for further reduction of
code duplication.


141467 07-Feb-2005 jhb

Use kern_setitimer() to implement linux_alarm() instead of fondling the
real interval timer directly.


141031 30-Jan-2005 sobomax

Boot away another stackgap (one of the lest ones in linuxlator/i386) by
providing special version of CDIOCREADSUBCHANNEL ioctl(), which assumes that
result has to be placed into kernel space not user space. In the long run
more generic solution has to be designed WRT emulating various ioctl()s
that operate on userspace buffers, but right now there is only one such
ioctl() is emulated, so that it makes little sense.

MFC after: 2 weeks


141029 30-Jan-2005 sobomax

Extend kern_sendit() to take another enum uio_seg argument, which specifies
where the buffer to send lies and use it to eliminate yet another stackgap
in linuxlator.

MFC after: 2 weeks


140992 29-Jan-2005 sobomax

o Split out kernel part of execve(2) syscall into two parts: one that
copies arguments into the kernel space and one that operates
completely in the kernel space;

o use kernel-only version of execve(2) to kill another stackgap in
linuxlator/i386.

Obtained from: DragonFlyBSD (partially)
MFC after: 2 weeks


140839 26-Jan-2005 sobomax

Split out kernel side of msgctl(2) into two parts: the first that pops data
from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrapper
of that syscall.

MFC after: 2 weeks


140832 25-Jan-2005 sobomax

Split out kernel side of {get,set}itimer(2) into two parts: the first that
pops data from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrappers
of those syscalls.

MFC after: 2 weeks


140827 25-Jan-2005 wpaul

Apparently, the Intel icc compiler doesn't like it when you use
attributes in casts (i.e. foo = (__stdcall sometype)bar). This only
happens in two places where we need to set up function pointers, so
work around the problem with some void pointer magic.


140751 24-Jan-2005 wpaul

Begin the first phase of trying to add IRP support (and ultimately
USB device support):

- Convert all of my locally chosen function names to their actual
Windows equivalents, where applicable. This is a big no-op change
since it doesn't affect functionality, but it helps avoid a bit
of confusion (it's now a lot easier to see which functions are
emulated Windows API routines and which are just locally defined).

- Turn ndis_buffer into an mdl, like it should have been. The structure
is the same, but now it belongs to the subr_ntoskrnl module.

- Implement a bunch of MDL handling macros from Windows and use them where
applicable.

- Correct the implementation of IoFreeMdl().

- Properly implement IoAllocateMdl() and MmBuildMdlForNonPagedPool().

- Add the definitions for struct irp and struct driver_object.

- Add IMPORT_FUNC() and IMPORT_FUNC_MAP() macros to make formatting
the module function tables a little cleaner. (Should also help
with AMD64 support later on.)

- Fix if_ndis.c to use KeRaiseIrql() and KeLowerIrql() instead of
the previous calls to hal_raise_irql() and hal_lower_irql() which
have been renamed.

The function renaming generated a lot of churn here, but there should
be very little operational effect.


140482 19-Jan-2005 ps

Add a 32bit syscall wrapper for modstat

Obtained from: Yahoo!


140481 19-Jan-2005 ps

- rename nanosleep1 to kern_nanosleep
- Add a 32bit syscall entry for nanosleep

Reviewed by: peter
Obtained from: Yahoo!


140267 14-Jan-2005 wpaul

Fix a problem reported by Pierre Beyssac. Sometinmes when ndis_get_info()
calls MiniportQueryInformation(), it will return NDIS_STATUS_PENDING.
When this happens, ndis_get_info() will sleep waiting for a completion
event. If two threads call ndis_get_info() and both end up having to
sleep, they will both end up waiting on the same wait channel, which
can cause a panic in sleepq_add() if INVARIANTS are turned on.

Fix this by having ndis_get_info() use a common mutex rather than
using the process mutex with PROC_LOCK(). Also do the same for
ndis_set_info(). Note that Pierre's original patch also made ndis_thsuspend()
use the new mutex, but ndis_thsuspend() shouldn't need this since
it will make each thread that calls it sleep on a unique wait channel.

Also, it occured to me that we probably don't want to enter
MiniportQueryInformation() or MiniportSetInformation() from more
than one thread at any given time, so now we acquire a Windows
spinlock before calling either of them. The Microsoft documentation
says that MiniportQueryInformation() and MiniportSetInformation()
are called at DISPATCH_LEVEL, and previously we would call
KeRaiseIrql() to set the IRQL to DISPATCH_LEVEL before entering
either routine, but this only guarantees mutual exclusion on
uniprocessor machines. To make it SMP safe, we need to use a real
spinlock. For now, I'm abusing the spinlock embedded in the
NDIS_MINIPORT_BLOCK structure for this purpose. (This may need to be
applied to some of the other routines in kern_ndis.c at a later date.)

Export ntoskrnl_init_lock() (KeInitializeSpinlock()) from subr_ntoskrnl.c
since we need to use in in kern_ndis.c, and since it's technically part
of the Windows kernel DDK API along with the other spinlock routines. Use
it in subr_ndis.c too rather than frobbing the spinlock directly.


140214 14-Jan-2005 obrien

Match the LINUX32's style with existing style
Submitted by: Jung-uk Kim <jkim@niksun.com>

Use positive, not negative logic.


140213 14-Jan-2005 obrien

Fix Linux compat 'uname -m' on AMD64.

Submitted by: Jung-uk Kim <jkim@niksun.com>
(patch reworked by me)


140199 13-Jan-2005 phk

Remove duplicate code.


139743 05-Jan-2005 imp

Start each of the license/copyright comments with /*-


139739 05-Jan-2005 jhb

- Move the function prototypes for kern_setrlimit() and kern_wait() to
sys/syscallsubr.h where all the other kern_foo() prototypes live.
- Resort kern_execve() while I'm there.


139682 04-Jan-2005 jhb

Regenerate.


139681 04-Jan-2005 jhb

Partial sync up to the master syscalls.master file:
- Mark mount, unmount and nmount MPSAFE.
- Add a stub for _umtx_op().
- Mark open(), link(), unlink(), and freebsd32_sigaction() MPSAFE.

Pointy hats to: several


139451 30-Dec-2004 jhb

Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.


138353 03-Dec-2004 phk

Do not blindly pass linux filesystem specific mount data across.


138281 01-Dec-2004 cperciva

Fix unvalidated pointer dereference. This is FreeBSD-SA-04:17.procfs.


138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


138126 27-Nov-2004 das

Axe the semblance of support for PECOFF and Linux a.out core dumps.


138107 26-Nov-2004 phk

Ignore MNT_NODEV option, it is implicit in choice of filesystem.


137921 20-Nov-2004 das

Maintain the broken state of backwards compatibilty for a.out (and
PECOFF!) core dumps. None of the old versions of gdb I tried were
able to read a.out core dumps before or after this change.

Reviewed by: arch@


137877 18-Nov-2004 marks

Rebuild from compat/freebsd32/syscalls.master:1.43

Reviewed by: imp, phk, njl, peter
Approved by: njl


137876 18-Nov-2004 marks

32-bit FreeBSD ABI compatibility stubs from syscalls.master:1.179

Reviewed by: imp, phk, njl, peter
Approved by: njl


137647 13-Nov-2004 phk

Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.

Use this in all the places where sleeping with the lock held is not
an issue.

The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.


137507 10-Nov-2004 phk

Pick up the inode number using VOP_GETATTR() rather than caching it
in all vnodes on the off chance that linprocfs needs it. If we can afford
to call vn_fullpath() we can afford the much cheaper VOP_GETATTR().


137339 07-Nov-2004 phk

More sensible FILEDESC_ locking.


136834 23-Oct-2004 rwatson

Rebuild from FreeBSD32 syscalls.master:1.42.


136833 23-Oct-2004 rwatson

32-bit FreeBSD ABI compatibility stubs from syscalls.master:1.178.


136404 11-Oct-2004 peter

Put on my peril sensitive sunglasses and add a flags field to the internal
sysctl routines and state. Add some code to use it for signalling the need
to downconvert a data structure to 32 bits on a 64 bit OS when requested by
a 32 bit app.

I tried to do this in a generic abi wrapper that intercepted the sysctl
oid's, or looked up the format string etc, but it was a real can of worms
that turned into a fragile mess before I even got it partially working.

With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have
it not abort. Things like netstat, ps, etc have a long way to go.

This also fixes a bug in the kern.ps_strings and kern.usrstack hacks.
These do matter very much because they are used by libc_r and other things.


136356 10-Oct-2004 dwmalone

Rename thread args to be called "td" rather than "p" to be
consistent with other bits of this file. There should be no
functional change.

Submitted by: Andrea Campi (many moons ago)
MFC after: 2 month


136192 06-Oct-2004 mtm

Close a race between a thread exiting and the freeing of it's stack.
After some discussion the best option seems to be to signal the thread's
death from within the kernel. This requires that thr_exit() take an
argument.

Discussed with: davidxu, deischen, marcel
MFC after: 3 days


136152 05-Oct-2004 jhb

Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.

Submitted by: bde (mostly)
MFC after: 1 month


135762 24-Sep-2004 jhb

Add a proc *p pointer for td->td_proc to make this code easier to read.


135715 24-Sep-2004 phk

Hold thread reference while frobbing cdevsw.


135573 22-Sep-2004 jhb

Various small style fixes.


135399 17-Sep-2004 bms

Fix compiler warnings, when __stdcall is #defined, by adding explicit casts.
These normally only manifest if the ndis compat module is statically
compiled into a kernel image by way of 'options NDISAPI'.

Submitted by: Dmitri Nikulin
Approved by: wpaul
PR: kern/71449
MFC after: 1 week


134267 24-Aug-2004 jhb

Regenerate after fcntl() wrappers were marked MP safe.


134266 24-Aug-2004 jhb

Fix the ABI wrappers to use kern_fcntl() rather than calling fcntl()
directly. This removes a few more users of the stackgap and also marks
the syscalls using these wrappers MP safe where appropriate.

Tested on: i386 with linux acroread5
Compiled on: i386, alpha LINT


134209 23-Aug-2004 des

Don't try to translate the control message unless we're certain it's
valid; otherwise a caller could trick us into changing any 32-bit word
in kernel memory to LINUX_SOL_SOCKET (0x00000001) if its previous value
is SOL_SOCKET (0x0000ffff).

MFC after: 3 days


133880 16-Aug-2004 wpaul

I'm a dumbass: remember to initialize fh->nf_map to NULL in
ndis_open_file() in the module loading case.


133877 16-Aug-2004 wpaul

The Texas Instruments ACX111 driver wants srand(), so provide it.


133876 16-Aug-2004 wpaul

Make the Texas Instruments 802.11g chipset work with the NDISulator.
This was tested with a Netgear WG311v2 802.11b/g PCI card. Things
that were fixed:

- This chip has two memory mapped regions, one at PCIR_BAR(0) and the
other at PCIR_BAR(1). This is a little different from the other
chips I've seen with two PCI shared memory regions, since they tend
to have the second BAR ad PCIR_BAR(2). if_ndis_pci.c tests explicitly
for PCIR_BAR(2). This has been changed to simply fill in ndis_res_mem
first and ndis_res_altmem second, if a second shared memory range
exists. Given that NDIS drivers seem to scan for BARs in ascending
order, I think this should be ok.

- Fixed the code that tries to process firmware images that have been
loaded as .ko files. To save a step, I was setting up the address
mapping in ndis_open_file(), but ndis_map_file() flags pre-existing
mappings as an error (to avoid duplicate mappings). Changed this so
that the mapping is now donw in ndis_map_file() as expected.

- Made the typedef for 'driver_entry' explicitly include __stdcall
to silence gcc warning in ndis_load_driver().

NOTE: the Texas Instruments ACX111 driver needs firmware. With my
card, there were 3 .bin files shipped with the driver. You must
either put these files in /compat/ndis or convert them with
ndiscvt -f and kldload them so the driver can use them. Without
the firmware image, the NIC won't work.


133850 16-Aug-2004 obrien

Fix the 'DEBUG' argument code to unbreak the amd64 LINT build.


133845 16-Aug-2004 obrien

Fix the 'DEBUG' argument code to unbreak the amd64 LINT build.


133840 16-Aug-2004 obrien

Fix the 'DEBUG' argument code to unbreak the LINT build.


133822 16-Aug-2004 tjr

Add support for 32-bit Linux binary emulation on amd64:
- include <machine/../linux32/linux.h> instead of <machine/../linux/linux.h>
if building with the COMPAT_LINUX32 option.
- make minimal changes to the i386 linprocfs_docpuinfo() function to support
amd64. We return a fake CPU family of 6 for now.


133816 16-Aug-2004 tjr

Changes to MI Linux emulation code necessary to run 32-bit Linux binaries
on AMD64, and the general case where the emulated platform has different
size pointers than we use natively:
- declare certain structure members as l_uintptr_t and use the new PTRIN
and PTROUT macros to convert to and from native pointers.
- declare some structures __packed on amd64 when the layout would differ
from that used on i386.
- include <machine/../linux32/linux.h> instead of <machine/../linux/linux.h>
if compiling with COMPAT_LINUX32. This will need to be revisited before
32-bit and 64-bit Linux emulation support can coexist in the same kernel.
- other small scattered changes.

This should be a no-op on i386 and Alpha.


133749 15-Aug-2004 tjr

Replace linux_getitimer() and linux_setitimer() with implementations
based on those in freebsd32_misc.c, removing the assumption that Linux
uses the same layout for struct itimerval as we use natively.


133747 15-Aug-2004 tjr

Avoid assuming that l_timeval is the same as the native struct timeval
in linux_select().


133745 15-Aug-2004 tjr

Use sv_psstrings from the current process's sysentvec structure instead
of PS_STRINGS. This is a no-op at present, but it will be needed when
running 32-bit Linux binaries on amd64 to ensure PS_STRINGS is in
addressable memory.


133716 14-Aug-2004 phk

Add XXX comment about findcdev() misuse.


133464 11-Aug-2004 marcel

Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc


133127 04-Aug-2004 wpaul

More minor cleanups and one small bug fix:

- In ntoskrnl_var.h, I had defined compat macros for
ntoskrnl_acquire_spinlock() and ntoskrnl_release_spinlock() but
never used them. This is fortunate since they were stale. Fix them
to work properly. (In Windows/x86 KeAcquireSpinLock() is a macro that
calls KefAcquireSpinLock(), which lives in HAL.dll. To imitate this,
ntoskrnl_acquire_spinlock() is just a macro that calls hal_lock(),
which lives in subr_hal.o.)

- Add macros for ntoskrnl_raise_irql() and ntoskrnl_lower_irql() that
call hal_raise_irql() and hal_lower_irql().

- Use these macros in kern_ndis.c, subr_ndis.c and subr_ntoskrnl.c.

- Along the way, I realised subr_ndis.c:ndis_lock() was not calling
hal_lock() correctly (it was using the FASTCALL2() wrapper when
in reality this routine is FASTCALL1()). Using the
ntoskrnl_acquire_spinlock() fixes this. Not sure if this actually
caused any bugs since hal_lock() would have just ignored what
was in %edx, but it was still bogus.

This hides many of the uses of the FASTCALLx() macros which makes the
code a little cleaner. Should not have any effect on generated object
code, other than the one fix in ndis_lock().


132980 01-Aug-2004 wpaul

In ndis_alloc_bufpool() and ndis_alloc_packetpool(), the test to see if
allocating pool memory succeeded was checking the wrong pointer (should
have been looking at *pool, not pool). Corrected this.


132973 01-Aug-2004 wpaul

Big mess 'o changes:

- Give ndiscvt(8) the ability to process a .SYS file directly into
a .o file so that we don't have to emit big messy char arrays into
the ndis_driver_data.h file. This behavior is currently optional, but
may become the default some day.

- Give ndiscvt(8) the ability to turn arbitrary files into .ko files
so that they can be pre-loaded or kldloaded. (Both this and the
previous change involve using objcopy(1)).

- Give NdisOpenFile() the ability to 'read' files out of kernel memory
that have been kldloaded or pre-loaded, and disallow the use of
the normal vn_open() file opening method during bootstrap (when no
filesystems have been mounted yet). Some people have reported that
kldloading if_ndis.ko works fine when the system is running multiuser
but causes a panic when the modile is pre-loaded by /boot/loader. This
happens with drivers that need to use NdisOpenFile() to access
external files (i.e. firmware images). NdisOpenFile() won't work
during kernel bootstrapping because no filesystems have been mounted.
To get around this, you can now do the following:

o Say you have a firmware file called firmware.img
o Do: ndiscvt -f firmware.img -- this creates firmware.img.ko
o Put the firmware.img.ko in /boot/kernel
o add firmware.img_load="YES" in /boot/loader.conf
o add if_ndis_load="YES" and ndis_load="YES" as well

Now the loader will suck the additional file into memory as a .ko. The
phony .ko has two symbols in it: filename_start and filename_end, which
are generated by objcopy(1). ndis_open_file() will traverse each module
in the module list looking for these symbols and, if it finds them, it'll
use them to generate the file mapping address and length values that
the caller of NdisOpenFile() wants.

As a bonus, this will even work if the file has been statically linked
into the kernel itself, since the "kernel" module is searched too.
(ndiscvt(8) will generate both filename.o and filename.ko for you).

- Modify the mechanism used to provide make-pretend FASTCALL support.
Rather than using inline assembly to yank the first two arguments
out of %ecx and %edx, we now use the __regparm__(3) attribute (and
the __stdcall__ attribute) and use some macro magic to re-order
the arguments and provide dummy arguments as needed so that the
arguments passed in registers end up in the right place. Change
taken from DragonflyBSD version of the NDISulator.


132708 27-Jul-2004 phk

Use kernel_vmount() instead of vfs_nmount().


132653 26-Jul-2004 cperciva

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


132468 20-Jul-2004 wpaul

*sigh* Fix source code compatibility with 5.2.1-RELEASE _again_.
(Make kdb stuff conditional.)


132347 18-Jul-2004 dwmalone

I missed two pieces of the commit to this file. Robert has already
added one, this adds the other.


132331 18-Jul-2004 rwatson

Remove 'sg' argument to linux_sendto_hdrincl, which is what I think was
intended. This fixes the build, but might require revision.


132313 17-Jul-2004 dwmalone

Add a kern_setsockopt and kern_getsockopt which can read the option
values from either user land or from the kernel. Use them for
[gs]etsockopt and to clean up some calls to [gs]etsockopt in the
Linux emulation code that uses the stackgap.


132263 16-Jul-2004 obrien

/usr/libexec/ld-elf.so.1 -> /libexec/ld-elf32.so.1


132199 15-Jul-2004 phk

Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".


132127 14-Jul-2004 peter

Regen


132126 14-Jul-2004 peter

Unmapped syscalls should be NOPROTO so that we don't get a duplicate
prototype. (kldunloadf in this case)


132117 13-Jul-2004 phk

Give kldunload a -f(orce) argument.

Add a MOD_QUIESCE event for modules. This should return error (EBUSY)
of the module is in use.

MOD_UNLOAD should now only fail if it is impossible (as opposed to
inconvenient) to unload the module. Valid reasons are memory references
into the module which cannot be tracked down and eliminated.

When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is
not given, MOD_QUIESCE failing will also prevent the unload.

For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as
success.

Document that modules should return EOPNOTSUPP for unknown events.


132116 13-Jul-2004 phk

Add kldunloadf() system call. Stay tuned for follwing commit messages.


131953 11-Jul-2004 wpaul

Make NdisReadPcmciaAttributeMemory() and NdisWritePcmciaAttributeMemory()
actually work.

Make the PCI and PCCARD attachments provide a bus_get_resource_list()
method so that resource listing for PCCARD works. PCCARD does not
have a bus_get_resource_list() method (yet), so I faked up the
resource list management in if_ndis_pccard.c, and added
bus_get_resource_list() methods to both if_ndis_pccard.c and if_ndis_pci.c.
The one in the PCI attechment just hands off to the PCI bus code.
The difference is transparent to the NDIS resource handler code.

Fixed ndis_open_file() so that opening files which live on NFS
filesystems work: pass an actual ucred structure to VOP_GETATTR()
(NFS explodes if the ucred structure is NOCRED).

Make NdisMMapIoSpace() handle mapping of PCMCIA attribute memory
resources correctly.

Turn subr_ndis.c:my_strcasecmp() into ndis_strcasecmp() and export
it so that if_ndis_pccard.c can use it, and junk the other copy
of my_strcasecmp() from if_ndis_pccard.c.


131909 10-Jul-2004 marcel

Update for the KDB framework:
o Call kdb_enter() instead of Debugger().

While here, remove a redundant return.


131897 10-Jul-2004 phk

Clean up and wash struct iovec and struct uio handling.

Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec. Caller frees.

Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.

Add cloneuio() which returns a malloc'ed copy. Caller frees.

Use them throughout.


131796 08-Jul-2004 phk

Use a couple of regular kernel entry points, rather than COMPAT_43
entry points.


131750 07-Jul-2004 wpaul

Fix two problems:

- In subr_ndis.c:ndis_allocate_sharemem(), create the busdma tags
used for shared memory allocations with a lowaddr of 0x3E7FFFFF.
This forces the buffers to be mapped to physical/bus addresses within
the first 1GB of physical memory. It seems that at least one card
(Linksys Instant Wireless PCI V2.7) depends on this behavior. I
don't know if this is a hardware restriction, or if the NDIS
driver for this card is truncating the addresses itself, but using
physical/bus addresses beyong the 1GB limit causes initialization
failures.

- Create am NDIS_INITIALIZED() macro in if_ndisvar.h and use it in
if_ndis.c to test whether the device has been initialized rather
than checking for the presence of the IFF_UP flag in if_flags.
While debugging the previous problem, I noticed that bringing
up the device would always produce failures from ndis_setmulti().
It turns out that the following steps now occur during device
initialization:

- IFF_UP flag is set in if_flags
- ifp->if_ioctl() called with SIOCSIFADDR (which we don't handle)
- ifp->if_ioctl() called with SIOCADDMULTI
- ifp->if_ioctl() called with SIOCADDMULTI (again)
- ifp->if_ioctl() called with SIOCADDMULTI (yet again)
- ifp->if_ioctl() called with SIOCSIFFLAGS

Setting the receive filter and multicast filters can only be done
when the underlying NDIS driver has been initialized, which is done
by ifp->if_init(). However, we don't call ifp->if_init() until
ifp->if_ioctl() is called with SIOCSIFFLAGS and IFF_UP has been
set. It appears that now, the network stack tries to add multicast
addresses to interface's filter before those steps occur. Normally,
ndis_setmulti() would trap this condition by checking for the IFF_UP
flag, but the network code has in fact set this flag already, so
ndis_setmulti() is fooled into thinking the interface has been
initialized when it really hasn't.

It turns out this is usually harmless because the ifp->if_init()
routine (in this case ndis_init()) will set up the multicast
filter when it initializes the hardware anyway, and the underlying
routines (ndis_get_info()/ndis_set_info()) know that the driver/NIC
haven't been initialized yet, but you end up spurious error messages
on the console all the time.

Something tells me this new behavior isn't really correct. I think
the intention was to fix it so that ifp->if_init() is only called
once when we ifconfig an interface up, but the end result seems a
little bogus: the change of the IFF_UP flag should be propagated
down to the driver before calling any other ioctl() that might actually
require the hardware to be up and running.


131461 02-Jul-2004 netchild

Implement SNDCTL_DSP_SETDUPLEX. This may fix sound apps which want to
use full duplex mode.

Approved by: matk


131431 02-Jul-2004 marcel

Change the thread ID (thr_id_t) used for 1:1 threading from being a
pointer to the corresponding struct thread to the thread ID (lwpid_t)
assigned to that thread. The primary reason for this change is that
libthr now internally uses the same ID as the debugger and the kernel
when referencing to a kernel thread. This allows us to implement the
support for debugging without additional translations and/or mappings.

To preserve the ABI, the 1:1 threading syscalls, including the umtx
locking API have not been changed to work on a lwpid_t. Instead the
1:1 threading syscalls operate on long and the umtx locking API has
not been changed except for the contested bit. Previously this was
the least significant bit. Now it's the most significant bit. Since
the contested bit should not be tested by userland, this change is
not expected to be visible. Just to be sure, UMTX_CONTESTED has been
removed from <sys/umtx.h>.

Reviewed by: mtm@
ABI preservation tested on: i386, ia64


131430 02-Jul-2004 marcel

Regen.


131013 24-Jun-2004 obrien

Cast variable-sized (based on platform) quantities before printing out.


130959 23-Jun-2004 bde

Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of
depending on namespace pollution in <sys/vnode.h> for the definition
of GIANT_REQUIRED.

Sorted includes.


130902 22-Jun-2004 rwatson

Mark linux_emul_convpath() as GIANT_REQUIRED.


130892 21-Jun-2004 phk

Put the pre FreeBSD-2.x tty compat code under BURN_BRIDGES.


130691 18-Jun-2004 bms

Add stub for Linux SOUND_MIXER_READ_RECMASK, required by some Linux sound
applications.

PR: misc/27471
Submitted by: Gavin Atkinson (with cleanups)


130689 18-Jun-2004 bms

Add a stub for the Linux SOUND_MIXER_INFO ioctl (even though we don't
actually implement it), as some applications, such as RealProducer,
expect to be able to use it.

PR: kern/65971
Submitted by: Matt Wright


130688 18-Jun-2004 bms

Linux applications expect to be able to call SIOCGIFCONF with an
NULL ifc.ifc_buf pointer, to determine the expected buffer size.

The submitted fix only takes account of interfaces with an AF_INET
address configured. This could no doubt be improved.

PR: kern/45753
Submitted by: Jacques Garrigue (with cleanups)


130687 18-Jun-2004 bms

Fix the VT_SETMODE/CDROMIOCTOCENTRY problem correctly.

Reviewed by: tjr


130682 18-Jun-2004 bms

Fix two attempts to use an unchecked NULL pointer provided from the
userland, for the CDIOREADTOCENTRY and VT_SETMODE cases respectively.

Noticed by: tjr


130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


130453 14-Jun-2004 phk

Add support for more linux ioctls.

I've had this sitting in my tree for a long time and I can't seem to
find who sent it to me in the first place, apologies to whoever is
missing out on a Contributed by: line here.

I belive it works as it should.


130398 13-Jun-2004 rwatson

Socket MAC labels so_label and so_peerlabel are now protected by
SOCK_LOCK(so):

- Hold socket lock over calls to MAC entry points reading or
manipulating socket labels.

- Assert socket lock in MAC entry point implementations.

- When externalizing the socket label, first make a thread-local
copy while holding the socket lock, then release the socket lock
to externalize to userspace.


130344 11-Jun-2004 phk

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


130166 07-Jun-2004 wpaul

Add another 5.2.1 source compatibility tweak: acquire Giant before calling
kthread_exit() if FreeBSD_version is old enough.


130101 05-Jun-2004 tjr

Change the types of vn_rdwr_inchunks()'s len and aresid arguments to
size_t and size_t *, respectively. Update callers for the new interface.
This is a better fix for overflows that occurred when dumping segments
larger than 2GB to core files.


130097 04-Jun-2004 des

Take advantage of the dev sysctl tree.

Approved by: wpaul


130052 04-Jun-2004 wpaul

Grrr. Really check subr_ndis.c in this time. (fixed my_strcasecmp())


129970 01-Jun-2004 wpaul

Explicitly #include <sys/module.h> instead of depending on <sys/kernel.h>
to do it for us.


129850 29-May-2004 wpaul

Fix build with ndisulator: Add prototype for my_strcasecmp().


129834 29-May-2004 wpaul

In subr_ndis.c, when searching for keys in our make-pretend registry,
make the key name matching case-insensitive. There are some drivers
and .inf files that have mismatched cases, e.g. the driver will look
for "AdhocBand" whereas the .inf file specifies a registry key to be
created called "AdHocBand." The mismatch is probably a typo that went
undetected (so much for QA), but since Windows seems to be case-insensitive,
we should be too.

In if_ndis.c, initialize rates and channels correctly so that specify
frequences correctly when trying to set channels in the 5Ghz band, and
so that 802.11b rates show up for some a/b/g cards (which otherwise
appear to have no 802.11b modes).

Also, when setting OID_802_11_CONFIGURATION in ndis_80211_setstate(),
provide default values for the beacon interval, ATIM window and dwelltime.
The Atheros "Aries" driver will crash if you try to select ad-hoc mode
and leave the beacon interval set to 0: it blindly uses this value and
does a division by 0 in the interrupt handler, causing an integer
divide trap.


128780 30-Apr-2004 wpaul

Small timer cleanups:

- Use the dh_inserted member of the dispatch header in the Windows
timer structure to indicate that the timer has been "inserted into
the timer queue" (i.e. armed via timeout()). Use this as the value
to return to the caller in KeCancelTimer(). Previously, I was using
callout_pending(), but you can't use that with timeout()/untimeout()
without creating a potential race condition.

- Make ntoskrnl_init_timer() just a wrapper around ntoskrnl_init_timer_ex()
(reduces some code duplication).

- Drop Giant when entering if_ndis.c:ndis_tick() and
subr_ntorkrnl.c:ntoskrnl_timercall(). At the moment, I'm forced to
use system callwheel via timeout()/untimeout() to handle timers rather
than the callout API (struct callout is too big to fit inside the
Windows struct KTIMER, so I'm kind of hosed). Unfortunately, all
the callouts in the callwhere are not marked as MPSAFE, so when
one of them fires, it implicitly acquires Giant before invoking the
callback routine (and releases it when it returns). I don't need to
hold Giant, but there's no way to stop the callout code from acquiring
it as long as I'm using timeout()/untimeout(), so for now we cheat
by just dropping Giant right away (and re-acquiring it right before
the routine returns so keep the callout code happy). At some point,
I will need to solve this better, but for now this should be a suitable
workaround.


128597 24-Apr-2004 marcel

Fix build for non-COMPAT_FREEBSD4 configurations. Make the FreeBSD 4
statfs functions conditional upon the option.


128546 22-Apr-2004 wpaul

Ok, _really_ fix the Intel 2100B Centrino deadlock problems this time.
(I hope.)

My original instinct to make ndis_return_packet() asynchronous was correct.
Making ndis_rxeof() submit packets to the stack asynchronously fixes
one recursive spinlock acquisition, but it's also possible for it to
happen via the ndis_txeof() path too. So:

- In if_ndis.c, revert ndis_rxeof() to its old behavior (and don't bother
putting ndis_rxeof_serial() back since we don't need it anymore).

- In kern_ndis.c, make ndis_return_packet() submit the call to the
MiniportReturnPacket() function to the "ndis swi" thread so that
it always happens in another context no matter who calls it.


128449 20-Apr-2004 wpaul

Correct the AT_DISPATCH_LEVEL() macro to match earlier changes.


128447 19-Apr-2004 wpaul

Try to handle recursive attempts to raise IRQL to DISPATCH_LEVEL better
(among other things).


128406 18-Apr-2004 wpaul

In ntoskrnl_unlock_dpc(), use atomic_store instead of atomic_cmpset
to give up the spinlock.

Suggested by: bde


128295 16-Apr-2004 wpaul

- Use memory barrier with atomic operations in ntoskrnl_lock_dpc() and
ntoskrnl_unlocl_dpc().
- hal_raise_irql(), hal_lower_irql() and hal_irql() didn't work right
on SMP (priority inheritance makes things... interesting). For now,
use only two states: DISPATCH_LEVEL (PI_REALTIME) and PASSIVE_LEVEL
(everything else). Tested on a dual PIII box.
- Use ndis_thsuspend() in ndis_sleep() instead of tsleep(). (I added
ndis_thsuspend() and ndis_thresume() to replace kthread_suspend()
and kthread_resume(); the former will preserve a thread's priority
when it wakes up, the latter will not.)
- Change use of tsleep() in ndis_stop_thread() to prevent priority
change on wakeup.


128262 14-Apr-2004 peter

Check in structure definitions for the FreeBSD-3.x signal syscall stuff.
Nothing uses these yet, but I dont want to lose them.


128261 14-Apr-2004 peter

Regen


128260 14-Apr-2004 peter

Catch up to the not-so-recent statfs(2) changes.


128229 14-Apr-2004 wpaul

Continue my efforts to imitate Windows as closely as possible by
attempting to duplicate Windows spinlocks. Windows spinlocks differ
from FreeBSD spinlocks in the way they block preemption. FreeBSD
spinlocks use critical_enter(), which masks off _all_ interrupts.
This prevents any other threads from being scheduled, but it also
prevents ISRs from running. In Windows, preemption is achieved by
raising the processor IRQL to DISPATCH_LEVEL, which prevents other
threads from preempting you, but does _not_ prevent device ISRs
from running. (This is essentially what Solaris calls dispatcher
locks.) The Windows spinlock itself (kspin_lock) is just an integer
value which is atomically set when you acquire the lock and atomically
cleared when you release it.

FreeBSD doesn't have IRQ levels, so we have to cheat a little by
using thread priorities: normal thread priority is PASSIVE_LEVEL,
lowest interrupt thread priority is DISPATCH_LEVEL, highest thread
priority is DEVICE_LEVEL (PI_REALTIME) and critical_enter() is
HIGH_LEVEL. In practice, only PASSIVE_LEVEL and DISPATCH_LEVEL
matter to us. The immediate benefit of all this is that I no
longer have to rely on a mutex pool.

Now, I'm sure many people will be seized by the urge to criticize
me for doing an end run around our own spinlock implementation, but
it makes more sense to do it this way. Well, it does to me anyway.

Overview of the changes:

- Properly implement hal_lock(), hal_unlock(), hal_irql(),
hal_raise_irql() and hal_lower_irql() so that they more closely
resemble their Windows counterparts. The IRQL is determined by
thread priority.

- Make ntoskrnl_lock_dpc() and ntoskrnl_unlock_dpc() do what they do
in Windows, which is to atomically set/clear the lock value. These
routines are designed to be called from DISPATCH_LEVEL, and are
actually half of the work involved in acquiring/releasing spinlocks.

- Add FASTCALL1(), FASTCALL2() and FASTCALL3() macros/wrappers
that allow us to call a _fastcall function in spite of the fact
that our version of gcc doesn't support __attribute__((__fastcall__))
yet. The macros take 1, 2 or 3 arguments, respectively. We need
to call hal_lock(), hal_unlock() etc... ourselves, but can't really
invoke the function directly. I could have just made the underlying
functions native routines and put _fastcall wrappers around them for
the benefit of Windows binaries, but that would create needless bloat.

- Remove ndis_mtxpool and all references to it. We don't need it
anymore.

- Re-implement the NdisSpinLock routines so that they use hal_lock()
and friends like they do in Windows.

- Use the new spinlock methods for handling lookaside lists and
linked list updates in place of the mutex locks that were there
before.

- Remove mutex locking from ndis_isr() and ndis_intrhand() since they're
already called with ndis_intrmtx held in if_ndis.c.

- Put ndis_destroy_lock() code under explicit #ifdef notdef/#endif.
It turns out there are some drivers which stupidly free the memory
in which their spinlocks reside before calling ndis_destroy_lock()
on them (touch-after-free bug). The ADMtek wireless driver
is guilty of this faux pas. (Why this doesn't clobber Windows I
have no idea.)

- Make NdisDprAcquireSpinLock() and NdisDprReleaseSpinLock() into
real functions instead of aliasing them to NdisAcaquireSpinLock()
and NdisReleaseSpinLock(). The Dpr routines use
KeAcquireSpinLockAtDpcLevel() level and KeReleaseSpinLockFromDpcLevel(),
which acquires the lock without twiddling the IRQL.

- In ndis_linksts_done(), do _not_ call ndis_80211_getstate(). Some
drivers may call the status/status done callbacks as the result of
setting an OID: ndis_80211_getstate() gets OIDs, which means we
might cause the driver to recursively access some of its internal
structures unexpectedly. The ndis_ticktask() routine will call
ndis_80211_getstate() for us eventually anyway.

- Fix the channel setting code a little in ndis_80211_setstate(),
and initialize the channel to IEEE80211_CHAN_ANYC. (The Microsoft
spec says you're not supposed to twiddle the channel in BSS mode;
I may need to enforce this later.) This fixes the problems I was
having with the ADMtek adm8211 driver: we were setting the channel
to a non-standard default, which would cause it to fail to associate
in BSS mode.

- Use hal_raise_irql() to raise our IRQL to DISPATCH_LEVEL when
calling certain miniport routines, per the Microsoft documentation.

I think that's everything. Hopefully, other than fixing the ADMtek
driver, there should be no apparent change in behavior.


128012 07-Apr-2004 wpaul

In ndis_convert_res(), initialize the head of our temporary list
before calling BUS_GET_RESOURCE_LIST(). Previously, the list head would
only be initialized if BUS_GET_RESOURCE_LIST() succeeded; it needs to
be initialized unconditionally so that the list cleanup code won't
trip over potential stack garbage.


127887 05-Apr-2004 wpaul

- The MiniportReset() function can return NDIS_STATUS_PENDING, in which
case we should wait for the resetdone handler to be called before
returning.

- When providing resources via ndis_query_resources(), uses the
computed rsclen when using bcopy() to copy out the resource data
rather than the caller-supplied buffer length.

- Avoid using ndis_reset_nic() in if_ndis.c unless we really need
to reset the NIC because of a problem.

- Allow interrupts to be fielded during ndis_attach(), at least
as far as allowing ndis_isr() and ndis_intrhand() to run.

- Use ndis_80211_rates_ex when probing for supported rates. Technically,
this isn't supposed to work since, although Microsoft added the extended
rate structure with the NDIS 5.1 update, the spec still says that
the OID_802_11_SUPPORTED_RATES OID uses ndis_80211_rates. In spite of
this, it appears some drivers use it anyway.

- When adding in our guessed rates, check to see if they already exist
so that we avoid any duplicates.

- Add a printf() to ndis_open_file() that alerts the user when a
driver attempts to open a file under /compat/ndis.

With these changes, I can get the driver for the SMC 2802W 54g PCI
card to load and run. This board uses a Prism54G chip. Note that in
order for this driver to work, you must place the supplied smc2802w.arm
firmware image under /compat/ndis. (The firmware is not resident on
the device.)

Note that this should also allow the 3Com 3CRWE154G72 card to work
as well; as far as I can tell, these cards also use a Prism54G chip.


127694 01-Apr-2004 pjd

Remove ps_argsopen from this check, because of two reasons:
1. This check if wrong, because it is true by default
(kern.ps_argsopen is 1 by default) (p_cansee() is not even checked).
2. Sysctl kern.ps_argsopen is going away.


127552 29-Mar-2004 wpaul

Add missing cprd_flags member to partial resource structure in
resource_var.h.

In kern_ndis.c:ndis_convert_res(), fill in the cprd_flags and
cprd_sharedisp fields as best we can.

In if_ndis.c:ndis_setmulti(), don't bother updating the multicast
filter if our multicast address list is empty.

Add some missing updates to ndis_var.h and ntoskrnl_var.h that I
forgot to check in when I added the KeDpc stuff.


127503 27-Mar-2004 wpaul

Apparently, some atheros drivers want rand(), so implement it (in terms
of random()).

Requested by: juli
Bribe offered: tacos


127484 27-Mar-2004 mtm

Regen for libthr thread synchronization syscalls.


127482 27-Mar-2004 mtm

Separate thread synchronization from signals in libthr. Instead
use msleep() and wakeup_one().

Discussed with: jhb, peter, tjr


127411 25-Mar-2004 wpaul

- In subr_ndis.c:ndis_init_event(), initialize events as notification
objects rather than synchronization objects. When a sync object is
signaled, only the first thread waiting on it is woken up, and then
it's automatically reset to the not-signaled state. When a
notification object is signaled, all threads waiting on it will
be woken up, and it remains in the signaled state until someone
resets it manually. We want the latter behavior for NDIS events.

- In kern_ndis.c:ndis_convert_res(), we have to create a temporary
copy of the list returned by BUS_GET_RESOURCE_LIST(). When the PCI
bus code probes resources for a given device, it enters them into
a singly linked list, head first. The result is that traversing
this list gives you the resources in reverse order. This means when
we create the Windows resource list, it will be in reverse order too.
Unfortunately, this can hose drivers for devices with multiple I/O
ranges of the same type, like, say, two memory mapped I/O regions (one
for registers, one to map the NVRAM/bootrom/whatever). Some drivers
test the range size to figure out which region is which, but others
just assume that the resources will be listed in ascending order from
lowest numbered BAR to highest. Reversing the order means such drivers
will choose the wrong resource as their I/O register range.

Since we can't traverse the resource SLIST backwards, we have to
make a temporary copy of the list in the right order and then build
the Windows resource list from that. I suppose we could just fix
the PCI bus code to use a TAILQ instead, but then I'd have to track
down all the consumers of the BUS_GET_RESOURCE_LIST() and fix them
too.


127393 25-Mar-2004 wpaul

- In kern_ndis.c, implement ndis_unsched(), the complement to ndis_sched(),
which pulls a job off a thread work queue (assuming it hasn't run yet).
This is needed for KeRemoveQueueDpc().

- In subr_ntoskrnl.c, implement KeInsertQueueDpc() and KeRemoveQueueDpc(),
to go with KeInitializeDpc() to round out the API. Also change the
KeTimer implementation to use this API instead of the private
timer callout scheduler. Functionality of the timer API remains
unchanged, but we get a couple new Windows kernel API routines and
more closely imitate the way thing works in Windows. (As of yet
I haven't encountered any drivers that use KeInsertQueueDpc() or
KeRemoveQueueDpc(), but it doesn't hurt to have them.)


127320 22-Mar-2004 wpaul

Remove another case of grabbing Giant before doing a kthread_exit()
which is now no longer needed.


127311 22-Mar-2004 wpaul

I'm a dumbass: the test in the MOD_SHUTDOWN case in ndis_modevent()
that checks to see if any devices are still in the devlist was reversed.


127284 22-Mar-2004 wpaul

The Intel 2200BG NDIS driver does an alloca() of about 5000 bytes
when it associates with a net. Because FreeBSD's kstack size is only
2 pages by default, this blows the stack and causes a double fault.

To deal with this, we now create all our kthreads with 8 stack pages.
Also, we now run all timer callouts in the ndis swi thread (since
they would otherwise run in the clock ithread, whose stack is too
small). It happens that the alloca() in this case was occuring within
the interrupt handler, which was already running in the ndis swi
thread, but I want to deal with the callouts too just to be extra
safe.

NOTE: this will only work if you update vm_machdep.c with the change
I just committed. If you don't include this fix, setting the number
of stack pages with kthread_create() has essentially no effect.


127251 21-Mar-2004 peter

Change (yet again, sorry!) the path of the 32 bit ld-elf.so.1.


127248 20-Mar-2004 wpaul

- Rewrite the timer and event API routines in subr_ndis.c so that they
are actually layered on top of the KeTimer API in subr_ntoskrnl.c, just
as it is in Windows. This reduces code duplication and more closely
imitates the way things are done in Windows.

- Modify ndis_encode_parm() to deal with the case where we have
a registry key expressed as a hex value ("0x1") which is being
read via NdisReadConfiguration() as an int. Previously, we tried
to decode things like "0x1" with strtol() using a base of 10, which
would always yield 0. This is what was causing problems with the
Intel 2200BG Centrino 802.11g driver: the .inf file that comes
with it has a key called RadioEnable with a value of 0x1. We
incorrectly decoded this value to '0' when it was queried, hence
the driver thought we wanted the radio turned off.

- In if_ndis.c, most drivers don't accept NDIS_80211_AUTHMODE_AUTO,
but NDIS_80211_AUTHMODE_SHARED may not be right in some cases,
so for now always use NDIS_80211_AUTHMODE_OPEN.

NOTE: There is still one problem with the Intel 2200BG driver: it
happens that the kernel stack in Windows is larger than the kernel
stack in FreeBSD. The 2200BG driver sometimes eats up more than 2
pages of stack space, which can lead to a double fault panic.
For the moment, I got things to work by adding the following to
my kernel config file:

options KSTACK_PAGES=8

I'm pretty sure 8 is too big; I just picked this value out of a hat
as a test, and it happened to work, so I left it. 4 pages might be
enough. Unfortunately, I don't think you can dynamically give a
thread a larger stack, so I'm not sure how to handle this short of
putting a note in the man page about it and dealing with the flood
of mail from people who never read man pages.


127140 17-Mar-2004 jhb

- Replace wait1() with a kern_wait() function that accepts the pid,
options, status pointer and rusage pointer as arguments. It is up to
the caller to copyout the status and rusage to userland if needed. This
lets us axe the 'compat' argument and hide all that functionality in
owait(), by the way. This also cleans up some locking in kern_wait()
since it no longer has to drop locks around copyout() since all the
copyout()'s are deferred.
- Convert owait(), wait4(), and the various ABI compat wait() syscalls to
use kern_wait() rather than wait1() or wait4(). This removes a bit
more stackgap usage.

Tested on: i386
Compiled on: i386, alpha, amd64


127059 16-Mar-2004 tjr

Use vfs_nmount() to mount linprocfs filesystems in linux_mount();
linprocfs doesn't support the old mount interface.


127057 16-Mar-2004 tjr

Correct size argument passed to copyinstr() in linux_mount(): mntfromname
and mntonname are both MNAMELEN characters long, not MFSNAMELEN.


127026 15-Mar-2004 wpaul

Add vectors for _snprintf() and _vsnprintf() (redirected straight to
snprintf() and vsnprintf() in FreeBSD kernel land).

This is needed by the Intel Centrino 2200BG driver. Unfortunately, this
driver still doesn't work right with Project Evil even with this tweak,
but I'm unable to diagnose the problem since I don't have access to a
sample card.


126928 13-Mar-2004 peter

Move the non-MD machine/dvcfg.h and machine/physio_proc.h to a common
MI area before they proliferate more.


126851 11-Mar-2004 phk

Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.


126834 11-Mar-2004 wpaul

Fix mind-o: sanity check in ndis_disable_ndis() is not sane.


126833 11-Mar-2004 wpaul

Fix the problem with the Cisco Aironet 340 PCMCIA card. Most newer drivers
for Windows are deserialized miniports. Such drivers maintain their own
queues and do their own locking. This particular driver is not deserialized
though, and we need special support to handle it correctly.

Typically, in the ndis_rxeof() handler, we pass all incoming packets
directly to (*ifp->if_input)(). This in turn may cause another thread
to run and preempt us, and the packet may actually be processed and
then released before we even exit the ndis_rxeof() routine. The
problem with this is that releasing a packet calls the ndis_return_packet()
function, which hands the packet and its buffers back to the driver.
Calling ndis_return_packet() before ndis_rxeof() returns will screw
up the driver's internal queues since, not being deserialized,
it does no locking.

To avoid this problem, if we detect a serialized driver (by checking
the attribute flags passed to NdisSetAttributesEx(), we use an alternate
ndis_rxeof() handler, ndis_rxeof_serial(), which puts the call to
(*ifp->if_input)() on the NDIS SWI work queue. This guarantees the
packet won't be processed until after ndis_rxeof_serial() returns.

Note that another approach is to always copy the packet data into
another mbuf and just let the driver retain ownership of the ndis_packet
structure (ndis_return_packet() never needs to be called in this
case). I'm not sure which method is faster.


126795 10-Mar-2004 wpaul

Fix several issues related to the KeInitializeTimer() etc... API stuff
that I added recently:

- When a periodic timer fires, it's automatically re-armed. We must
make sure to re-arm the timer _before_ invoking any caller-supplied
defered procedure call: the DPC may choose to call KeCancelTimer(),
and re-arming the timer after the DPC un-does the effect of the
cancel.

- Fix similar issue with periodic timers in subr_ndis.c.

- When calling KeSetTimer() or KeSetTimerEx(), if the timer is
already pending, untimeout() it first before timeout()ing
it again.

- The old Atheros driver for the 5211 seems to use KeSetTimerEx()
incorrectly, or at the very least in a very strange way that
doesn't quite follow the Microsoft documentation. In one case,
it calls KeSetTimerEx() with a duetime of 0 and a period of 5000.
The Microsoft documentation says that negative duetime values
are relative to the current time and positive values are absolute.
But it doesn't say what's supposed to happen with positive values
that less than the current time, i.e. absolute values that are
in the past.

Lacking any further information, I have decided that timers with
positive duetimes that are in the past should fire right away (or
in our case, after only 1 tick). This also takes care of the other
strange usage in the Atheros driver, where the duetime is
specified as 500000 and the period is 50. I think someone may
have meant to use -500000 and misinterpreted the documentation.

- Also modified KeWaitForSingleObject() and KeWaitForMultipleObjects()
to make the same duetime adjustment, since they have the same rules
regarding timeout values.

- Cosmetic: change name of 'timeout' variable in KeWaitForSingleObject()
and KeWaitForMultipleObjects() to 'duetime' to avoid senseless
(though harmless) overlap with timeout() function name.

With these fixes, I can get the 5211 card to associate properly with
my adhoc net using driver AR5211.SYS version 2.4.1.6.


126706 07-Mar-2004 wpaul

Add preliminary support for PCMCIA devices in addition to PCI/cardbus.
if_ndis.c has been split into if_ndis_pci.c and if_ndis_pccard.c.
The ndiscvt(8) utility should be able to parse device info for PCMCIA
devices now. The ndis_alloc_amem() has moved from kern_ndis.c to
if_ndis_pccard.c so that kern_ndis.c no longer depends on pccard.

NOTE: this stuff is not guaranteed to work 100% correctly yet. So
far I have been able to load/init my PCMCIA Cisco Aironet 340 card,
but it crashes in the interrupt handler. The existing support for
PCI/cardbus devices should still work as before.


126674 05-Mar-2004 jhb

kthread_exit() no longer requires Giant, so don't force callers to acquire
Giant just to call kthread_exit().

Requested by: many


126620 04-Mar-2004 wpaul

- Some older Atheros drivers want KeInitializeTimer(), so implement it,
along with KeInitializeTimerEx(), KeSetTimer(), KeSetTimerEx(),
KeCancelTimer(), KeReadStateTimer() and KeInitializeDpc(). I don't
know for certain that these will make the Atheros driver happy since
I don't have the card/driver combo needed to test it, but these are
fairly independent so they shouldn't break anything else.

- Debugger() is present even in kernels without options DDB, so no
conditional compilation is necessary (pointed out by bde).

- Remove the extra km_acquirecnt member that I added to struct kmutant
and embed it within an unused portion of the structure instead, so that
we don't make the structure larger than it's defined to be in Windows.
I don't know what crack I was smoking when I decided it was ok to do
this, but it's worn off now.


126568 04-Mar-2004 wpaul

Add sanity checks to the ndis_packet and ndis_buffer pool handling
routines to guard against problems caused by (possibly) buggy drivers.

The RealTek 8180 wireless driver calls NdisFreeBuffer() to release
some of its buffers _after_ it's already called NdisFreeBufferPool()
to destroy the pool to which the buffers belong. In our implementation,
this error causes NdisFreeBuffer() to touch stale heap memory.

If you are running a release kernel, and hence have INVARIANTS et al
turned off, it turns out nothing happens. But if you're using a
development kernel config with INVARIANTS on, the malloc()/free()
sanity checks will scribble over the pool memory with 0xdeadc0de
once it's released so that any attempts to touch it will cause a
trap, and indeed this is what happens. It happens that I run 5.2-RELEASE
on my laptop, so when I tested the rtl8180.sys driver, it worked fine
for me, but people trying to run it with development systems checked
out or cvsupped from -current would get a page fault on driver load.

I can't find any reason why the NDISulator would cause the RealTek
driver to do the NdisFreeBufferPool() prematurely, and the same driver
obviously works with Windows -- or at least, it doesn't cause a crash:
the Microsoft documentation for NdisFreeBufferPool() says that failing
to return all buffers to the pool before calling NdisFreeBufferPool()
causes a memory leak.

I've written to my contacts at RealTek asking them to check if this
is indeed a bug in their driver. In the meantime, these new sanity checks
will catch this problem and issue a warning rather than causing a trap.
The trick is to keep a count of outstanding buffers for each buffer pool,
and if the driver tries to call NdisFreeBufferPool() while there are still
buffers outstanding, we mark the pool for deletion and then defer
destroying it until after the last buffer has been reclaimed.


126559 03-Mar-2004 wpaul

Add proper support for DbgPrint(): only print messages if bootverbose
is set, since some drivers with debug info can be very chatty.

Also implement DbgBreakPoint(), which is the Windows equivalent of
Debugger(). Unfortunately, this forces subr_ntoskrnl.c to include
opt_ddb.h.


126093 21-Feb-2004 peter

Regen (FWIW)


126092 21-Feb-2004 peter

Try and make the compat sigreturn prototypes closer to reality.


126091 21-Feb-2004 peter

Add a note about the landmine in the middle of struct ia32_sigframe.


126090 21-Feb-2004 peter

DOH!!! Fix signals for freebsd-4.x/i386 binaries. The ucontext has
different alignments due to the sse fxsave dump area.


126081 21-Feb-2004 phk

Device megapatch 5/6:

Remove the unused second argument from udev2dev().

Convert all remaining users of makedev() to use udev2dev(). The
semantic difference is that udev2dev() will only locate a pre-existing
dev_t, it will not line makedev() create a new one.

Apart from the tiny well controlled windown in D_PSEUDO drivers,
there should no longer be any "anonymous" dev_t's in the system
now, only dev_t's created with make_dev() and make_dev_alias()


125997 19-Feb-2004 bms

Add BSD compatibility tty ioctls LINUX_TIOCSBRK and LINUX_TIOCCBRK. This
addition appears to allow VMware 3 Workstation to operate with nmdm(4)
as a virtual COM device.

Tested by: Guido van Rooij


125950 17-Feb-2004 wpaul

Add vector for memmove() (currently aliased to memcpy()) a implement
ExInterlockedAddLargeStatistic().


125860 16-Feb-2004 wpaul

More cleanups/fixes for the AMD Am1771 driver:

- When adding new waiting threads to the waitlist for an object,
use INSERT_LIST_TAIL() instead of INSERT_LIST_HEAD() so that new
waiters go at the end of the list instead of the beginning. When we
wake up a synchronization object, only the first waiter is awakened,
and this needs to be the first thread that actually waited on the object.

- Correct missing semicolon in INSERT_LIST_TAIL() macro.

- Implement lookaside lists correctly. Note that the Am1771 driver
uses lookaside lists to manage shared memory (i.e. DMAable) buffers
by specifying its own alloc and free routines. The Microsoft documentation
says you should avoid doing this, but apparently this did not deter
the developers at AMD from doing it anyway.

With these changes (which are the result of two straight days of almost
non-stop debugging), I think I finally have the object/thread handling
semantics implemented correctly. The Am1771 driver no longer crashes
unexpectedly during association or bringing the interface up.


125814 14-Feb-2004 wpaul

Fix a problem with the way we schedule work on the NDIS worker threads.
The Am1771 driver will sometimes do the following:

- Some thread-> NdisScheduleWorkItem(some work)
- Worker thread -> do some work, KeWaitForSingleObject(some event)
- Some other thread -> NdisScheduleWorkItem(some other work)

When the second call to NdisScheduleWorkItem() occurs, the NDIS worker
thread (in our case ndis taskqueue) is suspended in KeWaitForSingleObject()
and waiting for an event to be signaled. This is different from when
the worker thread is idle and waiting on NdisScheduleWorkItem() to
send it more jobs. However, the ndis_sched() function in kern_ndis.c
always calls kthread_resume() when queueing a new job. Normally this
would be ok, but here this causes KeWaitForSingleObject() to return
prematurely, which is not what we want.

To fix this, the NDIS threads created by kern_ndis.c maintain a state
variable to indicate whether they are running (scanning the job list
and executing jobs) or sleeping (blocked on kthread_suspend() in
ndis_runq()), and ndis_sched() will only call kthread_resume() if
the thread is in the sleeping state.

Note that we can't just check to see if the thread is on the run queue:
in both cases, the thread is sleeping, but it's sleeping for different
reasons.

This stops the Am1771 driver from emitting various "NDIS ERROR" messages
and fixes some cases where it crashes.


125723 11-Feb-2004 wpaul

Correct instance of *timeout that should have been timeout.

Noticed by: mlaier


125718 11-Feb-2004 wpaul

Add yet more bulletproofing. This is to guard against the case that
ndis_init_nic() works one during attach, but fails later. Many things
will blow up if ndis_init_nic() fails and we aren't careful.


125676 10-Feb-2004 wpaul

Add some bulletproofing: don't allow the ndis_get_info() or ndis_set_info()
routines to do anything except return error if the miniport adapter context
is not set (meaning we either having init'ed the driver yet, or the
initialization failed).

Also, be sure to NULL out the adapter context along with the
miniport characteristics pointers if calling the MiniportInitialize()
method fails.


125630 09-Feb-2004 des

Remove VFS_STATFS() call which violated the lock order and wasn't
really required anyway.

PR: kern/61994
Submitted by: Bjoern Groenvall <bg@sics.se>


125628 09-Feb-2004 wpaul

Add stub implementations of KfLowerIrql() and KfRaiseIrql() (both of
which are _fastcall).


125599 08-Feb-2004 wpaul

Make NdisMMapIoSpace() guard against NULL/uninitialized resource pointers too.


125598 08-Feb-2004 wpaul

Make NdisMMapIoSpace() handle the case where a device has both mem
and altmem ranges mapped.


125582 07-Feb-2004 wpaul

Argh. kthread_suspend() when in P_KTHREAD context, tsleep() when not,
not the other way around.


125577 07-Feb-2004 wpaul

Correct an intance of mtx_pool_lock() that should have been mtx_pool_unlock().


125575 07-Feb-2004 phk

I guess nobody has needed to use the SVR4olator to create device
nodes, or if they did, they're now locked away on the Kurt Gdel
memorial home for the numerically confused:

Don't cast a kernel pointer (from makedev(9)) to an integer (maj+minor combo).


125551 07-Feb-2004 wpaul

Add a whole bunch of new stuff to make the driver for the AMD Am1771/Am1772
802.11b chipset work. This chip is present on the SMC2602W version 3
NIC, which is what was used for testing. This driver creates kernel
threads (12 of them!) for various purposes, and required the following
routines:

PsCreateSystemThread()
PsTerminateSystemThread()
KeInitializeEvent()
KeSetEvent()
KeResetEvent()
KeInitializeMutex()
KeReleaseMutex()
KeWaitForSingleObject()
KeWaitForMultipleObjects()
IoGetDeviceProperty()

and several more. Also, this driver abuses the fact that NDIS events
and timers are actually Windows events and timers, and uses NDIS events
with KeWaitForSingleObject(). The NDIS event routines have been rewritten
to interface with the ntoskrnl module. Many routines with incorrect
prototypes have been cleaned up.

Also, this driver puts jobs on the NDIS taskqueue (via NdisScheduleWorkItem())
which block on events, and this interferes with the operation of
NdisMAllocateSharedMemoryAsync(), which was also being put on the
NDIS taskqueue. To avoid the deadlock, NdisMAllocateSharedMemoryAsync()
is now performed in the NDIS SWI thread instead.

There's still room for some cleanups here, and I really should implement
KeInitializeTimer() and friends.


125529 06-Feb-2004 jhb

Regen.


125527 06-Feb-2004 jhb

Sync up MP safe flags with global syscalls.master. This includes read(),
write(), close(), getpid(), setuid(), getuid(), svr4_sys_pause(),
svr4_sys_nice(), svr4_sys_kill(), svr4_sys_pgrpsys(), dup(), pipe(),
setgid(), getgid(), svr4_sys_signal(), umask(), getgroups(), setgroups(),
svr4_sys_sigprocmask(), svr4_sys_sigsuspend(), svr4_sys_sigaltstack(),
svr4_sys_sigaction(), svr4_sys_sigpending(), mprotect(), munmap(),
setegid(), seteuid(), setreuid(), setregid().


125457 04-Feb-2004 jhb

Regen.


125455 04-Feb-2004 jhb

The following compat syscalls are now mpsafe: linux_getrlimit(),
linux_setrlimit(), linux_old_getrlimit(), osf1_getrlimit(),
osf1_setrlimit(), svr4_sys_ulimit(), svr4_sys_setrlimit(),
svr4_sys_getrlimit(), svr4_sys_setrlimit64(), svr4_sys_getrlimit64(),
ibcs2_sysconf(), and ibcs2_ulimit().


125454 04-Feb-2004 jhb

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


125413 04-Feb-2004 wpaul

Correct/improve the implementation of NdisMAllocateSharedMemoryAsync().
Since we have a worker thread now, we can actually do the allocation
asynchronously in that thread's context. Also, we need to return a
status value: if we're unable to queue up the async allocation, we
return NDIS_STATUS_FAILURE, otherwise we return NDIS_STATUS_PENDING
to indicate the allocation has been queued and will occur later.

This replaces the kludge where we just invoked the callback routine
right away in the current context.


125377 03-Feb-2004 wpaul

Implement support for single packet sends. The Intel Centrino driver
that Asus provides on its CDs has both a MiniportSend() routine
and a MiniportSendPackets() function. The Microsoft NDIS docs say
that if a driver has both, only the MiniportSendPackets() routine
will be used. Although I think I implemented the support correctly,
calling the MiniportSend() routine seems to result in no packets going
out on the air, even though no error status is returned. The
MiniportSendPackets() function does work though, so at least in
this case it doesn't matter.

In if_ndis.c:ndis_getstate_80211(), if ndis_get_assoc() returns
an error, don't bother trying to obtain any other state since the
calls may fail, or worse cause the underlying driver to crash.

(The above two changes make the Asus-supplied Centrino work.)

Also, when calling the OID_802_11_CONFIGURATION OID, remember
to initialize the structure lengths correctly.

In subr_ndis.c:ndis_open_file(), set the current working directory
to rootvnode if we're in a thread that doesn't have a current
working directory set.


125371 03-Feb-2004 deischen

Regen.


125370 03-Feb-2004 deischen

Sync with kern/syscalls.master.


125171 28-Jan-2004 peter

Regen


125170 28-Jan-2004 peter

Add getitimer swab stub


125069 27-Jan-2004 wpaul

Implement NdisVirtualBufferAddress() and NdisVirtualBufferAddressSafe().

The RealTek 8180 driver seems to need this.


125057 26-Jan-2004 wpaul

Reorganize the timer code a little and implement NdisInitializeTimer()
and NdisCancelTimer(). NdisInitializeTimer() doesn't accept an NDIS
miniport context argument, so we have to derive it from the timer
function context (which is supposed to be the adapter private context).
NdisCancelTimer is now an alias for NdisMCancelTimer().

Also add stubs for NdisMRegisterDevice() and NdisMDeregisterDevice().
These are no-ops for now, but will likely get fleshed in once I start
working on the Am1771/Am1772 wireless driver.


125006 26-Jan-2004 wpaul

Avoid possible panic on shutdown: if there are still some devices
attached when shutting down, kill our kthreads, but don't destroy
the mutex pool and uma zone resources since the driver shutdown
routine may need them later.


124813 21-Jan-2004 wpaul

Add structures and definitions for task offload (TCP/IP checksum,
IPSec, TCP large send).


124809 21-Jan-2004 wpaul

Make sure to trap failures correctly in ndis_get_info() and ndis_set_info().


124802 21-Jan-2004 rwatson

Reduce gratuitous includes: don't include jail.h if it's not needed.
Presumably, at some point, you had to include jail.h if you included
proc.h, but that is no longer required.

Result of: self injury involving adding something to struct prison


124729 19-Jan-2004 wpaul

Add WDM major/minor #defines.


124727 19-Jan-2004 wpaul

Implement IofCompleteRequest() and IoIsWdmVersionAvailable().
Correct IofCallDriver(): it's fastcall, not stdcall.
Add vector to vsprintf().


124725 19-Jan-2004 wpaul

Implement atoi() and atol(). Some drivers appear to need these. Note
that like most C library routines, these appear to be _cdecl in Windows.


124724 19-Jan-2004 wpaul

Eliminate some code duplication: since ndis_runq() and ndis_intq() were
basically the same function, compact them into a single loop which can
be used for both threads.


124697 18-Jan-2004 wpaul

Convert from using taskqueue_swi to using private kernel threads. The
problem with using taskqueue_swi is that some of the things we defer
into threads might block for up to several seconds. This is an unfriendly
thing to do to taskqueue_swi, since it is assumed the taskqueue threads
will execute fairly quickly once a task is submitted. Reorganized the
locking in if_ndis.c in the process.

Cleaned up ndis_write_cfg() and ndis_decode_parm() a little.


124582 16-Jan-2004 obrien

The ndis_kspin_lock type is called KSPIN_LOCK in MS-Windows.
According to the Windows DDK header files, KSPIN_LOCK is defined like this:
typedef ULONG_PTR KSPIN_LOCK;

From basetsd.h (SDK, Feb. 2003):
typedef [public] unsigned __int3264 ULONG_PTR, *PULONG_PTR;
typedef unsigned __int64 ULONG_PTR, *PULONG_PTR;
typedef _W64 unsigned long ULONG_PTR, *PULONG_PTR;

The keyword __int3264 specifies an integral type that has the following
properties:
+ It is 32-bit on 32-bit platforms
+ It is 64-bit on 64-bit platforms
+ It is 32-bit on the wire for backward compatibility.
It gets truncated on the sending side and extended appropriately
(signed or unsigned) on the receiving side.

Thus register_t seems the proper mapping onto FreeBSD for spin locks.


124576 15-Jan-2004 wpaul

The definition for __stdcall logically belongs in pe_var.h, but
the definitions for NDIS_BUS_SPACE_IO and NDIS_BUS_SPACE_MEM logically
belong in hal_var.h. At least, that's my story, and I'm sticking to it.

Also, remove definition of __stdcall from if_ndis.c now that it's pulled
in from pe_var.h.


124574 15-Jan-2004 obrien

Create NDIS_BUS_SPACE_{IO,MEM} to abstract MD BUS_SPACE macros.
Provide appropriate definitions for i386 and AMD64.


124541 15-Jan-2004 wpaul

Implement NdisCopyFromPacketToPacket() and NdisCopyFromPacketToPacketSafe().
I only have one driver that references this routine (for the 3Com 3cR990)
and it never gets called, but just in case, here it is.


124537 14-Jan-2004 truckman

VOP_GETATTR() wants the vnode passed to it to be locked. Instead
of adding the code to lock and unlock the vnodes and taking care
to avoid deadlock, simplify linux_emul_convpath() by comparing the
vnode pointers directly instead of comparing their va_fsid and
va_fileid attributes. This allows the removal of the calls to
VOP_GETATTR().


124509 14-Jan-2004 wpaul

mp_ncpus is always defined now, so no need to do an #ifdef SMP in
ndis_cpu_cnt().


124504 13-Jan-2004 obrien

AMD64 has a single MS-Win calling convention, so provide an empty __stdcall.
Centralize the definition to make it easier to change.


124502 13-Jan-2004 obrien

Use 'vm_offset_t' rather than 'u_int32_t'.

Tested on: AMD64
Reviewed by: wpaul


124501 13-Jan-2004 obrien

AMD64 has a single MS-Win calling convention, so provide an empty __stdcall.


124463 13-Jan-2004 wpaul

Implement some more unicode handling routines. This will hopefully bring
us closer to being able to run the Intel PRO/Wireless 5000 driver.


124454 13-Jan-2004 wpaul

Loosen up the range test in ndis_register_ioport(). Allow drivers to
map ranges that are smaller than what our resource manager code knows
is available, rather than requiring that they match exactly. This
fixes a problem with the Intel PRO/1000 gigE driver: it wants to map
a range of 32 I/O ports, even though some chips appear set up to
decode a range of 64. With this fix, it loads and runs correctly.


124450 12-Jan-2004 wpaul

Ugh. I am not having a good day. Remove debugging #ifdef that accidentally
crept into the last commit.


124448 12-Jan-2004 wpaul

Ugh. Last commit went horribly wrong. Back out changes to subr_ntoskrnl.c,
make sure if_ndis.c really gets checked in this time.


124446 12-Jan-2004 wpaul

In if_ndis.c:ndis_intr(), be a bit more intelligent about squelching
unexpected interrupts. If an interrupt is triggered and we're not
finished initializing yet, bail. If we have finished initializing,
but IFF_UP isn't set yet, drain the interrupt with ndis_intr() or
ndis_disable_intr() as appropriate, then return _without_ scheduling
ndis_intrtask().

In kern_ndis.c:ndis_load_driver() only relocate/dynalink a given driver
image once. Trying to relocate an image that's already been relocated
will trash the image. We poison a part of the image header that we
don't otherwise need with a magic value to indicate it's already been
fixed up. This fixes the case where there are multiple units of the
same kind of device.


124409 12-Jan-2004 wpaul

Merge in some changes submitted by Brian Feldman. Among other things,
these add support for listing BSSIDs via wicontrol -l. I added code
to call OID_802_11_BSSID_LIST_SCAN to allow scanning for any nearby
wirelsss nets.

Convert from using individual mutexes to a mutex pool, created in
subr_ndis.c. This deals with the problem of drivers creating locks
in their DriverEntry() routines which might get trashed later.

Put some messages under IFF_DEBUG.


124407 12-Jan-2004 rwatson

Correct for proper vn_fullpath() failure mode: "== -1" -> "!= 0"

Discussed with: des


124278 09-Jan-2004 wpaul

The private data section of ndis_packets has a 'packet flags' byte
which has two important flags in it: the 'allocated by NDIS' flag
and the 'media specific info present' flag. There are two Windows macros
for getting/setting media specific info fields within the ndis_packet
structure which can behave improperly if these flags are not initialized
correctly when a packet is allocated. It seems the correct thing
to do is always set the NDIS_PACKET_ALLOCATED_BY_NDIS flag on
all newly allocated packets.

This fixes the crashes with the Intel Centrino wireless driver.
My sample card now seems to work correctly.

Also, fix a potential LOR involving ndis_txeof() in if_ndis.c.


124272 09-Jan-2004 wpaul

Implement NdisOpenFile()/NdisCloseFile()/NdisMapFile()/NdisUnmapFile().
By default, we search for files in /compat/ndis. This can be changed with
a systcl. These routines are used by some drivers which need to download
firmware or microcode into their respective devices during initialization.

Also, remove extraneous newlines from the 'built-in' sysctl/registry
variables.


124246 08-Jan-2004 wpaul

Correct the definition of the ndis_miniport_interrupt structure:
the ni_dpccountlock member is an ndis_kspin_lock, not an
ndis_spin_lock (the latter is too big).

Run if_ndis.c:ndis_tick() via taskqueue_schedule(). Also run
ndis_start() via taskqueue in certain circumstances.

Using these tweaks, I can now get the Broadcom BCM5701 NDIS
driver to load and run. Unfortunately, the version I have seems
to suffer from the same bug as the SMC 83820 driver, which is
that it creates a spinlock during its DriverEntry() routine.
I'm still debating the right way to deal with this.


124231 07-Jan-2004 wpaul

Correct and simplify the implementation of RtlEqualUnicodeString().


124228 07-Jan-2004 wpaul

It appears drivers may call NdisWriteErrorLogEntry() with locks
held. However, if we need to translate a unicode message table message,
ndis_unicode_to_ascii() might malloc() some memory, which causes
a warning from witness. Avoid this by using some stack space to hold
the translated message. (Also bounds check to make sure we don't
overrun the stack buffer.)


124203 07-Jan-2004 wpaul

Use atomic ops for the interlocked increment and decrement routines
in subr_ndis and subr_ntoskrnl. This is faster and avoids potential
LOR whinage from witness (an LOR couldn't happen with the old code
since the interlocked inc/dec routines could not sleep with a lock
held, but this will keep witness happy and it's more efficient
anyway. I think.)


124202 07-Jan-2004 wpaul

In subr_ndis.c: correct ndis_interlock_inc() and ndis_interlock_dec()
so we increment the right thing. (All work and not enough parens
make Bill something something...) This makes the RealTek 8139C+
driver work correctly.

Also fix some mtx_lock_spin()s and mtx_unlock_spin()s that should
have been just plain mtx_lock()s and mtx_unlock()s.

In kern_ndis.c: remove duplicate code from ndis_send_packets() and
just call the senddone handler (ndis_txeof()).


124173 06-Jan-2004 wpaul

Clean up pe_get_message(). Allow the caller to obtain the resource
flag so that it can see if the message string is unicode or not and
do the conversion itself rather than doing it in subr_pe.c. This
prevents subr_pe.c from being dependent on subr_ndis.c.


124165 06-Jan-2004 wpaul

- Add pe_get_message() and pe_get_messagetable() for processing
the RT_MESSAGETABLE resources that some driver binaries have.
This allows us to print error messages in ndis_syslog().

- Correct the implementation of InterlockedIncrement() and
InterlockedDecrement() -- they return uint32_t, not void.

- Correct the declarations of the 64-bit arithmetic shift
routines in subr_ntoskrnl.c (_allshr, allshl, etc...). These
do not follow the _stdcall convention: instead, they appear
to be __attribute__((regparm(3)).

- Change the implementation of KeInitializeSpinLock(). There is
no complementary KeFreeSpinLock() function, so creating a new
mutex on each call to KeInitializeSpinLock() leaks resources
when a driver is unloaded. For now, KeInitializeSpinLock()
returns a handle to the ntoskrnl interlock mutex.

- Use a driver's MiniportDisableInterrupt() and MiniportEnableInterrupt()
routines if they exist. I'm not sure if I'm doing this right
yet, but at the very least this shouldn't break any currently
working drivers, and it makes the Intel PRO/1000 driver work.

- In ndis_register_intr(), save some state that might be needed
later, and save a pointer to the driver's interrupt structure
in the ndis_miniport_block.

- Save a pointer to the driver image for use by ndis_syslog()
when it calls pe_get_message().


124135 04-Jan-2004 wpaul

Modify if_ndis.c so that the MiniportISR function runs in ndis_intr()
and MiniportHandleInterrupt() is fired off later via a task queue in
ndis_intrtask(). This more accurately follows the NDIS interrupt handling
model, where the ISR does a minimal amount of work in interrupt context
and the handler is defered and run at a lower priority.

Create a separate ndis_intrmtx mutex just for the guarding the ISR.

Modify NdisSynchronizeWithInterrupt() to aquire the ndis_intrmtx
mutex before invoking the synchronized procedure. (The purpose of
this function is to provide mutual exclusion for code that shares
variables with the ISR.)

Modify NdisMRegisterInterrupt() to save a pointer to the miniport
block in the ndis_miniport_interrupt structure so that
NdisSynchronizeWithInterrupt() can grab it later and derive
ndis_intrmtx from it.


124122 04-Jan-2004 wpaul

Implement NdisScheduleWorkItem() and RtlCompareMemory().

Also, call the libinit and libfini routines from the modevent
handler in kern_ndis.c. This simplifies the initialization a little.


124116 04-Jan-2004 wpaul

In ndis_attach(), report the NDIS API level that the Windows miniport
driver was compiled with.

Remove debug printf from ndis_assicn_pcirsc(). It doesn't serve
much purpose.

Implement NdisMIndicateStatus() and NdisMIndicateStatusComplete()
as functions in subr_ndis.c. In NDIS 4.0, they were functions. In
NDIS 5.0 and later, they're just macros.

Allocate a few extra packets/buffers beyond what the driver asks
for since sometimes it seems they can lie about how many they really
need, and some extra stupid ones don't check to see if NdisAllocatePacket()
and/or NdisAllocateBuffer() actually succeed.


124100 03-Jan-2004 wpaul

In if_ndis.c:ndis_attach(), temporarily set the IFF_UP flag while
calling the haltfunc. If an interrupt is triggered by the init
or halt func, the IFF_UP flag must be set in order for us to be able
to service it.

In kern_ndis.c: implement a handler for NdisMSendResourcesAvailable()
(currently does nothing since we don't really need it).

In subr_ndis.c:
- Correct ndis_init_string() and ndis_unicode_to_ansi(),
which were both horribly broken.
- Implement NdisImmediateReadPciSlotInformation() and
NdisImmediateWritePciSlotInformation().
- Implement NdisBufferLength().
- Work around my first confirmed NDIS driver bug.
The SMC 9462 gigE driver (natsemi 83820-based copper)
incorrectly creates a spinlock in its DriverEntry()
routine and then destroys it in its MiniportHalt()
handler. This is wrong: spinlocks should be created
in MiniportInit(). In a Windows environment, this is
often not a problem because DriverEntry()/MiniportInit()
are called once when the system boots and MiniportHalt()
or the shutdown handler is called when the system halts.

With this stuff in place, this driver now seems to work:

ndis0: <SMC EZ Card 1000> port 0xe000-0xe0ff mem 0xda000000-0xda000fff irq 10 at device 9.0 on pci0
ndis0: assign PCI resources...
ndis_open_file("FLASH9.hex", 18446744073709551615)
ndis0: Ethernet address: 00:04:e2:0e:d3:f0


124097 03-Jan-2004 wpaul

subr_hal.c: implement WRITE_PORT_BUFFER_xxx() and READ_PORT_BUFFER_xxx()
subr_ndis.c: implement NdisDprAllocatePacket() and NdisDprFreePacket()
(which are aliased to NdisAllocatePacket() and NdisFreePacket()), and
bump the value we return in ndis_mapreg_cnt() to something ridiculously
large, since some drivers apparently expect to be able to allocate
way more than just 64.

These changes allow the Level 1 1000baseSX driver to work for
the following card:

ndis0: <SMC TigerCard 1000 Adapter> port 0xe000-0xe0ff mem 0xda004000-0xda0043ff irq 10 at device 9.0 on pci0
ndis0: Ethernet address: 00:e0:29:6f:cc:04

This is already supported by the lge(4) driver, but I decided
to take a try at making the Windows driver that came with it work too,
since I still had the floppy diskette for it lying around.


124094 03-Jan-2004 wpaul

Tweak ndiscvt to support yet another flavor of .INF files (look for
the NTx86 section decoration).

subr_ndis.c: correct the behavior of ndis_query_resources(): if the
caller doesn't provide enough space to return the resources, tell it
how much it needs to provide and return an error.

subr_hal.c & subr_ntoskrnl.c: implement/stub a bunch of new routines;

ntoskrnl:

KefAcquireSpinLockAtDpcLevel
KefReleaseSpinLockFromDpcLevel
MmMapLockedPages
InterlockedDecrement
InterlockedIncrement
IoFreeMdl
KeInitializeSpinLock

HAL:

KfReleaseSpinLock
KeGetCurrentIrql
KfAcquireSpinLock

Lastly, correct spelling of "_aullshr" in the ntoskrnl functable.


124082 02-Jan-2004 alc

Lock the traversal of the vm object list. Use TAILQ_FOREACH consistently.


124060 02-Jan-2004 wpaul

Clean up ndiscvt a bit (leaving out the -i flag didn't work) and add
copyrights to the inf parser files.

Add a -n flag to ndiscvt to allow the user to override the default
device name of NDIS devices. Instead of "ndis0, ndis1, etc..."
you can have "foo0, foo1, etc..." This allows you to have more than
one kind of NDIS device in the kernel at the same time.

Convert from printf() to device_printf() in if_ndis.c, kern_ndis.c
and subr_ndis.c.

Create UMA zones for ndis_packet and ndis_buffer structs allocated
on transmit. The zones are created and destroyed in the modevent
handler in kern_ndis.c.

printf() and UMA changes submitted by green@freebsd.org


124014 31-Dec-2003 wpaul

- subr_ntoskrnl.c: improve the _fastcall hack based on suggestions from
peter and jhb: use __volatile__ to prevent gcc from possibly reordering
code, use a null inline instruction instead of a no-op movl (I would
have done this myself if I knew it was allowed) and combine two register
assignments into a single asm statement.
- if_ndis.c: set the NDIS_STATUS_PENDING flag on all outgoing packets
in ndis_start(), make the resource allocation code a little smarter
about how it selects the altmem range, correct a lock order reversal
in ndis_tick().


124005 30-Dec-2003 wpaul

- Add new 802.11 OID information obtained from NDIS 5.1 update to
ndis_var.h
- In kern_ndis.c:ndis_send_packets(), avoid dereferencing NULL pointers
created when the driver's send routine immediately calls the txeof
handler (which releases the packets for us anyway).
- In if_ndis.c:ndis_80211_setstate(), implement WEP support.


123976 29-Dec-2003 wpaul

Rework resource allocation. Replace the "feel around like a blind man"
method with something a little more intelligent: use BUS_GET_RESOURCE_LIST()
to run through all resources allocated to us and map them as needed. This
way we know exactly what resources need to be mapped and what their RIDs
are without having to guess. This simplifies both ndis_attach() and
ndis_convert_res(), and eliminates the unfriendly "ndisX: couldn't map
<foo>" messages that are sometimes emitted during driver load.


123941 28-Dec-2003 wpaul

Implement NdisInitUnicodeString().


123940 28-Dec-2003 wpaul

Remove the sanity test in ndis_adjust_buflen(). I'm not sure what the
nb_size field in an ndis_buffer is meant to represent, but it does not
represent the original allocation size, so the sanity check doesn't
make any sense now that we're using the Windows-mandated initialization
method.

Among other things, this makes the following card work with the
NDISulator:

ndis0: <NETGEAR PA301 Phoneline10X PCI Adapter> mem 0xda004000-0xda004fff irq 10 at device 9.0 on pci0

This is that notoriously undocumented 10Mbps HomePNA Broadcom chipset
that people wanted support for many moons ago. Sadly, the only other
HomePNA NIC I have handy is a 1Mbps device, so I can't actually do
any 10Mbps performance tests, but it talks to my 1Mbps ADMtek card
just fine.


123858 26-Dec-2003 wpaul

Attempt to handle the status field in the ndis_packet oob area correctly.

For received packets, an status of NDIS_STATUS_RESOURCES means we need
to copy the packet data and return the ndis_packet to the driver immediatel.
NDIS_STATUS_SUCCESS means we get to hold onto the packet, but we have
to set the status to NDIS_STATUS_PENDING so the driver knows we're
going to hang onto it for a while.

For transmit packets, NDIS_STATUS_PENDING means the driver will
asynchronously return the packet to us via the ndis_txeof() routine,
and NDIS_STATUS_SUCCESS means the driver sent the frame, and NDIS
(i.e. the OS) retains ownership of the packet and can free it
right away.


123848 26-Dec-2003 wpaul

Back out the last batch of changes until I have a chance to properly
evaluate them. Whatever they're meant to do, they're doing it wrong.

Also:

- Clean up last bits of NULL fallout in subr_pe
- Don't let ndis_ifmedia_sts() do anything if the IFF_UP flag isn't set
- Implement NdisSystemProcessorCount() and NdisQueryMapRegisterCount().


123846 26-Dec-2003 green

Don't call the miniport driver's releasepacket function unless the
packet being freed has NDIS_STATUS_PENDING in the status field of
the OOB data. Finish implementing the "alternative" packet-releasing
function so it doesn't crash.

For those that are curious about ndis0: <ORiNOCO 802.11abg ComboCard Gold>:
1123 packets transmitted, 1120 packets received, 0% packet loss
round-trip min/avg/max/stddev = 3.837/6.146/13.919/1.925 ms

Not bad!


123832 25-Dec-2003 wpaul

Give the timer API one last overhaul: this time, use the new callout
API instead of the old timeout/untimeout mechanism.


123828 25-Dec-2003 bde

Quick fix for LINT breakage caused by interface changes in accept(2), etc.
The log message for rev.1.160 of kern/uipc_syscalls.c and associated
changes only claimed to add restrict qualifiers (which have no effect in
the kernel so they probably shouldn't be added), but the following
interface changes were also made:
- caddr_t to `void *' and `struct sockaddr_t *'
- `int *' to `socklen_t *'.
These interface changes are not quite null, and this fix is quick (like
the changes in uipc_syscalls 1.160) because it uses bogus casts instead
of complete bounds-checked conversions.

Things should be fixed better when the conversions can be done without
using the stack gap. linux_check_hdrincl() already uses the stack gap
and is fixed completely though the type mismatches in it were not fatal
(there were only fatal type mismatches from unopaquing pointers to
[o]sockaddr't's -- the difference between accept()'s args and oaccept()'s
args is now non-opaque, but this is not reflected in their args structs).


123826 25-Dec-2003 wpaul

Avoid using any of the ndis_packet/ndis_packet_private fields for
mbuf<->packet housekeeping. Instead, add a couple of extra fields
to the end of ndis_packet. These should be invisible to the Windows
driver module.

This also lets me get rid of a little bit of evil from ndis_ptom()
(frobbing of the ext_buf field instead of relying on the MEXTADD()
macro).


123822 25-Dec-2003 wpaul

- Add stubs for Ndis*File() functions
- Fix ndis_time().
- Implement NdisGetSystemUpTime().
- Implement RtlCopyUnicodeString() and RtlUnicodeStringToAnsiString().
- In ndis_getstate_80211(), use sc->ndis_link to determine connect
status.

Submitted by: Brian Feldman <green@freebsd.org>


123821 24-Dec-2003 wpaul

- Fix some compiler warnings in subr_pe.c
- Add explicit cardbus attachment in if_ndis.c
- Clean up after moving bus_setup_intr() in ndis_attach().
- When setting an ssid, program an empty ssid as a 1-byte string
with a single 0 byte. The Microsoft documentation says this is
how you're supposed to tell the NIC to attach to 'any' ssid.
- Keep trace of callout handles for timers externally from the
ndis_miniport_timer structs, and run through and clobber them
all after invoking the haltfunc just in case the driver left one
running. (We need to make sure all timers are cancelled on driver
unload.)
- Handle the 'cancelled' argument in ndis_cancel_timer() correctly.


123810 24-Dec-2003 alfred

change NULL to 0 to silence warning.


123790 24-Dec-2003 peter

GC unused 'syshide' override to /dev/null. This was here to disable
the output of the namespc column. Its functionality was removed some time
ago, but the overrides and the namespc column remained.


123784 24-Dec-2003 peter

Regen. This should have been a NOP, but its not been regenerated for
ages and is missing the changes from the last few makesyscalls.sh
revisions.


123783 24-Dec-2003 peter

GC OBE namespc column and un-wrap longer lines that now fit


123778 23-Dec-2003 wpaul

Correct the definitions for NDIS_80211_NET_INFRA_IBSS and
NDIS_80211_NET_INFRA_BSS: I accidentally reversed them during
transcription from the Microsoft headers. Note that the
driver will default to BSS mode, and you need to specify
'mediaopt adhoc' to get it into IBSS mode.


123757 23-Dec-2003 wpaul

Re-do the handling of ndis_buffers. The NDIS_BUFFER structure is
supposed to be opaque to the driver, however it is exposed through
several macros which expect certain behavior. In my original
implementation, I used the mappedsystemva member of the structure
to hold a pointer to the buffer and bytecount to hold the length.
It turns out you must use the startva pointer to point to the
page containing the start of the buffer and set byteoffset to
the offset within the page where the buffer starts. So, for a buffer
with address 'baseva,' startva is baseva & ~(PAGE_SIZE -1) and
byteoffset is baseva & (PAGE_SIZE -1). We have to maintain this
convention everywhere that ndis_buffers are used.

Fortunately, Microsoft defines some macros for initializing and
manipulating NDIS_BUFFER structures in ntddk.h. I adapted some
of them for use here and used them where appropriate.

This fixes the discrepancy I observed between how RX'ed packet sizes
were being reported in the Broadcom wireless driver and the sample
ethernet drivers that I've tested. This should also help the
Intel Centrino wireless driver work.

Also try to properly initialize the 802.11 BSS and IBSS channels.
(Sadly, the channel value is meaningless since there's no way
in the existing NDIS API to get/set the channel, but this should
take care of any 'invalid channel (NULL)' messages printed on
the console.


123756 23-Dec-2003 peter

Regen (should be a NOP except for rcsid)


123755 23-Dec-2003 peter

GC unused namespc column.


123748 23-Dec-2003 peter

Regen


123747 23-Dec-2003 peter

freebsd32_fstat(2) is now MPSAFE


123746 23-Dec-2003 peter

Rather than screw around with the (unsafe) stackgap, call vn_stat/fo_stat
directly for stat/fstat/lstat syscall emulation. It turns out not only
safer, but the code is smaller this way too.


123745 23-Dec-2003 peter

Regen


123744 23-Dec-2003 peter

Eliminate stackgap usage for the (woefully incomplete) path translations
since it isn't needed here anymore.
Use standard open(2)/access(2) and chflags(2) syscalls now.


123742 23-Dec-2003 peter

Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.


123723 22-Dec-2003 wpaul

Some minor touchups:

In NdisQueryBuffer() and NdisQueryBufferSafe(), the vaddr argument is
optional, so test it before trying to dereference it.

Also correct NdisGetFirstBufferFromPacket()/NdisGetFirstBufferFromPacketSafe():
we need to use nb_mappedsystemva from the buffer, not nb_systemva.


123721 22-Dec-2003 wpaul

Now that I finally have power back, implement a couple more NDIS API
routines: NdisUnchainBufferAtBack(), NdisGetFirstBufferFromPacketSafe()
and NdisGetFirstBufferFromPacket(). This should bring us a little
closer to getting the Intel centrino wireless NIC to work.

Note: I have not actually tested these additions since I don't
have a driver that calls them, however they're pretty simple, and
one of them is taken pretty much directly from the Windows ndis.h
header file, so I'm fairly confident they work, but disclaimers
apply.


123695 21-Dec-2003 wpaul

Big round of updates:

- Make ndis_get_info()/ndis_set_info() sleep on the setdone/getdone
routines if they get back NDIS_STATUS_PENDING.

- Add a bunch of net80211 support so that 802.11 cards can be twiddled
with ifconfig. This still needs more work and is not guaranteed to
work for everyone. It works on my 802.11b/g card anyway.

The problem here is Microsoft doesn't provide a good way to a) learn
all the rates that a card supports (if it has more than 8, you're
kinda hosed) and b) doesn't provide a good way to distinguish between
802.11b, 802.11b/g an 802.11a/b/g cards, so you sort of have to guess.

Setting the SSID and switching between infrastructure/adhoc modes
should work. WEP still needs to be implemented. I can't find any API
for getting/setting the channel other than the registry/sysctl keys.


123620 18-Dec-2003 wpaul

Deal with the duplicate sysctl leaf problem. A .inf file may contain
definitions for more than one device (usually differentiated by
the PCI subvendor/subdevice ID). Each device also has its own tree
of registry keys. In some cases, each device has the same keys, but
sometimes each device has a unique tree but with overlap. Originally,
I just had ndiscvt(8) dump out all the keys it could find, and we
would try to apply them to every device we could find. Now, each key
has an index number that matches it to a device in the device ID list.
This lets us create just the keys that apply to a particular device.

I also added an extra field to the device list to hold the subvendor
and subdevice ID.

Some devices are generic, i.e. there is no subsystem definition. If
we have a device that doesn't match a specific subsystem value and
we have a generic entry, we use the generic entry.


123573 16-Dec-2003 wpaul

Implement NdisGetBufferPhysicalArraySize(), which apparently is a
synonym for NDIS_BUFFER_TO_SPAN_PAGES().


123536 14-Dec-2003 wpaul

Whups... remove some debug code that accidentally crept in.


123535 14-Dec-2003 wpaul

Rework mbuf<->ndis_packet/ndis_packet<->mbuf translation a little to
make it more robust. This should fix problems with crashes under
heavy traffic loads that have been reported. Also add a 'query done'
callback handler to satisfy the e100bex.sys sample Intel driver.


123526 14-Dec-2003 wpaul

Implement a few new NDIS routines: NdisInitAnsiString(),
NdisAnsiStringToUnicodeString(), NdisWriteConfiguration().

Also add stubs for NdisMGetDeviceProperty(), NdisTerminateWrapper(),
NdisOpenConfigurationKeyByName(), NdisOpenConfigurationKeyByIndex()
and NdisMGetDeviceProperty().


123512 13-Dec-2003 wpaul

Correct the implementation of NDIS_BUFFER_TO_SPAN_PAGES().


123507 13-Dec-2003 wpaul

subr_ndis.c:
- fix ndis_time() so that it returns a time based on the proper
epoch (wacky though it may be)
- implement NdisInitializeString() and NdisFreeString(), and add
stub for NdisMRemoveMiniport()

ntoskrnl_var.h:
- add missing member to the general_lookaside struct (gl_listentry)

subr_ntoskrnl.c:
- Fix arguments to the interlocked push/pop routines: 'head' is an
slist_header *, not an slist_entry *
- Kludge up _fastcall support for the push/pop routines. The _fastcall
convention is similar to _stdcall, except the first two available
DWORD-sized arguments are passed in %ecx and %edx, respectively.
One kludge for this __attribute__ ((regparm(3))), however this
isn't entirely right, as it assumes %eax, %ecx and %edx will be
used (regparm(2) assumes %eax and %edx). Another kludge is to
declare the two fastcall-ed args as local register variables and
explicitly assign them to %ecx and %edx, but experimentation showed
that gcc would not guard %ecx and %edx against being clobbered.
Thus, I came up with a 3rd kludge, which is to use some inline
assembly of the form:

void *arg1;
void *arg2;

__asm__("movl %%ecx, %%ecx" : "=c" (arg1));
__asm__("movl %%edx, %%edx" : "=d" (arg2));

This lets gcc know that we're going to reference %ecx and %edx and
that it should make an effort not to let it get trampled. This wastes
an instruction (movl %reg, %reg is a no-op) but insures proper
behavior. It's possible there's a better way to do this though:
this is the first time I've used inline assembler in this fashion.

The above fixes to ntoskrnl_var.h an subr_ntoskrnl.c make lookaside
lists work for the two drivers I have that use them, one of which
is an NDIS 5.0 miniport and another which is 5.1.


123504 12-Dec-2003 wpaul

Implement some more NDIS and ntoskrnl API calls:

subr_ndis.c: NdisGetCurrentSystemTime() which, according to the
Microsoft documentation returns "the number of 100 nanosecond
intervals since January 1, 1601." I have no idea what's so special
about that epoch or why they chose 100 nanosecond ticks. I don't
know the proper offset to convert nanotime() from the UNIX epoch
to January 1, 1601, so for now I'm just doing the unit convertion
to 100s of nanoseconds.

subr_ntoskrnl.c: memcpy(), memset(), ExInterlockedPopEntrySList(),
ExInterlockedPushEntrySList().

The latter two are different from InterlockedPopEntrySList()
and InterlockedPushEntrySList() in that they accept a spinlock to
hold while executing, whereas the non-Ex routines use a lock
internal to ntoskrnl. I also modified ExInitializePagedLookasideList()
and ExInitializeNPagedLookasideList() to initialize mutex locks
within the lookaside structures. It seems that in NDIS 5.0,
the lookaside allocate/free routines ExInterlockedPopEntrySList()
and ExInterlockedPushEntrySList(), which require the use of the
per-lookaside spinlock, whereas in NDIS 5.1, the per-lookaside
spinlock is deprecated. We need to support both cases.

Note that I appear to be doing something wrong with
ExInterlockedPopEntrySList() and ExInterlockedPushEntrySList():
they don't appear to obtain proper pointers to their arguments,
so I'm probably doing something wrong in terms of their calling
convention (they're declared to be FASTCALL in Widnows, and I'm
not sure what that means for gcc). It happens that in my stub
lookaside implementation, they don't need to do any work anyway,
so for now I've hacked them to always return NULL, which avoids
corrupting the stack. I need to do this right though.


123488 12-Dec-2003 wpaul

Correct the behavior of ndis_adjust_buflen(): the NDIS spec says
it's an error to set the buffer bytecount to anything larger than
the buffer's original allocation size, but anything less than that
is ok.

Also, in ndis_ptom(), use the same logic: if the bytecount is
larger than the allocation size, consider the bytecount invalid
and the allocation size as the packet fragment length (m_len)
instead of the bytecount.

This corrects a consistency problem between the Broadcom wireless
driver and some of the ethernet drivers I've tested: the ethernet
drivers all report the packet frag sizes in buf->nb_bytecount, but
the Broadcom wireless driver reports them in buf->nb_size. This
seems like a bug to me, but it clearly must work in Windows, so
we have to deal with it here too.


123485 12-Dec-2003 wpaul

In NDIS 5.1 miniport drivers, the shutdown handler function pointer
is provided to NDIS via the the miniport characteristics structure
supplied in the call to NdisMRegisterMiniport(). But in NDIS 5.0
and earlier, you had to call NdisMRegisterAdapterShutdownHandler()
and supply both a function pointer and context pointer.

We try to handle both cases in ndis_shutdown_nic(). If the
driver registered a shutdown routine and a context,then used
that context, otherwise pass it the adapter context from
NdisMSetAttributesEx().

This fixes a panic on shutdown with the sample Intel 82559 e100bex.sys
driver from the Windows DDK.
function pointer


123474 11-Dec-2003 wpaul

Commit the first cut of Project Evil, also known as the NDISulator.

Yes, it's what you think it is. Yes, you should run away now.

This is a special compatibility module for allowing Windows NDIS
miniport network drivers to be used with FreeBSD/x86. This provides
_binary_ NDIS compatibility (not source): you can run NDIS driver
code, but you can't build it. There are three main parts:

sys/compat/ndis: the NDIS compat API, which provides binary
compatibility functions for many routines in NDIS.SYS, HAL.dll
and ntoskrnl.exe in Windows (these are the three modules that
most NDIS miniport drivers use). The compat module also contains
a small PE relocator/dynalinker which relocates the Windows .SYS
image and then patches in our native routines.

sys/dev/if_ndis: the if_ndis driver wrapper. This module makes
use of the ndis compat API and can be compiled with a specially
prepared binary image file (ndis_driver_data.h) containing the
Windows .SYS image and registry key information parsed out of the
accompanying .INF file. Once if_ndis.ko is built, it can be loaded
and unloaded just like a native FreeBSD kenrel module.

usr.sbin/ndiscvt: a special utility that converts foo.sys and foo.inf
into an ndis_driver_data.h file that can be compiled into if_ndis.o.
Contains an .inf file parser graciously provided by Matt Dodd (and
mercilessly hacked upon by me) that strips out device ID info and
registry key info from a .INF file and packages it up with a binary
image array. The ndiscvt(8) utility also does some manipulation of
the segments within the .sys file to make life easier for the kernel
loader. (Doing the manipulation here saves the kernel code from having
to move things around later, which would waste memory.)

ndiscvt is only built for the i386 arch. Only files.i386 has been
updated, and none of this is turned on in GENERIC. It should probably
work on pc98. I have no idea about amd64 or ia64 at this point.

This is still a work in progress. I estimate it's about %85 done, but
I want it under CVS control so I can track subsequent changes. It has
been tested with exactly three drivers: the LinkSys LNE100TX v4 driver
(Lne100v4.sys), the sample Intel 82559 driver from the Windows DDK
(e100bex.sys) and the Broadcom BCM43xx wireless driver (bcmwl5.sys). It
still needs to have a net80211 stuff added to it. To use it, you would
do something like this:

# cd /sys/modules/ndis
# make; make load
# cd /sys/modules/if_ndis
# ndiscvt -i /path/to/foo.inf -s /path/to/foo.sys -o ndis_driver_data.h
# make; make load
# sysctl -a | grep ndis

All registry keys are mapped to sysctl nodes. Sometimes drivers refer
to registry keys that aren't mentioned in foo.inf. If this happens,
the NDIS API module creates sysctl nodes for these keys on the fly so
you can tweak them.

An example usage of the Broadcom wireless driver would be:

# sysctl hw.ndis0.EnableAutoConnect=1
# sysctl hw.ndis0.SSID="MY_SSID"
# sysctl hw.ndis0.NetworkType=0 (0 for bss, 1 for adhoc)
# ifconfig ndis0 <my ipaddr> netmask 0xffffff00 up

Things to be done:

- get rid of debug messages
- add in ndis80211 support
- defer transmissions until after a status update with
NDIS_STATUS_CONNECTED occurs
- Create smarter lookaside list support
- Split off if_ndis_pci.c and if_ndis_pccard.c attachments
- Make sure PCMCIA support works
- Fix ndiscvt to properly parse PCMCIA device IDs from INF files
- write ndisapi.9 man page


123427 11-Dec-2003 peter

regen


123426 11-Dec-2003 peter

Mark freebsd32_gettimeofday() as mpsafe


123425 11-Dec-2003 peter

Just implementing a 32 bit version of gettimeofday() was smaller than
the wrapper code. And it doesn't use the stackgap as a bonus.


123424 11-Dec-2003 peter

Move the ia32_sigtramp.S file back under amd64/. This interfaces closely
with the sendsig code in the MD area. It is not safe to assume that all
the register conventions will be the same. Also, the way of producing
32 bit code (.code32 directives) in this file is amd64 specific.


123423 11-Dec-2003 peter

Assimilate ia64 back into the fold with the common freebsd32/ia32 code.
The split-up code is derived from the ia64 code originally.

Note that I have only compile-tested this, not actually run-tested it.
The ia64 side of the force is missing some significant chunks of signal
delivery code.


123422 10-Dec-2003 peter

Use the correct syscall table limit


123417 10-Dec-2003 peter

Regen


123416 10-Dec-2003 peter

Add missing extattr_list_fd(), extattr_list_file(), extattr_list_link()
and kse_switchin() syscall slots.


123415 10-Dec-2003 peter

The osigpending, oaccept, orecvfrom and ogetdirentries entries were
accidently being compiled in as standard. These are part of the
set of unimplemented COMPAT_43 syscall set.


123246 07-Dec-2003 des

Use mp_ncpus instead of the hw.ncpu sysctl.


122892 19-Nov-2003 kan

Do not call VOP_GETATTR in getdents function. It does not serve any
purpose and the resulting vattr structure was ignored. In addition,
the VOP_GETATTR call was made with no vnode lock held, resulting in
vnode locking violation panic with debug kernels.

Reported by: truckman

Approved by: re@ (rwatson)


122861 17-Nov-2003 rwatson

Add a MAC check for VOP_LOOKUP() in the Linux getwcd() implementation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


122802 16-Nov-2003 sobomax

Pull latest changes from OpenBSD:

- improve sysinfo(2) syscall;
- add dummy fadvise64(2) syscall;
- add dummy *xattr(2) family of syscalls;
- add protos for the syscalls 222-225, 238-249 and 253-267;
- add exit_group(2) syscall, which is currently just wired to exit(2).

Obtained from: OpenBSD
MFC after: 2 weeks


122358 09-Nov-2003 dwmalone

Use kern_sendit rather than sendit for the Linux send* syscalls.
This means we can avoid using the stack gap for most send* syscalls
now (it is still used in the IP_HDRINCL case).


122303 08-Nov-2003 peter

Move a MD 32 bit binary support routine into the MD areas. exec_setregs
is highly MD in an emulation environment since it operates on the host
environment. Although the setregs functions are really for exec support
rather than signals, they deal with the same sorts of context and include
files. So I put it there rather than create yet another file.


122302 08-Nov-2003 peter

Regen


122301 08-Nov-2003 peter

"implement" vfork(). Add comments next to the other syscalls that need
to be implemented. This is enough to run i386 /bin/tcsh. /bin/sh is still
not happy because of some strange job control problem.


122293 08-Nov-2003 peter

Remove some duplicated comments that refer to npx. XXX The setregs
function is actually MD (not MI) though..


122277 08-Nov-2003 peter

Point the description of the fpu data in the context structures to
i386/include/npx.h instead of the host's machine/npx.h (which might not
exist)


122253 07-Nov-2003 peter

Dont write to the stackgap directly in execve().


122245 07-Nov-2003 jhb

Regen.


122244 07-Nov-2003 jhb

Sync with global syscalls.master by marking ptrace(), dup(), pipe(),
ktrace(), freebsd32_sigaltstack(), sysarch(), issetugid(), utrace(), and
freebsd32_sigaction() as MP safe.


122153 05-Nov-2003 anholt

Prevent leaking of fsid to non-root users in linux_statfs and linux_fstatfs.
Matches native syscalls now.

PR: kern/58793
Submitted by: David P. Reese Jr. <daver@gomerbud.com>
MFC after: 1 week


122088 05-Nov-2003 fjoe

Back out the following revisions:

1.36 +73 -60 src/sys/compat/linux/linux_ipc.c
1.83 +102 -48 src/sys/kern/sysv_shm.c
1.8 +4 -0 src/sys/sys/syscallsubr.h

That change was intended to support vmware3, but
wantrem parameter is useless because vmware3 uses SYSV shared memory
to talk with X server and X server is native application.
The patch worked because check for wantrem was not valid
(wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments).

Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set
to 1 allows to return removed segments in
shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().

MFC after: 1 week


121816 31-Oct-2003 brooks

Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)


121720 30-Oct-2003 peter

Oops, forgot to save these in the editor. Add CTASSERTS for signal and
context related things.


121719 30-Oct-2003 peter

Add CTASSERT()'s to check that the sizes of our replicas of the 32 bit
structures come out the right size.

Fix the ones that broke. stat32 had some missing fields from the end
and statfs32 was broken due to the strange definition of MNAMELEN
(which is dependent on sizeof(long))

I'm not sure if this fixes any actual problems or not.


121302 21-Oct-2003 tjr

Reject negative ngrp arguments in linux_setgroups() and linux_setgroups16();
stops users being able to cause setgroups to clobber the kernel stack by
copying in data past the end of the linux_gidset array.


121286 20-Oct-2003 sam

fix build: linux_to_bsd_msf_lba is no longer used because of previous commit


121275 20-Oct-2003 tjr

Fix some security bugs in the SVR4 emulator:
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in svr4_emul_find(),
and clean_pipe().
- Avoid integer overflow on large nfds argument in svr4_sys_poll()
- Reject negative nbytes argument in svr4_sys_getdents()
- Don't copy out past the end of the struct componentname
pathname buffer in svr4_sys_resolvepath()
- Reject out-of-range signal numbers in svr4_sys_sigaction(),
svr4_sys_signal(), and svr4_sys_kill().
- Don't malloc() user-specified lengths in show_ioc() and
show_strbuf(), place arbitrary limits instead.
- Range-check lengths in si_listen(), ti_getinfo(), ti_bind(),
svr4_do_putmsg(), svr4_do_getmsg(), svr4_stream_ti_ioctl().

Some fixes obtain from OpenBSD.


121272 20-Oct-2003 sos

We dont support CDROMREADAUDIO anymore.


121265 20-Oct-2003 cognet

Various style and type fixes in my last commit.

Suggested by: mux


121246 19-Oct-2003 cognet

Implement partially /proc/<pid>/maps.
It looks enough to make SImics run.

Reviewed by: des


121008 11-Oct-2003 iwasaki

Fix some problems in linux_sendmsg() and linux_recvmsg().
- Allocate storage for uap->msg always because it is copyin()'ed in
native sendmsg().
- Convert sockopt level from Linux to FreeBSD after native recvmsg() calling.
- Some cleanups.

Tested with: Oracle 9i shared server connection mode.

MFC after: 1 week


120912 08-Oct-2003 gallatin

make kernel_sysctl()'s args match its prototype in order to fix the
alpha build


120605 30-Sep-2003 des

Fix a (fortunately harmless) signed / unsigned bug.


120422 25-Sep-2003 peter

Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.


120340 22-Sep-2003 des

Previous commit contained too-smart-for-its-own-good code that might
produce incorrect (though harmless) output on single-CPU systems.


120339 22-Sep-2003 des

Fake multi-cpu statistics for proc/stat by dividing the totals by the
number of CPUs.

PR: kern/27522


119923 09-Sep-2003 des

Fix some broken comments.


119911 09-Sep-2003 des

Add cwd, root and statm (modeled on a 2.4.20 kernel). De-obfuscate
linprocfs_init() a little and remove some gratuitous whitespace.


119839 07-Sep-2003 bde

Restored a non-egregious cast so that this file compiles on i386's
with 64-bit longs again. This was fixed in rev.1.42 but the fix
rotted non-fatally in rev.1.105 and fatally in rev.1.137.

Many more non-egregrious casts are strictly required for conversions
from semi-opaque types to pointers, but we avoid most of them by using
types that are almost certain to be compatible with uintptr_t for
representing pointers (e.g., vm_offset_t). Here we don't really want
the u_longs, but we have them because a.out.h and its support code
doesn't use typedefs (it uses unsigned in V7 and unsigned long in
FreeBSD) and is too obsolete to fix now.


119336 23-Aug-2003 peter

Switch to using the emulator in the common compat area.
Still work-in-progress.


119334 22-Aug-2003 peter

Initial sweep at dividing up the generic 32bit-on-64bit kernel support
from the ia32 specific stuff. Some of this still needs to move to the MI
freebsd32 area, and some needs to move to the MD area. This is still
work-in-progress.


119333 22-Aug-2003 peter

Initial sweep to de-i386-ify this


119332 22-Aug-2003 peter

Regen


119331 22-Aug-2003 peter

Begin attempting to consolidate the two different i386 emulations
on ia64 and amd64. I'm attempting to keep the generic 32bit-on-64bit
binary support seperate from the i386 support and the MD backend support.


119194 21-Aug-2003 peter

Regen


119193 21-Aug-2003 peter

This is too funny for words. Swap syscalls 416 and 417 around. It works
better that way when sigaction() and sigreturn() do the right thing.


119068 18-Aug-2003 des

Whitespace cleanup.


119008 17-Aug-2003 marcel

Cleanup <machine/cpu.h> by moving MD prototypes to <machine/md_var.h>
like we have on other platforms. Move savectx() to <machine/pcb.h>.
A lot of files got these MD prototypes through the indirect inclusion
of <machine/cpu.h> and now need to include <machine/md_var.h>. The
number of which is unexpectedly large...

osf1_misc.c especially is tricky because szsigcode is redefined in
one of the osf1 header files. Reordering of the include files was
needed.

linprocfs.c now needs an explicit extern declaration.

Tested with: LINT


118560 06-Aug-2003 phk

Remove dangling extern reference to swap_pager_full


118421 04-Aug-2003 des

Add support for multiple CPUs to cpuinfo.


118149 29-Jul-2003 des

Try to make 'uname -a' look more like it does on Linux:

- cut the version string at the newline, suppressing information about
who built the kernel and in what directory. Most of this information
was already lost to truncation.

- on i386, return the precise CPU class (if known) rather than just
"i386". Linux software which uses this information to select
which binary to run often does not know what to make of "i386".


118047 26-Jul-2003 phk

Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.


118031 25-Jul-2003 obrien

Use __FBSDID().

Brought to you by: a boring talk at Ottawa Linux Symposium


117723 18-Jul-2003 phk

Add a new function swap_pager_status() which reports the total size of the
paging space and how much of it is in use (in pages).

Use this interface from the Linuxolator instead of groping around in the
internals of the swap_pager.


116999 28-Jun-2003 marcel

Don't map LINUX_POSIX_VDISABLE to _POSIX_VDISABLE and vice versa for
the VMIN and VTIME members of the c_cc array. These members are not
special control characters. By not excluding these members we
changed the noncanonical mode input processing when both members
were 0 on entry (=LINUX_POSIX_VDISABLE) as we would remap them to 255
(=_POSIX_VDISABLE). See termios(4) case A for how that screws up
your terminal I/O.

PR: 23173
Originator: Bjarne Blichfeldt <bbl@dk.damgaard.com>
Patch by: Boris Nikolaus <bn@dali.tellique.de> (original submission)
Philipp Mergenthaler <philipp.mergenthaler@stud.uni-karlsruhe.de>
Reminders by: Joseph Holland King <gte743n@cad.gatech.edu>
MFC after: 5 days


116678 22-Jun-2003 phk

Add a f_vnode field to struct file.

Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.


116361 15-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


116174 10-Jun-2003 obrien

Use __FBSDID().


116173 10-Jun-2003 obrien

Use __FBSDID().


115550 31-May-2003 phk

Put definition of struct svr4_sockcache_entry in a .h file rather than
having two independent definitions in two .c files.
Fiddle surrounding details to match.

Found by: FlexeLint


115430 31-May-2003 peter

Regenerate.


115429 31-May-2003 peter

Make this compile with WITNESS enabled. It wants the syscall names.


115252 23-May-2003 peter

Deal with the user VM space expanding. 32 bit applications do not like
having their stack at the 512GB mark. Give 4GB of user VM space for 32
bit apps. Note that this is significantly more than on i386 which gives
only about 2.9GB of user VM to a process (1GB for kernel, plus page
table pages which eat user VM space).

Approved by: re (blanket)


115006 15-May-2003 peter

Collect the nastiness for preserving the kernel MSR_GSBASE around the
load_gs() calls into a single place that is less likely to go wrong.

Eliminate the per-process context switching of MSR_GSBASE, because it
should be constant for a single cpu. Instead, save/restore it during
the loading of the new %gs selector for the new process.

Approved by: re (amd64/* blanket)


114988 14-May-2003 peter

Regen

Approved by: re (amd64 blanket)


114987 14-May-2003 peter

Add BASIC i386 binary support for the amd64 kernel. This is largely
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.

Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.

On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.

Approved by: re (amd64/* blanket)


114983 13-May-2003 jhb

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


114934 12-May-2003 phk

Don't #define memset() to bzero(), it is far too prone to bite somebody.

Approved by: re/scottl


114724 05-May-2003 mbr

Change the semantics of sysv shm emulation to take a additional
argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}.
The BSD semantics didn't allow the usage of shared segment after
being marked for removal through IPC_RMID.

The patch involves the following functions:
- shmat
- shmctl
- shm_find_segment_by_shmid
- shm_find_segment_by_shmidx
- linux_shmat
- linux_shmctl

Submitted by: Orlando Bassotto <orlando.bassotto@ieo-research.it>
Reviewed by: marcel


114230 29-Apr-2003 mbr

Initialize tbuf in newstat_copyout() too.

Reviewed by: phk


114216 29-Apr-2003 kan

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


114214 29-Apr-2003 mbr

Do the same thing for stat64_copyout() as we already
do for newstat_copyout().

Lie about disk drives which are character devices
in FreeBSD but block devices under Linux.

PR: 37227
Submitted by: Vladimir B. Grebenschikov <vova@sw.ru>
Reviewed by: phk
MFC after: 2 weeks


114174 28-Apr-2003 jhb

Argh! We want to return the old signal set when the error return is zero
(i.e. success), not non-zero (failure).

Submitted by: tegge
Pointy hat to: jhb


114023 25-Apr-2003 jhb

Use a switch to convert the Linux sigprocmask flags to the equivalent
FreeBSD flags instead of just adding one to the Linux flags. This should
be identical to the previous version except that I have at least one report
of this patch fixing problems people were having with Linux apps after my
last commit to this file. It is safer to use the switch then to make
assumptions about the flag values anyways, esp. since we currently use
MD defines for the values of the flags and this is MI code.

Tested by: Michael Class <michael_class@gmx.net>


114017 25-Apr-2003 jhb

Regen.


114016 25-Apr-2003 jhb

Oops, the thr_* and jail_attach() syscall entries should be NOPROTO rather
than STD.


113991 24-Apr-2003 anholt

Add an ioctl handler for the DRM. This removes the need for the DRM_LINUX
option, which has been a source of frustration for many users.


113989 24-Apr-2003 jhb

Regen.


113987 24-Apr-2003 jhb

Fix the thr_create() entry by adding a trailing \. Also, sync up the
MP safe flag for thr_* with the main table.


113917 23-Apr-2003 jhb

Fix a lock order reversal. Unlock the proc before calling fget().

Reported by: kris


113859 22-Apr-2003 jhb

- Replace inline implementations of sigprocmask() with calls to
kern_sigprocmask() in the various binary compatibility emulators.
- Replace calls to sigsuspend(), sigaltstack(), sigaction(), and
sigprocmask() that used the stackgap with calls to the corresponding
kern_sig*() functions instead without using the stackgap.


113616 17-Apr-2003 jhb

The proc lock is sufficient to test p_state against PRS_ZOMBIE, so don't
needlessly lock sched_lock.


113615 17-Apr-2003 jhb

Don't hold the proc lock while performing sigset conversions on local
variables.


113613 17-Apr-2003 jhb

Use local struct proc variables to reduce repeated td->td_proc dereferences
and improve readability.


113611 17-Apr-2003 jhb

P_SHOULDSTOP used to be p_stat == SSTOP and needed the sched_lock, now it
is protected by the proc lock and doesnt' need sched_lock, so adjust the
locking appropriately.


113581 16-Apr-2003 phk

Don't include <sys/disklabel.h>


113579 16-Apr-2003 jhb

Explicitly cast a l_ulong to an unsigned long to make all arch's happy
with the printf format.


113577 16-Apr-2003 jhb

Fix printf format in a debug printf.


113574 16-Apr-2003 jhb

Fix multiple printf warnings on Alpha:
- Prefer long long to quad_t to match printf args.
- Use uintmax_t and %j to print segsz_t and vm_size_t values.
- Fix others in Alpha-specific code.


113275 09-Apr-2003 mike

o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.

Reviewed by: rwatson, tjr


112938 01-Apr-2003 phk

Add #include <sys/conf.h> so we don't rely on <sys/disk.h> doing it.


112931 01-Apr-2003 phk

Don't include <sys/buf.h> needlessly.


112908 01-Apr-2003 jeff

- Add thr and umtx system calls.


112896 31-Mar-2003 jeff

- Add a placeholder for sigwait


112888 31-Mar-2003 jeff

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


112740 28-Mar-2003 phk

Fix an XXX: and implement LINUX_BLKGETSIZE correctly.


112682 26-Mar-2003 jhb

Add a cleanup function to destroy the osname_lock and call it on module
unload.

Submitted by: gallatin
Reported by: Martin Karlsson <mk-freebsd@bredband.net>


112470 21-Mar-2003 jhb

Sync up linux and svr compat elf fixup functions for exec(). These
functions are now all basically identical except that alpha linux uses
Elf64 arguments and svr4 and i386 linux use Elf32. The fixups include
changing the first argument to be a register_t ** to match the prototype
for fixup functions, asserting that the process in the image_params struct
is always curproc and removing unnecessary locking to read credentials as a
result, and a few style fixes.


112451 20-Mar-2003 jhb

Use td->td_ucred instead of td->td_proc->p_ucred.


112430 20-Mar-2003 phk

Backout the getcwd changes, a more comprehensive effort will be needed.


112342 17-Mar-2003 phk

(This commit certainly increases the need for a wash&clean of vfs_cache.c,
but I decided that it was important for this patch to not bit-rot, and
since it is mainly moving code around, the total amount of entropy is
epsilon /phk)

This is a patch to move the common parts of linux_getcwd() back into
kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's
extended functionality. The linux syscall linux_getcwd() in
compat/linux/linux_getcwd.c has been rewritten to use it too. It should
be possible to simplify libc's getcwd() after this. No doubt this code
needs some cleaning up, since I've left in the sysctl variables I used
for debugging.

PR: 48169
Submitted by: James Whitwell <abacau@yahoo.com.au>


112206 13-Mar-2003 jhb

- Change the linux_[gs]et_os{name, release, s_version}() functions to
take a thread instead of a proc for their first argument.
- Add a mutex to protect the system-wide Linux osname, osrelease, and
oss_version variables.
- Change linux_get_prison() to take a thread instead of a proc for its
first argument and to use td_ucred rather than p_ucred. This is ok
because a thread's prison does not change even though it's ucred might.
- Also, change linux_get_prison() to return a struct prison * instead of
a struct linux_prison * since it returns with the struct prison locked
and this makes it easier to safely unlock the prison when we are done
messing with it.


111798 03-Mar-2003 des

Clean up whitespace and remove register keyword.


111797 03-Mar-2003 des

More caddr_t removal, in conjunction with copy{in,out}(9) this time.
Also clean up some egregious casts and incorrect use of sizeof.


111748 02-Mar-2003 des

More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).


111742 02-Mar-2003 des

Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.


111741 02-Mar-2003 des

uiomove-related caddr_t -> void * (just the low-hanging fruit)


111173 20-Feb-2003 ume

Add M_WAITOK


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111034 17-Feb-2003 tjr

Use the proc lock to protect p_realtimer instead of Giant, and obtain
sched_lock around accesses to p_stats->p_timer[] to avoid a potential
race with hardclock. getitimer(), setitimer() and the realitexpire()
callout are now Giant-free.


111002 16-Feb-2003 phk

Remove #include <sys/dkstat.h>


110980 16-Feb-2003 tjr

Add MPSAFE comment to linux_sigpending().


110848 14-Feb-2003 tjr

Obtain proc lock around modification of p_siglist in linux_wait4().


110538 08-Feb-2003 dwmalone

1) Linux_sendto was trashing the BSD sockaddr it put in the stackgap,
so be more careful about calling stackgap_init.

Tested by: Fred Souza <fred@storming.org>

2) Linux_sendmsg was forgetting to fill out the bsd_args struct.

Reviewed by: ume

3) The args to linux_connect have differently named types on alpha and
i386, so add a cast to stop gcc complaining.

Spotted by: peter


110376 05-Feb-2003 ume

Avoid undefined symbol error with an IPv4 only kernel.

Reported by: "Sergey A. Osokin" <osa@freebsd.org.ru>


110295 03-Feb-2003 ume

Add IPv6 support for Linuxlator.

Reviewed by: dwmalone
MFC after: 10 days


110232 02-Feb-2003 alfred

Consolidate MIN/MAX macros into one place (param.h).

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109254 14-Jan-2003 dillon

Add missing #include

Submitted by: "Sam Leffler" <sam@errno.com>


109203 13-Jan-2003 dillon

Apply bandaid to bring svr4_sys_waitsys() in line with exit1(). This
routine really need to be gutted and merged with exit1().

Reviewed by: jhb


109153 13-Jan-2003 dillon

Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.


109123 12-Jan-2003 dillon

Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.


108541 02-Jan-2003 alfred

Add function linux_msg() for regulating output from the linux emulation
code, make the emulator use it.

Rename unsupported_msg() to unimplemented_syscall(). Rename some arguments
for clarity

Fixup grammar.

Requested by: bde


108523 01-Jan-2003 alfred

When complaining about obsolete/unimplemented syscalls output the process
name to make things more clear for the user.

PR: 46661
MFC After: 3 days


108409 29-Dec-2002 rwatson

Synchronize to kern/syscalls.master:1.139.

Obtained from: TrustedBSD Project


108172 22-Dec-2002 hsu

SMP locking for ifnet list.


107923 16-Dec-2002 marcel

Regen: swapoff


107922 16-Dec-2002 marcel

Change swapoff from MNOPROTO to UNIMPL. The former doesn't work.


107913 15-Dec-2002 dillon

This is David Schultz's swapoff code which I am finally able to commit.
This should be considered highly experimental for the moment.

Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after: 3 weeks


107849 14-Dec-2002 alfred

SCARGS removal take II.


107839 13-Dec-2002 alfred

Backout removal SCARGS, the code freeze is only "selectively" over.


107838 13-Dec-2002 alfred

Remove SCARGS.

Reviewed by: md5


107680 08-Dec-2002 iedowse

Fix emulation of the fcntl64() syscall. In Linux, this is exactly
the same as fcntl() except that it supports the new 64-bit file
locking commands (LINUX_F_GETLK64 etc) that use the `flock64'
structure. We had been interpreting all flock structures passed to
fcntl64() as `struct flock64' instead of only the ones from F_*64
commands.

The glibc in linux_base-7 uses fcntl64() by default, but the bug
was often non-fatal since the misinterpretation typically only
causes junk to appear in the `l_len' field and most junk values are
accepted as valid range lengths. The result is occasional EINVAL
errors from F_SETLK and a few bytes after the supplied `struct
flock' getting clobbered during F_GETLK.

PR: kern/37656
Reviewed by: marcel
Approved by: re
MFC after: 1 week


106993 16-Nov-2002 deischen

Regenerate after adding syscalls.


106989 16-Nov-2002 deischen

Add *context() syscalls to ia64 32-bit compatability table as requested
in kern/syscalls.master.


106468 05-Nov-2002 rwatson

Bring in two sets of changes:

(1) Permit userland applications to request a change of label atomic
with an execve() via mac_execve(). This is required for the
SEBSD port of SELinux/FLASK. Attempts to invoke this without
MAC compiled in result in ENOSYS, as with all other MAC system
calls. Complexity, if desired, is present in policy modules,
rather than the framework.

(2) Permit policies to have access to both the label of the vnode
being executed as well as the interpreter if it's a shell
script or related UNIX nonsense. Because we can't hold both
vnode locks at the same time, cache the interpreter label.
SEBSD relies on this because it supports secure transitioning
via shell script executables. Other policies might want to
take both labels into account during an integrity or
confidentiality decision at execve()-time.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


106437 05-Nov-2002 rwatson

Remove reference to struct execve_args from struct imgact, which
describes an image activation instance. Instead, make use of the
existing fname structure entry, and introduce two new entries,
userspace_argv, and userspace_envv. With the addition of
mac_execve(), this divorces the image structure from the specifics
of the execve() system call, removes a redundant pointer, etc.
No semantic change from current behavior, but it means that the
structure doesn't depend on syscalls.master-generated includes.

There seems to be some redundant initialization of imgact entries,
which I have maintained, but which could probably use some cleaning
up at some point.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


106364 02-Nov-2002 rwatson

Sync to src/sys/kern/syscalls.master


105663 21-Oct-2002 julian

Remove the process state PRS_WAIT.
It is never used. I left it there from pre-KSE days as I didn't know
if I'd need it or not but now I know I don't.. It's functionality
is in TDI_IWAIT in the thread.


105490 19-Oct-2002 peter

Stake a claim on 418 (__xstat), 419 (__xfstat), 420 (__xlstat)


105486 19-Oct-2002 peter

Grab 416/417 real estate before I get burned while testing again.
This is for the not-quite-ready signal/fpu abi stuff. It may not see
the light of day, but I'm certainly not going to be able to validate it
when getting shot in the foot due to syscall number conflicts.


105477 19-Oct-2002 marcel

Implement the CDROMREADAUDIO ioctl.


105476 19-Oct-2002 rwatson

Add a placeholder for the execve_mac() system call, similar to SELinux's
execve_secure() system call, which permits a process to pass in a label
for a label change during exec. This permits SELinux to change the
label for the resulting exec without a race following a manual label
change on the process. Because this interface uses our general purpose
MAC label abstraction, we call it execve_mac(), and wrap our port of
SELinux's execve_secure() around it with appropriate sid mappings.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105360 17-Oct-2002 robert

Replace the conventional usage of strncpy() by using strlcpy().


105359 17-Oct-2002 robert

- Use strlcpy() rather than strncpy() to copy NUL terminated
strings.
- Pass the correct buffer size to getcredhostname().


104893 11-Oct-2002 sobomax

- Add support for IPC_64 extensions into shmctl(2), semctl(2) and msgctl(2);
- add wrappers for mmap2(2) and ftruncate64(2) system calls;
- don't spam console with printf's when VFAT_READDIR_BOTH ioctl(2) is invoked;
- add support for SOUND_MIXER_READ_STEREODEVS ioctl(2);
- make msgctl(IPC_STAT) and IPC_SET actually working by converting from
BSD msqid_ds to Linux and vice versa;
- properly return EINVAL if semget(2) is called with nsems being negative.

Reviewed by: marcel
Approved by: marcel
Tested with: LSB runtime test


104741 09-Oct-2002 peter

re-regen. Sigh.


104740 09-Oct-2002 peter

Sigh. Fix fat-fingering of diff. I knew this was going to happen.


104739 09-Oct-2002 peter

regenerate. sendfile stuff and other recently picked up stubs.


104738 09-Oct-2002 peter

Try and deal with the #ifdef COMPAT_FREEBSD4 sendfile stuff. This would
have been a lot easier if do_sendfile() was usable externally.


104736 09-Oct-2002 peter

Try and patch up some tab-to-space spammage.


104735 09-Oct-2002 peter

Add placeholder stubs for nsendfile, mac_syscall, ksem_close, ksem_post,
ksem_wait, ksem_trywait, ksem_init, ksem_open, ksem_unlink, ksem_getvalue,
ksem_destroy, __mac_get_pid, __mac_get_link, __mac_set_link,
extattr_set_link, extattr_get_link, extattr_delete_link.


104571 06-Oct-2002 rwatson

Integrate mac_check_socket_send() and mac_check_socket_receive()
checks from the MAC tree: allow policies to perform access control
for the ability of a process to send and receive data via a socket.
At some point, we might also pass in additional address information
if an explicit address is requested on send.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


104379 02-Oct-2002 archie

Let kse_wakeup() take a KSE mailbox pointer argument.

Reviewed by: julian


104306 01-Oct-2002 jmallett

Back our kernel support for reliable signal queues.

Requested by: rwatson, phk, and many others


104233 30-Sep-2002 jmallett

First half of implementation of ksiginfo, signal queues, and such. This
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control. There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.

After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland. That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.

CODAFS is unfinished in this regard because the logic is unclear in
some places.

Sponsored by: New Gold Technology
Reviewed by: bde, tjr, jake [an older version, logic similar]


103972 25-Sep-2002 archie

Make the following name changes to KSE related functions, etc., to better
represent their purpose and minimize namespace conflicts:

kse_fn_t -> kse_func_t
struct thread_mailbox -> struct kse_thr_mailbox
thread_interrupt() -> kse_thr_interrupt()
kse_yield() -> kse_release()
kse_new() -> kse_create()

Add missing declaration of kse_thr_interrupt() to <sys/kse.h>.
Regenerate the various generated syscall files. Minor style fixes.

Reviewed by: julian


103941 25-Sep-2002 jeff

- Hold the vn lock over vm_mmap().


103886 24-Sep-2002 mini

Back out last commit. Linux uses the old 4.3BSD sockaddr format.


103873 23-Sep-2002 jhb

Ok, make this compile for real this time. recvfrom_args doesn't have a
fromlen member, instead it has a fromlenaddr pointer member. Set it to
NULL.


103872 23-Sep-2002 jhb

Use correct variable name so that previous commit actually compiles.


103839 23-Sep-2002 mini

Don't use compatability syscall wrappers in emulation code.
This is needed for the COMPAT_FREEBSD3 option split.

Reviewed by: alfred, jake


103767 21-Sep-2002 jake

Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.


103712 20-Sep-2002 mdodd

Remove NVIDIA ioctl bits. They will be provided in a kernel module.


103705 20-Sep-2002 phk

Put an XXX comment here to point somebody in the right direction.


103664 20-Sep-2002 imp

Current uses struct thread *td rather than struct proc *p.


103652 19-Sep-2002 mdodd

Pass flags to msync() accounting for differences in the definition of
MS_SYNC on FreeBSD and Linux.

Submitted by: Christian Zander <zander@minion.de>


103651 19-Sep-2002 mdodd

This patch extends the FreeBSD Linux compatibility layer to support
NVIDIA API calls; more specifically, it adds an ioctl() handler for
the range of possible NVIDIA ioctl numbers.

Submitted by: Christian Zander <zander@minion.de>


103216 11-Sep-2002 julian

Completely redo thread states.

Reviewed by: davidxu@freebsd.org


103086 07-Sep-2002 peter

Tidy up some loose ends that bde pointed out. caddr_t bad, ok?
Move fill_kinfo_proc to before we copy the results instead of after
the copy and too late.

There is still more to do here.


103084 07-Sep-2002 peter

The true value of how the kernel was configured for KSTACK_PAGES was not
available at module compile time. Do not #include the bogus
opt_kstack_pages.h at this point and instead refer to the variables that
are also exported via sysctl.


103065 07-Sep-2002 peter

Fix a missing line in a cut/paste error.


103047 07-Sep-2002 peter

Collect the a.out coredump code into the calling functions.
XXX why does pecoff dump in a.out format?


102963 05-Sep-2002 bde

Do not cast from a pointer to an integer of a possibly different size.
This fixes a warning on i386's with 64-bit longs.


102954 05-Sep-2002 bde

Include <sys/malloc.h> instead of depending on namespace pollution 2
layers deep in <sys/proc.h> or <sys/vnode.h>.

Removed unused includes. Sorted includes.


102947 05-Sep-2002 marcel

Implement LINUX_TIOCSCTTY.

PR: kern/42404


102872 02-Sep-2002 iedowse

Use the new kern_*() functions to avoid using the stack gap in
linux_fcntl*() and linux_getcwd().


102814 01-Sep-2002 iedowse

Use the new kern_* functions to avoid the need to store arguments
in the stack gap. This converts most VFS and signal related system
calls, as well as select().

Discussed on: -arch
Approved by: marcel


102808 01-Sep-2002 jake

Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec. Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places. Provided stack
fixup routines for emulations that previously used the default.


102803 01-Sep-2002 iedowse

Add a new function linux_emul_convpath(), which is a version of
linux_emul_find() that does not use stack gap storage but instead
always returns the resulting path in a malloc'd kernel buffer.
Implement linux_emul_find() in terms of this function. Also add
LCONVPATH* macros that wrap linux_emul_convpath in the same way
that the CHECKALT* macros wrap linux_emul_find().


102727 31-Aug-2002 jake

Make this compile.


102630 30-Aug-2002 dillon

Implement data, text, and vmem limit checking in the elf loader and svr4
compat code. Clean up accounting for multiple segments. Part 1/2.

Submitted by: Andrey Alekseyev <uitm@zenon.net> (with some modifications)
MFC after: 3 days


102291 22-Aug-2002 archie

Replace (ab)uses of "NULL" where "0" is really meant.


102052 18-Aug-2002 sobomax

Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by: -hackers, -net


102003 17-Aug-2002 rwatson

In continuation of early fileop credential changes, modify fo_ioctl() to
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.

- Change fo_ioctl() to accept active_cred; change consumers of the
fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
invocations of soo_ioctl() are provided access to the calling f_cred.
Pass ap->a_td->td_ucred as the active_cred, but note that this is
required because we don't yet distinguish file_cred and active_cred
in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101983 16-Aug-2002 rwatson

Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential. Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential. Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument. This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL()
to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
and consumers so that this distinction is maintained at the VFS
as well as 'struct file' layer. Pass active_cred instead of
td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
credential is properly initialized and can be used in the socket
code if desired. Pass ap->a_td->td_ucred as the active
credential to soo_poll(). If we teach the vnop interface about
the distinction between file and active credentials, we would use
the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained. It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101941 15-Aug-2002 rwatson

In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101924 15-Aug-2002 rwatson

On MAC check failure for readdir, use 'goto out' to use the common exit
handling, rather than returning directly to prevent leaking of vnode
reference/lock.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101846 13-Aug-2002 jeff

- Add the missing td argument to vn_lock that I missed in my last commit.


101771 13-Aug-2002 jeff

- Hold the vnode lock throughout execve.
- Set VV_TEXT in the top level execve code.
- Fixup the image activators to deal with the newly locked vnode.


101709 12-Aug-2002 rwatson

Enforce MAC policies for the locally implemented vnode services in
SVR4 emulation relating to readdir() and fd_revoke(). All other
services appear to be implemented by simply wrapping existing
FreeBSD native system call implementations, so don't require local
instrumentation in the emulator module.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101707 12-Aug-2002 rwatson

Another fix that wasn't pulled in from the MAC branch: the
struct mount is not cached as *mp at this point, so use
vp->v_mount directly, following the check that it's non-NULL.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101706 12-Aug-2002 rwatson

Fix missing parens in MAC readdir() check. This fix was in the MAC
branch, but apparently didn't get moved over when it was made.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101308 04-Aug-2002 jeff

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


101189 01-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC entry points for a number of VFS-related
operations in the Linux ABI module. In particular, handle uselib
in a manner similar to open() (more work is probably needed here),
as well as handle statfs(), and linux readdir()-like calls.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


100385 20-Jul-2002 peter

Regenerate


100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


99687 09-Jul-2002 robert

Move the switch statement labels for the explicit 64-bit
command arguments into the correct function, linux_fcntl64(),
and thus out of the scope of a compilation for the alpha
platform.

Requested by: obrien


99670 09-Jul-2002 robert

Enable emulation of the F_GETLK64, F_SETLK64, and F_SETLKW64
lock commands arguments to linux_fcntl64().


99669 09-Jul-2002 robert

The comment marked with XXX was right: emulate SVR4 for
ELF binaries branded with ELFOSABI_SYSV, this is reported
to work and brandelf(1) puts this type into files if "SVR4"
was specified.


99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


98878 26-Jun-2002 arr

- Remove the Giant acquisition from linux_socket_ioctl() as it was really
there to protect fdrop() (which in turn can call vrele()), however,
fdrop_locked() grabs Giant for us, so we do not have to.

Reviewed by: jhb
Inspired by: alc


98209 14-Jun-2002 rwatson

Add a comment about how we should use vn_open() here instead of directly
invoking VOP_OPEN(). This would reduce code redundancy with the rest
of the kernel, and also is required for MAC to work properly.


98124 11-Jun-2002 alfred

catch up with ktrace changes, KTRPOINT takes a 'struct thread' not
'struct proc' now.


97994 07-Jun-2002 jhb

Catch up to changes in ktrace API.


97748 02-Jun-2002 schweikh

Fix typo in the BSD copyright: s/withough/without/

Spotted and suggested by: des
MFC after: 3 weeks


97658 31-May-2002 tanimura

Back out my lats commit of locking down a socket, it conflicts with hsu's work.

Requested by: hsu


97555 30-May-2002 alfred

correct commented out preprocessor test for i386 to __i386__


97271 25-May-2002 bde

Fixed a printf format error. It was old and should have been detected by
gcc-2.9x, but somehow wasn't fixed already.


96972 20-May-2002 tanimura

Lock down a socket, milestone 1.

o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

- so_count
- so_options
- so_linger
- so_state

o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:

- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()

Reviewed by: alfred


96886 19-May-2002 jhb

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


96840 18-May-2002 marcel

In msgrcv(), set msgtyp correctly. Hardwiring 0 as the message type
yields incorrect behaviour. The hardwiring was present in the very
first commit that implemented msgrcv() (revision 1.4) and hasn't been
changed since. The native implementation was complete at that time,
so there doesn't seem to be a reason for the hardwiring from a
technical point of view.

Submitted by: Reinier Bezuidenhout <rbezuide@yahoo.com>


96398 11-May-2002 dd

sysctl -w -> sysctl


95837 01-May-2002 peter

Zap some stale unused headers, including one machine/psl.h (which is
a stub on alpha). Compile tested on alpha and x86.


95130 20-Apr-2002 rwatson

Add an XXX: linux_uselib() should be using vn_open() rather than invoking
VOP_OPEN() and doing lots of manual checking. This would further
centralize use of the name functions, and once the MAC code is integrated,
meaning few extraneous MAC checks scattered all over the place. I don't
have time to fix this now, but want to make sure it doesn't get
forgotten. Anyone interested in fixing this should feel free.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


94858 16-Apr-2002 jhb

- Lock proctree_lock instead of pgrpsess_lock.
- Exclusively lock proctree_lock while calling leavepgrp().


94621 13-Apr-2002 jhb

Rework logic of syscalls that modify process credentials as described in
rev 1.152 of sys/kern/kern_prot.c.


94620 13-Apr-2002 jhb

- p_cansee() needs the target process locked.
- We need the proc lock held for more of procfs_doprocstatus().


94455 11-Apr-2002 jhb

Use proc lock to protect p_ucred pointer while we deference it to read a
few values.


94454 11-Apr-2002 jhb

Use td_ucred in a few spots.


94380 10-Apr-2002 dfr

Initial support for executing IA-32 binaries. This will not compile
without a few patches for the rest of the kernel to allow the image
activator to override exec_copyout_strings and setregs.

None of the syscall argument translation has been done. Possibly, this
translation layer can be shared with any platform that wants to support
running ILP32 binaries on an LP64 host (e.g. sparc32 binaries?)


94307 09-Apr-2002 jhb

- Change fill_kinfo_proc() to require that the process is locked when it
is called.
- Change sysctl_out_proc() to require that the process is locked when it
is called and to drop the lock before it returns. If this proves too
complex we can change sysctl_out_proc() to simply acquire the lock at
the very end and have the calling code drop the lock right after it
returns.
- Lock the process we are going to export before the p_cansee() in the
loop in sysctl_kern_proc() and hold the lock until we call
sysctl_out_proc().
- Don't call p_cansee() on the process about to be exported twice in
the aforementioned loop.


93793 04-Apr-2002 bde

Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93393 29-Mar-2002 alfred

Protect proc struct (p_args and p_comm) when doing procfs IO that pulls
data from it.

Submitted by: Jonathan Mini <mini@haikugeek.com>


93295 27-Mar-2002 alfred

Make the reference counting of 'struct pargs' SMP safe.

There is still some locations where the PROC lock should be held
in order to prevent inconsistent views from outside (like the
proc->p_fd fix for kern/vfs_syscalls.c:checkdirs()) that can be
fixed later.

Submitted by: Jonathan Mini <mini@haikugeek.com>


93073 24-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). Tabs before "__P(("
were not removed.


92787 20-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


92761 20-Mar-2002 alfred

Remove __P.


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


91392 27-Feb-2002 robert

Use the updated getcredhostname() function.


91386 27-Feb-2002 robert

- Use the new getcredhostname function in the SVR4 uname system call.
- Remove spurious empty line.

Reviewed by: phk


91385 27-Feb-2002 robert

Use the getcredhostname function to fill the hostname into
the linux_newuname_args structure. This should fix the case
of jailed linux processes not using the jail's hostname.

PR: 35336
Reviewed by: phk


91334 26-Feb-2002 julian

remove "discards qualifier" erro by not potentially writing to
a const *.


91140 23-Feb-2002 tanimura

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


90984 20-Feb-2002 alfred

fix file descriptor leak.

Submitted by: Mark Santcroos <marks@ripe.net>


90690 15-Feb-2002 bde

Garbage collect options AVM_A1_PCI, AVM_A1_PCMCIA, DEBUG_LINUX, DEV_APM,
GUS_DMA, GUS_DMA2, GUS_IRQ, OLTR_NO_BULLSEYE_MAC, OLTR_NO_HAWKEYE_MAC,
OLTR_NO_TMS_MAC and PCIC_RESUME_RESET.


90371 07-Feb-2002 peter

Attempt to unmangle some code touched in the previous commit.


90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


90002 30-Jan-2002 alfred

include sys/lock.h and sys/mutex.h to make compile.

Noticed by: Vincent Poy <vince@oahu.WURLDLINK.NET>


89944 29-Jan-2002 marcel

Have SIOCGIFCONF return all (if any) AF_INET addresses for the
interfaces we encounter. In Linux, all addresses are returned for
which gifconf handlers are installed. This boils down to AF_DECnet
and AF_INET. We care mostly about AF_INET for now. Adding additional
families is simple enough.

Returning the addresses is important for RPC clients to function
properly. Andrew found in some reference code that the logic that
handles the retransmission looks for an interface that's up and has
an AF_INET address. This obviously failed as we didn't return any
addresses at all.

Note also that with this change we don't return interfaces that don't
have AF_INET addresses, whereas before we returned any interface
present in the system. This is in line with what Linux does (modulo
interfaces with only AF_DECnet addresses of course :-)

Reported by: "Andrew Atrens" <atrens@nortelnetworks.com>
MFC after: 1 week


89717 23-Jan-2002 gallatin

Linux/alpha uses the same BSDish return mechanism we do for
getpid, getuid, getgid and pipe, since they bootstrapped from
OSF/1 and never cleaned up. Switch to the native syscalls
on alpha so that the above functions work

MFC after: 7 days


89545 19-Jan-2002 tanimura

Lock the caller process if the pid passed to getsid() or getpgid()
equals to zero.


89541 19-Jan-2002 tanimura

For getsid(), return the sid stored in struct session. This prevents
panic in case where a session has no session leader.

Inspired by: Solaris 8


89536 19-Jan-2002 alfred

Make compile, remove extra fdrop() calls.
Change name of function to what it's supposed to be (s/sys/do)


89535 19-Jan-2002 alfred

make compile, add missing { and variable declaration.


89534 19-Jan-2002 alfred

Semi-backout previous fgetvp change, we need the struct file pointer
to perform relative offset calculations, so use fget instead.


89411 16-Jan-2002 alfred

fix typo, there's uap, just fd


89379 15-Jan-2002 marcel

Reinstate linux_ifname. Although the Linuxulator doesn't use it
itself, it's used outside the Linuxulator. Reimplement the
function so that its behaviour matches the current renaming
scheme. It's probably better to formalize these interdependencies.


89319 14-Jan-2002 alfred

Replace ffind_* with fget calls.

Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().


89311 13-Jan-2002 alfred

Remove unused variable.


89308 13-Jan-2002 alfred

Some of the KSE stuff was accidentally reverted by file locking,
fix it.

Pointed out by: jhb


89306 13-Jan-2002 alfred

SMP Lock struct file, filedesc and the global file list.

Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.

1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file * fhold(struct file *fp);
/* increments reference count on a file */

struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */

struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */

struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.


89182 10-Jan-2002 marcel

Further fixes related to the interface renaming. Now that we
properly translate the interface name passed to us, make sure
we also translate correctly before we return the list of
interfaces with the SIOCGIFCONF ioctl. It is common to use
the interface names returned by that ioctl in further ioctls,
such as SIOCGIFFLAGS.

Remove linux_ifname as it is no longer used. Also remove
ifname_bsd_to_linux as it cannot be used anymore now that
linux_ifname is removed (was deadcode anyway).

Reported and tested by: Andrew Atrens <atrens@nortelnetworks.com>


89064 08-Jan-2002 msmith

Gut this header; since physio_proc_init is never called, the code never does
anything more than multiply declare some unused variables.


87599 10-Dec-2001 obrien

Update to C99, s/__FUNCTION__/__func__/,
also don't use ANSI string concatenation.


87543 09-Dec-2001 des

Pull in more stuff from procfs now that it's been pseudofsized.


87335 04-Dec-2001 marcel

When translating the interface name when "eth?" is given, do not
use the internal index number as the unit number to compare with.
The first ethernet interface in Linux is called "eth0", whereas
our internal index starts wth 1 and is not unique to ethernet
interfaces (lo0 has index 1 for example). Instead, use a function-
local index number that starts with 0 and is incremented only
for ethernet interfaces. This way the unit number will match the
n-th ethernet interface in the system, which is exactly what it
means in Linux.

Tested by: Glenn Johnson <gjohnson@srrc.ars.usda.gov>
MFC after: 3 days


87275 03-Dec-2001 rwatson

o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
so as to enforce these protections, in particular, in kern_mib.c
protection sysctl access to the hostname and securelevel, as well as
kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
mib entries (osname, osrelease, osversion) so that they don't return
a pointer to the text in the struct linux_prison, rather, a copy
to an array passed into the calls. Likewise, update linprocfs to
use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
accessing struct prison.

Reviewed by: jhb


86852 24-Nov-2001 des

Revert incorrect KSEfication: realitexpire expects a struct proc *, not a
struct thread *.


86607 19-Nov-2001 iedowse

Deal with a few issues that cropped up following the recent changes
to the code for translating socket and private ioctls:

- Only perform socket ioctl translation if the file descriptor is a
socket.
- Treat socket ioctls on non-sockets specially, and for now assume
that these are directed at a tap/vmnet device, so translate the
ioctl numbers as appropriate (the way if_tap abuses some socket
ioctls to pass non-ifreq data is utterly bogus, but this is how
VMware on FreeBSD has always "worked"; I will deal with this
later).
- Add (untested) support for translating SIOCSIFADDR.
- In all cases where we fail to translate an ioctl, return ENOIOCTL
so that other handlers have a chance to do the translation.

This should fix the "/dev/vmnet1: Invalid argument" errors that
users of VMware were experiencing, though I have only verified this
on RELENG_4.

Submitted by: des (mostly)
MFC after: 3 days


86555 18-Nov-2001 marcel

Implement DVD-ROM ioctls.

PR: 26955
Submitted by: Boris Nikolaus (email unknown)


86540 18-Nov-2001 marcel

Implement missing SOUND_MIXER_WRITE_RECSRC ioctl.

PR: 22971
Tested by: dougb


86504 17-Nov-2001 dillon

Fix missing holdsock()->fgetsock()

Submitted by: Hisashi Hiramoto <hiramoto@phys.chs.nihon-u.ac.jp>


86487 17-Nov-2001 dillon

Give struct socket structures a ref counting interface similar to
vnodes. This will hopefully serve as a base from which we can
expand the MP code. We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.


86484 17-Nov-2001 peter

Forward declare struct ifnet - this fixes a warning in tdfx_pci.c


86483 17-Nov-2001 peter

Fix printf warnings (int/long)
#if 0 around unused ifname_bsd_to_linux() function


86482 17-Nov-2001 peter

Fix warning in debug printf. This is a long on alpha, and int on i386,
but printed with %ld always.


86183 08-Nov-2001 rwatson

o Replace reference to 'struct proc' with 'struct thread' in 'struct
sysctl_req', which describes in-progress sysctl requests. This permits
sysctl handlers to have access to the current thread, permitting work
on implementing td->td_ucred, migration of suser() to using struct
thread to derive the appropriate ucred, and allowing struct thread to be
passed down to other code, such as network code where td is not currently
available (and curproc is used).

o Note: netncp and netsmb are not updated to reflect this change, as they
are not currently KSE-adapted.

Reviewed by: julian
Obtained from: TrustedBSD Project


85657 29-Oct-2001 dillon

promote tv_sec in printf to make it type agnostic


85623 28-Oct-2001 mr

Introduce [IPC|SHM]_[INFO|STAT] to shmctl to make
`/compat/linux/usr/bin/ipcs -m` happy.


85599 27-Oct-2001 des

Eliminate the prefix parameter to linux_emul_find(), which was always
linux_emul_path anyway. Linux_emul_find() has interesting bugs in its
prefix handling (which luckily are not currently exploitable); this
commit is preliminary to an attempt at cleaning it up.

Approved by: marcel


85569 26-Oct-2001 fenner

Force the length of the sockaddr to be correct for AF_INET and AF_INET6
in bind() and connect(). Linux doesn't care if the length of the
sockaddr matches its address family; FreeBSD does. This fixes the
known issues with the resolver in linux_base-7.


85538 26-Oct-2001 phk

Reporting device drivers by traversing cdevsw[] is at best a hack
which may or may not return something which is partially right.

Disable the "devices" file until we find out what this is needed for,
and what exactly those apps need.

This will allow cdevsw to become static again.

Approved by: DES


85289 21-Oct-2001 des

Add proc/mtab which simulates a Linux system's /etc/mtab.


85203 20-Oct-2001 des

Tweak the way we determine if an interface needs to have its name translated.
Add some missing break statements in the socket ioctl switch.
Check the return value from copyin() / copyout().
Fix some disorderings and misindentations.
Support a couple more socket ioctls.
Add missing break statements.


85139 19-Oct-2001 marcel

Fix Alpha related brokenness. We used to have a MD linux_ioctl.h
that appeared to be very different from the MI version. These
differences were mostly bogus and caused by copying octal
definitions and write them as hexadecimal values without doing
any base conversion (ie 010 was copied to 0x10). After filtering
out these differences, any remaining (real) incompatibilities
have been merged into the MI header file to make them more visible.

While here, fix the termios <-> termio conversion WRT to the c_cc
field for Alpha. The termios values do not match the termio values
and thus prevents us from copying.

By eliminating the Alpha MD copy of linux_ioctl.h we also fixed
the recent build breakage caused by putting new bits in the MI
header and not in the MD header.


85130 19-Oct-2001 des

#if 0 out some code that depends on other uncommitted patches.


85129 19-Oct-2001 des

Adapt to pseudofs changes (dynamic initialization, not static).
Use the new linux_ifname() function from the linuxulator rather than roll
our own interface name translation.


85127 19-Oct-2001 des

Add support for the "device private" ioctls soon to be used by the an driver.
Also slightly change the name translation policy - only rename interfaces
that have the IFF_BROADCAST flag set. This is not perfect, but is closer to
how Linux names network interfaces.


85125 19-Oct-2001 des

Whitespace fix.


85022 16-Oct-2001 marcel

Implement linux_chown and linux_lchown. The fchown syscall maps
directly to the native syscall, because no filename handling
needs to be done.

Tested by: Martin Blapp <mb@imp.ch>


85012 15-Oct-2001 des

Try to make Linux socket ioctls work. Up until now they've only *pretended*
to work, but haven't really due to subtle differences in structs etc.

This is still not perfect (some ioctls are still known not to work, while
others haven't been tested at all), but it's enough to get Debian's ifconfig
to produce relatively sane output.

More work will be needed to get all ioctls (or at least a reasonable subset)
working, and to support the Cisco Aironet config tool mentioned in the PR.

PR: 26546
Submitted by: Doug Ambrisko <ambrisko@ambrisko.com>


84916 14-Oct-2001 marcel

When casting from uid16/gid16 to uid/gid respectively, make sure
that "no change" (ie 0xFFFF) is properly cast to (int)-1 for those
syscalls that set uids and/or gids.

Verified by: LTP


84811 11-Oct-2001 jhb

Add missing includes of sys/lock.h.


84783 10-Oct-2001 ps

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


84248 01-Oct-2001 des

Catch up with the visibility callback stuff, and give up trying to keep the
file definitions on single lines.


84189 30-Sep-2001 des

Specify readability and / or writeability for all nodes that need it.


84150 29-Sep-2001 des

Adapt to pseudofs version 2. Sorry about the breakage - I had this ready
to commit along with the pseudofs patches, but just plain forgot.


84075 28-Sep-2001 marcel

Remove linux_getpgid(). We map the syscall natively now.

PR: kern/21402


84067 28-Sep-2001 marcel

Swap the src and dst arguments of the bcopy added in the
previous commit. It ain't memcpy... *cough*


83955 26-Sep-2001 marcel

The arg parameter is passed by value in Linux, but not in FreeBSD.
We still have to account for a copyin. Make sure the copyin will
succeed by passing the FreeBSD syscall a pointer to userspace,
albeit one that's automagically mapped into kernel space.

Reported by: mr, Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org>
Tested by: Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org>


83926 25-Sep-2001 des

Clean up my source tree to avoid getting hit too badly by the next KSE or
whatever mega-commit. No real functional changes, just some experiments /
work in progress.


83667 19-Sep-2001 sobomax

Fix abuse of vtagtype. In addition, after this the linux programs will be
able correctly distinguish ext2fs from the ufs filesystem (previously ext2fs
was indistinguishable from the ufs).

Reviewed by: phk, marcel


83503 15-Sep-2001 mr

Add a wrapper for linux_getsid -> getsid Syscall.


83501 15-Sep-2001 mr

Implement LINUX_[SEM|IPC]_[STAT|INFO]
to make /compat/linux/usr/bin/ipcs -s happy.

PR: kern/29698 (part)
Reviewed by: audit


83436 14-Sep-2001 marcel

Fix off by one error introduced by the use of the ifnet_byindex()
macro. The commit log clearly states that the index given to the
macro is one higher than previously used to index the array. This
wasn't represented in the code and resulted in kernel page faults.

Reported by: Andrew Atrens <atrens@nortelnetworks.com>


83418 13-Sep-2001 julian

Fix typo.
noticed by: jhb


83382 12-Sep-2001 jhb

Whitespace fix.


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83222 08-Sep-2001 dillon

This brings in a Yahoo coredump patch from Paul, with additional mods by
me (addition of vn_rdwr_inchunks). The problem Yahoo is solving is that
if you have large process images core dumping, or you have a large number of
forked processes all core dumping at the same time, the original coredump code
would leave the vnode locked throughout. This can cause the directory vnode
to get locked up, which can cause the parent directory vnode to get locked
up, and so on all the way to the root node, locking the entire machine up
for extremely long periods of time.

This patch solves the problem in two ways. First it uses an advisory
non-blocking lock to abort multiple processes trying to core to the same
file. Second (my contribution) it chunks up the writes and uses bwillwrite()
to avoid holding the vnode locked while blocking in the buffer cache.

Submitted by: ps
Reviewed by: dillon
MFC after: 2 weeks


83221 08-Sep-2001 marcel

Round of cleanups and enhancements. These include (in random order):

o Introduce private types for use in linux syscalls for two reasons:
1. establish type independence for ease in porting and,
2. provide a visual queue as to which syscalls have proper
prototypes to further cleanup the i386/alpha split.
Linuxulator types are prefixed by 'l_'. void and char have not
been "virtualized".

o Provide dummy functions for all syscalls and remove dummy functions
or implementations of truely obsolete syscalls.

o Sanitize the shm*, sem* and msg* syscalls.

o Make a first attempt to implement the linux_sysctl syscall. At this
time it only returns one MIB (KERN_VERSION), but most importantly,
it tells us when we need to add additional sysctls :-)

o Bump the kenel version up to 2.4.2 (this is not the same as the
KERN_VERSION MIB, BTW).

o Implement new syscalls, of which most are specific to i386. Our
syscall table is now up to date with Linux 2.4.2. Some highlights:
- Implement the 32-bit uid_t and gid_t bases syscalls.
- Implement a couple of 64-bit file size/offset bases syscalls.

o Fix or improve numerous syscalls and prototypes.

o Reduce style(9) violations while I'm here. Especially indentation
inconsistencies within the same file are addressed. Re-indenting
did not obfuscate actual changes to the extend that it could not
be combined.

NOTE: I spend some time testing these changes and found that if there
were regressions, they were not caused by these changes AFAICT.
It was observed that installing a RH 7.1 runtime environment
did make matters worse. Hangs and/or reboots have been observed
with and without these changes, so when it failed to make life
better in cases it doesn't look like it made it worse.


83130 06-Sep-2001 jlemon

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


82753 01-Sep-2001 dillon

Synchronize syscalls.master(s) with recent Giant pushdown work


82745 01-Sep-2001 marcel

Speculatively add this file. It's part of the Linuxulator update
to make it emulate Linux kernel version 2.4.2, which is required
in order to upgrade the linux_base port to RH 7.1.

Note that this file is only needed for 32-bit architectures. To
us this means i386 (for now?)


82518 29-Aug-2001 gallatin

Fix linux_getcwd() so that if the cwd isn't cached (__getcwd() fails),
the cwd is looked up inside the kernel. The native getcwd() in libc
handles this in userland if __getcwd() fails.

Obtained from: NetBSD via OpenBSD
Tested by: Chris Casey <chriss@phys.ksu.edu>, Markus Holmberg <markush@acc.umu.se>
Reviewed by: Darrell Anderson <anderson@cs.duke.edu>
PR: kern/24315


80180 23-Jul-2001 pirzyk

Added the linux_sysinfo function to implement sysinfo(2).

PR: kern/27759
Reviewed by: marcel
Approved by: marcel
MFC after: 1 week


80114 22-Jul-2001 assar

get rid of some printf and pointer type warnings


79335 05-Jul-2001 rwatson

o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx().
The p_can(...) construct was a premature (and, it turns out,
awkward) abstraction. The individual calls to p_canxxx() better
reflect differences between the inter-process authorization checks,
such as differing checks based on the type of signal. This has
a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
invocation of p_candebug(), while maintaining the special case
check of KTR_ROOT. This allows ktrace() to "play more nicely"
with new mandatory access control schemes, as well as making its
authorization checks consistent with other "debugging class"
checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
caller to determine if privilege was required for successful
evaluation of the access control check. This primitive is currently
unused, and as such, serves only to complicate the API.

Approved by: ({procfs,linprocfs} changes) des
Obtained from: TrustedBSD Project


78264 15-Jun-2001 peter

Bah, back out part of previous commit. I got too carried away.
linux_debug_map[] is referred to from elsewhere.


78259 15-Jun-2001 peter

Fix warnings:
235: warning: unsigned int format, pointer arg (arg 3)
621: warning: cast discards qualifiers from pointer target type


78258 15-Jun-2001 peter

Fix warning:
239: warning: no previous prototype for `linux_debug'


78257 15-Jun-2001 peter

Fix warning:
413: warning: long unsigned int format, vm_offset_t arg (arg 2)


78161 13-Jun-2001 peter

With this commit, I hereby pronounce gensetdefs past its use-by date.

Replace the a.out emulation of 'struct linker_set' with something
a little more flexible. <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.

The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).

The macros declare a strongly typed set. They return elements with the
type that you declare the set with, rather than a generic void *.

For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>). Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.

For a.out, we use the old linker_set struct.

NOTE: the item lists are no longer null terminated. This is why
the code impact is high in certain areas.

The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.

linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.

Reviewed by: eivind


78116 11-Jun-2001 des

Say one thing, do the other... nextpid -> lastpid


78113 11-Jun-2001 des

Implement proc/cpuinfo for the Alpha (thanks to gallatin).
Implement proc/pid/cmdline.


78031 11-Jun-2001 des

Minor whitespace changes.


78026 10-Jun-2001 des

These aren't needed any more.


78025 10-Jun-2001 des

New pseudofs-based linprocfs (repo-copied from linprocfs_misc.c).


77675 04-Jun-2001 paul

S_IFCHR is not a bit mask, it's just a value in a field. The correct
way to clear that field is to use S_IFMT.

Pointed out by BDE.


77575 01-Jun-2001 ru

Remove vestiges of MFS.


77435 29-May-2001 phk

Remove MFS


77223 26-May-2001 ru

- sys/n[tw]fs moved to sys/fs/n[tw]fs
- /usr/include/n[tw]fs moved to /usr/include/fs/n[tw]fs


77183 25-May-2001 rwatson

o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit


76941 21-May-2001 jhb

Sort includes.


76839 19-May-2001 jlemon

Add new 'loadavg' entry, fix overflow with meminfo.

PR: 27253, 27350
Submitted by: Jim Pirzyk


76827 19-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


76405 09-May-2001 des

Avoid overflow when converting ticks to jiffies.

PR: 27215
Submitted by: Jim Pirzyk <Jim.Pirzyk@disney.com>


76267 04-May-2001 jlemon

Fix the problem of some directory entries going missing when
read by the linux version of 'ls'.

Spotted by: rwatson


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76131 29-Apr-2001 phk

Add a vop_stdbmap(), and make it part of the default vop vector.

Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.


75988 25-Apr-2001 paul

A bogus check for a char device also matched symbolic links.
Replace it with a correct check using S_ISCHR()

Symbolic links will now work again in linux compatibility.


75916 24-Apr-2001 rwatson

o Change a suser() call to a suser_xxx(..., PRISON_ROOT) call in the
linuxulator so as to allow privileged processes within a jail() to
invoke the Linux initgroups() system call. This allows the Linux
"su" to work properly (better) when running a complete Linux
environment under jail(). This problem was reported by Attila
Nagy <bra@fsn.hu>.

Reviewed by: marcel


75893 24-Apr-2001 jhb

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


75053 01-Apr-2001 alc

Add linux_sched_get_priority_max() and linux_sched_get_priority_min(): The
policy parameter requires translation.


74940 28-Mar-2001 jhb

Add missing includes of <sys/sx.h>

Reported by: peter


74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


74701 23-Mar-2001 gallatin

fix linux_times() to take into account linux's value of CLK_TCK on the alpha.
Previously, results were off by a factor of 10

Tested by: Yoriaki FUJIMORI <fujimori@grafin.fujimori.cache.waseda.ac.jp>


74135 12-Mar-2001 jlemon

Eliminate global node types and instead use an operations vector for
each node in order to make it easier to add new entries.

Rewrite the internal directory structure so that it is possible to
have independent subdirectories. Utilize this to add /proc/net/dev.

Reviewed by: DES


73929 07-Mar-2001 jhb

Grab the process lock while calling psignal and before calling psignal.


73923 07-Mar-2001 jhb

Just hold the proc lock while getting the parent's PID rather than a
proctree lock.


73907 07-Mar-2001 jhb

- Hold both an exclusive proctree lock and the proc lock when reparenting
a traced process during exit.
- Lock the parent process while sending it SIGCHLD.


73353 02-Mar-2001 jlemon

Only pick up so_error the first time through with EISCONN, as advertised.
The sense of the test was reversed, so we were returning EISCONN, then 0.

Pointed out and tested by: Martin Blapp <mb@imp.ch>


73288 01-Mar-2001 jlemon

Correctly emulate linux_connect. For nonblocking sockets, the behavior
is to return EINPROGRESS, EALREADY, (so_error ONCE), EISCONN. Certain
linux applications rely on the so_error (normally 0) being returned in
order to operate properly.

Tested by: Thomas Moestl <tmoestl@gmx.net>


73286 01-Mar-2001 adrian

Reviewed by: jlemon

An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.

* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.


72999 24-Feb-2001 obrien

MFS: bring the consistent `compat_3_brand' support into -CURRENT
(the work was first done in the RELENG_4 branch near a release
during a MFC to make the code cleaner and more consistent)


72786 21-Feb-2001 rwatson

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


72543 16-Feb-2001 jlemon

Allow debugging output to be controlled on a per-syscall granularity.
Also clean up debugging output in a slightly more uniform fashion.

The default behavior remains the same (all debugging output is turned on)


72538 16-Feb-2001 jlemon

Add mount syscall to linux emulation. Also improve emulation of reboot.


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


72091 06-Feb-2001 asmodai

Fix typo: seperate -> separate.

Seperate does not exist in the english language.


72082 06-Feb-2001 asmodai

Fix typo: wierd -> weird.

There is no such thing as wierd in the english language.


71699 27-Jan-2001 jhb

Back out proc locking to protect p_ucred for obtaining additional
references along with the actual obtaining of additional references.


71697 26-Jan-2001 jhb

- Back out over-aggressive locking of p->p_cred.
- Back out locking ucred's and bumping refcounts for vnode operations.


71490 24-Jan-2001 jhb

Use queue macros.


71471 23-Jan-2001 jhb

- Proc locking.
- Use queue macros.
- Use NULL instead of 0 for pointers.

Reviewed by: des


71456 23-Jan-2001 jhb

Argh, atomic_store_rel -> atomic_store_rel_int.


71455 23-Jan-2001 jhb

Woops, add in missing headers.


71454 23-Jan-2001 jhb

Proc locking.


71453 23-Jan-2001 jhb

Use queue macros.


71452 23-Jan-2001 jhb

- Add proc locking.
- Fix several bugs in the wait syscall, including freeing the actual
proc start, freeing the args, freeing the prison, and other minor
nits.
- Use appropriate queue(3) macros.
- Use zpfind() instead of walking zombproc ourselves.


71449 23-Jan-2001 jhb

- Use proper atomic operations to make the run time initialization
controlled by svr_str_initialized be MP safe.


71447 23-Jan-2001 jhb

FreeBSD doesn't have p_emuldata, and our stackgap_init() doesn't take an
argument.


71446 23-Jan-2001 jhb

Use proc lock to safely obtain references to p_ucred before vnode
operations.


71445 23-Jan-2001 jhb

Protect calcru() with sched_lock.


71435 23-Jan-2001 takawata

Map BSS section in PECOFF executable.

Submitted by: KUROSAWA Takahiro <fwkg7679@mb.infoweb.ne.jp>


71286 20-Jan-2001 wollman

Finish deprecating <sys/select.h> in favor of <sys/selinfo.h> in kernel code.


71048 14-Jan-2001 joe

Instead of hard coding the major numbers for IDE and SCSI disks
look in the device's cdevsw for the D_DISK flag.


70889 10-Jan-2001 jake

Protect proc.p_pptr with the proctree lock.


70835 09-Jan-2001 green

Take 10 seconds to actually fix the chgproccnt rather than just make it
explicitly error. If the module is horribly broken, it should be
temporarily removed from src/sys/modules.


70830 09-Jan-2001 wollman

With some trepidation, add a `#error' directive to this module. It was
broken and not fixed by whoever changed the interface of chgproccnt();
in the state it is in it could not possibly work (dereferencing an integer).


70458 29-Dec-2000 paul

Map FreeBSD character device hard disks to Linux block device hard disks.

This fixes the problem with VMWARE not being able to use raw disks.


70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


70224 20-Dec-2000 takawata

Add PECOFF (WIN32 Execution file format) support.
To use it, some dll is needed. And currently, the dll is only for NetBSD.
So one more kernel module is needed.
For more infomation,
http://chiharu.haun.org/peace/ .

Reviewed by: bp


70178 19-Dec-2000 assar

translate the flags in recvfrom and recvmsg from linux to bsd ones

Approved by: marcel


70061 15-Dec-2000 jhb

Lock access to proc members.

Glanced over by: marcel


69995 13-Dec-2000 des

Use kinfo_proc instead of eproc (which Kirk deep-sixed earlier this week)

Generate a version string that looks just like a real Linux one - almost :)

Use sbufs everywhere instead of sprintf(). Note that this is still imperfect,
as the code does not check whether the sbuf overflowed - but it'll still
work better than before, since if the sbuf overflows, the code now simply
copies out 0 bytes instead of causing a trap (or worse, corrupting kernel
structures)


69994 13-Dec-2000 des

Add dependency on linux, which is needed for proc/version.


69969 13-Dec-2000 jake

Lock the allproc list.

Approved by: DES


69947 13-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


69933 12-Dec-2000 des

Point #includes at compat/linprocfs instead of i386/linux/linprocfs.


69802 09-Dec-2000 des

Add proc/<pid>/cmdline.


69801 09-Dec-2000 des

Add a dependency on procfs.


69799 09-Dec-2000 des

A bunch of fixes that have been rotting in my tree for a month or two
waiting for procfs to get fixed:

- Use fill_eproc() to obtain correct VM stats. Attempt to compute VmLib.

- Fill some more fields in proc/<pid>/stat, and add four (unimplemented)
fields after studying a recent Linux kernel.

- Compute CPU frequency only once instead of twice.

- Fix some comments that were OBE.

- Fix indentation except where it makes the code less readable.


69595 05-Dec-2000 marcel

Remove call to bzero after MALLOC and instead add M_ZERO
to MALLOC.


69541 03-Dec-2000 marcel

Include machine/cpu.h for cpu_getstack().

Spotted by: jake


69539 03-Dec-2000 marcel

Don't auto-generate the syscalls.


69442 01-Dec-2000 jhb

Protect access to p_stat with sched_lock.


69379 30-Nov-2000 marcel

Don't use p->p_sigstk.ss_flags to keep state of whether the
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.

We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.

PR: 22286


69286 27-Nov-2000 jake

Use callout_reset instead of timeout(9). Most callouts are statically
allocated, 2 have been added to struct proc for setitimer and sleep.

Reviewed by: jhb, jlemon


69269 27-Nov-2000 des

Add bogomips to cpuinfo (set it equal to the CPU frequency, which is bogus
but not more so than Linux' definition).
This should get the IBM JDK 1.3 working again.

Prompted by: sobomax


69084 23-Nov-2000 dillon

Forgot to patch this file in file descriptor race fix commit

Submitted-by: "Danny J. Zerkel" <dzerkel@columbus.rr.com>


68803 16-Nov-2000 gallatin

Use the linux_connect() on alpha rather than passing directly through
to our native connect(). This is required to deal with the differences
in the way linux handles connects on non-blocking sockets.

This gets the private beta of the Compaq Linux/alpha JDK working
on FreeBSD/alpha

Approved by: marcel


68662 13-Nov-2000 marcel

Fix F_SETOWN on pipes. Linux returns EINVAL while we send a SIGIO
signal. There's at least 1 program that is known to break.
Submitted patch has been edited to match current code.

MFC: yes
Submitted by: bde


68583 10-Nov-2000 marcel

Revert auto-generation. The Alpha port is broken.
Syncing with it is wrong.


68520 09-Nov-2000 marcel

Make MINSIGSTKSZ machine dependent, and have the sigaltstack
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.

The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.

The native MINSIGSTKSZ is now defined as follows:

Arch MINSIGSTKSZ
---- -----------
alpha 4096
i386 2048
ia64 12288

Reviewed by: mjacob
Suggested by: bde


68519 09-Nov-2000 marcel

Sync with Alpha:
Do not use sysent.c, proto.h and syscall.h in source tree;
use auto-generated versions.


68377 06-Nov-2000 des

Check that p->p_pptr is not NULL - kernel processes have no parents!


68347 05-Nov-2000 marcel

Fix getdents syscall.

The offset field in struct dirent was set to the offset of
the next dirent in rev 1.36. The offset was calculated from
the current offset and the record length. This offset does
not necessarily match the real offset when we are using
cookies. Therefore, also use the cookies to set the offset
field in struct dirent if we're using cookies to iterate
through the dirents.


68251 02-Nov-2000 gallatin

zap a stray include that snuck in with rev 1.56

Submitted by: Clive Lin <clive@CirX.ORG>


68247 02-Nov-2000 gallatin

fix a comment that was inadvertantly changed by a cvs merge
pointed out by: obrien


68225 02-Nov-2000 marcel

Fix linux_ustat syscall. We only have cdevs now, so looking
for a block device isn't that useful anymore.

Reported by: Wesley Morgan <morganw@chemicals.tacorp.com>
Submitted by: gallatin
Acknowledged by: phk


68214 01-Nov-2000 gallatin

Support for the linux ipc syscalls on the alpha, where each one has
its own syscall rather than going through a demux function like
linux_ipc() on i386


68210 01-Nov-2000 gallatin

fix linux_termio and linux_termios structs on alpha. alpha differences
are in the termios struct (probably because linux wants to be compatible
with the osf/1 termios struct), not the termio struct.


68201 01-Nov-2000 obrien

The MI/MD split wasn't perfect and the MI files need hacks for the
AlphaLinux compat bits. This will be better cleaned up soon.

Agreed to what ever was necessary by: marcel


68157 01-Nov-2000 obrien

Make the target a little bit more generic.


67589 25-Oct-2000 des

Bring cpuinfo closer to what it looks like in Linux 2.2.

Submitted by: R Bradford Jones <brad@kazrak.com>


67588 25-Oct-2000 des

Add /proc/<pid>/status and /proc/<pid>/stat (the latter being mostly
zeroes for the time being).

Prompted by: Nathan Boeger <nathan@khmere.com>


67468 23-Oct-2000 non

Add PC-Card/ISA SCSI host adpater drivers from NetBSD/pc98
(a NetBSD port for NEC PC-98x1 machines). They are ncv for NCR 53C500,
nsp for Workbit Ninja SCSI-3, and stg for TMC 18C30 and 18C50.

I thank NetBSD/pc98 and bsd-nomads people.

Obtained from: NetBSD/pc98


67234 17-Oct-2000 gallatin

A start at an implemention of linux_rt_sendsig & linux_rt_sigreturn
and associated user-level signal trampoline glue.

Without this patch, an SA_SIGINFO style handler can be installed by a linux
app, but if the handler accesses its sip argument, it will get a garbage
pointer and likely segfault.

We currently supply a valid pointer, but its contents are mainly
garbage. Filling this in properly is future work.

This is the second of 3 commits that will get IBM's JDK 1.3 working with
FreeBSD ...


66923 10-Oct-2000 des

Mark directories as directories, not as regular files.


66834 08-Oct-2000 phk

Initiate deorbit burn sequence for <machine/console.h>.

Replace all in-tree uses with necessary subset of <sys/{fb,kb,cons}io.h>.
This is also the appropriate fix for exo-tree sources.

Put warnings in <machine/console.h> to discourage use.
November 15th 2000 the warnings will be converted to errors.
January 15th 2001 the <machine/console.h> files will be removed.


66037 18-Sep-2000 des

Fix cut'n'paste bogon.

Submitted by: Jim Pirzyk <Jim.Pirzyk@disney.com>


65635 09-Sep-2000 des

Remove unused variables.


65633 09-Sep-2000 des

Add stat, uptime and version.
Note that version currently returns the first line of the version string
from vers.c, which is not quite what a Linux system would return.


65577 07-Sep-2000 des

Pierre Beyssac originally derived linprocfs from procfs, and I've made (and
will keep making) significant modifications, so I'm adding both our copyrights
to the top of these files.


65446 04-Sep-2000 des

Remove obsolete comment (see rev 1.84 of procfs_vnops.c)


65337 01-Sep-2000 rwatson

o Synchronize linprocfs authorization with procfs authorization improvements
(better hiding of hidden processes, more access checks, use vaccess(), et
al)

Approved by: des
Obtained from: TrustedBSD Project


65302 31-Aug-2000 obrien

Cleanup after repo copy of sys/svr4 to sys/compat/svr4.


65258 30-Aug-2000 rwatson

o Update linprocfs to include similar changes as those in procfs, fixing
the build (oops!): replace calls to p_trespass() and PRISON_CHECK()
with p_can(..., {P_CAN_SEE, P_CAN_DEBUG}, NULL)
o Remove volatile usage from procfs_readdir() to remove warnings
o Apply bp's CREATE fix to linprocfs, causing EROFS to be returned on
CREATE calls to procfs_lookup()
o Some further synchronization still needs to occur: only existing
access checks were replaced, to fix the build--the new ones were not
added. I'll do this later today, this is a "fix the build quickly"
commit. This means that, in the interim, some information leakage
can still occur via linprocfs when using jail or kern.ps_showallprocs

Submitted by: knu
Approved by: des
Obtained from: TrustedBSD Project


65108 26-Aug-2000 marcel

Whitespace change: (near) KNF


65106 26-Aug-2000 marcel

Fix bug in previous commit. We need to trim the limits to fit
the datatype (= long). Use ULONG_MAX and LONG_MAX to avoid
creating MD code.


65099 26-Aug-2000 marcel

Re-implement linux_{g|s}etrlimit in terms of {g|s}etrlimit
instead of the o{g|s}etrlimit so that the dependency on
COMPAT_43 is removed.


65067 25-Aug-2000 marcel

Fix typo in license.


64913 22-Aug-2000 marcel

Update include directives.


64911 22-Aug-2000 marcel

Update include directives.

Make linux_to_bsd_sigset and linux_do_sigaction non-static.

Move linux_sigaction. linux_sigsuspend, linux_rt_sigsuspend,
linux_pause and linux_sigaltstack to MD code.


64909 22-Aug-2000 marcel

Update include directives.

Move linux_select to MD code (i386 compat. syscall).

Move linux_fork, linux_vfork, linux_clone, linux_mmap,
linux_pipe, linux_ioperm, linux_iopl and linux_modify_ldt
to MD code.


64907 22-Aug-2000 marcel

Update include directives.


64906 22-Aug-2000 marcel

Update include directives.

Make the sem*, msg* and shm* function non-static as they are
called from MD code.

Move linux_ipc to MD code.


64905 22-Aug-2000 marcel

Update include directives and remove linux_execve.


64904 22-Aug-2000 marcel

Provide prototypes for functions used by MD code.


64560 12-Aug-2000 bde

Fixed null pointer panic for accessing "meminfo" when there is no swap.


64002 29-Jul-2000 peter

Regen. (Fix SYS_exit)


64001 29-Jul-2000 peter

Sigh. Fix SYS_exit problems. I misunderstood the significance of these
trailing options.


63987 29-Jul-2000 peter

Regenerate with makesyscalls.sh


63986 29-Jul-2000 peter

Change the 'exit()' system call to 'sys_exit()'. This avoids overlapping
gcc's internal exit() prototypes and the (futile) hackery that we did to
try and avoid warnings. main() was renamed for similar reasons.
Remove an exit related hack from makesyscalls.sh.


63903 27-Jul-2000 marcel

Remove the only use of SCARG and perform dead code elimination.


63778 23-Jul-2000 marcel

Add bounds checking to stackgap_alloc. Previously it was possible
to construct a path that was long enough (ie longer than
SPARE_USRSPACE bytes) and trash the stack.

Note that SPARE_USRSPACE is much smaller than MAXPATHLEN so that
the Linuxulator will now return ENAMETOOLONG even if the path
is smaller than MAXPATHLEN.

PR: 12749


63605 20-Jul-2000 marcel

Revert implementation of setfsuid and setfsgid due to security
issues.

Requested by: rwatson
Backed by: kris


63285 17-Jul-2000 marcel

Implement pread and pwrite.

PR: 17991
Submitted by: Geoffrey Speicher <geoff@caribbean.sea-incorporated.com>


63280 16-Jul-2000 marcel

Implement setfsuid and setfsgid. Implementation derived from patch
in PR.

PR: 16993
Submitted by: Bjoern Groenvall <bg@sics.se>


63233 15-Jul-2000 marcel

Simplify the F_GETOWN and F_SETOWN fcntl commands. The workaround
is not needed since the FreeBSD native implementation switched
from TIOC{G|S}PGRP to FIO{G|S}ETOWN (kern_descrip.c rev 1.55).

PR: 16946
Submitted by: Victor Salaman <salaman@teknos.com>


62976 11-Jul-2000 mckusick

Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).


62573 04-Jul-2000 phk

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


62454 03-Jul-2000 phk

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


62378 02-Jul-2000 green

Modify ktrace's general I/O tracing, ktrgenio(), to use a struct uio *
instead of a struct iovec * array and int len. Get rid of stupidly trying
to allocate all of the memory and copyin()ing the entire iovec[], and
instead just do the proper VOP_WRITE() in ktrwrite() using a copy of
the struct uio that the syscall originally used.

This solves the DoS which could easily be performed; to work around the
DoS, one could also remove "options KTRACE" from the kernel. This is
a very strong MFC candidate for 4.1.

Found by: art@OpenBSD.org


61976 22-Jun-2000 alfred

fix races in the uidinfo subsystem, several problems existed:

1) while allocating a uidinfo struct malloc is called with M_WAITOK,
it's possible that while asleep another process by the same user
could have woken up earlier and inserted an entry into the uid
hash table. Having redundant entries causes inconsistancies that
we can't handle.

fix: do a non-waiting malloc, and if that fails then do a blocking
malloc, after waking up check that no one else has inserted an entry
for us already.

2) Because many checks for sbsize were done as "test then set" in a non
atomic manner it was possible to exceed the limits put up via races.

fix: instead of querying the count then setting, we just attempt to
set the count and leave it up to the function to return success or
failure.

3) The uidinfo code was inlining and repeating, lookups and insertions
and deletions needed to be in their own functions for clarity.

Reviewed by: green


61702 15-Jun-2000 cracauer

Linux allows to mmap annonymous with a file descriptor passed, FreeBSD
doesn't. In the Linux emulation layer, ignore the fd passed when
MAP_ANON is specified.

Known application to be fixed: Xanalys/Harlequin Lispworks

Also improve debug output for mmap, now showing what the emulation
layer mapped to what (-DDEBUG).

Reviewed by: marcel


60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


60860 24-May-2000 des

Make exe a symlink.


60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


60325 10-May-2000 bde

Regenerated (to fix "created from" lines, and to fix the previous
regeneration which somehow used the wrong syscalls.master file,
resulting in unbuildable svr4_sysent.c).


60324 10-May-2000 bde

Fixed the "created from" lines generated from this file. makesyscalls.sh
expects the active id to be on the first line of the specification file.

Fixed some nearby gratuitous differences with kern/syscalls.master.


60290 09-May-2000 bde

Regenerated (fixed the calculation of sy_nargs in sysent tables).


60289 09-May-2000 bde

Don't forget to back up svr4_syscallnames.c. Don't depend on side effects
to generate it.


60271 09-May-2000 bde

Fixed the return type and args struct tag for exit(). They were wrong in
all emulators. These entries were unused, so the bug had no effect, but
the the args struct tag will be used to calculate sy_nargs correctly.


60060 06-May-2000 green

Give the "streams" modulea version (1) and depend on it from the
"svr4elf" module. This unbreaks the SVR4 KLD (which had an undefined
function because of thenewly-committed KLD enhancements).


59874 01-May-2000 peter

Add $FreeBSD$


59794 30-Apr-2000 phk

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


59760 29-Apr-2000 phk

Remove unneeded #include <sys/kernel.h>


59458 21-Apr-2000 msmith

Fix include paths so that this builds correctly.

Submitted by: Mike Pritchard <mpp@mppsystems.com>


59412 20-Apr-2000 msmith

Move the linprocfs bits under the rest of the i386 linux compatibility
code.


59391 19-Apr-2000 phk

Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>


59368 18-Apr-2000 phk

Remove unneeded <sys/buf.h> includes.

Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.


59342 18-Apr-2000 obrien

Change our ELF binary branding to something more acceptable to the Binutils
maintainers.

After we established our branding method of writing upto 8 characters of
the OS name into the ELF header in the padding; the Binutils maintainers
and/or SCO (as USL) decided that instead the ELF header should grow two new
fields -- EI_OSABI and EI_ABIVERSION. Each of these are an 8-bit unsigned
integer. SCO has assigned official values for the EI_OSABI field. In
addition to this, the Binutils maintainers and NetBSD decided that a better
ELF branding method was to include ABI information in a ".note" ELF
section.

With this set of changes, we will now create ELF binaries branded using
both "official" methods. Due to the complexity of adding a section to a
binary, binaries branded with ``brandelf'' will only brand using the
EI_OSABI method. Also due to the complexity of pulling a section out of an
ELF file vs. poking around in the ELF header, our image activator only
looks at the EI_OSABI header field.

Note that a new kernel can still properly load old binaries except for
Linux static binaries branded in our old method.

*
* For a short period of time, ``ld'' will also brand ELF binaries
* using our old method. This is so people can still use kernel.old
* with a new world. This support will be removed before 5.0-RELEASE,
* and may not last anywhere upto the actual release. My expiration
* time for this is about 6mo.
*


57998 13-Mar-2000 nsayer

Fix some style bugs. The long line is in a chunk of code that's
being rewritten, though.

Submitted by: bde


57867 09-Mar-2000 marcel

Fix bug in linux_wait4 and linux_waitpid where garbage in the status
argument could panic the kernel.

Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Prompted by: jkh, gallatin
Approved by: prompters


57858 09-Mar-2000 nsayer

Implement Linux BLKGETSIZE ioctl, and open the door to implementing
other BLK.* ioctls should the desire arize.

Approved by: jkh (via dufault)


57564 28-Feb-2000 marcel

Fix accept(2) behavior in that accepted sockets don't inherit the
parents flags.

Note on the PR:
The PR contains another patch that's not being committed without
further background information. The PR stays open for now.

PR: 16946 (Victor A. Salaman <salaman@teknos.com>)
Prompted by: msmith
Indirect/implicit approval: jkh (shoot me if I'm wrong :-)


56940 01-Feb-2000 nsayer

Avoid passing an uninitialized structure member to the real
READSUBCHANNEL ioctl. This makes vmware work with SCSI CDROM
drives.

Approved by: jkh


56046 15-Jan-2000 newton

Fix handling of svr4_sigsets, which are implemented in SysVR4 as a sequence
of 4 longs used as a bitmask. sv4r4_sigfillset has been broken for a
while, probably since rev 1.5.

This patch fixes SVR4_NSIG (i.e.: sets it to the actual number of signals,
instead of the number of bits in the mask) because some SysVR4 clients
honestly seem to care about whether bits in the signal mask are set for
non-existant signals.

Additionally, the svr4_sigfillset macro has been replaced by a
fully fledged function, because the macro didn't actually work
(it returned an all-ones mask, but we don't want that: we want 0's
set where FreeBSD doesn't actually have a signal which is the same
as an SysVR4 signal, for example).

SysVR4 clients can now successfully ignore signals, although catching
them remains problematic (see commit log message for rev1.13 of
sys/i386/svr4/svr4_machdep.c for more info).


56045 15-Jan-2000 newton

Remove some all-too-wordy debugging prints


55771 10-Jan-2000 marcel

Return Linux kernel version 2.2.12 by default. This is in line
with linux_base-6.1.


55657 09-Jan-2000 bde

Removed bogus include of opt_global.h. opt_global.h is automatically
included in all C files if it makes sense (i.e., for compiling kernels
but not for compiling modules), so including it explicitly just
complicates module makefiles.


55629 08-Jan-2000 marcel

Convert the filesystem type returned in struct statfs by syscalls
linux_statfs and linux_fstatfs. Linux binaries testing this expect
the filesystem's magic number and not our vnode's tag.

PR: 15425
Tested by: Vladimir N. Silyaev <vsilyaev@mindspring.com>


55355 03-Jan-2000 newton

Need to #include vm_zone.h to pick up inline definition of zfree() so that
NDFREE() macro from namei.h will be happy.


54655 15-Dec-1999 eivind

Introduce NDFREE (and remove VOP_ABORTOP)


54494 12-Dec-1999 newton

Replace the svr4_sys_getdents64() routine with a port of linux_getdents() --
differences between the VFS interface between FreeBSD and NetBSD make
it easier to pick up the Linux one than to continue development with the
NetBSD port.

This patch fixes a bug which caused duplicate filenames to be seen by
callers to svr4_sys_getdents64(), leading to malformed directory listings
from Solaris client programs.

Obtained from: The Linuxulator, with a pointer from marcel


54493 12-Dec-1999 newton

Avoid excessive redundancy in svr4_sys_getmsg() and svr4_sys_putmsg():
Only look up the provided descriptor in fd_ofiles[] once.

Submitted by: Ville-Pertti Keinone <will@iki.fi>


54492 12-Dec-1999 newton

fd_revoke() shouldn't panic if the descriptor provided is not a file or
socket. Return EINVAL instead.

Submitted by: Ville-Pertti Keinone <will@iki.fi>


54399 10-Dec-1999 marcel

Remove unused includes.

Found by: phk-scan


54305 08-Dec-1999 newton

Remove unnecessary includes

Prodded by: phk


54300 08-Dec-1999 newton

SVR4 emulator source files now take their compilation options from
opt_global.h and opt_svr4.h, instead of from the command line. This
brings them in-line with most of the rest of the kernel.

svr4_ioctl.c has also failed to compile with debugging for a while
now; fixed by adding systm.h and socketvar.

Some svr4 source files are automatically generated from syscalls.master;
these have been committed as consequential changes, otherwise everyone
will have to "make svr4_sysent.c".

Changes:

sys/svr4/svr4.h include opt_global.h and opt_svr4.h
sys/svr4/svr4_ioctl.c include svr4.h, sys/systm.h and sys/socketvar.h
sys/svr4/svr4_ipc.c include svr4.h
sys/svr4/svr4_resource.c include svr4.h
sys/svr4/svr4_socket.c include svr4.h
sys/svr4/svr4_ttold.c include svr4.h
sys/svr4/syscalls.master include svr4.h
sys/svr4/svr4_syscallnames.c dependent on syscalls.master
sys/svr4/svr4_sysent.c dependent on syscalls.master
sys/svr4/svr4_syscall.h dependent on syscalls.master
sys/svr4/svr4_proto.h dependent on syscalls.master
sys/modules/svr4/Makefile create opt_global.h and opt_svr4.h


54152 05-Dec-1999 archie

Fix LINT breakage.


54122 04-Dec-1999 marcel

Implement pluggable ioctl handlers.

Other modules can register and unregister ioctl handlers to extend the
ioctls known by the Linuxulator. A recent application is the vmware
port. The Linuxulator itself uses the new interface to register its
handlers as well. Handlers for the following types of ioctls have been
defined:
cdrom
console (=keyboard and VT handling)
socket
sound
termio

All ioctl related defines and declarations have been moved to a new
file (linux_ioctl.h), except for the pluggable ioctl handler interface
definition.

While there, cleanup linux.h some more.

linux.h and linux_ioctl.[ch] have been made to conform to style(9) as
much as possible.

Inspired and reviewed by: Vladimir N. Silyaev


53954 30-Nov-1999 marcel

Implement linux_sigaltstack.


53902 29-Nov-1999 alfred

add linuxulator wrapper for SNDCTL_DSP_GETODELAY


53758 27-Nov-1999 marcel

Implement linux_ustat.

Reviewed by: bde


53713 26-Nov-1999 marcel

Implement fdatasync in terms of fsync. The regeneration of proto.h,
syscall.h and sysent.h was probably forgotten after the last change
syscalls.master.


53678 24-Nov-1999 phk

General clean-up of socket.h and associated sources to synchronise up
with NetBSD and the Single Unix Specification v2.

This updates some structures with other, almost equivalent types and
effort is under way to get the whole more consistent.

Also removes a double definition of INET6 and some other clean-ups.

Reviewed by: green, bde, phk
Some part obtained from: NetBSD, SUSv2 specification


53503 21-Nov-1999 phk

s/p_cred->pc_ucred/p_ucred/g


53009 08-Nov-1999 phk

simplify check for device.


52986 08-Nov-1999 peter

Use fo_stat() rather than Yet Another duplication of kern_descrip.c's stat
code.


52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


52421 21-Oct-1999 marcel

Fix the duplicate filenames that are the result of using getdents.

glibc2 defines struct dirent differently than the Linux kernel does.
The getdents function therefore needs to read a heuristically defined
number of kernel dirents to satisfy the request. In case where too
many kernel dirents have been read, the function lseeks on the
directory so that a next call will start with the right dirent. The
offset used in lseeking is the offset-field in the last dirent passed
to the application. This can only mean that the offset-field holds
the offset of the next dirent and not the offset of the dirent itself.


52334 17-Oct-1999 newton

Remove unnecessary includes.

phk's script walked through .c and .h files, but some of the ones on
the list are actually derived from sys/svr4/syscalls.master. Make
the necessary changes here and the others will implicitly follow...

Submitted by: phk


52333 17-Oct-1999 newton

Remove unnecessary includes.

Submitted by: phk


52140 11-Oct-1999 luoqi

Add a per-signal flag to mark handlers registered with osigaction, so we
can provide the correct context to each signal handler.

Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).

Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.

Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.

Reviewed by: marcel, jdp, bde


51969 06-Oct-1999 jhay

Swap IOC_OUT and IOC_IN for the SETDIR macro. The linux ioctl read and
write bits are swapped.

Reviewed by: luoqi, marcel


51957 05-Oct-1999 n_hibma

Removal of sys/device.h

- Move intrhook stuff into kernel.h
- Remove all occurrences of #device <device.h>
- Add kernel.h were necessary (nowhere)
- delete device.h

This file contained the structures for cfdata (old style config) and is no
longer used. It was included by most drivers.

It confuses the remote debugger as the definition of 'struct device' in
device.h is found before the one in bus_private.h.


51844 01-Oct-1999 peter

Oops. That'll teach me to commit without testing. I either replaced
one trigraph with another, or completely missed the point. Kill it for
real this time.


51843 01-Oct-1999 peter

Zap a trigraph (???)


51793 29-Sep-1999 marcel

sigset_t change (part 4 of 5)
-----------------------------

The compatibility code and/or emulators have been updated:

iBCS2 now mostly uses the older syscalls. SVR4 now properly
handles all signals. This has been achieved by using the
new sigset_t throughout the emulator. The Linuxulator has
been severely updated. Internally the new Linux sigset_t is
made the default. These are then mapped to and from the
new FreeBSD sigset_t.

Also, rt_sigsuspend has been implemented in the Linuxulator.
Implementing this syscall basicly caused all this sigset_t
changing in the first place and the syscall has been used
throughout the change as a means for testing. It basicly is
too much work to undo the implementation so that it can
later be added again.

A special note on the use of sv_sigtbl and sv_sigsize in
struct sysentvec:
Every signal larger than sv_sigsize is not translated and is
passed on to the signal handler unmodified. Signals in the
range 1 upto and including sv_sigsize are translated.
The rationale is that only the system defined signals need to
be translated.

The emulators also have been updated so that the translation
tables are only indexed for valid (system defined) signals.
This change also fixes the translation bug already in the
SVR4 emulator.


51654 25-Sep-1999 phk

This patch clears the way for removing a number of tty related
fields in struct cdevsw:

d_stop moved to struct tty.
d_reset already unused.
d_devtotty linkage now provided by dev_t->si_tty.

These fields will be removed from struct cdevsw together with
d_params and d_maxio Real Soon Now.

The changes in this patch consist of:

initialize dev->si_tty in *_open()
initialize tty->t_stop
remove devtotty functions
rename ttpoll to ttypoll
a few adjustments to these changes in the generic code
a bump of __FreeBSD_version
add a couple of FreeBSD tags


51602 23-Sep-1999 marcel

Linux doesn't complain if you remove a msg queue that doesn't exist
(given the proper permissions).


51569 22-Sep-1999 luoqi

Implement linux_ioperm() syscall. Fix linux_iopl() to use the level argument.
SVGAlib should now work.

Reviewed by: marcel


51418 19-Sep-1999 green

This is what was "fdfix2.patch," a fix for fd sharing. It's pretty
far-reaching in fd-land, so you'll want to consult the code for
changes. The biggest change is that now, you don't use
fp->f_ops->fo_foo(fp, bar)
but instead
fo_foo(fp, bar),
which increments and decrements the fp refcount upon entry and exit.
Two new calls, fhold() and fdrop(), are provided. Each does what it
seems like it should, and if fdrop() brings the refcount to zero, the
fd is freed as well.

Thanks to peter ("to hell with it, it looks ok to me.") for his review.
Thanks to msmith for keeping me from putting locks everywhere :)

Reviewed by: peter


51348 17-Sep-1999 marcel

Fix getcwd. It must return the length of the path including the terminating 0.
While I'm here, fix style and debug printf.

Fix derived from patch by: Darryl Okahata <darrylo@sr.hp.com>


50903 04-Sep-1999 peter

<machine/soundcard.h> -> <sys/soundcard.h>, since it's an exported API
that's arch neutral and OSS API and Linux API compatable.


50833 03-Sep-1999 marcel

I missed the namechange of field desc in struct i386_ldt_args into descs while
reviewing luoqi's changes...

Pointed out by: luoqi


50818 02-Sep-1999 marcel

Implementation of the modify_ldt syscall. Use the sysarch() interface to do
the actual work. When USER_LDT is not defined for a kernel, sysarch returns
EOPNOTSUPP. Display a message in that case and return ENOSYS to userland.

Reviewed by: luoqi


50718 01-Sep-1999 newton

Add MAINTAINER line


50558 29-Aug-1999 marcel

Fix a braino: Linux minor device numbers are 8 bits wide and not 10.


50546 29-Aug-1999 marcel

Fix a missing '-1' in the size argument of copyout in getgroups. Spotted while
reviewing the MFC in -stable.


50500 28-Aug-1999 marcel

Implement the OSS_GETVERSION ioctl. The version returned can be changed through
the sysctl variable `compat.linux.oss_version'.

PR: 12917
Originator: Dean Lombardo <dlombardo@excite.com>


50480 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50465 27-Aug-1999 marcel

Add sysctl variables for the Linuxulator. These reside under `compat.linux' as
discussed on current.

The following variables are defined (for now):

osname (defaults to "Linux")
Allow users to change the name of the OS as returned by uname(2),
specially added for all those Linux Netscape users and statistics
maniacs :-) We now have what we all wanted!

osrelease (defaults to "2.2.5")
Allow users to change the version of the OS as returned by uname(2).
Since -current supports glibc2.1 now, change the default to 2.2.5
(was 2.0.36).

oss_version (defaults to 198144 [0x030600])
This one will be used by the OSS_GETVERSION ioctl (PR 12917) which I
can commit now that we have the MIB. The default version number is the
lowest version possible with the current 'encoding'.

A note about imprisoned processes (see jail(2)):
These variables are copy-on-write (as suggested by phk). This means that
imprisoned processes will use the system wide value unless it is written/set
by the process. From that moment on, a copy local to the prison will be
used.

A note about the implementation:
I choose to add a single pointer to struct prison, because I didn't like the
idea of changing struct prison every time I come up with a new variable. As
a side effect, the extra storage is only needed when a variable is set from
within the prison. This also minimizes kernel bloat when the Linuxulator is
not used; both compiled in or as a module.

Reviewed by: bde (first version only) and phk


50405 26-Aug-1999 phk

Simplify the handling of VCHR and VBLK vnodes using the new dev_t:

Make the alias list a SLIST.

Drop the "fast recycling" optimization of vnodes (including
the returning of a prexisting but stale vnode from checkalias).
It doesn't buy us anything now that we don't hardlimit
vnodes anymore.

Rename checkalias2() and checkalias() to addalias() and
addaliasu() - which takes dev_t and udev_t arg respectively.

Make the revoke syscalls use vcount() instead of VALIASED.

Remove VALIASED flag, we don't need it now and it is faster
to traverse the much shorter lists than to maintain the
flag.

vfs_mountedon() can check the dev_t directly, all the vnodes
point to the same one.

Print the devicename in specfs/vprint().

Remove a couple of stale LFS vnode flags.

Remove unimplemented/unused LK_DRAINED;


50356 25-Aug-1999 marcel

Fix linux_newlstat in that it doesn't return the attributes of its containing
directory. Also, update arguments of NDINIT for both newstat and newlstat.

While I'm at it, fix style bugs in all {s|ls|fs}tat syscalls.

Reported by: bde


50350 25-Aug-1999 marcel

Fix {g|s}etgroups semantics. We use cr_groups[0] to hold egid. This means that
egid will be twice in the set and that setting cr_groups[0] will change egid.
This is simply solved by ignoring cr_groups[0]. That is; linux_getgroups does
not return cr_groups[0] and linux_setgroups does not touch it.

Noticed by: bde
Brought to my attention by: sheldonh


50345 25-Aug-1999 marcel

Change all UNIMPL syscalls to STD and add them to linux_dummy. Now we always
know if and when an unimplemented or obsoleted syscall is being used. Make the
message more end-user friendly.

And as long as we're here, rename some unimplemeted syscalls (linux_phys ->
linux_umount2, linux_vm86 -> linux_vm86old, linux_new_vm86 -> linux_vm86).

Change prototype for linux_newuname from `struct linux_newuname_t *' into
`struct linux_new_utsname *'. This change is reflected in linux.h and
linux_misc.c.


49960 17-Aug-1999 marcel

Fix a bug in debug-printfs of struct linux_termios fields, where I forgot to
change the format specifier after changing the definition of the structure.

Submitted by: billf
Commented on by: bde


49959 17-Aug-1999 marcel

Fix bug in the debug-printf of the vfork syscall, where the format specifier
didn't match the argument (p->p_pid).

While I'm at it, also fix the dupo in the format string and fix the annoying
inconsistency in all the debug-printfs wrt p_pid arguments. Change all of them
to use the %ld format specifier and cast the p_pid arguments to long.

Submitted by: billf


49890 16-Aug-1999 marcel

Implement linux_vfork() syscall by calling vfork(). Analogous to the
linux_fork() implementation.


49849 15-Aug-1999 marcel

Provide wrappers for sched_{s|g}etscheduler. We need to convert the policy
argument.

PR: 12006
Originator: Jean-Claude MICHOT <jcmichot@teaser.fr>


49845 15-Aug-1999 marcel

Fix bug in the fcntl syscall where 'arg' was not set properly.

PR: 12147
Submitted by: Allan Saddi <asaddi@philosophysw.com>


49842 15-Aug-1999 marcel

Include opt_compat.h so that COMPAT_43 is defined. This gives us the proper
prototypes of o{s|g}etrlimit (from sys/sysproto.h). Update linux_{s|g}etrlimit
so that the arguments to o{s|g}etrlimit are corresponding the prototypes.

Pointed out by: bde


49788 14-Aug-1999 marcel

Implementation of the linux_getcwd syscall.


49786 14-Aug-1999 marcel

Implementation of linux_rt_sigaction and linux_rt_sigprocmask syscalls. Both
functions use the new sigset_t and sigaction_t which allows support for more
than 32 signals. Only the lower 32 signals are supported for now.

linux_rt_sigaction, linux_sigaction and linux_signal use linux_do_sigaction
to do the actual work. That way unnecessary redundancy is avoided. The same
has been done for linux_rt_sigprocmask and linux_sigprocmask. They call
linux_do_sigprocmask to do the actual work.


49774 14-Aug-1999 marcel

Fix LINUX_TIOC{S|G}SERIAL implementation. Both do not copy data in or out
of kernel space. Remove the ioctl supporting functions, and move the actual
code to the switch-statement. Now everybody can clearly see that the
implementation is really poor.

Also fix a typo in LINUX_TIOCGETD. The underlying function was given command
TIOCSETD instead op TIOCGETD...


49770 14-Aug-1999 newton

Avoid possible panic by checking for EFAULT from copyinstr() during
pathname translation.

Submitted by: green


49768 14-Aug-1999 marcel

Fix the LINUX_TCSET{A|AW|AF} and LINUX_TCSET{S|SW|SF} ioctls. These all suffer
from the same bug in that the argument is not first copied from user space
before it is used. This is part 2 (of 2) of the termios fixes.


49766 14-Aug-1999 marcel

Fix a couple of termio/termios conversion bugs/typos/dupos/brainos and other
changes. This is part 1 of the complete termios ioctl fixes.

o change type of c_{i|o|c|l}flag in struct termios from unsigned long to
unsigned int. The type now matches the Linux definitions.
o replaced constants by the corresponding defines in sptab[] for clarity.
Since there's no define for 135 baud, its mapping has been dropped.

function bsd_to_linux_termios:
o Fix typo IXON -> IXANY.
o Remove bogus assignment to c_cc[LINUX_VSWTC].

function linux_to_bsd_termios:
o Fix dupo LINUX_IXON -> LINUX_IXANY.
o Add LINUX_CREAD mapping.
o Fix typo IEXTEN -> LINUX_IEXTEN.

function linux_to_bsd_termio:
o Small optimization: Don't preset the complete c_cc array when we next
assign to the first LINUX_NCC entries.


49688 13-Aug-1999 marcel

Implementation of the CDROMSUBCHNL ioctl.


49676 13-Aug-1999 marcel

In doing lock type conversion (struct flock), make sure that carbage in results
in deterministic behaviour. In this case known garbage out.
The fix is different than suggested in the PR.

PR: 12749
Originator: Boris Nikolaus <boris@cs.tu-berlin.de>


49662 12-Aug-1999 marcel

Use a wrapper for the link syscall that does name translations.

PR: 12749
Submitted by: Boris Nikolaus <boris@cs.tu-berlin.de>


49626 11-Aug-1999 marcel

Do not map {s|g}etrlimit onto FreeBSD syscalls. The arguments don't match.
The linux syscalls translate the arguments first before invoking the
FreeBSD native syscalls.

PR: kern/9591
Originator: John Plevyak <jplevyak@inktomi.com>


49523 08-Aug-1999 marcel

Fix page fault in linux_uselib syscall.

PR: 12910
Submitted by: Peter Holm <peter@holm.cc>


49478 07-Aug-1999 green

We don't end up checking for a return value of EFAULT from the copyinstr()
in the pathname translation procedure. This proves fatal, and can be
easily fixed. This or a similar change needs to be committed to svr4_util.h
and ibcs2_util.h. I will update ibcs2_util.h, if noone else thinks of a
better way to do this, in the same manner. I will leave svr4 to the
respective maintainer.

This closes the problem of the only crash I've been able to produce as
a user recently, except for (currently not-in-the-source tree) fd
table sharing fixes. Thanks goes to pho for his stress-testers.


49280 30-Jul-1999 newton

Previous commit also removed some 'const' qualifiers on args for
svr4_sys_sendto() which probably shouldn't have been 'const'.


49279 30-Jul-1999 newton

Previous commit also finished cleaning up some dev_t -> udev_t transformations
related to the commit for rev 1.3 of svr4_stat.c.
svr4_sysvec.c also received a copyright message (which is why it grew by
28 lines).


49278 30-Jul-1999 newton

Fix svr4_sys_poll(); SysV STREAMS produce return values from poll() which
BSD sockets don't. Guess at a correct emulation for those values (it seems
to work for telnet, ftp and friends)


49267 30-Jul-1999 newton

Add $Id$ tags


49264 30-Jul-1999 newton

Fix panic caused when *stat64() family of syscalls try to fill-in
their svr4_stat64 structures with old dev_t values instead of udev_t's.
Panic was caused when major() and minor() were called with args which
weren't pointers. The panic was probably introduced in rev 1.51 of
kern_conf.c


48885 18-Jul-1999 phk

Use the vn_todev() function, rather than VOP_GETATTR


48851 17-Jul-1999 marcel

Implementation of TCXONC.

Reviewed by: bde


48685 08-Jul-1999 marcel

Implement VT_RELDISP ioctl

Submitted by: Kazutaka Yokota <yokota@FreeBSD.org>


48628 06-Jul-1999 marcel

Trivial implementation of TIOCM{S|G}ET and TIOCMBI{S|C} ioctls. No need
to convert the arguments.


48620 06-Jul-1999 cracauer

Rename struct members sa_siginfo. POSIX reserves identifiers starting
with sa_ when <signal.h> is included. They would conflict with the
upcoming SA_SIGINFO implementation.

Reviewed by: BDE


48595 05-Jul-1999 marcel

Let newuname return "Linux" as the OS name and not "FreeBSD". Also, return a
more sensible (for Linux applications) release number. Hardcoding a release
number has its drawbacks, but it will do for now.


48503 03-Jul-1999 green

sys/buf.h needs to have included sys/systm.h for spl prototypes.


47028 11-May-1999 phk

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


46889 10-May-1999 peter

Ack! I deleted "struct", not "const".. Oh boy...

Submitted by: jkh


46803 09-May-1999 peter

Fix a couple of warnings and some bitrot in comments.


46778 09-May-1999 phk

Yet another place which knew too much. Still not sure how much
good this does in the end.


46676 08-May-1999 phk

I got tired of seeing all the cdevsw[major(foo)] all over the place.

Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.


46571 06-May-1999 peter

Fix up a few easy 'assignment used as truth value' and 'suggest parens
around && within ||' type warnings. I'm pretty sure I have not masked
any problems here, I've committed real problem fixes seperately.


46163 29-Apr-1999 luoqi

- Handle mixer read ioctls correctly. They have the same group, number and
argument size as their write counterparts and were handled as write ioctls.
- Emulate some cdrom ioctls.


46129 28-Apr-1999 luoqi

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


46116 27-Apr-1999 phk

Change suser_xxx() to suser() where it applies.


46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


45821 19-Apr-1999 peter

unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.


45739 17-Apr-1999 peter

Well folks, this is it - The second stage of the removal for build support
for LKM's..


45738 17-Apr-1999 peter

Image activators use EXEC_SET(), not TEXT_SET(). (The first is a macro
wrapper for DECLARE_MODULE(), the second is a linker set declaration)


44384 02-Mar-1999 julian

Fix thread/process tracking and differentiation for Linux threads emulation.

Submitted by: Richard Seaman, Jr." <dick@tar.com>

Also clean some compiler warnings in surrounding code.


43597 04-Feb-1999 newton

svr4 emulator will refuse to unload itself if it is currently in use.


43499 01-Feb-1999 newton

Acquiesce to proc.h for declarations of M_ZOMBIE, M_SUBPROC (and reorder
includes so proc.h knows the right type for 'em).

Suggested by: bde


43424 30-Jan-1999 newton

Oops - Ripped out a bit of debugging code which will stop certain bits
of networking from working for people without DEC Tulip ethernet cards.


43412 30-Jan-1999 newton

Emulator KLD for SysVR4 executables grabbed from NetBSD.
See http://www.freebsd.org/~newton/freebsd-svr4 for limitations,
capabilities, history and TO-DO list.


43208 26-Jan-1999 julian

Enable Linux threads support by default.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


42509 11-Jan-1999 msmith

Fix linux sendmsg() emulation

Submitted by: Brian Feldman <green@unixhelp.org>


42499 10-Jan-1999 eivind

Use truncate() instead of otruncate() - step on the way to stopping
the linulator from depending on COMPAT_43.


42360 06-Jan-1999 julian

Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>


42186 30-Dec-1998 sos

Commit patch in

PR: 9232
Submitted by: marcel@scc.nl <Marcel Moolenaar>


42185 30-Dec-1998 sos

Commit #2 of

PR: 9235
Submitted by: marcel@scc.nl <Marcel Moolenaar>


42054 24-Dec-1998 julian

According to the author..

"I've been having a problem running the patches [committed to current]
installed with the COMPAT_LINUX_THREADS option along
with the VM_STACK patches I did. I'm not sure what
the problem is, since it seemed to work before.

In any event, the attached patch fixes the problem for
me. While I've had no report of problems from anyone
else, possibly it would be wise to commit the patch
until the problem is found.

Also, there was some left-over junk in the linux_misc.c
file from some earlier work I did. The attached patch
cleans that up too."

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


41986 21-Dec-1998 sos

Kill(pid, 0) normally returns 0 on both FreeBSD and Redhat after having
performed all sorts of sanity checks. The FreeBSD linux emulator returns
EINVAL in such a case.
Allowing signal 0 to be passed to kill will result in compatible behaviour.

PR: 9082
Submitted by: Marcel Moolenaar <marcel@scc.nl>


41931 19-Dec-1998 julian

Reviewed by: Luoqi Chen, Jordan Hubbard
Submitted by: "Richard Seaman, Jr." <lists@tar.com>
Obtained from: linux :-)

Code to allow Linux Threads to run under FreeBSD.

By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.


41871 16-Dec-1998 bde

Removed the cast to a pointer in the definition of PS_STRINGS and
adjusted related casts to match (only in the kernel in this commit).
The pointer was only wanted in one place in kern_exec.c. Applications
should use the kern.ps_strings sysctl instead of PS_STRINGS, so they
shouldn't notice this change.


41650 10-Dec-1998 jkh

linux_pipe does not preserve the edx register. Linux and
programs using glibc expect edx to be preserved accross syscalls.
As a result, linux programs running in emulation mode can
have whatever value may be represented by edx clobbered.

PR: 9038
Submitted-By: Richard Seaman, Jr. <dick@tar.com>


41514 04-Dec-1998 archie

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


41105 12-Nov-1998 jkh

MF22: Bring in some linux sound ioctl support which I committed to 2.2
for PR 7792 but did not bring forward.

Submitted by: Avatar Liang <avatar@www.mmlab.cse.yzu.edu.tw>
PR: 8656


40203 11-Oct-1998 jdp

Fix a couple of out-of-bounds array references in mapping between
Linux and FreeBSD signal numbers. Also, check signal numbers passed
in from application programs for validity. Without these checks,
it is trivial to panic the system from a Linux program.


39978 05-Oct-1998 jfieber

Make async I/O on a socket work.

Although the current Sybase license does not permit running under
emulation, FreeBSD 3.0 is now "Sybase Ready" should the license change.


39977 05-Oct-1998 sos

In linux_newuname bzero the right type of struct (linux_newuname_t).


39799 30-Sep-1998 jfieber

Add several missing ioctl handlers. One needed by Sybase, the others
found while looking for the one.


39620 24-Sep-1998 jkh

MF22: revert time bogon.


39598 23-Sep-1998 jkh

return time in proper format for linux.


38679 31-Aug-1998 jkh

Argh! *Now* the correct 3.0 fix is committed.


38677 31-Aug-1998 jkh

Whoops! Stamp out a 2.2-ism that snuck between branches here.


38672 31-Aug-1998 jkh

Initial support for using linux X servers under emulation - to use an
XFree86 server, users need to create the following links in their
/compat/linux/dev directory (assuming kernel configured with 4 VTs).

lrwxrwxrwx 1 root wheel 7 Aug 30 22:59 tty0 -> console
lrwxrwxrwx 1 root wheel 5 Aug 30 22:45 tty1 -> ttyv0
lrwxrwxrwx 1 root wheel 5 Aug 30 22:45 tty2 -> ttyv1
lrwxrwxrwx 1 root wheel 5 Aug 30 22:45 tty3 -> ttyv2
lrwxrwxrwx 1 root wheel 5 Aug 30 22:45 tty4 -> ttyv3

VT switching is still not yet supported. Attempting to switch VT
currently will cause Xserver bus error.

Submitted by: Chain Lee <chain@110.net>


38354 16-Aug-1998 bde

Use [u]intptr_t instead of [u_]long for casts between pointers and
integers. Don't forget to cast to (void *) as well.


38344 15-Aug-1998 bde

Oops, the previous fix confused Linux's sigset_t with a pointer type.
It can be integral or a struct in POSIX, so it is difficult to print,
but it is actually declared as unsigned long. Assume that it is
unsigned integral.


38127 05-Aug-1998 bde

Converted the second last instance of hzto() to tvtohz().

Fixed nearby bugs (in linux_alarm()):
- the itimer for the alarm was relative to the epoch instead of relative
to the boot time. This was harmless because the itimer's interval is 0.
- the seconds arg was not checked for validity before converting it to a
possibly different value.
- printf format errors.

Improvements:
Don't use splclock(). splsoftclock() suffices. Don't complicate things
by micro-optimizing interrupt latency.

Minor improvements:
Various micro-optimizations to exploit the specialness of the alarm itimer
and the value 0.


37950 29-Jul-1998 bde

Fixed print format errors.


37548 10-Jul-1998 jkh

Quick and dirty support for Linux's mremap. Not used by anything
but quake2 AFAIK.

Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>


37287 30-Jun-1998 jmg

remove option LINUX as it did nothing, add DEBUG_LINUX to debug the
linux emulation...

(actually moved LINUX to opt_dontuse.h)


36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


36587 02-Jun-1998 jkh

".. x11amp appears to be calling shmctl(id, IPC_RMID, 0) and the emulation
layer does not like the null shmid_ds buffer pointer. The emulation layer
returned an error without ever calling FreeBSD's shmctl, so the segments
were not being deleted when the reference count went to zero."

Submitted by: Kevin Street <street@iname.com>


36119 17-May-1998 phk

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


35058 06-Apr-1998 phk

Make a kernel version of the timer* functions called timerval* to be
more consistent.

OK'ed by: bde


35034 04-Apr-1998 phk

Use microruntime() rather than doing it by hand.


34961 30-Mar-1998 phk

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


34941 29-Mar-1998 peter

The linux chown syscall is more like lchown, a new chown syscall that
follows links was added.


34924 28-Mar-1998 bde

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


33821 25-Feb-1998 bde

Removed redundant test against MAXDSIZ (the rlimit test is stronger).


33148 07-Feb-1998 msmith

In the words of the submitter:

----
I've worked to enhance the connect() patches.

I've just tested this with the Linux JDK appletviewer on an applet
that does a lot of connects, and it works as well as during my
previous tests.

The connect() patch is now a merge between my older patch and the
OpenBSD stuff. It ensures that any async error is returned by
connect() instead of getsockopt(SOL_SOCKET, SO_ERROR) as reasonnable
systems do.

There are also minor patches to implement IPPROTO_TCP for
get/setsocktopt(). These are also tested (with Linux Apache).
----

I would appreciate any feedback regarding these changes, as they'd
be very useful in 2.2.6.

Submitted by: pb@fasterix.freenix.org (Pierre Beyssac)


33044 03-Feb-1998 bde

Changed `inline' to `__inline' so that this file can be compiled by
`gcc -ansi'.

Changed NULL to 0 so that this file is more self-sufficient (now
only <sys/types.h> is a prerequisite).


32266 05-Jan-1998 jmb

sigh....forgot to update the DEBUG printf
to show both the path and the length args
to linux emulation truncate()

Submitted by: jmb


32265 05-Jan-1998 jmb

length argument to truncate() in linux emulation
was not being set copied to the bsd arguments..
frequently, resulting in files of over 100MB of NULs

PR: 386/5044
Reviewed by: jmb
Submitted by: (Richard Winkel) rich@math.missouri.edu


31784 16-Dec-1997 eivind

Make hidden COMPAT_43 dependencies explict. Options in headers is a
pain in the backside.


31778 16-Dec-1997 eivind

Make COMPAT_43 and COMPAT_SUNOS new-style options.


31730 15-Dec-1997 msmith

As described by the submitter:

These patches enables us to play quake2 .

Support linux keyboard ioctl for setting RAW, MEDIUMRAW and XLATE.

Support linux virtual terminal operations:
OPENQRY, GETMODE, SETMODE, GETSTATE, ACTIVATE, and WAITACTIVE.

Submitted by: Amancio Hasty <hasty@rah.star-gate.com>


31711 14-Dec-1997 msmith

As described by the submitter:

- emulate Linux IP_HDRINCL behaviour in sendto(): byte order fixed
Note that we do an extra getsockopt() on every sendto()
to check if the option is set because we don't keep state
in the emulator code. Is there a better way to implement
this?
- correct a bug (value of "name" not passed) with
getsockopt()

Submitted by: pb@fasterix.freenix.org (Pierre Beyssac)


31561 05-Dec-1997 bde

Don't include <sys/lock.h> in headers when only `struct simplelock' is
required. Fixed everything that depended on the pollution.


31198 17-Nov-1997 ahasty

Added support for linux sound ioctls:
LINUX_SNDCTL_DSP_GETOPTR
LINUX_SNDCTL_DSP_GETIPTR
LINUX_SNDCTL_DSP_SETTRIGGER
LINUX_SNDCTL_DSP_GETCAPS

With this rev level the linux realaudio player 5 and xquake should work.


30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


30855 30-Oct-1997 kato

Securelevel and formatting fixes, and trapframe simplification.

Reviewed by: sos
Submitted by: bde


30837 29-Oct-1997 kato

Implement linux_iopl and linux_nice.


30804 28-Oct-1997 kato

Implement linux_semop, linux_semget and linux_semctl.

PR: 4355


29679 21-Sep-1997 gibbs

Update for changes in the callout interface.


28861 28-Aug-1997 kato

Moved include files which are independent of bs driver.


28039 10-Aug-1997 sos

Ops the arguments to copyin was in the wrong order..
This has survived since the first version, sigh.


27557 20-Jul-1997 bde

Removed unused #includes.


26378 02-Jun-1997 dfr

Make this thing actually compile.


26366 02-Jun-1997 msmith

Oops, remove some bogus debugging code that crept in with the last commit.


26364 02-Jun-1997 msmith

Add support for the SIOCGIFHWADDR ioctl, commonly used by
license managers to obtain the host's ethernet address as
a key.

Note that this implementation takes the first hardware address for
the first ethernet interface found, and disregards the interface name
that may be passed in, as linux ethernet devices are all "ethX".


25219 28-Apr-1997 msmith

Always include PROT_READ for Linux mmap operations.
Submitted by: Hannu Savolainen <hannu@voxware.pp.fi> via jkh


24672 06-Apr-1997 dfr

Remove dependancy on UFS' DIRBLKSIZ definition.

2.2 candidate.

Submitted by: bde


24654 05-Apr-1997 dfr

Fix linux_getdents so that it can cope with filesystems which translate
the directory format (ext2fs, cd9660). For these filesystems, it must use
cookies to find the correct offset to use for subsequent reads. Without it,
linux /bin/ls tends to loop re-reading the same block over and over again.

2.2 candidate.


24478 01-Apr-1997 bde

Removed potentially harmful garbage <vm/lock.h> and fixed bogus
use of it. It was actually harmless because the use was null due
to fortuitous include orders and identical (wrong) idempotency
macros.


24205 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 3: include
<sys/filio.h> instead of <sys/ioctl.h> in non-network non-tty files.


24203 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


24131 23-Mar-1997 bde

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22543 10-Feb-1997 mpp

Make this compile again after the Lite2 merge.

VOP_UNLOCK was being called with the wrong mumber of arguments.

Also silenced a -Wall warning.


22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


20691 19-Dec-1996 bde

Fixed lseek() on named pipes. It always succeeded but should always fail.
Broke locking on named pipes in the same way as locking on non-vnodes
(wrong errno). This will be fixed later.

The fix involves negative logic. Named pipes are now distinguished from
other types of files with vnodes, and there is additional code to handle
vnodes and named pipes in the same way only where that makes sense (not
for lseek, locking or TIOCSCTTY).


20101 03-Dec-1996 fenner

Add IP_OPTIONS and the multicast-related setsockopts to the
list of IP setsockopts the Linux emulator recognizes.

Explicitly disallow IP_HDRINCL since Linux's handling of
raw output is different than BSD's.

Closes PR#kern/2111.

Submitted by: y-nakaga@ccs.mt.nec.co.jp (Yoshihisa NAKAGAWA)


19414 05-Nov-1996 smpatel

Add audio mixer ioctls.
Only writing to the mixer is implemented.


18027 03-Sep-1996 bde

Changed type of ni_dirp in `struct namei' from caddr_t to `const char *'
so that the compiler can see that it is OK to use const strings in
NDINIT(). Some emulators want to use paths of the form "/compat/foo".
Removed the casts that hid the non-problem. Didn't fix the missing
consts in syscalls.master that hid the non-problem.


17450 05-Aug-1996 nate

Fix memory leak bug in the path parsing code which never released it's
buffer in certain error conditions. Sync up the code to that in NetBSD
where applicable.

Reviewed by: Gary Jennejohn <garyj@munich.netsurf.de>
Submitted by: Michael Smith <msmith@atrad.adelaide.edu.au>
Obtained from: NetBSD sources


16632 23-Jun-1996 bde

Removed unused #include. Linux doesn't support SCO consoles.


16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


15538 02-May-1996 phk

First pass at cleaning up macros relating to pages, clusters and all that.


15117 07-Apr-1996 bde

Removed never-used #includes of <machine/cpu.h>. Many were apparently
copied from bad examples.


14703 19-Mar-1996 bde

Fixed unsigned longs that should have been vm_offset_t.

vm_offset_t is currently unsigned long but should probably be plain
unsigned for i386's to match the choice of minimal types to represent
for fixed-width types in Lite2. Anyway, it shouldn't be assumed
to be unsigned long.

I only fixed the type mismatches that were detected when I changed
vm_offset_t to unsigned. Only pointer type mismatches were detected.


14584 12-Mar-1996 peter

Remove references to MAP_FILE.. That is now "default" and is only
a "#define MAP_FILE 0" that is still there for net-2 source compatability.


14471 10-Mar-1996 peter

Fix the vm_map_remove and vm_map_protect calls.. Somewhere along the
line, these had got (start, length) arguments instead of (start, end)
args. This could be the cause of Robert Sanders lockups with ZMAGIC
binaries.


14466 10-Mar-1996 peter

Implement rudumentry support for the linux TIOC[SG]ETSERIAL ioctl's.
To complete this, some extra state has to be kept somewhere so that the
B38400 flag in Linux can be correctly translated to/from either 38400,
57600 or 115200.

Submitted by: Robert Sanders <rsanders@mindspring.com>


14465 10-Mar-1996 peter

Fix the getdents() emulation, the Linux ELF libraries use this, and
this code was not quite right (linux has a readdir and getdents syscall,
with the same args. readdir only returns one entry and uses a mutant
dirent structure. This code was also returning the mutant form for
getdents as well. My fault for missing this before.)


14463 10-Mar-1996 peter

Fix a (mostly harmless) bogon when allocating space above the stack
in the stack gap..


14456 10-Mar-1996 sos

First attempt at FreeBSD & Linux ELF support.

Compile and link a new kernel, that will give native ELF support, and
provide the hooks for other ELF interpreters as well.

To make native ELF binaries use John Polstras elf-kit-1.0.1..
For the time being also use his ld-elf.so.1 and put it in
/usr/libexec.

The Linux emulator has been enhanced to also run ELF binaries, it
is however in its very first incarnation.
Just get some Linux ELF libs (Slackware-3.0) and put them in the
prober place (/compat/linux/...).
I've ben able to run all the Slackware-3.0 binaries I've tried
so far.
(No it won't run quake yet :)


14381 04-Mar-1996 peter

update linux_times() and linux_utime() emulation,
fix sigsuspend() (actually back out my recent change there)
and regen the syscall tables..


14371 04-Mar-1996 peter

Add support for LINUX_TCSETAW and LINUX_TCSETAF, which Linux-pine uses.

Submitted by: Robert Sanders <rsanders@mindspring.com>


14361 03-Mar-1996 peter

Add support for the old-style Linux termio (not termios) TCGETA etc.

Also, LINUX_POSIX_VDISABLE is \0, FreeBSD's is 0xff. Convert between them.

This enables some more programs to run, including the Livingston Portmaster
utilities (PMtools).

Submitted by: Robert Sanders <rsanders@mindspring.com>


14342 02-Mar-1996 peter

Minor touch-up... make two functions static, and add missing $Id$


14331 02-Mar-1996 peter

Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.


14114 16-Feb-1996 peter

This is an extract of changes from what I am currently running...
- Optimise the linux a.out loading and uselib system calls so they
take advantage of some of John's recent interface improvements.
Basically, this means they make far less map changes than before.
- Attempt to plug some potentially nasty kernel_map memory leaks..
- Improve support for QMAGIC libs (I only use QMAGIC (ie: a.out libraries from
the slackware 3.0 dist) but this depends on other changes to enhance
the /compat/linux support)
- uselib goes out through a single exit as part of the resource tracking
that I did when closing the resource leaks on errors. This could be
cleaner than what I did, but making a 30-deep nested if/else was not my
idea of fun, neither did I want to repeat the same code 30 times over for
each failure possibility. I guess this function needs to be split into
smaller functions to solve this.

I've been running the Linux Netscape-2.0 (with Java) to test this, and apart
from the long-standing problem with the missing scrollbars, it appears to
still work as before with ZMAGIC libs (and the leaks).. However, I've
been using it with mods for the signal trampoline code for native linux stack
frames on signals and exterminated the blasted sigreturn printf() problem,
so I can't be certain that there is not a dependency on something else.


13739 30-Jan-1996 peter

Call pipe_stat() when presented with a DTYPE_PIPE file in the linux
fstat() syscall, rather than panic("linux newfstat").

(Note: I've extracted this from a larger set of diffs, I'm confident I've
not missed any dependencies but can't modload it to test it on my system)


13503 19-Jan-1996 dyson

Fixed vm_map_find for new vm updates.


13420 14-Jan-1996 sos

Add linux_mknod so that it will do mkfifo if needed...


13334 08-Jan-1996 peter

reran makesyscalls

Always call the SYSV ipc functions, stubs will take their place if
necessary.


13264 05-Jan-1996 wollman

The Linux emulator depends on SYSV IPC but doesn't actually reference
the options.


13113 30-Dec-1995 sos

Oops, forgot a little difference between my src-tree and ours...


13111 29-Dec-1995 sos

My first shot at get sound to work on the emulator.
Inspired by the work Amancio Hasty has done, but implemented
somewhat differently.


12867 15-Dec-1995 peter

Update linux_ipc.c to use the now generated prototypes for the shm* calls
it makes while emulating the linux equivalents.


12860 15-Dec-1995 peter

Initial attempt at getting Linux QMAGIC shared lib support. I have
successfully run linux netscape 2.0b3 with a QMAGIC ld.so and libc/libm
that I found on some linux machine that I _think_ is running slackware 3.0.

There are still problems.. ld.so claims the libraries are the wrong
format, but it still runs anyway.. :-/ The QMAGIC ld.so also screams
about needing ld.so.cache, and running a linux ldconfig is quite
educational. You soon learn to run "chroot /compat/linux /bin/ldconfig"
where ldconfig is living in /compat/linux/bin. :-]

(Lets just say that it puts loads of symlinks in /usr/lib otherwise :-)


12858 15-Dec-1995 peter

Clean up some warnings by using the generated structures in <sys/sysproto.h>
for passing to the bsd system calls, rather than inveninting our own
equivalent structures.


12842 14-Dec-1995 bde

Restored a vm #include.


12689 09-Dec-1995 peter

Attempt to make the Linux LKM compile again after the recent VM include
de-nesting changes...
(I figured this might be usefulif it actually built, since I've told
everybody to rebuild it or die.. :-)


12652 06-Dec-1995 bde

Include <vm/vm.h> explicitly to avoid breaking when vnode_if.h doesn't
include vm stuff.


12458 22-Nov-1995 bde

Completed function declarations and added prototypes.

Removed some unnecessary #includes.

Fixed warnings about nested externs.


12130 06-Nov-1995 dg

All:
Changed vnodep -> vp for consistency with the rest of the kernel, and
changed iparams -> imgp for brevity.

kern_exec.c:
Explicitly initialized some additional parts of the image_params struct
to avoid bzeroing it. Rewrote the set-id code to reduce the number of
logical tests. The rewrite exposed a mostly benign bug in the algorithm:
traced set-id images would get ktracing disabled even if the set-id didn't
happen for other reasons.


11418 10-Oct-1995 swallace

Fix the getdirentries of ibcs2 to handle uneven DIRBLKSIZ offsets.
Slight modification from previous fix.

Also, fix problem where an entry would be skipped next call if not enough room
in buffer current call.


11163 04-Oct-1995 julian

Submitted by: Juergen Lock <nox@jelal.hb.north.de>
Obtained from: other people on the net ?

1. stepping over syscalls (gdb ni) sends you to DDB, and returned
to the wrong address afterwards, with or without DDB. patch in
i386/i386/trap.c below.

2. the linux emulator (modload'ed) still causes panics with DIAGNOSTIC,
re-applied a patch posted to one of the lists...


10358 28-Aug-1995 julian

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


10355 28-Aug-1995 swallace

Modified linux_readdir() function to properly handle Linux readdir()
calls with a byte size of 1. This special case was not
correctly emulated. Now programs such as a simple 'ls' to a commercial
Macintosh emulator called 'executor' will work correctly.


9313 25-Jun-1995 sos

First incarnation of our Linux emulator or rather compatibility code.
This first shot only incorporaties so much functionality that DOOM
can run (the X version), signal handling is VERY weak, so is many
other things. But it meets my milestone number one (you guessed it
- running DOOM).

Uses /compat/linux as prefix for loading shared libs, so it won't
conflict with our own libs.

Kernel must be compiled with "options COMPAT_LINUX" for this to work.